Prediction of Manipulation Actions

Abstract

By looking at a person’s hands, one can often tell what the person is going to do next, how his/her hands are moving and where they will be, because an actor’s intentions shape his/her movement kinematics during action execution. Similarly, active systems with real-time constraints must not simply rely on passive video-segment classification, but they have to continuously update their estimates and predict future actions. In this paper, we study the prediction of dexterous actions. We recorded videos of subjects performing different manipulation actions on the same object, such as “squeezing”, “flipping”, “washing”, “wiping” and “scratching” with a sponge. In psychophysical experiments, we evaluated human observers’ skills in predicting actions from video sequences of different length, depicting the hand movement in the preparation and execution of actions before and after contact with the object. We then developed a recurrent neural network based method for action prediction using as input image patches around the hand. We also used the same formalism to predict the forces on the finger tips using for training synchronized video and force data streams. Evaluations on two new datasets show that our system closely matches human performance in the recognition task, and demonstrate the ability of our algorithms to predict in real time what and how a dexterous action is performed.

Publication
International Journal of Computer Vision

Related