Visual Detection of opportunities Determining Sequences of Contact States for Object Manipulation


The goal of traditional computer vision is, given an image or a video to describe the pictured three dimensional world in terms of objects, scene structure with their physical properties and spatial relationships. This is motivated by the need of visual representations for enabling successful interactions with our environment and other peers. While this is one important side of the story, research from cognitive science found evidence, that besides such visual scene representations making contact plays a central role in understanding human-object interactions [1, 2]. In this project we are interested in learning robot-object interactions and how those emerge from a sequence of contact states with the environment.

The project aims at finding answers to the following research questions: How can we learn visual representations informed by an abstract understanding of behavioral strategies for the prediction of hand-object contact configurations? To what degree determines the agent’s environment the development of behavioral strategies leading in such hand-object configurations? In this context some preliminary work shows how high-level activity understanding supports learning future motion trajectories more effectively [3].

The hypothesis motivating this research is, that visual representations, that are designed with information arising through interaction will lead to (I) a significant reduction of typically required training data and (II) and will allow for better generalization as these representations are object-agnostic compared to approaches relying on a semantic understanding of objects.


Zago, M., McIntyre, J., Senot, P., & Lacquaniti, F. (2009). Visuo-motor coordination and internal models for object interception. Experimental Brain Research, 192, 571-604.
Tresilian, J. R. (1995). Perceptual and cognitive processes in time-to-contact estimation: Analysis of prediction-motion and relative judgment tasks. Perception & Psychophysics, 57(2), 231-245
Halawa, M., Hellwich, O., & Bideau, P. (2022, October). Action-based contrastive learning for trajectory prediction. In European Conference on Computer Vision (pp. 143-159). Cham: Springer Nature Switzerland.


We have currently to open PhD positions for this research chair.



Published on  January 11, 2024
Updated on January 11, 2024