Axis 3: Perception & interaction

A major objective of artificial intelligence is to enhance the abilities of humans to interact with their environment. This involves the resolution of various problems, including perceiving, analysing and learning the informational structure of this environment, and acting on it in an adequate and efficient way. Importantly, the human environment is also composed of other humans, which raises specific questions about the automatic analysis of human behavior and the design of efficient systems for enhanced interaction between humans. The Grenoble teams have a long-standing competence on human-machine and human-human interaction, with an increasing use of machine learning techniques and maintaining at the same time ancient and strong links between computer science and cognitive psychology. This leads to three programmes addressing separately the questions of visual analysis of the external world, interaction with humans and objects in the sensory-motor framework associated to robotics, and communicating with humans by speech and language.

3.1. Robotics

The possibility to act on the environment raises specific problems and new challenges for learning and processing techniques, classically addressed in the sensory-motor framework provided by robotics. This programme will study how interacting with the external world enables to actively modify and optimize the information acquisition process, and explore how machine learning techniques can be associated to control theory to design robotic systems both efficient and safe. Human robotics is also a central component in this program, towards the elaboration of companion robots and collaborative intelligent systems providing powerful extensions and support to human intelligence.

3.2. Natural language and speech processing

Speech and language provide the most natural support for interaction between humans. The Grenoble teams have a strong expertise in the coupling of data-driven technologies for speech and language processing with the elaboration of physical and cognitive models of speech production and language acquisition. This programme will expand on this expertise to develop more versatile speech and language technologies able to learn from fewer examples, adapt to various kinds of perturbations, extend to new languages, dialects or social contexts. The expected outputs have applications in various domains including human-machine communication, education and clinical outputs.

3.3. Computer vision

Analysing visual scenes involves complex machine learning problems related to feature extraction and objects and trajectories representation. Hot challenges in the field concern the analysis of complex structures in space and time, related to such problems as 3D-image processing and representation, analysis of complex human actions, contextual recognition of sequences of actions or complex relationships between objects. This programme aims at associating learning algorithms to innovative developments on the physical and biological representation of objects and their dynamics, with a specific focus on the analysis of human activities.

  • Data Driven 3D Vision - Edmond Boyer
  • Towards self-supervised visual learning - Cordelia Schmid