WP7. Lifelong improvement of multi-modal human-robot interaction
The main goal of this package is the implementation of agents able to allow lifelong social interaction, using natural interaction channels, between the social robot and people living in the smart home.
It is a challenging target to create robots that are able to continuously learn and adapt their behavior to the user personality, preferences, and profile. In the assistive robotics context, behavior adaptation must address both short-term changes, that allow the robot to adapt to the particular context of each HRI process, and long-term changes, that allow the robot to set long-term (lifelong) relations with certain people. These special conditions of lifelong HRI processes require new features and data sources to be included in our current developments, being these design and integration works addressed in the following tasks:
T7.1 Auditory perception for multiple sources localization and audio enhancement
In real environments the abilities of the robot auditory system are of special interest when interacting with users. Usually an undetermined number of sound sources can be present simultaneously in a room: e.g. a person speaking while a radio is on in the room and the noise of an electrical appliance can be listened as well. In this scene, the robot should be able to enhance the audio coming from one of them, which will guarantee a good performance in the interaction tasks such as the dialogue process, emotion recognition based on voice or people tracking. But also, the knowledge about the other sources is useful as it can contribute to construct the inner model robot surroundings (WP3). Even these sources could be labeled in order to categorize in some way the acoustical sources. So special effort will be made on the development of methods for the unsupervised detection of sound sources and the audio separation, where the UJA team has some recent experience. More specifically, next questions will be addressed: i) The evaluation of the number and the geometrical disposition of the microphones to be used. ii) The development of a novel method for the unsupervised detection of an unknown number of sound sources in real time. This method can incorporate an expert system to include adjustments associated with the acoustical properties of the room. iii) The labelling of the detected sound sources and enhancement of the audio source of interest. iv) How the features of the active sound sources are analyzed to be incorporated into the inner model.
T7.2 Selective attentional mechanism based on audio-visual information fusion
Social robots should direct and share attention with people interacting with them. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. In this task we will employ a Bayesian approach in order to improve the attentional mechanism, taking into consideration audio and visual cues related to the spatial localization of the user in a holistic way, based on the integration of saliency maps.
T7.3 Person perception and lifelong internalizing
Within this project, following a game theoretic approach, the robot will be endowed with the ability to exhibit a pro-active behavior, changing roles with the perceived person in the HRI process. This socially aware robot will also be able to understand implicit communication in order to appropriately create a context for the interaction. The algorithms for perceiving the person are currently endowed into the robots at the Consortium. These functionalities will be evaluated and strengthened following research lines already initiated by our research groups at THERAPIST and ADAPTA projects.
T7.4 Para-verbal cues for emotion and health perception
In HRI, the use of some audio cues such as pitch, energy or tempo are one of the most robust solutions that can detect emotions such as sadness, happiness, fear or anger. Fusing this information with visual cues will lead to a better knowledge of the emotional state of the user, which will allow more natural and richer HRI. Furthermore, the emotional state of the user can be correlated with physiological signals (e.g. pulse), which will be monitored with the wearable smart bracelet acquired within the LICOG subproject (all data provided by this sensor will be available within the person model at the DSR (WP3)). It is reasonable to expect that the fusion of emotional and physiological data will lead to a robust perception of the welfare of the person.
T7.5 Speech corpus acquisition and multi-modal conversational modelling and learning
The main core of this task is the construction of a dialogue system able to deal with human robot speech based interactions. This dialogue system will conduct the dialogue and set interaction turns. Apart from the dialogue system, speech recognition and synthesis modules will be also developed extending conversational agents already employed in the Consortium. Also, in a first stage of this task, the acquisition of a specific domain speech corpus should be carried out in order to train the acoustic and language models to feed speech recognition system. Finally, the use of a multi-modal communication channel combining the work conducted in this task and the work developed in task 7.3 will be explored.
T7.6 Learning, adaptation, and customizable interaction
The robot behavior adaptation as a function of the human user in human-robot collaborative context is an open-problem, where some research questions are still open to discussion: (i) what a-priori information needs to be learnt about the person; (ii) how to learn optimal models; and (iii) how to obtain a context and user-based customized protocol interaction. In this task, we will address these issues and propose a methodology for developing and evaluating supervised learning-based approaches to robot behavior adaptation, so the robot can exhibit a lifelong personality adapted to the user profile.
T7.7 Robot Intentions
This task deals with decision-making, natural spontaneity, and expression of robot intentions. In this task, the robot will be endowed with the abilities to: (i) take into account context and user profile (dependencies, profile, preferences, disability level) while performing its tasks; (ii) change its interaction styles depending on the context and scenarios; and (iii) to take the initiative, and then establish and conduct an interactive session with a human (T3.1).