INTERSPEECH 2015 Special Session on
Active Perception in Human and Machine Speech Communication
This special session is organized in the framework of INTERSPEECH 2015 in Dresden
The communication between humans is a continuous and dynamic process driven by the goals of the communication partners, in which the sender and the receiver interact and thus shape the communication. In such situations, both the perception and the speech of the interlocutors may be adjusted to the context. In the recent past, scientific results have been reported on the context-adaptation of the talking behavior (e.g. in the project "Listening Talker" and the special session “Intelligibility-enhancing speech modifications” at INTERSPEECH 2013).
In this special session, we want to look at the reciprocal process, how the listener adapts to better understand the talker, how he actively perceives the message. A main target of this session is to explore how behavioral research can enrich the development of active machine listening.
Current research on active machine perception mainly focuses on visual perception, i.e. the capability of physically embodied systems to modify their sensors so that they actively select those parts of the scene they want to perceive in order to achieve their goals within a task-oriented activity.
In contrast, active auditory perception has received relatively little research attention. This is astonishing given the facts that (1) auditory perception on robots, especially automatic speech recognition, is still a bottleneck on current robotic systems characterized by the difficult noise conditions which are especially severe in human-robot interaction (HRI) not the least due to robot-eigennoise, (2) human listening is a process that appears to rely substantially on feedback, judging from the strongly efferent innervation of the inner ear, which is currently the focus of much research attention and (3) that speech processing is part of a capability that is unique to humans: intention-driven goal-oriented communication which provides a completely new dimension to active perception, namely the possibility to jointly create an optimal situation for mutual understanding within the current environment with the interaction partner.
With respect to the first issue, the physical capabilities of robots provide an ideal basis for actively optimizing the robot’s acoustic perception. This entails first of all, and similarly to active visual perception, assuming a spatial position and orientation for receiving optimal acoustic signals. In addition, it also entails the capability to (temporarily) actively suppress the robot’s own noise in acoustically difficult interaction situations such as noises of the cooling system, the motors, or the speech synthesis.
However, going beyond such ego-actions, an interactive robot system is also capable of socially manipulating the sound source, i.e., the way the interaction partner produces the signal. This means on the one hand that the robot can explicitly ask the interaction partner to change specific parameters of his/her speech, e.g. to speak more loudly or slowly or to not use specific words that are out of vocabulary. On the other hand, the system can make use of more subtle strategies based on the human capability of entrainment, i.e. the bias to adapt one’s own behavior to the behavior of the interaction partner. For example, it has been shown that interaction partners tend to speak more slowly or loudly when the other communication partner is doing so. Such entrainment capabilities have also been shown to occur towards computers and robots and are subject to a range of variables. Entrainment takes place at all linguistic processing levels, from the acoustic-phonetic to the pragmatic level. Already, first systems have been developed that make use of this human bias and deliberately provide the interaction partner with speech output that contains characteristics that are desirable from the robot’s perspective in the current situation.
This Special Session plans to bring together researchers from different disciplines in order to (1) assess current research efforts in this direction from the perspective of active audition and (2) to derive new research directions from this perspective. The session is planned as an oral session with an optional panel discussion at the end.
- research on active speech perception in humans
- research on adaptive interaction behavior in humans (e.g. entrainment as an implicit method to guide the interaction partner’s communicative behavior)
- research on adaptation to the environment (e.g. adaptation of volume to noise, adaptation to situation)
- systems changing their orientation, position in the room or appearance to actively shape their perception
- systems which control their attention in the communication actively
- systems which change their interaction with the communication partner to seek the information they want
- systems that encourage their interaction partner to change his interaction such that the system’s needs are better fulfilled (e.g. please speak more slowly
- research on entrainment in communications with the focus to derive the desired information
Submission: March 20, 2015
Acceptance notification: June 1, 2015
Camera-ready paper: June 10, 2015
Conference dates: September 6-10, 2015
Papers submitted to this Special Sessions have to be submitted following the same schedule and procedure as regular INTERSPEECH papers (INTERSPEECH paper submission). When submitting your paper please check the corresponding box for the Special Session on "Active Perception in Human and Machine Speech Communication" in the INTERSPEECH submission system.
The papers will undergo the same review process by anonymous and independent reviewers as the remaining INTERSPEECH submissions.
Honda Research Institute Europe GmbH
e-mail: martin.heckmann at honda-ri.de
University of Bielefeld
e-mail: bwrede at techfak.uni-bielefeld.de
Ruhr Universität Bochum
e-mail: dorothea.kolossa at rub.de
Technische Universität Berlin
e-mail: alexander.raake at telekom.de