go back

LaMI: Large Language Models for Multi-Modal Human-Robot Interaction

Chao Wang, Stephan Hasler, Daniel Tanneberg, Felix Ocker, Frank Joublin, Antonello Ceravola, Jörg Deigmöller, Michael Gienger, "LaMI: Large Language Models for Multi-Modal Human-Robot Interaction ", CHI 2024, 2024.

Abstract

In current approaches for designing human-robot interaction, engineers specialized in the field of robotics establish rules based on the context of an application scenario and a multimodal input from a user in order to define how the robot should react in the specific situation, and to generate an output accordingly. This represents a challenging task, as manually setting up the robot's interactive behavior in a specific situation is complex and requires considerable effort by a specialized engineer. On another hands, large language model (LLM) has the capability of social interaction with users. In this study, we suggested a framework to translate human’s multimodal input/output into text for learning social interaction by observing human-human interaction, later using this prior knowledge to drive multimodal output for robot-human interaction.



Download Bibtex file Download PDF

Search