go back

Diversity and Adaptation of Cooperative Agents

Rodrigo Canaan, "Diversity and Adaptation of Cooperative Agents", New York University, 2021.

Abstract

This thesis explores techniques and metrics for creating game-playing Artificial Intelligence (AI) systems that can model and adapt to diverse player behavior in a short time-frame, compatible with human play. We are interested in cooperative games and their relationship to artificial co-creativity, and provide a survey of metrics commonly used to evaluate co-creativity systems. We then focus on Hanabi, a cooperative card game by Antoine Bauza which has won multiple industry awards. The game serves as an ideal testbed for our research due to how the unique nature of its hidden information and communication channel encourage thinking in terms of a theory of mind. We show, through experiments with a Reinforcement Learning agent based on the Rainbow variant of the DQN algorithm, that existing near-state-of-the-art Hanabi agents perform poorly when paired with partners outside of their training sets. We develop an evolutionary framework to evolve agents represented by a series of rules which, at the time of its publication, achieved higher self-play and mixed-play scores than published hand-crafted rule-based agents for the game. We then extend this evolutionary framework by applying techniques from the field of Quality Diversity algorithms, which optimise for fitness (game scores) while simultaneously “illuminating” a specified behavior space with a diverse population of solutions. While the resulting agents display behavioral diversity among themselves, they are not yet imbued with the ability to adapt to unknown partners. Our final step is to build a “Bayesian Meta-Agent” that takes this behaviorally diverse population as initial hypotheses of partner behavior, updates its beliefs about ad-hoc partners through Bayesian inference and selects an appropriate response policy based on its beliefs. The meta-agent is able to improve its performance over the course of 10 games when compared to generalist baseline that does not update its initially uniform belief. This makes the agent a good starting step towards systems that can model and learn from human behavior within a timeframe compatible with human interaction.



Download Bibtex file Per Mail Request

Search