go back

A Reinforcement learning model develops causal inference and cue integration abilities

Thomas Weisswange, Constantin Rothkopf, Tobias Rodemann, Jochen Triesch, "A Reinforcement learning model develops causal inference and cue integration abilities", Proceedings of the 2009 Bernstein Conference on Computational Neuroscience, Frankfurt, 2009.

Abstract

In recent years it has been suggested that the performance of human subjects in a large variety of perceptual tasks can be modelled using Bayesian inference (e.g. [1]). The success of these methods stems from their capacity to explicitly represent the involved uncertainties. Recently, such methods have been extended to the task of model selection where the observer not only has to integrate different cues into a single estimate, but needs to first select which causal model best describes the stimuli [2]. As an example, consider the task of orienting towards a putative object. The stimuli consist of an auditory and a visual cue. Depending on the spatial distance between the position measurements provided by the two modalities it is more probable to assume that the signals originated from the same source or from two different sources. An open problem in this area is how the brain acquires the required models and how it learns to perform the proper kind of inference. Since infants and young children have been shown not to integrate cues initially [3,4], it seems likely that extended learning processes play an important role in our developing ability to integrate cues and select appropriate models. In the present study we investigate whether the framework of reinforcement learning (RL) could be used to study these questions. A one-dimensional version of an orienting task is considered, in which an auditory and a visual cue are placed at either the same or different positions. Each cue is corrupted by Gaussian noise with the variance of the auditory noise being larger than that of the visual, reflecting the different uncertainties in the sensory modalities. A positive reward is given if the agent orients to the true position of the object. In case the orienting movement does not target the object, we assume that an additional movement has to be carried out. The cost for each additional movement is proportional to the distance between the current position and the true position of the target. The action selection of the agent is probabilistic, using the softmax rule. Learning takes place using the SARSA algorithm [5]. The simulations show that the reinforcement learning agent is indeed capable of learning to integrate cues taking their relative reliabilities into account when this interpretation leads to a better detection of the target. Furthermore, the agent learns that if the position estimates provided by the two modalities are too far apart, it is better not to integrate the two signals but to select an action that only considers the cue with higher reliability. The displayed behaviour therefore implicitly corresponds to selection of different causal models. Our results suggest that generic reinforcement learning processes may contribute to the development of the ability to integrate different sensory cues and select causal models.



Download Bibtex file Download PDF

Search