go back

Neural Posterior Domain Randomization

Fabio Muratore, Theo Gruner, Florian Wiese, Boris Belousov, Michael Gienger, Jan Peters, "Neural Posterior Domain Randomization", Conference on Robbot Learning (CorL), 2021.

Abstract

Combining domain randomization and reinforcement learning is a widely used approach to train control policies that can bridge the gap between simulation and reality. However, existing methods make avoidable assumptions. Typically, one type of probability distribution (e.g., normal or uniform) is chosen a beforehand for every domain parameter. Another common assumption is the differentiability of the simulator. These design decisions simplify modeling and implementation, but also limit applicability as well as the expressiveness of the belief over domain parameters. Building on novel neural likelihood-free inference methods we introduce Neural Posterior Domain Randomization (NPDR), an algorithm which alternates learning a policy from a randomized simulator and adapting the posterior distribution over the simulator’s parameters in a Bayesian way. Our approach only requires a parameterized gener-ative model, coarse prior ranges, a policy (optionally with optimization routine),and a small set of real-world observations to compute features from. Most notably importantly, the generative model, e.g. a physics simulator, does not have to be differentiable, and the domain parameter distribution is not restricted to a specific type. We show that the presented method is able to efficiently perform distributional system identification on 3 robotic systems. Moreover, we demonstrate that NPDR can learn transferable policy with less than X real-world roll-outs.



Download Bibtex file Download PDF

Search