Fabio Muratore, Theo Gruner, Florian Wiese, Boris Belousov, Michael Gienger, Jan Peters,
"Neural Posterior Domain Randomization",
Conference on Robbot Learning (CorL), 2021.
Combining domain randomization and reinforcement learning is a widely
used approach to train control policies that can bridge the gap between
simulation and reality. However, existing methods make avoidable
assumptions. Typically, one type of probability distribution (e.g.,
normal or uniform) is chosen a beforehand for every domain parameter.
Another common assumption is the differentiability of the simulator.
These design decisions simplify modeling and implementation, but also
limit applicability as well as the expressiveness of the belief over
domain parameters. Building on novel neural likelihood-free inference
methods we introduce Neural Posterior Domain Randomization (NPDR), an
algorithm which alternates learning a policy from a randomized simulator
and adapting the posterior distribution over the simulator’s
parameters in a Bayesian way. Our approach only requires a parameterized
gener-ative model, coarse prior ranges, a policy (optionally with
optimization routine),and a small set of real-world observations to
compute features from. Most notably importantly, the generative model,
e.g. a physics simulator, does not have to be differentiable, and the
domain parameter distribution is not restricted to a specific type. We
show that the presented method is able to efficiently
perform distributional system identification on 3 robotic systems.
Moreover, we demonstrate that NPDR can learn transferable policy
with less than X real-world roll-outs.
Download Bibtex file