Abstract

Throughout our lives, we as humans acquire an intuitive understanding of our physical environments, a capacity that supports our imagination and planning abilities. Driven by our own curiosity, we learn about object motion and properties via self-curated targeted experiments, that teach us what we do not know. Recently, neural network models have been proposed that learn forward object dynamics from observations like humans. Unlike humans, these models do not actively interact with surrounding objects but learn from human-curated datasets as passive observers. In this work-in-progress, we propose a closed-loop system that teaches itself about forward object dynamics without any human intervention. Our model consists of two parts. A forward dynamics model that models the transition between states and a policy model that tries to predict the dynamics model’s error conditioned on object interactions as its intrinsic reward. We show that our method is able to train forward dynamics models that generalize to unseen physical scenarios and approaches the upper bound of models trained on human-curated data. The model generates complex behaviors with a preference to novel objects.