Sebastian Blaes, Marin Vlastelica Pogančić, Jiajie Zhu, Georg Martius
We present a novel intrinsically motivated agent that learns how to control the environment in a sample efficient manner, that is with as few environment interactions as possible, by optimizing learning progress. It learns what can be controlled, how to allocate time and attention as well as the relations between objects using surprise-based motivation. The effectiveness of our method is demonstrated in a synthetic and robotic manipulation environment yielding considerably improved performance and smaller sample complexity compared to an intrinsically motivated, non-hierarchical and state-of-the-art hierarchical baseline. In a nutshell, our work combines several task-level planning agent structures (backtracking search on task-graph, probabilistic road-maps, allocation of search efforts) with intrinsic motivation to achieve learning from scratch.