Reviews: Shaping Belief States with Generative Environment Models for RL

This paper examines the use of generative models for developing representations to improve data efficiency in RL. Specifically, the authors use a generative model that is trained to predict multiple frames into the future (overshooting), and they show that when the model is stochastic (but not deterministic), overshooting leads to useful representations of the environment that can improve RL efficiency. The reviews on this paper were fairly divergent in the first round. Two of the reviewers liked this paper, but one did not feel it provided truly novel contributions, and only brought together previously proposed ideas for using predictive training to improve RL representations. In discussion, the reviewers came to the conclusion that it does demonstrate the utility of overshoot prediction for stochastic models and that an empirical demonstration like this can be useful. But, it was also agreed in discussion that open code would provide means for others to verify and build on these empirical results. In the end, it was decided by the AC and SAC that this paper makes sufficient contribution to be accepted at NeurIPS, but given the discussions, we would strongly encourage the authors to release any code they can. For empirical demonstrations of the utility of ideas like this, open code can greatly enhance the impact of the contributions.

Paper ID:	7484
Title:	Shaping Belief States with Generative Environment Models for RL