NeurIPS 2020

The Value Equivalence Principle for Model-Based Reinforcement Learning

Meta Review

The paper introduces the concept of Value Equivalence (VE) in the model-based RL. VE states that two models are equivalent w.r.t. a set of value functions F and policies Pi if the effect of the Bellman operator for policies in Pi on the value functions in F are the same. The paper theoretically studies several aspects of the VE concept. This is a good theoretical work in MBRL, and especially in what might be called the problem of decision-aware model learning. Three out of four reviewers are in favour of accepting this paper. There are, however, certain concerns, some of which I briefly summarize below. Most of the concerns are related to how this paper positions itself in relation to the current literature. These issues are important, and the current version of the paper doesn’t do a very good job in acknowledging and positioning itself. We had a lot of discussions among reviewers, as well as between AC and SAC. I believe that with an honest effort, the authors can revise the paper and address these concerns without requiring another round of review. So I recommend acceptance of this work trusting the authors to seriously consider these in revising their paper. - The goal of the paper is very similar to the goal of the Value-Aware Model Learning (VAML) framework (as well as more recent Gradient-Aware Model-based Policy Search and Policy-Aware Model Learning methods), though their focus are somewhat different. It is important to emphasize and discuss this relation to VAML more prominently in the main body of the paper, instead of deferring it to an appendix. - The literature on the state abstraction is relevant to this work, even though their goals are not exactly the same. This is especially the case for deep RL approaches that learn both a state abstraction and a model. - It is suggested that the experimental part of the paper could be improved. This isn't critical, but it would be a welcome improvement.