Review for NeurIPS paper: Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

NeurIPS 2020

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Meta Review

summary: The paper presents a practical algorithm for model-based reinforcement learning that addresses the exploration problem with an optimistic approach. In particular they convert epistemic uncertainty into “hallucinated controls” that are optimized, thereby leading to optimistic behavior. pros: - contribution: tractable algorithm for optimistic RL for continuous state and action spaces - relevant theoretical guarantees (sublinear regret) - important topic - well situated in literature cons: - experimental evaluation not entirely convincing - some reviewers found it potentially somewhat incremental meta review: All reviewers agree that this is a good, potentially impactful paper.