NeurIPS 2020

Towards Safe Policy Improvement for Non-Stationary MDPs

Meta Review

With reviewer scores of (9, 7, 6, 6) this submission is overwhelmingly likely to be accepted. The submission describes a novel method combining Safe Policy Improvement in Non-stationary (SPIN) MDPs. The method alternates between Policy Evaluation and Policy Improvement with safety guarantees. The reviewers generally agree that the writing is clear, presents new technical results (based on Assumption #1) and good empirical evaluation in two domains: recommender systems.