NeurIPS 2020

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

Meta Review

The reviewers were split on this paper, with two indicating accept and one indicating reject. The core contribution is to improve current methods for lifelong multi-task reinforcement leaning by transferring knowledge between tasks. The proposed method provides a mechanism to limit interference between tasks, with theoretical and empirical validation. The reviews also found weaknesses, including (a) ambiguity about the algorithm's performance in a non-stationary setting, (b) the relationship with PG-ELLA, (c) comparisons to stronger baselines, and (d) overly broad claims. The author rebuttal clarified several points, with an experiment in a non-iid setting (a), and a better explanation of the relationship with PG-ELLA (b). The post rebuttal discussion also covered areas where the paper should be improved. These include a better discussion of the assumptions behind the work (stationarity is commonly assumed in the literature but is constraining in a lifelong learning setting), the chosen baselines (adequate but could be stronger), and the language (several claims should be stated more carefully, with references where appropriate). The remaining weaknesses are minor and should be corrected by the authors in the final version. I therefore recommend acceptance.