Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper extends the idea of learning intrinsic rewards to the centralized learning - decentralized execution, cooperative multi-agent setting. This setting has become popular in past years, as a setting that has high potential for real world applications and being amenable to progress towards tractable solutions. The approach presented by this work is easy to conceptually simple and well motivated. The authors empirically show that it outperforms existing state of the art approaches on challenging StarCraft benchmark tasks. Reviewers raised several concerns about the paper, including clarity (experiment details, precise description of the approach and distinction from existing approaches), and the need for further analysis. Most of the noted points were addressed by the authors in the rebuttal. In particular, the authors provided an additional diagnostic experiment that sheds light on how individual agents learn different internal proxy rewards. Overall, the paper is assessed as making a valuable contribution to NeurIPS. The proposed approach is well motivated and works well in practice. I strongly urge the authors to carefully consider all reviewer feedback when preparing the camera ready version.