NeurIPS 2020

Differentiable Meta-Learning of Bandit Policies

Meta Review

The rebuttal helped clarify the questions raised in the review. The consensus reached in the discussion is that this is a borderline-plus paper. The reviewers appreciate the contribution's practicality, relevance and usefulness, and at the same time they do remain concerned about the narrow scope, and would rather have seen the policy-gradient method applied to parameterized policies for more complex learning problems. On the whole, this is a worthwhile addition to the program. The rebuttal did not answer one question successfully, namely regarding the setup in the experiments section, where the learning process operating at two-levels remained confusing. Yet in my opinion the experimental setup approach is sound, and I was not confused by its description in the paper or by the rebuttal. I encourage the authors to further clarify and separate the two levels to address this in the camera ready version.