NeurIPS 2020

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

Meta Review

Four knowledgeable reviewers refereed the paper. Several issues were raised, and not all reviewers recommended acceptance. The first points that was brought up in several reviews is the motivation for looking at asymptotically optimal algorithms, and how much these are relevant in practice. The authors successfully addressed this in the rebuttal, and I recommend that this answer is taken as the starting point to reinforce the motivation part of the paper. Futher, reviewer #3 raised the missing comparison with Chu, Li, Reyzin and Schapire, 2011 as a major objection. However, this is in my opinion successfully addressed by the rebuttal through a chain of citations and by pointing out that past bounds are weaker than the reviewer believed. I am therefore lowering the confidence on the low score of reviewer 3. What remains is that the paper extends a series of works proving instance-optimal regret guarantees based on information theoretic lower bounds, now to address the contextual bandit setting, providing finite-time guarantees. This is therefore a worthwhile contribution, passing the acceptance threshold. I strongly encourage the authors to integrate the discussion points from the rebuttal (relation asymptotic optimality and practicality, comparing the dependence of the new and previous bounds in their various problem parameter dependences).