NeurIPS 2020

Inference for Batched Bandits

Meta Review

The author rebuttal was deemed satisfactory, and most reviewers tended to the positive side. On my personal reading of the paper, I concur with them and recommend acceptance. In particular, one of the reviewers summarized it fairly well in the discussion: "I liked the result on the non-normality when the margin vanishes (Theorem 2), but I am still not positive about the theoretical contributions related to BOLS (Theorems 3 and 4); the limited adaptivity simplifies the theoretical analysis greatly". Another reviewer opined "Given the batched setting, with limited adaptivity, it is not difficult to anticipate the algorithmic contributions of the paper. Nevertheless I think the paper is a good contribution to NeurIPS; the proof of non-normality when margin vanishes is interesting and new. The addition of uniform bounds and the Nie et al paper should round it well". The last reviewer added "My point about related work was addressed reasonably, though I think their experiment may have been a strawman. It is obvious that when CLT bounds apply, they are tighter than time-uniform bounds, the point is that the former do not apply for adaptively stopped experiments, which is half the point of sequential analysis and bandits". Overall the reviewers were very knowledgeable and thoughtful and I concur with all their points of view. The authors would benefit from taking all of their opinions seriously, the community would benefit if the reviewers' opinions were accurately reflected in the final version. In particular, I would request to properly cite the relationship of this works to the various suggested papers, for example on the one on bias in bandits (Nie et al, Shin et al) and anytime-valid confidence bounds mentioned by the last reviewer.