Review for NeurIPS paper: Latent Bandits Revisited

NeurIPS 2020

Latent Bandits Revisited

Meta Review

All reviewers are inclined towards acceptance, primarily because of the clear connection to practical settings where some a priori model information is available, development of a Thompson sampling algorithm and its analysis along with natural optimism-based strategies, coverage of misspecification of the latent space, and a reasonably comprehensive experimental evaluation of latent bandit algorithms. Hence I recommend acceptance. However, from some of my earlier readings of prior work on latent bandits, I am not convinced about the validity of the remark "The closest work to ours is that of 278 Maillard and Mannor [23], which proposes and analyzes non-contextual UCB algorithms under the 279 assumption that the mean rewards for each latent state are known"; from what I have seen, M&M actually do not assume the mean rewards are known (e.g., A-UCB strategy). Moreover, the "B" sets play the exact same role as a context, and the Cs are the latent/unknown classes (called s in this paper). So I urge the author(s) to take a closer look at prior work and fix the comparison to related work in a careful manner.