Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper is easy to read. The authors convey their main ideas clearly. The main issue with this paper is that lower bounds are hard to interpret. They do show clearly how these lower bounds match with other results in the top-k bandits or the best-arm identification problem. The regret bounds are also linearly dependent on the number of arm K. This poses problems when the number of arms is very big and therefor this analysis only seems to be interesting in the case when K << T. Therefore while these are finite time bounds, the time still has to significantly bigger than the number of arms. Typo in line 4 of algorithm 1 in the subscript of the argmin.
This paper adds a link between the principle of optimism in the face of uncertainty (widely used in regret minimization setting) and pure exploration problems, and proposes algorithms based on this principle. This idea is natural, and it is not surprised that the proposed algorithms can achieve non-asymptotic guarantee. The paper is clearly written, and it is technically sound. The analysis is a combination of known techniques, mainly from the bandit literature and reference , which makes the paper not that novel. The paper proposes several algorithms, but the numerical experiment is not comprehensive.
(1) This work studies the complexity of pure exploration when the confidence level is a constant. The approach is to solve the minimax optimization problem in the lower bound by viewing it as a two-player game and defining the players' learning dynamics. The authors also discuss a possible trade-off between the power of optimization oracles and the sample complexity guarantee. This seems to me a novel contribution to the field. (2) Overall, the paper is well-written and not too difficult to follow. I was able to follow the proof sketches in Section 3 and the technical claims in the paper seem valid to me. (3) My concern about this work is about the significance of the contributions, due to a lack of comparison to previous approaches and results; see Part 5 for details. *** added after author feedback *** After reading the author feedback, I thank the authors for addressing my concerns and would encourage the authors to add the discussion to the final version.