Thresholding Bandit with Optimal Aggregate Regret

Tao, Chao; Blanco, Saúl; Peng, Jian; Zhou, Yuan

Thresholding Bandit with Optimal Aggregate Regret

Chao Tao, Saúl Blanco, Jian Peng, Yuan Zhou

Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental

Abstract

We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $\theta$, with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios.

Abstract

Name Change Policy