Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff

Part of Advances in Neural Information Processing Systems 28 (NIPS 2015)

Bibtex »Metadata »Paper »Reviews »Supplemental »

Authors

Ofer Dekel, Ronen Eldan, Tomer Koren

Abstract

Bandit convex optimization is one of the fundamental problems in the field of online learning. The best algorithm for the general bandit convex optimization problem guarantees a regret of $\widetilde{O}(T^{5/6})$, while the best known lower bound is $\Omega(T^{1/2})$. Many attemptshave been made to bridge the huge gap between these bounds. A particularly interesting special case of this problem assumes that the loss functions are smooth. In this case, the best known algorithm guarantees a regret of $\widetilde{O}(T^{2/3})$. We present an efficient algorithm for the banditsmooth convex optimization problem that guarantees a regret of $\widetilde{O}(T^{5/8})$. Our result rules out an $\Omega(T^{2/3})$ lower bound and takes a significant step towards the resolution of this open problem.