Part of Advances in Neural Information Processing Systems 24 (NIPS 2011)
Elad Hazan, Satyen Kale
We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in \cite{AbernethyR09}, which is parameterized by a scalar (\alpha). We prove that the regret of \newtron is (O(\log T)) when (\alpha) is a constant that does not vary with horizon (T), and at most (O(T^{2/3})) if (\alpha) is allowed to increase to infinity with (T). For (\alpha) = (O(\log T)), the regret is bounded by (O(\sqrt{T})), thus solving the open problem of \cite{KST08, AbernethyR09}. Our algorithm is based on a novel application of the online Newton method \cite{HAK07}. We test our algorithm and show it to perform well in experiments, even when (\alpha) is a small constant.