Processing math: 100%

Logarithmic Regret from Sublinear Hints

Part of Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

Bibtex Paper Reviews And Public Comment » Supplemental

Authors

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Abstract

We consider the online linear optimization problem, where at every step the algorithm plays a point xt in the unit ball, and suffers loss ct,xt for some cost vector ct that is then revealed to the algorithm. Recent work showed that if an algorithm receives a _hint_ ht that has non-trivial correlation with ct before it plays xt, then it can achieve a regret guarantee of O(logT), improving on the bound of Θ(T) in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at _every_ time step. Somewhat surprisingly, we show that an algorithm can obtain O(logT) regret with just O(T) hints under a natural query model; in contrast, we also show that o(T) hints cannot guarantee better than Ω(T) regret. We give two applications of our result, to the well-studied setting of {\em optimistic} regret bounds, and to the problem of online learning with abstention.