Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track
Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Max Springer
We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of O(T23(Klog(|Π|))13) and makes at most O(K) calls per round to an offline optimization oracle, where K denotes the number of actions, T denotes the number of rounds and Π denotes the set of policies. This is the first result to improve the prior best bound of O((TK)23(log(|Π|))13) as obtained by Syrgkanis et al. at NeurIPS 2016, and the first to match the original bound of Langford and Zhang at NeurIPS 2007 which was obtained for the stochastic case.