Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track
Volkan Cevher, Ashok Cutkosky, Ali Kavis, Georgios Piliouras, Stratis Skoulakis, Luca Viano
Motivated by alternating game-play in two-player games, we study an altenating variant of the \textit{Online Linear Optimization} (OLO). In alternating OLO, a \textit{learner} at each round t∈[n] selects a vector xt and then an \textit{adversary} selects a cost-vector ct∈[−1,1]n. The learner then experiences cost (ct+ct−1)⊤xt instead of (ct)⊤xt as in standard OLO. We establish that under this small twist, the Ω(√T) lower bound on the regret is no longer valid. More precisely, we present two online learning algorithms for alternating OLO that respectively admit O((logn)4/3T1/3) regret for the n-dimensional simplex and O(ρlogT) regret for the ball of radius ρ>0. Our results imply that in alternating game-play, an agent can always guarantee ˜O((logn)4/3T1/3) regardless the strategies of the other agent while the regret bound improves to O(logT) in case the agent admits only two actions.