Raman Arora, Michael Dinitz, Teodor Vanislavov Marinov, Mehryar Mohri
The notion of
policy regret'' in online learning is supposed to capture the reactions of the adversary to the actions taken by the learner, which more traditional notions such as external regret do not take into account. We revisit this notion of policy regret, and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play which does well with respect to one must do poorly with respect to the other. We then focus on the game theoretic setting, when the adversary is a self-interested agent. In this setting we show that the external regret and policy regret are not in conflict, and in fact that a wide class of algorithms can ensure both as long as the adversary is also using such an algorithm. We also define a new notion of equilibrium which we call apolicy equilibrium'', and show that no-policy regret algorithms will have play which converges to such an equilibrium. Relating this back to external regret, we show that coarse correlated equilibria (which no-external regret players will converge to) are a strict subset of policy equilibria. So in game-theoretic settings every sequence of play with no external regret also has no policy regret, but the converse is not true.