Part of Advances in Neural Information Processing Systems 18 (NIPS 2005)
Martin Zinkevich, Amy Greenwald, Michael Littman
Although variants of value iteration have been proposed for ﬁnding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demon- strate by construction that existing variants of value iteration cannot ﬁnd stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value it- eration based on a new (non-stationary) equilibrium concept that we call “cyclic equilibria.” We prove that value iteration identiﬁes cyclic equi- libria in a class of games in which it fails to ﬁnd stationary equilibria. We also demonstrate empirically that value iteration ﬁnds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games.