Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

Wang, Xiaofeng; Sandholm, Tuomas

Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

Xiaofeng Wang, Tuomas Sandholm

Advances in Neural Information Processing Systems 15 (NIPS 2002)

Abstract

Multiagent learning is a key problem in AI. In the presence of multi- ple Nash equilibria, even agents with non-conﬂicting interests may not be able to learn an optimal coordination policy. The problem is exac- cerbated if the agents do not know the game and independently receive noisy payoffs. So, multiagent reinforfcement learning involves two inter- related problems: identifying the game and learning to play. In this paper, we present optimal adaptive learning, the ﬁrst algorithm that converges to an optimal Nash equilibrium with probability 1 in any team Markov game. We provide a convergence proof, and show that the algorithm’s parameters are easy to set to meet the convergence conditions.

Abstract

Name Change Policy