Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)
Xiaofeng Wang, Tuomas Sandholm
Multiagent learning is a key problem in AI. In the presence of multi- ple Nash equilibria, even agents with non-conﬂicting interests may not be able to learn an optimal coordination policy. The problem is exac- cerbated if the agents do not know the game and independently receive noisy payoffs. So, multiagent reinforfcement learning involves two inter- related problems: identifying the game and learning to play. In this paper, we present optimal adaptive learning, the ﬁrst algorithm that converges to an optimal Nash equilibrium with probability 1 in any team Markov game. We provide a convergence proof, and show that the algorithm’s parameters are easy to set to meet the convergence conditions.