Learning Near-Pareto-Optimal Conventions in Polynomial Time

Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)

Bibtex Metadata Paper


Xiaofeng Wang, Tuomas Sandholm


We study how to learn to play a Pareto-optimal strict Nash equilibrium when there exist multiple equilibria and agents may have different pref- erences among the equilibria. We focus on repeated coordination games of non-identical interest where agents do not know the game structure up front and receive noisy payoffs. We design ef´Čücient near-optimal al- gorithms for both the perfect monitoring and the imperfect monitoring setting(where the agents only observe their own payoffs and the joint actions).