Shuyue Hu, Chin-wing Leung, Ho-fung Leung
Modelling the dynamics of multi-agent learning has long been an important research topic, but all of the previous works focus on 2-agent settings and mostly use evolutionary game theoretic approaches. In this paper, we study an n-agent setting with n tends to infinity, such that agents learn their policies concurrently over repeated symmetric bimatrix games with some other agents. Using mean field theory, we approximate the effects of other agents on a single agent by an averaged effect. A Fokker-Planck equation that describes the evolution of the probability distribution of Q-values in the agent population is derived. To the best of our knowledge, this is the first time to show the Q-learning dynamics under an n-agent setting can be described by a system of only three equations. We validate our model through comparisons with agent-based simulations on typical symmetric bimatrix games and different initial settings of Q-values.