{"title": "Multiplicative Weights Updates with Constant Step-Size in Graphical Constant-Sum Games", "book": "Advances in Neural Information Processing Systems", "page_first": 3528, "page_last": 3538, "abstract": "Since Multiplicative Weights (MW) updates are the discrete analogue of the continuous Replicator Dynamics (RD), some researchers had expected their qualitative behaviours would be similar. We show that this is false in the context of graphical constant-sum games, which include two-person zero-sum games as special cases. In such games which have a fully-mixed Nash Equilibrium (NE), it was known that RD satisfy the permanence and Poincare recurrence properties, but we show that MW updates with any constant step-size eps > 0 converge to the boundary of the state space, and thus do not satisfy the two properties. Using this result, we show that MW updates have a regret lower bound of Omega( 1 / (eps T) ), while it was known that the regret of RD is upper bounded by O( 1 / T ).\n\nInterestingly, the regret perspective can be useful for better understanding of the behaviours of MW updates. In a two-person zero-sum game, if it has a unique NE which is fully mixed, then we show, via regret, that for any sufficiently small eps, there exist at least two probability densities and a constant Z > 0, such that for any arbitrarily small z > 0, each of the two densities fluctuates above Z and below z infinitely often.", "full_text": "Multiplicative Weights Updates with Constant\nStep-Size in Graphical Constant-Sum Games\n\nYun Kuen Cheung \u2217\n\nSingapore University of Technology and Design\n\nSingapore\n\nyunkuen_cheung@sutd.edu.sg\n\nAbstract\n\nSince Multiplicative Weights (MW) updates are the discrete analogue of the contin-\nuous Replicator Dynamics (RD), some researchers had expected their qualitative\nbehaviours would be similar. We show that this is false in the context of graphical\nconstant-sum games, which include two-person zero-sum games as special cases.\nIn such games which have a fully-mixed Nash Equilibrium (NE), it was known that\nRD satisfy the permanence and Poincar\u00e9 recurrence properties, but we show that\nMW updates with any constant step-size \u03b5 > 0 converge to the boundary of the\nstate space, and thus do not satisfy the two properties. Using this result, we show\nthat MW updates have a regret lower bound of \u2126(1/(\u03b5T )), while it was known\nthat the regret of RD is upper bounded by O(1/T ).\nInterestingly, the regret perspective can be useful for better understanding of the\nbehaviours of MW updates. In a two-person zero-sum game, if it has a unique NE\nwhich is fully mixed, then we show, via regret, that for any suf\ufb01ciently small \u03b5,\nthere exist at least two probability densities and a constant Z > 0, such that for\nany arbitrarily small z > 0, each of the two densities \ufb02uctuates above Z and below\nz in\ufb01nitely often.\n\n1\n\nIntroduction\n\nThe concept of Nash Equilibrium (NE) has been central in game theory. The existential proof of\nNash [20] is non-constructive, while the de\ufb01nition of NE itself is also static, both of which shed no\ninsight how NE can be computed or reached. In turn, lots of researchers have devoted efforts to justify\nthe concept of NE by providing algorithms/dynamics which might compute/reach a NE. Among\nthem, Multiplicative Weights (MW) updates have drawn a lot of attention, due to its simplicity and\nnaturalness, and perhaps more importantly, its distributive implementability2 which is essential in\ngames we observe in reality, where information communicated between players is often very limited.\nMW updates have also made profound impacts in algorithm design; see [1] for details. However,\nvarious PPAD-hardness and communication complexity results [8, 6, 13] serve as strong indicators\nthat no ef\ufb01cient algorithm/dynamic, MW updates included, can ef\ufb01ciently compute/reach NE for\ngeneral games. But can MW updates do so for interesting sub-families of games?\nThe best we could hope for is pointwise convergence toward a NE, but it is known not to hold\neven in the simplest scenario of two-person zero-sum games. A weaker notion of convergence,\ncalled empirical convergence (i.e., the average of the time series history converges), has been sought.\nWhile this notion might seem less natural, it is interesting from the perspective of statistics; this\n\u2217Most work done while the author was at Max-Planck Institute for Informatics, Saarland Informatics Campus.\n2In the context of game dynamics, distributive implementability means each player only needs information\nshe can observe locally (e.g., payoffs to each of her own strategies) to run the updates, and does not need to\nknow other global information such as the value of the underlying game matrix and the updates of other players.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\f\u221a\nT ), hence the empirical average at time T forms an O(1/\n\nnotion, coined as \u201cergodic convergence\u201d in the study of dynamical systems, is central in a branch\nof mathematics called ergodic theory, where its initial development was motivated by the study\nof time-average behaviours of systems of interest in statistical physics. Freund and Schapire [10]\nshowed that MW updates converge to NE empirically in any two-person zero-sum game using the\nnotion of regret; their analysis also yields a simple and beautiful proof of John von Neumann\u2019s\nMinimax Theorem. Daskalakis and Papadimitriou [9] and Cai and Daskalakis [4] generalized to\ngraphical constant-sum games and separable zero-sum games.\nIt should be noted that, however, their precise result is actually the following: if the \ufb01nishing time\n\u221a\nT is known a priori3, then there is a step-size \u03b5, which depends on T , such that the regret at time\nT is O(1/\nT )-approximate NE. When\n\u03b5 is a \ufb01xed constant and T (cid:37) \u221e, their analyses can only show a regret bound of O(\u03b5), and it\nis not clear whether empirical convergence toward NE occurs. Recently there are several work\nshowing distributed dynamics achieve oT (1) regret in two-person zero-sum games [7, 24] and general\ngames [28], but all of them require the step-size to be diminishing or adaptive. However, in some\napplications of biological system models, e.g., population dynamics (see [26, 15]) and evolution [5],\ndiminishing or adaptive step-size is unnatural. Also, from an algorithmic perspective, it is natural\nto ask what happens if the step-size is constant, and if the use of diminishing or adaptive step-size\nis unavoidable. Indeed, in last year, Palaiopanos et al. [21] have addressed the same question in the\ncontext of congestion games.\nIn this paper, we study MW updates with constant step-size in graphical constant-sum games. It was\nknown that Replicator Dynamics (RD), the continuous analogue of MW updates, in such games are\npermanent [14], i.e., all probability densities are bounded away from zero throughout. Also, RD in\nsuch games satisfy the Poincar\u00e9 recurrence property [23, 18]. In algorithm design and modelling of\nbiological systems, continuous-time dynamics like RD are hardly feasible, so we are interested in\nthe behaviours of MW updates. Due to the analog between MW updates and RD, it seems natural to\nexpect MW updates could retain the above good properties of RD. Unfortunately, this is not true. We\nshow that MW updates in such games do not satisfy permanence and Poincar\u00e9 recurrence properties;\nindeed, we show that MW updates converge toward the boundary of the state space (unless the\nstarting point is NE), which is stronger than non-satisfaction of the above two properties.\nWe note a recent independent work of Bailey and Piliouras [2], who show the same result in a more\ngeneral setting: \ufb01rst, they generalize to discrete Follow-The-Regularized-Leader (FTRL) dynamics,\nand second, they allow the step-sizes to be mildly decreasing. Precisely, they show that if a FTRL\ndynamic guarantees every update must stay in the interior of the state space (MW is an example\nof such FTRL dynamic), then the dynamic converges toward the boundary; otherwise, they show\nthat the FTRL dynamic gets arbitrarily close to the boundary in\ufb01nitely often (this is weaker than\n\u201cconvergence toward the boundary\u201d). In our paper, we proceed further by using the result to get a\nbetter understanding of the behaviours of MW updates and the regret, as we will discuss next.\nWe show a regret lower bound of \u2126(1/(\u03b5T )) plus a positive term, which is proportional to the average\nvariance of payoffs among strategies over time. This lower bound should be compared to RD\u2019s regret\nupper bound O(1/T ) by Sorin [27, Theorem 3.1].4 Mertikopoulos et al. [18] generalized Sorin\u2019s\nupper bound to continuous FTRL dynamics.\nInterestingly, the regret perspective can be useful for better understanding of the behaviours of MW\nupdates. In a two-person zero-sum game, if it has a unique NE which is fully mixed, then we show,\nvia regret, that for any suf\ufb01ciently small \u03b5, there exist at least two probability densities and a constant\nZ > 0, such that for any arbitrarily small z > 0, each of the two densities \ufb02uctuates above Z and\nbelow z in\ufb01nitely often.\n\nContinuous vs. Discrete Dynamics. As we will see, from a high-level perspective, we are exploit-\ning the interplay between continuous dynamics and their analogous discrete dynamics. Such interplay\nhas been exploited before in other contexts; see, for instance, Sorin [27], Kwon and Mertikopou-\nlos [17] and Bena\u00efm [3]. Brie\ufb02y speaking, in [27, 17], they showed that the disparity between the two\ndynamics is under control up to a certain time by choosing a suitable time-dependent step-size. Thus,\n\n3Now it is standard that the knowing-the-\ufb01nishing-time assumption can be get rid of by employing a\n\n\u201cdoubling trick\u201d, but the step-size will be diminishing over time.\n\n4Sorin [27] showed this upper bound in the unilateral setting, i.e., the bound holds for any player which uses\nRD, while how the environment (e.g., game payoffs, other players behaviours) varies over time does not matter.\n\n2\n\n\fif a good property holds for the continuous dynamic (which is often easier to show in the continuous\ntime setting), it might carry over to the discrete analogue, sometimes with a depreciation due to the\ndisparity. In the contexts we study, we show that the disparity must accumulate inde\ufb01nitely, and\neventually lead to very different qualitative behaviours between RD and MW updates.\n\nOther Related Work. The long term (asymptotic) behaviours of learning/evolution dynamics\nin games, biological and other systems have attracted attention from researchers across multiple\ndisciplines for decades; see the text of Hofbauer and Sigmund [15] for an extensive summary. A\nlarge number of work have focused on continuous dynamics. Even in simple games like Paper-\nRock-Scissors or small three-player graphical games, learning dynamics exhibit rich and sometimes\nsurprising long term behaviours; see, for instance, [29, 11, 12, 25, 16, 22, 19]. Discrete dynamics\nare more relevant from an algorithmic perspective, and certainly deserve more attention in this\nalgorithmic era. Our work suggests that the long term behaviours of continuous dynamics and their\ndiscrete counterparts can be very different, and a better understanding of such behaviours might shed\ninsights on game-theoretic benchmarks such as regret.\n\n2 Preliminary\n\nTypes of Games, and Nash Equilibrium.\nIn a general bimatrix game with two players, suppose\nPlayers A and B have strategy sets SA and SB respectively. The game is depicted by a bimatrix\nM = [(aij, bij)]i\u2208SA,j\u2208SB : when Player A chooses strategy i and Player B chooses strategy j,\naij, bij are the payoffs to Players A and B respectively. Such a game is called two-person constant-\nsum game if for all i \u2208 SA, j \u2208 SB, aij + bij = C for some real number C; such a game is called\ntwo-person zero-sum game if it is a two-person constant-sum game with C = 0.\nIn a game with m players, we number the players by 1, 2,\u00b7\u00b7\u00b7 , m, and let Si denote the strategy\nset of Player i, and ni := |Si|. For any s = (s1, s2,\u00b7\u00b7\u00b7 , sm) \u2208 \u00d7m\ni=1Si, where si is the choice\nof strategy of Player i, let ui(s) denote the payoff to Player i. Such a game is called separable\nzero-sum multiplayer game if for any s =\u2208 \u00d7m\nA game with m players is a graphical polymatrix game if the game is de\ufb01ned as follows: on an\nundirected graph G = ([m], E), each edge (i, j) \u2208 E corresponds to a bimatrix game between\nPlayers i and j with strategy sets Si and Sj respectively. It is worth noting that the strategy set of a\nPlayer i in different bimatrix games is the same, and every time she plays the game, she must choose\nthe same strategy for all these bimatrix games. Such a game is a graphical constant-sum game if\nthe bimatrix game corresponded by every edge is a two-person constant-sum game (different bimatrix\ngames may have different constants C).\nTheorem 1 ([4]). Every separable zero-sum multiplayer game can be transformed into a graphical\nconstant-sum game, while preserving all the payoffs.\n\ni=1Si,(cid:80)m\n\ni=1 ui(s) = 0.\n\nj\u2208Si\n\na vector xi \u2208 Rni with(cid:80)\n\nDue to the above theorem, we focus on developing our results on graphical constant-sum games; all\nthese results automatically carry over to separable zero-sum multiplayer games. Also, by suitable\nscaling, we can assume that the payoff to every player always lie within the interval [\u22121, +1].\nA mixed strategy of a Player i is a probability distribution over her strategy set, represented by\nxij = 1, where xij is the probability density that strategy j is\nchosen. When each Player i chooses a mixed strategy xi independently, the payoff function extends\nnaturally by ui(x1, x2,\u00b7\u00b7\u00b7 , xm) = Es\u223c\u00d7m\nj=1xj [ui(s)]. The boundary of the mixed strategy space\nis \u222ai,j\u2208Si{x | xij = 0}.\nWe say (x1, x2,\u00b7\u00b7\u00b7 , xm) is a Nash equilibrium (NE) if no player can change her mixed strategy for\nachieving a higher payoff. Precisely, for any Player i, let x\u2212i denote the mixed strategies chosen by\ni, x\u2212i) \u2264 ui(xi, x\u2212i).\nall players other than Player i, then for any mixed strategy x(cid:48)\nA NE is said to be fully mixed if no probability density in any of the xi is zero. For any Player i, let\nej denote a pure strategy j of her (i.e., a mixed strategy with probability one on strategy j).\n\ni of Player i, ui(x(cid:48)\n\nReplicator Dynamic and Multiplicative Weights Updates.\nIn a game with m players where each\nplayer employs a Replicator Dynamic (RD), each Player i maintains a mixed strategy xi(t) which is\nupdated continuously with time t. Let x(t) = (x1(t), x2(t),\u00b7\u00b7\u00b7 , xm(t)). The update rule is given\n\n3\n\n\fby a differential equation system, for each strategy j \u2208 Si,\n\nxij(t) = xij(t) \u00b7 [ui(ej, x\u2212i(t)) \u2212 ui(x(t))] .\n\nd\ndt\n\nIf times are discrete at non-negative integers, and if each Player i employs a Multiplicative Weights\n(MW) updates with step-size \u03b5i > 0, the update rule is\n\nxij(t + 1) =\n\n(cid:80)\n\nxij(t) \u00b7 exp (\u03b5i \u00b7 ui(ej, x\u2212i(t)))\nk\u2208Si\n\nxik(t) \u00b7 exp (\u03b5i \u00b7 ui(ek, x\u2212i(t)))\n\n.\n\nIt is well-known that MW updates are discrete analogue of RD.\nThroughout this paper, we always assume that for all i \u2208 [m] and j \u2208 Si, every starting density xij(0)\nis strictly positive, i.e., x(0) is fully mixed. Also, in all our results, we always assume \u03b5i \u2264 1/4.\nThe regret of Player i is\n\n(cid:34)(cid:32)\n(cid:34)(cid:32)\n\nmax\nj\u2208Si\n\n1\nT\n\n1\nT\n\nmax\nj\u2208Si\n\n(cid:90) T\nT\u22121(cid:88)\n\n0\n\n(cid:33)\n(cid:33)\n\n\u2212\n\n(cid:90) T\n\u2212 T\u22121(cid:88)\n\n0\n\n(cid:35)\n(cid:35)\n\nui(ej, x\u2212i(t)) dt\n\nui(x(t)) dt\n\nui(ej, x\u2212i(t))\n\nui(x(t))\n\nt=0\n\nt=0\n\nfor continuous-time model, T > 0;\n\nfor discrete-time model, T \u2208 N.\n\n3 Permanence and Poincar\u00e9 Recurrence\n\nWe \ufb01rst present two prior results concerning RD in general games and graphical constant-sum games.\nTheorem 2 ([27]). In any multiplayer game where the payoff function to Player i is Lebesgue\nintegrable in the mixed strategies of all players and the game parameters, if Player i employs RD,\nwhile the game parameters and mixed strategies of all other players are measurable functions of time,\nthen for any T > 0, the regret of Player i is at most 1\nTheorem 3 ([23]; see also [15, 18]). If a graphical constant-sum game admits a fully-mixed NE x\u2217\nij\u00b7\nx\u2217\n\nand all players employ RD, then for any fully mixed starting point x(0), H(t) := \u2212(cid:80)m\n(cid:80)\nln(xij(t)) is a constant for all t \u2265 0. Consequently, for all t \u2265 0, xij(t) \u2265 exp(cid:0)\u2212H(0)/x\u2217\n\nT \u00b7 maxj\u2208Si ln\n\n(cid:1) > 0,\n\nso the system is permanent (i.e., all xij\u2019s are bounded away from zero throughout), and the \u03c9-set of\nthe dynamic is bounded away from the boundary. Also, the dynamic satis\ufb01es the Poincar\u00e9 recurrence\nproperty.\n\nj\u2208Si\nij\n\n1\n\nxij (0) .\n\ni=1\n\nFor MW updates, to cope with the scenarios where different players use different step-sizes, we make\na slight modi\ufb01cation of the Hamiltonian function H in [23]:\n\nH(t) := \u2212 m(cid:88)\n\n1\n\u03b5i\n\ni=1\n\nij \u00b7 ln(xij(t)).\nx\u2217\n\nLemma 4. If a graphical constant-sum game admits a fully-mixed NE x\u2217 and all players employ\nMW updates, then H(t + 1) \u2265 H(t). More speci\ufb01cally, let Vi(t) denote\n\nxik(t)\u00b7 (exp (\u03b5i \u00b7 ui(ek, x\u2212i(t))) \u2212 1)2 \u2212\n\nxik(t) \u00b7 (exp (\u03b5i \u00b7 ui(ek, x\u2212i(t))) \u2212 1)\n\n,\n\n(cid:33)2\n\nVi(t)\n\n\u03b5i\n\n\u2265 H(t + 1) \u2212 H(t) \u2265 1\n\n4\n\ni=1\n\nVi(t)\n\n\u03b5i\n\n.\n\nBefore proving the lemma, we note that Vi(t) is indeed the variance of the following ran-\ndom variable, and thus is always non-negative:\nthe random variable realizes the value\n(exp (\u03b5i \u00b7 ui(ek, x\u2212i(t))) \u2212 1) with probability xik(t), for all k \u2208 Si. Moreover, if xi(t) is fully\nmixed, then Vi(t) is zero if and only if ui(ek, x\u2212i(t)) is identical for all k \u2208 Si.\n\n4\n\n\u00b7(cid:88)\n\nj\u2208Si\n\n(cid:32)(cid:88)\n(cid:80)m\n\nk\u2208Si\n\ni=1\n\n(cid:88)\nthen (cid:80)m\n\nk\u2208Si\n\n\fi=1\n\n1\n\u03b5i\n\n(cid:34)\n\nij \u00b7\nx\u2217\n\n\u03b5i \u00b7 ui(ej, x\u2212i(t)) \u2212 ln\n\nProof. We \ufb01rst expand H(t + 1) \u2212 H(t) = \u2212(cid:80)m\n= \u2212 m(cid:88)\n\uf8f6\uf8f8 \u2212 1\n= \u2212 m(cid:88)\nthe \ufb01nal equality holds since for each i \u2208 [m],(cid:80)\n\n\u00b7(cid:88)\n\uf8ee\uf8ef\uf8ef\uf8ef\uf8ef\uf8f0\n\uf8eb\uf8ed(cid:88)\n\nij \u00b7 ui(ej, x\u2212i(t))\nx\u2217\n\n\u00b7 ln\n\nj\u2208Si\n\nj\u2208Si\n\n(cid:124)\n\ni=1\n\ni=1\n\n\u03b5i\n\n1\n\u03b5i\n\n\u00b7(cid:80)\n(cid:32)(cid:88)\n(cid:32)(cid:88)\n\nk\u2208Si\n\nk\u2208Si\n\nx\u2217\nij = 1.\n\nj\u2208Si\n\nij \u00b7 ln xij (t+1)\nx\u2217\n\nxij (t) as follows:\n\nj\u2208Si\n\nxik(t) \u00b7 exp (\u03b5i \u00b7 ui(ek, x\u2212i(t)))\n\nxik(t) \u00b7 exp (\u03b5i \u00b7 ui(ek, x\u2212i(t)))\n\n(cid:123)(cid:122)\n\nL\n\n(cid:33)(cid:35)\n(cid:33)\n(cid:125)\n\n\uf8f9\uf8fa\uf8fa\uf8fa\uf8fa\uf8fb ;\n\nNoting that each (exp (\u03b5i \u00b7 ui(ek, x\u2212i(t))) \u2212 1) is within the interval [e\u2212\u03b5i \u2212 1, e\u03b5i \u2212 1], and noting\nthat in this interval the function ln(1 + y) + y2/4 is concave but the function ln(1 + y) + y2 is convex,\nby the Jensen\u2019s inequality, we have\n\nVi(t) \u2265 L \u2212 (cid:88)\nm(cid:88)\n\u2265 H(t + 1) \u2212 H(t) \u2212 m(cid:88)\n\nk\u2208Si\n\ni=1\n\n(cid:88)\n\ni=1\n\nk\u2208Si\n\nm(cid:88)\n\ni=1\n\nVi(t)\n\n\u03b5i\n\nThus,\n\nxik(t) \u00b7 \u03b5i \u00b7 ui(ek, x\u2212i(t)) \u2265 m(cid:88)\n\nVi(t)\n\n4\n\n.\n\ni=1\n\n(xik(t) \u2212 x\u2217\n\nik) \u00b7 ui(ek, x\u2212i(t)) \u2265 1\n4\n\n(1)\n\nVi(t)\n\n\u03b5i\n\n.\n\nm(cid:88)\n\ni=1\n\nBy following the proof of Theorem 3 in [23], one can show that the above double summation is zero.\nWe defer this part of the proof to Section 7.\nTheorem 5. If a graphical constant-sum game admits a fully-mixed NE x\u2217 and all players employ\nMW updates, while the starting point x(0) is not a NE, then\n\n(a) for any \u03b4 > 0, there exists a time T\u03b4 such that for all t \u2265 T\u03b4, there exists some i \u2208 [m] and\n\nj \u2208 Si with xij(t) \u2264 \u03b4. Thus, the \u03c9-set of the dynamic is a subset of the boundary;\n\n(b) let U be an open neighbourhood of x which is bounded away from the boundary, the MW\nupdates will enter U only \ufb01nitely often, i.e., there exists a time T such that for all t \u2265 T ,\nx(t) /\u2208 U; in other words, Poincar\u00e9 recurrence property does not hold.\n\nTheorem 5 should be compared with Theorem 3. The key message is that although MW updates\nare the discrete analogue of RD, their qualitative behaviours differ signi\ufb01cantly. The main technical\nreason behind is that the discretization from RD to MW updates introduces some second-order terms\nwhich accumulate in the bad way and keep pushing the MW updates toward the boundary. Such\nbad accumulation exists even when \u03b5 is arbitrarily tiny, which might be surprising to people not\nfamiliar with numerical methods, since they might have the misconception that once the step-size \u03b5\nis suf\ufb01ciently small, the discretization would always yield a good approximation of its continuous\ncounterpart. Theorem 5(b) is a direct corollary of Theorem 5(a).\nProof. Suppose that H(t) is bounded by some constant q throughout. By Lemma 4, H(t) \u2265 H(0)\nfor all t \u2265 0. Thus, for all t \u2265 0, x(t) always lies in the domain\n\n\uf8f1\uf8f2\uf8f3(x1, x2,\u00b7\u00b7\u00b7 , xm)\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\n\uf8eb\uf8ed\u2212 m(cid:88)\n\ni=1\n\n1\n\u03b5i\n\n\uf8f6\uf8f8 \u2208 [H(0), q]\n\n\uf8fc\uf8fd\uf8fe .\n\nij \u00b7 ln xij\nx\u2217\n\nConsider the function V (x1,\u00b7\u00b7\u00b7 , xm) :=\n\nxik \u00b7 (exp (\u03b5i \u00b7 ui(ek, x\u2212i)) \u2212 1)2 \u2212\n\nxik \u00b7 (exp (\u03b5i \u00b7 ui(ek, x\u2212i)) \u2212 1)\n\nD :=\n\nm(cid:88)\n\ni=1\n\n\u00b7\n\n1\n\u03b5i\n\n\uf8ee\uf8f0(cid:88)\n\nk\u2208Si\n\n\u00b7(cid:88)\n(cid:32)(cid:88)\n\nj\u2208Si\n\nk\u2208Si\n\n(cid:33)2\uf8f9\uf8fb .\n\nWe argue that inf x\u2208D V (x) is positive, which follows readily by the following sequence of observa-\ntions. By de\ufb01nition of D, it is bounded away from the boundary. Note that D is a compact set, since\nit is the inverse of a continuous function for some closed interval. Also, D is bounded away from\n\n5\n\n\fany NE (this is a simple corollary of Lemma 4, since the set of NE is closed). Since for any fully\nmixed point x, V (x) = 0 if and only if x is a NE, and since V is a continuous function on D, we can\nconclude that {V (x) | x \u2208 D} is bounded away from zero.\nLet inf x\u2208D V (x) = r > 0. By Lemma 4, after t =\nH(t) > q, a contradiction.\nThus, we can conclude that for any q > 0, there exists a time Tq such that for all t \u2265 Tq, H(t) > q.\nThis implies that for all t \u2265 Tq, there exists i \u2208 [m] and j \u2208 Si such that x\u2217\n, and\nij\n\u03b5i\n. Theorem 5 follows by setting q = 1\nhence xij(t) < exp\n\n(cid:108) 4(q\u2212H(0))\n\nq(cid:80)m\n(cid:1)\u00b7(cid:80)m\ni=1 ni\ni=1 ni.\n\n(cid:16)\u2212 \u03b5iq(cid:80)m\n\n+ 1 steps of the MW updates,\n\n\u00b7(cid:0)ln 1\n\n\u00b7ln 1\n\nxij (t) >\n\n(cid:17)\n\n(cid:109)\n\nr\n\n\u03b4\n\nmini \u03b5i\n\ni=1 ni\n\n1\n\n1\n\n(cid:17)\n\n(cid:16)\n\n2(ni\u22121) \u00b7 mink\u2208Si xik(0)\n\n4 Regret Lower Bound\nTheorem 6. In any graphical constant-sum game which admits a fully-mixed NE x\u2217 and all players\nemploy MW updates, while the starting point x(0) is not a NE, then there exists a suf\ufb01ciently large\nT such that for all t \u2265 T , there exists a Player i (which can change w.r.t. t) with regret at least\n\u03b5it \u00b7 ln\n. In particular, if xi(0) is uniform, then the regret is at least\n1 +\n4\u03b5i(ni\u22121)t .\nWe compare Theorem 6 with a lower bound result in Daskalakis et al. [7, Theorem 2]. Our result\nfocuses on MW updates, and it applies to any graphical constant-sum game with a fully-mixed NE.\nThe result in [7] focuses on a general lower bound of \u2126(1/T ) regret for any type of distributed\nprotocol, which is more general than ours. To do so, they constructed a speci\ufb01c class of zero-sum\ngames and show that any distributed protocol must lead to an \u2126(1/T ) regret in one of such games.\n\nmaxk\u2208Si xik(0)\n\n1\n\nProof. First, we show the following inequality; (*) follows from the fact ln(y) is a concave function\nof y, and a use of the Jensen\u2019s inequality.\n\nT\u22121(cid:88)\n\nt=0\n\n=\n\nln\n\nln xij(T ) \u2212 ln xij(0)\n\nT\n\nt=0\n\nT\u22121(cid:88)\nT\u22121(cid:88)\nT\u22121(cid:88)\n\nt=0\n\nt=0\n\n=\n\n\u03b5i\nT\n\n(\u2217)\u2264 \u03b5i\n\nT\n\n=\n\n\u03b5i\nT\n\nui(ej, x\u2212i(t)) \u2212 1\nT\n\nui(ej, x\u2212i(t)) \u2212 1\nT\n\nui(ej, x\u2212i(t)) \u2212 \u03b5i\nT\n\nxij(t + 1)\n\nxij(t)\n\n(cid:32)(cid:88)\n(cid:88)\n\nln\n\nt=0\n\nT\u22121(cid:88)\nT\u22121(cid:88)\nT\u22121(cid:88)\n\nt=0\n\nt=0\n\nk\u2208Si\n\nui(x(t)).\n\n(cid:33)\n\nxik(t) \u00b7 exp (\u03b5i \u00b7 ui(ek, x\u2212i(t)))\n\nk\u2208Si\nxik(t) \u00b7 \u03b5i \u00b7 ui(ek, x\u2212i(t))\n\nNote that the \ufb01nal expression, when maximizing over all j \u2208 Si, is exactly \u03b5i times the regret of\nPlayer i.\n2 \u00b7 mini\u2208[m],j\u2208Si xij(0), there exists a time T such that for all t \u2265 T , there\nBy Theorem 5, for \u03b4 = 1\nexists i \u2208 [m], j \u2208 Si such that xij(t) \u2264 xij(0)/2. Thus, for that Player i, there exists some strategy\n(cid:18)\nk \u2208 Si \\ {j} such that xik(t) \u2265 xik(0) +\nln xik(T )\u2212ln xik(0) \u2265 ln\n\n(cid:18)\n2(ni\u22121) \u00b7 xij(0), and hence\n\n\u2265 ln\n\n(cid:19)\n\n1 +\n\n1\n\n1\n\n1\n\n(cid:19)\n\n.\n\n\u00b7 mink\u2208Si xik(0)\nmaxk\u2208Si xik(0)\n\n\u00b7 xij(0)\nxik(0)\n\n2(ni \u2212 1)\n\n2(ni \u2212 1)\n\u03b5iT times the RHS of the above inequality.\n\n1 +\n\n1\n\nThus, the regret of Player i is at least\n\nIndeed, by using (1), the inequality (*) can be improved, and then we have\n\nln xij(T ) \u2212 ln xij(0)\n\n\u03b5iT\n\n+\n\n1\n\n4\u03b5iT\n\nT\u22121(cid:88)\n\nt=0\n\nT\u22121(cid:88)\n\nt=0\n\nVi(t) \u2264 1\nT\n\n6\n\nui(ej, x\u2212i(t)) \u2212 1\nT\n\nT\u22121(cid:88)\n\nt=0\n\nui(x(t))\n\n\f(cid:80)T\u22121\n\n1\n\n4\u03b5iT\n\nThus,\nt=0 Vi(t) can serve as a lower bound of regret. In the last section, we show that if\nthe starting point is fully mixed but not NE, if MW updates were to stay away from the boundary,\nthen Vi(t) is bounded away from zero, and thus a regret lower bound of \u2126\u03b5i(1) could follow. But\nMW updates do converge to the boundary, so it is not clear how to derive a tight lower bound on this\nsum. In particular, we cannot rule out the possibility that MW updates converge to a subgame NE (a\nsubgame is obtained from the original game by removing at least one strategy of some player), and if\nthis happens, Vi(t) converges to zero.\nWe note that essentially the same proof yields the following more general result about general games,\nwhich states that if the dynamic is not Poincar\u00e9 recurrent, then the regret is at least \u2126(1/T ) eventually.\nProposition 7. In any general game where all players employ MW updates with starting point x(0),\nif there exists an open neighbourhood B around x(0) and a time T such that for all t \u2265 T , x(t) /\u2208 B,\nthen for all t \u2265 T , there exists a Player i (which can change w.r.t. t) with regret at least \u2126(1/T ),\nwhere the hidden constant in \u2126(\u00b7) depends only on \u03b5 and the radius of B.\nWe do not have any improvement on the generic regret upper bound. In the next section, we will\nuse such the generic bound, which is 2\u03b5i + C(x(0))\nfor Player i, where C(x(0)) is a constant which\ndepends on the starting point. (See [10].)5 We bound this regret by 2.1 \u00b7 \u03b5i for all suf\ufb01ciently large T .\n\n\u03b5iT\n\n5\n\nIn\ufb01nitely Often Almost Extinction, In\ufb01nitely Often Resurgence\n\nIn this section, we focus on two-person zero-sum (or constant-sum) games. Theorem 5 applies, i.e.,\nbeyond some \ufb01nite time, there must exist some tiny probability density. A natural question to ask is\nwill one density be tiny forever, or some densities take turn to be tiny? In this section, we prove that\nthe former case cannot happen when \u03b5 is suf\ufb01ciently small. For any two-person zero-sum game G, let\nval(G) denote its game value w.r.t. Player 1. In this section, we write \u03b5 = max{\u03b51, \u03b52}.\nGiven a two-person zero-sum game G, name the two players 1 and 2, and their strategy sets are S1\nand S2 respectively. For i = 1, 2, and each j \u2208 Si, let Gij denote the two-person zero-sum game\nwhich is formed from G by removing the strategy j from Player i. Now, de\ufb01ne\n\n(cid:8)val(G) \u2212 val(G1j)(cid:9) , min\n\nk\u2208S2\n\n(cid:8)val(G2k) \u2212 val(G)(cid:9)(cid:27)\n\n.\n\n(cid:26)\n\n\u03b8(G) := min\n\nmin\nj\u2208S1\n\n(cid:80)T\u22121\nt=0 u1(x(t)) \u2208 val(G) \u00b1 2.1 \u00b7 \u03b5.\n\nUsing von Neumann\u2019s Minimax Theorem, it is easy to prove that \u03b8(G) \u2265 0. Intuitively this should be\nalso clear, since removing one strategy from Player 1 will surely not bene\ufb01t her, and removing one\nstrategy from Player 2 provides less choices available to Player 2, and hence might bene\ufb01t Player 1.\nThe following two lemmas can be easily proved using the linear program (LP) formulation of\ntwo-person zero-sum game and the Minimax Theorem; see Section 7 for their proofs.\nLemma 8. If G is a two-person zero-sum game with a unique NE which is fully mixed, then \u03b8(G) > 0.\nLemma 9. In a two-person zero-sum game G, if both players employ MW updates, for all suf\ufb01ciently\nlarge T , we have 1\nT\nTheorem 10. In a two-person zero-sum game with a unique NE which is fully mixed, if both players\nemploy MW updates with step-size \u03b5i < \u03b8(G)/7, then there exists at least two probability densities\nxij(t) which exhibit the following pattern: (a) for any \u03b4 > 0, xij(t) < \u03b4 for in\ufb01nitely many t; and\n(b) xij(t) \u2265 \u03b8(G)/7 for in\ufb01nitely many t.\nWe give some intuition before giving the proof. By Theorem 5, there exists T such that for all t \u2265 T ,\nthere must exist a density at time t which is below \u03b4. Thus, it is possible to \ufb01nd a \ufb01xed i and j \u2208 Si\nsuch that the property (a) holds. Suppose this i = 2; the case i = 1 is symmetric. Suppose that for\nthis xij, property (b) does not hold, i.e., xij(t) remains below some \u03ba after some time T (cid:48). Intuitively,\nthis implies that from time T (cid:48) onwards, the game essentially becomes Gij, modulo perturbation of\nmagnitude O(\u03ba). By Lemma 9, the long-run average payoff to Player 1 from time T (cid:48) onward is\nwithin the interval val(Gij) \u00b1 O(\u03ba) \u00b1 O(\u03b5). On the other hand, by Lemma 9 again, the long-run\naverage payoff to Player 1 from time 0 onward is within the interval val(G) \u00b1 O(\u03b5). These two\naverage payoffs should match, but when \u03b5, \u03ba are both small, the two intervals do not overlap, a\ncontradiction.\n\n\u221a\nT ) upper bound is indeed coming from this bound and pick \u03b5i = \u0398(1/\n\n\u221a\nT ).\n\n5The well-known O(1/\n\n7\n\n\fProof. To avoid cluster of algebra, we let v := val(G) and vij := val(Gij). Let \u03ba := \u03b8(G)/7.\nSuppose that the concerned density is x1j. By the regret upper bound, for all k \u2208 S1, and for all\nsuf\ufb01ciently large T (cid:48)(cid:48) > T (cid:48),\n\n\uf8ee\uf8f0 T (cid:48)(cid:48)(cid:88)\n\n1\n\nt=T (cid:48)\n\nT (cid:48)(cid:48) \u2212 T (cid:48) + 1\n\nu1(ek, x2(t)) \u2212 T (cid:48)(cid:48)(cid:88)\n(cid:80)T (cid:48)(cid:48)\n(cid:80)T (cid:48)(cid:48)\nt=T (cid:48) x2(t)) \u2265 v1j. Thus,\nT (cid:48)(cid:48)(cid:88)\n\n1\n\nt=T (cid:48)\n\n1\n\nNote that we can rewrite\nthe Minimax Theorem, we can guarantee that\nu1(ek,\n\nT (cid:48)(cid:48)\u2212T (cid:48)+1\n\n1\n\nT (cid:48)(cid:48)\u2212T (cid:48)+1\n\n\uf8f9\uf8fb \u2264 2.1 \u00b7 \u03b5.\n(cid:80)T (cid:48)(cid:48)\n\n1\n\nu1(x(t))\n\nt=T (cid:48) u1(ek, x2(t)) as u1(ek,\n\nt=T (cid:48) x2(t)). By\nthere is some k \u2208 S1 \\ {j} such that\n\nT (cid:48)(cid:48)\u2212T (cid:48)+1\n\n\uf8eb\uf8edek,\nT (cid:48)(cid:48)(cid:88)\n\nt=T (cid:48)\n\n(2)\nNext, consider Player 2. By the regret upper bound, for all k \u2208 S2, and for all suf\ufb01ciently large\nT (cid:48)(cid:48) > T (cid:48),\n\nT (cid:48)(cid:48) \u2212 T (cid:48) + 1\n\nt=T (cid:48)\n\nu1(x(t)) \u2265 v1j \u2212 2.1 \u00b7 \u03b5.\n\n\uf8eb\uf8edek,\n\nu2\n\n(cid:124)\n\n=\n\n1\n\nT (cid:48)(cid:48) \u2212 T (cid:48) + 1\n\n\uf8f6\uf8f8\n(cid:125)\n\nx1(t)\n\nT (cid:48)(cid:48)(cid:88)\n(cid:123)(cid:122)\n\uf8ee\uf8f0 T (cid:48)(cid:48)(cid:88)\nu2(ek, x1(t)) \u2212 T (cid:48)(cid:48)(cid:88)\n\nt=T (cid:48)\n\n+\n\nWk\n\n1\n\nT (cid:48)(cid:48) \u2212 T (cid:48) + 1\n\nt=T (cid:48)\n\nu1(x(t))\n\nT (cid:48)(cid:48)(cid:88)\n\uf8f9\uf8fb \u2264 2.1 \u00b7 \u03b5.\n\nu2(x(t))\n\nt=T (cid:48)\n\n1\n\nT (cid:48)(cid:48) \u2212 T (cid:48) + 1\n\nt=T (cid:48)\n\nNote that in the summation in the term Wk, x1j(t) < \u03ba by assumption. For each t, we construct a new\n1j(t) = 0, and for any k \u2208 S1 \\ {j},\nprobability distribution on S1, denoted by x(cid:48)\n1k(t) = x1k(t) + 1|S1|\u22121 \u00b7 x1j(t). Since the payoff value is always within the interval \u00b11, we have\nx(cid:48)\n\n1(t), as follows: x(cid:48)\n\nW (cid:48)\n\nk \u2212 2\u03ba := u2\n\n1\n\nT (cid:48)(cid:48) \u2212 T (cid:48) + 1\n\nx(cid:48)\n1(t)\n\n\uf8f6\uf8f8 \u2212 2\u03ba \u2264 Wk.\n\nT (cid:48)(cid:48)(cid:88)\n\nt=T (cid:48)\n\nBy the Minimax Theorem, we can guarantee that there is some k \u2208 S2 such that W (cid:48)\nCombining all the inequalities above yields\n\nk \u2265 \u2212v1j.\n\n1\n\nT (cid:48)(cid:48) \u2212 T (cid:48) + 1\n\nu1(x(t)) \u2264 v1j + 2.1 \u00b7 \u03b5 + 2\u03ba.\n\n(3)\n\nInequalities (2) and (3) imply that the long-run average payoff to Player 1 from time T (cid:48) onward is\nwithin the interval v1j \u00b1 2.1 \u00b7 \u03b5 \u00b1 2\u03ba. But Lemma 9 states that the long-run average payoff to Player\n1 from time 0 onward is within the interval v \u00b1 2.1 \u00b7 \u03b5. Since the former average is obtained by only\nignoring \ufb01nitely many terms in the beginning, these two averages are essentially the same in the long\nrun, i.e., the two intervals must overlap. However, this is not possible when \u03b5, \u03ba \u2264 \u03b8(G)/7.\nFinally, note that among the times t where xij(t) \u2265 \u03b8(G)/7, there must be another density, say\nxi(cid:48)j(cid:48), satisfying xi(cid:48)j(cid:48)(t) < \u03b4 in\ufb01nitely often. By reiterating the above argument for this xi(cid:48)j(cid:48), we are\ndone.\n\n6 Discussion and Some Open Problems\n\nIn this paper, we provide a better understanding of MW updates with constant step-size in graphical\nconstant-sum games. Yet, a number of interesting problems are still unsolved. We raise some:\n\u2022 While we provide a lower bound on the regret, the best upper bound we know is still the generic\n\u03b5 + oT (1) one, which applies in rather general scenarios and has not exploited any structure of\ngraphical constant-sum games. Will better lower/upper bound be admissible?\n\n8\n\n\f\u2022 We can only prove that the \ufb02uctuating pattern described in Theorem 10 exists for two-person\nconstant-sum games, but not general graphical constant-sum games. The technical reason is we\nneed several nice properties of von Neumann\u2019s LP formulation and Minimax Theorem6 to establish\nLemmas 8 and 9, which are not known for graphical constant-sum games. Can we generalize?\n\u2022 Even for two-person zero-sum games, Theorem 10 has not yet provided the complete picture. Will\nall densities exhibit such \ufb02uctuating pattern, or only some of them do? If it is the latter, given the\ngame and the starting point, can we determine (by a mathematical proof, or by a polynomial time\nalgorithm) which densities exhibit such \ufb02uctuating phenomenon?\n\n7 Missing Proofs\n\nThe Double Summation in the Proof of Lemma 4.\nIn a graphical constant-sum game, suppose\nthe underlying graph is G = ([m], E), and each edge (i, (cid:96)) \u2208 E corresponds to a constant-sum game;\nwe will use the matrix Ai(cid:96) to denote the payoffs to Player i in this game.\nIn the calculation below, we write x\u2022 \u2261 x\u2022(t), i.e., we ignore the parameter t.\nFirst, we rewrite the double summation as below:\n\nm(cid:88)\n\n(cid:88)\n\ni=1\n\nk\u2208Si\n\n(xik \u2212 x\u2217\n\nm(cid:88)\n\n(cid:88)\n\ni=1\n\n(cid:96):(i,(cid:96))\u2208E\n\n(xi \u2212 x\u2217\n\ni )TAi(cid:96)x(cid:96).\n\n(cid:16)(cid:80)\n\nik) \u00b7 ui(ek, x\u2212i) =\n(cid:88)\n\n(xi \u2212 x\u2217\n\n(cid:96):(i,(cid:96))\u2208E\n\n(cid:96) = 0,\n\ni )TAi(cid:96)x\u2217\n(cid:88)\n\n(cid:17)\n\nmust\n\n(cid:1)(cid:105)\n\n.\n\nSince x\u2217 is fully mixed, by the de\ufb01nition of NE, every entry in the vector\nbe identical. Thus,\n\n(cid:96):(i,(cid:96))\u2208E Ai(cid:96)x\u2217\n\n(cid:96)\n\nand hence\n\nm(cid:88)\n\n(cid:88)\n\ni=1\n\nk\u2208Si\n\nik) \u00b7 ui(ek, x\u2212i) =\n\nm(cid:88)\n(cid:2)(xi \u2212 x\u2217\n(cid:96) ) + (x(cid:96) \u2212 x\u2217\ni )TAi(cid:96)(x(cid:96) \u2212 x\u2217\n(cid:104)(cid:0)(xi)TAi(cid:96)x(cid:96) + (x(cid:96))TA(cid:96)ixi\n\n(cid:1) + (cid:0)(x\u2217\n\n(cid:96):(i,(cid:96))\u2208E\n\ni=1\n\n(xik \u2212 x\u2217\n\n(cid:88)\n(cid:88)\n\n(i,(cid:96))\u2208E\n\n(i,(cid:96))\u2208E\n\n=\n\n=\n\n\u2212 (cid:0)(x\u2217\n\n(xi \u2212 x\u2217\n\ni )TAi(cid:96)(x(cid:96) \u2212 x\u2217\n(cid:96) )\n\n(cid:96) )TA(cid:96)i(xi \u2212 x\u2217\n\ni )(cid:3)\n(cid:1) \u2212 (cid:0)(xi)TAi(cid:96)x\u2217\n\ni )TAi(cid:96)x\u2217\n\n(cid:96) + (x\u2217\n\n(cid:96) )TA(cid:96)ix\u2217\n\n(cid:1)\n(cid:96) + (x\u2217\n\ni\n\ni )TAi(cid:96)x(cid:96) + (x(cid:96))TA(cid:96)ix\u2217\n\ni\n\n(cid:96) )TA(cid:96)ixi\n\nNote that in the \ufb01nal expression, there are four terms, while each term is the sum of payoffs to the\nPlayers i and (cid:96) in the two-person constant-sum game corresponded by the edge (i, (cid:96)) assuming the\nplayers are using some mixed strategies. Therefore, the four terms are equal, and thus the overall\nexpression is zero.\n\nProof of Lemma 8. We prove the case when Player 1 has strategy j removed; the case for Player 2\nis symmetric.\nThe game value val(G) can be described to be the following: Player 1 picks a probability distribution\n1, x2) is always at least v, and val(G) is\nx\u2217\n1 such that no matter what Player 2\u2019s choice x2 is, u1(x\u2217\nthe maximum possible value of v, while x\u2217\n1 forms the mixed strategy of the player in a NE. Due to\nthe assumption that the unique NE is fully mixed, there is a unique fully mixed x\u2217\n1 that attains the\nmaximum possible value of v; in other words, any x1 with x1j = 0 (which is equivalent to strategy j\nbeing removed) must attain a value of v strictly less than val(G), i.e., val(G1j) < val(G).\n\nProof of Lemma 9. The proof follows closely the logic behind the derivations of inequalities (2)\nand (3).\n\n6The root of these properties is the \u201cabsolute con\ufb02ict\u201d nature of two-person constant-sum games, which does\n\nnot exist in general graphical constant-sum games.\n\n9\n\n\fAcknowledgments\n\nThe author would like to acknowledge Singapore NRF 2018 Fellowship NRF-NRFF2018-07 and\nMOE AcRF Tier 2 Grant 2016-T2-1-170. The author thanks the anonymous reviewers for their helpful\nsuggestions and comments, and for pointing out the prior work about continuous replicator/FTRL\ndynamics and the interplay between them and their discrete counterparts.\n\nReferences\n[1] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm\n\nand applications. Theory of Computing, 8(1):121\u2013164, 2012.\n\n[2] James P. Bailey and Georgios Piliouras. Multiplicative weights update in zero-sum games. In EC, pages\n\n321\u2013338, 2018.\n\n[3] Michel Bena\u00efm. Dynamics of stochastic approximation algorithms. In Jacques Az\u00e9ma, Michel \u00c9mery,\n\nMichel Ledoux, and Marc Yor, editors, S\u00e9minaire de Probabilit\u00e9s XXXIII, pages 1\u201368, 1999.\n\n[4] Yang Cai and Constantinos Daskalakis. On minmax theorems for multiplayer games. In SODA, pages\n\n217\u2013234, 2011.\n\n[5] Erick Chastain, Adi Livnat, Christos H. Papadimitriou, and Umesh V. Vazirani. Multiplicative updates in\n\ncoordination games and the theory of evolution. In ITCS, pages 57\u201358, 2013.\n\n[6] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing two-player nash\n\nequilibria. J. ACM, 56(3):14:1\u201314:57, 2009.\n\n[7] Constantinos Daskalakis, Alan Deckelbaum, and Anthony Kim. Near-optimal no-regret algorithms for\n\nzero-sum games. Games and Economic Behavior, 92:327\u2013348, 2015.\n\n[8] Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. The complexity of computing\n\na nash equilibrium. SIAM J. Comput., 39(1):195\u2013259, 2009.\n\n[9] Constantinos Daskalakis and Christos H. Papadimitriou. On a network generalization of the minmax\n\ntheorem. In ICALP, Part II, pages 423\u2013434, 2009.\n\n[10] Yoav Freund and Robert E. Schapire. Game theory, on-line prediction and boosting. In COLT, pages\n\n325\u2013332, 1996.\n\n[11] Andreas Gaunersdorfer. Time averages for heteroclinic attractors. SIAM J. Appl. Math., 52:1476\u20131489,\n\n1992.\n\n[12] Andreas Gaunersdorfer and Josef Hofbauer. Fictitious play, shapley polygons, and the replicator equation.\n\nGames and Economic Behavior, 11:279\u2013303, 1995.\n\n[13] Sergiu Hart and Yishay Mansour. How long to equilibrium? the communication complexity of uncoupled\n\nequilibrium procedures. Games and Economic Behavior, 69(1):107\u2013126, 2010.\n\n[14] Josef Hofbauer and Karl Sigmund. Permanence for replicator equations. In Dynamical Systems, pages\n\n70\u201391. Springer Berlin Heidelberg, 1987.\n\n[15] Josef Hofbauer and Karl Sigmund. Evolutionary Games and Population Dynamics. Cambridge University\n\nPress, 1998.\n\n[16] Josef Hofbauer, Sylvain Sorin, and Yannick Viossat. Time average replicator and best-reply dynamics.\n\nMath. Oper. Res., 34(2):263\u2013269, 2009.\n\n[17] Joon Kwon and Panayotis Mertikopoulos. A continuous-time approach to online optimization. Journal of\n\nDynamics and Games, 4(2):125\u2013148, 2017.\n\n[18] Panayotis Mertikopoulos, Christos Papadimitriou, and Georgios Piliouras. Cycles in adversarial regularized\n\nlearning. In SODA, pages 2703\u20132717, 2018.\n\n[19] Sai Ganesh Nagarajan, Sameh Mohamed, and Georgios Piliouras. Three body problems in evolutionary\n\ngame dynamics: Convergence, periodicity and limit cycles. In AAMAS, pages 685\u2013693, 2018.\n\n[20] John Nash. Non-cooperative games. The Annals of Mathematics, 54(2):286\u2013295, 1951.\n\n10\n\n\f[21] Gerasimos Palaiopanos, Ioannis Panageas, and Georgios Piliouras. Multiplicative weights update with\nconstant step-size in congestion games: Convergence, limit cycles and chaos. In NIPS, pages 5874\u20135884,\n2017.\n\n[22] Georgios Piliouras and Leonard J. Schulman. Learning dynamics and the co-evolution of competing sexual\n\nspecies. In ITCS, pages 59:1\u201359:3, 2018.\n\n[23] Georgios Piliouras and Jeff S. Shamma. Optimization despite chaos: Convex relaxations to complex limit\n\nsets via poincar\u00e9 recurrence. In SODA, pages 861\u2013873, 2014.\n\n[24] Alexander Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences.\n\nIn NIPS, pages 3066\u20133074, 2013.\n\n[25] Yuzuru Sato, Eizo Akiyama, and J. Doyne Farmer. Chaos in learning a simple two-person game. PNAS,\n\n99(7):4748\u20134751, 2002.\n\n[26] Peter Schuster, Karl Sigmund, Josef Hofbauer, and Robert Wolff. Selfregulation of behaviour in animal\n\nsocieties. part I: Symmetric contests. Biological Cybernetics, 40(1):1\u20138, 1981.\n\n[27] Sylvain Sorin. Exponential weight algorithm in continuous time. Math. Program., 116(1-2):513\u2013528,\n\n2009.\n\n[28] Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire. Fast convergence of regularized\n\nlearning in games. In NIPS, pages 2989\u20132997, 2015.\n\n[29] E. C. Zeeman. Population dynamics from game theory. Lecture Notes in Mathematics, 819:472\u2013497, 1980.\n\n11\n\n\f", "award": [], "sourceid": 1804, "authors": [{"given_name": "Yun Kuen", "family_name": "Cheung", "institution": "Singapore University of Technology and Design"}]}