{"title": "Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks", "book": "Advances in Neural Information Processing Systems", "page_first": 9233, "page_last": 9243, "abstract": "While Nash equilibrium in extensive-form games is well understood, very little is known about the properties of extensive-form correlated equilibrium (EFCE), both from a behavioral and from a computational point of view. In this setting, the strategic behavior of players is complemented by an external device that privately recommends moves to agents as the game progresses; players are free to deviate at any time, but will then not receive future recommendations. Our contributions are threefold. First, we show that an EFCE\ncan be formulated as the solution to a bilinear saddle-point problem. To showcase how this novel formulation can inspire new algorithms to compute EFCEs, we propose a simple subgradient descent method which exploits this formulation and structural properties of EFCEs. Our method has better scalability than the prior approach based on linear programming. Second, we propose two benchmark games, which we hope will serve as the basis for future evaluation of EFCE solvers. These games were chosen so as to cover two natural application domains for EFCE: conflict resolution via a mediator, and bargaining and negotiation. Third, we document the qualitative behavior of EFCE in our proposed games. We show that the social-welfare-maximizing equilibria in these games are highly nontrivial and exhibit surprisingly subtle sequential behavior that so far has not received attention in the literature.", "full_text": "Correlation in Extensive-Form Games:\n\nSaddle-Point Formulation and Benchmarks\u2217\n\nGabriele Farina\n\nComputer Science Department\nCarnegie Mellon University\n\ngfarina@cs.cmu.edu\n\nChun Kai Ling\n\nComputer Science Department\nCarnegie Mellon University\nchunkail@cs.cmu.edu\n\nFei Fang\n\nInstitute for Software Research\n\nCarnegie Mellon University\n\nfeif@cs.cmu.edu\n\nTuomas Sandholm\n\nComputer Science Department, CMU\n\nStrategic Machine, Inc.\n\nStrategy Robot, Inc.\n\nOptimized Markets, Inc.\nsandholm@cs.cmu.edu\n\nAbstract\n\nWhile Nash equilibrium in extensive-form games is well understood, very little\nis known about the properties of extensive-form correlated equilibrium (EFCE),\nboth from a behavioral and from a computational point of view. In this setting, the\nstrategic behavior of players is complemented by an external device that privately\nrecommends moves to agents as the game progresses; players are free to deviate\nat any time, but will then not receive future recommendations. Our contributions\nare threefold. First, we show that an EFCE can be formulated as the solution to a\nbilinear saddle-point problem. To showcase how this novel formulation can inspire\nnew algorithms to compute EFCEs, we propose a simple subgradient descent\nmethod which exploits this formulation and structural properties of EFCEs. Our\nmethod has better scalability than the prior approach based on linear programming.\nSecond, we propose two benchmark games, which we hope will serve as the basis\nfor future evaluation of EFCE solvers. These games were chosen so as to cover\ntwo natural application domains for EFCE: con\ufb02ict resolution via a mediator, and\nbargaining and negotiation. Third, we document the qualitative behavior of EFCE\nin our proposed games. We show that the social-welfare-maximizing equilibria\nin these games are highly nontrivial and exhibit surprisingly subtle sequential\nbehavior that so far has not received attention in the literature.\n\n1\n\nIntroduction\n\nNash equilibrium (NE) [Nash, 1950], the most seminal concept in non-cooperative game theory,\ncaptures a multi-agent setting where each agent is sel\ufb01shly motivated to maximize their own payoff.\nThe assumption underpinning NE is that the interaction is completely decentralized: the behavior of\neach agent is not regulated by any external orchestrator. Contrasted with the other\u2014often utopian\u2014\nextreme of a fully managed interaction, where an external dictator controls the behavior of each agent\nso that the whole system moves to a desired state, the social welfare that can be achieved by NE is\ngenerally lower, sometimes dramatically so [Koutsoupias and Papadimitriou, 1999; Roughgarden and\nTardos, 2002]. Yet, in many realistic interactions, some intermediate form of centralized control can\nbe achieved. In particular, in his landmark paper, Aumann [1974] proposed the concept of correlated\n\n\u2217The full version of this paper is available on arXiv.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fequilibrium (CE), where a mediator (the correlation device) can recommend behavior, but not enforce\nit. In a CE, the correlation device is constructed so that the agents\u2014which are still modeled as fully\nrational and sel\ufb01sh just like in an NE\u2014have no incentive to deviate from the private recommendation.\nAllowing correlation of actions while ensuring sel\ufb01shness makes CE a good candidate solution\nconcept in multi-agent and semi-competitive settings such as traf\ufb01c control, load balancing [Ashlagi\net al., 2008], and carbon abatement [Ray and Gupta, 2009], and it can lead to win-win outcomes.\nIn this paper, we study the natural extension of correlated equilibrium in extensive-form (i.e., sequen-\ntial) games, known as extensive-form correlated equilibrium (EFCE) [von Stengel and Forges, 2008].\nLike CE, EFCE assumes that the strategic interaction is complemented by an external mediator;\nhowever, in an EFCE the mediator only privately reveals the recommended next move to each acting\nplayer, instead of revealing the whole plan of action throughout the game (i.e., recommended move\nat all decision points) for each player at the beginning of the game. Furthermore, while each agent is\nfree to defect from the recommendation at any time, this comes at the cost of future recommendations.\nWhile the properties of correlation in normal-form games are well-studied, they do not automatically\ntransfer to the richer world of sequential interactions. It is known in the study of NE that sequential\ninteractions can pose different challenges, especially in settings where the agents retain private\ninformation. Conceptually, the players can strategically adjust to dynamic observations about the\nenvironment and their opponents as the game progresses. Despite tremendous interest and progress\nin recent years for computing NE in sequential interactions with private information, with signi\ufb01cant\nmilestones achieved in poker games [Bowling et al., 2015; Brown and Sandholm, 2017; Morav\u02c7c\u00edk\net al., 2017; Brown and Sandholm, 2019b] and other large, real-world domains, not much has been\ndone to increase our understanding of (extensive-form) correlated equilibria in these settings.\nContributions Our primary objective with this paper is to spark more interest in the community\ntowards a deeper understanding of the behavioral and computational aspects of EFCE.\n\u2022 In Section 3 we show that an EFCE in a two-player general-sum game is the solution to a bilinear\nsaddle-point problem (BSPP). This conceptual reformulation complements the EFCE construction\nby von Stengel and Forges [2008], and allows for the development of new and ef\ufb01cient algorithms.\nAs a proof of concept, by using our reformulation we devise a variant of projected subgradient\ndescent which outperforms linear-programming(LP)-based algorithms proposed by von Stengel\nand Forges [2008] in large game instances.\n\u2022 In Section 5 we propose two benchmark games; each game is parametric, so that these games can\nscale in size as desired. The \ufb01rst game is a general-sum variant of the classic war game Battleship.\nThe second game is a simpli\ufb01ed version of the Sheriff of Nottingham board game. These games\nwere chosen so as to cover two natural application domains for EFCE: con\ufb02ict resolution via a\nmediator, and bargaining and negotiation.\n\u2022 By analyzing EFCE in our proposed benchmark games, we show that even if the mediator cannot\nenforce behavior, it can induce signi\ufb01cantly higher social welfare than NE and successfully deter\nplayers from deviating in at least two (often connected) ways: (1) using certain sequences of actions\nas \u2018passcodes\u2019 to verify that a player has not deviated: defecting leads to incomplete or wrong\npasscodes which indicate deviation, and (2) inducing opponents to play punitive actions against\nplayers that have deviated from the recommendation, if such a deviation is detected. Crucially,\nboth deterrents are unique to sequential interactions and do not apply to non-sequential games.\nThis corroborates the idea that the mediation of sequential interactions is a qualitatively different\nproblem than that of non-sequential games and further justi\ufb01es the study of EFCE as an interesting\ndirection for the community. To our knowledge, these are the \ufb01rst experimental results and\nobservations on EFCE in the literature.\n\nThe source code for our game generators and subgradient method is published online2.\n\n2 Preliminaries\n\nExtensive-form games (EFGs) are sequential games that are played over a rooted game tree. Each\nnode in the tree belongs to a player and corresponds to a decision point for that player. Outgoing\nedges from a node v correspond to actions that can be taken by the player to which v belongs. Each\nterminal node in the game tree is associated with a tuple of payoffs that the players receive should\n\n2https://github.com/Sandholm-Lab/game-generators\n\nhttps://github.com/Sandholm-Lab/efce-subgradient\n\n2\n\n\fthe game end in that state. To capture imperfect information, the set of vertices of each player is\npartitioned into information sets. The vertices in a same information set are indistinguishable to\nthe player that owns those vertices. For example, in a game of Poker, a player cannot distinguish\nbetween certain states that only differ in opponent\u2019s private hand. As a result, the strategy of the\nplayer (specifying which action to take) is de\ufb01ned on the information sets instead of the vertices. For\nthe purpose of this paper, we only consider perfect-recall EFGs. This property means that each player\ndoes not forget any of their previous action, nor any private or public observation that the player has\nmade. The perfect-recall property can be formalized by requiring that for any two vertices in a same\ninformation set, the paths from those vertices to the root of the game tree contain the exact same\nsequence of actions for the acting player at the information set.\nA pure normal-form strategy for Player i de\ufb01nes a choice of action for every information set that\nbelongs to i. A player can play a mixed strategy, i.e., sample from a distribution over their pure\nnormal-form strategies. However, this representation contains redundancies: some information sets\nfor Player i may become unreachable reachable after the player makes certain decisions higher up in\nthe tree. Omitting these redundancies leads to the notion of reduced-normal-form strategies, which\nare known to be strategically equivalent to normal-form strategies (e.g., [Shoham and Leyton-Brown,\n2009] for more details). Both the normal-form and the reduced-normal-form representation are\nexponentially large in the size of the game tree.\nHere, we \ufb01x some notations. Let Z be the set of terminal states (or equivalently, outcomes) in the\ngame and ui(z) be the utility obtained by player i if the game terminates at z \u2208 Z. Let \u03a0i be the\nset of pure reduced-normal-form strategies for Player i. We de\ufb01ne \u03a0i(I), \u03a0i(I, a) and \u03a0i(z) to be\nthe set of reduced-normal-form strategies that (a) can lead to information set I, (b) can lead to I and\nprescribes action a at information set I, and (c) can lead to the terminal state z, respectively. We\ndenote by \u03a3i the set of information set-action pairs (I, a) (also referred to as sequences), where I is\nan information set for Player i and a is an action at set I. For a given terminal state z let \u03c3i(z) be the\nlast (I, a) pair belonging to Player i encountered in the path from the root of the tree to z.\nExtensive-Form Correlated Equilibrium Extensive-form correlated equilibrium (EFCE) is a\nsolution concept for extensive-form games introduced by von Stengel and Forges [2008].3 Like\nin the traditional correlated equilibrium (CE), introduced by Aumann [1974], a correlation device\nselects private signals for the players before the game starts. These signals are sampled from a\ncorrelated distribution \u00b5\u2014a joint probability distribution over \u03a01 \u00d7 \u03a02\u2014and represent recommended\nplayer strategies. However, while in a CE the recommended moves for the whole game tree are\nprivately revealed to the players when the game starts, in an EFCE the recommendations are revealed\nincrementally as the players progress in the game tree. In particular, a recommended move is only\nrevealed when the player reaches the decision point in the game for which the recommendation is\nrelevant. Moreover, if a player ever deviates from the recommended move, they will stop receiving\nrecommendations. To concretely implement an EFCE, one places recommendations into \u2018sealed\nenvelopes\u2019 which may only be opened at its respective information set. Sealed envelopes may\nimplemented using cryptographic techniques (see Dodis et al. [2000] for one such example).\nIn an EFCE, the players know less about the set of recommendations that were sampled by the\ncorrelation device. The bene\ufb01ts are twofold. First, the players can be more easily induced to play\nstrategies that hurt them (but bene\ufb01t the overall social welfare), as long as \u201con average\u201d the players\nare indifferent as to whether or not to follow the recommendations: the set of EFCEs is a superset\nof that of CEs. Second, since the players observe less, the set of probability distributions for the\ncorrelation device for which no player has an incentive to deviate can be described succinctly in\ncertain classes of games: von Stengel and Forges [2008, Theorem 1.1] show that in two-player,\nperfect-recall extensive-form games with no chance moves, the set of EFCEs can be described by\na system of linear equations and inequalities of polynomial size in the game description. On the\nother hand, the same result cannot hold in more general settings: von Stengel and Forges [2008,\nSection 3.7] also show that in games with more than two players and/or chance moves, deciding\nthe existence of an EFCE with social welfare greater than a given value is NP-hard. It is important\nto note that this last result only implies that the characterization of the set of all EFCEs cannot be\nof polynomial size in general (unless P = NP). However, the problem of \ufb01nding one EFCE can be\n\n3Other CE-related solution concepts in sequential games include the agent-form correlated equilibrium\n(AFCE), where agents continue to receive recommendations even upon defection, and normal-form coarse CE\n(NFCCE). NFCCE does not allow for defections during the game, in fact, before the game starts, players must\ndecide to commit to following all recommendations upfront (before receiving them), or elect to receive none.\n\n3\n\n\fsolved in polynomial time: Huang [2011] and Huang and von Stengel [2008] show how to adapt the\nEllipsoid Against Hope algorithm [Papadimitriou and Roughgarden, 2008; Jiang and Leyton-Brown,\n2015] to compute an EFCE in polynomial time in games with more than two players and/or with\nchance moves. Unfortunately, that algorithm is only theoretical, and known to not scale beyond\nextremely small instances [Leyton-Brown, 2019].\n\n3 Extensive-Form Correlated Equilibria as Bilinear Saddle-Point Problems\n\nOur objective for this section is to cast the problem of \ufb01nding an EFCE in a two-player game as a\nbilinear saddle-point problem, that is a problem of the form minx\u2208X maxy\u2208Y x(cid:62)Ay, where X and Y\nare compact convex sets. In the case of EFCE, X and Y are convex polytopes that belong to a space\nwhose dimension is polynomial in the game tree size. This reformulation is meaningful:\n\u2022 From a conceptual angle, it brings the problem of computing an EFCE closer to several other\nsolution concepts in game theory that are known to be expressible as BSPP. In particular, the BSPP\nformulation shows that an EFCE can be viewed as a NE in a two-player zero-sum game between a\ndeviator, who is trying to decide how to best defect from recommendations, and a mediator, who\nis trying to come up with an incentive-compatible set of recommendations.\n\u2022 From a geometric point of view, the BSPP formulation better captures the combinatorial structure\nof the problem: X and Y have a well-de\ufb01ned meaning in terms of the input game tree. This has\nalgorithmic implications: for example, because of the structure of Y (which will be detailed later),\nthe inner maximization problem can be solved via a single bottom-up game-tree traversal.\n\u2022 From a computational standpoint, it opens the way to the plethora of optimization algorithms\n(both general-purpose and those speci\ufb01c to game theory) that have been developed to solve BSPPs.\nExamples include Nesterov\u2019s excessive gap technique [Nesterov, 2005], Nemirovski\u2019s mirror prox\nalgorithm [Nemirovski, 2004] and regret-methods based methods such as mirror descent, follow-\nthe-regularized-leader (e.g., Hazan [2016]), and CFR and its variants Zinkevich et al. [2007];\nFarina et al. [2019]; Brown and Sandholm [2019a].\n\nFurthermore, it is easy to show that by dualizing the inner maximization problem in the BSPP\nformulation, one recovers the linear program introduced by von Stengel and Forges [2008] (we show\nthis in Appendix A in the full paper). In this sense, our formulation subsumes the existing one.\nTriggers and Deviations One effective way to reason about extensive-form correlated equilibria is\nvia the notion of trigger agents, which was introduced (albeit used in a different context) in Gordon\net al. [2008] and Dudik and Gordon [2009]:\nDe\ufb01nition 1. Let \u02c6\u03c3 := ( \u02c6I, \u02c6a) \u2208 \u03a3i be a sequence for Player i, and let \u02c6\u00b5 be a distribution over \u03a0i( \u02c6I).\nA (\u02c6\u03c3, \u02c6\u00b5)-trigger agent for Player i is a player that follows all recommendations given by the mediator\nunless they get recommended \u02c6a at \u02c6I; in that case, the player \u2018gets triggered\u2019, stops following the\nrecommendations and instead plays based on a pure strategy sampled from \u02c6\u00b5 until the game ends.\n\nA correlated distribution \u00b5 is an EFCE if and only if any trigger agent for Player i can get utility at\nmost equal to the utility that Player i earns by following the recommendations of the mediator at\nall decision points. In order to express the utility of the trigger agent, it is necessary to compute the\nprobability of the game ending in each of the terminal states. As we show in Appendix B in the full\npaper, this can be done concisely by partitioning the set of terminal nodes in the game tree into three\ndifferent sets. In particular, let Z \u02c6I,\u02c6a be the set of terminal nodes whose path from the root of the tree\ncontains taking action \u02c6a at \u02c6I and let Z \u02c6I be the set of terminal nodes whose path from the root passes\nthrough \u02c6I and are not in Z \u02c6I,\u02c6a. We have\nLemma 1. Consider a (\u02c6\u03c3, \u02c6\u00b5)-trigger agent for Player 1, where \u02c6\u03c3 = ( \u02c6I, \u02c6a). The value of the trigger\nagent, de\ufb01ned as the expected difference between the utility of the trigger agent and the utility of an\nagent that always follows recommendations sampled from correlated distribution \u00b5, is computed as\n\nu1(z)\u03be1(\u02c6\u03c3; z)y1,\u02c6\u03c3(z) \u2212 (cid:88)\n(cid:88)\n\u03c02\u2208\u03a02(z) \u00b5(\u03c01, \u03c02) and y1,\u02c6\u03c3(z) :=(cid:80)\n(cid:80)\n\nz\u2208Z \u02c6I,\u02c6a\n\nz\u2208Z \u02c6I\n\nu1(z)\u03be1(\u03c31(z); z),\n\n\u02c6\u03c01\u2208\u03a01(z) \u02c6\u00b5(\u02c6\u03c01).\n\nv1,\u02c6\u03c3(\u00b5, \u02c6\u00b5) :=\n\nwhere \u03be1(\u02c6\u03c3; z) :=(cid:80)\n\n\u03c01\u2208\u03a01(\u02c6\u03c3)\n\n(A symmetric result holds for Player 2, with symbols \u03be2(\u02c6\u03c3; z) and y2,\u02c6\u03c3(z).) It now seems natural to\nperform a change of variables, and pick distributions for the random variables y1,\u02c6\u03c3(\u00b7), y2,\u02c6\u03c3(\u00b7), \u03be1(\u00b7;\u00b7)\n\n4\n\n\fand \u03be2(\u00b7;\u00b7) instead of \u00b5 and \u02c6\u00b5. Since there are only a polynomial number (in the game tree size) of\ncombinations of arguments for these new random variables, this approach allows one to remove the\nredundancy of realization-equivalent normal-form plans and focus on a signi\ufb01cantly smaller search\nspace. In fact, the de\ufb01nition of \u03be = (\u03be1, \u03be2) also appears in [von Stengel and Forges, 2008], referred\nto as (sequence-form) correlation plan. In the case of the y1,\u02c6\u03c3 and y2,\u02c6\u03c3 random variables, it is clear\nthat the change of variables is possible via the sequence form [von Stengel, 2002]; we let Yi,\u02c6\u03c3 be the\nsequence-form polytope of feasible values for the vector yi,\u02c6\u03c3. Hence, the only hurdle is characterizing\nthe space spanned by \u03be1 and \u03be2 as \u00b5 varies across the probability simplex. In two-player perfect-recall\ngames with no chance moves, this is exactly one of the merits of the landmark work by von Stengel\nand Forges [2008]. In particular, the authors prove that in those games the space of feasible \u03be can be\ncaptured by a polynomial number of linear constraints. In more general cases the same does not hold\n(see second half of Section 2), but we prove the following (Appendix C in the full paper):\nLemma 2. In a two-player game, as \u00b5 varies over the probability simplex, the joint vector of \u03be1(\u00b7;\u00b7),\n\u03be2(\u00b7;\u00b7) variables spans a convex polytope X in Rn, where n is at most quadratic in the game size.\nSaddle-Point Reformulation According to Lemma 1, for each Player i and (\u02c6\u03c3, \u02c6\u00b5)-trigger agent\nfor them, the value of the trigger agent is a biaf\ufb01ne expression in the vectors yi,\u02c6\u03c3 and \u03bei, and can\nbe written as vi,\u02c6\u03c3(\u03bei, yi,\u02c6\u03c3) = \u03be(cid:62)i Ai,\u02c6\u03c3yi,\u02c6\u03c3 \u2212 b(cid:62)i,\u02c6\u03c3\u03bei for a suitable matrix Ai,\u02c6\u03c3 and vector bi,\u02c6\u03c3, where\nthe two terms in the difference correspond to the expected utility for deviating at \u02c6\u03c3 according to the\n(sequence-form) strategy yi,\u02c6\u03c3 and the expected utility for not deviating at \u02c6\u03c3. Given a correlation plan\n\u03be = (\u03be1, \u03be2) \u2208 X , the maximum value of any deviation for any player can therefore be expressed as\n\nv\u2217(\u03be) := max\n\n{i,\u02c6\u03c3,yi,\u02c6\u03c3}\n\nvi,\u02c6\u03c3(\u03bei, yi,\u02c6\u03c3) = max\ni\u2208{1,2}\n\nmax\n\u02c6\u03c3\u2208\u03a3i\n\nmax\ny\u02c6\u03c3\u2208Y\u02c6\u03c3\n\n{\u03be(cid:62)i Ai,\u02c6\u03c3yi,\u02c6\u03c3 \u2212 b(cid:62)i,\u02c6\u03c3\u03bei}.\n\nWe can convert the maximization above into a continuous linear optimization problem by introducing\nthe multipliers \u03bbi,\u02c6\u03c3 \u2208 [0, 1] (one per each Player i \u2208 {1, 2} and trigger \u02c6\u03c3 \u2208 \u03a3i), and write\n\ni\u2208{1,2}\n\n{\u03bbi,\u02c6\u03c3,zi,\u02c6\u03c3}\n\nv\u2217(\u03be) = max\n\nwhere the maximization is subject to the linear constraints [C1](cid:80)\n\n\u03be(cid:62)i Ai,\u02c6\u03c3zi,\u02c6\u03c3 \u2212 \u03bbi,\u02c6\u03c3b(cid:62)i,\u02c6\u03c3\u03bei,\n(cid:80)\n\u02c6\u03c3\u2208\u03a3i \u03bbi,\u02c6\u03c3 = 1 and [C2]\nzi,\u02c6\u03c3 \u2208 \u03bbi,\u02c6\u03c3Yi,\u02c6\u03c3 for all i \u2208 {1, 2}, \u02c6\u03c3 \u2208 \u03a3i. These linear constraints de\ufb01ne a polytope Y.\nA correlation plan \u03be is an EFCE if an only if vi,\u02c6\u03c3(\u03be, yi,\u02c6\u03c3) \u2264 0 for every trigger agent, i.e., v\u2217(\u03be) \u2264 0.\nTherefore, to \ufb01nd an EFCE, we can solve the optimization problem min\u03be\u2208X v\u2217(\u03be), which is a bilinear\nsaddle point problem over the convex domains X and Y, both of which are convex polytopes that\nbelong to Rn, where n is at most quadratic in the input game size (Lemma 2). If an EFCE exists, the\noptimal value should be non-positive and the optimal solution is an EFCE (as it satis\ufb01es v\u2217(\u03be) \u2264 0).\nIn fact, since EFCE\u2019s always exist (as EFCEs are supersets of CEs von Stengel and Forges [2008]),\nand one can select triggers to be terminal sequences for Player 1, the optimal value of the BSPP\nis always 0. The BSPP can be interpreted as the NE of a zero-sum game between the mediator,\nwho decides on a suitable correlation plan \u03be and a deviator who selects the yi,\u02c6\u03c3\u2019s to maximize each\nvi,\u02c6\u03c3(\u03bei, yi,\u02c6\u03c3). The value of this game is always 0.\nFinally, we can enforce a minimum lower bound \u03c4 on the sum of players\u2019 utility by introducing an\nadditional variable \u03bbsw \u2208 [0, 1] and maximizing the new convex objective\n\n(cid:88)\n\n(cid:88)\n\ni\n\n\u02c6\u03c3\n\n(cid:40)\n\nv\u2217sw(\u03be) := max\n\u03bbsw\u2208[0,1]\n\n(1 \u2212 \u03bbsw) \u00b7 v\u2217(\u03be) + \u03bbsw\n\n(cid:34)\n\u03c4 \u2212(cid:88)\n\nu1(z)\u03be1(z; z) \u2212(cid:88)\n\nz\u2208Z\n\nz\u2208Z\n\n(cid:35)(cid:41)\n\nu2(z)\u03be2(z; z)\n\n. (1)\n\n4 Computing an EFCE using Subgradient Descent\n\nvon Stengel and Forges [2008] show that a SW-maximizing EFCE of a two-player game without\nchance may be expressed as the solution of an LP and solved using generic methods such as the\nsimplex algorithm or interior-point methods. However, this does not scale to large games as these\nmethods require storing and inverting large matrices. Another way of computing SW-maximizing\nEFCEs was provided by Dudik and Gordon [2009]. However, their algorithm assumes that sampling\nfrom correlation plans is possible using the Monte Carlo Markov chain algorithm and does not factor\nin convergence of the Markov chain. Furthermore, even though their formulation generalizes beyond\n\n5\n\n\four setting of two-player games without chance, our gradient descent method admits more complex\nobjectives. In particular, it allows the mediator to maximize over general concave objectives (in\ncorrelation plans) instead of only linear objectives with potentially some regularization. Here, we\nshowcase the bene\ufb01ts of exploiting the combinatorial structure of the BSPP formulation of Section 3\nby proposing a simple algorithm based on subgradient descent; in Section 6 we show that this method\nscales better than commercial state-of-the-art LP solver in large games.\nFor brevity, we only provide a sketch of our algorithm, which computes a feasible EFCE; the extension\nto the slightly more complicated objective v\u2217sw(\u03be) (Equation 1) is straightforward\u2014see Appendix D\nin the full paper for more details. First, observe that the objective v\u2217(\u03be) is convex since it is the\nmaximum of linear functions of \u03be. This suggests that we may perform subgradient descent on v\u2217,\nwhere the subgradients are given by \u2202/\u2202\u03be v\u2217(\u03be) = Ai\u2217,\u02c6\u03c3\u2217 y\u2217i\u2217,\u02c6\u03c3\u2217 \u2212 bi,\u02c6\u03c3\u2217 , where (i\u2217, \u02c6\u03c3\u2217, y\u2217i\u2217,\u02c6\u03c3\u2217 ) is a\ntriplet which maximizes the objective function v\u2217(\u03be). The computation of such a triplet can be done\nvia a straightforward bottom-up traversal of the game tree. In order to maintain feasibility (that is,\n\u03be \u2208 X ), it is necessary to project onto X , which is challenging in practice because we are not aware\nof any distance-generating function that allows for ef\ufb01cient projection onto this polytope. This is so\neven in games without chance (where \u03be can be expressed by a polynomial number of constraints [von\nStengel and Forges, 2008]). Furthermore, iterative methods such as Dykstra\u2019s algorithm, add a\ndramatic overhead to the cost of each iterate.\nTo overcome this hurdle, we observe that in games with no chance moves, the set X of correlation\nplans\u2014as characterized by von Stengel and Forges [2008] via the notion of consistency constraints\u2014\ncan be expressed as the intersection of three sets: (i) X1, the sets of vectors \u03be that only satisfy\nconsistency constraints for Player 1; (ii) X2, the sets of vectors \u03be that only satisfy consistency\n+, the non-negative orthant. X1 and X2 are polytopes de\ufb01ned by\nconstraints for Player 2; and (iii) Rn\nequality constraints only. Therefore, an exact projection (in the Euclidean sense) onto X1 and X2 can\nbe carried out ef\ufb01ciently by precomputing a suitable factorization the constraint matrices that de\ufb01ne\nX1 and X2. In particular, we are able to leverage the speci\ufb01c combinatorial structure of the constraints\nthat form X1 and X2 to design an ef\ufb01cient and parallel sparse factorization algorithm (see Appendix D\nin the full paper for the full details). Furthermore, projection onto the non-negative orthant can be\ndone conveniently, as it just amounts to computing a component-wise maximum between \u03be and the\nzero vector. Since X = X1 \u2229 X2 \u2229 Rn\n+ individually is\neasy, we can adopt the recent algorithm proposed by Wang and Bertsekas [2013] designed to handle\nexactly this situation. In that algorithm, gradient steps are interlaced with projections onto X1, X2\nand Rn\n+ in a cyclical manner. This is similar to projected gradient descent, but instead of projecting\nonto the intersection of X1, X2 and Rn\n+ (which we believe to be dif\ufb01cult), we project onto just one of\nthem in round-robin fashion. This simple method was shown to converge by Wang and Bertsekas\n[2013]. However, no convergence bound is currently known.\n\n+, and since projecting onto X1, X2 and Rn\n\n5\n\nIntroducing the First Benchmarks for EFCE\n\nIn this section we introduce the \ufb01rst two benchmark games for EFCE. These games are naturally\nparametric so that they can scale in size as desired and hence used to evaluate different EFCE solvers.\nIn addition, we show that the EFCE in these games are interesting behaviorally: the correlation plan\nin social-welfare-maximizing EFCE is highly nontrivial and even seemingly counter-intuitive. We\nbelieve some of these induced behaviors may prove practical in real-world scenarios and hope our\nanalysis can spark an interest in EFCEs and other equilibria in sequential settings.\n\n5.1 Battleship: Con\ufb02ict Resolution via a Mediator\n\nIn this section we introduce our \ufb01rst proposed benchmark game to illustrate the power of correlation\nin extensive-form games. Our game is a general-sum variant of the classic game Battleship. Each\nplayer takes turns to secretly place a set of ships S (of varying sizes and value) on separate grids of\nsize H \u00d7 W . After placement, players take turns \ufb01ring at their opponent\u2014ships which have been hit\nat all the tiles they lie on are considered destroyed. The game continues until either one player has\nlost all of their ships, or each player has completed r shots. At the end of the game, the payoff of\neach player is computed as the sum of the values of the opponent\u2019s ships that were destroyed, minus\n\u03b3 times the value of ships which they lost, where \u03b3 \u2265 1 is called the loss multiplier of the game. The\nsocial welfare (SW) of the game is the sum of utilities to all players.\n\n6\n\n\fIn order to illustrate a few interesting feature of social-welfare-maximizing EFCE in this game, we\nwill focus on the instance of the game with a board of size 3 \u00d7 1, in which each player commands\njust 1 ship of value and length 1, there are 2 rounds of shooting per player, and the loss multiplier is\n\u03b3 = 2. In this game, the social-welfare-maximizing Nash equilibrium is such that each player places\ntheir ship and shoots uniformly at random. This way, the probability that Player 1 and 2 will end the\ngame by destroying the opponent\u2019s ship is 5/9 and 1/3 respectively (Player 1 has an advantage since\nthey act \ufb01rst). The probability that both players will end the game with their ships unharmed is a\nmeagre 1/9. Correspondingly, the maximum SW reached by any NE of the game is \u22128/9.\nIn the EFCE model, it is possible to induce the players to end the game with a peaceful outcome\u2014that\nis, no damage to either ship\u2014with probability 5/18, 2.5 times of the probability in NE, resulting in a\nmuch-higher SW of \u221213/18. Before we continue with more details as to how the mediator (correlation\ndevice) is able to achieve this result in the case where \u03b3 = 2, we remark that the bene\ufb01t of EFCE\nis even higher when the loss multiplier \u03b3 increases: Figure 1 (left) shows, as a function of \u03b3, the\nprobability with which Player 1 and 2 terminate the game by sinking their opponent\u2019s ship, if they\nplay according to the SW-maximizing EFCE. For all values of \u03b3, the SW-maximizing NE remains the\nsame while with a mediator, the probability of reaching a peaceful outcome increases as \u03b3 increases,\nand asymptotically gets closer to 1/3 and the gap between the expected utility of the two players\nvanishes. This is remarkable, considering Player 1\u2019s advantage for acting \ufb01rst.\n\nFigure 1: (Left) Probabilities of players sinking their opponent when the players play according to the SW-\nmaximizing EFCE. For \u03b3 \u2265 2, the probability of the game ending with no sunken ship and the probability of\nPlayer 2 sinking Player 1 coincide. (Right) Example of a playthrough of Battleship assuming both players are\nrecommended to place their ship in the same position a. Edge labels represents the probability of an action being\nrecommended. Squares and hexagons denote actions taken by Players 1 and 2 respectively. Blue and red nodes\nrepresent cases where Players 1 and 2 sink their opponent, respectively. The Shoot action is abbreviated \u2018Sh.\u2019.\n\nWe now resume our analysis of the SW-maximizing EFCE in the instance where \u03b3 = 2. In a nutshell,\nthe correlation plan is constructed in a way that players are recommended to deliberately miss, and\ndeviations from this are punished by the mediator, who reveals to the opponent the ship location\nthat was recommended to the deviating player. First, the mediator recommends the players a ship\nplacement that is sampled uniformly at random and independently for each players. This results in 9\npossible scenarios (one per possible ship placement) in the game, each occurring with probability 1/9.\nDue to the symmetric nature of ship placements, only two scenarios are relevant: whether the two\nplayers are recommended to place their ship in the same spot, or in different spots. Figure 1 (right)\nshows the probability of each recommendation from the mediator in the former case, assuming that\nthe players do not deviate. The latter case is symmetric (see Appendix E in the full paper for details).\nNow, we explain the \ufb01rst of the two methods in which the mediator compels non-violent behavior.\nWe focus on the \ufb01rst shot made by Player 1 (i.e., the root in Figure 3). The mediator suggests that\nPlayer 1 shoot at the Player 2\u2019s ship with a low 2/27 probability, and deliberately miss with high\nprobability. One may wonder how it is possible for this behavior to be incentive-compatible (that\nis, what are the incentives that compel Player 1 into not defecting), since the player may choose to\nrandomly \ufb01re in any of the 2 locations that were not recommended, and get almost 1/2 chance of\nwinning the game immediately. The key is that if Player 1 does so and does not hit the opponent\u2019s\nship, then the mediator can punish him by recommending that Player 2 shoot in the position where\nPlayer 1\u2019s was recommended to place their ship. Since players value their ships more than destroying\ntheir opponents\u2019, the player is incentivized to avoid such a situation by accepting the recommendation\nto (most probably) miss. We see the \ufb01rst example of deterrent used by the mediator: inducing the\nopponent to play punitive actions against players that have deviated from the recommendation, if\n\n7\n\n123456810203040601000.20.30.40.5Player2Player1NosunkenshipShiplossvalue(\u03b3)ProbabilityPl.1incella,Pl.2incellaSh.bSh.aSh.cSh.bSh.aSh.cSh.aSh.cSh.cSh.aSh.cSh.bSh.aSh.bSh.aSh.bSh.aSh.cSh.cSh.aSh.bSh.bSh.a25/542/2725/541/21/22/53/52/53/51/21/21/21/21/21/22/53/52/53/51/21/21/21/2\fever that deviation can be detected from the player. A similar situation arises in the \ufb01rst move of\nPlayer 2, where Player 2 is recommended to deliberately miss, hitting each of the 2 empty spots with\nprobability 1/2. A more detailed analysis is available in Appendix E in the full paper.\n\n5.2 Sheriff: Bargaining and Negotiation\n\nOur second proposed benchmark is a simpli\ufb01ed version of the Sheriff of Nottingham board game.\nThe game models the interaction of two players: the Smuggler\u2014who is trying to smuggle illegal\nitems in their cargo\u2014and the Sheriff \u2014who is trying to stop the Smuggler. At the beginning of the\ngame, the Smuggler secretly loads his cargo with n \u2208 {0, . . . , nmax} illegal items. At the end of the\ngame, the Sheriff decides whether to inspect the cargo. If the Sheriff chooses to inspect the cargo\nand \ufb01nds illegal goods, the Smuggler must pay a \ufb01ne worth p \u00b7 n to the Sheriff. On the other hand,\nthe Sheriff has to compensate the Smuggler with a utility s if no illegal goods are found. Finally,\nif the Sheriff decides not to inspect the cargo, the Smuggler\u2019s utility is v \u00b7 n whereas the Sheriff\u2019s\nutility is 0. The game is made interesting by two additional elements (which are also present in\nthe board game): bribery and bargaining. After the Smuggler has loaded the cargo and before the\nSheriff chooses whether or not to inspect, they engage in r rounds of bargaining. At each round\ni = 1, . . . , r, the Smuggler tries to tempt the Sheriff into not inspecting the cargo by proposing a\nbribe bi \u2208 {0, . . . bmax}, and the Sheriff responds whether or not they would accept the proposed\nbribe. Only the proposal and response from round r will be executed and have an impact on the \ufb01nal\npayoffs\u2014that is, all but the r-th round of bargaining are non-consequential and their purpose is for\nthe two players to settle on a suitable bribe amount. If the Sheriff accepts bribe br, then the Smuggler\ngets p \u00b7 n \u2212 br, while the Sheriff gets br. See Appendix F in the full paper for a formal description of\nthe game.\nWe now point out some interesting behavior of EFCE in this game. We refer to the game instance\nwhere v = 5, p = 1, s = 1, nmax = 10, bmax = 2, r = 2 as the baseline instance.\nEffect of v, p and s. First, we show what happens in the baseline instance when the item value v,\nitem penalty p, and Sheriff compensation (penalty) s are varied in isolation over a continuous range\nof values. The results are shown in Figure 2. In terms of general trends, the effect of the parameter\nto the Smuggler is fairly consistent with intuition: the Smuggler bene\ufb01ts from a higher item value\nas well as from higher sheriff penalties, and suffers when the penalty for smuggling is increased.\nHowever, the \ufb01ner details are much more nuanced. For one, the effect of changing the parameters\nnot only is non-monotonic, but also discontinuous. This behavior has never been documented and\nwe \ufb01nd it rather counterintuitive. More counterintuitive observations can be found in Appendix F.\nEffect of nmax, bmax, and r. Here, we try to empirically understand the impact of n and b on the SW\n\nFigure 2: Utility of players with varying v, p and s for the SW-maximizing EFCE. We veri\ufb01ed that these plots\nare not the result of equilibrium selection issues.\n\nmaximizing equilibrium. As before we set v = 5, p = 1, s = 1 and vary n and r simultaneously\nwhile keeping bmax constant. The results are shown in Table 1. The most striking observation is that\nincreasing the capacity of the cargo nmax may decrease social welfare. For example, consider the case\nwhen bmax = 2, nmax = 2, r = 1 (shown in blue in Table 1, right) where the payoffs are (8.0, 2.0).\nThis achieves the maximum attainable social welfare by smuggling nmax = 2 items and having the\nSheriff accept a bribe of 2. When nmax is increased to 5 (red entry in the table), the payoffs to both\nplayers drop signi\ufb01cantly, and even more so when nmax increases further. While counter-intuitive, this\nbehavior is consistent in that the Smuggler may not bene\ufb01t from loading 3 items every time he was rec-\nommended to load 2; the Sheriff reacts by inspecting more, leading to lower payoffs for both players.\n\n8\n\n10010110210\u22121100101SmugglerSheriffIllegalitemvalue(v)UtilitySheriffgamewithvaryingillegalitemvalue01234510\u22121100101SheriffSheriffSmugglerSmugglerIllegalitempenalty(p)Sheriffgamewithvaryingillegalitempenalty101102100101SheriffSmugglerSheriffpenalty(s)Sheriffgamewithvaryingsheriffpenalty(uponinspectionofacargowithnoillegalitems)\fr = 2\n\nr = 3\n\nnmax\n\nr = 1\n\nThat behavior is avoided by increasing the number of\nrounds r: by increasing to r = 2 (entry shown in purple),\nthe behavior disappears and we revert to achieving a\nsocial welfare of 10 just like in the instance with nmax =\n2, r = 1. With suf\ufb01cient bargaining steps, the Smuggler,\nwith the aid of the mediator, is able to convince the Sheriff\nthat they have complied with the recommendation by the\nmediator. This is because the mediator spends the \ufb01rst\nr \u2212 1 bribes to give a \u2018passcode\u2019 to the Smuggler so that the Sheriff can verify compliance\u2014if an\n\u2018unexpected\u2019 bribe is suggested, then the Smuggler must have deviated, and the Sheriff will inspect\nthe cargo as punishment. With more rounds, it is less likely that the Smuggler will guess the correct\npasscode. See also Appendix F in the full paper for additional insights.\n\nTable 1: Payoffs for (Smuggler, Sheriff) in the\nSW-maximizing EFCE.\n\n(3.00, 2.00)\n(8.00, 2.00)\n(8.00, 2.00)\n(8.00, 2.00)\n\n(3.00, 2.00)\n(8.00, 2.00)\n(8.00, 2.00)\n(7.26, 1.82)\n\n(3.00, 2.00)\n(8.00, 2.00)\n(2.28, 1.26)\n(1.76, 0.93)\n\n1\n2\n5\n10\n\n6 Experimental Evaluation\n\nEven our proof-of-concept algorithm based on the BSSP formulation and subgradient descent,\nintroduced in Section 3, is able to beat LP-based approaches using the commercial solver Gurobi\n[Gurobi Optimization, 2018] in large games. This con\ufb01rms known results about the scalability of\nmethods for computing NE, where in the recent years \ufb01rst-order methods have af\ufb01rmed themselves\nas the only algorithms that are able to handle large games.\nWe experimented on Battleship over a range of parameters while \ufb01xing \u03b3 = 2. All experiments\nwere run on a machine with 64 cores and 500GB of memory. For our method, we tuned step sizes\nbased on multiples of 10. In Table 2, we report execution times when all constraints (feasibility\nand deviation) are violated by no greater than 10\u22121, 10\u22122 and 10\u22123. Our method outperforms the\nLP-based approach for larger games. However, while we outperform the LP-based approach for\naccuracies up to 10\u22123, Gurobi spends most of its time reordering variables and preprocessing and its\nsolution converges faster for higher levels of precision; this is expected of a gradient-based method\nlike ours. On very large games with more than 100 million variables, both our method and Gurobi\nfail\u2014in Gurobi\u2019s case, it was due to a lack of memory while in our case, each iteration required\nnearly an hour which was prohibitive. The main bottleneck in our method was the projection onto X1\nand X2. We also experimented on the Sheriff game and obtained similar \ufb01ndings (Appendix H in the\nfull paper).\n\n(H, W ) r\n\nShip\nlength Pl 1\n\n#Actions\nPl 2\n\n#Relevant\nseq. pairs\n\n10\u22121\n2s\n\nTime (LP)\n\n10\u22122\n2s\n\n10\u22123\n2s\n\n(2, 2)\n(3, 2)\n(3, 2)\n(3, 2)\n\n3\n3\n4\n4\n\n1\n1\n1\n2\n\n917\n47k\n306k\n\n741\n15k\n145k\n970k 2.27M 111M\n\n35241\n3.89M\n3m 17s\n26.4M 42m 39s 42m 44s\n\n3m 6s\n\n\u2014 out of memory\u2020 \u2014\n\nTime (ours)\n\n10\u22121\n1s\n8s\n\n10\u22122\n2s\n34s\n\n10\u22123\n3s\n52s\n\n3m 24s\n43m 2m 48s 14m 1s 23m 24s\n\u2014 did not achieve \u2021 \u2014\n\nTable 2: #Relevant seq. pairs is the dimension of \u03be under the compact representation of von Stengel and\nForges [2008]. For LPs, we report the fastest of Barrier, Primal and Dual Simplex, and 3 different formulations\n(Appendix G in the full paper). \u2020 Gurobi went out of memory and was killed by the system after \u223c 3000 seconds\n\u2021 Our method requires 1 hour per iteration and did not achieve the required accuracy after 6 hours.\n\n7 Conclusions\n\nIn this paper, we proposed two parameterized benchmark games in which EFCE exhibits interesting\nbehaviors. We analyzed those behaviors both qualitatively and quantitatively, and isolated two ways\nthrough which a mediator is able to compel the agents to follow the recommendations. We also\nprovided an alternative saddle-point formulation of EFCE and demonstrated its merit with a simple\nsubgradient method which outperforms standard LP based methods.\nWe hope that our analysis will bring attention to some of the computational and practical uses of\nEFCE, and that our benchmark games will be useful for evaluating future algorithms for computing\nEFCE in large games.\n\n9\n\n\fAcknowledgments\n\nThis material is based on work supported by the National Science Foundation under grants IIS-\n1718457, IIS-1617590, and CCF-1733556, and the ARO under award W911NF-17-1-0082. Gabriele\nFarina is supported by a Facebook fellowship. Co-authors Ling and Fang are supported in part by a\nresearch grant from Lockheed Martin.\n\nReferences\nItai Ashlagi, Dov Monderer, and Moshe Tennenholtz. On the value of correlation. Journal of Arti\ufb01cial\n\nIntelligence Research, 33:575\u2013613, 2008.\n\nRobert Aumann. Subjectivity and correlation in randomized strategies. Journal of Mathematical\n\nEconomics, 1:67\u201396, 1974.\n\nMichael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-up limit hold\u2019em\n\npoker is solved. Science, 2015.\n\nNoam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats\n\ntop professionals. Science, Dec. 2017.\n\nNoam Brown and Tuomas Sandholm. Solving imperfect-information games via discounted regret\n\nminimization. In AAAI, 2019.\n\nNoam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science, 365(6456):885\u2013\n\n890, 2019.\n\nVincent P Crawford and Joel Sobel. Strategic information transmission. Econometrica: Journal of\n\nthe Econometric Society, pages 1431\u20131451, 1982.\n\nYevgeniy Dodis, Shai Halevi, and Tal Rabin. A cryptographic solution to a game theoretic problem.\n\nIn Annual International Cryptology Conference, pages 112\u2013130. Springer, 2000.\n\nMiroslav Dudik and Geoffrey J Gordon. A sampling-based approach to computing equilibria in\n\nsuccinct extensive-form games. In UAI, pages 151\u2013160. AUAI Press, 2009.\n\nGabriele Farina, Christian Kroer, and Tuomas Sandholm. Online convex optimization for sequential\ndecision processes and extensive-form games. In AAAI Conference on Arti\ufb01cial Intelligence, 2019.\n\nGeoffrey J Gordon, Amy Greenwald, and Casey Marks. No-regret learning in convex games. In\nProceedings of the 25th international conference on Machine learning, pages 360\u2013367. ACM,\n2008.\n\nLLC Gurobi Optimization. Gurobi optimizer reference manual, 2018.\n\nElad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization,\n\n2016.\n\nWan Huang and Bernhard von Stengel. Computing an extensive-form correlated equilibrium in\npolynomial time. In International Workshop On Internet And Network Economics (WINE), pages\n506\u2013513. Springer, 2008.\n\nWan Huang. Equilibrium computation for extensive games. PhD thesis, London School of Economics\n\nand Political Science, January 2011.\n\nAlbert Xin Jiang and Kevin Leyton-Brown. Polynomial-time computation of exact correlated\n\nequilibrium in compact games. Games and Economic Behavior, 91:347\u2013359, 2015.\n\nElias Koutsoupias and Christos Papadimitriou. Worst-case equilibria. In Symposium on Theoretical\n\nAspects in Computer Science, 1999.\n\nKevin Leyton-Brown. Personal communication, 2019.\n\n10\n\n\fMatej Morav\u02c7c\u00edk, Martin Schmid, Neil Burch, Viliam Lis\u00fd, Dustin Morrill, Nolan Bard, Trevor\nDavis, Kevin Waugh, Michael Johanson, and Michael Bowling. Deepstack: Expert-level arti\ufb01cial\nintelligence in heads-up no-limit poker. Science, 2017.\n\nJohn Nash. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences,\n\n36:48\u201349, 1950.\n\nArkadi Nemirovski. Prox-method with rate of convergence O(1/t) for variational inequalities with\nLipschitz continuous monotone operators and smooth convex-concave saddle point problems.\nSIAM Journal on Optimization, 2004.\n\nYurii Nesterov. Excessive gap technique in nonsmooth convex minimization. SIAM Journal of\n\nOptimization, 2005.\n\nChristos H Papadimitriou and Tim Roughgarden. Computing correlated equilibria in multi-player\n\ngames. Journal of the ACM, 55(3):14, 2008.\n\nIndrajit Ray and Sonali Sen Gupta. Technical Report, 2009.\n\nTim Roughgarden and \u00c9va Tardos. How bad is sel\ufb01sh routing?\n\n49(2):236\u2013259, 2002.\n\nJournal of the ACM (JACM),\n\nYoav Shoham and Kevin Leyton-Brown. Multiagent systems: Algorithmic, game-theoretic, and\n\nlogical foundations. Cambridge University Press, 2009.\n\nBernhard von Stengel and Fran\u00e7oise Forges. Extensive-form correlated equilibrium: De\ufb01nition and\n\ncomputational complexity. Mathematics of Operations Research, 33(4):1002\u20131022, 2008.\n\nBernhard von Stengel. Ef\ufb01cient computation of behavior strategies. Games and Economic Behavior,\n\n1996.\n\nBernhard von Stengel. Computing equilibria for two-person games. In Robert Aumann and Sergiu\nHart, editors, Handbook of game theory, volume 3. North Holland, Amsterdam, The Netherlands,\n2002.\n\nMengdi Wang and Dimitri P Bertsekas. Incremental constraint projection-proximal methods for\n\nnonsmooth convex optimization. SIAM J. Optim.(to appear), 2013.\n\nMartin Zinkevich, Michael Bowling, Michael Johanson, and Carmelo Piccione. Regret minimization\n\nin games with incomplete information. In NIPS, 2007.\n\n11\n\n\f", "award": [], "sourceid": 4955, "authors": [{"given_name": "Gabriele", "family_name": "Farina", "institution": "Carnegie Mellon University"}, {"given_name": "Chun Kai", "family_name": "Ling", "institution": "Carnegie Mellon University"}, {"given_name": "Fei", "family_name": "Fang", "institution": "Carnegie Mellon University"}, {"given_name": "Tuomas", "family_name": "Sandholm", "institution": "CMU, Strategic Machine, Strategy Robot, Optimized Markets"}]}