{"title": "Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium", "book": "Advances in Neural Information Processing Systems", "page_first": 5186, "page_last": 5196, "abstract": "Self-play methods based on regret minimization have become the state of the art for computing Nash equilibria in large two-players zero-sum extensive-form games. These methods fundamentally rely on the hierarchical structure of the players' sequential strategy spaces to construct a regret minimizer that recursively minimizes regret at each decision point in the game tree. In this paper, we introduce the first efficient regret minimization algorithm for computing extensive-form correlated equilibria in large two-player general-sum games with no chance moves. Designing such an algorithm is significantly more challenging than designing one for the Nash equilibrium counterpart, as the constraints that define the space of correlation plans lack the hierarchical structure and might even form cycles. We show that some of the constraints are redundant and can be excluded from consideration, and present an efficient algorithm that generates the space of extensive-form correlation plans incrementally from the remaining constraints. This structural decomposition is achieved via a special convexity-preserving operation that we coin scaled extension. We show that a regret minimizer can be designed for a scaled extension of any two convex sets, and that from the decomposition we then obtain a global regret minimizer. Our algorithm produces feasible iterates. Experiments show that it significantly outperforms prior approaches and for larger problems it is the only viable option.", "full_text": "Ef\ufb01cient Regret Minimization Algorithm for\nExtensive-Form Correlated Equilibrium\u2217\n\nGabriele Farina\n\nComputer Science Department\nCarnegie Mellon University\n\ngfarina@cs.cmu.edu\n\nChun Kai Ling\n\nComputer Science Department\nCarnegie Mellon University\nchunkail@cs.cmu.edu\n\nFei Fang\n\nInstitute for Software Research\n\nCarnegie Mellon University\n\nfeif@cs.cmu.edu\n\nTuomas Sandholm\n\nComputer Science Department, CMU\n\nStrategic Machine, Inc.\n\nStrategy Robot, Inc.\n\nOptimized Markets, Inc.\nsandholm@cs.cmu.edu\n\nAbstract\n\nSelf-play methods based on regret minimization have become the state of the art\nfor computing Nash equilibria in large two-players zero-sum extensive-form games.\nThese methods fundamentally rely on the hierarchical structure of the players\u2019\nsequential strategy spaces to construct a regret minimizer that recursively minimizes\nregret at each decision point in the game tree. In this paper, we introduce the \ufb01rst\nef\ufb01cient regret minimization algorithm for computing extensive-form correlated\nequilibria in large two-player general-sum games with no chance moves. Designing\nsuch an algorithm is signi\ufb01cantly more challenging than designing one for the Nash\nequilibrium counterpart, as the constraints that de\ufb01ne the space of correlation plans\nlack the hierarchical structure and might even form cycles. We show that some of\nthe constraints are redundant and can be excluded from consideration, and present\nan ef\ufb01cient algorithm that generates the space of extensive-form correlation plans\nincrementally from the remaining constraints. This structural decomposition is\nachieved via a special convexity-preserving operation that we coin scaled extension.\nWe show that a regret minimizer can be designed for a scaled extension of any\ntwo convex sets, and that from the decomposition we then obtain a global regret\nminimizer. Our algorithm produces feasible iterates. Experiments show that it\nsigni\ufb01cantly outperforms prior approaches and for larger problems it is the only\nviable option.\n\n1\n\nIntroduction\n\nIn recent years, self-play methods based on regret minimization, such as counterfactual regret\nminimization (CFR) [Zinkevich et al., 2007] and its faster variants [Tammelin et al., 2015; Brown\net al., 2017; Brown and Sandholm, 2019a] have emerged as powerful tools for computing Nash\nequilibria in large extensive-form games, and have been instrumental in several recent milestones\nin poker [Bowling et al., 2015; Brown and Sandholm, 2017a,b; Morav\u02c7c\u00edk et al., 2017; Brown and\nSandholm, 2019b]. These methods exploit the hierarchical structure of the sequential strategy spaces\nof the players to construct a regret minimizer that recursively minimizes regret locally at each decision\n\n\u2217The full version of this paper is available on arXiv.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fpoint in the game tree. This has inspired regret-based algorithms for other solution concepts in game\ntheory, such as extensive-form perfect equilibria [Farina et al., 2017], Nash equilibrium with strategy\nconstraints [Farina et al., 2017, 2019a,b; Davis et al., 2019], and quantal-response equilibrium [Farina\net al., 2019a].\nIn this paper, we give the \ufb01rst ef\ufb01cient regret-based algorithm for \ufb01nding an extensive-form correlated\nequilibrium (EFCE) [von Stengel and Forges, 2008] in two-player general-sum games with no chance\nmoves. EFCE is a natural extension of the correlated equilibrium (CE) solution concept to the setting\nof extensive-form games. Here, the strategic interaction of rational players is complemented by a\nmediator that privately recommends behavior, but does not enforce it: it is up to the mediator to\nmake recommendations that the players are incentivized to follow. Designing a regret minimization\nalgorithm that can ef\ufb01ciently search over the space of extensive-form correlated strategies (known as\ncorrelation plans) is signi\ufb01cantly more dif\ufb01cult than designing one for Nash equilibrium. This is\nbecause the constraints that de\ufb01ne the space of correlation plans lack the hierarchical structure of\nsequential strategy spaces and might even form cycles. Existing general-purpose regret minimization\nalgorithms, such as follow-the-regularized-leader [Shalev-Shwartz and Singer, 2007] and mirror\ndescent, as well as those proposed by Gordon et al. [2008] in the context of convex games, are not\npractical: they require the evaluation of proximal operators (generalized projections problems) or the\nminimization of linear functions on the space of extensive-form correlation plans. In the former case,\nno distance-generating function is known that can be minimized ef\ufb01ciently over this space, while in\nthe latter case current linear programming technology does not scale to large games, as we show in\nthe experimental section of this paper. The regret minimization algorithm we present in this paper\ncomputes the next iterate in linear time in the dimension of the space of correlation plans.\nWe show that some of the constraints that de\ufb01ne the polytope of correlation plans are redundant and\ncan be eliminated, and present an ef\ufb01cient algorithm that generates the space of correlation plans\nincrementally from the remaining constraints. This structural decomposition is achieved via a special\nconvexity-preserving operation that we coin scaled extension. We show that a regret minimizer can\nbe designed for a scaled extension of any two convex sets, and that from the decomposition we then\nobtain a global regret minimizer. Experiments show that our algorithm signi\ufb01cantly outperforms prior\napproaches\u2014the LP-based approach [von Stengel and Forges, 2008] and a very recent subgradient\ndescent algorithm [Farina et al., 2019c]\u2014and for larger problems it is the only viable option.\n\n2 Preliminaries\nExtensive-form games (EFGs) are played on a game tree. Each node in the game tree belongs to\na player, who acts at that node; for the purpose of this paper, we focus on two-player games only.\nEdges leaving a node correspond to actions that can be taken at that node. In order to capture private\ninformation, the game tree is supplemented with information sets. Each node belongs to exactly one\ninformation set, and each information set is a nonempty set of tree nodes for the same Player i, which\nare the set of nodes that Player i cannot distinguish among, given what they have observed so far. We\nwill focus on perfect-recall EFGs, that is, EFGs where no player forgets what the player knew earlier.\nWe denote by I1 and I2 the sets of all information sets that belong to Player 1 and 2, respectively. All\nnodes that belong to an information set I \u2208 I1 \u222a I2 share the same set of available actions (otherwise\nthe player acting at those nodes would be able to distinguish among them); we denote by AI the set\nof actions available at information set I. We de\ufb01ne the set of sequences of Player i as the set \u03a3i :=\n{(I, a) : I \u2208 Ii, a \u2208 AI} \u222a {\u2205}, where the special element \u2205 is called empty sequence. Given an in-\nformation set I \u2208 Ii, we denote by \u03c3(I) the parent sequence of I, de\ufb01ned as the last pair (I(cid:48), a(cid:48)) \u2208 \u03a3i\nencountered on the path from the root to any node v \u2208 I; if no such pair\nexists (that is, Player i never acts before any node v \u2208 I), we let \u03c3(I) = \u2205.\nWe (recursively) de\ufb01ne a sequence \u03c4 \u2208 \u03a3i to be a descendent of sequence\n\u03c4(cid:48) \u2208 \u03a3i, denoted by \u03c4 (cid:23) \u03c4(cid:48), if \u03c4 = \u03c4(cid:48) or if \u03c4 = (I, a) and \u03c3(I) (cid:23) \u03c4(cid:48). We\nuse the notation \u03c4 (cid:31) \u03c4(cid:48) to mean \u03c4 (cid:23) \u03c4(cid:48) \u2227 \u03c4 (cid:54)= \u03c4(cid:48). Figure 1 shows a small\nexample EFG; black round nodes belong to Player 1, white round nodes\nbelong to Player 2, action names are not shown, gray round sets de\ufb01ne\ninformation sets, and the numbers along the edges de\ufb01ne concise names\nfor sequences (for example, \u20187\u2019 denotes sequence (D, a) where a is the\nleftmost action at D).\nIn the sequence-form representation [Romanovskii, 1962; Koller et\nSequence-Form Strategies\nal., 1996; von Stengel, 1996], a strategy for Player i is compactly represented via a vector x indexed\n\nFigure 1: Small example.\n\nA\n1\n\n3 4\n\n5 6\n\n789\n\n789\n\n2\n\nB\n\nC\n\nD\n\n2\n\n\fx must satisfy the \u2018probability mass conservation\u2019 constraint: for all I \u2208 Ii,(cid:80)\n\nby sequences \u03c3 \u2208 \u03a3i. When \u03c3 = (I, a), the entry x[\u03c3] \u2265 0 de\ufb01nes the product of the probabilities\naccording to which Player i takes their actions on the path from the root to information set I, up to\nand including action a; furthermore, x[\u2205] = 1. Hence, in order to be a valid sequence-form strategy,\na\u2208AI x[(I, a)] =\nx[\u03c3(I)]. That is, every information sets partitions the probability mass received from the parent\nsequence onto its actions. In this sense, the constraints that de\ufb01ne the space of sequence-form\nstrategies naturally exhibit a hierarchical structure.\n\n2.1 Extensive-Form Correlated Equilibria\nExtensive-form correlated equilibrium (EFCE) [von Stengel and Forges, 2008] is a natural extension\nof the solution concept of correlated equilibrium (CE) [Aumann, 1974] to extensive-form games. In\nEFCE, a mediator privately reveals recommendations to the players as the game progresses. These\nrecommendations are incremental, in the sense that recommendations for the move to play at each\ndecision point of the game are revealed only if and when the decision point is reached. This is in\ncontrast with CE, where recommendations for the whole game are privately revealed upfront when the\ngame starts. Players are free to not follow the recommended moves, but once a player does not follow\na recommendation, he will not receive further recommendations. In an EFCE, the recommendations\nare incentive-compatible\u2014that is, the players are motivated to follow all recommendations. EFCE\nand CE are good candidates to model strategic interactions in which intermediate forms of centralized\ncontrol can be achieved [Ashlagi et al., 2008].\nIn a recent preprint, Farina et al. [2019c] show that in two-player perfect-recall extensive-form games,\nan EFCE that guarantees a social welfare (that is, sum of player\u2019s utilities) at least \u03c4 is the solution to\na bilinear saddle-point problem, that is an optimization problem of the form minx\u2208X maxy\u2208Y x(cid:62)Ay,\nwhere X and Y are convex and compact sets and A is a matrix of real numbers. In the case of\nEFCE, X = \u039e is known as the polytope of correlation plans (see Section 2.2) and Y is the convex\nhull of certain sequence-form strategy spaces. In general, \u039e cannot be captured by a polynomially\nsmall set of constraints, since computing an optimal EFCE in a two-player perfect-recall game is\ncomputationally hard [von Stengel and Forges, 2008].2 However, in the special case of games with\nno chance moves, this is not the case, and \u039e is the intersection of a polynomial (in the game tree\nsize) number of constraints, as discussed in the next subsection. In fact, most of the current paper\nis devoted to studying the structure of \u039e. We will largely ignore Y, for which an ef\ufb01cient regret\nminimizer can already be built, for instance by using the theory of regret circuits [Farina et al., 2019b]\n(see also Appendix A in the full paper). Similarly, we will not use any property of matrix A (except\nthat it can be computed and stored ef\ufb01ciently).\n\n2.2 Polytope of Extensive-Form Correlation Plans in Games with no Chance Moves\nIn their seminal paper, von Stengel and Forges [2008] characterize the constraints that de\ufb01ne the\nspace of extensive-form correlation plans \u039e in the case of two-player perfect-recall games with no\nchance moves. The characterization makes use of the following two concepts:\nDe\ufb01nition 1 (Connected information sets, I1 (cid:10) I2). Let I1, I2 be information sets for Player 1\nand 2, respectively. We say that I1 and I2 are connected, denoted I1 (cid:10) I2, if there exist two nodes\nu \u2208 I1, v \u2208 I2 such that u is on the path from the root to v, or v is on the path from the root to u.\nDe\ufb01nition 2 (Relevant sequence pair, \u03c31 (cid:46)(cid:47) \u03c32). Let \u03c31 \u2208 \u03a31, \u03c32 \u2208 \u03a32. We say that (\u03c31, \u03c32) is\na relevant sequence pair, and write \u03c31 (cid:46)(cid:47) \u03c32, if either \u03c31 or \u03c32 or both is the empty sequence, or\nif \u03c31 = (I1, a1) and \u03c32 = (I2, a2) and I1 (cid:10) I2. Similarly, given \u03c31 \u2208 \u03a31 and I2 \u2208 I2, we say\nthat (\u03c31, I2) forms a relevant sequence-information set pair, and write \u03c31 (cid:46)(cid:47) I2, if \u03c31 = \u2205 or if\n\u03c31 = (I1, a1) and I1 (cid:10) I2 (a symmetric statement holds for I1 (cid:46)(cid:47) \u03c32).\nDe\ufb01nition 3 (von Stengel and Forges [2008]). In a two-player perfect-recall extensive-form game\nwith no chance moves, the space \u039e of correlation plans is a convex polytope containing nonnegative\nvectors indexed over relevant sequences pairs, and is de\ufb01ned as\n\n\uf8f1\uf8f2\uf8f3\u03be \u2265 0 :\n\n\u039e :=\n\n\u2022 \u03be[\u2205, \u2205] = 1\n\n\u2022 (cid:80)\n\u2022 (cid:80)\n\na\u2208AI\na\u2208AJ\n\n\u03be[(I1, a), \u03c32] = \u03be[\u03c3(I1), \u03c32] \u2200I1 \u2208 I1, \u03c32 \u2208 \u03a32 s.t. I1 (cid:46)(cid:47) \u03c32\n\u2200I2 \u2208 I2, \u03c31 \u2208 \u03a31 s.t. \u03c31 (cid:46)(cid:47) I2\n\u03be[\u03c31, (I2, a)] = \u03be[\u03c31, \u03c3(I2)]\n\n2A feasible EFCE can be found in theoretical polynomial time [Huang and von Stengel, 2008; Huang, 2011]\nusing the ellipsoid-against-hope algorithm [Papadimitriou and Roughgarden, 2008; Jiang and Leyton-Brown,\n2015]. Unfortunately, that algorithm is known to not scale beyond small games.\n\n3\n\n\uf8fc\uf8fd\uf8fe.\n\n\fIn particular, \u039e is the intersection of at most 1 + |I1| \u00b7 |\u03a32| + |\u03a31| \u00b7 |I2| constraints, a polynomial\nnumber in the input game size.\n\n2.3 Regret Minimization and Relationship with Bilinear Saddle-Point Problems\n\nA regret minimizer is a device that supports two operations: (i) RECOMMEND, which provides\nthe next decision xt+1 \u2208 X , where X is a nonempty, convex, and compact subset of a Euclidean\nspace Rn; and (ii) OBSERVELOSS, which receives/observes a convex loss function (cid:96)t that is used\nto evaluate decision xt [Zinkevich, 2003]. In this paper, we will consider linear loss functions,\nwhich we represent in the form of a vector (cid:96)t \u2208 Rn. A regret minimizer is an online decision\nmaker in the sense that each decision is made by taking into account only past decisions and their\ncorresponding losses. The quality metric for the regret minimizer is its cumulative regret RT , de\ufb01ned\nRT :=(cid:80)T\n(cid:80)T\nas the difference between the loss cumulated by the sequence of decisions x1, . . . , xT and the loss\nthat would have been cumulated by the best-in-hindsight time-independent decision \u02c6x. Formally,\nt=1(cid:104)(cid:96)t, \u02c6x(cid:105). A \u2018good\u2019 regret minimizer has RT sublinear in T ;\nthis property is known as Hannan consistency. Hannan consistent regret minimizers can be used\nto converge to a solution of a bilinear saddle-point problem (Section 2.1). To do so, two regret\nminimizers, one for X and one for Y, are set up so that at each time t they observe loss vectors\n(cid:80)T\nx := \u2212Ayt and (cid:96)t\ny := A(cid:62)xt, respectively, where xt \u2208 X and yt \u2208 Y are the decisions output\n(cid:96)t\nby the two regret minimizers. A well-known folk theorem asserts that in doing so, at time T the\nt=1 yt) have saddle-point gap (a standard measure\naverage decisions ( \u00afxT , \u00afyT ) := ( 1\nT\nof how close a point is to being a saddle-point) \u03b3( \u00afxT , \u00afyT ) := max \u02c6x\u2208X \u02c6x(cid:62)A \u00afyT \u2212 min \u02c6y\u2208Y ( \u00afxT )(cid:62)A \u02c6y\nare the cumulative regrets of the\nbounded above by \u03b3( \u00afxT , \u00afyT ) \u2264 (RT\nregret minimizers. Since the regrets grow sublinearly, \u03b3( \u00afxT , \u00afyT ) \u2192 0 as T \u2192 +\u221e. As discussed in\nthe introduction, this approach has been extremely successful in computational game theory.\n\nt=1(cid:104)(cid:96)t, xt(cid:105) \u2212 min \u02c6x\u2208X\n\n(cid:80)T\n\nt=1 xt, 1\nT\n\nX + RT\n\nY )/T where RT\nX\n\nand RT\nY\n\n3 Scaled Extension: A Convexity-Preserving Operation for Incrementally\n\nConstructing Strategy Spaces\n\nIn this section, we introduce a new convexity-preserving operation between two sets. We show that\nit provides an alternative way of constructing the strategy space of a player in an extensive-form\ngame that is different from the construction based on convex hulls and Cartesian products described\nby Farina et al. [2019b]. Our new construction enables one to incrementally extend the strategy\nspace in a top-down fashion, whereas the construction by Farina et al. [2019b] was bottom-up.\nMost importantly, as we will show in Section 3.1, this new operation enables one to incrementally,\nrecursively construct the extensive-form correlated strategy space (again in a top-down fashion).\nDe\ufb01nition 4. Let X and Y be nonempty, compact and convex sets, and let h : X \u2192 R+ be a\nnonnegative af\ufb01ne real function. The scaled extension of X with Y via h is de\ufb01ned as the set\n\nh\n\nX\n\n(cid:47) Y := {(x, y) : x \u2208 X , y \u2208 h(x)Y}.\n\nSince we will be composing multiple scaled extensions together, it is important to verify that the\noperation above not only preserves convexity, but also preserves the non-emptiness and compactness\nof the sets (a proof of the following Lemma is available in Appendix B in the full paper):\nLemma 1. Let X ,Y and h be as in De\ufb01nition 4. Then X\n3.1 Construction of the Set of Sequence-Form Strategies\n\n(cid:47) Y is nonempty, compact and convex.\n\nh\n\nThe scaled extension operation can be used to construct the polytope of a perfect-recall player\u2019s\nstrategy in sequence-form in an extensive-form game. We illustrate the approach in the small example\nof Figure 1; the generalization to any extensive-form strategy space is immediate. As noted in\nSection 2, any valid sequence-form strategy must satisfy probability mass constraints, and can be\nconstructed incrementally in a top-down fashion, as follows (in the following we refer to the same\nnaming scheme as in Figure 1 for the sequences of Player 1):\n\ni. First, the empty sequence is set to value x[\u2205] = 1.\nii. (Info set A) Next, the value x[\u2205] is partitioned into the two non-negative values x[1]+x[2] = x[\u2205].\n\n4\n\n\fiii. (Info set B) Next, the value x[1] is partitioned into two non-negative values x[3] + x[4] = x[1].\niv. (Info set C) Next, the value x[1] is partitioned into two non-negative values x[5] + x[6] = x[1].\nv. (Info set D) Next, the value x[2] is partitioned into 3 non-negative values x[7]+x[8]+x[9] = x[2].\nThe incremental choices in the above recipe can be directly translated\u2014in the same order\u2014into set\noperations by using scaled extensions, as follows:\ni. First, the set of all feasible values of sequence x[\u2205] is the singleton X0 := {1}.\nii. Then, the set of all feasible values of (x[\u2205], x[1], x[2]) is the set X1 := X0 \u00d7 \u22062 = X0 (cid:47) h1 \u22062,\nwhere h1 is the linear function h1 : X0 (cid:51) x[\u2205] (cid:55)\u2192 x[\u2205] (the identity function).\niii. In order to characterize the set of all feasible values of (x[\u2205], . . . , x[4]) we start from X1, and\nextend any element (x[\u2205], x[1], x[2]) \u2208 X1 with the two sequences x[3] and x[4], drawn from\nthe set {(x[3], x[4]) \u2208 R+\n2 : x[3] + x[4] = x[1]} = x[1]\u22062. We can express this extension using\nscaled extension: X2 := X1 (cid:47) h2 \u22062, where h2 : X1 (cid:51) (x[\u2205], x[1], x[2]) (cid:55)\u2192 x[1].\niv. Similarly, we can extend every element in X2 to include (x[5], x[6]) \u2208 x[1]\u22062: in this case,\nX3 := X2 (cid:47) h3 \u22062, where h3 : X2 (cid:51) (x[\u2205], x[1], x[2], x[3], x[4]) (cid:55)\u2192 x[1].\nv. The set of all feasible(x[\u2205], .., x[9]) is X4 :=X3 (cid:47) h4 \u22063, where h4 :X3(cid:51) (x[\u2205], . . . ,x[6])(cid:55)\u2192 x[2].\nHence, the polytope of sequence-form strategies for Player 1 in Figure 1 can be expressed as\nh1(cid:47) \u22062 h2(cid:47) \u22062 h3(cid:47) \u22062 h4(cid:47) \u22063, where the scaled extension operation is intended as left associative.\n{1}\n3.2 Regret Minimizer for Scaled Extension\n\nh\n\nIt is always possible to construct a regret minimizer for Z = X\n(cid:47) Y, where h(x) = (cid:104)a, x(cid:105) + b,\nstarting from a regret minimizer for X \u2286 Rm and Y \u2286 Rn. The fundamental technical insight of the\nconstruction is that, given any vector (cid:96) = ((cid:96)x, (cid:96)y) \u2208 Rm \u00d7 Rn, the minimization of a linear function\n(cid:9) = min\n(cid:8)\n(cid:9)\nz (cid:55)\u2192 (cid:104)(cid:96), z(cid:105) over Z can be split into two separate linear minimization problems over X and Y:\ny\u2208Y(cid:104)(cid:96)y, y(cid:105), x(cid:11)(cid:9) + b \u00b7 min\ny\u2208Y(cid:104)(cid:96)y, y(cid:105)\nx\u2208X\ny\u2208Y(cid:104)(cid:96)y, y(cid:105).\n\n(cid:8)\n(cid:8)(cid:10)(cid:96)x + a \u00b7 min\n\nz\u2208Z (cid:104)(cid:96), z(cid:105) = min\nx\u2208X ,y\u2208Y\n= min\nx\u2208X\n\n(cid:104)(cid:96)x, x(cid:105) + h(x)(cid:104)(cid:96)y, y(cid:105)\n\n(cid:104)(cid:96)x, x(cid:105) + h(x) min\n\nmin\n\nThus, it is possible to break the problem of minimizing regret over Z into two regret minimization\nsubproblems over X and Y (more details in Appendix C in the full paper). In particular:\nbe two regret minimizer over X and Y respectively, and let\nProposition 1. Let RM\ndenote their cumulative regret at time T . Then, Algorithm 1 provides a regret minimizer over\nX , RT\nRT\nY\n, where h\u2217 := maxx\u2208X h(x).\nZ whose cumulative regret RT\nZ\nAlgorithm 1 Regret minimizer over the scaled extension X (cid:47) h Y.\n\nis bounded above as RT\n\nX + h\u2217RT\nY\n\nZ \u2264 RT\n\nand RM\n\nX\n\nY\n\nAlgorithm 1 can be composed recursively to construct a regret minimizer for any set that is expressed\nvia a chain of scaled extensions, such as the polytope of sequence-form strategies (Section 3.1) or\nthat of extensive-form correlation plans (Section 4). When used on the polytope of sequence-form\nstrategies, Algorithm 1 coincides with the CFR algorithm if all regret minimizers for the individual\nsimplexes in the chain of scaled extensions are implemented using the regret matching algorithm [Hart\nand Mas-Colell, 2000].\n\n4 Unrolling the Structure of the Correlated Strategy Polytope\n\nIn this section, we study the combinatorial structure of the polytope of correlated strategies (Sec-\ntion 2.2) of a two-player perfect-recall extensive-form game with no chance moves. The central\nresult of this section, Theorem 1, asserts that the correlated strategy polytope \u039e can be expressed via\na chain of scaled extensions. This matches the similar result regarding the sequence-form strategy\npolytope that we discussed in Section 3.1. However, unlike the sequence-form strategy polytope, the\n\n5\n\n1:functionRECOMMEND()2:xt\u2190RMX.RECOMMEND()3:yt\u2190RMY.RECOMMEND()4:return(xt,h(xt)yt)1:functionOBSERVELOSS(\u2018t=(\u2018tx,\u2018ty))2:yt\u2190RMY.RECOMMEND()3:\u02dc\u2018tx\u2190\u2018tx+h\u2018ty,yti\u00b7a4:RMX.OBSERVELOSS(\u02dc\u2018tx)5:RMY.OBSERVELOSS(\u2018ty)\fconstraints that de\ufb01ne the correlated strategy polytope do not exhibit a natural hierarchical structure:\nthe constraints that de\ufb01ne \u039e (De\ufb01nition 3) are such that the same entry of the correlation plan \u03be\ncan appear in multiple constraints, and furthermore the constraints will in general form cycles. This\nmakes the problem of unrolling the structure of \u039e signi\ufb01cantly more challenging.\nThe key insight is that some of the constraints that de\ufb01ne \u039e are redundant (that is, implied by the\nremaining constraints) and can therefore be safely eliminated. Our algorithm identi\ufb01es one such\nset of redundant constraints, and removes them. The set is chosen in such a way that the remaining\nconstraints can be laid down in a hierarchical way that can be captured via a chain of scaled extensions.\n\n4.1 Example\n\nBefore we delve into the technical details of the construction, we illustrate the key idea of the\nalgorithm in a small example. In particular, consider the small game tree of Figure 2 (left), where we\nused the same conventions as in Section 2 and Figure 1. All sequence pairs are relevant; the set of\nconstraints that de\ufb01ne \u039e is shown in Figure 2 (middle).\n\nA\n1\n\n2\n\nB\n\nC\n\n1 2\n\n3 4\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3 \u03be[\u2205, \u2205] = 1,\n\nIn this game, \u039e is de\ufb01ned by the following constraints:\n\n\u2200\u03c31 \u2208 {\u2205, 1, 2},\n\u03be[\u03c31, 1] + \u03be[\u03c31, 2] = \u03be[\u03c31, \u2205]\n\u2200\u03c31 \u2208 {\u2205, 1, 2},\n\u03be[\u03c31, 3] + \u03be[\u03c31, 4] = \u03be[\u03c31, \u2205]\n\u03be[1, \u03c32] + \u03be[2, \u03c32] = \u03be[\u2205, \u03c32] \u2200\u03c32 \u2208 {\u2205, 1, 2, 3, 4}.\n\n\u2205 1\n1\n\n2\n\n4\n\n3\n\n\u2205\n\n1\n\n2\n\n2\n\n3\n\n4\n\n4\n\n3\n\nFigure 2: (Left) Example game (Section 4.1). (Middle) Constraints that de\ufb01ne \u039e in the example game. (Right)\nFill-in order of \u03be. The cell at the intersection of row \u03c31 and column \u03c32 represents the entry \u03be[\u03c31, \u03c32] of \u03be.\nIn order to generate all possible correlation plans \u03be \u2208 \u039e, we proceed as follows. First, we assign\n\u03be[\u2205, \u2205] = 1. Then, we partition \u03be[\u2205, \u2205] into two non-negative values (\u03be[1, \u2205], \u03be[2, \u2205]) \u2208 \u03be[\u2205, \u2205]\u22062\nin accordance with the constraint \u03be[1, \u2205] + \u03be[2, \u2205] = \u03be[\u2205, \u2205]. Next, using the constraints\n\u03be[\u03c31, 1] + \u03be[\u03c31, 2] = \u03be[\u03c31, \u2205] and \u03be[\u03c31, 3] + \u03be[\u03c31, 4] = \u03be[\u03c31, \u2205], we pick values (\u03be[\u03c31, 1], \u03be[\u03c31, 2]) \u2208\n\u03be[\u03c31, \u2205]\u22062 and (\u03be[\u03c31, 3], \u03be[\u03c31, 4]) \u2208 \u03be[\u03c31, \u2205]\u22062 for \u03c31 \u2208 {1, 2}. So far, our strategy for \ufb01lling the\ncorrelation plan has been to split entries according to the information structure of the players. As\nshown in Section 3.1, these steps can be expressed via scaled extension operations.\nNext, we \ufb01ll in the four remaining entries in \u03be, that is \u03be[\u2205, \u03c32] for \u03c32 \u2208 {1, 2, 3, 4}, in accordance\nwith constraint \u03be[1, \u03c32] + \u03be[2, \u03c32] = \u03be[\u2205, \u03c32]. In this step, we are not splitting any value; rather, we\n\ufb01ll in \u03be[\u2205, \u03c32] in the only possible way (that is, \u03be[\u2205, \u03c32] = \u03be[1, \u03c32] + \u03be[2, \u03c32]), by means of a linear\ncombination of already-\ufb01lled-in entries. This operation can be also expressed via scaled extensions,\nwith the singleton set {1}: {(\u03be[1, \u03c32], \u03be[2, \u03c32], \u03be[\u2205, \u03c32])} = {(\u03be[1, \u03c32], \u03be[2, \u03c32])} (cid:47) h{1}, where\nh : (\u03be[1, \u03c32], \u03be[2, \u03c32]) (cid:55)\u2192 \u03be[1, \u03c32] + \u03be[2, \u03c32] (note that h respects the requirements of De\ufb01nition 4).\nThis way, we have \ufb01lled in all entries in \u03be. However, only 9 out of the 11 constraints have been\ntaken into account in the construction, and we still need to verify that the two leftover constraints\n\u03be[\u2205, 1] + \u03be[\u2205, 2] = \u03be[\u2205, \u2205] and \u03be[\u2205, 3] + \u03be[\u2205, 4] = \u03be[\u2205, \u2205] are automatically satis\ufb01ed by our way\nof \ufb01lling in the entries of \u03be. Luckily, this is always the case: by construction, \u03be[\u2205, 1]+\u03be[\u2205, 2] =\n(\u03be[1, 1]+\u03be[1, 2])+(\u03be[2, 1]+\u03be[2, 2]) = \u03be[1, \u2205]+\u03be[2, \u2205] = \u03be[\u2205, \u2205] (the proof for \u03be[\u2205, 3] + \u03be[\u2205, 4] is\nanalogous). We summarize the construction steps pictorially in Figure 2 (right).\nRemark 1. Similar construction that starts from assigning values for \u03be[\u2205, \u03c32] (\u03c32 \u2208 {1, 2, 3, 4}\nusing constraints \u03be[\u2205, 1] + \u03be[\u2205, 2] = \u03be[\u2205, \u2205], \u03be[\u2205, 3] + \u03be[\u2205, 4] = \u03be[\u2205, \u2205] and \ufb01lls out \u03be[\u03c31, \u03c32] for\n(\u03c31, \u03c32) \u2208 {1, 2}\u00d7{1, 2, 3, 4} would have not been successful: if (\u03be[1, 1], \u03be[1, 2]) and (\u03be[1, 3], \u03be[1, 4])\nare \ufb01lled in independently, there is no way of guaranteeing that \u03be[1, 1] + \u03be[1, 2] = \u03be[1, 3] + \u03be[1, 4]\n(= \u03be[1, \u2205]) as required by the constraints.\n\n4.2 An Unfavorable Case that Cannot Happen in Games with No Chance Moves\n\nWe now show that there exist game instances in which the general approach used in the previous\nsubsection fails. In particular, consider a relevant sequence pair (\u03c31, \u03c32) such that both \u03c31 and \u03c32 are\nparent sequences of two information sets of Player 1 and Player 2 respectively, and assume that all\nsequence pairs in the game are relevant. Then, no matter what the order of operations is, the situation\ndescribed in Remark 1 cannot be avoided. Luckily, in two-player perfect-recall games with no chance\nmoves, one can prove that this occurrence never happens (see Appendix D in the full paper for a\nproof):\n\n6\n\n\f(cid:10) I(cid:48)2.\n\nProposition 2. Consider a two-player perfect-recall game with no chance moves, and let (\u03c31, \u03c32)\nbe a relevant sequence pair, let I1, I(cid:48)1 be two distinct information sets of Player 1 such that \u03c3(I1) =\n\u03c3(I(cid:48)1) = \u03c31, and let I2, I(cid:48)2 be two distinct information sets of Player 2 such that \u03c3(I2) = \u03c3(I(cid:48)2) = \u03c32.\nIt is not possible that both I1 (cid:10) I2 and I(cid:48)1\nIn other words, if I1 (cid:10) I2, then any pair of sequences (\u03c3(cid:48)1, \u03c3(cid:48)2) where \u03c3(cid:48)1 belongs to I(cid:48)1 and \u03c3(cid:48)2 belongs\nto I(cid:48)2 is irrelevant. As we show in the next subsection, this is enough to yield a polynomial-time\nalgorithm to \u2018unroll\u2019 the process of \ufb01lling in the entries of \u03be \u2208 \u039e in any two-player perfect-recall\nextensive-form game with no chance moves. The following de\ufb01nition is crucial for that algorithm:\nDe\ufb01nition 5. Let (\u03c31, \u03c32) be a relevant sequence pair, and let I1 \u2208 I1 be an information set for\nPlayer 1 such that \u03c3(I1) = \u03c31. Information set I1 is called critical for \u03c32 if there exists at least one\nI2 \u2208 I2 with \u03c3(I2) = \u03c32 such that I1 (cid:10) I2. (A symmetric de\ufb01nition holds for an I2 \u2208 I2.)\nIt is a simple corollary of Proposition 2 that for any relevant sequence pair, at least one player has at\nmost one critical information set for the opponent\u2019s sequence. We call such a player critical for that\nrelevant sequence pair.\n\n4.3 A Polynomial-Time Algorithm that Decomposes \u039e using Scaled Extensions\n\nIn this section, we present the central result of the paper: an ef\ufb01cient algorithm that expresses \u039e as a\nchain of scaled extensions of simpler sets. In particular, as we have already seen in Section 4.1, each\nset in the decomposition is either a simplex (when splitting an already-\ufb01lled-in entry) or the singleton\nset {1} (when summing already \ufb01lled-in entries and assigning the result to a new entry of \u03be).\nThe algorithm consists of a recursive function, DECOMPOSE, which takes three arguments: a relevant\nsequence pair (\u03c31, \u03c32), a subset S of the set of all relevant sequence pairs, and a set D of vectors with\nentries indexed by the elements in S. S represents the set of indices of \u03be that have already been \ufb01lled\nin, while D is the set of all partially-\ufb01lled-in correlation plans (see Section 4.1). The decomposition for\nthe whole polytope \u039e is obtained by evaluating DECOMPOSE((\u2205, \u2205),S = {(\u2205, \u2205)},D = {(1)}),\nwhich corresponds to the starting situation in which only the entry \u03be[\u2205, \u2205] has been \ufb01lled in (with\nthe value 1 as per De\ufb01nition 3). Each call to DECOMPOSE returns a pair (S(cid:48),D(cid:48)) of updated\nindices and partial vectors, to re\ufb02ect the new entries that were \ufb01lled in during the call. Each call to\nDECOMPOSE((\u03c31, \u03c32),S,D) works as follows:\n\u2022 First, the algorithm \ufb01nds one critical player for the relevant sequence pair (\u03c31, \u03c32) (see end of\nSection 4.2). Assume without loss of generality that Player 1 is critical (the other case is symmetric),\nand let I\u2217 \u2286 I1 be the set of critical information sets for \u03c32 that belong to Player 1. By de\ufb01nition\nof critical player, I\u2217 is either a singleton or it is an empty set.\n\n\u2022 For each I \u2208 I1 such that \u03c3(I) = \u03c31 and I (cid:46)(cid:47) \u03c32, we:\n\n\u2013 Fill in all entries {\u03be[(I\u2217, a), \u03c32] : a \u2208 AI} by splitting \u03be[\u03c31, \u03c32]. This is re\ufb02ected by updating\nthe set of \ufb01lled-in-indices S \u2190 S \u222a {((I, a), \u03c32)} and extending D via a scaled extension:\nD \u2190 D (cid:47) h \u2206|AI| where h extracts \u03be[\u03c31, \u03c32] from any partially-\ufb01lled-in vector.\n\u2013 Then, for each a \u2208 AI we assign (S,D) \u2190 DECOMPOSE(((I, a), \u03c32),S,D).\n\nAfter this step, all the indices in {(\u03c3(cid:48)1, \u03c3(cid:48)2) : \u03c3(cid:48)1 (cid:31) \u03c31, \u03c3(cid:48)2 (cid:23) \u03c32} \u222a {(\u03c31, \u03c32)} have been \ufb01lled in,\nand none of the indices in {(\u03c31, \u03c3(cid:48)2) : \u03c3(cid:48)2 (cid:31) \u03c32} have been \ufb01lled in yet.\n\u2022 Finally, we \ufb01ll out all indices in {(\u03c31, \u03c3(cid:48)2) : \u03c3(cid:48)2 (cid:31) \u03c32}. We do so by iterating over all information\nsets J \u2208 I2 such that \u03c3(J) (cid:23) \u03c32 and \u03c31 (cid:46)(cid:47) J. For each such J, we split into two cases, according\nsigning its value in accordance with the constraint \u03be[\u03c31, (J, a)] =(cid:80)\nto whether I\u2217 = {I\u2217} (for some I\u2217 \u2208 I1, as opposed to I\u2217 being empty) and J (cid:10) I\u2217, or not:\n\u2013 If I\u2217 = {I\u2217} and J (cid:10) I\u2217, then for all a \u2208 AJ we \ufb01ll in the sequence pair \u03be[\u03c31, (J, a)] by as-\nvector to the value of(cid:80)\na\u2217\u2208AI\u2217 \u03be[(I\u2217, a\u2217), (J, a)]\nvia the scaled extension D \u2190 D (cid:47) h{1} where the linear function h maps a partially-\ufb01lled-in\n\u2013 Otherwise, we \ufb01ll in the entries {\u03be[\u03c31, (J, a)] : a \u2208 AJ}, by splitting the value \u03be[\u03c31, \u03c3(J)].\nIn other words, we let D \u2190 D (cid:47) h \u2206|AJ| where h extracts the entry \u03be[\u03c31, \u03c3(J)] from a\npartially-\ufb01lled-in vector in D.\n\u2022 At this point, all the entries corresponding to indices \u02dcS = {(\u03c3(cid:48)1, \u03c3(cid:48)2) : \u03c3(cid:48)1 (cid:23) \u03c31, \u03c3(cid:48)2 (cid:23) \u03c32} have\n\na\u2217\u2208AI\u2217 \u03be[(I\u2217, a\u2217), (J, a)].\n\nbeen \ufb01lled in, and we return (S \u222a \u02dcS,D).\n\nEvery call to DECOMPOSE increases the cardinality of S by at least one unit. Since S is a subset of\nthe set of relevant sequence pairs, and since the total number of relevant sequence pair is polynomial\n\n7\n\n\fin the input game tree size, the algorithm runs in polynomial time. See Appendix E in the full paper\nfor pseudocode, as well as a proof of correctness of the algorithm. Since every change to D is done\nvia scaled extensions (with either a simplex or the singleton set {1}), we conclude that:\nTheorem 1. In a two-player perfect-recall EFG with no chance moves, the space of correlation\nplans \u039e can be expressed via a sequence of scaled extensions with simplexes and singleton sets:\nhn(cid:47) Xn, where, for i = 1, . . . , n, either Xi = \u2206si or Xi = {1},\n\nh3(cid:47) \u00b7\u00b7\u00b7\n(1)\nand hi(\u00b7) = (cid:104)ai,\u00b7(cid:105) is a linear function. Furthermore, an exact algorithm exists to compute such\nexpression in polynomial time.\n\n\u039e = {1}\n\nh1(cid:47) X1\n\nh2(cid:47) X2\n\nWe can recursively use Algorithm 1 on the expression (1) to obtain a regret minimizer for \u039e. The\nresulting algorithm, shown in Algorithm 3 of Appendix F in the full paper, is contingent on a choice of\n\u201clocal\u201d regret minimizers RMi for each of the simplex domains \u2206si in (1). By virtue of Algorithm 1,\nif each local regret minimizer RMi for \u2206si runs in linear time (i.e., computes recommendations and\nobserves losses by running an algorithm whose complexity is linear in si)3, then the overall regret\nminimization algorithm for \u039e runs in linear time in the number of relevant sequence pairs of the\ngame. Furthermore, Proposition 1 immediately implies that if each RMi is Hannan consistent, then\nso is our overall algorithm for \u039e. Putting these observations together, we conclude:\nTheorem 2. For any two-player extensive-form game with no chance moves, there exists a Hannan\nconsistent regret minimizer for \u039e that runs in linear time in the number of relevant sequence pairs.\n\n5 Experimental Evaluation\n\nBoard Num Ship\nsize\nlength\n\nWe experimentally evaluate the scalability of our regret-minimization algorithm for computing an\nextensive-form correlated equilibrium. In particular, we implement a regret minimizer for the space\nof correlation plans by computing the structural decomposition of \u039e into a chain of scaled extensions\n(Section 4.3) and repeatedly applying the construction of Section 3.2. This regret minimizer is then\nused on the saddle-point formulation of an EFCE (Section 2.1) as explained in Section 2.3, with\ntwo modi\ufb01cations that are standard in the literature on regret minimization algorithms for game\ntheory [Tammelin et al., 2015; Burch et al., 2019]: (i) alternating updates and (ii) linear averaging of\nthe iterates4. We use regret-matching-plus [Tammelin et al., 2015] to minimize the regret over the\nsimplex domains in the structural decomposition. These variants are known to be bene\ufb01cial in the case\nof Nash equilibrium, and we observed the same for EFCE. We compare our algorithm to two known\nalgorithms in the literature. The \ufb01rst is based on linear programming [von Stengel and Forges, 2008].\nThe second is a very recent subgradient descent algorithm\nfor this problem [Farina et al., 2019c], which leverages a\nrecent subgradient descent technique [Wang and Bertsekas,\n2013]. All algorithms were run on a machine with 16 GB\nof RAM and an Intel i7 processor with 8 cores. We used\nthe Gurobi commercial solver (while allowing it to use\nany number of threads) to solve the LP when evaluating\nthe scalability of the LP-based method proposed by von\nStengel and Forges [2008].\nGame instances. We test the scalability of our algorithm in a benchmark game for EFCE that was\nrecently proposed by Farina et al. [2019b]: a parametric variant of the classical war game Battleship.\nTable 1 shows some statistics about the three game instances that we use, including the number of\nrelevant sequence pairs in the game (De\ufb01nition 2). \u2018Board size\u2019 refers to the size of the Battleship\nplay\ufb01eld; each player has a \ufb01eld of that size in which to place his ship. \u2018Num turns\u2019 refers to the\nmaximum number of shots that each player can take (in turns). \u2018Ship length\u2019 is the length of the\none ship that each player has. Despite the seemingly small board sizes and the presence of only one\nship per player, the game trees for these instances are quite large, with each player having tens of\nthousands to millions of sequences.\nScalability of the Linear Programming Approach [von Stengel and Forges, 2008]. Only the\nsmall instance could be solved by Gurobi, Figure 3 (left). (Out of the LP algorithms provided by\n3Linear-time regret minimizers for simplexes include regret-matching [Hart and Mas-Colell, 2000], regret-\nmatching-plus [Tammelin et al., 2015], mirror-descent and follow-the-regularized-leader (e.g, Hazan [2016]).\n\nTable 1: Game metrics for the different\ninstances of the Battleship game we test on.\n\n|\u03a31|\n3.89M\n15k\n145k\n26.4M\n970k 2.27M 111M\n\n4The linear average of n vectors \u03be1, . . . , \u03ben is ((cid:80)n\n\nt=1 t \u00b7 \u03bet)/((cid:80)n\n\nt=1 t) = 2((cid:80)n\n\nt=1 t \u00b7 \u03bet)/(n(n + 1)).\n\n|\u03a32|\n47k\n306k\n\nNum. rel.\nseq. pairs\n\nturns\n\n(3, 2)\n(3, 2)\n(3, 2)\n\n3\n4\n4\n\n1\n1\n2\n\n8\n\n\fFigure 3: Experimental results. The y-axis shows the maximum utility increase upon deviation.\n\nGurobi, the barrier method was faster than the primal- and dual-simplex methods.) On the medium\nand large instance, Gurobi was killed by the system for trying to allocate too much memory. Farina\net al. [2019c] report that the large instance needs more than 500GB of memory in order for Gurobi\nto run. The Gurobi run time shown in Figure 3 does not include the time needed to construct and\ndestruct the Gurobi LP objects, which is negligible.\nScalability of the Very Recent Subgradient Technique [Farina et al., 2019c]. The very recent\nsubgradient descent algorithm for this problem was able to solve the small and medium instances if\nthe algorithm\u2019s step size was tuned well. An advantage of our technique is that it has no parameters\nto tune. Another issue is that the iterates \u039e of the subgradient algorithm are not feasible while ours\nare. Furthermore, on the large instance, the subgradient technique was already essentially unusable\nbecause each iteration took over an hour (mainly due to computing the projection).\nFigure 3 shows the experimental performance of the subgradient descent algorithm. We used a step\nsize of 10\u22123 in the small instance and of 10\u22126 in the medium instance. Since the iterates produced by\nthe subgradient technique are not feasible, extra care has to be taken when comparing the performance\nof the subgradient method to that of our approach or Gurobi. Figure 5 in Appendix G in the full paper\nreports the infeasibility of the iterates produced by the subgradient technique over time.\nScalability of Our Approach. We implemented the structural decomposition algorithm of Sec-\ntion 4.3. Our parallel implementation using 8 threads has a runtime of 2 seconds on the small instance,\n6 seconds on the medium instance, and 40 seconds on the large instance (each result was averaged\nover 10 runs). Finally, we evaluated the performance of the regret minimizer constructed according\nto Section 3.2; the results are in Figure 3 (left) for the small instance and Figure 3 (right) for the\nmedium and large instance. The plots do not include the time needed to construct and destruct the\nregret minimizers in memory, which again is negligible. As expected, on the small instance, the rate\nof convergence of our regret minimizer (a \ufb01rst-order method) is slower than that of the barrier method\n(a second-order method). However, the barrier method incurs a large overhead at the beginning, since\nGurobi spends time factorizing the constraint matrix and computing a good ordering of variables for\nthe elimination tree. The LP-based approach could not solve the medium or large instance, while ours\ncould. Even on the largest instance, no more than 2GB of memory was reserved by our algorithm.\n\n6 Conclusions\n\nWe introduced the \ufb01rst ef\ufb01cient regret minimization algorithm for \ufb01nding an extensive-form correlated\nequilibrium in large two-player general-sum games with no chance moves. This is more challenging\nthan designing an algorithm for Nash equilibrium because the constraints that de\ufb01ne the space of\ncorrelation plans lack the hierarchical structure of sequential strategy spaces and might even form\ncycles. We showed that some of the constraints are redundant and can be excluded from consideration,\nand presented an ef\ufb01cient algorithm that generates the space of extensive-form correlation plans\nincrementally from the remaining constraints. We achieved this decomposition via a special convexity-\npreserving operation that we coined scaled extension. We showed that a regret minimizer can be\ndesigned for a scaled extension of any two convex sets, and that from the decomposition we then\nobtain a global regret minimizer. Our algorithm produces feasible iterates. Experiments showed that\nit signi\ufb01cantly outperforms prior approaches\u2014the LP-based approach and a very recent subgradient\ndescent algorithm\u2014and for larger problems it is the only viable option.\n\n9\n\n010020030040050060010\u22121010\u2212810\u2212610\u2212410\u22122100102GurobiOursSubgradientRuntime[s]MaxdeviationSmallBattleshipinstance05001,0001,50010\u2212410\u2212310\u2212210\u22121100Ours,MediumSubgradient,MediumOurs,LargeRuntime[s]MaxdeviationMediumandlargeBattleshipinstance\fAcknowledgments\n\nThis material is based on work supported by the National Science Foundation under grants IIS-\n1718457, IIS-1617590, and CCF-1733556, and the ARO under award W911NF-17-1-0082. Gabriele\nFarina is supported by a Facebook fellowship. Co-authors Ling and Fang are supported in part by a\nresearch grant from Lockheed Martin.\n\nReferences\nItai Ashlagi, Dov Monderer, and Moshe Tennenholtz. On the value of correlation. Journal of Arti\ufb01cial\n\nIntelligence Research, 33:575\u2013613, 2008.\n\nRobert Aumann. Subjectivity and correlation in randomized strategies. Journal of Mathematical\n\nEconomics, 1:67\u201396, 1974.\n\nMichael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-up limit hold\u2019em\n\npoker is solved. Science, 347(6218), January 2015.\n\nNoam Brown and Tuomas Sandholm. Safe and nested subgame solving for imperfect-information\ngames. In Proceedings of the Annual Conference on Neural Information Processing Systems\n(NIPS), pages 689\u2013699, 2017.\n\nNoam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats\n\ntop professionals. Science, page eaao1733, Dec. 2017.\n\nNoam Brown and Tuomas Sandholm. Solving imperfect-information games via discounted regret\n\nminimization. In AAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2019.\n\nNoam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science, 365(6456):885\u2013\n\n890, 2019.\n\nNoam Brown, Christian Kroer, and Tuomas Sandholm. Dynamic thresholding and pruning for regret\n\nminimization. In AAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2017.\n\nNeil Burch, Matej Moravcik, and Martin Schmid. Revisiting CFR+ and alternating updates. Journal\n\nof Arti\ufb01cial Intelligence Research, 64:429\u2013443, 2019.\n\nTrevor Davis, Kevin Waugh, and Michael Bowling. Solving large extensive-form games with strategy\n\nconstraints. In AAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2019.\n\nGabriele Farina, Christian Kroer, and Tuomas Sandholm. Regret minimization in behaviorally-\n\nconstrained zero-sum games. In International Conference on Machine Learning (ICML), 2017.\n\nGabriele Farina, Christian Kroer, and Tuomas Sandholm. Online convex optimization for sequential\ndecision processes and extensive-form games. In AAAI Conference on Arti\ufb01cial Intelligence\n(AAAI), 2019.\n\nGabriele Farina, Christian Kroer, and Tuomas Sandholm. Regret circuits: Composabilty of regret\n\nminimizers. In International Conference on Machine Learning (ICML), 2019.\n\nGabriele Farina, Chun Kai Ling, Fei Fang, and Tuomas Sandholm. Correlation in extensive-form\ngames: Saddle-point formulation and benchmarks. In Proceedings of the Annual Conference on\nNeural Information Processing Systems (NeurIPS), 2019.\n\nGeoffrey J Gordon, Amy Greenwald, and Casey Marks. No-regret learning in convex games. In\nProceedings of the 25th international conference on Machine learning, pages 360\u2013367. ACM,\n2008.\n\nSergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium.\n\nEconometrica, 68:1127\u20131150, 2000.\n\nElad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization,\n\n2(3-4):157\u2013325, 2016.\n\n10\n\n\fWan Huang and Bernhard von Stengel. Computing an extensive-form correlated equilibrium in\npolynomial time. In International Workshop On Internet And Network Economics (WINE), pages\n506\u2013513. Springer, 2008.\n\nWan Huang. Equilibrium computation for extensive games. PhD thesis, London School of Economics\n\nand Political Science, January 2011.\n\nAlbert Xin Jiang and Kevin Leyton-Brown. Polynomial-time computation of exact correlated\n\nequilibrium in compact games. Games and Economic Behavior, 91:347\u2013359, 2015.\n\nDaphne Koller, Nimrod Megiddo, and Bernhard von Stengel. Ef\ufb01cient computation of equilibria for\n\nextensive two-person games. Games and Economic Behavior, 14(2), 1996.\n\nMatej Morav\u02c7c\u00edk, Martin Schmid, Neil Burch, Viliam Lis\u00fd, Dustin Morrill, Nolan Bard, Trevor\nDavis, Kevin Waugh, Michael Johanson, and Michael Bowling. Deepstack: Expert-level arti\ufb01cial\nintelligence in heads-up no-limit poker. Science, 356(6337), May 2017.\n\nChristos H Papadimitriou and Tim Roughgarden. Computing correlated equilibria in multi-player\n\ngames. Journal of the ACM, 55(3):14, 2008.\n\nI. Romanovskii. Reduction of a game with complete memory to a matrix game. Soviet Mathematics,\n\n3, 1962.\n\nShai Shalev-Shwartz and Yoram Singer. A primal-dual perspective of online learning algorithms.\n\nMachine Learning, 69(2-3):115\u2013142, 2007.\n\nOskari Tammelin, Neil Burch, Michael Johanson, and Michael Bowling. Solving heads-up limit\nTexas hold\u2019em. In Proceedings of the 24th International Joint Conference on Arti\ufb01cial Intelligence\n(IJCAI), 2015.\n\nBernhard von Stengel and Fran\u00e7oise Forges. Extensive-form correlated equilibrium: De\ufb01nition and\n\ncomputational complexity. Mathematics of Operations Research, 33(4):1002\u20131022, 2008.\n\nBernhard von Stengel. Ef\ufb01cient computation of behavior strategies. Games and Economic Behavior,\n\n14(2):220\u2013246, 1996.\n\nMengdi Wang and Dimitri P Bertsekas. Incremental constraint projection-proximal methods for\n\nnonsmooth convex optimization. SIAM J. Optim.(to appear), 2013.\n\nMartin Zinkevich, Michael Bowling, Michael Johanson, and Carmelo Piccione. Regret minimization\nIn Proceedings of the Annual Conference on Neural\n\nin games with incomplete information.\nInformation Processing Systems (NIPS), 2007.\n\nMartin Zinkevich. Online convex programming and generalized in\ufb01nitesimal gradient ascent. In\nInternational Conference on Machine Learning (ICML), pages 928\u2013936, Washington, DC, USA,\n2003.\n\n11\n\n\f", "award": [], "sourceid": 2820, "authors": [{"given_name": "Gabriele", "family_name": "Farina", "institution": "Carnegie Mellon University"}, {"given_name": "Chun Kai", "family_name": "Ling", "institution": "Carnegie Mellon University"}, {"given_name": "Fei", "family_name": "Fang", "institution": "Carnegie Mellon University"}, {"given_name": "Tuomas", "family_name": "Sandholm", "institution": "CMU, Strategic Machine, Strategy Robot, Optimized Markets"}]}