{"title": "On Strategy Stitching in Large Extensive Form Multiplayer Games", "book": "Advances in Neural Information Processing Systems", "page_first": 100, "page_last": 108, "abstract": "Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold'em poker, and that a specific class of static experts can be preferred among a number of alternatives. Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition.", "full_text": "On Strategy Stitching in Large Extensive Form\n\nMultiplayer Games\n\nRichard Gibson and Duane Szafron\n\nDepartment of Computing Science, University of Alberta\n\nEdmonton, Alberta, T6G 2E8, Canada\n\n{rggibson | dszafron}@ualberta.ca\n\nAbstract\n\nComputing a good strategy in a large extensive form game often demands an ex-\ntraordinary amount of computer memory, necessitating the use of abstraction to\nreduce the game size. Typically, strategies from abstract games perform better in\nthe real game as the granularity of abstraction is increased. This paper investi-\ngates two techniques for stitching a base strategy in a coarse abstraction of the full\ngame tree, to expert strategies in \ufb01ne abstractions of smaller subtrees. We provide\na general framework for creating static experts, an approach that generalizes some\nprevious strategy stitching efforts. In addition, we show that static experts can cre-\nate strong agents for both 2-player and 3-player Leduc and Limit Texas Hold\u2019em\npoker, and that a speci\ufb01c class of static experts can be preferred among a number\nof alternatives. Furthermore, we describe a poker agent that used static experts\nand won the 3-player events of the 2010 Annual Computer Poker Competition.\n\n1 Introduction\n\nMany sequential decision-making problems are commonly modelled as an extensive form game.\nExtensive games are very versatile due to their ability to represent multiple agents, imperfect infor-\nmation, and stochastic events.\n\nFor many real-world problems, however, the extensive form game representation is too large to be\nfeasibly handled by current techniques. To address this limitation, strategies are often computed\nin abstract versions of the game that group similar states together into single abstract states. For\nvery large games, these abstractions need to be quite coarse, leaving many different states indistin-\nguishable. However, for smaller subtrees of the full game, strategies can be computed in much \ufb01ner\nabstractions. Such \u201cexpert\u201d strategies can then be pieced together, typically connecting to a \u201cbase\nstrategy\u201d computed in the full coarsely-abstracted game. A disadvantage of this approach is that\nwe may make assumptions about the other agents\u2019 strategies. In addition, by computing the base\nstrategy and the experts separately, we may lose \u201ccohesion\u201d among the different components.\n\nWe investigate stitched strategies in extensive form games, focusing on the trade-offs between the\nsizes of the abstractions versus the assumptions made and the cohesion among the computed strate-\ngies. We de\ufb01ne two strategy stitching techniques: (i) static experts that are computed in very \ufb01ne\nabstractions with varying degrees of assumptions and little cohesion, and (ii) dynamic experts that\nare contained in abstractions with lower granularity, but make fewer assumptions and have perfect\ncohesion. This paper generalizes previous strategy stitching efforts [1, 2, 11] under a more general\nstatic expert framework. We use poker as a testbed to demonstrate that, despite recent mixed results,\nstatic experts can create much stronger overall agents than the base strategy alone. Furthermore, we\nshow that under a \ufb01xed memory limitation, a speci\ufb01c class of static experts are preferred to several\n\n1\n\n\falternatives. As a \ufb01nal validation of these results, we describe entries to the 2010 Annual Computer\nPoker Competition1 (ACPC) that used static experts to win the 3-player events.\n\n2 Background\n\nAn extensive form game [9] is a rooted directed tree, where nodes represent decision states, edges\nrepresent actions, and terminal nodes hold end-game utility values for players. For each player, the\ndecision states are partitioned into information sets such that game states within an information set\nare indistinguishable to the player. Non-singleton information sets arise due to hidden information\nthat is only available to a subset of the players, such as private cards in poker. More formally:\n\nDe\ufb01nition 2.1 (Osborne and Rubenstein [9, p. 200]) A \ufb01nite extensive game \u0393 with imperfect in-\nformation has the following components:\n\n\u2022 A \ufb01nite set N of players.\n\u2022 A \ufb01nite set H of sequences, the possible histories of actions, such that the empty sequence is in\nH and every pre\ufb01x of a sequence in H is also in H. Z \u2286 H are the terminal histories (those\nwhich are not a pre\ufb01x of any other sequence). A(h) = {a | ha \u2208 H} are the actions available\nafter a nonterminal history h \u2208 H.\n\n\u2022 A function P that assigns to each nonterminal history h \u2208 H\\Z a member of N \u222a {C}. P is\nthe player function. P (h) is the player who takes an action after the history h. If P (h) = C,\nthen chance determines the action taken after history h. De\ufb01ne Hi := {h \u2208 H | P (h) = i}.\n\n\u2022 A function fC that associates with every history h for which P (h) = C a probability measure\nfC(\u00b7|h) on A(h) (fC(a|h) is the probability that a occurs given h), where each such probability\nmeasure is independent of every other such measure.\n\n\u2022 For each player i \u2208 N a partition Ii of Hi with the property that A(h) = A(h\u2032) whenever h\nand h\u2032 are in the same member of the partition. For I \u2208 Ii, we denote by A(I) the set of A(h)\nand by P (I) the player P (h) for any h \u2208 I. Ii is the information partition of player i; a set\nI \u2208 Ii is an information set of player i.\n\n\u2022 For each player i \u2208 N a utility function ui from the terminal histories Z to the real numbers\nR. If N = {1, 2} and u1 = \u2212u2, it is a 2-player zero-sum extensive game. De\ufb01ne \u2206u,i :=\nmaxz ui(z) \u2212 minz ui(z) to be the range of the utilities for player i.\n\nA strategy for player i, \u03c3i, is a function such that for each information set I \u2208 Ii, \u03c3i(I) is a\nprobability distribution over A(I). Let \u03a3i be the set of all strategies for player i. For h \u2208 I, we\nde\ufb01ne \u03c3i(h) := \u03c3i(I). A strategy pro\ufb01le \u03c3 consists of a strategy \u03c3i for each player i \u2208 N . We let\n\u03c3\u2212i refer to all the strategies in \u03c3 except \u03c3i, and denote ui(\u03c3) to be the expected utility for player i\ngiven that all players play according to \u03c3.\nIn a 2-player zero-sum game, a best response to a player 1 strategy \u03c31 is a player 2 strategy\n\u03c3BR\n2 = argmax\u03c32 u2(\u03c31, \u03c32) (similarly for a player 2 strategy \u03c32). The best response value of \u03c31\nis u2(\u03c31, \u03c3BR\n2 ), which measures the exploitability of \u03c31. The exploitability of a strategy tells us\nhow much that strategy loses to a worst-case opponent. Outside of 2-player zero-sum games, the\nworst-case scenario for player i would be for all other players to minimize player i\u2019s utility instead\nof maximizing their own. In large games, this value is dif\ufb01cult to compute since opponents cannot\nshare private information. Thus, we only investigate exploitability for 2-player zero-sum games.\n\nCounterfactual regret minimization (CFR) [14] is an iterative procedure for computing strategy pro-\n\ufb01les in extensive form games. In 2-player zero-sum games, CFR produces an approximate Nash\nequilibrium pro\ufb01le. In addition, CFR strategies have also been found to compete very well in games\nwith more than 2 players [1]. CFR\u2019s memory requirements are proportional to the number of infor-\nmation sets in the game times the number of actions available at an information set.\n\nThe extensive form game representation of many real-world problems is too large to feasibly com-\npute a strategy directly. A common approach in these games is to \ufb01rst create an abstract game by\ncombining information sets into single abstract states or by disallowing certain actions:\n\n1http://www.computerpokercompetition.org\n\n2\n\n\f\u0001\n\n\u0003\n\n\u0005\n\n\u0002\n\n\u0004\n\n\u0002\n\n\u0001\n\n\u0002\n\n\u0003\n\n\u0003\n\n\u0001\n\n\u0003\n\n\u0003\n\n\u0001\n\n\u0002\n\n\u0003\n\n\u0003\n\n\u0003\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0002\n\n\u0002\n\n\u0002\n\n\u0002\n\n\u0004\n\n\u0001\n\n\u0005\n\n\u0002\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0001\n\n\u0002\n\n\u0001\n\n\u0002\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n(a)\n\n(b)\n\nFigure 1: (a) An abstraction of an extensive game, where states connected by a bold curve are in the\nsame information set and thin curves denote merged abstract information sets. In the unabstracted\ngame, player 1 cannot distinguish between whether chance generated b or c and player 2 cannot\ndistinguish between a and b. In the abstract game, neither player can distinguish between any of\nchance\u2019s outcomes. (b) An example of a game \u0393\u2032 derived from the unabstracted game \u0393 in (a) for a\ndynamic expert strategy. Here, the abstraction from (a) is used as the base abstraction, and the null\nabstraction is employed on the subtree with G1,1 = \u2205 and G2,1 = {al, bl, cl} (bold states).\n\nDe\ufb01nition 2.2 (Waugh et al. [12]) An abstraction for player i is a pair \u03b1i = (cid:10)\u03b1I\n\nis a partition of Hi de\ufb01ning a set of abstract information sets coarser than Ii (i.e., every\n\ni (cid:11), where\n\ni , \u03b1A\n\n\u2022 \u03b1I\ni\nI \u2208 Ii is a subset of some set in \u03b1I\n\ni ), and\n\nis a function on histories where \u03b1A\n\n\u2022 \u03b1A\ni\nh\u2032 in the same abstract information set. We will call this the abstract action set.\n\ni (h) \u2286 A(h) and \u03b1A\n\ni (h) = \u03b1A\n\ni (h\u2032) for all histories h and\n\nThe null abstraction for player i is \u03c6i = hIi, Ai. An abstraction \u03b1 is a set of abstractions \u03b1i,\none for each player. Finally, for any abstraction \u03b1, the abstract game, \u0393\u03b1, is the extensive game\nobtained from \u0393 by replacing Ii with \u03b1I\n\ni (h) when P (h) = i, for all i \u2208 N .\n\ni and A(h) with \u03b1A\n\nFigure 1a shows an example of an abstracted extensive form game with no action abstraction. By\nreducing the number of information sets, computing strategies in an abstract game with an algorithm\nsuch as CFR requires less memory than computing strategies in the real game.\nIntuitively, if a\nstrategy pro\ufb01le for the abstract game \u03c3 performs well in \u0393\u03b1, and if \u03b1I\ni is de\ufb01ned such that merged\ninformation sets are \u201cstrategically similar,\u201d then \u03c3 is also likely to perform well in \u0393. Identifying\nstrategically similar information sets can be delicate though and typically becomes a domain-speci\ufb01c\ntask. Nevertheless, we often would like to have as much granularity in our abstraction as will \ufb01t in\nmemory to allow computed strategies to be as diverse as necessary.\n\n3 Strategy Stitching\n\nTo achieve abstractions with \ufb01ner granularity, a natural approach is to break the game up into sub-\ntrees, abstract each of the subtrees, and compute a strategy for each abstract subtree independently.\nWe introduce a formalism for doing so that generalizes Waugh et al.\u2019s strategy grafting [11] and two\npoker-speci\ufb01c methods described in Section 5. First, select a subset S \u2286 N of players. Secondly,\nfor each i \u2208 S, compute a base strategy \u03c3i for playing the full game. Next, divide the game into\nsubtrees:\n\nDe\ufb01nition 3.1 (Waugh et al. [11]) Gi = {Gi,0, Gi,1, ..., Gi,p} is a grafting partition for player i if\n\n\u2022 Gi is a partition of Hi (possibly containing empty parts),\n\u2022 \u2200I \u2208 Ii, \u2203j \u2208 {0, 1, ..., p} such that I \u2286 Gi,j, and\n\u2022 \u2200j \u2208 {1, 2, ..., p}, h \u2208 Gi,j, and h\u2032 \u2208 Hi, if h is a pre\ufb01x of h\u2032, then h\u2032 \u2208 Gi,j \u222a Gi,0.\n\nFor each i \u2208 S, choose a grafting partition Gi so that each partition has an equal number of parts p.\nThen, compute a strategy, or static expert, for each subtree using any strategy computation technique,\nsuch as CFR. Finally, since the subtrees are disjoint, create a static expert strategy by combining the\nstatic experts without any overlap to the base strategy in the undivided game:\n\n3\n\n\f\u0001\n\n\u0001\n\n\u0002\n\n\u0002\n\n\u0002\n\n\u0004\n\n\u0001\n\n\u0003\n\n\u0001\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0006\n\u0003\n\n\u0001\n\n\u0001\n\n\u0006\n\n\u0003\n\n\u0001\n\n\u0001\n\n\u0006\n\u0003\n\n\u0001\n\n\u0004\n\n\u0001\n\n\u0002\n\n\u0006\n\n\u0003\n\n\u0001\n\n\u0005\n\n\u0001\n\n\u0002\n\n\u0006\n\u0003\n\n\u0001\n\n\u0003\n\n\u0006\n\u0003\n\n\u0001\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0006\n\u0002\n\n\u0001\n\n\u0001\n\n\u0006\n\n\u0002\n\n\u0001\n\n\u0001\n\n\u0006\n\u0002\n\n\u0001\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n\u0004\n\n\u0005\n\n(a)\n\n(b)\n\nFigure 2: Two examples of a game \u0393j for a static expert derived from the unabstracted game \u0393 in\nFigure 1a. In both (a) and (b), G2,j = {al, bl, cl} (bold states). If player 1 takes action r, player 2\nno longer controls his or her decisions. Player 2\u2019s actions are instead generated by the base strategy\n\u03c32, computed beforehand. In (a), we have S = {2}. On the other hand, in (b), S = N = {1, 2},\nG1,j = \u2205, and hence all of player 1\u2019s actions are seeded by the base strategy \u03c31.\n\nDe\ufb01nition 3.2 Let S \u2286 N be a nonempty subset of players. For each i \u2208 S, let \u03c3i be a strategy for\nplayer i and Gi = {Gi,0, Gi,1, ..., Gi,p} be a grafting partition for player i. For j \u2208 {1, 2, ..., p},\nde\ufb01ne \u0393j to be an extensive game derived from the original game \u0393 where, for all i \u2208 S and\nh \u2208 Hi\\Gi,j, we set P (h) = C and fC (a|h) = \u03c3i(h, a). That is, each player i \u2208 S only controls\nactions for histories in Gi,j and is forced to play according to \u03c3i elsewhere. Let the static expert of\n{Gi,j | i \u2208 S}, \u03c3j, be a strategy pro\ufb01le of the game \u0393j. Finally, de\ufb01ne the static expert strategy for\nplayer i, \u03c3S\n\ni , as\n\n\u03c3S\n\ni (h, a) := (cid:26) \u03c3i(h, a)\n\n\u03c3j\ni (h, a)\n\nif h \u2208 Gi,0\nif h \u2208 Gi,j.\n\nWe call {\u03c3i | i \u2208 S} the base or seeding strategies and {Gi | i \u2208 S} the grafting pro\ufb01le for the\nstatic expert strategy \u03c3S\ni .\n\nFigure 2 shows two examples of a game \u0393j for a single static expert. This may be the only subtree\nfor which a static expert is computed (p = 1), or there could be more subtrees contained in the\ngrafting partition(s) (p > 1). Under a \ufb01xed memory limitation, we can employ \ufb01ner abstractions for\nthe subtrees \u0393j than we can in the full game \u0393. This is because \u0393j removes some of the information\nsets belonging to players in S, freeing up memory for computing strategies on the subtrees.\nWhen |S| = 1, the static expert approach is identical to strategy grafting [11, De\ufb01nition 8], with\nthe exception that each static expert need not be an approximate Nash equilibrium. We relax the\nde\ufb01nition for static experts because Nash equilibria are dif\ufb01cult to compute in multiplayer games,\nand may not be the best solution concept outside of 2-player zero-sum games anyways. Choosing\n|S| > 1, however, is dangerous because we \ufb01x opponent probabilities and assume that our opponents\nare \u201cstatic\u201d at certain locations. For example, in Figure 2b, it may not be wise for player 2 to assume\nthat player 1 must follow \u03c31. Doing so can dramatically skew player 2\u2019s beliefs about the action\ngenerated by chance and hurt the expert\u2019s performance against opponents that do not follow \u03c31. As\nwe will see in Section 6, having more static experts with |S| > 1 can result in a more exploitable\nstatic expert strategy. On the other hand, by removing information sets for multiple players, the static\nexpert approach creates smaller subtrees than strategy grafting does. As a result, we can employ even\n\ufb01ner abstractions within the subtrees. Section 6 shows that despite the risks, the abstraction gains\noften lead to static experts with S = N being preferred.\nRegardless of the choice of S, the base strategy lacks \u201ccohesion\u201d with the static experts since its\ncomputation is based on its own play at the subtrees rather than the experts\u2019 play. Though the\nexperts are identically seeded, the base strategy may want to play towards the expert subtrees more\noften to increase utility. This observation motivates our introduction of dynamic experts that are\ncomputed concurrently with a base. The full extensive game is divided into subtrees and each\nsubtree is supplied its own abstraction:\n\n4\n\n\fDe\ufb01nition 3.3 Let \u03b10, \u03b11, ..., \u03b1p be abstractions for the game \u0393 and for each i \u2208 N , let Gi =\n{Gi,0, Gi,1, ..., Gi,p} be a grafting partition for player i satisfying I \u2229 Gi,j \u2208 {\u2205, I} for all j \u2208\n{0, ..., p} and I \u2208 \u03b1j,I\n. Thus, each abstract information set is contained entirely in some part of\nj=0{I \u2208\n(h) when P (h) = i and h \u2208 Gi,j, for all i \u2208 N . Let the\ni, be a strategy for player i of the game \u0393\u2032. Finally de\ufb01ne the\ni|Gi,j . The abstraction \u03b10\n\nthe grafting partition. Let \u0393\u2032 be the abstract game obtained from \u0393 by replacing Ii with Sp\n\ni restricted to the histories in Gi,j, \u03c3\u2032\n\ni\n\ni is denoted as the base strategy.\n\ni\n\n\u03b1j,I\n| I \u2286 Gi,j} and A(h) with \u03b1j,A\ndynamic expert strategy for player i, \u03c3\u2032\ndynamic expert of Gi,j, \u03c3j\nis denoted as the base abstraction and the dynamic expert \u03c30\n\ni , to be \u03c3\u2032\n\ni\n\nFigure 1b contains an abstract game tree \u0393\u2032 for a dynamic expert strategy. We can view a dynamic\nexpert strategy as a strategy computed in an abstraction with differing granularity dependent on\nthe history of actions taken. Note that our de\ufb01nition is somewhat redundant to the de\ufb01nition of\nabstraction as we are simply de\ufb01ning a new abstraction for \u0393 based on the abstractions \u03b10, \u03b11, ..., \u03b1p.\nNonetheless, we supply De\ufb01nition 3.3 to provide the terms in bold that we will use throughout.\n\nUnder memory constraints, a dynamic expert strategy typically sacri\ufb01ces abstraction granularity\nin the base strategy to achieve \ufb01ner granularity in the experts. We hope doing so achieves better\nperformance at parts of the game that we believe may be more important. For instance, importance\ncould depend on the predicted relative frequencies of reaching different subtrees.The base strategy\u2019s\nabstraction is reduced to guarantee perfect cohesion between the base and the experts; the base\nstrategy knows about the experts and can calculate its probabilities \u201cdynamically\u201d during strategy\ncomputation based on the feedback from the experts. In Section 6, we contrast static and dynamic\nexperts to compare this trade-off between abstraction size and strategy cohesion.\n\n4 Texas and Leduc Hold\u2019em\n\nA hand of Texas Hold\u2019em poker (or simply Hold\u2019em) begins with each player being dealt two private\ncards, and two players posting mandatory bets or blinds. There are four betting rounds, the pre-\ufb02op,\n\ufb02op, turn, and river where \ufb01ve community cards are successively revealed. Of the players that did\nnot fold, the player with the highest ranked poker hand wins all of the bets. Full rules can be found\non-line.2 We focus on the Limit Hold\u2019em variant that \ufb01xes the bet sizes and the number of bets\nallowed per round. We denote the players\u2019 actions as f (fold), c (check or call), and r (bet or raise).\nLeduc Hold\u2019em [10] (or simply Leduc) is a smaller version of Hold\u2019em, played with a six card deck\nconsisting of two Jacks, two Queens, and two Kings with only two betting rounds, pre-\ufb02op and \ufb02op.\nRather than using blinds, antes are posted by all players at the beginning of a hand. Only one private\ncard is dealt to each player and one community card is dealt on the \ufb02op.\n\nWhile Leduc is small enough to bypass abstraction, Hold\u2019em is a massive game in terms of the\nnumber of information sets; 2-player Limit Hold\u2019em has approximately 3 \u00d7 1014 information sets,\nand 3-player has roughly 5 \u00d7 1017. Applying CFR to these enormous state spaces necessitates\nabstraction. A common abstraction technique in poker is to group many different card dealings\ninto single abstract states or buckets. This is commonly done by ordering all possible poker hands\nfor a speci\ufb01c betting round according to some metric, such as expected hand strength (E[HS]) or\nexpected hand strength squared (E[HS2]), and then grouping hands with similar metric values into\nthe same bucket [7]. Percentile bucketing with N buckets and M hands puts the top M/N hands\ninto 1 bucket, the next best M/N into a second bucket, etc., so that the buckets are approximately\nequal in size. More advanced bucketing schemes that use multiple metrics and clustering techniques\nare possible, but our experiments use simple percentile bucketing with no action abstraction.\n\n5 Related Work\n\nOur general framework for applying static experts to any extensive form game captures some previ-\nous poker-speci\ufb01c strategy stitching approaches. First, the PsOpti family of agents [2], which play\n2-player Limit Hold\u2019em, contain a base strategy called the \u201cpre-\ufb02op model\u201d and 7 static experts with\nS = N , or \u201cpost-\ufb02op models.\u201d Due to resource and technology limitations, the abstractions used to\n\n2http://en.wikipedia.org/wiki/Texas hold \u2019em\n\n5\n\n\fbuild the pre-\ufb02op and post-\ufb02op models were quite coarse, making the family no match for today\u2019s\ntop agents. Secondly, Abou Risk and Szafron [1] attach 6 static experts with S = N (which they call\n\u201cheads-up experts\u201d) to a base strategy for playing 3-player Limit Hold\u2019em. Each expert focuses on\na subtree immediately following a fold action, allowing much \ufb01ner abstractions for these 2-player\nscenarios. However, their results were mixed as the stitched strategy was not always better than the\nbase strategy alone. Nonetheless, our positive results for static experts with S = N in Section 6\nprovide evidence that the PsOpti approach and heads-up experts are indeed credible.\n\nIn addition, Gilpin and Sandholm [5] create a poker agent for 2-player Limit Hold\u2019em that uses a\n2-phase strategy different from the approaches discussed thus far. The \ufb01rst phase is used to play the\npre-\ufb02op and \ufb02op rounds, and is computed similarly to the PsOpti pre-\ufb02op model. For the turn and\nriver rounds, a second phase strategy is computed on-line. One drawback of this approach is that the\non-line computations must be quick enough to play in real time. Despite \ufb01xing the \ufb02op cards, this\nconstraint forced the authors to still employ a very coarse abstraction during the second phase.\n\nFurthermore, there have been a few other related approaches to creating poker agents. While 2-\nplayer poker is well studied, Ganzfried and Sandholm [3, 4] developed algorithms for computing\nNash equilibria in multiplayer games and applied it to a small 3-player jam/fold poker game. Addi-\ntionally, Gilpin et al. [6] use an automated abstraction building tool to dynamically bucket hands in\n2-player Limit Hold\u2019em. Here, we are not concerned with equilibrium properties or the abstraction\nbuilding process itself. In fact, strategy stitching is orthogonal to both strategy computation and\nabstraction improvements, and could be used in conjunction with more sophisticated techniques.\n\n6 Empirical Evaluation\n\nIn this section, we create several stitched strategies in both Leduc and Hold\u2019em using the chance-\nsampled variant of CFR [14]. CFR is state of the art in terms of memory ef\ufb01ciency for strategy\ncomputation, allowing us to employ abstractions with higher granularity than otherwise possible.\nResults may differ with other techniques for computing strategies and building abstractions. While\nCFR requires iterations quadratic in the number of information sets to converge [14, Theorem 4],\nwe restrict our resources only in terms of memory. Even though Leduc is small enough to not\nnecessitate strategy stitching, the Leduc experiments were conducted to evaluate our hypothesis that\nstatic experts with S = N can improve play. We ran many experiments and for brevity, only a\nrepresentative sample of the results are summarized.\n\nTo be consistent with post-\ufb02op models [2] and heads-up experts [1], our grafting pro\ufb01les are de\ufb01ned\nonly in terms of the players\u2019 actions. For each history h \u2208 H, de\ufb01ne b := b(h) to be the subsequence\nof h obtained by removing all actions generated by chance. We refer to a b-expert for player i as an\nexpert constructed for the subtree Gi(b) := {h \u2208 Hi | b is a pre\ufb01x of b(h)} containing all histories\nwhere the players initially follow b. For example, the experts for the games in Figures 1b, 2a, and\n2b are l-experts because the game is split after player 1 takes action l.\n\nLeduc. Our Leduc experiments use three different base abstractions, one of which is simply the\nnull abstraction. The second and third abstractions are the \u201cJQ-K\u201d and \u201cJ-QK\u201d abstractions that, on\nthe pre-\ufb02op, cannot distinguish between whether the private card is a Jack or Queen, or whether the\nprivate card is a Queen or King respectively. In addition, these two abstractions can only distinguish\nbetween whether the \ufb02op card pairs with the private card or not rather than knowing the identity of\nthe \ufb02op card. Because Leduc is such a small game, we do not consider a \ufb01xed memory restriction\nand instead just compare the techniques within the same base abstraction.\n\nFor both 2-player and 3-player, for each of the three base abstractions, and for each player i, we\nbuild a base strategy, a dynamic expert strategy, an S = {i} static expert strategy, and two S = N\nstatic expert strategies. Recall choosing S = {i} means that during computation of each static\nexpert, we only \ufb01x player i\u2019s action probabilities outside of the expert subtree, whereas S = N\nmeans that we \ufb01x all players outside of the subtree. For 2-player Leduc, we use r, cr, ccr, and cccr-\nexperts for both players. Thus, the base strategy plays until the \ufb01rst raise occurs, at which point\nan expert takes over for the remainder of the hand. As an exception, only one of our two S = N\nstatic expert strategies, named \u201cAll,\u201d uses all four experts; the other, named \u201cPre-\ufb02op,\u201d just uses the\nr and cr-experts. For 3-player Leduc, we use r, cr, ccr, cccr, ccccr, and cccccr-experts, except the\n\u201cPre-\ufb02op\u201d static strategies use just the three experts r, cr, and ccr. The null abstraction is employed\n\n6\n\n\fTable 1: The size, earnings, and exploitability of the 2-player (2p) Leduc strategies in the JQ-K base\nabstraction, and the size and earnings of the 3-player (3p) strategies in the J-QK base abstraction.\nThe sizes are measured in terms of the maximum number of information sets present within a single\nCFR computation. Earnings, as described in the text, and exploitability are in milli-antes per hand.\n\nStrategy (2p)\n\nBase\n\nDynamic\n\nStatic.S={i}\nStatic.S=N .All\n\nStatic.S=N .Pre-\ufb02op\n\nSize Earns. Exploit.\n496.31\n132\n159.84\n444\n167.61\n226\n186\n432.74\n214.44\n186\n\n24.73\n45.75\n28.87\n29.20\n37.77\n\nStrategy (3p)\n\nBase\n\nDynamic\n\nStatic.S={i}\nStatic.S=N .All\n\nStatic.S=N .Pre-\ufb02op\n\nSize\n1890\n6903\n3017\n2145\n2145\n\nEarns.\n-68.46\n113.04\n96.14\n117.01\n119.73\n\non every expert subtree. Each run of CFR is stopped after 100 million iterations, which for 2-player\nyields strategies within a milli-ante of equilibrium in the abstract game.\n\nEach strategy is evaluated against all combinations and orderings of opponent strategies where all\nstrategies use different base abstractions, and the scores are averaged together. For example, for\neach of our 2-player strategy pro\ufb01les \u03c3 in the JQ-K base abstraction, we compute 1/2(u1(\u03c31, \u03c3\u2032\n2) +\n1, \u03c32)), averaged over all pro\ufb01les \u03c3\u2032 that use either the null or J-QK base abstraction. Leduc is\nu2(\u03c3\u2032\na small enough game that the utilities can be computed exactly. A selection of these scores, along\nwith 2-player exploitability values, are reported in Table 1.\n\nFirstly, by increasing abstraction granularity, all of the JQ-K strategies employing experts earn\nmore than the base strategy alone. Secondly, Dynamic and Static.S=N earn more overall than\nStatic.S={i}, despite the 2-player Static.S=N being more exploitable due to the opponent action\nassumptions. In fact, despite requiring much less memory to compute, Static.S=N surprisingly\nearns more than Dynamic in 3-player Leduc. Finally, we see that only using two pre-\ufb02op static\nexperts as opposed to all four reduces the number of dangerous assumptions to provide a stronger\nand less exploitable strategy. However, as expected, Dynamic and Static.S={i} are less exploitable.\n\nHold\u2019em. Our Hold\u2019em experiments enforce a \ufb01xed memory restriction per run of CFR, which\nwe arti\ufb01cially set to 24 million information sets for 2-player and 162 million information sets for\n3-player. We compute stitched strategies of each type using as many percentile E[HS2] buckets as\npossible within the restriction. Our 2-player abstractions distribute buckets as close to uniformly\nas possible across the betting rounds while remembering buckets from previous rounds (known as\n\u201cperfect recall\u201d). Our 3-player abstractions are similar, except they use 169 pre-\ufb02op buckets that are\nforgotten on later rounds (known as \u201cimperfect recall;\u201d see [1] and [13] for more regarding CFR and\nimperfect recall).\n\nFor 2-player, our dynamic strategy has just an r-expert, our S = {i} static strategy uses r, cr, ccr,\nand cccr-experts, and our S = N static strategy employs r and cr-experts. These choices were\nbased on preliminary experiments to make the most effective use of the limited memory available\nfor each stitching approach. Following Abou Risk and Szafron [1], our 3-player stitched strategies\nall have f , rf , rrf , and rcf -experts as these appear to be the most commonly reached 2-player\nscenarios [1, Table 4]. Our abstractions range quite dramatically in terms of number of buckets. For\nexample, in 3-player, our dynamic strategy\u2019s base abstraction has just 8 river buckets with 7290 river\nbuckets for each expert, whereas our static strategies have 16 river buckets in the base abstraction\nwith up to 194,481 river buckets for the S = N static rcf -expert abstraction. For reference, all of\nthe 2-player base and experts are built from 720 million iterations of CFR, while we run CFR for\n100 million and 5 billion iterations for the 3-player base and experts respectively.\n\nWe evaluate our 2-player strategies by playing 500,000 duplicate hands (players play both sides of\nthe dealt cards) of poker between each pair of strategies. In addition to our base and stitched strate-\ngies, we also included a base strategy called \u201cBase.797M\u201d in an abstraction with over 797 million\ninformation sets that we expected to beat all of the strategies we were evaluating. Furthermore, using\na specialized best response tool [8], we computed the exploitability of our 2-player strategies. For\n3-player, we play 500,000 triplicate hands (each set of dealt cards played 6 times, one for each of the\nplayer seatings) between each combination of 3 strategies. We also included two other strategies:\n\u201cACPC-09,\u201d the 2009 ACPC 3-player event winner that did not use experts (Abou Risk and Szafron\n[1] call it \u201cIR16\u201d), and \u201cACPC-10,\u201d a static expert strategy that won a 3-player event at the 2010\nACPC and is outlined at the end of this section. The results are provided in Table 2.\n\n7\n\n\fTable 2: Earnings and 95% con\ufb01dence intervals over 500,000 duplicate hands of 2-player Hold\u2019em\nper pairing, and over 500,000 triplicate hands of 3-player Hold\u2019em per combination. The exploitabil-\nity of the 2-player strategies is also provided. All values are in milli-big-blinds per hand.\nEarnings\n\nExploitability\n\nStrategy (2p)\n\nEarnings\n\nStrategy (3p)\n\nBase\n\nDynamic\n\n\u221210.47 \u00b1 1.99\n\u22124.43 \u00b1 1.98\nStatic.S={i} \u221213.13 \u00b1 2.00\nStatic.S=N\n\u22124.57 \u00b1 1.95\nBase.797M\n32.59 \u00b1 2.14\n\n310.04\n307.76\n301.00\n288.82\n135.43\n\nBase\n\nDynamic\n\nStatic.S={i}\nStatic.S=N\nACPC-09\nACPC-10\n\n\u22126.09 \u00b1 0.71\n\u22124.91 \u00b1 0.75\n\u22125.20 \u00b1 0.70\n3.06 \u00b1 0.70\n\u221214.15 \u00b1 0.89\n27.29 \u00b1 0.86\n\nFirstly, in 2-player, we see that Static.S=N and Dynamic outperform Static.S={i} considerably,\nagreeing with the previous Leduc results. In fact, the Static.S={i} fails to even improve upon the\nbase strategy. For 3-player, Static.S=N is noticeably ahead of both Dynamic and Static.S={i} as it\nis the only strategy, aside from ACPC-10, to win money. By forcing one player to fold, the static\nexperts with S = N essentially reduce the size of the game tree from a 3-player to a 2-player\ngame, allowing many more buckets to be used. This result indicates that at least for poker, the\ngains in abstraction bucketing outweigh the risks of forced action assumptions and lack of cohesion\nbetween the base strategy and the experts. Furthermore, Static.S=N is slightly less exploitable in\n2-player than the base strategy and the other two stitched strategies. While there are one and two\nopponent static actions assumed by the r and cr-experts respectively, trading these few assumptions\nfor an increase in abstraction granularity is bene\ufb01cial. In summary, static experts with S = N are\npreferred to both dynamic and static experts with S = {i} in the experiments we ran.\nAn additional validation of the quality of the static expert approach was provided by the 2010 ACPC.\nThe winning entries in both 3-player events employed static experts with S = N . The base strategy,\ncomputed from 70 million iterations of CFR, used 169, 900, 100, and 25 buckets on each of the\nrespective rounds. Four experts were used, f , rf , rrf , and rcf , computed from 10 billion iterations\nof CFR, each containing 169, 60,000, 180,000, and 26,160 buckets on the respective rounds. In\naddition, clustering techniques on strength distribution were used instead of percentile bucketing.\nTwo strategies were created, where one was trained to play slightly more aggressively for the total\nbankroll event. Each version \ufb01nished in \ufb01rst place in its respective competition.\n\n7 Conclusions\n\nWe discussed two strategy stitching techniques for extensive games, including static experts that\ngeneralize strategy grafting and some previous techniques used in poker. Despite the accompanying\npotential dangers and lack of cohesion, we have shown static experts with S = N outperform the\ndynamic and static experts with S = {i} that we considered, especially when memory limitations are\npresent. However, additional static experts with several forced actions can lead to a more exploitable\nstrategy. Static experts with S = N is currently our preferred method for creating multiplayer poker\nstrategies and would be our \ufb01rst option for playing other large extensive games.\n\nFuture work includes \ufb01nding a way to create more cohesion between the base strategy and static\nexperts. One possibility is to rebuild the base strategy after the experts have been created so that the\nbase strategy\u2019s play is more uni\ufb01ed with the experts. In addition, we have yet to experiment with 3-\nplayer \u201chybrid\u201d static experts where |S| = 2. Finally, there are many ways to combine the stitching\ntechniques described in this paper. One possibility is to use a dynamic expert strategy as a base\nstrategy of a static expert strategy. In addition, static experts could themselves be dynamic expert\nstrategies for the appropriate subtrees. Such combinations may produce even stronger strategies than\nthose produced in this paper.\n\nAcknowledgments\n\nWe would like to thank Westgrid and Compute Canada for their computing resources that were used\nduring this work. We would also like to thank the members of the Computer Poker Research Group\nat the University of Alberta for their helpful pointers throughout this project. This research was\nfunded by NSERC and Alberta Ingenuity, now part of Alberta Innovates - Technology Futures.\n\n8\n\n\fReferences\n\n[1] N. Abou Risk and D. Szafron. Using counterfactual regret minimization to create competitive\n\nmultiplayer poker agents. In AAMAS, pages 159\u2013166, 2010.\n\n[2] D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron.\nApproximating game-theoretic optimal strategies for full-scale poker. In IJCAI, pages 661\u2013\n668, 2003.\n\n[3] S. Ganzfried and T. Sandholm. Computing an approximate jam/fold equilibrium for 3-agent\n\nno-limit Texas Hold\u2019em tournaments. In AAMAS, 2008.\n\n[4] S. Ganzfried and T. Sandholm. Computing equilibria in multiplayer stochastic games of im-\n\nperfect information. In IJCAI, 2009.\n\n[5] A. Gilpin and T. Sandholm. Better automated abstraction techniques for imperfect information\n\ngames, with application to Texas Hold\u2019em poker. In AAMAS, 2007.\n\n[6] A. Gilpin, T. Sandholm, and T.B. S\u00f8rensen. Potential-aware automated abstraction of sequen-\n\ntial games, and holistic equilibrium analysis of Texas Hold\u2019em poker. In AAAI, 2007.\n\n[7] M. Johanson. Robust strategies and counter-strategies: Building a champion level computer\n\npoker player. Master\u2019s thesis, University of Alberta, 2007.\n\n[8] M. Johanson, K. Waugh, M. Bowling, and M. Zinkevich. Accelerating best response calcula-\n\ntion in large extensive games. In IJCAI, 2011. To appear.\n\n[9] M. Osborne and A. Rubenstein. A Course in Game Theory. The MIT Press, Cambridge,\n\nMassachusetts, 1994.\n\n[10] F. Southey, M. Bowling, B. Larson, C. Piccione, N. Burch, D. Billings, and C. Rayner. Bayes\u2019\n\nbluff: Opponent modelling in poker. In UAI, pages 550\u2013558, 2005.\n\n[11] K. Waugh, M. Bowling, and N. Bard. Strategy grafting in extensive games. In NIPS-22, pages\n\n2026\u20132034, 2009.\n\n[12] K. Waugh, D. Schnizlein, M. Bowling, and D. Szafron. Abstraction pathologies in extensive\n\ngames. In SARA, pages 781\u2013788, 2009.\n\n[13] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and\n\nMichael Bowling. A practical use of imperfect recall. In SARA, pages 175\u2013182, 2009.\n\n[14] M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione. Regret minimization in games with\n\nincomplete information. In NIPS-20, pages 905\u2013912, 2008.\n\n9\n\n\f", "award": [], "sourceid": 95, "authors": [{"given_name": "Richard", "family_name": "Gibson", "institution": null}, {"given_name": "Duane", "family_name": "Szafron", "institution": null}]}