{"title": "Learning Optimal Commitment to Overcome Insecurity", "book": "Advances in Neural Information Processing Systems", "page_first": 1826, "page_last": 1834, "abstract": "Game-theoretic algorithms for physical security have made an impressive real-world impact. These algorithms compute an optimal strategy for the defender to commit to in a Stackelberg game, where the attacker observes the defender's strategy and best-responds. In order to build the game model, though, the payoffs of potential attackers for various outcomes must be estimated; inaccurate estimates can lead to significant inefficiencies. We design an algorithm that optimizes the defender's strategy with no prior information, by observing the attacker's responses to randomized deployments of resources and learning his priorities. In contrast to previous work, our algorithm requires a number of queries that is polynomial in the representation of the game.", "full_text": "Learning Optimal Commitment to Overcome Insecurity\n\nAvrim Blum\n\nCarnegie Mellon University\n\nNika Haghtalab\n\nCarnegie Mellon University\n\navrim@cs.cmu.edu\n\nnika@cmu.edu\n\nAriel D. Procaccia\n\nCarnegie Mellon University\narielpro@cs.cmu.edu\n\nAbstract\n\nGame-theoretic algorithms for physical security have made an impressive real-\nworld impact. These algorithms compute an optimal strategy for the defender\nto commit to in a Stackelberg game, where the attacker observes the defender\u2019s\nstrategy and best-responds. In order to build the game model, though, the payoffs\nof potential attackers for various outcomes must be estimated; inaccurate esti-\nmates can lead to signi\ufb01cant inef\ufb01ciencies. We design an algorithm that optimizes\nthe defender\u2019s strategy with no prior information, by observing the attacker\u2019s re-\nsponses to randomized deployments of resources and learning his priorities. In\ncontrast to previous work, our algorithm requires a number of queries that is poly-\nnomial in the representation of the game.\n\n1\n\nIntroduction\n\nThe US Coast Guard, the Federal Air Marshal Service, the Los Angeles Airport Police, and other\nmajor security agencies are currently using game-theoretic algorithms, developed in the last decade,\nto deploy their resources on a regular basis [13]. This is perhaps the biggest practical success story\nof computational game theory \u2014 and it is based on a very simple idea. The interaction between\nthe defender and a potential attacker can be modeled as a Stackelberg game, in which the defender\ncommits to a (possibly randomized) deployment of his resources, and the attacker responds in a\nway that maximizes his own payoff. The algorithmic challenge is to compute an optimal defender\nstrategy \u2014 one that would maximize the defender\u2019s payoff under the attacker\u2019s best response.\nWhile the foregoing model is elegant, implementing it requires a signi\ufb01cant amount of information.\nPerhaps the most troubling assumption is that we can determine the attacker\u2019s payoffs for different\noutcomes. In deployed applications, these payoffs are estimated using expert analysis and histori-\ncal data \u2014 but an inaccurate estimate can lead to signi\ufb01cant inef\ufb01ciencies. The uncertainty about\nthe attacker\u2019s payoffs can be encoded into the optimization problem itself, either through robust\noptimization techniques [12], or by representing payoffs as continuous distributions [5].\nLetchford et al. [8] take a different, learning-theoretic approach to dealing with uncertain attacker\npayoffs. Studying Stackelberg games more broadly (which are played by two players, a leader and\na follower), they show that the leader can ef\ufb01ciently learn the follower\u2019s payoffs by iteratively com-\nmitting to different strategies, and observing the attacker\u2019s sequence of responses. In the context of\nsecurity games, this approach may be questionable when the attacker is a terrorist, but it is a perfectly\nreasonable way to calibrate the defender\u2019s strategy for routine security operations when the attacker\nis, say, a smuggler. And the learning-theoretic approach has two major advantages over modifying\nthe defender\u2019s optimization problem. First, the learning-theoretic approach requires no prior infor-\nmation. Second, the optimization-based approach deals with uncertainty by inevitably degrading the\nquality of the solution, as, intuitively, the algorithm has to simultaneously optimize against a range\nof possible attackers; this problem is circumvented by the learning-theoretic approach.\nBut let us revisit what we mean by \u201cef\ufb01ciently learn\u201d. The number of queries (i.e., observations of\nfollower responses to leader strategies) required by the algorithm of Letchford et al. [8] is polynomial\nin the number of pure leader strategies. The main dif\ufb01culty in applying their results to Stackelberg\n\n1\n\n\fsecurity games is that even in the simplest security game, the number of pure defender strategies is\nexponential in the representation of the game. For example, if each of the defender\u2019s resources can\nprotect one of two potential targets, there is an exponential number of ways in which resources can\nbe assigned to targets. 1\n\nOur approach and results. We design an algorithm that learns an (additively) \u0001-optimal strategy\nfor the defender with probability 1 \u2212 \u03b4, by asking a number of queries that is polynomial in the\nrepresentation of the security game, and logarithmic in 1/\u0001 and 1/\u03b4. Our algorithm is completely\ndifferent from that of Letchford et al. [8]. Its novel ingredients include:\n\n\u2022 We work in the space of feasible coverage probability vectors, i.e., we directly reason about\nthe probability that each potential target is protected under a randomized defender strategy.\nDenoting the number of targets by n, this is an n-dimensional space. In contrast, Letchford\net al. [8] study the exponential-dimensional space of randomized defender strategies. We\nobserve that, in the space of feasible coverage probability vectors, the region associated\nwith a speci\ufb01c best response for the attacker (i.e., a speci\ufb01c target being attacked) is convex.\n\u2022 To optimize within each of these convex regions, we leverage techniques \u2014 developed\nby Tauman Kalai and Vempala [14] \u2014 for optimizing a linear objective function in an\nunknown convex region using only membership queries. In our setting, it is straightforward\nto build a membership oracle, but it is quite nontrivial to satisfy a key assumption of the\nforegoing result: that the optimization process starts from an interior point of the convex\nregion. We do this by constructing a hierarchy of nested convex regions, and using smaller\nregions to obtain interior points in larger regions.\n\u2022 We develop a method for ef\ufb01ciently discovering new regions. In contrast, Letchford et\nal. [8] \ufb01nd regions (in the high-dimensional space of randomized defender strategies) by\nsampling uniformly at random; their approach is inef\ufb01cient when some regions are small.\n\n2 Preliminaries\n\nA Stackelberg security game is a two-player general-sum game between a defender (or the leader)\nand an attacker (or the follower). In this game, the defender commits to a randomized allocation\nof his security resources to defend potential targets. The attacker, in turn, observes this randomized\nallocation and attacks the target with the best expected payoff. The defender and the attacker re-\nceive payoffs that depend on the target that was attacked and whether or not it was defended. The\ndefender\u2019s goal is to choose an allocation that leads to the best payoff.\nMore precisely, a security game is de\ufb01ned by a 5-tuple (T,D, R, A, U ):\n\n\u2022 T = {1, . . . , n} is a set of n targets.\n\u2022 R is a set of resources.\n\u2022 D \u2286 2T is a collection of subsets of targets, each called a schedule, such that for every\nschedule D \u2208 D, targets in D can be simultaneously defended by one resource.\nIt is\nnatural to assume that if a resource is capable of covering schedule D, then it can also\ncover any subset of D. We call this property closure under the subset operation; it is also\nknown as \u201csubsets of schedules are schedules (SSAS)\u201d [7].\n\u2022 A : R \u2192 2D, called the assignment function, takes a resource as input and returns the set of\nall schedules that the resource is capable of defending. An allocation of resources is valid\nif every resource r is allocated to a schedule in A(r).\n\u2022 The payoffs of the players are given by functions Ud(t, pt) and Ua(t, pt), which return the\nexpected payoffs of the defender and the attacker, respectively, when target t is attacked and\nit is covered with probability pt (as formally explained below). We make two assumptions\nthat are common to all papers on security games. First, these utility functions are linear.\nSecond, the attacker prefers it if the attacked target is not covered, and the defender prefers\n\n1Subsequent work by Marecki et al. [9] focuses on exploiting revealed information during the learning\nprocess \u2014 via Monte Carlo Tree Search \u2014 to optimize total leader payoff. While their method provably\nconverges to the optimal leader strategy, no theoretical bounds on the rate of convergence are known.\n\n2\n\n\fit if the attacked target is covered, i.e., Ud(t, pt) and Ua(t, pt) are respectively increasing\nand decreasing in pt. We also assume w.l.o.g.\nthat the utilities are normalized to have\nvalues in [\u22121, 1]. If the utility functions have coef\ufb01cients that are rational with denominator\nat most a, then the game\u2019s (utility) representation length is L = n log n + n log a.\n\nA pure strategy of the defender is a valid assignment of resources to schedules. The set of pure\nstrategies is determined by T , D, R, and A. Let there be m pure strategies; we use the following\nn \u00d7 m, zero-one matrix M to represent the set of all pure strategies. Every row in M represents\na target and every column represents a pure strategy. Mti = 1 if and only if target t is covered\nusing some resource in the ith pure strategy. A mixed strategy (hereinafter, called strategy) is a\ndistribution over the pure strategies. To represent a strategy we use a 1 \u00d7 m vector s, such that si is\n\nthe probability with which the ith strategy is played, and(cid:80)m\n\ni=1 si = 1.\n\nGiven a defender\u2019s strategy, the coverage probability of a target is the probability with which it is\ndefended. Let s be a defender\u2019s strategy, then the coverage probability vector is pT = M sT , where\npt is coverage probability of target t. We call a probability vector implementable if there exists a\nstrategy that imposes that coverage probability on the targets.\nLet ps be the corresponding coverage probability vector of strategy s. The attacker\u2019s best response\nt ). Since the attacker\u2019s best-response is determined by the\nto s is de\ufb01ned by b(s) = arg maxt Ua(t, ps\ncoverage probability vector irrespective of the strategy, we slightly abuse notation by using b(ps)\nto denote the best-response, as well. We say that target t is \u201cbetter\u201d than t(cid:48) for the defender if the\nhighest payoff he receives when t is attacked is more than the highest payoff he receives when t(cid:48) is\nattacked. We assume that if multiple targets are tied for the best-response, then ties are broken in\nfavor of the \u201cbest\u201d target.\nThe defender\u2019s optimal strategy is de\ufb01ned as the strategy with highest expected payoff for the de-\nb(s)). An optimal strategy p is called conservative if no other optimal\nfender, i.e. arg maxs Ud(b(s), ps\nstrategy has a strictly lower sum of coverage probabilities. For two coverage probability vectors we\nuse q (cid:22) p to denote that for all t, qt \u2264 pt.\n\n3 Problem Formulation and Technical Approach\n\nIn this section, we give an overview of our approach for learning the defender\u2019s optimal strategy\nwhen Ua is not known. To do so, we \ufb01rst review how the optimal strategy is computed in the case\nwhere Ua is known.\nComputing the defender\u2019s optimal strategy, even when Ua(\u00b7) is known, is NP-Hard [6]. In practice\nthe optimal strategy is computed using two formulations: Mixed Integer programming [11] and\nMultiple Linear Programs [1]; the latter provides some insight for our approach. The Multiple LP\napproach creates a separate LP for every t \u2208 T . This LP, as shown below, solves for the optimal\ndefender strategy under the restriction that the strategy is valid (second and third constraints) and the\nattacker best-responds by attacking t (\ufb01rst constraint). Among these solutions, the optimal strategy\nis the one where the defender has the highest payoff.\n\n(cid:88)\n\n(cid:88)\n\nsi)\n\nsi) \u2264 Ua(t,\n\ni:Mt(cid:48) i=1\n\ni:Mti=1\n\n(cid:88)\n\nmaximize Ud(t,\n\nsi)\n\ni:Mti=1\n\ns.t. \u2200t(cid:48) (cid:54)= t, Ua(t(cid:48),\n\n\u2200i, si \u2265 0\n\nn(cid:88)\n\nsi = 1\n\ni=1\n\nWe make two changes to the above LP in preparation for \ufb01nding the optimal strategy in polynomially\nmany queries, when Ua is unknown. First, notice that when Ua is unknown, we do not have an\nexplicit de\ufb01nition of the \ufb01rst constraint. However, implicitly we can determine whether t has a better\npayoff than t(cid:48) by observing the attacker\u2019s best-response to s. Second, the above LP has exponentially\n\n3\n\n\fmany variables, one for each pure strategy. However, given the coverage probabilities, the attacker\u2019s\nactions are independent of the strategy that induces that coverage probability. So, we can restate the\nLP to use variables that represent the coverage probabilities and add a constraint that enforces the\ncoverage probabilities to be implementable.\n\nmaximize Ud(t, pt)\n\ns.t.\n\nt is attacked\np is implementable\n\n(1)\n\nThis formulation requires optimizing a linear function over a region of the space of coverage prob-\nabilities, by using membership queries. We do so by examining some of the characteristics of the\nabove formulation and then leveraging an algorithm introduced by Tauman Kalai and Vempala [14]\nthat optimizes over a convex set, using only an initial point and a membership oracle. Here, we\nrestate their result in a slightly different form.\nTheorem 2.1 [14, restated]. For any convex set H \u2286 Rn that is contained in a ball of radius R,\ngiven a membership oracle, an initial point with margin r in H, and a linear function (cid:96)(\u00b7), with\nprobability 1 \u2212 \u03b4 we can \ufb01nd an \u0001-approximate optimal solution for (cid:96) in H, using O(n4.5 log nR2\nr\u0001\u03b4 )\nqueries to the oracle.\n\n4 Main Result\n\n\u0001 and 1\n\nIn this section, we design and analyze an algorithm that (\u0001, \u03b4)-learns the defender\u2019s optimal strategy\nin a number of best-response queries that is polynomial in the number of targets and the representa-\ntion, and logarithmic in 1\nTheorem 1. Consider a security game with n targets and representation length L, such that for ev-\nery target, the set of implementable coverage probability vectors that induce an attack on that target,\nif non-empty, contains a ball of radius 1/2L. For any \u0001, \u03b4 > 0, with probability 1 \u2212 \u03b4, Algorithm 2\n\ufb01nds a defender strategy that is optimal up to an additive term of \u0001, using O(n6.5(log n\n\u0001\u03b4 + L))\nbest-response queries to the attacker.\n\n\u03b4 . Our main result is:\n\nThe main assumption in Theorem 1 is that the set of implementable coverage probabilities for which\na given target is attacked is either empty or contains a ball of radius 1/2L. This implies that if it is\npossible to make the attacker prefer a target, then it is possible to do so with a small margin. This\nassumption is very mild in nature and its variations have appeared in many well-known algorithms.\nFor example, interior point methods for linear optimization require an initial feasible solution that\nis within the region of optimization with a small margin [4]. Letchford et al. [8] make a similar\nassumption, but their result depends linearly, instead of logarithmically, on the minimum volume of\na region (because they use uniformly random sampling to discover regions).\nTo informally see why such an assumption is necessary, consider a security game with n targets,\nsuch that an attack on any target but target 1 is very harmful to the defender. The defender\u2019s goal\nis therefore to convince the attacker to attack target 1. The attacker, however, only attacks target 1\nunder a very speci\ufb01c coverage probability vector, i.e., the defender\u2019s randomized strategy has to be\njust so. In this case, the defender\u2019s optimal strategy is impossible to approximate.\nThe remainder of this section is devoted to proving Theorem 1. We divide our intermediate results\ninto sections based on the aspect of the problem that they address. The proofs of most lemmas are\nrelegated to the appendix; here we mainly aim to provide the structure of the theorem\u2019s overall proof.\n\n4.1 Characteristics of the Optimization Region\nOne of the requirements of Theorem 2.1 is that the optimization region is convex. Let P denote the\nspace of implementable probability vectors, and let Pt = {p : p is implementable and b(p) = t}.\nThe next lemma shows that Pt is indeed convex.\nLemma 1. For all t \u2208 T , Pt is the intersection of a \ufb01nitely many half-spaces.\nProof. Pt is de\ufb01ned by the set of all p \u2208 [0, 1]n such that there is s that satis\ufb01es the LP with the\ni si \u2264 1 and\n\nfollowing constraints. There are m half-spaces of the form si \u2265 0, 2 half-spaces(cid:80)\n\n4\n\n\f(cid:80)\ni si \u2265 1, 2n half-spaces of the form M s T \u2212 p T \u2264 0 and M s T \u2212 p T \u2265 0, and n \u2212 1 half-\nspaces of the form Ua(t, pt) \u2212 Ua(t(cid:48), pt(cid:48)) \u2265 0. Therefore, the set of (s, p) \u2208 Rm+n such that p is\nimplemented by strategy s and causes an attack on t is the intersection of 3n + m +1 half-spaces. Pt\nis the re\ufb02ection of this set on n dimensions; therefore, it is also the intersection of at most 3n+m+1\nhalf-spaces.\nLemma 1, in particular, implies that Pt is convex. The Lemma\u2019s proof also suggests a method\nfor \ufb01nding the minimal half-space representation of P. Indeed, the set S = {(s, p) \u2208 Rm+n :\nValid strategy s implements p} is given by its half-space representation. Using the Double Descrip-\ntion Method [2, 10], we can compute the vertex representation of S. Since, P is a linear transforma-\ntion of S, its vertex representation is the transformation of the vertex representation of S. Using the\nDouble Description Method again, we can \ufb01nd the minimal half-space representation of P.\nNext, we establish some properties of P and the half-spaces that de\ufb01ne it. The proofs of the follow-\ning two lemmas appear in Appendices A.1 and A.2, respectively.\nLemma 2. Let p \u2208 P. Then for any 0 (cid:22) q (cid:22) p, q \u2208 P.\nLemma 3. Let A be a set of a positive volume that is the intersection of \ufb01nitely many half-spaces.\nThen the following two statements are equivalent.\n\n1. For all p \u2208 A, p (cid:23) \u0001. And for all \u0001 (cid:22) q (cid:22) p, q \u2208 A.\n2. A can be de\ufb01ned as the intersection of ei \u00b7 p \u2265 \u0001 for all i, and a set H of half-spaces, such\n\nthat for any h \u00b7 p \u2265 b in H, h (cid:22) 0, and b \u2264 \u2212\u0001.\n\nUsing Lemmas 2 and 3, we can refer to the set of half-spaces that de\ufb01ne P by {(ei, 0) : for all i} \u222a\n\u2217\nHP, where for all (h\n\n\u2217 (cid:22) 0, and b\u2217 \u2264 0.\n\n, b\u2217) \u2208 HP, h\n\n4.2 Finding Initial Points\n\nAn important requirement for many optimization algorithms, including the one developed by Tau-\nman Kalai and Vempala [14], is having a \u201cwell-centered\u201d initial feasible point in the region of\noptimization. There are two challenges involved in discovering an initial feasible point in the inte-\nrior of every region. First, establishing that a region is non-empty, possibly by \ufb01nding a boundary\npoint. Second, obtaining a point that has a signi\ufb01cant margin from the boundary. We carry out these\ntasks by executing the optimization in a hierarchy of sets where at each level the optimization task\nonly considers a subset of the targets and the feasibility space. We then show that optimization in\none level of this hierarchy helps us \ufb01nd initial points in new regions that are well-centered in higher\nlevels of the hierarchy.\nTo this end, let us de\ufb01ne restricted regions. These regions are obtained by \ufb01rst perturbing the\nde\ufb01ning half-spaces of P so that they conform to a given representation length, and then trimming\nthe boundaries by a given width (See Figure 1).\nIn the remainder of this paper, we use \u03b3 =\n(n+1)2L+1 to denote the accuracy of the representation\nand the width of the trimming procedure for obtaining restricted regions. More precisely:\nDe\ufb01nition 1 (restricted regions). The set Rk \u2208 Rn is de\ufb01ned by the intersection the following half-\n\u2217(cid:99)\n\u2217\nspaces: For all i, (ei, k\u03b3). For all (h\nand b = \u03b3(cid:100) 1\n\n, b\u2217) \u2208 HP, a half-space (h, b + k\u03b3), such that h = \u03b3(cid:98) 1\n\u03b3 h\n\n\u03b3 b\u2217(cid:101). Furthermore, for every t \u2208 T , de\ufb01ne Rk\n\n1\n\nt = Rk \u2229 Pt.\n\nThe next Lemma, whose proof appears in Appendix A.3, shows that the restricted regions are subsets\nof the feasibility space, so, we can make best-response queries within them.\nLemma 4. For any k \u2265 0, Rk \u2286 P.\nThe next two lemmas, whose proofs are relegated to Appendices A.4 and A.5, show that in Rk one\ncan reduce each coverage probability individually down to k\u03b3, and the optimal conservative strategy\nin Rk indeed reduces the coverage probabilities of all targets outside the best-response set to k\u03b3.\nLemma 5. Let p \u2208 Rk, and let q such that k\u03b3 (cid:22) q (cid:22) p. Then q \u2208 Rk.\nLemma 6. Let s and its corresponding coverage probability p be a conservative optimal strategy\nin Rk. Let t\u2217 = b(s) and B = {t : Ua(t, pt) = Ua(t\u2217, pt\u2217 )}. Then for any t /\u2208 B, pt = k\u03b3.\n\n5\n\n\fTarget\n\n1\n2\n\nDefender\nAttacker\n0.5(1 \u2212 p1) \u22120.5(1 \u2212 p1)\n(1 \u2212 p2)\n\u2212(1 \u2212 p2)\n\n(a) Utilities of the game\n\nThe following Lemma, whose proof appears in\nAppendix A.6 shows that if every non-empty\nPt contains a large enough ball, then Rn\nt (cid:54)= \u2205.\nLemma 7. For any t and k \u2264 n such that Pt\nt (cid:54)= \u2205.\ncontains a ball of radius r > 1\n\n2L , Rk\n\n(b) Regions\n\nThe next lemma provides the main insight be-\nhind our search for the region with the highest-\npaying optimal strategy. It implies that we can\nrestrict our search to strategies that are optimal\nfor a subset of targets in Rk, if the attacker also\nagrees to play within that subset of targets. At\nany point, if the attacker chooses a target out-\nside the known regions, he is providing us with\na point in a new region. Crucially, Lemma 8\nrequires that we optimize exactly inside each\nrestricted region, and we show below (Algo-\nrithm 1 and Lemma 11) that this is indeed pos-\nsible.\nLemma 8. Assume that for every t, if Pt is\nnon-empty, then it contains a ball of radius 1\n2L .\nGiven K \u2286 T and k \u2264 n, let p \u2208 Rk be\nthe coverage probability of the strategy that has\nk\u03b3 probability mass on targets in T \\ K and is\noptimal if the attacker were to be restricted to\nattacking targets in K. Let p\u2217 be the optimal\nstrategy in P. If b(p) \u2208 K then b(p\u2217) \u2208 K.\nProof. Assume on the contrary that b(p\u2217) =\nt\u2217 /\u2208 K. Since Pt\u2217 (cid:54)= \u2205, by Lemma 7, there\nexists p(cid:48) \u2208 Rk\nt\u2217.\nFor ease of exposition, replace p with its cor-\nresponding conservative strategy in Rk. Let\nB be the set of targets that are tied for\nthe attacker\u2019s best-response in p,\ni.e. B =\narg maxt\u2208T Ua(t, pt). Since b(p) \u2208 K and ties\nare broken in favor of the \u201cbest\u201d target, i.e. t\u2217, it must be that t\u2217 /\u2208 B. Then, for any t \u2208 B,\nUa(t, pt) > Ua(t\u2217, k\u03b3) \u2265 Ua(t\u2217, p(cid:48)\nt). Since Ua is decreasing in the coverage probabil-\nity, for all t \u2208 B, p(cid:48)\nt > pt. Note that there is a positive gap between the attacker\u2019s payoff for attacking\na best-response target versus another target, i.e. \u2206 = mint(cid:48)\u2208K\\B,t\u2208B Ua(t, pt)\u2212 Ua(t(cid:48), pt(cid:48)) > 0, so\nit is possible to increase pt by a small amount without changing the best response. More precisely,\nsince Ua is continuous and decreasing in the coverage probability, for every t \u2208 B, there exists\n\u03b4 < p(cid:48)\nLet q be such that for t \u2208 B, qt = p(cid:48)\nt \u2212 \u03b4 and for t /\u2208 B, qt = pt = k\u03b3 (by Lemma 6 and the fact\nthat p was replaced by its conservative equivalent). By Lemma 5, q \u2208 Rk. Since for all t \u2208 B\nand t(cid:48) \u2208 K \\ B, Ua(t, qt) > Ua(t(cid:48), qt(cid:48)), b(q) \u2208 B. Moreover, because Ud is increasing in the\ncoverage probability for all t \u2208 B, Ud(t, qt) > Ud(t, pt). So, q has higher payoff for the defender\nwhen the attacker is restricted to attacking K. This contradicts the optimality of p in Rk. Therefore,\nb(p\u2217) \u2208 K.\n\nFigure 1: A security game with one resource that\ncan cover one of two targets. The attacker re-\nceives utility 0.5 from attacking target 1 and util-\nity 1 from attacking target 2, when they are not\ndefended; he receives 0 utility from attacking a\ntarget that is being defended. The defender\u2019s util-\nity is the zero-sum complement.\n\nt \u2212 pt such that for all t(cid:48) \u2208 K \\ B, Ua(t(cid:48), pt(cid:48)) < Ua(t, p(cid:48)\n\nt \u2212 \u03b4) < Ua(t, pt).\n\nt\u2217 ) \u2265 Ua(t, p(cid:48)\n\nIf the attacker attacks a target t outside the set of targets K whose regions we have already discov-\nered, we can use the new feasible point in Rk\n, as the next\nlemma formally states.\nLemma 9. For any k and t, let p be any strategy in Rk\ni (cid:54)= t, qi = pi + \u03b3\n\u221a\n4\nThe lemma\u2019s proof is relegated to Appendix A.7.\n\nt to obtain a well-centered point in Rk\u22121\nt . De\ufb01ne q such that qt = pt \u2212 \u03b3\n\n2n from the boundaries of Rk\u22121\n\nn . Then, q \u2208 Rk\u22121\n\nand q has distance \u03b3\n\n2 and for all\n\nt\n\nt\n\n.\n\nt\n\n6\n\n00.10.20.30.40.50.60.70.80.9100.10.20.30.40.50.60.70.80.91p1p2Optimal strategyR22R12P2R21R11P1p1 + p2 <= 10.5(1\u2212p1) = 1\u2212p2 Attack on Target 1Attack on Target 2Utility HalfspaceFeasibility HalfspacesOptimal Strategy\f4.3 An Oracle for the Convex Region\nWe use a three-step procedure for de\ufb01ning a membership oracle for P or Rk\nt . Given a vector p, we\n\ufb01rst use the half-space representation of P (or Rk) described in Section 4.1 to determine whether\np \u2208 P (or p \u2208 Rk). We then \ufb01nd a strategy s that implements p by solving a linear system with\nconstraints M sT = pT , 0 (cid:22) s, and (cid:107)s(cid:107)1 = 1. Lastly, we make a best-response query to the attacker\nfor strategy s. If the attacker responds by attacking t, then p \u2208 Pt (or p \u2208 Rk\nt ), else p /\u2208 Pt (or\np /\u2208 Rk\nt ).\n\n4.4 The Algorithms\n\nIn this section, we de\ufb01ne algorithms that use the results from previous sections to prove Theorem 1.\nFirst, we de\ufb01ne Algorithm 1, which receives an approximately optimal strategy in Rk\nt as input,\nand \ufb01nds the optimal strategy in Rk\nt is\nrequired in order to apply Lemma 8, thereby ensuring that we discover new regions when lucrative\nundiscovered regions still exist.\n\nt . As noted above, obtaining exact optimal solutions in Rk\n\nAlgorithm 1 LATTICE-ROUNDING (approximately optimal strategy p)\n\n1. For all i (cid:54)= t, make best-response queries to binary search for the smallest p(cid:48)\n\n25n(L+1) , such that t = b(p(cid:48)), where for all j (cid:54)= i, p(cid:48)\n\n1\n\nj \u2190 pj.\n\nto accuracy\n\ni \u2208 [k\u03b3, pi] up\n\n2. For all i, set ri and qi respectively to the smallest and second smallest rational numbers\n\nwith denominator at most 22n(L+1), that are larger than p(cid:48)\n\ni \u2212\n\n1\n\n25n(L+1) .\n\nt is the unique rational number with denominator at most 22n(L+1) in\n\n24n(L+1) ). (Refer to the proof for uniqueness), and for all i (cid:54)= t, p\u2217\n\ni \u2190 ri.\n\n1\n\n3. De\ufb01ne p\u2217 such that p\u2217\n\n[pt, pt +\n\n4. Query j \u2190 b(p\u2217).\n5. If j (cid:54)= t, let p\u2217\n6. Return p\u2217.\n\nj \u2190 qi. Go to step 4\n\nThe next two Lemmas, whose proofs appear in Appendices A.8 and A.9, establish the guarantees of\nAlgorithm 1. The \ufb01rst is a variation of a well-known result in linear programming [3] that is adapted\nspeci\ufb01cally for our problem setting.\nLemma 10. Let p\u2217 be a basic optimal strategy in Rk\ndenominator at most 22n(L+1).\nLemma 11. For any k and t, let p be a\n\ufb01nds the optimal strategy in Rk\n\n26n(L+1) -approximate optimal strategy in Rk\n\ni is a rational number with\n\nt , then for all i, p\u2217\n\nt in O(nL) best-response queries.\n\nt . Algorithm 1\n\n1\n\nAt last, we are ready to prove our main result, which provides guarantees for Algorithm 2, given\nbelow.\nTheorem 1 (restated). Consider a security game with n targets and representation length L, such\nthat for every target, the set of implementable coverage probability vectors that induce an attack\non that target, if non-empty, contains a ball of radius 1/2L. For any \u0001, \u03b4 > 0, with probability\n1 \u2212 \u03b4, Algorithm 2 \ufb01nds a defender strategy that is optimal up to an additive term of \u0001, using\nO(n6.5(log n\n\n\u0001\u03b4 + L)) best-response queries to the attacker.\n\nProof Sketch. For each K \u2286 T and k, the loop at step 5 of Algorithm 2 \ufb01nds the optimal strategy if\nthe attacker was restricted to attacking targets of K in Rk.\nEvery time the IF clause at step 5a is satis\ufb01ed, the algorithm expands the set K by a target t(cid:48) and\nadds xt(cid:48)\n(by Lemma 9). Then the\nalgorithm restarts the loop at step 5. Therefore every time the loop at step 5 is started, X is a set of\ninitial points in K that have margin \u03b3\nWe reach step 6 only when the best-response to the optimal strategy that only considers targets of K\nis in K. By Lemma 8, the optimal strategy is in Pt for some t \u2208 K. By applying Theorem 2.1 to K,\n\nto the set of initial points X, which is an interior point of Rk\u22121\n\n2n in Rk. This loop is restarted at most n \u2212 1 times.\n\nt(cid:48)\n\n7\n\n\fAlgorithm 2 OPTIMIZE (accuracy \u0001, con\ufb01dence \u03b4)\n\n1\n\nn2 , and k \u2190 n.\n\n(n+1)2L+1 , \u03b4(cid:48) \u2190 \u03b4\n\n1. \u03b3 \u2190\n2. Use R,D, and A to compute oracles (half-spaces) for P,R0, . . . ,Rn.\n3. Query t \u2190 b(k\u03b3)\n4. K \u2190 {t}, X \u2190 {x t}, where xt\n5. For t \u2208 K,\n\nt = k\u03b3 \u2212 \u03b3/2 and for i (cid:54)= t, xt\n\n\u221a\ni = k\u03b3 + \u03b3\nn.\n\n4\n\n(a) If during steps 5b to 5e a target t(cid:48) /\u2208 K is attacked as a response to some strategy p:\n\nt(cid:48) \u2190 pt(cid:48) \u2212 \u03b3/2 and for i (cid:54)= t(cid:48), xt(cid:48)\n\ni \u2190 pi + \u03b3\ni. Let xt(cid:48)\n\u221a\n4\nii. X \u2190 X \u222a {xt(cid:48)}, K \u2190 K \u222a {t(cid:48)}, and k \u2190 k \u2212 1.\niii. Restart the loop at step 5.\n\nn.\n\n1\n\n26n(L+1) -approximate optimal strategy restricted to set K.\n\n(b) Use Theorem 2.1 with set of targets K. With probability 1 \u2212 \u03b4(cid:48) \ufb01nd a qt that is a\n(c) Use the Lattice Rounding on qt to \ufb01nd qt\u2217, that is the optimal strategy in Rk\n(d) For all t(cid:48) /\u2208 K, qt\u2217\n(e) Query qt\u2217.\n1 \u2212 \u03b4(cid:48), in Pt.\n\n6. For all t \u2208 K, use Theorem 2.1 to \ufb01nd pt\u2217 that is an \u0001-approximate strategy with probability\n\nt(cid:48) \u2190 k\u03b3.\n\nt restricted\n\nto K.\n\n7. Return pt\u2217 that has the highest payoff to the defender.\n\nwith an oracle for P using the initial set of point X which has \u03b3/2n margin in R0, we can \ufb01nd the\n\u0001-optimal strategy with probability 1\u2212\u03b4(cid:48). There are at most n2 applications of Theorem 2.1 and each\nsucceeds with probability 1\u2212\u03b4(cid:48), so our overall procedure succeeds with probability 1\u2212n2\u03b4(cid:48) \u2265 1\u2212\u03b4.\nRegarding the number of queries, every time the loop at step 5 is restarted |K| increases by 1. So,\nthis loop is restarted at most n \u2212 1 times. In a successful run of the loop for set K, the loop makes\n|K| calls to the algorithm of Theorem 2.1 to \ufb01nd a\n26n(L+1) -approximate optimal solution. In each\ncall, X has initial points with margin \u03b3\n2n, and furthermore, the total feasibility space is bounded\nn (because of probability vectors), so each call makes O(n4.5(log n\nby a sphere of radius\n\u03b4 + L))\nqueries. The last call looks for an \u0001-approximate solution, and will take another O(n4.5(log n\n\u0001\u03b4 + L))\nqueries. In addition, our the algorithm makes n2 calls to Algorithm 1 for a total of O(n3L) queries.\nIn conclusion, our procedure makes a total of O(n6.5(log n\n\n\u0001\u03b4 +L)) = poly(n, L, log 1\n\n\u0001\u03b4 ) queries.\n\n\u221a\n\n1\n\n5 Discussion\n\nOur main result focuses on the query complexity of our problem. We believe that, indeed, best re-\nsponse queries are our most scarce resource, and it is therefore encouraging that an (almost) optimal\nstrategy can be learned with a polynomial number of queries.\nIt is worth noting, though, that some steps in our algorithm are computationally inef\ufb01cient. Specif-\nically, our membership oracle needs to determine whether a given coverage probability vector is\nimplementable. We also need to explicitly compute the feasibility half-spaces that de\ufb01ne P. Infor-\nmally speaking, (worst-case) computational inef\ufb01ciency is inevitable, because computing an optimal\nstrategy to commit to is computationally hard even in simple security games [6].\nNevertheless, deployed security games algorithms build on integer programming techniques to\nachieve satisfactory runtime performance in practice [13]. While beyond the reach of theoretical\nanalysis, a synthesis of these techniques with ours can yield truly practical learning algorithms for\ndealing with payoff uncertainty in security games.\nAcknowledgments. This material is based upon work supported by the National Science Founda-\ntion under grants CCF-1116892, CCF-1101215, CCF-1215883, and IIS-1350598.\n\n8\n\n\fReferences\n[1] V. Conitzer and T. Sandholm. Computing the optimal strategy to commit to. In Proceedings\n\nof the 7th ACM Conference on Electronic Commerce (EC), pages 82\u201390, 2006.\n\n[2] K. Fukuda and A. Prodon. Double description method revisited. In Combinatorics and com-\n\nputer science, pages 91\u2013111. Springer, 1996.\n\n[3] P. G\u00b4acs and L. Lov\u00b4asz. Khachiyan\u2019s algorithm for linear programming. Mathematical Pro-\n\ngramming Studies, 14:61\u201368, 1981.\n\n[4] M. Gr\u00a8otschel, L. Lov\u00b4asz, and A. Schrijver. Geometric Algorithms and Combinatorial Opti-\n\nmization. Springer, 2nd edition, 1993.\n\n[5] C. Kiekintveld, J. Marecki, and M. Tambe. Approximation methods for in\ufb01nite Bayesian\nIn Proceedings of the 10th\nStackelberg games: Modeling distributional payoff uncertainty.\nInternational Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pages\n1005\u20131012, 2011.\n\n[6] D. Korzhyk, V. Conitzer, and R. Parr. Complexity of computing optimal Stackelberg strategies\nin security resource allocation games. In Proceedings of the 24th AAAI Conference on Arti\ufb01cial\nIntelligence (AAAI), pages 805\u2013810, 2010.\n\n[7] D. Korzhyk, Z. Yin, C. Kiekintveld, V. Conitzer, and M. Tambe. Stackelberg vs. Nash in\nsecurity games: An extended investigation of interchangeability, equivalence, and uniqueness.\nJournal of Arti\ufb01cial Intelligence Research, 41:297\u2013327, 2011.\n\n[8] J. Letchford, V. Conitzer, and K. Munagala. Learning and approximating the optimal strategy\nto commit to. In Proceedings of the 2nd International Symposium on Algorithmic Game Theory\n(SAGT), pages 250\u2013262, 2009.\n\n[9] J. Marecki, G. Tesauro, and R. Segal. Playing repeated Stackelberg games with unknown\nopponents. In Proceedings of the 11th International Conference on Autonomous Agents and\nMulti-Agent Systems (AAMAS), pages 821\u2013828, 2012.\n\n[10] T. S. Motzkin, H. Raiffa, G. L. Thompson, and R. M. Thrall. The double description method.\n\nAnnals of Mathematics Studies, 2(28):51\u201373, 1953.\n\n[11] P. Paruchuri, J. P. Pearce, J. Marecki, M. Tambe, F. F. Ord\u00b4o\u02dcnez, and S. Kraus. Playing games for\nsecurity: An ef\ufb01cient exact algorithm for solving Bayesian Stackelberg games. In Proceedings\nof the 7th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS),\npages 895\u2013902, 2008.\n\n[12] J. Pita, M. Jain, M. Tambe, F. Ord\u00b4o\u02dcnez, and S. Kraus. Robust solutions to Stackelberg games:\nAddressing bounded rationality and limited observations in human cognition. Arti\ufb01cial Intel-\nligence, 174(15):1142\u20131171, 2010.\n\n[13] M. Tambe. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned.\n\nCambridge University Press, 2012.\n\n[14] A. Tauman Kalai and S. Vempala. Simulated annealing for convex optimization. Mathematics\n\nof Operations Research, 31(2):253\u2013266, 2006.\n\n9\n\n\f", "award": [], "sourceid": 977, "authors": [{"given_name": "Avrim", "family_name": "Blum", "institution": "CMU"}, {"given_name": "Nika", "family_name": "Haghtalab", "institution": "Carnegie Mellon University"}, {"given_name": "Ariel", "family_name": "Procaccia", "institution": "Carnegie Mellon University"}]}