{"title": "Local Aggregative Games", "book": "Advances in Neural Information Processing Systems", "page_first": 5341, "page_last": 5351, "abstract": "Aggregative games provide a rich abstraction to model strategic multi-agent interactions. We focus on learning local aggregative games, where the payoff of each player is a function of its own action and the aggregate behavior of its neighbors in a connected digraph. We show the existence of a pure strategy epsilon-Nash equilibrium in such games when the payoff functions are convex or sub-modular. We prove an information theoretic lower bound, in a value oracle model, on approximating the structure of the digraph with non-negative monotone sub-modular cost functions on the edge set cardinality. We also introduce gamma-aggregative games that generalize local aggregative games, and admit epsilon-Nash equilibrium that are stable with respect to small changes in some specified graph property. Moreover, we provide estimation algorithms for the game theoretic model that can meaningfully recover the underlying structure and payoff functions from real voting data.", "full_text": "Local Aggregative Games\n\nVikas K. Garg\nCSAIL, MIT\n\nvgarg@csail.mit.edu\n\nTommi Jaakkola\n\nCSAIL, MIT\n\ntommi@csail.mit.edu\n\nAggregative games provide a rich abstraction to model strategic multi-agent interactions. We introduce\nlocal aggregative games, where the payoff of each player is a function of its own action and the\naggregate behavior of its neighbors in a connected digraph. We show the existence of a pure strategy\n\u0001-Nash equilibrium in such games when the payoff functions are convex or sub-modular. We prove\nan information theoretic lower bound, in a value oracle model, on approximating the structure of the\ndigraph with non-negative monotone sub-modular cost functions on the edge set cardinality. We also\nde\ufb01ne a new notion of structural stability, and introduce \u03b3-aggregative games that generalize local\naggregative games and admit \u0001-Nash equilibrium that is stable with respect to small changes in some\nspeci\ufb01ed graph property. Moreover, we provide algorithms for our models that can meaningfully\nestimate the game structure and the parameters of the aggregator function from real voting data.\n\n1\n\nIntroduction\n\nStructured prediction methods have been remarkably successful in learning mappings between input\nobservations and output con\ufb01gurations [1; 2; 3]. The central guiding formulation involves learning a\nscoring function that recovers the con\ufb01guration as the highest scoring assignment. In contrast, in\na game theoretic setting, myopic strategic interactions among players lead to a Nash equilibrium\nor locally optimal con\ufb01guration rather than highest scoring global con\ufb01guration. Learning games\ntherefore involves, at best, enforcement of local consistency constraints as recently advocated [4].\n[4] introduced the notion of contextual potential games, and proposed a dual decomposition algorithm\nfor learning these games from a set of pure strategy Nash equilibria. However, since their setting was\nrestricted to learning undirected tree structured potential games, it cannot handle (a) asymmetries in\nthe strategic interactions, and (b) higher order interactions. Moreover, a wide class of strategic games\n(e.g. anonymous games [5]) do not admit a potential function and thus locally optimal con\ufb01gurations\ndo not coincide with pure strategy Nash equilibria. In such games, the existence of only (approximate)\nmixed strategy equilibria is guaranteed [6].\nIn this work, we focus on learning local aggregative games to address some of these issues. In an\naggregative game [7; 8; 9], every player gets a payoff that depends only on its own strategy and the\naggregate of all the other players\u2019 strategies. Aggregative games and their generalizations form a very\nrich class of strategic games that subsumes Cournot oligopoly, public goods, anonymous, mean \ufb01eld,\nand cost and surplus sharing games [10; 11; 12; 13]. In a local aggregative game, a player\u2019s payoff is\na function of its own strategy and the aggregate strategy of its neighbors (i.e. only a subset of other\nplayers). We do not assume that the interactions are symmetric or con\ufb01ned to a tree structure, and\ntherefore the game structure could, in general, be a spanning digraph, possibly with cycles.\nWe consider local aggregative games where each player\u2019s payoff is a convex or submodular Lipschitz\nfunction of the aggregate of its neighbors. We prove suf\ufb01cient conditions under which such games\nadmit some pure strategy \u0001-Nash equilibrium. We then prove an information theoretic lower bound\nthat for a speci\ufb01ed \u0001, approximating a game structure that minimizes a non-negative monotone\nsubmodular cost objective on the cardinality of the edge set may require exponentially many queries\nunder a zero-order or value oracle model. Our result generalizes the approximability of the submodular\nminimum spanning tree problem to degree constrained spanning digraphs [14]. We argue that this\nlower bound might be averted with a dataset of multiple \u0001-Nash equilibrium con\ufb01gurations sampled\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\ffrom the local aggregative game. We also introduce \u03b3-aggregative games that generalize local\naggregative games to accommodate the (relatively weaker) effect of players that are not neighbors.\nThese games are shown to have a desirable stability property that makes their \u0001-Nash equilibria robust\nto small \ufb02uctuations in the aggregator input. We formulate learning these games as optimization\nproblems that can be ef\ufb01ciently solved via branch and bound, outer approximation decomposition, or\nextended cutting plane methods [17; 18]. The information theoretic hardness results do not apply to\nour algorithms since they have access to the (sub)gradients as well, unlike the value oracle model\nwhere only the function values may be queried. Our experiments strongly corroborate the ef\ufb01cacy of\nthe local aggregative and \u03b3-aggregative games in estimating the game structure on two real voting\ndatasets, namely, the US Supreme Court Rulings and the Congressional Votes.\n\n2 Setting\nWe consider an n-player game where each player i \u2208 [n] (cid:44) {1, 2, . . . , n} plays a strategy (or action)\nfrom a \ufb01nite set Ai. For any strategy pro\ufb01le a, ai denotes the strategy of the ith player, and a\u2212i the\nstrategies of the other players. We are interested in local aggregative games that have the property that\nthe payoff of each player i depends only on its own action and the aggregate action of its neighbors\nNG(i) = {j \u2208 V (G) : (j, i) \u2208 E(G)} in a connected digraph G = (V, E), where |V | = n. Since,\nthe graph is directed, the neighbors need not be symmetric, i.e., (j, i) \u2208 E does not imply (i, j) \u2208 E.\nFor any strategy pro\ufb01le a, we will denote the strategy vector of neighbors of player i by aNG(i). We\nassume that player i has a payoff function of the form ui(ai, fG(a, i)), where fG(a, i) (cid:44) f (aNG(i))\nis a local aggregator function, and ui is convex and Lipschitz in the aggregate fG(a, i) for all ai \u2208 Ai.\nSince fG(a, i) may take only \ufb01nitely many values, we will assume interpolation between these values\nsuch that they form a convex set. We can de\ufb01ne the Lipschitz constant of G as\n{ui(ai, fG(a(cid:48), i)) \u2212 ui(ai, fG(a(cid:48)(cid:48), i))},\n\ni,ai,a(cid:48)\nwhere the vectors a(cid:48)\n\u2212i differ in exactly one coordinate. Clearly, the payoff of any player in\nthe network does not change by more than \u03b4(G) when one of the neighbors changes its strategy. We\ncan now talk about a class of aggregative games characterized by the Lipschitz constant:\n\n\u03b4(G) (cid:44) max\n\u2212i,a(cid:48)(cid:48)\n\u2212i\n\u2212i and a(cid:48)(cid:48)\n\n(1)\n\nL(\u2206, n) = {G : V (G) = n, \u03b4(G) \u2264 \u2206}.\n\nA strategy pro\ufb01le a = (ai, a\u2212i) is said to be a pure strategy \u0001-Nash equilibrium (\u0001-PSNE) if no player\ncan improve its payoff by more than \u0001 by unilaterally switching its strategy. In other words, any\nplayer i cannot gain more than \u0001 by playing an alternative strategy a(cid:48)\ni if the other players continue\nto play a\u2212i. More generally, instead of playing deterministic actions in response to the actions of\nothers, each player can randomize its actions. Then, the distributions over players\u2019 actions constitute\na mixed strategy \u0001-Nash equilibrium if any unilateral deviation could improve the expected payoff\nby at most \u0001. We will prove the existence of \u0001-PSNE in our setting. We will assume a training set\nS = {a1, a2, . . . , aM}, where each ai is an \u0001-PSNE sampled from our game. Our objective is to\nrecover the game digraph G and the payoff functions ui, i \u2208 [n] from the set S.\nThe rest of the paper is organized as follows. We \ufb01rst establish some important theoretical parapher-\nnalia on the local aggregative games in Section 3. In Section 4, we introduce \u03b3-aggregative games\nand show that \u03b3-aggregators are structurally stable. We formulate the learning problem in Section 5,\nand describe our experimental set up and results in Section 6. We state the theoretical results in the\nmain text, and provide the detailed proofs in the Supplementary (Section 7) for improved readability.\n\n3 Theoretical foundations\n\nAny \ufb01nite game is guaranteed to admit a mixed strategy \u0001-equilibrium due to a seminal result by\nNash [6]. However, general games may not have any \u0001-PSNE (for small \u0001). We \ufb01rst prove a suf\ufb01cient\ncondition for the existence of \u0001-PSNE in local aggregative games with small Lipschitz constant. A\nsimilar result holds when the payoff functions ui(\u00b7) are non-negative monotone submodular and\nLipschitz (see the supplementary material for details).\nTheorem 1. Any local aggregative game on a connected digraph G, where G \u2208 L(\u2206, n) and\nmax\n\n|Ai| \u2264 m, admits a 10\u2206(cid:112)ln(8mn)-PSNE.\n\ni\n\n2\n\n\fProof. (Sketch.) The main idea behind the proof is to sample a random strategy pro\ufb01le from a mixed\nstrategy Nash equilibrium of the game, and show that with high probability the sampled pro\ufb01le\ncorresponds to an \u0001-PSNE when the Lipschitz constant is small. The proof is based on a novel\napplication of the Talagrand\u2019s concentration inequality.\n\n\u221a\n\nTheorem 1 implies the minimum degree d (which depends on number of players n, the local\naggregator function A, Lipschitz constant \u2206, and \u0001) of the game structure that ensures the existence\nof at least one \u0001-PSNE. One example is the following local generalization of binary summarization\ngames [8]. Each player i plays ai \u2208 {0, 1} and has access to an averaging aggregator that computes\nthe fraction of its neighbors playing action 1. Then, the Lipschitz constant of G is 1/k, where k is the\nminimum degree the underlying game digraph. Then, an \u0001-PSNE is guaranteed for k = \u2126(\nln n/\u0001).\nIn other words, k needs to grow slowly (i.e., only sub-logarithmically) in the number of players n.\nAn important follow-up question is to determine the complexity of recovering the underlying game\nstructure in a local aggregative game with an \u0001-PSNE. We will answer this question in a combinatorial\nsetting with non-negative monotone submodular cost functions on the edge set cardinality. Speci\ufb01cally,\nwe consider the following problem. Given a connected digraph G(V, E), a degree parameter d, and\na submodular cost function h : 2E \u2192 R+ that is normalized (i.e. h(\u2205) = 0) and monotone (i.e.\nh(S) \u2264 h(T ) for all S \u2286 T \u2208 2E), we would like to \ufb01nd a spanning directed subgraph1 Gs of G\nsuch that f (Gs) is minimized, the in-degree of each player is at least d, and Gs admits some \u0001-Nash\nequilibrium when players play to maximize their individual payoffs. We \ufb01rst establish a technical\nlemma that provides tight lower and upper bounds on the probability that a directed random graph is\ndisconnected, and thus extends a similar result for Erd\u02ddos-R\u00e9nyi random graphs [25] to the directed\nsetting. The lemma will be invoked while proving a bound for the recovery problem, and might be of\nindependent interest beyond this work.\nLemma 2. Consider a directed random graph DG(n, p) where p \u2208 (0, 1) is the probability of\nchoosing any directed edge independently of others. De\ufb01ne q = 1 \u2212 p. Let Pn be the probability that\n\nDG is connected. Then, the probability that DG is disconnected is 1\u2212 PN = nq2(n\u22121) + O(cid:0)n2q3n(cid:1).\n\nWe will now prove an information theoretic lower bound for the recovery problem under the value\noracle model [14]. A problem with an information theoretic lower bound of \u03b2 has the property that\nany randomized algorithm that approximates the optimum to within a factor \u03b2 with high probability\nneeds to make superpolynomial number of queries under the speci\ufb01ed oracle model. In the value\noracle model, each query Q corresponds to obtaining the cost/value of any candidate set by issuing\nQ to the value oracle (which acts as a black-box). We invoke the Yao\u2019s minimax principle [28],\nwhich states the relation between distributional complexity and randomized complexity. Using Yao\u2019s\nprinciple, the performance of randomized algorithms can be lower bounded by proving that no\ndeterministic algorithm can perform well on an appropriately de\ufb01ned distribution of hard inputs.\nTheorem 3. Let \u0001 > 0, and \u03b1, \u03b4 \u2208 (0, 1). Let n be the number of players in a local aggregative\ngame, where each player i \u2208 [n] is provided with some convex \u2206-Lipschitz function ui and an\naggregator A. Let Dn (cid:44) Dn(\u2206, \u0001, A, (ui)i\u2208[n]) be the suf\ufb01cient in-degree (number of incoming\nedges) of each player such that the game admits some \u0001-PSNE when the players play to maximize\ntheir individual payoffs ui according to the local information provided by the aggregator A. Assume\nany non-negative monotone submodular cost function on the edge set cardinality. Then for any\nd \u2265 max{Dn, n\u03b1 ln n}/(1 \u2212 \u03b1), any randomized algorithm that approximates the game structure to\na factor n1\u2212\u03b1/(1 + \u03b4)d requires exponentially many queries under the value oracle model.\n\nProof. (Sketch.) The main idea is to construct a digraph that has exponentially many spanning\ndirected subgraphs, and de\ufb01ne two carefully designed submodular cost functions over the edges of\nthe digraph, one of which is deterministic in query size while the other depends on a distribution.\nWe make it hard for the deterministic algorithm to tell one cost function from the other. This can be\naccomplished by ensuring two conditions: (a) these cost functions map to the same value on almost\nall the queries, and (b) the discrepancy in the optimum value of the functions (on the optimum query)\nis massive. The proof invokes Lemma 2, exploits the degree constraint for \u0001-PSNE, argues about the\noptimal query size, and appeals to the Yao\u2019s minimax principle.\n\n1A spanning directed graph spans all the vertices, and has the property that the (multi)graph obtained by\n\nreplacing the directed edges with undirected edges is connected.\n\n3\n\n\fTheorem 3 might sound pessimistic from a practical perspective, however, a closer look reveals why\nthe query complexity turned out to be prohibitive. The proof hinged on the fact that all spanning\nsubgraphs with same edge cardinality that satis\ufb01ed the suf\ufb01ciency condition for existence of any\n\u0001-PSNE were equally good with respect to our deterministic submodular function, and we created\nan instance with exponentially such spanning subgraphs. However, we might be able to circumvent\nTheorem 3 by breaking the symmetry, e.g., by using data that speci\ufb01es multiple distinct \u0001-Nash\nequilibria. Then, since the digraph instance would be required to satisfy these equilibria, fooling\nthe deterministic algorithm would be more dif\ufb01cult. Thus data could, in principle, help us avoid the\ncomplexity result of Theorem 3. We will formulate optimization problems that would enforce margin\nseparability on the equilibrium pro\ufb01les, which will further limit the number of potential digraphs and\nthus facilitate learning the aggregative game. Moreover, the hardness result does not apply to our\nestimation algorithms that will have access to the (sub)gradients in addition to the function values.\n\n4 \u03b3-Aggregative Games\n\nWe now describe a generalization of the local aggregative games, which we call the \u03b3-aggregative\ngames. The main idea behind these games is that a player i \u2208 [n] may, often, be in\ufb02uenced not only\nby the aggregate behavior of its neighbors, but also to a lesser extent on the aggregate behavior of\nthe other players, whose in\ufb02uence on the payoff of i decreases with increase in their distance to i.\nLet dG(i, j) be the number of intermediate nodes on a shortest path from j to i in the underlying\ndigraph G = (V, E). That is, dG(i, j) = 0 if (j, i) \u2208 E, and 1 + mink\u2208V \\{i,j} dG(i, k) + dG(k, j)\notherwise. Let WG (cid:44) maxi,j\u2208V dG(i, j) be the width of G. For any strategy pro\ufb01le a \u2208 {0, 1}n\nG(i) = {j : dG(i, j) = t} be the set of nodes that have exactly t\nand t \u2208 {0, 1, . . . , WG}, let I t\nintermediaries on a shortest path to i, and let aI t\nG(i) be a strategy pro\ufb01le of the nodes in this set. We\nG(a, i) (cid:44) f (aI t\nde\ufb01ne aggregator functions f t\nG(i)) that return the aggregate at level t with respect to\nplayer i. Let \u03b3 \u2208 (0, 1) be a discount rate. De\ufb01ne the \u03b3-aggregator function\n\ngG(a, \u03b3, (cid:96), i) (cid:44) (cid:96)(cid:88)\n\n(cid:96)(cid:88)\n\n\u03b3tf t\n\nG(a, i)/\n\n\u03b3t,\n\nt=0\n\nt=0\n\nwhich discounts the aggregates based on the distance (cid:96) \u2208 {0, 1, . . . , WG} to i. We assume that player\ni \u2208 [n] has a payoff function of the form ui(ai,\u00b7), which is convex and \u03b7-Lipschitz in its second\nargument for each \ufb01xed ai. Finally, we de\ufb01ne the Lipschitz constant of the \u03b3-aggregative game as\n\n{ui(ai, gG(a(cid:48), \u03b3, WG, i)) \u2212 ui(ai, gG(a(cid:48)(cid:48), \u03b3, WG, i))},\n\n\u03b4\u03b3(G) (cid:44) max\n\u2212i,a(cid:48)(cid:48)\n\u2212i\n\ni,ai,a(cid:48)\n\u2212i and a(cid:48)(cid:48)\n\nwhere the vectors a(cid:48)\n\n\u2212i differ in exactly one coordinate.\n\nThe main criticism of the concept of \u0001-Nash equilibrium concerns lack of stability: if any player\ndeviates (due to \u0001-incentive), then in general, some other player may have a high incentive to deviate\nas well, resulting in a non-equilibrium pro\ufb01le. Worse, it may take exponentially many steps to reach\nan \u0001-equilibrium again. Thus, stability of \u0001-equilibrium is an important consideration. We will now\nintroduce an appropriate notion of stability, and prove that \u03b3-aggregative games admit stable pure\nstrategy \u0001-equilibrium in that any deviation by a player does not affect the equilibrium much.\nStructurally Stable Aggregator (SSA): Let G = (E, V ) be a connected digraph and PG(w) be a\nproperty of G, where w denotes the parameters of PG. Let A be an aggregator function that depends\non PG. Suppose M = (a1, a2, . . . , an) be an \u0001-PSNE when A aggregates information according\nto PG(w), where ai is the strategy of player i \u2208 V = [n]. Suppose now A aggregates information\naccording to PG(w(cid:48)). Then, A is a (\u03b1, \u03b2)P,w,w(cid:48)-structurally stable aggregator (SSA) with respect to\nG, where \u03b1 and \u03b2 are functions of the gap between w, w(cid:48), if it satis\ufb01es these conditions: (a) M is a\n(\u0001 + \u03b1)-equilibrium under PG(w(cid:48)), and (b) the payoff of each player at the equilibrium pro\ufb01le M\nunder PG(w(cid:48)) is at most \u03b2 = O(\u03b1) worse than that under PG(w).\n\nA SSA with small values of \u03b1 and \u03b2 with respect to a small change in w is desirable since\nthat would discourage the players from deviating from their \u0001-equilibrium strategy, however, such an\naggregator might not exist in general. The following result shows the \u03b3-aggregator is a SSA.\n\n4\n\n\fTheorem 4. Let \u03b3 \u2208 (0, 1), and gG(\u00b7,\u00b7, (cid:96),\u00b7) be the \u03b3-aggregator de\ufb01ned above. Let PG((cid:96)) be the\nproperty \u201cthe number of maximum permissible intermediaries in a shortest path of length (cid:96) in G\u201d.\nThen, gG is a (2\u03b7\u03baG, \u03b7\u03baG)P,WG,L- SSA, where L < WG and \u03baG depends on \u03b3 and WG \u2212 L.\n\n5 Learning formulation\n\nNi\n\nWe now formulate an optimization problem to recover the underlying graph structure, the parameters\nof the aggregator function, and the payoff functions. Let S = {a1, a2, . . . , aM} be our training\nset, where each strategy pro\ufb01le am \u2208 {0, 1}n is an \u0001-PSNE, and am\nis the action of player i in\nexample m \u2208 [M ]. Let f be a local aggregator function, and let am\ni\nbe the actions of neighbors Ni\nof player i \u2208 [n] on training example m. We will also represent N as a 0-1 adjacency matrix, with\nthe interpretation that Nij = 1 implies that j \u2208 Ni, and Nij = 0 otherwise. We will use the notation\nNi\u00b7 (cid:44) {Nij : j (cid:54)= i}. Note that since the underlying game structure is represented as a digraph, Nij\nand Nji need not be equal. Let h be a concave function such that h(0) = 0. Then Fi(h) (cid:44) h(|Ni|)\nis submodular since the concave transformation of the cardinality function results in a submodular\ni\u2208[n] Fi(h) is submodular since it is a sum of submodular functions.\nWe will use F (h) as a sparsity-inducing prior. Several choices of h have been advocated in the\nliterature, including suitably normalized geometric, log, smooth log and square root functions [15].\nWe would denote the parameters of the aggregator function f by \u03b8f . The payoff functions will depend\non the choice of this parameterization. For a \ufb01xed aggregator f (such as the sum aggregator), linear\nparameterization is one possibility, where the payoff function for player i \u2208 [n] takes the form,\n\nfunction. Moreover F (h) =(cid:80)\n\nuf\ni (am, Ni\u00b7) = am\n\ni wi1(wf f (am\nNi\n\n) + bf ) + (1 \u2212 am\n\ni )wi0(wf f (am\nNi\n\n) + bf ),\n\nwhere wi\u00b7 = (wi0, wi1)(cid:62) and Ni\u00b7 denote the independent parameters for player i and \u03b8f = (wf , bf )(cid:62)\nare the shared parameters. Our setting is \ufb02exible, and we can easily accommodate more complex\naggregators instead of the standard aggregators (e.g. sum). Exchangeable functions over sets [16]\nprovide one such example. An interesting instantiation is a neural network comprising one hidden\nlayer, an output sum layer, with tied weights. Speci\ufb01cally, let W \u2208 Rn\u00d7(n\u22121) where all entries of W\nare equal to wN N . Let \u03c3 be an element-wise non-linearity (e.g. we used the ReLU function, \u03c3(x) =\nmax{x, 0} for our experiments). Then, using the element-wise multiplication operator (cid:12) and a vector\n1 with all ones, ui may be expressed as ufN N\n),\ni )wi0fN N (am\nNi\nwhere the permutation invariant neural aggregator, parameterized by \u03b8fN N = (wN N , bN N )(cid:62),\n\ni wi1fN N (am\nNi\n\n(am, Ni\u00b7) = am\n\n)+(1\u2212am\n\ni\n\nfN N (am\nNi\n\n) = 1(cid:62)\u03c3(W am\u2212i (cid:12) Ni\u00b7 + bN N ).\n\nWe could have more complex functions such as deeper neural nets, with parameter sharing, at\nthe expense of increased computation. We believe this versatility makes local aggregative games\nparticularly attractive, and provides a promising avenue for modeling structured strategic settings.\nEach am is an \u0001-PSNE, so it ensures a locally (near) optimal reward for each player. We will impose\na margin constraint on the difference in the payoffs when player i unilaterally deviates from am\ni .\nNote that Ni = {j \u2208 Ni\u00b7 : Nij = 1}. Then, introducing slack variables \u03bem\ni , and hyperparameters\nC, C(cid:48), Cf > 0, we obtain the following optimization problem in O(n2) variables:\nC\nM\n\nn(cid:88)\n\nFi(h) +\n\nmin\n\n\u03bem\ni\n\n1\n2\n\ni=1\n\nCf\n2M\n\n||wi\u00b7||2 +\ni (am, Ni\u00b7) \u2212 uf\nuf\n\nC(cid:48)\nn\n\nn(cid:88)\n\nn(cid:88)\nM(cid:88)\n||\u03b8f||2 +\ni (1 \u2212 am, Ni\u00b7) \u2265 e(am, a(cid:48)) \u2212 \u03bem\nNi\u00b7 \u2208 {0, 1}n\u22121,\n\ni \u2265 0\n\u03bem\n\ni=1\n\ni=1\n\ni\n\nm=1\n\ns.t.\n\n\u03b8f ,w1\u00b7,...,wn\u00b7,Ni\u00b7,...,Nn\u00b7,\u03be\n\u2200i \u2208 [n], m \u2208 [M ] :\n\u2200i \u2208 [n], m \u2208 [M ] :\n\u2200i \u2208 [n] :\n\nwhere am and a(cid:48) differ in exactly one coordinate, and e is a margin speci\ufb01c loss term, such as\nHamming loss eH (a, \u02dca) = 1{a (cid:54)= \u02dca} or scaled 0-1 loss es(a, \u02dca) = 1{a (cid:54)= \u02dca}/n. From a game\ntheoretic perspective, the scaled loss has a natural asymptotic interpretation: as the number of players\ni (1 \u2212 am, Ni\u00b7) \u2212 \u03bem\ni (am, Ni\u00b7) \u2265 uf\nn \u2192 \u221e, es(am, a(cid:48)) \u2192 0, and we get \u2200i \u2208 [n], m \u2208 [M ] : uf\ni ,\ni.e., each training example am is an \u0001-PSNE, where \u0001 = maxi\u2208[n],m\u2208[M ] \u03bem\ni .\nOnce \u03b8f are \ufb01xed, the problem clearly becomes separable, i.e., each player i can solve an independent\nsub-problem in O(n) variables. Each sub-problem includes both continuous and binary variables,\n\n5\n\n\fn(cid:88)\n\nn\u22121(cid:88)\n\n[h(k + 1) \u2212 h(k)]|Ni(k)|.\n\nand may be solved via branch and bound, outer approximation decomposition, or extended cutting\nplane methods (see [17; 18] for an overview of these techniques). The individual solutions can be\nforced to agree on \u03b8f via a standard dual decomposition procedure, and methods like alternating\ndirection method of multipliers (ADMM) [19] could be leveraged to facilitate rapid agreement of the\ncontinuous parameters wf and bf . The extension to learning the \u03b3-aggregative games is immediate.\nWe now describe some other optimization variants for the local aggregative games. Instead of\nconstraining each player to a hard neighborhood, one might relax the constraints Nij \u2208 {0, 1} to\nNij \u2208 [0, 1], where Nij might be interpreted as the strength of the edge (j, i). The Lov\u00e1sz convex\nrelaxation of F [20] is a natural prior for inducing sparsity in this case. Speci\ufb01cally, for an ordering\nof values |Ni(0)| \u2265 |Ni(1)| . . . \u2265 |Ni(n\u22121)|, i \u2208 [n], this prior is given by\n\n\u0393h(N ) =\n\n\u0393h(N, i), where \u0393h(N, i) =\n\ni=1\n\nk=0\n\ni by requiring(cid:80)\n\nSince the transformation h encodes the preference for each degree, \u0393h(N ) will act as a prior that\nencourages structured sparsity. One might also enforce other constraints on the structure of the\nlocal aggregative game. For instance, an undirected graph could be obtained by adding constraints\nNij = Nji, for i \u2208 [n], j (cid:54)= i. Likewise, a minimum in-degree constraint may be enforced on player\nj Nij \u2265 d. Both these constraints are linear in Ni\u00b7, and thus do not add to the\ncomplexity of the problem. Finally, based on cues such as domain knowledge, one may wish to add a\ndegree of freedom by not enforcing sharing of the parameters of the aggregator among the players.\n\n6 Experiments\n\nWe now present strong empirical evidence to demonstrate the ef\ufb01cacy of local aggregative games in\nunraveling the aggregative game structure of two real voting datasets, namely, the US Supreme Court\nRulings dataset and the Congressional Votes dataset. Our experiments span the different variants\nfor recovering the structure of the aggregative games including settings where (a) parameters of\nthe aggregator are learned along with the payoffs, (b) in-degree of each node is lower bounded, (c)\n\u03b3-discounting is used, or (d) parameters of the aggregator are \ufb01xed. We will also demonstrate that\nour method compares favorably with the potential games method for tree structured games [4], even\nwhen we relax the digraph setting to let weights Nij \u2208 [0, 1] instead of {0, 1} or force the game\nstructure to be undirected by adding the constraints Nij = Nji. For our purposes, we used the\ni + 1 \u2212 1 + \u03b1i parameterized by \u03b1, the sum and\nsmoothed square-root concave function, h(i) =\nneural aggregators, and the scaled 0-1 loss function es(a, \u02dca) = 1{a (cid:54)= \u02dca}/n. We found our model to\nperform well across a very wide range of hyperparameters. All the experiments described below used\nthe following setting of values: \u03b1 = 1, C = 100, and Cf = 1. C(cid:48) was also set to 0.01 in all settings\n\u221a\nexcept when the parameters of the aggregator were \ufb01xed, when we set C(cid:48) = 0.01\n\n\u221a\n\nn.\n\nT\n\n91%\n\nA\n\nThomas\n\nAlito\n\n86%\n\n80%\n\nSo\n\n94%\n\nKa\n\n93%\n\nG\n\nSotomayor\n\nKagan\n\nGinsburg\n\n91%\n\n93%\n\nScalia\n\nRoberts\n\nS\n\n90%\n\nR\nConservatives\n\nK\n\nKennedy\n\n89%\n\nB Breyer\n\nLiberals\n\nFigure 1: Supreme Court Rulings (full bench): The digraph recovered by the local aggregative\nand \u03b3-aggregative games ((cid:96) \u2264 2, all \u03b3) with the sum aggregator as well as the neural aggregator is\nconsistent with the known behavior of the Justices: conservative and liberal sides of the bench are\nwell segregated from each other, while the moderate Justice Kennedy is positioned near the center.\nNumbers on the arrows are taken from an independent study [21] on Justices\u2019 mutual voting patterns.\n\n6\n\n\f6.1 Dataset 1: Supreme Court Rulings\n\nWe experimented with a dataset containing all non-unanimous rulings by the US Supreme court\nbench during the year 2013. We denote the Justices of the bench by their last name initials, and\nadd a second character to some names to avoid the con\ufb02icts in the initials: Alito (A), Breyer (B),\nGinsburg(G), Kennedy (K), Kagan (Ka), Roberts (R), Scalia (S), Sotomayor (So), and Thomas (T).\nWe obtained a binary dataset following the procedure described in [4].\n\nT\n\nS\n\nT\n\nS\n\nA\n\nR\n\nK\n\n(a) Local Aggregative\n\nA\n\nR\n\nK\n\nG\n\nB\n\nG\n\nB\n\nS\n\nA\n\nT\n\nR\n\nK\n\nG\n\nB\n\n(b) Potential Exhaustive Enumeration\n\nT\n\nA\n\nR\n\nS\n\nK\n\nG\n\nB\n\n(c) Local Aggregative (Undirected & Relaxed)\n\n(d) Potential Hamming\n\nFigure 2: Comparison with the potential games method [4]: (a) The digraph produced by our\nmethod with the sum as well as the neural aggregator is consistent with the expected voting behavior\nof the Justices on the data used by [4] in their experiments. (c) Relaxing all Nij \u2208 [0, 1] and enforcing\nNij = Nji still resulted in a meaningful undirected structure. (b) & (d) The tree structures obtained\nby the brute force and the Hamming distance restricted methods [4] fail to capture higher order\ninteractions, e.g., the strongly connected component between Justices A, T, S and R.\n\n.\n\nT\n\nS\n\nA\n\nR\n\nK\n\nG\n\nB\n\nT\n\nR\n\nA\n\nS\n\nK\n\nG\n\nB\n\n(a) Local Aggregative (d >= 2)\n\n(b) \u03b3 \u2212 aggregative ((cid:96) = 2, \u03b3 = 0.9)\n\nFigure 3: Degree constrained and \u03b3-aggregative games: (a) Enforcing the degree of each node to\nbe at least 2 reinforces the intra-republican and the intra-democrat af\ufb01nity, reaf\ufb01rming their respective\njurisprudences, and (b) \u03b3-aggregative games also support this observation: the same digraph as Fig.\n2(a) is obtained unless (cid:96) and \u03b3 are set to high values (plot generated with (cid:96) = 2, \u03b3 = 0.9), when the\nstrong effect of one-hop and two-hop neighbors overpowers the direct connection between B and G.\n\nFig. 1 shows the structure recovered by the local aggregative method. The method was able to\ndistinguish the conservative side of the court (Justices A, R, S, and T) from the left side (B, G, Ka, and\nSo). Also, the structure places Justice Kennedy in between the two extremes, which is consistent with\nhis moderate jurisprudence. To put our method in perspective, we also compare the result of applying\nour method on the same subset of the full bench data that was considered by [4] in their experiments.\nFig. 2 demonstrates how the local aggregative approach estimated meaningful structures consistent\nwith the full bench structure, and compared favorably with both the methods of [4]. Finally, Fig. 3(a)\n\n7\n\n\fand 3(b) demonstrate the effect of enforcing minimum in-degree constraints in the local aggregative\ngames, and increasing (cid:96) and \u03b3 in the \u03b3-aggregative games respectively. As expected, the estimated\n\u03b3-aggregative structure is stable unless \u03b3 and (cid:96) are set to high values when non-local effects kick in.\nWe provide some additional results on the degree-constrained local aggregative games (Fig. 4 ) and\nthe \u03b3-aggregative games (Fig. 5). In particular, we see that the \u03b3-aggregative games are indeed robust\nto small changes in the aggregator input as expected in the light of stability result of Theorem 4.\n\nT\n\nS\n\nA\n\nThomas\n\nAlito\n\nScalia\n\nRoberts\n\nR\nConservatives\n\nK\n\nKennedy\n\nSo\n\nSotomayor\n\nKa\n\nKagan\n\nB Breyer\n\nG\n\nGinsburg\n\nLiberals\n\nFigure 4: Degree constrained local aggregative games (full bench): The digraph recovered by the\nlocal aggregative method when the degree of each node was constrained to be at least 2. Clearly,\nthe cohesion among the Justices on the conservative side got strengthened by the degree constraint\n(likewise for the liberal side of the bench). On the other hand, no additional edges were added\nbetween the two sides.\n\nT\n\nS\n\nA\n\nThomas\n\nAlito\n\nScalia\n\nRoberts\n\nR\nConservatives\n\nK\n\nKennedy\n\nSo\n\nKa\n\nG\n\nSotomayor\n\nKagan\n\nGinsburg\n\nB Breyer\n\nLiberals\n\nFigure 5: \u03b3-Aggregative Games (full bench): The digraph estimated by the \u03b3-aggregative method\nfor (cid:96) = 2, \u03b3 = 0.9, and lower values of \u03b3 and/or (cid:96). Note that an identical structure was obtained by\nthe local aggregative method (Fig. 1). This indicates that despite heavily weighting the effect of the\nnodes on a shortest path with one or two intermediary hops, the structure in Fig. 1 is very stable.\nAlso, this substantiates our theoretical result about the stability of the \u03b3-aggregative games.\n\n6.2 Dataset 2: Congressional Votes\n\nWe also experimented with the Congressional Votes data [22], that contains the votes by the US\nSenators on all the bills of the 110 US Congress, Session 2. Each of the 100 Senators voted in\nfavor of (treated as 1) or against each bill (treated as 0). Fig. 6 shows that the local aggregative\nmethod provides meaningful insights into the voting patterns of the Senators as well. In particular,\nfew connections exist between the nodes in red and those in blue, making the bipartisan structure\nquite apparent. In some cases, the intra-party connections might be bolstered due to same state\naf\ufb01liations, e.g. Senators Corker (28) and Alexander (2) represent Tennessee. The cross connections\nmay also capture some interesting collaborations or in\ufb02uences, e.g., Senators Allard (3) and Clinton\n(22) introduced the Autism Act. Likewise, Collins (26) and Carper (19) reintroduced the Fire Grants\nReauthorization Act. The potential methods [4] failed to estimate some of these strategic interactions.\nLikewise, Fig. 7 provides some interesting insights regarding the ideologies of some Senators that\nfollow a more centrist ideology than their respective political af\ufb01liations would suggest.\n\n8\n\n\f29\n\n30\n\n28\n\n2\n\n14\n\n10\n\n21\n\n24\n\n4\n\n15\n\n25\n\n7\n\n23\n\n3\n\n13\n\n26\n\n17\n\n11\n\n18\n\n12\n\n22\n\n20\n\n8\n\n1\n\n16\n\n5\n\n19\n\n9\n\n27\n\n6\n\nFigure 6: Comparison with [4] on the Congressional Votes data: The digraph recovered by local\naggregative method, on the data used by [4], when the parameters of the sum aggregator were \ufb01xed\n(wf = 1, bf = 0). The segregation between the Republicans (shown in red) and the Democrats\n(shown in blue) strongly suggests that they are aligned according to their party policies.\n\nFigure 7: Complete Congressional Votes data: The digraph recovered on \ufb01xing parameters, relax-\ning Nij to [0, 1], and thresholding at 0.05. The estimated structure not only separates majority of\nthe reds from the blues, but also associates closely the then independent Senators Sanders (82) and\nLieberman (62) with the Democrats. Moreover, the few reds among the blues generally identify with\na more centrist ideology - Collins (26) and Snowe (87) are two prominent examples.\n\nConclusion\n\nAn overwhelming majority of literature on machine learning is restricted to modeling non-strategic\nsettings. Strategic interactions in several real world systems such as decision/voting often exhibit local\nstructure in terms of how players are guided by or respond to each other. In other words, different\nagents make rational moves in response to their neighboring agents leading to locally stable con\ufb01gu-\nrations such as Nash equilibria. Another challenge with modeling the strategic settings is that they\nare invariably unsupervised. Consequently, standard learning techniques such as structured prediction\nthat enforce global consistency constraints fall short in such settings (cf. [4]). As substantiated by our\nexperiments, local aggregative games nicely encapsulate various strategic applications, and could\nbe leveraged as a tool to glean important insights from voting data. Furthermore, the stability of\napproximate equilibria is a primary consideration from a conceptual viewpoint, and the \u03b3-aggregative\ngames introduced in this work add a fresh perspective by achieving structural stability.\n\n9\n\n\fReferences\n[1] J. Lafferty, A. McCallum, and F. Pereira. Conditional random \ufb01elds: Probabilistic models for segmenting\n\nand labeling sequence data, ICML, 2001.\n\n[2] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks, NIPS, 2003.\n\n[3] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and\n\ninterdependent output variables, JMLR, 6(2), pp. 1453-1484, 2005.\n\n[4] V. K. Garg and T. Jaakkola. Learning Tree Structured Potential Games, NIPS, 2016.\n\n[5] C. Daskalakis and C. H. Papadimitriou. Approximate Nash equilibria in anonymous games, Journal of\n\nEconomic Theory, 156, pp. 207-245, 2015.\n\n[6] J. Nash. Non-Cooperative Games, Annals of Mathematics, 54(2), pp. 286-295, 1951.\n\n[7] R. Selten. Preispolitik der Mehrproduktenunternehmung in der Statischen Theorie, Springer-Verlag, 1970.\n\n[8] M. Kearns and Y. Mansour. Ef\ufb01cient Nash computation in large population games with bounded in\ufb02uence,\n\nUAI, 2002.\n\n[9] R. Cummings, M. Kearns, A. Roth, and Z. S. Wu. Privacy and truthful equilibrium selection for aggregative\n\ngames, WINE, 2015.\n\n[10] R. Cornes and R. Harley. Fully Aggregative Games, Economic Letters, 116, pp. 631-633, 2012.\n\n[11] W. Novshek. On the Existence of Cournot Equilibrium, Review of Economic Studies, 52, pp. 86-98, 1985.\n\n[12] M. K. Jensen. Aggregative Games and Best-Reply Potentials, Economic Theory, 43, pp. 45-66, 2010.\n\n[13] J. M. Lasry and P. L. Lions. Mean \ufb01eld games, Japanese Journal of Mathematics, 2(1), pp. 229-260, 2007.\n\n[14] G. Goel, C. Karande, P. Tripathi, and L. Wang. Approximability of Combinatorial Problems with Multi-\n\nagent Submodular Cost Functions, FOCS, 2009.\n\n[15] A. J. Defazio and T. S. Caetano. A convex formulation for learning scale-free networks via submodular\n\nrelaxation, NIPS, 2012.\n\n[16] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. Salakhutdinov, and A. Smola. Deep Sets,\n\narXiv:1703.06114, 2017.\n\n[17] P. Bonami et al. An algorithmic framework for convex mixed integer nonlinear programs, Discrete\n\nOptimization, 5(2), pp. 186-204, 2008.\n\n[18] P. Bonami, M. Kilin\u00e7, and J. Linderoth J. Algorithms and Software for Convex Mixed Integer Nonlinear\nPrograms, Mixed Integer Nonlinear Programming, The IMA Volumes in Mathematics and its Applications,\n154, Springer, 2012.\n\n[19] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning\nvia the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3, 2011.\n\n[20] F. Bach. Structured sparsity-inducing norms through submodular functions, NIPS, 2010.\n\n[21] J. Bowers, A. Liptak and D. Willis. Which Supreme Court Justices Vote Together Most and Least Often,\n\nThe New York Times, 2014.\n\n[22] J. Honorio and L. Ortiz. Learning the Structure and Parameters of Large-Population Graphical Games from\n\nBehavioral Data, JMLR, 16, pp. 1157-1210, 2015.\n\n[23] Y. Azrieli and E. Shmaya. Lipschitz Games, Mathematics of Operations Research, 38(2), pp. 350-357,\n\n2013.\n\n[24] E. Kalai. Large robust games. Econometrica, 72(6), pp. 1631-1665, 2004.\n\n[25] E. N. Gilbert. Random Graphs, The Annals of Mathematical Statistics, 30(4), pp. 1141-1144,1959.\n\n[26] W. Feller. An Introduction to Probability Theory and its Applications, Vol. 1, Second edition, Wiley, 1957.\n\n[27] M.-F. Balcan and N. J. A. Harvey. Learning Submodular Functions, STOC, 2011.\n\n10\n\n\f[28] A. Yao. Probabilistic computations: Toward a uni\ufb01ed measure of complexity, FOCS, 1977.\n\n[29] U. Feige, V. S. Mirrokni, and J. Vondrak. Maximizing non-monotone submodular functions, FOCS, 2007.\n\n[30] M. X. Goemans, N. J. A. Harvey, S. Iwata, and V. S. Mirrokni. Approximating submodular functions\n\neverywhere, SODA, 2009.\n\n[31] Z. Svitkina and L. Fleischer. Submodular approximation: Sampling based algorithms and lower bounds,\n\nFOCS, 2008.\n\n11\n\n\f", "award": [], "sourceid": 2769, "authors": [{"given_name": "Vikas", "family_name": "Garg", "institution": "MIT"}, {"given_name": "Tommi", "family_name": "Jaakkola", "institution": "MIT"}]}