{"title": "Statistical Cost Sharing", "book": "Advances in Neural Information Processing Systems", "page_first": 6221, "page_last": 6230, "abstract": "We study the cost sharing problem for cooperative games in situations where the cost function C is not available via oracle queries, but must instead be learned from samples drawn from a distribution, represented as tuples (S, C(S)), for different subsets S of players. We formalize this approach, which we call statistical cost sharing, and consider the computation of the core and the Shapley value. Expanding on the work by Balcan et al, we give precise sample complexity bounds for computing cost shares that satisfy the core property with high probability for any function with a non-empty core. For the Shapley value, which has never been studied in this setting, we show that for submodular cost functions with curvature bounded curvature kappa it can be approximated from samples from the uniform distribution to a sqrt{1 - kappa} factor, and that the bound is tight. We then define statistical analogues of the Shapley axioms, and derive a notion of statistical Shapley value and that these can be approximated arbitrarily well from samples from any distribution and for any function.", "full_text": "Statistical Cost Sharing\n\nEric Balkanski\nHarvard University\n\nericbalkanski@g.harvard.edu\n\nUmar Syed\nGoogle NYC\n\nusyed@google.com\n\nSergei Vassilvitskii\n\nGoogle NYC\n\nsergeiv@google.com\n\nAbstract\n\nWe study the cost sharing problem for cooperative games in situations where the\ncost function C is not available via oracle queries, but must instead be learned from\nsamples drawn from a distribution, represented as tuples (S, C(S)), for different\nsubsets S of players. We formalize this approach, which we call STATISTICAL\nCOST SHARING, and consider the computation of the core and the Shapley value.\nExpanding on the work by Balcan et al. [2015], we give precise sample complexity\nbounds for computing cost shares that satisfy the core property with high proba-\nbility for any function with a non-empty core. For the Shapley value, which has\nnever been studied in this setting, we show that for submodular cost functions\nwith bounded curvature \uf8ff it can be approximated from samples from the uniform\n\ndistribution to a p1 \uf8ff factor, and that the bound is tight. We then de\ufb01ne statis-\n\ntical analogues of the Shapley axioms, and derive a notion of statistical Shapley\nvalue and that these can be approximated arbitrarily well from samples from any\ndistribution and for any function.\n\n1\n\nIntroduction\n\nThe cost sharing problem asks for an equitable way to split the cost of a service among all of the\nparticipants. Formally, there is a cost function C de\ufb01ned over all subsets S \u2713 N of a ground set of\nelements, or players, and the objective is to fairly divide the cost of the ground set C(N ) among the\nplayers. Unlike traditional learning problems, the goal here is not to predict the cost of the service,\nbut rather learn which ways of dividing the cost among the players are equitable.\nCost sharing is central to cooperative game theory, and there is a rich literature developing the\nkey concepts and principles to reason about this topic. Two popular cost sharing concepts are\nthe core [Gillies, 1959], where no group of players has an incentive to deviate, and the Shapley\nvalue [Shapley, 1953], which is the unique vector of cost shares satisfying four natural axioms.\nWhile both the core and the Shapley value are easy to de\ufb01ne, computing them poses additional\nchallenges. One obstacle is that the computation of the cost shares requires knowledge of costs in\nmyriad different scenarios. For example, computing the exact Shapley value requires one to look at\nthe marginal contribution of a player over all possible subsets of others. Recent work [Liben-Nowell\net al., 2012] shows that one can \ufb01nd approximate Shapley values for a restricted subset of cost\nfunctions by looking at the costs for polynomially many speci\ufb01cally chosen subsets. In practice,\nhowever, another roadblock emerges: one cannot simply query for the cost of an arbitrary subset.\nRather, the subsets are passively observed, and the costs of unobserved subsets are simply unknown.\nWe share the opinion of Balcan et al. [2016] that the main dif\ufb01culty with using cost sharing methods\nin concrete applications is the information needed to compute them.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fConcretely, consider the following cost sharing applications.\n\nAttributing Battery Consumption on Mobile Devices. A modern mobile phone or tablet is\ntypically running a number of distinct apps concurrently. In addition to foreground processes, a lot\nof activity may be happening in the background: email clients may be fetching new mail, GPS may\nbe active for geo-fencing applications, messaging apps are polling for new noti\ufb01cations, and so on.\nAll of these activities consume power; the question is how much of the total battery consumption\nshould be attributed to each app? This problem is non-trivial because the operating system induces\ncooperation between apps to save battery power. For example there is no need to activate the GPS\nsensor twice if two different apps request the current location almost simultaneously.\n\nUnderstanding Black Box Learning Deep neural networks are prototypical examples of black\nbox learning, and it is almost impossible to tease out the contribution of a particular feature to the\n\ufb01nal output. Particularly in situations where the features are binary, cooperative game theory gives a\nformal way to analyze and derive these contributions. While one can evaluate the objective function\non any subset of features, deep networks are notorious for performing poorly on certain out of sample\nexamples [Goodfellow et al., 2014, Szegedy et al., 2013], which may lead to misleading conclusions\nwhen using traditional cost sharing methods.\nWe model these cost sharing questions as follows. Let N be the set of possible players (apps or\nfeatures), and for a subset S \u2713 N, let C(S) denote the cost of S. This cost represents the total power\nconsumed over a standard period of time, or the rewards obtained by the learner. We are given ordered\npairs (S1, C(S1)), (S2, C(S2)), . . . , (Sm, C(Sm)), where each Si \u2713 N is drawn independently\nfrom some distribution D. The problem of STATISTICAL COST SHARING asks to look for reasonable\ncost sharing strategies in this setting.\n\n1.1 Our results\n\nWe build on the approach from Balcan et al. [2015], which studied STATISTICAL COST SHARING in\nthe context of the core, and assume that only partial data about the cost function is observed. The\nauthors showed that cost shares that are likely to respect the core property can be obtained for certain\nrestricted classes of functions. Our main result is an algorithm that generalizes these results for all\ngames where the core is non-empty and we derive sample complexity bounds showing exactly the\nnumber of samples required to compute cost shares (Theorems 1 and 2). While the main approach\nof Balcan et al. [2015] relied on \ufb01rst learning the cost function and then computing cost shares, we\nshow how to proceed directly, computing cost shares without explicitly learning a good estimate of\nthe cost function. This high level idea was independently discovered by Balcan et al. [2016]; our\napproach here greatly improves the sample complexity bounds, culminating in a result logarithmic in\nthe number of players. We also show that approximately satisfying the core with probability one is\nimpossible in general (Theorem 3).\nWe then focus on the Shapley value, which has never been studied in the STATISTICAL COST\n\nSHARING context. We obtain a tight p1 \uf8ff multiplicative approximation of the Shapley values for\n\nsubmodular functions with bounded curvature \uf8ff over the uniform distribution (Theorems 4 and 11),\nbut show that they cannot be approximated by a bounded factor in general, even for the restricted\nclass of coverage functions, which are learnable, over the uniform distribution (Theorem 5). We\nalso introduce a new cost sharing method called data-dependent Shapley value which is the unique\nsolution (Theorem 6) satisfying four natural axioms resembling the Shapley axioms (De\ufb01nition 7),\nand which can be approximated arbitrarily well from samples for any bounded function and any\ndistribution (Theorem 7).\n\n1.2 Related work\n\nThere are two avenues of work which we build upon. The \ufb01rst is the notion of cost sharing in\ncooperative games, \ufb01rst introduced by Von Neumann and Morgenstern [1944]. We consider the\nShapley value and the core, two popular solution concepts for cost-sharing in cooperative games. The\nShapley value [Shapley, 1953] is studied in algorithmic mechanism design [Anshelevich et al., 2008,\nBalkanski and Singer, 2015, Feigenbaum et al., 2000, Moulin, 1999]. For applications of the Shapley\nvalue, see the surveys by Roth [1988] and Winter [2002]. A naive computation of the Shapley value\nof a cooperative game would take exponential time; recently, methods for ef\ufb01ciently approximating\n\n2\n\n\fthe Shapley value have been suggested [Bachrach et al., 2010, Fatima et al., 2008, Liben-Nowell\net al., 2012, Mann, 1960] for some restricted settings.\nThe core, introduced by Gillies [1959], is another well-studied solution concept for cooperative\ngames. Bondareva [1963] and Shapley [1967] characterized when the core is non-empty. The\ncore has been studied in the context of multiple combinatorial games, such as facility location\nGoemans and Skutella [2004] and maximum \ufb02ow Deng et al. [1999]. In cases with no solutions in\nthe core or when it is computationally hard to \ufb01nd one, the balance property has been relaxed to hold\napproximately [Devanur et al., 2005, Immorlica et al., 2008]. In applications where players submit\nbids, cross-monotone cost sharing, a concept stronger than the core that satis\ufb01es the group strategy\nproofness property, has attracted a lot of attention [Immorlica et al., 2008, Jain and Vazirani, 2002,\nMoulin and Shenker, 2001, P\u00e1l and Tardos, 2003]. We note that these applications are suf\ufb01ciently\ndifferent from the ones we are studying in this work.\nThe second is the recent work in econometrics and computational economics that aims to estimate\ncritical concepts directly from a limited data set, and reason about the sample complexity of the\ncomputational problems. Speci\ufb01cally, in all of the above papers, the algorithm must be able to query\nor compute C(S) for an arbitrary set S \u2713 N. In our work, we are instead given a collection of\nsamples from some distribution; importantly the algorithm does not know C(S) for sets S that were\nnot sampled. This approach was \ufb01rst introduced by Balcan et al. [2015], who showed how to compute\nan approximate core for some families of games. Their main technique is to \ufb01rst learn the cost\nfunction C from samples and then to use the learned function to compute cost shares. The authors\nalso showed that there exist games that are not PAC-learnable but that have an approximate core that\ncan be computed. Independently, in recent follow up work, the authors showed how to extend their\napproach to compute a probably approximate core for all games with a non-empty core, and gave\nweak sample complexity bounds [Balcan et al., 2016]. We improve upon their bounds, showing that a\nlogarithmic number of samples suf\ufb01ces when the spread of the cost function is bounded.\n\n2 Preliminaries\n\nA cooperative game is de\ufb01ned by an ordered pair (N, C), where N is the ground set of elements, also\ncalled players, and C : 2N ! R0 is the cost function mapping each coalition S \u2713 N to its cost,\nC(S). The ground set of size n = |N| is called the grand coalition and we denote the elements by\nN = {1, . . . , n} = [n]. We assume that C(;) = 0, C(S) 0 for all S \u2713 N, and that maxS C(S)\nis bounded by a polynomial in n, which are standard assumptions. We will slightly abuse notation\nand use C(i) instead of C({i}) for i 2 N when it is clear from the context.\nWe recall three speci\ufb01c classes of functions. Submodular functions exhibit the property of diminishing\nreturns: CS(i) CT (i) for all S \u2713 T \u2713 N and i 2 N where CS(i) is the marginal contribution of\nelement i to set S, i.e., CS(i) = C(S [ {i}) C(S). Coverage functions are the canonical example\nof submodular functions. A function is coverage if it can be written as C(S) = | [i2S Ti| where\nTi \u2713 U for some universe U. Finally, we also consider the simple class of additive functions, such\nthat C(S) =Pi2S C(i).\nA cost allocation is a vector 2 Rn where i is the share of element i. We call a cost allocation\n balanced ifPi2N i = C(N ). Given a cooperative game (N, C) the goal in the cost sharing\nliterature is to \ufb01nd \u201cdesirable\" balanced cost allocations. Most proposals take an axiomatic approach,\nde\ufb01ning a set of axioms that a cost allocation should satisfy. These lead to the concepts of Shapley\nvalue and the core, which we de\ufb01ne next. A useful tool to describe and compute these cost sharing\nconcepts is permutations. We denote by a uniformly random permutation of N and by S* 0, a cost allocation such thatPi2N i = C(N ) is in\n\n\u2022 the probably approximately stable core if PrS\u21e0D\u21e5Pi2S i \uf8ff C(S)\u21e4 1 for all D\n\u2022 the mostly approximately stable core over D if (1 \u270f)Pi2S i \uf8ff C(S) for all S \u2713 N,\n\u2022 the probably mostly approximately stable core if PrS\u21e0D\u21e5(1 \u270f)Pi2S i \uf8ff C(S)\u21e4 \n1 for all D,\n\n(see Balcan et al. [2015]),\n\nFor each of these notions, our goal is to ef\ufb01ciently compute a cost allocation in the approximate core,\nin the following sense.\nDe\ufb01nition 4. A cost allocation is ef\ufb01ciently computable for the class of functions C over distri-\nbution D, if for all C 2 C and any , , \u270f > 0, given C(N ) and m = poly(n, 1/, 1/, 1/\u270f) samples\n(Sj, C(Sj)) with each Sj drawn i.i.d. from distribution D, there exists an algorithm that computes \nwith probability at least 1 over both the samples and the choices of the algorithm.\nWe refer to the number of samples required to compute approximate cores as the sample complexity\nof the algorithm. We \ufb01rst present our result for computing a probably approximately stable core with\nsample complexity that is linear in the number of players, which was also independently discovered\nby Balcan et al. [2016].\nTheorem 1. The class of functions with a non-empty core has cost shares in the probably approxi-\nmately stable core that are ef\ufb01ciently computable. The sample complexity is\n\nO\u2713 n + log(1/)\n\n\n\n\u25c6 .\n\nThe full proof of Theorem 1 is in Appendix B, and can be summarized as follows: We de\ufb01ne a class\nof halfspaces which contains the core. Since we assume that C has a non-empty core, there exists a\ncost allocation in this class of halfspaces that satis\ufb01es both the core property on all the samples\nand the balance property. Given a set of samples, such a cost allocation can be computed with a\nsimple linear program. We then use the VC-dimension of the class of halfspaces to show that the\nperformance on the samples generalizes well to the performance on the distribution D.\n\n5\n\n\fWe next show that the sample complexity dependence on n can be improved from linear to logarithmic\nif we relax the goal from computing a cost allocation in the probably approximately stable core to\ncomputing one in the probably mostly approximately stable core instead. The sample complexity\nof our algorithm also depends on the spread of the function C, de\ufb01ned as maxS C(S)\nminS6=; C(S) (we assume\nminS6=; C(S) > 0).\nTheorem 2. The class of functions with a non-empty core has cost allocations in the probably mostly\napproximately stable core that are ef\ufb01ciently computable with sample complexity\n\n\u270f \u25c62128\u2327 (C)2 log(2n) + 8\u2327 (C)2 log(2/) = O \u2713 \u2327 (C)\n\u2713 1 \u270f\n\u270f \u25c62\n\n(log n + log(1/))! .\n\nwhere \u2327 (C) = maxS C(S)\n\nminS6=; C(S) is the spread of C.\n\nThe full proof of Theorem 2 is in Appendix B. Its main steps are:\n\n1. We \ufb01nd a cost allocation which satis\ufb01es the core property on all samples, restricting the\nsearch to cost allocations with bounded `1-norm. Such a cost allocation can be found\nef\ufb01ciently since the space of such cost allocations is convex.\n\n2. The analysis begins by bounding the `1-norm of any vector in the core (Lemma 3). Combined\nwith the assumption that the core is non-empty, this implies that a cost allocation satisfying\nthe previous conditions exists.\n\n3. Let [x]+ denote the function x 7! max(x, 0). Consider the following \u201closs\" function:\n\n\uf8ffPi2S i\nC(S) 1+\n\nThis loss function is convenient since it is equal to 0 if and only if the core property is\nsatis\ufb01ed for S and it is 1-Lipschitz, which is used in the next step.\n\n4. Next, we bound the difference between the empirical loss and the expected loss for all \nwith a known result using the Rademacher complexity of linear predictors with low `1 norm\nover \u21e2-Lipschitz loss functions (Theorem 10).\n\n5. Finally, given which approximately satis\ufb01es the core property in expectation, we show that\n is in the probably mostly approximately stable core by Markov\u2019s inequality (Lemma 4).\n\nSince we obtained a probably mostly approximately stable core, a natural question is if it is possible\nto compute cost allocations that are mostly approximately stable over natural distributions. The\nanswer is negative in general: even for the restricted class of monotone submodular functions, which\nalways have a solution in the core, the core cannot be mostly approximated from samples, even over\nthe uniform distribution. The full proof of this impossibility theorem is in Appendix B.\nTheorem 3. Cost allocations in the (1/2 + \u270f)-mostly approximately stable core, i.e., such that for\nall S,\n\n\u2713 1\n\n2\n\n+ \u270f\u25c6 \u00b7Xi2S\n\n i \uf8ff C(S),\n\ncannot be computed for monotone submodular functions over the uniform distribution, for any\nconstant \u270f > 0.\n\n4 Approximating the Shapley Value from Samples\n\nWe turn our attention to the STATISTICAL COST SHARING problem in the context of the Shapley\nvalue. Since the Shapley value exists and is unique for all functions, a natural relaxation is to simply\napproximate this value from samples. The distributions we consider in this section are the uniform\ndistribution, and more generally product distributions, which are the standard distributions studied in\nthe learning literature for combinatorial functions Balcan and Harvey [2011], Balcan et al. [2012],\nFeldman and Kothari [2014], Feldman and Vondrak [2014]. It is easy to see that we need some\nrestrictions on the distribution D (for example, if the empty set if drawn with probability one, the\nShapley value cannot be approximated).\n\n6\n\n\fFor submodular functions with bounded curvature, we prove approximation bounds when samples\nare drawn from the uniform or a bounded product distribution, and also show that the bound for\nthe uniform distribution is tight. However, we show that the Shapley value cannot be approximated\nfrom samples even for coverage functions (which are a special case of submodular functions) and\nthe uniform distribution. Since coverage functions are learnable from samples, this implies the\ncounter-intuitive observation that learnability does not imply that the Shapley value is approximable\nfrom samples. We defer the full proofs to Appendix C.\nDe\ufb01nition 5. An algorithm \u21b5-approximates, \u21b5 2 (0, 1], the Shapley value of cost functions C over\ndistribution D, if, for all C 2 C and all > 0, given poly(n, 1/, 1/1\u21b5) samples from D, it computes\nShapley value estimates \u02dcC such that \u21b5i \uf8ff \u02dci \uf8ff 1\n\u21b5 i for all i 2 N such that i 1/ poly(n)1with\nprobability at least 1 over both the samples and the choices made by the algorithm.\nWe consider submodular functions with bounded curvature, a common assumption in the submodular\nmaximization literature Iyer and Bilmes [2013], Iyer et al. [2013], Sviridenko et al. [2015], Vondr\u00e1k\n[2010]. Intuitively, the curvature of a submodular function bounds by how much the marginal\ncontribution of an element can decrease. This property is useful since the Shapley value of an element\ncan be written as a weighted sum of its marginal contributions over all sets.\nDe\ufb01nition 6. A monotone submodular function C has curvature \uf8ff 2 [0, 1] if CN\\{i}(i) (1 \n\uf8ff)C(i) for all i 2 N. This curvature is bounded if \uf8ff < 1.\nAn immediate consequence of this de\ufb01nition is that CS(i) (1 \uf8ff)CT (i) for all S, T such that\ni 62 S [ T by monotonicity and submodularity. The main tool used is estimates \u02dcvi of expected\nmarginal contributions vi = ES\u21e0D|i62S[CS(i)] where \u02dcvi = avg(Si) avg(Si) is the difference\nbetween the average value of samples containing i and the average value of samples not containing i.\nTheorem 4. Monotone submodular functions with bounded curvature \uf8ff have Shapley value that is\np1 \uf8ff \u270f approximable from samples over the uniform distribution, which is tight, and 1 \uf8ff \u270f\nConsider the algorithm which computes \u02dci = \u02dcvi. Note that i = E\n[C(A** (1 \uf8ff \u270f)\u02dcvi where the \ufb01rst inequality is by curvature and the second by\n(1 \uf8ff)vi > 1\uf8ff\nLemma 5 which shows that the estimates \u02dcvi of vi are arbitrarily good. The other direction follows\nsimilarly. The p1 \uf8ff result is the main technical component of the upper bound. We describe two\n1. The expected marginal contribution ES\u21e0U|i62S,|S|=j[CS(i)] of i to a uniformly random set\n2. Since a uniformly random set has size concentrated close to n/2, this implies that roughly\nj=0 ES\u21e0Uj|i62S[CS(i)])/n are greater than vi\n\napproximable over any bounded product distribution for any constant \u270f > 0.\n\nS of size j is decreasing in j, which is by submodularity.\n\nmain steps:\n\nhalf of the terms in the summation i = (Pn1\n\nand the other half of the terms are smaller.\n\nFor the tight lower bound, we show that there exists two functions that cannot be distinguished from\nsamples w.h.p. and that have an element with Shapley value which differs by an \u21b52 factor.\nWe show that the Shapley value of coverage (and submodular) functions are not approximable from\nsamples in general, even though coverage functions are PMAC-learnable ( Balcan and Harvey [2011])\nfrom samples over any distribution Badanidiyuru et al. [2012].\nTheorem 5. There exists no constant \u21b5 > 0 such that coverage functions have Shapley value that is\n\u21b5-approximable from samples over the uniform distribution.\n\n5 Data Dependent Shapley Value\n\nThe general impossibility result for computing the Shapley value from samples arises from the fact\nthat the concept was geared towards the query model, where the algorithm can ask for the cost of any\nset S \u2713 N. In this section, we develop an analogue that is distribution-dependent. We denote it by\nC,D with respect to both C and D. We de\ufb01ne four natural distribution-dependent axioms resembling\n\n1See Appendix C for general de\ufb01nition.\n\n7\n\n\fthe Shapley value axioms, and then prove that our proposed value is the unique solution satisfying\nthem. This value can be approximated arbitrarily well in the statistical model for all functions. The\nproofs are deferred to Appendix D. We start by stating the four axioms.\nDe\ufb01nition 7. The data-dependent axioms for cost sharing functions are:\n\n\u2022 Balance:Pi2N Di = ES\u21e0D[C(S)],\n\u2022 Symmetry: for all i and j, if PrS\u21e0D [|S \\ {i, j}| = 1] = 0 then Di = Dj ,\n\u2022 Zero element: for all i, if PrS\u21e0D [i 2 S] = 0 then Di = 0,\n\u2022 Additivity: for all i, if D1, D2, \u21b5, and such that \u21b5 + = 1, \u21b5D1+D2\nwhere Pr [S \u21e0 \u21b5D1 + D2] = \u21b5 \u00b7 Pr [S \u21e0 D1] + \u00b7 Pr [S \u21e0 D2].\n\ni\n\n= \u21b5D1\n\ni + D2\n\ni\n\nThe similarity to the original Shapley value axioms is readily apparent. The main distinction is that\nwe expect these to hold with regard to D, which captures the frequency with which different coalitions\nS occur. Interpreting the axioms one by one, the balance property ensures that the expected cost is\nalways accounted for. The symmetry axiom states that if two elements always occur together, they\nshould have the same share, since they are indistinguishable. If an element is never observed, then it\nshould have zero share. Finally costs should combine in a linear manner according to the distribution.\nThe data-dependent Shapley value is\n\nDi\n\n:= XS : i2S\n\nPr [S \u21e0 D] \u00b7\n\nC(S)\n|S|\n\n.\n\nInformally, for all set S, the cost C(S) is divided equally between all elements in S and is weighted\nwith the probability that S occurs according to D. The main appeal of this cost allocation is the\nfollowing theorem.\nTheorem 6. The data-dependent Shapley value is the unique value satisfying the four data-dependent\naxioms.\n\nThe data-dependent Shapley value can be approximated from samples with the following empirical\ndata-dependent Shapley value:\n\n\u02dcDi =\n\n1\n\nm XSj : i2Sj\n\nC(Sj)\n|Sj|\n\n.\n\nThese estimates are arbitrarily good with arbitrarily high probability.\nTheorem 7. The empirical data-dependent Shapley value approximates the data-dependent Shapley\nvalue arbitrarily well, i.e.,\n\nwith poly(n, 1/\u270f, 1/) samples and with probability at least 1 for any , \u270f > 0.\n6 Discussion and Future Work\n\n| \u02dcDi Di | < \u270f\n\nWe follow a recent line of work that studies classical algorithmic problems from a statistical per-\nspective, where the input is restricted to a collection of samples. Our results fall into two categories,\nwe give results for approximating the Shapley value and the core and propose new cost sharing\nconcepts that are tailored for the statistical framework. We use techniques from multiple \ufb01elds that\nencompass statistical machine learning, combinatorial optimization, and, of course, cost sharing. The\ncost sharing literature being very rich, the number of directions for future work are considerable. Ob-\nvious avenues include studying other cost sharing methods in this statistical framework, considering\nother classes of functions to approximate known methods, and improving the sample complexity\nof previous algorithms. More conceptually, an exciting modeling question arises when designing\n\u201cdesirable\" axioms from data. Traditionally these axioms only depended on the cost function, whereas\nin this model they can depend on both the cost function and the distribution, providing an interesting\ninterplay.\n\n8\n\n\fReferences\nElliot Anshelevich, Anirban Dasgupta, Jon Kleinberg, Eva Tardos, Tom Wexler, and Tim Rough-\ngarden. The price of stability for network design with fair cost allocation. SIAM Journal on\nComputing, 38(4):1602\u20131623, 2008.\n\nYoram Bachrach, Evangelos Markakis, Ezra Resnick, Ariel D Procaccia, Jeffrey S Rosenschein, and\nAmin Saberi. Approximating power indices: theoretical and empirical analysis. Autonomous\nAgents and Multi-Agent Systems, 20(2):105\u2013122, 2010.\n\nAshwinkumar Badanidiyuru, Shahar Dobzinski, Hu Fu, Robert Kleinberg, Noam Nisan, and Tim\nRoughgarden. Sketching valuation functions. In Proceedings of the twenty-third annual ACM-\nSIAM symposium on Discrete Algorithms, pages 1025\u20131035. Society for Industrial and Applied\nMathematics, 2012.\n\nMaria-Florina Balcan and Nicholas JA Harvey. Learning submodular functions. In Proceedings of\n\nthe forty-third annual ACM symposium on Theory of computing, pages 793\u2013802. ACM, 2011.\n\nMaria-Florina Balcan, Florin Constantin, Satoru Iwata, and Lei Wang. Learning valuation functions.\n\nIn COLT, volume 23, pages 4\u20131, 2012.\n\nMaria-Florina Balcan, Ariel D. Procaccia, and Yair Zick. Learning cooperative games. In Proceedings\nof the Twenty-Fourth International Joint Conference on Arti\ufb01cial Intelligence, IJCAI 2015, Buenos\nAires, Argentina, July 25-31, 2015, pages 475\u2013481, 2015.\n\nMaria-Florina Balcan, Ariel D Procaccia, and Yair Zick. Learning cooperative games. arXiv preprint\n\narXiv:1505.00039v2, 2016.\n\nEric Balkanski and Yaron Singer. Mechanisms for fair attribution. In Proceedings of the Sixteenth\n\nACM Conference on Economics and Computation, pages 529\u2013546. ACM, 2015.\n\nOlga N Bondareva. Some applications of linear programming methods to the theory of cooperative\n\ngames. Problemy kibernetiki, 10:119\u2013139, 1963.\n\nXiaotie Deng, Toshihide Ibaraki, and Hiroshi Nagamochi. Algorithmic aspects of the core of\ncombinatorial optimization games. Mathematics of Operations Research, 24(3):751\u2013766, 1999.\n\nNikhil R Devanur, Milena Mihail, and Vijay V Vazirani. Strategyproof cost-sharing mechanisms for\n\nset cover and facility location games. Decision Support Systems, 39(1):11\u201322, 2005.\n\nShaheen S Fatima, Michael Wooldridge, and Nicholas R Jennings. A linear approximation method\n\nfor the shapley value. Arti\ufb01cial Intelligence, 172(14):1673\u20131699, 2008.\n\nJoan Feigenbaum, Christos Papadimitriou, and Scott Shenker. Sharing the cost of muliticast trans-\nmissions (preliminary version). In Proceedings of the thirty-second annual ACM symposium on\nTheory of computing, pages 218\u2013227. ACM, 2000.\n\nVitaly Feldman and Pravesh Kothari. Learning coverage functions and private release of marginals.\n\nIn COLT, pages 679\u2013702, 2014.\n\nVitaly Feldman and Jan Vondrak. Optimal bounds on approximation of submodular and xos functions\nby juntas. In Information Theory and Applications Workshop (ITA), 2014, pages 1\u201310. IEEE, 2014.\n\nDonald B Gillies. Solutions to general non-zero-sum games. Contributions to the Theory of Games,\n\n4(40):47\u201385, 1959.\n\nMichel X Goemans and Martin Skutella. Cooperative facility location games. Journal of Algorithms,\n\n50(2):194\u2013214, 2004.\n\nIan J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial\n\nexamples. CoRR, abs/1412.6572, 2014. URL http://arxiv.org/abs/1412.6572.\n\nNicole Immorlica, Mohammad Mahdian, and Vahab S Mirrokni. Limitations of cross-monotonic\n\ncost-sharing schemes. ACM Transactions on Algorithms (TALG), 4(2):24, 2008.\n\n9\n\n\fRishabh K Iyer and Jeff A Bilmes. Submodular optimization with submodular cover and submodular\nknapsack constraints. In Advances in Neural Information Processing Systems, pages 2436\u20132444,\n2013.\n\nRishabh K Iyer, Stefanie Jegelka, and Jeff A Bilmes. Curvature and optimal algorithms for learning\nand minimizing submodular functions. In Advances in Neural Information Processing Systems,\npages 2742\u20132750, 2013.\n\nKamal Jain and Vijay V Vazirani. Equitable cost allocations via primal-dual-type algorithms. In\nProceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 313\u2013321.\nACM, 2002.\n\nDavid Liben-Nowell, Alexa Sharp, Tom Wexler, and Kevin Woods. Computing shapley value in\nsupermodular coalitional games. In International Computing and Combinatorics Conference,\npages 568\u2013579. Springer, 2012.\n\nIrwin Mann. Values of large games, IV: Evaluating the electoral college by Montecarlo techniques.\n\nRand Corporation, 1960.\n\nHerv\u00e9 Moulin. Incremental cost sharing: Characterization by coalition strategy-proofness. Social\n\nChoice and Welfare, 16(2):279\u2013320, 1999.\n\nHerv\u00e9 Moulin and Scott Shenker. Strategyproof sharing of submodular costs: budget balance versus\n\nef\ufb01ciency. Economic Theory, 18(3):511\u2013533, 2001.\n\nMartin P\u00e1l and \u00c9va Tardos. Group strategy proof mechanisms via primal-dual algorithms.\n\nIn\nFoundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on, pages\n584\u2013593. IEEE, 2003.\n\nAlvin E Roth. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press,\n\n1988.\n\nShai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to\n\nalgorithms. 2014.\n\nLloyd S Shapley. On balanced sets and cores. Naval research logistics quarterly, 14(4):453\u2013460,\n\n1967.\n\nLS Shapley. A value for n-person games1. 1953.\nMaxim Sviridenko, Jan Vondr\u00e1k, and Justin Ward. Optimal approximation for submodular and\nsupermodular optimization with bounded curvature. In Proceedings of the Twenty-Sixth Annual\nACM-SIAM Symposium on Discrete Algorithms, pages 1134\u20131148. SIAM, 2015.\n\nChristian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow,\nand Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013. URL\nhttp://arxiv.org/abs/1312.6199.\n\nJohn Von Neumann and Oskar Morgenstern. Theory of games and economic behavior. 1944.\nJan Vondr\u00e1k. Submodularity and curvature: the optimal algorithm. RIMS Kokyuroku Bessatsu B, 23:\n\n253\u2013266, 2010.\n\nEyal Winter. The shapley value. Handbook of game theory with economic applications, 3:2025\u20132054,\n\n2002.\n\n10\n\n\f", "award": [], "sourceid": 3149, "authors": [{"given_name": "Eric", "family_name": "Balkanski", "institution": "Harvard University"}, {"given_name": "Umar", "family_name": "Syed", "institution": "Google Research"}, {"given_name": "Sergei", "family_name": "Vassilvitskii", "institution": "Google"}]}*