{"title": "Robust Multi-agent Counterfactual Prediction", "book": "Advances in Neural Information Processing Systems", "page_first": 3083, "page_last": 3093, "abstract": "We consider the problem of using logged data to make predictions about what would happen if we changed the `rules of the game' in a multi-agent system. This task is difficult because in many cases we observe actions individuals take but not their private information or their full reward functions. In addition, agents are strategic, so when the rules change, they will also change their actions. Existing methods (e.g. structural estimation, inverse reinforcement learning) assume that agents' behavior comes from optimizing some utility or that the system is in equilibrium. They make counterfactual predictions by using observed actions to learn the underlying utility function (a.k.a. type) and then solving for the equilibrium of the counterfactual environment. This approach imposes heavy assumptions such as the rationality of the agents being observed and a correct model of the environment and agents' utility functions. We propose a method for analyzing the sensitivity of counterfactual conclusions to violations of these assumptions, which we call robust multi-agent counterfactual prediction (RMAC). We provide a first-order method for computing RMAC bounds. We apply RMAC to classic environments in market design: auctions, school choice, and social choice.", "full_text": "Robust Multi-agent Counterfactual Prediction\n\nAlexander Peysakhovich\u2217\nFacebook AI Research\n\nChristian Kroer\u2217\n\nFacebook Core Data Science\n\nAdam Lerer\u2217\n\nFacebook AI Research\n\nAbstract\n\nWe consider the problem of using logged data to make predictions about what\nwould happen if we changed the \u2018rules of the game\u2019 in a multi-agent system. This\ntask is dif\ufb01cult because in many cases we observe actions individuals take but\nnot their private information or their full reward functions. In addition, agents are\nstrategic, so when the rules change, they will also change their actions. Existing\nmethods (e.g. structural estimation, inverse reinforcement learning) assume that\nagents\u2019 behavior comes from optimizing some utility or that the system is in\nequilibrium. They make counterfactual predictions by using observed actions\nto learn the underlying utility function (a.k.a.\ntype) and then solving for the\nequilibrium of the counterfactual environment. This approach imposes heavy\nassumptions such as the rationality of the agents being observed and a correct\nmodel of the environment and agents\u2019 utility functions. We propose a method\nfor analyzing the sensitivity of counterfactual conclusions to violations of these\nassumptions, which we call robust multi-agent counterfactual prediction (RMAC).\nWe provide a \ufb01rst-order method for computing RMAC bounds. We apply RMAC to\nclassic environments in market design: auctions, school choice, and social choice.\n\n1\n\nIntroduction\n\nConstructing rules that lead optimizing agents to good collective outcomes is the goal of the \ufb01eld\nof mechanism design (Roth and Peranson, 1999; Roth et al., 2005; Abdulkadiro\u02d8glu et al., 2005;\nKlemperer, 2002; Roth, 2002; Porter et al., 2003). Good mechanism design is particularly important\nfor businesses which make their livelihoods as platforms (e.g. internet ad auctions, ride sharing,\ndating sites). A key challenge for designers in practice is to observe an existing set of rules at work\nand make a counterfactual statement about how outcomes would change if the rules changed (Bottou\net al., 2013; Athey, 2015).\nThe multi-agent counterfactual question is dif\ufb01cult for two reasons. First, participants are strategic.\nAn agent\u2019s optimal action can change due to changes in the rules, and often, can change when other\nagents change what they are doing. Second, agents have private information that is not known to the\ndesigner so even knowledge of the rules, and ability to compute optimal actions, is insuf\ufb01cient to\nestimate counterfactual outcomes. The analysis of online ad auctions provides a good example: if we\nobserve data from a series of \ufb01rst-price sealed bid auctions and wanted to predict what would happen\nto revenue if we changed the auction format to second price with a reserve we would need to account\nfor how agent behavior would change in response to these new incentives.\nA common class of approaches to this question assume that observed actions are coming from a\nmulti-agent system where all agents are optimizing some latent reward functions. In other words, that\nthe system is in some form of Nash equilibrium. Further, they assume that once changes are made,\nthe system will again equilibriate. Given these two assumptions, counterfactual prediction becomes a\nquestion of how equilibria change as the mechanism changes. Such assumptions are typical in the\n\n\u2217Equal contribution, author order has been randomized.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f\ufb01eld of inverse reinforcement learning (Ng et al., 2000) and in structural estimation in economics\n(Berry et al., 1995; Athey and Nekipelov, 2010).\nA downside of this approach is that it requires strong assumptions that are not always completely\ntrue in practice. For example, this process requires assuming that agents are optimizing their utility\ngiven the behavior of others so that an analyst can infer underlying \u2018taste\u2019 parameters from agent\nactions. It is well known, however, that human decisions do not always obey the axioms of utility\nmaximization (Camerer et al., 2011) and that both mistakes and biases can persist even when there is\nample opportunity for learning (Erev and Roth, 1998; Fudenberg and Peysakhovich, 2016).\nThe main contribution of this work is a method that computes a robust interval of counterfactual\nestimates under relaxations of the assumptions of rationality and correct speci\ufb01cation of the model.\nOur \ufb01rst contribution is to show that the counterfactual estimation problem corresponds to identifying\nequilibria in a game which we call a revelation game, and that the set of \u0001-equilibria correspond\nto counterfactual predictions when assumptions are relaxed. We consider particular \u0001-equilibria of\nthe revelation game - the \u2018worst\u2019 and \u2018best\u2019 elements of the \u0001-equilibrium set with respect to some\nevaluation function (e.g. revenue). These equilibria form the upper and lower bounds for our robust\nmulti-agent counterfactual prediction (RMAC). 2 We show that computing the RMAC bounds exactly\nis a dif\ufb01cult problem as it is NP-hard even for 2-player Bayesian games.\nAs our second contribution, we propose a \ufb01rst-order method which we refer to as revelation game\n\ufb01ctitious play (RFP) to compute the RMAC bounds and discuss its convergence properties.\nOur third contribution is to apply RFP to generate RMAC in three domains of interest for mechanism\ndesigners: auctions, matching, and (in the Appendix) social choice. In each of them we \ufb01nd that some\ncounterfactual predictions are much more robust than others. We also demonstrate that RMAC can\nbe applied even when standard assumptions about point identi\ufb01cation do not hold (e.g. when there\nare multiple equilibria or when the data is consistent with multiple type distributions) to compute\noptimistic and pessimistic counterfactual predictions.\n\n1.1 Related Work\n\nOur work is closely related to the notion of partial identi\ufb01cation (Manski, 2003). The main idea\nbehind partial identi\ufb01cation is that many statistical models are only able to recover a set of parameters\nconsistent with the data, not a single point estimate. The PI literature focuses on models where this\n\u2018identi\ufb01ed set\u2019 can be extracted easily. The revelation game is strongly related in that the equilibrium\nrelaxation we employ makes the counterfactual predictions a set rather than a point. We focus on\n\ufb01nding this set\u2019s worst (in terms of some evaluation function) and best elements.\nExisting work in the \ufb01eld of market design has used econometric techniques to estimate counterfactu-\nals in speci\ufb01c applications (Athey and Nekipelov, 2010; Chawla et al., 2017; Agarwal, 2015). These\napproaches are, like ours, designed with the goal of answering counterfactual questions. However,\nwhile they allow for measures of statistical uncertainty they do not allow analysts to check for\nrobustness of conclusions to violations of assumptions. Haile and Tamer (2003) consider using\n\u2018incomplete\u2019 models of auctions to provide some form of robustness but, like much of the literature\non the econometrics of auctions (and unlike RMAC), requires hand-deriving estimators speci\ufb01cally\ntailored to the auction at hand.\nSince the pioneering work of Myerson (1981) there is a large sub\ufb01eld of game theory dedicated to\ndesigning mechanisms that optimize some quantity (e.g. seller revenue). Myerson-style results often\nrequire the auctioneer to know the distribution of types (valuations) in the population. These strong\nassumptions are relaxed in robust mechanism design (Bergemann and Morris, 2005), automated\nmechanism design (Conitzer and Sandholm, 2002), and recent work in using deep learning methods\nto approximate optimal mechanisms (D\u00fctting et al., 2017; Feng et al., 2018). Optimal mechanism\ndesign is related to, but different from, the RMAC problem as it typically assumes access to at least\nsome direct information about the distribution of types, whereas the RMAC problem is to robustly\n\n2The RMAC bounds are different from standard uncertainty bounds (e.g. the standard error of a maximum\nlikelihood estimator). Statistical uncertainty bounds (i.e. standard errors) re\ufb02ect variance introduced by access\nto only \ufb01nite data but still assume the underlying model is completely correct. On the other hand, our robustness\nbounds are intended to measure error that can come from the analyst using a model that is precisely incorrect but\napproximately true.\n\n2\n\n\finfer the underlying types from observed actions. However, these problems are related and combining\ninsights from these literatures with RMAC is an interesting direction for future work.\nThere is recent interest in relaxing equilibrium assumptions in structural models. Nekipelov et al.\n(2015) consider replacing equilibrium assumptions with the assumption that individuals are no-regret\nlearners. This, again, gives a set valued solution concept which can be worked out explicitly for the\nspecial case of auctions. Given the prominence of no-regret learning in algorithmic game theory a\nnatural extension of the work in this paper is to consider expanding RMAC to learning as a solution\nconcept.\n\n2 Bayesian Games\n\nWe consider the standard one-shot Bayesian game setup. There are N players which each have a type\n\u03b8i \u2208 \u0398 drawn from an unknown distribution F. This type is assumed to represent their preferences\nand private information. For example, in the case of auctions this type describes the valuations of\neach player for each object.\nDe\ufb01nition 1. A game G has a set of actions for each player Ai with generic element ai. After each\nplayer chooses their action, the players receive utilities given by uG\n\ni (a1, . . . , aN , \u03b8i).\n\nWe focus on systems that come to a stable state, in particular we assume that they form a Bayesian\nNash equilibrium. We denote a strategy \u03c3i for player i in game G as a mapping which takes as input\n\u03b8i and outputs an action ai. As standard for a vector x of variables, one for each player, we let xi be\nthe variable for player i and x\u2212i be the vector for everyone other than i.\nDe\ufb01nition 2. An Bayesian Nash equilibrium (BNE) is a strategy pro\ufb01le \u03c3\u2217 such that for each player\ni, all possible types \u03b8i for that player which have positive probability under F, and any other strategy\n\u03c3(cid:48)\ni we have\n\n\u2212i(\u03b8\u2212i), \u03b8i)(cid:3) \u2265 EF(cid:2)uG\n\nEF(cid:2)uG\n\ni (\u03c3\u2217\n\ni (\u03b8i), \u03c3\u2217\n\n\u2212i(\u03b8\u2212i), \u03b8i)(cid:3).\n\ni (\u03c3(cid:48)\n\ni(\u03b8i), \u03c3\u2217\n\nThe Bayesian Nash equilibrium (BNE) assumption can be motivated by, for example, assuming that\nrepeated play (with rematching) have led learning agents to converge to such a state (Fudenberg and\nLevine, 1998; Dekel et al., 2004; Hartline et al., 2015). Importantly, BNE states that players\u2019 actions\nare optimal given the distribution of partners they could play, not necessarily that they are optimal at\neach realization of the game with types \ufb01xed.\nFor the purposes of lightening notation from here on we will deal with games where every player\u2019s\naction set is the same Ai = A and every players\u2019 type is drawn iid from F.\n\n3 The Revelation Game as a Counterfactual Estimator\n\nGiven the formal setup above, we now turn to answering our main question:\nQuestion 1. Suppose we have a dataset D of actions played in G. What can we say about what\nwould happen if we changed the underlying game to G(cid:48)?\nFormally, when we say that we change the game to G(cid:48) we mean that the action set changes to A(cid:48) and\ni (a1, . . . , aN , \u03b8i). G(cid:48) remains a Bayesian game so the de\ufb01nitions\nthe utility functions change to uG(cid:48)\nand notation above continue to apply.\nAs a concrete example: in the case of online advertising auctions, D will contain a series of auctions\nwith bids taken by different participants. We may wish to ask, what would happen if we changed the\nauction format? It is important to note here that D only contains actions played in the game and not\ntypes (which are never observed by the analyst).\nWe now discuss a set of assumptions typically made either implicitly or explicitly in existing literature.\nWe will refer to these as the standard assumptions.\nAssumption 1 (Equilibrium). Data is drawn from a BNE of G and play in G(cid:48) will form a BNE.\nAssumption 2 (Identi\ufb01cation). For any possible distribution of types F and associated BNE \u03c3\u2217 there\ndoes not exist another distribution of types F(cid:48) and BNE \u03c3(cid:48)\u2217 that induces the same distribution of\nactions.\n\n3\n\n\fAssumption 3 (Uniqueness in G(cid:48)). Given F there is a unique BNE in G(cid:48).\nIf the standard assumptions are satis\ufb01ed then the counterfactual question can be answered as follows.\nBy Assumption 1 each action di is optimal against the distribution of actions implied by D. If D is\nlarge enough then it approximates the true distribution implied by \u03c3 and F. By Assumption 2, there\nis a unique \u03c3 and F that give rise to this distribution. Therefore, F can be estimated using D, and the\nequilibrium in G(cid:48), which is unique by Assumption 3, can be estimated using standard methods.\nWe now show this procedure is equivalent to solving for the Nash equilibrium in a modi\ufb01ed game\nwhich we refer to as a revelation game.3 We do not consider that agents will actually play this game,\nrather we will show that this proxy game is a useful abstraction for doing robust counterfactual\ninference.\nThe revelation game has m players, one for each element of D. We refer to these as data-players to\navoid confusion with the players in G and G(cid:48). Each data-player knows that the analyst has a random\nvariable D of actions from the equilibrium of G. D includes the data-player\u2019s own true equilibrium\naction but the other actions are ex-ante unknown. Each data-player has a true type \u03b8j which is\nunknown to the analyst, the types of the other data-players \u2212j are unknown to j but it is commonly\nknown that they are drawn from the distribution F.\nEach data-player j makes a decision: they report a type \u02c6\u03b8j and an action for the counterfactual game\n\u02c6aj. They are paid as follows: \ufb01rst, let the D\u2212j denote the random variable which denotes the actions\nof the other data-players the analyst will observe. Now we de\ufb01ne the G-Regret of data-player j as\n\nRegretG\n\nj (\u02c6\u03b8j,D\u2212j) = maxaj\n\nWe de\ufb01ne the G(cid:48)\u2212Regret of data-player j as\nj (\u02c6aj, \u02c6\u03b8j, \u02c6a\u2212j) = maxaj\n\nRegretG(cid:48)\n\nE(cid:2)uG\nj (aj, \u02c6\u03b8j,D\u2212j)(cid:3) \u2212 E(cid:2)uG\nj (aj, \u02c6\u03b8j, \u02c6a\u2212j)(cid:3) \u2212 E(cid:2)uG(cid:48)\nE(cid:2)uG(cid:48)\n\nj (dj, \u02c6\u03b8j,D\u2212j)(cid:3).\nj (\u02c6aj, \u02c6\u03b8j, \u02c6a\u2212j)(cid:3).\n\nThe revelation game is a Bayesian game where each data-player j tries to minimize a loss given by\nthe max of the two above regrets:\n\nLrev\n\nj\n\n(\u02c6\u03b8j, \u02c6aj, \u02c6a\u2212j,D) = max{RegretG\n\nj (dj, \u02c6\u03b8j,D), RegretG(cid:48)\n\nj (\u02c6aj, \u02c6\u03b8j, \u02c6a\u2212j)}.\n\nGiven these de\ufb01nitions, we can show the following property:\nTheorem 1. If assumptions 1-3 are satis\ufb01ed then the revelation game has a unique BNE where each\nagent reveals their true type and counterfactual action.\n\nWe leave the proof of the theorem to the Appendix. This property means that if we can solve for the\nequilibrium of the revelation game, then we have our counterfactual prediction. With this result in\nhand, we now discuss how to modify the revelation game to make our counterfactual predictions\nrobust.\n\n4 Robust Multi-agent Counterfactual Inference\n\nIn reality, assumptions 1-3 above are rarely satis\ufb01ed exactly and we would like to see how robust\nconclusions are to violations of these assumptions. In addition, all modeling makes the important\nassumption\nAssumption 4 (Speci\ufb01cation). G and G(cid:48) include the correct speci\ufb01cations of individuals\u2019 reward\nfunctions.\n\nwhich, like the others, is rarely completely true in practice.\nTo relax all of these assumptions we will consider the concept of \u0001-BNE. \u0001-BNE requires that,\ngiven the behavior of individuals \u2212i, the decision of each individual i yields at most \u0001 re-\ngret relative to the optimal strategy. Formally this replaces the inequality in de\ufb01nition 2 by\n\n\u2212i(\u03b8\u2212i), \u03b8i)(cid:3) \u2265 EF(cid:2)uG\n\ni (\u03c3\u2217\n\ni (\u03b8i), \u03c3\u2217\n\n\u2212i(\u03b8\u2212i), \u03b8i)(cid:3) \u2212 \u0001.\n\ni (\u03c3(cid:48)\n\ni(\u03b8i), \u03c3\u2217\n\nEF(cid:2)uG\n\n3We are indebted to Jason Hartline who pointed out in an earlier versions of this work that our optimization\n\nproblem can be thought of as equilibrium \ufb01nding and thus make exposition much simpler.\n\n4\n\n\fAllowing for \u0001-BNE in the revelation game means that we are also allowing for \u0001-BNE in G and\nG(cid:48) since the revelation game loss is de\ufb01ned as the maximum of the two regrets. The introduction\nof \u0001-BNE is how we relax assumptions 1-4. In the Appendix we give a longer and more formal\ntreatment of the relationship between \u0001-BNE and the assumptions. Informally, notice that \u0001-equilibria\ncan arise because agents are imperfect optimizers4 (but are able to learn to avoid actions that cause\nhuge negative regret) or because the utility functions in G or G(cid:48) are slightly incorrect (and individuals\nreach an equilibrium corresponding to some other reward function).\nHowever, like many instances of partial identi\ufb01cation Manski (2003) \u0001-BNE is a set valued solution\nconcept. Rather than enumerate the whole set, we will consider particular boundary equilibria:\nWe assume the existence of an evaluation function V (\u03b8, a) which gives us a scalar evaluation of\nthe counterfactual outcome that the analyst cares about. We overload notation and let V (\u03c3) =\nE(\u03b8,a)\u223c\u03c3V (\u03b8, a) be the expected value of V given a mixed strategy \u03c3. Common examples of\nvaluation function used in the mechanism design literature include revenue, ef\ufb01ciency, fairness, envy,\nstability, strategy-proofness, or some combination of them (Roth and Sotomayor, 1992; Guruswami\net al., 2005; Budish, 2011; Caragiannis et al., 2016).\nWe will consider the maximal and minimal elements of the \u0001-BNE set with respect to V . Formally:\nDe\ufb01nition 3. The \u0001-pessimistic counterfactual prediction of V is\n\nV (\u03c3) s.t. \u03c3 is an \u0001-BNE in the revelation game.\n\ninf\n\u03c3\n\nThe \u0001-optimistic prediction replaces the inf with sup. The \u0001-RMAC bounds are the values of V attained\nat the pessimistic and optimistic predictions.\n\nThe \ufb01gure to the right summarizes the idea be-\nhind RMAC. The structural assumptions imply\na one-to-one mapping between observed dis-\ntributions and underlying types followed by a\none-to-one mapping between underlying types\nand counterfactual behavior. Assuming only\n\u0001-equilibrium makes both of these mappings\none-to-many and RMAC bounds select the most\noptimistic and pessimistic counterfactual distri-\nbutions consistent with these mappings.\n\n5 Computing Equilibria of the Revelation Game\nIn practice, we can replace the random variable D of the revelation game with their sample analogue,\nthe observed data. From here forward D will refer to the sample data. Unfortunately, we can derive a\nnegative complexity result for computing \u0001-RMAC bounds exactly:\nTheorem 2. It is NP-hard to compute the robust counterfactual estimate even if each data-point j\nhas only a single feasible type, and there are only two data points. It is also NP-hard even if there is\nno objective function, a \ufb01nite number of feasible types, and G(cid:48) has only two players.\n\nThe proof is provided in the Appendix. In the Appendix we also provide a mathematical program\nwith equilibrium constraints for the case of pure-strategy \u0001-BNE, and a mixed integer program for the\nspecial case of two-player games.\nGiven that computing RMAC bounds for the general case requires solving a mathematical program\nwith equilibrium constraints, we do not expect it to scale beyond small instances. Therefore, we\npropose to adapt the \ufb01ctitious play algorithm Brown (1951) to compute the optimistic and pessimistic\nequilibria of the revelation game. We refer to this as Revelation Game Fictitious Play (RFP).\nRFP works as follows. For notation, let \u02c6\u03b8t\ni be the estimated type for data point i at iteration t and \u02c6at\ni\nbe the estimated counterfactual action at iteration t. As with standard \ufb01ctitious play, at each time step\n\n4Here the \u0001 term is readily interpretable. For example, if our underlying game is an auction where bids are in\nUS dollars then \u0001 measures how many dollars an individual is giving up by playing their action instead of the\nbest response.\n\n5\n\n\feach i takes an action in the revelation game (i.e. reports a type-action pair). They observe the choices\nof others and update their t + 1 choice (\u02c6\u03b8t+1\n) to be the one that minimizes (or maximizes) V\nout of the set of \u0001 best responses to the current history of play (when \u0001 = 0 RFP simply chooses the\nbest response to the current history, breaking ties randomly). The pseudocode is shown in Algorithm\n1.\n\n, \u02c6at+1\n\ni\n\ni\n\nAlgorithm 1 Revelation Fictitious Play\n\nInput: \u0001,D, V,G,G(cid:48), if pessimistic then \u03b1 = \u22121, if optimistic then \u03b1 = 1\nRandomly initialize \u02c6\u03b80\nfor t = 0, . . . while not converged do\n\ni , \u02c6a0\ni\n\nLet \u00afat\u2212i be the historical distribution of \u02c6at(cid:48)\nLet \u00af\u03c3t\u2212i be the (mixed) strategy pro\ufb01le implied by the historical distribution of (\u02c6\u03b8t(cid:48)\nLet the set of low-regret revelation game actions be\n\u02c6Ct\ni = {(\u02c6\u03b8i, \u02c6ai) \u2208 \u2126 \u00d7 A | Lrev\n\n\u2212i for t(cid:48) \u2208 {0, . . . , t}\n\ni\n\n\u2212i, \u02c6at(cid:48)\n\u2212i)\n\nBreaking ties randomly, update guesses for each datapoint\n\n(\u02c6\u03b8i, \u02c6ai, \u00afat\u2212i,D) \u2264 \u0001}\n(cid:104)\n(cid:105)\n\n\u03b1V (\u02c6\u03b8i, \u02c6ai, \u00af\u03c3t\u2212i)\n\n.\n\n(\u02c6\u03b8t+1\n\ni\n\n, \u02c6at+1\n\ni\n\n) = argmax\u02c6\u03b8i,\u02c6ai\u2208 \u02c6Ct\n\ni\n\nIt is well-known that \ufb01ctitious play converges in 2-player zero-sum and potential games, while it\nmay cycle in general. Nonetheless, a well-known result states that if \ufb01ctitious play converges, then it\nconverges to a Nash equilibrium (Fudenberg and Levine, 1998).\nWe now show an analogous result for RFP: if pessimistic (optimistic) RFP converges then it converges\nto an \u0001-BNE and locally minimizes (maximizes) V in the sense that no unilateral deviation by a single\ndata-player j in the revelation game that are strictly \u0001-best responses leads to a smaller (bigger) V .\nWe denote by \u00af\u03c3t the mixed strategy implied by the history of play. As with standard \ufb01ctitious play\nwe consider convergence of \u00af\u03c3t:\nDe\ufb01nition 4. RFP converges to a mixed strategy \u03c3\u2217 if limt\u2192\u221e \u00af\u03c3t = \u03c3\u2217.\nWe use the following notion of local optimality (analogously de\ufb01ned for optimistic V):\nDe\ufb01nition 5. A mixed \u0001-BNE \u03c3\u2217 of the revelation game is locally V-optimal if\n\nV (\u03c3\u2217) \u2264 V (\u03b8j, aj, \u03c3\u2217\nfor any data-player j and unilateral deviation (\u03b8j, aj) where5\n\n\u2212j)\n\nE(\u03b8\u2212j ,a\u2212j )\u223c\u03c3\u2217\n\u2212j\n\n[Lrev\n\nj\n\n(\u03b8j, aj, a\u2212j,D)] < \u0001.\n\nTheorem 3. If RFP converges to \u03c3\u2217 then \u03c3\u2217 is a locally V-optimal \u0001-BNE of the revelation game.\n\nWe relegate the proof to the Appendix. The argument is an extension of standard \ufb01ctitious play results\nto the revelation game.\nAn important question is whether RFP can be guaranteed to converge in particular classes of Bayesian\ngames. We leave the theoretical study of RFP (or other learning algorithms in the revelation game) to\nfuture work and focus the rest of the paper on empirical evaluation.\n\n6 Experiments\n\nWe now turn to constructing RMAC bounds for classic problems in market design. In the next two\nsections we discuss auctions and school choice. In the Appendix we consider two other experiments:\n1) an auction setting where point identi\ufb01cation is impossible and 2) social choice.\n\n5Note the strict inequality: the reason is that there may be deviations which have strictly greater than \u0001 regret\n\nfor all t, but their regret converges to \u0001 from above, and so they enter the set at the limit.\n\n6\n\n\f6.1 RMAC in Auctions\n\nWe \ufb01rst evaluate RMAC by studying counterfactual revenue in auctions. We consider a \ufb01rst-price\n2-player auction G with types drawn from [0, 1] uniformly and bids in the interval [0, 1] discretized\nat intervals of .01. As our counterfactual games we consider a 2-player second-price auction with\nvarying reserves6 in the interval [0, 1] and N player \ufb01rst-price auctions.\nWe use counterfactual expected revenue as our evaluation\nfunction. We set the domain of possible types to also be\nequal to [0, 1]. We generate data by \ufb01rst sampling 1000\nindependent types and their actions from the closed form\n\ufb01rst-price equilibrium (bid = .5\u03b8), using these actions\nas D. We then use D to compute \u0001-RMAC predictions\nfor several levels of \u0001. Figure 1 shows our results with\n(small) error bars being shown as standard deviations of\nthe statistic over replicates.\nIn Figure 1, we see that in auctions, even slight changes\nto \u0001 can lead to larger changes in revenue. In particular,\nif we consider that the average expected utility accrued to\nthe winner in the 2 player auction is .25, an \u0001 of .01 cor-\nresponds only to a 4% misoptimization/misspeci\ufb01cation.\nHowever, this small \u0001 still gives quite wide revenue\nbounds.\nTo see the logic behind this lack of robustness, consider the\npessimistic estimate, in which the data is drawn from an \u0001-\nequilibrium where individuals are overbidding in the orig-\ninal game and underbidding in the counterfactual game.\nAssuming a uniform bid distribution for others, an indi-\nvidual\u2019s regret for (unilaterally) shading their bid by \u2206 is\n\u0001 = \u22062/2. This will decrease expected revenue decrease\nby \u2206. Therefore, we expect a worst-case \u0001-equilibrium in\nthe counterfactual game to decrease revenue by\n2\u0001. In\naddition, there will be a similar decrease in revenue from\nthe shift in types inferred from the original game.\nIn an additional experiment in the Appendix we further\nshow that the robustness of counterfactual estimates for\nchanging auction reserve price are assymetric. Speci\ufb01cally,\ncounterfactual estimates for increasing the reserve are robust, while estimates for decreasing reserve\nare not.\n\nFigure 1: RMAC revenue predictions using\ndata drawn from the equilibrium of a \ufb01rst\nprice 2 player auction for various counterfac-\ntual auction formats. The RMAC robustness\nbounds, even with small \u0001 are much larger\nthan the standard error bounds (grey ribbon\naround RMAC 0 line) estimated from multi-\nple replicates.\n\n\u221a\n\n6.2 RMAC in School Choice\n\nWe move to another commonly studied domain: school choice. Here the problem is to assign items\n(schools) to agents (students). Agents have preferences over schools, report them, and the output of\nthe mechanism is an assignment.\nWe look at two real world school choice mechanisms. The \ufb01rst is the Boston mechanism (Abdulka-\ndiro\u02d8glu et al., 2005). In Boston each student reports their rank order list and the mechanism tries\nto maximize the number of \ufb01rst choice assignments that it can. Once it has done this, it tries to\nmaximize the number of second-choice assignments, and so on. The second mechanism uses the\nrandom serial dictatorship (RSD) mechanism (Abdulkadiro\u02d8glu and S\u00f6nmez, 1998). Here students\nare each given a random number and sorted, the \ufb01rst in line gets to choose their favorite school, the\nsecond chooses their favorite among what\u2019s left and so on.\nThe main tradeoff in practical school choice comes from balancing the total social welfare achieved\nby the mechanism and their truthfulness. RSD (and other algorithms like student-proposing deferred\n\n6A reserve price r in an auction is a price \ufb02oor, individuals cannot win the auction if they bid below the\nreserve. In addition, in the case of second-price auctions, the price paid by the winner is the max of r (as long as\nr is less than the bid) and the second-highest bid.\n\n7\n\n0.00.20.40.60.000.250.500.75ReserveCounterfactual RevenueEpsilon00.0010.01Second Price with Reserve0.250.500.752345N BiddersCounterfactual RevenueEpsilon00.0010.01First Price with N Bidders\facceptance) have a dominant strategy for each agent to report their true type. This means that\nparticipants in real world implementations of such mechanisms do not need to spend cognitive effort\non guessing what others might do or searching for information - they can simply tell the truth and go\non with their day. On the other hand, equilibria of the Boston mechanism can be more ef\ufb01cient in\nterms of allocating schools to students but in equilibrium need not be truthful (Mennle and Seuken,\n2015; Abdulkadiro\u02d8glu et al., 2011).\nWe consider a problem mechanism with 3 students and 3\nschools (A, B, C). For both mechanisms the action space\nis a permutation over A, B, C.\nWe consider a hypothesis space of types that are permuta-\ntions of utility vector (5, 4, 0) - that is, individuals receive\nutility 5 if they get their \ufb01rst choice, 4 for the second and\n0 for the third. We are going to consider the case where\nall individuals have identical preferences of A > B > C.\nWe will take these types, construct a dataset of equilib-\nrium behavior under each mechanism, and ask what would\nhappen if we switched to the other mechanism.\nWe examine two evaluation functions V : overall social\nwelfare of the allocation and truthfulness of the strategies\n(i.e. whether types report their true values). We plot the\nestimated change in welfare and truthfulness from moving\nfrom one mechanism to another. This is an exercise that a\nmarket designer might perform in order to justify a change\nof mechanism.\nNote that in the case of \u2018Boston to RSD\u2019 at \u0001 = 0 the\nstandard structural assumptions are not satis\ufb01ed, as mul-\ntiple type distributions are consistent with the observed\nactions. Given our utility space, even though everyone has\nthe same preferences, same types may choose different\nactions (i.e. play a mixed strategy), since it is better to\nbe assured of getting B than take a lottery between A,\nB and C. So, some proportion of individuals will report\n(B, A, C) However, such an action pro\ufb01le is also consis-\ntent with an equilibrium of truthful types with different\npreferences. Since the types are not identi\ufb01ed from the\nobserved actions, structural estimation using maximum\nlikelihood has multiple optima with different values of V .\nHowever, RMAC with small \u0001 will produce an interval\nthat covers both possible type distributions. As shown in Figure 2, switching from Boston to (truthful)\nRSD may increase truthfulness by 26% (if the types are indeed all A > B > C) or 0% (if the types\nmatched the actions in Boston). Going from Boston to RSD also tends to lead to welfare decreases,\nalthough not always (e.g. if players have identical preferences, all mechanisms provide the same\nwelfare). The counterfactual question in the other direction, RSD to Boston, has far tighter RMAC\nbounds because the types are well-speci\ufb01ed by the truthful RSD mechanism.\n\nFigure 2: RMAC intervals for the change\nin social welfare and change in truthfulness\nfrom changing school choice mechanisms.\nDark and light curves are for 10th and 90th\npercentile of estimated intervals over repli-\ncates with different sampled D. The presence\nof multiple type distributions consistent with\na given action distribution in Boston means\nthat even for small \u0001 RMAC bounds can be\nquite wide for Boston to RSD.\n\n7 Conclusion\n\nMulti-agent counterfactual prediction is an important question both in theory and practice. We have\nintroduced RMAC as a way of testing the robustness of counterfactual predictions with respect to\nviolations of the standard assumptions of speci\ufb01cation, equilibrium, point identi\ufb01cation are not met.\nOur method applies a version of \ufb01ctitious play but it is well known that modi\ufb01cations to this algorithms\ncan lead to large changes in real world performance (Conitzer and Sandholm, 2007; Syrgkanis et al.,\n2015; Kroer et al., 2015). In addition, more complex environments would require multi-agent learning\nalgorithms that can handle function approximation such as those based on deep learning (Heinrich\nand Silver, 2016; D\u00fctting et al., 2017; Lowe et al., 2017; Feng et al., 2018; Brown et al., 2018).\n\n8\n\n\fReferences\nAtila Abdulkadiro\u02d8glu, Yeon-Koo Che, and Yosuke Yasuda. 2011. Resolving con\ufb02icting preferences\nin school choice: The\" boston mechanism\" reconsidered. American Economic Review 101, 1\n(2011), 399\u2013410.\n\nAtila Abdulkadiro\u02d8glu, Parag A Pathak, Alvin E Roth, and Tayfun S\u00f6nmez. 2005. The Boston public\n\nschool match. American Economic Review 95, 2 (2005), 368\u2013371.\n\nAtila Abdulkadiro\u02d8glu and Tayfun S\u00f6nmez. 1998. Random serial dictatorship and the core from\n\nrandom endowments in house allocation problems. Econometrica 66, 3 (1998), 689\u2013701.\n\nNikhil Agarwal. 2015. An empirical model of the medical match. American Economic Review 105, 7\n\n(2015), 1939\u201378.\n\nSusan Athey. 2015. Machine learning and causal inference for policy evaluation. In Proceedings of\nthe 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM,\n5\u20136.\n\nSusan Athey and Denis Nekipelov. 2010. A structural model of sponsored search advertising auctions.\n\nIn Sixth ad auctions workshop, Vol. 15.\n\nDirk Bergemann and Stephen Morris. 2005. Robust mechanism design. Econometrica 73, 6 (2005),\n\n1771\u20131813.\n\nSteven Berry, James Levinsohn, and Ariel Pakes. 1995. Automobile prices in market equilibrium.\n\nEconometrica: Journal of the Econometric Society (1995), 841\u2013890.\n\nL\u00e9on Bottou, Jonas Peters, Joaquin Qui\u00f1onero-Candela, Denis X Charles, D Max Chickering, Elon\nPortugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and\nlearning systems: The example of computational advertising. The Journal of Machine Learning\nResearch 14, 1 (2013), 3207\u20133260.\n\nGeorge W Brown. 1951. Iterative solution of games by \ufb01ctitious play. Activity analysis of production\n\nand allocation 13, 1 (1951), 374\u2013376.\n\nNoam Brown, Adam Lerer, Sam Gross, and Tuomas Sandholm. 2018. Deep Counterfactual Regret\n\nMinimization. arXiv preprint arXiv:1811.00164 (2018).\n\nEric Budish. 2011. The combinatorial assignment problem: Approximate competitive equilibrium\n\nfrom equal incomes. Journal of Political Economy 119, 6 (2011), 1061\u20131103.\n\nColin F Camerer, George Loewenstein, and Matthew Rabin. 2011. Advances in behavioral economics.\n\nPrinceton university press.\n\nIoannis Caragiannis, David Kurokawa, Herv\u00e9 Moulin, Ariel D Procaccia, Nisarg Shah, and Junxing\nWang. 2016. The unreasonable fairness of maximum Nash welfare. In Proceedings of the 2016\nACM Conference on Economics and Computation. ACM, 305\u2013322.\n\nShuchi Chawla, Jason D Hartline, and Denis Nekipelov. 2017. Mechanism Redesign. arXiv preprint\n\narXiv:1708.04699 (2017).\n\nVincent Conitzer and Tuomas Sandholm. 2002. Complexity of mechanism design. In Proceedings of\nthe Eighteenth conference on Uncertainty in arti\ufb01cial intelligence. Morgan Kaufmann Publishers\nInc., 103\u2013110.\n\nVincent Conitzer and Tuomas Sandholm. 2007. AWESOME: A general multiagent learning algorithm\nthat converges in self-play and learns a best response against stationary opponents. Machine\nLearning 67, 1-2 (2007), 23\u201343.\n\nEddie Dekel, Drew Fudenberg, and David K Levine. 2004. Learning to play Bayesian games. Games\n\nand Economic Behavior 46, 2 (2004), 282\u2013303.\n\nPaul D\u00fctting, Zhe Feng, Harikrishna Narasimhan, and David C Parkes. 2017. Optimal auctions\n\nthrough deep learning. arXiv preprint arXiv:1706.03459 (2017).\n\n9\n\n\fIdo Erev and Alvin E Roth. 1998. Predicting how people play games: Reinforcement learning in\nexperimental games with unique, mixed strategy equilibria. American economic review (1998),\n848\u2013881.\n\nZ Feng, H Narasimhan, and DC Parkes. 2018. Optimal auctions through deep learning. AAMAS\n\n(2018).\n\nDrew Fudenberg and David K Levine. 1998. The theory of learning in games. Vol. 2. MIT press.\n\nDrew Fudenberg and Alexander Peysakhovich. 2016. Recency, records, and recaps: Learning and\nnonequilibrium behavior in a simple decision problem. ACM Transactions on Economics and\nComputation (TEAC) 4, 4 (2016), 23.\n\nVenkatesan Guruswami, Jason D Hartline, Anna R Karlin, David Kempe, Claire Kenyon, and Frank\nMcSherry. 2005. On pro\ufb01t-maximizing envy-free pricing. In Proceedings of the sixteenth annual\nACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics,\n1164\u20131173.\n\nPhilip A Haile and Elie Tamer. 2003. Inference with an incomplete model of English auctions.\n\nJournal of Political Economy 111, 1 (2003), 1\u201351.\n\nJason Hartline, Vasilis Syrgkanis, and Eva Tardos. 2015. No-regret learning in Bayesian games. In\n\nAdvances in Neural Information Processing Systems. 3061\u20133069.\n\nJohannes Heinrich and David Silver. 2016. Deep reinforcement learning from self-play in imperfect-\n\ninformation games. arXiv preprint arXiv:1603.01121 (2016).\n\nPaul Klemperer. 2002. What really matters in auction design. Journal of economic perspectives 16, 1\n\n(2002), 169\u2013189.\n\nChristian Kroer, Kevin Waugh, Fatma Kilin\u00e7-Karzan, and Tuomas Sandholm. 2015. Faster \ufb01rst-order\nmethods for extensive-form game solving. In Proceedings of the Sixteenth ACM Conference on\nEconomics and Computation. ACM, 817\u2013834.\n\nRyan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017.\nMulti-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural\nInformation Processing Systems. 6379\u20136390.\n\nCharles F Manski. 2003. Partial identi\ufb01cation of probability distributions. Springer Science &\n\nBusiness Media.\n\nTimo Mennle and Sven Seuken. 2015. Trade-offs in school choice: comparing deferred acceptance,\n\nthe Na\u0131ve and the adaptive Boston mechanism.\n\nRoger B Myerson. 1981. Optimal auction design. Mathematics of operations research 6, 1 (1981),\n\n58\u201373.\n\nDenis Nekipelov, Vasilis Syrgkanis, and Eva Tardos. 2015. Econometrics for learning agents. In\n\nProceedings of the Sixteenth ACM Conference on Economics and Computation. ACM, 1\u201318.\n\nAndrew Y Ng, Stuart J Russell, et al. 2000. Algorithms for inverse reinforcement learning.. In Icml.\n\n663\u2013670.\n\nDavid Porter, Stephen Rassenti, Anil Roopnarine, and Vernon Smith. 2003. Combinatorial auction\n\ndesign. Proceedings of the National Academy of Sciences 100, 19 (2003), 11153\u201311157.\n\nAlvin E Roth. 2002. The economist as engineer: Game theory, experimentation, and computation as\n\ntools for design economics. Econometrica 70, 4 (2002), 1341\u20131378.\n\nAlvin E Roth and Elliott Peranson. 1999. The redesign of the matching market for American\nphysicians: Some engineering aspects of economic design. American economic review 89, 4\n(1999), 748\u2013780.\n\nAlvin E Roth, Tayfun S\u00f6nmez, et al. 2005. A kidney exchange clearinghouse in New England.\n\nAmerican Economic Review 95, 2 (2005), 376\u2013380.\n\n10\n\n\fAlvin E Roth and Marilda Sotomayor. 1992. Two-sided matching. Handbook of game theory with\n\neconomic applications 1 (1992), 485\u2013541.\n\nVasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E Schapire. 2015. Fast convergence of\nregularized learning in games. In Advances in Neural Information Processing Systems. 2989\u20132997.\n\n11\n\n\f", "award": [], "sourceid": 1749, "authors": [{"given_name": "Alexander", "family_name": "Peysakhovich", "institution": "Facebook"}, {"given_name": "Christian", "family_name": "Kroer", "institution": "Columbia University"}, {"given_name": "Adam", "family_name": "Lerer", "institution": "Facebook AI Research"}]}