{"title": "Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand", "book": "Advances in Neural Information Processing Systems", "page_first": 2321, "page_last": 2329, "abstract": "In many large economic markets, goods are sold through sequential auctions. Such domains include eBay, online ad auctions, wireless spectrum auctions, and the Dutch flower auctions. Bidders in these domains face highly complex decision-making problems, as their preferences for outcomes in one auction often depend on the outcomes of other auctions, and bidders have limited information about factors that drive outcomes, such as other bidders' preferences and past actions. In this work, we formulate the bidder's problem as one of price prediction (i.e., learning) and optimization. We define the concept of stable price predictions and show that (approximate) equilibrium in sequential auctions can be characterized as a profile of strategies that (approximately) optimize with respect to such (approximately) stable price predictions. We show how equilibria found with our formulation compare to known theoretical equilibria for simpler auction domains, and we find new approximate equilibria for a more complex auction domain where analytical solutions were heretofore unknown.", "full_text": "Approximating Equilibria in Sequential Auctions with\n\nIncomplete Information and Multi-Unit Demand\n\nAmy Greenwald and Eric Sodomka\n\nDepartment of Computer Science\n\nBrown University\n\nProvidence, RI 02912\n\n{amy,sodomka}@cs.brown.edu\n\nJiacui Li\n\nDepartment of Applied Math/Economics\n\nBrown University\n\nProvidence, RI 02912\n\njiacui li@alumni.brown.edu\n\nAbstract\n\nIn many large economic markets, goods are sold through sequential auctions.\nExamples include eBay, online ad auctions, wireless spectrum auctions, and the\nDutch \ufb02ower auctions. In this paper, we combine methods from game theory and\ndecision theory to search for approximate equilibria in sequential auction domains,\nin which bidders do not know their opponents\u2019 values for goods, bidders only par-\ntially observe the actions of their opponents\u2019, and bidders demand multiple goods.\nWe restrict attention to two-phased strategies: \ufb01rst predict (i.e., learn); second,\noptimize. We use best-reply dynamics [4] for prediction (i.e., to predict other bid-\nders\u2019 strategies), and then assuming \ufb01xed other-bidder strategies, we estimate and\nsolve the ensuing Markov decision processes (MDP) [18] for optimization. We\nexploit auction properties to represent the MDP in a more compact state space,\nand we use Monte Carlo simulation to make estimating the MDP tractable. We\nshow how equilibria found using our search procedure compare to known equilib-\nria for simpler auction domains, and we approximate an equilibrium for a more\ncomplex auction domain where analytical solutions are unknown.\n\n1\n\nIntroduction\n\nDecision-making entities, whether they are businesses, governments, or individuals, usually interact\nin game-theoretic environments, in which the \ufb01nal outcome is intimately tied to the actions taken\nby others in the environment. Auctions are examples of such game-theoretic environments with\nsigni\ufb01cant economic relevance. Internet advertising, of which a signi\ufb01cant portion of transactions\ntake place through online auctions, has had spending increase 24 percent from 2010 to 2011, globally\nbecoming an $85 billion industry [16]. The FCC has conducted auctions for wireless spectrum since\n1994, reaching sales of over $60 billion.1 Perishable commodities such as \ufb02owers are often sold via\nauction; the Dutch \ufb02ower auctions had about $5.4 billion in sales in 2011.2\nA game-theoretic equilibrium, in which each bidder best responds to the strategies of its opponents,\ncan be used as a means of prescribing and predicting auction outcomes. Finding equilibria in auc-\ntions is potentially valuable to bidders, as they can use the resulting strategies as prescriptions that\nguide their decisions, and to auction designers, as they can use the resulting strategies as predictions\nfor bidder behavior. While a rich literature exists on computing equilibria for relatively simple auc-\ntion games [11], auction theory offers few analytical solutions for real-world auctions. Even existing\ncomputational methods for approximating equilibria quickly become intractable as the number of\nbidders and goods, and the complexity of preferences and decisions, increase.\n\n1See http://wireless.fcc.gov/auctions/default.htm?job=auctions_all.\n2See http://www.floraholland.com/en/.\n\n1\n\n\fIn this paper, we combine methods from game theory and decision theory to approximate equilibria\nin sequential auction domains, in which bidders do not know their opponents\u2019 values for goods,\nbidders partially observe the actions of their opponents\u2019, and bidders demand multiple goods. Our\nmethod of searching for equilibria is motivated by the desire to reach strategies that real-world\nbidders might actually use. To this end, we consider strategies that consist of two parts: a prediction\n(i.e., learning) phase and an optimization phase. We use best-reply dynamics [4] for prediction (i.e.,\nto predict other bidders\u2019 strategies), and then assuming \ufb01xed other-bidder strategies, we estimate\nand solve a Markov decision processes (MDP) [18] for optimization. We exploit auction properties\nto represent the MDPs in a more compact state space, and we use Monte Carlo simulation to make\nestimating the MDPs tractable.\n\n2 Sequential Auctions\n\n1, . . . , bk\n\nok\u22121\n\ni\n\n1, . . . , ok\n\ni , given bk.\n\ni \u2208 {\u2205,{k}}, and how much she paid, ck\n\ni | bk) express the probability that player i receives signal ok\n\nWe focus on sequential sealed-bid auctions, with a single good being sold at each of K rounds. The\nnumber of bidders n and the order in which goods are sold are assumed to be common knowledge.\ni \u2208 Bi to the auctioneer. We let\nDuring auction round k, each bidder i submits a private bid bk\nn(cid:105) denote the vector of bids submitted by all bidders at round k. The bidder who\nbk = (cid:104)bk\nsubmits the highest bid wins and is assigned a cost based on a commonly known payment rule.\ni \u2208 Oi to each bidder i,\nAt the end of round k, the auctioneer sends a private (or public) signal ok\nwhich is a tuple specifying information about the auction outcome for round k, such as the winning\nbid, the bids of all agents, the winner identities, whether or not a particular agent won the good, or\nany combination thereof. Bidders only observe opponents\u2019 bids if those bids are announced by the\nauctioneer. Regardless, we assume that bidder i is told at least which set of goods she won in the\ni \u2208 R. We let \u03c8(ok | bk) \u2208 [0, 1] denote\nkth round, wk\nn(cid:105) given bk, and we let\nthe probability that the auctioneer sends the bidders signals ok = (cid:104)ok\n\u03c8(ok\nAn auction history at round k consists of past bids plus all information communicated by the auc-\ntioneer though round k \u2212 1. Let hk\n)(cid:105) be a possible auction history at\nround k as observed by bidder i. Let Hi be the set of all possible auction histories for bidder i.\nEach bidder i is endowed with a privately known type \u03b8i \u2208 \u0398i, drawn from a commonly known\ndistribution F , that determines bidder i\u2019s valuations for various bundles of goods. A (behavioral)\nstrategy \u03c3i : \u0398 \u00d7 Hi (cid:55)\u2192 (cid:52)Bi for bidder i speci\ufb01es a distribution over bids for each possible type\nand auction history. The set \u03a3i contains all possible strategies.\nAt the end of the K auction rounds, bidder i\u2019s utility is based on the bundle of goods she won\nand the amount she paid for those goods. Let X \u2286 {1, . . . , K} be a possible bundle of goods,\nand let v(X; \u03b8i) denote a bidder\u2019s valuation for bundle X when its type is \u03b8i. No assumptions\nare made about the structure of this value function. A bidder\u2019s utility for type \u03b8i and history hK\nafter K auction rounds is simply that bidder\u2019s value for the bundle of goods it won minus its cost:\nui(\u03b8i, hK\nGiven a sequential auction \u0393 (de\ufb01ned by all of the above), bidder i\u2019s objective is to choose a strategy\nthat maximizes its expected utility. But this quantity depends on the actions of other bidders. A\nstrategy pro\ufb01le (cid:126)\u03c3 = (\u03c31,\u00b7\u00b7\u00b7 , \u03c3N ) = (\u03c3i, \u03c3\u2212i) de\ufb01nes a strategy for each bidder. (Throughout\nthe paper, subscript i refers to a bidder i while \u2212i refers to all bidders except i.) Let Ui((cid:126)\u03c3) =\nE\n\u03b8i,hK\n\ni )] denote bidder i\u2019s expected utility given strategy pro\ufb01le (cid:126)\u03c3.\n\ni ; \u03b8i) \u2212(cid:80)K\n\ni ), . . . , (bk\u22121\n\ni\n\ni ) = v(\u222aK\n\ni = (cid:104)(b1\n\ni , o1\n\nk=1wk\n\ni .\nk=1 ck\n\ni |(cid:126)\u03c3[ui(\u03b8i, hK\n\nDe\ufb01nition 1 (\u0001-Bayes-Nash Equilibrium (\u0001-BNE)). Given a sequential auction \u0393, a strategy pro\ufb01le\n(cid:126)\u03c3 \u2208 \u03a3 is an \u0001-Bayes-Nash-equilibrium if Ui((cid:126)\u03c3) + \u0001 \u2265 Ui(\u03c3(cid:48)\n\ni, \u03c3\u2212i) \u2200i \u2208 {1, . . . , n}, \u2200\u03c3(cid:48)\n\ni \u2208 \u03a3i.\n\nIn an \u0001-Bayes-Nash Equilibrium, each bidder has to come within an additive factor (\u0001) of best-\nresponding to its opponent strategies. A Bayes-Nash equilibrium is an \u0001-Bayes-Nash equilibrium\nwhere \u0001 = 0. In this paper, we explore techniques for \ufb01nding \u0001-BNE in sequential auctions. We also\nexplain how to experimentally estimate the so-called \u0001-factor of a strategy pro\ufb01le:\nDe\ufb01nition 2 (\u0001-Factor). Given a sequential auction \u0393, the \u0001-factor of strategy pro\ufb01le (cid:126)\u03c3 for bidder\ni, \u03c3\u2212i) \u2212 Ui(\u03c3i, \u03c3\u2212i). In words, the \u0001-factor measures bidder i\u2019s loss in\ni is \u0001i((cid:126)\u03c3) = max\u03c3(cid:48)\nexpected utility for not playing his part of (cid:126)\u03c3 when other bidders are playing their parts.\n\nUi(\u03c3(cid:48)\n\ni\n\n2\n\n\f3 Theoretical Results\n\ni ) if s = hK\ni\n\nis a history of length K; 0 otherwise}, and transition function T .\n\nAs the number of rounds, bidders, possible types, or possible actions in a sequential auction in-\ncreases, it quickly becomes intractable to \ufb01nd equilibria using existing computational methods. Such\nreal-world intractability is one reason bidders often do not attempt to solve for equilibria, but rather\noptimize with respect to predictions about opponent behavior. Building on past work [2, 8], our \ufb01rst\ncontribution is to fully represent the decision problem for a single bidder i in a sequential auction \u0393\nas a Markov decision process (MDP).\nDe\ufb01nition 3 (Full-history MDP). A full-history MDP Mi(\u0393, \u03b8i, T ) represents the sequential auction\n\u0393 from bidder i\u2019s perspective, assuming i\u2019s type is \u03b8i, with states S = Hi, actions A = Bi, rewards\nR(s) = {ui(\u03b8i, hK\nIf bidder types are correlated, bidder i\u2019s type informs its beliefs about opponents\u2019 types and thus\nopponents\u2019 predicted behavior. For notational and computational simplicity, we assume that bidder\ntypes are drawn independently, in which case there is one transition function T regardless of bidder\ni\u2019s type. We also assume that bidders are symmetric, meaning their types are all drawn from the same\ndistribution. When bidders are symmetric, we can restrict our attention to symmetric equilibria,\nwhere a single set of full-history MDPs, one per type, is solved on behalf of all bidders.\nDe\ufb01nition 4 (MDP Assessment). An MDP assessment (\u03c0, T ) for a sequential auction \u0393 is a set of\npolicies {\u03c0\u03b8i | \u03b8i \u2208 \u0398i}, one for each full-history MDP Mi(\u0393, \u03b8i, T ).\nWe now explain where the transition function T comes from. At a high level, we de\ufb01ne (symmetric)\ninduced transition probabilities Induced(\u03c0) to be the transition probabilities that result from agent\ni using Bayesian updating to infer something about its opponents\u2019 private information, and then\nreasoning about its opponents\u2019 subsequent actions, assuming they all follow policy \u03c0. The following\nexample provides some intuition for this process.\nExample 1. Consider a \ufb01rst-price sequential auction with two rounds, two bidders, two possible\ntypes (\u201cH\u201d and \u201cL\u201d) drawn independently from a uniform prior (i.e., p(H) = 0.5 and p(L) = 0.5),\nand two possible actions (\u201chigh\u201d and \u201clow\u201d). Suppose Bidder 2 is playing the following simple\nstrategy: if type H: bid \u201chigh\u201d with probability .9, and bid \u201clow\u201d with probability .1; if type L: bid\n\u201chigh\u201d with probability .1, and bid \u201clow\u201d with probability .9.\nAt round k = 1, from the perspective of Bidder 1, the only uncertainty that exists is about Bidder 2\u2019s\ntype. Bidder 1\u2019s beliefs about Bidder 2\u2019s type is based solely on the type prior, resulting in beliefs\nthat Bidder 2 will bid \u201chigh\u201d and \u201clow\u201d each with equal probability. Suppose Bidder 1 bids \u201clow\u201d\nand loses to Bidder 2, who the auctioneer reports as having bid \u201chigh\u201d. At round k = 2, Bidder\n1 must update its posterior beliefs about Bidder 2 after observing the given outcome. This is done\nusing Bayes\u2019 rule to \ufb01nd that Bidder 2 is of type \u201cH\u201d with probability 0.9. Based on its policy, in\nthe subsequent round, the probability Bidder 2 bids \u201chigh\u201d is 0.9(0.9) + 0.1(0.1) = 0.82, and the\nprobability it bids \u201clow\u201d is 0.9(0.1) + 0.1(0.9) = 0.18. Given this bid distribution for Bidder 2,\nBidder 1 can compute her probability of transitioning to various future states for each possible bid.\n\nMore formally, denoting sk\nde\ufb01ne Pr(sk+1\nstate sk\n\ni as agent i\u2019s state and action at auction round k, respectively,\ni was taken in\ni . By twice applying the law of total probability and then noting conditional independencies,\n\ngiven that action ak\n\ni and ak\n\n| sk\n\ni\n\nPr(sk+1\n\ni\n\n| sk\n\ni , ak\n\ni ) =\n\ni\n\n| sk\n\nPr(sk+1\n\ni ) to be the probability of reaching state sk+1\ni , ak\n(cid:88)\n(cid:88)\n(cid:88)\n\ni , ak\u2212i) Pr(ak\u2212i | sk\n\n(cid:88)\n(cid:88)\n\n(cid:88)\n(cid:88)\n\nPr(sk+1\n\ni , ak\ni )\n\ni , ak\n\ni , ak\n\n| sk\n\nak\u2212i\n\nak\u2212i\n\nsk\u2212i\n\n\u03b8\u2212i\n\ni\n\ni\n\ni\n\n=\n\n=\n\n(cid:124)\n\n\u03b8\u2212i\n\n| sk\n\ni , ak\n\nPr(sk+1\n\ni , sk\u2212i, \u03b8\u2212i) Pr(sk\u2212i, \u03b8\u2212i | sk\n\ni , ak\u2212i, sk\u2212i, \u03b8\u2212i) Pr(ak\u2212i | sk\n(cid:125) Pr(sk\u2212i, \u03b8\u2212i | sk\n(cid:124)\n(cid:125) Pr(ak\u2212i | sk\u2212i, \u03b8\u2212i)\n(cid:124)\n(cid:123)(cid:122)\nindependent given that agent\u2019s state at round k: Pr(ak\u2212i | sk\u2212i, \u03b8\u2212i) = (cid:81)\n(cid:81)\n\nThe \ufb01rst term in Equation 1 is de\ufb01ned by the auction rules and depends only on the actions taken at\ni | ak). The second term is a joint distribution over oppo-\nround k: Pr(sk+1\nnents\u2019 actions given opponents\u2019 private information. Each agent\u2019s action at round k is conditionally\nj , \u03b8j) =\nj ). The third term is the joint distribution over opponents\u2019 private information,\n\ni , ak\u2212i) = \u03c8(ok\n\nj(cid:54)=i Pr(ak\n\nj | sk\n\nj | sk\n\nj(cid:54)=i \u03c0\u03b8j (ak\n\n| sk\n\ni , ak\u2212i)\n\ni , ak\n\ni , ak\ni )\n\ni , ak\ni )\n\n(cid:123)(cid:122)\n\n(cid:123)(cid:122)\n\ni , ak\n\nak\u2212i\n\nsk\u2212i\n\n(cid:125)\n\n(1)\n\ni\n\n3\n\n\fgiven agent i\u2019s observations. This term can be computed using Bayesian updating. We compute\ninduced transition probabilities Induced(\u03c0)(sk\nDe\ufb01nition 5 (\u03b4-Stable MDP Assessment). An MDP assessment (\u03c0, T ) for a sequential auction \u0393 is\ncalled \u03b4-stable if d(T, Induced(\u03c0)) < \u03b4, for some symmetric distance function d.\n\n) using Equation 1.\n\ni , sk+1\n\ni , ak\n\ni\n\n\u03b8i,hK\n\ni |\u03c0,T [ui(\u03b8i, hK\n\nWhen \u03b4 = 0, the induced transition probabilities exactly equal the transition probabilities from the\nMDP assessment (\u03c0, T ), meaning that if all agents follow (\u03c0, T ), the transition function T is correct.\nDe\ufb01ne Ui(\u03c0, T ) \u2261 E\ni )] to be the expected utility for following an MDP assess-\nment\u2019s policy \u03c0 when the transition function is T . (We abbreviate Ui by U because of symmetry.)\nDe\ufb01nition 6 (\u03b1-Optimal MDP Assessment). An MDP assessment (\u03c0, T ) for a sequential auction \u0393\nis called \u03b1-optimal if for all policies \u03c0(cid:48), U (\u03c0, T ) + \u03b1 \u2265 U (\u03c0(cid:48), T ).\nIf each agent is playing a 0-optimal (i.e., optimal) 0-stable (i.e., stable) MDP assessment for the\nsequential auction \u0393, each agent is best responding to its beliefs, and each agent\u2019s beliefs are correct.\nIt follows that any optimal stable MDP assessment for the sequential auction \u0393 corresponds to\na symmetric Bayes-Nash equilibrium for \u0393. Corollary 2 (below) generalizes this observation to\napproximate equilibria.3\nSuppose we have a black box that tells us the difference in perceived versus actual expected utility\nfor optimizing with respect to the wrong beliefs: i.e., the wrong transition function. More precisely,\nif we were to give the black box two transition functions T and T (cid:48) that differ by at most \u03b4 (i.e.,\nd(T, T (cid:48)) < \u03b4), the black box would return max\u03c0 |U (\u03c0, T ) \u2212 U (\u03c0, T (cid:48))| \u2261 D(\u03b4).\nTheorem 1. Given such a black box, if (\u03c0, T ) is an \u03b1-optimal \u03b4-stable MDP assessment for the\nsequential auction \u0393, then \u03c0 is a symmetric \u0001-Bayes-Nash equilibrium for \u0393, where \u0001 = 2D(\u03b4) + \u03b1.\nProof. Let T\u03c0 = Induced(\u03c0), and let \u03c0\u2217 be such that (\u03c0\u2217, T\u03c0) is an optimal MDP assessment.\n\nU (\u03c0, T\u03c0) \u2265 U (\u03c0, T ) \u2212 D(\u03b4)\n\n\u2265 U (\u03c0\u2217, T ) \u2212 (\u03b1 + D(\u03b4))\n\u2265 U (\u03c0\u2217, T\u03c0) \u2212 (\u03b1 + 2D(\u03b4))\n\n(2)\n(3)\n(4)\n\nLines 2 and 4 hold because (\u03c0, T ) is \u03b4-stable. Line 3 holds because (\u03c0, T ) is \u03b1-optimal.\nCorollary 2. If (\u03c0, T ) is an \u03b1-optimal \u03b4-stable MDP assessment for the sequential auction \u0393, then\n\u03c0 is a symmetric \u0001-Bayes-Nash equilibrium for \u0393, where \u0001 = 2\u03b4K + \u03b1.\n\nIn particlar, when the distance between other-agent bid predictions and the actual other-agent bids\ninduced by the actual other-agent policies is less than \u03b4, optimizing agents play a 2\u03b4K-BNE.\nThis corollary follows from the simulation lemma in Kakade et al. [9], which provides us\nthen |U (\u03c0, T ) \u2212\nwith a black box.4\n)| and\ni , ak\n\nU (\u03c0, Induced(\u03c0))| \u2264 \u03b4K, where d(T, T (cid:48)) = (cid:80)\n\nif MDP assessment (\u03c0, T ) is \u03b4-stable,\n\n) \u2212 T (cid:48)(sk\n\nIn particular,\n\ni , sk+1\n\ni , sk+1\n\nK is the MDP\u2019s horizon.\nWellman et al. [24] show that, for simultaneous one-shot auctions, optimizing with respect to pre-\ndictions about other-agent bids is an \u0001-Bayes-Nash equilibrium, where \u0001 depends on the distance\nbetween other-agent bid predictions and the actual other-agent bids induced by the actual other-\nagent strategies. Corollary 2 is an extension of that result to sequential auctions.\n\nsk+1\ni\n\n|T (sk\n\ni , ak\n\ni\n\ni\n\n4 Searching for an \u0001-BNE\n\nWe now know that an optimal, stable MDP assessment is a BNE, and moreover, a near-optimal,\nnear-stable MDP assessment is nearly a BNE. Hence, we propose to search for approximate BNE\nby searching the space of MDP assessments for any that are nearly optimal and nearly stable.\n\n3Note that this result also generalizes to non-symmetric equilibria: we would calculate a vector of induced\ntransition probabilities (one per bidder), given a vector of MDP assessments, (one per bidder), instead of assum-\ning that each bidder abides by the same assessment. Similarly, stability would need to be de\ufb01ned in terms of a\nvector of MDP assessments. We present our theoretical results in terms of symmetric equilibria for notational\nsimplicity, and because we search for symmetric equilibria in Section 5.\n\n4Slightly adjusted since there is error only in the transition probabilities, not in the rewards.\n\n4\n\n\fOur search uses an iterative two-step learning process. We \ufb01rst \ufb01nd a set of optimal policies \u03c0 with\nrespect to some transition function T (i.e., \u03c0 = Solve MDP(T )) using dynamic programming, as\ndescribed by Bellman\u2019s equations [1]. We then update the transition function T to re\ufb02ect what would\nhappen if all agents followed the new policies \u03c0 (i.e., T \u2217 = Induced(\u03c0)). More precisely,\n\n1. Initiate the search from an arbitrary MDP assessment (\u03c00, T 0)\n2. Initialize t = 1 and \u0001 = \u221e\n3. While (t < \u03c4 ) and (\u0001 > \u03ba)\n\n(a) PREDICT: T t = Induced(\u03c0t\u22121)\n(b) OPTIMIZE: for all types \u03b8i, \u03c0t = Solve MDP(\u03b8i, T t)\n(c) Calculate \u0001 \u2261 \u0001(\u03c0\u03c4 ) (de\ufb01ned below)\n(d) Increment t\n\n4. Return MDP assessment (\u03c0\u03c4 , T \u03c4 ) and \u0001\n\nThis learning process is not guaranteed to converge, so upon termination, it could return an optimal,\n\u03b4-stable MDP assessment for some very large \u03b4. However, it has been shown to be successful exper-\nimentally in simultaneous auction games [24] and other large games of imperfect information [7].\n\nMonte Carlo Simulations Recall how we de\ufb01ne induced transition functions (Equation 1). In\npractice, the Bayesian updating involved in this calculation is intractable. Instead, we employ Monte\nCarlo simulations. First, we further simplify Equation 1 using the law of total probability and noting\nconditional independencies (Equation 5). Second, we exploit some special structure of sequential\nauctions: if nothing but the winning price at each round is revealed, conditional on reaching state sk\ni ,\nthe posterior distribution over highest opponent bids is suf\ufb01cient for computing the probability of\nthat round\u2019s outcome (Equation 6).5 Third, we simulate N auction trajectories for the given policy\n\u03c0 and multiple draws from the agent\u2019s type distribution, and count the number of times each highest\nopponent bid occurs at each state (Equation 7):\n\nInduced(\u03c0)(sk\n\ni , ak\n\ni , sk+1\n\ni\n\nInducedN (\u03c0)(sk\n\ni , ak\n\ni , sk+1\n\ni\n\n) = Pr(sk+1\n= Pr(sk+1\n\ni\n\ni\n\n| sk\n| sk\n\ni , max ak\u2212i)Pr(max ak\u2212i | sk\ni , ak\ni )\ni , max ak\u2212i)Pr(max ak\u2212i | sk\ni )\n\ni , ak\ni , ak\n\n) = \u03c8(ok\n\ni | max(ak\u2212i), ak\ni )\n\n#(max(ak\u2212i), sk\ni )\n\n#(sk\ni )\n\n(5)\n(6)\n\n(7)\n\nSolving the MDP As previously stated, we solve the MDPs exactly using dynamic programming,\nbut we can only do so because we exploit the structure of auctions to reduce the number of states\nin each MDP. Recall that we assume symmetry: i.e., all bidders\u2019 types are drawn from the same\ndistribution. Under this assumption, when the auctioneer announces that an Bidder j has won an\nauction for the \ufb01rst time, this provides the same information as if a different Bidder k won an auction\nfor the \ufb01rst time. We thus collapse these two outcomes into the same state. This can greatly decrease\nthe MDP state space, particularly if the number of players n is larger than the number of auctions\nK, as is often the case in competitive markets. In fact, by handling this symmetry, the MDP state\nspace is the same for any number of players n \u2265 K.6 Second, we exploit the property of losing bid\nsymmetry: if a bidder i loses with a bid of b or a bid of b(cid:48), its beliefs about its opponents bids are\nunchanged, and thus it receives the same reward for placing the same bid at either resulting state.\n\n5A distribution over the next round\u2019s highest opponent bid is only suf\ufb01cient without the possibility of ties.\nIn ties can occur, a distribution over the number of opponents placing that highest bid is also needed. In our\nexperiments, we do not maintain such a distribution; if there is a tie, the agent in question wins with probability\n0.5 (i.e., we assume it tied with only one opponent).\n\n6Even when n < K, the state space can still be signi\ufb01cantly reduced, since instead of n different possible\nwinner identities in the kth round, there are only min(n; k + 1). In the extreme case of n = 2, there is no\nwinner identity symmetry to exploit, since n = k + 1 even in the \ufb01rst round.\n\n5\n\n\f\u03b8i,hK\n\nUi(\u03c0(cid:48)\n\ni, \u03c0\u2212i) \u2212 Ui(\u03c0i, \u03c0\u2212i).\n\ni\n\ni |(cid:126)\u03c0[ui(\u03b8i, hK\n\n\u0001-factor Approximation De\ufb01ne Ui((cid:126)\u03c0) = E\ni )] to be bidder i\u2019s expected utility\ni when each agent plays its part in the vector of MDP assessment policies (cid:126)\u03c0. Following Def-\ninition 2, the \u0001-factor measures bidder i\u2019s loss in expected utility for not playing his part of (cid:126)\u03c0\nwhen other bidders are playing their parts: \u0001i((cid:126)\u03c0) = max\u03c0(cid:48)\nIn fact,\nsince we are only interested in \ufb01nding symmetric equilibria, where (cid:126)\u03c0 = (\u03c0, . . . , \u03c0), we calculate\n\u0001(\u03c0) = max\u03c0(cid:48) U (\u03c0(cid:48), (cid:126)\u03c0\u2212i) \u2212 U (\u03c0, (cid:126)\u03c0\u2212i).\nThe \ufb01rst term in this de\ufb01nition is the expected utility of the best response, \u03c0\u2217, to (cid:126)\u03c0\u2212i. This quan-\ntity typically cannot be computed exactly, so instead, we compute a near-best response \u02c6\u03c0\u2217\nN =\nSolve MDP(InducedN (\u03c0)), which is optimal with respect to InducedN (\u03c0) \u2248 Induced(\u03c0),\nand then measure the gain in expected utility of deviating from \u03c0 to \u02c6\u03c0\u2217\nN .\nFurther, we approximate expected utility through Monte Carlo simulation. Speci\ufb01cially, we compute\nl=1 u(\u03b8l, hl) by sampling (cid:126)\u03b8 and simulating (\u03c0\u03b8, . . . , \u03c0\u03b8) L times, and then averaging\n\u02c6UL((cid:126)\u03c0) = 1\nL\nbidder i\u2019s resulting utilities. Thus, we approximate \u0001(\u03c0) by \u02c6\u0001(\u03c0) \u2248 \u02c6UL(\u02c6\u03c0\u2217\nThe approximation error in \u02c6\u0001(\u03c0) comes from both imprecision in InducedN (\u03c0), which depends on\nthe sample size N, and imprecision in the expected utility calculation, which depends on the sample\nsize L. The latter is O(\nL) by the central limit theorem, and can be made arbitrarily small. (In\nour experiments, we plot the con\ufb01dence bounds of this error to make sure it is indeed small.) The\nformer arises because \u02c6\u03c0\u2217\nN is not truly optimal with respect to Induced(\u03c0), and goes to zero as N\ngoes to in\ufb01nity by standard reinforcement learning results [20]. In practice we make sure that N is\nlarge enough so that this error is negligible.\n\nN , (cid:126)\u03c0\u2212i) \u2212 \u02c6UL(\u03c0, (cid:126)\u03c0\u2212i).\n\n(cid:80)L\n\n\u221a\n\n5 Experimental Results\n\nThis section presents the results of running our iterative learning method on three auction mod-\nels studied in the economics literature: Katzman [10], Weber [23], and Menezes and Monteiro\n[14]. These models are all two-round, second-price, sequential auctions7, with continuous valua-\ntion spaces; they differ only in their speci\ufb01c choice of valuations. The authors analytically derive\na symmetric pure strategy equilibrium for each model, which we attempt to re-discover using our\niterative method. After discretizing the valuation space, our method is suf\ufb01ciently general to apply\nimmediately in all three settings.\nAlthough these particular sequential auctions are all second price, our method applies to sequential\nauctions with other rules as well. We picked this format because of the abundance of corresponding\ntheoretical results and the simplicity of exposition in two-round auctions. It is a dominant strategy to\nbid truthfully in a one-shot second-price auction [22]; hence, when comparing policies in two-round\nsecond-price auctions it suf\ufb01ces to compare \ufb01rst-round policies only.\nStatic Experiments We \ufb01rst run one iteration of our learning procedure to check whether the\nderived equilibria are strict. In other words, we check whether Solve MDP(InducedN (\u03c0E)) =\n\u03c0E, where \u03c0E is a (discretized) derived equilibrium strategy. For each of the three models, Figures\n1(a)\u20131(c) compare \ufb01rst-round bidding functions of the former (blue) with the latter (green).\nOur results indicate that the equilibria derived by Weber and Katzman are indeed strict, while that\nby Menezes and Monteiro (MM) is not, since there exists a set of best-responses to the equilibrium\nstrategy, not a unique best-response. We con\ufb01rm analytically that the set of bids output by our\nlearning procedure are best-responses to the theoretical equilibrium, with the upper bound being the\nknown theoretical equilibrium strategy and the lower bound being the black dotted line.8 To our\nknowledge, this instability was previously unknown.\nDynamic Experiments Since MM\u2019s theoretical equilibrium is not strict, we apply our iterative\nlearning procedure to search for more stable approximate equilibria. Our procedure converges within\na small number of iterations to an \u0001-BNE with a small \u0001 factor, and the convergence is robust across\ndifferent initializations. We chose initial strategies \u03c00 parametrized by p \u2208 R+ that bid xp when\nthe marginal value of winning an additional good is x. By varying the exponent p, we initialize the\nlearning procedure with bidding strategies whose level of aggressiveness varies.\n\n7Weber\u2019s model can be extended to any number of rounds, but is unit, not multi-unit, demand.\n8These analytical derivations are included in supplemental material.\n\n6\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 1: Comparison of \ufb01rst-round bidding functions of theoretical equilibrium strategies (green) and that of\nthe best response from one step of the iterative learning procedure initialized with those equilibrium strategies\n(blue). (a) Weber. (b) Katzman. (c) MM.\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 2: Convergence properties of the learning procedure in two-round MM model with 3 agents. (a),(b)\nevaluates convergence through L1 distance of \ufb01rst-round bidding functions; (c) compares the learned best\nresponse (blue) with different learning procedure initializations (green). (d) plots evolution of estimated \u0001-\nfactor for learning dynamics with one speci\ufb01c initialization; plots for other initializations look very similar.\nThe bracketed values in the legend give the 99% con\ufb01dence bound for the \u0001-factor in the \ufb01nal iteration, which\nis estimated using more sample points (N = L = 109) than previous iterations (N = L = 106).\n\nOur iterative learning procedure is not guaranteed to converge. Nonetheless, in this experiment,\nour procedure not only converges with different initialization parameters p (Figure 2(a)), but also\nconverges to the same solution regardless of initial conditions (Figure 2(b)). The distance measure\nd(\u03c0, \u03c0(cid:48)) between two strategies \u03c0, \u03c0(cid:48) in these \ufb01gures is de\ufb01ned as the L1 distance of their respective\n\ufb01rst-round bidding functions. Furthermore, the more economically meaningful measure of \u0001(\u03c0),\nmeasured by \u02c6\u0001(\u03c0), converges quickly to a negligible factor smaller than 1\u00d7 10\u22124, which is less than\n0.01% of the expected bidder pro\ufb01t (Figure 2(d)).\nAll existing theoretical work on Bayesian sequential auctions with multi-unit demand is con\ufb01ned\nto two-round cases due to the increased complexity of additional rounds, but our method removes\nthis constraint. We extend the two-round MM model into a three-round auction model,9 and apply\nour learning procedure. It requires more iterations for our algorithm to converge in this set up, but it\nagain converges to a rather stable \u0001-BNE regardless of initial conditions. The \ufb01nal \u0001-factor is smaller\nthan 0.5% of expected bidder pro\ufb01t (Figure 3(d)). Although d(\u03c0, \u03c0(cid:48)) no longer fully summarizes\nstrategy differences, it still strongly indicates that the learning procedure converges to very similar\nstrategies regardless of initial conditions (Figure 3(b)).\n\n6 Related Work\n\nOn the theoretical side, Weber [23] derived equilibrium strategies for a basic model in which n\nbidders compete in k \ufb01rst or second price auctions, but bidders are assumed to have unit demand.\nF\u00b4evrier [6] and Yao [25] studied a model where n bidders have multi-unit demand, but there are\nonly two auctions and a bidder\u2019s per-good valuation is the same across the two goods. Liu [13]\nand Paes Leme et al. [17] studied models of n bidders with multi-unit demand where bidders have\n\n9This model is described in supplemental material.\n\n7\n\n00.5100.20.40.60.81ValuationBidWeber, 4 agents 00.5100.20.40.60.81ValuationBidKatzman, 2 agents 00.5100.20.40.60.81Menezes, 3 agents ValuationBid24681000.0020.0040.0060.0080.012 round, d(\u03c0it,\u03c0it+1)Iteration p = 0.5p = 1.0p = 2.024681000.0020.0040.0060.0080.012 round, d(\u03c0it,\u03c0jt)Iteration p=1.0 \u2212 p=0.5p=2.0 \u2212 p=0.5p=2.0 \u2212 p=1.000.5100.20.40.60.81ValuationBidAfter 20 iterations246810\u22121\u22120.500.51x 10\u22123Iteration2 round epsilonp = 1.0 [\u22122e\u221205,7e\u221205]\f(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 3: The same set of graphs as in Figure 2 for three round MM model with 3 agents.\n\ncomplete information about opponents\u2019 valuations and perfect information about opponents\u2019 past\nbids. Syrgkanis and Tardos [21] extended to the case of incomplete information with unit demand.\nOn the computational side, Rabinovich et al. [19] generalized \ufb01ctitious play to \ufb01nite-action incom-\nplete information games and applied their technique to simultaneous second-price auctions with\nutilities expressible as linear functions over a one-dimensional type space. Cai and Wurman [3] take\na heuristic approach to \ufb01nding equilibria for sequential auctions with incomplete information; oppo-\nnent valuations are sampled to create complete information games, which are solved with dynamic\nprogramming and a general game solver, and then aggregated into mixed behavior strategies to form\na policy for the original incomplete information game. Fatima et al. [5] \ufb01nd equilibrium bidding\nstrategies in sequential auctions with incomplete information under various rules of information\nrevelation after each round. Additional methods of computing equilibria have been developed for\nsequential games outside the context of auctions: Ganzfried and Sandholm [7] study the problem of\ncomputing approximate equilibria in the context of poker, and Mostafa and Lesser [15] describe an\nanytime algorithm for approximating equilibria in general incomplete information games.\nFrom a decision-theoretic perspective, the bidding problem for sequential auctions was previously\nformulated as an MDP in related domains. In Boutilier et al. [2], an MDP is created where distinct\ngoods are for sold consecutively, complementarities exist across goods, and the bidder is budget-\nconstrained. A similar formulation was studied in Greenwald and Boyan [8], but without budget\nconstraints. There, purchasing costs were models as negative rewards, signi\ufb01cantly reducing the\nsize of the MDP\u2019s state space. Lee et al. [12] represent multi-round games as iterated semi-net-\nform games, and then use reinforcement learning techniques to \ufb01nd K-level reasoning strategies for\nthose games. Their experiments are for two-player games with perfect information about opponent\nactions, but their approach is not conceptually limited to such models.\n\n7 Conclusion\n\nWe presented a two step procedure (predict and optimize) for \ufb01nding approximate equilibria in a\nclass of complex sequential auctions in which bidders have incomplete information about opponents\u2019\ntypes, imperfect information about opponents\u2019 bids, and demand multiple goods. Our procedure is\napplicable under numerous pricing rules, allocation rules, and information-revelation policies. We\nevaluated our method on models with analytically derived equilibria and on an auction domain in\nwhich analytical solutions were heretofore unknown. Our method was able to both show that the\nknown equilibrium for one model was not strict and guided our own analytical derivation of the\nnon-strict set of equilibria. For a more complex auction with no known analytical solutions, our\nmethod converged to an approximate equilibria with an \u0001-factor less than 10\u22124, and did so robustly\nwith respect to initialization of the learning procedure. While we achieved fast convergence in\nthe MM model, such convergence is not guaranteed. The fact that our procedure converged to\nnearly identical approximate equilibria even from different initializations is promising, and further\nexploring convergence properties in this domain is a direction for future work.\n\nAcknowledgements This research was supported by U.S. National Science Foundation Grants\nCCF-0905139 and IIS-1217761. The authors (and hence, the paper) bene\ufb01ted from lengthy discus-\nsions with Michael Wellman, Michael Littman, and Victor Naroditskiy. Chris Amato also provided\nuseful insights, and James Tavares contributed to the code development.\n\n8\n\n5101520253000.10.20.30.43 round, d(\u03c0it,\u03c0it+1)Iteration p = 0.5p = 1.0p = 2.0010203000.10.20.30.43 round, d(\u03c0it,\u03c0jt)Iteration p=1.0 \u2212 p=0.5p=2.0 \u2212 p=0.5p=2.0 \u2212 p=1.000.5100.20.40.60.81ValuationBidAfter 30 iterations102030\u22120.1\u22120.0500.050.1Iteration3 round epsilonp = 1.0 [\u22124e\u221205,0.003]\fReferences\n[1] R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.\n[2] C. Boutilier, M. Goldszmidt, and B. Sabata. Sequential auctions for the allocation of resources with\ncomplementarities. In International Joint Conference on Arti\ufb01cial Intelligence, volume 16, pages 527\u2013\n534. Lawrence Erlbaum Associates LTD, 1999.\n\n[3] G. Cai and P. R. Wurman. Monte Carlo approximation in incomplete information, sequential auction\n\ngames. Decision Support Systems, 39(2):153\u2013168, Apr. 2005.\n\n[4] A. Cournot. Recherches sur les Principes Mathematics de la Theorie la Richesse. Hachette, 1838.\n[5] S. S. Fatima, M. Wooldridge, and N. R. Jennings. Sequential Auctions in Uncertain Information Settings.\n\nAgent-Mediated Electronic Commerce and Trading Agent Design and Analysis, pages 16\u2014-29, 2009.\n\n[6] P. F\u00b4evrier. He who must not be named. Review of Economic Design, 8(1):99\u20131, Aug. 2003.\n[7] S. Ganzfried and T. Sandholm. Computing Equilibria in Multiplayer Stochastic Games of Imperfect\n\nInformation. International Joint Conference on Arti\ufb01cial Intelligence, pages 140\u2013146, 2009.\n\n[8] A. Greenwald and J. Boyan. Bidding under uncertainty: Theory and experiments. In Twentieth Conference\n\non Uncertainty in Arti\ufb01cial Intelligence, pages 209\u2013216, Banff, 2004.\n\n[9] S. M. Kakade, M. J. Kearns, and J. Langford. Exploration in metric state spaces. In Proceedings of the\n\n20th International Conference on Machine Learning ICML03, 2003.\n\n[10] B. Katzman. A Two Stage Sequential Auction with Multi-Unit Demands,. Journal of Economic Theory,\n\n86(1):77\u201399, May 1999.\n\n[11] P. Klemperer. Auctions: theory and practice. Princeton University Press, 2004.\n[12] R. Lee, S. Backhaus, J. Bono, W. Dc, D. H. Wolpert, R. Bent, and B. Tracey. Modeling Humans as\nReinforcement Learners : How to Predict Human Behavior in Multi-Stage Games. In NIPS 2011, 2011.\n[13] Q. Liu. Equilibrium of a sequence of auctions when bidders demand multiple items. Economics Letters,\n\n112(2):192\u2013194, 2011.\n\n[14] F. M. Menezes and P. K. Monteiro. Synergies and Price Trends in Sequential Auctions. Review of\n\nEconomic Design, 8:85\u201398, 2003.\n\n[15] H. Mostafa and V. Lesser. Approximately Solving Sequential Games With Incomplete Information. In\nProceedings of the AAMAS08 Workshop on Multi-Agent Sequential Decision Making in Uncertain Multi-\nAgent Domains, pages 92\u2013106, 2008.\n\n[16] Nielsen Company. Nielsen\u2019s quarterly global adview pulse report, 2011.\n[17] R. Paes Leme, V. Syrgkanis, and E. Tardos. Sequential Auctions and Externalities. In Proceedings of the\n\nTwenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 869\u2013886, 2012.\n\n[18] M. Puterman. Markov decision processes: discrete stochastic dynamic programming. Wiley, 1994.\n[19] Z. Rabinovich, V. Naroditskiy, E. H. Gerding, and N. R. Jennings. Computing pure Bayesian Nash\nequilibria in games with \ufb01nite actions and continuous types. Technical report, University of Southampton,\n2011.\n\n[20] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction, volume 9 of Adaptive computa-\n\ntion and machine learning. MIT Press, 1998.\n\n[21] V. Syrgkanis and E. Tardos. Bayesian sequential auctions. In Proceedings of the 13th ACM Conference\n\non Electronic Commerce, pages 929\u2013944. ACM, 2012.\n\n[22] W. Vickrey. Counterspeculation, Auctions, and Competitive Sealed Tenders. Journal of Finance, 16(1):\n\n8\u201337, 1961.\n\n[23] R. J. Weber. Multiple-Object Auctions. In R. Engelbrecht-Wiggans, R. M. Stark, and M. Shubik, editors,\n\nCompetitive Bidding, Auctions, and Procurement, pages 165\u2013191. New York University Press, 1983.\n\n[24] M. Wellman, E. Sodomka, and A. Greenwald. Self-con\ufb01rming price prediction strategies for simultaneous\n\none-shot auctions. In The Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI), 2012.\n\n[25] Z. Yao. Sequential First-Price Auctions with Multi-Unit Demand. Technical report, Discussion paper,\n\nUCLA, 2007.\n\n9\n\n\f", "award": [], "sourceid": 1137, "authors": [{"given_name": "Amy", "family_name": "Greenwald", "institution": null}, {"given_name": "Jiacui", "family_name": "Li", "institution": null}, {"given_name": "Eric", "family_name": "Sodomka", "institution": null}]}