{"title": "Convergence Analysis of Prediction Markets via Randomized Subspace Descent", "book": "Advances in Neural Information Processing Systems", "page_first": 3034, "page_last": 3042, "abstract": "Prediction markets are economic mechanisms for aggregating information about future events through sequential interactions with traders. The pricing mechanisms in these markets are known to be related to optimization algorithms in machine learning and through these connections we have some understanding of how equilibrium market prices relate to the beliefs of the traders in a market. However, little is known about rates and guarantees for the convergence of these sequential mechanisms, and two recent papers cite this as an important open question.In this paper we show how some previously studied prediction market trading models can be understood as a natural generalization of randomized coordinate descent which we call randomized subspace descent (RSD). We establish convergence rates for RSD and leverage them to prove rates for the two prediction market models above, answering the open questions. Our results extend beyond standard centralized markets to arbitrary trade networks.", "full_text": "Convergence Analysis of Prediction Markets via\n\nRandomized Subspace Descent\n\nRafael Frongillo\n\nDepartment of Computer Science\nUniversity of Colorado, Boulder\n\nraf@colorado.edu\n\nMark D. Reid\n\nResearch School of Computer Science\n\nThe Australian National University & NICTA\n\nmark.reid@anu.edu.au\n\nAbstract\n\nPrediction markets are economic mechanisms for aggregating information about\nfuture events through sequential interactions with traders. The pricing mecha-\nnisms in these markets are known to be related to optimization algorithms in ma-\nchine learning and through these connections we have some understanding of how\nequilibrium market prices relate to the beliefs of the traders in a market. However,\nlittle is known about rates and guarantees for the convergence of these sequential\nmechanisms, and two recent papers cite this as an important open question.\nIn this paper we show how some previously studied prediction market trading\nmodels can be understood as a natural generalization of randomized coordinate\ndescent which we call randomized subspace descent (RSD). We establish con-\nvergence rates for RSD and leverage them to prove rates for the two prediction\nmarket models above, answering the open questions. Our results extend beyond\nstandard centralized markets to arbitrary trade networks.\n\n1\n\nIntroduction\n\nIn recent years, there has been an increasing appreciation of the shared mathematical foundations\nbetween prediction markets and a variety of techniques in machine learning. Prediction markets\nconsist of agents who trade securities that pay out depending on the outcome of some uncertain,\nfuture event. As trading takes place, the prices of these securities re\ufb02ect an aggregation of the\nbeliefs the traders have about the future event. A popular class of mechanisms for updating these\nprices as trading occurs has been shown to be closely related to techniques from online learning [7,\n1, 21], convex optimization [10, 19, 13], probabilistic aggregation [24, 14], and crowdsourcing [3].\nBuilding these connections serve several purposes, however one important line of research has been\nto use insights from machine learning to better understand how to interpret prices in a prediction\nmarket as aggregations of trader beliefs, and moreover, how the market together with the traders can\nbe viewed as something akin to a distributed machine learning algorithm [24].\nThe analysis in this paper was motivated in part by two pieces of work that considered the equilib-\nria of prediction markets with speci\ufb01c models of trader behavior: traders as risk minimizers [13];\nand traders who maximize expected exponential utility using beliefs from exponential families [2].\nIn both cases, the focus was on understanding the properties of the market at convergence, and\nquestions concerning whether and how convergence happened were left as future work. In [2], the\nauthors note that \u201cwe have not considered the dynamics by which such an equilibrium would be\nreached, nor the rate of convergence etc., yet we think such questions provide fruitful directions\nfor future research.\u201d In [13], \u201cOne area of future work would be conducting a detailed analysis of\nthis framework using the tools of convex optimisation. A particularly interesting topic is to \ufb01nd the\nconditions under which the market will converge.\u201d\n\n1\n\n\fThe main contribution of this paper is to answer these questions of convergence. We do so by \ufb01rst\nproposing a new and very general model of trading networks and dynamics (\u00a73) that subsumes the\nmodels used in [2] and [13] and provide a key structural result for what we call ef\ufb01cient trades in\nthese networks (Theorem 2). As an aside, this structural result provides an immediate generalization\nof an existing aggregation result in [2] to trade networks of \u201ccompatible\u201d agents (Theorem 8). In\n\u00a74, we argue that ef\ufb01cient trades in our networks model can be viewed as steps of what we call\nRandom Subspace Descent (RSD) algorithm (Algorithm 1). This novel generalization of coordinate\ndescent allows an objective to be minimized by taking steps along af\ufb01nely constrained subspaces,\nand maybe be of independent interest beyond prediction market analysis. We provide a convergence\nanalysis of RSD under two sets of regularity constraints (Theorems 3 & 9) and show how these can\nbe used to derive (slow & fast) convergence rates in trade networks (Theorems 4 & 5).\nBefore introducing our general trading networks and convergence rate results, we \ufb01rst introduce the\nnow standard presentation of potential-based prediction markets [1] and the recent variant in which\nall agents determine their trades using risk measures [13]. We will then state informal versions of\nour main results so as to highlight how we address issues of convergence in existing frameworks.\n\n2 Background and Informal Results\n\ntotal of r \u00b7 \u03c6(\u03c9) =(cid:80)k\n\nPrediction markets are mechanisms for eliciting and aggregating distributed information or beliefs\nabout uncertain future events. The set of events or outcomes under consideration in the market will\nbe denoted \u2126 and may be \ufb01nite or in\ufb01nite. For example, each outcome \u03c9 \u2208 \u2126 might represent\na certain presidential candidate winning an election, the location of a missing submarine, or an\nunknown label for an item in a data set. Following [1], the goods that are traded in a prediction\nmarket are k outcome-dependent securities {\u03c6(\u00b7)i}k\ni=1 that pay \u03c6(\u03c9)i dollars should the outcome\n\u03c9 \u2208 \u2126 occur. We denote the set of distributions over \u2126 by \u2206\u2126 and note, for any p \u2208 \u2206\u2126, that the\nexpected pay off for the securities under p is E\u03c9\u223cp [\u03c6(\u03c9)] and the set of all expected pay offs is just\nthe convex hull, denoted \u03a0 := conv(\u03c6(\u2126)). A simple and commonly studied case is when \u2126 =\n[k] := {1, . . . , k} (i.e., when there are exactly k outcomes) and the securities are the Arrow-Debreu\nsecurities that pay out $1 should a speci\ufb01c outcome occur and nothing otherwise (i.e., \u03c6(\u03c9)i = 1 if\n\u03c9 = i and \u03c6(\u03c9)i = 0 for \u03c9 (cid:54)= i). Here, the securities are just basis vectors for Rk and \u03a0 = \u2206\u2126.\nTraders in a prediction market hold portfolios of securities r \u2208 Rk called positions that pay out a\ni=1 ri\u03c6(\u03c9)i dollars should outcome \u03c9 occur. We denote the set of positions\nby R = Rk. We will assume that R always contains a position r$ that returns a dollar regardless of\nwhich outcome occurs, meaning r$ \u00b7 \u03c6(\u03c9) = 1 for all \u03c9 \u2208 \u2126. We therefore interpret r$ as \u201ccash\u201d\nwithin the market in the sense that buying or selling r$ guarantees a \ufb01xed change in wealth.\nIn order to address the questions about convergence in [2, 13] we will consider a common form of\nprediction market that is run through a market maker. This is an automated agent that is willing to\nbuy or sell securities in return for cash. The speci\ufb01c and well-studied prediction market mechanism\nwe consider is the potential-based market maker [1]. Here, traders interact with the market maker\nsequentially, and the cost for each trade is determined by a convex potential function C : R \u2192 R\napplied to the market maker\u2019s state s \u2208 R. Speci\ufb01cally, the cost for a trade dr when the market\nmaker has state s is given by cost(dr; s) = C(s\u2212dr)\u2212C(s), i.e., the change in potential value of the\nmarket maker\u2019s position due to the market maker accepting the trade. After a trade, the market maker\nupdates the state to s \u2190 s \u2212 dr.1 As noted in the next section, the usual axiomatic requirements for\na cost function (e.g., in [1]) specify a function that is effectively a risk measure, commonly studied\nin mathematical \ufb01nance (see, e.g., [9]).\n\n2.1 Risk Measures\n\nAs in [13], agents in our framework will each quantify their uncertainty in positions using what is\nknown as risk measure. This is a function that assigns dollar values to positions. As Example 1\nbelow shows, this assumption will also cover the case of agents maximizing exponential utility, as\nconsidered in [2].\n\n1It is more common in the prediction market literature for s to be a liability vector, tracking what the market\nmaker stands to lose instead of gain. Here we adopt positive positions to match the convention for risk measures.\n\n2\n\n\fA (convex monetary) risk measure is a function \u03c1 : R \u2192 R satisfying, for all r, r(cid:48) \u2208 R:\n\n\u2022 Monotonicity: \u2200\u03c9 r \u00b7 \u03c6(\u03c9) \u2264 r(cid:48) \u00b7 \u03c6(\u03c9) =\u21d2 \u03c1(r) \u2265 \u03c1(r(cid:48)).\n\u2022 Cash invariance: \u03c1(r + c r$) = \u03c1(r) \u2212 c for all c \u2208 R.\n\n\u2022 Convexity: \u03c1(cid:0)\u03bbr + (1 \u2212 \u03bb)r(cid:48)(cid:1) \u2264 \u03bb\u03c1(r) + (1 \u2212 \u03bb)\u03c1(r(cid:48)) for all \u03bb \u2208 (0, 1).\n\n\u2022 Normalization: \u03c1(0) = 0.\n\nThe reasonableness of these properties is usually argued as follows (see, e.g., [9]). Monotonicity\nensures that positions that result in strictly smaller payoffs regardless of the outcome are considered\nmore risky. Cash invariance captures the idea that if a guaranteed payment of $c is added to the\npayment on each outcome then the risk will decrease by $c. Convexity states that merging positions\nresults in lower risk. Finally, normalization requires that holding no securities should carry no risk.\nThis last condition is only for convenience since any risk without this condition can trivially have its\nargument translated so it holds without affecting the other three properties. A key result concerning\nconvex risk measures is the following representation theorem (cf. [9, Theorem 4.15], ).\nTheorem 1 (Risk Representation). A functional \u03c1 : R \u2192 R is a convex risk measure if and only if\nthere is a closed convex function \u03b1 : \u03a0 \u2192 R\u222a{\u221e} such that \u03c1(r) = sup\u03c0\u2208relint(\u03a0) (cid:104)\u03c0,\u2212r(cid:105)\u2212 \u03b1(\u03c0).\nHere relint(\u03a0) denotes the relative interior of \u03a0, the interior relative to the af\ufb01ne hull of \u03a0. Notice\nthat if f\u2217 denotes the convex conjugate f\u2217(y) := supx (cid:104)y, x(cid:105) \u2212 f (x), then this theorem states that\n\u03c1(r) = \u03b1\u2217(\u2212r), that is, \u03c1 and \u03b1 are \u201cdual\u201d in the same way prices and positions are dual [5, \u00a75.4.4].\nThis suggests that the function \u03b1 can be interpreted as a penalty function, assigning a measure of\n\u201cunlikeliness\u201d \u03b1(\u03c0) to each expected value \u03c0 of the securities de\ufb01ned above. Equivalently, \u03b1(Ep [\u03c6])\nmeasures the unlikeliness of distribution p over the outcomes. We can then see that the risk is the\ngreatest expected loss under each distribution, taking into account the penalties assigned by \u03b1.\nExample 1. A well-studied risk measure is the entropic risk relative to a reference distribution\nq \u2208 \u2206\u2126 [9]. This is de\ufb01ned on positions r \u2208 R by \u03c1\u03b2(r) := \u03b2 log E\u03c9\u223cq [exp(\u2212r \u00b7 \u03c6(\u03c9)/\u03b2)]. The\ncost function C(r) = \u03c1\u03b2(\u2212r) associated with this risk exactly corresponds to the logarithmic mar-\nket scoring rule (LMSR). Its associated convex function \u03b1\u03b2 over distributions is the scaled relative\nentropy \u03b1\u03b2(p) = \u03b2 KL(p| q). As discussed in [2, 13], the entropic risk is closely related to expo-\nnential utility U\u03b2(w) := \u2212 1\n\u03b2 exp(\u2212\u03b2w). Indeed, \u03c1\u03b2(r) = \u2212U\u03b2 (E\u03c9\u223cq [U\u03b2(r \u00b7 \u03c6(\u03c9))]) which is just\nthe negative certainty equivalent of the position r \u2014 i.e., the amount of cash an agent with utility\nU\u03b2 and belief q would be willing to trade for the uncertain position r. Due to the monotonicity of\n\u03b2 , it follows that a trader maximizing expected utility E\u03c9\u223cq [U\u03b2(r \u00b7 \u03c6(\u03c9))] of holding position r\nU\u22121\nis equivalent to minimizing the entropic risk \u03c1\u03b2(r).\n\nFor technical reasons, in addition to the standard assumptions for convex risk measures, we will also\nmake two weak regularity assumptions. These are similar to properties required of cost functions in\nthe prediction market literature (cf. [1, Theorem 3.2]):\n\n\u2022 Expressiveness: \u03c1 is everywhere differentiable, and closure{\u2207\u03c1(r) : r \u2208 R} = \u03a0.\n\u2022 Strict risk aversion:\n\nthe Convexity inequality is strict unless r \u2212 r(cid:48) = c r$ for some c \u2208 R.\nAs discussed in [1], expressiveness is related to the dual formulation given above; roughly, it says\nthat the agent must take into account every possible expected value of the securities when calculating\nthe risk. Strict risk aversion says that an agent should strictly prefer a mixture of positions, unless\nof course the difference is outcome-independent.\nUnder these assumptions, the representation result of Theorem 1 and a similar result for cost func-\ntions [1, Theorem 3.2]) coincide and we are able to show that cost functions and risk measures\nare exactly the same object; we write \u03c1C(r) = C(r) when we think of C as a risk measure. Un-\nfolding the de\ufb01nition of cost now using cash invariance, we have \u03c1C(s \u2212 dr + cost(dr; s)r$ ) =\n\u03c1C(s \u2212 dr) \u2212 cost(dr; s) = C(s \u2212 dr) \u2212 C(s \u2212 dr) + C(s) = \u03c1C(s). Thus, we may view a\npotential-based market maker as a constant-risk agent.\n\n2.2 Trading Dynamics and Aggregation\n\nAs described above, we consider traders who approach the market maker sequentially and at random,\nand select the optimal trade based on their current position, the market state, and the cost function C.\n\n3\n\n\fAs we just observed, we may think of the market maker as a constant-risk agent with \u03c1C = C. Let\nus examine the optimization problem faced by the trader with position r when the current market\nstate is s. This trader will choose a portfolio dr\u2217 from the market maker so as to minimise her risk:\n\ndr\u2217 \u2208 arg min\ndr\u2208Rk\n\n\u03c1 (r + dr \u2212 cost(dr)r$ ) = arg min\ndr\u2208Rk\n\n\u03c1(r + dr) + \u03c1C(s \u2212 dr) .\n\n(1)\n\nSince, by the cash invariance of \u03c1 and the de\ufb01nition of cost, the objective is \u03c1(r + dr) + \u03c1C(s \u2212\ndr) \u2212 \u03c1C(s), and \u03c1C(s) does not depend on dr. Thus, if we think of F (r, s) = \u03c1(r) + \u03c1C(s) as\na kind of \u201csocial risk\u201d, we can de\ufb01ne the surplus as simply the net risk taken away by an optimal\ntrade, namely F (r, s) \u2212 F (r + dr\u2217, s \u2212 dr\u2217).\nWe can now state our central question: if a set of N such traders arrive at random and execute\noptimal (or perhaps near-optimal) trades with the market maker, will the market state converge to\nthe optimal risk, and if so how fast? As discussed in the introduction, this is precisely the question\nasked in [2, 13] that we set out to answer. To do so we will draw a close connection to the literature\non distributed optimization algorithms for machine learning. Speci\ufb01cally, if we encode the entire\nstate of our system in the positions R = (r0 = s, r1, . . . , rn) of the market maker and each of the\nn traders, we may view the optimal trade in eq. (1) as performing a coordinate descent step, by\noptimizing only with respect to coordinates 0 and i. We build on this connection in Section 4 and\nleverage a generalization of coordinate descent methods to show the following in Theorem 4: If a\nset of risk-based traders is sampled at random to sequentially trade in the market, the market state\nand prices converge to within \u0001 of the optimal total risk in O(1/\u0001) rounds.\nIn fact, under mild smoothness assumptions on the cost potential function C, we can improve this\nrate to O(log(1/\u0001)). We can also relax the optimality of the trader behavior; as long as traders \ufb01nd\na trade dr which extracts at least a constant fraction of the surplus, the rate remains intact.\nWith convergence rates in hand, the next natural question might be: to what does the market con-\nverge? Abernethy et al. [2] show that when traders minimize expected exponential utility and have\nexponential family beliefs, the market equilibrium price can be thought of as a weighted average of\nthe parameters of the traders, with the weights being a measure of their risk tolerance. Even though\nour setting is far more general than exponential utility and exponential families, the framework we\ndevelop can also be used to show that their results can be extended to interactions between traders\nwho have what we call \u201ccompatible\u201d risks and beliefs. Speci\ufb01cally, for any risk-based trader pos-\nsessing a risk \u03c1 with dual \u03b1, we can think of that trader\u2019s \u201cbelief\u201d as the least surprising distribution\np according to \u03b1. This view induces a family of distributions (which happen to be generalized ex-\nponential families [11]) that are parameterized by the initial positions of the traders. Furthermore,\nthe risk tolerance b is given by how sensitive this belief is to small changes of an agent\u2019s position.\nThe results of [2] are then a special case of our Theorem 8 for agents with \u03c1 being entropic risk (cf.\nExample 1): If each trader i has risk tolerance bi and a belief parameterized by \u03b8i, and the initial\nmarket state is \u03b80, then the equilibrium state of the market, to which the market converges, is given\n\nby \u03b8\u2217 = \u03b80+(cid:80)\n1+(cid:80)\n\ni bi\u03b8i\ni bi\n\n.\n\nAs the focus of this paper is on the convergence, the details for this result are given in Appendix C.\nThe main insight that drives the above analysis of the interaction between a risk-based trader and a\nmarket maker is that each trade minimizes a global objective for the market that is the in\ufb01mal convo-\nlution [6] of the traders\u2019 and market maker\u2019s risks. In fact, this observation naturally generalizes to\ntrades between three or more agents and the same convergence analysis applies. In other words, our\nanalysis also holds when bilateral trade with a \ufb01xed market maker is replaced by multilateral trade\namong arbitrarily overlapping subsets of agents. Viewed as a graph with agents as nodes, the stan-\ndard prediction market framework is represented by the star graph, where the central market market\ninteracts with traders sequentially and individually. More generally we have what we call a trading\nnetwork, in which the structure of trades can form arbitrary connected graphs or even hypergraphs.\nAn obvious choice is the complete graph, which can model a decentralized market, and in fact we\ncan even compare the convergence rate of our dynamics between the centralized and decentralized\nmodels; see Appendix D.2 and the discussion in \u00a7 5.\n\n4\n\n\f3 General Trading Dynamics\n\nsatisfying(cid:80)\n\nThe previous section described the two agent case of what is more generally known as the optimal\nrisk allocation problem [6] where two or more agents express their preferences for positions via\nrisk measures. This is formalized by considering N agents with risk measures \u03c1i : R \u2192 R for\ni \u2208 [N ] := {1, . . . , N} who are asked to split a position r \u2208 R in to per-agent positions ri \u2208 R\ni \u03c1i(ri). They note that the value of the total\nrisk is given by the in\ufb01mal convolution \u2227i\u03c1i of the individual agent risks \u2014 that is,\n\ni ri = r so as to minimise the total risk(cid:80)\n(cid:88)\n\n(cid:40)(cid:88)\n\n(cid:41)\n\nri = r , ri \u2208 R\n\n.\n\n(2)\n\n(\u2227i\u03c1i)(r) := inf\n\n\u03c1i(ri) :\n\ni\n\ni\n\nA key property of the in\ufb01mal convolution, which will underly much of our analysis, is that its convex\nconjugate is the sum of the conjugates of its constituent functions. See e.g. [23] for a proof.\n\n(3)\n\n(cid:88)\n\ni\u2208[N ]\n\n(\u2227i\u03c1i)\u2217 =\n\n\u03c1\u2217\ni .\n\n(i.e., a matrix in RN\u00d7k) such that(cid:80)\nby \u03a6S(r) =(cid:80)\ni\u2208S \u03c1i(ri)\u2212 (\u2227i\u03c1i)((cid:80)\n\nif it were a single risk-based agent) as a function of the net position(cid:80)\n\nOne can think of \u2227i\u03c1i as the \u201cmarket risk\u201d, which captures the risk of the entire market (i.e., as\ni ri of its constituents. By\nde\ufb01nition, eq. (2) says that the market is trying to reallocate the risk so as to minimize this net risk.\nThis interpretation is con\ufb01rmed by eq. (3) when we interpret the duals as penalty functions as above:\nthe penalty of \u03c0 is the sum of the penalties of the market participants.\nAs alluded to above, we allow our agents to interact round by round by conducting trades, which are\nsimply the exchange of outcome-contingent securities. Since by assumption our position space R is\nclosed under linear combinations, a trade between two agents is simply a position which is added to\none agent and subtracted from another. Generalizing from this two agent interaction, a trade among\na set of agents S \u2286 [N ] is just a collection of trade vectors, one for each agent, which sum to 0.\nFormally, let S \u2286 [N ] be a subset of agents. A trade on S is then a vector of positions dr \u2208 RN\ni\u2208S dri = 0 \u2208 R and dri = 0 for all i /\u2208 S. This last condition\n\nspeci\ufb01es that agents not in S do not change their position.\nA key quantity in our analysis is a measure of how much the total risk of a collection of traders drops\ndue to trading. Given some subset of traders S, the S-surplus is a function \u03a6S : RN \u2192 R de\ufb01ned\ni\u2208S ri) which measures the maximum achievable drop in risk\n(since \u2227i\u03c1i is an in\ufb01mum). In particular, \u03a6(r) := \u03a6[N ](r) is the surplus function. The trades that\nachieve this optimal drop in risk are called ef\ufb01cient: given current state r \u2208 RN , a trade dr \u2208 RN\non S \u2286 [N ] is ef\ufb01cient if \u03a6S(r + dr) = 0.\nOur following key result shows that ef\ufb01cient trades have remarkable structure: once the state r and\nsubset S is speci\ufb01ed, there is a unique ef\ufb01cient trade, up to cash transfers.\nIn other words, the\nsurplus is removed from the position vectors and then redistributed as cash to the traders; the choice\nof trade is merely in how this redistribution takes place. The fact that the derivatives match has strong\nintuition from prediction markets: agents must agree on the price.2 The proof is in Appendix A.1.\nTheorem 2. Let r \u2208 RN and S \u2286 [N ] be given.\ni. The surplus is always \ufb01nite: 0 \u2264 \u03a6S(r) < \u221e.\nii. The set of ef\ufb01cient trades on S is nonempty.\niii. Ef\ufb01cient trades are unique up to zero-sum cash transfers: Given ef\ufb01cient trades dr\u2217, dr \u2208 RN\nA trade dr on S is ef\ufb01cient if and only if for all i, j \u2208 S,\nIf dr is an ef\ufb01cient trade on S, for all i \u2208 S we have\n\u03c0\u2208\u03a0\n\non S, we have dr = dr\u2217 + (z1r$, . . . , zN r$ ) for some z \u2208 RN with(cid:80)\ni\u2208S \u03b1i(\u03c0) \u2212(cid:10)\u03c0,(cid:80)\n(cid:80)\n\n\u2207\u03c1i(ri + dri) = \u2207\u03c1j(rj + drj).\n\u2207\u03c1i(ri + dri) = \u2212\u03c0\u2217\nS, where \u03c0\u2217\n\nv. There is a unique \u201cef\ufb01cient price\u201d:\n\niv. Traders agree on \u201cprices\u201d:\n\nS = arg min\n\ni zi = 0.\n\n(cid:11).\n\ni\u2208S ri\n\n2As intuition for the term \u201cprice\u201d, consider that the highest price-per-unit agent i would be willing to pay\nfor an in\ufb01nitesimal quantity of a position dri is dri \u00b7 (\u2212\u2207\u03c1i(ri)), and likewise the lowest price-per-unit to sell.\nThus, the entries of \u2212\u2207\u03c1i(ri) act as the \u201cfair\u201d prices for their corresponding basis positions/securities.\n\n5\n\n\fThe above properties of ef\ufb01cient trades drive the remainder of our convergence analysis of network\ndynamics. It also allows us to write a simple closed form for the market price when traders share\na common risk pro\ufb01le (Theorem 8). Details are in Appendix C. Beyond our current focus on rates,\nTheorem 2 has implications for a variety of other economic properties of trade networks. For ex-\nample, in Appendix B we show that ef\ufb01cient trades correspond to \ufb01xed points for more general\ndynamics, market clearing equilibria, and equilibria of natural bargaining games among the traders.\nRecall that in the prediction market framework of [13], each round has a single trader, say i > 1,\ninteracting with the market maker who we will assume has index 1. In the notation just de\ufb01ned this\ncorresponds to choosing S = {1, i}. We now wish to consider richer dynamics where groups of two\nor more agents trade ef\ufb01ciently each round. To this end will we call a collection S = {Sj \u2286 [N ]}m\nof groups of traders a trading network and assume there is some \ufb01xed distribution D over S with\nfull support. A trade dynamic over S is a process that begins at t = 0 with some initial positions\nr0 \u2208 RN for the N traders, and at each round t, draws a random group of traders St \u2208 S according\nto D, selects some ef\ufb01cient trade drt on S, then updates the trader positions using rt+1 = rt + drt.\nFor the purposes of proving the convergence of trade dynamics, a crucial property is whether all\ntraders can directly or indirectly affect the others. To capture this we will say a trade network\nis connected if the hypergraph on [N ] with edges given by S is connected; i.e., information can\npropagate throughout the entire network. Dynamics over classical prediction markets are always\nconnected since any pair of groups from its network will always contain the market maker.\n\nj=1\n\n4 Convergence Analysis of Randomized Subspace Descent\n\nBefore brie\ufb02y reviewing the literature on coordinate descent, let us see why this might be a useful\nway to think of our dynamics. Recall that we have a set S of subsets of agents, and that in each step,\nan ef\ufb01cient trade dr is chosen which only modi\ufb01es the positions of agents in the sampled S \u2208 S.\nThinking of (r1, . . . , rN ) as a vector of dimension N \u00b7 k vector (recall R = Rk), changing rt to\nrt+1 = rt + dr thus only modi\ufb01es |S| blocks of k entries. Moreover, ef\ufb01ciency ensures that dr\nminimizes the sum of the risks of agents in S. Hence, ignoring for now the constraint that the sum\nof the positions must remain constant, the trade dynamic seems to be performing a kind of block\ncoordinate descent of the surplus function \u03a6.\n\n4.1 Randomized Subspace Descent\n\nSeveral randomized coordinate descent methods have appeared in the literature recently, with in-\ncreasing levels of sophistication. While earlier methods focused on updates which only modi\ufb01ed\ndisjoint blocks of coordinates [18, 22], more recent methods allow for more general con\ufb01gurations,\nsuch as overlapping blocks [17, 16, 20]. In fact, these last three methods are closest to what we study\nhere; the authors consider an objective which decomposes as the sum of convex functions on each\ncoordinate, and study coordinate updates which follow a graph structure, all under the constraint\nthat coordinates sum to 0. Despite the similarity of these methods to our trade dynamics, we require\neven more general updates, as we allow coordinate i to correspond to arbitrary subsets Si \u2208 S.\nInstead, we establish a uni\ufb01cation of these methods which we call randomized subspace descent\n(RSD), listed in Algorithm 1. Rather than blocks of coordinates or speci\ufb01c linear constraints, RSD\nabstracts away these constructs by simply specifying \u201ccoordinate subspaces\u201d in which the optimiza-\ntion is to be performed. Speci\ufb01cally, the algorithm takes a list of projection matrices {\u03a0i}n\ni=1 which\nde\ufb01ne the subspaces, and at each step t selects a \u03a0i at random and tries to optimize the objective\nunder the constraint that it may only move within the image space of \u03a0i; that is, if the current point\nis xt, then xt+1 \u2212 xt \u2208 im(\u03a0i).\nBefore stating our convergence results for Algorithm 1, we will need a notion of smoothness relative\nto our subspaces. Speci\ufb01cally, we say F is Li-\u03a0i-smooth if for all i there are constants Li > 0 such\nthat for all y \u2208 im(\u03a0i),\n\nF (x + y) \u2264 F (x) + (cid:104)\u2207F (x), y(cid:105) + Li\n\n2 (cid:107)y(cid:107)2\n2 .\n\n(4)\n\nFinally, let F min := miny\u2208span{im(\u03a0i)}i F (x0 + y) be the global minimizer of F subject to the\nconstraints from the \u03a0i. Then we have the following result for a constant R(x0) which increases in:\n\n6\n\n\fsmoothness parameters {Li}m\n\nALGORITHM 1: Randomized Subspace Descent\nInput: Smooth convex function F : Rn \u2192 R, initial point x0 \u2208 Rn, matrices {\u03a0i \u2208 Rn\u00d7n}m\ni=1,\nfor iteration t in {0, 1, 2,\u00b7\u00b7\u00b7} do\n\u03a0i\u2207F (xt)\n\ni=1, distribution p \u2208 \u2206m\n\nend\n\nsample i from p\nxt+1 \u2190 xt \u2212 1\n\nLi\n\n(1) the distance from the point x0 to furthest minimizer of F , (2) the Lipschitz constants of F w.r.t.\nthe \u03a0i, and (3) the connectivity of the hypergraph induced by the projections.\nTheorem 3. Let F , {\u03a0i}i, {Li}i, x0, and p be given as in Algorithm 1, with the condition that F is\n\nLi-\u03a0i-smooth for all i. Then E(cid:2)F (xt) \u2212 F min(cid:3) \u2264 2R2(x0) / t.\n\nThe proof is in Appendix D. Additionally, when F is strongly convex, meaning it has a uniform local\nquadratic lower bound, RSD enjoys faster, linear convergence. Formally, this condition requires F\nto be \u00b5-strongly convex for some constant \u00b5 > 0, that is, for all x, y \u2208 dom F we require\n\nF (y) \u2265 F (x) + \u2207F (x) \u00b7 (y \u2212 x) + \u00b5\n\n2(cid:107)y \u2212 x(cid:107)2 .\nThe statement and details of this stronger result is given in Appendix D.1.\nImportantly for our setting these results only track the progress per iteration. Thus, they apply to\nmore sophisticated update steps than a simple gradient step as long as they improve the objective\nby at least as much. For example, if in each step the algorithm computed the exact minimizer\nxt+1 = arg miny\u2208im(\u03a0i) F (xt + y), both theorems would still hold.\n\n(5)\n\nof RN consisting of all possible trades on S, namely {dr \u2208 RN : dri = 0 for i (cid:54)= S, (cid:80)\n\n4.2 Convergence Rates for Trade Dynamics\nTo apply Theorem 3 to the convergence of trading dynamics, we let F = \u03a6 and x = (r1, . . . , rN ) \u2208\nRN \u223c= RN k be the joint position of all agents. For each subset S \u2208 S of agents, we have a subspace\ni\u2208S dri =\n0}, with corresponding projection matrix \u03a0S. For the special case of prediction markets with a\ncentralized market maker, we have N \u2212 1 subspaces S = {{1, i} : i \u2208 {2, . . . , N}} and \u03a01,i\nprojects onto {dr \u2208 RN : dri = \u2212dr1, drj = 0 for j (cid:54)= 1, i}. The intuition of coordinate descent is\nclear now: the subset S of agents seek to minimize the total surplus within the subspace of trades on\nS, and thus the coordinate descent steps of Algorithm 1 will correspond to roughly ef\ufb01cient trades.\nWe now apply Theorem 3 to show that trade dynamics achieve surplus \u0001 > 0 in time O(1/\u0001). Note\nthat we will have to assume the risk measure \u03c1i of agent i is Li-smooth for some Li > 0. This is a\nvery loose restriction, as our risk measures are all differentiable by the expressiveness condition.\nTheorem 4. Let \u03c1i be an Li-smooth risk measure for all i. Then for any connected trade dynamic,\nwe have E [\u03a6(rt)] = O(1/t).\nProof. Taking LS = maxi\u2208S Li, one can check that F is LS-\u03a0S-smooth for all S \u2208 S by eq. (4).\nSince Algorithm 1 has no state aside from xt, and the proof of Theorem 3 depends only the drop\nin F per step, any algorithm selecting the sets S \u2208 S with the same distribution and satisfying\n\u03a0i\u2207F (xt)) will yield the same convergence rate. As trade dynamics satisfy\nF (xt+1) \u2264 F (xt \u2212 1\nF (xt+1) = miny\u2208RN k F (xt \u2212 \u03a0iy), this property trivially holds, and so Theorem 3 applies.\n\nLi\n\nIf we assume slightly more, that our risk measures have local quadratic lower bounds, then we can\nobtain linear convergence. Note that this is also a relatively weak assumption, and holds whenever\nthe risk measure has a Hessian with only one zero eigenvalue (for r$) at each point. This is satis\ufb01ed,\nfor example, by all the variants of entropic risk we discuss in the paper. The proof is in Appendix D.\nTheorem 5. Suppose for each i we have a continuous function \u00b5i : R \u2192 R+ such that for all r,\nrisk \u03c1i is \u00b5i(r)-strongly convex with respect to r$\u22a5 in a neighborhood of r; in other words, eq. (5)\nholds for F = \u03c1i, \u00b5 = \u00b5i(r), and all y in a neighborhood of r such that (r \u2212 y) \u00b7 r$ = 0. Then for\nall connected trade dynamics, E [\u03a6(rt)] = O(2\u2212t).\n\n7\n\n\f|V (G)|\n\n|E(G)|\n\n\u03bb2(G)\n\nn\nn\nn\n\nn(n \u2212 1)/2\n\n2(1\u2212cos \u03c0\nn )\n2(1\u2212cos 2\u03c0\nn )\n\nGraph\nKn\nPn\nCn\nK(cid:96),k\nBk\nTable 1: Algebraic connectivities for common graphs.\nFigure 1: Average (in bold) of 30 market simulations\nfor the complete and star graphs. The empirical gap in\niteration complexity is just under 2 (cf. Fig. 3).\n\nn \u2212 1\nn\n(cid:96)k\nk2k\u22121\n\n(cid:96) + k\n\n2k\n\nn\n\nk\n2\n\nAmazingly, the convergence rates in Theorem 4 and Theorem 5 hold for all connected trade dy-\nnamics. The constant hidden in the O(\u00b7) does depend on the structure of the network but can be\nexplicitly determined in terms its algebraic connectivity. This is discussed further in Appendix D.2.\nThe intuition behind these convergence rates given here is that agents in whichever group S is chosen\nalways trade to fully minimize their surplus. Because the proofs (in Appendix D) of these methods\nmerely track the reduction in surplus per trading round, the bounds apply as long as the update is at\nleast as good as a gradient step. In fact, we can say even more: if only an \u0001 fraction of the surplus is\ntaken at each round, the rates are still O(1/(\u0001t)) and O((1 \u2212 \u0001\u00b5)t), respectively. This suggests that\nour convergence results are robust with respect to the model of rationality one employs; if agents\nhave bounded rationality and can only compute positions which approximately minimize their risk,\nthe rates remain intact (up to constant factors) as long as the inef\ufb01ciency is bounded.\n\n5 Conclusions & Future Work\n\nUsing the tools of convex analysis to analyse the behavior of markets allows us to make precise,\nquantitative statements about their global behavior. In this paper we have seen that, with appropriate\nassumptions on trader behaviour, we can determine the rate at which the market will converge to\nequilibrium prices, thereby closing some open questions raised in [2] and [13].\nIn addition, our newly proposed trading networks model allow us to consider a variety of prediction\nmarket structures. As discussed in \u00a73, the usual prediction market setting is centralized, and corre-\nsponds to a star graph with the market maker at the center. A decentralized market where any trader\ncan trade with any other corresponds to a complete graph over the traders. We can also model more\nexotic networks, such as two or more market maker-based prediction markets with a risk minimizing\narbitrageur or small-world networks where agents only trade with a limited number of \u201cneighbours\u201d.\nFurthermore, because these arrangements are all instances of trade networks, we can immediately\ncompare the convergence rates across various constraints on how traders may interact. For example,\nin Appendix D.2, we show that a market that trades through a centralized market maker incurs an\nquanti\ufb01able ef\ufb01ciency overhead: convergence takes twice as long (see Figure 1). More generally,\nwe show that the rates scale as \u03bb2(G)/|E(G)|, allowing us to make similar comparisons between\narbitrary networks; see Table 1. This raises an interesting question for future work: given some\nconstraints such as a bound on how many traders a single agent can trade with, the total number of\nedges, etc, which network optimizes the convergence rate of the market? These new models and\nthe analysis of their convergence may provide new principles for building and analyzing distributed\nsystems of heterogeneous and self-interested learning agents.\n\nAcknowledgments\n\nWe would like to thank Matus Telgarsky for his generous help, as well as the lively discussions with,\nand helpful comments of, S\u00b4ebastien Lahaie, Miro Dud\u00b4\u0131k, Jenn Wortman Vaughan, Yiling Chen,\nDavid Parkes, and Nageeb Ali. MDR is supported by an ARC Discovery Early Career Research\nAward (DE130101605). Part of this work was developed while he was visiting Microsoft Research.\n\n8\n\n\fReferences\n[1] Jacob Abernethy, Yiling Chen, and Jennifer Wortman Vaughan. Ef\ufb01cient market making via convex\noptimization, and a connection to online learning. ACM Transactions on Economics and Computation,\n1(2):12, 2013.\n\n[2] Jacob Abernethy, Sindhu Kutty, S\u00b4ebastien Lahaie, and Rahul Sami. Information aggregation in expo-\nnential family markets. In Proceedings of the \ufb01fteenth ACM conference on Economics and computation,\npages 395\u2013412. ACM, 2014.\n\n[3] Jacob D Abernethy and Rafael M Frongillo. A collaborative mechanism for crowdsourcing prediction\n\nproblems. In Advances in Neural Information Processing Systems, pages 2600\u20132608, 2011.\n\n[4] Aharon Ben-Tal and Marc Teboulle. An old-new concept of convex risk measures: The optimized cer-\n\ntainty equivalent. Mathematical Finance, 17(3):449\u2013476, 2007.\n\n[5] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.\n[6] Christian Burgert and Ludger R\u00a8uschendorf. On the optimal risk allocation problem. Statistics & decisions,\n\n24(1/2006):153\u2013171, 2006.\n\n[7] Yiling Chen and Jennifer Wortman Vaughan. A new understanding of prediction markets via no-regret\nlearning. In Proceedings of the 11th ACM conference on Electronic commerce, pages 189\u2013198. ACM,\n2010.\n\n[8] Nair Maria Maia de Abreu. Old and new results on algebraic connectivity of graphs. Linear algebra and\n\nits applications, 423(1):53\u201373, 2007.\n\n[9] Hans F\u00a8ollmer and Alexander Schied. Stochastic Finance: An Introduction in Discrete Time, volume 27\n\nof de Gruyter Studies in Mathematics. Walter de Gruyter & Co., Berlin, 2nd edition, 2004.\n\n[10] Rafael M Frongillo, Nicol\u00b4as Della Penna, and Mark D Reid. Interpreting prediction markets: a stochastic\n\napproach. In Proceedings of Neural Information Processing Systems, 2012.\n\n[11] P.D. Gr\u00a8unwald and A.P. Dawid. Game theory, maximum entropy, minimum discrepancy and robust\n\nBayesian decision theory. The Annals of Statistics, 32(4):1367\u20131433, 2004.\n\n[12] JB Hiriart-Urruty and C Lemar\u00b4echal. Grundlehren der mathematischen wissenschaften. Convex Analysis\n\nand Minimization Algorithms II, 306, 1993.\n\n[13] Jinli Hu and Amos Storkey. Multi-period trading prediction markets with connections to machine learn-\n\ning. In Proceedings of the 31st International Conference on Machine Learning (ICML), 2014.\n\n[14] Jono Millin, Krzysztof Geras, and Amos J Storkey.\n\nIsoelastic agents and wealth updates in machine\nlearning markets. In Proceedings of the 29th International Conference on Machine Learning (ICML-12),\npages 1815\u20131822, 2012.\n\n[15] Bojan Mohar. The Laplacian spectrum of graphs. In Graph Theory, Combinatorics, and Applications,\n\n1991.\n\n[16] I Necoara, Y Nesterov, and F Glineur. A random coordinate descent method on large-scale optimization\n\nproblems with linear constraints. Technical Report, 2014.\n\n[17] Ion Necoara. Random coordinate descent algorithms for multi-agent convex optimization over networks.\n\nAutomatic Control, IEEE Transactions on, 58(8):2001\u20132012, 2013.\n\n[18] Yurii Nesterov. Ef\ufb01ciency of coordinate descent methods on huge-scale optimization problems. SIAM\n\nJournal on Optimization, 22(2):341\u2013362, 2012.\n\n[19] Mindika Premachandra and Mark Reid. Aggregating predictions via sequential mini-trading. In Asian\n\nConference on Machine Learning, pages 373\u2013387, 2013.\n\n[20] Sashank Reddi, Ahmed Hefny, Carlton Downey, Avinava Dubey, and Suvrit Sra. Large-scale randomized-\ncoordinate descent methods with non-separable linear constraints. arXiv preprint arXiv:1409.2617, 2014.\n[21] Mark D Reid, Rafael M Frongillo, Robert C Williamson, and Nishant Mehta. Generalized mixability via\n\nentropic duality. In Proc. of Conference on Learning Theory (COLT), 2015.\n\n[22] Peter Richt\u00b4arik and Martin Tak\u00b4a\u02c7c. Iteration complexity of randomized block-coordinate descent methods\n\nfor minimizing a composite function. Mathematical Programming, 144(1-2):1\u201338, 2014.\n\n[23] R.T. Rockafellar. Convex analysis. Princeton University Press, 1997.\n[24] Amos J Storkey. Machine learning markets. In International Conference on Arti\ufb01cial Intelligence and\n\nStatistics, pages 716\u2013724, 2011.\n\n9\n\n\f", "award": [], "sourceid": 1716, "authors": [{"given_name": "Rafael", "family_name": "Frongillo", "institution": "CU Boulder"}, {"given_name": "Mark", "family_name": "Reid", "institution": "Australia National University"}]}