{"title": "Adapting to a Market Shock: Optimal Sequential Market-Making", "book": "Advances in Neural Information Processing Systems", "page_first": 361, "page_last": 368, "abstract": "We study the profit-maximization problem of a monopolistic market-maker who sets two-sided prices in an asset market. The sequential decision problem is hard to solve because the state space is a function. We demonstrate that the belief state is well approximated by a Gaussian distribution. We prove a key monotonicity property of the Gaussian state update which makes the problem tractable, yielding the first optimal sequential market-making algorithm in an established model. The algorithm leads to a surprising insight: an optimal monopolist can provide more liquidity than perfectly competitive market-makers in periods of extreme uncertainty, because a monopolist is willing to absorb initial losses in order to learn a new valuation rapidly so she can extract higher profits later.", "full_text": "Adapting to a Market Shock: Optimal Sequential\n\nMarket-Making\n\nSanmay Das\n\nDepartment of Computer Science\nRensselaer Polytechnic Institute\n\nTroy, NY 12180\n\nsanmay@cs.rpi.edu\n\nMalik Magdon-Ismail\n\nDepartment of Computer Science\nRensselaer Polytechnic Institute\n\nTroy, NY 12180\n\nmagdon@cs.rpi.edu\n\nAbstract\n\nWe study the pro\ufb01t-maximization problem of a monopolistic market-maker who\nsets two-sided prices in an asset market. The sequential decision problem is hard\nto solve because the state space is a function. We demonstrate that the belief state\nis well approximated by a Gaussian distribution. We prove a key monotonicity\nproperty of the Gaussian state update which makes the problem tractable, yielding\nthe \ufb01rst optimal sequential market-making algorithm in an established model. The\nalgorithm leads to a surprising insight: an optimal monopolist can provide more\nliquidity than perfectly competitive market-makers in periods of extreme uncer-\ntainty, because a monopolist is willing to absorb initial losses in order to learn a\nnew valuation rapidly so she can extract higher pro\ufb01ts later.\n\n1 Introduction\n\nDesigning markets to achieve certain goals is gaining renewed importance with the prevalence of\nmany novel markets, ranging from prediction markets [13] to markets for e-services [11]. These\nmarkets tend to be thin (illiquid) when they \ufb01rst appear. Similarly, when a market shock occurs\nto the value of an instrument on a \ufb01nancial exchange, thousands of speculative traders suddenly\npossess new valuations on the basis of which they would like to trade. Periods of uncertainty, like\nthose following a shock, are also periods of illiquidity, so trading may be sparse right after a shock.\nThis is a chicken-and-egg problem. People do not want to trade in thin markets, and yet, having\nmany people trading is what creates liquidity. These markets therefore need to be bootstrapped\ninto a phase where they are suf\ufb01ciently liquid to attract trading. This bootstrapping is often achieved\nthrough market-makers [12]. Market-makers are responsible for providing liquidity and maintaining\norder on the exchange. For example, the NYSE designates a single monopolist specialist (market-\nmaker) for each stock, while the NASDAQ allows multiple market-makers to compete.\nThere has been much debate on whether one of these models is better than the other. This debate\nis again important today for those who are designing new markets. Should they employ a sin-\ngle monopolistic market-maker or multiple competitive market-makers? Alternatively, should the\nmarket-maker be based on some other criterion, and if so, what is the optimal design for this agent?\nMarket makers want to maximize pro\ufb01t, which could run contrary to their \u201csocial responsibility\u201d of\nproviding liquidity. A monopolist market maker attempts to maximize expected discounted prof-\nits, while competitive (non-colluding) market makers may only expect zero pro\ufb01t, since any pro\ufb01ts\nshould be wiped out by competition. Therefore, one would expect markets with competitive market-\nmakers to be of better quality. However, this has not been observed in practice, especially in the\nwell-studied case of the NASDAQ vs. the NYSE [1, 9]. Many explanations have been proposed in\nthe empirical literature, and have explained parts of this phenomenon. One reason that has been\nspeculated about anecdotally but never analyzed formally is the learning aspect of the problem. For\n\n1\n\n\fexample, the NYSE\u2019s promotional literature used to tout the bene\ufb01ts of a monopolist for \u201cmaintain-\ning a fair and orderly market\u201d in the face of market shocks [6].\nThe main challenge to formally analyzing this question is the complexity of the monopolistic market\nmaker\u2019s sequential decision problem. The market maker, when setting bid and ask prices, is plagued\nby a heavily path dependent exploitation-exploration dilema. There is a tradeoff between setting\nthe prices to extract maximum pro\ufb01t from the next trade versus setting the prices to get as much\ninformation about the new value of the instrument so as to generate larger pro\ufb01ts from future trades.\nThere is no known solution to this sequential decision problem.\nWe present the \ufb01rst such solution within an established model of market making. We show the\nsurprising fact that a monopolist market maker leads to higher market liquidity in periods of extreme\nmarket shock than does a zero-pro\ufb01t competitive market maker. In various single period settings, it\nhas been shown that monopolists can sometimes provide greater liquidity [6] by averaging expected\npro\ufb01ts across different trade sizes. We show for the \ufb01rst time that this can hold true with \ufb01xed trade\nsizes in a multi-period setting, because the market-maker is willing to take losses following a shock\nin order to learn the new valuation more quickly.\n\n1.1 Market Microstructure Background\n\nMarket microstructure has recently received much attention from a computational perspective [10,\n4, 12]. The driving problem of this paper is price discovery. Suppose an instrument has just begun\ntrading in a market where different people have different beliefs about its value. An example is\nshares in the \u201cBarack Obama wins the presidential election\u201d market. These shares should trade at\nprices that re\ufb02ect the probability that the event will occur: if the outcome pays off $100, the shares\nshould trade at about $55 if the aggregate public belief is 55% that the event will occur. Similarly,\nthe price of a stock should re\ufb02ect the aggregate public belief about future cash \ufb02ows associated\nwith a company. It is well-known that markets are good at aggregating information into prices,\nbut different market structures possess different qualities in this regard. We are concerned with the\nproperties of dealer markets, in which prices are set by one or more market-makers responsible for\nproviding liquidity by taking one side of every trade.\nMarket-making has been studied extensively in the theoretical market microstructure literature [8, 7,\nfor example], but only recently has the dynamic multi-period problem gained attention [2, 3]. Since\nwe are interested in the problem of how a market-maker learns a value for an asset, we follow the\ngeneral model of Glosten and Milgrom which abstracts away from the problem of quantities by\nrestricting attention to situations where the market-maker places bid and ask quotes for one unit of\nthe asset at each time step. Das [3] has extended this model to consider the market-maker\u2019s learning\nproblem with competitive pricing, while Darley et al [2] have used similar modeling for simulations\nof the NASDAQ. The Glosten and Milgrom model has become a standard model in this area.\nLiquidity, which is not easy to quantify, is the prime social concern. In practice, it is a function of\nthe depth of the limit order book. In our models, we measure liquidity using the bid-ask spread,\nor alternatively the probability that a trade will occur. This gives a good indication of the level of\ninformational heterogeneity in the market, and of execution costs. The dynamic behavior of the\nspread gives insight into the price discovery process.\n\n1.2 Our Contribution\n\nWe consider the question of optimal sequential price-setting in the Glosten-Milgrom model. The\nmarket-maker sets bid and ask prices at each trading period1 and when a trader arrives she has the\noption of buying or selling at those prices, or of not executing a trade. There are many results\nrelating to the properties of zero-pro\ufb01t (competitive) market-makers [7, 3]. The zero-pro\ufb01t problem\nis a single-period decision-making problem with online belief updates. Within this same framework,\none can formulate the decision problem for a monopolist market-maker who maximizes her total\ndiscounted pro\ufb01t as a reinforcement learning problem. The market maker\u2019s state is her belief about\nthe instrument value, and her action is to set bid and ask prices. The market maker\u2019s actions must\ntrade off pro\ufb01t taking (exploitation) with price discovery (exploration).\n\n1The MM is willing to buy at the bid price and sell at the ask price.\n\n2\n\n\fThe complexity of the sequential problem arises from the complexity of the state space and the fact\nthat the action space is continuous. The state of the market-maker must represent her belief about\nthe true value of the asset being traded. As such, it is a probability density function. In a parametric\nsetting, the state space is \ufb01nite dimensional, but continuous. Even if we assume a Gaussian prior\nfor the market-maker\u2019s belief as well as for the beliefs of all the traders, the market-maker\u2019s beliefs\nquickly become a complex product of error functions, and the exact dynamic programming problem\nbecomes intractable.\nWe solve the Bellman equation for the optimal sequential market maker within the framework of\nGaussian state space evolution, a close approximation to the true state space evolution. We present\nsimulation results which testify to how closely the Gaussian framework approximates the true evolu-\ntion. The Gaussian approximation alone does not alleviate the dif\ufb01culties associated with reinforce-\nment learning in continuous action and state spaces.2 However within our setting, we prove a key\nmonotonicity property for the state update. This property allows us to solve for the value function\nexactly using a single pass dynamic program.\nThus, our \ufb01rst contribution is a complete solution to the optimal sequential market making problem\nwithin a Gaussian update framework. Our second contribution relates to the phenomenological\nimplications for market behavior. We obtain the surprising result that in periods of extreme shock,\nwhen the market maker has large uncertainty relative to the traders, the monopolist provides greater\nliquidity than competitive zero-pro\ufb01t market-makers. The monopolist increases liquidity, possibly\ntaking short term losses, in order to learn more quickly, and in doing so offers the better social\noutcome. Of course, once the monopolist has adapted to the shock, she equilibrates at a higher bid\nask spread than the the corresponding zero-pro\ufb01t market maker with the same beliefs.\n\n2 The Model and the Sequential Decision Problem\n\n2.1 Market Model\n\nAt time 0, a shock occurs causing an instrument to attain value V which will be held \ufb01xed through\ntime (we consider one instrument in the market). This could represent a real market shock to a stock\nvalue (change in public beliefs), an IPO, or the introduction of a new contract in a prediction market.\nWe use a model similar to Das\u2019s [3] extension of the Glosten and Milgrom [7] model. We assume\nthat trading is divided into a sequence of discrete trading time steps, each time step corresponding\nto the arrival of a trader. The value V is drawn from some distribution gV (v).\nThe market-maker (M M), at each time step t \u2265 0, sets bid and ask prices bt \u2264 at at which she is\nwilling to respectively buy and sell one unit. Traders arrive at time-steps t \u2265 0. Trader t arrives with\na noisy estimate wt of V , where wt = V + \u0001t. The {\u0001t} are zero mean i.i.d. random variables with\ndistribution function F\u0001. We will assume that F\u0001 is symmetric, so that F\u0001(\u2212x) = 1 \u2212 F\u0001(x). The\ntrader decides whether to trade at either the bid or ask prices depending on the value of wt. The trader\nwill buy at at if wt > at (she thinks the instrument is undervalued), sell at bt if wt < bt (she thinks\nthe instrument is overvalued) and do nothing otherwise. M M receives a signal xt \u2208 {+1, 0,\u22121}\nindicating whether the trader bought, did nothing or sold. Note that information is conveyed only by\nthe direction of the trade. Information can also be conveyed by the patterns and size of trades, but\nthe present work abstracts away from those considerations.\nThe market-maker\u2019s objective is to maximize pro\ufb01t. In perfect competition, the MM is pushed to\nsetting bid and ask prices that yield zero expected pro\ufb01t. In a monopolistic setting, she wants to\noptimize the pro\ufb01ts she receives over time. As we will see below, this can be a dif\ufb01cult problem to\nsolve. A commonly used alternative is to consider a greedy, or myopically optimal MM who only\nmaximizes her expected pro\ufb01t from the next trade. This is a good approximation for agents with a\nhigh discount factor, since they are more concerned with immediate reward. We will consider all\nthree types of market-makers, (1) Zero-pro\ufb01t, (2) Myopic, and (3) Optimal.\n\n2Where one has to resort to unbounded value iteration methods whose convergence and uniqueness proper-\n\nties are little understood.\n\n3\n\n\f2.2 State Space\n\nThe state space for the MM is determined by MM\u2019s belief about the value V , described by a density\nfunction pt at time step t. The MM decides on actions (bid and ask prices) (bt, at) based on pt. The\nMM receives signal xt \u2208 {+1, 0,\u22121} as to whether the trader bought, sold, or did nothing.\nLet qt(V ; bt, at) be the probability of receiving signal xt given bid and ask (bt, at), conditioned on\nV . Assuming that F\u0001 is continuous at bt \u2212 V and at \u2212 V , a straightforward calculation yields\n\n\uf8f1\uf8f2\uf8f31 \u2212 F\u0001(at \u2212 V )\n, where the normalization constant At = (cid:82) \u221e\n\nxt = +1,\nF\u0001(at \u2212 V ) \u2212 F\u0001(bt \u2212 V ) xt = 0,\nxt = \u22121,\nF\u0001(bt \u2212 V )\nt are respectively +\u221e, at, bt\nor, qt(V ; bt, at) = F\u0001(z+\nand at, bt,\u2212\u221e when xt = +1, 0,\u22121. The Bayesian update to pt is then given by pt+1(v) =\npt(v) qt(v;bt,at)\n\u2212\u221e dv pt(v)qt(v; bt, at). Unfolding the\n\nt \u2212 V ), where z+\n\nt \u2212 V ) \u2212 F\u0001(z\u2212\n\nqt(V ; bt, at) =\n\nt and z\u2212\n\n= Ept[V |xt = +1]\n\nAt\n\nrecursion gives pt+1(v) = p0(v)(cid:81)t\n(cid:80)\u221e\nrt as rt = (cid:82) \u221e\n\nq\u03c4 (v;b\u03c4 ,a\u03c4 )\n\nA\u03c4\n\n\u03c4 =1\n\n2.3 Solving for Market Maker Prices\nLet bt \u2264 at, and let rt be the expected pro\ufb01t at time t. The expected discounted return is then R =\nt=0 \u03b3trt where 0 < \u03b3 < 1 is the discount factor. The optimal MM maximizes R. We can compute\n\u2212\u221e dv vF\u0001(\u2212v) (pt(v + bt) + pt(at \u2212 v)). rt decomposes into two terms which can\n(at). In perfect competition, M M\nbe identi\ufb01ed as the bid and ask side pro\ufb01ts, rt = rbid\nshould not be expecting any pro\ufb01t on either the bid or ask side. This is because if the contrary were\ntrue, a competing MM could place bid or ask prices so as to obtain less pro\ufb01t, wiping out M M\u2019s\nadvantage. This should hold at every time step. Hence the M M will set bid and ask prices such that\n(cid:82) \u221e\n(at) = 0. Solving for bt, at, we \ufb01nd that bt and at must satisfy the following\nrbid\nt\n\ufb01xed point equations (these are also derived for the case of Gaussian noise by Das [3]),\n(cid:82) \u221e\n\u2212\u221e dv vpt(v)F\u0001(v \u2212 at)\n\u2212\u221e dv pt(v)F\u0001(v \u2212 at)\n\n(cid:82) \u221e\n(cid:82) \u221e\n\u2212\u221e dv vpt(v)F\u0001(bt \u2212 v)\n\u2212\u221e dv pt(v)F\u0001(bt \u2212 v)\n\n= Ept[V |xt = \u22121], at =\n\n(bt) = 0 and rask\n\n(bt) + rask\n\nbt =\n\nt\n\nt\n\nt\n\n(assuming the denominators, which are the conditional probabilities of hitting the bid or ask are\nnon-zero). The myopic monopolist maximizes rt. For the typical case of well behaved distributions\npt(v) and F\u0001, the bid and ask returns display a single maximum. In this case, we can obtain bmyp\nand amyp\nby setting the derivatives to zero (we assume the functions are well behaved so that the\nderivatives are de\ufb01ned). Letting f\u0001(x) = F (cid:48)\n(cid:82) \u221e\nand\namyp\nt\n\u2212\u221e dv pt(v)(vf\u0001(at \u2212 v) + F\u0001(v \u2212 at))\n\n(cid:82) \u221e\nsatisfy the \ufb01xed point equations\n\u2212\u221e dv pt(v)(vf\u0001(bt \u2212 v) \u2212 F\u0001(bt \u2212 v))\n\n\u0001(x) be the density function for the noise \u0001t, bmyp\n\nt\n\nt\n\nt\n\n(cid:82) \u221e\n\u2212\u221e dv pt(v)f\u0001(at \u2212 v)\n\nThe optimal strategy for MM is not as easy to obtain. When \u03b3 is large, the expected discounted\nreturn R could be signi\ufb01cantly higher than the myopic return. The optimal MM might choose to\nsacri\ufb01ce short term return for a substantially larger return over the long term. The only reason to\ndo this is if choosing a sub-optimal short term strategy will lead to a signi\ufb01cant decrease in the\nuncertainty in V (which translates to a narrowing of the probability distribution pt(v)). MM can\nthen exploit this more certain information regarding V in the longer term.\nThe optimal strategy for the MM is encapsulated in the Bellman equation for the value functional\n(where the state pt, is a function, (bt, at) is the action, and \u03c0 is a policy):\nt (pt)] + \u03b3E[V (pt+1; \u03c0)|pt, b\u03c0\n\nV (pt; \u03c0) = E[r0|pt, b\u03c0\n\nt (pt), a\u03c0\n\nt (pt), a\u03c0\n\nt (pt)]\n\nThis equation re\ufb02ects the fact that the MM\u2019s expected pro\ufb01t is a function of both her immediate\nexpected return, and her future state, which is also affected by her bid and ask prices. The fact that\nV is a value functional leads to numerous technical problems when solving this Bellman equation.\nThe problem is heavily path dependent with the number of paths being exponential in the number\nof trading periods. To make this tractable, we use a Gaussian approximation for the state space\nevolution.\n\n4\n\nbt =\n\n, at =\n\n(cid:82) \u221e\n\u2212\u221e dv pt(v)f\u0001(bt \u2212 v)\n\n\fI(\u03b1, \u03b2) =\n\nJ(\u03b1, \u03b2) =\n\nK(\u03b1, \u03b2) =\n\n,\n\n= \u03a6\n\n1 + \u03b22\n\nZ \u221e\nZ \u03b1\u2212\u03b2x\n \n!\n\u2212\u221e dx N (x)\n\u2212\u221e dy N (y)\n\u03b1p\nZ \u221e\n\u2212\u221e dx x \u00b7 N (x)\ns\n \n\u00b7 N\nZ \u221e\n\u2212\u221e dx x\n= I(\u03b1, \u03b2) \u2212\n\n2 e\u2212x2 /2\n\n= \u2212\n\n2\u03c0\n\u03b1\u03b22\n\n1 + \u03b22\n\n\u03b22\n\n\u221a\n\n(1 + \u03b22)3/2\n\nL(\u03b1, \u03b2) = I(\u03b1, \u03b2) \u2212 K(\u03b1, \u03b2)\n\nA(z\n\n+\n\n, z\n\nB(z\n\n+\n\n, z\n\nC(z\n\n+\n\n, z\n\n\u2212\n\n\u2212\n\n\u2212\n\n) = I\n\n) = J\n\n) = L\n\n!\n!\n!\n\n, \u03c1t\n\n, \u03c1t\n\n z+ \u2212 \u00b5t\n z+ \u2212 \u00b5t\n z+ \u2212 \u00b5t\n\n\u03c3\u0001\n\n\u03c3\u0001\n\n\u03c3\u0001\n\n, \u03c1t\n\n\u2212 L\n\n,\n\n1 + \u03b22\n\nZ \u03b1\u2212\u03b2x\n!\n\u2212\u221e dy N (y)\n\u03b1p\nZ \u03b1\u2212\u03b2x\ne\u2212y2/2\n\u221a\n \n!\n\u2212\u221e dy\n2\u03c0\n\u03b1p\n z\u2212 \u2212 \u00b5t\n z\u2212 \u2212 \u00b5t\n z\u2212 \u2212 \u00b5t\n\n!\n!\n!\n\n1 + \u03b22\n\n\u2212 J\n\n\u2212 I\n\n\u00b7 N\n\n, \u03c1t\n\n, \u03c1t\n\n\u03c3\u0001\n\n\u03c3\u0001\n\n, \u03c1t\n\n\u03c3\u0001\n\nFigure 1: Gaussian state update (dashed) versus\ntrue state update (solid) illustrating that the Gaus-\nsian approximation is valid.\n\nFigure 2: Gaussian integrals and normalization\nconstants used in the derivation of the DP and\nthe state updates.\n\n2.4 The Gaussian Approximation\n\nFrom a Gaussian prior and performing Bayesian updates, one expects that the state distribution\nwill be closely approximated by a Gaussian (see Figure 1). Thus, forcing the MM to maintain a\nGaussian belief over the true value at each time t should give a good approximation to the true state\nspace evolution, and the resulting optimal actions should closely match the true optimal actions. In\nmaking this reduction, we reduce the state space to a two parameter function class parameterized by\nt ). The value function is independent of \u00b5t (hence dependent only on\nthe mean and variance, (\u00b5t, \u03c32\n\u03c3t), and the optimal action is of the form bt = \u00b5t \u2212 \u03b4t, at = \u00b5t + \u03b4t. Thus,\n\nV (\u03c3t) = max\n\n\u03b4\n\n{rt(\u03c3t, \u03b4) + \u03b3E[V (\u03c3t+1)|\u03b4]}\n\n(1)\n\n(cid:16) z\u2212\u2212v\n\n\u03c3\u0001\n\n(cid:17)(cid:105)\nt and z\u2212\n\n.\n\nTo compute the expectation on the RHS, we need the probabilistic dynamics in the (approximate)\nGaussian state space, i.e., we need the evolution of \u00b5t, \u03c3t.\nLet N(\u00b7), \u03a6(\u00b7) denote the standard normal density and distribution. Let pt(v) = 1\nGaussian with mean \u00b5t and variance \u03c32\nso F\u0001(x) = \u03a6( x\n\u03c3\u0001\n\nbe\nt . Assume that the noise is also Gaussian with variance \u03c32\n\u0001 ,\n\n). At time t + 1, after the Bayesian update, we have\n\n(cid:16) v\u2212\u00b5t\n\n(cid:17)\n\nN\n\n\u03c3t\n\n\u03c3t\n\n(cid:16) v\u2212\u00b5t\n\n(cid:17)(cid:104)\n\n\u03c3t\n\n\u03a6\n\n(cid:16) z+\u2212v\n\n\u03c3\u0001\n\n(cid:17) \u2212 \u03a6\n\npt+1 =\n\n1\nA\n\n\u00b7 1\n\u03c3t\n\nN\n\nThe normalization constant A(z+, z\u2212) is given in Figure 2, and z+\n+\u221e, at, bt and at, bt,\u2212\u221e when xt = +1, 0,\u22121. The updates \u00b5t+1 and \u03c32\n\nEpt+1[V ] = (cid:82) dv vpt+1(v) and Ept+1[V 2] = (cid:82) dv v2pt+1(v). After some tedious algebra (see\n\nare respectively\nt+1 are obtained from\n\nt\n\nsupplementary information), we obtain\n\n(cid:18)\n\n\u00b5t+1 = \u00b5t + \u03c3t \u00b7 B\nA\n\n,\n\nt+1 = \u03c32\n\u03c32\n\nt\n\n1 \u2212 AC + B2\n\nA2\n\n(cid:19)\n\n.\n\n(2)\n\n(3)\n\nFigure 2 gives the expressions for A, B, C.\nTheorem 2.1 (Monotonic state update). \u03c32\n\nt+1 \u2264 \u03c32\n\nt (see supplementary information for proof).\n\n5\n\n123450.0000.0050.0100.0150.0200.0250.030Exact vs. Approximate Belief UpdateValueProbability2 steps5 steps20 steps\fEstablishing that \u03c3t is decreasing in t allows us to solve the dynamic program ef\ufb01ciently (note that\nthe property of decreasing variance is well-known for the case of an update to a Gaussian prior when\nthe observation is also Gaussian \u2013 we are showing this for threshold observations).\n\n2.5 Solving the Bellman Equation\n\nWe now return to the Bellman equation (1). In light of Theorem 2.1, the RHS of this equation is\ndependent only on states \u03c3t+1 that are strictly smaller than the state \u03c3t on the LHS. We can thus\nsolve this problem numerically by computing V (0) and then building up the solution for a \ufb01ne grid\non the real line. We use linear interpolation between previously computed points if the variance\nupdate leads to a point not on the grid.\nWe need to explicitly construct the states on the RHS with respect to which the expectation is being\ntaken. The expectation is with respect to the future state \u03c3t+1, which depends directly on the trade\noutcome xt \u2208 {\u22121, 0, +1}. We de\ufb01ne \u03c1t = \u03c3t/\u03c3\u0001 and q = \u03b4t/\u03c3\u0001\nt , where at = \u00b5t + \u03b4t and\nbt = \u00b5t \u2212 \u03b4t. The following table sumarizes some of the useful quantities:\n\n(cid:112)1 + \u03c12\n\nxt\n+1\n0\n\u22121\n\nProb.\n\n1 \u2212 \u03a6(qt)\n2\u03a6(qt) \u2212 1\n1 \u2212 \u03a6(qt)\n\n\u00b5t+1\n\n\u03c3t+1\n\u00b5t + \u03bat\u03c3t \u03b1t\u03c3t\n\u03b2t\u03c3t\n\u00b5t \u2212 \u03bat\u03c3t \u03b1t\u03c3t\n\n\u00b5t\n\nwhere\n\n2\nt\n\n\u03b2\n\n= 1 \u2212\n\n2\n\nt = 1 \u2212 \u03c12\n\n\u03b1\n\nt N (qt)(N (qt) \u2212 qt[1 \u2212 \u03a6(qt)])\n\nt )(1 \u2212 \u03a6(qt))2\n\n(1 + \u03c12\n2\u03c12\nt qtN (qt)\nt )(2\u03a6(qt) \u2212 1)\n(1 + \u03c12\n\nvuut \u03c12\n\nt\n\n1 + \u03c12\nt\n\nN (qt)\n\n1 \u2212 \u03a6(qt)\n\n\u03bat =\n\nNote that qt > 0, \u03b1t, \u03b2t < 1 and \u03bat > 0. We can now compute E[V (\u03c3t+1|\u03b4t)] as\n\n2(1 \u2212 \u03a6(qt))V (\u03b1t\u03c3t) + (2\u03a6(qt) \u2212 1)V (\u03b2t\u03c3t).\n\nThis allows us to complete the speci\ufb01cation for the Bellman equation (with x = \u03c12\nis the MM\u2019s information disadvantage)\n\nt where \u03c1t = \u03c3\n\u03c3\u0001\n\n\uf6be\n\n\u221a\n\n2\u03c32\n\u0001\n\n1 + x\n\n\u201e\n+ \u03b3\u02c62(1 \u2212 \u03a6(q))V (\u03b12(x, q)x; \u03c3\u0001) + (2\u03a6(q) \u2212 1)V (\u03b22(x, q)x; \u03c3\u0001)\u02dc\ufb00\n\nq(1 \u2212 \u03a6(q)) \u2212 x\n\n\u00ab\n\n1 + x\n\nN (q)\n\nV (x; \u03c3\u0001) = max\n\nq\n\nt = x and qt = q.\n\nwhere \u03b12(x, q) and \u03b22(x, q) are as de\ufb01ned above with \u03c12\nWe de\ufb01ne the optimal action q\u2217(x) as the value of q that maximizes the RHS. When x = 0,\nthe myopic and optimal M M coincide, and so we have that V (0) = 2q\u2217(1\u2212\u03a6(q\u2217))\n, where q\u2217 =\nq\u2217(0) \u2248 0.7518 satis\ufb01es q\u2217N(q\u2217) = 1 \u2212 \u03a6(q\u2217). Note that if we only maximize the \ufb01rst term in the\nvalue function, we obtain the myopic action qmyp(\u03c1), satisfying the \ufb01xed point equation: qmyp =\n(1 + \u03c12\nN (qmyp) . There is a similarly elegant solution for the zero-pro\ufb01t MM under the Gaussian\nassumption, obtained by setting rt = 0, yielding the \ufb01xed point equation: qzero = \u03c12\nN (qzero)\n1\u2212\u03a6(qzero).\n1+\u03c12\nt\n10 standard \ufb01xed point iterations are suf\ufb01cient to solve these equations accurately.\n\nt ) 1\u2212\u03a6(qmyp)\n\n1\u2212\u03b3\n\nt\n\n3 Experimental Results\n\nFirst, we validate the Gaussian approximation by simulating a market as follows. The initial value\nV is drawn from a Gaussian with mean 0 and standard deviation \u03c3, and we set the discount rate\n\u03b3 = 0.9. Each simulation consists of 100 trading periods at which point discounted returns become\nnegligible. At each trading step t, a new trader arrives with a valuation wt \u223c N(V, 1) (Gaussian\nwith mean V and variance 1). We report results averaged over more than 10,000 simulations, each\nwith a randomly sampled value of V .\nIn each simulation, the market-maker\u2019s state updates are given by the Gaussian approximation (2),\n(3), according to which she sets bid and ask prices. The trader at time-step t trades by comparing\nwt to bt, at. We simulate the outcomes of the optimal, myopic, and zero-pro\ufb01t MMs. An alternative\n\n6\n\n\f(a) Realized vs theoretical value\nfunction in the Gaussian ap-\nproximation (thin black line).\nThe realized closely matches\nthe theoretical, validating the\nGaussian framework.\n\n(b) Bid-ask spreads as a func-\ntion of the MM information\ndisadvantage \u03c1 indicating that\nonce \u03c1 exceeds about 1.5, the\nmonopolist offers the greatest\nliquidity.\n\n(c) Realized average return as a\nfunction of time: the monopo-\nlist is willing to take signi\ufb01cant\nshort term loss to improve fu-\nture pro\ufb01ts as a result of better\nprice discovery.\n\nFigure 3: MM Properties derived from the solution of the Bellman equation.\n\nis to maintain the exact state as a product of error functions, and extract the mean and variance\nfor computing the optimal action. This is computationally prohibitive, and leads to no signi\ufb01cant\ndifferences. If the real world conformed to the MM\u2019s belief, a new value Vt would be drawn from\nN(\u00b5t, \u03c3t) at each trading period t, and then the trader would receive a sample wt \u223c N(Vt, 1). All\nour computations are exact within this \u201cGaussian\u201d world, however the point here is to test the degree\nto which the Gaussian and real worlds differ.\nThe ideal test of our optimal MM is against the true optimal for the real world, which is intractable.\nHowever, if we \ufb01nd that the theoretical value function for the optimal MM in the Gaussian world\nmatches the realized value function in the real world, then we have strong, though not necessarily\nconclusive, evidence for two conclusions: (1) The Gaussian world is a good approximation to the\nreal world, otherwise the realized and theoretical value functions would not coincide; (2) Since the\ntwo worlds are nearly the same, the optimal MM in the Gaussian world should closely match the\ntrue optimal. Figure 3(a) presents results which show that the realized and theoretical value func-\ntions are essentially the same, presenting the desired evidence (note that with independent updates,\nthe posterior should be asymptotically Gaussian). Figure 3(a) also demonstrates that the optimal\nsigni\ufb01cantly outperforms the myopic market-maker. Figure 3(b) shows how the bid-ask spread will\nbehave as a function of the MM information disadvantage.\nSome phenomenological properties of the market are shown in Figure 4.3 For a starting MM in-\nformation disadvantage of \u03c1 = 3, the optimal MM initially has signi\ufb01cantly lower spread, even\ncompared with the zero pro\ufb01t market-maker. The reason for this outcome is illustrated in Figure\n3(c) where we see that the optimal market maker is offering lower spreads and taking on signi\ufb01cant\ninitial loss to be compensated later by signi\ufb01cant pro\ufb01ts due to better price discovery. At equilibrium\nthe optimal MM\u2019s spread and the myopic spread are equal, as expected.\n\n4 Discussion\n\nOur solution to the Bellman equation for the optimal monopolistic MM leads to the striking conclu-\nsion that the optimal MM is willing to take early losses by offering lower spreads in order to make\nsigni\ufb01cantly higher pro\ufb01ts later (Figures 3(b,c) and 4). This is quantitative evidence that the optimal\nMM offers more liquidity than a zero-pro\ufb01t MM after a market shock, especially when the MM is\nat a large information disadvantage. In this regime, exploration is more important than exploitation.\nCompetition may actually impede the price discovery process, since the market makers would have\nno incentive to take early losses for better price discovery \u2013 competitive pricing is not necessarily\ninformationally ef\ufb01cient (there are quicker ways for the market to \u201clearn\u201d a new valuation).\n\n3With both zero-pro\ufb01t and optimal MMs we reproduce one of the key \ufb01ndings of Das [3]:\n\nthe market\nexhibits a two-regime behavior. Price jumps are immediately followed by a regime of high spreads (the price-\ndiscovery regime), and then when the market-maker learns the new valuation, the market settles into an equi-\nlibrium regime of lower spreads (the ef\ufb01cient market regime).\n\n7\n\n00.511.522.533.5400.511.522.533.5MM Information Disadvantage !Discounted ProfitProfit Vs. MM Information Disadvantage Opt.MyopicZero Profit012340246810MM Information Disadvantage !Spread (2\"/#$)Spread Vs. MM Information Disadvantage Opt.MyopicZero Profit5101520253035!1!0.500.51Time Step tProfitProfits Over Time Opt.MyopicZero Profit\f(a) Realized spread over time\n(\u03c3 = 3). The optimal MM\nstarts with lowest spread, and\nconverges quickest to equilib-\nrium.\n\n(b) Liquidity over time (\u03c3 =\n3), measured by probability of a\ntrade. Initial liquidity is higher\nfor the optimal MM.\n\n(c) Time to spread stabilization.\nWhen MM\u2019s information disad-\nvantage increases, the optimal\nMM is signi\ufb01cantly better.\n\nFigure 4: Realized market properties based on simulating the three MMs.\n\nOur solution is based on reducing a functional state space to a \ufb01nite-dimensional one in which the\nBellman equation can be solved ef\ufb01ciently. When the state is a probability distribution, updated\naccording to independent events, we expect the Gaussian approximation to closely match the real\nstate evolution. Hence, our methods may be generally applicable to problems of this form.\nWhile this paper presents a stylized model, simple trading models have been shown to produce rich\nmarket behavior in many cases (for example, [5]). The results presented here are an example of\nthe kinds of insights that can be be gained from studying market properties in these models while\napproaching agent decision problems from the perspective of machine learning. At the same time,\nthis paper is not purely theoretical. The eventual algorithm we present is easy to implement, and we\nare in the process of evaluating this algorithm in test prediction markets. Another direction we are\npursuing is to endow the traders with intelligence, so they may learn the true value too. We believe\nthe Gaussian approximation admits a solution for a monopolistic market-maker and adaptive traders.\n\nReferences\n[1] W.G. Christie and P.H. Schulz. Why do NASDAQ market makers avoid odd-eighth quotes? J. Fin., 49(5),\n\n1994.\n\n[2] V. Darley, A. Outkin, T. Plate, and F. Gao. Sixteenths or pennies? Observations from a simulation of the\n\nNASDAQ stock market. In IEEE/IAFE/INFORMS Conf. on Comp. Intel. for Fin. Engr., 2000.\n\n[3] S. Das. A learning market-maker in the Glosten-Milgrom model. Quant. Fin., 5(2):169\u2013180, April 2005.\n[4] E. Even-Dar, S.M. Kakade, M. Kearns, and Y. Mansour. (In)stability properties of limit order dynamics.\n\nIn Proc. ACM Conf. on Elect. Comm., 2006.\n\n[5] J.D. Farmer, P. Patelli, and I.I Zovko. The predictive power of zero intelligence in \ufb01nancial markets.\n\nPNAS, 102(11):2254\u20132259, 2005.\n\n[6] L.R. Glosten. Insider trading, liquidity, and the role of the monopolist specialist. J. Bus., 62(2), 1989.\n[7] L.R. Glosten and P.R. Milgrom. Bid, ask and transaction prices in a specialist market with heteroge-\n\nneously informed traders. J. Fin. Econ., 14:71\u2013100, 1985.\n\n[8] S.J. Grossman and M.H. Miller. Liquidity and market structure. J. Fin., 43:617\u2013633, 1988.\n[9] Roger D. Huang and Hans R. Stoll. Dealer versus auction markets: A paired comparison of execution\n\ncosts on NASDAQ and the NYSE. J. Fin. Econ., 41(3):313\u2013357, 1996.\n\n[10] S.M. Kakade, M. Kearns, Y. Mansour, and L. Ortiz. Competitive algorithms for VWAP and limit-order\n\ntrading. In Proc. ACM Conf. on Elect. Comm., pages 189\u2013198, 2004.\n\n[11] Juong-Sik Lee and Boleslaw Szymanski. Auctions as a dynamic pricing mechanism for e-services. In\n\nCheng Hsu, editor, Service Enterprise Integration, pages 131\u2013156. Kluwer, New York, 2006.\n\n[12] D. Pennock and R. Sami. Computational aspects of prediction markets. In N. Nisan, T. Roughgarden,\n\nE. Tardos, and V.V. Vazirani, editors, Algorithmic Game Theory. Cambridge University Press, 2007.\n\n[13] Justin Wolfers and Eric Zitzewitz. Prediction markets. J. Econ. Persp., 18(2):107\u2013126, 2004.\n\n8\n\n0102030400246810Time Step tBid!Ask SpreadExpected Bid!Ask Spread Dynamics Opt.MyopicZero Profit510152025303500.20.40.60.81Time Step tTrade ProbabilityTrading Activity Opt.MyopicZero Profit01234501020304050607080MM Information Disadvantage !Time To StabilizationSpread Stabilization Rate Opt.MyopicZero Profit\f", "award": [], "sourceid": 272, "authors": [{"given_name": "Sanmay", "family_name": "Das", "institution": null}, {"given_name": "Malik", "family_name": "Magdon-Ismail", "institution": null}]}