{"title": "On Stochastic and Worst-case Models for Investing", "book": "Advances in Neural Information Processing Systems", "page_first": 709, "page_last": 717, "abstract": "In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While it is often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best fixed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions.", "full_text": "On Stochastic and Worst-case Models for Investing\n\nElad Hazan\n\nIBM Almaden Research Center\n\n650 Harry Rd, San Jose, CA 95120\nehazan@cs.princeton.edu\n\nSatyen Kale\n\nYahoo! Research\n\n4301 Great America Parkway, Santa Clara, CA 95054\n\nskale@yahoo-inc.com\n\nAbstract\n\nIn practice, most investing is done assuming a probabilistic model of stock price\nreturns known as the Geometric Brownian Motion (GBM). While often an ac-\nceptable approximation, the GBM model is not always valid empirically. This\nmotivates a worst-case approach to investing, called universal portfolio manage-\nment, where the objective is to maximize wealth relative to the wealth earned by\nthe best \ufb01xed portfolio in hindsight.\nIn this paper we tie the two approaches, and design an investment strategy which\nis universal in the worst-case, and yet capable of exploiting the mostly valid GBM\nmodel. Our method is based on new and improved regret bounds for online convex\noptimization with exp-concave loss functions.\n\n1 Introduction\n\n\u201cAverage-case\u201d Investing: Much of mathematical \ufb01nance theory is devoted to the modeling of\nstock prices and devising investment strategies that maximize wealth gain, minimize risk while doing\nso, and so on. Typically, this is done by estimating the parameters in a probabilistic model of stock\nprices. Investment strategies are thus geared to such average case models (in the formal computer\nscience sense), and are naturally susceptible to drastic deviations from the model, as witnessed in\nthe recent stock market crash.\nEven so, empirically the Geometric Brownian Motion (GBM) ([Osb59, Bac00]) has enjoyed great\npredictive success and every year trillions of dollars are traded assuming this model. Black and\nScholes [BS73] used this same model in their Nobel prize winning work on pricing options on\nstocks.\n\u201cWorst-case\u201d Investing: The fragility of average-case models in the face of rare but dramatic de-\nviations led Cover [Cov91] to take a worst-case approach to investing in stocks. The performance\nof an online investment algorithm for arbitrary sequences of stock price returns is measured with\nrespect to the best CRP (constant rebalanced portfolio, see [Cov91]) in hindsight. A universal port-\nfolio selection algorithm is one that obtains sublinear (in the number of trading periods T ) regret,\nwhich is the difference in the logarithms of the \ufb01nal wealths obtained by the two.\nCover [Cov91] gave the \ufb01rst universal portfolio selection algorithm with regret bounded by\nO(log T ). There has been much follow-up work after Cover\u2019s seminal work, such as [HSSW96,\nMF92, KV03, BK97, HKKA06], which focused on either obtaining alternate universal algorithms\nor improving the ef\ufb01ciency of Cover\u2019s algorithm. However, the best regret bound is still O(log T ).\nThis dependence of the regret on the number of trading periods is not entirely satisfactory for two\nmain reasons. First, a priori it is not clear why the online algorithm should have high regret (growing\nwith the number of iterations) in an unchanging environment. As an extreme example, consider a\nsetting with two stocks where one has an \u201cupward drift\u201d of 1% daily, whereas the second stock\nremains at the same price. One would expect to \u201c\ufb01gure out\u201d this pattern quickly and focus on the\n\n1\n\n\f\ufb01rst stock, thus attaining a constant fraction of the wealth of the best CRP in the long run, i.e.\nconstant regret, unlike the worst-case bound of O(log T ).\nThe second problem arises from trading frequency. Suppose we need to invest over a \ufb01xed period of\ntime, say a year. Trading more frequently potentially leads to higher wealth gain, by capitalizing on\nshort term stock movements. However, increasing trading frequency increases T , and thus one may\nexpect more regret. The problem is actually even worse: since we measure regret as a difference of\nlogarithms of the \ufb01nal wealths, a regret bound of O(log T ) implies a poly(T ) factor ratio between the\n\ufb01nal wealths. In reality, however, experiments [AHKS06] show that some known online algorithms\nactually improve with increasing trading frequency.\nBridging Worst-case and Average-case Investing: Both these issues are resolved if one can show\nthat the regret of a \u201cgood\u201d online algorithm depends on total variation in the sequence of stock\nreturns, rather than purely on the number of iterations. If the stock return sequence has low variation,\nwe expect our algorithm to be able to perform better. If we trade more frequently, then the per\niteration variation should go down correspondingly, so the total variation stays the same.\nWe analyze a portfolio selection algorithm and prove that its regret is bounded by O(log Q), where\nQ (formally de\ufb01ned in Section 1.2) is the sum of squared deviations of the returns from their mean.\nSince Q \u2264 T (after appropriate normalization), we improve over previous regret bounds and retain\nthe worst-case robustness. Furthermore, in an average-case model such as GBM, the variation can\nbe tied very nicely to the volatility parameter, which explains the experimental observation the regret\ndoesn\u2019t increase with increasing trading frequency. Our algorithm is ef\ufb01cient, and its implementa-\ntion requires constant time per iteration (independent of the number of game iterations).\n\n1.1 New Techniques and Comparison to Related Work\n\nCesa-Bianchi, Mansour and Stoltz [CBMS07] initiated work on relating worst case regret to the\nvariation in the data for the related learning problem of prediction from expert advice, and conjec-\n\u221a\ntured that the optimal regret bounds should depend on the observed variation of the cost sequence.\nRecently, this conjectured was proved and regret bounds of \u02dcO(\nQ) were obtained in the full infor-\nmation and bandit linear optimization settings [HK08, HK09], where Q is the variation in the cost\nsequence. In this paper we give an exponential improvement in regret, viz. O(log Q), for the case\nof online exp-concave optimization, which includes portfolio selection as a special case.\nAnother approach to connecting worst-case to average-case investing was taken by Jamshidian\n[Jam92] and Cross and Barron [CB03]. They considered a model of \u201ccontinuous trading\u201d, where\nthere are T \u201ctrading intervals\u201d, and in each the online investor chooses a \ufb01xed portfolio which is\nrebalanced k times with k \u2192 \u221e. They prove familiar regret bounds of O(log T ) (independent of\nk) in this model w.r.t. the best \ufb01xed portfolio which is rebalanced T \u00d7 k times. In this model our\nalgorithm attains the tighter regret bounds of O(log Q), although our algorithm has more \ufb02exibility.\nFurthermore their algorithms, being extensions of Cover\u2019s algorithm, may require exponential time\nin general1.\n\u221a\nOur bounds of O(log Q) regret require completely different techniques compared to the \u02dcO(\nQ)\nregret bounds of [HK08, HK09]. These previous bounds are based on \ufb01rst-order gradient descent\nmethods which are too weak to obtain O(log Q) regret. Instead we have to use the second-order\nNewton step ideas based on [HKKA06] (in particular, the Hessian of the cost functions).\nThe second-order techniques of [HKKA06] are, however, not sensitive enough to obtain O(log Q)\nbounds. This is because progress was measured in terms of the distance between successive portfo-\nlios in the usual Euclidean norm, which is insensitive to variation in the cost sequence. In this paper,\nwe introduce a different analysis technique, based on analyzing the distance between successive\npredictions using norms that keep changing from iteration to iteration and are actually sensitive to\nthe variation.\nA key technical step in the analysis is a lemma (Lemma 6) which bounds the sum of differences of\nsuccessive Cesaro means of a sequence of vectors by the logarithm of its variation. This lemma,\n\n1Cross and Barron give an ef\ufb01cient implementation for some interesting special cases, under assumptions\non the variation in returns and bounds on the magnitude of the returns, and assuming k \u2192 \u221e. A truly ef\ufb01cient\nimplementation of their algorithm can probably be obtained using the techniques of Kalai and Vempala.\n\n2\n\n\fwhich may be useful in other contexts when variation bounds on the regret are desired, is proved\nusing the Kahn-Karush-Tucker conditions, and also improves the regret bounds in previous papers.\n\n1.2 The model and statement of results\n\n\u2206n = {(cid:80)\n\nPortfolio management. In the universal portfolio management model [Cov91], an online investor\niteratively distributes her wealth over n assets before observing the change in asset price. In each\niteration t = 1, 2, . . . the investor commits to an n-dimensional distribution of her wealth, xt \u2208\ni xi = 1 , x \u2265 0}. She then observes a price relatives vector rt \u2208 Rn\n+, where rt(i) is\n(cid:81)\nthe ratio between the closing price of the ith asset on trading period t and the opening price. In the\n(cid:80)\ntth trading period, the wealth of the investor changes by a factor of (rt \u00b7 xt). The overall change in\nt(rt\u00b7 xt). Since in a typical market wealth grows at an exponential rate, we measure\nwealth is thus\n(cid:81)\nt log(rt \u00b7 xt). A constant\nperformance by the exponential growth rate, which is log\nrebalanced portfolio (CRP) is an investment strategy which rebalances the wealth in every iteration\nt(rt \u00b7 x).\nto keep a \ufb01xed distribution. Thus, for a CRP x \u2208 \u2206n, the change in wealth is\nThe regret of the investor is de\ufb01ned to be the difference between the exponential growth rate of her\ninvestment strategy and that of the best CRP strategy in hindsight, i.e.\n\nt(rt \u00b7 xt) =\n\n(cid:81)\n\n(cid:88)\n\n(cid:88)\n\nRegret := max\nx\u2217\u2208\u2206n\n\nlog(rt \u00b7 x\u2217) \u2212\n\nlog(rt \u00b7 xt)\n\nt\n\nt\n\nNote that the regret doesn\u2019t change if we scale all the returns in any particular period by the same\namount. So we assume w.l.o.g. that in all periods t, maxi rt(i) = 1. We assume that there is known\nparameter r > 0, such that for all periods t, mint,i rt(i) \u2265 r. We call r the market variability\nparameter. This is the only restriction we put on the stock price returns; they could be chosen\nadversarially as long as they respect the market variability bound.\nOnline convex optimization. In the online convex optimization problem [Zin03], which generalizes\nuniversal portfolio management, the decision space is a closed, bounded, convex set K \u2208 Rn, and\nwe are sequentially given a series of convex cost2 functions ft : K \u2192 R for t = 1, 2, . . .. The\nalgorithm iteratively produces a point xt \u2208 K in every round t, without knowledge of ft (but using\nthe past sequence of cost functions), and incurs the cost ft(xt). The regret at time T is de\ufb01ned to be\n\nt denote\n\nUsually, we will let\nt=1. In this paper, we restrict our attention to convex cost functions\nwhich can be written as ft(x) = g(vt \u00b7 x) for some univariate convex function g and a parameter\nvector vt \u2208 Rn (for example, in the portfolio management problem, K = \u2206n, ft(x) = \u2212 log(rt\u00b7x),\ng = \u2212 log, and vt = rt).\nThus, the cost functions are parametrized by the vectors v1, v2, . . . , vT . Our bounds will be ex-\npressed as a function of the quadratic variability of the parameter vectors v1, v2, . . . , vT , de\ufb01ned\nas\n\nt=1\n\n(cid:107)vt \u2212 \u00b5(cid:107)2.\n\nQ(v1, ..., vT ) := min\n\u00b5\n\n(cid:80)T\nt=1 vt, and thus the quadratic variation is just T \u2212 1 times\nThis expression is minimized at \u00b5 = 1\nthe sample variance of the sequence of vectors {v1, ..., vt}. Note however that the sequence can be\nT\ngenerated adversarially rather than by some stochastic process. We shall refer to this as simply Q if\nthe vectors are clear from the context.\nMain theorem. In the setup of the online convex optimization problem above, we have the following\nalgorithmic result:\nTheorem 1. Let the cost functions be of the form ft(x) = g(vt\u00b7x). Assume that there are parameters\nR, D, a, b > 0 such that the following conditions hold:\n\nT(cid:88)\n\n2Note the difference from the portfolio selection problem: here we have convex cost functions, rather than\n\nconcave payoff functions. The portfolio selection problem is obtained by using \u2212 log as the cost function.\n\n3\n\nT(cid:88)\n\nt=1\n\nT(cid:88)\n\nt=1\n\nft(xt) \u2212 min\nx\u2208K\n\nft(x).\n\n(cid:80)\n\nRegret :=\n\n(cid:80)T\n\n\f1. for all t, (cid:107)vt(cid:107) \u2264 R,\n2. for all x \u2208 K, we have (cid:107)x(cid:107) \u2264 D,\n3. for all x \u2208 K, and for all t, either g(cid:48)(vt \u00b7 x) \u2208 [0, a] or g(cid:48)(vt \u00b7 x) \u2208 [\u2212a, 0], and\n4. for all x \u2208 K, and for all t, g(cid:48)(cid:48)(vt \u00b7 x) \u2265 b.\n\nThen there is an algorithm that guarantees the following regret bound:\n\nRegret = O((a2n/b) log(1 + bQ + bR2) + aRD log(2 + Q/R2) + D2).\n\nn since all rt(i) \u2264 1, thus R =\n\n\u221a\neters. We have (cid:107)rt(cid:107) \u2264 \u221a\nNow we apply Theorem 1 to the portfolio selection problem. First, we estimate the relevant param-\nn. For any x \u2208 \u2206n, (cid:107)x(cid:107) \u2264 1, so D = 1.\n(vt\u00b7x)2 \u2265 1, so\ng(cid:48)(vt \u00b7 x) = \u2212 1\nr . Finally, g(cid:48)(cid:48)(vt \u00b7 x) = 1\nr , 0], so a = 1\nb = 1. Applying Theorem 1 we get the following corollary:\nCorollary 2. For the portfolio selection problem over n assets, there is an algorithm that attains\nthe following regret bound:\n\n(vt\u00b7x), and thus g(cid:48)(vt \u00b7 x) \u2208 [\u2212 1\n\n(cid:179) n\n\n(cid:180)\nr2 log(Q + n)\n\n.\n\nRegret = O\n\n2 Bounding the Regret by the Observed Variation in Returns\n\n(cid:80)\n\n2.1 Preliminaries\nAll matrices are assumed be real symmetric matrices in Rn\u00d7n, where n is the number of stocks. We\nuse the notation A (cid:186) B to say that A \u2212 B is positive semide\ufb01nite. We require the notion of a norm\n\u221a\nof a vector x induced by a positive de\ufb01nite matrix M, de\ufb01ned as (cid:107)x(cid:107)M =\nx(cid:62)M x. The following\nsimple generalization of the Cauchy-Schwartz inequality is used in the analysis:\n\n\u2200x, y \u2208 Rn :\n\nx \u00b7 y \u2264 (cid:107)x(cid:107)M(cid:107)y(cid:107)M\u22121.\n\nWe denote by |A| the determinant of a matrix A, and by A \u2022 B = Tr(AB) =\nij AijBij. As\nwe are concerned with logarithmic regret bounds, potential functions which behave like harmonic\nseries come into play. A generalization of harmonic series to high dimensions is the vector-harmonic\nseries, which is a series of quadratic forms that can be expressed as (here A (cid:194) 0 is a positive de\ufb01nite\nmatrix, and v1, v2, . . . are vectors in Rn):\n2 (A + v1v(cid:62)\nThe following lemma is from [HKKA06]:\nLemma 3. For a vector harmonic series given by an initial matrix A and vectors v1, v2, . . . , vT , we\nhave\n\n1 (A + v1v(cid:62)\nv(cid:62)\n1 )\u22121v1, v(cid:62)\nT(cid:88)\n\n\u03c4 =1v\u03c4 v(cid:62)\n(cid:35)\n\n2 )\u22121v2, . . . , v(cid:62)\n\n\u03c4 )\u22121vt, . . .\n\n1 + v2v(cid:62)\n\n(cid:80)T\n\n(cid:80)t\n\nt (A +\n\n(cid:34)\n\n(cid:80)t\n\n|A +\n\n\u03c4 |\n\u03c4 =1 v\u03c4 v(cid:62)\n|A|\n\n.\n\nv(cid:62)\nt (A +\n\n\u03c4 =1v\u03c4 v(cid:62)\n\n\u03c4 )\u22121vt \u2264 log\n\nt=1\n\nThe reader can note that in one dimension, if all vectors vt = 1 and A = 1, then the series above\nreduces exactly to the regular harmonic series whose sum is bounded, of course, by log(T + 1).\n\n2.2 Algorithm and analysis\n\nWe analyze the following algorithm and prove that it attains logarithmic regret with respect to the\nobserved variation (rather than number of iterations). The algorithm follows the generic algorithmic\nscheme of \u201cFollow-The-Regularized-Leader\u201d (FTRL) with squared Euclidean regularization.\nAlgorithm Exp-Concave-FTL. In iteration t, use the point xt de\ufb01ned as:\n\n(cid:33)\n\nxt (cid:44) arg min\nx\u2208\u2206n\n\nf\u03c4 (x) +\n\n(cid:107)x(cid:107)2\n\n1\n2\n\n(1)\n\n(cid:195)\n\nt\u22121(cid:88)\n\n\u03c4 =1\n\nNote the mathematical program which the algorithm solves is convex, and can be solved in time\npolynomial in the dimension and number of iterations. The running time, however, for solving this\n\n4\n\n\fconvex program can be quite high.\nIn the full version of the paper, for the speci\ufb01c problem of\nportfolio selection, where ft(x) = \u2212 log(rt \u00b7 x), we give a faster implementation whose per itera-\ntion running time is independent of the number of iterations, using the more sophisticated \u201conline\nNewton method\u201d of [HKKA06]. In particular, we have the following result:\nTheorem 4. For the portfolio selection problem, there is an algorithm that runs in O(n3) time per\niteration whose regret is bounded by\n\n(cid:179) n\n\n(cid:180)\nr3 log(Q + n)\n\n.\n\nRegret = O\n\nIn this paper, we retain the simpler algorithm and analysis for an easier exposition. We now proceed\nto prove the Theorem 1.\n\nProof. [Theorem 1] First, we note that the algorithm is running a \u201cFollow-the-leader\u201d procedure\n2(cid:107)x(cid:107)2 is a \ufb01ctitious period 0 cost function. In\non the cost functions f0, f1, f2, . . . where f0(x) = 1\nother words, in each iteration, it chooses the point that would have minimized the total cost under\nall the observed functions so far (and, additionally, a \ufb01ctitious initial cost function f0). This point is\nreferred to as the leader in that round.\nThe \ufb01rst step in analyzing such an algorithm is to use a stability lemma from [KV05], which bounds\nthe regret of any Follow-the-leader algorithm by the difference in costs (under ft) of the current pre-\ndiction xt and the next one xt+1, plus an additional error term which comes from the regularization.\nThus, we have\n\nRegret \u2264 (cid:80)\n\u2264 (cid:80)\n(cid:80)\n\n((cid:107)x\u2217(cid:107)2 \u2212 (cid:107)x0(cid:107)2)\n\n1\n2 D2\n\n1\ntft(xt) \u2212 ft(xt+1) +\n2\n1\nt\u2207ft(xt) \u00b7 (xt \u2212 xt+1) +\n2 D2\ntg(cid:48)(vt \u00b7 xt)[vt \u00b7 (xt \u2212 xt+1)] +\n(cid:80)t\n(cid:80)t\n\u03c4 =0\u2207f\u03c4 (xt+1) \u2212 \u2207f\u03c4 (xt)\n(cid:80)t\n\u03c4 =1[g(cid:48)\n(cid:80)t\n\u03c4 =1[\u2207g(cid:48)\n\u03c4 =1g(cid:48)(cid:48)\n\n\u03c4 (v\u03c4 \u00b7 xt+1) \u2212 g(cid:48)\n\u03c4 (v\u03c4 \u00b7 \u03b6 t\n\n\u03c4 (v\u03c4 \u00b7 \u03b6 t\n\n=\n\n(2)\nThe second inequality is because ft is convex. The last equality follows because \u2207ft(xt) = g(cid:48)(xt \u00b7\nvt)vt. Now, we need a handle on xt \u2212 xt+1. For this, de\ufb01ne Ft =\n\u03c4 =0f\u03c4 , and note that xt\nminimizes Ft over K. Consider the difference in the gradients of Ft+1 evaluated at xt+1 and xt:\n\n(cid:80)t\u22121\n\n\u2207Ft+1(xt+1) \u2212 \u2207Ft+1(xt) =\n=\n=\n=\n\n\u03c4 (v\u03c4 \u00b7 xt)]v\u03c4 + (xt+1 \u2212 xt)\n\u03c4 ) \u00b7 (xt+1 \u2212 xt)]v\u03c4 + (xt+1 \u2212 xt)\n\u03c4 (xt+1 \u2212 xt) + (xt+1 \u2212 xt).\n\n(3)\n(4)\n\u03c4 (v\u03c4 \u00b7 x) at\n\u03c4 on the line segment joining xt and xt+1. The equation (4) follows from\n\u03c4 (v\u03c4 \u00b7 x) = g(cid:48)(cid:48)\n\nEquation 3 follows by applying the Taylor expansion of the (multi-variate) function g(cid:48)\npoint xt, for some point \u03b6 t\nthe observation that \u2207g(cid:48)\n\u03c4 )vtv(cid:62)\nDe\ufb01ne At =\nequation (4) can be re-written as:\n\nt + I, where I is the identity matrix, and \u2206xt = xt+1 \u2212 xt. Then\n\n(cid:80)t\n\u03c4 =1g(cid:48)(cid:48)(vt \u00b7 \u03b6 t\n\n\u03c4 (v\u03c4 \u00b7 x)v\u03c4 .\n\n\u03c4 )v\u03c4 v(cid:62)\n\n\u2207Ft+1(xt+1) \u2212 \u2207Ft(xt) \u2212 g(cid:48)(vt \u00b7 xt)vt = At\u2206xt.\n\n(5)\nNow, since xt minimizes the convex function Ft over the convex set K, a standard inequality of\nconvex optimization (see [BV04]) states that for any point y \u2208 K, we have \u2207Ft(xt)\u00b7 (y\u2212 xt) \u2265 0.\nThus, for y = xt+1, we get that \u2207Ft(xt) \u00b7 (xt+1 \u2212 xt) \u2265 0. Similarly, we get that \u2207Ft+1(xt+1) \u00b7\n(xt \u2212 xt+1) \u2265 0. Putting these two inequalities together, we get that\n(\u2207Ft+1(xt+1) \u2212 \u2207Ft(xt)) \u00b7 \u2206xt \u2264 0.\n\n(6)\n\nThus, using the expression for At\u2206xt from (5) we have\n\n(cid:107)\u2206xt(cid:107)2\n\nAt = At\u2206xt \u00b7 \u2206xt\n\n= (\u2207Ft+1(xt+1) \u2212 \u2207Ft(xt) \u2212 g(cid:48)(vt \u00b7 xt)vt) \u00b7 \u2206xt\n\u2264 g(cid:48)(vt \u00b7 xt)[vt \u00b7 (xt \u2212 xt+1)]\n\n(from (6))\n\n(7)\n\n5\n\n\ft+1\n\ng(cid:48)(vt \u00b7 xt)[vt \u00b7 (xt \u2212 xt+1)] \u2264 a(vt \u00b7 \u2206xt).\n\n(cid:80)t\n(cid:80)\n\u03c4 =1v\u03c4 . Then, we have\n(cid:80)t\nt\u02dcvt \u00b7 \u2206xt +\n\nAssume that g(cid:48)(vt \u00b7 x) \u2208 [\u2212a, 0] for all x \u2208 K and all t. The other case is handled similarly.\nInequality (7) implies that g(cid:48)(vt \u00b7 xt) and vt \u00b7 (xt \u2212 xt+1) have the same sign. Thus, we can upper\nbound\n(8)\n(cid:80)\nDe\ufb01ne \u02dcvt = vt \u2212 \u00b5t, \u00b5t = 1\ntvt \u00b7 \u2206xt =\nwhere \u02dcvt = vt \u2212 \u00b5t, \u00b5t = 1\nThen we bound\n\n(cid:80)T\n(cid:80)T\u22121\nt=2xt(\u00b5t\u22121 \u2212 \u00b5t) \u2212 x1\u00b51 + xT +1\u00b5T ,\n(9)\nt=2xt(\u00b5t\u22121 \u2212 \u00b5t) \u2212 x1\u00b51 + xT +1\u00b5T \u2264 (cid:80)T\n(cid:80)T\nt=1 (cid:107)\u00b5t+1 \u2212 \u00b5t(cid:107).\nt=2(cid:107)xt(cid:107)(cid:107)\u00b5t\u22121 \u2212 \u00b5t(cid:107) + (cid:107)x1(cid:107)(cid:107)\u00b51(cid:107) + (cid:107)xT +1(cid:107)(cid:107)\u00b5T(cid:107)\n(10)\nWe will bound \u03c1 momentarily. For now, we turn to bounding the \ufb01rst term of (9) using the Cauchy-\nSchwartz generalization as follows:\n\n\u03c4 =1vt. Now, de\ufb01ne \u03c1 = \u03c1(v1, . . . , vT ) =\n\n\u2264 D\u03c1 + 2DR.\n\nt+1\n\n\u02dcvt \u00b7 \u2206xt \u2264 (cid:107)\u02dcvt(cid:107)A\n\n\u22121\nt\n\n(cid:107)\u2206xt(cid:107)At.\n\n(cid:113)(cid:80)\n\n(cid:113)(cid:80)\n\n(cid:113)(cid:80)\n\n(cid:113)(cid:80)\n\n(11)\n\nta(vt \u00b7 \u2206xt)\n\n(cid:80)\nt(cid:107)\u02dcvt(cid:107)A\n\nBy the usual Cauchy-Schwartz inequality,\n\nt(cid:107)\u2206xt(cid:107)2\nAt\nfrom (7) and (8). We conclude, using (9), (10) and (11), that\n\nt(cid:107)\u02dcvt(cid:107)2\n\n\u22121\nt\n\n\u22121\nt\n\n\u00b7\n\nA\n\n\u2264\n\n(cid:107)\u2206xt(cid:107)At \u2264\n(cid:80)\nta(vt \u00b7 \u2206xt) \u2264 a\n\n(cid:113)(cid:80)\n\nta(vt \u00b7 \u2206xt) + aD\u03c1 + 2aDR.\nThis implies (using the AM-GM inequality applied to the \ufb01rst term on the RHS) that\n\nt(cid:107)\u02dcvt(cid:107)2\n\n\u22121\nt\n\n\u00b7\n\nt(cid:107)\u02dcvt(cid:107)2\n\nA\n\n\u00b7\n\n\u22121\nt\n\n(cid:113)(cid:80)\n\n\u22121\nt\nPlugging this into the regret bound (2) we obtain, via (8),\n\nA\n\nA\n\nta(vt \u00b7 \u2206xt) \u2264 a2(cid:80)\n(cid:80)\nRegret \u2264 a2(cid:80)\n(cid:163)\n\nt(cid:107)\u02dcvt(cid:107)2\n\nA\n\nt(cid:107)\u02dcvt(cid:107)2\n\n+ 2aD\u03c1 + 4aDR.\n\n+ 2aD\u03c1 + 4aDR +\n\n1\n2 D2.\n\n\u22121\nt\n\n\u03c4 =1\n\n1 +\n\n\u22121\nt\n\nA\n\n1\n\n\u03c4 =s\n\ns=1\n\nr~~ 0, with probability at least 1 \u2212 2e\u2212\u03b4, we have\n\nRegret \u2264 O(n(log((cid:107)\u03c3(cid:107)2 + n) + \u03b4)).\n\nTheorem 10 shows that one expects to achieve constant regret independent of the trading frequency,\nas long as the total trading period is \ufb01xed. This result is only useful if increasing trading frequency\nimproves the performance of the best constant rebalanced portfolio. Indeed, this has been observed\nempirically (see e.g. [AHKS06], and more empirical evidence is given in the full version of this\npaper.).\nTo obtain a theoretical justi\ufb01cation for increasing trading frequency, we consider an example where\nwe have two stocks that follow independent Black-Scholes models with the same drifts, but different\nvolatilities \u03c31, \u03c32. The same drift assumption is necessary because in the long run, the best CRP is\nthe one that puts all its wealth on the stock with the greater drift. We normalize the drifts to be equal\nto 0, this doesn\u2019t change the performance in any qualitative manner.\nSince the drift is 0, the expected return of either stock in any trading period is 1; and since the\nreturns in each period are independent, the expected \ufb01nal change in wealth, which is the product\nof the returns, is also 1. Thus, in expectation, any CRP (indeed, any portfolio selection strategy)\nhas overall return 1. We therefore turn to a different criterion for selecting a CRP. The risk of an\ninvestment strategy is measured by the variance of its payoff; thus, if different investment strategies\nhave the same expected payoff, then the one to choose is the one with minimum variance. We\ntherefore choose the CRP with the least variance. We prove the following lemma in the full version\nof the paper:\nLemma 11. In the setup where we trade two stocks with zero drift and volatilities \u03c31, \u03c32, the vari-\nance of the minimum variance CRP decreases as the trading frequency increases.\n\nThus, increasing the trading frequency decreases the variance of the minimum variance CRP, which\nimplies that it gets less risky to trade more frequently; in other words, the more frequently we trade,\nthe more likely the payoff will be close to the expected value. On the other hand, as we show\nin Theorem 10, the regret does not change even if we trade more often; thus, one expects to see\nimproving performance of our algorithm as the trading frequency increases.\n\n4 Conclusions and Future Work\n\nWe have presented an ef\ufb01cient algorithm for regret minimization with exp-concave loss functions\nwhose regret strictly improves upon the state of the art. For the problem of portfolio selection,\nthe regret is bounded in terms of the observed variation in stock returns rather than the number of\niterations.\nRecently, DeMarzo, Kremer and Mansour [DKM06] presented a novel game-theoretic framework\nfor option pricing. Their method prices options using low regret algorithms, and it is possible that\nour analysis can be applied to options pricing via their method (although that would require a much\ntighter optimization of the constants involved).\nIncreasing trading frequency in practice means increasing transaction costs. We have assumed no\ntransaction costs in this paper. It would be very interesting to extend our portfolio selection algorithm\nto take into account transaction costs as in the work of Blum and Kalai [BK97].\n\n8\n\n\f[BK97]\n\n[BS73]\n\n[BV04]\n\n[CB03]\n\n[Bac00]\n\nReferences\n[AHKS06] Amit Agarwal, Elad Hazan, Satyen Kale, and Robert E. Schapire. Algorithms for port-\n\nfolio management based on the newton method. In ICML, pages 9\u201316, 2006.\nL. Bachelier. Th\u00b4eorie de la sp\u00b4eculation. Annales Scienti\ufb01ques de l\u2019 \u00b4Ecole Normale\nSup\u00b4erieure, 3(17):21\u201386, 1900.\nAvrim Blum and Adam Kalai. Universal portfolios with and without transaction costs.\nIn COLT, pages 309\u2013313, New York, NY, USA, 1997. ACM.\nFischer Black and Myron Scholes. The pricing of options and corporate liabilities.\nJournal of Political Economy, 81(3):637\u2013654, 1973.\nStephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University\nPress, New York, NY, USA, 2004.\nJason E Cross and Andrew R Barron. Ef\ufb01cient universal portfolios for past dependent\ntarget classes. Mathematical Finance, 13(2):245\u2013276, 2003.\n[CBMS07] Nicol`o Cesa-Bianchi, Yishay Mansour, and Gilles Stoltz.\n\nImproved second-order\n\nbounds for prediction with expert advice. Mach. Learn., 66(2-3):321\u2013352, 2007.\nT. Cover. Universal portfolios. Math. Finance, 1:1\u201319, 1991.\n\n[Cov91]\n[DKM06] Peter DeMarzo, Ilan Kremer, and Yishay Mansour. Online trading algorithms and ro-\nbust option pricing. In STOC \u201906: Proceedings of the thirty-eighth annual ACM sym-\nposium on Theory of computing, pages 477\u2013486, New York, NY, USA, 2006. ACM.\nElad Hazan and Satyen Kale. Extracting certainty from uncertainty: Regret bounded\nby variation in costs. In Proceedings of 21st COLT, 2008.\nElad Hazan and Satyen Kale. Better algorithms for benign bandits. In SODA, pages\n38\u201347, Philadelphia, PA, USA, 2009. Society for Industrial and Applied Mathematics.\n[HKKA06] Elad Hazan, Adam Kalai, Satyen Kale, and Amit Agarwal. Logarithmic regret algo-\n\n[HK08]\n\n[HK09]\n\nrithms for online convex optimization. In COLT, pages 499\u2013513, 2006.\n\n[HSSW96] David P. Helmbold, Robert E. Schapire, Yoram Singer, and Manfred K. Warmuth. On-\n\nline portfolio selection using multiplicative updates. In ICML, pages 243\u2013251, 1996.\nF. Jamshidian. Asymptotically optimal portfolios. Mathematical Finance, 2:131\u2013150,\n1992.\nIoannis Karatzas and Steven E. Shreve. Brownian Motion and Stochastic Calculus.\nSpringer Verlag, New York, NY, USA, 2004.\nAdam Kalai and Santosh Vempala. Ef\ufb01cient algorithms for universal portfolios. J.\nMach. Learn. Res., 3:423\u2013440, 2003.\nAdam Kalai and Santosh Vempala. Ef\ufb01cient algorithms for online decision problems.\nJournal of Computer and System Sciences, 71(3):291\u2013307, 2005.\nNeri Merhav and Meir Feder. Universal sequential learning and decision from individ-\nual data sequences. In COLT, pages 413\u2013427, 1992.\n\n[Osb59] M. F. M. Osborne. Brownian motion in the stock market. Operations Research, 2:145\u2013\n\n173, 1959.\nMartin Zinkevich. Online convex programming and generalized in\ufb01nitesimal gradient\nascent. In ICML, pages 928\u2013936, 2003.\n\n[Jam92]\n\n[KS04]\n\n[KV03]\n\n[KV05]\n\n[MF92]\n\n[Zin03]\n\n9\n\n\f", "award": [], "sourceid": 470, "authors": [{"given_name": "Elad", "family_name": "Hazan", "institution": null}, {"given_name": "Satyen", "family_name": "Kale", "institution": null}]}~~