{"title": "Robust Portfolio Optimization", "book": "Advances in Neural Information Processing Systems", "page_first": 46, "page_last": 54, "abstract": "We propose a robust portfolio optimization approach based on quantile statistics. The proposed method is robust to extreme events in asset returns, and accommodates large portfolios under limited historical data. Specifically, we show that the risk of the estimated portfolio converges to the oracle optimal risk with parametric rate under weakly dependent asset returns. The theory does not rely on higher order moment assumptions, thus allowing for heavy-tailed asset returns. Moreover, the rate of convergence quantifies that the size of the portfolio under management is allowed to scale exponentially with the sample size of the historical data. The empirical effectiveness of the proposed method is demonstrated under both synthetic and real stock data. Our work extends existing ones by achieving robustness in high dimensions, and by allowing serial dependence.", "full_text": "Robust Portfolio Optimization\n\nHuitong Qiu\n\nDepartment of Biostatistics\nJohns Hopkins University\n\nBaltimore, MD 21205\nhqiu7@jhu.edu\n\nFang Han\n\nDepartment of Biostatistics\nJohns Hopkins University\n\nBaltimore, MD 21205\n\nfhan@jhu.edu\n\nHan Liu\n\nBrian Caffo\n\nDepartment of Biostatistics\nJohns Hopkins University\n\nBaltimore, MD 21205\nbcaffo@jhsph.edu\n\nDepartment of Operations Research\n\nand Financial Engineering\n\nPrinceton University\n\nPrinceton, NJ 08544 hanliu@princeton.edu\n\nAbstract\n\nWe propose a robust portfolio optimization approach based on quantile statistics.\nThe proposed method is robust to extreme events in asset returns, and accommo-\ndates large portfolios under limited historical data. Speci\ufb01cally, we show that the\nrisk of the estimated portfolio converges to the oracle optimal risk with parametric\nrate under weakly dependent asset returns. The theory does not rely on higher or-\nder moment assumptions, thus allowing for heavy-tailed asset returns. Moreover,\nthe rate of convergence quanti\ufb01es that the size of the portfolio under management\nis allowed to scale exponentially with the sample size of the historical data. The\nempirical effectiveness of the proposed method is demonstrated under both syn-\nthetic and real stock data. Our work extends existing ones by achieving robustness\nin high dimensions, and by allowing serial dependence.\n\nIntroduction\n\n1\nMarkowitz\u2019s mean-variance analysis sets the basis for modern portfolio optimization theory [1].\nHowever, the mean-variance analysis has been criticized for being sensitive to estimation errors in\nthe mean and covariance matrix of the asset returns [2, 3]. Compared to the covariance matrix,\nthe mean of the asset returns is more in\ufb02uential and harder to estimate [4, 5]. Therefore, many\nstudies focus on the global minimum variance (GMV) formulation, which only involves estimating\nthe covariance matrix of the asset returns.\nEstimating the covariance matrix of asset returns is challenging due to the high dimensionality and\nheavy-tailedness of asset return data. Speci\ufb01cally, the number of assets under management is usually\nmuch larger than the sample size of exploitable historical data. On the other hand, extreme events\nare typical in \ufb01nancial asset prices, leading to heavy-tailed asset returns.\nTo overcome the curse of dimensionality, structured covariance matrix estimators are proposed for\nasset return data. [6] considered estimators based on factor models with observable factors. [7,\n8, 9] studied covariance matrix estimators based on latent factor models. [10, 11, 12] proposed to\nshrink the sample covariance matrix towards highly structured covariance matrices, including the\nidentity matrix, order 1 autoregressive covariance matrices, and one-factor-based covariance matrix\nestimators. These estimators are commonly based on the sample covariance matrix. (sub)Gaussian\ntail assumptions are required to guarantee consistency.\nFor heavy-tailed data, robust estimators of covariance matrices are desired. Classic robust covariance\nmatrix estimators include M-estimators, minimum volume ellipsoid (MVE) and minimum covari-\n\n1\n\n\fance determinant (MCD) estimators, S-estimators, and estimators based on data outlyingness and\ndepth [13]. These estimators are speci\ufb01cally designed for data with very low dimensions and large\nsample sizes. For generalizing the robust estimators to high dimensions, [14] proposed the Orthogo-\nnalized Gnanadesikan-Kettenring (OGK) estimator, which extends [15]\u2019s estimator by re-estimating\nthe eigenvalues; [16, 17] studied shrinkage estimators based on Tyler\u2019s M-estimator. However, al-\nthough OGK is computationally tractable in high dimensions, consistency is only guaranteed under\n\ufb01xed dimension. The shrunken Tylor\u2019s M-estimator involves iteratively inverting large matrices.\nMoreover, its consistency is only guaranteed when the dimension is in the same order as the sam-\nple size. The aforementioned robust estimators are analyzed under independent data points. Their\nperformance under time series data is questionable.\nIn this paper, we build on a quantile-based scatter matrix1 estimator, and propose a robust portfolio\noptimization approach. Our contributions are in three aspects. First, we show that the proposed\nmethod accommodates high dimensional data by allowing the dimension to scale exponentially\nwith sample size. Secondly, we verify that consistency of the proposed method is achieved without\nany tail conditions, thus allowing for heavy-tailed asset return data. Thirdly, we consider weakly\ndependent time series, and demonstrate how the degree of dependence affects the consistency of the\nproposed method.\n2 Background\nIn this section, we introduce the notation system, and provide a review on the gross-exposure con-\nstrained portfolio optimization that will be exploited in this paper.\n2.1 Notation\nLet v = (v1, . . . , vd)T be a d-dimensional real vector, and M = [Mjk] \u2208 Rd1\u00d7d2 be a d1 \u00d7 d2\nmatrix with Mjk as the (j, k) entry. For 0 < q < \u221e, we de\ufb01ne the (cid:96)q vector norm of v as\nj=1 |vj|. Let the matrix\n(cid:96)max norm of M be (cid:107)M(cid:107)max := maxjk |Mjk|, and the Frobenius norm be (cid:107)M(cid:107)F :=\njk.\njk M 2\nLet X = (X1, . . . , Xd)T and Y = (Y1, . . . , Yd)T be two random vectors. We write X d= Y if X\nand Y are identically distributed. We use 1, 2, . . . to denote vectors with 1, 2, . . . at every entry.\n2.2 Gross-exposure Constrained GMV Formulation\nUnder the GMV formulation, [18] found that imposing a no-short-sale constraint improves portfolio\nef\ufb01ciency. [19] relaxed the no-short-sale constraint by a gross-exposure constraint, and showed that\nportfolio ef\ufb01ciency can be further improved.\nLet X \u2208 Rd be a random vector of asset returns. A portfolio is characterized by a vector of\ninvestment allocations, w = (w1, . . . , wd)T, among the d assets. The gross-exposure constrained\nGMV portfolio optimization can be formulated as\n\nj=1 |vj|)1/q and the (cid:96)\u221e vector norm of v as (cid:107)v(cid:107)\u221e := maxd\n\n(cid:107)v(cid:107)q := ((cid:80)d\n\n(cid:113)(cid:80)\n\nwT\u03a3w s.t. 1Tw = 1, (cid:107)w(cid:107)1 \u2264 c.\n\n(2.1)\nHere \u03a3 is the covariance matrix of X, 1Tw = 1 is the budget constraint, and (cid:107)w(cid:107)1 \u2264 c is the gross-\nexposure constraint. c \u2265 1 is called the gross exposure constant, which controls the percentage\nof long and short positions allowed in the portfolio [19]. The optimization problem (2.1) can be\nconverted into a quadratic programming problem, and solved by standard software [19].\n3 Method\nIn this section, we introduce the quantile-based portfolio optimization approach. Let Z \u2208 R be a\nrandom variable with distribution function F , and {zt}T\nt=1 be a sequence of observations from Z.\nFor a constant q \u2208 [0, 1], we de\ufb01ne the q-quantiles of Z and {zt}T\n\nt=1 to be\n\nmin\n\nw\n\n(cid:111)\n\n\u2265 q\n\n.\n\n(cid:98)Q({zt}T\n\nQ(Z; q) = Q(F ; q) := inf{z : P(Z \u2264 z) \u2265 q},\nt\nT\n\nt=1; q) := z(k) where k = min\n\nt :\n\n(cid:110)\n\n1A scatter matrix is de\ufb01ned to be any matrix proportional to the covariance matrix by a constant.\n\n2\n\n\fHere z(1) \u2264 . . . \u2264 z(T ) are the order statistics of {zt}T\nt=1. We say Q(Z; q) is unique if there\nt=1; q) is unique if there exists a unique\nz \u2208 {z1, . . . , zT} such that z = z(k). Following the estimator Qn [20], we de\ufb01ne the population\nand sample quantile-based scales to be\n\nexists a unique z such that P(Z \u2264 z) = q. We say (cid:98)Q({zt}T\nHere (cid:101)Z is an independent copy of Z. Based on \u03c3Q and (cid:98)\u03c3Q, we can further de\ufb01ne robust scat-\n\n\u03c3Q(Z) := Q(|Z \u2212 (cid:101)Z|; 1/4) and(cid:98)\u03c3Q({zt}T\n\nt=1) := (cid:98)Q({|zs \u2212 zt|}1\u2264s 0. Secondly, the pro-\njection (3.3) is more robust compared to the OGK estimate [14]. OGK induces positive de\ufb01niteness\nby re-estimating the eigenvalues using the variances of the principal components. Robustness is lost\nwhen the data, possibly containing outliers, are projected onto the principal directions for estimating\nthe principal components.\n\n3\n\n\fRemark 3.3. We adopt the 1/4 quantile in the de\ufb01nitions of \u03c3Q and(cid:98)\u03c3Q to achieve 50% breakdown\n\npoint. However, we note that our methodology and theory carries through if 1/4 is replaced by any\nabsolute constant q \u2208 (0, 1).\n4 Theoretical Properties\nIn this section, we provide theoretical analysis of the proposed portfolio optimization approach. For\n\nan optimized portfolio, (cid:98)wopt, based on an estimate, R, of RQ, the next lemma shows that the error\nbetween the risks R((cid:98)wopt; RQ) and R(wopt; RQ) is essentially related to the estimation error in R.\nLemma 4.1. Let (cid:98)wopt be the solution to\n\nfor an arbitrary matrix R. Then, we have\n\nR(w; R) s.t. 1Tw = 1, (cid:107)w(cid:107)1 \u2264 c\n\nmin\n\nw\n\n|R((cid:98)wopt; RQ) \u2212 R(wopt; RQ)| \u2264 2c2(cid:107)R \u2212 RQ(cid:107)max,\n\nwhere wopt is the solution to the oracle portfolio optimization problem (3.2), and c is the gross-\nexposure constant.\n\nNext, we derive the rate of convergence for R((cid:101)wopt; RQ), which relates to the rate of convergence\nin (cid:107)(cid:101)RQ\u2212 RQ(cid:107)max. To this end, we \ufb01rst introduce a dependence condition on the asset return series.\nDe\ufb01nition 4.2. Let {Xt}t\u2208Z be a stationary process. Denote by F 0\u2212\u221e := \u03c3(Xt : t \u2264 0) and\nn := \u03c3(Xt : t \u2265 n) the \u03c3-\ufb01leds generated by {Xt}t\u22640 and {Xt}t\u2265n, respectively. The \u03c6-mixing\nF\u221e\ncoef\ufb01cient is de\ufb01ned by\n\n(4.1)\n\n\u03c6(n) :=\n\nsup\nB\u2208F 0\u2212\u221e,A\u2208F\u221e\n\nn ,P(B)>0\n\n|P(A | B) \u2212 P(A)|.\n\nThe process {Xt}t\u2208Z is \u03c6-mixing if and only if limn\u2192\u221e \u03c6(n) = 0.\nCondition 1. {Xt \u2208 Rd}t\u2208Z is a stationary process such that for any j (cid:54)= k \u2208 {1, . . . , d},\n{Xtj}t\u2208Z, {Xtj + Xtk}t\u2208Z, and {Xtj \u2212 Xtk}t\u2208Z are \u03c6-mixing processes satisfying \u03c6(n) \u2264 1/n1+\u0001\nfor any n > 0 and some constant \u0001 > 0.\n\nThe parameter \u0001 determines the rate of decay in \u03c6(n), and characterizes the degree of dependence\nin {Xt}t\u2208Z. Next, we introduce an identi\ufb01ability condition on the distribution function of the asset\nreturns.\n\nCondition 2. Let(cid:102)X = ((cid:101)X1, . . . , (cid:101)Xd)T be an independent copy of X1. For any j (cid:54)= k \u2208 {1, . . . , d},\nlet F1;j, F2;j,k, and F3;j,k be the distribution functions of |X1j \u2212 (cid:101)Xj|, |X1j + X1k \u2212 (cid:101)Xj \u2212 (cid:101)Xk|, and\n|X1j \u2212 X1k \u2212 (cid:101)Xj + (cid:101)Xk|. We assume there exist constants \u03ba > 0 and \u03b7 > 0 such that\n\ninf\n\nF (y) \u2265 \u03b7\n\n|y\u2212Q(F ;1/4)|\u2264\u03ba\n\nd\ndy\nfor any F \u2208 {F1;j, F2;j,k, F3;j,k : j (cid:54)= k = 1, . . . , d}.\nCondition 2 guarantees the identi\ufb01ability of the 1/4 quantiles, and is standard in the literature on\nquantile statistics [22, 23]. Based on Conditions 1 and 2, we can present the rates of convergence\nTheorem 4.3. Let {Xt}t\u2208Z be an absolutely continuous stationary process satisfying Conditions\n1 and 2. Suppose log d/T \u2192 0 as T \u2192 \u221e. Then, for any \u03b1 \u2208 (0, 1) and T large enough, with\nprobability no smaller than 1 \u2212 8\u03b12, we have\n\nfor (cid:98)RQ and (cid:101)RQ.\n\nHere the rate of convergence rT is de\ufb01ned by\n\nrT = max\n\n(cid:110) 2\n\n4(1 + 2C\u0001)(log d \u2212 log \u03b1)\n\n(cid:107)(cid:98)RQ \u2212 RQ(cid:107)max \u2264 rT .\n(cid:104)(cid:114)\n(cid:104)(cid:114)\n(cid:107)(cid:101)RQ \u2212 RQ(cid:107)max \u2264 2rT .\n\n+\n4(1 + 2C\u0001)(log d \u2212 log \u03b1)\n\nT\n\nT\n\n\u03b72\n4\u03c3Q\nmax\n\u03b7\n\n4C\u0001\nT\n\n(cid:105)2\n\n,\n\n4C\u0001\nT\n\n(cid:105)(cid:111)\n\n4\n\n(4.2)\n\n(4.4)\n\n(4.3)\nmax := max{\u03c3Q(Xj), \u03c3Q(Xj + Xk), \u03c3Q(Xj \u2212 Xk) : j (cid:54)= k \u2208 {1, . . . , d}} and C\u0001 :=\n\n(cid:80)\u221e\nk=1 1/k1+\u0001. Moreover, if RQ \u2208 S\u03bb for S\u03bb de\ufb01ned in (3.3), we further have\n\nwhere \u03c3Q\n\n+\n\n,\n\n\fThe implications of Theorem 4.3 are as follows.\n\n1. When the parameters \u03b7, \u0001, and \u03c3Q\n\nmax do not scale with T , the rate of convergence reduces\n\nto OP ((cid:112)log d/T ). Thus, the number of assets under management is allowed to scale\nically, as \u0001 approaches 0, C\u0001 =(cid:80)\u221e\n\nexponentially with sample size T . Compared to similar rates of convergence obtained\nfor sample-covariance-based estimators [24, 25, 9], we do not require any moment or tail\nconditions, thus accommodating heavy-tailed asset return data.\n\n2. The effect of serial dependence on the rate of convergence is characterized by C\u0001. Specif-\nk=1 1/k1+\u0001 increases towards in\ufb01nity, in\ufb02ating rT . \u0001 is\n\nallowed to scale with T such that C\u0001 = o(T / log d).\n\n3. The rate of convergence rT is inversely related to the lower bound, \u03b7, on the marginal\ndensity functions around the 1/4 quantiles. This is because when \u03b7 is small, the distribu-\ntion functions are \ufb02at around the 1/4 quantiles, making the population quantiles harder to\nestimate.\n\nCombining Lemma 4.1 and Theorem 4.3, we obtain the rate of convergence for R((cid:101)wopt; RQ).\n\nTheorem 4.4. Let {Xt}t\u2208Z be an absolutely continuous stationary process satisfying Conditions 1\nand 2. Suppose that log d/T \u2192 0 as T \u2192 \u221e and RQ \u2208 S\u03bb. Then, for any \u03b1 \u2208 (0, 1) and T large\nenough, we have\n\n|R((cid:101)wopt; RQ) \u2212 R(wopt; RQ)| \u2264 2c2rT ,\n\n(4.5)\n\nwhere rT is de\ufb01ned in (4.3) and c is the gross-exposure constant.\n\nTheorem 4.4 shows that the risk of the estimated portfolio converges to the oracle optimal risk with\nparametric rate rT . The number of assets, d, is allowed to scale exponentially with sample size T .\nMoreover, the rate of convergence does not rely on any tail conditions on the distribution of the asset\nreturns.\nFor the rest of this section, we build the connection between the proposed robust portfolio opti-\nmization and its moment-based counterpart. Speci\ufb01cally, we show that they are consistent under the\nelliptical model.\nDe\ufb01nition 4.5. [26] A random vector X \u2208 Rd follows an elliptical distribution with location \u00b5 \u2208\nRd and scatter S \u2208 Rd\u00d7d if and only if there exist a nonnegative random variable \u03be \u2208 R, a matrix\nA \u2208 Rd\u00d7r with rank(A) = r, a random vector U \u2208 Rr independent from \u03be and uniformly\ndistributed on the r-dimensional sphere, Sr\u22121, such that\nX d= \u00b5 + \u03beAU .\n\nHere S = AAT has rank r. We denote X \u223c ECd(\u00b5, S, \u03be). \u03be is called the generating variate.\n\nCommonly used elliptical distributions include Gaussian distribution and t-distribution. Elliptical\ndistributions have been widely used for modeling \ufb01nancial return data, since they naturally capture\nmany stylized properties including heavy tails and tail dependence [27, 28, 29, 30, 31, 32]. The next\ntheorem relates RQ and R(w; RQ) to their moment-based counterparts, \u03a3 and R(w; \u03a3), under the\nelliptical model.\nTheorem 4.6. Let X = (X1, . . . , Xd)T \u223c ECd(\u00b5, S, \u03be) be an absolutely continuous elliptical\n\nrandom vector and(cid:102)X = ((cid:101)X1, . . . , (cid:101)Xd)T be an independent copy of X. Then, we have\n\n(4.6)\nfor some constant mQ only depending on the distribution of X. Moreover, if 0 < E\u03be2 < \u221e, we\nhave\n\nRQ = mQS\n\nwhere \u03a3 = Cov(X) is the covariance matrix of X, and cQ is a constant given by\n\nRQ = cQ\u03a3 and R(w; RQ) = cQR(w; \u03a3),\n\n(cid:111)\n\n(cid:110) (Xj \u2212 (cid:101)Xj)2\n(cid:110) (Xj \u2212 Xk \u2212 (cid:101)Xj + (cid:101)Xk)2\n\nVar(Xj)\n\n= Q\n\n;\n\n1\n4\n\nVar(Xj \u2212 Xk)\n\n(cid:111)\n\n.\n\n;\n\n1\n4\n\ncQ =Q\n\n=Q\n\n(cid:110) (Xj + Xk \u2212 (cid:101)Xj \u2212 (cid:101)Xk)2\n\nVar(Xj + Xk)\n\n(cid:111)\n\n;\n\n1\n4\n\n(4.7)\n\n(4.8)\n\nHere the last two inequalities hold when Var(Xj + Xk) > 0 and Var(Xj \u2212 Xk) > 0.\n\n5\n\n\fBy Theorem 4.6, under the elliptical model, minimizing the robust risk metric, R(w; RQ), is equiv-\nalent with minimizing the standard moment-based risk metric, R(w; \u03a3). Thus, the robust portfolio\noptimization (3.2) is equivalent to its moment-based counterpart (2.1) in the population level. Plug-\nging (4.7) into (4.5) leads to the following theorem.\nTheorem 4.7. Let {Xt}t\u2208Z be an absolutely continuous stationary process satisfying Conditions 1\nand 2. Suppose that X1 \u223c ECd(\u00b5, S, \u03be) follows an elliptical distribution with covariance matrix\n\u03a3, and log d/T \u2192 0 as T \u2192 \u221e. Then, we have\n\n|R((cid:101)wopt; \u03a3) \u2212 R(wopt; \u03a3)| \u2264 2c2\n\ncQ rT ,\n\nwhere c is the gross-exposure constant, cQ is de\ufb01ned in (4.8), and rT is de\ufb01ned in (4.3).\n\nThus, under the elliptical model, the optimal portfolio, (cid:101)wopt, obtained from the robust portfolio\n\noptimization also leads to parametric rate of convergence for the standard moment-based risk.\n5 Experiments\nIn this section, we investigate the empirical performance of the proposed portfolio optimization\napproach. In Section 5.1, we demonstrate the robustness of the proposed approach using synthetic\nheavy-tailed data. In Section 5.2, we simulate portfolio management using the Standard & Poor\u2019s\n500 (S&P 500) stock index data.\nThe proposed portfolio optimization approach (QNE) is compared with three competitors. These\ncompetitors are constructed by replacing the covariance matrix \u03a3 in (2.1) by commonly used co-\nvariance/scatter matrix estimators:\n\n1. OGK: The orthogonalized Gnanadesikan-Kettenring estimator constructs a pilot scatter\nmatrix estimate using a robust \u03c4-estimator of scale, then re-estimates the eigenvalues using\nthe variances of the principal components [14].\n\n2. Factor: The principal factor estimator iteratively solves for the speci\ufb01c variances and the\n\nfactor loadings [33].\n\n3. Shrink: The shrinkage estimator shrinkages the sample covariance matrix towards a one-\n\nfactor covariance estimator[10].\n\n5.1 Synthetic Data\nFollowing [19], we construct the covariance matrix of the asset returns using a three-factor model:\n(5.1)\nwhere Xj is the return of the j-th stock, bjk is the loadings of the j-th stock on factor fk, and \u03b5j is\nthe idiosyncratic noise independent of the three factors. Under this model, the covariance matrix of\nthe stock returns is given by\n\nXj = bj1f1 + bj2f2 + bj3f3 + \u03b5j, j = 1, . . . , d,\n\nd),\n\n1, . . . , \u03c32\n\n\u03a3 = B\u03a3f BT + diag(\u03c32\n\n(5.2)\nwhere B = [bjk] is a d \u00d7 3 matrix consisting of the factor loadings, \u03a3f is the covariance matrix\nof the three factors, and \u03c32\nj is the variance of the noise \u03b5i. We adopt the covariance in (5.2) in our\nsimulations. Following [19], we generate the factor loadings B from a trivariate normal distribution,\nNd(\u00b5b, \u03a3b), where the mean, \u00b5b, and covariance, \u03a3b, are speci\ufb01ed in Table 1. After the factor\nloadings are generated, they are \ufb01xed as parameters throughout the simulations. The covariance\nmatrix, \u03a3f , of the three factors is also given in Table 1. The standard deviations, \u03c31, . . . , \u03c3d, of the\nidiosyncratic noises are generated independently from a truncated gamma distribution with shape\n3.3586 and scale 0.1876, restricting the support to [0.195,\u221e). Again these standard deviations are\n\ufb01xed as parameters once they are generated. According to [19], these parameters are obtained by\n\ufb01tting the three-factor model, (5.1), using three-year daily return data of 30 Industry Portfolios from\nMay 1, 2002 to Aug. 29, 2005. The covariance matrix, \u03a3, is \ufb01xed throughout the simulations. Since\nwe are only interested in risk optimization, we set the mean of the asset returns to be \u00b5 = 0. The\ndimension of the stocks under consideration is \ufb01xed at d = 100.\nGiven the covariance matrix \u03a3, we generate the asset return data from the following three distribu-\ntions.\n\nD1: multivariate Gaussian distribution, Nd(0, \u03a3);\n\n6\n\n\fTable 1: Parameters for generating the covariance matrix in Equation (5.2).\n\nParameters for factor loadings\n\nParameters for factor returns\n\n\u00b5b\n\n0.7828\n0.5180\n0.4100\n\n0.02915\n0.02387\n0.01018\n\n\u03a3b\n\n0.02387\n0.05395\n-0.00697\n\n0.01018\n-0.00697\n0.08686\n\n1.2507\n-0.0350\n-0.2042\n\n\u03a3f\n-0.035\n0.3156\n-0.0023\n\n-0.2042\n-0.0023\n0.1930\n\nGaussian\n\nmultivariate t\n\nelliptical log-normal\n\nGaussian\n\nmultivariate t\n\nelliptical log-normal\n\nFigure 1: Portfolio risks, selected number of stocks, and matching rates to the oracle optimal port-\nfolios.\n\nD2: multivariate t distribution with degree of freedom 3 and covariance matrix \u03a3;\nD2: elliptical distribution with log-normal generating variate, log N (0, 2), and covariance ma-\n\ntrix \u03a3.\n\nUnder each distribution, we generate asset return series of half a year (T = 126). We estimate\nthe covariance/scatter matrices using QNE and the three competitors, and plug them into (2.1) to\noptimize the portfolio allocations. We also solve (2.1) with the true covariance matrix, \u03a3, to obtain\nthe oracle optimal portfolios as benchmarks. We range the gross-exposure constraint, c, from 1 to 2.\nThe results are based on 1,000 simulations.\n\n(cid:83) S2|, where |S| denotes the cardinality of set S.\n\nFigure 1 shows the portfolio risks R((cid:98)w; \u03a3) and the matching rates between the optimized portfolios\n(cid:84) S2|/|S1\n\nand the oracle optimal portfolios2. Here the matching rate is de\ufb01ned as follows. For two portfolios\nP1 and P2, let S1 and S2 be the corresponding sets of selected assets, i.e., the assets for which\nthe weights, wi, are non-zero. The matching rate between P1 and P2 is de\ufb01ned as r(P1, P2) =\n|S1\nWe note two observations from Figure 1.\n(i) The four estimators leads to comparable portfolio\nrisks under the Gaussian model D1. However, under heavy-tailed distributions D2 and D3, QNE\nachieves lower portfolio risk. (ii) The matching rates of QNE are stable across the three models,\nand are higher than the competing methods under heavy-tailed distributions D2 and D3. Thus, we\nconclude that QNE is robust to heavy tails in both risk minimization and asset selection.\n5.2 Real Data\nIn this section, we simulate portfolio management using the S&P 500 stocks. We collect 1,258\nadjusted daily closing prices3 for 435 stocks that stayed in the S&P 500 index from January 1, 2003\n\n2Due to the (cid:96)1 regularization in the gross-exposure constraint, the solution is generally sparse.\n3The adjusted closing prices accounts for all corporate actions including stock splits, dividends, and rights\n\nofferings.\n\n7\n\n1.01.21.41.61.82.00.20.40.60.81.0gross\u2212exposure constant (c)riskOracleQNEOGKFactorShrink1.01.21.41.61.82.00.20.40.60.81.0gross\u2212exposure constant (c)riskOracleQNEOGKFactorShrink1.01.21.41.61.82.00.20.40.60.81.0gross\u2212exposure constant (c)riskOracleQNEOGKFactorShrink1.01.21.41.61.82.00.00.20.40.60.81.0gross\u2212exposure constant (c)matching rateQNEOGKFactorShrink1.01.21.41.61.82.00.00.20.40.60.81.0gross\u2212exposure constant (c)matching rateQNEOGKFactorShrink1.01.21.41.61.82.00.00.20.40.60.81.0gross\u2212exposure constant (c)matching rateQNEOGKFactorShrink\fTable 2: Annualized Sharpe ratios, returns, and risks under 4 competing approaches, using S&P 500\nindex data.\n\nSharpe ratio\n\nreturn (in %)\n\nrisk (in %)\n\nc=1.0\nc=1.2\nc=1.4\nc=1.6\nc=1.8\nc=2.0\nc=1.0\nc=1.2\nc=1.4\nc=1.6\nc=1.8\nc=2.0\nc=1.0\nc=1.2\nc=1.4\nc=1.6\nc=1.8\nc=2.0\n\nQNE\n2.04\n1.89\n1.61\n1.56\n1.55\n1.53\n20.46\n18.41\n15.58\n15.02\n14.77\n14.51\n10.02\n9.74\n9.70\n9.63\n9.54\n9.48\n\nOGK\n1.64\n1.39\n1.24\n1.31\n1.48\n1.51\n16.59\n13.15\n11.30\n11.48\n12.39\n12.27\n10.09\n9.46\n9.10\n8.75\n8.39\n8.13\n\nFactor\n1.29\n1.22\n1.34\n1.38\n1.41\n1.43\n13.18\n10.79\n10.88\n10.68\n10.57\n10.60\n10.19\n8.83\n8.12\n7.71\n7.51\n7.43\n\nShrink\n0.92\n0.74\n0.72\n0.75\n0.78\n0.83\n9.84\n7.20\n6.55\n6.49\n6.58\n6.76\n10.70\n9.76\n9.14\n8.68\n8.38\n8.18\n\nto December 31, 2007. Using the closing prices, we obtain 1,257 daily returns as the daily growth\nrates of the prices.\nWe manage a portfolio consisting of the 435 stocks from January 1, 2003 to December 31, 20074.\nOn days i = 42, 43, . . . , 1, 256, we optimize the portfolio allocations using the past 2 months stock\nreturn data (42 sample points). We hold the portfolio for one day, and evaluate the portfolio return\non day i + 1. In this way, we obtain 1,215 portfolio returns. We repeat the process for each of the\nfour methods under comparison, and range the gross-exposure constant c from 1 to 25.\nSince the true covariance matrix of the stock returns is unknown, we adopt the Sharpe ratio for\nevaluating the performances of the portfolios. Table 2 summarizes the annualized Sharpe ratios,\nmean returns, and empirical risks (i.e., standard deviations of the portfolio returns). We observe that\nQNE achieves the largest Sharpe ratios under all values of the gross-exposure constant, indicating\nthe lowest risks under the same returns (or equivalently, the highest returns under the same risk).\n6 Discussion\nIn this paper, we propose a robust portfolio optimization framework, building on a quantile-based\nscatter matrix. We obtain non-asymptotic rates of convergence for the scatter matrix estimators and\nthe risk of the estimated portfolio. The relations of the proposed framework with its moment-based\ncounterpart are well understood.\nThe main contribution of the robust portfolio optimization approach lies in its robustness to heavy\ntails in high dimensions. Heavy tails present unique challenges in high dimensions compared to\nlow dimensions. For example, asymptotic theory of M-estimators guarantees consistency in the rate\nincreasing n. However, when d (cid:29) n, statistical error may scale rapidly with dimension. Thus,\nstringent tail conditions, such as subGaussian conditions, are required to guarantee consistency for\nmoment-based estimators in high dimensions [36]. In this paper, based on quantile statistics, we\nachieve consistency for portfolio risk without assuming any tail conditions, while allowing d to\nscale nearly exponentially with n.\nAnother contribution of his work lies in the theoretical analysis of how serial dependence may affect\nconsistency of the estimation. We measure the degree of serial dependence using the \u03c6-mixing\ncoef\ufb01cient, \u03c6(n). We show that the effect of the serial dependence on the rate of convergence is\n\nOP ((cid:112)d/n) even for non-Gaussian data [34, 35]. If d (cid:28) n, statistical error diminishes rapidly with\n\nsummarized by the parameter C\u0001, which characterizes the size of(cid:80)\u221e\n\nn=1 \u03c6(n).\n\n4We drop the data after 2007 to avoid the \ufb01nancial crisis, when the stock prices are likely to violate the\n\nstationary assumption.\n\n5c = 2 imposes a 50% upper bound on the percentage of short positions. In practice, the percentage of\n\nshort positions is usually strictly controlled to be much lower.\n\n8\n\n\fReferences\n[1] Harry Markowitz. Portfolio selection. The Journal of Finance, 7(1):77\u201391, 1952.\n[2] Michael J Best and Robert R Grauer. On the sensitivity of mean-variance-ef\ufb01cient portfolios to changes\nin asset means: some analytical and computational results. Review of Financial Studies, 4(2):315\u2013342,\n1991.\n\n[3] Vijay Kumar Chopra and William T Ziemba. The effect of errors in means, variances, and covariances on\n\noptimal portfolio choice. The Journal of Portfolio Management, 19(2):6\u201311, 1993.\n\n[4] Robert C Merton. On estimating the expected return on the market: An exploratory investigation. Journal\n\n[5] Jarl G Kallberg and William T Ziemba. Mis-speci\ufb01cations in portfolio selection problems. In Risk and\n\nof Financial Economics, 8(4):323\u2013361, 1980.\n\nCapital, pages 74\u201387. Springer, 1984.\n\n[6] Jianqing Fan, Yingying Fan, and Jinchi Lv. High dimensional covariance matrix estimation using a factor\n\nmodel. Journal of Econometrics, 147(1):186\u2013197, 2008.\n\n[7] James H Stock and Mark W Watson. Forecasting using principal components from a large number of\n\npredictors. Journal of the American Statistical Association, 97(460):1167\u20131179, 2002.\n\n[8] Jushan Bai, Kunpeng Li, et al. Statistical analysis of factor models of high dimension. The Annals of\n\nStatistics, 40(1):436\u2013465, 2012.\n\n[9] Jianqing Fan, Yuan Liao, and Martina Mincheva. Large covariance estimation by thresholding principal\northogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology),\n75(4):603\u2013680, 2013.\n\n[10] Olivier Ledoit and Michael Wolf. Improved estimation of the covariance matrix of stock returns with an\n\napplication to portfolio selection. Journal of Empirical Finance, 10(5):603\u2013621, 2003.\n\n[11] Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional covariance matri-\n\nces. Journal of Multivariate Analysis, 88(2):365\u2013411, 2004.\n\n[12] Olivier Ledoit and Michael Wolf. Honey, I shrunk the sample covariance matrix. The Journal of Portfolio\n\nManagement, 30(4):110\u2013119, 2004.\n\n[13] Peter J Huber. Robust Statistics. Wiley, 1981.\n[14] Ricardo A Maronna and Ruben H Zamar. Robust estimates of location and dispersion for high-\n\ndimensional datasets. Technometrics, 44(4):307\u2013317, 2002.\n\n[15] Ramanathan Gnanadesikan and John R Kettenring. Robust estimates, residuals, and outlier detection with\n\nmultiresponse data. Biometrics, 28(1):81\u2013124, 1972.\n\n[16] Yilun Chen, Ami Wiesel, and Alfred O Hero. Robust shrinkage estimation of high-dimensional covariance\n\nmatrices. IEEE Transactions on Signal Processing, 59(9):4097\u20134107, 2011.\n\n[17] Romain Couillet and Matthew R McKay. Large dimensional analysis and optimization of robust shrinkage\n\ncovariance matrix estimators. Journal of Multivariate Analysis, 131:99\u2013120, 2014.\n\n[18] Ravi Jagannathan and T Ma. Risk reduction in large portfolios: Why imposing the wrong constraints\n\nhelps. The Journal of Finance, 58(4):1651\u20131683, 2003.\n\n[19] Jianqing Fan, Jingjin Zhang, and Ke Yu. Vast portfolio selection with gross-exposure constraints. Journal\n\nof the American Statistical Association, 107(498):592\u2013606, 2012.\n\n[20] Peter J Rousseeuw and Christophe Croux. Alternatives to the median absolute deviation. Journal of the\n\nAmerican Statistical Association, 88(424):1273\u20131283, 1993.\n\n[21] M. H. Xu and H. Shao. Solving the matrix nearness problem in the maximum norm by applying a\n\nprojection and contraction method. Advances in Operations Research, 2012:1\u201315, 2012.\n\n[22] Alexandre Belloni and Victor Chernozhukov. (cid:96)1-penalized quantile regression in high-dimensional sparse\n\nmodels. The Annals of Statistics, 39(1):82\u2013130, 2011.\n\n[23] Lan Wang, Yichao Wu, and Runze Li. Quantile regression for analyzing heterogeneity in ultra-high\n\ndimension. Journal of the American Statistical Association, 107(497):214\u2013222, 2012.\n\n[24] Peter J Bickel and Elizaveta Levina. Covariance regularization by thresholding. The Annals of Statistics,\n\n36(6):2577\u20132604, 2008.\n\nman and Hall, 1990.\n\n[25] T Tony Cai, Cun-Hui Zhang, and Harrison H Zhou. Optimal rates of convergence for covariance matrix\n\nestimation. The Annals of Statistics, 38(4):2118\u20132144, 2010.\n\n[26] Kai-Tai Fang, Samuel Kotz, and Kai Wang Ng. Symmetric Multivariate and Related Distributions. Chap-\n\n[27] Harry Joe. Multivariate Models and Dependence Concepts. Chapman and Hall, 1997.\n[28] Rafael Schmidt. Tail dependence for elliptically contoured distributions. Mathematical Methods of Op-\n\nerations Research, 55(2):301\u2013327, 2002.\n\n[29] Svetlozar Todorov Rachev. Handbook of Heavy Tailed Distributions in Finance. Elsevier, 2003.\n[30] Svetlozar T Rachev, Christian Menn, and Frank J Fabozzi. Fat-tailed and Skewed Asset Return Distribu-\n\ntions: Implications for Risk Management, Portfolio Selection, and Option Pricing. Wiley, 2005.\n\n[31] Kevin Dowd. Measuring Market Risk. Wiley, 2007.\n[32] Torben Gustav Andersen. Handbook of Financial Time Series. Springer, 2009.\n[33] Jushan Bai and Shuzhong Shi. Estimating high dimensional covariance matrices and its applications.\n\nAnnals of Economics and Finance, 12(2):199\u2013215, 2011.\n\n[34] Sara Van De Geer and SA Van De Geer. Empirical Processes in M-estimation. Cambridge University\n\n[35] Alastair R Hall. Generalized Method of Moments. Oxford University Press, Oxford, 2005.\n[36] Peter B\u00a8uhlmann and Sara Van De Geer. Statistics for High-dimensional Data: Methods, Theory and\n\nPress, Cambridge, 2000.\n\nApplications. Springer, 2011.\n\n9\n\n\f", "award": [], "sourceid": 27, "authors": [{"given_name": "Huitong", "family_name": "Qiu", "institution": "Johns Hopkins University"}, {"given_name": "Fang", "family_name": "Han", "institution": "Johns Hopkins University"}, {"given_name": "Han", "family_name": "Liu", "institution": "Princeton University"}, {"given_name": "Brian", "family_name": "Caffo", "institution": null}]}