{"title": "Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 1575, "page_last": 1585, "abstract": "We propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns. Our model captures well both the time-varying characteristic and the asymmetrical heavy-tail property of financial time series. It combines the merits of a popular sequential neural network model, i.e., LSTM, with a novel parametric quantile function that we construct to represent the conditional distribution of asset returns. Our model also captures individually the serial dependences of higher moments, rather than just the volatility. Across a wide range of asset classes, the out-of-sample forecasts of conditional quantiles or VaR of our model outperform the GARCH family. Further, the proposed approach does not suffer from the issue of quantile crossing, nor does it expose to the ill-posedness comparing to the parametric probability density function approach.", "full_text": "Parsimonious Quantile Regression of Financial Asset\n\nTail Dynamics via Sequential Learning\n\nXing Yan3\n\nWeizhong Zhang1\n\nLin Ma1\n\nWei Liu1\n\nQi Wu2,\u2217\n\n1Tencent AI Lab\n\n2School of Data Science, City University of Hong Kong\n\n3Department of SEEM, The Chinese University of Hong Kong\n\nxyan@se.cuhk.edu.hk {zhangweizhongzju,forest.linma}@gmail.com\n\nwl2223@columbia.edu qiwu55@cityu.edu.hk\n\nAbstract\n\nWe propose a parsimonious quantile regression framework to learn the dynamic\ntail behaviors of \ufb01nancial asset returns. Our model captures well both the time-\nvarying characteristic and the asymmetrical heavy-tail property of \ufb01nancial time\nseries. It combines the merits of a popular sequential neural network model, i.e.,\nLSTM, with a novel parametric quantile function that we construct to represent the\nconditional distribution of asset returns. Our model also captures individually the\nserial dependences of higher moments, rather than just the volatility. Across a wide\nrange of asset classes, the out-of-sample forecasts of conditional quantiles or VaR\nof our model outperform the GARCH family. Further, the proposed approach does\nnot suffer from the issue of quantile crossing, nor does it expose to the ill-posedness\ncomparing to the parametric probability density function approach.\n\n1\n\nIntroduction\n\nIn general, machine learning models aim to predict one single value of output variable y given input\nx, usually to estimate the conditional mean E[y|x]. In many situations, we are also interested in the\ncharacteristics of the conditional distribution p(y|x). A typical domain needing the learning of these\ncharacteristics is \ufb01nancial returns. Data from \ufb01nancial markets is highly stochastic or noisy. It is\nimpossible to accurately predict future \ufb01nancial returns. What we can predict and what we really care\nabout are their conditional distributional characteristics like volatility, heavy tails, and Value-at-Risk,\nwhich are all widely used measures of risks. The huge and increasing demands for risk management\nand for understanding market behaviors make it extremely important to predict these characteristics.\nIn the scope of discrete-time econometric models, the benchmark of forecasting conditional distri-\nbution of time-t asset return rt conditional on past return history is the Generalized Autoregressive\nConditional Heteroskedasticity (GARCH) model as well as its variants. First appearing in [9] and\n[2] to model time-varying volatility, GARCH-type models have now become a big family, including\npopular variants like EGARCH [24], GJR-GARCH [16], TGARCH [32], etc. They all describe the\ndistribution p(rt|rt\u22121, rt\u22122, . . . ) by making strong assumptions on the probability density function\nof it, e.g., assuming it is Gaussian, and let the distribution parameters depend on past information.\nUsually, t-distribution is assumed to model heavy tails. Quantile regression [19][20] is another\ntype of method to forecast the conditional distributional characteristics. It predicts the quantiles of\np(y|x) without making any distributional assumption. In this paper, we model and predict conditional\nquantiles and heavy tails of \ufb01nancial return series in a parsimonious quantile regression framework\nthat describes the distribution p(rt|rt\u22121, rt\u22122, . . . ) in a parametric quantile function way.\n\n\u2217Corresponding author.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fIt is known that \ufb01nancial asset returns are heavy-tail distributed, both conditionally and uncondition-\nally [7]. This has important consequences for both the pricing of assets and the management of their\nrisks [15]. More importantly, their tail behaviors are not only asymmetrical but also time-varying.\nIn GARCH family, t-distribution is heavy-tailed but symmetric, and more critically, the degrees of\nfreedom which control the tail heaviness cannot vary with time. Previous studies [17][22][1][25]\nmodelled time-varying conditional skewness and kurtosis in an autoregressive way, like volatility\nmodelling in GARCH. However, they all assumed complicated probability density functions, some\nof which even have no analytical forms, which make model estimation dif\ufb01cult. Besides, models that\nallow the data to speak for itself rather than being restricted to linear auto-regressiveness are needed.\nIn this paper, we approach the problem by parameterizing the conditional quantile function of asset\nreturns, instead of assuming they are drawn from a given probability density function, hence suffering\nfrom tractability and ill-posedness. Apart from probability density function, quantiles are another\nrepresentation of the asymmetry and tailedness of a distribution. If one can model and estimate\nconditional quantiles for a \ufb01ne set of probability levels, it achieves almost the same effect as modelling\nconditional mean, volatility, skewness, and kurtosis simultaneously. Quantile regression has the\npotential to undertake this interesting task, but the traditional version suffers from some issues. One\nis the lack of monotonicity in the estimated quantiles, also known as quantile crossing, despite some\nimperfect solutions proposed in [27] and [5]. Other issues include an increasing number of parameters\nwhen estimating more quantiles, and the lack of interpretability. Recently, some works deal with the\nlarge-scale [31] and high-dimensional [26] situations of quantile regression.\nIn this paper, we propose a parametric heavy-tailed quantile function (HTQF) to model a distribution\nwith asymmetric left and right heavy tails. The Q-Q plot of the proposed HTQF against the standard\nnormal distribution is of an inverted S shape, and the degrees of the tail heaviness are controlled\nby two parameters in a \ufb02exible way. Our HTQF overcomes the disadvantages of the probability\ndensity function approach in GARCH-type models when modelling asymmetric heavy tails. For\n\ufb01nancial asset returns, we let the quantile function of p(rt|rt\u22121, rt\u22122, . . . ) be an HTQF and let the\nparameters of it be time-varying and depend on past information through a Long Short-term Memory\n(LSTM) unit [18], which is a popular sequential neural network model and has been applied to many\npractical problems like video understanding [11][29][30], video prediction [4], and video retrieval\n[12]. Parameters of the LSTM can be learned in a quantile regression framework with multiple\nprobability levels. After training, the conditional quantiles of rt and the interpretable parameters of\nHTQF representing tail heaviness can be estimated.\nOur model has signi\ufb01cant advantages over GARCH-type models and traditional quantile regression.\nTo summarize, our contributions are: (1) We propose a novel parametric quantile function to represent\na distribution with asymmetric heavy tails, and leverage it to model the conditional distribution of\n\ufb01nancial return series. (2) In the quantile regression framework, coupled with an LSTM unit, our\nmethod can learn the time-varying tail behaviors successfully and predict conditional quantiles more\naccurately, as veri\ufb01ed by our experiments. (3) We overcome the disadvantages of traditional quantile\nregression, including the quantile crossing, the increasing number of parameters when estimating\nmore quantiles, and the lack of interpretability.\n\n2 GARCH-type Models\nFor a univariate \ufb01nancial return series {rt}, GARCH model was \ufb01rst proposed in [9] and [2] to model\nits time-varying volatility or volatility clustering. By making a prior assumption on the conditional\ndistribution of the residual \u03b5t = rt \u2212 \u00b5t (\u00b5t is the conditional mean of rt), letting it be normal\nN (0, \u03c32\nt ), the time-t volatility \u03c3t is modelled to depend on past residuals \u03b5t\u22121, . . . , \u03b5t\u2212q and past\nvolatilities \u03c3t\u22121, . . . , \u03c3t\u2212p. Formally, GARCH(p, q) is speci\ufb01ed as follows:\n\n\u03b5t|\u03c8t\u22121 \u223c N (0, \u03c32\nt ),\nt\u2212q + \u03b21\u03c32\n\nt\u22121 + \u00b7\u00b7\u00b7 + \u03b2p\u03c32\n\n(1)\n\nrt = \u00b5t + \u03b5t,\n\nt\u22121 + \u00b7\u00b7\u00b7 + \u03b1q\u03b52\n\n\u03c32\nt = \u03c9 + \u03b11\u03b52\n\n(2)\nwhere \u03c8t\u22121 denotes the past information set. The parameters \u03c9, \u03b1i, \u03b2j can be estimated with\nmaximum likelihood method. Because of its success in modelling and forecasting conditional\nvolatility, a lot of extensions and variants had been proposed such as EGARCH [24], GJR-GARCH\n[16], TGARCH [32], etc. Most of them made reasonable and interpretable changes to Equation (2)\nand achieved better performances.\n\nt\u2212p,\n\n2\n\n\fAlternatives can also be made in Equation (1). A better choice of the distribution assumption is\nStudent\u2019s t-distribution: \u03b5t = \u03c3tzt, zt|\u03c8t\u22121 \u223c t(\u03bd), where \u03bd is the degrees of freedom. t-distribution\nhas symmetric heavy tails at the left and right sides. We denote GARCH-type models with the\nt-distribution assumption by GARCH-t, EGARCH-t, etc. Besides, one can also choose different\nways to model the conditional mean \u00b5t, e.g., to use GARCH-type models alone, it can be set to a\nconstant \u00b5t = \u00b5. One can also adopt the linear autoregressive way: \u00b5t = \u03b30 + \u03b31rt\u22121 +\u00b7\u00b7\u00b7 + \u03b3srt\u2212s.\nWe denote GARCH-type models with this linear autoregressive speci\ufb01cation of conditional mean and\nwith the t-distribution assumption by AR-GARCH-t, AR-EGARCH-t, etc.\nAlthough GARCH-type models were initially designed to model and forecast conditional volatility,\nthey can naturally be used to predict conditional quantiles because they fully describe the conditional\ndistribution. Actually they are widely employed in \ufb01nance to predict Value-at-Risk (VaR), which are\nthe left-tail side quantiles, e.g., 0.01 or 0.05-quantile, representing downside risk of asset prices.\nAnother big family of models that have similarities with GARCH-type models are stochastic volatility\n(SV) models. Some comparisons between GARCH-type and SV models were made in [28][13][3][14].\nSV models are applied in situations when volatility contains independent risk driver. In continuous\ntime, if driven by Brownian Motion, they are Markovian, which is essentially different from GARCH\nfamily and our proposed model and may not be suitable for modelling serial dependence of volatility.\nWhat are comparable with GARCH-type and our models and are consistent with the focus of this\npaper, are long-memory volatility models driven by, e.g., fractional Brownian Motion or Hawkes\nprocess, and preferably in discrete time. CAViaR [10] is another similar model for estimating\nconditional quantiles inspired by GARCH. It models conditional quantiles separately for different\nprobability levels instead of making assumptions on the full conditional distribution. So it is somewhat\ndif\ufb01cult to estimate the conditional moments, also different from GARCH-type and our models.\n\n3 Traditional Quantile Regression\n\nQuantiles are important characteristics of a distribution. For a continuous distribution density p(y),\nfor a given probability level \u03c4 \u2208 (0, 1), e.g., \u03c4 = 0.1 or 0.9, the \u03c4-quantile q of p(y) is de\ufb01ned as\nq = F \u22121(\u03c4 ) where F (y) is the cumulative distribution function of p(y). Quantile regression [19][20]\naims to estimate the \u03c4-quantile q of the conditional distribution p(y|x). To do this, without making\nany assumption on p(y|x), a parametric function q = f\u03b8(x) is chosen, for example, a linear one\nq = w(cid:62)x + b. Note that q is an unobservable quantity, a specially designed loss function (named\npinball loss in this paper) between y and q makes the estimation feasible in quantile regression:\n\nL\u03c4 (y, q) =\n\n\u03c4|y \u2212 q|\n(1 \u2212 \u03c4 )|y \u2212 q|\n\ny > q\ny \u2264 q\n\n.\n\n(3)\n\nThen we minimize the expected loss in a traditional regression way to get the estimated parameter \u02c6\u03b8:\n(4)\n\nE[L\u03c4 (y, f\u03b8(x))].\n\nmin\n\nGiven a dataset {xi, yi}N\ni=1 L\u03c4 (yi, f\u03b8(xi)) is minimized instead.\nWhen we want to estimate multiple conditional quantiles q1, q2, . . . , qK for different probability\nlevels \u03c41 < \u03c42 < \u00b7\u00b7\u00b7 < \u03c4K, K different parametric functions qk = f\u03b8k (x) are chosen and the losses\nare summed up to be minimized simultaneously:\n\ni=1, the empirical average loss 1\nN\n\n(cid:80)N\n\n(cid:26)\n\n\u03b8\n\nK(cid:88)\n\nN(cid:88)\n\nk=1\n\ni=1\n\nmin\n\n\u03b81,...,\u03b8K\n\n1\nK\n\n1\nN\n\nL\u03c4k (yi, f\u03b8k (xi)).\n\n(5)\n\nHowever, this combination may lead to an embarrassing issue called quantile crossing, i.e., for\nsome x and \u03c4j < \u03c4k, it is possible that f\u03b8j (x) > f\u03b8k (x) which contradicts the probability theory.\nIt occurs because \u03b8j and \u03b8k are in fact independently estimated in the optimization. To overcome\nthis, additional constraints on the monotonicity of the quantiles can be added to the optimization to\nensure non-crossing [27]. Another simpler solution is post-processing, i.e., sorting or rearranging\nthe original estimated quantiles to be monotone [5]. Another two shortcomings of this traditional\nquantile regession include an increasing number of parameters when estimating quantiles for a larger\nset of \u03c4, i.e., K is larger. For a more elaborate description of a distribution, large K is necessary in\nsome cases. The other shortcoming is that the explicit mapping from x to the conditional quantile has\nno interpretability, making it dif\ufb01cult to combine domain knowledge.\n\n3\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 1: Q-Q plots against N (0, 1): (a) t(2); (b) HTQF with u = 1.0 and v = 0.1; (c) HTQF with\nu = 0.6 and v = 1.2. For all three distributions, \u00b5 = 1 and \u03c3 = 1.5. For HTQF, A = 4.\n\n4 Our Model\n\nWe \ufb01rst describe the proposed parametric quantile function, then show how it is used to model the\nconditional distribution p(rt|rt\u22121, rt\u22122, . . . ) of \ufb01nancial return series and how the dependence on\npast information is modelled. Our proposed model is completed in a quantile regression framework.\n\n4.1 Heavy-tailed Quantile Function\n\nThere are three common ways to fully express a continuous distribution, through probability density\nfunction (PDF), cumulative distribution function (CDF), or quantile function. In \ufb01nancial data\nmodelling, much attention is paid to how to choose an appropriate parametric PDF that is consistent\nwith the empirical facts of \ufb01nancial returns, like heavy tails. In our model, we design a parametric\nquantile function that allows varying tails and is intuitively easy to be understood.\nOur idea starts from the Q-Q plot, which is a popular method to determine whether a set of observa-\ntions follows a normal distribution or not. The theory behind this is quite simple: the \u03c4-quantile of\na normal distribution N (\u00b5, \u03c32) is \u00b5 + \u03c3Z\u03c4 , where Z\u03c4 is the \u03c4-quantile of the standard normal one.\nWhen \u03c4 takes different values in (0, 1), their Q-Q plot forms a straight line. If the Q-Q plot yields an\ninverted S shape, it indicates that the corresponding distribution is heavy-tailed (see Figure 1 (a) for\nan example of the Q-Q plot of t-distribution with 2 degrees of freedom against N (0, 1)).\nWe construct a parsimonious parametric quantile function, as a function of Z\u03c4 , to let it have a\ncontrollable shape in the Q-Q plot against the standard normal distribution. Speci\ufb01cally, the up tail\nand down tail of the inverted S shape in the Q-Q plot are controlled by two parameters respectively.\nOur proposed heavy-tailed quantile function (HTQF) has the following form:\n\nQ(\u03c4|\u00b5, \u03c3, u, v) = \u00b5 + \u03c3Z\u03c4\n\n+ 1\n\n(6)\n\n(cid:18) euZ\u03c4\n\nA\n\n(cid:19)(cid:18) e\u2212vZ\u03c4\n\n(cid:19)\n\n+ 1\n\n,\n\nA\n\nwhere \u00b5, \u03c3 are location and scale parameters respectively, A is a relatively large positive constant.\nu > 0 controls the up tail of the inverted S shape, i.e., the right tail of the corresponding distribution.\nv > 0 controls the down tail, i.e., the left tail of the corresponding distribution. The larger u or v, the\nheavier the tail. When u = v = 0, the HTQF becomes the quantile function of a normal distribution.\nTo understand these, note that in Equation (6), Z\u03c4 is \ufb01rst multiplied by two factors fu(Z\u03c4 ) =\neuZ\u03c4 /A + 1 and fv(Z\u03c4 ) = e\u2212vZ\u03c4 /A + 1, then multiplied by \u03c3 and added by \u00b5 (for simplicity one\ncan set \u00b5 = 0 and \u03c3 = 1). The factor fu is a monotonically increasing and convex function of Z\u03c4 ,\nand satis\ufb01es fu \u2192 1 as Z\u03c4 \u2192 \u2212\u221e. So Z\u03c4 fu(Z\u03c4 ) will exhibit the up tail of the inverted S only. The\nsame analysis applies to the factor fv too. Thus, Z\u03c4 fu(Z\u03c4 )fv(Z\u03c4 ) exhibits the whole inverted S of\nthe Q-Q plot. The roles of A are to let fu(0) and fv(0) be close to 1, and to ensure the HTQF is\nmonotonically increasing with Z\u03c4 . Figure 1 (b) and (c) show the Q-Q plots of HTQF with different\nvalues of u and v against N (0, 1). They exhibit different degrees of tailedness and the tails can\n\ufb02exibly change according to u and v. In addition, for an HTQF with \ufb01xed values of its parameters,\nthere exists a unique probability distribution associated with it because the inverse function of it exists\nand is a CDF. Please refer to the proof in the supplementary material.\n\n4\n\n202105051020210505102021050510\f4.2 Quantile Regression with HTQF\nFor the distribution p(rt|rt\u22121, rt\u22122, . . . ), different from GARCH-type models, we do not make\nassumptions on the PDF of it. Instead, we assume its quantile function being an HTQF, denoted by\nQ(\u03c4|\u00b5t, \u03c3t, ut, vt), where \u00b5t, \u03c3t, ut, vt are time-varying parameters representing the location, scale,\nand heavy tails of the corresponding distribution. Hence, the conditional \u03c4k-quantile of rt can be\neasily obtained by putting \u03c4k into the function: qt\nObviously, the parameters \u00b5t, \u03c3t, ut, vt should depend on past series rt\u22121, rt\u22122, . . . . To model that,\nwe select a subsequence of \ufb01xed length from rt\u22121, rt\u22122, . . . to construct a feature vector sequence,\nand apply an LSTM unit on it. LSTM [18] is a popular and powerful sequential neural network model\nin machine learning, so it is a natural choice in our method (see the supplementary material for a brief\nintroduction and [23] for a comprehensive review of LSTM). In detail, a \ufb01xed length L is chosen,\nand then a feature vector sequence of length L is constructed from rt\u22121, . . . , rt\u2212L:\n\nk = Q(\u03c4k|\u00b5t, \u03c3t, ut, vt), k = 1, . . . , K.\n\n\uf8ee\uf8ef\uf8f0\n\nrt\u2212L\n\n(rt\u2212L \u2212 \u00afrt)2\n(rt\u2212L \u2212 \u00afrt)3\n(rt\u2212L \u2212 \u00afrt)4\n\n\uf8f9\uf8fa\uf8fb , . . . ,\n\n\uf8ee\uf8ef\uf8f0\n\n\uf8f9\uf8fa\uf8fb ,\n\nrt\u22121\n\n(rt\u22121 \u2212 \u00afrt)2\n(rt\u22121 \u2212 \u00afrt)3\n(rt\u22121 \u2212 \u00afrt)4\n\n(7)\n\nxt\n1, . . . , xt\n\nL =\n\n(cid:80)L\n\nK(cid:88)\n\nT(cid:88)\n\nk=1\n\nt=L+1\n\nwhere \u00afrt = 1\ni=1 rt\u2212i. The intuition behind this construction is straightforward, which is to\nL\nextract information contained in raw quantities associated with the \ufb01rst, second, third, and fourth\ncentral moments of past L samples. After this construction, we model the four HTQF parameters\n\u00b5t, \u03c3t, ut, vt as the output of an LSTM unit when feeding input xt\n\n[\u00b5t, \u03c3t, ut, vt](cid:62) = tanh(W oht + bo),\n\n(8)\nwhere \u0398 is the LSTM parameters, ht is the last hidden state. W o, bo are the output layer parameters.\nAt last, for multiple probability levels 0 < \u03c41 < \u03c42 < \u00b7\u00b7\u00b7 < \u03c4K < 1, the pinball losses between rt\nk = Q(\u03c4k|\u00b5t, \u03c3t, ut, vt) are summed up to be minimized together, like\nand its conditional quantiles qt\nin traditional quantile regression:\n\nL),\n\nL:\n1, . . . , xt\nht = LSTM\u0398(xt\n1, . . . , xt\n\nmin\n\n\u0398,W o,bo\n\n1\nK\n\n1\n\nT \u2212 L\n\nL\u03c4k (rt, Q(\u03c4k|\u00b5t, \u03c3t, ut, vt)) .\n\n(9)\n\nCombine Equation (6)(7)(8)(9) to complete our proposed quantile regression model using LSTM and\nHTQF, denoted by LSTM-HTQF. After training, for new subsequent series {rt(cid:48)}, the time-varying\nparameters \u00b5t(cid:48), \u03c3t(cid:48), ut(cid:48), vt(cid:48) of HTQF can be calculated directly with the learned model parameters\n\u02c6\u0398, \u02c6W o, \u02c6bo. Among them, {ut(cid:48)} and {vt(cid:48)} can represent how the tails behave temporally. In addition,\nconditional quantiles qt(cid:48)\nk can be predicted and the summed loss in Equation (9) is evaluated again for\ntesting the performance on the new subsequent series, since no ground truth of quantiles are available.\n\n4.3 Discussions\n\nThe advantages of our model over GARCH-type models are obvious. The proposed HTQF is more\nintuitive and \ufb02exible to model asymmetric heavy tails than the PDFs in GARCH-type models, like\nthe skewed generalized t-distribution in [1]. To have varying tails, one PDF must be in complicated\nanalytical form that will make model estimation dif\ufb01cult. Even for the simplest one, the t-distribution,\nthe analytical complexity of its PDF makes model estimation unfeasible if one assumes time-varying\ndegrees of freedom, while our HTQF parameters can be easily set to be time-varying in the quantile\nregression framework. Besides, the LSTM can help to learn nonlinear dependence on past information\nwhile the linear auto-regressiveness in GARCH-type models cannot. We quantitatively compare our\nmodel to several classical GARCH-type models in the experiments.\nComparing to traditional quantile regression, our model overcomes the three shortcomings mentioned\nin Section 3. First, it is not hard to prove that the HTQF is a monotonically increasing function with\nZ\u03c4 , and also with \u03c4, so quantile crossing will never happen. Then, no matter how large K is, i.e., a lot\nof quantiles need to be estimated, we only need HTQF\u2019s four parameters \u00b5t, \u03c3t, ut, vt to determine\nall of them. That is a big saving in the number of parameters. At last, our model is interpretable, and\ncombines domain knowledge in \ufb01nance. For quantitative evaluation, we implement the traditional\nquantile regression in our experiments, also coupled with an LSTM unit. Mathematically describing it,\n\n5\n\n\f(cid:80)K\n\n(cid:80)T\n\n1\n\nT\u2212L\n\nk=1\n\n1, xt\n\nk: [qt\n\n1, . . . , qt\n\n2, . . . , xt\n\nt=L+1 L\u03c4k (rt, qt\n1 , . . . , qt(cid:48)\n\nK](cid:62) = tanh(W oht +\nk) is minimized as in Equation (9). For\nK of rt(cid:48) are sorted to ensure no crossing.\n\nin Equation (8), the output \u00b5t, \u03c3t, ut, vt are replaced by quantiles qt\nbo) and the summed loss 1\nK\nnew subsequent time t(cid:48), the predicted quantiles qt(cid:48)\nWe denote this model by LSTM-TQR.\nGenerally, feature vector sequence xt\nL should be designed to contain any information\nthat is related to the conditional distribution of rt or is helpful to the prediction, like trading volume,\nrelated assets, or fundamentals. To keep consistency with GARCH family and ensure the fairness of\nthe comparisons in experiments, we construct xt\nL only from past returns rt\u22121, rt\u22122, . . . .\nIn real applications of our method, more information can be included in the feature vector sequence.\nOur method is widely applicable in quantile prediction or time series modelling in many other non-\n\ufb01nancial \ufb01elds. Time series data exhibiting asymmetrical time-varying tail behavior and nonlinear\nserial dependence of conditional distribution, e.g., hydrologic data, internet traf\ufb01c data, and electricity\nprice and demand, is most suited. One can also change the standard normal distribution in the Q-Q\nplot (Z\u03c4 in HTQF in Equation (6)) to other baseline distribution, to let the HTQF have a controllable\nshape in the Q-Q plot against the speci\ufb01ed distribution, like exponential one or lognormal one, the\nchoice of which relies on domain knowledge.\n\n1, xt\n\n2, . . . , xt\n\n5 Experiments\n\nOur experiments are conducted on three types of time series datasets: simulated data, daily asset\nreturns (of stock indexes, exchange rates, and treasury yields), and intraday 5-minute commodity\nfutures returns. For daily returns, for every time series, the data of maximum possible length is used,\ne.g., S&P 500 index returns start from January 4, 1950 and end at July 2, 2018, which is the longest\nseries with more than 17,000 observations. The shortest has nearly 8000 observations. For intraday\ncommodity futures returns, the recent 1-year every 5-minute returns are used and each series has\nabout 20,000 observations. All returns are calculated by rt = Pt/Pt\u22121 \u2212 1 where Pt is the price,\nrate, or yield at time t.\nEach time series is divided into three successive parts, for training, validation, and testing respectively.\nThe training set is four \ufb01fths of the original series, and the validation and test sets are both one\ntenth. The training set is normalized to have sample mean 0 and sample variance 1, followed by\nnormalizing the validation and test sets in the exactly same way. The validation set is used for tuning\nhyper-parameters, and for stopping training when the loss on the validation set begins to increase, to\nprevent over\ufb01tting. Our model has two hyper-parameters, the length L of past series rt\u22121, . . . , rt\u2212L\non which time-t HTQF parameters \u00b5t, \u03c3t, ut, vt depend, and the hidden state dimension H of the\nLSTM unit. We denote our model with them by LSTM-HTQF(L,H). Similarly, the LSTM-TQR\nmodel described in Section 4.3 also has L and H as hyper-parameters. Our competing models are\nmainly GARCH-type models, from which we select some popular ones for comparisons: GARCH,\nGARCH-t, EGARCH-t, GJR-GARCH-t, AR-EGARCH-t, and AR-GJR-GARCH-t. In all of them,\ns, p, q are hyper-parameters that will be tuned (see Section 2 for details).\nThe tuning of the hyper-parameters is done in the following sets: L \u2208 {40, 60, 80, 100}, H \u2208 {8, 16},\nand s, p, q \u2208 {1, 2, 3}. The A in the HTQF is set to be 4 arbitrarily. We choose K = 21 probability\nlevels into the \u03c4 set: [\u03c41, . . . , \u03c421] = [0.01, 0.05, 0.1, . . . , 0.9, 0.95, 0.99]. Performance is evaluated\nusing the pinball loss on the test set. GARCH-type models can easily do this because the conditional\nPDF is modelled. For comparisons from different perspectives, two test performances over two\ndifferent \u03c4 sets are evaluated: one is the full \u03c4 set, the other is [0.01, 0.05, 0.1], the quantiles of which\nare VaR representing downside risk.\n\n5.1 Simulated Data\n\nThe purpose of the simulation experiment is to verify whether our method can learn the true temporal\nbehavior of the conditional distribution of a given time series. We generate our simulated time series\nin a way similar to GARCH-t model, but differently, let the degrees of freedom \u03bdt be time-varying.\nSpeci\ufb01cally, starting from r0 = 0 and \u03c30 = 1, the time series {rt} together with the scale parameter\n{\u03c3t} and tail parameter {\u03bdt} are generated as follows:\n\n\u03bdt = max{8 \u2212 2\u03c0t, 3},\n\n\u03c0t =\n\n0.136 + 0.257r2\n\nt\u22121 + 0.717\u03c02\n\nt\u22121,\n\n(10)\n\n(cid:113)\n\n6\n\n\f(a)\n\n(b)\n\nFigure 2: Comparisons between true parameters {\u03c3t} {\u03bdt} (black lines) and the learned HTQF\nparameters {\u02c6\u03c3t} {\u02c6ut} (red lines). Upper part of (a): {\u03c3t} v.s. {\u02c6\u03c3t} on the training set; lower part of\n(a): {\u03c3t} v.s. {\u02c6\u03c3t} on the test set; upper part of (b): {\u03bdt} v.s. {\u02c6ut} on the training set; lower part of\n(b): {\u03bdt} v.s. {\u02c6ut} on the test set. Linear transformations are made before plotting.\n\n(cid:113)\n\nt\u22121 + 0.575\u03c32\n\nt\u22121,\n\nrt = \u03c3tzt,\n\n\u03c3t =\n\n0.293 + 0.161r2\n\nzt is sampled from t(\u03bdt).\n\n(11)\nTotally 10,000 data points are generated. Some example pieces of the generated {\u03c3t} and {\u03bdt} are\nshown in Figure 2, where the left two black lines are {\u03c3t} and the right two black ones are {\u03bdt}.\nThe upper two are from training set, while the lower two are from test set. The red lines are HTQF\nparameters {\u02c6\u03c3t} and {\u02c6ut} learned by our method LSTM-HTQF(20,8) (20 and 8 are set arbitrarily\nwithout tuning). {\u03c3t} and {\u02c6\u03c3t} are plotted together, and {\u03bdt} and {\u02c6ut} are plotted together. We make\nlinear transformations to the raw quantities to let them be in similar ranges, to be plotted together.\nOne can see that the learned HTQF scale and tail parameters {\u02c6\u03c3t} {\u02c6ut} are highly linearly correlated\nto the true parameters {\u03c3t} {\u03bdt}, on both the training set and the test set. It means that our method\nhas successfully learned the temporal behavior of the conditional distribution of rt. In fact, the linear\ncorrelation coef\ufb01cients between the two lines in the four subplots are 0.8751, -0.8974, 0.9548, and\n-0.8808 respectively. Negative signs are due to the fact that the heavier the tail, the bigger \u02c6ut but\nthe smaller \u03bdt. After running linear regressions between them, we obtain R-squared values: 0.7658,\n0.8054, 0.9116, and 0.7758. Another learned parameter {\u02c6vt} is similar to {\u02c6ut}, and is not shown\nhere, because the t-distribution used for generating the data is symmetric.\n\n5.2 Real-world Market Data\n\nIn this experiment, \ufb01rst, world\u2019s representative stock indexes, exchange rates, and treasury yields are\nselected, including S&P 500, NASDAQ 100, HSI, Nikkei 225, DAX, FTSE 100, exchange rate of\nUSD to EUR/GBP/CHF/JPY/AUD, and U.S. treasury yield of 2/10/30 years. We report the pinball\nlosses of every methods on the test sets of every asset return series, as shown in Table 1. In parts (a)\nand (c) of Table 1, the losses over the full \u03c4 set are reported, while in parts (b) and (d) the losses over\nonly [0.01, 0.05, 0.1] (the quantiles of which are VaR) are reported. It is clearly shown that on most\nassets our LSTM-HTQF outperforms the competitors. Moreover, in parts (b) and (d), the performance\nimprovements are more signi\ufb01cant than in (a) and (c), which is consistent with the intuition that\nthe tails are better modelled by our method. Note that the pinball loss is such a measure between\nobservation and quantile that, even for ground truth quantile, the loss is not zero and is bounded by a\npositive number. So a small decrease in the loss may actually be a substantial improvement. We also\nconduct the Kupiec\u2019s unconditional coverage test [21], the Christoffersen\u2019s independence test [6], and\nthe mixed conditional coverage test for backtesting the VaR forecasts of various models, and show\nthe results in the supplementary material (see [8] for a description of details of these statistical tests).\nTo investigate the tail dynamics captured by the LSTM-HTQF model, we plot the HTQF parameters\n{\u02c6ut} and {\u02c6vt} on the S&P 500 test set in Figure 3 (a), where the blue line is the right tail parameter\n\n7\n\n\fTable 1: The pinball losses on the test sets of daily data of stock indexes, exchange rates, and treasury\nyields. The losses are evaluated over two different \u03c4 sets: (a)(c) [0.01, 0.05, 0.1, . . . , 0.9, 0.95, 0.99];\n(b)(d) [0.01, 0.05, 0.1]. USnY represents the U.S. treasury yield of n years.\n\nMethod\\Stock Index\nGARCH\nGARCH-t\nEGARCH-t\nGJR-GARCH-t\nAR-EGARCH-t\nAR-GJR-GARCH-t\nLSTM-TQR\nLSTM-HTQF\n\nMethod\\Stock Index\nGARCH\nGARCH-t\nEGARCH-t\nGJR-GARCH-t\nAR-EGARCH-t\nAR-GJR-GARCH-t\nLSTM-TQR\nLSTM-HTQF\n\n(a)\nS&P 500 NASDAQ 100\n0.2316\n0.2314\n0.2308\n0.2314\n0.2304\n0.2310\n0.2325\n0.2299\n\n0.1406\n0.1396\n0.1395\n0.1396\n0.1391\n0.1393\n0.1380\n0.1387\n\n(b)\nS&P 500 NASDAQ 100\n0.1039\n0.1048\n0.1037\n0.1048\n0.1041\n0.1052\n0.1032\n0.1025\n\n0.0669\n0.0667\n0.0668\n0.0667\n0.0666\n0.0666\n0.0644\n0.0646\n\nHSI\n0.1623\n0.1612\n0.1611\n0.1612\n0.1611\n0.1612\n0.1601\n0.1598\n\nHSI\n0.0729\n0.0719\n0.0717\n0.0719\n0.0715\n0.0717\n0.0709\n0.0702\n\nNikkei 225\n\n0.2868\n0.2855\n0.2851\n0.2855\n0.2847\n0.2852\n0.2822\n0.2854\n\nNikkei 225\n\n0.1339\n0.1330\n0.1324\n0.1330\n0.1327\n0.1333\n0.1284\n0.1289\n\nDAX\n0.1968\n0.1961\n0.1957\n0.1961\n0.1952\n0.1963\n0.1938\n0.1932\n\nDAX\n0.0853\n0.0850\n0.0840\n0.0850\n0.0834\n0.0854\n0.0812\n0.0810\n\nFTSE 100\n\n0.1987\n0.1987\n0.1983\n0.1987\n0.1982\n0.1981\n0.1961\n0.1959\n\nFTSE 100\n\n0.0855\n0.0861\n0.0854\n0.0861\n0.0856\n0.0852\n0.0830\n0.0827\n\nMethod\\Asset\nGARCH\nGARCH-t\nEGARCH-t\nGJR-GARCH-t\nAR-EGARCH-t\nAR-GJR-t\nLSTM-TQR\nLSTM-HTQF\n\nMethod\\Asset\nGARCH\nGARCH-t\nEGARCH-t\nGJR-GARCH-t\nAR-EGARCH-t\nAR-GJR-t\nLSTM-TQR\nLSTM-HTQF\n\n(c)\n\nUSDEUR USDGBP USDCHF USDJPY USDAUD US2Y US10Y US30Y\n0.2952\n0.2260\n0.2948\n0.2258\n0.2940\n0.2258\n0.2948\n0.2258\n0.2258\n0.2941\n0.2947\n0.2259\n0.2943\n0.2250\n0.2247\n0.2937\n\n0.2861\n0.1935\n0.2858\n0.1931\n0.2855\n0.1923\n0.1931\n0.2858\n0.1916 0.2854\n0.2857\n0.1924\n0.2872\n0.1928\n0.2849\n0.1925\n\n0.2329\n0.2338\n0.2370\n0.2338\n0.2367\n0.2346\n0.2318\n0.2322\n\n0.2361\n0.2366\n0.2352\n0.2366\n0.2353\n0.2367\n0.2350\n0.2351\n\n0.2222\n0.2206\n0.2202\n0.2206\n0.2199\n0.2203\n0.2195\n0.2193\n\n0.2025\n0.2009\n0.2032\n0.2009\n0.2007\n0.2005\n0.1966\n0.1966\n\n(d)\n\nUSDEUR USDGBP USDCHF USDJPY USDAUD US2Y US10Y US30Y\n0.1232\n0.0942\n0.1236\n0.0941\n0.1223\n0.0938\n0.0941\n0.1236\n0.1224\n0.0938\n0.1226\n0.0941\n0.0923\n0.1231\n0.0930\n0.1224\n\n0.1232\n0.0902\n0.1229\n0.0876\n0.1218\n0.0879\n0.1229\n0.0876\n0.1218\n0.0877\n0.1229\n0.0875\n0.0879\n0.1231\n0.0869 0.1199\n\n0.0996\n0.0978\n0.0975\n0.0978\n0.0976\n0.0980\n0.0965\n0.0958\n\n0.1041\n0.1026\n0.1054\n0.1026\n0.1026\n0.1026\n0.0975\n0.0971\n\n0.0913\n0.0923\n0.1062\n0.0923\n0.1053\n0.0916\n0.0899\n0.0897\n\n0.0965\n0.0984\n0.0959\n0.0984\n0.0960\n0.0982\n0.0948\n0.0946\n\n{\u02c6ut} and the red one is the left {\u02c6vt}. We can see roughly similar patterns in the two lines, both with\nclustering and spikes, but different in details.\nAt last, we collect intraday 5-minute returns of \ufb01ve commodity futures from Chinese futures market:\nsteel rebar, natural rubber, soybean, cotton, and sugar. To reduce the dif\ufb01culty, the overnight jumps\nare eliminated. In the same way as daily asset returns, the losses on the test sets are reported in\nTable 2, which also shows that our LSTM-HTQF outperforms the competitors on most assets. The\nplotting of {\u02c6ut} and {\u02c6vt} on the steel rebar test set is shown in Figure 3 (b), which indicates that\nhigh-frequency \ufb01nancial asset returns also have time-varying heavy tails. The different tail dynamic\nwith S&P 500 may attribute to the different time scales of the two time series.\n\n6 Conclusions\n\nIn summary, in this paper, we proposed a parametric HTQF to represent the asymmetric heavy-tailed\nconditional distribution of \ufb01nancial return series. The dependence of HTQF\u2019s parameters on past\n\n8\n\n\f(a)\n\n(b)\n\nFigure 3: The HTQF parameters {\u02c6ut} and {\u02c6vt} on the test set of: (a) S&P 500 daily data; (b) steel\nrebar 5-minute data. The blue line is {\u02c6ut} and the red one is {\u02c6vt}.\n\nTable 2: The pinball losses on the test sets of 5-minute return data of commodity futures. The losses\nare evaluated over two different \u03c4 sets: (a) [0.01, 0.05, 0.1, . . . , 0.9, 0.95, 0.99]; (b) [0.01, 0.05, 0.1].\n\nSteel Rebar Natural Rubber\n\nSteel Rebar Natural Rubber\n\nMethod\\Commodity\nGARCH\nEGARCH-t\nGJR-GARCH-t\nAR-EGARCH-t\nAR-GJR-GARCH-t\nLSTM-TQR\nLSTM-HTQF\n\nMethod\\Commodity\nGARCH\nEGARCH-t\nGJR-GARCH-t\nAR-EGARCH-t\nAR-GJR-GARCH-t\nLSTM-TQR\nLSTM-HTQF\n\n0.1770\n0.1643\n0.1648\n0.1646\n0.1652\n0.1644\n0.1639\n\n0.0882\n0.0797\n0.0801\n0.0805\n0.0807\n0.0769\n0.0767\n\n(a)\n\n0.1701\n0.1564\n0.1576\n0.1572\n0.1586\n0.1543\n0.1548\n(b)\n\n0.0885\n0.0797\n0.0807\n0.0810\n0.0825\n0.0765\n0.0770\n\nSoybean\n0.2424\n0.2392\n0.2393\n0.2391\n0.2391\n0.2389\n0.2385\n\nSoybean\n0.1077\n0.1062\n0.1059\n0.1063\n0.1060\n0.1059\n0.1065\n\nCotton\n0.1621\n0.1524\n0.1526\n0.1522\n0.1524\n0.1504\n0.1501\n\nCotton\n0.0783\n0.0720\n0.0721\n0.0719\n0.0721\n0.0710\n0.0704\n\nSugar\n0.1958\n0.1859\n0.1859\n0.1857\n0.1857\n0.1844\n0.1842\n\nSugar\n0.0994\n0.0935\n0.0935\n0.0937\n0.0937\n0.0922\n0.0916\n\ninformation is modelled by an LSTM unit. The pinball loss between the observation and conditional\nquantiles makes the learning of LSTM parameters be in a quantile regression framework, which\novercomes the disadvantages of traditional quantile regression. After learning, conditional quantiles\nor VaR can be predicted with relatively better accuracy, and besides, the plotting of HTQF parameters\nshows us the dynamic tail behaviors of \ufb01nancial asset returns, some of which display clustering and\nspikes but difference between left and right tails.\nAlthough our paper focuses on the tail dynamics, in the future, more advanced models that can learn\nmore elaborate dynamics of the conditional distribution of \ufb01nancial time series are necessary, e.g.,\nimproving the \ufb02exibility of the HTQF or modifying the way how LSTM is used may be needed.\nMoreover, it is important to discover how LSTM and HTQF in our model work respectively and\nhow they contribute to the performance improvements. It is also interesting to interpret what tail\ndynamics of \ufb01nancial assets we have captured, and what consequences it has for understanding\nmarket behaviors, for asset pricing, and for risk management. To make those clear, we may need\nmore in-depth analysis of our model and more statistical testing and analysis of the VaR or quantile\nforecasts of the model in the future.\n\nAcknowledgement\n\nQi WU acknowledges the \ufb01nancial support from the Hong Kong Research Grants Council, in\nparticular the Early Career Scheme 24200514 and the General Research Funds 14211316 and\n14206117.\n\n9\n\n0.40.20.00.20.40.60.81.00.40.60.81.01.21.41.61.80.40.20.00.20.40.60.81.01.20.60.81.01.21.41.61.82.0\fReferences\n[1] Turan G Bali, Hengyong Mo, and Yi Tang. The role of autoregressive conditional skewness and\nkurtosis in the estimation of conditional var. Journal of Banking & Finance, 32(2):269\u2013282,\n2008.\n\n[2] Tim Bollerslev. Generalized autoregressive conditional heteroskedasticity. Journal of econo-\n\nmetrics, 31(3):307\u2013327, 1986.\n\n[3] M Angeles Carnero, Daniel Pe\u00f1a, and Esther Ruiz. Persistence and kurtosis in garch and\n\nstochastic volatility models. Journal of Financial Econometrics, 2(2):319\u2013342, 2004.\n\n[4] Xinpeng Chen, Jingyuan Chen, Lin Ma, Jian Yao, Wei Liu, Jiebo Luo, and Tong Zhang. Fine-\ngrained video attractiveness prediction using multimodal deep learning on a large real-world\ndataset. In Companion of the The Web Conference 2018 on The Web Conference 2018, pages\n671\u2013678. International World Wide Web Conferences Steering Committee, 2018.\n\n[5] Victor Chernozhukov, Iv\u00e1n Fern\u00e1ndez-Val, and Alfred Galichon. Quantile and probability\n\ncurves without crossing. Econometrica, 78(3):1093\u20131125, 2010.\n\n[6] Peter F Christoffersen. Evaluating interval forecasts. International economic review, pages\n\n841\u2013862, 1998.\n\n[7] Rama Cont. Empirical properties of asset returns: stylized facts and statistical issues. Quantita-\n\ntive Finance, 1(2):223\u2013236, 2001.\n\n[8] Alexandra Dias. Market capitalization and value-at-risk. Journal of Banking & Finance,\n\n37(12):5248\u20135260, 2013.\n\n[9] Robert F Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of\nunited kingdom in\ufb02ation. Econometrica: Journal of the Econometric Society, pages 987\u20131007,\n1982.\n\n[10] Robert F Engle and Simone Manganelli. Caviar: Conditional autoregressive value at risk by\n\nregression quantiles. Journal of Business & Economic Statistics, 22(4):367\u2013381, 2004.\n\n[11] Lijie Fan, Wenbing Huang, Stefano Ermon Chuang Gan, Boqing Gong, and Junzhou Huang.\nEnd-to-end learning of motion representation for video understanding. In Proceedings of the\nIEEE Conference on Computer Vision and Pattern Recognition, pages 6016\u20136025, 2018.\n\n[12] Yang Feng, Lin Ma, Wei Liu, Tong Zhang, and Jiebo Luo. Video re-localization. In Proceedings\n\nof the European Conference on Computer Vision (ECCV), pages 51\u201366, 2018.\n\n[13] Jeff Fleming and Chris Kirby. A closer look at the relation between garch and stochastic\n\nautoregressive volatility. Journal of \ufb01nancial econometrics, 1(3):365\u2013419, 2003.\n\n[14] Philip Hans Franses, Marco Van Der Leij, and Richard Paap. A simple test for garch against a\n\nstochastic volatility model. Journal of Financial Econometrics, 6(3):291\u2013306, 2007.\n\n[15] Paul Glasserman and Qi Wu. Persistence and procyclicality in margin requirements. Manage-\n\nment Science, 2018.\n\n[16] Lawrence R Glosten, Ravi Jagannathan, and David E Runkle. On the relation between the\nexpected value and the volatility of the nominal excess return on stocks. The journal of \ufb01nance,\n48(5):1779\u20131801, 1993.\n\n[17] Bruce E Hansen. Autoregressive conditional density estimation. International Economic Review,\n\npages 705\u2013730, 1994.\n\n[18] Sepp Hochreiter and J\u00fcrgen Schmidhuber. Long short-term memory. Neural computation,\n\n9(8):1735\u20131780, 1997.\n\n[19] Roger Koenker and Gilbert Bassett Jr. Regression quantiles. Econometrica: journal of the\n\nEconometric Society, pages 33\u201350, 1978.\n\n10\n\n\f[20] Roger Koenker and Kevin F Hallock. Quantile regression. Journal of economic perspectives,\n\n15(4):143\u2013156, 2001.\n\n[21] Paul H Kupiec. Techniques for verifying the accuracy of risk measurement models. The Journal\n\nof Derivatives, 3(2):73\u201384, 1995.\n\n[22] \u00c1ngel Le\u00f3n, Gonzalo Rubio, and Gregorio Serna. Autoregresive conditional volatility, skewness\n\nand kurtosis. The Quarterly Review of Economics and Finance, 45(4-5):599\u2013618, 2005.\n\n[23] Zachary C Lipton, John Berkowitz, and Charles Elkan. A critical review of recurrent neural\n\nnetworks for sequence learning. arXiv preprint arXiv:1506.00019, 2015.\n\n[24] Daniel B Nelson. Conditional heteroskedasticity in asset returns: A new approach. Economet-\n\nrica: Journal of the Econometric Society, pages 347\u2013370, 1991.\n\n[25] Michael Rockinger and Eric Jondeau. Entropy densities with an application to autoregressive\n\nconditional skewness and kurtosis. Journal of Econometrics, 106(1):119\u2013142, 2002.\n\n[26] Vidyashankar Sivakumar and Arindam Banerjee. High-dimensional structured quantile regres-\n\nsion. In International Conference on Machine Learning, pages 3220\u20133229, 2017.\n\n[27] Ichiro Takeuchi, Quoc V Le, Timothy D Sears, and Alexander J Smola. Nonparametric quantile\n\nestimation. Journal of Machine Learning Research, 7(Jul):1231\u20131264, 2006.\n\n[28] Stephen J Taylor. Modeling stochastic volatility: A review and comparative study. Mathematical\n\n\ufb01nance, 4(2):183\u2013204, 1994.\n\n[29] Bairui Wang, Lin Ma, Wei Zhang, and Wei Liu. Reconstruction network for video captioning.\nIn Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages\n7622\u20137631, 2018.\n\n[30] Jingwen Wang, Wenhao Jiang, Lin Ma, Wei Liu, and Yong Xu. Bidirectional attentive fusion\nwith context gating for dense video captioning. In Proceedings of the IEEE Conference on\nComputer Vision and Pattern Recognition, pages 7190\u20137198, 2018.\n\n[31] Jiyan Yang, Xiangrui Meng, and Michael Mahoney. Quantile regression for large-scale applica-\n\ntions. In International Conference on Machine Learning, pages 881\u2013887, 2013.\n\n[32] Jean-Michel Zakoian. Threshold heteroskedastic models. Journal of Economic Dynamics and\n\ncontrol, 18(5):931\u2013955, 1994.\n\n11\n\n\f", "award": [], "sourceid": 798, "authors": [{"given_name": "Xing", "family_name": "Yan", "institution": "The Chinese University of Hong Kong"}, {"given_name": "Weizhong", "family_name": "Zhang", "institution": "Zhejiang University"}, {"given_name": "Lin", "family_name": "Ma", "institution": "Tencent AI Lab"}, {"given_name": "Wei", "family_name": "Liu", "institution": "Tencent AI Lab"}, {"given_name": "Qi", "family_name": "Wu", "institution": "City University of Hong Kong"}]}