{"title": "Gaussian Process Conditional Copulas with Applications to Financial Time Series", "book": "Advances in Neural Information Processing Systems", "page_first": 1736, "page_last": 1744, "abstract": "The estimation of dependencies between multiple variables is a central problem in the analysis of financial time series. A common approach is to express these dependencies in terms of a copula function. Typically the copula function is assumed to be constant but this may be innacurate when there are covariates that could have a large influence on the dependence structure of the data. To account for this, a Bayesian framework for the estimation of conditional copulas is proposed. In this framework the parameters of a copula are non-linearly related to some arbitrary conditioning variables. We evaluate the ability of our method to predict time-varying dependencies on several equities and currencies and observe consistent performance gains compared to static copula models and other time-varying copula methods.", "full_text": "Gaussian Process Conditional Copulas with\n\nApplications to Financial Time Series\n\nJos\u00b4e Miguel Hern\u00b4andez-Lobato\n\nEngineering Department\nUniversity of Cambridge\njmh233@cam.ac.uk\n\nJames Robert Lloyd\nEngineering Department\nUniversity of Cambridge\njrl44@cam.ac.uk\n\nDaniel Hern\u00b4andez-Lobato\nComputer Science Department\n\nUniversidad Aut\u00b4onoma de Madrid\ndaniel.hernandez@uam.es\n\nAbstract\n\nThe estimation of dependencies between multiple variables is a central problem\nin the analysis of \ufb01nancial time series. A common approach is to express these\ndependencies in terms of a copula function. Typically the copula function is as-\nsumed to be constant but this may be inaccurate when there are covariates that\ncould have a large in\ufb02uence on the dependence structure of the data. To account\nfor this, a Bayesian framework for the estimation of conditional copulas is pro-\nposed. In this framework the parameters of a copula are non-linearly related to\nsome arbitrary conditioning variables. We evaluate the ability of our method to\npredict time-varying dependencies on several equities and currencies and observe\nconsistent performance gains compared to static copula models and other time-\nvarying copula methods.\n\n1\n\nIntroduction\n\nUnderstanding dependencies within multivariate data is a central problem in the analysis of \ufb01nancial\ntime series, underpinning common tasks such as portfolio construction and calculation of value-at-\nrisk. Classical methods estimate these dependencies in terms of a covariance matrix (possibly time\nvarying) which is induced from the data [4, 5, 7, 1]. However, a more general approach is to use\ncopula functions to model dependencies [6]. Copulas have become popular since they separate\nthe estimation of marginal distributions from the estimation of the dependence structure, which is\ncompletely determined by the copula.\nThe use of standard copula methods to estimate dependencies is likely to be inaccurate when the\nactual dependencies are strongly in\ufb02uenced by other covariates. For example, dependencies can\nvary with time or be affected by observations of other time series. Standard copula methods can-\nnot handle such conditional dependencies. To address this limitation, we propose a probabilistic\nframework to estimate conditional copulas. Speci\ufb01cally we assume parametric copulas whose pa-\nrameters are speci\ufb01ed by unknown non-linear functions of arbitrary conditioning variables. These\nlatent functions are approximated using Gaussian processes (GP) [17].\nGPs have previously been used to model conditional copulas in [12] but that work only applies\nto copulas speci\ufb01ed by a single parameter. We extend this work to accommodate copulas with\nmultiple parameters. This is an important improvement since it allows the use of a richer set of\ncopulas including Student\u2019s t and asymmetric copulas. We demonstrate our method by choosing the\nconditioning variables to be time and evaluating its ability to estimate time-varying dependencies\n\n1\n\n\fFigure 1: Left, Gaussian copula density for \u03c4 = 0.3. Middle, Student\u2019s t copula density for \u03c4 = 0.3\nand \u03bd = 1. Right, symmetrized Joe Clayton copula density for \u03c4 U = 0.1 and \u03c4 L = 0.6. The latter\ncopula model is asymmetric along the main diagonal of the unit square.\n\non several currency and equity time series. Our method achieves consistently superior predictive\nperformance compared to static copula models and other dynamic copula methods. These include\nmodels that allow their parameters to change with time, e.g. regime switching models [11] and\nmethods proposing GARCH-style updates to copula parameters [20, 11].\n\n2 Copulas and Conditional Copulas\n\nCopulas provide a powerful framework for the construction of multivariate probabilistic models by\nseparating the modeling of univariate marginal distributions from the modeling of dependencies\nbetween variables [6]. We focus on bivariate copulas since higher dimensional copulas are typically\nconstructed using bivariate copulas as building blocks [e.g 2, 12].\nSklar\u2019s theorem [18] states that given two one-dimensional random variables, X and Y , with con-\ntinuous marginal cumulative distribution functions (cdfs) FX (X) and FY (Y ), we can express their\njoint cdf FX,Y as FX,Y (x, y) = CX,Y [FX (x), FY (y)], where CX,Y is the unique copula for X and\nY . Since FX (X) and FY (Y ) are marginally uniformly distributed on [0, 1], CX,Y is the cdf of a\nprobability distribution on the unit square [0, 1] \u00d7 [0, 1] with uniform marginals. Figure 1 shows\nplots of the copula densities for three parametric copula models: Gaussian, Student\u2019s t and the sym-\nmetrized Joe Clayton (SJC) copulas. Copula models can be learnt in a two step process [10]. First,\nthe marginals FX and FY are learnt by \ufb01tting univariate models. Second, the data are mapped to\nthe unit square by U = FX (X), V = FY (Y ) (i.e. a probability integral transform) and then CX,Y\nis then \ufb01t to the transformed data.\n\n2.1 Conditional Copulas\n\nFX,Y |Z(x, y|z) = CX,Y |Z\n\n(cid:2)FX|Z(x|z), FY |Z(y|z)|z(cid:3) .\n\nWhen one has access to a covariate vector Z, one may wish to estimate a conditional version of a\ncopula model i.e.\n\n(1)\nHere, the same two-step estimation process can be used to estimate FX,Y |Z(x, y|z). The estimation\nof the marginals FX|Z and FY |Z can be implemented using standard methods for univariate con-\nditional distribution estimation. However, the estimation of CX,Y |Z is constrained to have uniform\nmarginal distributions; this is a problem that has only been considered recently [12]. We propose a\ngeneral Bayesian non-parametric framework for the estimation of conditional copulas based on GPs\nand an alternating expectation propagation (EP) algorithm for ef\ufb01cient approximate inference.\n\n3 Gaussian Process Conditional Copulas\nLet DZ = {zi}n\ni=1 where (ui, vi) is a sample drawn from CX,Y |zi.\nWe assume that CX,Y |Z is a parametric copula model Cpar[u, v|\u03b81(z), . . . , \u03b8k(z)] speci\ufb01ed by k\nparameters \u03b81, . . . , \u03b8k that may be functions of the conditioning variable z. Let \u03b8i(z) = \u03c3i[fi(z)],\n\ni=1 and DU,V = {(ui, vi)}n\n\n2\n\nGaussian Copula0.20.40.60.80.20.40.60.8Student's t Copula0.20.40.60.80.20.40.60.8Symmetrized Joe Clayton Copula0.20.40.60.80.20.40.60.8\fwhere fi is an arbitrary real function and \u03c3i is a function that maps the real line to a set \u0398i of valid\ncon\ufb01gurations for \u03b8i. For example, Cpar could be a Student\u2019s t copula. In this case, k = 2 and \u03b81\nand \u03b82 are the correlation and the degrees of freedom in the Student\u2019s t copula, \u03981 = (\u22121, 1) and\n\u03982 = (0,\u221e). One could then choose \u03c31(\u00b7) = 2\u03a6(\u00b7) \u2212 1, where \u03a6 is the standard Gaussian cdf and\n\u03c32(\u00b7) = exp(\u00b7) to satisfy the constraint sets \u03981 and \u03982 respectively.\nOnce we have speci\ufb01ed the parametric form of Cpar and the mapping functions \u03c31, . . . , \u03c3k, we need\nto learn the latent functions f1, . . . , fk. We perform a Bayesian non-parametric analysis by placing\nGP priors on these functions and computing their posterior distribution given the observed data.\nLet fi = (fi(z1), . . . , fi(zn))T. The prior distribution for fi given DZ is p(fi|DZ) = N (fi|mi, Ki),\nwhere mi = (mi(z1), . . . , mi(zn))T for some mean function mi(z) and Ki is an n \u00d7 n covariance\nmatrix generated by the squared exponential covariance function, i.e.\n\n[Ki]jk = Cov[fi(zj), fi(zk)] = \u03b2i exp(cid:8)\u2212(zj \u2212 zk)Tdiag(\u03bbi)(zj \u2212 zk)(cid:9) + \u03b3i ,\n\n(2)\n\nwhere \u03bbi is a vector of inverse length-scales and \u03b2i, \u03b3i are amplitude and noise parameters. The\nposterior distribution for f1, . . . , fk given DU,V and DZ is\n\n(cid:104)(cid:81)n\n\ni=1 cpar\n\n(cid:2)ui, vi|\u03c31 [f1(zi)] , . . . , \u03c3k [fk(zi)](cid:3)(cid:105)(cid:104)(cid:81)k\n\np(f1, . . . , fk|DU,V ,DZ) =\n(3)\nwhere cpar is the density of the parametric copula model and p(DU,V |DZ) is a normalization con-\nstant often called the model evidence. Given a particular value of Z denoted by z(cid:63), we can make\npredictions about the conditional distribution of U and V using the standard GP prediction formula\n\np(DU,V |DZ)\n\n,\n\n(cid:105)\n\ni=1 N (fi|mi, Ki)\n\np(u(cid:63), v(cid:63)|z(cid:63)) =\n\nk ])p(f (cid:63)|f1, . . . , fk, z(cid:63),Dz)\n\n(cid:90)\nk )T, p(f (cid:63)|f1, . . . , fk, z(cid:63),Dz) = (cid:81)k\n\ncpar(u(cid:63), v(cid:63)|\u03c31[f (cid:63)\np(f1, . . . , fk|DU,V ,DZ) df1 \u00b7\u00b7\u00b7 dfk df (cid:63) ,\n\n1 ], . . . , \u03c3k[f (cid:63)\n\n1 , . . . , f (cid:63)\ni |fi, z(cid:63),Dz) = N (f (cid:63)\n\nwhere f (cid:63) = (f (cid:63)\ni = fi(z(cid:63)),\ni ki), ki = Cov[fi(z(cid:63)), fi(z(cid:63))]\np(f (cid:63)\nand ki = (Cov[fi(z(cid:63)), fi(z1)], . . . , Cov[fi(z(cid:63)), fi(zn)])T. Unfortunately, (3) and (4) cannot be\ncomputed analytically, so we approximate them using expectation propagation (EP) [13].\n\n(fi \u2212 mi), ki \u2212 kT\n\ni |mi(z(cid:63)) + kT\n\ni=1 p(f (cid:63)\ni K\u22121\n\ni K\u22121\n\ni\n\ni |fi, z(cid:63),Dz), f (cid:63)\n\n3.1 An Alternating EP Algorithm for Approximate Bayesian Inference\nThe joint distribution for f1, . . . , fk and DU,V given DZ can be written as a product of n + k factors:\n\n(4)\n\np(f1, . . . , fk,DU,V |DZ) =\n\ngi(f1i, . . . , fki, )\n\nhi(fi)\n\n,\n\n(5)\n\n(cid:34) n(cid:89)\n\ni=1\n\n(cid:35)\n\n(cid:35)(cid:34) k(cid:89)\n\ni=1\n\nj=1 exp(cid:8)\u2212(fji \u2212 \u02dcmji)2/[2\u02dcvji](cid:9), where si > 0, \u02dcmji and \u02dcvji are param-\n(cid:81)k\n\nwhere fji = fj(zi), hi(fi) = N (fi|mi, Ki) and gi(f1i, . . . , fki) = cpar[ui, vi|\u03c31[f1i], . . . , \u03c3k[fki]].\nEP approximates each factor gi with an approximate Gaussian factor \u02dcgi that may not integrate to one,\ni.e. \u02dcgi(f1i, . . . , fki) = si\neters to be calculated by EP. The other factors hi already have a Gaussian form so they do not need\nto be approximated. Since all the \u02dcgi and hi are Gaussian, their product is, up to a normalization con-\nstant, a multivariate Gaussian distribution q(f1, . . . , fk) which approximates the exact posterior (3)\nand factorizes across f1, . . . , fk. The predictive distribution (4) is approximated by \ufb01rst integrating\np(f (cid:63)|f1, . . . , fk, z(cid:63),Dz) with respect to q(f1, . . . , fk). This results in a factorized Gaussian distribu-\ntion q(cid:63)(f (cid:63)) which approximates p(f (cid:63)|DU,V ,DZ). Finally, (4) is approximated by Monte-Carlo by\nsampling from q(cid:63) and then averaging cpar(u(cid:63), v(cid:63)|\u03c31[f (cid:63)\nEP iteratively updates each \u02dcgi until convergence by \ufb01rst computing q\\i \u221d q/\u02dcgi and then minimizing\nthe Kullback-Leibler divergence [3] between giq\\i and \u02dcgiq\\i. This involves updating \u02dcgi so that the\n\ufb01rst and second marginal moments of giq\\i and \u02dcgiq\\i match. However, it is not possible to compute\nthe moments of giq\\i analytically due to the complicated form of gi. A solution is to use numerical\nmethods to compute these k-dimensional integrals. However, this typically has an exponentially\nlarge computational cost in k which is prohibitive for k > 1. Instead we perform an additional\napproximation when computing the marginal moments of fji with respect to giq\\i. Without loss of\n\nk ]) over the samples.\n\n1 ], . . . , \u03c3k[f (cid:63)\n\n3\n\n\fgenerality, assume that we want to compute the expectation of f1i with respect to giq\\i. We make\nthe following approximation:\n\n(cid:90)\n\nf1igi(f1i, . . . , fki)q\n\n\\i(f1i, . . . ,fki) df1i, . . . , dfki \u2248\n\n(cid:90)\n\nC \u00d7\n\nf1igi(f1i, \u00aff2i, . . . , \u00affki)q\n\n\\i(f1i, \u00aff2i, . . . , \u00affki) df1i ,\n\n(6)\n\nwhere \u00aff1i, . . . , \u00affki are the means of f1i, . . . , fki with respect to q\\i, and C is a constant that approx-\nimates the width of the integrand around its maximum in all dimensions except f1i. In practice all\nmoments are normalized by the 0-th moment so C can be ignored. The right hand side of (6) is a one-\ndimensional integral that can be easily computed using numerical techniques. The approximation\nabove is similar to approximating an integral by the product of the maximum value of the integrand\nand an estimate of its width. However, instead of maximizing gi(f1i, . . . , fki)q\\i(f1i, . . . , fki) with\nrespect to f2i, . . . , fki, we are maximizing q\\i. This is a much easier task because q\\i is Gaussian\nand its maximizer is its own mean vector. In practice, gi(f1i, . . . , fki) is very \ufb02at when compared to\nq\\i and the maximizer of q\\i approximates well the maximizer of gi(f1i, . . . , fki)q\\i(f1i, . . . , fki).\nSince q factorizes across f1, . . . , fk (as well as q\\i), our implementation of EP decouples into k EP\nsub-routines among which we alternate; the j-th sub-routine approximates the posterior distribution\nof fj using as input the means of q\\i generated by the other EP sub-routines. Each sub-routine \ufb01nds\na Gaussian approximation to a set of n one-dimensional factors; one factor per data point. In the\nj-th EP sub-routine, the i-th factor is given by gi(f1i, . . . , fki), where each {f1i, . . . , fki} \\ {fji}\nis kept \ufb01xed to the current mean of q\\i, as estimated by the other EP sub-routines. We iteratively\nalternate between sub-routines, running each one until convergence before re-running the next one.\nConvergence is achieved very quickly; we only run each EP sub-routine four times.\nThe EP sub-routines are implemented using the parallel EP update scheme described in [21]. To\nspeed up GP related computations, we use the generalized FITC approximation [19, 14]: Each\nn \u00d7 n covariance matrix Ki is approximated by K(cid:48)\ni = Qi + diag(Ki \u2212 Qi), where Qi =\nn0n0 is the n0 \u00d7 n0 covariance matrix generated by evaluating (2) at\n]T, Ki\nKi\nn0 (cid:28) n pseudo-inputs, and Ki\nnn0 is the n\u00d7 n0 matrix with the covariances between training points\n0). Each time we call the j-th EP subroutine, we opti-\nand pseudo-inputs. The cost of EP is O(knn2\nmize the corresponding kernel hyper-parameters \u03bbj, \u03b2j and \u03b3j and the pseudo-inputs by maximizing\nthe EP approximation of the model evidence [17].\n\n]\u22121[Ki\n\n[Ki\n\nnn0\n\nn0n0\n\nnn0\n\n4 Related Work\n\nThe model proposed here is an extension of the conditional copula model of [12]. In the case of\nbivariate data and a copula based on one parameter the models are identical. We have extended the\napproximate inference for this model to accommodate copulas with multiple parameters; previously\ncomputationally infeasible due to requiring the numerical calculation of multidimensional integrals\nwithin an inner loop of EP inference. We have also demonstrated that one can use this model to\nproduce excellent predictive results on \ufb01nancial time series by conditioning the copula on time.\n\n4.1 Dynamic Copula Models\n\nIn [11] a dynamic copula model is proposed based on a two-state hidden Markov model (HMM)\n(St \u2208 {0, 1}) that assumes that the data generating process changes between two regimes of\nlow/high correlation. At any time t the copula density is Student\u2019s t with different parameters for\nthe two values of the hidden state St. Maximum likelihood estimation of the copula parameters and\ntransition probabilities is performed using an EM algorithm [e.g. 3].\nA time-varying correlation (TVC) model based on the Student\u2019s t copula is described in [20, 11].\nThe correlation parameter1of a Student\u2019s t copula is assumed to satisfy \u03c1t = (1 \u2212 \u03b1 \u2212 \u03b2)\u03c1 +\n\u03b1\u03b5t\u22121 + \u03b2\u03c1t\u22121, where \u03b5t\u22121 is the empirical correlation of the previous 10 observations and \u03c1, \u03b1\nand \u03b2 satisfy \u22121 \u2264 \u03c1 \u2264 1, 0 \u2264 \u03b1, \u03b2 \u2264 1 and \u03b1 + \u03b2 \u2264 1. The number of degrees of freedom \u03bd\n\n4\n\n\fis assumed to be constant. The previous formula is the GARCH equation for correlation instead of\nvariance. Estimation of \u03c1, \u03b1, \u03b2 and \u03bd is easily performed by maximum likelihood.\nIn [15] a dynamic copula based on the SJC copula (DSJCC) is introduced.\nparameters \u03c4 U and \u03c4 L of an SJC copula are assumed to depend on time according to\n\nIn this method, the\n\n\u03c4 U (t) = 0.01 + 0.98\u039b(cid:2)\u03c9U + \u03b1U \u03b5t\u22121 + \u03b2U \u03c4 U (t \u2212 1)(cid:3) ,\n\u03c4 L(t) = 0.01 + 0.98\u039b(cid:2)\u03c9L + \u03b1L\u03b5t\u22121 + \u03b2L\u03c4 L(t \u2212 1)(cid:3) ,\n\n(7)\n(cid:80)10\n(8)\nwhere \u039b[\u00b7] is the logistic function, \u03b5t\u22121 = 1\nj=1 |ut\u2212j \u2212 vt\u2212j|, (ut, vt) is a copula sample at\ntime t and the constants are used to avoid numerical instabilities. These formulae are the GARCH\nequation for correlations, with an additional logistic function to constrain parameter values. The\nestimation of \u03c9U , \u03b1U , \u03b2U , \u03c9L, \u03b1L and \u03b2L is performed by maximum likelihood.\nWe go beyond this prior work by allowing copula parameters to depend on an arbitrary conditioning\nvariables rather than time alone. Also, the models above either assume Markov independence or\nGARCH-like updates to copula parameters. These assumptions have been empirically proven to\nbe effective for the estimation of univariate variances, but the consistent performance gains of our\nproposed method suggest these assumptions are less applicable for the estimation of dependencies.\n\n10\n\n4.2 Other Dynamic Covariance Models\n\nA direct extension of the GARCH equations to multiple time series, VEC, was proposed by [5].\nLet x(t) be a multivariate time series assumed to satisfy x(t) \u223c N (0, \u03a3(t)). VEC(p, q) models the\ndynamics of \u03a3(t) by an equation of the form\n\nvech(\u03a3(t)) = c +\n\nAk vech(x(t \u2212 k)x(t \u2212 k)T) +\n\nBk vech(\u03a3(t \u2212 k))\n\n(9)\n\nk=1\n\nk=1\n\nwhere vech is the operation that stacks the lower triangular part on a matrix into a column vector.\nThe VEC model has a very large number of parameters and hence a more commonly used model is\nthe BEKK(p, q) model [7] which assumes the following dynamics\n\np(cid:88)\n\np(cid:88)\n\nq(cid:88)\n\nq(cid:88)\n\n\u03bd(cid:88)\n\n\u03a3(t) = CTC +\n\nkx(t \u2212 k)x(t \u2212 k)TAk +\nAT\n\nk \u03a3(t \u2212 k)Bk.\nBT\n\n(10)\n\nk=1\n\nk=1\n\nThis model also has many parameters and many restricted versions of these models have been pro-\nposed to avoid over-\ufb01tting (see e.g. section 2 of [1]).\nAn alternative solution to over-\ufb01tting due to over-parameterization is the Bayesian approach of [23]\nwhere Bayesian inference is performed in a dynamic BEKK(1, 1) model. Other Bayesian approaches\ninclude the non-parametric generalized Wishart process [22, 8]. In these works \u03a3(t) is modeled by\na generalized Wishart process i.e.\n\n\u03a3(t) =\n\nLui(t)ui(t)TLT\n\n(11)\n\nwhere uid(\u00b7) are distributed as independent GPs.\n\ni=1\n\n5 Experiments\n\nWe evaluate the proposed Gaussian process conditional copula models (GPCC) on a one-step-ahead\nprediction task with synthetic data and \ufb01nancial time series. We use time as the conditioning vari-\nable and consider three parametric copula families; Gaussian (GPCC-G), Student\u2019s t (GPCC-T) and\nsymmetrized Joe Clayton (GPCC-SJC). The parameters of these copulas are presented in Table 1\nalong with the transformations used to model them. Figure 1 shows plots of the densities of these\nthree parametric copula models. The code and data are publicly available at http://jmhl.org.\n\n1The parameterization used in this paper is related by \u03c1 = sin(0.5\u03c4 \u03c0)\n\n5\n\n\fCopula\nGaussian\nStudent\u2019s t\n\nSJC\n\nParameters\ncorrelation, \u03c4\ncorrelation, \u03c4\n\ndegrees of freedom, \u03bd\nupper dependence, \u03c4 U\nlower dependence, \u03c4 L\n\nTransformation\n0.99(2\u03a6[f (t)] \u2212 1)\n0.99(2\u03a6[f (t)] \u2212 1)\n1 + 106\u03a6[g(t)]\n\n0.01 + 0.98\u03a6[g(t)]\n0.01 + 0.98\u03a6[g(t)]\n\nSynthetic parameter function\n\u03c4 (t) = 0.3 + 0.2 cos(t\u03c0/125)\n\u03c4 (t) = 0.3 + 0.2 cos(t\u03c0/125)\n\u03bd(t) = 1 + 2(1 + cos(t\u03c0/250))\n\n\u03c4 U (t) = 0.1 + 0.3(1 + cos(t\u03c0/125))\n\n\u03c4 L(t) = 0.1 + 0.3(1 + cos(t\u03c0/125 + \u03c0/2))\n\nTable 1: Copula parameters, modeling formulae and parameter functions used to generate synthetic\ndata. \u03a6 is the standard Gaussian cumulative density function f and g are GPs.\n\nThe three variants of GPCC were compared against three dynamic copula methods and three con-\nstant copula models. The three dynamic methods include the HMM based model, TVC and DSJCC\nintroduced in Section 4. The three constant copula models use Gaussian, Student\u2019s t and SJC copulas\nwith parameter values that do not change with time (CONST-G, CONST-T and CONST-SJC). We\nperform a one-step-ahead rolling-window prediction task on bivariate time series {(ut, vt)}. Each\nmodel is trained on the \ufb01rst nW data points and the predictive log-likelihood of the (nW +1)\u2212th data\npoint is recorded, where nW = 1000. This is then repeated, shifting the training and test windows\nforward by one data point. The methods are then compared by average predictive log-likelihood; an\nappropriate performance measure for copula estimation since copulas are probability distributions.\n\n5.1 Synthetic Data\n\nWe generated three synthetic datasets of length 5001 from copula models (Gaussian, Student\u2019s t,\nSJC) whose parameters vary as periodic functions of time, as speci\ufb01ed in Table 1. Table 2 reports\nthe average predictive log-likelihood for each method on each synthetic time series. The results of\nthe best performing method on each synthetic time series are shown in bold. The results of any\nother method are underlined when the differences with respect to the best performing method are\nnot statistically signi\ufb01cant according to a paired t test at \u03b1 = 0.05.\nGPCC-T and GPCC-SJC obtain the best results in the Student\u2019s t and SJC time series respectively.\nHowever, HMM is the best performing method for the Gaussian time series. This technique suc-\ncessfully captures the two regimes of low/high correlation corresponding to the peaks and troughs\nof the sinusoid that maps time t to correlation \u03c4. The proposed methods GPCC-[G,T,SJC] are more\n\ufb02exible and hence less ef\ufb01cient than HMM in this particular problem. However, HMM performs\nsigni\ufb01cantly worse in the Student\u2019s t and SJC time series since the different periods for the different\ncopula parameter functions cannot be captured by a two state model. Figure 2 shows how GPCC-T\nsuccessfully tracks \u03c4 (t) and \u03bd(t) in the Student\u2019s t time series. The plots display the mean (red) and\ncon\ufb01dence bands (orange, 0.1 and 0.9 quantiles) for the predictive distribution of \u03c4 (t) and \u03bd(t) as\nwell as the ground truth values (blue). Finally, Table 2 also shows that the static copula methods\nCONST-[G,T,SJC] are usually outperformed by all dynamic techniques GPCC-[G,T,SJC], DSJCC,\nTVC and HMM.\n\n5.2 Foreign Exchange Time Series\n\nWe evaluated each method on the daily logarithmic returns of nine currencies shown in Table 3 (all\npriced with respect to the U.S. dollar).The date range of the data is 02-01-1990 to 15-01-2013; a\ntotal of 6011 observations. We evaluated the methods on eight bivariate time series, pairing each\ncurrency pair with the Swiss franc (CHF). CHF is known to be a safe haven currency, meaning that\ninvestors \ufb02ock to it during times of uncertainty [16]. Consequently we expect correlations between\nCHF and other currencies to have large variability across time in response to changes in \ufb01nancial\nconditions.\nWe \ufb01rst process our data using an asymmetric AR(1)-GARCH(1,1) process with non-parametric\ninnovations [9] to estimate the univariate marginal cdfs at all time points. We train this GARCH\nmodel on nW = 2016 data points and then predict the cdf of the next data point; subsequent cdfs\nare predicted by shifting the training window by one data point in a rolling-window methodology.\nThe cdf estimates are used to transform the raw logarithmic returns (xt, yt) into a pseudo-sample\nof the underlying copula (ut, vt) as described in Section 2. We note that any method for predicting\nunivariate cdfs could have been used to produce pseudo-samples from the copula. We then perform\n\n6\n\n\fthe rolling-window predictive likelihood experiment on the transformed data. The results are shown\nin Table 4; overall the best technique is GPCC-T, followed by GPCC-G. The dynamic copula meth-\nods GPCC-[G,T,SJC], HMM, and TVC outperform the static methods CONST-[G,T,SJC] in all the\nanalyzed series. The dynamic method DSJCC occasionally performed poorly; worse than the static\nmethods for 3 experiments.\n\nGaussian Student SJC\nMethod\n0.3879 0.2513\n0.3347\nGPCC-G\n0.4656 0.2610\n0.3397\nGPCC-T\n0.4132 0.2771\n0.3355\nGPCC-SJC\n0.3555\n0.4422 0.2547\nHMM\n0.4273 0.2534\n0.3277\nTVC\n0.4096 0.2612\n0.3329\nDSJCC\n0.3201 0.2339\n0.3129\nCONST-G\n0.3178\n0.4218 0.2499\nCONST-T\n0.3812 0.2502\nCONST-SJC 0.3002\n\ntest log-likelihood of\n\nFigure 2: Predictions made by GPCC-T for \u03bd(t) and \u03c4 (t) on\nthe synthetic time series sampled from a Student\u2019s t copula.\nCode Currency Name\nCHF Swiss Franc\nAUD Australian Dollar\nCAD Canadian Dollar\nJPY Japanese Yen\nNOK Norwegian Krone\nSEK Swedish Krona\nEUR Euro\nNZD New Zeland Dollar\nGBP British Pound\n\nTable 2: Avg.\neach method on each time series.\nAUD CAD\nJPY NOK SEK EUR GBP NZD\nMethod\n0.1260 0.0562 0.1221 0.4106 0.4132 0.8842 0.2487 0.1045\nGPCC-G\n0.1319 0.0589 0.1201 0.4161 0.4192 0.8995 0.2514 0.1079\nGPCC-T\nGPCC-SJC 0.1168 0.0469 0.1064 0.3941 0.3905 0.8287 0.2404 0.0921\n0.1164 0.0478 0.1009 0.4069 0.3955 0.8700 0.2374 0.0926\nHMM\n0.1181 0.0524 0.1038 0.3930 0.3878 0.7855 0.2301 0.0974\nTVC\nDSJCC\n0.0798 0.0259 0.0891 0.3994 0.3937 0.8335 0.2320 0.0560\nCONST-G 0.0925 0.0398 0.0771 0.3413 0.3426 0.6803 0.2085 0.0745\nCONST-T\n0.1078 0.0463 0.0898 0.3765 0.3760 0.7732 0.2231 0.0875\nCONST-SJC 0.1000 0.0425 0.0852 0.3536 0.3544 0.7113 0.2165 0.0796\n\nTable 3: Currencies.\n\nTable 4: Avg. test log-likelihood of each method on the currency data.\n\nFigure 3: Left and middle, predictions made by GPCC-T for \u03bd(t) and \u03c4 (t) on the time series EUR-\nCHF when trained on data from 10-10-2006 to 09-08-2010. There is a signi\ufb01cant reduction in \u03bd(t)\nat the onset of the 2008-2012 global recession. Right, predictions made by GPCC-SJC for \u03c4 U (t) and\n\u03c4 L(t) when trained on the same time-series data. The predictions for \u03c4 L(t) are much more erratic\nthan those for \u03c4 U (t).\n\nThe proposed method GPCC-T can capture changes across time in the parameters of the Student\u2019s t\ncopula. The left and middle plots in Figure 3 show predictions for \u03bd(t) and \u03c4 (t) generated by GPCC-\nT. In the left plot, we observe a reduction in \u03bd(t) at the onset of the 2008-2012 global recession\nindicating that the return series became more prone to outliers. The plot for \u03c4 (t) (middle) also\nshows large changes across time.\nIn particular, we observe large drops in the dependence level\nbetween EUR-USD and CHF-USD during the fall of 2008 (at the onset of the global recession) and\nthe summer of 2010 (corresponding to the worsening European sovereign debt crisis).\nFor comparison, we include predictions for \u03c4 L(t) and \u03c4 U (t) made by GPCC-SJC in the right plot\nof Figure 3. In this case, the prediction for \u03c4 U (t) is similar to the one made by GPCC-T for \u03c4 (t),\n\n7\n\n02004006008001000515Student's t Time Series,Mean GPCC\u2212TGround truth020040060080010000.20.40.60.8Student's t Time SeriesMean GPCC\u2212TGround truth020406080100120140EUR\u2212CHF GPCC-T,Oct 06Mar 07Aug 07Jan 08Jun 08Nov 08Apr 09Oct 09Mar 10Aug 10Mean GPCC\u2212T0.30.40.50.60.70.80.91.0EUR\u2212CHF GPCC-T,Oct 06Mar 07Aug 07Jan 08Jun 08Nov 08Apr 09Oct 09Mar 10Aug 10Mean GPCC\u2212T0.20.40.60.81.01.2EUR\u2212CHF  GPCC-SJC,Oct 06Mar 07Aug 07Jan 08Jun 08Nov 08Apr 09Oct 09Mar 10Aug 10Mean GPCC-SJCMean GPCC-SJC\fbut the prediction for \u03c4 L(t) is much noisier and erratic. This suggests that GPCC-SJC is less robust\nthan GPCC-T. All the copula densities in Figure 1 take large values in the proximity of the points\n(0,0) and (1,1) i.e. positive correlation. However, the Student\u2019s t copula is the only one of these\nthree copulas which can take high values in the proximity of the points (0,1) and (1,0) i.e. negative\ncorrelation. The plot in the left of Figure 3 shows how \u03bd(t) takes very low values at the end of the\ntime period, increasing the robustness of GPCC-T to negatively correlated outliers.\n\n5.3 Equity Time Series\n\nAs a further comparison, we evaluated each method on the logarithmic returns of 8 equity pairs, from\nthe same date range and processed using the same AR(1)-GARCH(1,1) model discussed previously.\nThe equities were chosen to include pairs with both high correlation (e.g. RBS and BARC) and low\ncorrelation (e.g. AXP and BA).\nThe results are shown in Table 5; again the best technique is GPCC-T, followed by GPCC-G.\n\nCSX\n\nAXP CNW ED\nBA\nEIX\n\nHPQ BARC RBS RBS\nHD\nHON\nIBM HSBC BARC HSBC\nMethod\n0.1247 0.1133 0.1450 0.2072 0.1536 0.2424 0.3401 0.1860\nGPCC-G\n0.1289 0.1187 0.1499 0.2059 0.1591 0.2486 0.3501 0.1882\nGPCC-T\nGPCC-SJC 0.1210 0.1095 0.1399 0.1935 0.1462 0.2342 0.3234 0.1753\n0.1260 0.1119 0.1458 0.2040 0.1511 0.2486 0.3414 0.1818\nHMM\n0.1251 0.1119 0.1459 0.2011 0.1511 0.2449 0.3336 0.1823\nTVC\nDSJCC\n0.0935 0.0750 0.1196 0.1721 0.1163 0.2188 0.3051 0.1582\nCONST-G 0.1162 0.1027 0.1288 0.1962 0.1325 0.2307 0.2979 0.1663\nCONST-T\n0.1239 0.1091 0.1408 0.2007 0.1481 0.2426 0.3301 0.1775\nCONST-SJC 0.1175 0.1046 0.1307 0.1891 0.1373 0.2268 0.2992 0.1639\n\nFigure 4: Prediction for \u03bd(t)\non RBS-BARC.\n\nTable 5: Average test log-likelihood for each method on each pair of\nstocks.\n\nFigure 4 shows predictions for \u03bd(t) generated by GPCC-T. We observe low values of \u03bd during\n2010 suggesting that a Gaussian copula would be a bad \ufb01t to the data. Indeed, GPCC-G performs\nsigni\ufb01cantly worse than GPCC-T on this equity pair.\n\n6 Conclusions and Future Work\n\nWe have proposed an inference scheme to \ufb01t a conditional copula model to multivariate data where\nthe copula is speci\ufb01ed by multiple parameters. The copula parameters are modeled as unknown non-\nlinear functions of arbitrary conditioning variables. We evaluated this framework by estimating time-\nvarying copula parameters for bivariate \ufb01nancial time series. Our method consistently outperforms\nstatic copula models and other dynamic copula models.\nIn this initial investigation we have focused on bivariate copulas. Higher dimensional copulas are\ntypically constructed using bivariate copulas as building blocks [2, 12]. Our framework could be\napplied to these constructions and our empirical predictive performance gains will likely transfer to\nthis setting. Evaluating the effectiveness of this approach compared to other models of multivariate\ncovariance would be a pro\ufb01table area of empirical research.\nOne could also extend the analysis presented here by including additional conditioning variables\nas well as time. For example, including a prediction of univariate volatility as a conditioning vari-\nable would allow copula parameters to change in response to changing volatility. This would pose\ninference challenges as the dimension of the GP increases, but could create richer models.\n\nAcknowledgements\n\nWe thank David L\u00b4opez-Paz and Andrew Gordon Wilson for interesting discussions. Jos\u00b4e Miguel\nHern\u00b4andez-Lobato acknowledges support from Infosys Labs, Infosys Limited. Daniel Hernandez-\nLobato acknowledges support from the Spanish Direcci\u00b4on General de Investigaci\u00b4on, project ALLS\n(TIN2010-21575-C02-02).\n\n8\n\n01020304050RBS\u2212BARC GPCC-TApr 09Sep 09Aug 10Jan 11Jul 11Dec 11Jun 12Nov 12Apr 13Mean GPCC\u2212T\fReferences\n[1] L. Bauwens, S. Laurent, and J. V. K. Rombouts. Multivariate GARCH models: a survey. Journal of\n\nApplied Econometrics, 21(1):79\u2013109, 2006.\n\n[2] T. Bedford and R. M. Cooke. Probability density decomposition for conditionally dependent random\n\nvariables modeled by vines. Annals of Mathematics and Arti\ufb01cial Intelligence, 32(1-4):245\u2013268, 2001.\n\n[3] C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer,\n\n2007.\n\n[4] T. Bollerslev. Generalized autoregressive conditional heteroskedasticity.\n\n31(3):307\u2013327, 1986.\n\nJournal of Econometrics,\n\n[5] T. Bollerslev, R. F. Engle, and J. M. Wooldridge. A capital asset pricing model with time-varying covari-\n\nances. The Journal of Political Economy, pages 116\u2013131, 1988.\n\n[6] G. Elidan. Copulas and machine learning. In Invited survey to appear in the proceedings of the Copulae\n\nin Mathematical and Quantitative Finance workshop, 2012.\n\n[7] R. F. Engle and K. F. Kroner. Multivariate simultaneous generalized ARCH. Econometric theory,\n\n11(1):122\u2013150, 1995.\n\n[8] E. B. Fox and D. B. Dunson. Bayesian nonparametric covariance regression. arXiv:1101.2017, 2011.\n[9] J. M. Hern\u00b4andez-Lobato, D. Hern\u00b4andez-Lobato, and A. Su\u00b4arez. GARCH processes with non-parametric\nIn Arti\ufb01cial Neural Networks ICANN 2007, volume 4669 of\n\ninnovations for market risk estimation.\nLecture Notes in Computer Science, pages 718\u2013727. Springer Berlin Heidelberg, 2007.\n\n[10] H. Joe. Asymptotic ef\ufb01ciency of the two-stage estimation method for copula-based models. Journal of\n\nMultivariate Analysis, 94(2):401\u2013419, 2005.\n\n[11] E. Jondeau and M. Rockinger. The Copula-GARCH model of conditional dependencies: An international\n\nstock market application. Journal of International Money and Finance, 25(5):827\u2013853, 2006.\n\n[12] D. Lopez-Paz, J. M. Hern\u00b4andez-Lobato, and Z. Ghahramani. Gaussian process vine copulas for multi-\nvariate dependence. In S Dasgupta and D McAllester, editors, JMLR W&CP 28(2): Proceedings of The\n30th International Conference on Machine Learning, pages 10\u201318. JMLR, 2013.\n\n[13] T. P. Minka. Expectation Propagation for approximate Bayesian inference. Proceedings of the 17th\n\nConference in Uncertainty in Arti\ufb01cial Intelligence, pages 362\u2013369, 2001.\n\n[14] A. Naish-Guzman and S. Holden. The generalized \ufb01tc approximation. In J.C. Platt, D. Koller, Y. Singer,\nand S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1057\u20131064. MIT\nPress, Cambridge, MA, 2008.\n\n[15] A. J. Patton. Modelling asymmetric exchange rate dependence.\n\n47(2):527\u2013556, 2006.\n\nInternational Economic Review,\n\n[16] A. Ranaldo and P. S\u00a8oderlind. Safe haven currencies. Review of Finance, 14(3):385\u2013407, 2010.\n[17] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.\n[18] A. Sklar. Fonctions de r\u00b4epartition `a n dimensions et leurs marges. Publ. Inst. Statis. Univ. Paris, 8(1):229\u2013\n\n231, 1959.\n\n[19] E. Snelson and Z. Ghahramani. Sparse gaussian processes using pseudo-inputs. In Y. Weiss, B. Sch\u00a8olkopf,\nand J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 1257\u20131264. MIT\nPress, Cambridge, MA, 2006.\n\n[20] Y. K. Tse and A. K. C. Tsui. A multivariate generalized autoregressive conditional heteroscedasticity\nmodel with time-varying correlations. Journal of Business & Economic Statistics, 20(3):351\u2013362, 2002.\n[21] M. A. J. van Gerven, B. Cseke, F. P. de Lange, and T. Heskes. Ef\ufb01cient bayesian multivariate fmri analysis\n\nusing a sparsifying spatio-temporal prior. NeuroImage, 50(1):150\u2013161, 2010.\n\n[22] A. G. Wilson and Z. Ghahramani. Generalised Wishart processes. In F. Cozman and A. Pfeffer, editors,\nProceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Arti\ufb01cial Intelli-\ngence (UAI-11), Barcelona, Spain, 2011. AUAI Press.\n\n[23] Y. Wu, J. M. Hernandez-Lobato, and Z. Ghahramani. Dynamic covariance models for multivariate \ufb01-\nIn S. Dasgupta and D. McAllester, editors, Proceedings of the 30th International\nnancial time series.\nConference on Machine Learning (ICML-13), volume 28, pages 558\u2013566. JMLR Workshop and Confer-\nence Proceedings, 2013.\n\n9\n\n\f", "award": [], "sourceid": 876, "authors": [{"given_name": "Jos\u00e9 Miguel", "family_name": "Hern\u00e1ndez-Lobato", "institution": "University of Cambridge"}, {"given_name": "James", "family_name": "Lloyd", "institution": "University of Cambridge"}, {"given_name": "Daniel", "family_name": "Hern\u00e1ndez-Lobato", "institution": "Universidad Aut\u00f3noma de Madrid"}]}