{"title": "Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.", "book": "Advances in Neural Information Processing Systems", "page_first": 2535, "page_last": 2543, "abstract": "Many applications require the analysis of complex interactions between time series. These interactions can be non-linear and involve vector valued as well as complex data structures such as graphs or strings. Here we provide a general framework for the statistical analysis of these interactions when random variables are sampled from stationary time-series of arbitrary objects. To achieve this goal we analyze the properties of the kernel cross-spectral density operator induced by positive definite kernels on arbitrary input domains. This framework enables us to develop an independence test between time series as well as a similarity measure to compare different types of coupling. The performance of our test is compared to the HSIC test using i.i.d. assumptions, showing improvement in terms of detection errors as well as the suitability of this approach for testing dependency in complex dynamical systems. Finally, we use this approach to characterize complex interactions in electrophysiological neural time series.", "full_text": "Statistical analysis of coupled time series with Kernel\n\nCross-Spectral Density operators.\n\nMPI for Intelligent Systems and MPI for Biological Cybernetics, T\u00a8ubingen, Germany\n\nmichel.besserve@tuebingen.mpg.de\n\nMichel Besserve\n\nNikos K. Logothetis\n\nMPI for Biological Cybernetics, T\u00a8ubingen\n\nnikos.logothetis@tuebingen.mpg.de\n\nBernhard Sch\u00a8olkopf\n\nMPI for Intelligent Systems, T\u00a8ubingen\n\nbs@tuebingen.mpg.de\n\nAbstract\n\nMany applications require the analysis of complex interactions between time se-\nries. These interactions can be non-linear and involve vector valued as well as\ncomplex data structures such as graphs or strings. Here we provide a general\nframework for the statistical analysis of these dependencies when random vari-\nables are sampled from stationary time-series of arbitrary objects. To achieve this\ngoal, we study the properties of the Kernel Cross-Spectral Density (KCSD) oper-\nator induced by positive de\ufb01nite kernels on arbitrary input domains. This frame-\nwork enables us to develop an independence test between time series, as well as a\nsimilarity measure to compare different types of coupling. The performance of our\ntest is compared to the HSIC test using i.i.d. assumptions, showing improvements\nin terms of detection errors, as well as the suitability of this approach for testing\ndependency in complex dynamical systems. This similarity measure enables us to\nidentify different types of interactions in electrophysiological neural time series.\n\n1\n\nIntroduction\n\nComplex dynamical systems can often be observed by monitoring time series of one or more vari-\nables. Finding and characterizing dependencies between several of these time series is key to un-\nderstand the underlying mechanisms of these systems. This problem can be addressed easily in\nlinear systems [4], however non-linear systems are much more challenging. Whereas higher order\nstatistics can provide helpful tools in speci\ufb01c contexts [15], and have been extensively used in sys-\ntem identi\ufb01cation, causal inference and blind source separation (see for example [10, 13, 5]); it is\ndif\ufb01cult to derive a general approach with solid theoretical results accounting for a broad range of\ninteractions. Especially, studying the relationships between time series of arbitrary objects such as\ntexts or graphs within a general framework is largely unaddressed.\nOn the other hand, the dependency between independent identically distributed (i.i.d.) samples\nof arbitrary objects can be studied elegantly in the framework of positive de\ufb01nite kernels [19]. It\nrelies on de\ufb01ning cross-covariance operators between variables mapped implicitly to Reproducing\nKernel Hilbert Spaces (RKHS) [7]. It has been shown that when using a characteristic kernel for\nthe mapping [9], the properties of RKHS operators are related to statistical independence between\ninput variables and allow testing for it in a principled way with the Hilbert-Schmidt Independence\nCriterion (HSIC) test [11]. However, the suitability of this test relies heavily on the assumption\nthat i.i.d. samples of random variables are used. This assumption is obviously violated in any non-\ntrivial setting involving time series, and as a consequence trying to use HSIC in this context can\nlead to incorrect conclusions. Zhang et al. established a framework in the context of Markov chains\n\n1\n\n\f[22], showing that a structured HSIC test still provides good asymptotic properties for absolutely\nregular processes. However, this methodology has not been assessed extensively in empirical time\nseries. Moreover, beyond the detection of interactions, it is important to be able to characterize the\nnature of the coupling between time series. It was recently suggested that generalizing the concept\nof cross-spectral density to Reproducible Kernel Hilbert Spaces (RKHS) could help formulate non-\nlinear dependency measures for time series [2]. However, no statistical assessment of this measure\nhas been established. In this paper, after recalling the concept of kernel spectral density operator,\nwe characterize its statistical properties. In particular, we de\ufb01ne independence tests based on this\nconcept as well as a similarity measure to compare different types of couplings. We use these tests\nin section 4 to compute the statistical dependencies between simulated time series of various types\nof objects, as well as recordings of neural activity in the visual cortex of non-human primates. We\nshow that our technique reliably detects complex interactions and provides a characterization of\nthese interactions in the frequency domain.\n\n2 Background and notations\n\nto x = ki(., x) \u2208 Hi, such that \u2200f \u2208 Hi, f (x) = (cid:10)f, x(cid:11)\n\nRandom variables in Reproducing Kernel Hilbert Spaces\nLet X1 and X2 be two (possibly non vectorial) input domains. Let k1(., .) : X1 \u00d7 X1 \u2192 C and\nk2(., .) : X2 \u00d7 X2 \u2192 C be two positive de\ufb01nite kernels, associated to two separable Hilbert spaces\nof functions, H1 and H2 respectively. For i \u2208 {1, 2}, they de\ufb01ne a canonical mapping from x \u2208 Xi\n(see [19] for more details). In the\nsame way, this mapping can be extended to random variables, so that the random variable Xi \u2208 Xi\nis mapped to the random element Xi \u2208 Hi. Statistical objects extending the classical mean and\ncovariance to random variables in the RKHS are de\ufb01ned as follows:\n\nHi\n\n\u2022 the Mean Element (see [1, 3]): \u00b5i = E [Xi] ,\n\u2022 the Cross-covariance operator (see [6]): Cij = Cov [Xi, Xj] = E[Xi \u2297 X\u2217\n\nf \u2297 g\u2217 =(cid:10)g, .(cid:11)f (following [3]). As a consequence, the cross-covariance can be seen as an operator\n\nj ] \u2212 \u00b5i \u2297 \u00b5\u2217\nj ,\nwhere we use the tensor product notation f \u2297 g\u2217 to represent the rank one operator de\ufb01ned by\nin L(Hj,Hi), the Hilbert space of linear Hilbert-Schmidt operators from Hj to Hi (isomorphic to\nHi \u2297 H\u2217\nj ). Interestingly, the link between Cij and covariance in the input domains is given by the\nHilbert-Schmidt scalar product\n\nHS = Cov [fi(Xi), fj(Xj)] ,\n\n\u2200(fi, fj) \u2208 Hi \u2297 Hj\n\n(cid:10)Cij, fi \u2297 f\u2217\n\nj\n\n(cid:11)\n\nMoreover, the Hilbert-Schmidt norm of the operator in this space has been proved to be a measure of\nindependence between two random variables, whenever kernels are characteristic [11]. Extension of\nthis result has been provided in [22] for Markov chains. If the time series are assumed to be k-order\nMarkovian, then results of the classical HSIC can be generalized for a structured HSIC using uni-\nversal kernels based on the state vectors (x1(t), . . . , x1(t + k), x2(t), . . . , x2(t + k)). The statistical\nperformance of this methodology has not been studied extensively, in particular its sensitivity to the\ndimension of the state vector. The following sections propose an alternative methodology.\n\nKernel Cross-Spectral Density operator\nConsider a bivariate discrete time random process on X1 \u00d7 X2 : {(X1(t), X2(t))}t\u2208Z. We assume\nstationarity of the process and thus use the following translation invariant notations for the mean\nelements and cross-covariance operators:\n\nEXi(t) = \u00b5i, Cov [Xi(t + \u03c4 ), Xj(t)] = Cij(\u03c4 )\n\nThe cross-spectral density operator was introduced for stationary signals in [2] based on second\norder cumulants. Under mild assumptions, it is a Hilbert-Schmidt operator de\ufb01ned for all normalized\nfrequencies \u03bd \u2208 [0 ; 1] as:\n\n(cid:88)\n\nk\u2208Z\n\nS12(\u03bd) =\n\n(cid:88)\n\nk\u2208Z\n\nC12(k) exp(\u2212k2\u03c0\u03bd) =\n\n2\n\nC12(k)z\u2212k, for z = e2\u03c0i\u03bd.\n\n\fThis object summarizes all\nthe cross-spectral properties between the families of processes\n{f (X1)}f\u2208H1 and {g(X2)}g\u2208H2 in the sense that the cross-spectrum between f (X1) and g(X2)\nis given by Sf,g\nDensity operator (KCSD).\n\n12 (\u03bd) = (cid:10)f, S12g(cid:11). We therefore refer to this object as the Kernel Cross-Spectral\n\n3 Statistical properties of KCSD\n\nMeasuring independence with the KCSD\n\nOne interesting characteristic of the KCSD is given by the following theorem [2]:\nTheorem 1. Assume the kernels k1 and k2 are characteristic [9]. The processes X1 and X2 are\npairwise independent (i.e. for all integers t and t\u2019, X1(t) and X2(t(cid:48)) are independent), if and only\n\nif(cid:13)(cid:13)S12(\u03bd)(cid:13)(cid:13)HS = 0, \u2200\u03bd \u2208 [0 , 1].\n\nWhile this theorem states that KCSD can be used to test pairwise independence between time series,\nit does not imply independence between arbitrary sets of random variables taken from each time\nseries in general. However, if the joint probability distribution of the time series is encoded by a\nDirected Acyclic Graph (DAG), the following Theorem shows that independence in this broader\nsense is achieved under mild assumptions.\nProposition 2. If the joint probability distribution of time series is encoded by a DAG with no\nconfounder under the Markov property and faithfulness assumption, pairwise independence between\ntime series implies the mutual independence relationship {X1(t)}t\u2208Z \u22a5\u22a5 {X2(t)}t\u2208Z.\nProof. The proof uses the fact that the faithfulness and Markov property assumptions provide an\nequivalence between the independence of two sets of random variables and the d-separation of the\ncorresponding sets of nodes in the DAG (see [17]). We start by assuming pairwise independence\nbetween the time series.\nFor arbitrary times t and t(cid:48), assume the DAG contains an arrow linking the nodes X1(t) and X2(t(cid:48)).\nThis is an unblocked path linking this two nodes; thus they are not d-separated. As a consequence of\nfaithfulness, X1(t) and X2(t(cid:48)) are not independent. Since this contradicts our initial assumptions,\nthere cannot exist any arrow between X1(t) and X2(t(cid:48)).\nSince this holds for all t and t(cid:48), there is no path linking the nodes of each time series and we have\n{X1(t)}t\u2208Z \u22a5\u22a5 {X2(t)}t\u2208Z according to the Markov property (any joint probability distribution on\nthe nodes will factorize in two terms, one for each time series).\nAs a consequence, the use of KCSD to test for independence is justi\ufb01ed under the widely used\nfaithfulness and Markov assumptions of graphical models. As a comparison, the structured HSIC\nproposed in [22] is theoretically able to capture all dependencies within the range of k samples by\nassuming k-order Markovian time series.\n\nFourth order kernel cumulant operator\n\nStatistical properties of KCSD require assumptions regarding the higher order statistics of the time\nseries. Analogously to covariance, higher order statistics can be generalized as operators in (tensor\nproducts of) RKHSs. An important example in our setting is the joint quadricumulant (4th order\ncumulant) (see [4]). We skip the general expression of this cumulant to focus on its simpli\ufb01ed form\nfor four centered scalar random variables:\n\n\u03ba(X1, X2, X3, X4) = E[X1X2X3X4] \u2212 E[X1X2]E[X3X4] \u2212 E[X1X3]E[X2X4]\n\n\u2212 E[X1X4]E[X2X3]\n\n(1)\nThis object can be generalized to the case random variables mapped in two RKHSs. The quadricu-\nmulant operator K1234 is a linear operator in the Hilbert space L(H1 \u2297 H\u2217\n2), such that\n\nerties of this operator will be useful in the next sections due to the following lemma.\nLemma 3. [Property of the tensor quadricumulant] Let Xc\nthe Hilbert space H1 and Xc\nde\ufb01ned by Xc\n1, Xc\n3\n\n(cid:11), for arbitrary elements fi. The prop-\n3 be centered random elements in\n4 centered random elements in H2 (the centered random element is\n(cid:11)\n(cid:11) + Tr C1,3 Tr C2,4 +(cid:10)C1,4, C3,2\n\n\u03ba(f1(X1),f2(X2),f3(X3),f4(X4)) =(cid:10)f1\u2297f\u2217\nE(cid:2)(cid:10)Xc\n\n(cid:11)\n(cid:11)\n2, Xc\ni = Xj \u2212 \u00b5j), then\nH2\n\n(cid:3) = TrK1234 +(cid:10)C1,2, C3,4\n\n2 ,K1234f3\u2297f\u2217\n\n4\n\n2,H1 \u2297 H\u2217\n\n(cid:10)Xc\n\nH1\n\n2, Xc\n4\n\n1, Xc\n\n3\n\n\fIn the case of two jointly stationary time series, we de\ufb01ne the translation invariant quadricumulant\nbetween the two stationary time series as:\n\nK12(\u03c41, \u03c42, \u03c43) = K1234(X1(t + \u03c41), X2(t + \u03c42), X1(t + \u03c43), X2(t))\n\nEstimation with the Kernel Periodogram\n\nIn the following, we address the problem of estimating the properties of cross-spectral density oper-\nators from \ufb01nite samples. The idea for doing this analytically is to select samples from a time-series\nwith a tapering window function w : R (cid:55)\u2192 R with a support included in [0, 1]. By scaling this\nwindow according to wT (k) = w(k/T ), and multiplying it with the time series, T samples of the\nsequence can be selected. The windowed periodogram estimate of the KCSD operator for T succes-\nsive samples of the time series is\n\nPT\n\n1\n\nT (cid:107)w(cid:107)2FT [Xc\n\n1](\u03bd)\u2297FT [Xc\n2](\u03bd)\u2217,\n\u02c6\n12(\u03bd) =\ni (k) = Xi(k) \u2212 \u00b5i and (cid:107)w(cid:107)2 =\n\n1\n\nwith Xc\n\nw2(t)dt\n\n0\n\n1] =(cid:80)T\n\nk=1wT (k)(Xc\n\nwhere FT [Xc\n1(k))z\u2212k, for z = e2\u03c0i\u03bd, is the windowed Fourier transform of\nthe delayed time series in the RKHS. Properties of the windowed Fourier transform are related to\nthe regularity of the tapering window. In particular, we will chose a tapering window of bounded\nvariation. In such a case, the following lemma holds (see supplementary material for the proof).\nLemma 4. [A property of bounded variation functions] Let w be a bounded function of bounded\n\nUsing this assumption, the above periodogram estimate is asymptotically unbiased as shown in the\nfollowing theorem\nk\u2208Z |k|(cid:107)C12(k)(cid:107)HS < +\u221e,\n\nt=\u2212\u221ewT (t + k)w(t) \u2212(cid:80)+\u221e\n\nvariation then for all k,(cid:12)(cid:12)(cid:80)+\u221e\nt=\u2212\u221ewT (t)2(cid:12)(cid:12) \u2264 C|k|\nTheorem 5. Let w be a bounded function of bounded variation, if(cid:80)\n(cid:12)(cid:12)Tr [K12(k, i, j)](cid:12)(cid:12) < +\u221e,\n(cid:80)\nk\u2208Z |k| Tr(Cii(k)) < +\u221e and(cid:80)\n2(n)z\u2212n(cid:1)\u2217\n\n1(k)z\u2212k(cid:1)\u2297(cid:0)(cid:80)\n\n12(\u03bd) = S12(\u03bd), \u03bd (cid:54)\u2261 0\n\n(k,i,j)\u2208Z3\nE PT\n\nProof. By de\ufb01nition,\n\n12(z) = 1\n\n(mod 1/2)\n\nPT\n\nn\u2208Z zn\u2212kwT (k)wT (n)Xc\n\nn\u2208Z wT (n)Xc\n1(k)\u2297Xc\n\n2(n)\u2217\n\n= 1\n\n= 1\n\nT(cid:107)w(cid:107)2\n\nT(cid:107)w(cid:107)2\n\nT\u2192+\u221e\n\nthen lim\n\nk\u2208Z wT (k)Xc\n\n(cid:0)(cid:80)\n(cid:80)\nk\u2208Z(cid:80)\n\u03b4\u2208Z z\u2212\u03b4(cid:80)\n(cid:80)\n(cid:80)\n\u03b4\u2208Z z\u2212\u03b4((cid:80)\n)(cid:80)\n= 1(cid:107)w(cid:107)2 ((cid:80)\n\nn\u2208Z wT (n)2\n\nT(cid:107)w(cid:107)2\n\nT\n\nThus using Lemma 4,\nE PT\n\n12(z) = 1\n\nT(cid:107)w(cid:107)2\n\nn\u2208Z wT (n + \u03b4)wT (n)Xc\n\n1(n + \u03b4)\u2297Xc\n\n2(n)\u2217 , using \u03b4 = k \u2212 n.\n\nn\u2208Z wT (n)2 + O(|\u03b4|))C12(\u03b4)\n\u03b4\u2208Z z\u2212\u03b4C12(\u03b4)+ 1\n\nT O((cid:80)\n\n\u03b4\u2208Z |\u03b4|(cid:13)(cid:13)C12(\u03b4)(cid:13)(cid:13)HS) \u2192\n\nT\u2192+\u221e S12.\n\nHowever, the squared Hilbert-Schmidt norm of PT\npopulation KCSD squared norm according to the following theorem.\nTheorem 6. Under the assumptions of Theorem 5, for \u03bd (cid:54)\u2261 0 (mod 1/2)\n\nE(cid:13)(cid:13)PT\n12(\u03bd)(cid:13)(cid:13)2\n\nHS =(cid:13)(cid:13)S12(\u03bd)(cid:13)(cid:13)2\n\nlim\n\nT\u2192+\u221e\n\nHS + Tr(S11(\u03bd)) Tr(S22(\u03bd))\n\n12(\u03bd) is an asymptotically biased estimator of the\n\nThe proof of Theorem 5 is based on the decomposition in Lemma 3 and is provided in supplementary\ninformation.\nThis estimate requires speci\ufb01c bias estimation techniques to develop an independence test, we will\ncall it the biased estimate of the KCSD squared norm. Having the KCSD de\ufb01ned in an Hilbert space\nalso enables to de\ufb01ne similarity between two KCSD operators, so that it is possible to compare\nquantitatively whether different dynamical systems have similar couplings. The following theorem\nshows how periodograms enable to estimate the scalar product between two KCSD operators, which\nre\ufb02ects their similarity.\n\n4\n\n\fTheorem 7. Assume assumptions of Theorem 5 hold for two independent samples of bivariate\ntime series{(X1(t), X2(t))}t=...,\u22121,0,1,... and {(X3(t), X4(t))}t=...,\u22121,0,1,..., mapped with the same\ncouple of reproducing kernels.\n\nE(cid:10)PT\n\n34(\u03bd)(cid:11)\n\nHS =(cid:10)S12(\u03bd), S34(\u03bd)(cid:11)\n\nThen\n\nlim\n\nT\u2192+\u221e\n\n12(\u03bd), PT\n\nHS, \u03bd (cid:54)\u2261 0\n\n(mod 1/2)\n\nThe proof of Theorem 7 is similar to the one of Theorem 6 provided as supplemental information.\nInterestingly, this estimate of the scalar product between KCSD operators is unbiased. This comes\nfrom the assumption that the two bivariate series are independent. This provides a new opportunity\nto estimate the Hilbert-Schmidt norm as well, in case two independent samples of the same bivariate\nseries are available.\nCorollary 8. Assume assumptions of Theorem 5 hold for\nthe bivariate time series\n{(X1(t), X2(t))}t\u2208Z and assume {( \u02dcX1(t), \u02dcX2(t))}t\u2208Zan independent copy of the same time series,\nproviding the periodogram estimates PT\n\n12(\u03bd) and \u02dcPT\n\nE(cid:10)PT\n\n12(\u03bd)(cid:11)\n\nHS =(cid:13)(cid:13)S12(\u03bd)(cid:13)(cid:13)2\n\n12(\u03bd), respectively.\nHS, \u03bd (cid:54)\u2261 0\n\nThen\n\nlim\n\nT\u2192+\u221e\n\n12(\u03bd), \u02dcPT\n\n(mod 1/2)\n\nIn many experimental settings, such as in neuroscience, it is possible to measure the same time series\nin several independent trials. In such a case, corollary 8 states that estimating the Hilbert-Schmidt\nnorm of the KCSD without bias is possible using two intependent trials. We will call this estimate\nthe unbiased estimate of the KCSD squared norm.\nThese estimate can be computed ef\ufb01ciently for T equispaced frequency samples using the fast\nFourier transform of the centered kernel matrices of the two time series.\nIn general, the choice\nof the kernel is a trade-off between the capacity to capture complex dependencies (a character-\nistic kernel being better in this respect), and the convergence rate of the estimate (simpler ker-\nnels related to lower order statistics usually require less samples). Related theoretical analy-\nsis can be found in [8, 12]. Unless otherwise stated, the Gaussian RBF kernel with bandwidth\nparameter \u03c3, k(x, y) = exp((cid:107)x \u2212 y(cid:107)2 /2\u03c32), will be used as a characteristic kernel for vec-\ntor spaces. Let Kij denote the kernel matrix between the i-th and j-th time series (such that\n(Kij)k,l = k(xi(k), xj(l))), W the windowing matrix (such that (W)k,l = wT (k)wT (l)) and\nM be the centering matrix M = I \u2212 1T 1T\nT /T , then we can de\ufb01ne the windowed centered kernel\nmatrices \u02dcKij = (MKijM) \u25e6 W. De\ufb01ning the Discrete Fourier Transform matrix F, such that\n\u221a\n(F)k,l = exp(\u2212i2\u03c0kl/T )/\n\n\u03bd=(0,1,...,(T\u22121))/T = (cid:107)w(cid:107)\u22124 diag(cid:0)F \u02dcK13F\u22121(cid:1) \u25e6 diag(cid:0)F\u22121 \u02dcK24F(cid:1),\n\nT , the estimated scalar product is\n\n(cid:10)PT\n\n12, PT\n34\n\n(cid:11)\n\nwhich can be ef\ufb01ciently computed using the Fast Fourier Transform (\u25e6 is the Hadamard product).\nThe biased and unbiased squared norm estimates can be trivially retrieved from the above expression.\n\nShuf\ufb02ing independence tests\n\nAccording to Theorem 1, pairwise independence between time series requires the cross-spectral\ndensity operator to be zero for all frequencies. We can thus test independence by testing whether\nthe Hilbert-Schmidt norm of the operator vanishes for each frequency. We rely on Theorem 6 and\nCorollary 8 to compute biased and unbiased estimates of this norm. To achieve this, we generate\na distribution of the Hilbert-Schmidt norm statistics under the null hypothesis by cutting the time\ninterval in non-overlapping blocks and matching the blocks of each time series in pairs at random.\nDue to the central limit theorem, for a suf\ufb01ciently large number of time windows, the empirical\naverage of the statistics approaches a Gaussian distribution. We thus test whether the empirical\nmean differs from the one under the null distribution using a t-statistic. To prevent false positive\nresulting from multiple hypothesis testing, we control the Family-wise Error Rate (FWER) of the\ntests performed for each frequency. Following [16], we estimate a global maximum distribution on\nthe family of t-statistics across frequencies under the null hypothesis, and use the percentile of this\ndistribution to assess the signi\ufb01cance of the original t-statistics.\n\n5\n\n\fFigure 1: Results for the phase-amplitude coupling system. Top-left: example time course. Top-\nmiddle: estimate of the KCSD squared norm with a linear kernel. Top-right: estimate of the KCSD\nsquared norm with an RBF kernel. Bottom-left: performance of the biased kcsd test as a function of\nnumber of samples. Bottom-middle: performance of the unbiased kcsd test as a function of number\nof samples. Bottom-right: Rate of type I and type II errors for several independence tests.\n\n4 Experiments\n\nIn the following, we validate the performance of our test, called kcsd, on several datasets in the\nbiased and unbiased case. There is no general time series analysis tool in the literature to compare\nwith our approach on all these datasets. So our main source of comparison will be the HSIC test of\nindependence (assuming data is i.i.d.). This enables us, to compare both approaches using the same\nkernels. For vector data, one can compare the performance of our approach with a linear dependency\nmeasure: we do this by implementing our test using a linear kernel (instead of an RBF kernel), and\nwe call it linear kscd. Finally, we use the alternative approach of structured HSIC [22] by cutting\nthe time series in time windows (using the same approach as our independence test) and considering\neach of them as a single multivariate sample. This will be called block hsic. The bandwidth of the\nHSIC methods is chosen proportional to the median norm of the sample points in the vector space.\nThe p-value for all independence tests will be set to 5%.\n\nPhase amplitude coupling\n\nWe \ufb01rst simulate a non-linear dependency between two time series by generating two oscillations at\nfrequencies f1 and f2 , and introducing a modulation of the amplitude of the second oscillation by\nthe phase of the \ufb01rst one. This is achieved using the following discrete time equations:\n\n(cid:26) \u03d51(k + 1) = \u03d51(k) + .1\u00011(k) + 2\u03c0f1Ts\n\n(cid:26) x1(k) =\n\ncos(\u03d51(k))\n\n\u03d52(k + 1) = \u03d52(k) + .1\u00012(k) + 2\u03c0f2Ts\n\nx2(k) = (2 + C sin \u03d51(k)) cos(\u03d52(k))\n\nWhere the \u0001i are i.i.d normal. A simulation with f1 = 4Hz and f2 = 20Hz for a sampling fre-\nquency 1/Ts=100Hz is plotted on Figure 1 (top-left panel). For the parameters of the periodogram,\nwe used a window length of 50 samples (.5 s). We used a Gaussian RBF kernel to compute non-\nlinear dependencies between the two time series after standardizing each of them (divide them by\ntheir standard deviation). The top-middle and top-right panels of Figure 1 plot the mean and stan-\ndard errors of the estimate of the squared Hilbert-Schmidt norm for this system (for C = .1) for a\nlinear and a Gaussian RBF kernel (with \u03c3 = 1) respectively. The bias of the \ufb01rst estimate appears\nclearly in both cases at the two power picks of the signals for the biased estimate. In the second\n(unbiased) estimate, the spectrum exhibits a zero mean for all but one peak (at 4Hz for the RBF\nkernel), which corresponds to the expected frequency of non-linear interaction between the time\nseries. The observed negative values are also a direct consequence of the unbiased property of our\nestimate (Corollary 8). The in\ufb02uence of the bandwidth parameter of the kernel was studied in the\ncase of weakly coupled time series (C = .4 ). The bottom left and middle panels of Figure 1 show\n\n6\n\nfrequency (Hz)biasedunbiasedSquared norm estimate: linear kernelfrequency (Hz)biasedunbiasedSquared norm estimate: RBF kernel00.511.5\u22124\u2212202420406080100block hsichsiclinear kcsdkcsd020406080100type I (C=0)type II (C=.4)type II (C=2)error rate (%)time (s)number of samplesnumber of dependencies detected (%)Detection probability for biased kcsdnumber of samples10210300.20.40.60.81Detection probability for unbiased kcsdnumber of dependencies detected (%)10210300.20.40.60.81linearrbf\u03c3=.1.2.631.53.910051015202530\u22120.100.10.20.3051015202530\u22120.0500.050.10.15\fFigure 2: Markov chain dynamical system. Upper left: Markov transition probabilities, \ufb02uctuating\nbetween the values indicated in both graphs. Upper right: example of simulated time series. Bottom\nleft: the biased and unbiased KCSD norm estimates in the frequency domain. Bottom right: type I\nand type II errors for hsic and kcsd tests\n\nthe in\ufb02uence of this parameter on the number of samples required to actually reject the null hypoth-\nesis and detect the dependency for biased and unbiased estimates respectively. It was observed that\nchoosing an hyper-parameter close to the standard deviation of the signal (here 1.5) was an optimal\nstrategy, and that the test relying on the unbiased estimate outperformed the biased estimate. We\nthus used the unbiased estimate in our subsequent analysis. The coupling parameter C was further\nvaried to test the performance of independence tests both in case the null hypothesis of indepen-\ndence is true (C=0), and when it should be rejected (C = .4 for weak coupling, C = 2 for strong\ncoupling). These two settings enable to quantify the type I and type II error of the tests, respectively.\nThe bottom-right panel of Figure 1 reports these errors for several independence tests. Showing the\nsuperiority of our method especially for type II errors. In particular, methods based on HSIC fail to\ndetect weak dependencies in the time series.\n\nTime varying Markov chain\n\nWe now illustrate the use of our test in an hybrid setting. We generate a symbolic time series x2\nusing the alphabet S = [1, 2, 3], controlled by a scalar time series x1. The coupling is achieved by\nmodulating across time the transition probabilities of the Markov transition matrix generating the\nsymbolic time series x2 using the current value of the scalar time series x1 . This model is described\nby the following equations with f1 = 1Hz.\n\n(cid:40)\n\n= \u03d51(k) + .1\u00011(k) + 2\u03c0f1Ts\n=\np(x2(k + 1) = Si|x2(k) = Sj) =\n\n\u03d51(k + 1)\nx1(k + 1)\n\nsin(\u03d51(k + 1))\n\nMij + \u2206Mijx1(k)\n\nSince x1 is bounded between -1 and 1, the Markov transition matrix \ufb02uctuates across time between\ntwo models represented Figure 2 (top-left panel). A model without these \ufb02uctuations (\u2206M = 0)\nwas simulated as well to measure type I error. The time course of such an hybrid system is illustrated\non the top-right panel of the same \ufb01gure. In order to measure the dependency between these two\ntime series, we use a k-spectrum kernel [14] for x2 and a RBF kernel for x1 . For the k-spectrum\nkernel, we use k=2 (using k=1, i.e. counting occurrences of single symbols was less ef\ufb01cient) and\nwe computed the kernel between words of 3 successive symbols of the time series. We used an RBF\nkernel with \u03c3 = 1, decimated the signals by a factor 2 and signals were cut in time windows of 100\nsamples. The biased and unbiased estimates of the KCSD norm are represented at the bottom-left\nof Figure 2 and show a clear peak at the modulating frequency (1Hz). The independence test results\nshown at the bottom-right of Figure 2 illustrate again the superiority of KCSD for type II error,\nwhereas type I error stays in an acceptable range.\n\n7\n\n123.1.1.1.2.4.7.5.7.2123.1.1.1.2.6.01.3.7.89Transition probabilities01234\u22120.500.5state 1state 2state 3time (s)block hsichsickcsd020406080100error rate (%)type I errortype II errorbiasedunbiasedfrequency (Hz)KCSD norm estimate0.10.20.5125102005101520\fFigure 3: Left: Experimental setup of LFP recordings in anesthetized monkey during visual stim-\nulation with a movie. Right: Proportion of detected dependencies for the unbiased kcsd test of\ninteractions between Gamma band and wide band LFP for different kernels.\n\nNeural data: local \ufb01eld potentials from monkey visual cortex\n\nWe analyzed dependencies between local \ufb01eld potential (LFP) time series recorded in the primary\nvisual cortex of one anesthetized monkey during visual stimulation by a commercial movie (see\nFigure 3 for a scheme of the experiment). LFP activity re\ufb02ects the non-linear interplay between a\nlarge variety of underlying mechanisms. Here we investigate this interplay by extracting LFP activity\nin two frequency bands within the same electrode and quantify the non-linear interactions between\nthem with our approach. LFPs were \ufb01ltered into two frequency bands: 1/ a wide band ranging from\n1 to 100Hz which contains a rich variety of rhythms and 2/ a high gamma band ranging from 60 to\n100Hz which as been shown to play a role in the processing of visual information.\nBoth of these time series were sampled at 1000Hz. Using non-overlapping time windows of 1s\npoints, we computed the Hilbert-Schmidt norm of the KCSD operator between gamma and large\nband time series originating from the same electrode. We performed statistical testing for all fre-\nquencies between 1 and 500Hz (using a Fourier transform on 2048 points). The results of the test\naveraged over all recording sites is plotted on Figure 3. We observe a highly reliable detection of\ninteractions in the gamma band, using either a linear or non-linear kernel. This is due to the fact\nthat the Gamma band LFP is a \ufb01ltered version of the wide band LFP, making these signals highly\ncorrelated in the Gamma band. However, in addition to this obvious linear dependency, we observe\nsigni\ufb01cant interactions in the lowest frequencies (0.5-2Hz) which can not be explained by linear in-\nteraction (and is thus not detected by the linear kernel). This characteristic illustrates the non-linear\ninteraction between the high frequency gamma rhythm and other lower frequencies of the brain elec-\ntrical activity, which has been reported in other studies [21]. This also shows the interpretability of\nour approach as a test of non-linear dependency in the frequency domain.\n\n5 Conclusion\n\nAn independence test for time series based on the concept of Kernel Cross Spectral Density estima-\ntion was introduced in this paper. It generalizes the linear approach based on the Fourier transform\nin several respects. First, it allows quanti\ufb01cation of non-linear interactions for time series living\nin vector spaces. Moreover, it can measure dependencies between more complex objects, includ-\ning sequences in an arbitrary alphabet, or graphs, as long as an appropriate positive de\ufb01nite kernel\ncan be de\ufb01ned in the space of each time series. This paper provides asymptotic properties of the\nKCSD estimates, as well as an ef\ufb01cient approach to compute them on real data. The space of KCSD\noperators constitutes a very general framework to analyze dependencies in multivariate and highly\nstructured dynamical systems. Following [13, 18], our independence test can further be combined\nto recent developments in kernel time series prediction techniques [20] to de\ufb01ne general and reliable\nmultivariate causal inference techniques.\n\nAcknowledgments. MB is grateful to Dominik Janzing for fruitful discussions and advice.\n\n8\n\n\fReferences\n[1] A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics.\n\nKluwer Academic Boston, 2004.\n\n[2] M. Besserve, D. Janzing, N. Logothetis, and B. Sch\u00a8olkopf. Finding dependencies between frequencies\nwith the kernel cross-spectral density. In IEEE International Conference on Acoustics, Speech and Signal\nProcessing, pages 2080\u20132083, 2011.\n\n[3] G. Blanchard, O. Bousquet, and L. Zwald. Statistical properties of kernel principal component analysis.\n\nMachine Learning, 66(2-3):259\u2013294, 2007.\n\n[4] D. Brillinger. Time series: data analysis and theory. Holt, Rinehart, and Winston, New York, 1974.\n\n[5] J.-F. Cardoso. High-order contrasts for independent component analysis. Neural computation, 11(1):157\u2013\n\n192, 1999.\n\n[6] K. Fukumizu, F. Bach, and A. Gretton. Statistical convergence of kernel CCA. In Advances in Neural\n\nInformation Processing Systems 18, pages 387\u2013394, 2006.\n\n[7] K. Fukumizu, F. Bach, and M. Jordan. Dimensionality reduction for supervised learning with reproducing\n\nkernel Hilbert spaces. J. Mach. Learn. Res., 5:73\u201399, 2004.\n\n[8] K. Fukumizu, A. Gretton, G. R. Lanckriet, B. Sch\u00a8olkopf, and B. K. Sriperumbudur. Kernel choice and\nIn Advances in Neural Information\n\nclassi\ufb01ability for RKHS embeddings of probability distributions.\nProcessing Systems 21, pages 1750\u20131758, 2009.\n\n[9] K. Fukumizu, A. Gretton, X. Sun, and B. Sch\u00a8olkopf. Kernel Measures of Conditional Dependence. In\n\nAdvances in Neural Information Processing Systems 20, pages 489\u2013496, 2008.\n\n[10] G. B. Giannakis and J. M. Mendel.\n\nIdenti\ufb01cation of nonminimum phase systems using higher order\n\nstatistics. Acoustics, Speech and Signal Processing, IEEE Transactions on, 37(3):360\u2013377, 1989.\n\n[11] A. Gretton, K. Fukumizu, C. Teo, L. Song, B. Sch\u00a8olkopf, and A. Smola. A kernel statistical test of\n\nindependence. In Advances in Neural Information Processing Systems 20, pages 585\u2013592. 2008.\n\n[12] A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, K. Fukumizu, and B. K. Sripe-\nrumbudur. Optimal kernel choice for large-scale two-sample tests. In Advances in Neural Information\nProcessing Systems 25, pages 1214\u20131222, 2012.\n\n[13] A. Hyv\u00a8arinen, S. Shimizu, and P. O. Hoyer. Causal modelling combining instantaneous and lagged effects:\nan identi\ufb01able model based on non-gaussianity. In Proceedings of the 25th International Conference on\nMachine Learning, pages 424\u2013431. ACM, 2008.\n\n[14] C. Leslie, E. Eskin, and W. Noble. The spectrum kernel: a string kernel for SVM protein classi\ufb01cation.\n\nIn Pac Symp Biocomput., 2002.\n\n[15] C. Nikias and A. Petropulu. Higher-Order Spectra Analysis - A Non-linear Signal Processing Framework.\n\nPrentice-Hall PTR, Englewood Cliffs, NJ, 1993.\n\n[16] D. Pantazis, T. Nichols, S. Baillet, and R. Leahy. A comparison of random \ufb01eld theory and permutation\n\nmethods for the statistical analysis of MEG data. NeuroImage, 25:383 \u2013 394, 2005.\n\n[17] J. Pearl. Causality - Models, Reasoning, and Inference. Cambridge University Press, Cambridge, UK,\n\n2000.\n\n[18] J. Peters, D. Janzing, and B. Sch\u00a8olkopf. Causal inference on time series using structural equation models.\n\nIn Advances in Neural Information Processing Systems 26, 2013.\n\n[19] B. Sch\u00a8olkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.\n\n[20] V. Sindhwani, H. Q. Minh, and A. C. Lozano. Scalable matrix-valued kernel learning for high-dimensional\nnonlinear multivariate regression and granger causality. In Proceedings of the 29th Conference on Uncer-\ntainty in Arti\ufb01cial Intelligence, 2013.\n\n[21] K. Whittingstall and N. K. Logothetis. Frequency-band coupling in surface EEG re\ufb02ects spiking activity\n\nin monkey visual cortex. Neuron, 64:281\u20139, 2009.\n\n[22] X. Zhang, L. Song, A. Gretton, and A. Smola. Kernel Measures of Independence for Non-IID Data. In\n\nAdvances in Neural Information Processing Systems 21, pages 1937\u20131944, 2009.\n\n9\n\n\f", "award": [], "sourceid": 1197, "authors": [{"given_name": "Michel", "family_name": "Besserve", "institution": "MPI for Intelligent Systems"}, {"given_name": "Nikos", "family_name": "Logothetis", "institution": "MPI for Biological Cybernetics"}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": "MPI T\u00fcbingen"}]}