{"title": "Modeling Dynamic Functional Connectivity with Latent Factor Gaussian Processes", "book": "Advances in Neural Information Processing Systems", "page_first": 8263, "page_last": 8273, "abstract": "Dynamic functional connectivity, as measured by the time-varying covariance of neurological signals, is believed to play an important role in many aspects of cognition. While many methods have been proposed, reliably establishing the presence and characteristics of brain connectivity is challenging due to the high dimensionality and noisiness of neuroimaging data. We present a latent factor Gaussian process model which addresses these challenges by learning a parsimonious representation of connectivity dynamics. The proposed model naturally allows for inference and visualization of the time-varying connectivity. As an illustration of the scientific utility of the model, application to a data set of rat local field potential activity recorded during a complex non-spatial memory task provides evidence of stimuli differentiation.", "full_text": "Modeling Dynamic Functional Connectivity with\n\nLatent Factor Gaussian Processes\n\nLingge Li\u2217\nUC Irvine\n\nlinggel@uci.edu\n\nDustin Pluta\u2217\n\nUC Irvine\n\ndpluta@uci.edu\n\nBabak Shahbaba\n\nUC Irvine\n\nNorbert Fortin\n\nUC Irvine\n\nbabaks@uci.edu\n\nnorbert.fortin@uci.edu\n\nHernando Ombao\n\nKAUST\n\nhernando.ombao@kaust.edu.sa\n\nPierre Baldi\nUC Irvine\n\npfbaldi@ics.uci.edu\n\nAbstract\n\nDynamic functional connectivity, as measured by the time-varying covariance\nof neurological signals, is believed to play an important role in many aspects of\ncognition. While many methods have been proposed, reliably establishing the\npresence and characteristics of brain connectivity is challenging due to the high\ndimensionality and noisiness of neuroimaging data. We present a latent factor Gaus-\nsian process model which addresses these challenges by learning a parsimonious\nrepresentation of connectivity dynamics. The proposed model naturally allows\nfor inference and visualization of connectivity dynamics. As an illustration of the\nscienti\ufb01c utility of the model, application to a data set of rat local \ufb01eld potential\nactivity recorded during a complex non-spatial memory task provides evidence of\nstimuli differentiation.\n\n1\n\nIntroduction\n\nThe celebrated discoveries of place cells, grid cells, and similar structures in the hippocampus have\nproduced a detailed, experimentally validated theory of the formation and processing of spatial\nmemories. However, the speci\ufb01c characteristics of non-spatial memories, e.g. memories of odors and\nsounds, are still poorly understood. Recent results from human fMRI and EEG experiments suggest\nthat dynamic functional connectivity (DFC) is important for the encoding and retrieval of memories\n[1, 2, 3, 4, 5, 6], yet DFC in local \ufb01eld potentials (LFP) in animal models has received relatively little\nattention. We here propose a novel latent factor Gaussian process (LFGP) model for DFC estimation\nand apply it to a data set of rat hippocampus LFP during a non-spatial memory task [7]. The model\nproduces strong statistical evidence for DFC and \ufb01nds distinctive patterns of DFC associated with\ndifferent experimental stimuli.\nDue to the high-dimensionality of time-varying covariance and the complex nature of cognitive\nprocesses, effective analysis of DFC requires balancing model parsimony, \ufb02exibility, and robustness\nto noise. DFC models fall into a common framework with three key elements: dimensionality\nreduction, covariance estimation from time series, and identi\ufb01cation of connectivity patterns [8].\nMany neuroimaging studies use a combination of various methods, such as sliding window (SW)\nestimation, principal component analysis (PCA), and the hidden Markov model (HMM) (see e.g.\n[9, 10, 11]). In general, these methods are not fully probabilistic, which can make uncertainty\nquanti\ufb01cation and inference dif\ufb01cult in practice.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\u2217These authors contributed equally to this work.\n\n\fBayesian latent factor models provide a probabilistic approach to modeling dynamic covariance\nthat allows for simultaneous dimensionality reduction and covariance process estimation. Examples\ninclude the latent factor stochastic volatility (LFSV) model [12] and the nonparametric covariance\nmodel [13]. In the LFSV model, an autoregressive process is imposed on the latent factors and can be\noverly restrictive. While the nonparametric model is considerably more \ufb02exible, the matrix process\nfor time-varying loadings adds substantial complexity.\nAiming to bridge the gap between these factor models, we propose the latent factor Gaussian process\n(LFGP) model. In this approach, a latent factor structure is placed on the log-covariance process of a\nnon-stationary multivariate time series, rather than on the observed time series itself as in other factor\nmodels. Since covariance matrices lie on the manifold of symmetric positive-de\ufb01nite (SPD) matrices,\nwe utilize the Log-Euclidean metric to allow unconstrained modeling of the vectorized upper triangle\nof the covariance process. Dimension reduction and model parsimony is achieved by representing\neach covariance element as a linear combination of Gaussian process latent factors [14].\nIn this work, we highlight three major advantages of the LFGP model for practical DFC analysis.\nFirst, through the prior on the Gaussian process length scale, we are able to incorporate scienti\ufb01c\nknowledge to target speci\ufb01c frequency ranges that are of scienti\ufb01c interest. Second, the model\nposterior allows us to perform Bayesian inference for scienti\ufb01c hypotheses, for instance, whether the\nLFP time series is non-stationary, and if characteristics of DFC differ across experimental conditions.\nThird, the latent factors serve as a low-dimensional representation of the covariance process, which\nfacilitates visualization of complex phenomena of scienti\ufb01c interest, such as the role of DFC in\nstimuli discrimination in the context of a non-spatial memory experiment.\n\n2 Background\n\n2.1 Sliding Window Covariance Estimation\n\nL can be written as the convolution \u02c6KSW (t) = (h \u2217 XX(cid:48))(t) =(cid:80)T\n\nSliding window methods have been extensively researched for the estimation and analysis of DFC, par-\nticularly in human fMRI studies; applications of these methods have identi\ufb01ed signi\ufb01cant associations\nof DFC with disease status, behavioral outcomes, and cognitive differences in humans. See [8] for a\nrecent detailed review of existing literature. For X(t) \u223c N (0, K(t)) a p-variate time series of length\nT with covariance process K(t), the sliding window covariance estimate \u02c6KSW (t) with window length\ns=1 h(s)X(t \u2212 s)X(t \u2212 s)(cid:48) ds,\nfor the rectangular kernel h(t) = 1[0,L\u22121](t)/L, where 1 is the indicator function. Studies\nof the performance of sliding window estimates recommend the use of a tapered kernel to de-\ncrease the impact of outlying measurements and to improve the spectral properties of the es-\ntimate [15, 16, 17].\nIn the present work we employ a Gaussian taper with scale \u03c4 de\ufb01ned as\n\u2212 1\n1[0,L\u22121](t), where \u03b6 is a normalizing constant. The corresponding\nh\u03c4 (t) = 1\ntapered SW estimate is \u02c6K\u03c4 (t) = (h\u03c4 \u2217 XX(cid:48))(t).\n\n(cid:16) t\u2212L/2\n\n(cid:17)2(cid:27)\n\n(cid:26)\n\n\u03b6 exp\n\n2\n\n\u03c4 L/2\n\n2.2 Log-Euclidean Metric\n\nDirect modeling of the covariance process from the SW estimates is complicated by the positive\nde\ufb01nite constraint of the covariance matrices. To ensure the model estimates are positive de\ufb01nite, it\nis necessary to employ post-hoc adjustments, or to build the constraints into the model, typically by\nutilizing the Cholesky or spectral decompositions. The LFGP model instead uses the Log-Euclidean\nframework of symmetric positive de\ufb01nite (SPD) matrices to naturally ensure positive-de\ufb01niteness of\nthe estimated covariance process while also simplifying the model formulation and implementation.\nDenote the space of p \u00d7 p SPD matrices as Pp. For X1, X2 \u2208 Pp, the Log-Euclidean distance is\nde\ufb01ned by dLE(X1, X2) = (cid:107)Log(X1) \u2212 Log(X2)(cid:107), where Log is the matrix logarithm, and (cid:107) \u00b7 (cid:107) is\nthe Frobenius norm. The metric space (Pp, dLE) is a Riemannian manifold that is isomorphic to Rq\nwith the usual Euclidean norm, for q = (p + 1)p/2.\nMethods for modeling covariances in regression contexts via the matrix logarithm were \ufb01rst introduced\nin [18]. The Log-Euclidean framework for analysis of SPD matrices in neuroimaging contexts was\n\ufb01rst proposed in [19], with further applications in neuroimaging having been developed in recent\nyears [20]. The present work is a novel application of the Log-Euclidean framework for DFC analysis.\n\n2\n\n\f2.3 Bayesian Latent Factor Models\n\niid\u223c N (0, Ir), \u03b5i\n\nFor xij, i = 1, . . . , n, j = 1, . . . , p, the simple Bayesian latent factor model is xi = fi\u039b + \u03b5i, with\niid\u223c N (0, \u03a3), and \u039b an r \u00d7 p matrix of factor loadings [21]. \u03a3 is commonly\nfi\nassumed to be a diagonal matrix, implying the latent factors capture all the correlation structure of the\np features of x. The latent factor model shares some similarities with principal component analysis,\nbut includes a stochastic error term, which leads to a different interpretation of the resulting factors\n[9, 10].\nVariants of the linear factor model have been developed for modeling non-stationary multivariate\ntime series [22, 23]. In general, these models represent the p-variate observed time series as a linear\ncombination of r latent factors fj(t), j = 1, . . . , r, with r \u00d7 q loading matrix \u039b and errors \u03b5(t):\nX(t) = f (t)\u039b + \u03b5(t). From this general modeling framework, numerous methods for capturing\nthe non-stationary dynamics in the underlying time series have been developed, such as latent\nfactor stochastic volatility (LFSV) [12], dynamic conditional correlation [24], and the nonparametric\ncovariance model [13].\n\n2.4 Gaussian Processes\nA Gaussian process (GP) is a continuous stochastic process for which any \ufb01nite collection of\npoints are jointly Gaussian with some speci\ufb01ed mean and covariance. A GP can be understood\nas a distribution on functions belonging to a particular reproducing kernel Hilbert space (RKHS)\ndetermined by the covariance operator of the process [25]. Typically, a zero mean GP is assumed\n(i.e. the functional data has been centered by subtracting a consistent estimator of the mean), so that\nthe GP is parameterized entirely by the kernel function \u03ba that de\ufb01nes the pairwise covariance. Let\nf \u223c GP(0, k(\u00b7,\u00b7)). Then for any x and x(cid:48) we have\n\n(cid:18) f (x)\n\n(cid:19)\n\n(cid:18)\n\n(cid:20) \u03ba(x, x)\n\n\u223c N\n\n0,\n\n\u03ba(x, x(cid:48))\n\u03ba(x, x(cid:48)) \u03ba(x(cid:48), x(cid:48))\n\n.\n\n(1)\n\n(cid:21)(cid:19)\n\nf (x(cid:48))\nFurther details are given in [26].\n\n3 Latent Factor Gaussian Process Model\n\n3.1 Formulation\n\nWe consider estimation of dynamic covariance from a sample of n independent time series with p\nvariables and T time points. Denote the ith observed p-variate time series by Xi(t), i = 1,\u00b7\u00b7\u00b7 , n.\nWe assume that each Xi(t) follows an independent distribution D with zero mean and stochastic\ncovariance process Ki(t). To model the covariance process, we \ufb01rst compute the Gaussian tapered\nsliding window covariance estimates for each Xi(t), with \ufb01xed window size L and taper \u03c4 to obtain\n\u02c6K\u03c4,i. We then apply the matrix logarithm to obtain the q = p(p + 1)/2 length vector Yi(t) speci\ufb01ed\nby \u02c6K\u03c4,i = Log((cid:126)u(Yi)), where (cid:126)u maps a matrix to its vectorized upper triangle. We refer to Yi(t) as\nthe \u201clog-covariance\" at time t.\nThe resulting Yi(t) can be modeled as an unconstrained q-variate time series. The LFGP model\nrepresents Yi(t) as a linear combination of r latent factors Fi(t) through an r \u00d7 q loading matrix B\nand independent Gaussian errors \u0001i. The loading matrix B is held constant across observations and\ntime. Here Fi(t) is modeled as a product of independent Gaussian processes. Placing priors p1, p2, p3\non the loading matrix B, Gaussian noise variance \u03c32, and Gaussian process hyper-parameter \u03b8,\nrespectively, gives a fully probabilistic latent factor model on the covariance process:\n\nXi(t) \u223c D(0, Ki(t)) where Ki(t) = exp ((cid:126)u(Yi(t)))\nYi(t) = Fi(t) \u00b7 B + \u0001i where \u0001i\nFi(t) \u223c GP(0, \u03ba(t; \u03b8))\n\niid\u223c N (0, I\u03c32)\n\n(3)\n(4)\n(5)\nThe LFGP model employs a latent distribution of curves GP(0, \u03ba(t; \u03b8)) to capture temporal depen-\ndence of the covariance process, thus inducing a Gaussian process on the log-covariance Y (t). This\n\nB \u223c p1, \u03c32 \u223c p2, \u03b8 \u223c p3.\n\n(2)\n\n3\n\n\fconveniently allows multiple observations to be modeled as different realizations of the same induced\nGP as done in [27]. The model posteriors are conditioned on different observations despite sharing\nthe same kernel. For better identi\ufb01ability, the GP variance scale is \ufb01xed so that the loading matrix\ncan be unconstrained.\n\n3.2 Properties\n\nTheorem 1. The log-covariance process induced by the LFGP model is weakly stationary when the\nGP kernel \u03ba(s, t) depends only on |s \u2212 t|.\n\nProof. The covariance of the log-covariance process Y (t) depends only on the static loading\nmatrix B = (\u03b2kj)1\u2264k\u2264r;1\u2264j\u2264q and the factor covariance kernels. Explicitly, for factor kernels\n\u03ba(s, t; \u03b8k), k = 1, . . . , r, and assuming \u03b5i(t) iid\u223c N (0, \u03a3), with \u03a3 = (\u03c32\njj(cid:48))j,j(cid:48)\u2264q constant across\nobservations and time, the covariance of elements of Y (t) is\n\nr(cid:88)\n\n(cid:33)\n\nFik(t)\u03b2kj(cid:48) + \u03b5ij(cid:48)(t)\n\n(6)\n\n(7)\n\n(cid:32) r(cid:88)\n\nr(cid:88)\n\nk=1\n\nCov(Yij(s), Yij(cid:48)(t)) = Cov\n\nFik(s)\u03b2kj + \u03b5ij(cid:48)(t),\n\nk=1\n\nk=1\n\njj(cid:48),\nwhich is weakly stationary when \u03ba(s, t) depends only on |s \u2212 t|.\n\n\u03b2kj\u03b2kj(cid:48)\u03ba(s, t; \u03b8k) + \u03c32\n\n=\n\nPosterior contraction. To consider posterior contraction of the LFGP model, we make the following\nassumptions. The true log-covariance process w = (cid:126)u(log(K(t)) is in the support of the product GP\nW \u223c F (t)B, for F (t) and B de\ufb01ned above, with known number of latent factors r. The GP kernel\n\u03ba is \u03b1-H\u00f6lder continuous with \u03b1 \u2265 1/2. Y (t) : [0, 1] \u2192 Rq is a smooth function in (cid:96)\u221e\nq ([0, 1]) with\nrespect to the Euclidean norm, and the prior p2 for \u03c32 has support on a given interval [a, b] \u2282 (0,\u221e).\nUnder the above assumptions, bounds on the posterior contraction rates then follow from previous\nresults on posterior contraction of Gaussian process regression for \u03b1-smooth functions given in\n[28, 29]. Speci\ufb01cally,\n\nE0\u03a0n((w, \u03c3) : (cid:107)w \u2212 w0(cid:107)n + |\u03c3 \u2212 \u03c30| > M \u03b5n|Y1,\u00b7\u00b7\u00b7 , Yn) \u2192 0\n\nfor suf\ufb01ciently large M and with posterior contraction rate \u03b5n = n\u2212\u03b1/(2\u03b1+q) log\u03b4(n) for some \u03b4 > 0,\nwhere E0(\u03a0n(\u00b7|Y1,\u00b7\u00b7\u00b7 , Yn)) is the expectation of the posterior under the model priors.\nTo illustrate posterior contraction in the LFGP model, we simulate data for \ufb01ve signals with various\nsample sizes (n) and numbers of observation time points (t), with a covariance process generated by\ntwo latent factors. To measure model bias, we consider the mean squared error of posterior median\nof the reconstructed log-covariance series. To measure posterior uncertainty, the posterior sample\nvariance is used. As shown in Table 1, both sample size n and number of observation time points t\ncontribute to posterior contraction.\n\nTable 1: Mean squared error of posterior median (posterior sample variance) \u00d710\u22122\n\nn = 1\n\nn = 10\n\nn = 20\n\nn = 50\n\nt = 25\nt = 50\nt = 100\n\n12.212 (20.225)\n6.911 (7.588)\n3.728 (5.218)\n\n7.845 (8.743)\n4.123 (5.836)\n1.682 (2.582)\n\n7.089 (7.714)\n3.273 (3.989)\n1.672 (2.659)\n\n5.869 (7.358)\n3.237 (3.709)\n1.672 (1.907)\n\nlog-covariance element will have prior Yj(t) =(cid:80)r\n\nLarge prior support. The prior distribution of the log-covariance process Y (t) is a linear combi-\nnation of r independent GPs each with mean 0 and kernel \u03ba(s, t; \u03b8k), k = 1,\u00b7\u00b7\u00b7 , r. That is, each\njk\u03ba(s, t; \u03b8k)). Con-\nsidering B \ufb01xed, the resulting prior for Fi(t)B has support equal to the closure of the reproducing\nkernel Hilbert space (RKHS) with kernel BTK(t,\u00b7)B [26], where K is the covariance tensor formed\nby stacking \u03bak = \u03ba(s, t; \u03b8k), k = 1,\u00b7\u00b7\u00b7 , r [25]. Accounting for the prior p1 of B, a function\n\nk=1 \u03b2jkFk(t) \u223c GP(0,(cid:80) \u03b22\n\n4\n\n\fW \u2208 (cid:96)\u221e\nwith kernel ATK(t,\u00b7)A for some A in the support of p1.\n\nq [0, 1] will have nonzero prior probability \u03a00(W ) > 0 if W is in the closure of the RKHS\n\n3.3 Factor Selection via the Horseshoe Prior\n\nSimilar to other factor models, the number of latent factors in the LFGP model has a crucial effect on\nmodel performance, and must be selected somehow. For Bayesian factor analysis, there is extensive\nliterature on factor selection methods, such as Bayes factors, reversible jump sampling [30], and\nshrinkage priors [31]. While we can compare different models in terms of goodness-of-\ufb01t, we cannot\ncompare their latent factors in a meaningful way due to identi\ufb01ability issues. Therefore, we instead\niteratively increase the number of factors and \ufb01t the new factors on the residuals resulting from the\nprevious \ufb01t. In order to avoid over\ufb01tting with too many factors, we place a horseshoe prior on the\nloadings of the new factors, so that the loadings shrink to zero if the new factor is unnecessary.\n\nFigure 1: Violin plots of loading posteriors show that the loadings for the fourth factor (indices 30 to\n39) shrink to zero with the horseshoe prior (left). Compared to the posteriors of the \ufb01rst three factors\n(dashed gray), the posterior of the extraneous factor (solid red) is diffused around zero as a result of\nzero loadings (right).\n\nIntroduced by [32], the horseshoe prior in the regression setting is given by\n\n\u03b2|\u03bb, \u03c4 \u223c N (0, \u03bb2\u03c12)\n\n\u03bb \u223c Cauchy+(0, 1)\n\n(8)\n(9)\nand can be considered as a scale-mixture of Gaussian distributions. A small global scale \u03c1 encourages\nshrinkage, while the heavy tailed Cauchy distribution allows the loadings to escape from zero. The\nexample shown in Figure 1 illustrates the shrinkage effect of the horseshoe prior when iteratively\n\ufb01tting an LFGP model with four factors to simulated data generated from three latent factors. For\nsampling from the loading posterior distribution, we use the No-U-Turn Sampler [33] as implemented\nin PyStan [34].\n\n3.4 Scalable Computation\n\nThe LFGP model can be \ufb01t via Gibbs sampling, as commonly done for Bayesian latent variable\nmodels. In every iteration, we \ufb01rst sample F|B, \u03c32, \u03b8, Y from the conditional p(F|Y ) as F, Y are\njointly multivariate Gaussian where the covariance can be written in terms of B, \u03c32, \u03b8. However,\nit is worth noting that this multivariate Gaussian has a large covariance matrix, which could be\ncomputationally expensive to invert. Given F , the parameters B, \u03c32 and \u03b8 become conditionally\nindependent. Using conjugate priors for Bayesian linear regression, the posterior p(B, \u03c32|F, Y ) is\ndirectly available. For the GP parameter posterior p(\u03b8|F ), either Metropolis random walk or slice\nsampling [35] can be used within each Gibbs step because the parameter space is low dimensional.\nFor ef\ufb01cient GP posterior sampling, it is essential to exploit the structure of the covariance matrix.\nFor each independent latent GP factor Fj, there are n independent sets of observations at t time points.\nTherefore, the GP covariance matrix \u03a3j has dimensions nT \u00d7 nT . To reduce the computational\nburden, we notice that the covariance \u03a3j can be decomposed using a Kronecker product \u03a3j =\nIn \u2297 Ktime(t), where Ktime is the T \u00d7 T temporal covariance. The cost to invert \u03a3j using this\ndecomposition is O(T 3), which is a substantial reduction compared to the original cost O((nT )3).\nFor many choices of kernel, such as the squared-exponential or Mat\u00e9rn kernel, Ktime(t) has a Toeplitz\nstructure and can be approximated through interpolation [36], further reducing the computational\ncost.\n\n5\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 2: The full covariance matrix \u03a3Y is composed of building blocks of smaller matrices. (a)\nGP covariance matrix at evenly-spaced time points, (b) covariance matrix of factor Fj for n sets of\nobservations, (c) contribution to the covariance of Y from factor Fj, and (d) full covariance matrix\n\u03a3Y .\n\nthe covariance matrix \u03a3Y can be written as a sum of Kronecker products(cid:80)r\n\nCombining the latent GP factors F (dimensions n\u00d7 T \u00d7 r) and loading matrix B (dimensions r \u00d7 q)\ninduces a GP on Y . The dimensionality of Y is n \u00d7 T \u00d7 q so the full (nT q) \u00d7 (nT q) covariance\nmatrix \u03a3Y is prohibitive to invert. As every column of Y is a weighted sum of the GP factors,\nj=1 Aj \u2297 \u03a3j + I\u03c32,\nwhere \u03a3j is the covariance matrix of the jth latent GP factor and Aj is a q \u00d7 q matrix based on the\nfactor loadings. We can regress residuals of Y on each column of F iteratively to sample from the\nconditional distribution p(F|Y ) so that the residual covariance is only Aj \u2297 \u03a3j + I. The inversion\ncan be done in a computationally ef\ufb01cient way with the following matrix identity\n\n(C \u2297 D + I)\u22121 = (P \u2297 Q)T (I + \u039b1 \u2297 \u039b2)\u22121(P \u2297 Q)\n\n(10)\n\nwhere C = P \u039b1P T and D = Q\u039b2QT are the spectral decompositions. In the identity, obtaining\nP, Q, \u039b1, \u039b2 costs O(q3) and O((nT )3), which is a substantial reduction from the cost of direct\ninversion, O((nT q)3); calculating (I + \u039b1 \u2297 \u039b2)\u22121 is straightforward since \u039b1 and \u039b2 are diagonal.\n\n4 Experiments\n\n4.1 Model Comparisons on Simulated Data\n\nWe here consider three benchmark models: sliding window with principal component analysis (SW-\nPCA), hidden Markov model, and LFSV model. SW-PCA and HMM are commonly used in DFC\nstudies but have severe limitations. The sliding window covariance estimates are consistent but noisy,\nand PCA does not take the estimation error into account. HMM is a probabilistic model and can be\nused in conjunction with a time series model, but it is not well-suited to capturing smoothly varying\ndynamics in brain connectivity.\n\nFigure 3: With the jagged dynamics of discrete states, the LFGP model fails to capture the \u201cjumps\"\nbut approximates the overall trend (left). When the underlying dynamics are smooth, the LFGP\nmodel can accurately recover the shape up to some scaling constant (right).\n\n6\n\n\fTo compare the performance of different models, we simulate time series data Xt \u223c N (0, K(t))\nwith time-varying covariance K(t). The covariance K(t) follows deterministic dynamics that are\ngiven by (cid:126)u(log(K(t))) = U (t) \u00b7 A. We consider three different scenarios of dynamics U (t): square\nwaves, piece-wise linear functions, and cubic splines. Note that both square waves and piece-wise\nlinear functions give rise to dynamics that are not well-represented by the LFGP model when the\nsquared-exponential kernel is used. For each scenario, we randomly generate 100 time series data\nsets and \ufb01t all the models. The evaluation metric is reconstruction loss of the covariance as measured\nby the Log-Euclidean distance. The simulation results in Table 2 show that the proposed LFGP model\nhas the lowest reconstruction loss among the methods considered. Each time series has 10 variables\n\nTable 2: Median reconstruction loss (standard deviation) across 100 data sets\n\nSquare save\nPiece-wise\n\nSmooth spline\n\nSW-PCA\n\n0.693 (0.499)\n0.034 (0.093)\n0.037 (0.016)\n\nHMM\n\nLFSV\n\nLFGP\n\n1.003 (1.299)\n0.130 (0.124)\n0.137 (0.113)\n\n4.458 (2.416)\n0.660 (0.890)\n0.532 (0.400)\n\n0.380 (0.420)\n0.027 (0.088)\n0.028 (0.123)\n\nwith 1000 observations and the latent dynamics are 4-dimensional as illustrated in Figure 3. For the\nSW-PCA model, the sliding window size is 50 and the number of principal components is 4. For\nthe HMM, the number of hidden states is increased gradually until the model does not converge,\nfollowing the implementation outlined in [37]. For the LFSV model, the R package factorstochvol is\nused with default settings. All simulations are run on a 2.7 GHz Intel Core i5 Macbook Pro laptop\nwith 8GB memory.\n\n4.2 Application to Rat Hippocampus Local Field Potentials\n\nTo investigate the neural mechanisms underlying the temporal organization of memories, [7] recorded\nneural activity in the CA1 region of the hippocampus as rats performed a sequence memory task.\nThe task involves the presentation of repeated sequences of 5 stimuli (odors A, B, C, D, and E) at\na single port and requires animals to correctly identify each stimulus as being presented either \u201cin\nsequence\u201d (e.g., ABC...) or \u201cout of sequence\u201d (e.g., ABD...) to receive a reward. Here the model is\napplied to local \ufb01eld potential (LFP) activity recorded from the rat hippocampus, but the key reason\nfor choosing this data set is that it provides a rare opportunity to subsequently apply the model to\nother forms of neural activity data collected using the same task (including spiking activity from\ndifferent regions in rats [38] and whole-brain fMRI in humans).\nLFP signals were recorded in the hippocampi of \ufb01ve rats performing the task. The local \ufb01eld\npotentials are measured by surgically implanted tetrodes and the exact tetrode locations vary across\nrats. Therefore, it may not make sense to compare LFP channels of different rats. This issue actually\nmotivates the latent factor approach because we want to eventually visualize and compare the latent\ntrajectories for all the rats. For the present analysis, we have focused on the data from a particular rat\nexhibiting the best memory task performance. To boost the signal-to-noise ratio, six LFP channels\nthat recorded a majority of the attached neurons were chosen. Only trials of odors B and C were\nconsidered, to avoid potential confounders with odor A being the \ufb01rst odor presented, and due to\nsubstantially fewer trials for odors D and E.\n\nFigure 4: Time series of 6 LFP channels for a single trial sampled at 1000Hz include all frequency\ncomponents (left). Posterior draws of latent factors for the covariance process appear to be smoothly\nvarying near the theta frequency range (right).\n\nDuring each trial, the LFP signals are sampled at 1000Hz for one second after odor release. We focus\non 41 trials of odor B and 37 trials of odor C. Figure 4 shows the time series of these six LFP channels\n\n7\n\n\ffor a single trial. We treat all 78 trials as different realizations of the same stochastic process without\ndistinguishing the stimuli explicitly in the model. In order to facilitate interpretation of the latent\nspace representation, we \ufb01t two latent factors which explain about 40% of the variance in the data.\nThe prior for GP length scale is a Gamma distribution concentrated around 100ms on the time scale\nto encourage learning frequency dynamics close to the theta range (4-12 Hz). Notably, oscillations\nin this frequency range have been associated with memory function but have not previously been\nshown to differentiate among the type of stimuli used here, thus providing an opportunity to test\nthe sensitivity of the model. For the loadings and variances, we use the Gaussian-Inverse Gamma\nconjugate priors. 20,000 MCMC draws are taken with the \ufb01rst 5000 draws discarded as burn-in.\n\nFigure 5: Posterior draws of median GP factors visualized as trajectories in latent space can be\nseparated based on the odor, with maximum separation around 250ms (left). The latent trajectories\nare much more intertwined when the model is \ufb01tted to data of the same odor. (right)\n\nFor each odor, we can calculate the posterior median latent factors across trials and visualize them\nas a trajectory in the latent space. Figure 5 shows that the two trajectories start in an almost\noverlapping area, with separation occurring around 250ms. This is corroborated by the experimental\ndata indicating that animals begin to identify the odor 200-250ms after onset. We also observe that\nthe two trajectories converge toward the end of the odor presentation. This is also consistent with the\nexperimental data showing that, by then, animals have correctly identi\ufb01ed the odors and are simply\nwaiting to perform the response (thereby resulting in similar neural states). In order to quantify odor\nseparation, we can evaluate the difference between the posterior distributions of odor median latent\ntrajectories by using classi\ufb01ers on the MCMC draws. We also \ufb01t the model to two random subsets of\nthe 58 trials of odor A and train the same classi\ufb01ers. Table 3) shows the classi\ufb01cation results and the\nposteriors are more separated for different odors.\n\nTable 3: Odor separation as measured by Latent space classi\ufb01cation accuracy (standard deviation)\n\nLogistic regression\n\nk-NN\nSVM\n\nDifferent odors\n69.97 (0.78)\n87.12 (0.33)\n74.53 (0.67)\n\nSame odor\n63.10 (0.91)\n78.41 (0.65)\n64.75 (1.21)\n\nAs a comparison, a hidden Markov model was \ufb01t to the LFP data from the same six selected tetrodes.\nFigure 6 compares the estimated covariance with different models. Eight states were selected with an\nelbow method using the AIC of the HMM; we note that the minimum AIC is not achieved for less\nthan 50 states, suggesting that the dynamics of the LFP covariance may be better described with a\ncontinuous model. Moreover, the proportion of time spent in each state for odor B and C trials given\nin Table 4 fails to capture odor separation in the LFP data.\nCollectively, these results provide compelling evidence that this model can use LFP activity to\ndifferentiate the representation of different stimuli, as well as capture their expected dynamics within\ntrials. Stimuli differentiation has frequently been accomplished by analyzing spiking activity, but\nnot LFP activity alone. This approach, which may be applicable to other types of neural data\nincluding spiking activity and fMRI activity, may signi\ufb01cantly advance our ability to understand how\ninformation is represented among brain regions.\n\n8\n\n\fTable 4: State proportions for odors B and C as estimated by HMM\n\nOdor\nB\nC\n\nState 1\n0.123\n0.133\n\nState 2\n0.089\n0.092\n\nState 3\n0.146\n0.144\n\nState 4\n0.153\n0.147\n\nState 5\n0.109\n0.106\n\nState 6\n0.159\n0.164\n\nState 7\n0.160\n0.152\n\nState 8\n0.061\n0.062\n\nFigure 6: Median covariance matrices over time for odor B trials estimated with sliding window (top),\nHMM (middle), and LFGP model (bottom) reveal similar patterns in dynamic connectivity in the six\nLFP channels.\n\n5 Discussion\n\nThe proposed LFGP model is a novel application of latent factor models for directly modeling the\ndynamic covariance in multivariate non-stationary time series. As a fully probabilistic approach, the\nmodel naturally allows for inference regarding the presence of DFC, and for detecting differences in\nconnectivity across experimental conditions. Moreover, the latent factor structure enables visualiza-\ntion and scienti\ufb01c interpretation of connectivity patterns. Currently, the main limitation of the model\nis scalability with respect to the number of observed signals. Thus, in practical applications it may\nbe necessary to select a relevant subset of the observed signals, or apply some form of clustering of\nsimilar signals. Future work will consider simultaneously reducing the dimension of the signals and\nmodeling the covariance process to improve the scalability and performance of the LFGP model.\nThe Gaussian process regression framework is a new avenue for analysis of DFC in many neu-\nroimaging modalities. Within this framework, it is possible to incorporate other covariates in the\nkernel function to naturally account for between-subject variability. In our setting, multiple trials are\ntreated as independent observations or repeated measurements from the same rat, while in human\nneuroimaging studies, there are often single observations from many subjects. Pooling information\nacross subjects in this setting could yield more ef\ufb01cient inference and lead to more generalizable\nresults.\n\nAcknowledgments\n\nThis work was supported by NIH award R01-MH115697 (B.S., H.O., N.J.F), NSF award DMS-\n1622490 (B.S.), Whitehall Foundation Award 2010-05-84 (N.J.F.), NSF CAREER award IOS-\n1150292 (N.J.F.), NSF award BSC-1439267 (N.J.F.), and KAUST research fund (H.O.). We would\nlike to thank Michele Guindani (UC-Irvine), Weining Shen (UC-Irvine), and Moo Chung (Univ. of\nWisconsin) for their helpful comments regarding this work.\n\nReferences\n[1] Athina Demertzi, Enzo Tagliazucchi, S Dehaene, G Deco, P Barttfeld, Federico Raimondo,\nCharlotte Martial, D Fern\u00e1ndez-Espejo, B Rohaut, HU Voss, et al. Human consciousness\n\n9\n\n\fis supported by dynamic complex patterns of brain signal coordination. Science Advances,\n5(2):eaat7603, 2019.\n\n[2] Meenakshi Khosla, Keith Jamison, Gia H Ngo, Amy Kuceyeski, and Mert R Sabuncu. Machine\n\nlearning in resting-state fmri analysis. arXiv preprint arXiv:1812.11477, 2018.\n\n[3] Mark Fiecas and Hernando Ombao. Modeling the evolution of dynamic brain processes\nduring an associative learning experiment. Journal of the American Statistical Association,\n111(516):1440\u20131453, 2016.\n\n[4] Hernando Ombao, Mark Fiecas, Chee-Ming Ting, and Yin Fen Low. Statistical models for\n\nbrain signals with properties that evolve across trials. NeuroImage, 180:609\u2013618, 2018.\n\n[5] Chee-Ming Ting, Hernando Ombao, S Balqis Samdin, and Sh-Hussain Salleh. Estimating\ndynamic connectivity states in fmri using regime-switching factor models. IEEE transactions\non medical imaging, 37(4):1011\u20131023, 2018.\n\n[6] S\u00f8ren FV Nielsen, Kristoffer H Madsen, Rasmus R\u00f8ge, Mikkel N Schmidt, and Morten M\u00f8rup.\nNonparametric modeling of dynamic functional connectivity in fmri data. arXiv preprint\narXiv:1601.00496, 2016.\n\n[7] Timothy A Allen, Daniel M Salz, Sam McKenzie, and Norbert J Fortin. Nonspatial sequence\n\ncoding in ca1 neurons. Journal of Neuroscience, 36(5):1547\u20131563, 2016.\n\n[8] Maria Giulia Preti, Thomas AW Bolton, and Dimitri Van De Ville. The dynamic functional\n\nconnectome: State-of-the-art and perspectives. Neuroimage, 160:41\u201354, 2017.\n\n[9] Hernando Ombao, Rainer Von Sachs, and Wensheng Guo. Slex analysis of multivariate\nnonstationary time series. Journal of the American Statistical Association, 100(470):519\u2013531,\n2005.\n\n[10] Yuxiao Wang, Chee-Ming Ting, and Hernando Ombao. Modeling effective connectivity in\nhigh-dimensional cortical source signals. IEEE Journal of Selected Topics in Signal Processing,\n10(7):1315\u20131325, 2016.\n\n[11] S Balqis Samdin, Chee-Ming Ting, and Hernando Ombao. Detecting state changes in community\nstructure of functional brain networks using a markov-switching stochastic block model. In 2019\nIEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pages 1483\u20131487.\nIEEE, 2019.\n\n[12] Gregor Kastner, Sylvia Fr\u00fchwirth-Schnatter, and Hedibert Freitas Lopes. Ef\ufb01cient bayesian\ninference for multivariate factor stochastic volatility models. Journal of Computational and\nGraphical Statistics, 26(4):905\u2013917, 2017.\n\n[13] Emily B Fox and David B Dunson. Bayesian nonparametric covariance regression. The Journal\n\nof Machine Learning Research, 16(1):2501\u20132542, 2015.\n\n[14] Neil D Lawrence. Gaussian process latent variable models for visualisation of high dimensional\n\ndata. In Advances in neural information processing systems, pages 329\u2013336, 2004.\n\n[15] Daniel A Handwerker, Vinai Roopchansingh, Javier Gonzalez-Castillo, and Peter A Bandettini.\n\nPeriodic changes in fmri connectivity. Neuroimage, 63(3):1712\u20131719, 2012.\n\n[16] Elena A Allen, Eswar Damaraju, Sergey M Plis, Erik B Erhardt, Tom Eichele, and Vince D\nCalhoun. Tracking whole-brain connectivity dynamics in the resting state. Cerebral cortex,\n24(3):663\u2013676, 2014.\n\n[17] Nora Leonardi and Dimitri Van De Ville. On spurious and real \ufb02uctuations of dynamic functional\n\nconnectivity during rest. Neuroimage, 104:430\u2013436, 2015.\n\n[18] Tom YM Chiu, Tom Leonard, and Kam-Wah Tsui. The matrix-logarithmic covariance model.\n\nJournal of the American Statistical Association, 91(433):198\u2013210, 1996.\n\n10\n\n\f[19] Vincent Arsigny, Pierre Fillard, Xavier Pennec, and Nicholas Ayache. Fast and simple calculus\non tensors in the log-euclidean framework. In International Conference on Medical Image\nComputing and Computer-Assisted Intervention, pages 115\u2013122. Springer, 2005.\n\n[20] Hongtu Zhu, Yasheng Chen, Joseph G Ibrahim, Yimei Li, Colin Hall, and Weili Lin. Intrinsic\nregression models for positive-de\ufb01nite matrices with applications to diffusion tensor imaging.\nJournal of the American Statistical Association, 104(487):1203\u20131212, 2009.\n\n[21] Omar Aguilar, Gabriel Huerta, Raquel Prado, and Mike West. Bayesian inference on latent\n\nstructure in time series. Bayesian Statistics, 6(1):1\u201316, 1998.\n\n[22] Raquel Prado and Mike West. Time series: modeling, computation, and inference. Chapman\n\nand Hall/CRC, 2010.\n\n[23] Giovanni Motta and Hernando Ombao. Evolutionary factor analysis of replicated time series.\n\nBiometrics, 68(3):825\u2013836, 2012.\n\n[24] Martin A Lindquist, Yuting Xu, Mary Beth Nebel, and Brain S Caffo. Evaluating dynamic\nbivariate correlations in resting-state fmri: a comparison study and a new approach. NeuroImage,\n101:531\u2013546, 2014.\n\n[25] Aad W van der Vaart, J Harry van Zanten, et al. Reproducing kernel hilbert spaces of gaussian\npriors. In Pushing the limits of contemporary statistics: contributions in honor of Jayanta K.\nGhosh, pages 200\u2013222. Institute of Mathematical Statistics, 2008.\n\n[26] C. E. Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine\n\nlearning, pages 63\u201371. Springer, 2004.\n\n[27] Shiwei Lan, Andrew Holbrook, Norbert J Fortin, Hernando Ombao, and Babak Shahbaba.\n\nFlexible bayesian dynamic modeling of covariance and correlation matrices. 2017.\n\n[28] Subhashis Ghosal, Aad Van Der Vaart, et al. Convergence rates of posterior distributions for\n\nnoniid observations. The Annals of Statistics, 35(1):192\u2013223, 2007.\n\n[29] Aad W van der Vaart, J Harry van Zanten, et al. Rates of contraction of posterior distributions\n\nbased on gaussian process priors. The Annals of Statistics, 36(3):1435\u20131463, 2008.\n\n[30] Hedibert Freitas Lopes and Mike West. Bayesian model assessment in factor analysis. Statistica\n\nSinica, 14(1):41\u201368, 2004.\n\n[31] Anirban Bhattacharya and David B Dunson. Sparse bayesian in\ufb01nite factor models. Biometrika,\n\npages 291\u2013306, 2011.\n\n[32] Carlos M Carvalho, Nicholas G Polson, and James G Scott. Handling sparsity via the horseshoe.\n\nIn Arti\ufb01cial Intelligence and Statistics, pages 73\u201380, 2009.\n\n[33] Matthew D Hoffman and Andrew Gelman. The no-u-turn sampler: adaptively setting path\nlengths in hamiltonian monte carlo. Journal of Machine Learning Research, 15(1):1593\u20131623,\n2014.\n\n[34] Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael\nBetancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic\nprogramming language. Journal of statistical software, 76(1), 2017.\n\n[35] Radford M Neal et al. Slice sampling. The annals of statistics, 31(3):705\u2013767, 2003.\n\n[36] Andrew Wilson and Hannes Nickisch. Kernel interpolation for scalable structured gaussian\nprocesses (kiss-gp). In International Conference on Machine Learning, pages 1775\u20131784, 2015.\n\n[37] Jeff A Bilmes et al. A gentle tutorial of the em algorithm and its application to parameter\n\nestimation for gaussian mixture and hidden markov models. 1998.\n\n[38] Andrew Holbrook, Alexander Vandenberg-Rodes, Norbert Fortin, and Babak Shahbaba. A\nbayesian supervised dual-dimensionality reduction model for simultaneous decoding of lfp and\nspike train signals. Stat, 6(1):53\u201367, 2017.\n\n11\n\n\f", "award": [], "sourceid": 4485, "authors": [{"given_name": "Lingge", "family_name": "Li", "institution": "UC Irvine"}, {"given_name": "Dustin", "family_name": "Pluta", "institution": "UC Irvine"}, {"given_name": "Babak", "family_name": "Shahbaba", "institution": "UCI"}, {"given_name": "Norbert", "family_name": "Fortin", "institution": "UC Irvine"}, {"given_name": "Hernando", "family_name": "Ombao", "institution": "KAUST"}, {"given_name": "Pierre", "family_name": "Baldi", "institution": "UC Irvine"}]}