{"title": "GP CaKe: Effective brain connectivity with causal kernels", "book": "Advances in Neural Information Processing Systems", "page_first": 950, "page_last": 959, "abstract": "A fundamental goal in network neuroscience is to understand how activity in one brain region drives activity elsewhere, a process referred to as effective connectivity. Here we propose to model this causal interaction using integro-differential equations and causal kernels that allow for a rich analysis of effective connectivity. The approach combines the tractability and flexibility of autoregressive modeling with the biophysical interpretability of dynamic causal modeling. The causal kernels are learned nonparametrically using Gaussian process regression, yielding an efficient framework for causal inference. We construct a novel class of causal covariance functions that enforce the desired properties of the causal kernels, an approach which we call GP CaKe. By construction, the model and its hyperparameters have biophysical meaning and are therefore easily interpretable. We demonstrate the efficacy of GP CaKe on a number of simulations and give an example of a realistic application on magnetoencephalography (MEG) data.", "full_text": "GP CaKe: Effective brain connectivity with causal\n\nkernels\n\nLuca Ambrogioni\nRadboud University\n\nl.ambrogioni@donders.ru.nl\n\nMarcel A. J. van Gerven\n\nRadboud University\n\nm.vangerven@donders.ru.nl\n\nMax Hinne\n\nRadboud University\n\nm.hinne@donders.ru.nl\n\nEric Maris\n\nRadboud University\n\ne.maris@donders.ru.nl\n\nAbstract\n\nA fundamental goal in network neuroscience is to understand how activity in one\nbrain region drives activity elsewhere, a process referred to as effective connectivity.\nHere we propose to model this causal interaction using integro-differential equa-\ntions and causal kernels that allow for a rich analysis of effective connectivity. The\napproach combines the tractability and \ufb02exibility of autoregressive modeling with\nthe biophysical interpretability of dynamic causal modeling. The causal kernels are\nlearned nonparametrically using Gaussian process regression, yielding an ef\ufb01cient\nframework for causal inference. We construct a novel class of causal covariance\nfunctions that enforce the desired properties of the causal kernels, an approach\nwhich we call GP CaKe. By construction, the model and its hyperparameters have\nbiophysical meaning and are therefore easily interpretable. We demonstrate the\nef\ufb01cacy of GP CaKe on a number of simulations and give an example of a realistic\napplication on magnetoencephalography (MEG) data.\n\n1\n\nIntroduction\n\nIn recent years, substantial effort was dedicated to the study of the network properties of neural\nsystems, ranging from individual neurons to macroscopic brain areas. It has become commonplace to\ndescribe the brain as a network that may be further understood by considering either its anatomical\n(static) scaffolding, the functional dynamics that reside on top of that or the causal in\ufb02uence that\nthe network nodes exert on one another [1\u20133]. The latter is known as effective connectivity and has\ninspired a surge of data analysis methods that can be used to estimate the information \ufb02ow between\nneural sources from their electrical or haemodynamic activity[2, 4]. In electrophysiology, the most\npopular connectivity methods are variations on the autoregressive (AR) framework [5]. Speci\ufb01cally,\nGranger causality (GC) and related methods, such as partial directed coherence and directed transfer\nfunction, have been successfully applied to many kinds of neuroscienti\ufb01c data [6, 7]. These methods\ncan be either parametric or non-parametric, but are not based on a speci\ufb01c biophysical model [8, 9].\nConsequently, the connectivity estimates obtained from these methods are only statistical in nature\nand cannot be directly interpreted in terms of biophysical interactions [10]. This contrasts with the\nframework of dynamic causal modeling (DCM), which allows for Bayesian inference (using Bayes\nfactors) with respect to biophysical models of interacting neuronal populations [11]. These models\nare usually formulated in terms of either deterministic or stochastic differential equations, in which\nthe effective connectivity between neuronal populations depends on a series of scalar parameters\nthat specify the strength of the interactions and the conduction delays [12]. DCMs are usually\nless \ufb02exible than AR models since they depend on an appropriate parametrization of the effective\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fconnectivity kernel, which in turn depends on detailed prior biophysical knowledge or Bayesian\nmodel comparison.\nIn this paper, we introduce a new method that is aimed to bridge the gap between biophysically\ninspired models, such as DCM, and statistical models, such as AR, using the powerful tools of\nBayesian nonparametrics [13]. We model the interacting neuronal populations with a system of\nstochastic integro-differential equations. In particular, the intrinsic dynamic of each population is\nmodeled using a linear differential operator while the effective connectivity between populations\nis modeled using causal integral operators. The differential operators can account for a wide range\nof dynamic behaviors, such as stochastic relaxation and stochastic oscillations. While this class of\nmodels cannot account for non-linearities, it has the advantage of being analytically tractable. Using\nthe framework of Gaussian process (GP) regression, we can obtain the posterior distribution of the\neffective connectivity kernel without specifying a predetermined parametric form. We call this new\neffective connectivity method Gaussian process Causal Kernels (GP CaKe). The GP CaKe method\ncan be seen as a nonparametric extension of linear DCM for which the exact posterior distribution can\nbe obtained in closed-form without resorting to variational approximations. In this way, the method\ncombines the \ufb02exibility and statistical simplicity of AR modeling with the biophysical interpretability\nof a linear DCM.\nThe paper is structured as follows. In Section 2 we describe the model for the activity of neuronal\npopulations and their driving interactions. In Section 3 we construct a Bayesian hierarchical model\nthat allows us to learn the causal interaction functions. Next, in Subsection 3.2, we show that these\ncausal kernels may be learned analytically using Gaussian process regression. Subsequently in\nSection 4, we validate GP CaKe using a number of simulations and demonstrate its usefulness on\nMEG data in Section 5. Finally, we discuss the wide array of possible extensions and applications of\nthe model in Section 6.\n\n2 Neuronal dynamics\n\nWe model the activity of a neuronal population xj(t) using the stochastic differential equation\n\nDjxj(t) = Ij(t) + wj(t) ,\n\nwhite noise with mean 0 and variance \u03c32. The differential operator Dj = \u03b10 +(cid:80)P\n\nwhere Ij(t) is the total synaptic input coming from other neuronal populations and wj(t) is Gaussian\ndp\ndtp speci\ufb01es\nthe internal dynamic of the neuronal population. For example, oscillatory dynamic can be modeled\nusing the damped harmonic operator DH\n, where \u03c90 is the (undamped) peak\nangular frequency and \u03b2 is the damping coef\ufb01cient.\nIn Eq. 1, the term Ij(t) accounts for the effective connectivity between neuronal populations. Assum-\ning that the interactions are linear and stationary over time, the most general form for Ij(t) is given\nby a sum of convolutions:\n\nj = d2\n\np=1 \u03b1p\n\ndt2 + \u03b2 d\n\ndt + \u03c92\n\n0\n\n(cid:0)ci\u2192j (cid:63) xi\n\n(cid:1)(t) ,\n\nN(cid:88)\n\ni=1\n\nIj(t) =\n\n(1)\n\n(2)\n\nwhere the function ci\u2192j(t) is the causal kernel, modeling the effective connectivity from population i\nto population j, and (cid:63) indicates the convolution operator. The causal kernel ci\u2192j(t) gives a complete\ncharacterization of the linear effective connectivity between the two neuronal populations, accounting\nfor the excitatory or inhibitory nature of the connection, the time delay, and the strength of the\ninteraction. Importantly, in order to preserve the causality of the system, we assume that ci\u2192j(t) is\nidentically equal to zero for negative lags (t < 0).\nInserting Eq. 2 into Eq. 1, we obtain the following system of stochastic integro-differential equations:\n\nN(cid:88)\n\n(cid:0)ci\u2192j (cid:63) xi\n\n(cid:1)(t) + wj(t),\n\nDjxj(t) =\n\nwhich fully characterizes the stochastic dynamic of a functional network consisting of N neuronal\npopulations.\n\ni=1\n\n2\n\nj = 1 . . . N ,\n\n(3)\n\n\f3 The Bayesian model\n\nWe can frame the estimation of the effective connectivity between neuronal populations as a nonpara-\nmetric Bayesian regression problem. In order to do this, we assign a GP prior distribution to the kernel\nfunctions ci\u2192j(t) for every presynaptic population i and postsynaptic population j. A stochastic func-\ntion f (t) is said to follow a GP distribution when all its marginal distributions p(f (t1), . . . , f (tn)) are\ndistributed as a multivariate Gaussian [14]. Since these marginals are determined by their mean vector\nand covariance matrix, the GP is fully speci\ufb01ed by a mean and a covariance function, respectively\nmf (t) = (cid:104)f (t)(cid:105) and Kf (t1, t2) = (cid:104)(f (t1) \u2212 mf (t1))(f (t2) \u2212 mf (t2))(cid:105). Using the results of the\nprevious subsection we can summarize the problem of Bayesian nonparametric effective connectivity\nestimation in the following way:\n\nci\u2192j(t) \u223c GP (0, K(t1, t2))\nwj(t) \u223c N (0, \u03c32)\n\nN(cid:88)\n\n(4)\n\n(5)\n\ni=1\n\nwhere expressions such as f (t) \u223c GP(cid:0)m(t), K(t1, t2)(cid:1) mean that the stochastic process f (t) follows\n\n(ci\u2192j (cid:63) xi) (t) + wj(t) ,\n\nDjxj(t) =\n\na GP distribution with mean function m(t) and covariance function K(t1, t2).\nOur aim is to obtain the posterior distributions of the effective connectivity kernels given a set of\nsamples from all the neuronal processes. As a consequence of the time shift invariance, the system\nof integro-differential equations becomes a system of decoupled linear algebraic equations in the\nfrequency domain. It is therefore convenient to rewrite the regression problem in the frequency\ndomain:\n\nci\u2192j(\u03c9) \u223c CGP(cid:0)0, K(\u03c91, \u03c92)(cid:1)\n\nwj(\u03c9) \u223c CN (0, \u03c32)\n\nN(cid:88)\n\ni=1\n\nPj(\u03c9)xj(\u03c9) =\n\nxi(\u03c9)ci\u2192j(\u03c9) + wj(\u03c9) ,\n\nwhere Pj(\u03c9) =(cid:80)P\n\np=0 \u03b1p(\u2212i\u03c9)p is a complex-valued polynomial since the application of a differen-\ntial operator in the time domain is equivalent to multiplication with a polynomial in the frequency\ndomain. In the previous expression, CN (\u00b5, \u03bd) denotes a circularly-symmetric complex normal\ndistribution with mean \u00b5 and variance \u03bd, while CGP (m(t), K(\u03c9)) denotes a circularly-symmetric\ncomplex valued GP with mean function m(\u03c9) and Hermitian covariance function K(\u03c91, \u03c92) [15].\nImportantly, the complex valued Hermitian covariance function K(\u03c91, \u03c92) can be obtained from\nK(t1, t2) by taking the Fourier transform of both its arguments:\n\nK(\u03c91, \u03c92) =\n\ne\u2212i\u03c91t1\u2212i\u03c92t2 K(t1, t2)dt1dt2 .\n\n(6)\n\n(cid:90) +\u221e\n\n(cid:90) +\u221e\n\n\u2212\u221e\n\n\u2212\u221e\n\n3.1 Causal covariance functions\n\nIn order to be applicable for causal inference, the prior covariance function K(t1, t2) must re\ufb02ect\nthree basic assumptions about the connectivity kernel: I) temporal localization, II) causality and\nIII) smoothness. Since we perform the GP analysis in the frequency domain, we will work with\nK(\u03c91, \u03c92), i.e. the double Fourier transform of the covariance function.\nFirst, the connectivity kernel should be localized in time, as the range of plausible delays in axonal\ncommunication between neuronal populations is bounded. In order to enforce this constraint, we\nneed a covariance function K(t1, t2) that vanishes when either t1 or t2 becomes much larger than a\ntime constant \u03d1. In the frequency domain, this temporal localization can be implemented by inducing\ncorrelations between the Fourier coef\ufb01cients of neighboring frequencies. In fact, local correlations in\nthe time domain are associated with a Fourier transform that vanishes for high values of \u03c9. From\nFourier duality, this implies that local correlations in the frequency domain are associated with a\nfunction that vanishes for high values of t. We model these spectral correlations using a squared\nexponential covariance function:\n\nKSE(\u03c91, \u03c92) = e\u2212\u03d1 (\u03c92\u2212\u03c91)2\n\n2\n\n+its(\u03c92\u2212\u03c91) = e\u2212\u03d1 \u03b62\n\n2 +its\u03b6 ,\n\n(7)\n\n3\n\n\fwhere \u03b6 = \u03c92 \u2212 \u03c91. Since we expect the connectivity to be highest after a minimal conduction delay\nts, we introduced a time shift factor its\u03b6 in the exponent that translates the peak of the variance from\n0 to ts, which follows from the Fourier shift theorem. As this covariance function depends solely on\nthe difference between frequencies \u03b6, it can be written (with a slight abuse of notation) as KSE(\u03b6).\nSecond, we want the connectivity kernel to be causal, meaning that information cannot propagate\nback from the future. In order to enforce causality, we introduce a new family of covariance functions\nthat vanish when the lag t2 \u2212 t1 is negative. In the frequency domain, a causal covariance function\ncan be obtained by adding an imaginary part to Eq. 7 that is equal to its Hilbert transform H [16].\nCausal covariance functions are the Fourier dual of quadrature covariance functions, which de\ufb01ne GP\ndistributions over the space of analytic functions, i.e. functions whose Fourier coef\ufb01cients are zero\nfor all negative frequencies [15]. The causal covariance function is given by the following formula:\n(8)\nFinally, as communication between neuronal populations is mediated by smooth biological processes\nsuch as synaptic release of neurotransmitters and dendritic propagation of potentials, we want the\nconnectivity kernel to be a smooth function of the time lag. Smoothness in the time domain can be\nimposed by discounting high frequencies. Here, we use the following discounting function:\n\nKC(\u03b6) = KSE(\u03b6) + iHKSE(\u03b6) .\n\nf (\u03c91, \u03c92) = e\u2212\u03bd\n\n\u03c92\n1 +\u03c92\n2\n\n2\n\n(9)\nThis discounting function induces a process that is smooth (in\ufb01nitely differentiable) and with time\nscale equal to \u03bd [14]. Our \ufb01nal covariance function is given by\n\nK(\u03c91, \u03c92) = f (\u03c91, \u03c92) (KSE(\u03b6) + iHKSE(\u03b6)) .\n\n(10)\nUnfortunately, the temporal smoothing breaks the strict causality of the covariance function because\nit introduces leakage from the positive lags to the negative lags. Nevertheless, the covariance function\nclosely approximates a causal covariance function when \u03bd is not much bigger than ts.\n\n.\n\n3.2 Gaussian process regression\n\nIn order to explain how to obtain the posterior distribution of the causal kernel, we need to review some\nbasic results of nonparametric Bayesian regression and GP regression in particular. Nonparametric\nBayesian statistics deals with inference problems where the prior distribution has in\ufb01nitely many\ndegrees of freedom [13]. We focus on the following nonparametric regression problem, where the aim\nis to reconstruct a series of real-valued functions from a \ufb01nite number of noisy mixed observations:\n\n(cid:88)\n\ni\n\nyt =\n\n\u03b3i(t)fi(t) + wt ,\n\n(11)\n\nwhere yt is the t-th entry of the data vector y, fi(t) is an unknown latent function and wt is a\nrandom variable that models the observation noise with diagonal covariance matrix D. The mixing\nfunctions \u03b3i(t) are assumed to be known and determine how the latent functions generate the data.\nIn nonparametric Bayesian regression, we specify prior probability distributions over the whole\n(in\ufb01nitely dimensional) space of functions fi(t). Speci\ufb01cally, in the GP regression framework this\ndistribution is chosen to be a zero-mean GP. In order to infer the value of the function f (t) at an\narbitrary set of target points T \u00d7 = {t\u00d7\nm}, we organize these values in the vector f with entries\nfl = f (t\u00d7\n\nl ). The posterior expected value of f, that we will denote as mfj|y, is given by\n\n1 , ..., t\u00d7\n\n(cid:18)(cid:88)\n\n(cid:19)\u22121\n\nmfj|y = K\u00d7\n\nfj\n\n\u0393j\n\n\u0393iKfi\u0393i + D\n\ny ,\n\n(12)\n\ni\n\n\u03c8 is de\ufb01ned by the entries [K\u00d7\n\nwhere the covariance matrix Kf is de\ufb01ned by the entries [Kf ]uv = Kf (tu, tv) and the cross-\ncovariance matrix K\u00d7\nu , tv) [14]. The matrices \u0393i are\nsquare and diagonal, with the entries [\u0393i]uu given by \u03b3i(tu).\nIt is easy to see that the problem de\ufb01ned by Eq. 5 has the exact same form as the generalized\nregression problem given by Eq. 11, with \u03c9 as dependent variable. In particular, the weight functions\n\u03b3i(\u03c9) are given by xi(\u03c9)Pj (\u03c9) and the noise term wj (\u03c9)\n|Pj (\u03c9)|2 . Therefore, the expectation\nof the posterior distributions p(ci\u2192j(\u03c9)|{x1(\u03c9h)}, . . . ,{xN (\u03c9h)}) can be obtained in closed from\nfrom Eq. 12.\n\nf ]uv = Kf (t\u00d7\n\nPj (\u03c9) has variance\n\n\u03c32\n\n4\n\n\f4 Effective connectivity simulation study\n\nWe performed a simulation study to assess the performance of the GP CaKe approach in recovering\nthe connectivity kernel from a network of simulated sources. The neuronal time series xj(t) are\ngenerated by discretizing a system of integro-differential equations, as expressed in Eq. 3. Time series\ndata was then generated for each of the sources using the Ornstein-Uhlenbeck process dynamic, i.e.\n\nD(1) =\n\nd\ndt\n\n+ \u03b1 ,\n\n(13)\n\nci\u2192j(\u03c4 ) = ai\u2192j\u03c4 e\u2212 \u03c4\ns ,\n\nwhere the positive parameter \u03b1 is the relaxation coef\ufb01cient of the process. The bigger \u03b1 is, the faster\nthe process reverts to its mean (i.e. zero) after a perturbation. The discretization of this dynamic is\nequivalent to a \ufb01rst order autoregressive process. As ground truth effective connectivity, we used\nfunctions of the form\n\n(14)\nwhere \u03c4 is a (non-negative) time lag, ai\u2192j is the connectivity strength from i to j and s is the\nconnectivity time scale.\nIn order to recover the connectivity kernels ci\u2192j(t) we \ufb01rst need to estimate the differential operator\nD(1). For simplicity, we estimated the parameters of the differential operator by maximizing the\nunivariate marginal likelihood of each individual source. This procedure requires that the variance\nof the structured input from the other neuronal populations is smaller than the variance of the\nunstructured white noise input so that the estimation of the intrinsic dynamic is not too much affected\nby the coupling.\nSince most commonly used effective connectivity measures (e.g. Granger causality, partial directed\ncoherence, directed transfer function) are obtained from \ufb01tted vector autoregression (VAR) coef\ufb01-\ncients, we use VAR as a comparison method. Since the least-squares solution for the VAR coef\ufb01cients\nis not regularized, we also compare with a ridge regularized VAR model, whose penalty term is\nlearned using cross-validation on separately generated training data. This comparison is particularly\nnatural since our connectivity kernel is the continuous-time equivalent of the lagged AR coef\ufb01cients\nbetween two time series.\n\n4.1 Recovery of the effective connectivity kernels\n\nWe explore the effects of different parameter values to demonstrate the intuitiveness of the kernel\nparameters. Whenever a parameter is not speci\ufb01cally adjusted, we use the following default values:\nnoise level \u03c3 = 0.05, temporal smoothing \u03bd = 0.15 and temporal localization \u03d1 = \u03c0. Furthermore,\nwe set ts = 0.05 throughout.\nFigure 1 illustrates connectivity kernels recovered by GP CaKe. These kernels have a connection\nstrength of ai\u2192j = 5.0 if i feeds into j and ai\u2192j = 0 otherwise. This applies to both the two node\nand the three node network. As these kernels show, our method recovers the desired shape as well\nas the magnitude of the effective connectivity for both connected and disconnected edges. At the\nsame time, Fig. 1B demonstrates that the indirect pathway through two connections does not lead to a\nnon-zero estimated kernel. Note furthermore that the kernels become non-zero after the zero-lag mark\n(indicated by the dashed lines), demonstrating that there is no signi\ufb01cant anti-causal information\nleakage.\nThe effects of the different kernel parameter settings are shown in Fig. 2A, where again the method is\nestimating connectivity for a two node network with one active connection, with ai\u2192j = 5.0. We\nshow the mean squared error (MSE) as well as the correlation between the ground truth effective\nconnectivity and the estimates obtained using our method. We do this for different values of the\ntemporal smoothing, the noise level and the temporal localization parameters. Figure 2B shows\nthe estimated kernels that correspond to these settings. As to be expected, underestimating the\ntemporal smoothness results in increased variance due to the lack of regularization. On the other hand,\noverestimating the smoothness results in a highly biased estimate as well as anti-causal information\nleakage. Overestimating the noise level does not induce anti-causal information leakage but leads to\nsubstantial bias. Finally, overestimating the temporal localization leads to an underestimation of the\nduration of the causal in\ufb02uence.\nFigure 3 shows a quantitative comparison between GP CaKe and the (regularized and unregularized)\nVAR model for the networks shown in Fig. 1A and Fig. 1B. The connection strength ai\u2192j was\n\n5\n\n\fFigure 1: Example of estimated connectivity. A. The estimated connectivity kernels for two connec-\ntions: one present (2 \u2192 1) and one absent (1 \u2192 2). B. A three-node network in which node 1 feeds\ninto node 2 and node 2 feeds into node 3. The disconnected edge from 1 to 3 is correctly estimated,\nas the estimated kernel is approximately zero. For visual clarity, estimated connectivity kernels for\nother absent connections (2 \u2192 1, 3 \u2192 2 and 3 \u21921) are omitted in the second panel. The shaded\nareas indicate the 95% posterior density interval over 200 trials.\n\nvaried to study its effect on the kernel estimation. It is clear that GP CaKe greatly outperforms both\nVAR models and that ridge regularization is bene\ufb01cial for the VAR approach. Note that, when the\nconnection strength is low, the MSE is actually smallest for the fully disconnected model. Conversely,\nboth GP CaKe and VAR always outperform the disconnected estimate with respect to the correlation\nmeasure.\n\n5 Brain connectivity\n\nIn this section we investigate the effective connectivity structure of a network of cortical sources. In\nparticular, we focus on sources characterized by alpha oscillations (8\u201312Hz), the dominant rhythm in\nMEG recordings. The participant was asked to watch one-minute long video clips selected from an\nAmerican television series. During these blocks the participant was instructed to \ufb01xate on a cross\nin the center of the screen. At the onset of each block a visually presented message instructed the\nparticipant to pay attention to either the auditory or the visual stream. The experiment also included a\nso-called \u2018resting state\u2019 condition in which the participant was instructed to \ufb01xate on a cross in the\ncenter of a black screen. Brain activity was recorded using a 275 channels axial MEG system.\nThe GP CaKe method can be applied to a set of signals whose intrinsic dynamic can be characterized\nby stochastic differential equations. Raw MEG measurements can be seen as a mixture of dynamical\nsignals, each characterized by a different intrinsic dynamic. Therefore, in order to apply the method\non MEG data, we need to isolate a set of dynamic components. We extracted a series of unmixed\nneural sources by applying independent component analysis (ICA) on the sensor recordings. These\ncomponents were chosen to have a clear dipolar pattern, the signature of a localized cortical source.\nThese local sources have a dynamic that can be well approximated with a linear mixture of linear\nstochastic differential equations [17]. We used the recently introduced temporal GP decomposition\nin order to decompose the components\u2019 time series into a series of dynamic components [17]. In\nparticular, for each ICA source we independently extracted the alpha oscillation component, which\nwe modeled with a damped harmonic oscillator: DH\n0. Note that the temporal GP\ndecomposition automatically estimates the parameters \u03b2 and \u03c90 through a non-linear least-squares\nprocedure [17].\nWe computed the effective connectivity between the sources that corresponded to occipital, parietal\nand left- and right auditory cortices (see Fig. 4A) using GP CaKe with the following parameter\nsettings: temporal smoothing \u03bd = 0.01, temporal shift ts = 0.004, temporal localization \u03d1 = 8\u03c0\nand noise level \u03c3 = 0.05. To estimate the causal structure of the network, we performed a z-test\non the maximum values of the kernels for each of the three conditions. The results were corrected\n\ndt2 + \u03b2 d\n\nj = d2\n\ndt + \u03c92\n\n6\n\n12AB32Ground truthGP CaKe0 lag1\fFigure 2: The effect of the the temporal localization, smoothness and noise level parameters on a\npresent connection. A. The correlation and mean squared error between the ground truth connectivity\nkernel and the estimation by GP CaKe. B. The shapes of the estimated kernels as determined by the\nindicated parameter. Default values for the parameters that remain \ufb01xed are \u03c3 = 0.05, \u03bd = 0.15 and\n\u03d1 = \u03c0. The dashed line indicates the zero-lag moment at which point the causal effect deviates from\nzero. The shaded areas indicate the 95% posterior density interval over 200 trials.\n\nFigure 3: The performance of the recovery of the effective connectivity kernels in terms of the\ncorrelation and mean squared error between the actual and the recovered kernel. Left column: results\nfor the two node graph shown in Fig. 1A. Right column: results for the three node graph shown in\nFig. 1B. The dashed line indicates the baseline that estimates all node pairs as disconnected.\n\nfor multiple comparisons using FDR correction with \u03b1 = 0.05. The resulting structure is shown\nin Fig. 4A, with the corresponding causal kernels in Fig. 4B. The three conditions are clearly\ndistinguishable from their estimated connectivity structure. For example, during the auditory attention\ncondition, alpha band causal in\ufb02uence from parietal to occipital cortex is suppressed relative to the\nother conditions. Furthermore, a number of connections (i.e. right to left auditory cortex, as well as\nboth auditory cortices to occipital cortex) are only present during the resting state.\n\n7\n\nNoise level\u03c3 = 0.01Temporal smoothing\u03bd = 0.012\u03c0Temporal localization\u03b8 = \u03c0ABGround truthGP CaKe0 lag\u03b8 = 2\u03c0\u03b8 = 3\u03c0\u03b8 = 4\u03c0\u03c3 = 10.00\u03c3 = 1.00\u03c3 = 0.10\u03bd = 10.00\u03bd = 1.00\u03bd = 0.103\u03c04\u03c0\u03c0Time lag (s) Time lag (s)Time lag (s) Time lag (s) 0.01.02.00.01.00.01.02.00.01.02.00.01.02.00.01.00.01.02.00.01.00.01.02.00.01.02.00.01.02.00.100.200.30MSECorrelationMSEMSE10-210-11001010.00.20.40.60.81.001234567MSE10-210-11001010.00.20.40.60.81.000.20.40.60.80.00.20.40.60.81.0CorrelationCorrelationCorrelation\u03bd\u03c3\u03b81.05.010.010-210-1100101102\u22120.20.00.20.41.02.55.0GP CaKeVAR, RidgeVARBaseline10-2100102104106108\u22120.2\u22120.10.00.10.20.31.02.55.01.05.010.0Two-node networkThree-node networkABConnection weightConnection weightConnection weightConnection weightCorrelationMean squared errorCorrelationMean squared error\fFigure 4: Effective connectivity using MEG for three conditions: I. resting state (R), II. attention to\nvideo stream (V) and III. attention to audio stream (A). Shown are the connections between occipital\ncortex, parietal cortex and left and right auditory cortices. A. The binary network for each of the\nthree conditions. B. The kernels for each of the connections. Note that the magnitude of the kernels\ndepends on the noise level \u03c3, and as the true strength is unknown, this is in arbitrary units.\n\n6 Discussion\n\nWe introduced a new effective connectivity method based on GP regression and integro-differential\ndynamical systems, referred to as GP CaKe. GP CaKe can be seen as a nonparametric extension of\nDCM [11] where the posterior distribution over the effective connectivity kernel can be obtained in\nclosed form. In order to regularize the estimation, we introduced a new family of causal covariance\nfunctions that encode three basic assumptions about the effective connectivity kernel: (1) temporal\nlocalization, (2) causality, and (3) temporal smoothness. The resulting estimated kernels re\ufb02ect the\ntime-modulated causal in\ufb02uence that one region exerts on another. Using simulations, we showed\nthat GP CaKe produces effective connectivity estimates that are orders of magnitude more accurate\nthan those obtained using (regularized) multivariate autoregression. Furthermore, using MEG data,\nwe showed that GP CaKe is able to uncover interesting patterns of effective connectivity between\ndifferent brain regions, modulated by cognitive state.\nThe strategy for selecting the hyperparameters of the GP CaKe model depends on the speci\ufb01c study.\nIf they are hand-chosen they should be set in a conservative manner. For example, the temporal\nlocalization should be longer than the highest biologically meaningful conduction delay. Analogously,\nthe smoothing parameter should be smaller than the time scale of the system of interest. In ideal\ncases, such as for the analysis of the subthreshold postsynaptic response of the cellular membrane,\nthese values can be reasonably obtained from biophysical models. When prior knowledge is not\navailable, several off-the-shelf Bayesian hyperparameter selection or marginalization techniques can\nbe applied to GP CaKe directly since both the marginal likelihood and its gradient are available in\nclosed-form. In this paper, instead of proposing a particular hyper-parameter selection technique, we\ndecided to focus our exposition on the interpretability of the hyperparameters. In fact, biophysical\ninterpretability can help neuroscientists construct informed hyperprior distributions.\nDespite its high performance, the current version of the GP CaKe method has some limitations.\nFirst, the method can only be used on signals whose intrinsic dynamics are well approximated\nby linear stochastic differential equations. Real-world neural recordings are often a mixture of\nseveral independent dynamic components. In this case the signal needs to be preprocessed using\na dynamic decomposition technique [17]. The second limitation is that the intrinsic dynamics are\ncurrently estimated from the univariate signals. This procedure can lead to biases when the neuronal\npopulations are strongly coupled. Therefore, future developments should focus on the integration of\ndynamic decomposition with connectivity estimation within an overarching Bayesian model.\nThe model can be extended in several directions. First, the causal structure of the neural dynamical\nsystem can be constrained using structural information in a hierarchical Bayesian model. Here,\nstructural connectivity may be provided as an a priori constraint, for example derived from diffusion-\nweighted MRI [18], or learned from the functional data simultaneously [19]. This allows the model\nto automatically remove connections that do not re\ufb02ect a causal interaction, thereby regularizing\n\n8\n\nABParietal cortexL. auditory cortexR. auditory cortexOccipital cortexOcc. cortexPar. cortexL. aud. cortexR. aud. cortexOcc.Par.R.A.L.A.R, VAVRRVRR, V, AR, V, AR, V, A\u22123\u22122\u221210121e\u221218\u22123\u22122\u221210121e\u2212180.000.040.080.12\u22123\u22122\u221210121e\u2212180.000.040.080.120.000.040.080.12restvideoaudio\u22123\u22122\u221210121e\u221218Time lag (s)Time lag (s)Time lag (s)Time lag (s)onsetR,V,AR,V,A\fthe estimation. Alternatively, the anatomical constraints on causal interactions may be integrated\ninto a spatiotemporal model of the brain cortex by using partial integro-differential neural \ufb01eld\nequations [20] and spatiotemporal causal kernels. In addition, the nonparametric modeling of the\ncausal kernel can be integrated into a more complex and biophysically realistic model where the\ndifferential equations are not assumed to be linear [12] or where the observed time series data are\n\ufb01ltered through a haemodynamic [21] or calcium impulse response function [22].\nFinally, while our model explicitly refers to neuronal populations, we note that the applicability of\nthe GP CaKe framework is in no way limited to neuroscience and may also be relevant for \ufb01elds such\nas econometrics and computational biology.\n\nReferences\n[1] A Fornito and E T Bullmore. Connectomics: A new paradigm for understanding brain disease.\n\nEuropean Neuropsychopharmacology, 25:733\u2013748, 2015.\n\n[2] K Friston. Functional and effective connectivity: A review. Brain Connectivity, 1(1):13\u201335,\n\n2011.\n\n[3] S L Bressler and V Menon. Large-scale brain networks in cognition: Emerging methods and\n\nprinciples. Trends in Cognitive Sciences, 14(6):277\u2013290, 2010.\n\n[4] K E Stephan and A Roebroeck. A short history of causal modeling of fMRI data. NeuroImage,\n\n62(2):856\u2013863, 2012.\n\n[5] K Friston, R Moran, and A K Seth. Analysing connectivity with Granger causality and dynamic\n\ncausal modelling. Current Opinion in Neurobiology, 23(2):172\u2013178, 2013.\n\n[6] K Sameshima and L A Baccal\u00e1. Using partial directed coherence to describe neuronal ensemble\n\ninteractions. Journal of Neuroscience Methods, 94(1):93\u2013103, 1999.\n\n[7] M Kami\u00b4nski, M Ding, W A Truccolo, and S. L. Bressler. Evaluating causal relations in neural\nsystems: Granger causality, directed transfer function and statistical assessment of signi\ufb01cance.\nBiological Cybernetics, 85(2):145\u2013157, 2001.\n\n[8] M Dhamala, G Rangarajan, and M Ding. Analyzing information \ufb02ow in brain networks with\n\nnonparametric Granger causality. NeuroImage, 41(2):354\u2013362, 2008.\n\n[9] S L Bressler and A K Seth. Wiener\u2013Granger causality: A well established methodology.\n\nNeuroImage, 58(2):323\u2013329, 2011.\n\n[10] B Schelter, J Timmer, and M Eichler. Assessing the strength of directed in\ufb02uences among\nneural signals using renormalized partial directed coherence. Journal of Neuroscience Methods,\n179(1):121\u2013130, 2009.\n\n[11] K Friston, B Li, J Daunizeau, and K E Stephan. Network discovery with DCM. NeuroImage,\n\n56(3):1202\u20131221, 2011.\n\n[12] O David, S J Kiebel, L M Harrison, J Mattout, J M Kilner, and K J Friston. Dynamic causal\n\nmodeling of evoked responses in EEG and MEG. NeuroImage, 30(4):1255\u20131272, 2006.\n\n[13] N L Hjort, C Holmes, P M\u00fcller, and S G Walker. Bayesian Nonparametrics. Cambridge\n\nUniversity Press, 2010.\n\n[14] C E Rasmussen. Gaussian Processes for Machine Learning. The MIT Press, 2006.\n\n[15] L Ambrogioni and E Maris. Complex\u2013valued Gaussian process regression for time series\n\nanalysis. arXiv preprint arXiv:1611.10073, 2016.\n\n[16] U C T\u00e4uber. Critical dynamics: a \ufb01eld theory approach to equilibrium and non-equilibrium\n\nscaling behavior. Cambridge University Press, 2014.\n\n[17] L Ambrogioni, M A J van Gerven, and E Maris. Dynamic decomposition of spatiotemporal\n\nneural signals. arXiv preprint arXiv:1605.02609, 2016.\n\n9\n\n\f[18] M Hinne, L Ambrogioni, R J Janssen, T Heskes, and M A J van Gerven. Structurally-informed\n\nBayesian functional connectivity analysis. NeuroImage, 86:294\u2013305, 2014.\n\n[19] M Hinne, R J Janssen, T Heskes, and M A J van Gerven. Bayesian estimation of conditional\nindependence graphs improves functional connectivity estimates. PLoS Computational Biology,\n11(11):e1004534, 2015.\n\n[20] S Coombes, P beim Graben, R Potthast, and J Wright. Neural Fields. Springer, 2014.\n\n[21] K J Friston, A Mechelli, R Turner, and C J Price. Nonlinear responses in fMRI: the Balloon\n\nmodel, Volterra kernels, and other hemodynamics. NeuroImage, 12(4):466\u2013477, 2000.\n\n[22] C Koch. Biophysics of computation: Information processing in single neurons. Computational\n\nNeuroscience Series. Oxford University Press, 2004.\n\n10\n\n\f", "award": [], "sourceid": 604, "authors": [{"given_name": "Luca", "family_name": "Ambrogioni", "institution": "Donders Institute"}, {"given_name": "Max", "family_name": "Hinne", "institution": "Radboud University"}, {"given_name": "Marcel", "family_name": "Van Gerven", "institution": "Radboud University"}, {"given_name": "Eric", "family_name": "Maris", "institution": "Donders Institute"}]}