{"title": "Brain covariance selection: better individual functional connectivity models using population prior", "book": "Advances in Neural Information Processing Systems", "page_first": 2334, "page_last": 2342, "abstract": "Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration that the limited and noisy data available in spontaneous activity observations could be used to learn full-brain probabilistic models that generalize to new data. Learning such models entails two main challenges: i) modeling full brain connectivity is a difficult estimation problem that faces the curse of dimensionality and ii) variability between subjects, coupled with the variability of functional signals between experimental runs, makes the use of multiple datasets challenging. We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population. We show that individual models learned from functional Magnetic Resonance Imaging (fMRI) data using this population prior generalize better to unseen data than models based on alternative regularization schemes. To our knowledge, this is the first report of a cross-validated model of spontaneous brain activity. Finally, we use the estimated graphical model to explore the large-scale characteristics of functional architecture and show for the first time that known cognitive networks appear as the integrated communities of functional connectivity graph.", "full_text": "Brain covariance selection: better individual\n\nfunctional connectivity models using population prior\n\nGa\u00a8el Varoquaux(cid:63)\nParietal, INRIA\n\nNeuroSpin, CEA, France\n\nAlexandre Gramfort\n\nParietal, INRIA\n\nNeuroSpin, CEA, France\n\ngael.varoquaux@normalesup.org\n\nalexandre.gramfort@inria.fr\n\nJean-Baptiste Poline\nLNAO, I2BM, DSV\n\nNeuroSpin, CEA, France\njbpoline@cea.fr\n\nBertrand Thirion\nParietal, INRIA\n\nNeuroSpin, CEA, France\n\nbertrand.thirion@inria.fr\n\nAbstract\n\nSpontaneous brain activity, as observed in functional neuroimaging, has been\nshown to display reproducible structure that expresses brain architecture and car-\nries markers of brain pathologies. An important view of modern neuroscience is\nthat such large-scale structure of coherent activity re\ufb02ects modularity properties\nof brain connectivity graphs. However, to date, there has been no demonstra-\ntion that the limited and noisy data available in spontaneous activity observations\ncould be used to learn full-brain probabilistic models that generalize to new data.\nLearning such models entails two main challenges: i) modeling full brain con-\nnectivity is a dif\ufb01cult estimation problem that faces the curse of dimensionality\nand ii) variability between subjects, coupled with the variability of functional sig-\nnals between experimental runs, makes the use of multiple datasets challenging.\nWe describe subject-level brain functional connectivity structure as a multivari-\nate Gaussian process and introduce a new strategy to estimate it from group data,\nby imposing a common structure on the graphical model in the population. We\nshow that individual models learned from functional Magnetic Resonance Imag-\ning (fMRI) data using this population prior generalize better to unseen data than\nmodels based on alternative regularization schemes. To our knowledge, this is\nthe \ufb01rst report of a cross-validated model of spontaneous brain activity. Finally,\nwe use the estimated graphical model to explore the large-scale characteristics of\nfunctional architecture and show for the \ufb01rst time that known cognitive networks\nappear as the integrated communities of functional connectivity graph.\n\n1\n\nIntroduction\n\nThe study of brain functional connectivity, as revealed through distant correlations in the signals\nmeasured by functional Magnetic Resonance Imaging (fMRI), represents an easily accessible, albeit\nindirect marker of brain functional architecture; in the recent years, it has given rise to fundamen-\ntal insights on brain organization by representing it as a modular graph with large functionally-\nspecialized networks [1, 2, 3].\nAmong other features, the concept of functionally-specialized cognitive network has emerged as one\nof the leading views in current neuroscienti\ufb01c studies: regions that activate simultaneously, spon-\n\n(cid:63)Funding from INRIA-INSERM collaboration and grant /ANR/-08-BLAN-0250-02 VIMAGINE\n\n1\n\n\ftaneously or as an evoked response, form an integrated network that supports a speci\ufb01c cognitive\nfunction [1, 3]. In parallel, graph-based statistical analysis have shown that the graphical models\nthat naturally represent the correlation structure of brain signals exhibit small-world properties: any\ntwo regions of the brain can be connected through few intermediate steps, despite the fact that most\nnodes maintain only a few direct connections [4, 2]. These experimental results are consistent with\nthe view that the local neuronal systems in the brain group together to form large-scale distributed\nnetworks [5]. However, the link between large-scale networks corresponding to a known cognitive\nfunction and segregation into functional connectivity subgraphs has never been established.\nAt the individual level, the different brain functional networks are attractive as their coherence, as\nmanifested in their correlation structure, appears impacted by brain pathologies, such as schizophre-\nnia [6], neurodegenerative diseases \u2013e.g. Alzheimer\u2019s disease\u2013[7, 8], or in the study of brain lesions\n[9]. From the clinical standpoint, there is a strong interest in spontaneous-activity data to study and\ndiagnose brain pathologies because they can be recorded even on severely impaired subjects [10].\nFMRI is the tool of choice to study large-scale functional connectivity, as it relies on wide ex-\npertise gained through decades of brain mapping, and MRI scanners are widely available in brain\nresearch institutes and hospitals. However neural activity is observed in fMRI indirectly, at a limited\nspatiotemporal resolution ((3mm)3 \u00d7 3s typically), and is confounded by measurement and physio-\nlogical noise (cardiac and respiratory cycles, motion). For clinical applications as well as inference\nof brain fundamental architecture, the quantitative characterization of spontaneous activity has to\nrely on a probabilistic model of the signal. The question of the robustness of covariance estimation\nprocedures to observation noise as well as inter-individual variability is thus fundamental, and has\nnot been addressed so far.\nThe focus of this work is the estimation of a large-scale Gaussian model to give a probabilistic\ndescription of brain functional signals. The dif\ufb01culties are two-fold: on the one hand, there is a\nshortage of data to learn a good covariance model from an individual subject, and on the other\nhand, subject-to-subject variability poses a serious challenge to the use of multi-subject data: this\nconcerns the creation of population-level connectivity templates, the estimation of the normal vari-\nability around this template, and the assessment of non-normal variability. In this paper, we provide\nevidence that optimal regularization schemes can be used in the covariance estimation problem,\nmaking it possible to pull data from several subjects. We show that the resulting covariance model\nyields easily interpretable structures, and in particular we provide the \ufb01rst experimental evidence that\nthe functionally integrated communities of brain connectivity graphs correspond to known cognitive\nnetworks. To our knowledge, this is the \ufb01rst experiment that assesses quantitatively the goodness\nof \ufb01t of a full-brain functional connectivity model to new data. For this purpose, we introduce an\nunbiased cross-validation scheme that tests the generalization power of the inferred model.\nAlthough the proposed framework shares with so-called effective connectivity models (SEM [11],\nDCM [12]) the formulation in terms of graphical model, it is fundamentally different in that these\napproaches are designed to test the coef\ufb01cients of (small) graphical models in a hypothesis-driven\nframework, while our approach addresses the construction of large-scale model of brain connectivity\nthat might be valid at the population level, and is completely data-driven. [13] have applied with\nsuccess a similar framework to modeling task-driven brain activity.\nThe layout of the paper is the following. We \ufb01rst formulate the problem of estimating a high-\ndimensional Gaussian graphical model from multi-subject data. Second, we detail how we extract\nactivity time-series for various brain regions from fMRI data. Then, we compare the generalization\nperformance of different estimators based on various regularization procedures. Finally, we study\nthe graph communities of the learnt connectivity model as well as the integration and segregation\nprocesses between these communities. The present work opens the way to a systematic use of\nGaussian graphical Models for the analysis of functional connectivity data.\n\n2 Theoretical background: estimating Gaussian graphical models\n\nFrom a statistical estimation standpoint, the challenge to address is to estimate a covariance or a\ncorrelation matrix giving a good description of the brain activation data. We choose to use the\nframework of Gaussian models as these are the processes with the minimum information \u2013i.e. the\nmaximum entropy\u2013 given a covariance matrix.\n\n2\n\n\fCovariance selection procedures Let us consider a dataset X \u2208 Rn\u00d7p with p variables and n\nsamples, modeled as centered multivariate Gaussian process. Estimating its covariance matrix is a\ndif\ufb01cult statistical problem for two reasons. First, to specify a valid multivariate Gaussian model,\nthis covariance has to be positive de\ufb01nite. Second, if n < 1\n2 p(p + 1), as this is the case in our\nproblem, the number of unknown parameters is greater than the number of samples. As a result,\nthe eigenstructure of the sample covariance matrix carries a large estimation error. To overcome\nthese challenges, Dempster [14] proposed covariance selection: learning or setting conditional in-\ndependence between variables improves the conditioning of the problem. In multivariate Gaussian\nmodels, conditional independence between variables is given by the zeros in the precision (inverse\ncovariance) matrix K. Covariance selection can thus be achieved by imposing a sparse support for\nthe estimated precision matrix, i.e., a small number of non-zero coef\ufb01cients. In terms of graphical\nmodels, this procedure amounts to limiting the number of edges.\nSelecting the non-zero coef\ufb01cients to optimize the likelihood of the model given the data is a dif\ufb01cult\ncombinatorial optimization problem.\nIn order to tackle\nthis problem with more than tens of variables, it can be relaxed into a convex problem using a\npenalization based on the (cid:96)1 norm of the precision matrix, that is known to promote sparsity on the\nestimates [15]. The optimization problem is given by:\n\nIt is NP hard in the number of edges.\n\n\u02c6K(cid:96)1 = argminK(cid:31)0tr (K \u02c6\u03a3sample) \u2212 log det K + \u03bb(cid:107)K(cid:107)1,\n\n(1)\nn XT X is the sample covariance matrix, and (cid:107) \u00b7 (cid:107)1 is the element-wise (cid:96)1 norm\nwhere \u02c6\u03a3sample = 1\nof the off-diagonal coef\ufb01cients in the matrix. Optimal solutions to this problem can be computed\n\nvery ef\ufb01ciently in O(cid:0)p3(cid:1) time [15, 16, 17]. Note that this formulation of the problem amounts to\n\nthe computation of a maximum a posteriori (MAP) with an i.i.d. Laplace prior on the off-diagonal\ncoef\ufb01cients of the precision matrix.\n\nImposing a common sparsity structure\nIn the application targeted by this contribution, the prob-\nlem is to estimate the precision matrices in a group of subjects among which one can assume that all\nthe individual precision matrices share the same structure of conditional independence, i.e., the zeros\nin the different precision matrices should be at the same positions. This amounts to a joint prior that\ncan also lead to the computation of a MAP. To achieve the estimation with the latter constraint, a nat-\nural solution consists in estimating all matrices jointly. Following the idea of joint feature selection\nusing the group-Lasso for regression problems [18], the solution we propose consists in penalizing\nprecisions using a mixed norm (cid:96)21. Let us denote K(s) the precision for subject s in a population\ni(cid:54)=j (cid:107)K(\u00b7)\nij (cid:107)2. This leads to\n(cid:88)\n\nof S subjects. The penalty can be written as(cid:80)\n(cid:16) \u02c6K(s)\n\n(cid:113)(cid:80)S\nij )2 =(cid:80)\nsample) \u2212 log det K(s)(cid:17)\n\n\uf8f6\uf8f8 (2)\n\nthe minimization problem:\n\n= argminK(s)(cid:31)0\n\n(cid:107)K(\u00b7)\nij (cid:107)2\n\ntr(K(s) \u02c6\u03a3(s)\n\n(cid:17)\n\n(cid:96)21\n\ns=1..S\n\ni(cid:54)=j\n\ns=1(K(s)\n\n(cid:16)\n\n\uf8eb\uf8ed S(cid:88)\n\ns=1\n\n+ \u03bb\n\ni(cid:54)=j\n\nOne can notice then that in the special case where S = 1, (2) is equivalent to (1). By using such a\npenalization, a group of coef\ufb01cients { \u02c6K(s)\nij , s = 1, . . . , S} are either jointly set to zero or are jointly\nnon-zero [18], thus one enforces the precisions matrices to have a common sparse support for all\nsubjects.\nTo our knowledge, two other recent contributions address the problem of jointly estimating multiple\ngraphical models [19, 20]. While the approach of [19] is different from (2) and does not correspond\nto a group-Lasso formulation, [20] mentions the problem (2). Compared to this prior work, the\noptimization strategy we introduce largely differs, but also the application and the validation settings.\nIndeed, we are not interested in detecting the presence or the absence of edges on a common graph,\nbut in improving the estimation of a probabilistic model of the individual data. Also, the procedure\nto set regularization parameter \u03bb is done by evaluating the likelihood of unseen data in a principled\nnested cross-validation setting.\nIn order to minimize (2), we modi\ufb01ed the SPICE algorithm [21] that consists in upper bounding the\nnon-differentiable absolute values appearing in the (cid:96)1 norm with a quadratic differentiable function.\nWhen using a group-Lasso penalty, similarly the non-differentiable (cid:96)2 norms appearing in the (cid:96)21\npenalty can be upper bounded. The computational complexity of an iteration that updates all coef\ufb01-\n\ncients once is now in O(cid:0)S p3(cid:1): it scales linearly with the number of models to estimate. Following\n\n3\n\n\fthe derivation from [16], the iterative optimization procedure is stopped using a condition on the\noptimality of the solution using a control on the duality gap. Global optimality of the estimated\nsolution is made possible by the convexity of the problem (2).\nAlternatively, a penalization based on a squared (cid:96)2 norm has been investigated. It consists in regu-\nlarizing the estimate of the precision matrix by adding a diagonal matrix to the sample covariance\nbefore computing its inverse. It amounts to an (cid:96)2 shrinkage by penalizing uniformly off-diagonal\nterms:\n\n(3)\nAlthough the penalization parameter \u03bb for this shrinkage can be chosen by cross-validation, Ledoit\nand Wolf [22] have introduced a closed formula that leads to a good choice in practice. Unlike (cid:96)1\npenalization, (cid:96)2 downplays uniformly connections between variables, and is thus of less interest for\nthe study of brain structure. It is presented mainly for comparison purposes.\n\n\u02c6K(cid:96)2 = ( \u02c6\u03a3sample + \u03bb I)\u22121\n\n3 Probing brain functional covariance with fMRI\n\nInter-individual variability of resting-state fMRI We are interested in modeling spontaneous\nbrain activity, also called resting state data, recorded with fMRI. Although such data require complex\nstrategies to provide quantitative information on brain function, they are known to reveal intrinsic\nfeatures of brain functional anatomy, such as cognitive networks [1, 23, 3] or connectivity topology\n[4, 2].\nA well-known challenge with brain imaging data is that no two brains are alike. Anatomical corre-\nspondence between subjects is usually achieved by estimating and applying a deformation \ufb01eld that\nmaps the different anatomies to a common template. In addition to anatomical variability, within\na population of subjects, cognitive networks may recruit slightly different regions. Our estima-\ntion strategy is based on the hypothesis that although the strength of correlation between connected\nbrain region may vary across subjects, many of the conditional independence relationship will be\npreserved, as they re\ufb02ect the structural wiring.\n\nThe data at hand: multi-subject brain activation time series\n20 healthy subjects were scanned\ntwice in a resting task, eyes closed, resulting in a set of 244 brain volumes per session acquired with\na repetition time of 2.4 s. As in [8], after standard neuroimaging pre-processing, we extract brain\nfMRI time series and average them based on an atlas that subdivides the gray matter tissues into\nstandard regions.\nWe have found that the choice of the atlas used to extract time-series is crucial. Depending on\nwhether the atlas oversegments brain lobes into regions smaller than subject-to-subject anatomical\nvariability or captures this variability, cross-validation scores vary signi\ufb01cantly. Unlike previous\nstudies [4, 8], we choose to rely on an inter-subject probabilistic atlas of anatomical structures. For\ncortical structures, we use the prior probability of cortical folds in template space1 used in Bayesian\nsulci labeling and normalization of the cortical surface [24]. This atlas covers 122 landmarks spread\nthroughout the whole cortex and matches naturally their anatomical variability in terms of position,\nshape, and spread. It has been shown to be a good support to de\ufb01ne regions of interest for fMRI\nstudies [25]. For sub-cortical structures, such as gray nuclei, we use the Harvard-Oxford sub-cortical\nprobabilistic atlas, as shipped by the FSL software package. The union of both atlases forms an\ninter-subject probabilistic atlas for 137 anatomically-de\ufb01ned regions.\nAs we are interested in modeling only gray-matter correlations, we regress out confound effects ob-\ntained by extracting signals in different white matter and cortico-spinal \ufb02uid (CSF) regions, as well\nas the rigid-body motion time courses estimated during data pre-processing. We use the SPM soft-\nware to derive voxel-level tissue probability of gray matter, white matter, and CSF from the anatom-\nical images of each subject. Tissue-speci\ufb01c time series for either confound signals or grey-matter\nsignals are obtained by multiplying the subject-speci\ufb01c tissue probability maps with the probabilistic\natlas.\nFinally, as the fMRI signals contributing to functional connectivity have been found to lie in frequen-\ncies below 0.1 Hz [26], we apply temporal low-pass \ufb01ltering to the extracted time series. We set the\n\n1The corresponding atlas can be downloaded on http://lnao.lixium.fr/spip.php?article=229\n\n4\n\n\fcut-off frequency of the \ufb01lter using cross-validation with the Ledoit-Wolf (cid:96)2-shrinkage estimator.\nWe \ufb01nd an optimal choice of 0.3 Hz. Also, we remove residual linear trends due to instrument bias\nor residual movement signal and normalize the variance of the resulting time series. The covariance\nmatrices that we study thus correspond to correlations.\n\n4 Learning a better model for a subject\u2019s spontaneous activity\n\nModel-selection settings Given a subject\u2019s resting-state fMRI dataset, our goal is to estimate the\nbest multivariate normal model describing this subject\u2019s functional connectivity. For this, we learn\nthe model using the data from one session, and measure the likelihood of the second session\u2019s data\nfrom the same subject. We use this two-fold cross-validation procedure to tune the regularization\nparameters. In addition, we can use the data of the remaining subjects as a reference population\nduring the training procedure to inform the model for the singled-out subject.\n\nGeneralization performance for different estimation strategies We compare different estima-\ntion strategies. First, we learn the model using only the subject\u2019s data. We compare the sample\ncorrelation matrix, as well as the Ledoit-Wolf, (cid:96)2 and (cid:96)1-penalized estimators. Second, we use the\ncombined data of the subject\u2019s training session as well as the population, using the same estima-\ntors: we concatenate the data of the population and of the train session to estimate the covariance.\nFinally, we use the (cid:96)21-penalized estimator in Eq.(2), to learn different precisions for each subject,\nwith a common sparse structure. As this estimation strategy yields a different correlation matrix for\neach subject, we use the precision corresponding to the singled-out subject to test \u2013i.e. compute the\nGaussian log-likelihood of\u2013 the data of the left out session.\nThe cross-validation results (averaged across 20 subjects) are reported in Table 1. In addition, an\nexample of estimated precision matrices can be seen in Figure 1. We \ufb01nd that, due to the insuf\ufb01cient\nnumber of samples in one session, the subject\u2019s sample precision matrix performs poorly. (cid:96)2 pe-\nnalization gives a good conditioning and better performances, but is outperformed by (cid:96)1 penalized\nestimator that yields a sparsity structure expressing conditional independences between regions. On\nthe other hand, the population\u2019s sample precision is well-conditioned due to the high number of\nsamples at the group level and generalizes much better than the subject-level sample precision or the\ncorresponding (cid:96)2-penalized estimate. Penalizing the population-level covariance matrix does not\ngive a signi\ufb01cant performance gain. In particular, the (cid:96)1-penalized subject-level precision matrix\noutperforms the precision matrices learned from the group (p < 10\u22125).\nWe conclude from these cross-validation results that the generalization power of the models esti-\nmated from the population data are not limited by the number of samples but because they do not\nre\ufb02ect the subject\u2019s singularities. On the other hand, the estimation of a model solely from the\nsubject\u2019s data is limited by estimation error. We \ufb01nd that the (cid:96)21-penalized estimator strikes a com-\npromise and generalizes signi\ufb01cantly better than the other approaches (p < 10\u221210). Although each\nindividual dataset is different and generalization scores vary from subject to subject, compared to\nthe second-best performing estimator the (cid:96)21-penalized estimator gives a net gain for each subject\nof at least 1.7 in the likelihood of unseen data.\n\nGraphs estimated As can be seen from Figure 1, precision matrices corresponding to models that\ndo not generalize well display a lot of background noise whereas in models that generalize well,\na sparse structure stands out. Although an (cid:96)1 penalization is sparsity inducing, the optimal graphs\nestimated with such estimators are not very sparse (see table 1): a \ufb01lling factor of 50% amounts\nto 5 000 edges. As a result, the corresponding graphs are not interpretable without thresholding\n\nGeneralization likelihood\nFilling factor\nNumber of communities\nModularity\n\n9\n.23\n\nUsing subject data\n\nUniform group model\n\nLW\n-57.1\n\nMLE\n(cid:96)21\n33.1\n45.6\n100% 100% 100% 45% 100% 100% 100% 60% 8%\n16\n.60\n\nMLE\n40.6\n\nLW\n41.5\n\n8\n.23\n\n(cid:96)1\n41.8\n\n9\n.32\n\n(cid:96)2\n41.6\n\n7\n.18\n\n(cid:96)1\n43.0\n\n9\n.25\n\n(cid:96)2\n38.8\n\n5\n.12\n\n6\n.07\n\n5\n.07\n\nTable 1: Summary statistics for different estimation strategies. MLE is the Maximum Likelihood\nEstimate, in other words, the sample precision matrix. LW is the Ledoit-Wolf estimate.\n\n5\n\n\f(corresponding visualization are given in the supplementary materials). To interpret dense brain\nconnectivity graphs, previous work relied on extracting a connectivity backbone using a maximal\nspanning tree [27], or graph statistics on thresholded adjacency matrices [2].\nOn the opposite, the (cid:96)21-penalized graph is very sparse, with only 700 edges. Adequate penalization\nserves as a replacement to backbone extraction; moreover it corresponds to a theoretically well-\ngrounded and accurate model of brain connectivity. After embedding in 3D anatomical space, the\nestimated graph is very symmetric (see Figure 2). A third of the weight on the edges is on con-\nnections between a region and the corresponding one on the opposite hemisphere. In addition, the\nconnectivity model displays strong fronto-parietal connections, while the visual system is globally\nsingled out into one cluster, connected to the rest of the cortex mostly via the middle-temporal area.\n\n5 An application: graph communities to describe functional networks\n\nEven very sparse, high-dimensional functional connectivity graphs are hard to interpret. However,\nthey are deemed of high neuroscienti\ufb01c interest, as their structure can re\ufb02ect fundamental nervous\nsystem assembly principles. Indeed, there is evidence from the study of the fault-resilient structure\nof anatomical connections in the nervous systems that ensembles of neurones cluster together to\nform communities that are specialized to a cognitive task [5, 4, 27]. This process, known as func-\ntional integration goes along with a reduction of between-community connections, called segrega-\ntion. So far, studies of full-brain connectivity graphs have focused on the analysis of their statistical\nproperties, namely their small-world characteristics related to the emergence of strongly-connected\ncommunities in neural system. These properties can be summarized by a measure called modu-\nlarity [4, 2, 28]. As the original measures introduced for integration and segregation are Gaussian\nentropy and mutual information measures [29, 30], the estimation of a well-conditioned Gaussian\ngraphical model of the functional signal gives us an adequate tool to study large-scale modularity\nand integration in the brain. A limitation of the studies of statistical properties on graphs estimated\nfrom the data is that they may re\ufb02ect properties of the estimation noise. Given that our graphical\ndescription generalizes well to unseen data, it should re\ufb02ect the intrinsic properties of brain func-\ntional connectivity better than the sample correlation matrices previously used [4]. In this section,\nwe study these properties on the optimal precision matrices describing a representative individual as\nestimated above.\n\nFinding communities to maximize modularity Graph communities are a concept originally\nintroduced in social networks: communities are groups of densely-connected nodes with little\nbetween-group connections. Newman and Girvan [28] have introduced an objective function Q,\ncalled modularity, to measure the quality of a graph partition in a community structure. Choosing\nthe partition to optimize modularity is a NP-hard problem, but Smyth and White formulate it as a\ngraph partitioning problem, and give an algorithm [31] based on a convex approximation leading to\nspectral embedding and k-means clustering. The number of classes is chosen to optimize modularity.\n\nBrain functional-connectivity communities We apply Smyth and White\u2019s algorithm on the brain\nconnectivity graphs. We \ufb01nd that using the (cid:96)21-penalized precision matrices yields a higher number\nof communities, and higher modularity values (Table 1) then the other estimation strategies. We dis-\ncuss in details the results obtained without regularization, and with the best performing regulariza-\ntion strategies: (cid:96)1 penalization on individual data, and (cid:96)21 penalization. The communities extracted\nfrom the sample precision matrix are mostly spread throughout the brain, while the graph estimated\nwith (cid:96)1 penalization on individual data yields communities centered on anatomo-functional regions\nsuch as the visual system (\ufb01gures in supplementary materials). The communities extracted on the\n(cid:96)21-penalized precision exhibit \ufb01ner anatomo-functional structures, but also extract some known\nfunctional networks that are commonly found while studying spontaneous as well as task-related\nactivity [3]. In Figure 2, we display the resulting communities, making use, when possible, of the\nsame denominations as the functional networks described in [3]. In particular, the default mode net-\nwork and the fronto-parietal network are structures reproducibly found in functional-connectivity\nstudies that are non-trivial as they are large-scale, long-distance, and not comprised solely of bilat-\neral regions.\n\n6\n\n\fFigure 1: Precision matrices computed with different estimators. The precision matrix is shown in\nfalse colors in the background and its support is shown in black and white in an inset.\n\nFull graph\n\nCommunities\n\nFigure 2: Functional-connectivity graph computed by (cid:96)21-penalized estimation and corresponding\ncommunities. The graph displayed on the left is not thresholded, but on the top view, connections\nlinking one region to its corresponding one on the opposite hemisphere are not displayed.\n\n(cid:96)1\n\n(cid:96)21\n\nFigure 3: Between-communities integration graph obtained through (cid:96)1- (left) and (cid:96)21-penalization\n(right). The size of the nodes represents integration within a community and the size of the edges\nrepresents mutual information between communities. Region order is chosen via 1D Laplace em-\nbedding. The regions comprising the communities for the (cid:96)1-penalized graph are detailed in the\nsupplementary materials.\n\n7\n\nSubject sample precision40302010010203040Subject precision l14.53.01.50.01.53.04.5Group sample precision6.04.53.01.50.01.53.04.56.0Group precision l14.53.01.50.01.53.04.5Group precision l211.61.20.80.40.00.40.81.21.6Medial visualOccipital pole visualLateral visualDefault modeBasal gangliaRight ThalamusLeft PutamenDorsal motorAuditoryVentral motorPars opercularis (Broca aera)Fronto-lateral fronto-parietalLeft and rightPosterior inferiortemporal 2Posterior inferiortemporal 1Cingulo-insularnetwork\fIS1\n\n=\n\nlog det(KS1)\n\n1\n2\n\nIntegration and segregation in the graph communities These functionally-specialized networks\nare thought to be the expression of integration and segregation processes in the brain circuits archi-\ntecture. We apply the measures introduced by Tononi et al. [29] on the estimated graphs to quantify\nthis integration and segregation, namely Gaussian entropy of the functional networks, and mutual\ninformation. However, following [32], we use conditional integration and conditional mutual infor-\nmation to obtain conditional pair-wise measures, and thus a sparser graph: for two sets of nodes S1\nand S2,\n\nIntegration:\n\nMutual information: MS1,S2 = IS1\u222aS2 \u2212 IS1 \u2212 IS2,\n\n(4)\n(5)\nwhere KS1 denotes the precision matrix restricted to the nodes in S1. We use these two measures,\npair-wise and within-community, to create a graph between communities.\nThis graph re\ufb02ects the large-scale brain function organization. We compare the graph built using the\n(cid:96)1 and (cid:96)21-penalized precisions (\ufb01gure 3). We \ufb01nd that the former is much sparser than the latter,\nre\ufb02ecting a higher large segregation in between the communities estimated. The graph correspond-\ning to the (cid:96)21 penalization segments the brain in smaller communities and care must be taken in\ncomparing the relative integration of the different systems: for instance the visual system appears as\nmore integrate on the (cid:96)1 graph, but this is because it is split in three on the (cid:96)21 graph.\nAlthough this graph is a very simpli\ufb01ed view of brain functional architecture at rest, it displays\nsome of the key processing streams: starting from the primary visual system (medial visual areas),\nwe can distinguish the dorsal visual pathway, going through the occipital pole to the intra-parietal\nareas comprised in the default mode network and the fronto-parietal networks, as well as the ventral\nvisual pathway, going through the lateral visual areas to the inferior temporal lobe. The default\nmode and the fronto-parietal networks appear as hubs, connecting different networks with different\nfunctions, such as the visual streams, but also the motor areas, as well as the frontal regions.\n\n6 Conclusion\n\nWe have presented a strategy to overcome the challenge of subject-to-subject variability and learn\na detailed model of an individual\u2019s full-brain functional connectivity using population data. The\nlearnt graphical model is sparse and reveals the interaction structure between functional modules\nvia conditional independence relationships that generalize to new data. As far as we can tell, this is\nthe \ufb01rst time an unsupervised model of brain functional connectivity is backed by cross-validation.\nAlso, from a machine learning perspective, this work is the \ufb01rst demonstration, to our knowledge,\nof joint estimation of multiple graphical models in a model-selection setting, and the \ufb01rst time it is\nshown to improve a prediction score for individual graphical models.\nFrom a neuroscience perspective, learning high-dimensional functional connectivity probabilistic\nmodels opens the door to new studies of brain architecture. In particular, the models estimated with\nour strategy are well suited to exploring the graph-community structure resulting from the func-\ntional integration, specialization, and segregation of distributed networks. Our preliminary work\nsuggests that a mesoscopic description of neural ensembles via high-dimensional graphical models\ncan establish the link between the functional networks observed in brain imaging and the funda-\nmental nervous-system assembly principles. Finally, subject-level Gaussian probabilistic models of\nfunctional connectivity between a few regions have proved useful for statistically-controlled inter-\nindividual comparisons on resting-state, with medical applications [9]. Extending such studies to\nfull-brain analysis, that have been so-far limited by the amount of data available on individual sub-\njects, clears the way to new insights in brain pathologies [6, 8].\n\nReferences\n[1] M. Fox and M. Raichle: Spontaneous \ufb02uctuations in brain activity observed with functional magnetic\n\nresonance imaging. Nat Rev Neurosci 8 (2007) 700\u2013711\n\n[2] E. Bullmore and O. Sporns: Complex brain networks: graph theoretical analysis of structural and func-\n\ntional systems. Nat Rev Neurosci 10 (2009) 186\u2013198\n\n[3] S. Smith, et al. : Correspondence of the brain\u2019s functional architecture during activation and rest. PNAS\n\n106 (2009) 13040\n\n8\n\n\f[4] S. Achard, et al. : A resilient, low-frequency, small-world human brain functional network with highly\n\nconnected association cortical hubs. J Neurosci 26 (2006) 63\n\n[5] O. Sporns, et al. : Organization, development and function of complex brain networks. Trends in Cogni-\n\ntive Sciences 8 (2004) 418\u2013425\n\n[6] G. Cecchi, et al. : Discriminative network models of schizophrenia. In: NIPS 22. (2009) 250\u2013262\n[7] W. Seeley, et al. : Neurodegenerative Diseases Target Large-Scale Human Brain Networks. Neuron 62\n\n(2009) 42\u201352\n\n[8] S. Huang, et al. : Learning brain connectivity of Alzheimer\u2019s disease from neuroimaging data.\n\nAdvances in Neural Information Processing Systems 22. (2009) 808\u2013816\n\nIn:\n\n[9] G. Varoquaux, et al. : Detection of brain functional-connectivity difference in post-stroke patients using\n\ngroup-level covariance modeling. In: IEEE MICCAI. (2010)\n\n[10] M. Greicius: Resting-state functional connectivity in neuropsychiatric disorders. Current opinion in\n\nneurology 21 (2008) 424\n\n[11] A. McLntosh and F. Gonzalez-Lima: Structural equation modeling and its application to network analysis\n\nin functional brain imaging. Human Brain Mapping 2(1) (1994) 2\u201322\n\n[12] J. Daunizeau, K. Friston, and S. Kiebel: Variational Bayesian identi\ufb01cation and prediction of stochastic\n\nnonlinear dynamic causal models. Physica D 238 (2009)\n\n[13] J. Honorio and D. Samaras: Multi-Task Learning of Gaussian Graphical Models. In: ICML. (2010)\n[14] A. Dempster: Covariance selection. Biometrics 28(1) (1972) 157\u2013175\n[15] O. Banerjee, et al. : Convex optimization techniques for \ufb01tting sparse Gaussian graphical models. In:\n\nICML. (2006) 96\n\n[16] J. Duchi, S. Gould, and D. Koller: Projected subgradient methods for learning sparse gaussians. In: Proc.\n\nof the Conf. on Uncertainty in AI. (2008)\n\n[17] J. Friedman, T. Hastie, and R. Tibshirani: Sparse inverse covariance estimation with the graphical lasso.\n\nBiostatistics 9(3) (2008) 432\u2013441\n\n[18] M. Yuan and Y. Lin: Model selection and estimation in regression with grouped variables. Journal-Royal\n\nStatistical Society Series B Statistical Methodology 68(1) (2006) 49\n\n[19] J. Guo, et al. : Joint estimation of multiple graphical models. Preprint (2009)\n[20] J. Chiquet, Y. Grandvalet, and C. Ambroise:\n\nInferring multiple graphical structures. Stat and Comput\n\n(2010)\n\n[21] A. Rothman, et al. : Sparse permutation invariant covariance estimation. Electron J Stat 2 (2008) 494\n[22] O. Ledoit and M. Wolf: A well-conditioned estimator for large-dimensional covariance matrices. J.\n\nMultivar. Anal. 88 (2004) 365\u2013411\n\n[23] C. F. Beckmann and S. M. Smith: Probabilistic independent component analysis for functional magnetic\n\nresonance imaging. Trans Med Im 23(2) (2004) 137\u2013152\n\n[24] M. Perrot, et al. : Joint Bayesian Cortical Sulci Recognition and Spatial Normalization. In: IPMI. (2009)\n[25] M. Keller, et al. : Anatomically Informed Bayesian Model Selection for fMRI Group Data Analysis. In:\n\nMICCAI. (2009)\n\n[26] D. Cordes, et al. : Mapping functionally related regions of brain with functional connectivity MR imaging.\n\nAmerican Journal of Neuroradiology 21(9) (2000) 1636\u20131644\n\n[27] P. Hagmann, et al. : Mapping the structural core of human cerebral cortex. PLoS Biol 6(7) (2008) e159\n[28] M. Newman and M. Girvan: Finding and evaluating community structure in networks. Phys rev E 69\n\n(2004) 26113\n\n[29] G. Tononi, O. Sporns, and G. Edelman: A measure for brain complexity: relating functional segregation\n\nand integration in the nervous system. PNAS 91 (1994) 5033\n\n[30] O. Sporns, G. Tononi, and G. Edelman: Theoretical neuroanatomy: relating anatomical and functional\n\nconnectivity in graphs and cortical connection matrices. Cereb Cortex 10 (2000) 127\n\n[31] S. White and P. Smyth: A spectral clustering approach to \ufb01nding communities in graphs. In: 5th SIAM\n\ninternational conference on data mining. (2005) 274\n\n[32] D. Coynel, et al. : Conditional integration as a way of measuring mediated interactions between large-\n\nscale brain networks in functional MRI. In: Proc. ISBI. (2010)\n\n9\n\n\f", "award": [], "sourceid": 1054, "authors": [{"given_name": "Gael", "family_name": "Varoquaux", "institution": null}, {"given_name": "Alexandre", "family_name": "Gramfort", "institution": null}, {"given_name": "Jean-baptiste", "family_name": "Poline", "institution": null}, {"given_name": "Bertrand", "family_name": "Thirion", "institution": null}]}