{"title": "Generalized Correspondence-LDA Models (GC-LDA) for Identifying Functional Regions in the Brain", "book": "Advances in Neural Information Processing Systems", "page_first": 1118, "page_last": 1126, "abstract": "This paper presents Generalized Correspondence-LDA (GC-LDA), a generalization of the Correspondence-LDA model that allows for variable spatial representations to be associated with topics, and increased flexibility in terms of the strength of the correspondence between data types induced by the model. We present three variants of GC-LDA, each of which associates topics with a different spatial representation, and apply them to a corpus of neuroimaging data. In the context of this dataset, each topic corresponds to a functional brain region, where the region's spatial extent is captured by a probability distribution over neural activity, and the region's cognitive function is captured by a probability distribution over linguistic terms. We illustrate the qualitative improvements offered by GC-LDA in terms of the types of topics extracted with alternative spatial representations, as well as the model's ability to incorporate a-priori knowledge from the neuroimaging literature. We furthermore demonstrate that the novel features of GC-LDA improve predictions for missing data.", "full_text": "Generalized Correspondence-LDA Models (GC-LDA)\n\nfor Identifying Functional Regions in the Brain\n\nTimothy N. Rubin\n\nSurveyMonkey\n\nOluwasanmi Koyejo\n\nUniv. of Illinois, Urbana-Champaign\n\nMichael N. Jones\nIndiana University\n\nTal Yarkoni\n\nUniversity of Texas at Austin\n\nAbstract\n\nThis paper presents Generalized Correspondence-LDA (GC-LDA), a generalization\nof the Correspondence-LDA model that allows for variable spatial representations\nto be associated with topics, and increased \ufb02exibility in terms of the strength\nof the correspondence between data types induced by the model. We present\nthree variants of GC-LDA, each of which associates topics with a different spatial\nrepresentation, and apply them to a corpus of neuroimaging data. In the context of\nthis dataset, each topic corresponds to a functional brain region, where the region\u2019s\nspatial extent is captured by a probability distribution over neural activity, and the\nregion\u2019s cognitive function is captured by a probability distribution over linguistic\nterms. We illustrate the qualitative improvements offered by GC-LDA in terms\nof the types of topics extracted with alternative spatial representations, as well\nas the model\u2019s ability to incorporate a-priori knowledge from the neuroimaging\nliterature. We furthermore demonstrate that the novel features of GC-LDA improve\npredictions for missing data.\n\n1\n\nIntroduction\n\nOne primary goal of cognitive neuroscience is to \ufb01nd a mapping from neural activity onto cognitive\nprocesses\u2013that is, to identify functional networks in the brain and the role they play in supporting\nmacroscopic functions. A major milestone towards this goal would be the creation of a \u201cfunctional-\nanatomical atlas\u201d of human cognition, where, for each putative cognitive function, one could identify\nthe regions and brain networks within the region that support the function.\nEfforts to create such functional brain atlases are increasingly common in recent years. Most studies\nhave proceeded by applying dimensionality reduction or source decomposition methods such as\nIndependent Component Analysis (ICA) [4] and clustering analysis [9] to large fMRI datasets such\nas the Human Connectome Project [10] or the meta-analytic BrainMap database [8]. While such\nwork has provided valuable insights, these approaches also have signi\ufb01cant drawbacks. In particular,\nthey typically do not jointly estimate regions along with their mapping onto cognitive processes.\nInstead, they \ufb01rst extract a set of neural regions (e.g., via ICA performed on resting-state data), and\nthen in a separate stage\u2014if at all\u2014estimate a mapping onto cognitive functions. Such approaches do\nnot allow information regarding cognitive function to constrain the spatial characterization of the\nregions. Moreover, many data-driven parcellation approaches involve a hard assignment of each brain\nvoxel to a single parcel or cluster, an assumption that violates the many-to-many nature of functional\nbrain networks. Ideally, a functional-anatomical atlas of human cognition should allow the spatial\nand functional correlates of each atom or unit to be jointly characterized, where the function of each\nregion constrains its spatial boundaries, and vice-versa.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fIn the current work, we propose Generalized Correspondence LDA (GC-LDA) \u2013 a novel generaliza-\ntion of the Correspondence-LDA model [2] for modeling multiple data types, where one data type\ndescribes the other. While the proposed approach is general and can be applied to a variety of data,\nour work is motivated by its application to neuroimaging meta-analysis. To that end, we consider\nseveral GC-LDA models that we apply to the Neurosynth [12] corpus, consisting of the document\ntext and neural activation data from a large body of neuroimaging publications. In this context, the\nmodels extract a set of neural \u201ctopics\u201d, where each topic corresponds to a functional brain region. For\neach topic, the model describes its spatial extent (captured via probability distributions over neural\nactivation) and cognitive function (captured via probability distributions over linguistic terms). These\nmodels provide a novel approach for jointly identifying the spatial location and cognitive mapping of\nfunctional brain regions, that is consistent with the many-to-many nature of functional brain networks.\nFurthermore, to the best of our knowledge, one of the GC-LDA variants provides the \ufb01rst automated\nmeasure of the lateralization of cognitive functions based on large-scale imaging data.\nThe GC-LDA and Correspondence-LDA models are extensions of Latent Dirichlet Allocation (LDA)\n[3]. Several Bayesian methods with similarities (or equivalences) to LDA have been applied to\ndifferent types of neuroimaging data. Poldrack et al. (2012) used standard LDA to derive topics\nfrom the text of the Neurosynth database and then projected the topics onto activation space based on\ndocument-topic loadings [7]. Yeo et al. (2014) used a variant of the Author-Topic model to model the\nBrainMap Database [13]. Manning et al. (2014) described a Bayesian method \u201cTopographic Factor\nAnalysis\u201d to identify brain regions based on the raw fMRI images (but not text) extracted from a set\nof controlled experiments, which can later be mapped on functional categories [5].\nRelative to the Correspondence-LDA model, the GC-LDA model incorporates: (i) the ability to\nassociate different types of spatial distributions with each topic, (ii) \ufb02exibility in how strictly the\nmodel enforces a correspondence between the textual and spatial data within each document, and (iii)\nthe ability to incorporate a-priori spatial structure, e.g., encouraging relatively homologous functional\nregions located in each brain hemisphere. As we show, these aspects of GC-LDA have a signi\ufb01cant\neffect on the quality of the estimated topics, as well as on the models\u2019 ability to predict missing data.\n\n2 Models\n\nIn this paper we propose a set of unsupervised generative models based on the Correspondence-LDA\nmodel [2] that we use to jointly model text and brain activations from the Neurosynth meta-analytic\ndatabase [12]. Each of these models, as well as Correspondence-LDA, can be viewed as special cases\nof a broader model that we will refer to as Generalized Correspondence-LDA (GC-LDA). In the\nsection below, we describe the GC-LDA model and its relationship to Correspondence-LDA. We\nthen detail the speci\ufb01c instances of the model that we use throughout the remainder of the paper. A\nsummary of the notation used throughout the paper is provided in Table 1.\n\n2.1 Generalized Correspondence LDA (GC-LDA)\n\n(cid:8)w(d)\nkens(cid:8)x(d)\n\n1 , w(d)\n\n2 , ..., w(d)\nN (d)\n1 , x(d)\n\nw\n\nEach document d in the corpus is comprised of two types of data: a set of word tokens\n\n(cid:9) consisting of unigrams and/or n-grams, and a set of peak activation to-\n(cid:9), where N (d)\n\nw and N (d)\n\nx\n\nx\n\n2 , ..., x(d)\nN (d)\n\nare the number of word and activation tokens in\ndocument d, respectively. In the target application, each token xi is a 3-dimensional vector corre-\nsponding to the peak activation coordinates of a value reported in fMRI publications. However, we\nnote that this model can be directly applied to other types of data, such as segmented images, where\neach xi corresponds to a vector of real-valued features extracted from each image segment (c.f. [2]).\nGC-LDA is described by the following generative process (depicted in Figure 1.A):\n\n1. For each topic t \u2208(cid:8)1, ..., T(cid:9)1:\n\n(a) Sample a Multinomial distribution over word types \u03c6(t) \u223c Dirichlet(\u03b2)\n\n2. For each document d \u2208 {1, ..., D}:\n\n1To make the model fully generative, one could additionally put a prior on the spatial distribution parameters\n\u039b(t) and sample them. For the purposes of the present paper we do not specify a prior on these parameters, and\ntherefore leave this out of the generative process.\n\n2\n\n\fTable 1: Table of notation used throughout the paper\n\nModel speci\ufb01cation\n\nNotation Meaning\n\nN (d)\n\nwi, xi\nw , N (d)\n\nThe ith word token and peak activation token in the corpus, respectively\nThe number of word tokens and peak activation tokens in document d, respectively\n\nThe number of topics in the model\n\nx\nD The number of documents in the corpus\nT\nR The number of components/subregions in each topic\u2019s spatial distribution (subregions model)\nzi\nyi\nz(d), y(d)\nN Y D\nci\n\u039b(t)\n\nIndicator variable assigning word token wi to a topic\nIndicator variable assigning activation token xi to a topic\nThe set of all indicator variables for word tokens and activation tokens in document d\nThe number of activation tokens within document d that are assigned to topic t\nIndicator variable assigning activation token yi to a subregion (subregion models)\nPlaceholder for all spatial parameters for topic t\n\ntd\n\nGaussian parameters for subregion r in topic t (subregion models)\n\n\u00b5(t), \u03c3(t) Gaussian parameters for topic t\n\u00b5(t)\nr , \u03c3(t)\nr\n\u03c6(t) Multinomial distribution over word types for topic t\n\u03c6(t)\nw\n\u03b8(d) Multinomial distribution over topics for document d\n\u03b8(d)\nt\n\u03c0(t) Multinomial distribution over subregions for topic t (subregion models)\n\u03c0(t)\nr\n\nProbability of subregion r given topic t (subregion models)\n\nProbability of word type w given topic t\n\nProbability of topic t given document d\n\n\u03b2, \u03b1, \u03b3 Model hyperparameters\n\n\u03b4 Model hyperparameter (subregion models)\n\n(a) Sample a Multinomial distribution over topics \u03b8(d) \u223c Dirichlet(\u03b1)\n\n(b) For each peak activation token xi, i \u2208(cid:8)1, ..., N (d)\n(c) For each word token wi, i \u2208(cid:8)1, ..., N (d)\n\n(cid:9):\n\nw\n\nx\n\ni. Sample indicator variable yi from Multinomial(\u03b8(d))\nii. Sample a peak activation token xi from the spatial distribution: xi \u223c f (\u039b(yi))\n\n(cid:9):\n\n(cid:16) N Y D\n\ni. Sample indicator variable zi from Multinomial\n\n,\nis the number of activation tokens y in document d that are assigned to topic t,\n\n1d +\u03b3\nx +\u03b3\u2217T\n(d)\n\n2d +\u03b3\nx +\u03b3\u2217T\n(d)\n\nT d +\u03b3\nx +\u03b3\u2217T\n(d)\n\nN\n\nN\n\nN\n\n, ..., N Y D\n\n, N Y D\n\nis the total number of activation tokens in d, and \u03b3 is a hyperparameter\n\nwhere N Y D\nN (d)\n\ntd\n\nx\n\nii. Sample a word token wi from Multinomial(\u03c6(zi))\n\n(cid:17)\n\n\uf8eb\uf8edN (d)\nx(cid:89)\n\n\uf8f6\uf8f8\u00b7\n\n\uf8eb\uf8edN (d)\nw(cid:89)\n\n\uf8f6\uf8f8 (1)\n\nIntuitively, in the present application of GC-LDA, each topic corresponds to a functional region of the\nbrain, where the linguistic features for the topic describe the cognitive processes associated with the\nspatial distribution of the topic. The resulting joint distribution of all observed peak activation tokens,\nword tokens, and latent parameters for each individual document in the GC-LDA model is as follows:\n\np(x, w, z, y, \u03b8) = p(\u03b8|\u03b1)\u00b7\n\np(yi|\u03b8(d))p(xi|\u039b(yi))\n\np(zj|y(d), \u03b3)p(wj|\u03c6(zj ))\n\ni=1\n\nj=1\n\nNote that when \u03b3 = 0, and the spatial distribution for each topic is speci\ufb01ed as a single multivariate\nGaussian distribution, the model becomes equivalent to a smoothed version of the Correspondence\nLDA model described by Blei & Jordan (2003) [2].2\n\ni\n\nindicator variables; in [2], zi is sampled uniformly from (1, ..., N (d)\n\n2We note that [2] uses a different generative description for how the zi variables are sampled conditional on\nthe y(d)\ny ), and then wi is sampled from\nthe multinomial distribution of the topic y(d)\nthat zi points to. This ends up being functionally equivalent to\nthe generative description for zi given here when \u03b3 = 0. Additionally, in [2], no prior is put on \u03c6(t), unlike in\nGC-LDA. Therefore, when using GC-LDA with a single multivariate Gaussian and \u03b3 = 0, it is equivalent to a\nsmoothed version of Correspondence-LDA. Dirichlet priors have been demonstrated to be bene\ufb01cial to model\nperformance [1], so including a prior on \u03c6(t) in GC-LDA should have a positive impact.\n\ni\n\n3\n\n\fFigure 1: (A) Graphical model for the Generalized Correspondence-LDA model, GC-LDA. (B)\nGraphical model for GC-LDA with spatial distributions modeled as a single multivariate Gaussian\n(equivalent to a smoothed version of Correspondence-LDA if \u03b3 = 0)2. (C) Graphical model for\nGC-LDA with subregions, with spatial distributions modeled as a mixture of multivariate Gaussians\n\nA key aspect of this model is that it induces a correspondence between the number of activation\ntokens and the number of word tokens within a document that will be assigned to the same topic. The\nhyperparameter \u03b3 controls the strength of this correspondence. If \u03b3 = 0, then there is zero probability\nthat a word for document d will be sampled from topic t if no peak activations in d were sampled\nfrom t. As \u03b3 becomes larger, this constraint is relaxed. Although intuitively one might want \u03b3 to be\nzero in order to maximize the correspondence between the spatial and linguistic information, we have\nfound that setting \u03b3 > 0 leads to signi\ufb01cantly better model performance. We conjecture that using a\nnon-zero \u03b3 allows the parameter space to be more ef\ufb01ciently explored during inference, and that it\nimproves the model\u2019s ability to handle data sparsity and noise in high dimensional spaces, similar to\nthe role that the \u03b1 and \u03b2 hyperparameters serve in standard LDA [1].\n\n2.2 Versions of GC-LDA Employed in Current Paper\nThere are multiple reasonable choices for the spatial distribution p(xi|\u039b(yi)) in GC-LDA, depending\nupon the application and the goals of the modeler. For the purposes of the current paper, we considered\nthree variants that are motivated by the target application. The \ufb01rst model shown in Figure 1.B\nemploys a single multivariate Gaussian distribution for each topic\u2019s spatial distribution \u2013 and is\ntherefore equivalent to a smoothed version of Correspondence-LDA if setting \u03b3 = 0. The generative\nprocess for this model is the same as speci\ufb01ed above, with generative step (b.ii) modi\ufb01ed as follows:\nSample peak activation token xi from from a Gaussian distribution with parameters \u00b5(yi) and \u03c3(yi).\nWe refer to this model as the \u201cno-subregions\u201d model.\nThe second model and third model both employ Gaussian mixtures with R = 2 components for\neach topic\u2019s spatial distribution, and are shown in Figure 1.C. Employing a Gaussian mixture gives\nthe model more \ufb02exibility in terms of the types of spatial distributions that can be associated with\na topic. This is notably useful in modeling spatial distributions associated with neural activity, as\nit allows the model to learn topics where a single cognitive function (captured by the linguistic\ndistribution) is associated with spatially discontiguous patterns of activations. In the second GC-LDA\nmodel we present\u2014which we refer to as the \u201cunconstrained subregions\u201d model\u2014the Gaussian\nmixture components are unconstrained. In the third version of GC-LDA\u2014which we refer to as the\n\u201cconstrained subregions\u201d model\u2014the Gaussian components are constrained to have symmetric means\nwith respect to their distance from the origin along the horizontal spatial axis (a plane corresponding\nto the longitudinal \ufb01ssure in the brain). This constraint is consistent with results from meta-analyses\nof the fMRI literature, where most studied functions display a high degree of bilateral symmetry\n[6, 12].\nThe use of mixture models for representing the spatial distribution in GC-LDA requires the additional\nparameters c, \u03c0, and hyperparameter \u03b4, as well as additional modi\ufb01cations to the description of\nthe generative process. Each topic\u2019s spatial distribution in these models is now associated with a\nmultinomial probability distribution \u03c0(t) giving the probability of sampling each component r from\neach topic t, where \u03c0(t)\nis the probability of sampling the rth component (which we will refer to as a\nr\n\n4\n\nT R D NW \u03b3\tw NX y x \u03b8 \u03b1\tT \u03c6\t\u03b2\t\u00b5\t\u03c3\tz D NW \u03b3\tw NX y x \u03b8 \u03b1\t\u03c6\t\u03b2\t\u00b5\t\u03c3\tz \u03c0\tc\t\u03b4 D NW \u03b3\tw NX y x \u03b8 \u03b1\tT \u03c6\t\u03b2\t\u03bb1\t\u03bbN\t\u2026\tz (A) (B) (C) \fsubregion) from the tth topic. Variable ci is an indicator variable that assigns each activation token\nxi to a subregion r of the topic to which it is assigned via yi. A full description of the generative\nprocess for these models is provided in Section 1 of the supplementary materials3.\n\n2.3\n\nInference for GC-LDA\n\nExact probabilistic inference for the GC-LDA model is intractable. We employed collapsed Gibbs\nsampling for posterior inference \u2013 collapsing out \u03b8(d), \u03c6(t), and \u03c0(t) while sampling the indicator\nvariables yi, zi and ci. Spatial distribution parameters \u039b(t) are estimated via maximum likelihood.\nThe per-iteration computational complexity of inference is O(T (NW + NX R)), where T is the\nnumber of topics, R is the number of subregions, and NW and NX are the total number of word\ntokens and activation tokens in the corpus, respectively. Details of the inference methods and sampling\nequations are provided in Section 2 of the supplement.\n\n3 Experimental Evaluation\n\nWe refer to the three versions of GC-LDA described in Section 2 as (1) the \u201cno subregions\u201d model,\nfor the model in which each topic\u2019s spatial distribution is a single multivariate Gaussian distribution,\n(2) the \u201cunconstrained subregions\u201d model, for the model in which each topic\u2019s spatial distribution is a\nmixture of R = 2 unconstrained Gaussian distributions, and (3) the \u201cconstrained subregions\u201d model,\nfor the model in which each topic\u2019s spatial distribution is a mixture of R = 2 Gaussian distributions\nwhose means are constrained to be symmetric along the horizontal spatial dimension with respect to\ntheir distance from the origin.\nOur empirical evaluations of the GC-LDA model are based on the application of these models to the\nNeurosynth meta-analytic database [12]. We \ufb01rst illustrate and contrast the qualitative properties of\ntopics that are extracted by the three versions of GC-LDA4. We then provide a quantitative model\ncomparison, in which the models are evaluated in terms of their ability to predict held out data. These\nresults highlight the promise of GC-LDA and this type of modeling for jointly extracting the spatial\nextent and cognitive functions of neuroanatomical brain regions.\nNeurosynth Database: Neurosynth [12] is a publicly available database consisting of data automati-\ncally extracted from a large collection of functional magnetic resonance imaging (fMRI) publications5.\nFor each publication, the database contains the abstract text and all reported 3-dimensional peak\nactivation coordinates (in MNI space) in the study. The text was pre-processed to remove common\nstop-words. For the version of the Neurosynth database employed in the current paper, there were\n11,362 total publications, which had on average 35 peak activation tokens and 46 word tokens after\npreprocessing (corresponding to approximately 400k activation and 520k word tokens in total).\n\n3.1 Visualizing GC-LDA Topics\n\nIn Figure 2 we present several illustrative examples of topics for all three GC-LDA variants that we\nconsidered. For each topic, we illustrate the topic\u2019s distribution over word types via a word cloud,\nwhere the sizes of words are proportional to their probabilities \u03c6(t)\nw in the model. Each topic\u2019s spatial\ndistribution over neural activations is illustrated via a kernel-smoothed representation of all activation\ntokens that were assigned to the topic, overlaid on an image of the brain. For the models that\nrepresent spatial distributions using Gaussian mixtures (the unconstrained and constrained subregions\nmodels), activations are color-coded based on which subregion they are assigned to, and the mixture\nweights for the subregions \u03c0(t)\nare depicted above the activation image on the left. In the constrained\nr\nsubregions model (where the means of the two Gaussians were constrained to be symmetric along\nthe horizontal axis) the two subregions correspond to a \u2018left\u2019 and \u2018right\u2019 hemisphere subregion. The\nfollowing parameter settings were used for generating the images in Figure 2: T = 200, \u03b1 = .1,\n\u03b2 = .01, \u03b3 = .01, and for the models with subregions, \u03b4 = 1.0.\n\n3Note that these models are still instances of GC-LDA as presented in Figure 1.1; they can be equivalently\nformulated by marginalizing out the ci variables, such that the probability f (xi|\u039b(t)) depends directly on the\nparameters of each component, and the component probabilities given by \u03c0(t).\n4A brief discussion of the stability of topics extracted by GC-LDA is provided in Section 3 of the supplement\n5Additional details and Neurosynth data can be found at http://neurosynth.org/\n\n5\n\n\fFigure 2: Illustrative examples of topics extracted for the three GC-LDA variants. Probability\ndistributions over word types \u03c6(t) are represented via word clouds, where word sizes are proportional\nto \u03c6(t)\nw . Spatial distributions are illustrated using kernel-smoothed representations of all activation\ntokens assigned to each topic. For the models with subregions, each activation token\u2019s color (blue or\nred) corresponds to the subregion r that the token is assigned to.\n\nFor nearly all of the topics shown in Figure 2, the spatial and linguistic distributions closely correspond\nto functional regions that are extensively described in the literature (e.g., motor function in primary\nmotor cortex; face processing in the fusiform gyrus, etc.). We note that a key feature of all versions\nof the GC-LDA model, relative to the majority of existing methods in the literature, is that the\nmodel is able to capture the one-to-many mapping from neural regions onto cognitive functions.\nFor example, in all model variants, we observe topics corresponding to auditory processing and\nlanguage processing (e.g., the topics shown in panels B1 and B3 for the subregions model). While\nthese cognitive processes are distinct, they have partial overlap with respect to the brain networks\nthey recruit \u2013 speci\ufb01cally, the superior temporal sulcus in the left hemisphere.\nFor functional regions that are relatively medial, the no-subregions model is able to capture bilateral\nhomologues by consolidating them into a single distribution (e.g., the topic shown in A2, which\nspans the medial primary somatomotor cortex in both hemispheres). However, for functional regions\nthat are more laterally localized, the model cannot capture bilateral homologues using a single topic.\nFor cognitive processes that are highly lateralized (such as language processing, shown in A1, B1\n\n6\n\n\fand C1) this poses no concern. However, for functional regions that are laterally distant and do have\nspatial symmetry, the model ends up distributing the functional region across multiple topics\u2013see,\ne.g., the topics shown in A3 and A4 in the no-subregions model, which correspond to the auditory\ncortex in the left and right hemisphere respectively. Given that these two topics (and many other pairs\nof topics that are not shown) correspond to a single cognitive function, it would be preferable if they\nwere represented using a single topic. This can potentially be achieved by increasing the \ufb02exibility\nof the spatial representations associated with each topic, such that the model can capture functional\nregions with distant lateral symmetry or other discontiguous spatial features using a single topic. This\nmotivates the unconstrained and constrained subregions models, in which topic\u2019s spatial distributions\nare represented by Gaussian mixtures.\nIn Figure 2, the topics in panels B3 and C3 illustrate how the subregions models are able to handle\nsymmetric functional regions that are located on the lateral surface of the brain. The lexical dis-\ntribution for each of these individual topics in the subregions models is similar to that of both the\ntopics shown in A3 and A4 of the no-subregions model. However, the spatial distributions in B3 and\nC3 each capture a summation of the two topics from the no subregions model. In the case of the\nconstrained subregion model, the symmetry between the means of the spatial distributions for the\nsubregions is enforced, while for the unconstrained model the symmetry is data-driven and falls out\nof the model.\nWe note that while the unconstrained subregions model picks up spatial symmetry in a signi\ufb01cant\nsubset of topics, it does not always do so. In the case of language processing (panel A1), the\nlack of spatial symmetry is consistent with a large fMRI literature demonstrating that language\nprocessing is highly left-lateralized [11]. And in fact, the two subregions in this topic correspond\napproximately to Wernicke\u2019s and Broca\u2019s areas, which are integral to language comprehension and\nproduction, respectively. In other cases, (e.g., the topics in panels B2 and B4), the unconstrained\nsubregions model partially captures spatial symmetry with a highly-weighted subregion near the\nhorizontal midpoint, but also has an additional low-weighted region that is lateralized. While this\nresult is not necessarily wrong per se, it is somewhat inelegant from a neurobiological standpoint.\nMoreover, there are theoretical reasons to prefer a model in which subregions are always laterally-\nsymmetrical. Speci\ufb01cally, in instances where the subregions are symmetric (the topic in panel B3\nfor the unconstrained subregions model and all topics for the constrained subregions model), the\nsubregion weights provide a measure of the relative lateralization of function. For example, the\nlanguage topic in panel C1 of the constrained subregions model illustrates that while there is neural\nactivation corresponding to linguistic processing in the right hemisphere of the brain, the function is\nstrongly left-lateralized (and vice-versa for face processing, illustrated in panel C2). By enforcing\nthe lateral symmetry in the constrained subregions model, the subregion weights \u03c0(t)\n(illustrated\nr\nabove the left activation images) for each topic inherently correspond to an automated measure of the\nlateralization of the topic\u2019s function. Thus, the constrained model produces what is, to our knowledge,\nthe \ufb01rst data-driven estimation of region-level functional hemispheric asymmetry across the whole\nbrain.\n\n3.2 Predicting Held Out Data\n\n(cid:106)\n\n(cid:107)\n\n(cid:106)\n\n.2N (d)\nw\n\n(cid:107)\n\n.2N (d)\n\nx\n\nThis section describes quantitative comparisons between three GC-LDA models in terms of their\nability to predict held-out data. We split the Neurosynth dataset into a training and test set, where\napproximately 20% of all data in the corpus was put into the test set. For each document, we\nrandomly removed\nword tokens from each document.\nWe trained the models on the remaining data, and then for each model we computed the log-likelihood\nof the test data, both for the word tokens and peak tokens.\nThe space of possible hyperparameters to explore in GC-LDA is vast, so we restrict our comparison\nto the aspects of the model which are novel relative to the original Correspondence-LDA model.\nSpeci\ufb01cally, for all three model variants, we compared the log-likelihood of the test data across\n\ndifferent values of \u03b3, where \u03b3 \u2208(cid:8)0, 0.001, 0.01, 0.1, 1(cid:9). We note again here that the no-subregions\n\npeak activation tokens and\n\nmodel with \u03b3 = 0 is equivalent to a smoothed version of Correspondence-LDA [2] (see footnote 2\nfor additional clari\ufb01cation). The remainder of the parameters were \ufb01xed as follows (chosen based on\na combination of precedent from the topic modeling literature and preliminary model exploration):\nT = 100, \u03b1 = .1, and \u03b2 = .01 for all models, and \u03b4 = 1.0 for the models with subregions. All\nmodels were trained for 1000 iterations.\n\n7\n\n\fFigure 3 presents the held out log-likelihoods for all models across different settings of \u03b3, in terms\nof (i) the total log-likelihood for both activation tokens and word tokens (left) (ii) log-likelihood\nfor activation tokens only (middle), and (iii) log likelihood for word tokens only (right). For both\nactivation tokens and word tokens, for all three versions of GC-LDA, using a non-zero \u03b3 leads\nto signi\ufb01cant improvement in performance. In terms of predicting activation tokens alone, there\nis a monotonic relationship between the size of \u03b3 and log-likelihood. This is unsurprising, since\nincreasing \u03b3 reduces the extent that word tokens constrain the spatial \ufb01t of the model. In terms of\npredicting word tokens (and overall log-likelihood), the effect of \u03b3 shows an inverted-U function,\nwith the best performance in the range of .01 to .1. These patterns were consistent across all three\nvariants of GC-LDA. Taken together, our results suggest that using a non-zero \u03b3 results in a signi\ufb01cant\nimprovement over the Correspondence-LDA model.\nIn terms of comparisons across model variants, we found that both subregions models were signi\ufb01cant\nimprovements over the no-subregions models in terms of total log-likelihood, although the no-\nsubregions model performed slightly better than the constrained subregions model at predicting word\ntokens. In terms of the two subregions models, performance is overall fairly similar. Generally,\nthe constrained subregions model performs slightly better than the unconstrained model in terms\nof predicting peak tokens, but slightly worse in terms of predicting word tokens. The differences\nbetween the two subregions models in terms of total log-likelihood were negligible. These results do\nnot provide a strong statistical case for choosing one subregions model over the other; instead, they\nsuggest that the modeler ought to choose between models based on their respective theoretical or\nqualitative properties (e.g., biological plausibility, as discussed in Section 3.1).\n\nFigure 3: Log Likelihoods of held out data for the three GC-LDA models as a function of model\nparameter \u03b3. Left: total log-likelihood (activation tokens + word tokens). Middle: log-likelihood of\nactivation tokens only. Right: log-likelihood of word tokens only.\n\n4 Summary\n\nWe have presented generalized correspondence LDA (GC-LDA) \u2013 a generalization of the\nCorrespondence-LDA model, with a focus on three variants that capture spatial properties mo-\ntivated by neuroimaging applications. We illustrated how this model can be applied to a novel type of\nmetadata\u2014namely, the spatial peak activation coordinates reported in fMRI publications\u2014and how it\ncan be used to generate a relatively comprehensive atlas of functional brain regions. Our quantitative\ncomparisons demonstrate that the GC-LDA model outperforms the original Correspondence-LDA\nmodel at predicting both missing word tokens and missing activation peak tokens. This improvement\nwas demonstrated in terms of both the introduction of the \u03b3 parameter, and with respect to alternative\nparameterizations of topics\u2019 spatial distributions.\nBeyond these quantitative results, our qualitative analysis demonstrates that the model can recover\ninterpretable topics corresponding closely to known functional regions of the brain. We also showed\nthat one variant of the model can recover known features regarding the hemispheric lateralization\nof certain cognitive functions. These models show promise for the \ufb01eld of cognitive neuroscience,\nboth for summarizing existing results and for generating novel hypotheses. We also expect that novel\nfeatures of GC-LDA can be carried over to other extensions of Correspondence-LDA in the literature.\nIn future work, we plan to explore other spatial variants of these models that may better capture the\nmorphological features of distinct brain regions \u2013 e.g., using hierarchical priors that can capture the\nhierarchical organization of brain systems. We also hope to improve the model by incorporating\nfeatures such as the correlation between topics. Applications and extensions of our approach for\nmore standard image processing applications may also be a fruitful area of research.\n\n8\n\nActivations only Words only Activations + Words Log-Likelihood \fReferences\n[1] Arthur Asuncion, Max Welling, Padhraic Smyth, and Yee Whye Teh. On smoothing and inference for\ntopic models. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Arti\ufb01cial Intelligence,\npages 27\u201334. AUAI Press, 2009.\n\n[2] David M Blei and Michael I Jordan. Modeling annotated data.\n\nIn Proceedings of the 26th annual\ninternational ACM SIGIR conference on Research and development in informaion retrieval, pages 127\u2013134.\nACM, 2003.\n\n[3] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. the Journal of machine\n\nLearning research, 3:993\u20131022, 2003.\n\n[4] Vince D Calhoun, Jingyu Liu, and T\u00fclay Adal\u0131. A review of group ica for fmri data and ica for joint\n\ninference of imaging, genetic, and erp data. Neuroimage, 45(1):S163\u2013S172, 2009.\n\n[5] Jeremy R Manning, Rajesh Ranganath, Kenneth A Norman, and David M Blei. Topographic factor analysis:\n\na bayesian model for inferring brain networks from neural data. PloS one, 9(5):e94914, 2014.\n\n[6] Adrian M Owen, Kathryn M McMillan, Angela R Laird, and Ed Bullmore. N-back working memory\nparadigm: A meta-analysis of normative functional neuroimaging studies. Human brain mapping, 25(1):46\u2013\n59, 2005.\n\n[7] Russell A Poldrack, Jeanette A Mumford, Tom Schonberg, Donald Kalar, Bishal Barman, and Tal Yarkoni.\nDiscovering relations between mind, brain, and mental disorders using topic mapping. PLoS Comput Biol,\n8(10):e1002707, 2012.\n\n[8] Stephen M Smith, Peter T Fox, Karla L Miller, David C Glahn, P Mickle Fox, Clare E Mackay, Nicola\nFilippini, Kate E Watkins, Roberto Toro, Angela R Laird, et al. Correspondence of the brain\u2019s functional\narchitecture during activation and rest. Proceedings of the National Academy of Sciences, 106(31):13040\u2013\n13045, 2009.\n\n[9] Bertrand Thirion, Ga\u00ebl Varoquaux, Elvis Dohmatob, and Jean-Baptiste Poline. Which fmri clustering gives\n\ngood brain parcellations? Frontiers in neuroscience, 8(167):13, 2014.\n\n[10] David C Van Essen, Stephen M Smith, Deanna M Barch, Timothy EJ Behrens, Essa Yacoub, Kamil Ugurbil,\nWU-Minn HCP Consortium, et al. The wu-minn human connectome project: an overview. Neuroimage,\n80:62\u201379, 2013.\n\n[11] Mathieu Vigneau, Virginie Beaucousin, Pierre-Yves Herve, Hugues Duffau, Fabrice Crivello, Olivier\nHoude, Bernard Mazoyer, and Nathalie Tzourio-Mazoyer. Meta-analyzing left hemisphere language areas:\nphonology, semantics, and sentence processing. Neuroimage, 30(4):1414\u20131432, 2006.\n\n[12] Tal Yarkoni, Russell A Poldrack, Thomas E Nichols, David C Van Essen, and Tor D Wager. Large-scale\n\nautomated synthesis of human functional neuroimaging data. Nature methods, 8(8):665\u2013670, 2011.\n\n[13] BT Thomas Yeo, Fenna M Krienen, Simon B Eickhoff, Siti N Yaakub, Peter T Fox, Randy L Buckner,\nChristopher L Asplund, and Michael WL Chee. Functional specialization and \ufb02exibility in human\nassociation cortex. Cerebral Cortex, page bhu217, 2014.\n\n9\n\n\f", "award": [], "sourceid": 635, "authors": [{"given_name": "Timothy", "family_name": "Rubin", "institution": "Indiana University"}, {"given_name": "Oluwasanmi", "family_name": "Koyejo", "institution": "UIUC"}, {"given_name": "Michael", "family_name": "Jones", "institution": "Indiana University"}, {"given_name": "Tal", "family_name": "Yarkoni", "institution": "University of Texas at Austin"}]}