{"title": "Learning Macroscopic Brain Connectomes via Group-Sparse Factorization", "book": "Advances in Neural Information Processing Systems", "page_first": 8849, "page_last": 8859, "abstract": "Mapping structural brain connectomes for living human brains typically requires expert analysis and rule-based models on diffusion-weighted magnetic resonance imaging. A data-driven approach, however, could overcome limitations in such rule-based approaches and improve precision mappings for individuals. In this work, we explore a framework that facilitates applying learning algorithms to automatically extract brain connectomes. Using a tensor encoding, we design an objective with a group-regularizer that prefers biologically plausible fascicle structure. We show that the objective is convex and has unique solutions, ensuring identifiable connectomes for an individual. We develop an efficient optimization strategy for this extremely high-dimensional sparse problem, by reducing the number of parameters using a greedy algorithm designed specifically for the problem. We show that this greedy algorithm significantly improves on a standard greedy algorithm, called Orthogonal Matching Pursuit. We conclude with an analysis of the solutions found by our method, showing we can accurately reconstruct the diffusion information while maintaining contiguous fascicles with smooth direction changes.", "full_text": "Learning Macroscopic Brain Connectomes via\n\nGroup-Sparse Factorization\n\nFarzane Aminmansour1, Andrew Patterson1, Lei Le2, Yisu Peng3, Daniel Mitchell1, Franco Pestilli4,\n\nCesar Caiafa5,6, Russell Greiner1 and Martha White1\n\n1Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada\n\n2Department of Computer Science, Indiana University, Bloomington, Indiana, USA\n\n3Department of Computer Science, Northeastern University, Boston, Massachusetts, USA\n\n4Department of Psychological and Brain Sciences, Indiana University, Bloomington, Indiana, USA\n5Instituto Argentino de Radioastronom\u00eda- CCT La Plata, CONICET / CIC-PBA, V. Elisa, Argentina\n\n6Tensor Learning Unit- Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan\n{aminmans, ap3, daniel7, rgreiner whitem}@ualberta.ca, {leile}@iu.edu,\n{peng.yis}@husky.neu.edu, {franpest}@indiana.edu, {ccaiafa}@gmail.com\n\nAbstract\n\nMapping structural brain connectomes for living human brains typically requires\nexpert analysis and rule-based models on diffusion-weighted magnetic resonance\nimaging. A data-driven approach, however, could overcome limitations in such rule-\nbased approaches and improve precision mappings for individuals. In this work, we\nexplore a framework that facilitates applying learning algorithms to automatically\nextract brain connectomes. Using a tensor encoding, we design an objective with a\ngroup-regularizer that prefers biologically plausible fascicle structure. We show that\nthe objective is convex and has a unique solution, ensuring identi\ufb01able connectomes\nfor an individual. We develop an ef\ufb01cient optimization strategy for this extremely\nhigh-dimensional sparse problem, by reducing the number of parameters using a\ngreedy algorithm designed speci\ufb01cally for the problem. We show that this greedy\nalgorithm signi\ufb01cantly improves on a standard greedy algorithm, called Orthogonal\nMatching Pursuit. We conclude with an analysis of the solutions found by our\nmethod, showing we can accurately reconstruct the diffusion information while\nmaintaining contiguous fascicles with smooth direction changes.\n\n1\n\nIntroduction\n\nA fundamental challenge in neuroscience is to estimate the structure of white matter connectivity\nin the human brain or connectomes [14, 29]. Connectomes are made up of neuronal axon bundles\nwrapped with myelin sheaths, called fascicles, and connect different areas of the brain. Acquiring\ninformation about brain tissue is possible by measuring the diffusion of water molecules at different\nspatial directions. Fascicles can be inferred by employing tractography algorithms, which calculate\nmathematical models from the diffusion-weighted signal. Currently, diffusion-weighted magnetic\nresonance imaging (dMRI) combined with \ufb01ber tractography is the only method available to map\nstructural brain connectomes in living human brains [3, 30, 23]. This method has revolutionized our\nunderstanding of the network structure of the human brain and the role of white matter in health and\ndisease.\nStandard practice in mapping connectomes is comprised of several steps:a dMRI is acquired (Fig\n1A), a model is \ufb01t to the signal in each brain voxel (Fig. 1B) and a tractography algorithm is used to\nestimate long range brain connections (Fig. 1C). Multiple models can be used at each one of these\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: A: Measurements of white matter using diffusion-weighted magnetic resonance imaging\n(dMRI). B: Multiple models can describe the dMRI signal in each brain voxel. For example, the\ndiffusion-tensor model (DTI; top, [2]) and the constrained-spherical deconvolution model (CSD,\nbottom; [28]) are commonly used. C: Multiple tractography methods integrate model \ufb01ts across\nvoxels to estimate long-range brain connections. There are many tractography algorithms exist, each\nwith multiple parameters, for both deterministic and probabilistic methods [27]. In principle several\ncombinations of methods and parameters are used by investigators. D: Left: Two major white matter\ntracts, the Arcuate Fasciculus in gold and superior lateral fasciculus in lilac, reconstructed in a single\nbrain using deterministic (top) and probabilistic (bottom) tractography. Right: Cortical termination\nof the superior lateral fasciculus in the same brain estimated with deterministic (top) and probabilistic\n(bottom) tractography. Arrows show multiple possible choices of model and parameters to generate\nconnectome estimates (D) from dMRI data (A).\n\nsteps and each model allows multiple parameters to be set. Currently, best practice in the \ufb01eld is\nto choose one model and pick a single set of parameters using heuristics such as recommendations\nby experts or previous publications. This rule-based approach has several limitations. For example,\ndifferent combinations of models and parameters generate different solutions (Fig 1D). Figure 1\nexempli\ufb01es how from a single dMRI data set collected in a brain, choosing a single model and\nparameters set (Fig. 1A-C) can generate vastly different connectome mapping results (Fig 1D;\nadapted from [20]). In the \ufb01gure, we show that both estimates of white matter tracts (Fig 1D left) and\ncortical connections (Fig. 1D right) vary substantially even within a single brain.\nThere have been some supervised learning approaches proposed for tractography. These supervised\nmethods, however, such as those using random forests [17] and neural networks [22, 5] require\nlabelled data. This means tractography solutions must \ufb01rst be given for training, limiting the models\nmainly to mimic expert solutions rather than learn structures beyond them. A few methods have used\nregularized learning strategies, but for different purposes, such as removing false connections in the\ngiven tractography solution [12] and using radial regularization for micro-structure [9].\nThis work presents a fully unsupervised learning framework for tractography. We exploit a recently\nintroduced encoding for connectome data, called ENCODE [8], which represents dMRI (and white\nmatter fascicles) as a tensor factorization. This factorization was previously used only to represent\nexpert connectomes as a tensor, generated using a standard rule-based tractography process introduced\nin Fig. 1. We propose to instead learn this tensor using the dMRI data, to learn the structure of\nbrain connectomes. We introduce a regularized objective that attempts to extract a tensor that re\ufb02ects\na biologically plausible fascicle structure while also reconstructing the diffusion information. We\naddress two key challenges: (1) designing regularizers that adequately capture biologically plausible\ntract structures and (2) optimizing the resulting objective for an extremely high-dimensional and\nsparse tensor. We develop a group regularizer that captures both spatial and directional continuity of\nthe white matter fascicles. We solve this extremely high-dimensional sparse problem using a greedy\nalgorithm to screen the set of possible solutions upfront. We prove both that the objective is convex,\nwith a unique solution, and provide approximation guarantees on the greedy algorithm. We then\nshow that this greedy algorithm much more effectively selects possible solutions, as compared to a\nstandard greedy algorithm called Orthogonal Matching Pursuit (OMP). We show, both quantitatively\nand qualitatively, that the solutions provided by our method effectively reconstruct the diffusion\ninformation in each voxel while maintaining contiguous, smooth fascicles.\n\nThe code is available at: https://github.com/framinmansour/Learning-Macroscopic-Brain-Connectomes-\n\nvia-Group-Sparse-Factorization\n\n2\n\nDTI modelCSD modelA...BDeterministic tractography - Seeding method - Turning angle - Stopping criteria - EtcProbabilistic tractography - Seeding method - Turning angle - Stopping criteria - Etc...C1 cmD\f2 Encoding Brain Connectomes as Tensors\n\nFigure 2: A: The ENCODE method; from natural brain space to tensor encoding. Left: Two example\nwhite matter fascicles (f1 and f2) passing through three voxels (v1, v2 and v3). Right: Encoding of\nthe two fascicles in a three dimensional tensor. The non-zero entries in \u03a6 indicate fascicles orientation\n(1st mode), position (voxel, 2nd mode) and identity (3rd mode). B: Model formulation and group sparse\nregularization. Depiction of how ENCODE facilitates integration of dMRI signal, Y, connectome\nstructure, \u03a6, and a dictionary of predictions of the dMRI signal, D, for each fascicle orientation. The\ngroup regularizers (orange and green squares) de\ufb01nes pairwise groups of neighbouring voxels and\nsimilar orientations. Note that the voxels are linearized to enable \u03a6 and the groups to be visualized.\nThis allows us to \ufb02atten four-dimensional hyper-cubes\u2014three dimensions for voxels and one for\norientations\u2014to squares.\n\nENCODE [8] maps fascicles from their natural brain space into the three dimensions of a sparse\ntensor \u03a6 \u2208 RNa\u00d7Nv\u00d7Nf (Fig. 2A - right). The \ufb01rst dimension of \u03a6 (1st mode, size Na) encodes\nindividual white matter fascicles orientation at each position along their path through the brain.\nIndividual segments (nodes) in a fascicle are coded as non-zero entries in the sparse array (dark-blue\ncubes in Fig. 2A - right). The second dimension of \u03a6 (2nd mode, size Nv) encodes fascicles spatial\nposition within the voxels of dMRI data. Slices in this second dimension represent single voxels\n(cyan slice in Fig. 2A - right). The third dimension (3rd mode, size Nf ) encodes the indices of each\nfascicle within the connectome. Full fascicles are encoded as \u03a6 frontal slices (cf., yellow and blue in\nFig. 2A - right). Within one tract, such as the Arcuate Fasciculus, the model we use has \ufb01ne-grained\norientations Na = 1057, with number of fascicles Nf = 868 and number of voxels Nv = 11, 823.\nENCODE facilitates the integration of measured dMRI signals with the connectome structure (Fig.\n2B - right). DMRI measurements are collected with and without a diffusion sensitization magnetic\ngradient and along several gradient directions or N\u03b8, i.e. \u03b8 \u2208 R3. In the Arcuate Fasciculus for\ninstance, the data was collected for N\u03b8 = 96 different angles of gradient direction. Then, the dMRI\nsignal is represented as matrix Y \u2208 RN\u03b8\u00d7Nv, which represents the value of diffusion signal received\nfrom each voxel when any individual angle of gradient directions were applied during the scanning.\nMoreover, ENCODE allows factorizing the dMRI signal as the product of a 3-dimensional tensor\n\u03a6 \u2208 RNa\u00d7Nv\u00d7Nf and a dictionary of dMRI signals D \u2208 RN\u03b8\u00d7Na:Y \u2248 \u03a6 \u00d71 D \u00d73 1. The notation\n\u201c\u00d7n\u201d is the tensor-by-matrix product in mode-n (see [15]). The dot product with 1 \u2208 RNf sums over\nthe fascicle dimension.1 The matrix D is a dictionary of representative diffusion signals: each column\n(cid:80)\nrepresents the diffusion signal we expect to receive from any axon in the direction of any possible\nfascicle orientation a by sensitizing magnetic gradient in each direction of \u03b8. More speci\ufb01cally,\nthe entries are computed as follows: D(\u03b8, a) = e\u2212b\u03b8T Qa\u03b8 \u2212 1\n\u03b8 e\u2212b\u03b8T Qa\u03b8, in which Qa is an\napproximation of diffusion tensor per fascicle-voxel and scalar b denotes the diffusion sensitization\ngradient strength. \u03b8T Qa\u03b8 gives us the diffusion at direction \u03b8 generated by fascicle f.\n\nN\u03b8\n\n3 A Tractography Objective for Learning Brain Connectomes\n\nThe original work on ENCODE assumed the tensor \u03a6 was obtained from a tractography algorithm.\nIn this section, we instead use this encoding to design an objective to learn \u03a6 directly from dMRI\n1The original encoding uses a set of fascicles weights w \u2208 RNf , to get Y \u2248 \u03a6 \u00d71 D \u00d73 w. For a \ufb01xed\n\u03a6, w was learned to adjust the magnitude of each fascicle dimension. We do not require this additional vector,\nbecause these magnitudes can be incorporated into \u03a6 and implicitly learned when \u03a6 is learned.\n\n3\n\n1A. Natural brain space and tensor encoding Block regularization:Voxels by directions vicinity continuityB. Model formulation and block reguralizersOrientationVoxelsFasciclesnon-zero entryfasciclefasciclevoxelMRI signal\fdata. First consider the problem of estimating tensor \u03a6 to best predict Y, for a given D \u2208 RN\u03b8\u00d7Na.\nWe can use a standard maximum likelihood approach (see Appendix A for the derivation), to get the\nfollowing reconstruction objective\n\n\u02c6\u03a6 =\n\nargmin\n\n\u03a6\u2208RNa\u00d7Nv\u00d7Nf\n\n(cid:107)Y \u2212 \u03a6 \u00d71 D \u00d73 1(cid:107)2\nF ,\n\n(1)\n\nwhere (cid:107)\u00b7(cid:107)F is the Frobenius norm that sums up the squared entries of the given matrix. This objective\nprefers \u03a6 that can accurately recreate the diffusion information in Y. This optimization, however, is\nhighly under-constrained, with many possible (dense) solutions.\nIn particular, this objective alone does not enforce a biologically plausible fascicle structure in \u03a6.\nThe tensor \u03a6 should be highly sparse, because each voxel is expected to have only a small number\nof fascicles and orientations [20]. For example, for the Arcuate Fasciculus, we expect at most an\nactivation level in \u03a6 of (Nv \u00d7 10 \u00d7 10/(Na \u00d7 Nv \u00d7 Nf ) \u2248 1e\u22126, using a conservative upper\nbound of 10 fascicles and 10 orientations on average per voxel. Additionally, the fascicles should be\ncontiguous and should not sharply change orientation.\nWe design a group regularizer to encode these properties. Anatomical consistency of fascicles is\nenforced locally within groups of neighboring voxels and orientations. Overlapping groups are used\nto encourage this local consistency to result in global consistency. Group regularization prefers to\nzero all coef\ufb01cients for a group. This zeroing has the effect of clustering non-zero coef\ufb01cients in\nlocal regions within the tensor, ensuring similar fascicles and orientations are active based on spatial\nproximity. Further, overlapping groups encourages neighbouring groups to either both be active\nor inactive for a fascicle and direction. This promotes contiguous fascicles and smooth direction\nchanges. These groups are depicted in Figure 2B, with groups de\ufb01ned separately for each fascicle\n(slice). We describe the group regularizer more formally in the remainder of this section.\nAssume we have groups of voxels GV \u2208 V based on spatial coordinates and groups of orientations\nGA \u2208 A based on orientation similarity. For example, each GV could be a set of 27 voxels in a local\ncube; these cubes of voxels can overlap between groups, such as {(1, 1, 1), (1, 1, 2), . . . , (3, 3, 3)} \u2208\nV and {(2, 1, 1), (2, 1, 2), . . . , (4, 3, 3)} \u2208 V. Each GA can be de\ufb01ned by selecting one atom (one\norientation) and including all orientations in the group that have a small angle to that central atom, i.e.,\nan angle that is below a chosen threshold. Consider one orientation, voxel, fascicle triple (a, v, f ).\nAssume a voxel has a non-zero coef\ufb01cient for a fascicle: \u03a6a,v,f is not zero for some a. A voxel\nwithin the same group GV is likely to have the same fascicle with a similar orientation. A distant\nvoxel, on the other hand, is highly unlikely to share the same fascicle. The goal is to encourage\nas many pairwise groups (GV ,GA) to be inactive\u2014have all zero coef\ufb01cients for a fascicle\u2014and\nconcentrate activation in \u03a6 within groups.\nWe can enforce this group sparsity by adding a regularizer to (1). Let xGA,v,f \u2208 R indicate whether a\nfascicle f is active for voxel v, for any orientation a \u2208 GA. Let xGA,GV ,f be the vector composed\nof these identi\ufb01ers for each v \u2208 GV. Either we want the entire vector xGA,GV ,f to be zero, meaning\nthe fascicle is not active in any of the voxels v \u2208 GV for the orientations a \u2208 GA. Or, we want more\nthan one non-zero entry in this vector, meaning multiple nearby voxels share the same fascicle. This\nsecond criterion is largely enforced by encouraging as many blocks to be zero as possible, because\neach voxel will prefer to activate fascicles and orientations in already active pairs (GV ,GA). As with\nmany sparse approaches, we use an (cid:96)1 regularizer to set entire blocks to zero. In particular, as has\nbeen previously done for block sparsity [26], we can use an (cid:96)1 across the blocks xGA,GV ,f\n\n(cid:88)\n\n(cid:88)\n\nThe outer sums can be seen as an (cid:96)1 norm across the vector of norm values containing (cid:107)xGA,GV ,f(cid:107)2.\nThis encourages (cid:107)xGA,GV ,f(cid:107)2 = 0, which is only possible if xGA,GV ,f = 0.\nFinally, we need to de\ufb01ne a continuous indicator variable xGA,GV ,f to simplify the optimization. A\n0-1 indicator is discontinuous, and would be dif\ufb01cult to optimize. Instead, we use the following\ncontinuous indicator\n\nxGA,GV ,f = [(cid:107)\u03a6GA,v1,f(cid:107)1, . . . ,(cid:107)\u03a6GA,vn,f(cid:107)1]\n\n(3)\nAn entry in xGA,GV ,f is 0 if fascicle f is not active for (GV ,GA). Otherwise, the entry is proportional\nto the sum of the absolute coef\ufb01cient values for that fascicle for orientations in GA.\n\nfor each vi \u2208 GV\n\n(cid:88)\n\nf\u2208F\n\nGV\u2208V\n\nGA\u2208A\n\n(cid:107)xGA,GV ,f(cid:107)2.\n\n(2)\n\n4\n\n\fOur proposed group regularizer is\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\nR(\u03a6) =\n\nf\u2208F\n\nGV\u2208V\n\nGA\u2208A\n\n(cid:107)xGA,GV ,f(cid:107)2 =\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\nf\u2208F\n\nGV\u2208V\n\nGA\u2208A\n\n(cid:118)(cid:117)(cid:117)(cid:116)(cid:88)\n\nv\u2208GV\n\n(cid:32)(cid:88)\n\na\u2208GA\n\n(cid:33)2\n|\u03a6a,v,f|\n\n,\n\n(4)\n\nwhich, combined with equation (1), gives our proposed objective. Given the observed Y, and the\ndictionary D, \ufb01nd the \u03a6 s.t.\n\nmin\n\n\u03a6\u2208RNa\u00d7Nv\u00d7Nf\n\n(cid:107)Y \u2212 \u03a6 \u00d71 D \u00d73 1(cid:107)2\n\nF + \u03bbR(\u03a6)\n\n(5)\n\nfor regularization weight \u03bb > 0. This objective balances between reconstructing diffusion data and\nconstraints on the structure in \u03a6. Crucially, this objective is convex in \u03a6 and has a unique solution,\nwhich we show in Theorem 1 in Appendix B. Uniqueness ensures identi\ufb01able tractography solutions\nand convexity facilitates obtaining optimal solutions.\n\n4 An Ef\ufb01cient Algorithm for the Tractography Objective\n\nStandard gradient descent algorithms can be used directly on (5) to \ufb01nd the optimal solution. Unfor-\ntunately, the number of parameters in the optimization is very large: Nv \u00d7 Nf \u00d7 Na is billions even\nfor just one tract. At the same time, the number of active coef\ufb01cients at the end of the optimization is\nmuch smaller, only on the order of Nv, because there are only handful of fascicles and orientations\nper voxel. Even when initializing \u03a6 to zero, the gradient descent optimization might make all of \u03a6\nactive during the optimization. Screening algorithms have been developed to prune entries for sparse\nproblems [31, 6]. These generic methods, however, still have too many active coef\ufb01cients to make\nthis optimization tractable for wide application, as we have veri\ufb01ed empirically.\nInstead, we can design a screening algorithm specialized to our objective. Orientations can largely be\nselected independently for each voxel, based solely on diffusion information. We can infer the likely\norientations of fascicles in a voxel that could plausibly explain the diffusion information, without\nknowing precisely which fascicles are in that voxel. If we can select a plausible set of orientations for\neach voxel before optimizing the objective, we can signi\ufb01cantly reduce the number of parameters.\nFor example, 20 orientations is a large superset, but would reduce the number of parameters by a\nfactor of 10,000 because the whole Na = 120, 000.\nOne strategy is to generate these orientations greedily, such as with a method like Orthogonal\nMatching Pursuit (OMP). This differs from most screening approaches, which usually iteratively\nprune starting from the full set. Generating orientations starting from an empty set, rather than\npruning, is a more natural strategy for such an extremely sparse solution, where only 0.017% of the\nitems are used. Consider how OMP might generate orientations. For a given voxel v, the next best\norientation is greedily selected based on how much it reduces the residual error for the diffusion. On\nthe \ufb01rst step, it adds the single best orientation for predicting the N\u03b8 = 96 dimensional diffusion\nvector for voxel v. It generates up to a maximum of k orientations greedily and then stops. Then\nonly coef\ufb01cients for this set of orientations will be considered for voxel v in the optimization of the\ntractography objective. This procedure is executed for each voxel, and is very fast.\nThough a greedy strategy for generating orientations is promising, the criterion used by OMP is\nnot suitable for this problem. Using residual errors for the criterion prefers orthogonal or dissimilar\norientations, to provide a basis with which to easily reconstruct the signal. The orientations in\nvoxels, however, are unlikely to be orthogonal. Instead, it is more likely that there are multiple\nfascicles with similar orientations in a voxel, with some fascicles overlapping in a different\u2014but\nnot necessarily orthogonal\u2014direction. We must modify the selection criterion to select a number of\nsimilar orientations to reconstruct the diffusion information in a voxel.\nTo do so, we rely on the more general algorithmic framework for subselecting items from a set, of\nwhich OMP is a special case. We need to de\ufb01ne a criterion which evaluates the quality of subsets S\nfrom the full set of items S. In our setting, S is the full set of orientations and S a subset of those\norientations. Our goal is to \ufb01nd S \u2282 S with |S| \u2264 k such that \u00afg(S) is maximal. If we can guarantee\nthis criterion \u00afg : P(S) \u2192 R is (approximately) submodular, then we can rely on a wealth of literature\nshowing the effectiveness of greedy algorithms for picking S to maximize \u00afg.\n\n5\n\n\fWe use a simple modi\ufb01cation on the criterion for OMP, the g(S) = the squared multiple correlation\n[13]. We propose a simple yet effective modi\ufb01cation, and de\ufb01ne the Orientation Greedy criterion as\n\n(cid:88)\n\ns\u2208S\n\n\u00afg(S)\n\ndef\n= g(S) +\n\ng({s})\n\nThis objective balances between preferring a set S with high multiple correlation, and ensuring that\neach orientation itself is useful. Each orientation likely explains a large proportion of the diffusion\nfor a voxel. This objective will likely prefer to pick two orientations that are similar that recreate the\ndiffusion in the voxel well. This contrasts two orthogonal orientations, that can be linearly combined\nto produce those two orientation but that themselves do not well explain the diffusion information.\nThis modi\ufb01cation is conceptually simple, yet now has a very different meaning. The simplicity of\nthe modi\ufb01cation is also useful for the optimization, since a linear sum of submodular functions is\nitself again submodular. We provide approximation guarantees for this submodular maximization in\nAppendix D, using results for the multiple correlation [13].\nThe full algorithm consists of two key steps. The \ufb01rst step is to screen the orientations, using\nOrientation Greedy in Algorithm 1. We then use subgradient descent to optimize the Tractography\nObjective using this much reduced set of parameters. The second step prunes this superset of possible\norientations further, often to only a couple of orientations. The resulting solution only has a small\nnumber of active fascicles and orientations for each voxel. We provide a detailed derivation and\ndescription of the algorithm in Appendix C.\nThe optimization given the screened orientations remains convex. The main approximation in\nthe algorithm is introduced from the greedy selection of orientations. We provide approximation\nguarantees for how effectively the greedy algorithm maximizes the criterion \u00afg. But, this does not\ncharacterize whether the criterion itself is a suitable strategy for screening. In the next section, we\nfocus our empirical study on the ef\ufb01cacy of this greedy algorithm, which is critical for obtaining\nef\ufb01cient solutions for the tractography objective.\n\n5 Empirical results: Reconstructing the anatomical structure of tracts\n\nWe investigate the properties of the proposed objective on two major structures in the brain. The \ufb01rst\nis the Arcuate Fasciculus, hereafter Arcuate. The other is the Arcuate combined with one branch\nof the Superior Longitudinal Fasciculus, SLF1, hereafter ARC-SLF. Due to space constraints, we\nrelegate additional empirical results on ARC-SLF to Appendix E.6. We learn on data generated by an\nexpert connectome solution within the ENCODE model (Appendix E.2). This allows us to objectively\ninvestigate the ef\ufb01cacy of the objective and greedy optimization strategy, because we have access to\nthe ground truth \u03a6 that generated the data. To the best of our knowledge, this is the \ufb01rst completely\nunsupervised data-driven approach for extracting brain connectomes. We, therefore, focus primarily\non understanding the properties of our learning approach for tractography.\nWe particularly (a) investigate how effectively our Greedy algorithm selects orientations, (b) inves-\ntigate how accurately the group regularized objective with this screening approach can reconstruct\nthe diffusion information, and (c) visualize the plausibility of the solutions produced by our method,\nparticularly in terms of smoothness of the fascicles. Even with screening, this optimization when\nlearning over all fascicles and voxels, is prohibitively expensive for running thorough experiments.\nWe therefore focus \ufb01rst on evaluating the model given the assignment of fascicles to voxels, meaning\nfor the following experiments fascicles are \ufb01xed. Because the largest approximation in the algorithm\nis the greedy selection of orientations, this is the most important step to understand \ufb01rst. For a given\nset of (greedily chosen) orientations, the objective remains convex with a unique solution. We know,\ntherefore, that further optimizing over fascicles as well would only reduce the reconstruction error.\n\n5.1 Screening\n\nWe de\ufb01ne two error metrics to demonstrate the utility of GreedyOrientation over OMP for this task.\nThe \ufb01rst is the total number of orientations present in \u03a6-expert that are not present in \u03a6 generated by\nthe screening approach, measuring the exactness of the solution. The second metric is the minimum\npossible angular distance between each of the orientations in \u03a6-expert with any arbitrary set of\norientations in the corresponding voxel of \u03a6 generated by the screening approach, so that the set\n\n6\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 3: (a): Average number of missing orientations per voxel in candidate sets of increasing size.\n(b): The distribution of angular distances from the ground truth of OMP and GreedyOrientation after\nglobal optimization procedure. The angular distance is the minimum possible distance given some\nweighted combination of selected orientations. (c): Average angular distance between the weighted\nsum of predicted node orientations and the ground truth in each voxel for candidate sets of increasing\nsize.\n\n(a)\n\n(b)\n\nFigure 4: (a): Comparing the distribution of reconstruction error for ground truth, OMP, and\nGreedyOrientation over voxels after optimization. (b): The improvement of reconstruction error\nduring the steps of gradient decent shows that the objective is not able to improve the OMP selected\norientation sets while it is improving the GreedyOrientation choices constantly.\n\nwould provide the best possible approximation of that orientation. The details of algorithm can be\nfound in Appendix E.5.\nWe demonstrate the screening method\u2019s performance using both error metrics in Figure 3. In Figure\n3a, we show the effect of increasing the size of our candidate set of orientations on the number\nof missing orientations compared to the ground truth. GreedyOrientation\u2019s advantage is likely\nbecause OMP continually adds dissimilar orientations, thus is less likely to add the exactly correct\norientations because these are too similar to orientations already in the candidate set. Figure 3b\nshows the minimum angular distance given a linear combination of orientations in the candidate set\ncompared to the ground truth. GreedyOrientation has high probability mass near zero, showing that it\ngenerates appropriate candidate sets. Finally, Figure 3c shows that the angular distances between the\norientations weighted with the optimized weights and ground truth for different size of orientations\ncandidate set.\nWe can clearly see that increasing the size of the orientation set in OMP results in a larger angular\ndistance since more dissimilar orientations are included. On the other hand, the angular distance of\ncandidate sets chosen by GreedyOrientation decreases fast and then stabilized, which indicates that\nGreedyOrientation forward selection criterion is de\ufb01ned well so that the best candidate orientations\napproximate the ground truth are among the immediate ones. Moreover, we can infer the minimum\nbest choice of k since a larger value would not affect the \ufb01nal connectome structure signi\ufb01cantly.\nAlthough, the best choice was k = 10, we set k = 5 in our experiments, which means that we had\nlarger approximation than the best choice.\nWe additionally demonstrate the effects of each screen method on \ufb01nal reconstruction error after\noptimization. Figure 4a shows the distribution of reconstruction error over voxels. Starting the\n\n7\n\n102030405060Size of Candidate Set1.5245Missing Orientations per Voxel(log scale)OMPGreedyReconstruction Error00.15Percent of VoxelsOMPGreedyGround Truth10-210-1102015Optimization StepReconstruction ErrorOMPGreedy103106\f(a) Ground truth\n\n(b) OMP after optimization\n\n(c) Greedy after optimization\n\nFigure 5: Solutions learned after the group sparse optimization for both screening strategies, compared\nto ground truth.\n\noptimization with GreedyOrientation leads to much lower bias in the \ufb01nal optimization result than\nOMP, as demonstrated by the shift of these distributions away from the Ground Truth distribution.\nIn Figure 4b, we show the reconstruction error on each step of optimization. The reconstruction\nerror when initialized with orientations generated by OMP is decreasing at a rate several orders of\nmagnitude slower than GreedyOrientation.\n\n5.2 Group Sparse Optimization\n\nAfter \u03a6 has been initialized with one of the locally greedy screening algorithms, we learn the ap-\npropriate weighting of \u03a6 by optimizing the global objective. We applied batch gradient decent with\n15 iterations and a dynamic step-size value which started from 1e-5 and decreased each time the\nalgorithm could not improve the total objective error. The (cid:96)1 and group regularizer coef\ufb01cients were\nchosen to be 10 for most of the experiments, we tested the following values of the regularization coef-\n\ufb01cient [10\u22123, 10\u22122, . . . , 102, 103] and found that results were negligibly affected. For (cid:96)1 regularizer,\nwe applied a proximal operator to truncate weights less than the threshold of 0.001. The derivation of\nthe gradient and optimization procedure can be found in Appendices C.2 and E.3, respectively. The\nvisualization algorithm, for a given \u03a6, is given in Appendix E.4.\nFigure 5 visualizes the results of \u03a6 after optimization with both OMP and GreedyOrientation\ninitialization strategies. Comparing the GreedyOrientation predicted \u03a6 with expert \u03a6 shows that the\ngroup regularizer performed well in regenerating macrostructure of the Arcuate. Figure 5b shows\nthat the OMP initialization strategy for \u03a6 is not appropriate for this setting, and prevents the global\noptimization procedure from generating the desired macrostructure.\nTo get a better sense for the generated fascicles, we illustrate the best and the worst fascicles for\n\u03a6 initialized with GreedyOrientation and OMP in Figure 6. GreedyOrientation produces plausible\nfascicles in terms of orientation, in some cases seemingly even more so than the ground truth which\nwas obtained with a tractography algorithm. In the best case, in Figure 6a the reconstruction is\nhighly accurate. In the worst case, in Figure 6b, GreedyOrientation produces fascicles with sharply\nchanging direction. Looking closer, the worst reconstructed fascicles tend to be long winding fascicles\nwith abrupt direction changes. Because the objective attempts to minimize these features during\noptimization, these tracts are very dif\ufb01cult to reconstruct. Fascicles such as these are unlikely to occur\nin the brain, and are likely a result of imperfect tractography methods that were used for creating the\nground truth data for this experiment. Solutions with OMP are generally poor.\n\n6 Conclusion and Discussion\n\nIn this work, we considered the problem of learning macroscopic brain connectomes from dMRI\ndata. This involves inferring locations and orientations of fascicles given local measurements of\ndiffusion of water molecules within the white-matter tissue. We proposed a new way to formulate this\nlearning problem, using a tensor encoding. Our proposed group sparse objective facilitates the use of\noptimization algorithms to automatically extract brain structure, without relying on expert tractogra-\nphy solutions. We proposed an ef\ufb01cient greedy screening algorithm for this objective, and proved\n\n8\n\n\f(a) Best 5 GreedyOrientation Fascicles\n\n(b) Worst 5 GreedyOrientation Fascicles\n\n(c) Best 5 OMP Fascicles\n\n(d) Worst 5 OMP Fascicles\n\nFigure 6: Top \ufb01ve best and worst fascicles for OMP and GreedyOrientation after optimization\naccording to reconstruction error. Solid lines show the predicted \u03a6 and dashed lines ground truth.\n\napproximation guarantees for the algorithm. We \ufb01nally demonstrated that our specialized screening\nalgorithm resulted in a much better orientations than a generic greedy subselection algorithm, called\nOMP. The solutions with our group sparse objective, in conjunction with these selected orientations,\nresulted in smooth fascicles and low reconstruction error of the diffusion data. We also highlighted\nsome failures of the solution, and that more needs to be done to get fully plausible solutions.\nOur tractography learning formulation has the potential to open new avenues for learning-based\napproaches for obtaining brain connectomes. This preliminary work was necessarily limited, focused\non providing a sound formulation and providing an initial empirical investigation into the ef\ufb01cacy of\nthe approximations. The next step is to demonstrate the real utility of a full tractography solution\nusing this formulation. This will involve learning solutions across brain datasets; understanding\nstrengths and weaknesses compared to current tractography approaches; potentially incorporating\nnew regularizers and algorithms; and even incorporating different types of data. All of this can build\non the central idea introduced in this work: using a factorization encoding to automatically learn\nbrain structure from data.\n\nAcknowledgments\n\nThis research was funded by NSERC, Amii and CIFAR. Computing was generously provided by\nCompute Canada and Cybera.\nF.P. was supported by NSF IIS-1636893, NSF BCS-1734853, NSF AOC 1916518, NIH NCATS\nUL1TR002529, a Microsoft Research Award, Google Cloud Platform, and the Indiana University\nAreas of Emergent Research initiative \u201cLearning: Brains, Machines, Children.\n\nReferences\n[1] G Allen. Sparse higher-order principal components analysis. In International Conference on\n\nArti\ufb01cial Intelligence and Statistics, 2012.\n\n[2] P J Basser, S Pajevic, C Pierpaoli, J Duda, and A Aldroubi. In vivo \ufb01ber tractography using\n\nDT-MRI data. Magnetic Resonance in Medicine, 44(4):625\u2013632, October 2000.\n\n[3] Danielle S Bassett and Olaf Sporns. Network neuroscience. Nature Neuroscience, 20(3):353\u2013\n\n364, February 2017.\n\n9\n\n\f[4] T E J Behrens, M W Woolrich, M Jenkinson, H Johansen-Berg, R G Nunes, S Clare, P M\nMatthews, J M Brady, and S M Smith. Characterization and propagation of uncertainty in\ndiffusion-weighted MR imaging. Magnetic resonance in medicine, 2003.\n\n[5] Itay Benou and Tammy Riklin Raviv. Deeptract: A probabilistic deep learning framework for\nwhite matter \ufb01ber tractography. In Dinggang Shen, Tianming Liu, Terry M. Peters, Lawrence H.\nStaib, Caroline Essert, Sean Zhou, Pew-Thian Yap, and Ali Khan, editors, Medical Image\nComputing and Computer Assisted Intervention \u2013 MICCAI 2019, pages 626\u2013635, Cham, 2019.\nSpringer International Publishing.\n\n[6] Antoine Bonnefoy, Valentin Emiya, Liva Ralaivola, and R\u00e9mi Gribonval. Dynamic Screening:\nAccelerating First-Order Algorithms for the Lasso and Group-Lasso. IEEE Transactions on\nSignal Processing, 2015.\n\n[7] Cesar F Caiafa and Andrzej Cichocki. Computing sparse representations of multidimensional\n\nsignals using Kronecker bases. Neural Computation, 2013.\n\n[8] Cesar F Caiafa, Olaf Sporns, Andrew Saykin, and Franco Pestilli. Uni\ufb01ed representation of\ntractography and diffusion-weighted mri data using sparse multidimensional arrays. In Advances\nin neural information processing systems, pages 4340\u20134351, 2017.\n\n[9] Emmanuel Caruyer and Rachid Deriche. Diffusion MRI signal reconstruction with continuity\n\nconstraint and optimal regularization. Medical image analysis, 2012.\n\n[10] A Cichocki, R Zdunek, A H Phan, and S Amari. Nonnegative Matrix and Tensor Factorizations:\nApplications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley,\n2009.\n\n[11] Andrzej Cichocki, Danilo P Mandic, Lieven De Lathauwer, Guoxu Zhou, Qibin Zhao, Cesar F\nCaiafa, and Anh Huy Phan. Tensor Decompositions for Signal Processing Applications: From\ntwo-way to multiway component analysis. IEEE Signal Process. Mag. (), 2015.\n\n[12] Alessandro Daducci, Muhamed Barakovic, Gabriel Girard, Maxime Descoteaux, and Jean-\nPhilippe Thiran. Reducing false positives in tractography with microstructural and anatomical\npriors. Technical report, 2018.\n\n[13] Abhimanyu Das and David Kempe. Submodular meets Spectral: Greedy Algorithms for Subset\nSelection, Sparse Approximation and Dictionary Selection. In International Conference on\nMachine Learning, 2011.\n\n[14] Martijn Heuvel and Olaf Sporns. Rich-club organization of the human connectome. The Journal\nof neuroscience : the of\ufb01cial journal of the Society for Neuroscience, 31:15775\u201386, 11 2011.\n\n[15] TG Kolda and BW Bader. Tensor decompositions and applications. SIAM Review, 51(3):455\u2013\n\n500, 2009.\n\n[16] Morten M\u00f8rup, Lars Kai Hansen, and Sidse Marie Arnfred. Algorithms for Sparse Nonnegative\n\nTucker Decompositions. Neural Computation, 2008.\n\n[17] Peter F Neher, Michael G\u00f6tz, Tobias Norajitra, Christian Weber, and Klaus H Maier-Hein. A\nmachine learning based approach to \ufb01ber tractography using classi\ufb01er voting. In International\nConference on Medical Image Computing and Computer-Assisted Intervention, pages 45\u201352.\nSpringer, 2015.\n\n[18] Martin Ohlson, M Rauf Ahmad, and Dietrich von Rosen. The multilinear normal distribution:\n\nIntroduction and some basic properties. J. Multivariate Analysis (), 2013.\n\n[19] Yagyensh Chandra Pati, Ramin Rezaiifar, and Perinkulam Sambamurthy Krishnaprasad. Or-\nthogonal matching pursuit: Recursive function approximation with applications to wavelet\ndecomposition. In Proceedings of 27th Asilomar conference on signals, systems and computers,\npages 40\u201344. IEEE, 1993.\n\n[20] Franco Pestilli, Jason D Yeatman, Ariel Rokem, Kendrick N Kay, and Brian A Wandell.\nEvaluation and statistical inference for human connectomes. Nature Methods, 11(10):1058\u2013\n1063, September 2014.\n\n10\n\n\f[21] Franco Pestilli, Jason D Yeatman, Ariel Rokem, Kendrick N Kay, and Brian A Wandell.\nEvaluation and statistical inference for human connectomes. Nature methods, 11(10):1058,\n2014.\n\n[22] Philippe Poulin, Marc-Alexandre Cote, Jean-Christophe Houde, Laurent Petit, Peter F Neher,\nKlaus H Maier-Hein, Hugo Larochelle, and Maxime Descoteaux. Learn to track: Deep learning\nfor tractography. In International Conference on Medical Image Computing and Computer-\nAssisted Intervention, pages 540\u2013547. Springer, 2017.\n\n[23] Ariel Rokem, Jason D Yeatman, Franco Pestilli, Kendrick N Kay, Aviv Mezer, Stefan van der\nWalt, and Brian A Wandell. Evaluating the accuracy of diffusion MRI models in white matter.\nPLoS ONE, 10(4):e0123272, April 2015.\n\n[24] R Rubinstein, M Zibulevsky, and M Elad. Ef\ufb01cient implementation of the K-SVD algorithm\n\nusing batch orthogonal matching pursuit. Technical report, 2008.\n\n[25] Sara Soltani, Misha Elena Kilmer, and Per Christian Hansen. A Tensor-Based Dictionary\n\nLearning Approach to Tomographic Image Reconstruction. CoRR abs/1506.04954, 2015.\n\n[26] Grzegorz Swirszcz, Naoki Abe, and Aurelie C Lozano. Grouped Orthogonal Matching Pursuit\nfor Variable Selection and Prediction. Advances in Neural Information Processing Systems,\n2009.\n\n[27] J-Donald Tournier, Fernando Calamante, and Alan Connelly. MRtrix: Diffusion tractography in\ncrossing \ufb01ber regions. International Journal of Imaging Systems and Technology, 22(1):53\u201366,\nFebruary 2012.\n\n[28] J-Donald Tournier, Fernando Calamante, David G Gadian, and Alan Connelly. Direct estimation\nof the \ufb01ber orientation density function from diffusion-weighted MRI data using spherical\ndeconvolution. NeuroImage, 23(3):1176\u20131185, November 2004.\n\n[29] Martijn P Van den Heuvel, Edward T Bullmore, and Olaf Sporns. Comparative connectomics.\n\nTrends in cognitive sciences, 20(5):345\u2013361, 2016.\n\n[30] Brian A Wandell. Clarifying Human White Matter. Annual Review of Neuroscience, 39(1):103\u2013\n\n128, July 2016.\n\n[31] Zhen James Xiang, Hao Xu, and Peter J Ramadge. Learning Sparse Representations of High\nDimensional Data on Large Scale Dictionaries. Advances in Neural Information Processing\nSystems, 2011.\n\n[32] Yangyang Xu and Wotao Yin. A block coordinate descent method for regularized multiconvex\noptimization with applications to nonnegative tensor factorization and completion. SIAM J.\nImaging Sciences, 2013.\n\n[33] Zemin Zhang and Shuchin Aeron. Denoising and Completion of 3D Data via Multidimensional\n\nDictionary Learning. CoRR abs/1202.6504, 2015.\n\n[34] Syed Zubair and Wenwu Wang. Tensor dictionary learning with sparse TUCKER decomposition.\n\nDSP, 2013.\n\n11\n\n\f", "award": [], "sourceid": 4758, "authors": [{"given_name": "Farzane", "family_name": "Aminmansour", "institution": "University of Alberta"}, {"given_name": "Andrew", "family_name": "Patterson", "institution": "University of Alberta"}, {"given_name": "Lei", "family_name": "Le", "institution": "Amazon"}, {"given_name": "Yisu", "family_name": "Peng", "institution": "Northeastern University"}, {"given_name": "Daniel", "family_name": "Mitchell", "institution": "University of Alberta"}, {"given_name": "Franco", "family_name": "Pestilli", "institution": "Indiana University"}, {"given_name": "Cesar", "family_name": "Caiafa", "institution": "CONICET/RIKEN AIP"}, {"given_name": "Russell", "family_name": "Greiner", "institution": "University of Alberta"}, {"given_name": "Martha", "family_name": "White", "institution": "University of Alberta"}]}