{"title": "High resolution neural connectivity from incomplete tracing data using nonnegative spline regression", "book": "Advances in Neural Information Processing Systems", "page_first": 3099, "page_last": 3107, "abstract": "Whole-brain neural connectivity data are now available from viral tracing experiments, which reveal the connections between a source injection site and elsewhere in the brain. These hold the promise of revealing spatial patterns of connectivity throughout the mammalian brain. To achieve this goal, we seek to fit a weighted, nonnegative adjacency matrix among 100 \u03bcm brain \u201cvoxels\u201d using viral tracer data. Despite a multi-year experimental effort, injections provide incomplete coverage, and the number of voxels in our data is orders of magnitude larger than the number of injections, making the problem severely underdetermined. Furthermore, projection data are missing within the injection site because local connections there are not separable from the injection signal. We use a novel machine-learning algorithm to meet these challenges and develop a spatially explicit, voxel-scale connectivity map of the mouse visual system. Our method combines three features: a matrix completion loss for missing data, a smoothing spline penalty to regularize the problem, and (optionally) a low rank factorization. We demonstrate the consistency of our estimator using synthetic data and then apply it to newly available Allen Mouse Brain Connectivity Atlas data for the visual system. Our algorithm is significantly more predictive than current state of the art approaches which assume regions to be homogeneous. We demonstrate the efficacy of a low rank version on visual cortex data and discuss the possibility of extending this to a whole-brain connectivity matrix at the voxel scale.", "full_text": "High resolution neural connectivity from incomplete\n\ntracing data using nonnegative spline regression\n\nKameron Decker Harris\n\nApplied Mathematics, U. of Washington\n\nkamdh@uw.edu\n\nStefan Mihalas\n\nAllen Institute for Brain Science\n\nApplied Mathematics, U. of Washington\n\nstefanm@alleninstitute.org\n\nEric Shea-Brown\n\nApplied Mathematics, U. of Washington\n\nAllen Institute for Brain Science\n\netsb@uw.edu\n\nAbstract\n\nWhole-brain neural connectivity data are now available from viral tracing experi-\nments, which reveal the connections between a source injection site and elsewhere\nin the brain. These hold the promise of revealing spatial patterns of connectivity\nthroughout the mammalian brain. To achieve this goal, we seek to \ufb01t a weighted,\nnonnegative adjacency matrix among 100 \u00b5m brain \u201cvoxels\u201d using viral tracer data.\nDespite a multi-year experimental effort, injections provide incomplete coverage,\nand the number of voxels in our data is orders of magnitude larger than the number\nof injections, making the problem severely underdetermined. Furthermore, projec-\ntion data are missing within the injection site because local connections there are\nnot separable from the injection signal.\nWe use a novel machine-learning algorithm to meet these challenges and develop a\nspatially explicit, voxel-scale connectivity map of the mouse visual system. Our\nmethod combines three features: a matrix completion loss for missing data, a\nsmoothing spline penalty to regularize the problem, and (optionally) a low rank\nfactorization. We demonstrate the consistency of our estimator using synthetic data\nand then apply it to newly available Allen Mouse Brain Connectivity Atlas data for\nthe visual system. Our algorithm is signi\ufb01cantly more predictive than current state\nof the art approaches which assume regions to be homogeneous. We demonstrate\nthe ef\ufb01cacy of a low rank version on visual cortex data and discuss the possibility\nof extending this to a whole-brain connectivity matrix at the voxel scale.\n\n1\n\nIntroduction\n\nAlthough the study of neural connectivity is over a century old, starting with pioneering neuroscientists\nwho identi\ufb01ed the importance of networks for determining brain function, most knowledge of\nanatomical neural network structure is limited to either detailed description of small subsystems\n[2, 9, 14, 26] or to averaged connectivity between larger regions [7, 21]. We focus our attention\non spatial, structural connectivity at the mesoscale: a coarser scale than that of single neurons or\ncortical columns but \ufb01ner than whole brain regions. Thanks to the development of new tracing\ntechniques, image processing algorithms, and high-throughput methods, data at this resolution are\nnow accessible in animals such as the \ufb02y [12, 19] and mouse [15, 18]. We present a novel regression\ntechnique tailored to the challenges of learning spatially re\ufb01ned mesoscale connectivity from neural\ntracing experiments. We have designed this technique with neural data in mind and will use this\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fFigure 1: A, We seek to \ufb01t a matrix W which reproduces neural tracing experiments. Each column\nof W represents the expected signal in target voxels given an injection of one unit into a single source\nvoxel. B, In the work of Oh et al. [18], a regionally homogeneous connectivity matrix was \ufb01t using\na prede\ufb01ned regional parcellation to constrain the problem. We propose that smoothness of W is\na better prior. C, The mouse\u2019s visual \ufb01eld can be represented in azimuth/altitude coordinates. This\nrepresentation is maintained in the retinotopy, a smoothly varying map replicated in many visual\nareas (e.g. [8]). D, Assuming locations in VISp (the primary visual area) project most strongly to\npositions which represent the same retinotopic coordinates in a secondary visual area, then we expect\nthe mapping between upstream and downstream visual areas to be smooth.\n\nlanguage to describe our method, but it is a general technique to assimilate spatial network data or\ninfer smooth kernels of integral equations. Obtaining a spatially-resolved mesoscale connectome will\nreveal detailed features of connectivity, for example unlocking cell-type speci\ufb01c connectivity and\nmicrocircuit organization throughout the brain [13].\nIn mesoscale anterograde tracing experiments, a tracer virus is \ufb01rst injected into the brain. This\ninfects neurons primarily at their cell bodies and dendrites and causes them to express a \ufb02uorescent\nprotein in their cytoplasm, including in their axons. Neurons originating in the source injection site\nare then imaged to reveal their axonal projections throughout the brain. Combining many experiments\nwith different sources then reveals the pathways that connect those sources throughout the brain. This\nrequires combining data across multiple animals, which appears justi\ufb01ed at the mesoscale [18].\nWe assume there exists some underlying nonnegative, weighted adjacency matrix W (cid:23) 0 that is\ncommon across animals. Each experiment can be thought of as an injection x, and its projections\ny, so that y \u2248 W x as in Fig. 1A. Uncovering the unknown W from multiple experiments (xi, yi)\nfor i = 1, . . . , ninj is then a multivariate regression problem: Each xi is an image of the brain which\nrepresents the strength of the signal within the injection site. Likewise, every yi is an image of the\nstrength of signal elsewhere, which arises due to the axonal projections of neurons with cell bodies\nin the injection site. The unknown matrix W is a linear operator which takes images of the brain\n(injections) and returns images of the brain (projections).\nIn a previous paper, Oh et al. [18] were able to obtain a 213 \u00d7 213 regional weight matrix using 469\nexperiments with mice (Fig. 1B). They used nonnegative least squares to \ufb01nd the unknown regional\nweights in an overdetermined regression problem. Our aim is to obtain a much higher-resolution\nconnectivity map on the scale of voxels, and this introduces many more challenges.\nFirst, the number of voxels in the brain is much larger than the number of injection experiments we\ncan expect to perform; for mouse with 100 \u00b5m voxels this is O(105) versus O(103) [15, 18]. Also, the\ninjections that are performed will inevitably leave gaps in their coverage of the brain. Thus specifying\nW is underdetermined. Second, there is no way to separately image the injections and projections. In\norder to construct them, experimenters image the brain once by serial tomography and \ufb02uorescence\n\n2\n\n++++VISp\fmicroscopy. The injection sites can be annotated by \ufb01nding infected cell bodies, but there is no way to\ndisambiguate \ufb02uorescence from the cell bodies and dendrites from that of local injections. Projection\nstrength is thus unknown within the injection sites and the neighborhood occupied by dendrites.\nThird, \ufb01tting full-brain voxel-wise connectivity is challenging since the number of elements in W is\nthe square of the number of voxels in the brain. Thus we need compressed representations of W as\nwell as ef\ufb01cient algorithms to perform inference. The paper proceeds as follows.\nIn Section 2, we describe our assumption that the mesoscale connectivity W is smoothly-varying in\nspace, as could be expected from to the presence of topographic maps across much of cortex. Later,\nwe show that using this assumption as a prior yields connectivity maps with improved cross-validation\nperformance.\nIn Section 3, we present an inference algorithm designed to tackle the dif\ufb01culties of underdetermina-\ntion, missing data, and size of the unknown W . To deal with the gaps and ill-conditioning, we use\nsmoothness as a regularization on W . We take an agnostic approach, similar to matrix completion\n[5], to the missing projection data and use a regression loss function that ignores residuals within the\ninjection site. Finally, we present a low rank version of the estimator that will allow us to scale to\nlarge matrices.\nIn Section 4, we test our method on synthetic data and show that it performs well for sparse data that\nis consistent with the regression priors. This provides evidence that it is a consistent estimator. We\ndemonstrate the necessity of both the matrix completion and smoothing terms for good reconstruction.\nIn Section 5, we then apply the spline-smoothing method to recently available Allen Institute for\nBrain Science (Allen Institute) connectivity data from mouse visual cortex [15, 18]. We \ufb01nd that our\nmethod is able to outperform current spatially uniform regional models, with signi\ufb01cantly reduced\ncross-validation errors. We also \ufb01nd that a low rank version is able to achieve approximately 23\u00d7\ncompression of the original data, with the optimal solution very close to the full rank optimum. Our\nmethod is a superior predictor to the existing regional model for visual system data, and the success\nof the low rank version suggests that this approach will be able to reveal whole-brain structural\nconnectivity at unprecedented scale.\nAll of our supplemental material and data processing and optimization code is available for download\nfrom: https://github.com/kharris/high-res-connectivity-nips-2016.\n\n2 Spatial smoothness of mesoscale connectivity\n\nThe visual cortex is a collection of relatively large cortical areas in the posterior part of the mammalian\nbrain. Visual stimuli sensed in the retina are relayed through the thalamus into primary visual cortex\n(VISp), which projects to higher visual areas. We know this partly due to tracing projections between\nthese areas, but also because neurons in the early visual areas respond to visual stimuli in a localized\nregion of the visual \ufb01eld called their receptive \ufb01elds [11].\nAn interesting and important feature of visual cortex is the presence of topographic maps of the visual\n\ufb01eld called the retinotopy [6, 8, 10, 20, 25]. Each eye sees a 2-D image of the world, where two\ncoordinates, such as azimuth and altitude, de\ufb01ne a point in the visual \ufb01eld (Fig. 1C). Retinotopy\nrefers to the fact that cells are organized in cortical space by the position of their receptive \ufb01elds;\nnearby cells have similar receptive \ufb01eld positions. Furthermore, these retinotopic maps reoccur in\nmultiple visual areas, albeit with varying orientation and magni\ufb01cation.\nRetinotopy in other areas downstream from VISp, which do not receive many projections directly\nfrom thalamus, are likely a function of projections from VISp. It is reasonable to assume that areas\nwhich code for similar visual locations are most strongly connected. Then, because retinotopy is\nsmoothly varying in cortical space and similar retinotopic coordinates are the most strongly connected\nbetween visual areas, the connections between those areas should be smooth in cortical space (Fig. 1C\nand D).\nRetinotopy is a speci\ufb01c example of topography, which extends to other sensory systems such as\nauditory and somatosensory cortex [22]. For this reason, connectivity may be spatially smooth\nthroughout the brain, at least at the mesoscale. This idea can be evaluated via the methods we\nintroduce below: if a smooth model is more predictive of held-out data than another model, then this\nsupports the assumption.\n\n3\n\n\f3 Nonnegative spline regression with incomplete tracing data\nWe consider the problem of \ufb01tting an adjacency operator W : T \u00d7 S \u2192 R+ to data arising from ninj\ninjections into a source space S which projects to a target space T . Here S and T are compact subsets\nof the brain, itself a compact subset of R3. In this mathematical setting, S and T could be arbitrary\nsets, but typically S = T for the ipsilateral data we present here.1 The source S and target T are\ndiscretized into nx and ny cubic voxels, respectively. The discretization of W is then an adjacency\nmatrix W \u2208 Rny\u00d7nx\n+ and\nyi \u2208 Rny\n+ , the source and target tracer signals at each voxel for experiments i = 1, . . . , ninj. We\nwould like to \ufb01t a linear model, a matrix W such that yi \u2248 W xi. We assume an observation model\n\n. Mathematically, we de\ufb01ne the tracing data as a set of pairs xi \u2208 Rnx\n\n+\n\nyi = W xi + \u03b7i\n\niid\u223c N (0, \u03c32I) multivariate Gaussian random variables with zero mean and covariance matrix\nwith \u03b7i\n\u03c32I \u2208 Rny\u00d7ny. The true data are not entirely linear, due to saturation effects of the \ufb02uorescence\nsignal, but the linear model provides a tractable way of \u201ccredit assignment\u201d of individual source\nvoxels\u2019 contributions to the target signal [18].\nFinally, we assume that the target projections are unknown within the injection site. In other words, we\nonly know yj outside the support of xj, which we denote supp xj, and we wish to only evaluate error\nfor the observable voxels. Let \u2126 \u2208 Rny\u00d7ninj, where the jth column \u2126j = 1 \u2212 1supp xj , the indicator\nof the complement of the support. We de\ufb01ne the orthogonal projector P\u2126 : Rny\u00d7ninj \u2192 Rny\u00d7ninj\nas P\u2126(A) = A \u25e6 \u2126, the entrywise product of A and \u2126. This operator zeros elements of A which\ncorrespond to the voxels within each experiment\u2019s injection site. The operator P\u2126 is similar to what\nis used in matrix completion [5], here in the context of regression rather than recovery.\nThese assumptions lead to a loss function which is the familiar (cid:96)2-loss applied to the projected\nresiduals:\n\nwhere Y =(cid:2)y1, . . . , yninj\n\n1\n\n(cid:3) and X =(cid:2)x1, . . . , xninj\n\n\u03c32ninj\n\n(cid:107)P\u2126(W X \u2212 Y )(cid:107)2\n\n(cid:3) are data matrices. Here (cid:107) \u00b7 (cid:107)F is the Frobenius\n\n(1)\n\nF\n\nnorm, i.e. the (cid:96)2-norm of the matrix as a vector: (cid:107)A(cid:107)F = (cid:107)vec(A)(cid:107)2, where vec(A) takes a matrix\nand converts it to a vector by stacking consecutive columns.\nWe next construct a regularization penalty. The matrix W represents the spatial discretization of\na two-point kernel W. An important assumption for W is that it is spatially smooth. Function\nspace norms of the derivatives of W, viewed as a real-valued function on T \u00d7 S, are a natural way\nto measure the roughness of this function. For this study, we chose the squared L2-norm of the\nLaplacian\n\n(cid:90)\n\n|\u2206W|2 dydx,\n\nT\u00d7S\n\n2 =(cid:13)(cid:13)LyW + W LT\n\nx\n\n(cid:13)(cid:13)2\n\nwhich is called the thin plate spline bending energy [24]. In the discrete setting, this becomes the\nsquared (cid:96)2-norm of a discrete Laplacian applied to W :\n\n(cid:107)L vec(W )(cid:107)2\n\nF .\n\n(2)\nThe operator L : Rnynx \u2192 Rnynx is the discrete Laplacian operator or second \ufb01nite difference\nmatrix on T \u00d7 S. The equality in Eqn. (2) results from the fact that the Laplacian on the product\nspace T \u00d7 S can be decomposed as L = Lx \u2297 Iny + Inx \u2297 Ly [17]. Using the well-known Kronecker\nproduct identity for linear matrix equations\n\n(cid:0)BT \u2297 A(cid:1) vec(X) = vec(Y ) \u21d0\u21d2 AXB = Y\n\n(3)\ngives the result in Eqn. (2) [23], which allows us to ef\ufb01ciently evaluate the Laplacian action. As for\nboundary conditions, we do not want to impose any particular values at the boundary, so we choose\nthe \ufb01nite difference matrix corresponding to a homogeneous Neumann (zero derivative) boundary\ncondition. 2\n\n1Ipsilateral refers to connections within the same cerebral hemisphere. For contralateral (opposite hemisphere)\n\nconnectivity, S and T are disjoint subsets of the brain corresponding to the two hemispheres.\n\n2It is straightforward to avoid smoothing across region boundaries by imposing Neumann boundary conditions\n\nat the boundaries; this is an option in our code available online.\n\n4\n\n\fCombining the loss and penalty terms, Eqn. (1) and (2), gives a convex optimization problem for\ninferring the connectivity:\n\n(cid:13)(cid:13)LyW + W LT\n\nx\n\n(cid:13)(cid:13)2\n\nF .\n\n(P1)\n\nW \u2217 = arg min\nW(cid:23)0\n\n(cid:107)P\u2126(W X \u2212 Y )(cid:107)2\n\nF + \u03bb\n\nninj\nnx\n\nIn the \ufb01nal form, we absorb the noise variance \u03c32 into the regularization hyperparameter \u03bb and\nrescale the penalty so that it has the same dependence on the problem size nx, ny, and ninj as the loss.\nWe solve the optimization (P1) using the L-BFGS-B projected quasi-Newton method, implemented\nin C++ [3, 4]. The gradient is ef\ufb01ciently computed using matrix algebra.\nNote that (P1) is a type of nonnegative least squares problem, since we can use Eqn. (3) to convert it\ninto\n\nwhere A = diag (vec(\u2126))(cid:0)X T \u2297 Iny\n\nw\u2217 = arg min\nw(cid:23)0\n\n(cid:107)Aw \u2212 y(cid:107)2\n\n(cid:1), y = diag (vec(\u2126)) vec(Y ), and w = vec(W ). Furthermore,\n\n(cid:107)L w(cid:107)2\n2 ,\n\nninj\nnx\n\nwithout the nonnegativity constraint the estimator is linear and has an explicit solution. However, the\ndesign matrix A will have dimension (nyninj) \u00d7 (nynx), with O(ny\n3ninj) entries if nx = O(ny).\nThe dimensionality of the problem prevents us from working directly in the tensor product space.\nAnd since the model is a structured matrix regression problem [1], the usual representer theorems\n[24], which reduce the dimensionality of the estimator to effectively the number of data points, do\nnot immediately apply. However, we hope to elucidate the connection to reproducing kernel Hilbert\nspaces in future work.\n\n2 + \u03bb\n\n3.1 Low rank version\n\nThe largest object in our problem is the unknown connectivity W , since in the underconstrained\nsetting ninj (cid:28) nx, ny. In order to improve the scaling of our problem with the number of voxels, we\nreformulate it with a compressed version of W :\n\n(cid:13)(cid:13)LyU V T + U V T LT\n\nx\n\n(cid:13)(cid:13)2\n\nF .\n\n(P2)\n\n(U\u2217, V \u2217) = arg min\nU,V (cid:23)0\n\n(cid:107)P\u2126(U V T X \u2212 Y )(cid:107)2\n\nF + \u03bb\n\nninj\nnx\n\n+\n\n+\n\nand V \u2208 Rnx\u00d7r\n\nHere, U \u2208 Rny\u00d7r\nfor some \ufb01xed rank r, so that the optimal connectivity W \u2217 =\nU\u2217V \u2217T is given in low rank, factored form. Note that we use nonnegative factors rather than constrain\nU V T (cid:23) 0, since this is a nonlinear constraint.\nThis has the advantage of automatically computing a nonnegative matrix factorization (NMF) of\nW . The NMF is of separate scienti\ufb01c interest, to be pursued in future work, since it decomposes\nthe connectivity into a relatively small number of projection patterns, which has interpretations as a\nclustering of the connectivity itself.\nIn going from the full rank problem (P1) to the low rank version (P2), we lose convexity. So the\nusual optimization methods are not guaranteed to \ufb01nd a global optimimum, and the clustering just\nmentioned is not unique. However, we have also reduced the size of the unknowns to the potentially\nmuch smaller matrices U and V , if r (cid:28) ny, nx. If nx = O(ny), we have only O(nyr) unknowns\ninstead of O(ny\n2). Evaluating the penalty term still requires computation of nynx terms, but this can\nbe performed without storing them in memory.\nWe use a simple projected gradient method with Nesterov acceleration in Matlab to \ufb01nd a local\noptimum for (P2) [3], and will present and compare these results to the solution of (P1) below. As\nbefore, computing the gradients is ef\ufb01cient using matrix algebra. This method has been used before\nfor NMF [16].\n\n4 Test problem\n\nWe next apply our algorithms to a test problem consisting of a one-dimensional \u201cbrain,\u201d where the\nsource and target space S = T = [0, 1]. The true connectivity kernel corresponds to a Gaussian\npro\ufb01le about the diagonal plus a bump:\n\n(cid:40)\n\n+ 0.9 exp\n\n5\n\n(cid:18) x \u2212 y\n\n(cid:19)2(cid:41)\n\n(cid:40)\n\n\u2212\n\nWtrue(x, y) = exp\n\n0.4\n\n\u2212 (x \u2212 0.8)2 + (y \u2212 0.1)2\n\n(0.2)2\n\n(cid:41)\n\n.\n\n\fFigure 2: Comparison of the true (far left) and inferred connectivity from 5 injections. Unless noted,\n\u03bb = 100. Second from left, we show the what happens when we solve (P1) without the matrix\ncompletion term P\u2126. The holes in the projection data cause patchy and incorrect output. Note the\ncolorbar range is 6\u00d7 that in the other cases. Second from right is the result with P\u2126 but without\nregularization, solving (P1) for \u03bb = 0. There, the solution does not interpolate between injections.\nFar right is a rank r = 20 result using (P2), which captures the diagonal band and off-diagonal\nbump that make up Wtrue. In this case, the low rank result has less relative error (9.6%) than the full\nrank result (11.1%, not shown).\n\nSee the left panel of Fig. 2. The input and output spaces were discretized using nx = ny = 200\npoints. Injections are delivered at random locations within S, with a width of 0.12 + 0.1\u0001 where\n\u0001 \u223c Uniform(0, 1). The values of x are set to 1 within the injection region and 0 elsewhere, y is\nset to 0 within the injection region, and we take noise level \u03c3 = 0.1. The matrices Lx = Ly are the\n5-point \ufb01nite difference Laplacians for the rectangular lattice.\nExample output of (P1) and (P2) is given for 5 injections in Fig. 2. Unless stated otherwise, \u03bb = 100.\nThe injections, depicted as black bars in the bottom of each sub-\ufb01gure, do not cover the whole space\nS but do provide good coverage of the bump, otherwise there is no information about that feature.\nWe depict the result of the full rank algorithm (P1) without the matrix completion term P\u2126, the result\nincluding P\u2126 but without smoothing (\u03bb = 0), and the result of (P2) with rank r = 20. The full rank\nsolution is not shown, but is similar to the low rank one.\nFigure 2 shows the necessity of each term within the algorithm. Leaving out the matrix completion\nP\u2126 leads to dramatically biased output since the algorithm uses incorrect values ysupp(x) = 0. If we\ninclude P\u2126 but neglect the smoothing term by setting \u03bb = 0, we also get incorrect output: without\nsmoothing, the algorithm cannot \ufb01ll in the injection site holes nor can it interpolate between injections.\nHowever, the low rank result accurately approximates the true connectivity Wtrue, including the\ndiagonal pro\ufb01le and bump, achieving 9.6% relative error measured as (cid:107)W \u2217 \u2212 Wtrue(cid:107)F /(cid:107)Wtrue(cid:107)F .\nThe full rank version is similar, but in fact has slightly higher 11.1% relative error.\n\n5 Finding a voxel-scale connectivity map for mouse cortex\n\nWe next apply our method to the latest data from the Allen Institute Mouse Brain Connectivity Atlas,\nobtained with the API at http://connectivity.brain-map.org. Brie\ufb02y, in each experiment\nmice were injected with adeno-associated virus expressing a \ufb02uorescent protein. The virus infects\nneurons in the injection site, causing them to produce the protein, which is transported throughout\nthe axonal and dendritic processes. The mouse brains for each experiment were then sliced, imaged,\nand aligned onto the common coordinates in the Allen Reference Atlas version 3 [15, 18]. These\ncoordinates divide the brain volume into 100 \u00b5m \u00d7 100 \u00b5m \u00d7 100 \u00b5m voxels, with approximately\n5 \u00d7 105 voxels in the whole brain. The \ufb02uorescent pixels in each aligned image were segmented\nfrom the background, and we use the fraction of segmented versus total pixels in a voxel to build the\nvectors x and y. Since cortical dendrites project locally, the signal outside the injection site is mostly\naxonal, and so the method reveals anterograde axonal projections from the injection site.\nFrom this dataset, we selected 28 experiments which have 95% of their injection volumes contained\nwithin the visual cortex (atlas regions VISal, VISam, VISl, VISp, VISpl, VISpm, VISli, VISpor,\nVISrl, and VISa) and injection volume less than 0.7 mm3. For this study, we present only the results\nfor ipsilateral connectivity, where S = T and nx = ny = 7497. To compute the smoothing penalty,\nwe used the 7-point \ufb01nite-difference Laplacian on the cubic voxel lattice.\n\n6\n\nWtrue 00.20.40.60.8100.10.20.30.40.50.60.70.80.9100.10.20.30.40.50.60.70.80.91Wfull without \u2126 00.20.40.60.81\u22120.100.10.20.30.40.50.60.70.80.90123456Wfull with \u03bb=0 00.20.40.60.81\u22120.100.10.20.30.40.50.60.70.80.900.10.20.30.40.50.60.70.80.91Wrank 20 00.20.40.60.81\u22120.100.10.20.30.40.50.60.70.80.900.10.20.30.40.50.60.70.80.91\fModel\nRegional\n\nVoxel\n\nVoxel MSErel Regional MSErel\n107% (70%)\n33% (10%)\n\n48% (6.8%)\n16% (2.3%)\n\nTable 1: Model performance on Allen Institute Mouse Brain Connectivity Atlas data. Cross-validation\nerrors of the voxel model (P1) and regionally homogeneous models are shown, with training errors\nin parentheses. The errors are computed in both voxel space and regional space, using the relative\nmean squared error MSErel, Eqn. (4). In either space, the voxel model shows reduced training and\ncross-validation errors relative to the regional model.\n\nIn order to evaluate the performance of the estimator, we employ nested cross-validation with 5 inner\nand outer folds. The full rank estimator (P1) was \ufb01t for \u03bb = 103, 104, . . . , 1012 on the training data.\nUsing the validation data, we then selected the \u03bbopt that minimized the mean square error relative to\nthe average squared norm of the prediction W X and truth Y , evaluating errors outside the injection\nsites:\n\n2(cid:107)P\u2126(W X \u2212 Y )(cid:107)2\n\nF\n\n.\n\nF\n\nMSErel =\n\n(cid:107)P\u2126(W X)(cid:107)2\n\nF + (cid:107)P\u2126(Y )(cid:107)2\n\n(4)\nThis choice of normalization prevents experiments with small (cid:107)Y (cid:107) from dominating the error. This\nerror metric as well as the (cid:96)2-loss adopted in Eqn. (P1) both more heavily weight the experiments\nwith larger signal. After selection of \u03bbopt, the model was re\ufb01t to the combined training and validation\ndata. In our dataset, \u03bbopt = 105 was selected for all outer folds. The \ufb01nal errors were computed\nwith the test datasets in each outer fold. For comparison, we also \ufb01t a regional model within the\ncross-validation framework, using nonnegative least squares. To do this, similar to the study by Oh\net al. [18], we constrained the connectivity Wkl = WRiRj to be constant for all voxels k in region Ri\nand l in region Rj.\nThe results are shown in Table 1. Errors were computed according to both voxels and regions. For the\nlatter, we integrated the residual over voxels within the regions before computing the error. The voxel\nmodel is more predictive of held-out data than the regional model, reducing the voxel and regional\nMSErel by 69% and 67%, respectively. The regional model is designed for inter-region connectivity.\nTo allow an easier comparison with the voxel model, we here include within region connections. We\n\ufb01nd that the regional model is a poor predictor of voxel scale projections, with over 100% relative\nvoxel error, but it performs okay at the regional scale. The training errors, which re\ufb02ect goodness of\n\ufb01t, were also reduced signi\ufb01cantly with the voxel model. We conclude that the more \ufb02exible voxel\nmodel is a better estimator for these Allen Institute data, since it improves both the \ufb01ts to training\ndata as well as cross-validation skill.\nThe inferred visual connectivity also exhibits a number of features that we expect. There are strong\nlocal projections (similar to the diagonal in the test problem, Fig. 2) along with spatially organized\nprojections to higher visual areas. See Fig. 3, which shows example projections from source voxels\nwithin VISp. These are just two of 7497 voxels in the full matrix, and we depict only a 2-D\nprojection of 3-D images. The connectivity exhibits strong local projections, which must be \ufb01lled in\nby the smoothing since within the injection sites the projection data are unknown; it is surprising\nhow well the algorithm does at capturing short-range connectivity that is translationally invariant.\nThere are also long-range bumps in the higher visual areas, medial and lateral, which move with\nthe source voxel. This is a result of retinotopic maps between VISp and downstream areas. The\nsupplementary material presents a view of this high-dimensional matrix in movie form, allowing\none to see the varying projections as the seed voxel moves. We encourage the reader to view the\nsupplemental movies, where movement of bumps in downstream regions hints at the underlying\nretinotopy: https://github.com/kharris/high-res-connectivity-nips-2016.\n\n5.1 Low rank inference successfully approximates full rank solution for visual system\n\nWe next use these visual system data, for which the full rank solution was computed, to test whether\nthe low rank approximation can be applied. This is an important stepping stone to an eventual\ninference of spatial connectivity for the full brain.\nFirst, we note that the singular value spectrum of the \ufb01tted W \u2217\nfull (now using all 28 injections and\n\u03bb = 105) is heavily skewed: 95% of the energy can be captured with 21 of 7497 components, and\n99% with 67 components. However, this does not directly imply that a nonnegative factorization will\n\n7\n\n\fFigure 3: Inferred connectivity using all 28 selected injections from visual system data. Left,\nProjections from a source voxel (blue) located in VISp to all other voxels in the visual areas. The\nview is integrated over the superior-inferior axis. The connectivity shows strong local connections\nand weaker connections to higher areas, in particular VISam, VISal, and VISl. Movies of the inferred\nconnectivity (full, low rank, and the low rank residual) for varying source voxel are available in\nthe supplementary material. Center, For a source 800 \u00b5m voxels away, the pattern of anterograde\nprojections is similar, but the distal projection centers are shifted, as expected from retinotopy. Right,\nThe residuals between the full rank and rank 160 result from solving (P2), for the same source voxel\nas in the center. The residuals are an order of magnitude less than typical features of the connectivity.\n\nperform as well. To test this, we \ufb01t a low rank decomposition directly to all 28 visual injection data\nusing (P2) with rank r = 160 and \u03bb = 105. The output of the optimization procedure yields U\u2217 and\nV \u2217, and we \ufb01nd that the low rank output is very similar to the full result W \u2217\nfull \ufb01t to the same data\n(see also Fig. 3, which visualizes the residuals):\n\n(cid:107)U\u2217V \u2217T \u2212 W \u2217\nfull(cid:107)F\n\n(cid:107)W \u2217\n\nfull(cid:107)F\n\n= 13%.\n\nThis close approximation is despite the fact that the low rank solution achieves a roughly 23\u00d7\ncompression of the 7497 \u00d7 7497 matrix.\nAssuming similar compressibility for the whole brain, where the number of voxels is 5 \u00d7 105, would\nmean a rank of approximately 104. This is still a problem in O(109) unknowns, but these bring the\nmemory requirements of storing one matrix iterate in double precision from approximately 1.9 TB to\n75 GB, which is within reach of commonly available large memory machines.\n\n6 Conclusions\n\nWe have developed and implemented a new inference algorithm that uses modern machine learning\nideas\u2014matrix completion loss, a smoothing penalty, and low rank factorization\u2014to assimilate sparse\nconnectivity data into complete, spatially explicit connectivity maps. We have shown that this\nmethod can be applied to the latest Allen Institute data from multiple visual cortical areas, and that it\nsigni\ufb01cantly improves cross-validated predictions over the current state of the art and unveils spatial\npatterning of connectivity. Finally, we show that a low rank version of the algorithm produces very\nsimilar results on these data while compressing the connectivity map, potentially opening the door to\nthe inference of whole brain connectivity from viral tracer data at the voxel scale.\n\nAcknowledgements\n\nWe acknowledge the support of the UW NIH Training Grant in Big Data for Neuroscience and Genetics (KDH),\nBoeing Scholarship (KDH), NSF Grant DMS-1122106 and 1514743 (ESB & KDH), and a Simons Fellowship\nin Mathematics (ESB). We thank Liam Paninski for helpful insights at the outset of this project. We wish to\nthank the Allen Institute founders, Paul G. Allen and Jody Allen, for their vision, encouragement, and support.\nThis work was facilitated though the use of advanced computational, storage, and networking infrastructure\nprovided by the Hyak supercomputer system at the University of Washington.\n\nReferences\n[1] Argyriou, A., Micchelli, C. A., and Pontil, M. (2009). When Is There a Representer Theorem? Vector\n\nVersus Matrix Regularizers. J. Mach. Learn. Res., 10:2507\u20132529.\n\n8\n\n\fUSA.\n\n[2] Bock, D. D., Lee, W.-C. A., Kerlin, A. M., Andermann, M. L., Hood, G., Wetzel, A. W., Yurgenson, S.,\nSoucy, E. R., Kim, H. S., and Reid, R. C. (2011). Network anatomy and in vivo physiology of visual cortical\nneurons. Nature, 471(7337):177\u2013182.\n\n[3] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press, New York, NY,\n\n[4] Byrd, R., Lu, P., Nocedal, J., and Zhu, C. (1995). A Limited Memory Algorithm for Bound Constrained\n\nOptimization. SIAM Journal on Scienti\ufb01c Computing, 16(5):1190\u20131208.\n\n[5] Candes, E. and Tao, T. (2010). The Power of Convex Relaxation: Near-Optimal Matrix Completion. IEEE\n\nTransactions on Information Theory, 56(5):2053\u20132080.\n\n[6] Chaplin, T. A., Yu, H.-H., and Rosa, M. G. (2013). Representation of the visual \ufb01eld in the primary visual\narea of the marmoset monkey: Magni\ufb01cation factors, point-image size, and proportionality to retinal ganglion\ncell density. Journal of Comparative Neurology, 521(5):1001\u20131019.\n\n[7] Felleman, D. J. and Van Essen, D. C. (1991). Distributed Hierarchical Processing in the Primate. Cerebral\n\nCortex, 1(1):1\u201347.\n\n[8] Garrett, M. E., Nauhaus, I., Marshel, J. H., and Callaway, E. M. (2014). Topography and Areal Organization\n\nof Mouse Visual Cortex. Journal of Neuroscience, 34(37):12587\u201312600.\n\n[9] Glickfeld, L. L., Andermann, M. L., Bonin, V., and Reid, R. C. (2013). Cortico-cortical projections in\n\nmouse visual cortex are functionally target speci\ufb01c. Nature Neuroscience, 16(2).\n\n[10] Goodman, C. S. and Shatz, C. J. (1993). Developmental mechanisms that generate precise patterns of\n\nneuronal connectivity. Cell, 72, Supplement:77\u201398.\n\n[11] Hubel, D. H. and Wiesel, T. N. (1962). Receptive \ufb01elds, binocular interaction and functional architecture\n\nin the cat\u2019s visual cortex. The Journal of Physiology, 160(1):106\u2013154.2.\n\n[12] Jenett, A., Rubin, G. M., Ngo, T.-T. B., Shepherd, D., Murphy, C., Dionne, H., Pfeiffer, B. D., Cavallaro,\nA., Hall, D., Jeter, J., Iyer, N., Fetter, D., Hausen\ufb02uck, J. H., Peng, H., Trautman, E. T., Svirskas, R. R.,\nMyers, E. W., Iwinski, Z. R., Aso, Y., DePasquale, G. M., Enos, A., Hulamm, P., Lam, S. C. B., Li, H.-H.,\nLaverty, T. R., Long, F., Qu, L., Murphy, S. D., Rokicki, K., Safford, T., Shaw, K., Simpson, J. H., Sowell, A.,\nTae, S., Yu, Y., and Zugates, C. T. (2012). A GAL4-Driver Line Resource for Drosophila Neurobiology. Cell\nReports, 2(4):991\u20131001.\n\n[13] Jonas, E. and Kording, K. (2015). Automatic discovery of cell types and microcircuitry from neural\n\nconnectomics. eLife, 4:e04250.\n\n[14] Kleinfeld, D., Bharioke, A., Blinder, P., Bock, D. D., Briggman, K. L., Chklovskii, D. B., Denk, W.,\nHelmstaedter, M., Kaufhold, J. P., Lee, W.-C. A., Meyer, H. S., Micheva, K. D., Oberlaender, M., Prohaska,\nS., Reid, R. C., Smith, S. J., Takemura, S., Tsai, P. S., and Sakmann, B. (2011). Large-Scale Automated\nHistology in the Pursuit of Connectomes. The Journal of Neuroscience, 31(45):16125\u201316138.\n\n[15] Kuan, L., Li, Y., Lau, C., Feng, D., Bernard, A., Sunkin, S. M., Zeng, H., Dang, C., Hawrylycz, M., and\n\nNg, L. (2015). Neuroinformatics of the Allen Mouse Brain Connectivity Atlas. Methods, 73:4\u201317.\n\n[16] Lin, C.-J. (2007). Projected Gradient Methods for Nonnegative Matrix Factorization. Neural Computation,\n\n19(10):2756\u20132779.\n\n[17] Lynch, R. E., Rice, J. R., and Thomas, D. H. (1964). Tensor product analysis of partial difference equations.\n\nBulletin of the American Mathematical Society, 70(3):378\u2013384.\n\n[18] Oh, S. W., Harris, J. A., Ng, L., Winslow, B., Cain, N., Mihalas, S., Wang, Q., Lau, C., Kuan, L., Henry,\nA. M., Mortrud, M. T., Ouellette, B., Nguyen, T. N., Sorensen, S. A., Slaughterbeck, C. R., Wakeman, W.,\nLi, Y., Feng, D., Ho, A., Nicholas, E., Hirokawa, K. E., Bohn, P., Joines, K. M., Peng, H., Hawrylycz, M. J.,\nPhillips, J. W., Hohmann, J. G., Wohnoutka, P., Gerfen, C. R., Koch, C., Bernard, A., Dang, C., Jones, A. R.,\nand Zeng, H. (2014). A mesoscale connectome of the mouse brain. Nature, 508(7495):207\u2013214.\n\n[19] Peng, H., Tang, J., Xiao, H., Bria, A., Zhou, J., Butler, V., Zhou, Z., Gonzalez-Bellido, P. T., Oh, S. W.,\nChen, J., Mitra, A., Tsien, R. W., Zeng, H., Ascoli, G. A., Iannello, G., Hawrylycz, M., Myers, E., and Long,\nF. (2014). Virtual \ufb01nger boosts three-dimensional imaging and microsurgery as well as terabyte volume\nimage visualization and analysis. Nature Communications, 5.\n\n[20] Rosa, M. G. and Tweedale, R. (2005). Brain maps, great and small: Lessons from comparative studies of\nprimate visual cortical organization. Philosophical Transactions of the Royal Society B: Biological Sciences,\n360(1456):665\u2013691.\n\n[21] Sporns, O. (2010). Networks of the Brain. The MIT Press, 1st edition.\n[22] Udin, S. B. and Fawcett, J. W. (1988). Formation of Topographic Maps. Annual Review of Neuroscience,\n\n[23] Van Loan, C. F. (2000). The ubiquitous Kronecker product. Journal of Computational and Applied\n\n[24] Wahba, G. (1990). Spline Models for Observational Data. SIAM.\n[25] Wang, Q. and Burkhalter, A. (2007). Area map of mouse visual cortex. The Journal of Comparative\n\n11(1):289\u2013327.\n\nMathematics, 123(1\u20132):85\u2013100.\n\nNeurology, 502(3):339\u2013357.\n\n[26] White, J. G., Southgate, E., Thomson, J. N., and Brenner, S. (1986). The Structure of the Nervous System\nof the Nematode Caenorhabditis elegans. Philosophical Transactions of the Royal Society of London B:\nBiological Sciences, 314(1165):1\u2013340.\n\n9\n\n\f", "award": [], "sourceid": 1544, "authors": [{"given_name": "Kameron", "family_name": "Harris", "institution": "University of Washington"}, {"given_name": "Stefan", "family_name": "Mihalas", "institution": "Allen Institute for Brain Science"}, {"given_name": "Eric", "family_name": "Shea-Brown", "institution": "University of Washington"}]}