{"title": "Neural Reconstruction with Approximate Message Passing (NeuRAMP)", "book": "Advances in Neural Information Processing Systems", "page_first": 2555, "page_last": 2563, "abstract": "Many functional descriptions of spiking neurons assume a cascade structure where inputs are passed through an initial linear filtering stage that produces a low-dimensional signal that drives subsequent nonlinear stages. This paper presents a novel and systematic parameter estimation procedure for such models and applies the method to two neural estimation problems: (i) compressed-sensing based neural mapping from multi-neuron excitation, and (ii) estimation of neural receptive yields in sensory neurons. The proposed estimation algorithm models the neurons via a graphical model and then estimates the parameters in the model using a recently-developed generalized approximate message passing (GAMP) method. The GAMP method is based on Gaussian approximations of loopy belief propagation. In the neural connectivity problem, the GAMP-based method is shown to be computational efficient, provides a more exact modeling of the sparsity, can incorporate nonlinearities in the output and significantly outperforms previous compressed-sensing methods. For the receptive field estimation, the GAMP method can also exploit inherent structured sparsity in the linear weights. The method is validated on estimation of linear nonlinear Poisson (LNP) cascade models for receptive fields of salamander retinal ganglion cells.", "full_text": "Neural Reconstruction with Approximate\n\nMessage Passing (NeuRAMP)\n\nAlyson K. Fletcher\n\nUniversity of California, Berkeley\nalyson@eecs.berkeley.edu\n\nSundeep Rangan\n\nPolytechnic Institute of New York University\n\nsrangan@poly.edu\n\nLav R. Varshney\n\nIBM Thomas J. Watson Research Center\n\nlrvarshn@us.ibm.com\n\nAniruddha Bhargava\n\nUniversity of Wisconsin Madison\n\naniruddha@wisc.edu\n\nAbstract\n\nMany functional descriptions of spiking neurons assume a cascade structure where\ninputs are passed through an initial linear \ufb01ltering stage that produces a low-\ndimensional signal that drives subsequent nonlinear stages. This paper presents a\nnovel and systematic parameter estimation procedure for such models and applies\nthe method to two neural estimation problems: (i) compressed-sensing based neu-\nral mapping from multi-neuron excitation, and (ii) estimation of neural receptive\n\ufb01elds in sensory neurons. The proposed estimation algorithm models the neu-\nrons via a graphical model and then estimates the parameters in the model using\na recently-developed generalized approximate message passing (GAMP) method.\nThe GAMP method is based on Gaussian approximations of loopy belief propa-\ngation. In the neural connectivity problem, the GAMP-based method is shown\nto be computational ef\ufb01cient, provides a more exact modeling of the sparsity,\ncan incorporate nonlinearities in the output and signi\ufb01cantly outperforms previ-\nous compressed-sensing methods. For the receptive \ufb01eld estimation, the GAMP\nmethod can also exploit inherent structured sparsity in the linear weights. The\nmethod is validated on estimation of linear nonlinear Poisson (LNP) cascade mod-\nels for receptive \ufb01elds of salamander retinal ganglion cells.\n\n1\n\nIntroduction\n\nFundamental to describing the behavior of neurons in response to sensory stimuli or to inputs from\nother neurons is the need for succinct models that can be estimated and validated with limited data.\nTowards this end, many functional models assume a cascade structure where an initial linear stage\ncombines inputs to produce a low-dimensional output for subsequent nonlinear stages. For example,\nin the widely-used linear nonlinear Poisson (LNP) model for retinal ganglion cells (RGCs) [1,2], the\ntime-varying input stimulus vector is \ufb01rst linearly \ufb01ltered and summed to produce a low (typically\none or two) dimensional output, which is then passed through a memoryless nonlinear function that\noutputs the neuron\u2019s instantaneous Poisson spike rate. An initial linear \ufb01ltering stage also appears\nin the well-known integrate-and-\ufb01re model [3]. The linear \ufb01ltering stage in these models reduces\nthe dimensionality of the parameter estimation problem and provides a simple characterization of a\nneuron\u2019s receptive \ufb01eld or connectivity.\nHowever, even with the dimensionality reduction from assuming such linear stages, parameter esti-\nmation may be dif\ufb01cult when the stimulus is high-dimensional or the \ufb01lter lengths are large. Com-\npressed sensing methods have been recently proposed [4] to reduce the dimensionality further. The\nkey insight is that although most experiments for mapping, say visual receptive \ufb01elds, expose the\n\n1\n\n\fFigure 1: Linear nonlinear Poisson (LNP) model for a neuron with n stimuli.\n\nneural system under investigation to a large number of stimulus components, the overwhelming ma-\njority of the components do not affect the instantaneous spiking rate of any one particular neuron due\nto anatomical sparsity [5, 6]. As a result, the linear weights that model the response to these stimu-\nlus components will be sparse; most of the coef\ufb01cients will be zero. For the retina, the stimulus is\ntypically a large image, whereas the receptive \ufb01eld of any individual neuron is usually only a small\nportion of that image. Similarly, for mapping cortical connectivity to determine the connectome,\neach neuron is typically only connected to a small fraction of the neurons under test [7]. Due to the\nsparsity of the weights, estimation can be performed via sparse reconstruction techniques similar to\nthose used in compressed sensing (CS) [8\u201310].\nThis paper presents a CS-based estimation of linear neuronal weights via a recently-developed gen-\neralized approximate message passing (GAMP) methods from [11] and [12]. GAMP, which builds\nupon earlier work in [13, 14], is a Gaussian approximation of loopy belief propagation. The ben-\ne\ufb01ts of the GAMP method for neural mapping are that it is computationally tractable with large\nsums of data, can incorporate very general graphical model descriptions of the neuron and provides\na method for simultaneously estimating the parameters in the linear and nonlinear stages. In con-\ntrast, methods such as the common spike-triggered average (STA) perform separate estimation of the\nlinear and nonlinear components. Following the simulation methodology in [4], we show that the\nGAMP method offers signi\ufb01cantly improved reconstruction of cortical wiring diagrams over other\nstate-of-the-art CS techniques.\nWe also validate the GAMP-based sparse estimation methodology in the problem of \ufb01tting LNP\nmodels of salamander RGCs. LNP models have been widely-used in systems modeling of the retina,\nand they have provided insights into how ganglion cells communicate to the lateral geniculate nu-\ncleus, and further upstream to the visual cortex [15]. Such understanding has also helped clarify the\ncomputational purpose of cell connectivity in the retina. The \ufb01lter shapes estimated by the GAMP\nalgorithm agree with other \ufb01ndings on RGC cells using STA methods, such as [16]. What is impor-\ntant here is that the \ufb01lter coef\ufb01cients can be estimated accurately with a much smaller number of\nmeasurements. This feature suggests that GAMP-based sparse modeling may be useful in the future\nfor other neurons and more complex models.\n\n2 Linear Nonlinear Poisson Model\n\n2.1 Mathematical Model\n\nWe consider the following simple LNP model for the spiking output of a single neuron under n\nstimulus components shown in Fig. 1, cf. [1, 2]. Inputs and outputs are measured in uniform time\nintervals t = 0, 1, . . . , T \u2212 1, and we let uj[t] denote the jth stimulus input in the tth time interval,\nj = 1, . . . , n. For example, if the stimulus is a sequence of images, n would be the number of pixels\nin each image and uj[t] would be the value of the jth pixel over time. We let y[t] denote the number\nof spikes in the tth time interval, and the general problem is to \ufb01nd a model that explains the relation\nbetween the stimuli uj[t] and spike outputs y[t].\nAs the name suggests, the LNP model is a cascade of three stages: linear, nonlinear and Poisson. In\nthe \ufb01rst (linear) stage, the input stimulus is passed through a set of n linear \ufb01lters and then summed\n\n2\n\nLinear filteringStimulus(eg. n pixel image)[]1ut(*)[]11uwt(*)[]uwtnn[]utnPoisson spike process []zt[]t\uf06cNonlinearity[]dtGaussian noiseSpike count[]yt\fto produce the scalar output z[t] given by\n\nn(cid:88)\n\nj=1\n\nz[t] =\n\n(wj \u2217 uj)[t] =\n\nn(cid:88)\n\nL\u22121(cid:88)\n\nj=1\n\n(cid:96)=0\n\nwj[(cid:96)]uj[t \u2212 (cid:96)],\n\n(1)\n\nwhere wj[\u00b7] is the linear \ufb01lter applied to the jth stimulus component and (wj \u2217 uj)[t] is the convo-\nlution of the \ufb01lter with the input. We assume the \ufb01lters have \ufb01nite impulse response (FIR) with L\ntaps, wj[(cid:96)], (cid:96) = 0, 1, . . . , L \u2212 1. In the second (nonlinear) stage of the LNP model, the scalar linear\noutput z[t] passes through a memoryless nonlinear random function to produce a spike rate \u03bb[t]. We\nassume a nonlinear mapping of the form\n\n(cid:104)\n\n1 + exp(cid:0)\u03c6(v[t]; \u03b1)(cid:1)(cid:105)\n\n\u03bb[t] = f (v[t]) = log\nv[t] = z[t] + d[t], d[t] \u223c N (0, \u03c32\nd),\n\n,\n\n(2a)\n\n(2b)\n\nwhere d[t] is Gaussian noise to account for randomness in the spike rate and \u03c6(v; \u03b1) is the \u03bd-th\norder polynomial,\n\n\u03c6(v; \u03b1) = \u03b10 + \u03b11v + \u00b7\u00b7\u00b7 + \u03b1\u03bdv\u03bd.\n\n(3)\nThe form of the function in (2b) ensures that the spike rate \u03bb[t] is always positive. In the third and\n\ufb01nal stage of the LNP model, the number of spikes is modeled as a Poisson process with mean \u03bb[t].\nThat is,\n\nPr\n\ny[t] = k\n\n= e\u2212\u03bb[t]\u03bb[t]k/k!, k = 0, 1, 2, . . .\n\n(4)\n\n(cid:16)\n\n(cid:17)\n\n(cid:12)(cid:12)(cid:12) \u03bb[t]\n\nThis LNP model is sometimes called a one-dimensional model since z[t] is a scalar.\n\n2.2 Conventional Estimation Methods\n\nd), where w is the nL-\nThe parameters in the neural model can be written as the vector \u03b8 = (w, \u03b1, \u03c32\ndimensional vector of the \ufb01lter coef\ufb01cients, the vector \u03b1 contains the \u03bd + 1 polynomial coef\ufb01cients\nin (3) and \u03c32\nd is the noise variance. The basic problem is to estimate the parameters \u03b8 from the\ninput/output data uj[t] and y[t]. We brie\ufb02y summarize three conventional methods: spike-triggered\naverage (STA), reverse correlation (RC) and maximum likelihood (ML), all described in several\ntexts including [1].\nThe STA and RC methods are based on simple linear regression. The vector z of linear \ufb01lter outputs\nz[t] in (1) can be written as z = Aw, where A is a known block Toeplitz matrix with the input\ndata uj[t]. The STA and RC methods then both attempt to \ufb01nd a w such that output z has high\nlinear correlation with measured spikes y. The RC method \ufb01nds this solution with the least squares\nestimate\n\n(cid:98)wRC = (A\u2217A + \u03c32I)\u22121A\u2217y,\n\nfor some parameter \u03c32, and the STA is an approximation given by\n\n(cid:98)wSTA =\n\nA\u2217y.\n\n1\nT\n\nThe statistical properties of the estimates are discussed in [17, 18].\n\nOnce the estimate (cid:98)w = (cid:98)wSTA or (cid:98)wRC has been computed, one can compute an estimate(cid:98)z = A(cid:98)w\n\nfor the linear output z and then use any scalar estimation method to \ufb01nd a nonlinear mapping from\nz[t] to \u03bb[t] based on the outputs y[t].\nA shortcoming of the STA and RC methods is that the \ufb01lter coef\ufb01cients w are selected to maximize\nthe linear correlation and may not work well when there is a strong nonlinearity. A maximum\nlikelihood (ML) estimate may overcome this problem by jointly optimizing over nonlinear and linear\nparameters. To describe the ML estimate, \ufb01rst \ufb01x parameters \u03b1 and \u03c32\nd in the nonlinear stage. Then,\ngiven the vector output z from the linear stage, the spike count components y[t] are independent:\n\n(cid:16)\n\n(cid:12)(cid:12)(cid:12) z, \u03b1, \u03c32\n\nd\n\n(cid:17)\n\n(cid:16)\n\n(cid:12)(cid:12)(cid:12) z[t], \u03b1, \u03c32\n\nd\n\n(cid:17)\n\nPr\n\ny\n\n=\n\nPr\n\ny[t]\n\nT\u22121(cid:89)\n\nt=0\n\n3\n\n(5)\n\n(6)\n\n(7)\n\n\fwhere the component distributions are given by\n\n(cid:16)\nand p(cid:0)\u03bb[t](cid:12)(cid:12) z[t], \u03b1, \u03c32\n\ny[t]\n\nP\n\nd\n\n(cid:90) \u221e\n\nd\n\n=\n\nPr\n\n(cid:16)\n\n(cid:17)\n\n(cid:12)(cid:12)(cid:12) z[t], \u03b1, \u03c32\n(cid:1) can be computed from the relation (2b) and Pr(cid:0)y[t](cid:12)(cid:12) \u03bb[t](cid:1) is the Poisson\n(cid:98)\u03b8ML := arg max\n\np(cid:0)\u03bb[t](cid:12)(cid:12) z[t], \u03b1, \u03c32\n(cid:17)\n\n(cid:12)(cid:12)(cid:12) \u03bb[t]\n(cid:17)\n(cid:12)(cid:12)(cid:12) z[t], \u03b1, \u03c32\n\n(cid:1) d\u03bb[t],\n\n, z = Aw.\n\nT\u22121(cid:89)\n\n(cid:16)\n\ny[t]\n\nPr\n\ny[t]\n\n(9)\n\n(8)\n\nd\n\nd\n\n0\n\n(w,\u03b1,\u03c32\nd)\n\nt=0\n\ndistribution (4). The ML estimate is then given by the solution to the optimization\n\nIn this way, the ML estimate attempts to maximize the goodness of \ufb01t by simultaneously searching\nover the linear and nonlinear parameters.\n\n3 Estimation via Compressed Sensing\n\n3.1 Bayesian Model with Group Sparsity\n\nA dif\ufb01culty in the above methods is that the number, Ln, of \ufb01lter coef\ufb01cients in w may be large and\nrequire an excessive number of measurements to estimate accurately. As discussed above, the key\nidea in this work is that most stimulus components have little effect on the spiking output. Most of\nthe \ufb01lter coef\ufb01cients wj[(cid:96)] will be zero and exploiting this sparsity may be able to reduce the number\nof measurements while maintaining the same estimation accuracy.\nThe sparse nature of the \ufb01lter coef\ufb01cients can be modeled with the following group sparsity struc-\nture: Let \u03bej be a binary random variable with \u03bej = 1 when stimulus j is in the receptive \ufb01eld of the\nneuron and \u03bej = 0 when it is not. We call the variables \u03bej the receptive \ufb01eld indicators, and model\nthese indicators as i.i.d. Bernoulli variables with\n\n(10)\nwhere \u03c1 \u2208 [0, 1] is the average fraction of stimuli in the receptive \ufb01eld. We then assume that,\ngiven the vector \u03be of receptive \ufb01eld indicators, the \ufb01lter weight coef\ufb01cients are independent with\ndistribution\n\nPr(\u03bej = 1) = 1 \u2212 Pr(\u03bej = 0) = \u03c1,\n\n(cid:16)\n\n(cid:17)\n\n(cid:12)(cid:12)(cid:12) \u03be\n\n(cid:16)\n\n(cid:26) 0\n\n(cid:17)\n\n(cid:12)(cid:12)(cid:12) \u03bej\n\np\n\nwj[(cid:96)]\n\n= p\n\nwj[(cid:96)]\n\n=\n\nN (0, \u03c32\nx)\n\nif \u03bej = 0\nif \u03bej = 1.\n\n(11)\n\nThat is, the linear weight coef\ufb01cients are zero outside the receptive \ufb01eld and Gaussian within the\nreceptive \ufb01eld. Since our algorithms are general, other distributions can also be used\u2014we use the\nGaussian for illustration. The distribution on w de\ufb01ned by (10) and (11) is often called a group\nsparse model, since the components of the vector w are zero in groups.\nEstimation with this sparse structure leads naturally to a compressed sensing problem. Speci\ufb01cally,\nwe are estimating a sparse vector w through a noisy version y of a linear transform z = Aw, which\nis precisely the problem of compressed sensing [8\u201310]. With a group structure, one can employ a\nvariety of methods including the group Lasso [19\u201321] and group orthogonal matching pursuit [22].\nHowever, these methods are designed for either AWGN or logistic outputs. In the neural model, the\nspike count y[t] is a nonlinear, random function of the linear output z[t] described by the probability\ndistribution in (8).\n\n3.2 GAMP-Based Sparse Estimation\n\nTo address the nonlinearities in the outputs, we use the generalized approximate message passing\n(GAMP) algorithm [11] with extensions in [12]. The GAMP algorithm is a general approximate\ninference method for graphical models with linear mixing. To place the neural estimation problem\nin the GAMP framework, \ufb01rst \ufb01x the stimulus input vector u, nonlinear output parameters \u03b1 and\nd. Then, the conditional joint distribution of the outputs y, linear \ufb01lter weights w and receptive\n\u03c32\n\ufb01eld indicators \u03be factor as\n\n(cid:34)\n\nn(cid:89)\n\nL\u22121(cid:89)\n\np(cid:0)wj[(cid:96)](cid:12)(cid:12) \u03bej\n\n(cid:1)(cid:35) T\u22121(cid:89)\n\n(cid:16)\n\nPr\n\ny[t]\n\n(cid:12)(cid:12)(cid:12) z[t], \u03b1, \u03c32\n\nd\n\n(cid:17)\n\n(cid:16)\n\n(cid:12)(cid:12)(cid:12) u, \u03b1, \u03c32\n\nd\n\n(cid:17)\n\np\n\ny, \u03be, w\n\n=\n\nPr(\u03bej)\n\n,\n\n(12)\n\nj=1\n\n(cid:96)=0\n\nt=0\n\nz = Aw.\n\n4\n\n\fFigure 2:\nThe neural estima-\ntion problem represented as a\ngraphical model with linear mix-\ning. Solid circles are unknown\nvariables, dashed circles are ob-\nserved variables (in this case,\nspike counts) and squares are fac-\ntors in the probability distribu-\ntion. The linear mixing compo-\nnent of the graph indicates the\nconstraints that z = Aw.\n\nSimilar to standard graphical model estimation [23], GAMP is based on the \ufb01rst representing the\ndistribution in (12) via a factor graph as shown in Fig. 2.\nIn the factor graph, the solid circles\nrepresent the components of the unknown vectors w, \u03be, . . ., and the dashed circles the components\nof the observed or measured variables y. Each square corresponds to one factor in the distribution\n(12). What is new for the GAMP methodology, is that the factor graph also contains a component\nto indicate the linear constraints that z = Aw, which would normally be represented by a set of\nadditional factor nodes.\nInference on graphical models is often performed by some variant of loopy belief propagation (BP).\nLoopy BP attempts to reduce the joint estimation of all the variables to a sequence of lower dimen-\nsional estimation problems associated with each of the factors in the graph. Estimation at the factor\nnodes is performed iteratively, where after each iteration, \u201cbeliefs\u201d of the variables are passed to the\nfactors to improve the estimates in the subsequent iterations. Details can be found in [23].\nHowever, exact implementation of loopy BP is intractable for the neural estimation problem: The\nlinear constraints z = Aw create factor nodes that connect each of the variables z[t] to all the\nvariables wj[(cid:96)] where uj[t \u2212 (cid:96)] is non-zero. In the RGC experiments below, the pixels value uj[t]\nare non-zero 50% of the time, so each variable z[t] will be connected to, on average, half of the Ln\n\ufb01lter weight coef\ufb01cients through these factor nodes. Since exact implementation of loopy BP grows\nexponentially in the degree of the factor nodes, loopy BP would be infeasible for the neural problem,\neven for moderate values of Ln.\nThe GAMP method reduces the complexity of loopy BP by exploiting the linear nature of the rela-\ntions between the variables w and z. Speci\ufb01cally, it is shown that when each term z[t] is a linear\ncombination of a large number of terms wj[(cid:96)], the belief messages across the factor node for the\nlinear constraints can be approximated as Gaussians and the factor node updates can be computed\nwith a central limit theorem approximation. Details are in [11] and [12].\n\n4 Receptive Fields of Salamander Retinal Ganglion Cells\n\nThe sparse LNP model with GAMP-based estimation was evaluated on data from recordings of\nneural spike trains from salamander retinal ganglion cells exposed to random checkerboard images,\nfollowing the basic methods of [24].1 In the experiment, spikes from individual neurons were mea-\nsured over an approximately 1900s period at a sampling interval of 10ms. During the recordings,\nthe salamander was exposed to 80 \u00d7 60 pixel random black-white binary images that changed every\n3 to 4 sampling intervals. The pixels of each image were i.i.d. with a 50-50 black-white probability.\nWe compared three methods for \ufb01tting an L = 30 tap one-dimensional LNP model for the RGC\nneural responses: (i) truncated STA, (ii) approximate ML, and (iii) GAMP estimation with the\nsparse LNP model. Methods (i) and (ii) do not exploit sparsity, while method (iii) does.\nThe truncated STA method was performed by \ufb01rst computing a linear \ufb01lter estimate as in (6) for the\nentire 80 \u00d7 60 image and then setting all coef\ufb01cients outside an 11 \u00d7 11 pixel subarea around the\npixel with the largest estimated response to zero. The 11\u00d7 11 size was chosen since it is suf\ufb01ciently\nlarge to contain these neurons\u2019 entire receptive \ufb01elds. This truncation signi\ufb01cantly improves the\nSTA estimate by removing spurious estimates that anatomically cannot have relation to the neural\n\n1Data from the Leonardo Laboratory at the Janelia Farm Research Campus.\n\n5\n\nData matrix with input stimuli uj[t]AReceptive field indicators\u03be\uf028\uf029[]pwjj\uf078\uf06c()Pj\uf0782[][],,pytztd\uf073\uf0e6\uf0f6\uf0e7\uf0f7\uf0e8\uf0f8\u03b1Filter weightswFilter outputs zObserved spike countsyNonlinear parameters2,d\uf073\u03b1\f(a) Filter responses over time\n\n(b) Spatial receptive \ufb01eld\n\nFigure 3: Estimated \ufb01lter responses and visual receptive \ufb01eld for salamander RGCs using a non-\nsparse LNP model with STA estimation and a sparse LNP model with GAMP estimation.\n\nresponses; this provides a better comparison to test other methods. From the estimate (cid:98)wSTA of the\nlinear \ufb01lter coef\ufb01cients, we compute an estimate(cid:98)z = A(cid:98)w of the linear \ufb01lter output. The output\n\nd are then \ufb01t by numerical maximization of likelihood P (y|(cid:98)z, \u03b1, \u03c32\n\nd) in (7).\n\nparameters \u03b1 and \u03c32\nWe used a (\u03bd = 1)-order polynomial, since higher orders did not improve the prediction. The\nfact that only a linear polynomial was needed in the output is likely due to the fact that random\ncheckerboard images rarely align with the neuron\u2019s \ufb01lters and therefore do not excite the neural\nspiking into a nonlinear regime. An interesting future experiment would be to re-run the estimation\nwith swatches of natural images as in [25]. We believe that under such experimental conditions, the\nadvantages of the GAMP-based nonlinear estimation would be even larger.\nThe RC estimate (5) was also computed, but showed no appreciable difference from the STA esti-\nmate for this matrix A. As a result, we discuss only STA results below.\nThe GAMP-based sparse estimation used the STA estimate for initialization to select the 11 \u00d7 11\nx in (11). As in the STA case, we used only a (\u03bd = 1)-order linear\npixel subarea and the variances \u03c32\npolynomial in (3). The linear coef\ufb01cient \u03b11 was set to 1 since other scalings could be absorbed into\nthe \ufb01lter weights w. The constant term \u03b10 was incorporated as another linear regression coef\ufb01cient.\nFor a third algorithm, we approximately computed the ML estimate (9) by running the GAMP\nalgorithm, but with all the factors for the priors on the weights w removed.\nTo illustrate the qualitative differences between the estimates, Fig. 3 shows the estimated responses\nfor the STA and GAMP-based sparse LNP estimates for one neuron using three different lengths of\ntraining data: 400, 600 and 1000 seconds of the total 1900 second training data. For brevity, the\napproximate ML estimate is omitted, but is similar to the STA estimate. The estimated responses in\nFig. 3(a) are displayed as 11 \u00d7 11 = 121 curves, each curve representing the linear \ufb01lter response\nwith L = 30 taps over the 30\u00d710 = 300ms response. Fig. 3(b) shows the estimated spatial receptive\n\ufb01elds plotted as the total magnitude of the 11 \u00d7 11 \ufb01lters. One can immediately see that the GAMP\nbased sparse estimate is signi\ufb01cantly less noisy than the STA estimate, as the smaller, unreliable\nresponses are zeroed out in the GAMP-based sparse LNP estimate.\nThe improved accuracy of the GAMP-estimation with the sparse LNP model was veri\ufb01ed in the\ncross validation, as shown in Fig. 4. In this plot, the length of the training data was varied from 200\nto 1000 seconds, with the remaining portion of the 1900 second data used for cross-validation. At\neach training length, each of the three methods\u2014STA, GAMP-based sparse LNP and approximate\nd). Fig. 4 shows, for each of these methods,\nd)1/T , which is the geometric mean of the likelihood in (7).\nIt can be seen that the GAMP-based sparse LNP estimate signi\ufb01cantly outperforms the STA and\n\nML\u2014were used to produce an estimate (cid:98)\u03b8 = ((cid:98)w,(cid:98)\u03b1,(cid:98)\u03c32\nthe cross-validation scores P (y|(cid:98)z,(cid:98)\u03b1,(cid:98)\u03c32\n\n6\n\nNon\u2212sparse LNP w/ STA400 sTrainingSparse LNP w/ GAMP600 sTraining0100200300Delay (ms)1000 sTraining0100200300Delay (ms)Non\u2212sparse LNP w/ STASparse LNP w/ GAMP\fFigure 4: Prediction accuracy of sparse and\nnon-sparse LNP estimates for data from sala-\nmander RGC cells. Based on cross-validation\nscores, the GAMP-based sparse LNP estima-\ntion provides a signi\ufb01cantly better estimate for\nthe same amount of training.\n\nFigure 5: Comparison of reconstruction meth-\nods on cortical connectome mapping with\nmulti-neuron excitation based on simulation\nmodel in [4]. In this case, connectivity from\nn = 500 potential pre-synaptic neurons are\nestimated from m = 300 measurements with\n40 neurons excited in each measurement. In\nthe simulation, only 6% of the n potential\nneurons are actually connected to the post-\nsynaptic neuron under test.\n\napproximate ML estimates that do not assume any sparse structure. Indeed, by the measure of the\ncross-validation score, the sparse LNP estimate with GAMP after only 400 seconds of data was as\naccurate as the STA estimate with 1000 seconds of data. Interestingly, the approximate ML estimate\nis actually worse than the STA estimate, presumably since it over\ufb01ts the model.\n\n5 Neural Mapping via Multi-Neuron Excitation\n\nThe GAMP methodology was also applied to neural mapping from multi-neuron excitation, orig-\ninally proposed in [4]. A single post-synaptic neuron has connections to n potential pre-synaptic\nneurons. The standard method to determine which of the n neurons are connected to the post-\nsynaptic neurons is to excite one neuron at a time. This process is wasteful, since only a small\nfraction of the neurons are typically connected. In the method of [4], multiple neurons are excited\nin each measurement. Then, exploiting the sparsity in the connectivity, compressed sensing tech-\nniques can be used to recover the mapping from m < n measurements. Unfortunately, the output\nstage of spiking neurons is often nonlinear and most CS methods cannot directly incorporate such\nnonlinearities into the estimation. The GAMP methodology thus offers the possibility of improved\nperformance for reconstruction.\nTo validate the methodology, we compared the performance of GAMP to various reconstruction\nmethods following a simulation of mapping of cortical neurons with multi-neuron excitation in [4].\nThe simulation assumes an LNP model of Section 2.1, where the inputs uj[t] are 1 or 0 depending\non whether the jth pre-synaptic input is excited in tth measurement. The \ufb01lters have a single tap (i.e.\nL=1), which are modeled as a Bernoulli-Weibull distribution with a probability \u03c1 = 0.06 of being\non (the neuron is connected) or 1 \u2212 \u03c1 of being zero (the neuron is not connected). The output has a\nstrong nonlinearity including a thresholding and saturation \u2013 the levels of which must be estimated.\nConnectivity detection amounts to determining which of the n pre-synaptic neurons have non-zero\nweights.\nFig. 5 plots the missed detection vs. false alarm rate of the various detectors. It can be seen that the\nGAMP-based connectivity detection signi\ufb01cantly outperforms both non-sparse RC reconstruction\nas well as a state-of-the-art greedy sparse method CoSaMP [26, 27].\n\n7\n\n20040060080010000.8950.90.9050.910.9150.920.925Train time (sec)Cross\u2212valid score Sparse LNP w/ GAMPNon\u2212sparse LNP w/ STANon\u2212sparse LNP w/ approx ML10\u2212410\u2212310\u2212210\u2212110000.20.40.60.81False alarm prob, pFAMissed detect prob, pMD RCCoSaMPGAMP\f6 Conclusions and Future Work\n\nA general method for parameter estimation in neural models based on generalized approximate\nmessage passing was presented. The GAMP methodology is computationally tractable for large\ndata sets, can exploit sparsity in the linear coef\ufb01cients and can incorporate a wide range of nonlinear\nmodeling complexities in a systematic manner. Experimental validation of the GAMP-based esti-\nmation of a sparse LNP model for salamander RGC cells shows signi\ufb01cantly improved prediction in\ncross-validation over simple non-sparse estimation methods such as STA. Bene\ufb01ts over state-of-the-\nart sparse reconstruction methods are also apparent in simulated models of cortical mapping with\nmulti-neuron excitation.\nGoing forward, the generality offered by the GAMP model will enable accurate parameter estima-\ntion for other complex neural models. For example, the GAMP model can incorporate other prior\ninformation such as a correlation between responses in neighboring pixels. Future work may also\ninclude experiments with integrate-and-\ufb01re models [3]. An exciting future possibility for cortical\nmapping is to decode memories, which are thought to be stored as the connectome [7, 28].\nThroughout this paper, we have presented GAMP as an experimental data analysis method.\nOne might wonder, however, whether the brain itself might use compressive representations and\nmessage-passing algorithms to make sense of the world. There have been several previous sug-\ngestions that visual and general cortical regions of the brain may use belief propagation-like algo-\nrithms [29, 30]. There have also been recent suggestions that the visual system uses compressive\nrepresentations [31]. As such, we assert the biologically plausibility of the brain itself using the\nalgorithms presented herein for receptive \ufb01eld and memory decoding.\n\n7 Acknowledgements\n\nWe thank D. B. Chklovskii and T. Hu for formulative discussions on the problem, A. Leonardo for\nproviding experimental data and further discussions, and B. Olshausen for discussions.\n\nReferences\n[1] Peter Dayan and L. F. Abbott. Theoretical Neuroscience. Computational and Mathematical\n\nModeling of Neural Systems. MIT Press, 2001.\n\n[2] Odelia Schwartz, Jonathan W. Pillow, Nicole C. Rust, and Eero P. Simoncelli. Spike-triggered\n\nneural characterization. J. Vis., 6(4):13, July 2006.\n\n[3] Liam Paninski, Jonathan W. Pillow, and Eero P. Simoncelli. Maximum Likelihood Estimation\nof a Stochastic Integrate-and-Fire Neural Encoding Model. Neural Computation, 16(12):2533\u2013\n2561, December 2004.\n\n[4] Tao Hu and Dmitri B. Chklovskii. Reconstruction of sparse circuits using multi-neuronal\nexcitation (RESCUME). In Yoshua Bengio, Dale Schuurmans, John Lafferty, Chris Williams,\nand Aron Culotta, editors, Advances in Neural Information Processing Systems 22, pages 790\u2013\n798. MIT Press, Cambridge, MA, 2009.\n\n[5] James R. Anderson, Bryan W. Jones, Carl B. Watt, Margaret V. Shaw, Jia-Hui Yang, David\nDeMill, James S. Lauritzen, Yanhua Lin, Kevin D. Rapp, David Mastronarde, Pavel Koshevoy,\nBradley Grimm, Tolga Tasdizen, Ross Whitaker, and Robert E. Marc. Exploring the retinal\nconnectome. Mol. Vis, 17:355\u2013379, February 2011.\n\n[6] Elad Ganmor, Ronen Segev, and Elad Schneidman. The architecture of functional interaction\n\nnetworks in the retina. J. Neurosci., 31(8):3044\u20133054, February 2011.\n\n[7] Lav R. Varshney, Per Jesper Sj\u00a8ostr\u00a8om, and Dmitri B. Chklovskii. Optimal information storage\n\nin noisy synapses under resource constraints. Neuron, 52(3):409\u2013423, November 2006.\n\n[8] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruc-\ntion from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489\u2013\n509, February 2006.\n\n[9] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289\u20131306, April\n\n2006.\n\n8\n\n\f[10] E. J. Cand`es and T. Tao. Near-optimal signal recovery from random projections: Universal\n\nencoding strategies? IEEE Trans. Inform. Theory, 52(12):5406\u20135425, December 2006.\n\n[11] S. Rangan. Generalized Approximate Message Passing for Estimation with Random Linear\n\nMixing. arXiv:1010.5141 [cs.IT]., October 2010.\n\n[12] S. Rangan, A.K. Fletcher, V.K.Goyal, and P. Schniter. Hybrid Approximate Message Passing\n\nwith Applications to Group Sparsity . arXiv, 2011.\n\n[13] D. Guo and C.-C. Wang. Random sparse linear systems observed via arbitrary channels: A\ndecoupling principle. In Proc. IEEE Int. Symp. Inform. Th., pages 946 \u2013 950, Nice, France,\nJune 2007.\n\n[14] David L. Donoho, Arian Maleki, and Andrea Montanari. Message-passing algorithms for\n\ncompressed sensing. PNAS, 106(45):18914\u201318919, September 2009.\n\n[15] David H. Hubel. Eye, Brain, and Vision. W. H. Freeman, 2nd edition, 1995.\n[16] Toshihiko Hosoya, Stephen A. Baccus, and Markus Meister. Dynamic predictive coding by\n\nthe retina. Nature, 436(7047):71\u201377, July 2005.\n\n[17] E. J. Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Com-\n\nputation in Neural Systems., 12:199\u2013213, 2001.\n\n[18] L. Paninski. Convergence properties of some spike-triggered analysis techniques. Network:\n\nComputation in Neural Systems, 14:437\u2013464, 2003.\n\n[19] S. Bakin. Adaptive regression and model selection in data mining problems. PhD thesis,\n\nAustralian National University, Canberra, 1999.\n\n[20] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. J.\n\nRoyal Statist. Soc., 68:49\u201367, 2006.\n\n[21] Lukas Meier, Sara van de Geer, and Peter B\u00a8uhlmann. Model selection and estimation in re-\n\ngression with grouped variables. J. Royal Statist. Soc., 70:53\u201371, 2008.\n\n[22] Aur\u00b4elie C. Lozano, Grzegorz \u00b4Swirszcz, and Naoki Abe. Group orthogonal matching pursuit\n\nfor variable selection and prediction. In Proc. NIPS, Vancouver, Canada, December 2008.\n\n[23] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics.\n\nSpringer, New York, NY, 2006.\n\n[24] Markus Meister, Jerome Pine, and Denis A. Baylor. Multi-neuronal signals from the retina:\n\nacquisition and analysis. J. Neurosci. Methods, 51(1):95\u2013106, January 1994.\n\n[25] Joaquin Rapela, Jerry M. Mendel, and Norberto M. Grzywacz. Estimating nonlinear receptive\n\n\ufb01elds from natural images. J. Vis., 6(4):11, May 2006.\n\n[26] D. Needell and J. A. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate\n\nsamples. Appl. Comput. Harm. Anal., 26(3):301\u2013321, May 2009.\n\n[27] W. Dai and O. Milenkovic. Subspace pursuit for compressive sensing signal reconstruction.\n\nIEEE Trans. Inform. Theory, 55(5):2230\u20132249, May 2009.\n\n[28] Dmitri B. Chklovskii, Bartlett W. Mel, and Karel Svoboda. Cortical rewiring and information\n\nstorage. Nature, 431(7010):782\u2013788, October 2004.\n\n[29] Tai Sing Lee and David Mumford. Hierarchical bayesian inference in the visual cortex. J. Opt.\n\nSoc. Am. A, 20(7):1434\u20131448, July 2003.\n\n[30] Karl Friston. The free-energy principle: a uni\ufb01ed brain theory? Nat. Rev. Neurosci., 11(2):127\u2013\n\n138, February 2010.\n\n[31] Guy Isely, Christopher J. Hillar, and Friedrich T. Sommer. Decyphering subsampled data:\nAdaptive compressive sampling as a principle of brain communication. In J. Lafferty, C. K. I.\nWilliams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Informa-\ntion Processing Systems 23, pages 910\u2013918. MIT Press, Cambridge, MA, 2010.\n\n9\n\n\f", "award": [], "sourceid": 1390, "authors": [{"given_name": "Alyson", "family_name": "Fletcher", "institution": null}, {"given_name": "Sundeep", "family_name": "Rangan", "institution": null}, {"given_name": "Lav", "family_name": "Varshney", "institution": null}, {"given_name": "Aniruddha", "family_name": "Bhargava", "institution": null}]}