{"title": "Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions", "book": "Advances in Neural Information Processing Systems", "page_first": 1250, "page_last": 1258, "abstract": "We propose a compressed sensing (CS) calcium imaging framework for monitoring large neuronal populations, where we image randomized projections of the spatial calcium concentration at each timestep, instead of measuring the concentration at individual locations. We develop scalable nonnegative deconvolution methods for extracting the neuronal spike time series from such observations. We also address the problem of demixing the spatial locations of the neurons using rank-penalized matrix factorization methods. By exploiting the sparsity of neural spiking we demonstrate that the number of measurements needed per timestep is significantly smaller than the total number of neurons, a result that can potentially enable imaging of larger populations at considerably faster rates compared to traditional raster-scanning techniques. Unlike traditional CS setups, our problem involves a block-diagonal sensing matrix and a non-orthogonal sparse basis that spans multiple timesteps. We study the effect of these distinctive features in a noiseless setup using recent results relating conic geometry to CS. We provide tight approximations to the number of measurements needed for perfect deconvolution for certain classes of spiking processes, and show that this number displays a phase transition,\" similar to phenomena observed in more standard CS settings; however, in this case the required measurement rate depends not just on the mean sparsity level but also on other details of the underlying spiking process.\"", "full_text": "Sparse nonnegative deconvolution for compressive\ncalcium imaging: algorithms and phase transitions\n\nEftychios A. Pnevmatikakis and Liam Paninski\n\nDepartment of Statistics, Center for Theoretical Neuroscience\n\nGrossman Center for the Statistics of Mind, Columbia University, New York, NY\n\n{eftychios, liam}@stat.columbia.edu\n\nAbstract\n\nWe propose a compressed sensing (CS) calcium imaging framework for monitoring\nlarge neuronal populations, where we image randomized projections of the spatial\ncalcium concentration at each timestep, instead of measuring the concentration at\nindividual locations. We develop scalable nonnegative deconvolution methods for\nextracting the neuronal spike time series from such observations. We also address\nthe problem of demixing the spatial locations of the neurons using rank-penalized\nmatrix factorization methods. By exploiting the sparsity of neural spiking we\ndemonstrate that the number of measurements needed per timestep is signi\ufb01cantly\nsmaller than the total number of neurons, a result that can potentially enable\nimaging of larger populations at considerably faster rates compared to traditional\nraster-scanning techniques. Unlike traditional CS setups, our problem involves a\nblock-diagonal sensing matrix and a non-orthogonal sparse basis that spans multiple\ntimesteps. We provide tight approximations to the number of measurements needed\nfor perfect deconvolution for certain classes of spiking processes, and show that\nthis number undergoes a \u201cphase transition,\u201d which we characterize using modern\ntools relating conic geometry to compressed sensing.\n\n1\n\nIntroduction\n\nCalcium imaging methods have revolutionized data acquisition in experimental neuroscience; we\ncan now record from large neural populations to study the structure and function of neural circuits\n(see e.g. Ahrens et al. (2013)), or from multiple locations on a dendritic tree to examine the detailed\ncomputations performed at a subcellular level (see e.g. Branco et al. (2010)). Traditional calcium\nimaging techniques involve a raster-scanning protocol where at each cycle/timestep the microscope\nscans the image in a voxel-by-voxel fashion, or some other predetermined pattern, e.g. through\nrandom access multiphoton (RAMP) microscopy (Reddy et al., 2008), and thus the number of\nmeasurements per timestep is equal to the number of voxels of interest. Although this protocol\nproduces \u201ceye-interpretable\u201d measurements, it introduces a tradeoff between the size of the imaged\n\ufb01eld and the imaging frame rate; very large neural populations can be imaged only with a relatively\nlow temporal resolution.\nThis unfavorable situation can potentially be overcome by noticing that many acquired measurements\nare redundant; voxels can be \u201cvoid\u201d in the sense that no neurons are located there, and active voxels\nat nearby locations or timesteps will be highly correlated. Moreover, neural activity is typically\nsparse; most neurons do not spike at every timestep. During recent years, imaging practitioners have\ndeveloped specialized techniques to leverage this redundancy. For example, Nikolenko et al. (2008)\ndescribe a microscope that uses a spatial light modulator and allows for the simultaneous imaging\nof different (prede\ufb01ned) image regions. More broadly, the advent of compressed sensing (CS) has\nfound many applications in imaging such as MRI (Lustig et al., 2007), hyperspectral imaging (Gehm\net al., 2007), sub-diffraction microscopy (Rust et al., 2006) and ghost imaging (Katz et al., 2009),\n\n1\n\n\fwith available hardware implementations (see e.g. Duarte et al. (2008)). Recently, Studer et al.\n(2012) presented a \ufb02uorescence microscope based on the CS framework, where each measurement\nis obtained by projection of the whole image on a random pattern. This framework can lead to\nsigni\ufb01cant undersampling ratios for biological \ufb02uorescence imaging.\nIn this paper we propose the application of the imaging framework of Studer et al. (2012) to the case\nof neural population calcium imaging to address the problem of imaging large neural populations with\nhigh temporal resolution. The basic idea is to not measure the calcium at each location individually,\nbut rather to take a smaller number of \u201cmixed\u201d measurements (based on randomized projections of\nthe data). Then we use convex optimization methods that exploit the sparse structure in the data in\norder to simultaneously demix the information from the randomized projection observations and\ndeconvolve the effect of the slow calcium indicator to recover the spikes. Our results indicate that the\nnumber of required randomized measurements scales merely with the number of expected spikes\nrather than the ambient dimension of the signal (number of voxels/neurons), allowing for the fast\nmonitoring of large neural populations. We also address the problem of estimating the (potentially\noverlapping) spatial locations of the imaged neurons and demixing these locations using methods for\nnuclear norm minimization and nonnegative matrix factorization. Our methods scale linearly with\nthe experiment length and are largely parallelizable, ensuring computational tractability. Our results\nindicate that calcium imaging can be potentially scaled up to considerably larger neuron populations\nand higher imaging rates by moving to compressive signal acquisition.\nIn the traditional static compressive imaging paradigm the sensing matrix is dense; every observation\ncomes from the projection of all the image voxels to a random vector/matrix. Moreover, the underlying\nimage can be either directly sparse (most of the voxels are zero) or sparse in some orthogonal basis\n(e.g. Fourier, or wavelet). In our case the sensing matrix has a block-diagonal form (we can only\nobserve the activity at one speci\ufb01c time in each measurement) and the sparse basis (which corresponds\nto the inverse of the matrix implementing the convolution of the spikes from the calcium indicator) is\nnon-orthogonal and spans multiple timelags. We analyze the effect of these distinctive features in\nSec. 3 in a noiseless setting. We show that as the number of measurements increases, the probability of\nsuccessful recovery undergoes a phase transition, and study the resulting phase transition curve (PTC),\ni.e., the number of measurements per timestep required for accurate deconvolution as a function of\nthe number of spikes. Our analysis uses recent results that connect CS with conic geometry through\nthe \u201cstatistical dimension\u201d (SD) of descent cones (Amelunxen et al., 2013). We demonstrate that in\nmany cases of interest, the SD provides a very good estimate of the PTC.\n\n2 Model description and approximate maximum-a-posteriori inference\n\nSee e.g. Vogelstein et al. (2010) for background on statistical models for calcium imaging data. Here\nwe assume that at every timestep an image or light \ufb01eld (either two- or three-dimensional) is observed\nfor a duration of T timesteps. Each observed \ufb01eld contains a total number of d voxels and can be\nvectorized in a single column vector. Thus all the activity can be described by d \u00d7 T matrix F . Now\nassume that the \ufb01eld contains a total number of N neurons, where N is in general unknown. Each\nspike causes a rapid increase in the calcium concentration which then decays with a time constant\nthat depends on the chemical properties of the calcium indicator. For each neuron i we assume that\nthe \u201ccalcium activity\u201d ci can be described as a stable autoregressive process AR(1) process1 that\n\ufb01lters the neuron\u2019s spikes si(t) according to the fast-rise slow-decay procedure described before:\n\nci(t) = \u03b3ci(t \u2212 1) + si(t),\n\n(1)\nwhere \u03b3 is the discrete time constant which satis\ufb01es 0 < \u03b3 < 1 and can be approximated as\n\u03b3 = 1 \u2212 exp(\u2212\u2206t/\u03c4 ), where \u2206t is the length of each timestep and \u03c4 is the continuous time constant\nof the calcium indicator. In general we assume that each si(t) is binary due to the small length of\nthe timestep in the proposed compressive imaging setting, and we use an i.i.d. prior for each neuron\np(si(t) = 1) = \u03c0i.2 Moreover, let ai \u2208 Rd\n+ the (nonnegative) location vector for neuron i, and\nb \u2208 Rd\n+ the (nonnegative) vector of baseline concentration for all the voxels. The spatial calcium\nconcentration pro\ufb01le at time t can be described as\n\n(cid:88)N\n\ni=1\n\nf (t) =\n\naici(t) + b.\n\n(2)\n\n1Generalization to general AR(p) processes is straightforward, but we keep p = 1 for simplicity.\n2This choice is merely for simplicity; more general prior distributions can be incorporated in our framework.\n\n2\n\n\fIn conventional raster-scanning experiments, at each timestep we observe a noisy version of the d-\ndimensional image f (t). Since d is typically large, the acquisition of this vector can take a signi\ufb01cant\namount of time, leading to a lengthy timestep \u2206t and low temporal resolution. Instead, we propose\nto observe the projections of f (t) onto a random matrix Bt \u2208 Rn\u00d7d (e.g. each entry of Bt could be\nchosen as 0 or 1 with probability 0.5):\n\ny(t) = Btf (t) + \u03b5t,\n\n\u03b5t \u223c N (0, \u03a3t),\n\n(3)\n\nwhere \u03b5t denotes measurement noise (Gaussian, with diagonal covariance \u03a3t, for simplicity). If\nn = dim(y(t)) satis\ufb01es n (cid:28) d, then y(t) represents a compression of f (t) that can potentially be\nobtained more quickly than the full f (t). Now if we can use statistical methods to recover f (t) (or\nequivalently the location ai and spikes si of each neuron) from the compressed measurements y(t),\nthe total imaging throughput will be increased by a factor proportional to the undersampling ratio\nd/n. Our assumption here is that the random projection matrices Bt can be constructed quickly.\nRecent technological innovations have enabled this fast construction by using digital micromirror\ndevices that enable spatial light modulation and can construct different excitation patterns with a high\nfrequency (order of kHz). The total \ufb02uorescence can then be detected with a single photomultiplier\ntube. For more details we refer to Duarte et al. (2008); Nikolenko et al. (2008); Studer et al. (2012).\nWe discuss the statistical recovery problem next. For future reference, note that eqs. (1)-(3) can be\nwritten in matrix form as (vec(\u00b7) denotes the vectorizing operator)\n\n\uf8ee\uf8ef\uf8ef\uf8f0 1\n\n\u2212\u03b3\n...\n\n0\n\n0\n. . .\n1\n. . .\n...\n...\n. . . \u2212\u03b3\n\n0\n0\n\n...\n\n1\n\n\uf8f9\uf8fa\uf8fa\uf8fb , B = blkdiag{B1, . . . , BT}.\n\n(4)\n\nS = CGT\nF = AC + b1T\nT\n\nwith G =\n\nvec(Y ) = Bvec(F ) + \u03b5,\n\n2.1 Approximate MAP inference with an interior point method\n\nFor now we assume that A is known. In general MAP inference of S is dif\ufb01cult due to the discrete\nnature of S. Following Vogelstein et al. (2010) we relax S to take continuous values in the interval\n[0, 1] (remember that we assume binary spikes), and appropriately modify the prior for si(t) to\nlog p(si(t)) \u221d \u2212(\u03bbisi(t))1(0 \u2264 si(t) \u2264 1), where \u03bbi is chosen such that the relaxed prior has the\nsame mean \u03c0i. To exploit the banded structure of G we seek the MAP estimate of C (instead of S)\nby solving the following convex quadratic problem (we let \u00afy(t) = y(t) \u2212 Btb)\n\n(cid:88)T\n\n1\n2\n\nminimize\nsubject to 0 \u2264 CGT \u2264 1, c(1) \u2265 0,\n\nt=1\n\nC\n\n( \u00afy(t) \u2212 BtAc(t))T \u03a3\u22121\n\nt ( \u00afy(t) \u2212 BtAc(t)) \u2212 log p(C)\n\n(P-QP)\n\nUsing the prior on S and the relation S = CGT , the log-prior of C can be written as log p(C) \u221d\n\u2212\u03bbT CGT 1T .We can solve (P-QP) ef\ufb01ciently using an interior point method with a log-barrier\n(Vogelstein et al., 2010). The contribution of the likelihood term to the Hessian is a block-diagonal\nmatrix, whereas the barrier-term will contribute a block-tridiagonal matrix where each non-zero\nblock is diagonal. As a result the Newton search direction \u2212H\u22121\u2207 can be computed ef\ufb01ciently in\nO(T N 3) time using a block version of standard forward-backward methods for tridiagonal systems\nof linear equations. We note that if N is large this can be inef\ufb01cient. In this case we can use an\naugmented Lagrangian method (Boyd et al., 2011) to derive a fully parallelizable \ufb01rst order method,\nwith O(T N ) complexity per iteration. We refer to the supplementary material for additional details.\nAs a \ufb01rst example we consider a simple setup where all the parameters are assumed to be known.\nWe consider N = 50 neurons observed over T = 1000 timesteps. We assume that A, b are known,\nwith A = IN (corresponding to non-overlapping point neurons, with one neuron in each voxel) and\nb = 0, respectively. This case of known point neurons can be thought as the compressive analog\nof RAMP microscopy where the neuron locations are predetermined and then imaged in a serial\nmanner. (We treat the case of unknown and possibly overlapping neuron locations in section 2.2.)\nEach neuron was assumed to \ufb01re in an i.i.d. fashion with probability per timestep p = 0.04. Each\nmeasurement was obtained by projecting the spatial \ufb02uorescence vector at time t, f (t), onto a random\nmatrix Bt. Each row of Bt is taken as an i.i.d. normalized vector 2\u03b2/\nN, where \u03b2 has i.i.d. entries\nfollowing a fair Bernoulli distribution. For each set of measurements we assume that \u03a3t = \u03c32In, and\n\n\u221a\n\n3\n\n\fFigure 1: Performance of proposed algorithm under different noise levels. A: True traces, B:\nEstimated traces with n = 5 (10\u00d7 undersampling), SNR = 20dB. C: Estimated traces with n = 20\n(2.5\u00d7 undersampling), SNR = 20dB. D: True and estimated spikes from the traces shown in panels\nB and C for a randomly selected neuron. E: Relative error between true and estimated traces for\ndifferent number of measurements per timestep under different noise levels. The error decreases with\nthe number of observations and the reconstruction is stable with respect to noise.\n\nthe signal-to-noise ratio (SNR) in dB is de\ufb01ned as SNR = 10 log10(Var[\u03b2T f (t)]/N \u03c32); a quick\ncalculation reveals that SNR = 10 log10(p(1 \u2212 p)/(1 \u2212 \u03b32)\u03c32).\nFig. 1 examines the solution of (P-QP) when the number of measurements per timestep n varied from\n1 to N and for 8 different SNR values 0, 5, . . . , 30 plus the noiseless case (SNR = \u221e). Fig. 1A\nshows the noiseless traces for all the neurons and panels B and C show the reconstructed traces for\nSNR = 20dB and n = 5, 20 respectively. Fig. 1D shows the estimated spikes for these cases for a\nrandomly picked neuron. For very small number of measurements (n = 5, i.e., 10\u00d7 undersampling)\nthe inferred calcium traces (Fig. 1B) already closely resemble the true traces. However, the inferred\nMAP values of the spikes (computed by S = CGT , essentially a differencing operation here) lie\nin the interior of [0, 1], and the results are not directly interpretable at a high temporal resolution.\nAs n increases (n = 20, red) the estimated spikes lie very close to {0, 1} and a simple thresholding\nprocedure can recover the true spike times. In Fig. 1E the relative error between the estimated and\ntrue traces ((cid:107)C \u2212 \u02c6C(cid:107)F /(cid:107)C(cid:107)F , with (cid:107)\u00b7(cid:107)F denoting the the Frobenius norm) is plotted. In general the\nerror decreases with the number of observations and the reconstruction is robust with noise. Finally,\nby observing the noiseless case (dashed curve) we see that when n \u2265 13 the error becomes practically\nzero indicating fully compressed acquisition of the calcium traces with a roughly 4\u00d7 undersampling\nfactor. We will see below that this undersampling factor is inversely proportional to the \ufb01ring rate:\nwe can recover highly sparse spike signals S using very few measurements n.\n\n2.2 Estimation of the spatial matrix A\n\nThe above algorithm assumes that the underlying neurons have known locations, i.e., the matrix\nA is known. In some cases A can be estimated a-priori by running a conventional raster-scanning\nexperiment at a high spatial resolution and locating the active voxels. However this approach is\nexpensive and can still be challenging due to noise and possible spatial overlap between different\nneurons. To estimate A within the compressive framework we note that the baseline-subtracted\nspatiotemporal calcium matrix F (see eqs. (2) and (4)) can be written as \u00afF = F \u2212 b1T\nT = AC; thus\nrank( \u00afF ) \u2264 N where N is the number of underlying neurons, with typically N (cid:28) d. Since N is also\nin general unknown we estimate \u00afF by solving a nuclear norm penalized problem (Recht et al., 2010)\n\n( \u00afy(t) \u2212 Bt\n\n\u00aff (t))T \u03a3\u22121\n\nt ( \u00afy(t) \u2212 Bt\n\n\u00aff (t)) \u2212 log p( \u00afF ) + \u03bbN N(cid:107) \u00afF(cid:107)\u2217\n\n1\n2\n\n(P-NN)\n\nT(cid:88)\n\nt=1\n\nminimize\n\n\u00afF\n\nsubject to\n\n\u00afF GT \u2265 0, \u00aff (1) \u2265 0,\n\n4\n\nTrue tracesA1020304050Neuron idEstimated traces, 5 meas., SNR = 20dBB1020304050TimestepEstimated traces, 20 meas., SNR = 20dBC100200300400500600700800900100010203040500100200300400500600700800900100001TimestepEstimated Spikes, SNR = 20db DTrue5 meas.20 meas.510152025303540455010\u2212410\u2212310\u2212210\u22121100 # of measurements per timestep Relative error EInf30 dB25 dB20 dB15 dB10 dB5 dB0 dB\fF\n\nwhere (cid:107) \u00b7 (cid:107)\u2217 denotes the nuclear norm (NN) of a matrix (i.e., the sum of its singular values), which is\na convex approximation to the nonconvex rank function (Fazel, 2002). The prior of \u00afF can be chosen\nin a similar fashion as log p(C), i..e, log p( \u00afF ) \u221d \u2212\u03bbT\n\u00afF GT 1T , where \u03bbF \u2208 Rd. Although more\ncomplex than (P-QP), (P-NN) is again convex and can be solved ef\ufb01ciently using e.g. the ADMM\nmethod of Boyd et al. (2011). From the solution of (P-NN) we can estimate N by appropriately\nthresholding the singular values of the estimated \u00afF .3 Having N we can then use appropriately\nconstrained nonnegative matrix factorization (NMF) methods to alternately estimate A and C. Note\nthat during this NMF step the baseline vector b can also be estimated jointly with A. Since NMF\nmethods are nonconvex, and thus prone to local optima, informative initialization is important. We\ncan use the solution of (P-NN) to initialize the spatial component A using clustering methods, similar\nto methods typically used in neuronal extracellular spike sorting (Lewicki, 1998). Details are given\nin the supplement (along with some discussion of the estimation of the other parameters in this\nproblem); we refer to Pnevmatikakis et al. (2013) for full details.\n\nFigure 2: Estimating locations and calcium concentration from compressive calcium imaging mea-\nsurements. A: True spatiotemporal concentration B: estimate by solving (P-NN) C: estimate by using\nNMF methods. D: Logarithmic plot of the \ufb01rst singular values of the solution of (P-NN), E: Estima-\ntion of baseline vector, F: true spatial locations G: estimated spatial locations. The NN-penalized\nmethod estimates the number of neurons and the NMF algorithm recovers the spatial and temporal\ncomponents with high accuracy.\n\nIn Fig. 2 we present an application of this method to an example with N = 8 spatially overlapping\nneurons. For simplicity we consider neurons in a one-dimensional \ufb01eld with total number of voxels\nd = 128 and spatial positions shown in Fig. 2E. At each timestep we obtain just n = 5 noisy\nmeasurements using random projections on binary masks. From the solution to the NN-penalized\nproblem (P-NN) (Fig. 2B) we threshold the singular values (Fig. 2D) and estimate the number of\nunderlying neurons (note the logarithmic gap between the 8th and 9th largest singular values that\nenables this separation). We then use the NMF approach to obtain \ufb01nal estimates of the spatial\nlocations (Fig. 2G), the baseline vector (Fig. 2E), and the full spatiotemporal concentration (Fig. 2C).\nThe estimates match well with the true values. Note that n < N (cid:28) d showing that compressive\nimaging with signi\ufb01cant undersampling factors is possible, even in the case of classical raster scanning\nprotocol where the spatial locations are unknown.\n\n3 Estimation of the phase transition curve in the noiseless case\n\nThe results presented above indicate that reconstruction of the spikes is possible even with signi\ufb01cant\nundersampling. In this section we study this problem from a compressed sensing (CS) perspective\nin the idealized case where the measurements are noiseless. For simplicity, we also assume that\nA = I (similar to a RAMP setup). Unlike the traditional CS setup, where a sparse signal (in some\nbasis) is sensed with a dense fully supported random matrix, in our case the sensing matrix B has a\nblock-diagonal form. A standard justi\ufb01cation of CS approaches proceeds by establishing that the\nsensing matrix satis\ufb01es the \u201crestricted isometry property\u201d (RIP) for certain classes of sparse signals\n\n3To reduce potential shrinkage but promote low-rank solutions this \u201cglobal\u201d NN penalty can be replaced by a\n\nseries of \u201clocal\u201d NN penalties on spatially overlapping patches.\n\n5\n\nTrue ConcentrationTimestepVoxel #A100200300400500600700800900100020406080100120NN EstimateTimestepB1002003004005006007008009001000NMF EstimateTimestepC10020030040050060070080090010002468101214100101102Singular Value ScalingD00.5100.20.40.60.81TrueEstimateBaseline estimationE20406080100120True LocationsVoxel #F20406080100120Estimated LocationsVoxel #G\fwith high probability (w.h.p.); this property in turn guarantees the correct recovery of the parameters\nof interest (Candes and Tao, 2005). Yap et al. (2011) showed that for signals that are sparse in some\northogonal basis, the RIP holds for random block-diagonal matrices w.h.p. with a number of suf\ufb01cient\nmeasurement that scales with the squared coherence between the sparse basis and the elementary\n(identity) basis. For non-orthogonal basis the RIP property has only been established for fully dense\nsensing matrices (Candes et al., 2011). For signals with sparse variations Ba et al. (2012) established\nperfect and stable recovery conditions under the assumption that the sensing matrix at each timestep\nsatis\ufb01es certain RIPs, and the sparsity level at each timestep has known upper bounds.\nWhile the RIP is a valuable tool for the study of convex relaxation approaches to compressed sensing\nproblems, its estimates are usually up to a constant and can be relatively loose (Blanchard et al.,\n2011). An alternative viewpoint is offered from conic geometric arguments (Chandrasekaran et al.,\n2012; Amelunxen et al., 2013) that examine how many measurements are required such that the\nconvex relaxed program will have a unique solution which coincides with the true sparse solution.\nWe use this approach to study the theoretical properties of our proposed compressed calcium imaging\nframework in an idealized noiseless setting. When noise is absent, the quadratic program (P-QP) for\nthe approximate MAP estimate converges to a linear program4:\n\nminimize\n\n(cid:26) (v \u2297 1N )T vec(C),\n\nC\n\n(G \u2297 Id)vec(C) \u2265 0\n\nf (C), subject to: Bvec(C) = vec(Y )\n\n(P-LP)\n\nf (C) =\n\nwith\nHere \u2297 denotes the Kronecker product and we used the identity vec(CGT ) = (G \u2297 Id)vec(C). To\nexamine the properties of (P-LP) we follow the approach of Amelunxen et al. (2013): For a fully\ndense sensing i.i.d. Gaussian (or random rotation) matrix B, the linear program (P-LP) will succeed\nw.h.p. to reconstruct the true solution C0, if the total number of measurements nT satis\ufb01es\n\n\u221e, otherwise\n\n, and v = GT 1T .\n\n(5)\nD(f, C0) is the descent cone of f at C0, induced by the set of non-increasing directions from C0, i.e.,\n\nT N ).\n\nnT \u2265 \u03b4(D(f, C0)) + O(\n\n\u221a\n\nD(f, C0) = \u222a\u03c4\u22650\n\n(6)\nand \u03b4(C) is the \u201cstatistical dimension\u201d (SD) of a convex cone C \u2286 Rm, de\ufb01ned as the expected\nsquared length of a standard normal Gaussian vector projected onto the cone\n\n(cid:8)y \u2208 RN\u00d7T : f (C0 + \u03c4 y) \u2264 f (C0)(cid:9) ,\n\n\u03b4(C) = Eg(cid:107)\u03a0C(g)(cid:107)2, with g \u223c N (0, Im).\n\nEq. (5), and the analysis of Amelunxen et al. (2013), state that as T N \u2192 \u221e, the probability that\n(P-LP) will succeed to \ufb01nd the true solution undergoes a phase transition, and that the phase transition\ncurve (PTC), i.e., the number of measurements required for perfect reconstruction normalized by\nthe ambient dimension N T (Donoho and Tanner, 2009), coincides with the normalized SD. In our\ncase B is a block-diagonal matrix (not a fully-dense Gaussian matrix), and the SD only provides an\nestimate of the PTC. However, as we show below, this estimate is tight in most cases of interest.\n\n3.1 Computing the statistical dimension\n\nUsing a result from Amelunxen et al. (2013) the statistical dimension can also be expressed as the\nexpected squared distance of a standard normal vector from the cone induced by the subdifferential\n(Rockafellar, 1970) \u2202f of f at the true solution C0:\n\n\u03b4(D(f, C0) = Eg inf\n\nmin\n\nu\u2208\u03c4 \u2202f (C0)\n\n\u03c4 >0\n\n(cid:107)g \u2212 u(cid:107)2, with g \u223c N (0, IN T ).\n\n(7)\n\nAlthough in general (7) cannot be solved in closed form, it can be easily estimated numerically; in the\nsupplementary material we show that the subdifferential \u2202f (C0) takes the form of a convex polytope,\ni.e., an intersection of linear half spaces. As a result, the distance of any vector g from \u2202f (C0) can\nbe found by solving a simple quadratic program, and the statistical dimension can be estimated with\na simple Monte-Carlo simulation (details are presented in the supplement). The characterization\nof (7) also explains the effect of the sparsity pattern on the SD. In the case where the sparse basis\n\n4To illustrate the generality of our approach we allow for arbitrary nonnegative spike values in this analysis,\n\nbut we also discuss the binary case that is of direct interest to our compressive calcium framework.\n\n6\n\n\fis the identity then the cone induced by the subdifferential can be decomposed as the union of the\nrespective subdifferential cones induced by each coordinate. It follows that the SD is invariant to\ncoordinate permutations and depends only on the sparsity level, i.e., the number of nonzero elements.\nHowever, this result is in general not true for a nonorthogonal sparse basis, indicating that the precise\nlocation of the spikes (sparsity pattern) and not just their number has an effect on the SD. In our case\nthe calcium signal is sparse in the non-orthogonal basis described by the matrix G from (4).\n\n3.2 Relation with the phase transition curve\n\nIn this section we examine the relation of the SD with the PTC for our compressive calcium imaging\nproblem. Let S denote the set of spikes, \u2126 = supp(S), and C the induced calcium traces C = SG\u2212T .\nAs we argued, the statistical dimension of the descent cone D(f, C) depends both on the cardinality\nof the spike set |\u2126| (sparsity level) and the location of the spikes (sparsity pattern). To examine\nthe effects of the sparsity level and pattern we de\ufb01ne the normalized expected statistical dimension\n(NESD) with respect to a certain distribution (e.g. Bernoulli) \u03c7 from which the spikes S are drawn.\n\n\u02dc\u03b4(k/N T, \u03c7) = E\u2126\u223c\u03c7 [\u03b4(D(f, C))/N T ] , with supp(S) = \u2126, and E\u2126\u223c\u03c7|\u2126| = k.\n\nIn Fig. 3 we examine the relation of the NESD with the phase transition curve of the noiseless problem\n(P-LP). We consider a setup with N = 40 point neurons (A = Id, d = N ) observed over T = 50\ntimesteps and chose discrete time constant \u03b3 = 0.99. The spike-times of each neuron came from\nthe same distribution and we considered two different distributions: (i) Bernoulli spiking, i.e., each\nneuron \ufb01res i.i.d. spikes with the probability k/T , and (ii) desynchronized periodic spiking where\neach neuron \ufb01res deterministically spikes with discrete frequency k/T timesteps\u22121, and each neuron\nhas a random phase. We considered two forms of spikes: (i) with nonnegative values (si(t) \u2265 0),\nand (ii) with binary values (si(t) = {0, 1}), and we also considered two forms of sensing matrices:\n(i) with time-varying matrix Bt, and (ii) with constant, fully supported matrices B1 = . . . = BT .\nThe entries of each Bt are again drawn from an i.i.d. fair Bernoulli distribution. For each of these\n8 different conditions we varied the expected number of spikes per neuron k from 1 to T and the\nnumber of observed measurements n from 1 to N. Fig. 3 shows the empirical probability that the\nprogram (P-LP) will succeed in reconstructing the true solution averaged over a 100 repetitions.\nSuccess is declared when the reconstructed spike signal \u02c6S satis\ufb01es5 (cid:107) \u02c6S \u2212 S(cid:107)F /(cid:107)S(cid:107)F < 10\u22123. We\nalso plot the empirical PTC (purple dashed line), i.e., the empirical 50% success probability line, and\nthe NESD (solid blue line), approximated with a Monte Carlo simulation using 200 samples, for each\nof the four distinct cases (note that the SD does not depend on the structure of the sensing matrix B).\nIn all cases, our problem undergoes a sharp phase transition as the number of measurements per\ntimestep varies: in the white regions of Fig. 3, S is recovered essentially perfectly, with a transition\nto a high probability of at least some errors in the black regions. Note that the phase transitions are\nde\ufb01ned as functions of the sparsity index k/T ; the signal sparsity sets the compressibility of the data.\nIn addition, in the case of time-varying Bt, the NESD provides a surprisingly good estimate of the\nPTC, especially in the binary case or when the spiking signal is actually sparse (k/T < 0.5), a result\nthat justi\ufb01es our overall approach. Although using time-varying sensing matrices Bt leads to better\nresults, compression is also possible with a constant B. This is an important result for implementation\npurposes where changing the sensing matrix might be a costly or slow operation. On a more technical\nside we also observe the following interesting properties:\n\u2022 Periodic spiking requires more measurements for accurate deconvolution, a property again\npredicted by the SD. This comes from the fact that the sparse basis is not orthogonal and\nshows that for a \ufb01xed sparsity level k/T the sparsity pattern also affects the number of required\nmeasurements. This difference depends on the time constant \u03b3. As \u03b3 \u2192 0, G \u2192 I; the problem\nbecomes equivalent to a standard nonnegative CS problem, where the spike pattern is irrelevant.\n\u2022 In the Bernoulli spiking nonnegative case, the SD is numerically very close to the PTC of the\nstandard nonnegative CS problem (not shown here), adding to the growing body of evidence for\nuniversal behavior of convex relaxation approaches to CS (Donoho and Tanner, 2009).\n\u2022 In the binary case the results exhibit a symmetry around the axis k/T = 0.5. In fact this symmetry\nbecomes exact as \u03b3 \u2192 1. In the supplement we prove that this result is predicted by the SD.\n5When calculating this error we excluded the last 10 timesteps. As every spike is \ufb01ltered by the AR process\nit has an effect for multiple timelags in the future and an optimal encoder has to sense it over multiple timelags.\nThis number depends only on \u03b3 and not on the length T , and thus this behavior becomes negligible as T \u2192 \u221e.\n\n7\n\n\fFigure 3: Relation of the statistical dimension with the phase transition curve for two different\nspiking distributions (Bernouli, periodic), two different spike values (nonnegative, binary), and two\nclasses of sensing matrices (time-varying, constant). For each panel: x-axis normalized sparsity k/T ,\ny-axis undersampling index n/N. Each panel shows the empirical success probability for each pair\n(k/T, n/N ), the empirical 50%-success line (dashed purple line) and the SD (blue solid line). When\nB is time-varying the SD provides a good estimate of the empirical PTC.\n\nAs mentioned above, our analysis is only approximate since B is block-diagonal and not fully\ndense. However, this approximation is tight in the time-varying case. Still, it is possible to construct\nadversarial counterexamples where the SD approach fails to provide a good estimate of the PTC.\nFor example, if all neurons \ufb01re in a completely synchronized manner then the required number\nof measurements grows at a rate that is not predicted by (5). We present such an example in the\nsupplement and note that more research is needed to understand such extreme cases.\n\n4 Conclusion\n\nWe proposed a framework for compressive calcium imaging. Using convex relaxation tools from\ncompressed sensing and low rank matrix factorization, we developed an ef\ufb01cient method for extracting\nneurons\u2019 spatial locations and the temporal locations of their spikes from a limited number of\nmeasurements, enabling the imaging of large neural populations at potentially much higher imaging\nrates than currently available. We also studied a noiseless version of our problem from a compressed\nsensing point of view using newly introduced tools involving the statistical dimension of convex\ncones. Our analysis can in certain cases capture the number of measurements needed for perfect\ndeconvolution, and helps explain the effects of different spike patterns on reconstruction performance.\nOur approach suggests potential improvements over the standard raster scanning protocol (unknown\nlocations) as well as the more ef\ufb01cient RAMP protocol (known locations). However our analysis is\nidealistic and neglects several issues that can arise in practice. The results of Fig. 1 suggest a tradeoff\nbetween effective compression and SNR level. In the compressive framework the cycle length can be\nrelaxed more easily due to the parallel nature of the imaging (each location is targeted during the\nwhole \u201ccycle\u201d). The summed activity is then collected by the photomultiplier tube that introduces the\nnoise. While the nature of this addition has to be examined in practice, we expect that the observed\nSNR will allow for signi\ufb01cant compression. Another important issue is motion correction for brain\nmovement, especially in vivo conditions. While new approaches have to be derived for this problem,\nthe novel approach of Cotton et al. (2013) could be adaptable to our setting. We hope that our work\nwill inspire experimentalists to leverage the proposed advanced signal processing methods to develop\nmore ef\ufb01cient imaging protocols.\n\nAcknowledgements\nLP is supported from an NSF career award. This work is also supported by ARO MURI W911NF-12-\n1-0594.\n\n8\n\nBernouli spikingUndersampling Index0.10.20.30.40.50.60.70.80.91Nonnegative SpikesSparsity Index 0.10.20.30.40.50.60.70.80.91Statistical dimensionEmpirical PTCBinary SpikesSparsity IndexPeriodic spikingUndersampling IndexTime\u2212varying B0.20.40.60.810.10.20.30.40.50.60.70.80.91Constant B0.20.40.60.81Time\u2212varying B0.20.40.60.810.10.20.30.40.50.60.70.80.91Constant B 0.20.40.60.8100.10.20.30.40.50.60.70.80.91\fReferences\nAhrens, M. B., M. B. Orger, D. N. Robson, J. M. Li, and P. J. Keller (2013). Whole-brain functional imaging at\n\ncellular resolution using light-sheet microscopy. Nature methods 10(5), 413\u2013420.\n\nAmelunxen, D., M. Lotz, M. B. McCoy, and J. A. Tropp (2013). Living on the edge: A geometric theory of\n\nphase transitions in convex optimization. arXiv preprint arXiv:1303.6672.\n\nBa, D., B. Babadi, P. Purdon, and E. Brown (2012). Exact and stable recovery of sequences of signals with\nsparse increments via differential l1-minimization. In Advances in Neural Information Processing Systems\n25, pp. 2636\u20132644.\n\nBlanchard, J. D., C. Cartis, and J. Tanner (2011). Compressed sensing: How sharp is the restricted isometry\n\nproperty? SIAM review 53(1), 105\u2013125.\nBoyd, S., N. Parikh, E. Chu, B. Peleato, and J. Eckstein (2011). Distributed optimization and statistical learning\nvia the alternating direction method of multipliers. Foundations and Trends R(cid:13) in Machine Learning 3(1),\n1\u2013122.\n\nBranco, T., B. A. Clark, and M. H\u00a8ausser (2010). Dendritic discrimination of temporal input sequences in cortical\n\nneurons. Science 329, 1671\u20131675.\n\nCandes, E. J., Y. C. Eldar, D. Needell, and P. Randall (2011). Compressed sensing with coherent and redundant\n\ndictionaries. Applied and Computational Harmonic Analysis 31(1), 59\u201373.\n\nCandes, E. J. and T. Tao (2005). Decoding by linear programming. Information Theory, IEEE Transactions\n\non 51(12), 4203\u20134215.\n\nChandrasekaran, V., B. Recht, P. A. Parrilo, and A. S. Willsky (2012). The convex geometry of linear inverse\n\nproblems. Foundations of Computational Mathematics 12(6), 805\u2013849.\n\nCotton, R. J., E. Froudarakis, P. Storer, P. Saggau, and A. S. Tolias (2013). Three-dimensional mapping of\n\nmicrocircuit correlation structure. Frontiers in Neural Circuits 7.\n\nDonoho, D. and J. Tanner (2009). Observed universality of phase transitions in high-dimensional geometry, with\nimplications for modern data analysis and signal processing. Philosophical Transactions of the Royal Society\nA: Mathematical, Physical and Engineering Sciences 367(1906), 4273\u20134293.\n\nDuarte, M. F., M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk (2008).\n\nSingle-pixel imaging via compressive sampling. Signal Processing Magazine, IEEE 25(2), 83\u201391.\n\nFazel, M. (2002). Matrix rank minimization with applications. Ph. D. thesis, Stanford University.\nGehm, M., R. John, D. Brady, R. Willett, and T. Schulz (2007). Single-shot compressive spectral imaging with a\n\ndual-disperser architecture. Opt. Express 15(21), 14013\u201314027.\n\nKatz, O., Y. Bromberg, and Y. Silberberg (2009). Compressive ghost imaging. Applied Physics Letters 95(13).\nLewicki, M. (1998). A review of methods for spike sorting: the detection and classi\ufb01cation of neural action\n\npotentials. Network: Computation in Neural Systems 9, R53\u2013R78.\n\nLustig, M., D. Donoho, and J. M. Pauly (2007). Sparse MRI: The application of compressed sensing for rapid\n\nMR imaging. Magnetic resonance in medicine 58(6), 1182\u20131195.\n\nNikolenko, V., B. Watson, R. Araya, A. Woodruff, D. Peterka, and R. Yuste (2008). SLM microscopy: Scanless\ntwo-photon imaging and photostimulation using spatial light modulators. Frontiers in Neural Circuits 2, 5.\nPnevmatikakis, E., T. Machado, L. Grosenick, B. Poole, J. Vogelstein, and L. Paninski (2013). Rank-penalized\nnonnegative spatiotemporal deconvolution and demixing of calcium imaging data. In Computational and\nSystems Neuroscience Meeting COSYNE. (journal paper in preparation for PLoS Computational Biology).\nRecht, B., M. Fazel, and P. Parrilo (2010). Guaranteed minimum-rank solutions of linear matrix equations via\n\nnuclear norm minimization. SIAM review 52(3), 471\u2013501.\n\nReddy, G., K. Kelleher, R. Fink, and P. Saggau (2008). Three-dimensional random access multiphoton\n\nmicroscopy for functional imaging of neuronal activity. Nature Neuroscience 11(6), 713\u2013720.\n\nRockafellar, R. (1970). Convex Analysis. Princeton University Press.\nRust, M. J., M. Bates, and X. Zhuang (2006). Sub-diffraction-limit imaging by stochastic optical reconstruction\n\nmicroscopy (STORM). Nature methods 3(10), 793\u2013796.\n\nStuder, V., J. Bobin, M. Chahid, H. S. Mousavi, E. Candes, and M. Dahan (2012). Compressive \ufb02uorescence mi-\ncroscopy for biological and hyperspectral imaging. Proceedings of the National Academy of Sciences 109(26),\nE1679\u2013E1687.\n\nVogelstein, J., A. Packer, T. Machado, T. Sippy, B. Babadi, R. Yuste, and L. Paninski (2010). Fast non-negative\ndeconvolution for spike train inference from population calcium imaging. Journal of Neurophysiology 104(6),\n3691\u20133704.\n\nYap, H. L., A. Eftekhari, M. B. Wakin, and C. J. Rozell (2011). The restricted isometry property for block\ndiagonal matrices. In Information Sciences and Systems (CISS), 2011 45th Annual Conference on, pp. 1\u20136.\n\n9\n\n\f", "award": [], "sourceid": 645, "authors": [{"given_name": "Eftychios", "family_name": "Pnevmatikakis", "institution": "Columbia University"}, {"given_name": "Liam", "family_name": "Paninski", "institution": "Columbia University"}]}