{"title": "High-dimensional neural spike train analysis with generalized count linear dynamical systems", "book": "Advances in Neural Information Processing Systems", "page_first": 2044, "page_last": 2052, "abstract": "Latent factor models have been widely used to analyze simultaneous recordings of spike trains from large, heterogeneous neural populations.  These models assume  the signal of interest in the population is a low-dimensional latent intensity that evolves over time, which is observed in high dimension via noisy point-process observations.  These techniques have been well used to capture neural correlations across a population and to provide a smooth, denoised, and concise representation of high-dimensional spiking data.  One limitation of many current models is that the observation model is assumed to be Poisson, which lacks the flexibility to capture under- and over-dispersion that is common in recorded neural data, thereby introducing bias into estimates of covariance.  Here we develop the generalized count linear dynamical system, which relaxes the Poisson assumption by using a more general exponential family for count data.  In addition to containing Poisson, Bernoulli, negative binomial, and other common count distributions as special cases, we show that this model can be tractably learned by extending recent advances in variational inference techniques.  We apply our model to data from primate motor cortex and demonstrate performance improvements over state-of-the-art methods, both in capturing the variance structure of the data and in held-out prediction.", "full_text": "High-dimensional neural spike train analysis with\n\ngeneralized count linear dynamical systems\n\nYuanjun Gao\n\nDepartment of Statistics\n\nColumbia University\nNew York, NY 10027\n\nyg2312@columbia.edu\n\nLars Buesing\n\nDepartment of Statistics\n\nColumbia University\nNew York, NY 10027\n\nlars@stat.columbia.edu\n\nKrishna V. Shenoy\n\nDepartment of Electrical Engineering\n\nStanford University\nStanford, CA 94305\n\nshenoy@stanford.edu\n\nJohn P. Cunningham\nDepartment of Statistics\n\nColumbia University\nNew York, NY 10027\n\njpc2181@columbia.edu\n\nAbstract\n\nLatent factor models have been widely used to analyze simultaneous recordings of\nspike trains from large, heterogeneous neural populations. These models assume\nthe signal of interest in the population is a low-dimensional latent intensity that\nevolves over time, which is observed in high dimension via noisy point-process\nobservations. These techniques have been well used to capture neural correlations\nacross a population and to provide a smooth, denoised, and concise representa-\ntion of high-dimensional spiking data. One limitation of many current models\nis that the observation model is assumed to be Poisson, which lacks the \ufb02exibil-\nity to capture under- and over-dispersion that is common in recorded neural data,\nthereby introducing bias into estimates of covariance. Here we develop the gen-\neralized count linear dynamical system, which relaxes the Poisson assumption by\nusing a more general exponential family for count data. In addition to contain-\ning Poisson, Bernoulli, negative binomial, and other common count distributions\nas special cases, we show that this model can be tractably learned by extend-\ning recent advances in variational inference techniques. We apply our model to\ndata from primate motor cortex and demonstrate performance improvements over\nstate-of-the-art methods, both in capturing the variance structure of the data and\nin held-out prediction.\n\n1\n\nIntroduction\n\nMany studies and theories in neuroscience posit that high-dimensional populations of neural spike\ntrains are a noisy observation of some underlying, low-dimensional, and time-varying signal of\ninterest. As such, over the last decade researchers have developed and used a number of methods\nfor jointly analyzing populations of simultaneously recorded spike trains, and these techniques have\nbecome a critical part of the neural data analysis toolkit [1]. In the supervised setting, generalized\nlinear models (GLM) have used stimuli and spiking history as covariates driving the spiking of the\nneural population [2, 3, 4, 5]. In the unsupervised setting, latent variable models have been used\nto extract low-dimensional hidden structure that captures the variability of the recorded data, both\ntemporally and across the population of neurons [6, 7, 8, 9, 10, 11].\n\n1\n\n\fIn both these settings, however, a limitation is that spike trains are typically assumed to be condition-\nally Poisson, given the shared signal [8, 10, 11]. The Poisson assumption, while offering algorithmic\nconveniences in many cases, implies the property of equal dispersion: the conditional mean and vari-\nance are equal. This well-known property is particularly troublesome in the analysis of neural spike\ntrains, which are commonly observed to be either over- or under-dispersed [12] (variance greater\nthan or less than the mean). No doubly stochastic process with a Poisson observation can capture\nunder-dispersion, and while such a model can capture over-dispersion, it must do so at the cost of\nerroneously attributing variance to the latent signal, rather than the observation process.\nTo allow for deviation from the Poisson assumption, some previous work has instead modeled the\ndata as Gaussian [7] or using more general renewal process models [13, 14, 15]; the former of\nwhich does not match the count nature of the data and has been found inferior [8], and the latter of\nwhich requires costly inference that has not been extended to the population setting. More general\ndistributions like the negative binomial have been proposed [16, 17, 18], but again these families do\nnot generalize to cases of under-dispersion. Furthermore, these more general distributions have not\nyet been applied to the important setting of latent variable models.\nHere we employ a count-valued exponential family distribution that addresses these needs and in-\ncludes much previous work as special cases. We call this distribution the generalized count (GC)\ndistribution [19], and we offer here four main contributions: (i) we introduce the GC distribution and\nderive a variety of commonly used distributions that are special cases, using the GLM as a motivat-\ning example (\u00a72); (ii) we combine this observation likelihood with a latent linear dynamical systems\nprior to form a GC linear dynamical system (GCLDS; \u00a73); (iii) we develop a variational learning al-\ngorithm by extending the current state-of-the-art methods [20] to the GCLDS setting (\u00a73.1); and (iv)\nwe show in data from the primate motor cortex that the GCLDS model provides superior predictive\nperformance and in particular captures data covariance better than Poisson models (\u00a74).\n\n2 Generalized count distributions\n\nWe de\ufb01ne the generalized count distribution as the family of count-valued probability distributions:\n\npGC(k; \u03b8, g(\u00b7)) =\n\nexp(\u03b8k + g(k))\nk!M (\u03b8, g(\u00b7))\n\n, k \u2208 N\n\nk=0\n\nk!\n\nexp(\u03b8k+g(k))\n\n(cid:80)\u221e\n\n(1)\nwhere \u03b8 \u2208 R and the function g : N \u2192 R parameterizes the distribution, and M (\u03b8, g(\u00b7)) =\nis the normalizing constant. The primary virtue of the GC family is that it recov-\ners all common count-valued distributions as special cases and naturally parameterizes many com-\nmon supervised and unsupervised models (as will be shown); for example, the function g(k) = 0\nimplies a Poisson distribution with rate parameter \u03bb = exp{\u03b8}. Generalizations of the Poisson\ndistribution have been of interest since at least [21], and the paper [19] introduced the GC family\nand proved two additional properties: \ufb01rst, that the expectation of any GC distribution is mono-\ntonically increasing in \u03b8, for a \ufb01xed g(k); and second \u2013 and perhaps most relevant to this study \u2013\nconcave (convex) functions g(\u00b7) imply under-dispersed (over-dispersed) GC distributions. Further-\nmore, often desired features like zero truncation or zero in\ufb02ation can also be naturally incorporated\nby modifying the g(0) value [22, 23]. Thus, with \u03b8 controlling the (log) rate of the distribution\nand g(\u00b7) controlling the \u201cshape\u201d of the distribution, the GC family provides a rich model class for\ncapturing the spiking statistics of neural data. Other discrete distribution families do exist, such as\nthe Conway-Maxwell-Poisson distribution [24] and ordered logistic/probit regression [25], but the\nGC family offers a rich exponential family, which makes computation somewhat easier and allows\nthe g(\u00b7) functions to be interpreted.\nFigure 1 demonstrates the relevance of modeling dispersion in neural data analysis. The left panel\nshows a scatterplot where each point is an individual neuron in a recorded population of neurons\nfrom primate motor cortex (experimental details will be described in \u00a74). Plotted are the mean and\nvariance of spiking activity of each neuron; activity is considered in 20ms bins. For reference, the\nequi-dispersion line implied by a homogeneous Poisson process is plotted in red, and note further\nthat all doubly stochastic Poisson models would have an implied dispersion above this Poisson line.\nThese data clearly demonstrate meaningful under-dispersion, underscoring the need for the present\nadvance. The right panel demonstrates the appropriateness of the GC model class, showing that a\nconvex/linear/concave function g(k) will produce the expected over/equal/under-dispersion. Given\n\n2\n\n\fthe left panel, we expect under-dispersed GC distributions to be most relevant, but indeed many\nneural datasets also demonstrate over and equi-dispersion [12], highlighting the need for a \ufb02exible\nobservation family.\n\nFigure 1: Left panel: mean \ufb01ring rate and variance of neurons in primate motor cortex during\nthe peri-movement period of a reaching experiment (see \u00a74). The data exhibit under-dispersion,\nespecially for high \ufb01ring-rate neurons. The two marked neurons will be analyzed in detail in Figure\n2. Right panel: the expectation and variance of the GC distribution with different choices of the\nfunction g\n\nyi \u223c GC(\u03b8(xi), g(\u00b7)), where \u03b8(xi) = xi\u03b2.\n\nTo illustrate the generality of the GC family and to lay the foundation for our unsupervised learning\napproach, we consider brie\ufb02y the case of supervised learning of neural spike train data, where gener-\nalized linear models (GLM) have been used extensively [4, 26, 17]. We de\ufb01ne GCGLM as that which\nmodels a single neuron with count data yi \u2208 N, and associated covariates xi \u2208 Rp(i = 1, ..., n) as\n(2)\nHere GC(\u03b8, g(\u00b7)) denotes a random variable distributed according to (1), \u03b2 \u2208 Rp are the regression\ncoef\ufb01cients. This GCGLM model is highly general. Table 1 shows that many of the commonly\nused count-data models are special cases of GCGLM, by restricting the g(\u00b7) function to have certain\nparametric form. In addition to this convenient generality, one bene\ufb01t of our parametrization of the\nGC model is that the curvature of g(\u00b7) directly measures the extent to which the data deviate from\nthe Poisson assumption, allowing us to meaningfully interrogate the form of g(\u00b7). Note that (2) has\nno intercept term because it can be absorbed in the g(\u00b7) function as a linear term \u03b1k (see Table 1).\nUnlike previous GC work [19], our parameterization implies that maximum likelihood parameter\nestimation (MLE) is a tractable convex program, which can be seen by considering:\n( \u02c6\u03b2, \u02c6g(\u00b7)) = arg max\n(3)\n(\u03b2,g(\u00b7))\nFirst note that, although we have to optimize over a function g(\u00b7) that is de\ufb01ned on all non-negative\nintegers, we can exploit the empirical support of the distribution to produce a \ufb01nite optimization\nproblem. Namely, for any k\u2217 that is not achieved by any data point yi (i.e., the count #{i|yi =\nk\u2217} = 0), the MLE for g(k\u2217) must be \u2212\u221e, and thus we only need to optimize g(k) for k that\nhave empirical support in the data. Thus g(k) is a \ufb01nite dimensional vector. To avoid the potential\nover\ufb01tting caused by truncation of gi(\u00b7) beyond the empirical support of the data, we can enforce a\nlarge (\ufb01nite) support and impose a quadratic penalty on the second difference of g(.), to encourage\nlinearity in g(\u00b7) (which corresponds to a Poisson distribution). Second, note that we can \ufb01x g(0) = 0\nwithout loss of generality, which ensures model identi\ufb01ability. With these constraints, the remaining\ng(k) values can be \ufb01t as free parameters or as convex-constrained (a set of linear inequalities on g(k);\nsimilarly for concave case). Finally, problem convexity is ensured as all terms are either linear or\nlinear within the log-sum-exp function M (\u00b7), leading to fast optimization algorithms [27].\n\n[(xi\u03b2)yi + g(yi) \u2212 log M (xi\u03b2, g(\u00b7))] .\n\nlog p(yi) = arg max\n(\u03b2,g(\u00b7))\n\nn(cid:88)\n\ni=1\n\nn(cid:88)\n\ni=1\n\n3 Generalized count linear dynamical system model\n\nWith the GC distribution in hand, we now turn to the unsupervised setting, namely coupling the GC\nobservation model with a latent, low-dimensional dynamical system. Our model is a generalization\n\n3\n\n00.511.5200.511.52neuron 1neuron 2Mean firing rate per time bin (20ms)Variance00.511.522.500.511.522.53ExpectationVariance Convex gLinear gConcave g\fModel Name\n\nLogistic regression\n(e.g. [25])\n\nPoisson regression\n(e.g., [4, 26] )\n\nP (y = k) =\n\nP (y = k) =\n\nTypical Parameterization\nexp (k(\u03b1 + x\u03b2))\n1 + exp(\u03b1 + x\u03b2)\n\u03bbk\nk!\n\nexp(\u2212\u03bb);\n\u03bb = exp(\u03b1 + x\u03b2)\n\nAdjacent category regression\n(e.g., [25] )\n\nP (y = k + 1)\n\nP (y = k)\n\n= exp(\u03b1k + x\u03b2)\n\nNegative binomial regression\n(e.g., [17, 18])\n\nP (y = k) =\n\nGCGLM Parametrization\n\ng(k) = \u03b1k; k = 0, 1\n\ng(k) = \u03b1k\n\nk(cid:88)\n\ng(k) =\n\n(\u03b1i\u22121 + log i);\n\ni=1\n\nk =0, 1, ..., K\n\ng(k) =\u03b1k + log (k + r \u2212 1)!\n\nTable 1: Special cases of GCGLM. For all models, the GCGLM parametrization for \u03b8 is only as-\nsociated with the slope \u03b8(x) = \u03b2x, and the intercept \u03b1 is absorbed into the g(\u00b7) function. In all\ncases we have g(k) = \u2212\u221e outside the stated support of the distribution. Whenever unspeci\ufb01ed, the\nsupport of the distribution and the domain of the g(\u00b7) function are non-negative integers N.\n\n(k + r \u2212 1)!\nk!(r \u2212 1)!\np = exp(\u03b1 + x\u03b2)\n\n(1 \u2212 p)rpk\n\n+\u221e(cid:88)\n\nj=1\n\nCOM-Poisson regression\n(e.g., [24])\n\nP (y = k) =\n\n\u03bbk\n(k!)\u03bd /\n\n\u03bbj\n(j!)\u03bd\n\ng(k) = \u03b1k + (1 \u2212 \u03bd) log k!\n\n\u03bb = exp(\u03b1 + x\u03b2)\n\nof linear dynamical systems with Poisson likelihoods (PLDS), which have been extensively used\nfor analysis of populations of neural spike trains [8, 11, 28, 29]. Denoting yrti as the observed\nspike-count of neuron i \u2208 {1, ..., N} at time t \u2208 {1, ..., T} on experimental trial r \u2208 {1, ..., R},\nthe PLDS assumes that the spike activity of neurons is a noisy Poisson observation of an underlying\nlow-dimensional latent state xrt \u2208 Rp,(where p (cid:28) N), such that:\ni xrt + di\n\n(4)\n(cid:62) \u2208 RN\u00d7p is the factor loading matrix mapping the latent state xrt to a\nHere C = [c1\nlog rate, with time and trial invariant baseline log rate d \u2208 RN . Thus the vector Cxrt + d denotes\nthe vector of log rates for trial r and time t. Critically, the latent state xrt can be interpreted as the\nunderlying signal of interest that acts as the \u201ccommon input signal\u201d to all neurons, which is modeled\na priori as a linear Gaussian dynamical system (to capture temporal correlations):\n\nyrti|xrt \u223c Poisson(cid:0)exp(cid:8)c(cid:62)\n\n(cid:9)(cid:1) .\n\ncN ]\n\n...\n\nxr1 \u223c N (\u00b51, Q1)\n\nxr(t+1)|xrt \u223c N (Axrt + bt, Q),\n\n(5)\nwhere \u00b51 \u2208 Rp and Q1 \u2208 Rp\u00d7p parameterize the initial state. The transition matrix A \u2208 Rp\u00d7p\nand innovations covariance Q \u2208 Rp\u00d7p parameterize the dynamical state update. The optional term\nbt \u2208 Rp allows the model to capture a time-varying \ufb01ring rate that is \ufb01xed across experimental\ntrials. The PLDS has been widely used and has been shown to outperform other models in terms of\npredictive performance, including in particular the simpler Gaussian linear dynamical system [8].\nThe PLDS model is naturally extended to what we term the generalized count linear dynamical\nsystem (GCLDS) by modifying equation (4) using a GC likelihood:\n\n(6)\nWhere gi(\u00b7) is the g(\u00b7) function in (1) that models the dispersion for neuron i. Similar to the GLM,\nfor identi\ufb01ability, the baseline rate parameter d is dropped in (6) and we can \ufb01x g(0) = 0. As with\nthe GCGLM, one can recover preexisting models, such as an LDS with a Bernoulli observation, as\nspecial cases of GCLDS (see Table 1).\n\nyrti|xrt \u223c GC(cid:0)c(cid:62)\n\ni xrt, gi(\u00b7)(cid:1) .\n\n3.1\n\nInference and learning in GCLDS\n\nAs is common in LDS models, we use expectation-maximization to learn parameters \u0398 =\n{A,{bt}t, Q, Q1, \u00b51,{gi(\u00b7)}i, C} . Because the required expectations do not admit a closed form\n\n4\n\n\fas in previous similar work [8, 30], we required an additional approximation step, which we im-\nplemented via a variational lower bound. Here we brie\ufb02y outline this algorithm and our novel\ncontributions, and we refer the reader to the full details in the supplementary materials.\nFirst, each E-step requires calculating p(xr|yr, \u0398) for each trial r \u2208 {1, ..., R} (the conditional dis-\ntribution of the latent trajectories xr = {xrt}1\u2264t\u2264T , given observations yr = {yrti}1\u2264t\u2264T,1\u2264i\u2264N\nand parameter \u0398). For ease of notation below we drop the trial index r. These posterior distribu-\ntions are intractable, and in the usual way we make a normal approximation p(x|y, \u0398) \u2248 q(x) =\nN (m, V ). We identify the optimal (m, V ) by maximizing a variational Bayesian lower bound (the\nso-called evidence lower bound or \u201cELBO\u201d) over the variational parameters m, V as:\nL(m, V ) =Eq(x)\n\n(cid:20)\n(cid:0)log |V | \u2212 tr[\u03a3\u22121V ] \u2212 (m \u2212 \u00b5)T \u03a3\u22121(m \u2212 \u00b5)(cid:1) +\n\n(7)\nEq(xt)[log p(yti|xt)] + const,\n\n+ Eq(x)[log p(y|x, \u0398)]\n\n(cid:18) p(x|\u0398)\n\n(cid:88)\n\n(cid:19)(cid:21)\n\nq(x)\n\nlog\n\n=\n\n1\n2\n\nt,i\n\nwhich is the usual form to be maximized in a variational Bayesian EM (VBEM) algorithm [11]. Here\n\u00b5 \u2208 RpT and \u03a3 \u2208 RpT\u00d7pT are the expectation and variance of x given by the LDS prior in (5). The\n\ufb01rst term of (7) is the negative Kullback-Leibler divergence between the variational distribution and\nprior distribution, encouraging the variational distribution to be close to the prior. The second term\ninvolving the GC likelihood encourages the variational distribution to explain the observations well.\nThe integrations in the second term are intractable (this is in contrast to the PLDS case, where all\nintegrals can be calculated analytically [11]). Below we use the ideas of [20] to derive a tractable,\nfurther lower bound. Here the term Eq(xt)[log p(yti|xt)] can be reduced to:\nEq(xt)[log p(yti|xt)] =Eq(\u03b7ti) [log pGC(y|\u03b7ti, gi(\u00b7))]\n\n(cid:35)\n\n(8)\n\nyti\u03b7ti + gi(yti) \u2212 log yti! \u2212 log\n\nexp(k\u03b7ti + gi(k))\n\n,\n\nK(cid:88)\n\nk=0\n\n1\nk!\n\n(cid:34)\nreduced to Eq(\u03bd)[\u03bdtiyti \u2212 log((cid:80)\n\n=Eq(\u03b7ti)\n\ni xt. Denoting \u03bdtik = k\u03b7ti + gi(k) \u2212 log(k!) = kcT\n\ni xt + gi(k) \u2212 log k!, (8) is\nwhere \u03b7ti = cT\n0\u2264k\u2264K exp(\u03bdtik))]. Since \u03bdtik is a linear transformation of xt,\nunder the variational distribution \u03bdtik is also normally distributed \u03bdtik \u223c N (htik, \u03c1tik). We have\ni Vtci, where (mt, Vt) are the expectation and covariance\nhtik = kcT\nmatrix of xt under variational distribution. Now we can derive a lower bound for the expectation by\nJensen\u2019s inequality:\n\ni mt +gi(k)\u2212log k!, \u03c1tik = k2cT\n(cid:34)\n\n(cid:35)\n\nEq(\u03bdti)\n\n\u03bdtiyti \u2212 log\n\nexp(\u03bdtik)\n\n\u2265htiyti \u2212 log\n\nexp(htik + \u03c1tik/2) =: fti(hti, \u03c1ti).\n\n(9)\n\n(cid:88)\n\nk\n\nCombining (7) and (9), we get a tractable variational lower bound:\n\nL(m, V ) \u2265 L\u2217(m, V ) = Eq(x)\n\nlog\n\n(cid:88)\n\nt,i\n\nfti(hti, \u03c1ti).\n\n(10)\n\n+\n\nK(cid:88)\n(cid:18) p(x|\u0398)\n\nk=1\n\nq(x)\n\n(cid:19)(cid:21)\n\n(cid:20)\n\nFor computational convenience, we complete the E-step by maximizing the new evidence lower\nbound L\u2217 via its dual [20]. Full details are derived in the supplementary materials.\nThe M-step then requires maximization of L\u2217 over \u0398. Similar to the PLDS case, the set of parame-\nters involving the latent Gaussian dynamics (A,{bt}t, Q, Q1, \u00b51) can be optimized analytically [8].\nThen, the parameters involving the GC likelihood (C,{gi}i) can be optimized ef\ufb01ciently via convex\noptimization techniques [27] (full details in supplementary material).\nIn practice we initialize our VBEM algorithm with a Laplace-EM algorithm, and we initialize each\nE-step in VBEM with a Laplace approximation, which empirically gives substantial runtime advan-\ntages, and always produces a sensible optimum. With the above steps, we have a fully speci\ufb01ed\nlearning and inference algorithm, which we now use to analyze real neural data. Code can be found\nat https://bitbucket.org/mackelab/pop_spike_dyn.\n\n5\n\n\f4 Experimental results\n\nWe analyze recordings of populations of neurons in the primate motor cortex during a reaching\nexperiment (G20040123), details of which have been described previously [7, 8]. In brief, a rhesus\nmacaque monkey executed 56 cued reaches from a central target to 14 peripheral targets. Before the\nsubject was cued to move (the go cue), it was given a preparatory period to plan the upcoming reach.\nEach trial was thus separated into two temporal epochs, each of which has been suggested to have\ntheir own meaningful dynamical structure [9, 31]. We separately analyze these two periods: the\npreparatory period (1200ms period preceding the go cue), and the reaching period (50ms before to\n370ms after the movement onset). We analyzed data across all 14 reach targets, and results were\nhighly similar; in the following for simplicity we show results for a single reaching target (one 56\ntrial dataset). Spike trains were simultaneously recorded from 96 electrodes (using a Blackrock\nmulti-electrode array). We bin neural activity at 20ms. To include only units with robust activity, we\nremove all units with mean rates less than 1 spike per second on average, resulting in 81 units for the\npreparatory period, and 85 units for the reaching period. As we have already shown in Figure 1, the\nreaching period data are strongly under-dispersed, even absent conditioning on the latent dynamics\n(implying further under-dispersion in the observation noise). Data during the preparatory period are\nparticularly interesting due to its clear cross-correlation structure.\nTo fully assess the GCLDS model, we analyze four LDS models \u2013 (i) GCLDS-full: a separate func-\ntion gi(\u00b7) is \ufb01tted for each neuron i \u2208 {1, ..., N}; (ii) GCLDS-simple: a single function g(\u00b7) is shared\nacross all neurons (up to a linear term modulating the baseline \ufb01ring rate); (iii) GCLDS-linear: a\ntruncated linear function gi(\u00b7) is \ufb01tted, which corresponds to truncated-Poisson observations; and\n(iv) PLDS: the Poisson case is recovered when gi(\u00b7) is a linear function on all nonnegative integers.\nIn all cases we use the learning and inference of \u00a73.1. We initialize the PLDS using nuclear norm\nminimization [10], and initialize the GCLDS models with the \ufb01tted PLDS. For all models we vary\nthe latent dimension p from 2 to 8.\nTo demonstrate the generality of the GCLDS and verify our algorithmic implementation, we \ufb01rst\nconsidered extensive simulated data with different GCLDS parameters (not shown). In all cases\nGCLDS model outperformed PLDS in terms of negative log-likelihood (NLL) on test data, with\nhigh statistical signi\ufb01cance. We also compared the algorithms on PLDS data and found very simi-\nlar performance between GCLDS and PLDS, implying that GCLDS does not signi\ufb01cantly over\ufb01t,\ndespite the additional free parameters and computation due to the g(\u00b7) functions.\n\nAnalysis of the reaching period. Figure 2 compares the \ufb01ts of the two neural units highlighted\nin Figure 1. These two neurons are particularly high-\ufb01ring (during the reaching period), and thus\nshould be most indicative of the differences between the PLDS and GCLDS models. The left column\nof Figure 2 shows the \ufb01tted g(\u00b7) functions the for four LDS models being compared. It is apparent in\nboth the GCLDS-full and GCLDS-simple cases that the \ufb01tted g function is concave (though it was\nnot constrained to be so), agreeing with the under-dispersion observed in Figure 1.\nThe middle column of Figure 2 shows that all four cases produce models that \ufb01t the mean activity of\nthese two neurons very well. The black trace shows the empirical mean of the observed data, and all\nfour lines (highly overlapping and thus not entirely visible) follow that empirical mean closely. This\nresult is con\ufb01rmatory that the GCLDS matches the mean and the current state-of-the-art PLDS.\nMore importantly, we have noted the key feature of the GCLDS is matching the dispersion of the\ndata, and thus we expect it should outperform the PLDS in \ufb01tting variance. The right column of\nFigure 2 shows this to be the case: the PLDS signi\ufb01cantly overestimates the variance of the data.\nThe GCLDS-full model tracks the empirical variance quite closely in both neurons. The GCLDS-\nlinear result shows that only adding truncation does not materially improve the estimate of variance\nand dispersion: the dotted blue trace is quite far from the true data in black, and indeed it is quite\nclose to the Poisson case. The GCLDS-simple still outperforms the PLDS case, but it does not\nmodel the dispersion as effectively as the GPLDS-full case where each neuron has its own dispersion\nparameter (as Figure 1 suggests). The natural next question is whether this outperformance is simply\nin these two illustrative neurons, or if it is a population effect. Figure 3 shows that indeed the\npopulation is much better modeled by the GCLDS model than by competing alternatives. The left\nand middle panels of Figure 3 show leave-one-neuron-out prediction error of the LDS models. For\neach reaching target we use 4-fold cross-validation and the results are averaged across all 14 reaching\n\n6\n\n\fFigure 2: Examples of \ufb01tting result for selected high-\ufb01ring neurons. Each row corresponds to one\nneuron as marked in left panel of Figure 1 \u2013 left column: \ufb01tted g(\u00b7) using GCLDS and PLDS; middle\nand right column: \ufb01tted mean and variance of PLDS and GCLDS. See text for details.\n\nFigure 3: Goodness-of-\ufb01t for monkey data during the reaching period \u2013 left panel: percentage\nreduction of mean-squared-error (MSE) compared to the baseline (homogeneous Poisson process);\nmiddle panel: percentage reduction of predictive negative log likelihood (NLL) compared to the\nbaseline; right panel: \ufb01tted variance of PLDS and GCLDS for all neurons compared to the observed\ndata. Each point gives the observed and \ufb01tted variance of a single neuron, averaged across time.\n\ntargets. Critically, these predictions are made for all neurons in the population. To give informative\nperformance metrics, we de\ufb01ned baseline performance as a straightforward, homogeneous Poisson\nprocess for each neuron, and compare the LDS models with the baseline using percentage reduction\nof mean-squared-error and negative log likelihood (thus higher error reduction numbers imply better\nperformance). The mean-squared-error (MSE; left panel) shows that the GCLDS offers a minor\nimprovement (reduction in MSE) beyond what is achieved by the PLDS. Though these standard\nerror bars suggest an insigni\ufb01cant result, a paired t-test is indeed signi\ufb01cant (p < 10\u22128). Nonetheless\nthis minor result agrees with the middle column of Figure 2, since predictive MSE is essentially a\nmeasurement of the mean.\nIn the middle panel of Figure 3, we see that the GCLDS-full signi\ufb01cantly outperforms alternatives\nin predictive log likelihood across the population (p < 10\u221210, paired t-test). Again this largely\nagrees with the implication of Figure 2, as negative log likelihood measures both the accuracy of\nmean and variance. The right panel of Figure 3 shows that the GCLDS \ufb01ts the variance of the data\nexceptionally well across the population, unlike the PLDS.\n\nAnalysis of the preparatory period. To augment the data analysis, we also considered the\npreparatory period of neural activity. When we repeated the analyses of Figure 3 on this dataset,\nthe same results occurred: the GCLDS model produced concave (or close to concave) g functions\n\n7\n\n05\u22124\u221220k (spikes per bin)g(k)neuron 1010020030011.522.53Time after movement onset (ms)Mean01002003000.511.522.5Time after movement onset (ms)Variance05\u22124\u221220k (spikes per bin)g(k)neuron 2010020030000.511.5Time after movement onset (ms)Mean observed dataPLDSGCLDS\u2212fullGCLDS\u2212simpleGCLDS\u2212linear010020030000.511.5Time after movement onset (ms)Variance246810.51111.512Latent dimension% MSE reduction PLDSGCLDS\u2212fullGCLDS\u2212simpleGCLDS\u2212linear246856789Latent dimension% NLL reduction01200.511.52Observed varianceFitted variance PLDSGCLDS\u2212full\fand outperformed the PLDS model both in predictive MSE (minority) and negative log likelihood\n(signi\ufb01cantly). For brevity we do not show this analysis here. Instead, we here compare the temporal\ncross-covariance, which is also a common analysis of interest in neural data analysis [8, 16, 32] and,\nas noted, is particularly salient in preparatory activity. Figure 4 shows that GCLDS model \ufb01ts both\nthe temporal cross-covariance (left panel) and variance (right panel) considerably better than PLDS,\nwhich overestimates both quantities.\n\nFigure 4: Goodness-of-\ufb01t for monkey data during the preparatory period \u2013 Left panel: Temporal\ncross-covariance averaged over all 81 units during the preparatory period, compared to the \ufb01tted\ncross-covariance by PLDS and GCLDS-full. Right panel: \ufb01tted variance of PLDS and GCLDS-full\nfor all neurons compared to the observed data (averaged across time).\n\n5 Discussion\n\nIn this paper we showed that the GC family better captures the conditional variability of neural\nspiking data, and further improves inference of key features of interest in the data. We note that\nit is straightforward to incorporate external stimuli and spike history in the model as covariates, as\nhas been done previously in the Poisson case [8]. Beyond the GCGLM and GCLDS, the GC family\nis also extensible to other models that have been used in this setting, such as exponential family\nPCA [10] and subspace clustering [11]. The cost of this performance, compared to the PLDS, is an\nextra parameterization (the gi(\u00b7) functions) and the corresponding algorithmic complexity. While\nwe showed that there seems to be no empirical sacri\ufb01ce to doing so, it is likely that data with few\nexamples and reasonably Poisson dispersion may cause GCLDS to over\ufb01t.\n\nAcknowledgments\n\nJPC received funding from a Sloan Research Fellowship, the Simons Foundation (SCGB#325171\nand SCGB#325233), the Grossman Center at Columbia University, and the Gatsby Charitable Trust.\nThanks to Byron Yu, Gopal Santhanam and Stephen Ryu for providing the cortical data.\n\nReferences\n[1] J. P. Cunningham and B. M Yu, \u201cDimensionality reduction for large-scale neural recordings,\u201d Nature\n\nneuroscience, vol. 17, no. 71, pp. 1500\u20131509, 2014.\n\n[2] L. Paninski, \u201cMaximum likelihood estimation of cascade point-process neural encoding models,\u201d Net-\n\nwork: Computation in Neural Systems, vol. 15, no. 4, pp. 243\u2013262, 2004.\n\n[3] W. Truccolo, U. T. Eden, M. R. Fellows, J. P. Donoghue, and E. N. Brown, \u201cA point process framework\nfor relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects,\u201d\nJournal of neurophysiology, vol. 93, no. 2, pp. 1074\u20131089, 2005.\n\n[4] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, E. Chichilnisky, and E. P. Simoncelli, \u201cSpatio-\ntemporal correlations and visual signalling in a complete neuronal population,\u201d Nature, vol. 454, no. 7207,\npp. 995\u2013999, 2008.\n\n[5] M. Vidne, Y. Ahmadian, J. Shlens, J. W. Pillow, J. Kulkarni, A. M. Litke, E. Chichilnisky, E. Simoncelli,\nand L. Paninski, \u201cModeling the impact of common noise inputs on the network activity of retinal ganglion\ncells,\u201d Journal of computational neuroscience, vol. 33, no. 1, pp. 97\u2013121, 2012.\n\n8\n\n\u2212200\u221210001002000246810x 10\u22123Time lag (ms)Covariance recorded dataGCLDS\u2212fullPLDS00.20.40.60.800.20.40.60.81Observed varianceFitted variance PLDSGCLDS\u2212full\f[6] J. E. Kulkarni and L. Paninski, \u201cCommon-input models for multiple neural spike-train data,\u201d Network:\n\nComputation in Neural Systems, vol. 18, no. 4, pp. 375\u2013407, 2007.\n\n[7] B. M Yu, J. P. Cunningham, G. Santhanam, S. I. Ryu, K. V. Shenoy, and M. Sahani, \u201cGaussian-process\nfactor analysis for low-dimensional single-trial analysis of neural population activity,\u201d in NIPS, pp. 1881\u2013\n1888, 2009.\n\n[8] J. H. Macke, L. Buesing, J. P. Cunningham, B. M Yu, K. V. Shenoy, and M. Sahani, \u201cEmpirical models\n\nof spiking in neural populations,\u201d in NIPS, pp. 1350\u20131358, 2011.\n\n[9] B. Petreska, B. M Yu, J. P. Cunningham, G. Santhanam, S. I. Ryu, K. V. Shenoy, and M. Sahani, \u201cDy-\n\nnamical segmentation of single trials from population neural data,\u201d in NIPS, pp. 756\u2013764, 2011.\n\n[10] D. Pfau, E. A. Pnevmatikakis, and L. Paninski, \u201cRobust learning of low-dimensional dynamics from large\n\nneural ensembles,\u201d in NIPS, pp. 2391\u20132399, 2013.\n\n[11] L. Buesing, T. A. Machado, J. P. Cunningham, and L. Paninski, \u201cClustered factor analysis of multineu-\n\nronal spike data,\u201d in NIPS, pp. 3500\u20133508, 2014.\n\n[12] M. M. Churchland, B. M Yu, J. P. Cunningham, L. P. Sugrue, M. R. Cohen, G. S. Corrado, W. T.\nNewsome, A. M. Clark, P. Hosseini, B. B. Scott, et al., \u201cStimulus onset quenches neural variability:\na widespread cortical phenomenon,\u201d Nature neuroscience, vol. 13, no. 3, pp. 369\u2013378, 2010.\n\n[13] J. P. Cunningham, B. M Yu, K. V. Shenoy, and S. Maneesh, \u201cInferring neural \ufb01ring rates from spike trains\n\nusing gaussian processes,\u201d in NIPS, pp. 329\u2013336, 2007.\n\n[14] R. P. Adams, I. Murray, and D. J. MacKay, \u201cTractable nonparametric bayesian inference in poisson pro-\n\ncesses with gaussian process intensities,\u201d in ICML, pp. 9\u201316, ACM, 2009.\n\n[15] S. Koyama, \u201cOn the spike train variability characterized by variance-to-mean power relationship,\u201d Neural\n\ncomputation, 2015.\n\n[16] R. L. Goris, J. A. Movshon, and E. P. Simoncelli, \u201cPartitioning neuronal variability,\u201d Nature neuroscience,\n\nvol. 17, no. 6, pp. 858\u2013865, 2014.\n\n[17] J. Scott and J. W. Pillow, \u201cFully bayesian inference for neural models with negative-binomial spiking,\u201d in\n\nNIPS, pp. 1898\u20131906, 2012.\n\n[18] S. W. Linderman, R. Adams, and J. Pillow, \u201cInferring structured connectivity from spike trains under\n\nnegative-binomial generalized linear models,\u201d COSYNE, 2015.\n\n[19] J. del Castillo and M. P\u00b4erez-Casany, \u201cOverdispersed and underdispersed poisson generalizations,\u201d Journal\n\nof Statistical Planning and Inference, vol. 134, no. 2, pp. 486\u2013500, 2005.\n\n[20] M. Emtiyaz Khan, A. Aravkin, M. Friedlander, and M. Seeger, \u201cFast dual variational inference for non-\n\nconjugate latent gaussian models,\u201d in ICML, pp. 951\u2013959, 2013.\n\n[21] C. R. Rao, \u201cOn discrete distributions arising out of methods of ascertainment,\u201d Sankhy\u00afa: The Indian\n\nJournal of Statistics, Series A, pp. 311\u2013324, 1965.\n\n[22] D. Lambert, \u201cZero-in\ufb02ated poisson regression, with an application to defects in manufacturing,\u201d Techno-\n\nmetrics, vol. 34, no. 1, pp. 1\u201314, 1992.\n\n[23] J. Singh, \u201cA characterization of positive poisson distribution and its statistical application,\u201d SIAM Journal\n\non Applied Mathematics, vol. 34, no. 3, pp. 545\u2013548, 1978.\n\n[24] K. F. Sellers and G. Shmueli, \u201cA \ufb02exible regression model for count data,\u201d The Annals of Applied Statis-\n\ntics, pp. 943\u2013961, 2010.\n\n[25] C. V. Ananth and D. G. Kleinbaum, \u201cRegression models for ordinal responses: a review of methods and\n\napplications.,\u201d International journal of epidemiology, vol. 26, no. 6, pp. 1323\u20131333, 1997.\n\n[26] L. Paninski, J. Pillow, and J. Lewi, \u201cStatistical models for neural encoding, decoding, and optimal stimulus\n\ndesign,\u201d Progress in brain research, vol. 165, pp. 493\u2013507, 2007.\n\n[27] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2009.\n[28] L. Buesing, J. H. Macke, and M. Sahani, \u201cLearning stable, regularised latent models of neural population\n\ndynamics,\u201d Network: Computation in Neural Systems, vol. 23, no. 1-2, pp. 24\u201347, 2012.\n\n[29] L. Buesing, J. H. Macke, and M. Sahani, \u201cEstimating state and parameters in state-space models of spike\n\ntrains,\u201d in Advanced State Space Methods for Neural and Clinical Data, Cambridge Univ Press., 2015.\n\n[30] V. Lawhern, W. Wu, N. Hatsopoulos, and L. Paninski, \u201cPopulation decoding of motor cortical activity\nusing a generalized linear model with hidden states,\u201d Journal of neuroscience methods, vol. 189, no. 2,\npp. 267\u2013280, 2010.\n\n[31] M. M. Churchland, J. P. Cunningham, M. T. Kaufman, J. D. Foster, P. Nuyujukian, S. I. Ryu, and K. V.\n\nShenoy, \u201cNeural population dynamics during reaching,\u201d Nature, vol. 487, no. 7405, pp. 51\u201356, 2012.\n\n[32] M. R. Cohen and A. Kohn, \u201cMeasuring and interpreting neuronal correlations,\u201d Nature neuroscience,\n\nvol. 14, no. 7, pp. 811\u2013819, 2011.\n\n9\n\n\f", "award": [], "sourceid": 1235, "authors": [{"given_name": "Yuanjun", "family_name": "Gao", "institution": "Columbia University"}, {"given_name": "Lars", "family_name": "Busing", "institution": "Columbia University"}, {"given_name": "Krishna", "family_name": "Shenoy", "institution": "Stanford University"}, {"given_name": "John", "family_name": "Cunningham", "institution": "University of Columbia"}]}