{"title": "Gaussian process based nonlinear latent structure discovery in multivariate spike train data", "book": "Advances in Neural Information Processing Systems", "page_first": 3496, "page_last": 3505, "abstract": "A large body of recent work focuses on methods for extracting low-dimensional latent structure from multi-neuron spike train data. Most such methods employ either linear latent dynamics or linear mappings from latent space to log spike rates. Here we propose a doubly nonlinear latent variable model that can identify low-dimensional structure underlying apparently high-dimensional spike train data. We introduce the Poisson Gaussian-Process Latent Variable Model (P-GPLVM), which consists of Poisson spiking observations and two underlying Gaussian processes\u2014one governing a temporal latent variable and another governing a set of nonlinear tuning curves. The use of nonlinear tuning curves enables discovery of low-dimensional latent structure even when spike responses exhibit high linear dimensionality (e.g., as found in hippocampal place cell codes). To learn the model from data, we introduce the decoupled Laplace approximation, a fast approximate inference method that allows us to efficiently optimize the latent path while marginalizing over tuning curves. We show that this method outperforms previous Laplace-approximation-based inference methods in both the speed of convergence and accuracy. We apply the model to spike trains recorded from hippocampal place cells and show that it compares favorably to a variety of previous methods for latent structure discovery, including variational auto-encoder (VAE) based methods that parametrize the nonlinear mapping from latent space to spike rates with a deep neural network.", "full_text": "Gaussian process based nonlinear latent structure\n\ndiscovery in multivariate spike train data\n\nAnqi Wu, Nicholas A. Roy, Stephen Keeley, & Jonathan W. Pillow\n\nPrinceton Neuroscience Institute\n\nPrinceton University\n\nAbstract\n\nA large body of recent work focuses on methods for extracting low-dimensional\nlatent structure from multi-neuron spike train data. Most such methods employ\neither linear latent dynamics or linear mappings from latent space to log spike\nrates. Here we propose a doubly nonlinear latent variable model that can identify\nlow-dimensional structure underlying apparently high-dimensional spike train data.\nWe introduce the Poisson Gaussian-Process Latent Variable Model (P-GPLVM),\nwhich consists of Poisson spiking observations and two underlying Gaussian\nprocesses\u2014one governing a temporal latent variable and another governing a set\nof nonlinear tuning curves. The use of nonlinear tuning curves enables discovery\nof low-dimensional latent structure even when spike responses exhibit high linear\ndimensionality (e.g., as found in hippocampal place cell codes). To learn the model\nfrom data, we introduce the decoupled Laplace approximation, a fast approxi-\nmate inference method that allows us to ef\ufb01ciently optimize the latent path while\nmarginalizing over tuning curves. We show that this method outperforms previous\nLaplace-approximation-based inference methods in both the speed of convergence\nand accuracy. We apply the model to spike trains recorded from hippocampal place\ncells and show that it compares favorably to a variety of previous methods for latent\nstructure discovery, including variational auto-encoder (VAE) based methods that\nparametrize the nonlinear mapping from latent space to spike rates with a deep\nneural network.\n\nIntroduction\n\n1\nRecent advances in multi-electrode array recording techniques have made it possible to measure\nthe simultaneous spiking activity of increasingly large neural populations. These datasets have\nhighlighted the need for robust statistical methods for identifying the latent structure underlying\nhigh-dimensional spike train data, so as to provide insight into the dynamics governing large-scale\nactivity patterns and the computations they perform [1\u20134].\nRecent work has focused on the development of sophisticated model-based methods that seek to\nextract a shared, low-dimensional latent process underlying population spiking activity. These\nmethods can be roughly categorized on the basis of two basic modeling choices: (1) the dynamics of\nthe underlying latent variable; and (2) the mapping from latent variable to neural responses. For choice\nof dynamics, one popular approach assumes the latent variable is governed by a linear dynamical\nsystem [5\u201311], while a second assumes that it evolves according to a Gaussian process, relaxing the\nlinearity assumption and imposing only smoothness in the evolution of the latent state [1, 12\u201314].\nFor choice of mapping function, most previous methods have assumed a \ufb01xed linear or log-linear\nrelationship between the latent variable and the mean response level [1, 5\u20138, 11, 12]. These methods\nseek to \ufb01nd a linear embedding of population spiking activity, akin to PCA or factor analysis. In many\ncases, however, the relationship between neural activity and the quantity it encodes can be highly\nnonlinear. Hippocampal place cells provide an illustrative example: if each discrete location in a 2D\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFigure 1: Schematic diagram of the Poisson Gaussian Process Latent Variable Model (P-GPLVM),\nillustrating multi-neuron spike train data generated by the model with a one-dimensional latent\nprocess.\n\nenvironment has a single active place cell, population activity spans a space whose dimensionality is\nequal to the number of neurons; a linear latent variable model cannot \ufb01nd a reduced-dimensional\nrepresentation of population activity, despite the fact that the underlying latent variable (\u201cposition\u201d)\nis clearly two-dimensional.\nSeveral recent studies have introduced nonlinear coupling between latent dynamics and \ufb01ring rate\n[7, 9, 10, 15]. These models use deep neural networks to parametrize the nonlinear mapping from\nlatent space to spike rates, but often require repeated trials or long training sets. Table 1 summarizes\nthese different model structures for latent neural trajectory estimation (including the original Gaussian\nprocess latent variable model (GPLVM) [16], which assumes Gaussian observations and does not\nproduce spikes).\n\nlinear\n\nneural net\nneural net\n\noutput nonlinearity\n\nmodel\nPoisson\nPLDS [8]\nPoisson\nPfLDS [9, 10]\nPoisson\nLFADS [15]\nGaussian\nGPFA [1]\nPoisson\nP-GPFA [13, 14]\nGaussian\nGPLVM [16]\nP-GPLVM\nPoisson\nTable 1: Modeling assumptions of various latent variable models for spike trains.\n\nlatent mapping function\nLDS\nLDS\nRNN\nGP\nGP\nGP\nGP\n\nlinear\nlinear\nGP\nGP\n\nexp\nexp\nexp\n\nexp\n\nobservation\n\nidentity\n\nidentity\n\nexp\n\nIn this paper, we propose the Poisson Gaussian process latent variable model (P-GPLVM) for spike\ntrain data, which allows for nonlinearity in both the latent state dynamics and in the mapping from\nthe latent states to the spike rates. Our model posits a low-dimensional latent variable that evolves\nin time according to a Gaussian process prior; this latent variable governs \ufb01ring rates via a set\nof non-parametric tuning curves, parametrized as exponentiated samples from a second Gaussian\nprocess, from which spikes are then generated by a Poisson process (Fig. 1).\nThe paper is organized as follows: Section 2 introduces the P-GPLVM; Section 3 describes the\ndecoupled Laplace approximation for performing ef\ufb01cient inference for the latent variable and\ntuning curves; Section 4 describes tuning curve estimation; Section 5 compares P-GPLVM to other\nmodels using simulated data and hippocampal place-cell recordings, demonstrating the accuracy and\ninterpretability of P-GPLVM relative to other methods.\n\n2 Poisson-Gaussian process latent variable model (P-GPLVM)\nSuppose we have simultaneously recorded spike trains from N neurons. Let Y \u2208 RN\u00d7T denote the\nmatrix of spike count data, with neurons indexed by i \u2208 (1, . . . , N ) and spikes counted in discrete\n\n2\n\n1001230spike ratelatent processlog tuning curvestuning curvestime (s)spike trains100time (s)\ftime bins indexed by t \u2208 (1, . . . , T ). Our goal is to construct a generative model of the latent structure\nunderlying these data, which will here take the form of a P -dimensional latent variable x(t) and a set\nof mapping functions or tuning curves {hi(x)}, i \u2208 (1, . . . , N ) which map the latent variable to the\nspike rates of each neuron.\nLatent dynamics\nLet x(t) denote a (vector-valued) latent process, where each component xj(t), j \u2208 (1, . . . , P ),\nevolves according to an independent Gaussian process (GP),\nxj(t) \u223c GP (0, kt) ,\n\n(1)\nwith covariance function kt(t, t(cid:48)) (cid:44) cov(xj(t), xj(t(cid:48))) governing how each scalar process varies\nover time. Although we can select any valid covariance function for kt, here we use the exponential\ncovariance function, a special case of the Mat\u00e9rn kernel, given by k(t, t(cid:48)) = r exp (\u2212|t \u2212 t(cid:48)|/l),\nwhich is parametrized by a marginal variance r > 0 and length-scale l > 0. Samples from this GP\nare continuous but not differentiable, equivalent to a Gaussian random walk with a bias toward the\norigin, also known as the Ornstein-Uhlenbeck process [17].\nThe latent state x(t) at any time t is a P -dimensional vector that we will write as xt \u2208 RP\u00d71. The\ncollection of such vectors over T time bins forms a matrix X \u2208 RP\u00d7T . Let xj denote the jth row\nof X, which contains the set of states in latent dimension j. From the de\ufb01nition of a GP, xj has a\nmultivariate normal distribution,\n(2)\nwith a T \u00d7 T covariance matrix Kt generated by evaluating the covariance function kt at all time\nbins in (1, . . . , T ).\nNonlinear mapping\nLet h : RP \u2212\u2192 R denote a nonlinear function mapping from the latent vector xt to a \ufb01ring rate \u03bbt.\nWe will refer to h(x) as a tuning curve, although unlike traditional tuning curves, which describe\n\ufb01ring rate as a function of some externally (observable) stimulus parameter, here h(x) describes\n\ufb01ring rate as a function of the (unobserved) latent vector x. Previous work has modeled h with a\nparametric nonlinear function such as a deep neural network [9, 10]. Here we develop a nonparametric\napproach using a Gaussian process prior over the log of h. The logarithm assures that spike rates are\nnon-negative.\nLet fi(x) = log hi(x) denote the log tuning curve for the i\u2019th neuron in our population, which we\nmodel with a GP,\n\nxj \u223c N (0, Kt)\n\nfi(x) \u223c GP (0, kx) ,\n\n(3)\nwhere kx is a (spatial) covariance function that governs smoothness of the function over its P -\ndimensional input space. For simplicity, we use the common Gaussian or radial basis function (RBF)\n\ncovariance function: kx(x, x(cid:48)) = \u03c1 exp(cid:0)\u2212||x \u2212 x(cid:48)||2\n\n2/2\u03b42(cid:1), where x and x(cid:48) are arbitrary points in\n\nlatent space, \u03c1 is the marginal variance and \u03b4 is the length scale. The tuning curve for neuron i is then\ngiven by hi(x) = exp(fi(x)).\nLet fi \u2208 RT\u00d71 denote a vector with the t\u2019th element equal to fi(xt). From the de\ufb01nition of a GP, fi\nhas a multivariate normal distribution given latent vectors at all time bins x1:T = {xt}T\n\nt=1,\n\nfi|x1:T \u223c N (0, Kx)\n\n(4)\nwith a T \u00d7 T covariance matrix Kx generated by evaluating the covariance function kx at all pairs of\nlatent vectors in x1:T . Stacking fi for N neurons, we will formulate a matrix F \u2208 RN\u00d7T with f(cid:62)\ni on\nthe i\u2019th row. The element on the i\u2019th row and the t\u2019th column is fi,t = fi(xt).\nPoisson spiking\nLastly, we assume Poisson spiking given the latent \ufb01ring rates. We assume that spike rates are in\nunits of spikes per time bin. Let \u03bbi,t = exp(fi,t) = exp(fi(xt)) denote the spike rate of neuron i at\ntime t. The spike-count of neuron i at t given the log tuning curve fi and latent vector xt is Poisson\ndistributed as\n\n(5)\nIn summary, our model is as a doubly nonlinear Gaussian process latent variable model with Poisson\nobservations (P-GPLVM). One GP is used to model the nonlinear evolution of the latent dynamic\nx, while a second GP is used to generate the log of the tuning curve f as a nonlinear function of x,\nwhich is then mapped to a tuning curve h via a nonlinear link function, e.g. exponential function.\nFig. 1 provides a schematic of the model.\n\nyi,t|fi, xt \u223c Poiss(exp(fi(xt))).\n\n3\n\n\fInference using the decoupled Laplace approximation\n\n3\nFor our inference procedure, we estimate the log of the tuning curve, f, as opposed to attempting\nto infer the tuning curve h directly. Once f is estimated, h can be obtained by exponentiating f.\nGiven the model outlined above, the joint distribution over the observed data and all latent variables\nis written as,\n\np(Y, F, X, \u03b8) = p(Y|F)p(F|X, \u03c1, \u03b4)p(X|r, l) =\n\np(yi,t|fi,t)\n\np(fi|X, \u03c1, \u03b4)\n\np(xj|r, l),\n\n(6)\n\nN(cid:89)\n\nT(cid:89)\n\ni=1\n\nt=1\n\nN(cid:89)\n\ni=1\n\nP(cid:89)\n\nj=1\n\nwhere \u03b8 = {\u03c1, \u03b4, r, l} is the hyperparameter set, references to which will now be suppressed for\nsimpli\ufb01cation. This is a Gaussian process latent variable model (GPLVM) with Poisson observations\nand a GP prior, and our goal is to now estimate both F and X. A standard Bayesian treatment of the\nGPLVM requires the computation of the log marginal likelihood associated with the joint distribution\n(Eq.6). Both F and X must be marginalized out,\n\nlog p(Y) = log\n\np(Y, F, X)dXdF = log\n\np(Y|F)\n\np(F|X)p(X)dX dF.\n\n(7)\n\n(cid:90)\n\n(cid:90)\n\n(cid:90) (cid:90)\n\nHowever, propagating the prior density p(X) through the nonlinear mapping makes this inference\ndif\ufb01cult. The nested integral in (Eq. 7) contains X in a complex nonlinear manner, making analytical\nintegration over X infeasible. To overcome these dif\ufb01culties, we can use a straightforward MAP\ntraining procedure where the latent variables F and X are selected according to\n\nFMAP, XMAP = argmaxF,X p(Y|F)p(F|X)p(X).\n\n(8)\nNote that point estimates of the hyperparameters \u03b8 can also be found by maximizing the same\nobjective function. As discussed above, learning X remains a challenge due to the interplay of the\nlatent variables, i.e. the dependency of F on X. For our MAP training procedure, \ufb01xing one latent\nvariable while optimizing for the other in a coordinate descent approach is highly inef\ufb01cient since the\nstrong interplay of variables often means getting trapped in bad local optima. In variational GPLVM\n[18], the authors introduced a non-standard variational inference framework for approximately\nintegrating out the latent variables X then subsequently training a GPLVM by maximizing an analytic\nlower bound on the exact marginal likelihood. An advantage of the variational framework is the\nintroduction of auxiliary variables which weaken the strong dependency between X and F. However,\nthe variational approximation is only applicable to Gaussian observations; with Poisson observations,\nthe integral over F remains intractable. In the following, we will propose using variations of the\nLaplace approximation for inference.\n3.1 Standard Laplace approximation\nWe \ufb01rst use Laplace\u2019s method to \ufb01nd a Gaussian approximation q(F|Y, X) to the true posterior\np(F|Y, X), then do MAP estimation for X only. We employ the Laplace approximation for each fi\nindividually. Doing a second order Taylor expansion of log p(fi|yi, X) around the maximum of the\nposterior, we obtain a Gaussian approximation\n\nq(fi|yi, X) = N (\u02c6fi, A\u22121),\n\n(9)\n\np(fi|yi, X) and A = \u2212\u2207\u2207 log p(fi|yi, X)|fi=\u02c6fi\n\nwhere \u02c6fi = argmaxfi\nis the Hessian of the neg-\native log posterior at that point. By Bayes\u2019 rule, the posterior over fi is given by p(fi|yi, X) =\np(yi|fi)p(fi|X)/p(yi|X), but since p(yi|X) is independent of fi, we need only consider the unnor-\nmalized posterior, de\ufb01ned as \u03a8(fi), when maximizing w.r.t. fi. Taking the logarithm gives\nlog |Kx| + const.\n\n\u03a8(fi) = log p(yi|fi) + log p(fi|X) = log p(yi|fi) \u2212 1\n2\n\n(10)\n\nf(cid:62)\ni K\u22121\n\nx fi \u2212 1\n2\n\nDifferentiating (Eq. 10) w.r.t. fi we obtain\n\n\u2207\u03a8(fi) = \u2207 log p(yi|fi) \u2212 K\u22121\nx fi\n\u2207\u2207\u03a8(fi) = \u2207\u2207 log p(yi|fi) \u2212 K\u22121\nx = \u2212Wi \u2212 K\u22121\nx ,\n\n(11)\n(12)\nwhere Wi = \u2212\u2207\u2207 log p(yi|fi). The approximated log conditional likelihood on X (see Sec. 3.4.4\nin [17]) can then be written as\n\nlog q(yi|X) = log p(yi|\u02c6fi) \u2212 1\n2\n\n\u02c6f(cid:62)\ni K\u22121\n\nx\n\n\u02c6fi \u2212 1\n2\n\nlog |IT + KxWi|.\n\n(13)\n\n4\n\n\fWe can then estimate X as\n\nN(cid:88)\n\nq(yi|X)p(X).\n\nXMAP = argmaxX\n\n(14)\n\ni=1\n\nWhen using standard LA, the gradient of log q(yi|X) w.r.t. X should be calculated for a given\nposterior mode \u02c6fi. Note that not only is the covariance matrix Kx an explicit function of X, but also\n\u02c6fi and Wi are also implicitly functions of X \u2014 when X changes, the optimum of the posterior \u02c6fi\nchanges as well. Therefore, log q(yi|X) contains an implicit function of X which does not allow for\na straightforward closed-form gradient expression. Calculating numerical gradients instead yields a\nvery inef\ufb01cient implementation empirically.\n3.2 Third-derivative Laplace approximation\nOne method to derive this gradient explicitly is described in [17] (see Sec. 5.5.1). We adapt their\nprocedure to our setting to make the implicit dependency of \u02c6fi and Wi on X explicit. To solve (Eq.\n14), we need to determine the partial derivative of our approximated log conditional likelihood (Eq.\n13) w.r.t. X, given as\n\n\u2202 log q(yi|X)\n\n\u2202X\n\n=\n\n\u2202 log q(yi|X)\n\n\u2202X\n\nT(cid:88)\n\n+\n\n\u2202 log q(yi|X)\n\nt=1\n\n\u2202 \u02c6fi,t\n\n\u2202 \u02c6fi,t\n\u2202X\n\n(15)\n\n\u2202\u02c6fi\n\u2202X \u00b7 \u2202\n\nby the chain rule. When evaluating the second term, we use the fact that \u02c6fi is the posterior maximum,\nso \u2202\u03a8(fi)/\u2202fi = 0 at fi = \u02c6fi, where \u03a8(fi) is de\ufb01ned in (Eq. 11). Thus the implicit derivatives of the\n\ufb01rst two terms in (Eq. 13) vanish, leaving only\n\u2202 log q(yi|X)\nx + Wi)\u22121 \u2202Wi\n\u2202 \u02c6fi,t\n\n(cid:2)(K\u22121\nx + Wi)\u22121(cid:3)\n\nlog p(yi|\u02c6fi). (16)\n\n= \u2212 1\n2\n\n= \u2212 1\n2\n\n(K\u22121\n\n(cid:32)\n\n\u2202 \u02c6fi,t\n\ntr\n\n\u22023\n\u2202 \u02c6f 3\ni,t\n\ntt\n\nTo evaluate \u2202 \u02c6fi,t/\u2202X, we differentiate the self-consistent equation \u02c6fi = Kx\u2207 log p(yi|\u02c6fi) (setting\n(Eq. 11) to be 0 at \u02c6fi) to obtain\n\u2202\u02c6fi\n\u2202X\n\n= (IT + KxWi)\u22121 \u2202Kx\n\u2202X\n\n\u2207 log p(yi|\u02c6fi), (17)\n\n\u2207 log p(yi|\u02c6fi) + Kx\n\n\u2207 log p(yi|\u02c6fi)\n\n\u2202Kx\n\u2202X\n\n\u2202\u02c6fi\n\u2202X\n\n=\n\n\u2202\u02c6fi\n\n\u2202X = \u2202\u02c6fi\n\nand \u2202\u2207 log p(yi|\u02c6fi)/\u2202\u02c6fi = \u2212Wi from (Eq. 12). The\nwhere we use the chain rule \u2202\ndesired implicit derivative is obtained by multiplying (Eq. 16) and (Eq. 17) to formulate the second\nterm in (Eq. 15).\nWe can now estimate XMAP with (Eq. 14) using the explicit gradient expression in (Eq. 15). We\ncall this method third-derivative Laplace approximation (tLA), as it depends on the third derivative\nof the data likelihood term (see [17] for further details). However, there is a big computational\ndrawback with tLA: for each step along the gradient we have just derived, the posterior mode \u02c6fi\nmust be reevaluated. This method might lead to a fast convergence theoretically, but this nested\noptimization makes for a very slow computation empirically.\n3.3 Decoupled Laplace approximation\nWe propose a novel method to relax the Laplace approximation, which we refer to as the decoupled\nLaplace approximation (dLA). Our relaxation not only decouples the strong dependency between X\nand F, but also avoids the nested optimization of searching for the posterior mode of F within each\nupdate of X. As in tLA, dLA also assumes \u02c6fi to be a function of X. However, while tLA assumes \u02c6fi\nto be an implicit function of X, dLA constructs an explicit mapping between \u02c6fi and X.\nThe standard Laplace approximation uses a Gaussian approximation for the posterior p(fi|yi, X) \u221d\np(yi|fi)p(fi|X) where, in this paper, p(yi|fi) is a Poisson distribution and p(fi|X) is a multivariate\nGaussian distribution. We \ufb01rst do the same second order Taylor expansion of log p(fi|yi, X) around\nthe posterior maximum to \ufb01nd q(fi|yi, X) as in (Eq. 9). Now if we approximate the likelihood\ndistribution p(yi|fi) as a Gaussian distribution q(yi|fi) = N (m, S), we can derive its mean m and\ncovariance S. If p(fi|X) = N (0, Kx) and q(fi|yi, X) = N (\u02c6fi, A\u22121), the relationship between\ntwo Gaussian distributions and their product allow us to solve for m and S from the relationship\nN (\u02c6fi, A\u22121) \u221d N (m, S)N (0, Kx):\n\n(cid:12)(cid:12)(cid:12)(cid:12)explicit\n(cid:33)\n\n5\n\n\fAlgorithm 1 Decoupled Laplace approximation at iteration k\n\nInput: data observation yi, latent variable Xk\u22121 from iteration k \u2212 1\n1. Compute the new posterior mode \u02c6f k\n\ni and the precision matrix Ak by solving (Eq. 10) to obtain\n\nq(fi|yi, Xk\u22121) = N (\u02c6f k\n\ni , Ak\u22121\n\n).\n\nA(X) = Sk\u22121\n\n2. Derive mk and Sk (Eq. 18): Sk = (Ak \u2212 K\u22121\n3. Fix mk and Sk and derive the new mean and covariance for q(fi|yi, Xk\u22121) as functions of X:\nx , we have Wi = Sk\u22121, and can obtain the new approximated conditional\n(cid:80)N\ni=1 q(yi|X)p(X).\n\nx )\u22121, mk = SkAk\u02c6f k\ni .\n\u22121Sk\u22121\n\u22121Ak\u02c6f k\ni .\n\ndistribution q(yi|X) (Eq. 13) with \u02c6fi replaced by \u02c6fi(X).\n\n4. Since A = Wi + K\u22121\n\n\u22121, \u02c6fi(X) = A(X)\n\nmk = A(X)\n\n+ Kx(X)\n\n5. Solve Xk = argmaxX\nOutput: new latent variable Xk\n\nA = S\u22121 + K\u22121\n\nx )\u22121, m = SA\u02c6fi.\n\nx , \u02c6fi = A\u22121S\u22121m =\u21d2 S = (A \u2212 K\u22121\n\n(18)\nm and S represent the components of the posterior terms, \u02c6fi and A, that come from the likelihood.\nNow when estimating X, we \ufb01x these likelihood terms m and S, and completely relax the prior,\np(fi|X). We are still solving (Eq. 14) w.r.t. X, but now q(fi|yi, X) has both mean and covariance\napproximated as explicit functions of X. Alg. 1 describes iteration k of the dLA algorithm, with which\nwe can now estimate XMAP. Step 3 indicates that the posterior maximum for the current iteration\n\u02c6fi(X) = A(X)\nis now explicitly updated as a function of X, avoiding the computationally\ndemanding nested optimization of tLA. Intuitively, dLA works by \ufb01nding a Gaussian approximation\ni such that the approximated posterior of fi, q(fi|yi, X), is now a closed-form\nto the likelihood at \u02c6f k\nGaussian distribution with mean and covariance as functions of X, ultimately allowing for the explicit\ncalculation of q(yi|X).\n\n\u22121Ak\u02c6f k\n\ni\n\n4 Tuning curve estimation\nGiven the estimated \u02c6X and \u02c6f from the inference, we can now calculate the tuning curve h for each\nneuron. Let x1:G = {xg}G\ng=1 be a grid of G latent states, where xg \u2208 RP\u00d71. Correspondingly,\nfor each neuron, we have the log of the tuning curve vector evaluated on the grid of latent states,\nfgrid \u2208 RG\u00d71, with the g\u2019th element equal to f (xg). Similar to (Eq. 4), we can write down its\ndistribution as\n(19)\nwith a G \u00d7 G covariance matrix Kgrid generated by evaluating the covariance function kx at all pairs\nof vectors in x1:G. Therefore we can write a joint distribution for [\u02c6f , fgrid] as\n\nfgrid|x1:G \u223c N (0, Kgrid)\n\n(cid:35)\n\n(cid:34) \u02c6f\n\nfgrid\n\n(cid:18)\n\n(cid:20) K\u02c6x\n\n\u223c N\n\n0,\n\nkgrid\nk(cid:62)\ngrid Kgrid\n\n(cid:21)(cid:19)\n\n.\n\n(20)\n\nK\u02c6x \u2208 RT\u00d7T is a covariance matrix with elements evaluated at all pairs of estimated latent vectors\n\u02c6x1:T = {\u02c6xt}T\nt=1 in \u02c6X, and kgridt,g = kx(\u02c6xt, xg). Thus we have the following posterior distribution\nover fgrid:\n\nfgrid|\u02c6f , \u02c6x1:T , x1:G \u223c N (\u00b5(x1:G), \u03a3(x1:G))\n\n\u00b5(x1:G) = k(cid:62)\n\ngridK\u22121\n\n\u02c6x\n\n\u02c6f\n\n, \u03a3(x1:G) = diag(Kgrid) \u2212 k(cid:62)\n\ngridK\u22121\n\n\u02c6x kgrid\n\n(21)\n\n(22)\n\nwhere diag(Kgrid) denotes a diagonal matrix constructed from the diagonal of Kgrid. Setting \u02c6fgrid =\n\u00b5(x1:G), the spike rate vector\n\n\u02c6\u03bbgrid = exp(\u02c6fgrid)\n\ndescribes the tuning curve h evaluated on the grid x1:G.\n5 Experiments\n5.1 Simulation data\nWe \ufb01rst examine performance using two simulated datasets generated with different kinds of tuning\ncurves, namely sinusoids and Gaussian bumps. We will compare our algorithm (P-GPLVM) with\n\n6\n\n\fFigure 2: Results from the sinusoid and Gaussian bump simulated experiments. A) and C) are\nestimated latent processes. B) and D) display the tuning curves estimated by different methods. E)\nshows the R2 performances with error bars. F) shows the convergence R2 performances of three\ndifferent Laplace approximation inference methods with error bars. Error bars are plotted every 10\nseconds.\n\nPLDS, PfLDS, P-GPFA and GPLVM (see Table 1), using the tLA and dLA inference methods. We\nalso include an additional variant on the Laplace approximation, which we call the approximated\nLaplace approximation (aLA), where we use only the explicit (\ufb01rst) term in (Eq. 15) to optimize over\nX for multiple steps given a \ufb01xed \u02c6fi. This allows for a coarse estimation for the gradient w.r.t. X for\na few steps in X before estimation is necessary, partially relaxing the nested optimization so as to\nspeed up the learning procedure.\nFor comparison between models in our simulated experiments, we compute the R-squared (R2)\nvalues from the known latent processes and the estimated latent processes. In all simulation studies,\nwe generate 1 single trial per neuron with 20 simulated neurons and 100 time bins for a single\nexperiment. Each experiment is repeated 10 times and results are averaged across 10 repeats.\nSinusoid tuning curve: This simulation generates a \"grid cell\" type response. A grid cell is a type of\nneuron that is activated when an animal occupies any point on a grid spanning the environment [19].\nWhen an animal moves in a one-dimensional space (P = 1), grid cells exhibit oscillatory responses.\nMotivated by the response properties of grid cells, the log \ufb01ring rate of each neuron i is coupled to\nthe latent process through a sinusoid with a neuron-speci\ufb01c phase \u03a6i and frequency \u03c9i,\n\nfi = sin(\u03c9ix + \u03a6i).\n\n(23)\n\nWe randomly generated \u03a6i uniformly from the region [0, 2\u03c0] and \u03c9i uniformly from [1.0, 4.0].\nAn example of the estimated latent processes versus the true latent process is presented in Fig. 2A. We\nused least-square regression to learn an af\ufb01ne transformation from the latent space to the space of the\ntrue locations. Only P-GPLVM \ufb01nds the global optimum by \ufb01tting the valley around t = 70. Fig. 2B\ndisplays the true tuning curves and the estimated tuning curves for neuron 4, 10, & 9 with PLDS,\nPfLDS, P-GPFA and P-GPLVM-dLA. For PLDS, PfLDS and P-GPFA, we replace the estimated \u02c6f\nwith the observed spike count y in (Eq. 21), and treat the posterior mean as the tuning curve on a\ngrid of latent representations. For P-GPLVM, the tuning curve is estimated via (Eq. 22). The R2\nperformance is shown in the \ufb01rst column of Fig. 2E.\nDeterministic Gaussian bump tuning curve: For this simulation, each neuron\u2019s tuning curve is\nmodeled as a unimodal Gaussian bump in a 2D space such that the log of the tuning curve, f, is a\ndeterministic Gaussian function of x. Fig. 2C shows an example of the estimated latent processes.\nPLDS \ufb01ts an overly smooth curve, while P-GPLVM can \ufb01nd the small wiggles that are missed by\nother methods. Fig. 2D displays the 2D tuning curves for neuron 1, 4, & 12 estimated by PLDS,\nPfLDS, P-GPFA and P-GPLVM-dLA. The R2 performance is shown in the second column of Fig. 2E.\nOverall, P-GPFA has a quite unstable performance due to the ARD kernel function in the GP prior,\npotentially encouraging a bias for smoothness even when the underlying latent process is actually\n\n7\n\nPfLDSP-GPFAP-GPLVMPLDSTrueneuron 1neuron 4neuron 12D)PLDSPfLDSTrue locationP-GPFAP-GPLVM-dLA-1.2-0.40.401.53-1.2-0.40.401.531.531.5300-1.2-0.40.4-1.2-0.40.4PLDSPfLDSP-GPFAP-GPLVM-dLAlocationspike rateneuron 4neuron 10neuron 19TrueEstimatedB)-2021st dimension022nd dimension2060100timeGaussian bump tuning curveC)time2060100-1.2-0.40.41st dimension Sinusoid tuning curveA)0.80.9Gaussian bumptuning curveE)F)P-GPLVM-dLAPfLDSP-GPFAGPLVMP-GPLVM-tLAP-GPLVM-aLASinusoidtuning curve0.90.60.3PLDS0.90.80.70.60400800time (sec)P-GPLVM-tLAP-GPLVM-aLAP-GPLVM-dLA value value\fFigure 3: Results from the hippocampal data of two rats. A) and B) are estimated latent processes\nduring a 1s recording period for two rats. C) and D) show R2 and PLL performance with error bars.\nE) and F) display the true tuning curves and the tuning curves estimated by P-GPLVM-dLA.\n\nquite non-smooth. PfLDS performs better than PLDS in the second case, but when the true latent\nprocess is highly nonlinear (sinusoid) and the single-trial dataset is small, PfLDS losses its advantage\nto stochastic optimization. GPLVM has a reasonably good performance with the nonlinearities, but is\nworse than P-GPLVM which demonstrates the signi\ufb01cance of using the Poisson observation model.\nFor P-GPLVM, the dLA inference algorithm performs best overall w.r.t. both convergence speed and\nR2 (Fig. 2F).\n5.2 Application to rat hippocampal neuron data\nNext, we apply the proposed methods to extracellular recordings from the rodent hippocampus.\nNeurons were recorded bilaterally from the pyramidal layer of CA3 and CA1 in two rats as they\nperformed a spatial alternation task on a W-shaped maze [20]. We con\ufb01ne our analyses to simultane-\nously recorded putative place cells during times of active navigation. Total number of simultaneously\nrecorded neurons ranged from 7-19 for rat 1 and 24-38 for rat 2. Individual trials of 50 seconds were\nisolated from 15 minute recordings, and binned at a resolution of 100ms.\nWe used this hippocampal data to identify a 2D latent space using PLDS, PfLDS, P-GPFA, GPLVM\nand P-GPLVMs (Fig. 3), and compared these to the true 2D location of the rodent. For visualization\npurposes, we linearized the coordinates along the arms of the maze to obtain 1D representations.\n\n8\n\nPLDSTrue locationP-GPLVM-dLA20060010001802201st dimension2006001000501502nd dimensiontime (ms)rat 2A)B)C)D)010020004010020005010020004010020005010020004010020005010020004010020005locationspike rateneuron 1neuron 12neuron 2neuron 10True locationEstimated location01002003000.51.501002003000501002003000501002003000501002003000.51.5010020030005010020030005010020030005neuron 19neuron 1neuron 9neuron 28True locationEstimated locationlocationrat 120060010001002001st dimension201002nd dimension2006001000time (ms)E)F) value0.80.750.70.650.40.30.2PLL value0.80.750.70.65PLDSPfLDSP-GPFAGPLVMP-GPLVM-aLAP-GPLVM-tLAP-GPLVM-dLA0.150.1PLL\fFig. 3A & B present two segments of 1s recordings for the two animals. The P-GPLVM results are\nsmoother and recover short time-scale variations that PLDS ignores. The average R2 performance\nfor all methods for each rodent is shown in Fig. 3C & D where P-GPLVM-dLA consistently performs\nthe best.\nWe also assessed the model \ufb01tting quality by doing prediction on a held-out dataset. We split all\nthe time bins in each trial into training time bins (the \ufb01rst 90% time bins) and held-out time bins\n(the last 10% time bins). We \ufb01rst estimated the parameters for the mapping function or the tuning\ncurve in each model using spike trains from all the neurons within training time bins. Then we \ufb01xed\nthe parameters and inferred the latent process using spike trains from 70% neurons within held-out\ntime bins. Finally, we calculated the predictive log likelihood (PLL) for the other 30% neurons\nwithin held-out time bins given the inferred latent process. We subtracted the log-likelihood of the\npopulation mean \ufb01ring rate model (single spike rate) from the predictive log likelihood divided by\nnumber of observations, shown in Fig. 3C & D. Both P-GPLVM-aLA and P-GPLVM-dLA perform\nwell. GPLVM has very negative PLL, omitted in the \ufb01gures.\nFig. 3E & F present the tuning curves learned by P-GPLVM-dLA where each row corresponds to\na neuron. For our analysis we have the true locations xtrue, the estimated locations xP-GPLVM, a grid\nof G locations x1:G distributed with a shape of the maze, the spike count observation yi, and the\nestimated log of the tuning curves \u02c6fi for each neuron i. The light gray dots in the \ufb01rst column of\nFig. 3E & F are the binned spike counts when mapping from the space of xtrue to the space of x1:G.\nThe second column contains the binned spike counts mapped from the space of xP-GPLVM to the space\nof x1:G. The black curves in the \ufb01rst column are achieved by replacing \u02c6x and \u02c6f with xtrue and y\nrespectively using the predictive posterior in (Eq. 21) and (Eq. 22). The yellow curves in the second\ncolumn are the estimated tuning curves by using (Eq. 22) to get \u02c6\u03bbgrid for each neuron. We can tell that\nthe estimated tuning curves closely match the true tuning curves from the observations, discovering\ndifferent responsive locations for different neurons as the rat moves.\n6 Conclusion\nWe proposed a doubly nonlinear Gaussian process latent variable model for neural population spike\ntrains that can identify nonlinear low-dimensional structure underlying apparently high-dimensional\nspike train data. We also introduced a novel decoupled Laplace approximation, a fast approximate\ninference method that allows us to ef\ufb01ciently maximize marginal likelihood for the latent path\nwhile integrating over tuning curves. We showed that this method outperforms previous Laplace-\napproximation-based inference methods in both the speed of convergence and accuracy. We applied\nthe model to both simulated data and spike trains recorded from hippocampal place cells and showed\nthat it outperforms a variety of previous methods for latent structure discovery.\n\n9\n\n\fReferences\n[1] BM Yu, JP Cunningham, G Santhanam, SI Ryu, KV Shenoy, and M Sahani. Gaussian-process factor\nanalysis for low-dimensional single-trial analysis of neural population activity. In Adv neur inf proc sys,\npages 1881\u20131888, 2009.\n\n[2] L. Paninski, Y. Ahmadian, Daniel G. Ferreira, S. Koyama, Kamiar R. Rad, M. Vidne, J. Vogelstein, and\n\nW. Wu. A new look at state-space models for neural data. J comp neurosci, 29(1-2):107\u2013126, 2010.\n\n[3] John P Cunningham and B M Yu. Dimensionality reduction for large-scale neural recordings. Nature\n\nneuroscience, 17(11):1500\u20131509, 2014.\n\n[4] SW Linderman, MJ Johnson, MA Wilson, and Z Chen. A bayesian nonparametric approach for uncovering\n\nrat hippocampal population codes during spatial navigation. J neurosci meth, 263:36\u201347, 2016.\n\n[5] JH Macke, L Buesing, JP Cunningham, BM Yu, KV Shenoy, and M Sahani. Empirical models of spiking\n\nin neural populations. In Adv neur inf proc sys, pages 1350\u20131358, 2011.\n\n[6] L Buesing, J H Macke, and M Sahani. Spectral learning of linear dynamics from generalised-linear\nobservations with application to neural population data. In Adv neur inf proc sys, pages 1682\u20131690, 2012.\n\n[7] EW Archer, U Koster, JW Pillow, and JH Macke. Low-dimensional models of neural population activity in\n\nsensory cortical circuits. In Adv neur inf proc sys, pages 343\u2013351, 2014.\n\n[8] JH Macke, L Buesing, and M Sahani. Estimating state and parameters in state space models of spike trains.\n\nAdvanced State Space Methods for Neural and Clinical Data, page 137, 2015.\n\n[9] Evan Archer, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. Black box variational\n\ninference for state space models. arXiv preprint arXiv:1511.07367, 2015.\n\n[10] Y Gao, EW Archer, L Paninski, and JP Cunningham. Linear dynamical neural population models through\n\nnonlinear embeddings. In Adv neur inf proc sys, pages 163\u2013171, 2016.\n\n[11] JC Kao, P Nuyujukian, SI Ryu, MM Churchland, JP Cunningham, and KV Shenoy. Single-trial dynamics\n\nof motor cortex and their applications to brain-machine interfaces. Nature communications, 6, 2015.\n\n[12] David Pfau, Eftychios A Pnevmatikakis, and Liam Paninski. Robust learning of low-dimensional dynamics\n\nfrom large neural ensembles. In Adv neur inf proc sys, pages 2391\u20132399, 2013.\n\n[13] Hooram Nam. Poisson extension of gaussian process factor analysis for modeling spiking neural pop-\nulations. Master\u2019s thesis, Department of Neural Computation and Behaviour, Max Planck Institute for\nBiological Cybernetics, Tubingen, 8 2015.\n\n[14] Y. Zhao and I. M. Park. Variational latent gaussian process for recovering single-trial dynamics from\n\npopulation spike trains. arXiv preprint arXiv:1604.03053, 2016.\n\n[15] David Sussillo, Rafal Jozefowicz, LF Abbott, and Chethan Pandarinath. Lfads-latent factor analysis via\n\ndynamical systems. arXiv preprint arXiv:1608.06315, 2016.\n\n[16] Neil D Lawrence. Gaussian process latent variable models for visualisation of high dimensional data. In\n\nAdv neur inf proc sys, pages 329\u2013336, 2004.\n\n[17] Carl Rasmussen and Chris Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.\n\n[18] AC Damianou, MK Titsias, and ND Lawrence. Variational inference for uncertainty on the inputs of\n\ngaussian process models. arXiv preprint arXiv:1409.2287, 2014.\n\n[19] T Hafting, M Fyhn, S Molden, MB Moser, and EI Moser. Microstructure of a spatial map in the entorhinal\n\ncortex. Nature, 436(7052):801\u2013806, 2005.\n\n[20] M Karlsson, M Carr, and Frank LM. Simultaneous extracellular recordings from hippocampal areas ca1 and\nca3 (or mec and ca1) from rats performing an alternation task in two w-shapped tracks that are geometrically\nidentically but visually distinct. crcns.org. http://dx.doi.org/10.6080/K0NK3BZJ, 2005.\n\n10\n\n\f", "award": [], "sourceid": 1981, "authors": [{"given_name": "Anqi", "family_name": "Wu", "institution": "Princeton University"}, {"given_name": "Nicholas", "family_name": "Roy", "institution": "Princeton Neuroscience Institute"}, {"given_name": "Stephen", "family_name": "Keeley", "institution": "Princeton University"}, {"given_name": "Jonathan", "family_name": "Pillow", "institution": "Princeton University"}]}