{"title": "Recurrent linear models of simultaneously-recorded neural populations", "book": "Advances in Neural Information Processing Systems", "page_first": 3138, "page_last": 3146, "abstract": "Population neural recordings with long-range temporal structure are often best understood in terms of a shared underlying low-dimensional dynamical process. Advances in recording technology provide access to an ever larger fraction of the population, but the standard computational approaches available to identify the collective dynamics scale poorly with the size of the dataset. Here we describe a new, scalable approach to discovering the low-dimensional dynamics that underlie simultaneously recorded spike trains from a neural population. Our method is based on recurrent linear models (RLMs), and relates closely to timeseries models based on recurrent neural networks. We formulate RLMs for neural data by generalising the Kalman-filter-based likelihood calculation for latent linear dynamical systems (LDS) models to incorporate a generalised-linear observation process. We show that RLMs describe motor-cortical population data better than either directly-coupled generalised-linear models or latent linear dynamical system models with generalised-linear observations. We also introduce the cascaded linear model (CLM) to capture low-dimensional instantaneous correlations in neural populations. The CLM describes the cortical recordings better than either Ising or Gaussian models and, like the RLM, can be fit exactly and quickly. The CLM can also be seen as a generalization of a low-rank Gaussian model, in this case factor analysis. The computational tractability of the RLM and CLM allow both to scale to very high-dimensional neural data.", "full_text": "Recurrent linear models of\n\nsimultaneously-recorded neural populations\n\nMarius Pachitariu, Biljana Petreska, Maneesh Sahani\n\nGatsby Computational Neuroscience Unit\n\n{marius,biljana,maneesh}@gatsby.ucl.ac.uk\n\nUniversity College London, UK\n\nAbstract\n\nPopulation neural recordings with long-range temporal structure are often best un-\nderstood in terms of a common underlying low-dimensional dynamical process.\nAdvances in recording technology provide access to an ever-larger fraction of the\npopulation, but the standard computational approaches available to identify the\ncollective dynamics scale poorly with the size of the dataset. We describe a new,\nscalable approach to discovering low-dimensional dynamics that underlie simul-\ntaneously recorded spike trains from a neural population. We formulate the Re-\ncurrent Linear Model (RLM) by generalising the Kalman-\ufb01lter-based likelihood\ncalculation for latent linear dynamical systems to incorporate a generalised-linear\nobservation process. We show that RLMs describe motor-cortical population data\nbetter than either directly-coupled generalised-linear models or latent linear dy-\nnamical system models with generalised-linear observations. We also introduce\nthe cascaded generalised-linear model (CGLM) to capture low-dimensional in-\nstantaneous correlations in neural populations. The CGLM describes the cortical\nrecordings better than either Ising or Gaussian models and, like the RLM, can be\n\ufb01t exactly and quickly. The CGLM can also be seen as a generalisation of a low-\nrank Gaussian model, in this case factor analysis. The computational tractability\nof the RLM and CGLM allow both to scale to very high-dimensional neural data.\n\n1\n\nIntroduction\n\nMany essential neural computations are implemented by large populations of neurons working in\nconcert, and recent studies have sought both to monitor increasingly large groups of neurons [1, 2]\nand to characterise their collective behaviour [3, 4]. In this paper we introduce a new computational\ntool to model coordinated behaviour in very large neural data sets. While we explicitly discuss only\nmulti-electrode extracellular recordings, the same model can be readily used to characterise 2-photon\ncalcium-marker image data, EEG, fMRI or even large-scale biologically-faithful simulations.\nPopulational neural data may be represented at each time point by a vector yt with as many dimen-\nsions as neurons, and as many indices t as time points in the experiment. For spiking neurons, yt\nwill have positive integer elements corresponding to the number of spikes \ufb01red by each neuron in\nthe time interval corresponding to the t-th bin. As others have before [5, 6], we assume that the\ncoordinated activity re\ufb02ected in the measurement yt arises from a low-dimensional set of processes,\ncollected into a vector xt, which is not directly observed. However, unlike the previous studies,\nwe construct a recurrent model in which the hidden processes xt are driven directly and explicitly\nby the measured neural signals y1 . . . yt\u22121. This assumption simpli\ufb01es the estimation process. We\nassume for simplicity that xt evolves with linear dynamics and affects the future state of the neural\nsignal yt in a generalised-linear manner, although both assumptions may be relaxed. As in the latent\ndynamical system, the resulting model enforces a \u201cbottleneck\u201d, whereby predictions of yt based on\ny1 . . . yt\u22121 must be carried by the low-dimensional xt.\n\n1\n\n\fState prediction in the RLM is related to the Kalman \ufb01lter [7] and we show in the next section a\nformal equivalence between the likelihoods of the RLM and the latent dynamical model when ob-\nservation noise is Gaussian distributed. However, spiking data is not well modelled as Gaussian,\nand the generalisation of our approach to Poisson noise leads to a departure from the latent dynam-\nical approach. Unlike latent linear models with conditionally Poisson observations, the parameters\nof our model can be estimated ef\ufb01ciently and without approximation. We show that, perhaps in\nconsequence, the RLM can provide superior descriptions of neural population data.\n\n2 From the Kalman \ufb01lter to the recurrent linear model (RLM)\n\nConsider a latent linear dynamical system (LDS) model with linear-Gaussian observations.\nIts\ngraphical model is shown in Fig. 1A. The latent process is parametrised by a dynamics matrix\nA and innovations covariance Q that describe the evolution of the latent state xt:\n\nP (xt|xt\u22121) = N (xt|Axt\u22121, Q) ,\n\nwhere N (x|\u00b5, \u03a3) represents a normal distribution on x with mean \u00b5 and (co)variance \u03a3. For brevity,\nwe omit here and below the special case of the \ufb01rst time-step, in which x1 is drawn from a multivari-\nate Gaussian. The output distribution is determined by an observation loading matrix C and a noise\ncovariance R often taken to be diagonal so that all covariance is modelled by the latent process:\n\nP (yt|xt) = N (yt|Cxt, R) .\n\nIn the LDS, the joint likelihood of the observations {yt} can be written as the product:\n\nP (y1 . . . yT ) = P (y1)\n\nP (yt|y1 . . . yt\u22121)\n\nT(cid:89)\n\nt=2\n\n(cid:90)\n(cid:90)\n\nand in the Gaussian case can be computed using the usual Kalman \ufb01lter approach to \ufb01nd the condi-\ntional distributon at time t iteratively:\n\nP (yt+1|y1 . . . yt) =\n\ndxt+1 P (yt+1|xt+1)P (xt+1|y1 . . . yt)\ndxt+1 N (yt+1|Cxt+1, R) N (xt+1|A\u02c6xt, Vt+1)\n\n=\n= N (yt+1|CA\u02c6xt, CVt+1C(cid:62) + R) ,\n\ncertainty Vt+1 = E(cid:2)(xt+1 \u2212 A\u02c6xt)2|y1 . . . yt\n\nwhere we have introduced the (\ufb01ltered) state estimate \u02c6xt = E [xt|y1 . . . yt] and (predictive) un-\nKalman gain Kt = VtC(cid:62)(CVtC(cid:62) + R)\u22121, giving the following recursive recipe to calculate the\nconditional likelihood of yt+1:\n\n(cid:3). Both quantities are computed recursively using the\n\n\u02c6xt = A\u02c6xt\u22121 + Kt(yt \u2212 \u02c6yt)\nVt+1 = A(I \u2212 KtC)VtA(cid:62) + Q\n\u02c6yt+1 = CA\u02c6xt\n\nP (yt+1|y1 . . . yt) = N (yt+1| \u02c6yt+1, CVt+1C(cid:62) + R)\n\nFor the Gaussian LDS, the Kalman gain Kt and state uncertainty Vt+1 (and thus the output covari-\nance CVt+1C(cid:62) + R) depend on the model parameters (A, C, R, Q) and on the time step\u2014although\nas time grows they both converge to stationary values. Neither depends on the observations.\nThus, we might consider a relaxation of the Gaussian LDS model in which these matrices are taken\nto be stationary from the outset, and are parametrised independently so that they are no longer\nconstrained to take on the \u201ccorrect\u201d values as computed for Kalman inference. Let us call this\nparametric form of the Kalman gain W and the parametric form of the output covariance S. Then\nthe conditional likelihood iteration becomes\n\n\u02c6xt = A\u02c6xt\u22121 + W (yt \u2212 \u02c6yt)\n\n\u02c6yt+1 = CA\u02c6xt\n\nP (yt+1|y1 . . . yt) = N (yt+1| \u02c6yt+1, S) .\n\n2\n\n\fA x1\n\nA\n\nC\n\ny1\n\nB x0\n\n\u03b71\n\ny1\n\nC x0\n\nA\n\nA\n\n\u03b72\n\nA\n\nx2\n\nC\n\ny2\n\nx1\n\ny2\n\nx1\n\nx3\n\nC\n\ny3\n\nx2\n\ny3\n\nx2\n\nCA\n\nW\n\nCA\n\nW\n\nCA\n\nA\n\n\u2022 \u2022 \u2022\n\nA\n\n\u2022 \u2022 \u2022\n\n\u03b73\n\n\u03b7T -1\n\n\uf8fc\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8fd\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8fe LDS\n\nxT\n\nC\n\nyT\n\nxT -1\n\nyT\n\nA\n\nW\n\n\u2022 \u2022 \u2022\n\nA\n\nxT -1\n\nW\n\nCA\n\nRLM\n\ny1\n\ny2\n\ny3\n\nyT\n\nFigure 1: Graphical representa-\ntions of the latent linear dynamical\nsystem (LDS: A, B) and recurrent\nlinear model (RLM: C). Shaded\nvariables are observed, unshaded\ncircles are latent random variables\nand squares are variables that de-\npend deterministically on their par-\nents. In B the LDS is redrawn in\nterms of the random innovations\n\u03b7t = xt \u2212 Axt\u22121, facilitating the\ntransition towards the RLM. The\nRLM is then obtained by replacing\n\u03b7t with a deterministically derived\nestimate W (yt \u2212 \u02c6yt).\n\nThe parameters of this new model are A, C, W and S. This is a relaxation of the Gaussian latent\nLDS model because W has more degrees of freedom than Q, as does S than R (at least if R is\nconstrained to be diagonal). The new model has a recurrent linear structure in that the random\nobservation yt is fed back linearly to perturb the otherwise deterministic evolution of the state \u02c6xt.\nA graphical representation of this model is shown in Fig. 1C, along with a redrawn graph of the LDS\nmodel. The RLM can be viewed as replacing the random innovation variables \u03b7t = xt \u2212 Axt\u22121\nwith data-derived estimates W (yt \u2212 \u02c6yt); estimates which are made possible by the fact that \u03b7t\ncontributes to the variability of yt around \u02c6yt.\n\n3 Recurrent linear models with Poisson observations\n\nThe discussion above has transformed a stochastic-latent LDS model with Gaussian output to an\nRLM with deterministic latent, but still with Gaussian output. Our goal, however, is to \ufb01t a model\nwith an output distribution better suited to the binned point-processes that characterise neural spik-\ning. Both linear Kalman-\ufb01ltering steps above and the eventual stationarity of the inference param-\neters depend on the joint Gaussian structure of the assumed LDS model. They would not apply\nif we were to begin a similar derivation from an LDS with Poisson output. However, a tractable\napproach to modelling point-process data with low-dimensional temporal structure may be provided\nby introducing a generalised-linear output stage directly to the RLM. This model is given by:\n\n\u02c6xt = A\u02c6xt\u22121 + W (yt \u2212 \u02c6yt)\n\ng( \u02c6yt+1) = CA\u02c6xt\n\nP (yt+1|y1 . . . yt) = ExpFam(yt+1| \u02c6yt+1)\n\n(1)\n\nwhere ExpFam is an exponential-family distribution such as Poisson, and the element-wise link\nfunction g allows for a nonlinear mapping from xt to the predicted mean \u02c6yt+1. In the following, we\nwill write f for the inverse-link as is more common for neural models, so that \u02c6yt+1 = f(CA\u02c6xt).\nThe simplest Poisson-based generalised-linear RLM might take as its output distribution\n\nP (yt| \u02c6yt) =(cid:89)\n\nPoisson(yti|\u02c6yti);\n\n\u02c6yt = f(CA\u02c6xt\u22121)) ,\n\ni\n\nwhere yti is the spike count of the ith cell in bin t and the function f is non-negative. However,\ncomparison with the output distribution derived for the Gaussian RLM suggests that this choice\nwould fail to capture the instantaneous covariance that the LDS formulation transfers to the output\ndistribution (and which appears in the low-rank structure of S above). We can address this concern\nin two ways. One option is to bin the data more \ufb01nely, thus diminishing the in\ufb02uence of the instan-\ntaneous covariance. The alternative is to replace the independent Poissons with a correlated output\ndistribution on spike counts. The cascaded generalised-linear model introduced below is a natural\nchoice, and we will show that it captures instantaneous correlations faithfully with very few hidden\ndimensions.\n\n3\n\n\fIn practice, we also sometimes add a \ufb01xed input \u00b5t to equation 1 that varies in time and determines\nthe average behavior of the population or the peri-stimulus time histogram (PSTH).\n\n\u02c6yt+1 = f (\u00b5t + CAxt)\n\nNote that the matrices A and C retain their interpretation from the LDS models. The matrix A\ncontrols the evolution of the dynamical process xt. The phenomenology of its dynamics is deter-\nmined by the complex eigenvalues of A. Eigenvalues with moduli close to 1 correspond to long\ntimescales of \ufb02uctuation around the PSTH. Eigenvalues with non-zero imaginary part correspond\nto oscillatory components. Finally, the dynamics will be stable iff all the eigenvalues lie within the\nunit disc. The matrix C describes the dependence of the high-dimensional neural signals on the low-\ndimensional latent processes xt. In particular, equation 2 determines the \ufb01ring rate of the neurons.\nThis generalised-linear stage ensures that the \ufb01ring rates are positive through the link function f, and\nthe observation process is Poisson. For other types of data, the generalised-linear stage might be\nreplaced by other appropriate link functions and output distributions.\n\n3.1 Relationship to other models\n\nRLMs are related to recurrent neural networks [8]. The differences lie in the state evolution, which\nin the neural network is nonlinear: xt = h (Axt\u22121 + W yt\u22121); and in the recurrent term which\ndepends on the observation rather than the prediction error. On the data considered here, we found\nthat using sigmoidal or threshold-linear functions h resulted in models comparable in likelihood\nto the RLM, and so we restricted our attention to simple linear dynamics. We also found that\nusing the prediction error term W (yt\u22121 \u2212 \u02c6yt) resulted in better models than the simple neural-net\nformulation, and we attribute this difference to the link between the RLM and Kalman inference.\nIt is also possible to work within the stochatic latent LDS framework, replacing the Gaussian out-\nput distribution with a generalised-linear Poisson output (e.g. [6]). The main dif\ufb01culty here is the\nintractability of the estimation procedure. For an unobserved latent process xt, an inference pro-\ncedure needs to be devised to estimate the posterior distribution on the entire sequence x1 . . . xt.\nFor linear-Gaussian observations, this inference is tractable and is provided by Kalman smoothing.\nHowever, with generalised-linear observations, inference becomes intractable and the necessary ap-\nproximations [6] are computationally intense and can jeopardize the quality of the \ufb01tted models. By\ncontrast, in the RLM xt is a deterministic function of data. In effect, the Kalman \ufb01lter has been built\ninto the model as the accurate estimation procedure, and ef\ufb01cient \ufb01tting is possible by direct gradient\nascent on the log-likelihood. Empirically we did not encounter dif\ufb01culties with local minima during\noptimization, as has been reported for LDS models \ufb01t by approximate EM [9]. Multiple restarts\nfrom different random values of the parameters always led to models with similar likelihoods.\nNote that to estimate the matrices A and W the gradient must be backpropagated through succes-\nsive iterations of equation 1. This technique, known as backpropagation-through-time, was \ufb01rst\ndescribed by [10] as a technique to \ufb01t recurrent neural network models. Recent implementations\nhave demonstrated state-of-the-art language models [11]. Backpropagation-through-time is thought\nto be inherently unstable when propagated past many timesteps and often the gradient is truncated\nprematurely [11]. We found that using large values of momentum in the gradient ascent alleviated\nthese instabilities and allowed us to use backpropagation without the truncation.\n\n4 The cascaded generalised-linear model (CGLM)\n\nThe link between the RLM and the LDS raises the possibility that a model for simultaneously-\nrecorded correlated spike counts might be derived in a similar way, starting from a non-dynamical,\nbut low-dimensional, Gaussian model. Stationary models of population activity have attracted recent\ninterest for their own sake (e.g. [1]), and would also provide a way model correlations introduced\nby common innovations that were neglected by the simple Poisson form of the RLM. Thus, we\nconsider vectors y of spike counts from N neurons, without explicit reference to the time at which\nthey were collected. A Gaussian model for y can certainly describe correlations between the cells,\nbut is ill-matched to discrete count observations. Thus, as with the derivation of the RLM from the\nKalman \ufb01lter, we derive here a new generalisation of a low-dimensional, structured Gaussian model\nto spike count data.\n\n4\n\n\fThe distribution of any multivariate variable y can be factorized into a \u201ccascaded\u201d product of multi-\nple one-dimensional distributions:\n\nP (y) =\n\nP (yn|y