{"title": "Clustered factor analysis of multineuronal spike data", "book": "Advances in Neural Information Processing Systems", "page_first": 3500, "page_last": 3508, "abstract": "High-dimensional, simultaneous recordings of neural spiking activity are often explored, analyzed and visualized with the help of latent variable or factor models. Such models are however ill-equipped to extract structure beyond shared, distributed aspects of firing activity across multiple cells. Here, we extend unstructured factor models by proposing a model that discovers subpopulations or groups of cells from the pool of recorded neurons. The model combines aspects of mixture of factor analyzer models for capturing clustering structure, and aspects of latent dynamical system models for capturing temporal dependencies. In the resulting model, we infer the subpopulations and the latent factors from data using variational inference and model parameters are estimated by Expectation Maximization (EM). We also address the crucial problem of initializing parameters for EM by extending a sparse subspace clustering algorithm to integer-valued spike count observations. We illustrate the merits of the proposed model by applying it to calcium-imaging data from spinal cord neurons, and we show that it uncovers meaningful clustering structure in the data.", "full_text": "Clustered factor analysis of multineuronal spike data\n\nLars Buesing1, Timothy A. Machado1,2, John P. Cunningham1 and Liam Paninski1\n\n1 Department of Statistics, Center for Theoretical Neuroscience\n\n& Grossman Center for the Statistics of Mind\n\n2 Howard Hughes Medical Institute & Department of Neuroscience\n\nColumbia University, New York, NY\n\n{lars,cunningham,liam}@stat.columbia.edu\n\nAbstract\n\nHigh-dimensional, simultaneous recordings of neural spiking activity are often\nexplored, analyzed and visualized with the help of latent variable or factor mod-\nels. Such models are however ill-equipped to extract structure beyond shared,\ndistributed aspects of \ufb01ring activity across multiple cells. Here, we extend un-\nstructured factor models by proposing a model that discovers subpopulations or\ngroups of cells from the pool of recorded neurons. The model combines aspects\nof mixture of factor analyzer models for capturing clustering structure, and as-\npects of latent dynamical system models for capturing temporal dependencies. In\nthe resulting model, we infer the subpopulations and the latent factors from data\nusing variational inference and model parameters are estimated by Expectation\nMaximization (EM). We also address the crucial problem of initializing parame-\nters for EM by extending a sparse subspace clustering algorithm to integer-valued\nspike count observations. We illustrate the merits of the proposed model by ap-\nplying it to calcium-imaging data from spinal cord neurons, and we show that it\nuncovers meaningful clustering structure in the data.\n\n1\n\nIntroduction\n\nRecent progress in large-scale techniques for recording neural activity has made it possible to study\nthe joint \ufb01ring statistics of 102 up to 105 cells at single-neuron resolution. Such data sets grant\nunprecedented insight into the temporal and spatial structure of neural activity and will hopefully\nlead to an improved understanding of neural coding and computation.\nThese recording techniques have spurred the development of statistical analysis tools which help to\nmake accessible the information contained in simultaneously recorded activity time-series. Amongst\nthese tools, latent variable models prove to be particularly useful for analyzing such data sets [1,\n2, 3, 4]. They aim to capture shared structure in activity across different neurons and therefore\nprovide valuable summary statistics of high-dimensional data that can be used for exploratory data\nanalysis as well as for visualization purposes. The majority of latent variable models, however,\nbeing relatively general purpose tools, are not designed to extract additional structure from the data.\nThis leads to latent variables that can be hard to interpret biologically. Furthermore, additional\ninformation from other sources, such as spatial structure or genetic cell type information, cannot be\nreadily integrated into these models.\nAn approach to leveraging simultaneous activity recordings that is complementary to applying un-\nstructured factor models, is to infer detailed circuit properties from the data. By modelling the\ndetailed interactions between neurons in a local micro-circuit, multiple tools aim at inferring the\nexistence, type, and strength of synaptic connections between neurons [5, 6]. In spite of algorithmic\nprogress [7], the feasibility of this approach has only been demonstrated in circuits of up to three\n\n1\n\n\fneurons [8], as large scale data with ground truth connectivity is currently only rarely available.\nThis lack of validation data sets also makes it dif\ufb01cult to asses the impact of model mismatch and\nunobserved, highly-correlated noise sources (\u201ccommon input\u201d).\nHere, we propose a statistical tool for analyzing multi-cell recordings that offers a middle ground\nbetween unstructured latent variable models and models for inferring detailed network connectivity.\nThe basic goal of the model is to cluster neurons into groups based on their joint activity statistics.\nClustering is a ubiquitous and valuable tool in statistics and machine learning as it often yields\ninterpretable structure (a partition of the data), and is of particular relevance in neuroscience because\nneurons often can be categorized into distinct groups based on their morphology, physiology, genetic\nidentity or stimulus-response properties. In many experimental setups, side-information allowing for\na reliable supervised partitioning of the recorded neurons is not available. Hence, the main goal of\nthe paper is to develop a method for clustering neurons based on their activity recordings.\nWe model the \ufb01ring time-series of a cluster of neurons using latent factors, assuming that different\nclusters are described by disjoint sets of factors. The resulting model is similar to a mixture of factor\nanalyzers [9, 10] with Poisson observations, where each mixture component describes a subpopu-\nlation of neurons. In contrast to a mixture of factor analyzers model which assumes independent\nfactors, we put a Markovian prior over the factors, capturing temporal dependencies of neural ac-\ntivity as well as interactions between different clusters over time. The resulting model, which we\ncall mixture of Poisson linear dynamical systems (mixPLDS) model, is able to capture more struc-\nture using the cluster assignments compared to latent variable models previously applied to neural\nrecordings, while at the same time still providing low-dimensional latent trajectories for each cluster\nfor exploratory data analysis and visualization. In contrast to the lack of connectivity ground truth\nfor neurons from large-scale recordings, there are indeed large-scale activity recordings available\nthat exhibit rich and biologically interpretable clustering structure, allowing for a validation of the\nmixPLDS model in practice.\n\n2 Mixture of Poisson linear dynamical systems for modelling neural\n\nsubpopulations\n\n2.1 Model de\ufb01nition\n\nLet ykt denote the observed spike count of neuron k = 1, . . . , K in time-bin t = 1, . . . , T . For\nthe mixture of Poisson linear dynamical systems (mixPLDS) model, we assume that each neuron k\nbelongs to exactly one of M groups (subpopulations, clusters), indicated by the discrete (categorical)\nvariable sk \u2208 {1, . . . , M}. The sk are modelled as i.i.d.:\n\nK(cid:89)\n\nK(cid:89)\n\np(s) =\n\np(sk) =\n\nk=1\n\nk=1\n\nDisc(sk|\u03c60),\n\n(1)\n\n0, . . . , \u03c6M\n\n0 ) are the natural parameters of the categorical distribution. In the remain-\nwhere \u03c60 := (\u03c61\nder of the paper we use the convention that the group-index m = 1, . . . , M is written as superscript.\nt \u2208 Rdm. We\nThe activity of each subpopulation m at time t is modeled by a latent variable xm\nassume that these latent variables (we will also call them factors) are jointly normal and we model\ninteractions between different groups by a linear dynamical system (LDS) prior:\n\n\uf8eb\uf8ec\uf8ed x1\n\n...\nxM\nt\n\nt\n\n\uf8f6\uf8f7\uf8f8 = Axt\u22121 + \u03b7t =\n\n\uf8eb\uf8ec\uf8ed A11\n\n...\nAM 1\n\n\uf8f6\uf8f7\uf8f8\n\n\uf8eb\uf8ec\uf8edx1\n\nt\u22121\n...\nxM\nt\u22121\n\n\uf8f6\uf8f7\uf8f8 + \u03b7t,\n\n\u00b7\u00b7\u00b7 A1M\n...\n\u00b7\u00b7\u00b7 AMM\n\nxt =\n\n(2)\n\nwhere the block-matrices Aml \u2208 Rdm\u00d7dl capture the interactions between groups m and l. The\ninnovations \u03b7t are i.i.d. from N (0, Q) and the starting distribution is given by x1 \u223c N (\u00b51, Q1). If\nneuron k belongs to group m, i.e. sk = m, we model its activity ykt at time t as Poisson distributed\nspike count with a log-rate given by an af\ufb01ne combination of the factors of group m:\n\nzkt | sk = m = C m\nykt | zkt, sk \u223c Poisson(exp(zkt + bk)),\n\nk: xm\nt\n\n(3)\n(4)\n\n2\n\n\fC := (C 1 \u00b7\u00b7\u00b7 CM ) \u2208 RK\u00d7d, where d := (cid:80)M\n\nwhere b \u2208 RK captures the baseline of the \ufb01ring rates. We denote with C m \u2208 RK\u00d7dm the\ngroup loading matrix with rows C m\nk: for neurons k in group m and \ufb01ll in the remaining rows\nwith 0s for all neurons not in group m. We concatenate these into the total loading matrix\nm=1 dm is the total latent dimension. If the neu-\nrons are sorted with respect to their group membership, then the total loading C has block-diagonal\nstructure. Further, we denote with yk: := (yk,1 \u00b7\u00b7\u00b7 yk,T ) the activity time series of neuron k and\nn,T ) \u2208 R1\u00d7T for n = 1, . . . , dm. The model\nuse an analogous notation for xm\nparameters are \u03b8 := (A, Q, Q1, \u00b51, C, b); we consider the hyper-parameters \u03c60 to be given and\n\ufb01xed.\nFor known clusters s, the mixPLDS model can be regarded as a special case of the Poisson linear\ndynamical system (PLDS) model [3], where the loading C is block-diagonal. For unknown group\nmemberships s, the mixPLDS model de\ufb01ned above is similar to a mixture of factor analyzers (e.g.\nsee [9, 10]) with Poisson observations over neurons k = 1, . . . , K. In the mixPLDS model however,\nwe do not restrict the factors of the mixture components to be independent but allow for interactions\nover time which are modeled by a LDS.\n\nn,1 \u00b7\u00b7\u00b7 xm\n\nn := (xm\n\n2.2 Variational inference and parameter estimation for the mixPLDS model\n\nWhen applying the mixPLDS model to data y, we are interested in inferring the group memberships\ns and the latent trajectories x as well as estimating the parameters \u03b8. For known parameters \u03b8,\nthe posterior p(x, s|y, \u03b8) (even in the special case of a single mixture component M = 1) is not\navailable in closed form and needs approximating. Here we propose to approximate the posterior\nusing variational inference with the following factorization assumption:\n\np(x, s|y, \u03b8) \u2248 q(x)q(s).\n\n(5)\nWe further restrict q(x) to be a normal distribution q(x) = N (x|m, V ) with mean m and covariance\nk q(sk) where q(sk) is a\ncategorical distribution with natural parameters \u03c6k = (\u03c61\nk ). The variational parameters\nm, V and \u03c6 = (\u03c61, . . . , \u03c6K) are obtained by maximizing the variational lower bound of the log\nmarginal likelihood log p(y|\u03b8):\nL(m, V, \u03c6, \u03b8) =\n\nV . Under the assumption (5), q(s) further factorizes into the product(cid:81)\n(cid:0)log |V | \u2212 tr[\u03a3\u22121V ] \u2212 (m \u2212 \u00b5)(cid:62)\u03a3\u22121(m \u2212 \u00b5)(cid:1) +\nK(cid:88)\nM(cid:88)\n\nDKL[q(sk)(cid:107)p(sk)]\n\nk, . . . , \u03c6M\n\nT(cid:88)\n\nK(cid:88)\n\nk=1\n\n\u03c0m\nk (ykthm\n\nkt \u2212 exp(hm\n\nkt + \u03c1m\n\nkt/2)) + const\n\n1\n2\n\n+\n\n(6)\n\nm=1\n\nk=1\n:= C mmt + b,\n\nt=1\n\nhm\nt\n\n:= diag(C mVtC m(cid:62)\n\n),\n\n\u03c1m\nt\n\nk \u221d exp(\u03c6m\n\u03c0m\nk ),\n\nwhere Vt = Covq(x)[xt] and \u00b5 \u2208 RdT , \u03a3 \u2208 RdT\u00d7dT are the mean and covariance of the LDS\nprior over x. The \ufb01rst two terms in (6) are the Kullback-Leibler divergence between the prior\np(x, s) = p(x)p(s) and its approximation q(x)q(s), penalizing a variational posterior that is far\naway from the prior. The third term in (6) is given by the expected log-likelihood of the data,\npromoting a posterior approximation that explains the observed data well. We optimize L in a\ncoordinate ascent manner, i.e. we hold \u03c6 \ufb01xed and optimize jointly over m, V and vice versa. A\nnaive implementation of the optimization of L over {m, V } is prohibitively costly for data sets with\nlarge T , as the posterior covariance V has O((dT )2) elements and has to be optimized over the set\nof semi-de\ufb01nite matrices. Instead of solving this large program, we apply a method proposed in\n[11], where the authors show that Gaussian variational inference for latent Gaussian models with\nPoisson observations can be solved more ef\ufb01ciently using the dual problem. We generalize their\napproach to the mixture of Poisson observation model (3) considered here, and we also leverage the\nMarkovian structure of the LDS prior to speed up computations (see below). In the supplementary\nmaterial, we derive this approach to inference in the mixPLDS model in detail. The optimization\nover \u03c6 is available in closed form and is also given in the supplementary material. We iterate updates\nover m, V and \u03c6. In practice, this method converges very quickly, often requiring only two or three\niterations to reach a reasonable convergence criterion.\nThe most computationally intensive part of the proposed variational inference method is the update\nof m, V . Using properties of the LDS prior (i.e. the prior precision \u03a3\u22121 is block-tri-diagonal),\n\n3\n\n\fwe can show that evaluation of L, its dual and the gradient of the latter all cost O(KT d + T d3),\nwhich is the same complexity as Kalman smoothing in a LDS with Gaussian observations or a\nsingle iteration of Laplace inference over x. While having the same cost as Laplace approximation,\nvariational inference has the advantage of a non-deceasing variational lower bound L, which can be\nused for monitoring convergence as well as for model comparison.\nWe can also get estimates for the model parameters by maximizing the lower bound L over \u03b8. To\nthis end, we interleave updates of \u03c6 and m, V with maximizations over \u03b8. The latter corresponds to\nstandard parameter updates in a LDS model with Poisson observations and are discussed e.g. in [3].\nThis procedure implements variational Expectation Maximization (VEM) in the mixPLDS model.\n\n2.3\n\nInitialization by Poisson subspace clustering\n\nIn principle, for a given number of groups M with given dimensions d1, . . . , dM one can estimate\nthe parameters of the mixPLDS using VEM as described above. In practice we \ufb01nd however that\nthis yields poor results without having reasonable initial membership assignments s, i.e. reasonable\ninitial values for the variational parameters \u03c6. Furthermore, VEM requires the a priori speci\ufb01cation\nof the latent dimensions d1, . . . , dM . Here we show that a simple extension to an existing subspace\nclustering algorithm provides, given the number of groups M, a suf\ufb01ciently accurate initializer for\n\u03c6 and allows for an informed choice for the dimensions d1, . . . , dM .\nWe \ufb01rst illustrate the connection of the mixPLDS model to the subspace clustering problem (for\na review of the latter see e.g. [12]). Assume that we observe the log-rates zkt de\ufb01ned in equation\n(3) directly; we denote the corresponding data matrix as Z \u2208 RK\u00d7T . For unknown loading C, the\nrow Zk: lies on a dm-dimensional subspace spanned by the \u201cbasis-trajectories\u201d xm\ndm,:, if\nneuron k is in group m. If s and x are unobserved, we only know that the rows of Z lie on a union\nof M subspaces of dimensions d1, . . . , dm in an ambient space of dimension T . Reconstructing the\nsubspaces and the subspace assignments is known as a subspace clustering problem and connections\nto mixtures of factor analyzers have been pointed out in [13]. The authors of [13] propose to solve\nthe subspace clustering problem by the means of the following sparse regression problem:\n\n1,:, . . . , xm\n\nmin\n\nW\u2208RK\u00d7K\ns.t.\n\n(cid:107)Z \u2212 W Z(cid:107)2\n\n1\n2\ndiag(W ) = 0.\n\nF + \u03bb(cid:107)W(cid:107)1\n\n(7)\n\nThis optimization can be interpreted as trying to reconstruct each row Zk: by the remaining rows\nZ\\k: using sparse reconstruction weights W . Intuitively, a point on a subspace can be reconstructed\nusing the fewest reconstruction weights by points on the same subspace, i.e. Wkl = 0 if k and l lie\non different subspaces. The symmetrized, sign-less weights |W| + |W|(cid:62) are then interpreted as the\nadjacency matrix of a graph and spectral clustering, with a user de\ufb01ned number of clusters M, is\napplied to obtain a subspace clustering solution. In the noise-free case (and taking \u03bb \u2192 0 in eqn 7),\nunder linear independence assumptions on the subspaces, [13] shows that this procedure recovers\nthe correct subspace assignments.\nIf the matrix Z is not observed directly but only through the observation model (3), the subspace\nclustering approach does not directly apply. The observed data Y generated from the model (3)\nis corrupted by Poisson noise and furthermore the non-linear link function transforms the union of\nsubspaces into a union of manifolds. We can circumvent these problems using the simple observa-\ntion that not only Z but also the rows Ck: of the loading matrix C lie on a union of subspaces of\ndimensions d1, . . . , dm (where the ambient space has dimension d). This can be easily seen from the\nblock-diagonal structure of C (if the neurons are sorted by their true cluster assignments) mentioned\nin section 2.1. Hence we can use an estimate \u02dcC of the loading C as input to the subspace clustering\noptimization (7). In order to get an initial estimate \u02dcC we can use a variety of dimensionality re-\nduction methods with exp-Poisson observations, e.g. exponential family PCA [14], a nuclear norm\nbased method [15], subspace identi\ufb01cation methods [16] and EM-based PLDS learning [16]; here\nwe use the nuclear norm based method [15] for reasons that will become obvious below. Because of\nthe non-identi\ufb01ability of latent factor models, these methods only yield an estimate of C \u00b7 D with an\nunknown, invertible transformation D \u2208 Rd\u00d7d. Nevertheless, the rows of C\u00b7D still lie on a union of\nsubspaces (which are however not axis-aligned anymore as is the case for C), and therefore the clus-\nter assignments can still be recovered. Given these cluster assignments, we can get initial estimates\nof the non-zero rows of C m by applying nuclear norm minimization to the individual clusters. This\n\n4\n\n\fmethod also returns a singular value spectrum associated with each subspace, which can be used to\ndetermine the dimension dm. One can specify e.g. a threshold \u03c3min, and determine the dimension\ndm as the number of singular values > \u03c3min.\n\n2.4 The full parameter estimation algorithm\n\nWe brie\ufb02y summarize the proposed parameter estimation algorithm for the mixPLDS model. The\nprocedure requires the user to de\ufb01ne the number of groups M. This choice can either be informed\nby biological prior knowledge or one can use standard model selection methods, such as cross-\nvalidation on the variational approximation of the marginal likelihood. We \ufb01rst get an initial es-\ntimate \u02dcC of the total loading matrix by nuclear-norm-penalized Poisson dimensionality reduction.\nThen, subspace clustering on \u02dcC yields initial group assignments. Based on these assignments, for\neach cluster we estimate the group dimension dm and the group loading \u02dcC m. Keeping the cluster\nassignments \ufb01xed, we do a few VEM steps in the mixPLDS model with an initial estimation for the\nloading matrix given by ( \u02dcC 1, . . . , \u02dcCM ). This last step provides reasonable initial parameters for the\nparameters A, Q, Q1, \u00b51 of the dynamical system prior. Finally, we do full VEM iterations in the\nmixPLDS model to re\ufb01ne the initial parameters. We monitor the increase of the variational lower\nbound L and use its increments in a termination criterion for the VEM iterations.\n\n2.5 Non-negativity constraints on the loading C\n\nEach component m of the mixPLDS model, representing a subpopulation of neurons, can be a\nvery \ufb02exible model by itself (depending on the latent dimension dm). This \ufb02exibility can in some\nsituations lead to counter-intuitive clustering results. Consider the following example. Let half of\nthe recorded neurons oscillate in phase and the remaining neurons oscillate with a phase shift of\n\u03c0 relative to the \ufb01rst half. Depending on the context, we might be interested in clustering the \ufb01rst\nand second half of the neurons into separate groups re\ufb02ecting oscillation phase. The mixPLDS\nmodel could however end up putting all neurons into a single cluster, by modelling them with one\noscillating latent factor that has positive loadings on the \ufb01rst half of neurons and negative on the\nsecond half (or vice versa). We can prevent this behavior, by imposing element-wise non-negativity\nconstraints on the loading matrix C, denoted as C \u2265 0 (and by simultaneously constraining the\nlatent dimensions of each group). The constraints guarantee that the in\ufb02uence of each factor on its\ngroup has the same sign across all neurons. The suitability of these constraints strongly depends on\nthe biological context. In the application of the mixPLDS model in section 3.2, we found them to\nbe essential for obtaining meaningful results.\nWe modify the subspace clustering initialization to respect the constraints C \u2265 0 in the follow-\ning way. Instead of solving the unconstrained reconstruction problem (7) with respect to W , we\nadd non-negativity constraints W \u2265 0. These sign constraints restrict the points that can be re-\nconstructed from a given set of points to the convex cone of these points (instead of the subspace\ncontaining these points). Hence, under these assumptions, all data points in a cluster can be ap-\nproximately reconstructed by a (non-negative) convex combination of some \u201ctime-series basis\u201d. We\nempirically observed that this yields initial loading matrix estimates with only very few negative\nelements (after possible row-wise sign inversions). For the full mixPLDS model we enforce C \u2265 0\nby the reparametrization C = exp(\u03c7) and doing VEM updates on \u03c7.\n\n3 Experiments\n\n3.1 Arti\ufb01cial data\n\nHere we validate the parameter estimation procedure for the mixPLDS model on arti\ufb01cial data. We\ngenerate 35 random ground truth mixPLDS models with M = 3, d1 = d2 = d3 = 2 and 20 observed\nneurons per cluster. We sampled from each ground truth model a data set consisting of 4 i.i.d. trials\nwith T = 250 times steps each. Ground truth parameters were generated such that the resulting data\nwas sparse (12% of the bins non-empty). We compared the ability of different clustering methods\nto recover the 3 clusters from each data set. We report the results in \ufb01g. 1A in terms of the fraction\nof misclassi\ufb01ed neurons (class labels were determined by majority vote in each cluster). We applied\nK-Means with careful initialization of the cluster centers [17] to the data. For K-Means, we pre-\n\n5\n\n\fA\n\nB\n\nFigure 1: Finding clusters of neurons in arti\ufb01cial data. A: Performance of different clustering\nalgorithms, reported in terms of frequency of misclassi\ufb01ed neurons, on arti\ufb01cial data sampled from\nground truth mixPLDs models. Red bars indicate medians and blue boxes the 25% and 75% per-\ncentiles. Standard clustering methods (data plotted in black) such as K-Means, spectral clustering\n(\u201cspecCl\u201d), and subspace clustering (\u201csubCl\u201d) are substantially outperformed by the two methods\nproposed here (data plotted in red). Poisson subspace clustering (\u201cPsubCl\u201d) yielded accurate initial\ncluster estimates that were signi\ufb01cantly improved by application of the full mixPLDs model. B:\nMisclassi\ufb01cation rate as a function of the cluster assignment uncertainty for the mixPLDS model.\nThis shows that the posterior over cluster assignments returned by the mixPLDS model is well cali-\nbrated, as neurons with low assignment uncertainty as rarely misclassi\ufb01ed.\n\nprocessed the data in a standard way by smoothing (Gaussian kernel, standard deviation 10 time-\nsteps), mean-centering and scaling (such that each dimension k = 1, . . . , K has variance 1). We\nfound K-Means yielded reasonable clusters when all populations are one-dimensional (i.e. \u2200m dm =\n1, data not shown) but it fails when clustering multi-dimensional groups of neurons. An alternative\napproach is to cluster the cross-correlation matrix of neurons (computed from pre-processed data as\nabove) with standard spectral clustering [18]. We found that this approach works well when all the\nfactors have small variances, as in this case the link function of the observation model is only mildly\nnon-linear. However, with growing variances of the factors (larger dynamic ranges of neurons)\nspectral clustering performance quickly degrades. Standard sparse subspace clustering [13] on the\nspike trains (pre-processed as above) yielded very similar results to spectral clustering. We found\nour novel Poisson subspace clustering algorithm proposed in section 2.3 to robustly outperform the\nother approaches, as long as reasonable amounts of data were available (roughly T > 100 for the\nabove system). The mixPLDS model initialized with the Poisson subspace clustering consistently\nyielded the best results, as it is able to integrate information over time and denoise the observations.\nOne advantage of the mixPLDS model is that it not only returns cluster assignments for neurons\nbut also provides a measure of uncertainty over these assignments. However, variational inference\ntends to return over-con\ufb01dent posteriors in general and the factorization approximation (5) might\nyield posterior uncertainty that is uninformative. To show that the variational posterior uncertainty\nis well-calibrated we computed the entropy of the posterior cluster assignment q(sk) for all neurons\nas a measure for assignment uncertainty. We binned the neurons according to their assignment\nuncertainty and report the misclassi\ufb01cation rate for each bin in \ufb01g. 1B. 89% of the neurons have low\nposterior uncertainty and reside in the \ufb01rst bin having a low misclassi\ufb01cation rate of \u2248 0.1, whereas\nfew neurons (5%) have an assignment uncertainty larger than 0.3 nats and they are misclassi\ufb01ed\nwith a rate of \u2248 0.4.\n\n3.2 Calcium imaging of spinal cord neurons\n\nWe tested the mixPLDS model on calcium imaging data obtained from an in vitro, neonatal mouse\nspinal cord that expressed the calcium indicator GCaMP3 in all motor neurons. When an isolated\nspinal cord is tonically excited by a cocktail of rhythmogenic drugs (5 \u00b5M NMDA, 10 \u00b5M 5-HT,\n50 \u00b5M DA), motor neurons begin to \ufb01re rhythmically. In this network state, spatially clustered en-\nsembles of motor neurons \ufb01re in phase with each other [19]. Since multiple ensembles that have\ndistinct phase tunings can be visualized in a single imaging \ufb01eld, this data represents a convenient\n\n6\n\nKmeansspecClsubClPsubClmixPLDS00.5freq. of misclassification00.10.20.30.400.5assignment uncertaintyfreq. of misclassification\fA\n\nB\n\nC\n\nFigure 2: Application of the mixPLDS model to recordings from spinal cord neurons. A, top\npanel: 500 frames of input data to the mixPLDS model. Middle panel: Same data as in upper panel,\nbut rows are sorted by mixPLDS clusters and factor loading. Inferred latent factors (red: cluster 1,\nblue: cluster 2, solid: factor 1, dashed: factor 2) are also shown. Bottom panel: Inferred (smoothed)\n\ufb01ring rates. B: Loading matrix C of the mixPLDS model showing how factors 1,2 of cluster 1 and\nfactors 3,4 of cluster 2 in\ufb02uence the neurons. C: Preferred phases shown as a function of (sorted)\nneuron index and colored by posterior probability of belonging to cluster 1. Clearly visible are two\nclusters as well as an (approximately) increasing ordering within a cluster.\n\nsetting for testing our algorithm. The data (90 second long movies) were acquired at 15 Hz from\na custom two-photon microscope equipped with a resonant scanner (downsampled from 60 Hz to\nboost SNR). The frequency of the rhythmic activity was typically 0.2 Hz. In addition, aggregate mo-\ntor neuron activity was simultaneously acquired with each movie using a suction electrode attached\nto a ventral root. This electrophysiology recording (referred to here as ephys-trace) was used as an\nexternal phase reference point to compute phase tuning curves for imaged neurons, which we used\nto validate our mixPLDS results.\nA deconvolution algorithm [20] was applied to the recorded calcium time-series to estimate the\nspiking activity of 70 motor neurons. The output of the deconvolution, a 70 \u00d7 1140 (neurons \u00d7\nframes) matrix of posterior expected number of spikes, was used as input to the mixPLDS model.\nThe non-empty bins of the the \ufb01rst 500 out of the 1140 frames of input data (thresholded at 0.1)\nare shown in \ufb01g. 2A (upper panel). We used a mixPLDS model with M = 2 groups with two\nlatent dimensions each, i.e. d1 = d2 = 2. We imposed the non-negativity constraints C \u2265 0 on the\nloading matrix; these were found to be crucial for \ufb01nding a meaningful clustering of the neurons,\nas discussed above. The mixPLDS clustering reveals two groups with strongly periodic but phase-\nshifted population activities, as can be seen from the inferred latent factors shown in \ufb01g. 2A (middle\npanel, factors of cluster 1 shown in red, factors of cluster 2 in blue). For each cluster, the model\nlearned a stronger (higher variance) latent factor (solid line) and a weaker one (dashed line); we\ninterpret the former as capturing the main activity structure in a cluster and the latter as describing\ndeviations. Based on the estimated mixPLDS model, we sorted the neurons for visualization into\ntwo clusters according to their most likely cluster assignment argmaxsk=1,2 q(sk). Within each\ncluster, we sorted the neurons according to the ratio of the loading coef\ufb01cient onto the stronger\nfactor over the loading onto the weaker factor. Re-plotting the spike-raster with this sorting in \ufb01g.\n2A (middle panel) reveals interesting structure. First, it shows that the initial choice of two clusters\nwas well justi\ufb01ed for this data set. Second, the sorting reveals that the majority of neurons tend to\n\n7\n\n701170frames1500701sortedneuron #sortedneuron #unsortedneuron #factorscluster 1factorscluster 2latent dimsorted neuron #14170\f\ufb01re at a preferred phase relative to the oscillation cycle, and the mixPLDS-based sorting corresponds\nto an increasing ordering of preferred phases. Fig. 2B shows the loading matrix C of the mixPLDS,\nwhich is found to be approximately block-diagonal.\nOn this data set we also have the opportunity to validate the unsupervised clustering by taking into\naccount the simultaneously recorded ephys-trace. We computed for each neuron a phase tuning\ncurve based on the ephys-trace history of the last 80 times steps (estimated via L2 regularized gen-\neralized linear model estimation, with an exp-Poisson observation model). For each neuron, we\nextracted the peak location of this phase tuning curve, which we call the preferred phase. Fig. 2C\nshows these preferred phases as a function of (sorted) neuron index, revealing that the two clusters\nfound by the mixPLDS model coincide well with the two modes of the bi-model distribution of pre-\nferred phases. Furthermore, within each cluster, the preferred phases are (approximately) increasing,\nshowing that the mixPLDS-sorting of neurons re\ufb02ects the phase-relation of the neurons to the global,\noscillatory ephys-trace. We emphasize that the latter was not used for \ufb01tting the mixPLDS; i.e., this\nconstitutes an independent validation of our results.\nWe conclude that the mixPLDS model successfully uncovered clustering structure from the record-\nings that can be validated using the side information from electrophysiological tuning, and further-\nmore allowed for a meaningful sorting within each cluster capturing neural response properties. In\naddition, the mixPLDS model leverages the temporal structure in recordings, automatically optimiz-\ning for the temporal smoothness level and revealing the main time-constants in the data (in the above\ndata set 1.8 and 6.5 sec) as well as main oscillation frequencies (0.2 and 0.45Hz). Furthermore, ei-\nther the latent trajectories or the inferred \ufb01ring rates shown in \ufb01g. 2A can be used as smoothed\nproxies for their corresponding population activities for subsequent analyses.\n\n4 Discussion\n\nOne can generalize the mixPLDS model in several ways. Here we assumed that, given the latent fac-\ntors, all neurons \ufb01re independently. This is presumably a good assumption if the recorded neurons\nare spatially distant, but it might break down if neurons are densely sampled from a local population\nand have strong, monosynaptic connections. This more general case can be accounted for by incor-\nporating direct interaction terms between neurons into the observation model in the spirit of coupled\nGLMs (see [21]); inference and parameter learning are still tractable in this model using VEM. Fur-\nthermore, in addition to the activity recordings, one might have access to other covariates that are\ninformative about the clustering structure of the population, such as cell location, genetic markers,\nor cell morphology. We can add such data as additional observations into the mixPLDS model to\nfacilitate clustering of the cells. An especially relevant example are stimulus-response properties of\ncells. We can add a mixture model over receptive-\ufb01eld parameters using the cluster assignments s.\nThis extension would provide a clustering of neurons based on their joint activity statistics (such as\nshared trial-to-trial variability) as well as on their receptive \ufb01eld properties.\nWe presented three technical contributions, that we expect to be useful outside the context of the\nmixPLDS model. First, we proposed a simple extension of the sparse subspace clustering algorithm\nto Poisson observations. We showed that if the dimension of the union of subspaces is much smaller\nthan the ambient dimension, our method substantially outperforms other approaches. Second, we\nintroduced a version of subspace clustering with non-negativity constraints on the reconstruction\nweights, which therefore clusters points into convex cones. We expect this variant to be particularly\nuseful when clustering activity traces of cells, allowing for separating anti-phasic oscillations. Third,\nwe applied the dual variational inference approach of [11] to a model with a Markovian prior and\nwith mixtures of Poisson observations. The resulting inference method proved itself numerically\nrobust, and we expect it to be a valuable tool for analyzing time-series of sparse count variables.\n\nAcknowledgements This work was supported by Simons Foundation (SCGB#325171 and\nSCGB#325233), Grossman Center at Columbia University, and Gatsby Charitable Trust as well\nas grants MURI W911NF-12-1-0594 from the ARO, vN00014-14-1-0243 from the ONR, W91NF-\n14-1-0269 from DARPA and an NSF CAREER award (L.P.).\n\n8\n\n\fReferences\n[1] Anne C Smith and Emery N Brown. Estimating a state-space model from point process observations.\n\nNeural Computation, 15(5):965\u2013991, 2003.\n\n[2] Lauren M Jones, Alfredo Fontanini, Brian F Sadacca, Paul Miller, and Donald B Katz. Natural stimuli\nevoke dynamic sequences of states in sensory cortical ensembles. Proceedings of the National Academy\nof Sciences, 104(47):18772\u201318777, 2007.\n\n[3] Jakob H Macke, Lars Buesing, John P Cunningham, M Yu Byron, Krishna V Shenoy, and Maneesh\n\nSahani. Empirical models of spiking in neural populations. In NIPS, pages 1350\u20131358, 2011.\n\n[4] Byron M Yu, John P Cunningham, Gopal Santhanam, Stephen I Ryu, Krishna V Shenoy, and Maneesh\nSahani. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population\nactivity. In NIPS, pages 1881\u20131888, 2008.\n\n[5] Murat Okatan, Matthew A Wilson, and Emery N Brown. Analyzing functional connectivity using a\nnetwork likelihood model of ensemble neural spiking activity. Neural Computation, 17(9):1927\u20131961,\n2005.\n\n[6] Yuriy Mishchenko, Joshua T Vogelstein, Liam Paninski, et al. A Bayesian approach for inferring neuronal\nconnectivity from calcium \ufb02uorescent imaging data. The Annals of Applied Statistics, 5(2B):1229\u20131261,\n2011.\n\n[7] Suraj Keshri, Eftychios Pnevmatikakis, Ari Pakman, Ben Shababo, and Liam Paninski. A shotgun\narXiv preprint\n\nsampling solution for the common input problem in neural connectivity inference.\narXiv:1309.3724, 2013.\n\n[8] Felipe Gerhard, Tilman Kispersky, Gabrielle J Gutierrez, Eve Marder, Mark Kramer, and Uri Eden. Suc-\ncessful reconstruction of a physiological circuit with known connectivity from spiking activity alone.\nPLoS computational biology, 9(7):e1003138, 2013.\n\n[9] Michael E Tipping and Christopher M Bishop. Mixtures of probabilistic principal component analyzers.\n\nNeural Computation, 11(2):443\u2013482, 1999.\n\n[10] Zoubin Ghahramani, Geoffrey E Hinton, et al. The EM algorithm for mixtures of factor analyzers. Tech-\n\nnical report, Technical Report CRG-TR-96-1, University of Toronto, 1996.\n\n[11] Mohammad Emtiyaz Khan, Aleksandr Aravkin, Michael Friedlander, and Matthias Seeger. Fast dual\nvariational inference for non-conjugate latent gaussian models. In Proceedings of The 30th International\nConference on Machine Learning, pages 951\u2013959, 2013.\n\n[12] Ren\u00b4e Vidal. A tutorial on subspace clustering. IEEE Signal Processing Magazine, 28(2):52\u201368, 2010.\n[13] Ehsan Elhamifar and Ren\u00b4e Vidal. Sparse subspace clustering: Algorithm, theory, and applications. Pattern\n\nAnalysis and Machine Intelligence, IEEE Transactions on, 35(11):2765\u20132781, Nov 2013.\n\n[14] Michael Collins, Sanjoy Dasgupta, and Robert E Schapire. A generalization of principal component\n\nanalysis to the exponential family. In NIPS, volume 13, page 23, 2001.\n\n[15] David Pfau, Eftychios A Pnevmatikakis, and Liam Paninski. Robust learning of low-dimensional dynam-\n\nics from large neural ensembles. In NIPS, pages 2391\u20132399, 2013.\n\n[16] Lars Buesing, Jakob H Macke, and Maneesh Sahani.\n\nSpectral learning of linear dynamics from\ngeneralised-linear observations with application to neural population data. In NIPS, pages 1691\u20131699,\n2012.\n\n[17] David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings\nof the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027\u20131035. Society for\nIndustrial and Applied Mathematics, 2007.\n\n[18] Andrew Y Ng, Michael I Jordan, and Yair Weiss. On spectral clustering1 analysis and an algorithm. Pro-\nceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 14:849\u2013\n856, 2001.\n\n[19] Timothy A. Machado, Eftychios Pnevmatikakis, Liam Paninski, Thomas M. Jessell, and Andrew Miri.\nIn 79th Cold Spring\n\nFunctional organization of spinal motor neurons revealed by ensemble imaging.\nHarbor Symposium on Quantitative Biology Cognition, 2014.\n\n[20] E. A. Pnevmatikakis, Y. Gao, D. Soudry, D. Pfau, C. Lace\ufb01eld, K. Poskanzer, R. Bruno, R. Yuste, and\nL. Paninski. A structured matrix factorization framework for large scale calcium imaging data analysis.\nArXiv e-prints, September 2014.\n\n[21] Jonathan W Pillow, Jonathon Shlens, Liam Paninski, Alexander Sher, Alan M Litke, EJ Chichilnisky, and\nEero P Simoncelli. Spatio-temporal correlations and visual signalling in a complete neuronal population.\nNature, 454(7207):995\u2013999, 2008.\n\n9\n\n\f", "award": [], "sourceid": 1841, "authors": [{"given_name": "Lars", "family_name": "Buesing", "institution": "Gatsby Unit, UCL"}, {"given_name": "Timothy", "family_name": "Machado", "institution": "Columbia University"}, {"given_name": "John", "family_name": "Cunningham", "institution": "Columbia University"}, {"given_name": "Liam", "family_name": "Paninski", "institution": "Columbia University"}]}