{"title": "Learning a latent manifold of odor representations from neural responses in piriform cortex", "book": "Advances in Neural Information Processing Systems", "page_first": 5378, "page_last": 5388, "abstract": "A major difficulty in studying the neural mechanisms underlying olfactory perception is the lack of obvious structure in the relationship between odorants and the neural activity patterns they elicit. Here we use odor-evoked responses in piriform cortex to identify a latent manifold specifying latent distance relationships between olfactory stimuli. Our approach is based on the Gaussian process latent variable model, and seeks to map odorants to points in a low-dimensional embedding space, where distances between points in the embedding space relate to the similarity of population responses they elicit. The model is specified by an explicit continuous mapping from a latent embedding space to the space of high-dimensional neural population firing rates via nonlinear tuning curves, each parametrized by a Gaussian process. Population responses are then generated by the addition of correlated, odor-dependent Gaussian noise. We fit this model to large-scale calcium fluorescence imaging measurements of population activity in layers 2 and 3 of mouse piriform cortex following the presentation of a diverse set of odorants. The model identifies a low-dimensional embedding of each odor, and a smooth tuning curve over the latent embedding space that accurately captures each neuron's response to different odorants. The model captures both signal and noise correlations across more than 500 neurons. We validate the model using a cross-validation analysis known as co-smoothing to show that the model can accurately predict the responses of a population of held-out neurons to test odorants.", "full_text": "Learning a latent manifold of odor representations\n\nfrom neural responses in piriform cortex\n\nAnqi Wu1\n\nStan L. Pashkovski2\n\nSandeep Robert Datta2\n\nJonathan W. Pillow1\n\n1 Princeton Neuroscience Institute, Princeton University,\n\n{anqiw, pillow}@princeton.edu\n\n2 Department of Neurobiology, Harvard Medical School,\n\n{pashkovs, srdatta}@hms.harvard.edu\n\nAbstract\n\nA major dif\ufb01culty in studying the neural mechanisms underlying olfactory percep-\ntion is the lack of obvious structure in the relationship between odorants and the\nneural activity patterns they elicit. Here we use odor-evoked responses in piriform\ncortex to identify a latent manifold specifying latent distance relationships between\nolfactory stimuli. Our approach is based on the Gaussian process latent variable\nmodel, and seeks to map odorants to points in a low-dimensional embedding space,\nwhere distances between points in the embedding space relate to the similarity of\npopulation responses they elicit. The model is speci\ufb01ed by an explicit continuous\nmapping from a latent embedding space to the space of high-dimensional neural\npopulation \ufb01ring rates via nonlinear tuning curves, each parametrized by a Gaus-\nsian process. Population responses are then generated by the addition of correlated,\nodor-dependent Gaussian noise. We \ufb01t this model to large-scale calcium \ufb02uores-\ncence imaging measurements of population activity in layers 2 and 3 of mouse\npiriform cortex following the presentation of a diverse set of odorants. The model\nidenti\ufb01es a low-dimensional embedding of each odor, and a smooth tuning curve\nover the latent embedding space that accurately captures each neuron\u2019s response to\ndifferent odorants. The model captures both signal and noise correlations across\nmore than 500 neurons. We validate the model using a cross-validation analysis\nknown as co-smoothing to show that the model can accurately predict the responses\nof a population of held-out neurons to test odorants.\n\n1\n\nIntroduction\n\nOdorants are physically described by thousands of features in a high-dimensional chemical feature\nspace. Previous studies have focused on reducing the dimensionality of this chemical feature space\n[1], or on identifying dimensions of olfactory perceptual space using psychophysical measurements\nin humans [2, 3]. However, the dimensions of olfactory space underlying neural representations in\nthe brain remain largely unknown. Here we take a latent variable modeling approach to the problem\nof identifying a low-dimensional manifold of olfactory stimuli. Our approach is unsupervised in\nthat it makes no use of chemical features, but seeks to identify a latent embedding of odorants from\nmeasurements of odor-evoked neural population activity in mouse piriform cortex. This approach\naims to provide insight into odor encoding in the brain by identifying an olfactory space that relates\nsmoothly to changes in large-scale neural \ufb01ring patterns.\nRecent work in computational neuroscience has focused on the development of sophisticated model-\nbased methods for identifying low-dimensional latent manifolds underlying neural population activity\n[4\u201312]. Here we extend such methods to the problem of neural coding in the olfactory system.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fSpeci\ufb01cally, we develop a Gaussian process based latent variable model (GPLVM) [13] for identifying\nlatent structure underlying population activity in the olfactory cortex. The model is de\ufb01ned by a\nlatent olfactory space, which serves as a low-dimensional embedding space. This latent space seeks\nto preserve the similarity relationships between odors on the basis of similarities in evoked neural\nactivity patterns. The latent olfactory space is mapped to the space of high-dimensional neural activity\npatterns via a set of nonlinear tuning curves, one for each neuron, each governed by a Gaussian process\nprior. The output of these tuning curves speci\ufb01es a vector of mean responses to an odorant, and we\nmodel the neural activity patterns as Gaussian with a low-rank plus diagonal covariance, modulated\nby an odor-dependent scaling factor. This results in a matrix normal model of the population response\nacross odorants, de\ufb01ned by a diagonal odorant covariance and a low-rank plus diagonal neuron\ncovariance matrix. The main novelty of this work from a modeling perspective consists of extending\nthe GPLVM to incorporate structured noise for capturing correlated, odor-dependent variability in\nmulti-trial population responses to repeated stimuli. Although we have applied it here to the piriform\ncortex, we feel that this model could be used to gain insights into the latent organization of neural\npopulation activity in a wide variety of other brain areas where coding is mixed or poorly understood,\ne.g., prefrontal cortex [14, 15], parietal cortex [16\u201318], or entorhinal cortex [19].\nIn the following, we formulate the multi-trial Gaussian process latent variable for correlated neural\nactivity (Sec. 2) and describe an ef\ufb01cient variational expectation maximization (EM) inference method\nbased on black-box variational inference (Sec. 3). We then describe a validation procedure based\non co-smoothing, in which we predict the response of a subset of the neural population to a test\nodor using the tuning curves and the latent embeddings estimated from training data (Sec. 4). We\nvalidate our model and inference methodology using a simulated experiment, which reveals that\nrepeated stimulus presentations are necessary to obtain accurate estimates of the structured noise\ncovariance (Sec. 6). Finally (Sec. 7), we apply the model to multiple multi-neuron recordings of\npopulation activity from layer 2 (L2) and layer 3 (L3) mouse piriform cortex, each with more than\n500 simultaneously recorded neurons. The model allows us to infer a low-dimensional embedding of\n66 odorants, and smooth, low-dimensional neural tuning curves that account for the mean response\nof each neuron across odorants, and covariance matrices that account for both signal and noise\ncorrelations in neural activity patterns across neurons and odorants.\n\n2 Multi-trial Gaussian process latent variable with structured noise\n\nWe consider simultaneously measured calcium \ufb02uorescence imaging responses from N neurons\nin response to D distinct odorants, each presented T times. Let Y 2 RT\u21e5D\u21e5N denote the tensor\nof neural responses, with neurons indexed by n 2{ 1, ..., N}, odorants indexed by d 2{ 1, ..., D}\nand repeats indexed by t 2{ 1, ..., T}. Our goal is to build a generative model characterizing a\nlow-dimensional latent structure underlying this data, and assume each odor is associated with a\nlatent variable xd 2 RP\u21e51 in a P -dimensional latent space.\nLatent space: Let X = [x1, ..., xD]> 2 RD\u21e5P denote the matrix of latent locations for the D\nodorants in a P -dimensional latent embedding space. Let xp denote the p\u2019th column of X, which\ncarries the embedding location of all odorants along the p\u2019th latent dimension. We place a standard\nnormal prior to the embedding locations, xp \u21e0N (0, ID) for all p, re\ufb02ecting our lack of prior\ninformation from the chemical descriptors for each odorant.\nNonlinear latent tuning curves: Let f : RP\u21e51 ! R denote a nonlinear function mapping from\nthe latent space of odorant embeddings {xd} to a single neuron\u2019s \ufb01ring rate. These functions differ\nfrom traditional tuning curves in that their input is the latent (unobserved) vector xd of an odorant,\nas opposed to an observable stimulus feature (e.g., or orientation of a visual grating, or chemical\nfeatures of an odorant). Let fn(x) denote the tuning curve for the n\u2019th neuron, which we parametrize\nwith a Gaussian Process (GP) prior:\n\nfn(x) \u21e0GP (m(x), k(x, x0)), n = {1, ..., N}\n\n(1)\n\nwhere m(x) = bn>x is a linear mean function with weights bn, and k(x, x0) is a covariance function\nthat governs smoothness of the tuning curve over its P -dimensional input latent space. We use the\n2/22),\nGaussian or radial basis function (RBF) covariance function: k(x, x0) = \u21e2 exp(||x  x0||2\nwhere x and x0 are arbitrary points in the latent space, \u21e2 is the marginal variance and  is the length\nscale controlling smoothness of the latent tuning curve.\n\n2\n\n\f2D latent odor \n locations\n\n2D tuning curves\n\n \n\n1\nm\nd\n\ni\n\nn\n\ne\n\nu\nr\no\n\nn\ns\n\n \n\n1\nm\nd\n\ni\n\nfiring rates\n\nodor noise \ncovariance\n\nneural noise \n covariance\n\nneural recordings\n\nneuron\n\nr\no\nd\no\n\nneuron\n\nr\no\nd\no\n\n,\n\nodor\n\nn\no\nr\nu\ne\nn\n\n,\n\nneuron\n\nr\no\nd\no\n\np\n\ne\n\nr\ne\n\na\nts\n\ndim 2\n\ndim 2\n\nFigure 1: Schematic diagram of the multi-trial Gaussian process latent variable with structured noise.\n\nLet fn 2 RD\u21e51 denote a vector of \ufb01ring rates for neuron n in response the D odorants, with the\nd\u2019th element equal to fn(xd). The GP prior over fn(\u00b7) implies that fn has a multivariate normal\ndistribution given X:\n(2)\nwhere mn is a D \u21e5 1 mean vector for neuron n, and K is a D \u21e5 D covariance matrix generated\nby evaluating the covariance function k(\u00b7,\u00b7) at all pairs of rows in X. We assume the mean vector\nto be mn = Xbn with weights bn 2 RP\u21e51 giving a linearly mapping of the P -dimensional latent\nrepresentation for the mean of the \ufb01ring rate vector fn. If we assume a prior distribution over\nbn : p(bn) = N (0, 1IP ) for n = {1, ..., N} with  as the precision, we can integrate over bn to\nget the distribution of fn conditioned on X only:\n\nfn | X \u21e0N (mn, K), n = {1, ..., N}\n\nfn|X \u21e0N (0, K + 1XX>), n = {1, ..., N}\n\n(3)\n\nwhere the covariance is a mixture of a linear kernel and a nonlinear RBF kernel. The precision\nvalue  plays a role as the trade-off parameter between two kernels. For simplicity, we will denote\nK + 1XX> as K in the following sections, and we will differentiate the RBF kernel and the\nmixture kernel in the experimental section. Horizontally stacking fn for N neurons, we get a \ufb01ring\n\nrate matrix F 2 RD\u21e5N with fn on the n\u2019th column. Letef = vec(F) be the vectorized F, we can\nwrite the prior foref as,\nObservation model: For each repeat in the olfaction dataset, we have the neural population response\nto all odors, denoted as Yt 2 RD\u21e5N. Instead of taking the average over {Yt}T\nt=1 and modeling the\naveraged neural response as noise corrupted F, we use all the repeats to estimate latent variable and\nnoise covariance. First we collapse neuron dimension and odor dimension together to formulate a 2D\n\nef \u21e0N (0, IN \u2326 K)\n\n(4)\n\nt = {1, ..., T}\n\nt=1 are i.i.d samples from\n\neyt|ef \u21e0N (ef , ),\n\nt=1. Given the vectorized \ufb01ring rateef,\nmatrix eY 2 RT\u21e5(DN ), with the row vectors {eyt 2 R(DN )\u21e51}T\n{eyt}T\n(5)\nwhere  2 R(DN )\u21e5(DN ) is the noise covariance matrix. When  is a diagonal matrix, the model\nimplies the observed response yt,d,n = fd,n + \u270ft,d,n with \u270ft,d,n \u21e0N (0, 2\nd,n) for the n\u2019th neuron and\nd\u2019th odor in repeat t. When we assume the noise correlation exists across multiple neuron and odor\npairs,  is a non-diagonal matrix. In the olfaction dataset, there is a very small amount of repeats but\na large neural population, which implies that eY locates in a small-sample and high-dimension regime.\nSuch a dataset is insuf\ufb01cient to provide strong data support to estimate parameters for a full rank \nmatrix. Moreover, inverting  requires O(D3N 3) computational complexity prohibiting an ef\ufb01cient\ninference when N is large. Therefore, our solution is to model the noise covariance matrix with a\nKronecker structure, i.e.,  = \u2303N \u2326 \u2303D, where \u2303N is the noise covariance across neurons and\n\u2303D is the noise covariance across odors. Fig. 1 provides a schematic of the model. When applying\nmulti-trial GPLVM to the olfactory data, each repeat of presentations of all odorants is one trial to \ufb01t\nthe model.\nMarginal distribution over F: Since we have normal distributions for both data likelihood (eq. 5)\n\nof deriving the integration. Here, we provide one formulation consisting of multiple multivariate\n\nand prior foref (eq. 4), we can marginalize outef to derive the evidence for X. There are multiple ways\n\n3\n\n\fnormal distributions and treating the mean and the cross-trial information as random variables:\n\np(ey1, ...,eyT|K) = N0@ 1\n\npT\n\nTXj=1eyj|0,  + T I \u2326 K1A\n\nT1Yt=1\n\nN0@\n\n1\n\npt(t + 1)\n\ntXj=1eyj r t\n\nt + 1eyt+1|0, 1A .\n\nMore derivation details can be found in the supplement (Appendix A). The evidence distribution\nconsists of two parts: 1) normal distributions for the cross-trial random variables with the noise\ncovariance as its covariance, and 2) a normal distribution for the average of all repeats with a\ncovariance formed as a sum of the noise covariance and the GP prior covariance. For single-trial data,\nthe evidence distribution is reduced to the \ufb01rst normal distribution only in eq. 6, which is insuf\ufb01cient\nto be used to estimate a full noise covariance with a Kronecker structure as well as a kernel matrix.\nTherefore, the cross-trial statistics should be considered for structured noise estimation.\n\n(6)\n\n3 Ef\ufb01cient variational inference\n\nGiven the evidence in eq. 6 and the normal prior for X, we estimate the latent variable X in K and\nmodel parameters consisting of noise covariance  and hyperparameters in the kernel function. The\njoint distribution is written as,\n\np(Y, X|,\u2713 ) = p(Y|X, ,\u2713 )p(X)\n\n(7)\nwhere \u2713 = {\u21e2, } is the hyperparameter set, references to which will now be suppressed for\nsimpli\ufb01cation. This is a Gaussian process latent variable model (GPLVM) with multi-trial Gaussian\nobservations and structured noise covariance. Due to the non-conjugacy of the data distribution and\nthe prior over X, we employ a variational distribution to approximate the posterior of latent variable\nusing the Black Box Variational Inference (BBVI) [20] and optimize both latent variable and model\nparameters using a variational Expectation-Maximization (EM) algorithm. More details can be found\nin the supplement (Appendix B).\nIn E-step, we need to evaluate the log marginal likelihood for eq. 6 and calculate the inversion\nof (DN ) \u21e5 (DN ) covariance matrices, which is the computational bottleneck of the evaluation.\nHowever, we can ef\ufb01ciently evaluate it with the nice property of Kronecker product. For the\nnoise-only normal distributions, their covariance  = \u2303N \u2326 \u2303D is a Kronecker product of two\nsmaller matrices. The inversion of  is achieved by 1 = \u23031\nD . The log determinant\nis log || = N log |\u2303D| + D log |\u2303N|. For the normal distribution with both latent variable and\nnoise, its covariance matrix is a sum of two Kronecker products. In general, ef\ufb01cient evaluation\ncan be carried out for such a formulation. The key idea is to transform the summation of two full\nmatrices into one full matrix plus a diagonal matrix and then invert the summation using eigenvalue\ndecomposition.\nLet \u2303D = UD\u21e4DU>D and \u2303N = UN \u21e4N U>N be the eigen-decompositions of \u2303D and \u2303N. The\ncovariance matrix C can be factorized as\nC = T IN \u2326 K + \u2303N \u2326 \u2303D\n\nN \u2326 \u23031\n\nThe complexity of inverting the \ufb01rst and the third terms in eq. 8 is O(D3 + N 3). The bottleneck\nD and\n\n=\u21e3UN \u21e4\nD ) + IN \u2326 ID\u2318\u21e3\u21e4\nis now inverting the second term in eq. 8. We de\ufb01ne new notations eK = \u21e4 1\neC = T \u21e41\nThe problem is thus reduced to inverting the matrix eC. The second step is to exploit the compatibility\nUT \u21e4T U>T and eK = UK\u21e4KU>K be the eigen-decompositions of T \u21e41\neC = T \u21e41\nC =\u21e3UN \u21e4\n\nN \u2326 eK + IN \u2326 ID.\nN and eK. Thus,\nN \u2326 eK + IN \u2326 ID = (UT \u2326 UK) (\u21e4T \u2326 \u21e4K + IN \u2326 ID)U>T \u2326 U>K ,\nD\u2318 (UT \u2326 UK) (\u21e4T \u2326 \u21e4K + IN \u2326 ID)U>T \u2326 U>K\u21e3\u21e4\n\nof a Kronecker product plus a constant diagonal term with eigenvalue decomposition. Let T \u21e41\n\nFinally, combining eq. 8 and eq. 9 to get\n\nD\u2318\u21e3(T \u21e41\n\nD U>DKUD\u21e4 1\n\n1\n2\n\nDU>D\u2318 .\n\nN ) \u2326 (\u21e4 1\n\n2\n\nN U>N \u2326 \u21e4\n\nD U>DKUD\u21e4 1\n\n2\n\n2\n\n1\n2\n\nN U>N \u2326 \u21e4\n\nN \u2326 UD\u21e4\n\n1\n2\n\nN \u2326 UD\u21e4\n\n1\n2\n\nN =\n\n(9)\n\n1\n2\n\n1\n2\n\n1\n2\n\n2\n\n(8)\n\n1\n2\n\nDU>D\u2318 . (10)\n\n4\n\n\fInverting C now has only O(D3 + N 3) computational complexity instead of O(D3N 3). More\ndetailed derivations can be found in the supplement (Appendix C). With this ef\ufb01cient evaluation of\nthe log conditional likelihood, we can run BBVI fast for E-step to learn the optimal approximate\nposterior q(X|\u2020) \u21e1 p(X|Y, ,\u2713 ) given a \ufb01xed set of  and \u2713 with \u2020 as the optimal approximation\nparameters.\nIn M-step, model parameters are optimized using the ELBO given the optimal variational distribution\nlearned from E-step:\n\n\u2020,\u2713 \u2020 = argmax,\u2713 Eq(X|\u2020) [log p(Y|X, ,\u2713 )]\n\n(11)\n\nwhere the expectation can also be approximated by Monte Carlo integration.\nAfter the optimization, we can derive the posterior distribution for \ufb01ring rates F given the neural\nresponse Y and optimal X,  and \u2713 as\n\np(F|Y, X, ,\u2713 ) = N ef|(IN \u2326 K)( + T IN \u2326 K)1\n\nTXt=1eyt, (IN \u2326 K)( + T IN \u2326 K)1! . (12)\n\nSimilar to the evaluation in E-step, the posterior mean of \ufb01ring rates can be ef\ufb01ciently calculated\nusing the same Kronecker trick in eq. 10.\n\n4 Prediction with co-smoothing\n\nWe propose a model to learn latent representations for odors and tuning curves for neurons as well\nas structured noise covariance with multi-trial neural responses. Next, we employ a co-smoothing\nidea to evaluate its performance. The question to ask is when presenting an unseen odor to neural\npopulations, can we use partially observed neurons\u2019 responses to learn the odor\u2019s latent representation,\nthen predict the neural responses of the unobserved neurons given their tuning curves and the latent\nrepresentation?\nFiring rate prediction: We \ufb01rst use the training odors to estimate the \ufb01ring rates and the latent\nrepresentations of these training odors as shown in sec. 3. For a new odor, we collect some repeats of\nneural responses from partially observed neural ensembles Y\u21e4o 2 RT\u21e51\u21e5No where T is the number\nof repeats, No is the number of observed neurons and \u21e4 indicates the test odor. We use Y\u21e4o as well as\nthe optimal \ufb01ring rates F and latent variables X to estimate the latent representation x\u21e4 for the test\nodor. We use the same variational EM algorithm to learn q(x\u21e4) \u21e1 p(x\u21e4|Y\u21e4o, Y, X, ,\u2713 ) by \ufb01xing\nthe latent variables and noise covariance from the training data as well as the hyperparameters while\nchanging the latent variable and noise variance related to the test odor. Finally, the predictive \ufb01ring\nrate for the test odor from the partially unobserved neural ensembles, denoted as F\u21e4u 2 RNu\u21e51 with\nNu as the number of unobserved neurons, is calculated as\n\nF\u21e4u = (Nu,N \u2326 K\u21e4)( + T IN \u2326 K)1\n\n(13)\n\nTXt=1eyt,\n\nt=1 Y\u21e4u,t. We will show the \ufb01ring rate prediction in the olfaction data experiment.\n\nwhere K\u21e4 2 R1\u21e5D is the kernel matrix evaluated between the test odor\u2019s latent representation x\u21e4 and\nthe training odors\u2019 latent representations X, and Nu,N 2 RNu\u21e5N is a zero-one matrix indicating\nthe indices of the unobserved neurons in the entire neural ensemble. We can also calculate the \ufb01ring\nrates for the observed neurons F\u21e4o 2 RNo\u21e51 using a similar expression as eq. 13. For experimental\nevaluation purpose, we can compare the predictive \ufb01ring rate F\u21e4u with the averaged true response\nPT\n\nSingle-trial neural activity prediction: When the number of repeats is large enough to render a\nmean response resembling the underlying \ufb01ring rate, single-trial and trial-average models can both\nprovide good estimations for latent variables and \ufb01ring rates for test odors by using the co-smoothing\napproach. The advantage of our multi-trial model will be suppressed when only evaluating the\npredictive performance for \ufb01ring rates when there are many repeats. Thereby, we can take another\nstep forward to predict single-trial neural activities given the estimated \ufb01ring rates where the estimated\nnoise covariance encodes trial-by-trial deviations from the noise-free \ufb01ring rate.\n\n5\n\n\fA)\n\nsingle-trial prediction\n\nB)\n\nGP kernel\n\nmulti-trial\n\ntrial-average\n\nodor signal \n \n\nneural signal \n \n\nodor noise \n \n\nneural noise \n \n\nD\n\nN\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n2\n\nR\n\na\n\nt\n\na\nd\n\n \n\ne\nu\nr\nt\n\nl\n\ne\nd\no\nm\n\nn\no\n\ni\nt\n\na\nm\n\ni\nt\ns\ne\n\n0\n\n0\n\n19\n\n0\n\n19\n\n0\n\n49\n\n0\n\n19\n\n0\n\n49\n\n19\n\n0\n\n19\n\n0\n\n49\n\n0\n\n19\n\n0\n\n49\n\nFigure 2: A) R2 values for single-trial prediction for 8 different noise covariance structures comparing\nbetween trial-average and multi-trial models. The top two rows indicate the combinations of neuron\nnoise covariance and odor noise covariance parametrization. The y-axis indicates the R2 values. B)\nData covariance/correlation (top) and model-recovered covariance/correlation (bottom) for signal\n(columns 2-3) and noise (columns 4-5). The true kernel matrix in the GP prior is presented at the top\nin the 1st column and the estimated kernel matrix is presented at the bottom in the 1st column.\n\nLet \u2303D and \u2303N denote the noise covariance matrices for all odors and all neurons. We can partition\nthem into the following forms:\n\n\u2303D =\uf8ff \u230311\n\n\u230312\n\nD \u230312\nD\nD > \u230322\n\nD , \u2303N =\uf8ff \u230311\n\n\u230312\n\nN \u230312\nN\nN > \u230322\n\nN .\n\n(14)\n\nD is the noise covariance for\n\u2303D is partitioned according to the training odors and the test odor. \u230311\nD is the cross noise covariance between\nthe training odors estimated during the training stage; \u230312\nthe training odors and the test odor estimated during the co-smoothing stage; and \u230322\nD is the test\nodor noise covariance estimated during the co-smoothing stage. \u2303N is partitioned according to the\nobserved neurons and the unobserved neurons. \u230311\nN is the noise covariance for the observed odors;\nN is the cross noise covariance between the observed neurons and the unobserved neurons; and\n\u230312\nN is the unobserved neuron noise covariance. The entire \u2303N matrix is learned during the training\n\u230322\nprocedure and is partially used to do co-smoothing. We also denote the single-trial neural response\nfor training as Yt, the single-trial neural response added for co-smoothing as Y\u21e4o ,t and the single-trial\nneural response from the unobserved neurons for the test odor as Y\u21e4u,t. Then we can write down the\nmean of the posterior distribution for Y\u21e4u,t, i.e., p(Y\u21e4u,t|Yt, Y\u21e4o ,t, F, F\u21e4o, F\u21e4u, \u2303D, \u2303N ), as\n\nvec \u02c6Y\u21e4u,t = vec \u02c6F\u21e4u +24\u230312\nD\u2326\uf8ff\u230312\nN\n\nN\n\u230322\n\u230322\nD \u2326 \u230312\n\nN\n\n35\n\n>24 \u230311\n\nD >\u2326 [\u230311\n\u230312\n\nD \u2326 \u2303N \u230312\nN ] \u230322\n\nN \u230312\n\nD \u2326\uf8ff \u230311\nN >\n\nN\n\u230312\nD \u2326 \u230311\n\nN\n\n35\n\n1\uf8ff vecYt  vecF\nvecY\u21e4o ,t  vecF\u21e4o (15)\n\nWe will show the predictive performance comparing \u02c6Y\u21e4u,t and Y\u21e4u,t using single repeats in the\nsimulated experiment.\n\n5 Simulated data\n\nFirst, we consider a simulated dataset to illustrate the effect of our multi-trial GPLVM model with\nstructured noise covariance on single-trial predictive performance. We create a simulated example\nwith T = 10 repeats, N = 50 neurons and D = 20 odors according to the generative model\ndescribed in sec. 2. We generate 2-dimensional latent variables from a normal prior and construct a\ncovariance matrix from the latent using an RBF kernel function, and then i.i.d sample tuning curves\nfor 50 neurons from a Gaussian process prior with a zero mean and the covariance matrix. Then we\ngenerate two structured noise covariance matrices with rank = 2 for neurons and odors respectively.\nFinally, we generate 10 samples from eq. 5 using the sampled tuning curves and the structured noise\ncovariances.\nWe compare multiple combinations of structures for neuron noise covariance \u2303N and odor noise\ncovariance \u2303D. Each one can take one of three forms: an identity matrix, a diagonal matrix with\nheterogeneous noise variances on the diagonal and a low-rank full matrix plus a heterogeneous\ndiagonal (indicated in Fig. 2A)). Moreover, we compare between trial-averaged neural response and\nmulti-trial neural response in order to show that it requires more statistics to learn structured noise\nvariance. The trial-average results in Fig. 2A) are achieved by \ufb01tting the mean response only to\n\n6\n\n\f2D latents for odors\nPCA\n\nmulti-trial GPLVM\n\nB)\n\n3\n\n6\n\n2D tuning curves for individual neurons\n\nA)\n\nl\n\na\nn\no\n\ni\nt\nc\nn\nu\n\nf\n\n-2\n\n-2\n\n4\n\n-6\n\n-5\n\n5\n\nl\n\na\nc\no\n\nl\n\n2\n\n-2\n\n-7\n\n7\n\n-2\n\n2\n\n-5\n\n5\n\nC)\n\ns\ne\n\nt\n\na\nr\n \n\ng\nn\ni\nr\ni\nf\n\nmean response to each odor\n\nempirical firing rates\n\nestimated firing rates\n\nodor index\n\nFigure 3: A) 2D latent representations of 22 odors in functional and local odor sets analyzed by\nPCA and multi-trial GPLVM. Odors from different functional groups are color-coded. B) Inferred\ntwo-dimensional latent tuning curves for \ufb01ve example neurons. C) Mean response to each of the 22\nindividual odors for these same example neurons. Traces show observed mean spike count for each\nodor (blue) and inferred latent tuning curve value (red).\n\nmulti-trial GPLVM to learn structured noise. Our quantitative comparison covers the noise models\nfor GP from [21] and [22]. The R2 values of the single-trial prediction performance is shown in\nFig. 2A). The red and blue error bars represent trial-average and multi-trial respectively. When \ufb01tting\na full noise covariance matrix for odors, a trial-average model is poor. When \ufb01tting the 8th column\nwith full matrices for both neurons and odors, it prefers the multi-trial model and achieves the best\npredictive performance with structured noise covariance matrices. We also show that the best model\n(the 8th column) effectively captures noise structures and signal structures for both neuron and odor\nfrom the data (Fig. 2B)). The kernel matrix for the prior is also well recovered in Fig. 2B).\n\n6 Olfaction data\n\nTwo-photon calcium imaging of piriform cortex was performed in awake mice previously infected\nwith the GCaMP6s activity reporter. Imaging volumes through piriform layers 2 and 3 were ac-\nquired at 7 volumes/sec using a custom microscope equipped with a resonant galvo and high-speed\npiezo actuator. Detection of active neurons, segmentation, and extraction of \ufb02uorescence signal\nwas performed using Suite2p software. Extracted \ufb02uorescence traces were corrected for neuropil\ncontamination. For each cell, responses to odor presentations constituted a single delta F/F0 value\nwhere F is the average \ufb02uorescence signal over 2 seconds immediately following odor onset and F0 is\n\ufb02uorescence signal preceding odor onset. Monomolecular odors were diluted in di-propylene glycol\n(DPG) according to individual vapor pressures obtained from www.thegoodscentscompany.com, to\ngive a nominal concentration of 500 ppm. This vapor-phase concentration was further diluted 1:5\nby the carrier air\ufb02ow to yield 100 ppm at the exit port. Odor presentations lasted for two seconds\nand were interleaved by 30 seconds of blank (DPG) delivery. The order of presentation of odors was\npseudo-randomized for each experiment, such that on any given repeat, odors were presented once\nin no predictable order. Three different odor sets, each consisting of 22 odorants, were presented\nto multiple awake mice with 10 repeats for each odor. For each odor set, we have calcium imaging\nneural responses collected from about 200 neurons in both layer 2 (L2) and layer 3 (L3) in the\npiriform cortex of 3 mice leading to a dataset with about 500 L2 neurons and 500 L3 neurons for\neach odor set. Therefore, we deal with three datasets, each with T = 10 repeats, D = 22 odorants,\nN \u21e1 500 L2 neurons and N \u21e1 500 L3 neurons.\nWe standardize each repeat response across neurons and apply principle component analysis (PCA)\nand our model with a 2-dimensional latent embedding to these datasets. For PCA, we \ufb01nd the \ufb01rst\ntwo principal components of the D \u21e5 (N T ) response matrix. For our model, the kernel in eq. 3 is\nan RBF function without a linear component. We set the noise covariances for odors and neurons\nto be a heterogeneous diagonal matrix and a full matrix with a low-rank structure as described in\nFig. 2A). We \ufb01t the model to three different odor sets {\"functional\", \"local\", \"global\"} using both L2\nneurons and L3 neurons sharing the same 2D latent variables. Fig. 3A) shows the 2-dimensional latent\nvariables for 22 odors in the functional and local odor sets. More latent representations discovered by\nt-SNE [23] and multidimensional scaling (MDS) [24] can be found in the supplementary (Appendix\nD).\n\n7\n\n\fA)\n\nsingle-trial prediction\n\nD\n\nN\n\n0.4\n\n0.2\n\n0.8\n0.6\n0.4\n0.2\n\n2\n\nR\n\nc\ni\nr\nt\ne\nm\nn\no\n\n \n\ni\nt\n\nl\n\na\ne\nr\nr\no\nc\n\nmulti-trial RBF\nmulti-trial RBF+linear\n\ntrial-average RBF\ntrial-average RBF+linear\n\nB)\n\nrank of \n0.325\n\nodor signal \n \n\nneural signal \n \n\nodor noise \n \n\nneural noise \n \n\nt\n\na\na\nd\ne\nu\nr\nt\n\n \n\nn\no\n\ni\nt\n\nl\n\ne\nd\no\nm\n\na\nm\n\ni\nt\ns\ne\n\n0.3\n0.52\n\n1\n\n2 4 8\n\n12\n\n0.48\n\n1\n\n2 4 8\nrank r\n\n12\n\nFigure 4: A) R2 and correlation metric criteria for predictive performance for 5 different noise\ncovariance structures comparing between trial-average and multi-trial models as well as an RBF\nkernel vs a mixture of kernels. The top two rows indicate the combinations of neuron noise structure\nand odor noise structure. The in\ufb02uence of the rank of noise covariance is also presented for two\ncriteria. B) Data covariance/correlation (top) and model-recovered covariance/correlation (bottom)\nfor signal (the \ufb01rst two columns) and noise (the last two columns).\n\nThe functional odor set contains distinct odors sharing one of six chemical functional groups. Odors\nsharing the same functional group should be more closely related in chemical space than odors\nharboring different functional groups. The local odor set contains straight chain aliphatic odorants\nthat harbor 1 of 4 carbonyl functional groups and range 3-8 carbons in length. PCA cannot discover\nthe functional class nor identify the linearized embeddings effectively for both sets. Our model (multi-\ntrial GPLVM) can identify 2-dimensional clusters with clear linear boundaries for the functional set\nand linearized curves of groups of odors for the local set, without knowing any information regarding\nthe chemical features (Fig. 3A)). Odors from the same functional group have the same color. We\nlearn the 2D latent variables by imposing L2 and L3 sharing the same latent space, but the tuning\ncurves are estimated separately with different length scales for the GP priors. We observe that L3\nneurons have a bigger length scale value than L2 neurons. This implies wider tuning curves for L3\nwhich leads to better performance for L3 at discriminating different functional groups and identifying\nthe latent odor embeddings. Fig. 3B) shows some example 2D tuning curves from L3 in both odor\nsets. Fig. 3C) presents averaged \ufb01ring rates for individual neurons. The blue curves are the mean\nresponses across repeats which can be considered as empirical tuning curves (signal). The red curves\nare estimated tuning curves. This comparison suggests that our model can identify the signal and\n\ufb01t the data pretty well. Moreover, 1D empirical curves are plotted along the indices of the odors\nwhich are not that smooth nor interpretable. We can see that the model can effectively capture a\nset of smooth 2D neural turning curves for individual neurons which explicitly map the 2D latent\nrepresentations of odors to high-dimensional neural activities.\nThe 2D illustration indicates the strength of our proposed model in discovering nonlinear latent\nembedding for neural ensembles. We can \ufb01nd more interpretable 2D tuning curves than just taking\nthe average across multiple repeats for single neurons. Thereby, such a 2D space can be interpreted\nas an underlying embedding of neural populations. Next, we will employ the co-smoothing idea\ndescribed in sec. 4 to evaluate the predictive power of our model with different noise structures. The\nbetter the predictive performance is, the better the data is \ufb01t and explained by the noise structure.\nFor evaluating purpose, we leave one odor out for each odor set, train on 21 odors using L3 neurons\nand compute the predicted neural activities, an Nu by 1 vector, for the test odor within the odor\nset. In total, we carry out a training and predicting procedure for 66 times (leaving one odor out at\neach time) and take the average. Given the predicted neural activity vector, we use two evaluating\ncriteria: r-squared value (R2) and correlation metric. R2 reveals how close the true neural activities\nare to the predicted ones. It emphasizes single-neuron performance. However neurons in the piriform\ncortex are known for encoding correlation information of odors at the population level rather than\nindividual neurons. The correlation/similarity between odors represented in neural space is more\ninformative. We propose another correlation-based metric. We compare the correlation between\nthe predicted neural activity of the test odor and the training odors to obtain a 21 by 1 vector and\ncompare this vector with the true 21 by 1 vector constructed from the true neural activities using\nanother r-squared comparison. This is saying whether the similarity between the test odor and the\n\n8\n\n\f(xix0i)2\n\n22\ni\n\ni\n\ni s approaching to in\ufb01nity and a few small 2\n\n). Each latent dimension has its own length scale 2\n\nk(x, x0) = \u21e2 exp(PP\n\ntraining odors estimated by the model resembles the true correlation in neural space. The correlation\nmetric should have higher r-squared values than R2 employed on the predictive neural activity vector\nsince noisy neurons are smoothed out in the correlation metric.\nFig. 4A) presents both R2 and correlation metric (y-axis) on 5 different noise models. For both\nmetrics, the higher the y value is, the better the performance is. The structures of the models are\nindicated in the top two rows. When \ufb01tting the olfaction data, we don\u2019t assume a low-rank matrix for\nodor noise covariance. Since the presentation of odors were randomized, odors across repeats don\u2019t\nimply each other. The red and blue error bars represent trial-average and multi-trial respectively. It\u2019s\nclear that trial-average has much poor performance, especially for non-identity \u2303N matrices. When\n\u2303N is an identity matrix (the 3th column), the trial-average values almost catch up with the multi-trial\nperformance. The circle represents a single RBF kernel, and the square is a mixture of RBF and linear\nkernels with precision  estimated as an element in the hyperparameter set. Among all the models,\nthe 5th model outperforms the others with a full-matrix \u2303N and a non-identity \u2303D. This essentially\nsuggests that there exists correlated noise variability among neurons which cannot be ignored and\ncontribute to information encoding in the piriform cortex. Odorants are more independent in neural\nspace but require odor-speci\ufb01c noise variances. This result validates our prior knowledge about the\nolfactory neurons. Fig. 4B) shows that the best model (the 5th column) effectively captures noise\nstructures and signal structures for both neuron and odor from the data.\nThere are two dimensionality parameters we need to tune in the model. One is the dimensionality\nof the latent space, and the other is the rank of the low-rank component in the structured noise\nmatrix. We automate the selection of the number of latent dimensions via an automatic relevance\ndetermination (ARD) kernel [25] version of RBF over the latent variables, i.e. K in eq. 2 achieved by\ni , and they are\ni , the model automatically learns a sparse\nindependent of each other. By \ufb01tting the length scale 2\nlatent space with most 2\ni s. As a result of ARD, irrelevant\nlatent dimensions are effectively turned off by selecting large length scales for them. We initially set\nthe dimensionality to be 100, and the model returns 10-15 effective dimensions for all the data. For\nthe rank r of the low-rank structure, we run experiments with r = {1, 2, 4, 8, 12}. Fig. 4A) shows\nthat r = 2 has the best predictive performance using both R2 and correlation metric suggesting the\nnoise correlation is pretty strong with a low-dimensional subspace.\n7 Conclusion\nWe have proposed a multi-trial Gaussian process latent variable model with structured noise, and used\nit to infer a latent odor manifold underlying olfactory responses in the piriform cortex. The resulting\nmodel maps odorants to points in a low-dimensional embedding space, where the distance between\npoints in this embedding space relates to the similarity of population responses they elicit. The\nmodel is speci\ufb01ed by an explicit continuous mapping from a latent embedding space to the space of\nhigh-dimensional neural population activity patterns via a set of nonlinear neural tuning curves, each\nparametrized by a Gaussian process, followed by a low-rank model of correlated, odor-dependent\nGaussian noise. We used multiple repeats for analysis instead of trial-average responses for the sake\nof structured noise covariance estimation. We applied this model to calcium \ufb02uorescence imaging\nmeasurements of population activity in layers 2 and 3 of mouse piriform cortex following presentation\nof a diverse set of odorants. We showed that we can learn a low-dimensional embedding of odorants\nand a smooth tuning curve over the latent embedding space that accurately captures neural responses\nto different odorants. The model captured both signal and noise correlations across more than 500\nneurons. Finally, we performed a co-smoothing analysis to show that the model can accurately predict\nresponses of a population of held-out neurons to test odorants.\nIn the future, we will further investigate the biological interpretability of the 10-15 effective latent\ndimensions for olfactory perceptual space and the rank-2 structured neural noise covariance. Moreover,\nwe will explore the relationship between chemical features of these odorants and their learned latent\nembeddings in order to understand which chemical features are most important for determining an\nodorant\u2019s location within the neural manifold for olfactory representations.\nAcknowledgements\nThis work was supported by grants from the Simons Foundation (SCGB AWD1004351 and\nAWD543027), the NIH (R01EY017366, R01NS104899) and a U19 NIH-NINDS BRAIN Initiative\nAward (NS104648-01).\n\n9\n\n\fReferences\n[1] Ra\ufb01 Haddad, Rehan Khan, Yuji K Takahashi, Kensaku Mori, David Harel, and Noam Sobel. A metric for\n\nodorant comparison. Nature methods, 5(5):425, 2008.\n\n[2] Alexei Koulakov, Brian E Kolterman, Armen Enikolopov, and Dmitry Rinberg. In search of the structure\n\nof human olfactory space. Frontiers in systems neuroscience, 5:65, 2011.\n\n[3] Amir Madany Mamlouk, Christine Chee-Ruiter, Ulrich G Hofmann, and James M Bower. Quantifying\nolfactory perception: mapping olfactory perception space by using multidimensional scaling and self-\norganizing maps. Neurocomputing, 52:591\u2013597, 2003.\n\n[4] John P Cunningham and M Yu Byron. Dimensionality reduction for large-scale neural recordings. Nature\n\nneuroscience, 17(11):1500, 2014.\n\n[5] Evan Archer, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. Black box variational\n\ninference for state space models. arXiv preprint arXiv:1511.07367, 2015.\n\n[6] Yuanjun Gao, Evan W Archer, Liam Paninski, and John P Cunningham. Linear dynamical neural population\nmodels through nonlinear embeddings. In Advances in Neural Information Processing Systems, pages\n163\u2013171, 2016.\n\n[7] Matthew R Whiteway and Daniel A Butts. Revealing unobserved factors underlying cortical activity with\na recti\ufb01ed latent variable model applied to neural population recordings. Journal of neurophysiology,\n117(3):919\u2013936, 2016.\n\n[8] Yuan Zhao and Il Memming Park. Variational latent gaussian process for recovering single-trial dynamics\n\nfrom population spike trains. Neural Computation, 2017.\n\n[9] David Sussillo, Rafal Jozefowicz, LF Abbott, and Chethan Pandarinath. Lfads-latent factor analysis via\n\ndynamical systems. arXiv preprint arXiv:1608.06315, 2016.\n\n[10] Yuan Zhao and Il Memming Park. Recursive variational bayesian dual estimation for nonlinear dynamics\n\nand non-gaussian observations. arXiv preprint arXiv:1707.09049, 2017.\n\n[11] Anqi Wu, Nicholas G Roy, Stephen Keeley, and Jonathan W Pillow. Gaussian process based nonlinear\nlatent structure discovery in multivariate spike train data. In Advances in Neural Information Processing\nSystems, pages 3499\u20133508, 2017.\n\n[12] Zhe Chen. Latent variable modeling of neural population dynamics. In Dynamic Neuroscience, pages\n\n53\u201382. Springer, 2018.\n\n[13] Neil D Lawrence. Gaussian process latent variable models for visualisation of high dimensional data. In\n\nAdvances in neural information processing systems, pages 329\u2013336, 2004.\n\n[14] Mattia Rigotti, Omri Barak, Melissa R Warden, Xiao-Jing Wang, Nathaniel D Daw, Earl K Miller, and\nStefano Fusi. The importance of mixed selectivity in complex cognitive tasks. Nature, 497(7451):585\u2013590,\nMay 2013.\n\n[15] Valerio Mante, David Sussillo, Krishna V Shenoy, and William T Newsome. Context-dependent computa-\n\ntion by recurrent dynamics in prefrontal cortex. Nature, 503(7474):78\u201384, 2013.\n\n[16] Miriam LR Meister, Jay A Hennig, and Alexander C Huk. Signal multiplexing and single-neuron\ncomputations in lateral intraparietal area during decision-making. Journal of Neuroscience, 33(6):2254\u2013\n2267, 2013.\n\n[17] David Raposo, Matthew T Kaufman, and Anne K Churchland. A category-free neural population supports\n\nevolving demands during decision-making. Nature neuroscience, 2014.\n\n[18] Il Memming Park, Miriam LR Meister, Alexander C Huk, and Jonathan W Pillow. Encoding and decoding\nin parietal cortex during sensorimotor decision-making. Nature neuroscience, 17(10):1395\u20131403, 10 2014.\n\n[19] Kiah Hardcastle, Niru Maheswaranathan, Surya Ganguli, and Lisa M Giocomo. A multiplexed, heteroge-\n\nneous, and adaptive code for navigation in medial entorhinal cortex. Neuron, 94(2):375\u2013387, 2017.\n\n[20] Rajesh Ranganath, Sean Gerrish, and David Blei. Black box variational inference. In Arti\ufb01cial Intelligence\n\nand Statistics, pages 814\u2013822, 2014.\n\n10\n\n\f[21] Oliver Stegle, Christoph Lippert, Joris M Mooij, Neil D Lawrence, and Karsten M Borgwardt. Ef\ufb01cient\ninference in matrix-variate gaussian models with\\iid observation noise. In Advances in neural information\nprocessing systems, pages 630\u2013638, 2011.\n\n[22] Barbara Rakitsch, Christoph Lippert, Karsten Borgwardt, and Oliver Stegle. It is all in the noise: Ef\ufb01cient\nIn Advances in neural information\n\nmulti-task gaussian process inference with structured residuals.\nprocessing systems, pages 1466\u20131474, 2013.\n\n[23] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning\n\nresearch, 9(Nov):2579\u20132605, 2008.\n\n[24] Trevor F Cox and Michael AA Cox. Multidimensional scaling. Chapman and hall/CRC, 2000.\n\n[25] Carl Edward Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine\n\nlearning, pages 63\u201371. Springer, 2004.\n\n11\n\n\f", "award": [], "sourceid": 2580, "authors": [{"given_name": "Anqi", "family_name": "Wu", "institution": "Princeton University"}, {"given_name": "Stan", "family_name": "Pashkovski", "institution": "harvard university"}, {"given_name": "Sandeep", "family_name": "Datta", "institution": "harvard university"}, {"given_name": "Jonathan", "family_name": "Pillow", "institution": "Princeton University"}]}