{"title": "Decoding V1 Neuronal Activity using Particle Filtering with Volterra Kernels", "book": "Advances in Neural Information Processing Systems", "page_first": 1359, "page_last": 1366, "abstract": "", "full_text": "Decoding V1 Neuronal Activity using Particle\n\nFiltering with Volterra Kernels\n\nRyan Kelly\n\nTai Sing Lee\n\nCenter for the Neural Basis of Cognition\n\nCenter for the Neural Basis of Cognition\n\nCarnegie-Mellon University\n\nPittsburgh, PA 15213\n\nrkelly@cs.cmu.edu\n\nCarnegie-Mellon University\n\nPittsburgh, PA 15213\ntai@cnbc.cmu.edu\n\nAbstract\n\nDecoding is a strategy that allows us to assess the amount of information\nneurons can provide about certain aspects of the visual scene. In this\nstudy, we develop a method based on Bayesian sequential updating and\nthe particle \ufb01ltering algorithm to decode the activity of V1 neurons in\nawake monkeys. A distinction in our method is the use of Volterra ker-\nnels to \ufb01lter the particles, which live in a high dimensional space. This\nparametric Bayesian decoding scheme is compared to the optimal linear\ndecoder and is shown to work consistently better than the linear optimal\ndecoder. Interestingly, our results suggest that for decoding in real time,\nspike trains of as few as 10 independent but similar neurons would be\nsuf\ufb01cient for decoding a critical scene variable in a particular class of\nvisual stimuli. The reconstructed variable can predict the neural activity\nabout as well as the actual signal with respect to the Volterra kernels.\n\n1\n\nIntroduction\n\nCells in the primary visual cortex perform nonlinear operations on visual stimuli. This non-\nlinearity introduces ambiguity in the response of the neurons. Given a neuronal response,\nan optimal linear decoder cannot accurately reconstruct the visual stimulus due to nonlin-\nearities. Is there a strategy to resolve this ambiguity and recover the information that is\nencoded in the response of these neurons?\n\nBayesian decoding schemes, which are nonlinear, might be useful in this context . Bayesian\nsequential updating or belief propagation, implemented in the form of particle \ufb01ltering, has\nrecently been used in estimating the hand trajectories of monkeys based on M1 neuron\u2019s\nresponses [4] and the location of a rat based on the responses of the place cells in the\nhippocampus[3]. However, linear methods have been shown to be quite adequate for de-\ncoding LGN, motor cortical, or hippocampal place cells\u2019 signals using population vectors\nor the optimal linear decoder [10, 5, 8]. Bayesian methods, with proper probability model\nassumptions, could work better than the linear methods, but they apparently are not crit-\nical to solving those problems. These methods may be more useful or important in the\ndecoding of nonlinear visual neuronal responses. Here, we implement an algorithm based\non Bayesian sequential updating in the form particle \ufb01ltering to decode nonlinear visual\nneurons in awake behaving monkeys. The strategy is similar to the one used by Brown et\n\n\fal. [2] and Brockwell et al. [1] in their decoding of hippocampus place neurons or M1 neu-\nrons, except that we introduced the use of Volterra kernels [6, 7, 9] to \ufb01lter the hypothesis\nparticle to generate feedback messages. The Volterra kernels integrate information from the\nprevious 200 ms. This window allows us to backtrack and update the hypotheses within a\n200 ms window, so the hypothesis space does not grow beyond 200ms for lengthy signals.\nWe demonstrated that this method is feasible practically and indeed useful for decoding a\ntemporal variable in the stimulus input based on cells\u2019 responses and that it succeeds even\nwhen the optimal linear decoder fails.\n\n2 The Approach\n\nOur objective is to infer the time-series of a scene variable based on the ongoing response\nof one or a set of visual neurons. A hypothesis particle is then the entire history of the\nscene variable of interest up to the present time t, i.e. (x1, x2, . . . , xt) given the observed\nneuronal activity (y1, y2, . . . , yt). A key feature of our algorithm is the use of a decoded\nor estimated hypothesis to predict the response of the neurons at the next time step. The\npremise is that the scene variable we are inferring is suf\ufb01cient to predict the activity of\nthe neuron. Since visual neurons have a temporal receptive \ufb01eld and typically integrate\ninformation from the past 100-200 ms to produce a response, we cannot make the Marko-\nvian assumption made in other Bayesian decoding studies [1, 2, 3, 4]. Instead, we will use\nthe receptive \ufb01eld (kernel) to \ufb01lter each hypothesis particle to generate a prediction of the\nneural response. We propose to use the Volterra kernels, which have been used in previous\nstudies [6, 7, 9] to characterize the transfer function or receptive \ufb01eld of a neuron, to \ufb01lter\nthe hypothesis (\u02c6xt, . . . \u02c6x1). The predicted response of the neuron according to the kernels\nis based on the stimulus in the last 200 ms, optionally incorporating some lag which we\neliminated by shifting the response forward 40 ms in time to compensate for the 40 ms the\nvisual signal required to travel from the retina to V1.\n\nOngoing observation of the activity of\nneurons is compared to the predicted re-\nsponse or proposal to yield a likelihood\nmeasure.\nThe likelihood measure of\neach hypothesis particle is proportional to\nhow close the hypothesis\u2019s predicted re-\nsponse is to the actual observed neural re-\nsponse. As all the existing hypotheses\nare weighted by their likelihood measures,\nthe posterior distribution of the hypothe-\nsis is effectively resampled. The hypothe-\nses that tend to generate incorrect propos-\nals will die off over time. Conversely, the\nhypotheses that give predicted responses\nclose to the actual response values will not only be kept alive, but also be allowed to give\nbirth to offspring particles in its vicinity in the hypothesis space, allowing the algorithm to\nzoom in to the correct hypothesis more precisely.\n\nFigure 1: Two sample sinewave gratings.\n\nAfter weighting, resampling and reproducing, the hypothesis particles are propagated for-\nward according to the prior statistical distribution on how the scene variable tends to\nprogress. That is, p(xt|xt\u22121) yields a proposed hypothesis about the stimulus at time t + 1\nbased on the existing hypothesis which is de\ufb01ned at t and earlier times. These hypotheses\nare then \ufb01ltered though the Volterra kernels to predict the distribution p(yt|xt\u2212200,...,t\u22121),\nthus completing the loop. The entire \ufb02ow-chart of our inference system is shown in Figure\n4. Each step is described in detail below.\n\n\f3 Neurophysiological Experiment\n\nWe applied the ideas above to the data ob-\ntained by the following experiment. This\nexperiment sought to understand the en-\ncoding and decoding of temporal visual in-\nformation by V1 neurons. In each exper-\nimental session, a movie (2.2 seconds per\ntrial) of a sinewave grating stimulus was\npresented while the monkey had to main-\ntain \ufb01xation on a spot within a 0.8o \u00d7 0.8o\nwindow. The sinewave grating was con-\nstrained to move along one dimension in a\ndirection perpendicular to the grating with\na step size in phase drawn from a random\npink noise distribution which follow a 1/f\npower spectrum in the Fourier domain, approximating the statistical correlational structures\nin natural temporal stimuli. To ensure continuity of the input signals we took the cosine\nof the phase, which is related to the image intensity value at a local area within the recep-\ntive \ufb01eld. In decoding the cos(phase), a hidden variable, was the scene variable inferred.\nA sample stimulus is given in Figure 2. This scene variable, through the Volterra kernel\nprocedure, can predict the neural responses of this class of stimulus reasonably well.\n\nFigure 2: A sample time series of the scene\nvariable, with a sample spike train below.\n\n400 trials of different sequences were presented. The known pair sequences of stimulus and\nresponse in these trials were used to estimate the Volterra kernels by correlating the input\nx with the neural response y. In addition, one particular stimulus sequence is repeated\n60-80 trials to obtain a PSTH, which is smoothed with a 10 ms window to give an estimate\nof the instantaneous \ufb01ring rate. In our decoding work, we take the PSTH as input to our\nalgorithm; this is considered equivalent to assuming simultaneous access to a number of\nidentical, independent neurons. When the neurons are different, a kernel derivation for\neach neuron is necessary.\n\n4 Volterra Kernels\n\nVolterra kernels have been used to characterize a cell\u2019s transfer function. With Volterra\nkernels with memory length L, the response yt can be predicted by convolution of the\nkernels with the input xt,\n\nLX\n\nLX\n\nLX\n\ny(t) = yt = ho +\n\nh\u03c4 xt\u2212\u03c4 +\n\nh\u03c41,\u03c42xt\u2212\u03c41xt\u2212\u03c42,\n\n\u03c4 =1\n\n\u03c42\n\n\u03c41\n\nwhere h0 corresponds to the mean \ufb01ring rate, h\u03c4 is the \ufb01rst order kernel and h\u03c41,\u03c42 the\nsecond order kernel. We restrict all \u03c4 \u2019s to be positive, so we only consider causal \ufb01lters.\nThis equation is easily expressed in matrix form as Y = XH, where time is now indexed\nby matrix row in Y and X. H contains the concatenation of the terms\n\n[h0 h1 \u00b7\u00b7\u00b7 hL h1,1 h1,2 \u00b7\u00b7\u00b7 hL,L]0\n\n,\n\nand row t of X is similarly\n\n[1 xt\u22121 \u00b7\u00b7\u00b7 xt\u2212L (xt\u22121 xt\u22121) \u00b7\u00b7\u00b7 (xt\u2212L xt\u2212L)]\n\nThe standard solution for this regression problem is H = (X0X)\u22121 X0Y . That is,\nthe parameters of the kernels are derived using the regression technique by correlat-\ning the input and the output, and are compensated by the covariance in the input, i.e.\n\n\fFigure 3: The \ufb01rst and second order Volterra kernels of a V1 cell (left) and a typical pre-\ndiction of the neuronal response compared to the actual response (right).\n\nH = (X0X)\u22121X0Y . Because of the correlations in the input signal xt, the matrix (X0X)\nis ill conditioned. Instead of directly inverting this matrix, singular value decomposition\ncan be used, as U SU0 = X0X where U S\u22121U0 = (X0X)\u22121 and S is a diagonal matrix.\nOnly the \ufb01rst n largest dimensions as ranked by their eigenvalue are included, where n is\nchosen to account for 99% of the variance in X [7].\nFigure 3 depicts an example of the \ufb01rst and second order Volterra kernels and also shows\na typical example of their accuracy in predicting the response PSTH yt. For a majority of\nthese neurons, the Volterra kernels recovered are capable of predicting the neural response\nto the input stimulus with a high level of accuracy. This observation forms the basis of\nsuccess for our scheme of particle \ufb01ltering.\n\n5 Decoding Scheme\n\nWe apply Bayesian decoding to the problem of determining the visual stimulus variable\nxt for some time step t, given observations of the neural responses (y1, y2, . . . , yt). The\nglobal \ufb02ow of the algorithm is shown in Figure 4.\n\n5.1 Particle Prediction\n\nAt each time step of of decoding scheme, we can now \ufb01lter a hypothesis particle\n(\u02c6x1, \u02c6x2, . . . , \u02c6xt) by the Volterra kernels to generate a prediction of the response of the\nneuron to test the validity of the hypothesis. (y1, y2, . . . , yt) remains the observed neural\nactivity of a V1 neuron up to time t, and \u02c6yi\nt is the predicted neural activity at time t based on\nhypothesis particle i. This gives us a set of predicted responses at time t, {\u02c6y1\nt },\nt , . . . , \u02c6yN\nwhere the subscript is the particle index, and N is the number of particles.\n\nt , \u02c6y2\n\n5.2 Particle Resampling\n\nThe actual observed response of the neuron at time t is compared to each particle\u2019s predic-\ntion as a means to evaluate the likelihood or \ufb01tness of the particle. If we assume yt is the\naverage of spike trains from a single neuron in independent trials or the average \ufb01ring rate\nof a population of independent neurons with identical tuning properties, then the resulting\nerror distribution can be assumed to be a Gaussian distribution, with \u03c3 representing the\nuncertainty of the predicted response given the correct values of the stimulus variable. The\n\n0500100015002000250000.050.10.150.2Time(ms)Response (Prob of Spiking)PSTHPredicted Response\fFigure 4: Flow chart of the PF decoding scheme. The effect of one resampling step is\nshown in the two graphs. Each graph shows the particles\u2019 (n=100) values during a trial\nover 200 ms. The thicknesses of the lines are proportional to the number of particles with\nthe corresponding values. Notice the change in the distribution of particles after sampling.\nAfter the resampling there are many more particles concentrated around 1 instead of -1.\n\nrelative likelihood of an observation given each particle is then given by\n\np(yt|\u02c6xi\n\n1, . . . , \u02c6xi\n\nP\n\nt) = e\u2212(\u02c6yi\nj e\u2212(\u02c6yj\n\nt\u2212yt)2/2\u03c32\nt\u2212yt)2/2\u03c32 .\n\nAll the particles together provide a representation of the particle-conditional distribution,\n\np(yt|\u02c6xt, \u02c6xt\u22121, . . . , \u02c6x1).\n\nThis is used to resample the posterior distribution of the hypotheses based on all the obser-\nvations up to time t \u2212 1,\n\np(\u02c6xt|y1, y2, . . . yt) \u221d p(yt|\u02c6xt)p(\u02c6xt|y1, y2, . . . yt\u22121),\n\nto produce a current posterior distribution of the hypotheses.\n\n5.3 Particle Propagation\n\nThe next step in the decoding scheme is to generate new value \u02c6xt+1 and append it to the\nhypothesis particle\n\nZ\n\np(\u02c6xt+1|y1, y2, . . . yt) =\n\np(\u02c6xt+1|\u02c6xt)p(\u02c6xt|y1, y2, . . . yt)dxt,\n\nwhere p(\u02c6xt+1|\u02c6xt) is the state propagation model that provides the prior on how the stimulus\nchanges over time. For the state propagation model used in this study, all initial positions\nfor the stimulus are equally likely. The range of the stimulus (-1 to 1) is divided into 60\nequally spaced intervals. A 60x60 probability table is constructed empirically from the\ntraining data stimuli, corresponding to a discrete version approximating the conditional\nprior above. Solving these priors analytically is dif\ufb01cult or even impossible. Besides, the\n\n\fhypothesis space is enormous as there are 60 possible values at each time point, and infor-\nmation from a 200 ms window (20 time points at 10 ms intervals) is being integrated to\npredict yt. The particle \ufb01ltering algorithm is basically a way to approximate the distribu-\ntions ef\ufb01ciently.\n\nThe algorithm consists of cycling through\nthe above steps, i.e. particle prediction,\nparticle resampling, and particle propaga-\ntion. In summary,\n\n1. Prediction step: Filter all particles by\nthe Volterra kernels to generate the predic-\ntion of neural responses.\n2. Resampling step: Compare actual neu-\nral response with the predicted response of\neach particle to assign a likelihood value\nto each particle. Resample (with replace-\nment) the posterior distribution of the par-\nticles based on their likelihood.\n3. Propagation step: Sample from the state\nmodel to randomly postulate a new stimu-\nlus value \u02c6xt for each particle and add this\nvalue to the end of the particle\u2019s sequence\nto obtain (\u02c6x1, \u02c6x2, . . . , \u02c6xt).\nIn the propagation step, the state model will move the stimulus in ways that it has typically\nbeen seen to move. In the prediction step, particles that predict a neural response close\nto the actual observed response will be highly valued and will likely be duplicated in the\nresampled set. Conversely, particles that predict a response which is not close to the actual\nresponse will not be highly valued and thus will likely be removed from the resultant set.\n\nFigure 5: A scatter plot showing the least\nsquares regression line for the data.\n\n6 Results and Discussion\n\nLet xt, xt\u22121, . . . , x1 be the inferred scene variable\n(cos(phase)). sk(t) is the binary spike response of a\nneuron during trial k. The instantaneous \ufb01ring rate\nof the neuron is given by\n\nmX\n\nk\n\ny(t) =\n\n1\nm\n\n= 1sk(t)\n\nwhere m is the number of trials.\nIn general, for\ncells that respond well to a stimulus, this \ufb01rst order\nand the second order kernel can predict the response\nwell. Over all cells tested (n=33), the average er-\nror ratio ey in the energy of the actual response is\n18.4%. Each of the cells was decoded using the\nparticle \ufb01ltering algorithm with 1000 particles. The\naverage reconstruction error ex is 27.14%, and the\nbest cell has 10% error. A correlation exists be-\nP\ntween the encoding and decoding errors across tri-\nals as shown in Figure 5.\nP\nt(\u02c6yt \u2212 yt)2\n\nP\nP\nt(\u02c6xt \u2212 xt)2\nt(xt + 1)2 ,\n\n, ex =\n\ney =\n\nt y2\nt\n\nFigure 6: Reconstruction error when\ninput PSTH is constructed from\nfewer trials. With 10 spike trains,\nthe PF has almost achieved the min-\nimum error possible for this cell.\n\n\fFigure 7: Particle \ufb01ltering (PF) and optimal linear decoder (O.L.D.) reconstructions. The\ntop left is the best PF reconstruction, and the bottom right is the worst out of all the cells\ntested.\n\n\u03c3 affects the rate at which the particle hypothesis space collapses around the correct solu-\ntion. If \u03c3 is too large, all particles will become equally likely, while if \u03c3 is too small, only\na few particles will survive each time step. Ideally, the particles will converge on a value\nfor a number of time steps equal to the kernel\u2019s length. The optimal value for \u03c3 was found\nempirically and was used in all reconstructions.\n\nFigure 7 shows sample reconstructions of some\ngood and bad cells. Decoding accuracy is limited\nby the performance of the Volterra kernel. When\nthe kernel is unable to predict the neuronal re-\nsponse, particularly for cells that have low \ufb01ring\nrates, any decoding scheme will suffer because of\ninsuf\ufb01cient information. Thus the amount of error\nis correlated to the inability of the kernel in pre-\ndicting neuronal responses. This idea is consistent\nwith the error correlation between the particle \ufb01lter\nand kernel in Figure 5. These cells do not provide\nenough relevant information about the visual stim-\nulus in their spiking activities.\n\nFigure 6 shows that reconstruction based on the\nPSTH constructed from as few as 5-10 spike trains\ncan reach an accuracy not far from reconstruction\nbased on the PSTH of 80 trials. This suggests that as few as 10 independent but similar\ncells recorded simultaneously might be suf\ufb01cient for decoding this scene variable.\n\nFigure 8: A scatter plot comparing\nthe two decoding methods.\n\nWe \ufb01nd that the optimal linear decoder does not decode these cells well. The decoded\noutput tends to follow the signal somewhat, but at a low amplitude as shown in Figure 7.\nThe problem for the optimal linear decoder is that at any single moment in time it can only\npropose a single hypothesis, but there exist multiple signals that can produce the response.\n\n\fThe optimal linear decoder tends to average in these cases. The particle \ufb01lter keeps alive\nmany independent hypotheses and can thus choose the most likely candidate by integrating\ninformation.\n\nThe success of the particle \ufb01lter relies mainly on three factors. First, in the particle pre-\ndiction step, the Volterra kernels allow the particles to make reasonably accurate proposals\nbased on the observed neural activities. This gives a good measure for evaluating the \ufb01tness\nof each particle. Second, in the resampling step, the weight of each particle embodies all\nthe earlier observations, and because our particle \ufb01lter keeps track of all proposals within\nthe last 200 ms, earlier hypotheses can continue to be reevaluated and re\ufb01ned. Finally, in\nthe propagation step, the particle \ufb01lter utilizes prior knowledge about the manner in which\nthe stimulus moves. This helps further in pruning down the hypothesis space.\n\nAcknowledgments\n\nThis research is supported by NSF CAREER 9984706, NIH Vision Research core grant\nEY08098, and a NIH 2P41PR06009-11 for biomedical supercomputing. Thanks to Rick\nRomero, Yuguo Yu, and Anthony Brockwell for helpful discussion and advice.\n\nReferences\n\n[1] A. E. Brockwell, A. L. Rojas, and R. E. Kass. Bayesian decoding of motor cortical\n\nsignals by particle \ufb01ltering. Submitted to J. Neurophysiology, 2003.\n\n[2] E. Brown, L. Frank, D. Tang, M. Quirk, and M. Wilson. A statistical paradigm for\nneural spike train decoding applied to position prediction from ensemble \ufb01ring pat-\nterns of rat hippocampal place cells. J. Neuroscience, 18(18):7411\u20137425, 1998.\n\n[3] U.T. Eden, L.M. Frank, R. Barbieri, and E.N. Brown. Particle \ufb01ltering algorithms\nIn Proc.\n\nfor neural decoding and adaptive estimation of receptive \ufb01eld plasticity.\nComputational Neuroscience Meeting, CNS \u201902, Santa Barbara, 2002.\n\n[4] Y. Gao, M. J. Black, E. Bienenstock, S. Shoham, and J. P. Donoghue. Probabilistic\nInference of Hand Motion from Neural Activity in Motor Cortex, pages 213\u2013220. MIT\nPress, Cambridge, MA, 2002.\n\n[5] A. P. Georgopoulos, A. B. Schwartz, and R. E. Kettner. Neuronal population coding\n\nof movement direction. Science, 243:234\u2013236, 1989.\n\n[6] F. Rieke, D. Warland, R. deRuytervanSteveninck, and W. Bialek. Spikes: Exploring\n\nthe Neural Code. MIT Press, Cambridge, MA, 1997.\n\n[7] R. Romero, Y. Yu, P Afhsar, and T. S. Lee. Adaptation of the temporal receptive \ufb01elds\n\nof macaque v1 neurons. Neurocomputing, 52-54:135\u2013140, 2002.\n\n[8] G. Stanley, F. Li, and Y. Dan. Reconstruction of natural scenes from ensemble re-\nsponses in the lateral geniculate nucleus. J. Neuroscience, 19(18):8036\u20138042, 1999.\n[9] G. B. Stanley. Adaptive spatiotemporal receptive \ufb01eld estimation in the visual path-\n\nway. Neural Computation, 14:2925\u20132946, 2002.\n\n[10] K. Zhang, I. Ginzburg, B.L. McNaughton, and T. J. Sejnowski.\n\nInterpreting neu-\nronal population activity by reconstruction: Uni\ufb01ed framework with application to\nhippocampal place cells. J. Neurophysiology, 79:1017\u20131044, 1998.\n\n\f", "award": [], "sourceid": 2491, "authors": [{"given_name": "Ryan", "family_name": "Kelly", "institution": null}, {"given_name": "Tai Sing", "family_name": "Lee", "institution": null}]}