{"title": "Probabilistic Computation in Spiking Populations", "book": "Advances in Neural Information Processing Systems", "page_first": 1609, "page_last": 1616, "abstract": null, "full_text": "Probabilistic computation in spiking populations\n\n\n\n Richard S. Zemel Quentin J. M. Huys Rama Natarajan Peter Dayan\n Dept. of Comp. Sci. Gatsby CNU Dept. of Comp. Sci. Gatsby CNU\n Univ. of Toronto UCL Univ. of Toronto UCL\n\n\n\n Abstract\n\n As animals interact with their environments, they must constantly update\n estimates about their states. Bayesian models combine prior probabil-\n ities, a dynamical model and sensory evidence to update estimates op-\n timally. These models are consistent with the results of many diverse\n psychophysical studies. However, little is known about the neural rep-\n resentation and manipulation of such Bayesian information, particularly\n in populations of spiking neurons. We consider this issue, suggesting a\n model based on standard neural architecture and activations. We illus-\n trate the approach on a simple random walk example, and apply it to\n a sensorimotor integration task that provides a particularly compelling\n example of dynamic probabilistic computation.\n\n\nBayesian models have been used to explain a gamut of experimental results in tasks which\nrequire estimates to be derived from multiple sensory cues. These include a wide range\nof psychophysical studies of perception;13 motor action;7 and decision-making.3, 5 Central\nto Bayesian inference is that computations are sensitive to uncertainties about afferent and\nefferent quantities, arising from ignorance, noise, or inherent ambiguity (e.g., the aperture\nproblem), and that these uncertainties change over time as information accumulates and\ndissipates. Understanding how neurons represent and manipulate uncertain quantities is\ntherefore key to understanding the neural instantiation of these Bayesian inferences.\n\nMost previous work on representing probabilistic inference in neural populations has fo-\ncused on the representation of static information.1, 12, 15 These encompass various strategies\nfor encoding and decoding uncertain quantities, but do not readily generalize to real-world\ndynamic information processing tasks, particularly the most interesting cases with stim-\nuli changing over the same timescale as spiking itself.11 Notable exceptions are the re-\ncent, seminal, but, as we argue, representationally restricted, models proposed by Gold and\nShadlen,5 Rao,10 and Deneve.4\n\nIn this paper, we first show how probabilistic information varying over time can be repre-\nsented in a spiking population code. Second, we present a method for producing spiking\ncodes that facilitate further processing of the probabilistic information. Finally, we show\nthe utility of this method by applying it to a temporal sensorimotor integration task.\n\n\n1 TRAJECTORY ENCODING AND DECODING\n\nWe assume that population spikes R(t) arise stochastically in relation to the trajectory X(t)\nof an underlying (but hidden) variable. We use RT and XT for the whole trajectory and\n\n\f\nspike trains respectively from time 0 to T . The spikes RT constitute the observations and\nare assumed to be probabilistically related to the signal by a tuning function f (X, i):\n\n P (R(i, T )|X(T )) f (X, i) (1)\n\nfor the spike train of the ith neuron, with parameters i. Therefore, via standard Bayesian\ninference, RT determines a distribution over the hidden variable at time T , P (X(T )|RT ).\n\nWe first consider a version of the dynamics and input coding that permits an analytical\nexamination of the impact of spikes. Let X(t) follow a stationary Gaussian process such\nthat the joint distribution P (X(t1), X(t2), . . . , X(tm)) is Gaussian for any finite collection\nof times, with a covariance matrix which depends on time differences: Ctt = c(|t - t |).\nFunction c(|t|) controls the smoothness of the resulting random walks. Then,\n\n P (X(T )|RT ) p(X(T )) dX(T )P (R\n X(T ) T |X(T ))P (X(T )|X (T )) (2)\n\nwhere P (X(T )|X(T )) is the distribution over the whole trajectory X(T ) conditional on\nthe value of X(T ) at its end point. If RT are a set of conditionally independent inhomoge-\nneous Poisson processes, we have\n\n P (RT |X(T )) f (X(t d f (X( ), \n i i ), i) exp - i i) , (3)\n\nwhere ti are the spike times of neuron i in RT . Let = [X(ti )] be the vector of\nstimulus positions at the times at which we observed a spike and = [(ti )] be the vector\nof spike positions. If the tuning functions are Gaussian f (X, i) exp(-(X - i)2/22)\nand sufficiently dense that d f (X, \n i i) is independent of X (a standard assumption\nin population coding), then P (RT |X(T )) exp(- - 2/22) and in Equation 2, we\ncan marginalize out X(T ) except at the spike times ti :\n\n P (X(T )|RT ) p(X(T )) d exp -[, X(T )]T C-1 [, X(T )] - - 2 (4)\n 2 22\n\nand C is the block covariance matrix between X(ti ), x(T ) at the spike times [tt ] and the\nfinal time T . This Gaussian integral has P (X(T )|RT ) N ((T ), (T )), with\n\n (T ) = CT t(Ctt + I2)-1 = k (T ) = CT T - kCtT (5)\n\nCT T is the T, T th element of the covariance matrix and CT t is similarly a row vector. The\ndependence in on past spike times is specified chiefly by the inverse covariance matrix,\nand acts as an effective kernel (k). This kernel is not stationary, since it depends on factors\nsuch as the local density of spiking in the spike train RT .\n\nFor example, consider where X(t) evolves according to a diffusion process with drift:\n\n dX = -Xdt + dN (t) (6)\n\nwhere prevents it from wandering too far, N (t) is white Gaussian noise with mean zero\nand 2 variance. Figure 1A shows sample kernels for this process.\n\nInspection of Figure 1A reveals some important traits. First, the monotonically decreasing\nkernel magnitude as the time span between the spike and the current time T grows matches\nthe intuition that recent spikes play a more significant role in determining the posterior over\nX(T ). Second, the kernel is nearly exponential, with a time constant that depends on the\ntime constant of the covariance function and the density of the spikes; two settings of these\nparameters produced the two groupings of kernels in the figure. Finally, the fully adaptive\nkernel k can be locally well approximated by a metronomic kernel k (shown in red in\nFigure 1A) that assumes regular spiking. This takes advantage of the general fact, indicated\nby the grouping of kernels, that the kernel depends weakly on the actual spike pattern, but\nstrongly on the average rate. The merits of the metronomic kernel are that it is stationary\nand only depends on a single mean rate rather than the full spike train RT . It also justifies\n\n\f\n Kernels k and ks Variance ratio Full kernel\n A B D\n -0.5\n\n -2 10\n 10\n 2 / 0\n -4 2 5 Space\n 10 \n\n Kernel size (weight)\n 0 0.5\n 0 0.03 0.06 0.09 0.04 0.06 0.08 0.1\n t-tspike Time\n C True stimulus and means Regular, stationary kernel\n E\n 0.5 -0.5\n\n\n\n\n\n 0 0\n Space Space\n\n\n\n -0.5 0.5\n 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1\n Time Time\n\n\n\nFigure 1: Exact and approximate spike decoding with the Gaussian process prior. Spikes\nare shown in yellow, the true stimulus in green, and P (X(T )|RT ) in gray. Blue: exact\ninference with nonstationary and red: approximate inference with regular spiking. A Ker-\nnel samples for a diffusion process as defined by equations 5, 6. B, C: Mean and variance\nof the inference. D: Exact inference with full kernel k and E: approximation based on\nmetronomic kernel k. (Equation 7).\n\n\nthe form of decoder used for the network model in the next section.6 Figure 1D shows an\nexample of how well Equation 5 specifies a distribution over X(t) through very few spikes.\n\nFinally, 1E shows a factorized approximation with the stationary kernel similar to that used\nby Hinton and Brown6 and in our recurrent network:\n\n ^ t\n P (X(t)|R(t)) f (X, kst\n j=0 j ij = exp(-E(X(t), R(t), t)), (7)\n i i)\n\nBy design, the mean is captured very well, but not the variance, which in this example\ngrows too rapidly for long interspike intervals (Figure 1B, C). Using a slower kernel im-\nproves performance on the variance, but at the expense of the mean. We thus turn to the net-\nwork model with recurrent connections that are available to reinstate the spike-conditional\ncharacteristics of the full kernel.\n\n\n2 NETWORK MODEL FORMULATION\n\nAbove we considered how population spikes RT specify a distribution over X(T ). We now\nextend this to consider how interconnected populations of neurons can specify distributions\nover time-varying variables. We frame the problem and our approach in terms of a two-level\nnetwork, connecting one population of neurons to another; this construction is intended to\napply to any level of processing. The network maps input population spikes R(t) to output\npopulation spikes S(t), where input and output evolve over time. As with the input spikes,\nST indicates the output spike trains from time 0 to T , and these output spikes are assumed\nto determine a distribution over a related hidden variable.\n\nFor the recurrent and feedforward computation in the network, we start with the de-\nceptively simple goal9 of producing output spikes in such a way that the distribution\nQ(X(T )|ST ) they imply over the same hidden variable X(T ) as the input, faithfully\nmatches P (X(T )|RT ). This might seem a strange goal, since one could surely just lis-\nten to the input spikes. However, in order for the output spikes to track the hidden variable,\nthe dynamics of the interactions between the neurons must explicitly capture the dynamics\n\n\f\nof the process X(T ). Once this `identity mapping' problem has been solved, more general,\ncomplex computations can be performed with ease. We illustrate this on a multisensory\nintegration task, tracking a hidden variable that depends on multiple sensory cues.\n\nThe aim of the recurrent network is to take the spikes R(t) as inputs, and produce output\nspikes that capture the probabilistic dynamics. We proceed in two steps. We first consider\nthe probabilistic decoding process which turns ST into Q(X(t)|ST ). Then we discuss the\nrecurrent and feedforward processing that produce appropriate ST given this decoder. Note\nthat this decoding process is not required for the network processing; it instead provides a\ncomputational objective for the spiking dynamics in the system.\n\nWe use a simple log-linear decoder based on a spatiotemporal kernel:6\n\n Q(X(T )|ST ) exp(-E(X(T ), ST , T )) , where (8)\n E(X, S T\n T , T ) = S(j, T - )\n j =0 j (X, ) (9)\n\nis an energy function, and the spatiotemporal kernels are assumed separable: j(X, ) =\ngj(X)( ). The spatial kernel gj(X) is related to the receptive field f (X, j) of neuron j\nand the temporal kernel j(X, ) to k