{"title": "A Neural Implementation of the Kalman Filter", "book": "Advances in Neural Information Processing Systems", "page_first": 2062, "page_last": 2070, "abstract": "There is a growing body of experimental evidence to suggest that the brain is capable of approximating optimal Bayesian inference in the face of noisy input stimuli. Despite this progress, the neural underpinnings of this computation are still poorly understood. In this paper we focus on the problem of Bayesian filtering of stochastic time series. In particular we introduce a novel neural network, derived from a line attractor architecture, whose dynamics map directly onto those of the Kalman Filter in the limit where the prediction error is small. When the prediction error is large we show that the network responds robustly to change-points in a way that is qualitatively compatible with the optimal Bayesian model. The model suggests ways in which probability distributions are encoded in the brain and makes a number of testable experimental predictions.", "full_text": "A Neural Implementation of the Kalman Filter\n\nRobert C. Wilson\n\nDepartment of Psychology\n\nPrinceton University\nPrinceton, NJ 08540\n\nrcw2@princeton.edu\n\nLeif H. Finkel\n\nDepartment of Bioengineering\n\nUniversity of Pennsylvania\n\nPhiladelphia, PA 19103\n\nAbstract\n\nRecent experimental evidence suggests that the brain is capable of approximating\nBayesian inference in the face of noisy input stimuli. Despite this progress, the\nneural underpinnings of this computation are still poorly understood. In this pa-\nper we focus on the Bayesian \ufb01ltering of stochastic time series and introduce a\nnovel neural network, derived from a line attractor architecture, whose dynamics\nmap directly onto those of the Kalman \ufb01lter in the limit of small prediction error.\nWhen the prediction error is large we show that the network responds robustly to\nchangepoints in a way that is qualitatively compatible with the optimal Bayesian\nmodel. The model suggests ways in which probability distributions are encoded\nin the brain and makes a number of testable experimental predictions.\n\n1\n\nIntroduction\n\nThere is a growing body of experimental evidence consistent with the idea that animals are some-\nhow able to represent, manipulate and, ultimately, make decisions based on, probability distribu-\ntions. While still unproven, this idea has obvious appeal to theorists as a principled way in which to\nunderstand neural computation. A key question is how such Bayesian computations could be per-\nformed by neural networks. Several authors have proposed models addressing aspects of this issue\n[15, 10, 9, 19, 2, 3, 16, 4, 11, 18, 17, 7, 6, 8], but as yet, there is no conclusive experimental evidence\nin favour of any one and the question remains open.\nHere we focus on the problem of tracking a randomly moving, one-dimensional stimulus in a noisy\nenvironment. We develop a neural network whose dynamics can be shown to approximate those of a\none-dimensional Kalman \ufb01lter, the Bayesian model when all the distributions are Gaussian. Where\nthe approximation breaks down, for large prediction errors, the network performs something akin to\noutlier or change detection and this \u2018failure\u2019 suggests ways in which the network can be extended to\ndeal with more complex, non-Gaussian distributions over multiple dimensions.\nOur approach rests on the modi\ufb01cation of the line attractor network of Zhang [26]. In particular, we\nmake three changes to Zhang\u2019s network, modifying the activation rule, the weights and the inputs\nin such a way that the network\u2019s dynamics map exactly onto those of the Kalman \ufb01lter when the\nprediction error is small. Crucially, these modi\ufb01cations result in a network that is no longer a line\nattractor and thus no longer suffers from many of the limitations of these networks.\n\n2 Review of the one-dimensional Kalman \ufb01lter\n\nFor clarity of exposition and to de\ufb01ne notation, we brie\ufb02y review the equations behind the one-\ndimensional Kalman \ufb01lter. In particular, we focus on tracking the true location of an object, x(t),\nover time, t, based on noisy observations of its position z(t) = x(t) + nz(t), where nz(t) is zero\nmean Gaussian random noise with standard deviation \u03c3z(t), and a model of its dynamics, x(t+1) =\n\n1\n\n\fx(t) + v(t) + nv(t), where v(t) is the velocity signal and nv(t) is a Gaussian noise term with zero\nmean and standard deviation \u03c3v(t). Assuming that \u03c3z(t), \u03c3v(t) and v(t) are all known, then the\nKalman \ufb01lter\u2019s estimate of the position, \u02c6x(t), can be computed via the following three equations\n\n1\n\n\u02c6\u03c3x(t + 1)2 =\n\n\u00afx(t + 1) = \u02c6x(t) + v(t)\n\u02c6\u03c3x(t)2 + \u03c3v(t)2 +\n\n1\n\n1\n\n\u03c3z(t + 1)2\n\n\u02c6x(t + 1) = \u00afx(t + 1) +\n\n\u02c6\u03c3x(t + 1)2\n\u03c3z(t + 1)2 [z(t + 1) \u2212 \u00afx(t + 1)]\n\n(1)\n\n(2)\n\n(3)\n\nIn equation 1 the model computes a prediction, \u00afx(t+1), for the position at time t+1; equation 2 up-\ndates the model\u2019s uncertainty, \u02c6\u03c3x(t+1), in its estimate; and equation 3 updates the model\u2019s estimate\nof position, \u02c6x(t + 1), based on this uncertainty and the prediction error [z(t + 1) \u2212 \u00afx(t + 1)].\n\n3 The neural network\n\nThe network is a modi\ufb01cation of Zhang\u2019s line attractor model of head direction cells [26]. We use\nrate neurons and describe the state of the network at time t with the membrane potential vector, u(t),\nwhere each component of u(t) denotes the membrane potential of a single neuron. In discrete time,\nthe update equation is then\n(4)\nwhere w scales the strength of the weights, J is the connectivity matrix, f[\u00b7] is the activation rule that\nmaps membrane potential onto \ufb01ring rate, and I(t + 1) is the input at time t + 1. As in [26], we set\nJ = Jsym + \u03b3(t)Jasym such that the the connections are made up of a mixture of symmetric, Jsym,\nand asymmetric components, Jasym (de\ufb01ned as spatial derivative of Jsym), with mixing strength\n\u03b3(t) that can vary over time. Although the results presented here do not depend strongly on the exact\nforms of Jsym and Jasym, for concreteness we use the following expressions\n\nu(t + 1) = wJf [u(t)] + I(t + 1)\n\n\uf8ee\uf8f0cos\n\n(cid:16) 2\u03c0(i\u2212j)\n\n(cid:17) \u2212 1\n\n\uf8f9\uf8fb \u2212 c\n\nJ sym\nij = Kw exp\n\nN\n\u03c32\nw\n\n;\n\nJ asym\nij\n\n= \u2212 2\u03c0\nN \u03c32\nw\n\nsin\n\n(cid:18)2\u03c0(i \u2212 j)\n\n(cid:19)\n\nN\n\nJ sym\nij\n\n(5)\n\nwhere N is the number of neurons in the network and \u03c3w, Kw and c are constants that determine\nthe width and excitatory and inhibitory connection strengths respectively.\nTo approximate the Kalman \ufb01lter, the activation function must implement divisive inhibition [14, 13]\n\nS + \u00b5(cid:80)\n\n[u]+\n\ni[ui]+\n\nf[u] =\n\n(6)\n\nwhere [u]+ denotes reciti\ufb01cation of u; \u00b5 determines the strength of the divisive feedback and S\ndetermines the gain when there is no previous activity in the network.\nWhen w = 1, \u03b3(t) = 0 and I(t) = 0, the network is a line attractor over a wide range of Kw,\n\u03c3w, c, S and \u00b5, having a continuum of \ufb01xed points (as N \u2192 \u221e). Each \ufb01xed point has the same\nshape, taking the form of a smooth membrane potential pro\ufb01le, U(x) = Jsymf [U(x)], centered at\nlocation, x, in the network.\nWhen \u03b3(t) (cid:54)= 0, the bump of activity can be made to move over time (without losing its shape) [26]\nand hence, so long as \u03b3(t) = v(t), implement the prediction step of the Kalman \ufb01lter (equation\n1). That is, if the bump at time t is centered at \u02c6x(t), i.e. u(t) = U(\u02c6x(t)), then at time t + 1 it is\ncentered at \u00afx(t + 1) = \u02c6x(t) + \u03b3(t), i.e. u(t + 1) = U(\u02c6x(t) + \u03b3(t)) = U(\u00afx(t + 1)). Thus, in\nthis con\ufb01guration, the network can already implement the \ufb01rst step of the Kalman \ufb01lter through its\nrecurrent connectivity. The next two steps, equations 2 and 3, however, remain inaccessible as the\nnetwork has no way of encoding uncertainty and it is unclear how it will deal with external inputs.\n\n4 Relation to Kalman \ufb01lter - small prediction error case\n\nIn this section we outline how the neural network dynamics can be mapped onto those of a Kalman\n\ufb01lter. In the interests of space we focus only on the main points of the derivation, leaving the full\nworking to the supplementary material.\n\n2\n\n\fOur approach is to analyze the network in terms of U, which, for clarity, we de\ufb01ne here to be the\n\ufb01xed point membrane potential pro\ufb01le of the network when w = 1, \u03b3(t) = 0, I(t) = 0, S = S0 and\n\u00b5 = \u00b50. Thus, the results described here are independent of the exact form of U so long as it is a\nsmooth, non-uniform pro\ufb01le over the network.\nWe begin by making the assumption that both the input, I(t), and the network membrane potential,\nu(t), take the form of scaled versions U, with the former encoding the noisy observations, z(t), and\nthe latter encoding the network\u2019s estimate of position, \u02c6x(t), i.e.,\n\nI(t) = A(t)U(z(t))\n\nand u(t) = \u03b1(t)U(\u02c6x(t))\n\nSubstituting this ansatz for membrane potential into the left hand side of equation 4 gives\n\nLHS = \u03b1(t + 1)U (\u02c6x(t + 1))\n\nand into the right hand side of equation 4 gives\n\nRHS = wJf [\u03b1(t)U (\u02c6x(t))]\n\n+ A(t + 1)U (z(t + 1))\n\n(cid:125)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\n(cid:125)\n\nrecurrent input\n\nexternal input\n\nFor the ansatz to be self-consistent we require that RHS can be written in the same form as LHS.\nWe now show that this is the case.\nAs in the previous section, the recurrent input, implements the prediction step of the Kalman \ufb01lter,\nwhich, after a little algebra (see supplementary material), allows us to write\n\nRHS \u2248 CU (\u00afx(t + 1))\n\n+ A(t + 1)U (z(t + 1))\n\n(cid:125)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\n(cid:125)\n\nprediction\n\nexternal input\n\n(cid:124)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\n(cid:123)(cid:122)\n\n(cid:18)\n\n(7)\n\n(8)\n\n(9)\n\n(10)\n\n(11)\n\n(12)\n\n(13)\n\n(14)\n\n(15)\n\n(16)\n\nWith the variable C de\ufb01ned as\n\nwhere I =(cid:80)\n\ni [Ui(\u02c6x(t))]+.\n\nC =\n\n1\n\nS\n\nw(S0+\u00b50I)\n\n1\n\n\u03b1(t) +\n\n\u00b5I\n\nw(S0+\u00b50I)\n\nIf we now suppose that the prediction error [z(t + 1) \u2212 \u00afx(t + 1)] is small, then we can linearize\naround the prediction, \u00afx(t + 1), to get (see supplementary material)\n\nRHS \u2248 [C + A(t + 1)] U\n\n\u00afx(t + 1) + A(t + 1)\nA(t + 1) + C\n\n[z(t + 1) \u2212 \u00afx(t + 1)]\n\nwhich is of the same form as equation 8 and thus the ansatz holds. More speci\ufb01cally, equating terms\nin equations 8 and 12, we can write down expressions for \u03b1(t + 1) and \u02c6x(t + 1)\n\n(cid:19)\n\n\u03b1(t + 1) \u2248 C + A(t + 1) =\n\nS\n\nw(S0+\u00b50I)\n\u02c6x(t + 1) \u2248 \u00afx(t) + A(t + 1)\n\u03b1(t + 1)\n\n1\n\n1\n\n\u03b1(t) +\n\n\u00b5I\n\nw(S0+\u00b50I)\n\n+ A(t + 1)\n\n[z(t + 1) \u2212 x(t + 1)]\n\nwhich, if we de\ufb01ne w such that\n\nS\n\nw(S0 + \u00b50I)\n\n= 1\n\ni.e.\n\nw =\n\nS\n\nS0 + \u00b50I\n\nare identical to equations 2 and 3 so long as\n\n(a) \u03b1(t) \u221d 1\n\n\u02c6\u03c3x(t)2\n\n(b) A(t) \u221d 1\n\n\u03c3z(t)2\n\n(c)\n\n\u00b5I\nS\n\n\u221d \u03c3v(t)2\n\nThus the network dynamics, when the prediction error is small, map directly onto the Kalman \ufb01lter\nequations. This is our main result.\n\n3\n\n\fFigure 1: Comparison of noiseless network dynamics with dynamics of the Kalman Filter for small\nprediction errors.\n\n4.1\n\nImplications\n\nReciprocal code for uncertainty in input and estimate Equation 16a provides a link between the\nstrength of activity in the network and the overall uncertainty in the estimate of the Kalman \ufb01lter,\n\u02c6\u03c3x(t), with uncertainty decreasing as the activity increases. A similar relation is also implied for the\nuncertainty in the observations, \u03c3z(t), where equation 16b suggests that this should be re\ufb02ected in\nthe magnitude of the input, A(t). Interestingly, such a scaling, without a corresponding narrowing\nof tuning curves, is seen in the brain [20, 5, 2].\n\nCode for velocity signal As with Zhang\u2019s line attractor network [26], the mean of the velocity\nsignal, v(t) is encoded into the recurrent connections of the network, with the degree of asymmetry\nin the weights, \u03b3(t), proportional to the speed. Such hard coding of the velocity signal represents\na limitation of the model, as we would like to be able to deal with arbitrary, time varying speeds.\nHowever, this kind of change could be implemented by pre-synaptic inhibition [24] or by using a\n\u2018double-ring\u2019 network similar to [25].\nEquation 16c implies that the variance of the velocity signal, \u03c3v(t), is encoded in the strength of the\ndivisive feedback, \u00b5 (assuming constant S). This is very different from Zhang\u2019s model, that has no\nconcept of uncertainty and is also very different from the traditional view of divisive inhibition that\nsees it as a mechanism for gain control [14, 13].\n\nThe network is no longer a line attractor This can be seen by considering the \ufb01xed point values\nof the scale factor, \u03b1(t), when the input current, I(t) = 0. Requiring \u03b1(t + 1) = \u03b1(t) = \u03b1\u2217 in\nequation 13 gives values for these \ufb01xed points as\n\n\u03b1\u2217 = 0\n\nand\n\n\u03b1\u2217 =\n\n(cid:18) S0 + \u00b50I\n\n(cid:19)\n\n\u00b5I\n\nw \u2212 S\n\u00b5I\n\n(17)\n\nThis second solution is exactly zero when w satis\ufb01es equation 15, hence the network only has one\n\ufb01xed point corresponding to the all zero state and is not a line attractor. This is a key result as it\nremoves all of the constraints required for line attractor dynamics such as in\ufb01nite precision in the\nweights and lack of noise in the network and thus the network is much more biologically plausible.\n\n4.2 An example\n\nIn \ufb01gure 1 we demonstrate the ability of the network to approximate the dynamics of a one-\ndimensional Kalman \ufb01lter. The input, shown in \ufb01gure 1A, is a noiseless bump of current centered\n\n4\n\nneuron #time step02040608010020406080100neuron #time step020406080100204060801000204060801002030405060positiontime step0204060801000246time step\u03c3x(t)ABCD\fFigure 2: Response of the network when presented with a noisy moving bump input.\n\nat the position of the observation, z(t). The observation noise has standard deviation \u03c3z(t) = 5, the\nspeed v(t) = 0.5 for 1 \u2264 t < 50 and v(t) = \u22120.5 for 50 \u2264 t < 100 and the standard deviation of\nthe random walk dynamics, \u03c3v(t) = 0.2. In accordance with equation 16b, the height of each bump\nis scaled by 1/\u03c3z(t)2.\nIn \ufb01gure 1B we plot the output activity of the network over time. Darker shades correspond to higher\n\ufb01ring rates. We assume that the network gets the correct velocity signal, i.e. \u03b3(t) = v(t) and \u00b5 is\nset such that equation 16c holds. The other parameters are set to Kw = 1, \u03c3w = 0.2, c = 0.05,\nS = S0 = 1 and \u00b50 = 1 which gives I = 5.47. As can be seen from the plot, the amount of\nactivity in the network steadily grows from zero over time to an asymptotic value, corresponding\nto the network\u2019s increasing certainty in its predictions. The position of the bump of activity in the\nnetwork is also much less jittery than the input bump, re\ufb02ecting a certain amount of smoothing.\nIn \ufb01gure 1C we compare the positions of the input bumps (gray dots) with the position of the network\nbump (black line) and the output of the equivalent Kalman \ufb01lter (red line). The network clearly\ntracks the Kalman \ufb01lter estimate extremely well. The same is true for the network\u2019s estimate of\n\nthe uncertainty, computed as 1/(cid:112)\u03b1(t) and shown as the black line in \ufb01gure 1D, which tracks the\n\nKalman \ufb01lter uncertainty (red line) almost exactly.\n\n5 Effect of input noise\n\nWe now consider the effect of noise on the ability of the network to implement a Kalman \ufb01lter. In\nparticular we consider noise in the input signal, which for this simple one layer network is equivalent\nto having noise in the update equation. For brevity, we only present the main results along with the\nresults of simulations, leaving more detailed analysis to the supplementary material.\nSpeci\ufb01cally, we consider input signals where the only source of noise is in the input current i.e. there\nis no additional jitter in the position of the bump as there was in the noiseless case, thus we write\n\nI(t) = A(t)U (x(t)) + \u0001(t)\n\n(18)\n\nwhere \u0001(t) is some noise vector. The main effect of the noise is that it perturbs the effective position\nof the input bump. This can be modeled by extracting the maximum likelihood estimate of the input\nposition given the noisy input and then using this position as the input to the equivalent Kalman\n\ufb01lter. Because of the noise, this extracted position is not, in general, the same as the noiseless input\nposition and for zero mean Gaussian noise with covariance \u03a3, the variance of the perturbation, \u03c3z(t),\n\n5\n\nneuron #time step02040608010020406080100neuron #time step020406080100204060801000204060801002030405060positiontime step02040608010000.511.52time step\u03c3x(t)ABCD\fFigure 3: Effect of noise magnitude on performance of network.\n\nis approximately given by\n\n(cid:114)\n\n2\n\nU(cid:48)T \u03a3\u22121U(cid:48)\n\n\u03c3z(t) \u2248 1\nA(t)\n\n(19)\n\nNow, for the network to approximate a Kalman \ufb01lter, equation 16b must hold which means that we\nrequire the magnitude of the covariance matrix to scale in proportion to the strength of the input\nsignal, A(t), i.e. \u03a3 \u221d A(t). Interestingly this relation is true for Poisson noise, the type of noise\nthat is found all over the brain.\nIn \ufb01gure 2 we demonstrate the ability of the network to approximate a Kalman \ufb01lter. In panel A\nwe show the input current which is a moving bump of activity corrupted by independent Gaussian\nnoise of standard deviation \u03c3noise = 0.23, or about two thirds of the maximum height of the \ufb01xed\npoint bump, U. This is a high noise setting and it is hard to see the bump location by eye. The\nnetwork dramatically cleans up this input signal (\ufb01gure 2B) and the output activity, although still\nnoisy, re\ufb02ects the position of the underlying stimulus much more faithfully than the input. (Note\nthat the colour scales in A and B are different).\nIn panel C we compare the position of the output bump in the network (black line) with that of the\nequivalent Kalman \ufb01lter. To do this we \ufb01rst \ufb01t the noisy input bump at each time to obtain input\npositions z(t) shown as gray dots. Then using \u03c3z = 2.23 computed using equation 19 we can\ncompute the estimates of the equivalent Kalman \ufb01lter (thick red line). which closely match those\nof the network (black line). Similarly, there is good agreement between the two estimates of the\nuncertainty, \u02c6\u03c3x(t), panel D (black line - network, red line - Kalman \ufb01lter).\n\n5.1 Performance of the network as a function of noise magnitude\n\nThe noise not only affects the position of the input bump but also, in a slightly more subtle manner,\ncauses a gradual decline in the ability of the network to emulate a Kalman \ufb01lter. The reason for\nthis (outlined in more detail in the supplementary material) is that the output bump scale factor,\n\u03b1, decreases as a function of the noise variance, \u03c3noise. This effect is illustrated in \ufb01gure 3A\nwhere we plot the steady state value of \u03b1 (for constant input strength, A(t)) as a function of \u03c3noise.\nThe average results of simulations on 100 neurons are shown as the red dots, while the black line\nrepresents the results of the theory in the supplementary material.\nThe reason for the decline in \u03b1 as \u03c3noise goes up is that, because of the rectifying non-linearity in\nthe activation rule, increasing \u03c3noise increases the amount of noisy activity in the network. Because\nof inhibition (both divisive and subtractive) in the network, this \u2018noisy activity\u2019 competes with the\nbump activity and decreases it - thus reducing \u03b1.\nThis decrease in \u03b1 results in a change in the Kalman gain of the network, by equation 14, making\nit different from that of the equivalent Kalman \ufb01lter, thus degrading the network\u2019s performance. We\nquantify this difference in \ufb01gure 3B where we plot the root mean squared error (in units of neural\nposition) between the network and the equivalent Kalman \ufb01lter as a function of \u03c3noise. As before, the\nresults of simulations are shown as red dots and the theory (outlined in the supplementary material)\nis the black line. To give some sense for the scale on this plot, the horizontal blue line corresponds\nto the maximum height of the (noise free) input bump. Thus we may conclude that the performance\nof the network and the theory are robust up to fairly large values of \u03c3noise.\n\n6\n\n00.511.522.530246\u03b1\u03c3noise00.511.522.530246810\u03c3noiseERMS vs KFAB\fFigure 4: Response of the network to changepoints.\n\n6 Response to changepoints (and outliers) - large prediction error case\n\nWe now consider the dynamics of the network when the prediction error is large. By large we mean\nthat the prediction error is greater than the width of the bump of activity in the network. Such a big\ndiscrepancy could be caused by an outlier or a changepoint, i.e. a sustained large and abrupt change\nin the input position at a random time. In the interests of space we focus only on the latter case and\nsuch an input, with a changepoint at t = 50, is shown in \ufb01gure 4A.\nIn \ufb01gure 4B we show the network\u2019s response to this stimulus. As before, prior to the change, there\nis a single bump of activity whose position approximates that of a Kalman \ufb01lter. However, after the\nchangepoint, the network maintains two bumps of activity for several time steps. One at the original\nposition, that shrinks over time and essentially predicts where the input would be if the change had\nnot occurred, and a second, that grows over time, at the location of the input after the changepoint.\nThus in the period immediately after the changepoint, the network can be thought of as encoding\ntwo separate and competing hypotheses about the position of the stimulus, one corresponding to the\ncase where no change has occurred, and the other, the case where a change occurred at t = 50.\nIn \ufb01gure 4C we compare the position of the bump(s) in the network (black dots whose size re\ufb02ects\nthe size of each bump) to the output from the Kalman \ufb01lter (red line). Before the changepoint, the\ntwo agree well, but after the change, the Kalman \ufb01lter becomes suboptimal, taking a long time to\nmove to the new position. The network, however, by maintaining two hypotheses reacts much better.\nFinally, in \ufb01gure 4D we plot the scale factor, \u03b1i(t), of each bump as computed from the simulations\n(black dots) and from the approximate analytic solution described in the supplementary material\n(red line for bump at 30, blue line for bump at 80). As can be seen, there is good agreement between\ntheory and simulation, with the largest discrepancy occurring for small values of the scale factor.\nThus, when confronted with a changepoint, the network no longer approximates a Kalman \ufb01lter\nand instead maintains two competing hypotheses in a way that is qualitatively similar to that of the\nrun-length distribution in [1]. This is an extremely interesting result and hints at ways in which more\ncomplex distributions may be encoded in these type of networks.\n\n7 Discussion\n\n7.1 Relation to previous work\n\nOf the papers mentioned in the introduction, two are of particular relevance to the current work.\nIn the \ufb01rst, [8], the authors considered a neural implementation of the Kalman \ufb01lter using line\n\n7\n\ntime stepneuron #02040608010020406080100time stepneuron #02040608010020406080100020406080100020406080100time stepposition0204060801000123time step\u03b1i(t)ABCD\fattractors. Although this work, at \ufb01rst glance, seems similar to what is presented here, there are\nseveral major differences, the main one being that our network is not a line attractor at all, while the\nresults in [8] rely on this property. Also, in [8], the Kalman gain is changed manually, where as in\nour case it adjusts automatically (equations 13 and 14), and the form of non-linearity is different.\nProbabilistic population coding [16, 4] is more closely related to model presented here. Combined\nwith divisive normalization, these networks can implement a Kalman \ufb01lter exactly, while the model\npresented here can \u2018only\u2019 approximate one. While this may seem like a limitation of our network,\nwe see it as an advantage as the breakdown of the approximation leads to a more robust response to\noutliers and changepoints than a pure Kalman \ufb01lter.\n\n7.2 Extension beyond one-dimensional Gaussians\n\nA major limitation of the current model is that it only applies to one-dimensional Gaussian tracking\n- clearly an unreasonable restriction for the brain. One possible way around this limitation is hinted\nat by the response of the network in the changepoint case where we saw two, largely independent\nbumps of activity in the network. This ability to encode multiple \u2018particles\u2019 in the network may\nallow networks of this kind to implement something like the dynamics of a particle \ufb01lter [12] that\ncan approximate the inference process for non-linear and non-Gaussian systems. Such a possibility\nis an intriguing idea for future work.\n\n7.3 Experimental predictions\n\nThe model makes at least two easily testable predictions about the response of head direction cells\n[21, 22, 23] in rats. The \ufb01rst comes by considering the response of the neurons in the \u2018dark\u2019.\nAssuming that all bearing cues can indeed be eliminated, by setting A(t) = 0 in equation 13, we\nexpect the activity of the neurons to fall off as 1/t and that the shape of the tuning curves will\nremain approximately constant. Note that this prediction is vastly different from the behaviour of a\nline attractor, where we would not expect the level of activity to fall off at all in the dark.\nAnother, slightly more ambitious experiment would involve perturbing the reliability of one of the\nlandmark cues. In particular, one could imagine a training phase, where the position of one landmark\nis jittered over time, such that each time the rat encounters it it is at a slightly different heading. In\nthe test case, all other, reliable, landmark cues would be removed and the response of head direction\ncells measured in response to presentation of the unreliable cue alone. The prediction of the model\nis that this would reduce the strength of the input, A, which in turn reduces the level of activity in\nthe head direction cells, \u03b1. In particular, if \u03c3z is the jitter of the unreliable landmark, then we expect\n\u03b1 to scale as 1/\u03c32\nz. This prediction is very different from that of a line attractor which would predict\na constant level of activity regardless of the reliability of the landmark cues.\n\n8 Conclusions\n\nIn this paper we have introduced a novel neural network model whose dynamics map directly onto\nthose of a one-dimensional Kalman \ufb01lter when the prediction error is small. This property is robust\nto noise and when the prediction error is large, such as for changepoints, the output of the network\ndiverges from that of the Kalman \ufb01lter, but in a way that is both interesting and useful. Finally, the\nmodel makes two easily testable experimental predictions about head direction cells.\n\nAcknowledgements\n\nWe would like to thank the anonymous reviewers for their very helpful comments on this work.\n\nReferences\n[1] R.P. Adams and D.J.C. MacKay. Bayesian online changepoint detection. Technical report, University of\n\nCambridge, Cambridge, UK, 2007.\n\n[2] J. S. Anderson, I. Lampl, D. C. Gillespie, and D. Ferster. The contribution of noise to contrast invariance\n\nof orientation tuning in cat visual cortex. Science, 290:1968\u20131972, 2000.\n\n8\n\n\f[3] M. J. Barber, J. W. Clark, and C. H. Anderson. Neural representation of probabilistic information. Neural\n\nComputation, 15:1843\u20131864, 2003.\n\n[4] J. Beck, W. J. Ma, P. E. Latham, and A. Pouget. Probabilistic population codes and the exponential family\n\nof distributions. Progress in Brain Research, 165:509\u2013519, 2007.\n\n[5] K. H. Britten, M. N. Shadlen, W. T. Newsome, and J. A. Movshon. Response of neurons in macaque mt\n\nto stochastic motion signals. Visual Neuroscience, 10(1157-1169), 1993.\n\n[6] S. Deneve. Bayesian spiking neurons i: Inference. Neural Computation, 20:91\u2013117, 2008.\n[7] S. Deneve. Bayesian spiking neurons ii: Learning. Neural Computation, 20:118\u2013145, 2008.\n[8] S. Deneve, J.-R. Duhammel, and A. Pouget. Optimal sensorimotor integration in recurrent cortical net-\n\nworks: a neural implementation of kalman \ufb01lters. Journal of Neuroscience, 27(21):5744\u20135756, 2007.\n\n[9] S. Deneve, P. E. Latham, and A. Pouget. Reading population codes: a neural implementation of ideal\n\nobservers. Nature Neuroscience, 2(8):740\u2013745, 1999.\n\n[10] S. Deneve, P. E. Latham, and A. Pouget. Ef\ufb01cient computation and cue integration with noisy population\n\ncodes. Nature Neuroscience, 4(8):826\u2013831, 2001.\n\n[11] J. I. Gold and M. N. Shadlen. Representation of a perceptual decision in developing oculomotor com-\n\nmands. Nature, 404(390-394), 2000.\n\n[12] N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-gaussian bayesian\n\nstate estimation. IEE-Proceedings-F, 140:107\u2013113, 1993.\n\n[13] D. J. Heeger. Modeling simple cell direction selectivity with normalized half-squared, linear operators.\n\nJournal of Neurophysiology, 70:1885\u20131897, 1993.\n\n[14] D. J. Heeger. Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9:181\u2013198, 1993.\n[15] P. E. Latham, S. Deneve, and A. Pouget. Optimal computation with attractor networks. Journal of\n\nPhysiology Paris, 97(683-694), 2003.\n\n[16] W. J. Ma, J. M. Beck, P. E. Latham, and A. Pouget. Bayesian inference with probabilistic population\n\ncodes. Nature Neuroscience, 9(11):1432\u20131438, 2006.\n\n[17] R. P. N. Rao. Bayesian computation in recurrent neural circuits. Neural Computation, 16:1\u201338, 2004.\n[18] R. P. N. Rao. Hierarchical bayesian inference in networks of spiking neurons. In Advances in Neural\n\nInformation Processing Systems, volume 17, 2005.\n\n[19] M. Sahani and P. Dayan. Doubly distributional population codes: simultaneous representation of uncer-\n\ntainty and multiplicity. Neural Computation, 15:2255\u20132279, 2003.\n\n[20] G. Sclar and R. D. Freeman. Orientation selectivity in the cat\u2019s striate cortex is invariant with stimulus\n\ncontrast. Experimental Brain Research, 46:457\u2013461, 1982.\n\n[21] J. S. Taube, R. U. Muller, and J. B. Ranck. Head-direction cells recorded from postsubiculum in freely\n\nmoving rats. i. description and quantitative analysis. Journal of Neuroscience, 10(2):420\u2013435, 1990.\n\n[22] J. S. Taube, R. U. Muller, and J. B. Ranck. Head-direction cells recorded from postsubiculum in freely\nmoving rats. ii. effects of environmental manipulations. Journal of Neuroscience, 10(2):436\u2013447, 1990.\n[23] S. I. Wiener and J. S. Taube. Head direction cells and the neural mechanisms of spatial orientation. MIT\n\nPress, 2005.\n\n[24] L.-G. Wu and P. Saggau. Presynaptic inhibition of elicited neurotransmitter release. Trends in Neuro-\n\nscience, 20:204\u2013212, 1997.\n\n[25] X. Xie, R. H. Hahnloser, and H. S. Seung. Double-ring network model of the head-direction system.\n\nPhysical Review E, 66:0419021\u20130419029, 2002.\n\n[26] K. Zhang. Representation of spatial orientation by the intrinsic dynamics of the head-direction cell en-\n\nsemble: a theory. Journal of Neuroscience, 16(6):2112\u20132126, 1996.\n\n9\n\n\f", "award": [], "sourceid": 920, "authors": [{"given_name": "Robert", "family_name": "Wilson", "institution": null}, {"given_name": "Leif", "family_name": "Finkel", "institution": null}]}