{"title": "Bayesian inference in spiking neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 353, "page_last": 360, "abstract": null, "full_text": " Bayesian inference in spiking neurons\n\n\n\n Sophie Deneve\n Gatsby Computational Neuroscience Unit\n University College London\n London, UK WC1N 3AR\n sdeneve@gatsby.ucl.ac.uk\n\n\n\n\n Abstract\n\n We propose a new interpretation of spiking neurons as Bayesian integra-\n tors accumulating evidence over time about events in the external world\n or the body, and communicating to other neurons their certainties about\n these events. In this model, spikes signal the occurrence of new infor-\n mation, i.e. what cannot be predicted from the past activity. As a result,\n firing statistics are close to Poisson, albeit providing a deterministic rep-\n resentation of probabilities. We proceed to develop a theory of Bayesian\n inference in spiking neural networks, recurrent interactions implement-\n ing a variant of belief propagation.\n\n\nMany perceptual and motor tasks performed by the central nervous system are probabilis-\ntic, and can be described in a Bayesian framework [4, 3]. A few important but hidden\nproperties, such as direction of motion, or appropriate motor commands, are inferred from\nmany noisy, local and ambiguous sensory cues. These evidences are combined with priors\nabout the sensory world and body. Importantly, because most of these inferences should\nlead to quick and irreversible decisions in a perpetually changing world, noisy cues have to\nbe integrated on-line, but in a way that takes into account unpredictable events, such as a\nsudden change in motion direction or the appearance of a new stimulus.\n\nThis raises the question of how this temporal integration can be performed at the neural\nlevel. It has been proposed that single neurons in sensory cortices represent and compute\nthe log probability that a sensory variable takes on a certain value (eg Is visual motion in\nthe neuron's preferred direction?) [9, 7]. Alternatively, to avoid normalization issues and\nprovide an appropriate signal for decision making, neurons could represent the log proba-\nbility ratio of a particular hypothesis (eg is motion more likely to be towards the right than\ntowards the left) [7, 6]. Log probabilities are convenient here, since under some assump-\ntions, independent noisy cues simply combine linearly. Moreover, there are physiological\nevidence for the neural representation of log probabilities and log probability ratios [9, 6, 7].\n\nHowever, these models assume that neurons represent probabilities in their firing rates. We\nargue that it is important to study how probabilistic information are encoded in spikes.\nIndeed, it seems spurious to marry the idea of an exquisite on-line integration of noisy cues\nwith an underlying rate code that requires averaging on large populations of noisy neurons\nand long periods of time. In particular, most natural tasks require this integration to take\nplace on the time scale of inter-spike intervals. Spikes are more efficiently signaling events\n\n Institute of Cognitive Science, 69645 Bron, France\n\n\f\nthan analog quantities. In addition, a neural theory of inference with spikes will bring us\ncloser to the physiological level and generate more easily testable predictions.\n\nThus, we propose a new theory of neural processing in which spike trains provide a de-\nterministic, online representation of a log-probability ratio. Spikes signals events, eg that\nthe log-probability ratio has exceeded what could be predicted from previous spikes. This\nform of coding was loosely inspired by the idea of \"energy landscape\" coding proposed\nby Hinton and Brown [2]. However, contrary to [2] and other theories using rate-based\nrepresentation of probabilities, this model is self-consistent and does not require different\nmodels for encoding and decoding: As output spikes provide new, unpredictable, tempo-\nrally independent evidence, they can be used directly as an input to other Bayesian neurons.\n\nFinally, we show that these neurons can be used as building blocks in a theory of approx-\nimate Bayesian inference in recurrent spiking networks. Connections between neurons\nimplement an underlying Bayesian network, consisting of coupled hidden Markov models.\nPropagation of spikes is a form of belief propagation in this underlying graphical model.\n\nOur theory provides computational explanations of some general physiological properties\nof cortical neurons, such as spike frequency adaptation, Poisson statistics of spike trains,\nthe existence of strong local inhibition in cortical columns, and the maintenance of a tight\nbalance between excitation and inhibition. Finally, we discuss the implications of this\nmodel for the debate about temporal versus rate-based neural coding.\n\n\n1 Spikes and log posterior odds\n\n1.1 Synaptic integration seen as inference in a hidden Markov chain\n\nWe propose that each neuron codes for an underlying \"hidden\" binary variable, xt, whose\nstate evolves over time. We assume that xt depends only on the state at the previous time\nstep, xt-dt, and is conditionally independent of other past states. The state xt can switch\nfrom 0 to 1 with a constant rate ron = 1 lim\n dt dt0 P (xt = 1|xt-dt = 0), and from 1 to\n0 with a constant rate roff . For example, these transition rates could represent how often\nmotion in a preferred direction appears the receptive field and how long it is likely to stay\nthere.\n\nThe neuron infers the state of its hidden variable from N noisy synaptic inputs, considered\nto be observations of the hidden state. In this initial version of the model, we assume\nthat these inputs are conditionally independent homogeneous Poisson processes, synapse\ni emitting a spike between time t and t + dt (sit = 1) with constant probability qiondt if\nxt = 1, and another constant probability qi dt\n off if xt = 0. The synaptic spikes are assumed\nto be otherwise independent of previous synaptic spikes, previous states and spikes at other\nsynapses. The resulting generative model is a hidden Markov chain (figure 1-A).\n\nHowever, rather than estimating the state of its hidden variable and communicating this\nestimate to other neurons (for example by emitting a spike when sensory evidence for\nxt = 1 goes above a threshold) the neuron reports and communicates its certainty that the\ncurrent state is 1. This certainty takes the form of the log of the ratio of the probability\nthat the hidden state is 1, and the probability that the state is 0, given all the synaptic inputs\nreceived so far: Lt = log P (xt=1|s0t)\n P (xt=0|s0t) . We use s0t as a short hand notation for the N\nsynaptic inputs received at present and in the past. We will refer to it as the log odds ratio.\n\nThanks to the conditional independencies assumed in the generative model, we can com-\npute this Log odds ratio iteratively. Taking the limit as dt goes to zero, we get the following\ndifferential equation:\n\n L = ron 1 + e-L - roff 1 + eL + w\n i i(sit - 1) - \n\n\f\nA. B.\n r .r r .r\n on off on off\n x x x\n t dt i\n t t dt st Ot\n\n q , q q , q q , q\n on off j\n on off on off st I O G\n t Lt t t\n\n t t\n s s s\n t dt t t dt\n\n\n\n\nC.\n E.\n 4\n sd 2\n do g 0\n oL -2\n -4 500 1000 1500 2000 2500 3000\nD. 2 20\n\n 0\n L Count\n t Ot\n -2 0\n\n 500 1000 1500 2000 2500 3000 0 200 400 600\n Time ISI\n\n\nFigure 1: A. Generative model for the synaptic input. B. Schematic representation of log\nodds ratio encoding and decoding. The dashed circle represents both eventual downstream\nelements and the self-prediction taking place inside the model neuron. A spike is fired only\nwhen Lt exceeds Gt. C. One example trial, where the state switches from 0 to 1 (shaded\narea) and back to 0. plain: Lt, dotted: Gt. Black stripes at the top: corresponding spikes\ntrain. D. Mean Log odds ratio (dark line) and mean output firing rate (clear line). E. Output\nspike raster plot (1 line per trial) and ISI distribution for the neuron shown is C. and D.\nClear line: ISI distribution for a poisson neuron with the same rate.\n\n\n\nwi, the synaptic weight, describe how informative synapse i is about the state of the hidden\nvariable, e.g. wi = log qion . Each synaptic spike (si\n qi t = 1) gives an impulse to the log\n off\nodds ratio, which is positive if this synapse is more active when the hidden state if 1 (i.e it\nincreases the neuron's confidence that the state is 1), and negative if this synapse is more\nactive when xt = 0 (i.e it decreases the neuron's confidence that the state is 1).\n\nThe bias, , is determined by how informative it is not to receive any spike, e.g. =\n qi\n i on - qioff . By convention, we will consider that the \"bias\" is positive or zero (if not,\nwe need simply to invert the status of the state x).\n\n\n1.2 Generation of output spikes\n\nThe spike train should convey a sparse representation of Lt, so that each spike reports new\ninformation about the state xt that is not redundant with that reported by other, preceding,\nspikes. This proposition is based on three arguments: First, spikes, being metabolically\nexpensive, should be kept to a minimum. Second, spikes conveying redundant information\nwould require a decoding of the entire spike train, whereas independent spike can be taken\ninto account individually. And finally, we seek a self consistent model, with the spiking\noutput having a similar semantics to its spiking input.\n\nTo maximize the independence of the spikes (conditioned on xt), we propose that the neu-\nron fires only when the difference between its log odds ratio Lt and a prediction Gt of this\nlog odds ratio based on the output spikes emitted so far reaches a certain threshold. Indeed,\nsupposing that downstream elements predicts Lt as best as they can, the neuron only needs\nto fire when it expects that prediction to be too inaccurate (figure 1-B). In practice, this\n\n\f\nwill happen when the neuron receives new evidence for xt = 1. Gt should thereby follow\nthe same dynamics as Lt when spikes are not received. The equation for Gt and the output\nOt (Ot = 1 when an output spike is fired) are given by:\n\n\n \n G = ron 1 + e-L - roff 1 + eL + go(Ot - 1) (1)\n g\n O o\n t = 1. when Lt > Gt + , 0 otherwise, (2)\n 2\n\nHere go, a positive constant, is the only free parameter, the other parameters being con-\nstrained by the statistics of the synaptic input.\n\n\n1.3 Results\n\nFigure 1-C plots a typical trial, showing the behavior of L, G and O before, during and after\npresentation of the stimulus. As random synaptic inputs are integrated, L fluctuates and\neventually exceeds G + 0.5, leading to an output spike. Immediately after a spike, G jumps\nto G + go, which prevents (except in very rare cases) a second spike from immediately\nfollowing the first. Thus, this \"jump\" implements a relative refractory period. However,\nG decays as it tends to converge back to its stable level gstable = log ron\n r . Thus L\n off\neventually exceeds G again, leading to a new spike. This threshold crossing happens more\noften during stimulation (xt = 1) as the net synaptic input alters to create a higher overall\nlevel of certainty, Lt.\n\nMean Log odds ratio and output firing rate\n\nThe mean firing rate \n Ot of the Bayesian neuron during presentation of its preferred stimulus\n(i.e. when xt switches from 0 to 1 and back to 0) is plotted in figure 1-D, together with the\nmean log posterior ratio \n Lt, both averaged over trials. Not surprisingly, the log-posterior\nratio reflects the leaky integration of synaptic evidence, with an effective time constant that\ndepends on the transition probabilities ron, roff . If the state is very stable (ron = roff 0),\nsynaptic evidence is integrated over almost infinite time periods, the mean log posterior\nratio tending to either increase or decrease linearly with time. In the example in figure 1-\nD, the state is less stable, so \"old\" synaptic evidence are discounted and Lt saturates.\n\nIn contrast, the mean output firing rate \n Ot tracks the state of xt almost perfectly. This\nis because, as a form of predictive coding, the output spikes reflect the new synaptic\nevidence, It = (si\n i t - 1) - , rather than the log posterior ratio itself. In partic-\nular, the mean output firing rate is a rectified linear function of the mean input, e. g.\n +\n\nO = 1 \n I = w - \n g iqi .\n o i on(off)\n\nAnalogy with a leaky integrate and fire neuron\n\nWe can get an interesting insight into the computation performed by this neuron by lineariz-\ning L and G around their mean levels over trials. Here we reduce the analysis to prolonged,\nstatistically stable periods when the state is constant (either ON or OFF). In this case, the\nmean level of certainty \n L and its output prediction \n G are also constant over time. We make\nthe rough approximation that the post spike jump, go, and the input fluctuations are small\ncompared to the mean level of certainty \n L.\n\nRewriting Vt = Lt - Gt + go\n 2 as the \"membrane potential\" of the Bayesian neuron:\n\n \n V = -kLV + It - g - g\n o oOt\n\nwhere k \n L = rone- \n L +roffeL, the \"leak\" of the membrane potential, depends on the overall\nlevel of certainty. g is positive and a monotonic increasing function of g\n o o.\n\n\f\n 1\n A. s 1 1 B.\n s s C.\n t dt t t dt\n\n\n\n 1 1\n x x\n t t\n\n\n\n\n\n 3 3\n x x 3\n x\n t dt t t dt 2 2\n x 3\n x n\n x x 3\n x n\n x\n t t t t t t\n ... ...\n\n\n ...\n 1 1\n x x 1\n x Lx2\n t dt t t dt D. -0.5\n -1-1 Feedback\n\n 10 -1.5\n -2\n s No inh 500 1000 1500 2000\n d\n 2 Tiger\n 2 2 5\n x x x d\n t dt t t dt Stripes\n o 0\n goL -5\n 2 2 -10\n s 2\n s s 500 1000 1500 2000 2500\n t dt t t dt\n Time\n\n\nFigure 2: A. Bayesian causal network for yt (tiger), x1t (stripes) and x2t (paws). B. A net-\nwork feedforward computing the log posterior for x1t. C. A recurrent network computing\nthe log posterior odds for all variables. D. Log odds ratio in a simulated trial with the net-\nwork in C (see text). Thick line: Lx2\n t , thin line: Lx1\n t , dash-dotted: Lx1\n t without inhibition.\nInsert: Lx2\n t averaged over trials, showing the effect of feedback.\n\n\n\nThe linearized Bayesian neuron thus acts in its stable regime as a leaky integrate and fire\n(LIF) neuron. The membrane potential Vt integrates its input, Jt = It - g , with a leak\n o\nkL. The neuron fires when its membrane potential reaches a constant threshold go. After\neach spikes, Vt is reset to 0.\n\nInterestingly, for appropriately chosen compression factor go, the mean input to the lin-\nearized neuron \n J = \n I - g 0 1. This means that the membrane potential is purely\n o\ndriven to its threshold by input fluctuations, or a random walk in membrane potential. As\na consequence, the neuron's firing will be memoryless, and close to a Poisson process. In\nparticular, we found Fano factor close to 1 and quasi-exponential ISI distribution (figure 1-\nE) on the entire range of parameters tested. Indeed, LIF neurons with balanced inputs have\nbeen proposed as a model to reproduce the statistics of real cortical neurons [8]. This bal-\nance is implemented in our model by the neuron's effective self-inhibition, even when the\nsynaptic input itself is not balanced.\n\nDecoding\n\nAs we previously said, downstream elements could predict the log odds ratio Lt by com-\nputing Gt from the output spikes (Eq 1, fig 1-B). Of course, this requires an estimate of\nthe transition probabilities ron, roff , that could be learned from the observed spike trains.\n\nHowever, we show next that explicit decoding is not necessary to perform bayesian infer-\nence in spiking networks. Intuitively, this is because the quantity that our model neurons\nreceive and transmit, eg new information, is exactly what probabilistic inference algorithm\npropagate between connected statistical elements.\n\n\n\n 1Even if go is not chosen optimally, the influence of the drift \n J is usually negligible compared to\nthe large fluctuations in membrane potential.\n\n\f\n2 Bayesian inference in cortical networks\n\nThe model neurons, having the same input and output semantics, can be used as build-\ning blocks to implement more complex generative models consisting of coupled Markov\nchains. Consider, for example, the example in figure 2-A. Here, a \"parent\" variable x1t\n(the presence of a tiger) can cause the state of n other \"children\" variables ([xkt]k=2...n),\nof whom two are represented (the presence of stripes,x2t, and motion, x3t). The \"chil-\ndren\" variables are Bayesian neurons identical to those described previously. The resulting\nbayesian network consist of n + 1 coupled hidden Markov chains. Inference in this archi-\ntecture corresponds to computing the log posterior odds ratio for the tiger, x1t, and the log\nposterior of observing stripes or motion, ([xkt]k=2...n), given the synaptic inputs received\nby the entire network so far, i.e. s20t, . . . , sk0t.\n\nUnfortunately, inference and learning in this network (and in general in coupled Markov\nchains) requires very expensive computations, and cannot be performed by simply propa-\ngating messages over time and among the variable nodes. In particular, the state of a child\nvariable xkt depends on xkt-dt, skt, x1t and the state of all other children at the previous\ntime step, [xj ]\n t-dt 2