{"title": "Hierarchical Bayesian Inference in Networks of Spiking Neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 1113, "page_last": 1120, "abstract": null, "full_text": " Hierarchical Bayesian Inference in\n Networks of Spiking Neurons\n\n\n\n Rajesh P. N. Rao\n Department of Computer Science and Engineering\n University of Washington, Seattle, WA 98195\n rao@cs.washington.edu\n\n\n Abstract\n\n There is growing evidence from psychophysical and neurophysiological\n studies that the brain utilizes Bayesian principles for inference and de-\n cision making. An important open question is how Bayesian inference\n for arbitrary graphical models can be implemented in networks of spik-\n ing neurons. In this paper, we show that recurrent networks of noisy\n integrate-and-fire neurons can perform approximate Bayesian inference\n for dynamic and hierarchical graphical models. The membrane potential\n dynamics of neurons is used to implement belief propagation in the log\n domain. The spiking probability of a neuron is shown to approximate the\n posterior probability of the preferred state encoded by the neuron, given\n past inputs. We illustrate the model using two examples: (1) a motion de-\n tection network in which the spiking probability of a direction-selective\n neuron becomes proportional to the posterior probability of motion in\n a preferred direction, and (2) a two-level hierarchical network that pro-\n duces attentional effects similar to those observed in visual cortical areas\n V2 and V4. The hierarchical model offers a new Bayesian interpretation\n of attentional modulation in V2 and V4.\n\n\n1 Introduction\n\nA wide range of psychophysical results have recently been successfully explained using\nBayesian models [7, 8, 16, 19]. These models have been able to account for human re-\nsponses in tasks ranging from 3D shape perception to visuomotor control. Simultaneously,\nthere is accumulating evidence from human and monkey experiments that Bayesian mecha-\nnisms are at work during visual decision making [2, 5]. The versatility of Bayesian models\nstems from their ability to combine prior knowledge with sensory evidence in a rigorous\nmanner: Bayes rule prescribes how prior probabilities and stimulus likelihoods should be\ncombined, allowing the responses of subjects or neural responses to be interpreted in terms\nof the resulting posterior distributions.\n\nAn important question that has only recently received attention is how networks of corti-\ncal neurons can implement algorithms for Bayesian inference. One powerful approach has\nbeen to build on the known properties of population coding models that represent informa-\ntion using a set of neural tuning curves or kernel functions [1, 20]. Several proposals have\nbeen made regarding how a probability distribution could be encoded using population\ncodes ([3, 18]; see [14] for an excellent review). However, the problem of implementing\ngeneral inference algorithms for arbitrary graphical models using population codes remains\nunresolved (some encouraging initial results are reported in Zemel et al., this volume). An\n\n\f\nalternate approach advocates performing Bayesian inference in the log domain such that\nmultiplication of probabilities is turned into addition and division to subtraction, the latter\noperations being easier to implement in standard neuron models [2, 5, 15] (see also the\npapers by Deneve and by Yu and Dayan in this volume). For example, a neural implemen-\ntation of approximate Bayesian inference for a hidden Markov model was investigated in\n[15]. The question of how such an approach could be generalized to spiking neurons and\narbitrary graphical models remained open.\n\nIn this paper, we propose a method for implementing Bayesian belief propagation in net-\nworks of spiking neurons. We show that recurrent networks of noisy integrate-and-fire\nneurons can perform approximate Bayesian inference for dynamic and hierarchical graph-\nical models. In the model, the dynamics of the membrane potential is used to implement\non-line belief propagation in the log domain [15]. A neuron's spiking probability is shown\nto approximate the posterior probability of the preferred state encoded by the neuron, given\npast inputs. We first show that for a visual motion detection task, the spiking probability\nof a direction-selective neuron becomes proportional to the posterior probability of motion\nin the neuron's preferred direction. We then show that in a two-level network, hierarchical\nBayesian inference [9] produces responses that mimic the attentional effects seen in visual\ncortical areas V2 and V4.\n\n\n2 Modeling Networks of Noisy Integrate-and-Fire Neurons\n\n2.1 Integrate-and-Fire Model of Spiking Neurons\n\nWe begin with a recurrently-connected network of integrate-and-fire (IF) neurons receiving\nfeedforward inputs denoted by the vector I. The membrane potential of neuron i changes\naccording to:\n dv\n i = -v w u\n dt i + ij Ij + ij vj (1)\n j j\n\nwhere is the membrane time constant, Ij denotes the synaptic current due to input neuron\nj, wij represents the strength of the synapse from input j to recurrent neuron i, vj de-\nnotes the synaptic current due to recurrent neuron j, and uij represents the corresponding\nsynaptic strength. If vi crosses a threshold T , the neuron fires a spike and vi is reset to the\npotential vreset. Equation 1 can be rewritten in discrete form as:\n\n vi(t + 1) = vi(t) + (-vi(t) + wijIj(t)) + uijvj(t)) (2)\n j j\n\n i.e. vi(t + 1) = wijIj(t) + uijvj(t) (3)\n j j\n\nwhere is the integration rate, uii = 1 + (uii - 1) and for i = j, uij = uij.\n\nA more general integrate-and-fire model that takes into account some of the effects of non-\nlinear filtering in dendrites can be obtained by generalizing Equation 3 as follows:\n\n vi(t + 1) = f wijIj(t) + g uijvj(t) (4)\n j j\n\nwhere f and g model potentially different dendritic filtering functions for feedforward and\nrecurrent inputs.\n\n2.2 Stochastic Spiking in Noisy IF Neurons\n\nTo model the effects of background inputs and the random openings of membrane channels,\none can add a Gaussian white noise term to the right hand side of Equations 3 and 4. This\nmakes the spiking of neurons in the recurrent network stochastic. Plesser and Gerstner [13]\nand Gerstner [4] have shown that under reasonable assumptions, the probability of spiking\n\n\f\nin such noisy neurons can be approximated by an \"escape function\" (or hazard function)\nthat depends only on the distance between the (noise-free) membrane potential vi and the\nthreshold T . Several different escape functions were studied. Of particular interest to the\npresent paper is the following exponential function for spiking probability suggested in [4]\nfor noisy integrate-and-fire networks:\n\n P (neuron i spikes at time t) = ke(vi(t)-T )/c (5)\nwhere k and c are arbitrary constants. We used a model that combines Equations 4 and 5\nto generate spikes, with an absolute refractory period of 1 time step.\n\n\n3 Bayesian Inference using Spiking Neurons\n\n3.1 Inference in a Single-Level Model\n\nWe first consider on-line belief propagation in a single-level dynamic graphical model and\nshow how it can be implemented in spiking networks. The graphical model is shown in\nFigure 1A and corresponds to a classical hidden Markov model. Let (t) represent the\nhidden state of a Markov model at time t with transition probabilities given by P ((t) =\ni|(t - 1) = j) = P (ti|t-1)\n j for i, j = 1 . . . N . Let I(t) be the observable output\ngoverned by the probabilities P (I(t)|(t)). Then, the forward component of the belief\npropagation algorithm [12] prescribes the following \"message\" for state i from time step t\nto t + 1:\n mt,t+1 = P ( P (t )mt-1,t\n i I(t)|ti) i |t-1\n j j (6)\n j\n\nIf m0,1 = P (\n i i) (the prior distribution over states), then it is easy to show using Bayes\nrule that mt,t+1 = P (t\n i i , I(t), . . . , I(1)). If the probabilities are normalized at each update\nstep:\n mt,t+1 = P ( P (t )mt-1,t/nt-1,t\n i I(t)|ti) i |t-1\n j j (7)\n j\n\nwhere nt-1,t = mt-1,t\n j j , then the message becomes equal to the posterior probability\nof the state and current input, given all past inputs:\n\n mt,t+1 = P (t\n i i , I(t)|I(t - 1), . . . , I(1)) (8)\n\n\n3.2 Neural Implementation of the Inference Algorithm\n\nBy comparing the membrane potential equation (Eq. 4) with the on-line belief propaga-\ntion equation (Eq. 7), it is clear that the first equation can implement the second if belief\npropagation is performed in the log domain [15], i.e., if:\n\n vi(t + 1) log mt,t+1\n i (9)\n\n f wijIj(t) = log P (I(t)|ti) (10)\n j\n\n g uijvj(t) = log( P (ti|t-1)mt-1,t/nt-1,t)\n j j (11)\n j j\n\nIn this model, the dendritic filtering functions f and g approximate the logarithm func-\ntion1, the synaptic currents Ij(t) and vj(t) are approximated by the corresponding instan-\ntaneous firing rates, and the recurrent synaptic weights uij encode the transition probabil-\nities P (ti|t-1)\n j . Normalization by nt-1,t is implemented by subtracting log nt-1,t using\ninhibition.\n\n 1An alternative approach, which was also found to yield satisfactory results, is to approximate the\nlog-sum with a linear weighted sum [15], the weights being chosen to minimize the approximation\nerror.\n\n\f\n t\n\n t+1\n\n t t+1\n \n t 2 2\n\n t+1\n t t+1\n t+1\n t \n 1 1\n\n\n\n I(t) I(t+1) I(t)\n I(t) I(t) I(t+1)\n A B C D\n\nFigure 1: Graphical Models and their Neural Implementation. (A) Single-level dynamic graph-\nical model. Each circle represents a node denoting the state variable t which can take on values\n1, . . . , N . (B) Recurrent network for implementing on-line belief propagation for the graphical\nmodel in (A). Each circle represents a neuron encoding a state i. Arrows represent synaptic con-\nnections. The probability distribution over state values at each time step is represented by the entire\npopulation. (C) Two-level dynamic graphical model. (D) Two-level network for implementing on-\nline belief propagation for the graphical model in (C). Arrows represent synaptic connections in the\ndirection pointed by the arrow heads. Lines without arrow heads represent bidirectional connections.\n\n\n\nFinally, since the membrane potential vi(t + 1) is assumed to be proportional to log mt,t+1\n i\n(Equation 9), we have:\n vi(t + 1) = c log mt,t+1 + T\n i (12)\nfor some constants c and T . For noisy integrate-and-fire neurons, we can use Equation 5 to\ncalculate the probability of spiking for each neuron i as:\n\n P (neuron i spikes at time t + 1) e(vi(t+1)-T )/c (13)\n\n = elog mt,t+1\n i = mt,t+1\n i (14)\n\nThus, the probability of spiking (or equivalently, the instantaneous firing rate) for neuron i\nin the recurrent network is directly proportional to the posterior probability of the neuron's\npreferred state and the current input, given all past inputs. Figure 1B illustrates the single-\nlevel recurrent network model that implements the on-line belief propagation equation 7.\n\n\n3.3 Hierarchical Inference\n\nThe model described above can be extended to perform on-line belief propagation and\ninference for arbitrary graphical models. As an example, we describe the implementation\nfor the two-level hierarchical graphical model in Figure 1C.\n\nAs in the case of the 1-level dynamic model, we define the following \"messages\" within a\nparticular level and between levels: mt,t+1\n 1,i (message from state i to other states at level 1\nfrom time step t to t + 1), mt12,i (\"feedforward\" message from state i at level 1 sent to\nlevel 2 at time t), mt,t+1\n 2,i (message from state i to other states at level 2 from time step t\nto t + 1), and mt21,i (\"feedback\" message from state i at level 2 sent to level 1 at time\nt). Each of these messages can be calculated based on an on-line version of loopy belief\npropagation [11] for the multiply connected two-level graphical model in Figure 1C:\n\n mt1 P (t )mt-1,tP (I(t)|t\n 2,i = 1,k|t2,i, t-1\n 1,j 1,j 1,k) (15)\n j k\n\n mt2 P (t )mt-1,t\n 1,i = 2,i|t-1\n 2,j 2,j (16)\n j\n\n\f\n mt,t+1 = P ( P (t )mt\n 1,i I(t)|t1,i) 1,i|t2,j , t-1 (17)\n 1,k 21,j mt-1,t\n 1,k\n j k\n\n mt,t+1 = mt P (t )mt-1,t\n 2,i 12,i 2,i|t-1\n 2,j 2,j (18)\n j\n\n\nNote the similarity between the last equation and the equation for the single-level model\n(Equation 6). The equations above can be implemented in a 2-level hierarchical recurrent\nnetwork of integrate-and-fire neurons in a manner similar to the 1-level case. We assume\nthat neuron i in level 1 encodes 1,i as its preferred state while neuron i in level 2 en-\ncodes 2,i. We also assume specific feedforward and feedback neurons for computing and\nconveying mt12,i and mt21,i respectively.\nTaking the logarithm of both sides of Equations 17 and 18, we obtain equations that can\nbe computed using the membrane potential dynamics of integrate-and-fire neurons (Equa-\ntion 4). Figure 1D illustrates the corresponding two-level hierarchical network. A modifica-\ntion needed to accommodate Equation 17 is to allow bilinear interactions between synaptic\ninputs, which changes Equation 4 to:\n\n vi(t + 1) = f wijIj(t) + g uijkvj(t)xk(t) (19)\n j j k\n\nMultiplicative interactions between synaptic inputs have previously been suggested by sev-\neral authors (e.g., [10]) and potential implementations based on active dendritic interactions\nhave been explored. The model suggested here utilizes these multiplicative interactions\nwithin dendritic branches, in addition to a possible logarithmic transform of the signal be-\nfore it sums with other signals at the soma. Such a model is comparable to recent models\nof dendritic computation (see [6] for more details).\n\n4 Results\n\n4.1 Single-Level Network: Probabilistic Motion Detection and Direction Selectivity\n\nWe first tested the model in a 1D visual motion detection task [15]. A single-level recurrent\nnetwork of 30 neurons was used (see Figure 1B). Figure 2A shows the feedforward weights\nfor neurons 1, . . . , 15: these were recurrently connected to encode transition probabili-\nties biased for rightward motion as shown in Figure 2B. Feedforward weights for neurons\n16, . . . , 30 were identical to Figure 2A but their recurrent connections encoded transition\nprobabilities for leftward motion (see Figure 2B). As seen in Figure 2C, neurons in the\nnetwork exhibited direction selectivity. Furthermore, the spiking probability of neurons\nreflected the posterior probabilities over time of motion direction at a given location (Fig-\nure 2D), suggesting a probabilistic interpretation of direction selective spiking responses in\nvisual cortical areas such as V1 and MT.\n\n4.2 Two-Level Network: Spatial Attention as Hierarchical Bayesian Inference\n\nWe tested the two-level network implementation (Figure 1D) of hierarchical Bayesian in-\nference using a simple attention task previously used in primate studies [17]. In an input\nimage, a vertical or horizontal bar could occur either on the left side, right side, or both\nsides (see Figure 3). The corresponding 2-level generative model consisted of two states\nat level 2 (left or right side) and four states at level 1: vertical left, horizontal left, vertical\nright, horizontal right. Each of these states was encoded by a neuron at the respective level.\nThe feedforward connections at level 1 were chosen to be vertically or horizontally oriented\nGabor filters localized to the left or right side of the image. Since the experiment used static\nimages, the recurrent connections at each level implemented transition probabilities close\nto 1 for the same state and small random values for other states. The transition probabilities\nP (t |t )\n 1,k 2,i, t-1\n 1,j were chosen such that for t2 = left side, the transition probabilities for\n\n\f\n t\n w w\n 1 15 1 15 30\n 0.3\n 0.3 1\n\n\n\n\n 0.2\n 0.2\n\n t+1 15\n\n 0.1\n 0.1\n\n\n\n 00 Rightward Leftward\n 0 5 10 15 20 25 30 30\n 1 10 20 30\n Spatial Location (pixels)\n\n A B\n\n Rightward Motion Leftward Motion Rightward Motion Leftward Motion\n\n 0.5 0.5\n Neuron\n 0.4 0.4\n\n\n\n 0.3 0.3\n\n\n\n 0.2 0.2\n 8 0.1 0.1\n 0 0\n 5 10 15 20 25 5 10 15 20 25 30\n\n\n\n\n\n 0.5 0.5\n\n\n\n 0.4 0.4\n\n\n\n 0.3 0.3\n\n\n\n 0.2 0.2\n 10 0.1 0.1\n\n\n 0 0\n 5 10 15 20 25 5 10 15 20 25 30\n\n\n\n\n\n 0.5 0.5\n\n\n\n 0.4 0.4\n\n\n\n 0.3 0.3\n\n\n\n 0.2 0.2\n\n\n\n 0.1 0.1\n 12\n 0 0\n 5 10 15 20 25 5 10 15 20 25 30\n\n\n\n\n\n C D\n\nFigure 2: Responses from the Single-Level Motion Detection Network. (A) Feedforward weights\nfor neurons 1, . . . , 15 (rightward motion selective neurons). Feedforward weights for neurons\n16, . . . , 30 (leftward motion selective) are identical. (B) Recurrent weights encoding the transition\nprobabilities P (t+1|t\n i j ) for i, j = 1, . . . , 30. Probability values are proportional to pixel brightness.\n(C) Spiking responses of three of the first 15 neurons in the recurrent network (neurons 8, 10, and\n12). As is evident, these neurons have become selective for rightward motion as a consequence of\nthe recurrent connections specified in (B). (D) Posterior probabilities over time of motion direction\n(at a given location) encoded by the three neurons for rightward and leftward motion.\n\n\n\nstates t1 coding for the right side were set to values close to zero (and vice versa, for t2 =\nright side). As shown in Figure 3, the response of a neuron at level 1 that, for example,\nprefers a vertical edge on the right mimics the response of a V4 neuron with and without\nattention (see figure caption for more details). The initial setting of the priors at level 2 is\nthe crucial determinant of attentional modulation in level 1 neurons, suggesting that feed-\nback from higher cortical areas may convey task-specific priors that are integrated into V4\nresponses.\n\n\n5 Discussion and Conclusions\n\nWe have shown that recurrent networks of noisy integrate-and-fire neurons can perform\napproximate Bayesian inference for single- and multi-level dynamic graphical models. The\nmodel suggests a new interpretation of the spiking probability of a neuron in terms of the\nposterior probability of the preferred state encoded by the neuron, given past inputs. We\nillustrated the model using two problems: inference of motion direction in a single-level\nnetwork and hierarchical inference of object identity at an attended visual location in a two-\nlevel network. In the first case, neurons generated direction-selective spikes encoding the\nprobability of motion in a particular direction. In the second case, attentional effects similar\nto those observed in primate cortical areas V2 and V4 emerged as a result of imposing\nappropriate priors at the highest level.\n\nThe results obtained thus far are encouraging but several important questions remain. How\ndoes the approach scale to more realistic graphical models? The two-level model explored\nin this paper assumed stationary objects, resulting in simplified dynamics for the two levels\nin our recurrent network. Experiments are currently underway to test the robustness of the\nproposed model when richer classes of dynamics are introduced at the different levels. An-\n\n\f\n A B\n\n\n\n\n\n 30 30\n\n Ref Att Away Pair Att Away\n 25 25\n\n\n\n 20 20\n\n\n 15\n 15\n\n Spikes/second Spikes/second 10\n 0 5 10 15 20 0 5 10 15 20\n\n Time steps from stim onset Time steps from stim onset\n\n\n C D\n\n\n\n\n\n 30\n\n Pair Att Ref\n 25\n\n\n\n 20\n\n\n\n 15\n\n Spikes/second\n 0 5 10 15 20\n\n Time steps from stim onset\n\n\nFigure 3: Responses from the Two-Level Hierarchical Network. (A) Top panel: Input image\n(lasting the first 15 time steps) containing a vertical bar (\"Reference\") on the right side. Each in-\nput was convolved with a retinal spatiotemporal filter. Middle: Three sample spike trains from the\n1st level neuron whose preferred stimulus was a vertical bar on the right side. Bottom: Posterior\nprobability of a vertical bar (= spiking probability or instantaneous firing rate of the neuron) plotted\nover time. (B) Top panel: An input containing two stimuli (\"Pair\"). Below: Sample spike trains and\nposterior probability for the same neuron as in (A). (C) When \"attention\" is focused on the right side\n(depicted by the white oval) by initializing the prior probability encoded by the 2nd level right-coding\nneuron at a higher value than the left-coding neuron, the firing rate for the 1st level neuron in (A) in-\ncreases to a level comparable to that in (A). (D) Responses from a neuron in primate area V4 without\nattention (top panel, Ref Att Away and Pair Att Away; compare with (A) and (B)) and with attention\n(bottom panel, Pair Att Ref; compare with (C)) (from [17]). Similar responses are seen in V2 [17].\n\n\f\nother open question is how active dendritic processes could support probabilistic integration\nof messages from local, lower-level, and higher-level neurons, as suggested in Section 3.\nWe intend to investigate this question using biophysical (compartmental) models of cortical\nneurons. Finally, how can the feedforward, feedback, and recurrent synaptic weights in the\nnetworks be learned directly from input data? We hope to investigate this question using\nbiologically-plausible approximations to the expectation-maximization (EM) algorithm.\n\nAcknowledgments. This research was supported by grants from ONR, NSF, and the Packard Foun-\ndation. I am grateful to Wolfram Gerstner, Michael Shadlen, Aaron Shon, Eero Simoncelli, and Yair\nWeiss for discussions on topics related to this paper.\n\nReferences\n\n [1] C. H. Anderson and D. C. Van Essen. Neurobiological computational systems. In Computa-\n tional Intelligence: Imitating Life, pages 213222. New York, NY: IEEE Press, 1994.\n\n [2] R. H. S. Carpenter and M. L. L. Williams. Neural computation of log likelihood in control of\n saccadic eye movements. Nature, 377:5962, 1995.\n\n [3] S. Deneve and A. Pouget. Bayesian estimation by interconnected neural networks (abstract no.\n 237.11). Society for Neuroscience Abstracts, 27, 2001.\n\n [4] W. Gerstner. Population dynamics of spiking neurons: Fast transients, asynchronous states, and\n locking. Neural Computation, 12(1):4389, 2000.\n\n [5] J. I. Gold and M. N. Shadlen. Neural computations that underlie decisions about sensory stimuli.\n Trends in Cognitive Sciences, 5(1):1016, 2001.\n\n [6] M. Hausser and B. Mel. Dendrites: bug or feature? Current Opinion in Neurobiology, 13:372\n 383, 2003.\n\n [7] D. C. Knill and W. Richards. Perception as Bayesian Inference. Cambridge, UK: Cambridge\n University Press, 1996.\n\n [8] K. P. Kording and D. Wolpert. Bayesian integration in sensorimotor learning. Nature, 427:244\n 247, 2004.\n\n [9] T. S. Lee and D. Mumford. Hierarchical Bayesian inference in the visual cortex. Journal of the\n Optical Society of America A, 20(7):14341448, 2003.\n\n[10] B. W. Mel. NMDA-based pattern discrimination in a modeled cortical neuron. Neural Compu-\n tation, 4(4):502517, 1992.\n\n[11] K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propagation for approximate inference: An\n empirical study. In Proceedings of UAI (Uncertainty in AI), pages 467475. 1999.\n\n[12] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Mor-\n gan Kaufmann, San Mateo, CA, 1988.\n\n[13] H. E. Plesser and W. Gerstner. Noise in integrate-and-fire neurons: From stochastic input to\n escape rates. Neural Computation, 12(2):367384, 2000.\n\n[14] A. Pouget, P. Dayan, and R. S. Zemel. Inference and computation with population codes.\n Annual Review of Neuroscience, 26:381410, 2003.\n\n[15] R. P. N. Rao. Bayesian computation in recurrent neural circuits. Neural Computation, 16(1):1\n 38, 2004.\n\n[16] R. P. N. Rao, B. A. Olshausen, and M. S. Lewicki. Probabilistic Models of the Brain: Perception\n and Neural Function. Cambridge, MA: MIT Press, 2002.\n\n[17] J. H. Reynolds, L. Chelazzi, and R. Desimone. Competitive mechanisms subserve attention in\n macaque areas V2 and V4. Journal of Neuroscience, 19:17361753, 1999.\n\n[18] M. Sahani and P. Dayan. Doubly distributional population codes: Simultaneous representation\n of uncertainty and multiplicity. Neural Computation, 15:22552279, 2003.\n\n[19] Y. Weiss, E. P. Simoncelli, and E. H. Adelson. Motion illusions as optimal percepts. Nature\n Neuroscience, 5(6):598604, 2002.\n\n[20] R. S. Zemel, P. Dayan, and A. Pouget. Probabilistic interpretation of population codes. Neural\n Computation, 10(2):403430, 1998.\n\n\f\n", "award": [], "sourceid": 2643, "authors": [{"given_name": "Rajesh", "family_name": "Rao", "institution": null}]}