{"title": "Homeostatic plasticity in Bayesian spiking networks as Expectation Maximization with posterior constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 773, "page_last": 781, "abstract": "Recent spiking network models of Bayesian inference and unsupervised learning frequently assume either inputs to arrive in a special format or employ complex computations in neuronal activation functions and synaptic plasticity rules. Here we show in a rigorous mathematical treatment how homeostatic processes, which have previously received little attention in this context, can overcome common theoretical limitations and facilitate the neural implementation and performance of existing models. In particular, we show that homeostatic plasticity can be understood as the enforcement of a 'balancing' posterior constraint during probabilistic inference and learning with Expectation Maximization. We link homeostatic dynamics to the theory of variational inference, and show that nontrivial terms, which typically appear during probabilistic inference in a large class of models, drop out. We demonstrate the feasibility of our approach in a spiking Winner-Take-All architecture of Bayesian inference and learning. Finally, we sketch how the mathematical framework can be extended to richer recurrent network architectures. Altogether, our theory provides a novel perspective on the interplay of homeostatic processes and synaptic plasticity in cortical microcircuits, and points to an essential role of homeostasis during inference and learning in spiking networks.", "full_text": "Homeostatic plasticity in Bayesian spiking networks as\nExpectation Maximization with posterior constraints\n\nInstitute for Theoretical Computer Science, Graz University of Technology\n\nStefan Habenschuss\u2217, Johannes Bill\u2217, Bernhard Nessler\n{habenschuss,bill,nessler}@igi.tugraz.at\n\nAbstract\n\nRecent spiking network models of Bayesian inference and unsupervised learning\nfrequently assume either inputs to arrive in a special format or employ complex\ncomputations in neuronal activation functions and synaptic plasticity rules. Here\nwe show in a rigorous mathematical treatment how homeostatic processes, which\nhave previously received little attention in this context, can overcome common\ntheoretical limitations and facilitate the neural implementation and performance of\nexisting models. In particular, we show that homeostatic plasticity can be under-\nstood as the enforcement of a \u2019balancing\u2019 posterior constraint during probabilis-\ntic inference and learning with Expectation Maximization. We link homeostatic\ndynamics to the theory of variational inference, and show that nontrivial terms,\nwhich typically appear during probabilistic inference in a large class of models,\ndrop out. We demonstrate the feasibility of our approach in a spiking Winner-\nTake-All architecture of Bayesian inference and learning. Finally, we sketch how\nthe mathematical framework can be extended to richer recurrent network archi-\ntectures. Altogether, our theory provides a novel perspective on the interplay of\nhomeostatic processes and synaptic plasticity in cortical microcircuits, and points\nto an essential role of homeostasis during inference and learning in spiking net-\nworks.\n\n1\n\nIntroduction\n\nExperimental \ufb01ndings from neuro- and cognitive sciences have led to the hypothesis that humans\ncreate and maintain an internal model of their environment in neuronal circuitry of the brain during\nlearning and development [1, 2, 3, 4], and employ this model for Bayesian inference in everyday\ncognition [5, 6]. Yet, how these computations are carried out in the brain remains largely unknown.\nA number of innovative models has been proposed recently which demonstrate that in principle,\nspiking networks can carry out quite complex probabilistic inference tasks [7, 8, 9, 10], and even\nlearn to adapt to their inputs near optimally through various forms of plasticity [11, 12, 13, 14, 15].\nStill, in network models for concurrent online inference and learning, most approaches introduce\ndistinct assumptions: Both [12] in a spiking Winner-take-all (WTA) network, and [15] in a rate based\nWTA network, identi\ufb01ed the limitation that inputs must be normalized before being presented to the\nnetwork, in order to circumvent an otherwise nontrivial (and arguably non-local) dependency of the\nintrinsic excitability on all afferent synapses of a neuron. Nessler et al. [12] relied on population\ncoded input spike trains; Keck et al. [15] proposed feed-forward inhibition as a possible neural\nmechanism to achieve this normalization. A theoretically related issue has been encountered by\nDeneve [7, 11], in which inference and learning is realized in a two-state Hidden Markov Model by\na single spiking neuron. Although synaptic learning rules are found to be locally computable, the\nlearning update for intrinsic excitabilities remains intricate. In a different approach, Brea et al. [13]\nhave recently proposed a promising model for Bayes optimal sequence learning in spiking networks\n\n\u2217These authors contributed equally to this work.\n\n1\n\n\fin which a global reward signal, which is computed from the network state and synaptic weights,\nmodulates otherwise purely local learning rules. Also the recent innovative model for variational\nlearning in recurrent spiking networks by Rezende et al. [14] relies on sophisticated updates of\nvariational parameters that complement otherwise local learning rules.\nThere exists great interest in developing Bayesian spiking models which require minimal non-\nstandard neural mechanisms or additional assumptions on the input distribution: such models are\nexpected to foster the analysis of biological circuits from a Bayesian perspective [16], and to pro-\nvide a versatile computational framework for novel neuromorphic hardware [17]. With these goals\nin mind, we introduce here a novel theoretical perspective on homeostatic plasticity in Bayesian\nspiking networks that complements previous approaches by constraining statistical properties of the\nnetwork response rather than the input distribution. In particular we introduce \u2019balancing\u2019 posterior\nconstraints which can be implemented in a purely local manner by the spiking network through a\nsimple rule that is strongly reminiscent of homeostatic intrinsic plasticity in cortex [18, 19]. Im-\nportantly, it turns out that the emerging network dynamics eliminate a particular class of nontrivial\ncomputations that frequently arise in Bayesian spiking networks.\nFirst we develop the mathematical framework for Expectation Maximization (EM) with homeostatic\nposterior constraints in an instructive Winner-Take-all network model of probabilistic inference and\nunsupervised learning. Building upon the theoretical results of [20], we establish a rigorous link\nbetween homeostatic intrinsic plasticity and variational inference. In a second step, we sketch how\nthe framework can be extended to recurrent spiking networks; by introducing posterior constraints\non the correlation structure, we recover local plasticity rules for recurrent synaptic weights.\n\n2 Homeostatic plasticity in WTA circuits as EM with posterior constraints\n\nWe \ufb01rst introduce, as an illustrative and representative example, a generative mixture model\np(z, y|V ) with hidden causes z and binary observed variables y, and a spiking WTA network N\nwhich receives inputs y(t) via synaptic weights V . As shown in [12], such a network N can\nimplement probabilistic inference p(z|y, V ) through its spiking dynamics, and maximum likeli-\nhood learning through local synaptic learning rules (see Figure 1A). The mixture model comprises\nk=1 zk = 1, each specialized on a\n\nK binary and mutually exclusive components zk \u2208 {0, 1}, PK\n\ndifferent N-dimensional input pattern:\n\n(cid:2)(\u03c0ki)yi \u00b7 (1 \u2212 \u03c0ki)1\u2212yi(cid:3)zk\n\np(y, z|V ) =\n\nKY\n\u21d4 log p(y, z|V ) =X\nwith X\n\nk=1\n\ne\n\n\u02c6bkzk\ne\n\nNY\n X\n\ni=1\n\n\u02c6bk = 1 and \u03c0ki = \u03c3(Vki) and Ak =X\n\nVkiyi \u2212 Ak + \u02c6bk\n\nzk\n\nk\n\ni\n\nk\n\ni\n\n!\n\n,\n\nlog(1 + eVki) ,\n\n(1)\n\n(2)\n\n(3)\n\nwhere \u03c3(x) = (1 + exp(\u2212x))\u22121 denotes the logistic function, and \u03c0ki the expected activation of\ninput i under the mixture component k. For simplicity and notational convenience, we will treat the\nprior parameters \u02c6bk as constants throughout the paper. Probabilistic inference of hidden causes zk\nbased on an observed input y can be implemented by a spiking WTA network N of K neurons\nwhich \ufb01re with the instantaneous spiking probability (for \u03b4t \u2192 0),\n\n\u221d p(zk = 1|y, V ) ,\n\n(4)\n\np(zk spikes in [t, t + \u03b4t]) = \u03b4t \u00b7 rnet \u00b7\n\nwith the input potential uk(t) = P\n\ni Vkiyi(t) \u2212 Ak + \u02c6bk. Each WTA neuron k receives spik-\ning inputs yi via synaptic weights Vki and responds with an instantaneous spiking probability\nwhich depends exponentially on its input potential uk in accordance with biological \ufb01ndings [21].\nStochastic winner-take-all (soft-max) competition between the neurons is modeled via divisive\nnormalization (4) [22]. The input is de\ufb01ned as yi(t) = 1 if input neuron i emitted a spike within the\nlast \u03c4 milliseconds, and 0 otherwise, corresponding to a rectangular post-synaptic potential (PSP) of\nlength \u03c4. We de\ufb01ne zk(t) = 1 at spike times t of neuron k and zk(t) = 0 otherwise.\n\neuk(t)P\n\nj euj (t)\n\n2\n\n\fFigure 1: A. Spiking WTA network model. B. Input templates from MNIST database (digits 0-5)\nare presented in random order to the network as spike trains (the input template switches after every\n250ms, black/white pixels are translated to high/low \ufb01ring rates between 20 and 90 Hz). C. Sketch\nof intrinsic homeostatic plasticity maintaining a certain target average activation. D. Homeostatic\nplasticity induces average \ufb01ring rates (blue) close to target values (red). E. After a learning period,\neach WTA neuron has specialized on a particular input motif. F. WTA output spikes during a test\nphase before and after learning. Learning leads to a sparse output code.\n\nIn addition to the spiking input, each neuron\u2019s potential uk features an intrinsic excitability \u2212Ak+\u02c6bk.\nNote that, besides the prior constant \u02c6bk, this excitability depends on the normalizing term Ak, and\nhence on all afferent synaptic weights through (3): WTA neurons which encode strong patterns\nwith high probabilities \u03c0ki require lower intrinsic excitabilities, while neurons with weak patterns\nrequire larger excitabilities. In the presence of synaptic plasticity, i.e., time-varying Vki, it is unclear\nhow biologically realistic neurons could communicate ongoing changes in synaptic weights from\ndistal synaptic sites to the soma. This critical issue was apparently identi\ufb01ed in [12] and [15]; both\npapers circumvent the problem (in similar probabilistic models) by constraining the input y (and\nalso the synaptic weights in [15]) in order to maintain constant and uniform values Ak across all\nWTA neurons.\nHere, we propose a different approach to cope with the nontrivial computations Ak during inference\nand learning in the network. Instead of assuming that the inputs y meet a normalization constraint,\nwe constrain the network response during inference, by applying homeostatic dynamics to the intrin-\nsic excitabilities. This approach turns out to be bene\ufb01cial in the presence of time-varying synaptic\nweights, i.e., during ongoing changes of Vki and Ak. The resulting interplay of intrinsic and synaptic\nplasticity can be best understood from the standard EM lower bound [23],\n\nF (V , q(z|y)) = L(V ) \u2212 h KL (q(z|y)|| p(z|y, V )ip\u2217(y)\n\n\u2192 E-step ,\n\u2192 M-step ,\n\n= h log p(y, z|V )ip\u2217(y)q(z|y) + h H(q(z|y))ip\u2217(y)\n\n(5)\n(6)\nwhere L(V ) = hlog p(y|V )ip\u2217(y) denotes the log-likelihood of the input under the model, KL (\u00b7||\u00b7)\nthe Kullback-Leibler divergence, and H(\u00b7) the entropy. The decomposition holds for arbitrary dis-\nIn hitherto proposed neural implementations of EM [11, 12, 15, 24], the network\ntributions q.\nimplements the current posterior distribution in the E-step, i.e., q = p and KL (q || p) = 0.\nIn\ncontrast, by applying homeostatic plasticity, the network response will be constrained to implement\na variational posterior from a class of \u201chomeostatic\u201d distributions Q: the long-term average acti-\nvation of each WTA neuron zk is constrained to an a priori de\ufb01ned target value. Notably, we will\nsee that the resulting network response q\u2217 describes an optimal variational E-Step in the sense that\nq\u2217(z|y) = arg minq\u2208Q KL (q(z|y)|| p(z|y, V )). Importantly, homeostatic plasticity fully regu-\nlates the intrinsic excitabilities, and as a side effect eliminates the non-local terms Ak in the E-step,\n\n3\n\n\fuk(t) =X\n\ni\n\nVkiyi(t) + bk ,\n\nwhile synaptic plasticity of the weights Vki optimizes the underlying probabilistic model p(y, z|V )\nin the M-step.\nIn summary, the network response implements q\u2217 as the variational E-step, the M-Step can be per-\nformed via gradient ascent on (6) with respect to Vki. As derived in section 2.1, this gives rise to\nthe following temporal dynamics and plasticity rules in the spiking network, which instantiate a\nstochastic version of the variational EM scheme:\n\n\u02d9bk(t) = \u03b7b \u00b7 (rnet \u00b7 mk \u2212 \u03b4(zk(t) \u2212 1)) ,\n\u02d9Vki(t) = \u03b7V \u00b7 \u03b4(zk(t) \u2212 1) \u00b7 (yj(t) \u2212 \u03c3(Vki)) ,\n\n(7)\n\nThe target activations mk \u2208 (0, 1) can be chosen freely with the obvious constraint thatP\n\n(8)\nwhere \u03b4(\u00b7) denotes the Dirac delta function, and \u03b7b, \u03b7V are learning rates (which were kept time-\ninvariant in the simulations with \u03b7b = 10 \u00b7 \u03b7V ). Note that (8) is a spike-timing dependent plasticity\nrule (cf. [12]) and is non-zero only at post-synaptic spike times t, for which zk(t) = 1. The effect of\nthe homeostatic intrinsic plasticity rule (7) is illustrated in Figure 1C: it aims to keep the long-term\naverage activation of each WTA neuron k close to a certain target value mk. More precisely, if rk is\na neuron\u2019s long-term average \ufb01ring rate, then homeostatic plasticity will ensure that rk/rnet \u2248 mk.\nk mk = 1.\nNote that (7) is strongly reminiscent of homeostatic intrinsic plasticity in cortex [18, 19].\nWe have implemented these dynamics in a computer simulation of a WTA spiking network N .\nInputs y(t) were de\ufb01ned by translating handwritten digits 0-5 (Figure 1B) from the MNIST\ndataset [25] into input spike trains. Figure 1D shows that, at the end of a 104s learning period,\nhomeostatic plasticity has indeed achieved that rk \u2248 rnet \u00b7 mk. Figure 1E illustrates the patterns\nlearned by each WTA neuron after this period (shown are the \u03c0ki). Apparently, the WTA neu-\nrons have specialized on patterns of different intensity which correspond to different values of Ak.\nFigure 1F shows the output spiking behavior of the circuit before and after learning in response to a\nset of test patterns. The specialization to different patterns has led to a distinct sparse output code,\nin which any particular test pattern evokes output spikes from only one or two WTA neurons. Note\nthat homeostasis forces all WTA neurons to participate in the competition, and thus prevents neurons\nfrom becoming underactive if their synaptic weights decrease, and from becoming overactive if their\nsynaptic weights increase, much like the original Ak terms (which are nontrivial to compute for the\nnetwork). Indeed, the learned synaptic parameters and the resulting output behavior corresponds to\nwhat would be expected from an optimal learning algorithm for the mixture model (1)-(3).1\n\n2.1 Theory for the WTA model\n\nIn the following, we develop the three theoretical key results for the WTA model (1)-(3):\n\nest to the posterior distribution p(z|y, V ), from a set of \u201chomeostatic\u201d distributions Q.\n\n\u2022 Homeostatic intrinsic plasticity \ufb01nds the network response distribution q\u2217(z|y) \u2208 Q clos-\n\u2022 The interplay of homeostatic and synaptic plasticity can be understood from the perspective\n\u2022 The critical non-local terms Ak de\ufb01ned by (3) drop out of the network dynamics.\n\nof variational EM.\n\nE-step: variational inference with homeostasis\nThe variational distribution q(z|y) we consider for the model (1)-(3) is a 2N \u00b7 K dimensional object.\nSince q describes a conditional probability distribution, it is non-negative and normalized for all y.\nIn addition, we constrain q to be a \u201chomeostatic\u201d distribution q \u2208 Q such that the average activation\nof each hidden variable (neuron) zk equals an a-priori speci\ufb01ed mean activation mk under the input\nstatistics p\u2217(y). This is sketched in Figure 2. Formally we de\ufb01ne the constraint set,\n\nQ = {q : hzkip\u2217(y)q(z|y) = mk,\n\nfor all k = 1 . . . K} ,\n\nmk = 1 .\n\n(9)\n\nwith X\n\n1Without adaptation of intrinsic excitabilities, the network would start performing erroneous inference,\nlearning would reinforce this erroneous behavior, and performance would quickly break down. We have veri\ufb01ed\nthis in simulations for the present WTA model: Consistently across trials, a small subset of WTA neurons\nbecame dominantly active while most neurons remained silent.\n\nk\n\n4\n\n\fFigure 2: A. Homeostatic posterior constraints in the WTA model: Under the variational distri-\nbution q, the average activation of each variable zk must equal mk. B. For each set of synaptic\nweights V there exists a unique assignment of intrinsic excitabilities b, such that the constraints are\nful\ufb01lled. C. Theoretical decomposition of the intrinsic excitability bk into \u2212Ak, \u02c6bk and \u03b2k. D. Dur-\ning variational EM the bk predominantly \u201ctrack\u201d the dynamically changing non-local terms \u2212Ak\n(relative comparison between two WTA neurons from Figure 1).\n\nThe constrained maximization problem q\u2217(z|y) = arg maxq\u2208Q F (V , q(z|y)) can be solved with\nthe help of Lagrange multipliers (cf. [20]). We \ufb01nd that the q\u2217 which maximizes the objective\nfunction F during the E-step (and thus minimizes the KL-divergence to the posterior p(z|y, V ))\nk. Hence, it suf\ufb01ces to\n\nhas the convenient form q\u2217(z|y) \u221d p(z|y, V ) \u00b7 exp(P\n\nkzk) with some \u03b2\u2217\n\nconsider distributions of the form,\n\nVkiyi + \u02c6bk \u2212 Ak + \u03b2k\n\n)) ,\n\n(10)\n\nq\u03b2(z|y) \u221d exp(X\n\nzk(X\n\nk\n\ni\n\nk \u03b2\u2217\n|\n\n{z\n\n=:bk\n\n}\n\nfor the maximization problem. We identify \u03b2k as the variational parameters which remain to be\noptimized. Note that any distribution of this form can be implemented by the spiking network N\nif the intrinsic excitabilities are set to bk = \u2212Ak + \u02c6bk + \u03b2k. The optimal variational distribution\nq\u2217(z|y) = q\u03b2\u2217(z|y) then has \u03b2\n\u2217 = arg max\u03b2 \u03a8(\u03b2), i.e. the variational parameter vector which\nmaximizes the dual [20],\n\n\u03a8(\u03b2) =X\n\n\u03b2kmk \u2212 hlogX\n\np(z|y, V ) exp(X\n\n\u03b2kzk)ip\u2217(y) .\n\n(11)\n\nk\n\nz\n\nk\n\nk = \u2212Ak+\u02c6bk+\u03b2\u2217\n\n\u2217 exists, and thus also the corresponding\nDue to concavity of the dual, a unique global maximizer \u03b2\nk are unique. Hence, the posterior constraint q \u2208 Q\noptimal intrinsic excitabilities b\u2217\ncan be illustrated as in Figure 2B: For each synaptic weight con\ufb01guration V there exists, under\na particular input distribution p\u2217(y), a unique con\ufb01guration of intrinsic excitabilities b such that\nthe resulting network output ful\ufb01lls the homeostatic constraints. The theoretical relation between\nthe intrinsic excitabilities bk, the original nontrivial term \u2212Ak and the variational parameters \u03b2k\nis sketched in Figure 2C. Importantly, while bk is implemented in the network, Ak, \u03b2k and \u02c6bk\nare not explicitly represented in the implementation anymore. Finding the optimal b in the dual\nperspective, i.e. those intrinsic excitabilities which ful\ufb01ll the homeostatic constraints, amounts to\ngradient ascent \u2202\u03b2\u03a8(\u03b2) on the dual, which leads to the following homeostatic learning rule for the\nintrinsic excitabilities,\n\n\u2206bk \u221d \u2202\u03b2k\u03a8(\u03b2) = mk \u2212 hzkip\u2217(y)q(z|y) .\n\n(12)\n\nNote that the intrinsic homeostatic plasticity rule (7) in the network corresponds to a sample-based\nstochastic version of this theoretically derived adaptation mechanism (12). Hence, given enough\ntime, homeostatic plasticity will automatically install near-optimal intrinsic excitabilities b \u2248 b\n\u2217 and\nimplement the correct variational distribution q\u2217 up to stochastic \ufb02uctuations in b due to the non-\nzero learning rate \u03b7b. The non-local terms Ak have entirely dropped out of the network dynamics,\nsince the intrinsic excitabilities bk can be arbitrarily initialized, and are then fully regulated by the\nlocal homeostatic rule, which does not require knowledge of Ak.\nAs a side remark, note that although the variational parameters \u03b2k are not explicitly present\nin the implementation, they can be theoretically recovered from the network at any point, via\n\n5\n\n\fFigure 3: A. Input templates from MNIST dataset (digits 0,3 at a ratio 2:1, and digits 0,3,4 at a ratio\n1:1:1) used during the \ufb01rst and second learning period, respectively. B. Learned patterns at the end\nof each learning period. C. Network performance converges in the course of learning. F is a tight\nlower bound to L. D. Illustration of pattern learning and re-learning dynamics in a 2-D projection in\nthe input space. Each black dot corresponds to the pattern \u03c0ki of one WTA neuron k. Colored dots\nare input samples from the training set (blue/green/red \u2194 digits 0/3/4).\n\n\u03b2k = bk + Ak \u2212 \u02c6bk. Notably, in all our simulations we have consistently found small absolute val-\nues of \u03b2k, corresponding to a small KL-divergence between q\u2217 and p.2 Hence, a major effect of the\nlocal homeostatic plasticity rule during learning is to dynamically track and effectively implement\nthe non-local terms \u2212Ak. This is shown in Figure 2D, in which the relative excitabilities of two\nWTA neurons bk \u2212 bj are plotted against the corresponding non-local Ak \u2212 Aj over the course of\nlearning in the \ufb01rst simulation (Figure 1).\n\nM-step: interplay of synaptic and homeostatic intrinsic plasticity\n\nDuring the M-step, we aim to increase the EM lower bound F in (6) w.r.t. the synaptic parameters V .\nGradient ascent yields,\n\n\u2202VkiF (V , q(z|y)) = h\u2202Vki log p(y, z|V )ip\u2217(y)q(z|y)\n\n(13)\n(14)\nwhere q is the variational distribution determined during the E-step, i.e., we can set q = q\u2217. Note the\nformal correspondence of (14) with the network synaptic learning rule (8). Indeed, if the network\nactivity implements q\u2217, it can be shown easily that the expected update of synaptic weights due to\nthe synaptic plasticity (8) is proportional to (14), and hence implements a stochastic version of the\ntheoretical M-step (cf. [12]).\n\n= h zk \u00b7 (yj \u2212 \u03c3(Vki))ip\u2217(y)q(z|y) ,\n\n2.2 Dynamical properties of the Bayesian spiking network with homeostasis\n\nTo highlight a number of salient dynamical properties emerging from homeostatic plasticity in the\nconsidered WTA model, Figure 3 shows a simulation of the same network N with homeostatic\ndynamics as in Figure 1, only with different input statistics presented to the network, and uniform\nmk = 1\nK . During the \ufb01rst 5000s, different writings of 0\u2019s and 3\u2019s from the MNIST dataset were\npresented, with 0\u2019s occurring twice as often as 3\u2019s. Then the input distribution p\u2217(y) abruptly\nswitched to include also 4\u2019s, with each digit occurring equally often. The following observations\ncan be made: Due to the homeostatic constraint, each neuron responds on average to mk \u00b7 T out of T\npresented inputs. As a consequence, the number of neurons which specialize on a particular digit is\n\n2This is assuming for simplicity uniform prior parameters \u02c6bk. Note that a small KL-divergence is in fact\noften observed during variational EM since F , which contains the negative KL-divergence, is being maximized.\n\n6\n\n\fdirectly proportional to the frequency of occurrence of that digit, i.e. 8:4 and 4:4:4 after the \ufb01rst and\nsecond learning period, respectively (Figure 3B). In general, if uniform target activations mk are\nchosen, output resources are allocated precisely in proportion to input frequency. Figure 3C depicts\nthe time course of the EM lower bound F as well as the average likelihood L (assuming uniform \u02c6bk)\nunder the model during a single simulation run, demonstrating both convergence and tightness of\nthe lower bound. As expected due to the stabilizing dynamics of homeostasis, we found variability\nin performance among different trials to be small (not shown). Figure 3D illustrates the dynamics\nof learning and re-learning of patterns \u03c0ki in a 2D projection of input patterns onto the \ufb01rst two\nprincipal components.\n\n3 Homeostatic plasticity in recurrent spiking networks\n\nThe neural model so far was essentially a feed-forward network, in which every postsynaptic spike\ncan directly be interpreted as one sample of the instantaneous posterior distribution [12]. The lateral\ninhibition served only to ensure the normalization of the posterior. We will now extend the concept\nof homeostatic processes as posterior constraints to the broader class of recurrent networks and\nsketch the utility of the developed framework beyond the regulation of intrinsic excitabilities.\nRecently it was shown in [9, 10] that recurrent networks of stochastically spiking neurons can in\nprinciple carry out probabilistic inference through a sampling process. At every point in time, the\njoint network state z(t) represents one sample of a posterior. However, [9] and [10] did not consider\nunsupervised learning on spiking input streams.\nFor the following considerations, we divide the de\ufb01nition of the probabilistic model in two parts.\nFirst, we de\ufb01ne a Boltzmann distribution,\n\n\u02c6Wkjzkzj)/norm. ,\n\n(15)\n\np(z) = exp(X\n\nX\n\nj6=k\n\n\u02c6bkzk +\n\n1\n2\n\nk\n\np(y|z, V ) = exp(f0(y) +X\n\nk,i\n\nwith \u02c6Wkj = \u02c6Wjk as \u201cprior\u201d for the hidden variables z which will be represented by a recurrently\nconnected network of K spiking neurons. For the purpose of this section, we treat \u02c6bk and \u02c6Wkj as\nconstants. Secondly, we de\ufb01ne a conditional distribution in the exponential-family form [23],\n\nVkizkfi(y) \u2212 A(z, V )) ,\n\n(16)\n\nthat speci\ufb01es the likelihood of observable inputs y, given a certain network state z. This de\ufb01nes the\ngenerative model p(y, z|V ) = p(z) p(y|z, V ).\nWe map this probabilistic model to the spiking network and de\ufb01ne that for every k and every point\nin time t the variable zk(t) has the value 1, if the corresponding neuron has \ufb01red within the time\nwindow (t \u2212 \u03c4, t]. In accordance with the neural sampling theory, in order for a spiking network to\nsample from the correct posterior p(z|y, V ) \u221d p(z) p(y|z, V ) given the input y, each neuron must\ncompute in its membrane potential the log-odd [9],\n\n=X\n|\n\ni\n\n{z\n\nfeedforward drive\n\n|\n\n}\n\n{z\n\n+X\n\nj6=k\n\n}\n\n|\n\n{z\n\n}\n\nVkifi(y)\n\n\u2212Ak(V ) + \u02c6bk\n\n(\u2212Akj(V ) + \u02c6Wkj\n\n)zj \u2212 . . .\n\nintr. excitability\n\nrecurrent weight\n\nuk = log\n\np(zk = 1|z\\k, V )\np(zk = 0|z\\k, V )\n\n(17)\nwhere z\\k = (z1, . . . , zk\u22121, zk+1, . . . zK)T. The Ak, Akj, . . . are given by the decomposition of\nA(z, V ) along the binary combinations of z as,\n\nA(z, V ) = A0(V ) +X\n\nzkAk(V ) +\n\n1\n2\n\nk\n\nX\n\nj6=k\n\nzkzjAkj(V ) + . . .\n\n(18)\n\nNote, that we do not aim at this point to give learning rules for the prior parameters \u02c6bk and \u02c6Wkj. In-\nstead we proceed as in the last section and specify a-priori desired properties of the average network\nresponse under the input distribution p\u2217(y),\nckj = hzkzjip\u2217(y)q(z|y)\n\nmk = hzkip\u2217(y)q(z|y) .\n\n(19)\n\nand\n\n7\n\n\fq\u03b2,\u03c9(z|y) \u221d p(z|y, V ) \u00b7 exp\n\n\u03b2kzk +\n\n1\n2\n\n\u03c9kjzkzj\n\n(20)\n\n\uf8eb\uf8edX\n\nk\n\n\uf8f6\uf8f8 ,\n\nX\n\nj6=k\n\nLet us explore some illustrative con\ufb01gurations for mk and ckj. One obvious choice is closely re-\nlated to the goal of maximizing the entropy of the output code by \ufb01xing hzki to 1\nK and hzkzji\nto hzkihzji = 1\nK2 , thus enforcing second order correlations to be zero. Another intuitive choice\nwould be to set all hzkzji very close to zero, which excludes that two neurons can be active si-\nmultaneously and thus recovers the function of a WTA. It is further conceivable to assign positive\ncorrelation targets to groups of neurons, thereby creating populations with redundant codes. Finally,\nwith a topographical organization of neurons in mind, all three basic ideas sketched above might\nbe combined: one could assign positive correlations to neighboring neurons in order to create lo-\ncal cooperative populations, mutual exclusion at intermediate distance, and zero correlation targets\nbetween distant neurons.\nWith this in mind, we can formulate the goal of learning for the network in the context of EM\nwith posterior constraints: we constrain the E-step such that the average posterior ful\ufb01lls the chosen\ntargets, and adapt the forward weights V in the M-step according to (6). Analogous to the \ufb01rst-order\ncase, the variational solution of the E-step under these constraints takes the form,\n\nwith symmetric \u03c9kl = \u03c9lk as variational parameters. A neural sampling network N with input\nweights Vki will sample from q\u03b2,\u03c9 if the intrinsic excitabilities are set to bk = \u2212Ak + \u02c6bk + \u03b2k, and\nthe symmetric recurrent synaptic weights to Wkj = \u2212Akj + \u02c6Wkj + \u03c9kj. The variational parameters\n\u03b2, \u03c9 (and hence also b, W ) which optimize the dual problem \u03a8(b, \u03c9) are uniquely de\ufb01ned and\ncan be found iteratively via gradient ascent. Analogous to the last section, this yields the intrinsic\nplasticity rule (12) for bk. In addition, we obtain for the recurrent synapses Wkj,\n\n\u2206Wkj \u221d ckj \u2212 hzkzjip\u2217(y)q(z|y) ,\n\nthe network state z, i.e., p(y|z, V ) = Q\n\n(21)\nwhich translates to an anti-Hebbian spike-timing dependent plasticity rule in the network implemen-\ntation.\nFor any concrete instantiation of f0(y), fi(y) and A(z, V ) in (16) it is possible to derive learning\nrules for Vki for the M-step via \u2202VkiF (V , q). Of course not all models entail local synaptic learning\nrules. In particular it might be necessary to assume conditional independence of the inputs y given\ni p(yi|z, V ). Furthermore, in order to ful\ufb01ll the neural\ncomputability condition (17) for neural sampling [9] with a recurrent network of point neurons, it\nmight be necessary to choose A(z, V ) such that terms of order higher than 2 vanish in the decompo-\nsition. This can be shown to hold, for example, in a model with conditionally independent Gaussian\ndistributed inputs yi. It is ongoing work to \ufb01nd further biologically realistic network models in the\nsense of this theory and to assess their computational capabilities through computer experiments.\n\n4 Discussion\n\nComplex and non-local computations, which appear during probabilistic inference and learning, ar-\nguably constitute one of the cardinal challenges in the development of biologically realistic Bayesian\nspiking network models. In this paper we have introduced homeostatic plasticity, which to the best\nof our knowledge had not been considered before in the context of EM in spiking networks, as a\ntheoretically grounded approach to stabilize and facilitate learning in a large class of network mod-\nels. Our theory complements previously proposed neural mechanisms and provides, in particular,\na simple and biologically realistic alternative to the assumptions on the input distribution made in\n[12] and [15]. Indeed, our results challenge the hypothesis of [15] that feedforward inhibition is\ncritical for correctly learning the structure of the data with biologically plausible plasticity rules.\nMore generally, it turns out that the enforcement of a balancing posterior constraint often simpli\ufb01es\ninference in recurrent spiking networks by eliminating nontrivial computations. Our results suggest\na crucial role of homeostatic plasticity in the Bayesian brain: to constrain activity patterns in cortex\nto assist the autonomous optimization of an internal model of the environment.\n\nAcknowledgments. Written under partial support by the European Union - projects #FP7-269921\n(BrainScaleS), #FP7-216593 (SECO), #FP7-237955 (FACETS-ITN), #FP7-248311 (AMARSi),\n#FP7-216886 (PASCAL2) - and the Austrian Science Fund FWF #I753-N23 (PNEUMA).\n\n8\n\n\fReferences\n\n[1] K. P. K\u00a8ording and D. M. Wolpert. Bayesian integration in sensorimotor learning. Nature, 427(6971):244\u2013\n\n247, 2004.\n\n[2] G. Orban, J. Fiser, R.N. Aslin, and M. Lengyel. Bayesian learning of visual chunks by human observers.\n\nProceedings of the National Academy of Sciences, 105(7):2745\u20132750, 2008.\n\n[3] J. Fiser, P. Berkes, G. Orban, and M. Lengyel. Statistically optimal perception and learning: from behavior\n\nto neural representation. Trends in Cogn. Sciences, 14(3):119\u2013130, 2010.\n\n[4] P. Berkes, G. Orban, M. Lengyel, and J. Fiser. Spontaneous cortical activity reveals hallmarks of an\n\noptimal internal model of the environment. Science, 331:83\u201387, 2011.\n\n[5] T. L. Grif\ufb01ths and J. B. Tenenbaum. Optimal predictions in everyday cognition. Psychological Science,\n\n17(9):767\u2013773, 2006.\n\n[6] D. E. Angelaki, Y. Gu, and G. C. DeAngelis. Multisensory integration: psychophysics, neurophysiology\n\nand computation. Current opinion in neurobiology, 19(4):452\u2013458, 2009.\n\n[7] S. Deneve. Bayesian spiking neurons I: Inference. Neural Computation, 20(1):91\u2013117, 2008.\n[8] A. Steimer, W. Maass, and R.J. Douglas. Belief propagation in networks of spiking neurons. Neural\n\nComputation, 21:2502\u20132523, 2009.\n\n[9] L. Buesing, J. Bill, B. Nessler, and W. Maass. Neural dynamics as sampling: A model for stochastic\n\ncomputation in recurrent networks of spiking neurons. PLoS Comput Biol, 7(11):e1002211, 11 2011.\n\n[10] D. Pecevski, L. Buesing, and W. Maass. Probabilistic inference in general graphical models through\n\nsampling in stochastic networks of spiking neurons. PLoS Comput Biol, 7(12), 12 2011.\n\n[11] S. Deneve. Bayesian spiking neurons II: Learning. Neural Computation, 20(1):118\u2013145, 2008.\n[12] B. Nessler, M. Pfeiffer, and W. Maass. STDP enables spiking neurons to detect hidden causes of their\n\ninputs. In Proc. of NIPS 2009, volume 22, pages 1357\u20131365. MIT Press, 2010.\n\n[13] J. Brea, W. Senn, and J.-P. P\ufb01ster. Sequence learning with hidden units in spiking neural networks. In\n\nProc. of NIPS 2011, volume 24, pages 1422\u20131430. MIT Press, 2012.\n\n[14] D. J. Rezende, D. Wierstra, and W. Gerstner. Variational learning for recurrent spiking networks. In Proc.\n\nof NIPS 2011, volume 24, pages 136\u2013144. MIT Press, 2012.\n\n[15] C. Keck, C. Savin, and J. L\u00a8ucke. Feedforward inhibition and synaptic scaling\u2013two sides of the same coin?\n\nPLoS Computational Biology, 8(3):e1002432, 2012.\n\n[16] Joshua B. Tenenbaum, Charles Kemp, Thomas L. Grif\ufb01ths, and Noah D. Goodman. How to grow a mind:\n\nStatistics, structure, and abstraction. Science, 331(6022):1279\u20131285, 2011.\n\n[17] J. Schemmel, D. Br\u00a8uderle, A. Gr\u00a8ubl, M. Hock, K. Meier, and S. Millner. A wafer-scale neuromorphic\n\nhardware system for large-scale neural modeling. Proc. of ISCAS\u201910, pages 1947\u20131950, 2010.\n\n[18] N.S. Desai, L.C. Rutherford, and G.G. Turrigiano. Plasticity in the intrinsic excitability of cortical pyra-\n\nmidal neurons. Nature Neuroscience, 2(6):515, 1999.\n\n[19] A. Watt and N. Desai. Homeostatic plasticity and STDP: keeping a neurons cool in a \ufb02uctuating world.\n\nFrontiers in Synaptic Neuroscience, 2, 2010.\n\n[20] J. Graca, K. Ganchev, and B. Taskar. Expectation maximization and posterior constraints. In Proc. of\n\nNIPS 2007, volume 20. MIT Press, 2008.\n\n[21] R. Jolivet, A. Rauch, HR L\u00a8uscher, and W. Gerstner. Predicting spike timing of neocortical pyramidal\n\nneurons by simple threshold models. Journal of Computational Neuroscience, 21:35\u201349, 2006.\n\n[22] E.P. Simoncelli and D.J. Heeger. A model of neuronal responses in visual area MT. Vision Research,\n\n38(5):743\u2013761, 1998.\n\n[23] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, New York, 2006.\n[24] M. Sato. Fast learning of on-line EM algorithm. Rapport Technique, ATR Human Information Processing\n\nResearch Laboratories, 1999.\n\n[25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.\n\nIn Proceedings of the IEEE, volume 86, pages 2278\u20132324, 11 1998.\n\n9\n\n\f", "award": [], "sourceid": 346, "authors": [{"given_name": "Stefan", "family_name": "Habenschuss", "institution": null}, {"given_name": "Johannes", "family_name": "Bill", "institution": null}, {"given_name": "Bernhard", "family_name": "Nessler", "institution": null}]}