{"title": "Towards a learning-theoretic analysis of spike-timing dependent plasticity", "book": "Advances in Neural Information Processing Systems", "page_first": 2456, "page_last": 2464, "abstract": "This paper suggests a learning-theoretic perspective on how synaptic plasticity benefits global brain functioning. We introduce a model, the selectron, that (i) arises as the fast time constant limit of leaky integrate-and-fire neurons equipped with spiking timing dependent plasticity (STDP) and (ii) is amenable to theoretical analysis. We show that the selectron encodes reward estimates into spikes and that an error bound on spikes is controlled by a spiking margin and the sum of synaptic weights. Moreover, the efficacy of spikes (their usefulness to other reward maximizing selectrons) also depends on total synaptic strength. Finally, based on our analysis, we propose a regularized version of STDP, and show the regularization improves the robustness of neuronal learning when faced with multiple stimuli.", "full_text": "Towards a learning-theoretic analysis of\n\nspike-timing dependent plasticity\n\nDavid Balduzzi\n\nMPI for Intelligent Systems, T\u00a8ubingen, Germany\n\nETH Zurich, Switzerland\n\ndavid.balduzzi@inf.ethz.ch\n\nMPI for Intelligent Systems and MPI for Biological Cybernetics\n\nMichel Besserve\n\nT\u00a8ubingen, Germany\n\nmichel.besserve@tuebingen.mpg.de\n\nAbstract\n\nThis paper suggests a learning-theoretic perspective on how synaptic plasticity\nbene\ufb01ts global brain functioning. We introduce a model, the selectron, that (i)\narises as the fast time constant limit of leaky integrate-and-\ufb01re neurons equipped\nwith spiking timing dependent plasticity (STDP) and (ii) is amenable to theoretical\nanalysis. We show that the selectron encodes reward estimates into spikes and that\nan error bound on spikes is controlled by a spiking margin and the sum of synaptic\nweights. Moreover, the ef\ufb01cacy of spikes (their usefulness to other reward maxi-\nmizing selectrons) also depends on total synaptic strength. Finally, based on our\nanalysis, we propose a regularized version of STDP, and show the regularization\nimproves the robustness of neuronal learning when faced with multiple stimuli.\n\n1\n\nIntroduction\n\nFinding principles underlying learning in neural networks is an important problem for both arti\ufb01cial\nand biological networks. An elegant suggestion is that global objective functions may be optimized\nduring learning [1]. For biological networks however, the currently known neural plasticity mech-\nanisms use a very restricted set of data \u2013 largely consisting of spikes and diffuse neuromodulatory\nsignals. How a global optimization procedure could be implemented at the neuronal (cellular) level\nis thus a dif\ufb01cult problem.\nA successful approach to this question has been Rosenblatt\u2019s perceptron [2] and its extension to\nmultilayer perceptrons via backpropagation [3]. Similarly, (restricted) Boltzmann machines, con-\nstructed from simple stochastic units, have provided a remarkably powerful approach to organizing\ndistributed optimization across many layers [4]. By contrast, although there has been signi\ufb01cant\nprogress in developing and understanding more biologically realistic models of neuronal learn-\ning [5\u201310], these do not match the performance of simpler, more analytically and computationally\ntractable models in learning tasks.\n\nOverview. This paper constructs a bridge from biologically realistic to analytically tractable mod-\nels. The selectron is a model derived from leaky integrate and \ufb01re neurons equipped with spike-\ntiming dependent plasticity that is amenable to learning-theoretic analysis. Our aim is to extract\nsome of the principles implicit in STDP by thoroughly investigating a limit case.\nSection \u00a72 introduces the selectron. We state a constrained reward maximization problem which\nimplies that selectrons encode empirical reward estimates into spikes. Our \ufb01rst result, section \u00a73,\n\n1\n\n\fis that the selectron arises as the fast time constant limit of well-established models of neuronal\nspiking and plasticity, suggesting that cortical neurons may also be encoding reward estimates into\ntheir spiketrains.\nTwo important questions immediately arise. First, what guarantees can be provided on spikes being\nreliable predictors of global (neuromodulatory) outcomes? Second, what guarantees can be provided\non the usefulness of spikes to other neurons? Sections \u00a74 and \u00a75 answer these questions by providing\nan upper bound on a suitably de\ufb01ned 0/1 loss and a lower bound on the ef\ufb01cacy of a selectron\u2019s\nspikes, measured in terms of its contribution to the expected reward of a downstream selectron.\nBoth bounds are controlled by the sum of synaptic weights kwk1, thereby justifying the constraint\nintroduced in \u00a72. Finally, motivated by our analysis, \u00a76 introduces a regularized STDP rule and\nshows that it learns more robustly than classical STDP. \u00a77 concludes the paper. Proofs of theorems\nare provided in the supplementary material.\n\nRelated work. Spike-timing dependent plasticity and its implications for the neural code have\nbeen intensively studied in recent years. The work closest in spirit to our own is Seung\u2019s \u201chedonistic\u201d\nsynapses, which seek to increase average reward [6]. Our work provides guarantees on the \ufb01nite\nsample behavior of a discrete-time analog of hedonistic neurons. Another related line of research\nderives from the information bottleneck method [9,11] which provides an alternate constraint to the\none considered here. An information-theoretic perspective on synaptic homeostasis and metabolic\ncost, complementing the results in this paper, can be found in [12, 13]. Simulations combining\nsynaptic renormalization with burst-STDP can be found in [14].\nImportant aspects of plasticity that we have not considered here are properties speci\ufb01c to continuous-\ntime models, such as STDP\u2019s behavior as a temporal \ufb01lter [15], and also issues related to conver-\ngence [8, 10].\nThe learning-theoretic properties of neural networks have been intensively studied, mostly focusing\non perceptrons, see for example [16]. A non-biologically motivated \u201clarge-margin\u201d analog of the\nperceptron was proposed in [17].\n\n2 The selectron\n\nWe introduce the selectron, which can be considered a biologically motivated adaptation of the\nperceptron, see \u00a73. The mechanism governing whether or not the selectron spikes is a Heaviside\nfunction acting on a weighted sum of synaptic inputs; our contribution is to propose a new reward\nfunction and corresponding learning rule.\nLet us establish some notation. Let X denote the set of N-dimensional {0, 1}-valued vectors form-\ning synaptic inputs to a selectron, and Y = {0, 1} the set of outputs. A selectron spikes according\nto\n\ny = fw(x) := H (w|x #) , where H(z) :=\u21e21\n\n0\n\nif z > 0\nelse\n\n(1)\n\nis the Heaviside function and w is a [0, 1] \u21e2 R valued N-vector specifying the selectron\u2019s synaptic\nweights. Let P (x) denote the probability of input x arising.\nTo model the neuromodulatory system we introduce random variable \u232b : X ! {1, 0, +1}, where\npositive values correspond to desirable outcomes, negative to undesirable and zero to neutral. Let\nP (\u232b|x) denote the probability of the release of neuromodulatory signal subsequent to input x.\nDe\ufb01nition 1. De\ufb01ne reward function\n\nR(x, fw,\u232b ) =\n\n=\u21e2\u232b(x) \u00b7 (w|x #)\n\n0\n\nif y = 1\nelse.\n\n(2)\n\nThe reward consists in three components. The \ufb01rst term is the neuromodulatory signal, which acts as\na supervisor. The second term is the total current w|x minus the threshold #. It is analogous to the\nmargin in support vector machines or boosting algorithms, see section \u00a74 for a precise formulation.\n\nneuromodulators\n\n\u232b(x)\n\n|{z}\n\n\u00b7 (w|x #)\n}\n{z\n|\n\nmargin\n\n\u00b7 fw(x)\n\nselectivity\n\n| {z }\n\n2\n\n\fThe third term gates rewards according to whether or not the selectron spikes. The reward is thus\nselected1: neuromodulatory signals are ignored by the selectron\u2019s reward function when it does not\nspike, enabling specialization.\n\nConstrained reward maximization. The selectron solves the following optimization problem:\n\nmaximize:\n\nw\n\n\u232b(x(i)) \u00b7 (w|x(i) #) \u00b7 fw(x(i))\n\n(3)\n\nnXi=1\n\nbRn :=\n\nsubject to: kwk1 \uf8ff ! for some !> 0.\n\nRemark 1 (spikes encode rewards).\nOptimization problem (3) ensures that selectrons spike for inputs that, on the basis of their empirical\nsample, reliably lead to neuromodulatory rewards. Thus, spikes encode expectations about rewards.\nThe constraint is motivated by the discussion after Theorem 1 and the analysis in \u00a74 and \u00a75. We\npostpone discussion of how to impose the constraint to \u00a76, and focus on reward maximization here.\nThe reward maximization problem cannot be solved analytically in general. However, it is possible\nto use an iterative approach. Although fw(x) is not continuous, the reward function is a continuous\nfunction of w and is differentiable everywhere except for the \u201ccorner\u201d where w|x # = 0. We\ntherefore apply gradient ascent by computing the derivative of (3) with respect to synaptic weights\nto obtain online learning rule\n\nwj = \u21b5 \u00b7 \u232b(x) \u00b7 xj \u00b7 fw(x) =\u21e2\u21b5 \u00b7 \u232b(x)\n\n0\n\nif xj = 1 and y = 1\nelse\n\n(4)\n\nwhere update factor \u21b5 controls the learning rate.\nThe learning rule is selective: regardless of the neuromodulatory signal, synapse wjk is updated\nonly if there is both an input xj = 1 and output spike y = fw(x) = 1.\nThe selectron is not guaranteed to \ufb01nd a global optimum. It is prone to initial condition dependent\nlocal optima because rewards depend on output spikes in learning rule (4). Although this is an\nundesirable property for an isolated learner, it is less important, and perhaps even advantageous, in\nlarge populations where it encourages specialization.\nRemark 2 (unsupervised setting).\nDe\ufb01ne the unsupervised setting by \u232b(x) = 1 for all x. The reward function reduces to R(x, fw) =\n(w|x #) \u00b7 fw(x). Without the constraint synapses will saturate. Imposing the constraint yields a\nmore interesting solution where the selectron \ufb01nds a weight vector summing to ! which balances (i)\nfrequent spikes and (ii) high margins.\nTheorem 1 (Controlling the frequency of spikes).\nAssuming synaptic inputs are i.i.d. Bernoulli variables with P (spike) = p, then\n\nP\u21e3fw(x) = 1\u2318 \uf8ff p \u00b7\u2713kwk1\n# \u25c62\n\n\uf8ff p \u00b7\u21e3 !\n#\u23182\n\n.\n\nThe Bernoulli regime is the discrete-time analog of the homogeneous Poisson setting used to prove\nconvergence of reward-modulated STDP in [8]. Interestingly, in this setting the constraint provides\na lever for controlling (lower bounding) rewards per spike\n\nnreward per spikeo =\n\nP (fw(x) = 1) c1 \u00b7 bR\n\nbR\n\n!2 .\n\nIf inputs are not Bernoulli i.i.d., then P (y = 1) and ! still covary, although the precise relationship is\nmore dif\ufb01cult to quantify. Although i.i.d. inputs are unrealistic, note that recent neurophysiological\nevidence suggests neuronal \ufb01ring \u2013 even of nearby neurons \u2013 is uncorrelated [18].\n\n1The name \u201cselectron\u201d was chosen to emphasize this selective aspect.\n\n3\n\n\f3 Relation to leaky integrate-and-\ufb01re neurons equipped with STDP\n\nThe literature contains an enormous variety of neuronal models, which vary dramatically in so-\nphistication and the extent to which they incorporate the the details of the underlying biochemical\nprocesses. Similarly, there is a large menagerie of models of synaptic plasticity [19]. We consider\ntwo well-established models: Gerstner\u2019s Spike Response Model (SRM) which generalizes leaky\nintegrate-and-\ufb01re neurons [20] and the original spike-timing dependent plasticity learning rule pro-\nposed by Song et al [5], and show that the selectron arises in the fast time constant limit of the two\nmodels.\nFirst let us recall the SRM. Suppose neuron nk last outputted a spike at time tk and receives input\nspikes at times tj from neuron nj. Neuron nk spikes or according to the Heaviside function applied\nto the membrane potential Mw:\n\nwjk \u00b7 \u270f(t tj) at time t tk.\n\nfw(t) = H (Mw(t) #) where Mw(t) = \u2318(t tk) +Xtj\uf8fft\n\u2327s \u2318\u25c6\n\u2327s \u2318 and \u2318(t tk) = #\uf8ffK1e\u21e3 tkt\n\u270f(t tj) = K\uf8ffe\u21e3 tjt\nto the membrane potential for tj \uf8ff t and tk \uf8ff t respectively. Here \u2327m and \u2327s are the membrane and\nsynapse time constants.\nThe original STDP update rule [5] is\n\n\u2327m \u2318 K2\u2713e\u21e3 tkt\n\n\u2327m \u2318 e\u21e3 tkt\n\nInput and output spikes add\n\n\u2327m \u2318 e\u21e3 tjt\n\nwjk =8<:\n\n\u21b5+ \u00b7 e\u21e3 tjtk\n\u2327+ \u2318\n\u21b5 \u00b7 e\u21e3 tktj\n\u2327 \u2318\n\nif tj \uf8ff tk\nelse\n\n(5)\n\nwhere \u2327+ and \u2327 are time constants. STDP potentiates input synapses that spike prior to output\nspikes and depotentiates input synapses that spike subsequent to output spikes.\nTheorem 2 (the selectron is the fast time constant limit of SRM + STDP).\nIn the fast time constant limit, lim\u2327\u2022 ! 0, the SRM transforms into a selectron with\nwjk \u00b7 tk (t).\n\nfw(t) = H\u21e3Mw(t) #\u2318 where Mw = X{j|tjtk}\n\nMoreover, STDP transforms into learning rule (4) in the unsupervised setting with \u232b(x) = 1 for all\nx. Finally, STDP arises as gradient ascent on a reward function whose limit is the unsupervised\nsetting of reward function (2).\n\nTheorem 2 shows that STDP implicitly maximizes a time-discounted analog of the reward function\nin (3). We expect many models of reward-modulated synaptic plasticity to be analytically tractable\nin the fast time constant limit. An important property shared by STDP and the selectron is that\nsynaptic (de)potentiation is gated by output spikes, see \u00a7A.1 for a comparison with the perceptron\nwhich does not gate synaptic learning\n\n4 An error bound\n\nMaximizing reward function (3) implies that selectrons encode reward estimates into their spikes.\nIndeed, it recursively justi\ufb01es incorporating spikes into the reward function via the margin (w|x \n#), which only makes sense if upstream spikes predict reward. However, in a large system where\nestimates pile on top of each other there is a tendency to over\ufb01t, leading to poor generalizations [21].\nIt is therefore crucial to provide guarantees on the quality of spikes as estimators.\nBoosting algorithms, where the outputs of many weak learners are aggregated into a classi\ufb01er [22],\nare remarkably resistant to over\ufb01tting as the number of learners increases [23]. Cortical learning may\nbe analogous to boosting: individual neurons have access to a tiny fraction of the total brain state,\nand so are weak learners; and in the fast time constant limit, neurons are essentially aggregators.\n\n4\n\n\fWe sharpen the analogy using the selectron. As a \ufb01rst step towards understanding how the cortex\ncombats over\ufb01tting, we adapt a theorem developed to explain the effectiveness of boosting [24]. The\ngoal is to show how the margin and constraint on synaptic weights improve error bounds.\nDe\ufb01nition 2. A selectron incurs a 0/1 loss if a spike is followed by negative neuromodulatory\nfeedback\n\nl(x, fw,\u232b ) = fw(x)\u00b7\u232b(x) =\u21e21\n\n0\n\nif y = 1 and \u232b(x) = 1\nelse.\n\n(6)\n\nThe 0/1 loss fails to take the estimates (spikes) of other selectrons into account and is dif\ufb01cult to\noptimize, so we also introduce the hinge loss:\n\nh\uf8ff(x, fw,\u232b ) :=\u21e3\uf8ff (w|x #) \u00b7 \u232b(x)\u2318+ \u00b7 fw(x), where (x)+ :=\u21e2x if x 0\n\nelse.\n\n0\n\n(7)\n\nNote that l \uf8ff h\uf8ff for all \uf8ff 1. Parameter \uf8ff controls the saturation point, beyond which the size of\nthe margin makes no difference to h\uf8ff.\nAn alternate 0/1 loss2 penalizes a selectron if it (i) \ufb01res when it shouldn\u2019t, i.e. when \u232b(x) = 1\nor (ii) does not \ufb01re when it should, i.e. when \u232b(x) = 1. However, since the cortex contains\nmany neurons and spiking is metabolically expensive [25], we propose a conservative loss that only\npenalizes errors of commission (\u201c\ufb01rst, do no harm\u201d) and does not penalize specialization.\nTheorem 3 (spike error bound).\nSuppose each selectron has \uf8ff N synapses. For any selectron nk, let Sk = {nk}[{ nj : nj ! nk}\ndenote a 2-layer feedforward subnetwork. For all \uf8ff 1, with probability at least 1 ,\n+! \u00b7 2B \u00b7p8(N + 1) log(n + 1) + 1\n}\n\nh\uf8ffx(i), fw,\u232b (x(i))\n|\n}\n\ncapacity term\n\nE\u21e5l(x, fw,\u232b )\u21e4\n|\n}\n\n{z\n\npn\n\nhinge loss\n\n0/1 loss\n\n\uf8ff\n\n{z\n|\nwhere B = \uf8ff + ! #.\n\n1\n\nnXi\n{z\n+ 2B \u00b7 s 2 log 2\n}\n{z\n\n|\n\ncon\ufb01dence term\n\nn\n\n\n\nRemark 3 (theoretical justi\ufb01cation for maximizing margin and constraining kwk1).\nThe theorem shows how subsets of distributed systems can avoid over\ufb01tting. First, it demonstrates\nthe importance of maximizing the margin (i.e. the empirical reward). Second, it shows the capacity\nterm depends on the number of synapses N and the constraint ! on synaptic weights, rather than\nthe capacity of Sk \u2013 which can be very large.\n\nThe hinge loss is dif\ufb01cult to optimize directly since gating with output spikes fw(x) renders it\ndiscontinuous. However, in the Bernoulli regime, Theorem 1 implies the bound in Theorem 3 can\nbe rewritten as\n\n!2\n\n#2 bRnx(i), fw,\u232b (x(i)) + ! \u00b7capacity term +con\ufb01dence term (8)\n\nE\u21e5l(x, fw,\u232b )\u21e4 \uf8ff p\uf8ff\nand so ! again provides the lever required to control the 0/1 loss. The constraint kwk1 \uf8ff ! is best\nimposed of\ufb02ine, see \u00a76.\n5 A bound on the ef\ufb01cacy of inter-neuronal communication\n\nEven if a neuron\u2019s spikes perfectly predict positive neuromodulatory signals, the spikes only matter\nto the extent they affect other neurons in cortex. Spikes are produced for neurons by neurons. It is\ntherefore crucial to provide guarantees on the usefulness of spikes.\nIn this section we quantify the effect of one selectron\u2019s spikes on another selectron\u2019s expected re-\nward. We demonstrate a lower bound on ef\ufb01cacy and discuss its consequences.\n\n2See \u00a7A.5 for an error bound.\n\n5\n\n\fDe\ufb01nition 3. The ef\ufb01cacy of spikes from selectron nj on selectron nk is\n\n:= E[Rk|xj = 1] E[Rk|xj = 0]\n\n,\n\nRk\nxj\n\n1 0\n\ni.e. the expected contribution of spikes from selectron nj to selectron nk\u2019s expected reward, relative\nto not spiking. The notation is intended to suggest an analogy with differentiation \u2013 the in\ufb01nitesimal\ndifference made by spikes on a single synapse.\nEf\ufb01cacy is zero if E[Rk|xj = 1] = E[Rk|xj = 0]. In other words, if spikes from nj make no\ndifference to the expected reward of nk.\nThe following theorem relies on the assumption that the average contribution of neuromodulators is\nhigher after nj spikes than after it does not spike (i.e. upstream spikes predict reward), see \u00a7A.6 for\nprecise statement. When the assumption is false the synapse wjk should be pruned.\nTheorem 4 (spike ef\ufb01cacy bound).\nLet pj := E[Y j] denote the frequency of spikes from neuron nj. The ef\ufb01cacy of nj\u2019s spikes on nk is\nlower bounded by\n\n2EhY jY k \u00b7(wCj)|x #i\n}\n|\ni := wi if i 6= j and 0 if i = j.\n\nwj \u00b7 E[Y jY k]\n}\n|\nwhere c2 is described in \u00a7A.6 and wCj\nThe ef\ufb01cacy guarantee is interpreted as follows. First, the guarantee improves as co-spiking by nj\nand nk increases. However, the denominators imply that increasing the frequency of nj\u2019s spikes\nworsens the guarantee, insofar as nj is not correlated with nk. Similarly, from the third term,\nincreasing nk\u2019s spikes worsens the guarantee if they do not correlate with nj.\nAn immediate corollary of Theorem 4 is that Hebbian learning rules, such as STDP and the selectron\nlearning rule (4), improve the ef\ufb01cacy of spikes. However, it also shows that naively increasing the\nfrequency of spikes carries a cost. Neurons therefore face a tradeoff. In fact, in the Bernoulli regime,\nTheorem 1 implies (9) can be rewritten as\n\nxj|{z}\n\nwj-weighted co-spike frequency\n\nEhY k \u00b7(wCj)|x #i\n}\n|\n\nnk spike frequency\n\n1 pj\n\n{z\n\npj(1 pj)\n\nco-spike frequency\n\npj\n\n{z\n\n{z\n\nc2 \u00b7\n\nef\ufb01cacy\n\nRk\n\n\n\n(9)\n\n\n\n+\n\nRk\nxj \n\nc2 \u00b7\n\nwj\np \u00b7 E[Y jY k] +\n\n2\n\np(1 p)EhY jY k \u00b7(wCj)|x #i \n\np \u00b7 !2 \u00b7 (! #)\n\n(1 p)#2\n\n,\n\n(10)\n\nso the constraint ! on synaptic strength can be used as a lever to improve guarantees on ef\ufb01cacy.\nRemark 4 (ef\ufb01cacy improved by pruning weak synapses).\nThe 1st term in (9) suggests that pruning weak synapses increases the ef\ufb01cacy of spikes, and so may\naid learning in populations of selectrons or neurons.\n\n6 Experiments\n\nCortical neurons are constantly exposed to different input patterns as organisms engage in different\nactivities. It is therefore important that what neurons learn is robust to changing inputs [26, 27]. In\nthis section, as proof of principle, we investigate a simple tweak of classical STDP involving of\ufb02ine\nregularization. We show that it improves robustness when neurons are exposed to more than one\npattern.\nObserve that regularizing optimization problem (3) yields\n\nmaximize:\n\nw\n\nnXi=1\n\nRx(i), fw,\u232b (x(i)) \n\n\n2\n\n(kwk1 !)2\n\nlearning rule: wj = \u21b5 \u00b7 \u232b(x) \u00b7 xj \u00b7 fw(x) \u00b7kwk1 ! \u00b7 wj\n\n(12)\nincorporates synaptic renormalization directly into the update. However, (12) requires continuously\nre-evaluating the sum of synaptic weights. We therefore decouple learning into an online reward\nmaximization phase and an of\ufb02ine regularization phase which resets the synaptic weights.\n\n(11)\n\n6\n\n\fA similar decoupling may occur in cortex. It has recently been proposed that a function of NREM\nsleep may be to regulate synaptic weights [28]. Indeed, neurophysiological evidence suggests that\naverage cortical \ufb01ring rates increase during wakefulness and decrease during sleep, possibly re\ufb02ect-\ning synaptic strengths [29, 30]. Experimental evidence also points to a net increase in dendritic\nspines (synapses) during waking and a net decrease during sleep [31].\n\nSetup. We trained a neuron on a random input pattern for 10s to 87% accuracy with regularized\nSTDP. See \u00a7A.7 for details on the structure of inputs. We then performed 700 trials (350 classical\nand 350 regularized) exposing the neuron to a new pattern for 20 seconds and observed performance\nunder classical and regularized STDP.\n\nSRM neurons with classical STDP. We used Gerstner\u2019s SRM model, recall \u00a73, with parameters\nchosen to exactly coincide with [32]: \u2327m = 10, \u2327s = 2.5, K = 2.2, K1 = 2, K2 = 4 and\n4 #synapses. STDP was implemented via (5) with parameters \u21b5+ = 0.03125, \u2327+ = 16.8,\n# = 1\n\u21b5 = 0.85\u21b5+ and \u2327 = 33.7 also taken from [32]. Synaptic weights were clipped to fall in [0, 1].\nRegularized STDP consists of a small tweak of classical STDP in the online phase, and an addi-\ntional of\ufb02ine regularization phase:\n\nimplementation to \u21b5 = 0.75\u21b5+.\n\n\u2022 Online. In the online phase, reduce the depotentiation bias from 0.85\u21b5+ in the classical\n\u2022 Of\ufb02ine. In the of\ufb02ine phase, modify synapses once per second according to\n\nwj =\u21e2 \u00b7 3\n\n2 wj \u00b7 (! s)\n\n \u00b7 (! s)\n\nif !< s\nelse,\n\n(13)\n\nwhere s is output spikes per second, ! = 5Hz is the target rate and update factor = 0.6.\nThe of\ufb02ine update rule is \ufb01ring rate, and not spike, dependent.\n\nClassical STDP has a depotentiation bias to prevent runaway potentiation feedback loops leading to\nseizures [5]. Since synapses are frequently renormalized of\ufb02ine we incorporate a weak exploratory\n(potentiation) bias during the online phase which helps avoid local minima.3 This is in line with\nexperimental evidence showing increased cortical activity during waking [30].\nSince computing the sum of synaptic weights is non-physiological, we draw on Theorem 1 and\nuse the neuron\u2019s \ufb01ring rate when responding to uncorrelated inputs as a proxy for kwk1. Thus,\nin the of\ufb02ine phase, synapses receive inputs generated as in the online phase but without repeated\npatterns. Note that (12) has a larger pruning effect on stronger synapses, discouraging specialization.\nMotivated by Remark 4, we introduce bias ( 3\n2 wj) in the of\ufb02ine phase to ensure weaker synapses\nare downscaled more than strong synapses. For example, a synapse with wi = 0.5 is downscaled\nby twice as much as a synapse with weight wj = 1.0.\nRegularized STDP alternates between 2 seconds online and 4 seconds of\ufb02ine, which suf\ufb01ces to\nrenormalize synaptic strengths. The frequency of the of\ufb02ine phase could be reduced by decreas-\ning the update factors \u21b5\u00b1, presenting stimuli less frequently (than 7 times per second), or adding\ninhibitory neurons to the system.\n\nResults. A summary of results is presented in the table below: accuracy quanti\ufb01es the fraction\nof spikes that co-occur with each pattern. Regularized STDP outperforms classical STDP on both\npatterns on average. It should be noted that regularized neurons were not only online for 20 seconds\nbut also of\ufb02ine \u2013 and exposed to Poisson noise \u2013 for 40 seconds. Interestingly, exposure to Poisson\nnoise improves performance.\n\nAlgorithm\n\nAccuracy\n\nPattern 1\n\nPattern 2\n\nClassical\nRegularized\n\n54%\n59%\n\n39%\n48%\n\n3The input stream contains a repeated pattern, so there is a potentiation bias in practice even though the net\n\nintegral of STDP in the online phase is negative.\n\n7\n\n\fS\nL\nA\nR\nT\n\nI\n\n160\n100\n\n0\n100\n\nS\nL\nA\nR\nT\n\nI\n\n60\n30\n0\n\n100\n\n \n\n \n\n1\n#\nN\nO\nY\nC\nA\nR\nU\nC\nC\nA\n\n80\n\n60\n\n40\n\n20\n\n0\n\n80\n\n60\n\n40\n\n20\n\n \n\n \n\n1\n#\nN\nO\nY\nC\nA\nR\nU\nC\nC\nA\n\n40\n\n60\n\n100\n\n0\n\n80 140\n\n80\n20\nACCURACY ON #2\n(a) Classical STDP\n(b) Regularized STDP\nFigure 1: Accuracy after 20 seconds of exposure to a novel pattern.\n\n20\nACCURACY ON #2\n\n80 100\n\nTRIALS\n\n40\n\n60\n\n0\n\n0\n\n40 70\n\nTRIALS\n\nFig. 1 provides a more detailed analysis. Each panel shows a 2D-histogram (darker shades of gray\ncorrespond to more trials) plotting accuracies on both patterns simultaneously, and two 1D his-\ntograms plotting accuracies on the two patterns separately. The 1D histogram for regularized STDP\nshows a unimodal distribution for pattern #2, with most of the mass over accuracies of 50-90%. For\npattern #1, which has been \u201cunlearned\u201d for twice as long as the training period, most of the mass is\nover accuracies of 50% to 90%, with a signi\ufb01cant fraction \u201cunlearnt\u201d. By contrast, classical STDP\nexhibits extremely brittle behavior. It completely unlearns the original pattern in about half the trials,\nand also fails to learn the new pattern in most of the trials.\nThus, as suggested by our analysis, introducing a regularization both improves the robustness of\nSTDP and enables an exploratory bias by preventing runaway feedback leading to epileptic seizures.\n\n7 Discussion\n\nThe selectron provides a bridge between a particular model of spiking neurons \u2013 the Spike Re-\nsponse Model [20] with the original spike-timing dependent plasticity rule [5] \u2013 and models that\nare amenable to learning-theoretic analysis. Our hope is that the selectron and related models lead\nto an improved understanding of the principles underlying learning in cortex. It remains to be seen\nwhether other STDP-based models also have tractable discrete-time analogs.\nThe selectron is an interesting model in its own right: it embeds reward estimates into spikes and\nmaximizes a margin that improves error bounds. It imposes a constraint on synaptic weights that:\nconcentrates rewards/spike, tightens error bounds and improves guarantees on spiking ef\ufb01cacy. Al-\nthough the analysis does not apply directly to continuous-time models, experiments show that a\ntweak inspired by our analysis improves the performance of a more realistic model. An impor-\ntant avenue for future research is investigating the role of feedback in cortex, speci\ufb01cally NMDA\nsynapses, which may have interesting learning-theoretic implications.\nAcknowledgements. We thank Timoth\u00b4ee Masquelier for generously sharing his source code [32]\nand Samory Kpotufe for useful discussions.\n\nReferences\n[1] Friston K, Kilner J, Harrison L: A free energy principle for the brain. J. Phys. Paris 2006, 100:70\u201387.\n[2] Rosenblatt F: The perceptron: a probabilistic model for information storage and organization in the\n\nbrain. Psychol Rev 1958, 65(6):386\u2013408.\n\n[3] Rumelhart DE, Hinton GE, Williams RJ: Learning representations by back-propagating errors. Na-\n\nture 1986, 323:533\u2013536.\n\n[4] Hinton G, Osindero S, Teh YW: A Fast Learning Algorithm for Deep Belief Nets. Neural Computation\n\n2006, 18:1527\u20131554.\n\n8\n\n\f[5] Song S, Miller KD, Abbott LF: Competitive Hebbian learning through spike-timing-dependent\n\nsynaptic plasticity. Nature Neuroscience 2000, 3(9).\n\n[6] Seung HS: Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Trans-\n\nmission. Neuron 2003, 40(1063-1073).\n\n[7] Bohte SM, Mozer MC: Reducing spike train variability: A computational theory of spike-timing\n\ndependent plasticity. In Advances in Neural Information Processing Systems (NIPS) 2005.\n\n[8] Legenstein R, Maass W: A criterion for the convergence of learning with spike timing dependent\n\nplasticity. In Advances in Neural Information Processing Systems (NIPS) 2006.\n\n[9] Buesing L, Maass W: Simpli\ufb01ed rules and theoretical analysis for information bottleneck optimiza-\n\ntion and PCA with spiking neurons. In Adv in Neural Information Processing Systems (NIPS) 2007.\n\n[10] Legenstein R, Pecevski D, Maass W: Theoretical analysis of learning with reward-modulated spike-\n\ntiming-dependent plasticity. In Advances in Neural Information Processing Systems (NIPS) 2008.\n\n[11] Tishby N, Pereira F, Bialek W: The information bottleneck method. In Proc. of the 37-th Annual Aller-\n\nton Conference on Communication, Control and Computing. Edited by Hajek B, Sreenivas R 1999.\n\n[12] Balduzzi D, Tononi G: What can neurons do for their brain? Communicate selectivity with spikes.\n\nTo appear in Theory in Biosciences 2012.\n\n[13] Balduzzi D, Ortega PA, Besserve M: Metabolic cost as an organizing principle for cooperative learn-\n\ning. Under review, 2012.\n\n[14] Nere A, Olcese U, Balduzzi D, Tononi G: A neuromorphic architecture for object recognition and\n\nmotion anticipation using burst-STDP. PLoS One 2012, 7(5):e36958.\n\n[15] Schmiedt J, Albers C, Pawelzik K: Spike timing-dependent plasticity as dynamic \ufb01lter. In Advances\n\nin Neural Information Processing Systems (NIPS) 2010.\n\n[16] Anthony M, Bartlett PL: Neural Network Learning: Theoretical Foundations. Cambridge Univ Press\n\n1999.\n\n[17] Freund Y, Schapire RE: Large Margin Classi\ufb01cation Using the Perceptron Algorithm. Machine Learn-\n\ning 1999, 37(3):277\u2013296.\n\n[18] Ecker AS, Berens P, Keliris GA, Bethge M, Logothetis NK, Tolias AS: Decorrelated neuronal \ufb01ring in\n\ncortical microcircuits. Science 2010, 327(5965):584\u20137.\n\n[19] Dan Y, Poo MM: Spike timing-dependent plasticity of neural circuits. Neuron 2004, 44:23\u201330.\n[20] Gerstner W: Time structure of the activity in neural network models. Phys. Rev. E 1995, 51:738\u2013758.\n[21] Geman S, Bienenstock E, Doursat R: Neural Networks and the Bias/Variance Dilemma. Neural Comp\n\n1992, 4:1\u201358.\n\n[22] Freund Y, Schapire RE: Experiments with a New Boosting Algorithm. In Machine Learning: Proceed-\n\nings of the Thirteenth International Conference 1996.\n\n[23] Schapire RE, Freund Y, Bartlett P, Lee WS: Boosting the Margin: A New Explanation for the Effec-\n\ntiveness of Voting Methods. The Annals of Statistics 1998, 26(5).\n\n[24] Boucheron S, Bousquet O, Lugosi G: Theory of classi\ufb01cation: A survey of some recent advances.\n\nESAIM: PS 2005, 9:323\u2013375.\n\n[25] Hasenstaub A, Otte S, Callaway E, Sejnowski TJ: Metabolic cost as a unifying principle governing\n\nneuronal biophysics. Proc Natl Acad Sci U S A 2010, 107(27):12329\u201334.\n\n[26] Fusi S, Drew P, Abbott L: Cascade Models of Synaptically Stored Memories. Neuron 2005, 45:599\u2013\n\n611.\n\n[27] Fusi S, Abbott L: Limits on the memory storage capacity of bounded synapses. Nature Neuroscience\n\n2007, 10(4):485\u2013493.\n\n[28] Tononi G, Cirelli C: Sleep function and synaptic homeostasis. Sleep Med Rev 2006, 10:49\u201362.\n[29] Vyazovskiy VV, Cirelli C, P\ufb01ster-Genskow M, Faraguna U, Tononi G: Molecular and electrophysio-\nlogical evidence for net synaptic potentiation in wake and depression in sleep. Nat Neurosci 2008,\n11(2):200\u20138.\n\n[30] Vyazovskiy VV, Olcese U, Lazimy Y, Faraguna U, Esser SK, Williams JC, Cirelli C, Tononi G: Cortical\n\n\ufb01ring and sleep homeostasis. Neuron 2009, 63(6):865\u201378.\n\n[31] Maret S, Faraguna U, Nelson AB, Cirelli C, Tononi G: Sleep and waking modulate spine turnover in\n\nthe adolescent mouse cortex. Nat Neurosci 2011, 14(11):1418\u20131420.\n\n[32] Masquelier T, Guyonneau, R and Thorpe SJ: Competitive STDP-Based Spike Pattern Learning. Neural\n\nComputation 2009, 21(5):1259\u20131276.\n\n[33] Roelfsema PR, van Ooyen A: Attention-gated reinforcement learning of internal representations for\n\nclassi\ufb01cation. Neural Comput 2005, 17(10):2176\u20132214.\n\n9\n\n\f", "award": [], "sourceid": 1182, "authors": [{"given_name": "David", "family_name": "Balduzzi", "institution": null}, {"given_name": "Michel", "family_name": "Besserve", "institution": null}]}