{"title": "Enforcing balance allows local supervised learning in spiking recurrent networks", "book": "Advances in Neural Information Processing Systems", "page_first": 982, "page_last": 990, "abstract": "To predict sensory inputs or control motor trajectories, the brain must constantly learn temporal dynamics based on error feedback. However, it remains unclear how such supervised learning is implemented in biological neural networks. Learning in recurrent spiking networks is notoriously difficult because local changes in connectivity may have an unpredictable effect on the global dynamics. The most commonly used learning rules, such as temporal back-propagation, are not local and thus not biologically plausible. Furthermore, reproducing the Poisson-like statistics of neural responses requires the use of networks with balanced excitation and inhibition. Such balance is easily destroyed during learning. Using a top-down approach, we show how networks of integrate-and-fire neurons can learn arbitrary linear dynamical systems by feeding back their error as a feed-forward input. The network uses two types of recurrent connections: fast and slow. The fast connections learn to balance excitation and inhibition using a voltage-based plasticity rule. The slow connections are trained to minimize the error feedback using a current-based Hebbian learning rule. Importantly, the balance maintained by fast connections is crucial to ensure that global error signals are available locally in each neuron, in turn resulting in a local learning rule for the slow connections. This demonstrates that spiking networks can learn complex dynamics using purely local learning rules, using E/I balance as the key rather than an additional constraint. The resulting network implements a given function within the predictive coding scheme, with minimal dimensions and activity.", "full_text": "Enforcing balance allows local supervised learning in\n\nspiking recurrent networks\n\nRalph Bourdoukan\n\nGroup For Neural Theory, ENS Paris\n\nRue dUlm, 29, Paris, France\n\nralph.bourdoukan@gmail.com\n\nSophie Deneve\n\nGroup For Neural Theory, ENS Paris\n\nRue dUlm, 29, Paris, France\nsophie.deneve@ens.fr\n\nAbstract\n\nTo predict sensory inputs or control motor trajectories, the brain must con-\nstantly learn temporal dynamics based on error feedback. However, it remains\nunclear how such supervised learning is implemented in biological neural net-\nworks. Learning in recurrent spiking networks is notoriously dif\ufb01cult because lo-\ncal changes in connectivity may have an unpredictable effect on the global dynam-\nics. The most commonly used learning rules, such as temporal back-propagation,\nare not local and thus not biologically plausible. Furthermore, reproducing the\nPoisson-like statistics of neural responses requires the use of networks with bal-\nanced excitation and inhibition. Such balance is easily destroyed during learning.\nUsing a top-down approach, we show how networks of integrate-and-\ufb01re neu-\nrons can learn arbitrary linear dynamical systems by feeding back their error as\na feed-forward input. The network uses two types of recurrent connections: fast\nand slow. The fast connections learn to balance excitation and inhibition using a\nvoltage-based plasticity rule. The slow connections are trained to minimize the\nerror feedback using a current-based Hebbian learning rule. Importantly, the bal-\nance maintained by fast connections is crucial to ensure that global error signals\nare available locally in each neuron, in turn resulting in a local learning rule for\nthe slow connections. This demonstrates that spiking networks can learn complex\ndynamics using purely local learning rules, using E/I balance as the key rather\nthan an additional constraint. The resulting network implements a given function\nwithin the predictive coding scheme, with minimal dimensions and activity.\n\nThe brain constantly predicts relevant sensory inputs or motor trajectories. For example, there is\nevidence that neural circuits mimic the dynamics of motor effectors using internal models [1]. If the\ndynamics of the predicted sensory and motor variables change in time, these models may become\nfalse [2] and therefore need to be readjusted through learning based on error feedback.\nFrom a modeling perspective, supervised learning in recurrent networks faces many challenges.\nEarlier models have succeeded in learning useful functions at the cost of non local learning rules\nthat are biologically implausible [3, 4]. More recent models based on reservoir computing [5\u20137]\ntransfer the learning from the recurrent network (with now \u201crandom\u201d, \ufb01xed weights) to the readout\nweights. Using this simple scheme, the network can learn to generate complex patterns. However,\nthe majority of these models use abstract rate units and are yet to be translated into more realistic\nspiking networks. Moreover, to provide a suf\ufb01ciently large reservoir, the recurrent network needs\nto be large, balanced and have a rich and high dimensional dynamics. This typically generates far\nmore activity than strictly required, a redundancy that can be seen as inef\ufb01cient.\nOn the other hand, supervised learning models involving spiking neurons have essentially concen-\ntrated on the learning of precise spike sequences [8\u201310]. With some exceptions [10,11] these models\nuse feed-forward architectures [12]. In a balanced recurrent network with asynchronous, irregular\nand highly variable spike trains, such as those found in cortex, the activity has been shown to be\n\n1\n\n\fchaotic [13, 14]. This leads to spike timing being intrinsically unreliable, rendering a representation\nof the trajectory by precise spike sequences problematic. Moreover, many con\ufb01gurations of spike\ntimes may achieve the same goal [15].\nHere we derive two local learning rules that drive a network of leaky integrate-and-\ufb01re (LIF) neu-\nrons into implementing a desired linear dynamical system. The network is trained to minimize the\nobjective (cid:107)x(t) \u2212 \u02c6x(t)(cid:107)2 + H(r), Where \u02c6x(t) is the output of the network decoded from the spikes,\nx(t) is the desired output, and H(r) is a cost associated with \ufb01ring (penalizing unnecessary activ-\nity, and thus enforcing ef\ufb01ciency). The dynamical system is linear, \u02d9x = Ax + c, with A being\na constant matrix and c a time varying command signal. We \ufb01rst study the learning of an autoen-\ncoder, i.e., a network where the desired output is fed to the network as a feedforward input. The\nautoencoder learns to represent its inputs as precisely as possible in an unsupervised fashion. After\nlearning, each unit represents the encoding error made by the entire network. We then show that\nthe network can learn more complex computations if slower recurrent connections are added to the\nautoencoder. Thus, it receives the command c along with an error signal and learns to generate the\noutput \u02c6x with the desired temporal dynamics. Despite the spike-based nature of the representation\nand of the plasticity rules, the learning does not enforce precise spike timing trajectories but, on the\ncontrary, enforces irregular and highly variable spike trains.\n\n1 Learning a balance : global becomes local\n\nUsing a predictive coding strategy [15\u201317], we build a network that learns to accurately represent its\ninputs while expending the least amount of spikes. To introduce the learning rules and explain how\nthey work, we start by describing the optimized network (after learning).\nLet us \ufb01rst consider a set of unconnected integrate-and-\ufb01re neurons receiving shared input signals\nx = (xi) through feedforward connections F = (Fji). We assume that the network performs predic-\ntive coding, i.e. it subtracts from each of these input signals an estimate \u02c6x obtained by decoding the\n\noutput spike trains (\ufb01g 1A). Speci\ufb01cally, \u02c6xi =(cid:80) Dijrj, where D = (Dij) are the decoding weights\nand r = (rj) are the \ufb01ltered spike trains which obey \u02d9rj = \u2212\u03bbrj + oj with oj(t) =(cid:80)\n\nk \u03b4(t \u2212 tk\nj )\nbeing the spike train of neuron j and tk\nj are the times of its spikes. Note that such an autoencoder\nautomatically maintains an accurate representation, because it responds to any encoding error larger\nthan the \ufb01ring threshold by increasing its response and in turn decreasing the error. It is also ef\ufb01-\ncient, because neurons respond only when input and decoded signals differ. The autoencoder can be\nequivalently implemented by lateral connections, rather than feedback targeting the inputs (Fig 1A).\nThese lateral connections combine the feedforward connections and the decoding weights and they\nsubtract from the feedforward inputs received by each neuron. The membrane potential dynamics\nin this recurrent network are described by:\n\n\u02d9V = \u2212\u03bbV + Fs + Wo\n\n(1)\nwhere V is the vector of the membrane potentials of the population, s = \u02d9x + \u03bbx is the effective\ninput to the population, W = \u2212FD is the connectivity matrix, and o is the population vector of the\nspikes. Neuron i has threshold Ti = (cid:107)Fi(cid:107)2/2 [15]. When input channels are independent and the\nfeed-forward weights are distributed uniformly on a sphere then the optimal decoding weights D are\nequal to the encoding weights F and hence the optimal recurrent connectivity W = \u2212FFT [17].\nIn the following we assume that this is always the case and we choose the feedforward weights\naccordingly.\nIn this auto-encoding scheme having a precise representation of the inputs is equivalent to main-\ntaining a precise balance between excitation and inhibition. In fact, the membrane potential of a\nneuron is the projection of the global error of the network on the neurons\u2019s feedforward weight\n(Vi = Fi(x \u2212 \u02c6x) [15]). If the output of the network matches the input, the recurrent term in the\nmembrane potential, Fi \u02c6x, should precisely cancel the feedforward term Fix. Therefore, in order to\nlearn the connectivity matrix W, we tackle the problem through balance, which is its physiological\ncharacterization. The learning rule that we derive achieves ef\ufb01cient coding by enforcing a precise\nbalance at a single neuron level. It makes the network converge to a state where each presynaptic\nspike cancels the recent charge that was accumulated by the postsynaptic neuron (Fig 1B). This\naccumulation of charge is naturally represented by the postsynaptic membrane potential Vi, which\njumps upon the arrival of a presynaptic spike by a magnitude given by the recurrent weight Wij due\n\n2\n\n\fFigure 1: A: a network preforming predictive coding. Top panel: a set of unconnected leaky\nintegrate-and-\ufb01re neurons receiving the error between a signal and their own decoded spike trains.\nBottom panel: the previous architecture is equivalent to the recurrent network with lateral connec-\ntions equal to the product of the encoding and the decoding weights. B: illustration of the learning\nof an inhibitory weight. The trace of the membrane potential of a postsynaptic neuron is shown in\nblue and red. The blue lines correspond to changes due to the integration of the feedforward input,\nand the red to changes caused by the integration of spikes from neurons in the population. The black\nline represents the resting potential of the neuron. In the left panel the presynaptic spike perfectly\ncancels the accumulated feedforward current during a cycle and therefore there is no learning. In the\nright panel the inhibitory weight is too strong and thus creates imbalance in the membrane potential.\nTherefore, it is depressed by learning. C: learning in a 20-neuron network. Top panels: the two\ndimensions of the input (blue lines) and the output (red lines) before (left) and after (right) learning.\nBottom panels: raster plots of the spikes in the population. D, left panel: after learning each neuron\nreceives a local estimate of the output of the network through lateral connections (red arrows). Right\npanel: scatter plot of the output of the network projected on the feedforward weights of the neurons\nversus the recurrent input they receive. E: the evolution of the mean error between the recurrent\nweights of the network and the optimal recurrent weights \u2212FFT using the rule de\ufb01ned by equation\n2 (black line) and the rule in [16] (gray line). Note that our rule is different from [16] because it\noperates on a \ufb01ner time-scale and reaches the optimal balanced state with more than one order of\nmagnitude faster. This speed-up is important because, as we will see below, some computations\nrequire a very fast restoration of this balance.\n\nto the instantaneous nature of recurrent synapses. Because the two charges should cancel each other,\nthe greedy learning rule is proportional to the sum of both quantities:\n\n\u00b5(cid:80)\n\n\u03b4Wij \u221d \u2212(Vi + \u03b2Wij)\n\n(2)\nwhere Vi is the membrane potential of the postsynaptic neuron, Wij is the recurrent weight from\nneuron j to neuron i, and the factor \u03b2 controls the overall magnitude of lateral weights. More\nimportantly, \u03b2 regularizes the cost penalizing the total spike count in the population (i.e. H(r) =\ni ri where \u00b5 is the effective linear cost [15]). The example of an inhibitory synapse Wij < 0\nis illustrated in \ufb01gure 1B. If neuron i is too hyperpolarized upon the arrival of a presynaptic spike\nfrom neuron j, i.e., if the inhibitory weight Wij is smaller than \u2212Vi/\u03b2, the absolute weight of\n\n3\n\nW=0W>0+ - \u02c6xxWf=FD\u02c6xFDFD-20020-20020100020003000400010201000200030004000100020003000400010201000200030004000Neuronindexx1,\u02c6x1x2,\u02c6x2BeforeAfterFi\u02c6xWirWir=Fi\u02c6xFixBalancedUnbalancedABCDE10110210310-5100EWTime(s)Vpost200msx\fthe synapse (the amplitude of the IPSP) is decreased. The opposite occurs if the membrane is too\ndepolarized. The synaptic weights thus converge when the two quantities balance each other on\naverage Wij = \u2212(cid:104)Vi(cid:105)tj /\u03b2, where tj are the spike times of the presynaptic neuron j.\nFig 1C shows the learning in a 20-neuron network receiving random input signals. For illustration\npurposes the weights are initialized with very small values. Before learning, the lack of lateral\nconnectivity causes neurons to \ufb01re synchronously and regularly. After learning, spike trains are\nsparse, irregular and asynchronous, despite the quasi absence of noise in the network. Even though\nthe \ufb01ring rates decrease globally, the quality of the input representation drastically improves over\nthe course of learning. Moreover, the convergence of recurrent weights to their optimal values is\ntypically quick and monotonic (Fig 1E).\nBy enforcing balance, the learning rule establishes an ef\ufb01cient and reliable communication between\nneurons. Because V = Fx \u2212 FFT r = F(x \u2212 \u02c6x), every neuron has access - through its recurrent\ninput - to the network\u2019s global coding error projected on its feedforward weight (Fig 1D). This local\nrepresentation of the network\u2019s global performance is crucial in the supervised learning scheme we\ndescribe in the following sections.\n\n2 Generating temporal dynamics within the network\n\nWhile in the previous section we presented a novel rule that drives a spiking network into ef\ufb01ciently\nrepresenting its inputs, we are generally interested in networks that perform more complex compu-\ntations. It has been shown already that a network having two synaptic time scales can implement an\narbitrary linear dynamical system [15]. We brie\ufb02y summarize this approach in this section.\n\nFigure 2: The construction of a recurrent network that implements a linear dynamical system.\n\nIn the autoencoder presented above, the effective input to the network is s = \u02d9x + \u03bbx (Fig 2A). We\nassume that x follows linear dynamics \u02d9x = Ax + c, where A is a constant matrix and c(t) is a time\nvarying command. Thus, the input can be expanded to s = Ax + c + \u03bbx = (A + \u03bbI)x + c (Fig\n2B). Because the output of the network \u02c6x approximates x very precisely, they can be interchanged.\nAccording to this self-consistency argument, the external input term (A + \u03bbI)x is replaced by\n(A + \u03bbI)\u02c6x which only depends on the activity of the network (Fig 2C). This replacement amounts\nto including a global loop that adds the term (A + \u03bbI)\u02c6x to the source input (Fig 2D). As in the\nautoencoder, this can be achieved using recurrent connections in the form of F(A + \u03bbI)FT (Fig\n2E). Note that this recurrent input is the \ufb01ltered spike train r, not the raw spikes o. As a result, these\nnew connections have slower dynamics than the connections presented in the \ufb01rst section. This\nmotivates us to characterize connections as fast and slow depending on their underlying dynamics.\nThe dynamics of the membrane potentials are now described by:\n\n\u02d9V = \u2212\u03bbV V + Fc + Wsr + Wf o\n\n(3)\n\n4\n\n\u02d9x+x(A+I)\u02c6x\u02c6x\u02c6x\u02c6xAx+xA\u02c6x+\u02c6x+c+ccc\u02c6xFFTFFTWfWsDECBA\fwhere \u03bbV is the leak in the membrane potential, which is different from the leak in the decoder \u03bb. It\nis clear from the previous construction that the slow connectivity Ws = F(A + \u03bbI)FT , is involved\nin generating the temporal dynamics of x. Owing to the slow connections, the network is able to\ngenerate autonomously the temporal dynamics of the output and thus, only needs the command\nc as an external input. For example, if A = 0 (i.e.\nthe network implements a pure integrator),\nWs = \u03bbFFT compensates for the leak in the decoder by generating a positive feedback term that\nprevents the activity form decaying. On the other hand, the fast connectivity matrix Wf = \u2212FFT ,\ntrained with the unsupervised, voltage-based rule presented previously, plays the same role as in the\nautoencoder; It insures that the global output and the global coding error of the network are available\nlocally to each neuron.\n\n3 Teaching the network to implement a desired dynamical system\n\nOur aim is to develop a supervised learning scheme where a network learns to generate a desired\noutput with an error feedback as well as a local learning rule. The learning rule targets the slow\nrecurrent connections responsible for the generation of the temporal dynamics in the output, as seen\nin the previous section. Instead of deriving directly the learning rule for the recurrent connections,\nwe \ufb01rst derive a learning rule for the matrix A of the linear dynamical system using simple results\nfrom control theory, and then we translate the learning to the recurrent network.\n\n3.1\n\nlearning a linear dynamical system online\n\nConsider the linear dynamical system \u02d9\u02c6x = M\u02c6x + c where M is a matrix. We derive an online\nlearning rule for the coef\ufb01cients of the matrix M, such that the output \u02c6x becomes after learning\nequal to the desired output x. The latter undergoes the dynamics \u02d9x = Ax + c. Therefore, we de\ufb01ne\ne = x \u2212 \u02c6x as the error vector between the actual and the desired output. This error is fed to the\nmistuned system in order to correct and \u201cguide\u201d its behavior (Fig 3A). Thus, the dynamics of the\nsystem with this feedback are \u02d9\u02c6x = M\u02c6x + c + K(x \u2212 \u02c6x), where K is a scalar implementing the gain\nof the loop. The previous equation can be rewritten in the following form:\n\n\u02d9\u02c6x = (M \u2212 KI)\u02c6x + c + Kx\n\n(4)\nwhere I is the identity matrix. If we assume that the spectra of the signals are bounded, it is straight-\nforward to show, via a Laplace transform, that \u02c6x \u2192 x when K \u2192 +\u221e. The larger the gain of the\nfeedback, the smaller the error. Intuitively, if K is large, very small errors are immediately detected\nand therefore, corrected by the system. Nevertheless our aim is not to correct the dynamical system\nforever, but to teach it to generate the desired output itself without the error feedback. Thus, the\nmatrix M needs to be modi\ufb01ed over time. To derive the learning rule for the matrix M, we operate\na gradient descent on the loss function L = eT e = (cid:107)x \u2212 \u02c6x(cid:107)2 with respect to the components of the\nmatrix. The component Mij is updated proportionally to the gradient of L,\n\n\u03b4Mij \u221d \u2212\n\n\u2202L\n\u2202Mij \u221d (\n\n\u2202\u02c6x\n\u2202Mij\n\n)T e\n\n(5)\n\nTo evaluate the term \u2202\u02c6x/\u2202Mij, we solve the equation 4 for the simple case were inputs c are con-\nstant. If we assume that K is much larger than the eigenvalues of M, the gradient \u2202\u02c6x/\u2202Mij is\napproximated by Eij \u02c6x, where Eij is a matrix of zeros except for component ij which is one. This\nleads to the very simple learning rule \u03b4Mij \u2248 \u02c6xjei, which we can write in matrix form as:\n\n(6)\nThe learning rule is simply the outer product of the output and the error. To derive the learning rule\nwe assume constant or slowly varying input. In practice, however, learning can be achieved also\nusing fast varying inputs (Fig 3).\n\n\u03b4M \u221d e\u02c6xT\n\n3.2 Learning rule for the slow connections\n\nIn the previous section we derived a simple learning rule for the state matrix M of a linear dynamical\nsystem. We translate this learning scheme into the recurrent network described in section 2. To this\nend, we need to determine two things. First, we have to de\ufb01ne the form of the error feedback in the\n\n5\n\n\frecurrent network case. Second, we need to adapt the learning rule of the matrix of the underlying\ndynamical system to the slow weights of the recurrent neural network.\nIn the previous learning scheme the error is fed into the dynamical system as an additional input.\nSince the input weight vector of a neuron Fi de\ufb01nes the direction that is relevant for its \u201caction\u201d\nspace, the neuron should only receive the errors in that direction. Thus, the error vector is projected\non the feedforward weights vector of a neuron before being fed to it. Accordingly, equation 3\nbecomes:\n\n\u02d9V = \u2212\u03bbV V + Fc + Wsr + Wf o + KFe\n\n(7)\nIn the autoencoder, the membrane potential of a neuron represents the error between the input and\nthe output of the entire network along the neuron\u2019s feedforward weight. With the addition of the\ndynamic error feedback and the slow connections, the membrane potentials now represent the error\nbetween the actual and the desired network output trajectories.\nTo translate the learning rule of the dynamical system into a rule for the recurrent network, we as-\nsume that any modi\ufb01cation of the recurrent weights directly re\ufb02ects a modi\ufb01cation in the underlying\ndynamical system. This is achieved if the updates \u03b4Ws of the slow connectivity matrix are in the\nform of F(\u03b4M)FT . This ensures that the network always implements a linear dynamical system and\nguarantees that the analysis is consistent. The learning rule of the slow connections Ws is obtained\nby replacing \u03b4M by its expression according to equation 6 in F(\u03b4M)FT :\n\n\u03b4Ws \u221d (Fe)(F\u02c6x)T\n\n(8)\nAccording to this learning rule, the weight update between two neurons, \u03b4W s\nij, is proportional to\nthe error feedback Fie received as a current by the postsynaptic neuron i and to Fj \u02c6x, the output of\nthe network projected on the feedforward weight of the presynaptic neuron j. The latter quantity is\navailable to the presynaptic neuron through its inward fast recurrent connections, as shown for the\nautoencoder in Fig 1D.\nOne might object that the previous learning rule is not biologically plausible because it involves\ncurrents present separately in the pre- and post-synaptic neurons. Indeed, the presynaptic term may\nnot be available to the synapse. However, as shown in the supplementary information of [15], the\n\ufb01ltered spike train rj of the presynaptic neuron is approximately proportional to (cid:98)Fj \u02c6x(cid:99)+, a recti\ufb01ed\nversion of the presynaptic term in the previous learning rule. By replacing Fj \u02c6x by rj in the equation\n8 we obtain the following biologically plausible learning rule:\n\n\u03b4W s\n\nij = Eirj\n\n(9)\n\nWhere Ei = Fie is the total error current received by the postsynaptic neuron.\n\n3.3 Learning the underlying dynamical system while maintaining balance\n\nFor the previous analysis to hold, the fast connectivity Wf should be learned simultaneously with\nthe slow connections using the learning rule de\ufb01ned by equation 2. As shown in the \ufb01rst section,\nthe learning of the fast connections establishes a detailed balance on the level of the neuron and\nguarantees that the output of the network is available to each neuron through the term Fj \u02c6x. The\nlatter is the presynaptic term in the learning rule of equation 8. Despite not being involved in the\ndynamics per se, these fast connections are crucial in order to learn any temporal dynamics. In other\nwords, learning a detailed balance is a pre-requirement to learn dynamics with local plasticity rules\nin a spiking network. The plasticity of the fast connections remediate very quickly any perturbation\nto the balance caused by the learning of the slow connections.\n\n3.4 Simulation\n\nAs a toy example, we simulated a 20-neuron network learning a 2D-damped oscillator using a feed-\nback gain K = 100. The network is initialized with weak fast connections and weak slow con-\nnections. The learning is driven by smoothed gaussian noise as the command c. Note that in the\ninitial state there are no fast recurrent connection and the output of the network does not depend\nlinearly on the input because membrane potentials are too hyperpolarized (Fig 3B). The network\u2019s\noutput is quickly linearized through the learning of the fast connections (equation 2) by enforcing a\nbalance on the membrane potential (Fig 3B): initial membrane potentials exhibit large \ufb02uctuations\n\n6\n\n\fFigure 3: Learning temporal dynamics in a recurrent network. A, top panel: the linear dynamical\nsystem characterized by the state matrix M receives feedback signaling the difference between its\nactual output and a desired output. Bottom panel: a recurrent network displaying slow and fast\nconnections is equivalent to the top architecture if the error feedback is fed into the network through\nthe feedforward matrix F. B: a 20 neuron network learns using equations 9 and 2. Left panel: the\nevolution of the error between the desired and the actual output during learning. The black and\ngrey arrows represent instances where the time course of the membrane potential is shown in the\nnext plot. Right panel: the time course of the membrane potential of one neuron at two different\ninstances during learning. The gray line corresponds to the initial state while the black line is a few\niterations after. C: scatter plots of the learned versus the predicted weights at the end of learning for\nfast (top panel) and slow (bottom panel) connections. D, top panels: the output of the network (red)\nand the desired output (blue), before (left) and after (right) learning. The black solid line on the top\nshows the impulse command that drives the network. Bottom panels: raster plots before and after\nlearning. In the left raster plot there is no spiking activity after the \ufb01rst 50 ms.\n\nthat wane drastically after a few iterations (Fig 3B). On a slower time scale the slow connections\nlearn to minimize the prediction error using the learning rule of equation 9. The error between the\noutput of the network and the desired output decreases drastically (Fig 3B). To compute this error,\ndifferent instances of the connectivity matrices were sampled during learning. The network was\nthen re-simulated using the same instances while \ufb01xing K=0 to mesure the performance in the ab-\nsence of feedback. At the end of learning, both slow and fast connections converge to their predicted\nvalues Ws = F(A + \u03bbI)FT and Wf = \u2212FFT (Fig 3C). The presence of feedback is no longer\nrequired for the network to have the right dynamics (i.e. if we set K = 0 we still obtain the desired\noutput, see Figs. 3D and 3B). The output of the network is very accurate (representing the state x\nwith a precision of the order of the contribution of a single spike), parsimonious (no unnecessary\nspikes are emitted to represent the dynamical state at this level of accuracy) and the spike trains are\nasynchronous and irregular. Note that because the slow connections are weak at the initial state,\nspiking activity decays quickly once the command impulse is turned off, due to the absence of slow\nrecurrent excitation (Fig 3D).\n\nSimulation parameters Figure 1 : \u03bb = 0.05, \u03b2 = 0.51, learning rate: 0.01. Figure 3 : \u03bb = 50,\n\u03bbV = 1, \u03b2 = 0.52, K = 100, learning rate of the fast connections: 0.03, learning rate of the slow\nconnections: 0.15.\n\n7\n\nMcKe+ + - + c\u02c6xxKe+ - FFFTACDM.P.100102104048-8-40150ms500010000150001102050001000015000Neuronindexx1,\u02c6x1x2,\u02c6x2-101-101-1020102-1020102300msPredictedLearnedLearnedWfWs100102104048-8-40BErrorTime(s)\u02c6xx\f4 Discussion\n\nUsing a top-down approach we derived a pair of spike-based and current-based plasticity rules that\nenable precise supervised learning in a recurrent network of LIF neurons. The essence of this ap-\nproach is that every neuron is a precise computational unit that represents the network error in a\nsubspace of dimension 1 in the output space. The precise and distributed nature of this code allows\nthe derivation of local learning rules from global objectives.\nTo compute collectively, the neurons need to communicate to each other about their contributions to\nthe output of the network. The fast connections are trained in an unsupervised fashion using a spike-\nbased rule to optimize this communication. It establishes this ef\ufb01cient communication by enforcing\na detailed balance between excitation and inhibition. The slow connections however are trained to\nminimize the error between the actual output of the network and a target dynamical system. They\nproduce currents with long temporal correlations implementing the temporal dynamics of the under-\nlying linear dynamical system. The plasticity rule for the slow connections is simply proportional\nto an error feedback injected as a current in the postsynaptic neuron, and to a quantity akin to the\n\ufb01ring rate of the presynaptic neuron. To guide the behavior of the network during learning, the error\nfeedback must be strong and speci\ufb01c. Such strength and specialization is in agreement with data\non climbing \ufb01bers in the cerebellum [18\u201320], which are believed to bring information about errors\nduring motor learning [21]. However, in this model, the speci\ufb01city of the error signals are de\ufb01ned\nby a weight matrix through which the errors are fed to the neurons. Learning these weights is still\nunder investigation. We believe that they could be learned using a covariance-based rule.\nOur approach is substantially different from usual supervised learning paradigms in spiking net-\nworks since it does not target the spike times explicitly. However, observing spike times may be\nmisleading since there are many combinations that can produce the same output [15, 16]. Thus, in\nthis framework, variability in spiking is not a lack of precision, but is the consequence of the redun-\ndancy in the representation. Neurons having similar decoding weights may have their spike times\ninterchanged while the global representation is conserved. What is important, is the cooperation\nbetween the neurons and the precise spike timing relative to the population. For example, using in-\ndependent Poisson neurons with instantaneous \ufb01ring rates identical to the predictive coding network\ndrastically degrades the quality of the representaion [15].\nOur approach is also different from liquid computing in the sense that the network is small, struc-\ntured, and \ufb01res only when needed.\nIn addition, in these studies the feedback error used in the\nlearning rule has no clear physiological correlate, while here it is concretely injected as a current in\nthe neurons. This current is used simultaneously to drive the learning rule and to guide the dynamics\nof the neuron in the short term. However, it is still unclear what the mechanisms are that could\nimplement such a current dependent learning rule in biological neurons.\nAn obvious limitation of our framework is that it is currently restricted to linear dynamical systems.\nOne possibility to overcome this limitation would be to introduce non-linearities in the decoder,\nwhich would translate into speci\ufb01c non-linearities and structures in the dendrites. A similar strategy\nhas been employed recently to combine the approach of predictive coding and FORCE learning [7]\nusing two compartment LIF neurons [22]. We are currently exploring less constraining forms of\nsynaptic non-linearities, with the ultimate goal of being able to learn arbitrary dynamics in spiking\nnetworks using purely local plasticity rules.\n\nAcknowledgments\n\nThis work was supported by ANR-10-LABX-0087 IEC, ANR-10-IDEX-0001-02 PSL, ERC grant\nFP7-PREDISPIKE and the James McDonnell Foundation Award - Human Cognition.\n\nRefrences\n\n[1] Kawato, M. (1999). Internal models for motor control and trajectory planning. Current opinion\n\nin neurobiology, 9(6), 718-727.\n\n[2] Lackner, J. R., & Dizio, P. (1998). Gravitoinertial force background level affects adaptation to\ncoriolis force perturbations of reaching movements. Journal of neurophysiology, 80(2), 546-\n553.\n\n8\n\n\f[3] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-\n\npropagating errors. Cognitive modeling, 5.\n\n[4] Williams, R. J., & Zipser, D. (1989). A learning algorithm for continually running fully recur-\n\nrent neural networks. Neural computation, 1(2), 270-280.\n\n[5] Jaeger, H. (2001). The echo state approach to analysing and training recurrent neural networks-\nwith an erratum note. Bonn, Germany: German National Research Center for Information\nTechnology GMD Technical Report, 148, 34.\n\n[6] Maass, W., Natschlger, T., & Markram, H. (2002). Real-time computing without stable states:\nA new framework for neural computation based on perturbations. Neural computation, 14(11),\n2531-2560.\n\n[7] Sussillo, D., & Abbott, L. F. (2009). Generating coherent patterns of activity from chaotic\n\nneural networks. Neuron, 63(4), 544-557.\n\n[8] Legenstein, R., Naeger, C., & Maass, W. (2005). What can a neuron learn with spike-timing-\n\ndependent plasticity?. Neural computation, 17(11), 2337-2382.\n\n[9] P\ufb01ster, J., Toyoizumi, T., Barber, D., & Gerstner, W. (2006). Optimal spike-timing-dependent\nplasticity for precise action potential \ufb01ring in supervised learning. Neural computation, 18(6),\n1318-1348.\n\n[10] Ponulak, F., & Kasinski, A. (2010). Supervised learning in spiking neural networks with Re-\nSuMe: sequence learning, classi\ufb01cation, and spike shifting. Neural Computation, 22(2), 467-\n510.\n\n[11] Memmesheimer, R. M., Rubin, R., lveczky, B. P., & Sompolinsky, H. (2014). Learning pre-\n\ncisely timed spikes. Neuron, 82(4), 925-938.\n\n[12] G\u00a8utig, R., & Sompolinsky, H. (2006). The tempotron: a neuron that learns spike timingbased\n\ndecisions. Nature neuroscience, 9(3), 420-428.\n\n[13] van Vreeswijk, C., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced\n\nexcitatory and inhibitory activity. Science, 274(5293), 1724-1726.\n\n[14] Brunel, N. (2000). Dynamics of networks of randomly connected excitatory and inhibitory\n\nspiking neurons. Journal of Physiology-Paris, 94(5), 445-463.\n\n[15] Boerlin, M., Machens, C. K., & Den`eve, S. (2013). Predictive coding of dynamical variables\n\nin balanced spiking networks. PLoS computational biology, 9(11), e1003258.\n\n[16] Bourdoukan, R., Barrett, D., Machens, C. K & Den`eve, S. (2012). Learning optimal spike-\nbased representations. In Advances in neural information processing systems (pp. 2285-2293).\n[17] Vertechi, P., Brendel, W., & Machens, C. K. (2014). Unsupervised learning of an ef\ufb01cient\nshort-term memory network. In Advances in Neural Information Processing Systems (pp. 3653-\n3661).\n\n[18] Watanabe, M., & Kano, M. (2011). Climbing \ufb01ber synapse elimination in cerebellar Purkinje\n\ncells. European Journal of Neuroscience, 34(10), 1697-1710.\n\n[19] Chen, C., Kano, M., Abeliovich, A., Chen, L., Bao, S., Kim, J. J., ... & Tonegawa, S. (1995).\nImpaired motor coordination correlates with persistent multiple climbing \ufb01ber innervation in\nPKC mutant mice. Cell, 83(7), 1233-1242.\n\n[20] Eccles, J. C., Llinas, R., & Sasaki, K. (1966). The excitatory synaptic action of climbing \ufb01bres\n\non the Purkinje cells of the cerebellum. The Journal of Physiology, 182(2), 268-296.\n\n[21] Knudsen, E. I. (1994). Supervised Learning in the Brain. The Journal of Neuroscience 14(7),\n\n39853997.\n\n[22] Thalmeier, D., Uhlmann, M., Kappen, H.J., & Memmesheimer, R. (2015) Learning universal\n\ncomputations with spikes. arXiv preprint arXiv:1505.07866.\n\n9\n\n\f", "award": [], "sourceid": 622, "authors": [{"given_name": "Ralph", "family_name": "Bourdoukan", "institution": "Ecole Normale Superieure"}, {"given_name": "Sophie", "family_name": "Den\u00e8ve", "institution": "GNT, Ecole Normale Superieure"}]}