{"title": "Spike-Train Level Backpropagation for Training Deep Recurrent Spiking Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 7802, "page_last": 7813, "abstract": "Spiking neural networks (SNNs) well support spatiotemporal learning and energy-efficient event-driven hardware neuromorphic processors. As an important class of SNNs, recurrent spiking neural networks (RSNNs) possess great computational power. However, the practical application of RSNNs is severely limited by challenges in training. Biologically-inspired unsupervised learning has limited capability in boosting the performance of RSNNs. On the other hand, existing backpropagation (BP) methods suffer from high complexity of unrolling in time, vanishing and exploding gradients, and approximate differentiation of discontinuous spiking activities when applied to RSNNs. To enable supervised training of RSNNs under a well-defined loss function, we present a novel Spike-Train level RSNNs Backpropagation (ST-RSBP) algorithm for training deep RSNNs. The proposed ST-RSBP directly computes the gradient of a rated-coded loss function defined at the output layer of the network w.r.t tunable parameters. The scalability of ST-RSBP is achieved by the proposed spike-train level computation during which temporal effects of the SNN is captured in both the forward and backward pass of BP. Our ST-RSBP algorithm can be broadly applied to RSNNs with a single recurrent layer or deep RSNNs with multiple feed-forward and recurrent layers. Based upon challenging speech and image datasets including TI46, N-TIDIGITS, Fashion-MNIST and MNIST, ST-RSBP is able to train RSNNs with an accuracy surpassing that of the current state-of-art SNN BP algorithms and conventional non-spiking deep learning models.", "full_text": "Spike-Train Level Backpropagation for Training\n\nDeep Recurrent Spiking Neural Networks\n\nUniversity of California, Santa Barbara\n\nUniversity of California, Santa Barbara\n\nWenrui Zhang\n\nSanta Barbara, CA 93106\nwenruizhang@ucsb.edu\n\nPeng Li\n\nSanta Barbara, CA 93106\n\nlip@ucsb.edu\n\nAbstract\n\nSpiking neural networks (SNNs) well support spatio-temporal learning and energy-\nef\ufb01cient event-driven hardware neuromorphic processors. As an important class\nof SNNs, recurrent spiking neural networks (RSNNs) possess great computa-\ntional power. However, the practical application of RSNNs is severely limited by\nchallenges in training. Biologically-inspired unsupervised learning has limited\ncapability in boosting the performance of RSNNs. On the other hand, existing\nbackpropagation (BP) methods suffer from high complexity of unfolding in time,\nvanishing and exploding gradients, and approximate differentiation of discontinu-\nous spiking activities when applied to RSNNs. To enable supervised training of\nRSNNs under a well-de\ufb01ned loss function, we present a novel Spike-Train level\nRSNNs Backpropagation (ST-RSBP) algorithm for training deep RSNNs. The\nproposed ST-RSBP directly computes the gradient of a rate-coded loss function\nde\ufb01ned at the output layer of the network w.r.t tunable parameters. The scalability\nof ST-RSBP is achieved by the proposed spike-train level computation during\nwhich temporal effects of the SNN is captured in both the forward and backward\npass of BP. Our ST-RSBP algorithm can be broadly applied to RSNNs with a\nsingle recurrent layer or deep RSNNs with multiple feedforward and recurrent\nlayers. Based upon challenging speech and image datasets including TI46 [25],\nN-TIDIGITS [3], Fashion-MNIST [40] and MNIST, ST-RSBP is able to train\nSNNs with an accuracy surpassing that of the current state-of-the-art SNN BP\nalgorithms and conventional non-spiking deep learning models.\n\n1\n\nIntroduction\n\nIn recent years, deep neural networks (DNNs) have demonstrated outstanding performance in natural\nlanguage processing, speech recognition, visual object recognition, object detection, and many other\ndomains [6, 14, 21, 36, 13]. On the other hand, it is believed that biological brains operate rather\ndifferently [17]. Neurons in arti\ufb01cial neural networks (ANNs) are characterized by a single, static, and\ncontinuous-valued activation function. More biologically plausible spiking neural networks (SNNs)\ncompute based upon discrete spike events and spatio-temporal patterns while enjoying rich coding\nmechanisms including rate and temporal codes [11]. There is theoretical evidence supporting that\nSNNs possess greater computational power over traditional ANNs [11]. Moreover, the event-driven\nnature of SNNs enables ultra-low-power hardware neuromorphic computing devices [7, 2, 10, 28].\nBackpropagation (BP) is the workhorse for training deep ANNs [22]. Its success in the ANN world\nhas made BP a target of intensive research for SNNs. Nevertheless, applying BP to biologically\nmore plausible SNNs is nontrivial due to the necessity in dealing with complex neural dynamics\nand non-differentiability of discrete spike events. It is possible to train an ANN and then convert\nit to an SNN [9, 10, 16]. However, this suffers from conversion approximations and gives up the\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fopportunity in exploring SNNs\u2019 temporal learning capability. One of the earliest attempts to bridge\nthe gap between discontinuity of SNNs and BP is the SpikeProp algorithm [5]. However, SpikeProp\nis restricted to single-spike learning and has not yet been successful in solving real-world tasks.\nRecently, training SNNs using BP under a \ufb01ring rate (or activity level) coded loss function has been\nshown to deliver competitive performances [23, 39, 4, 33]. Nevertheless, [23] does not consider the\ntemporal correlations of neural activities and deals with spiking discontinuities by treating them as\nnoise. [33] gets around the non-differentiability of spike events by approximating the spiking process\nvia a probability density function of spike state change. [39], [4], and [15] capture the temporal effects\nby performing backpropagation through time (BPTT) [37]. Among these, [15] adopts a smoothed\nspiking threshold and a continuous differentiable synaptic model for gradient computation, which\nis not applicable to widely used spiking neuron models such as the leaky integrate-and-\ufb01re (LIF)\nmodel. Similar to [23], [39] and [4] compute the error gradient based on the continuous membrane\nwaveforms resulted from smoothing out all spikes. In these approaches, computing the error gradient\nby smoothing the microscopic membrane waveforms may lose the sight of the all-or-none \ufb01ring\ncharacteristics of the SNN that de\ufb01nes the higher-level loss function and lead to inconsistency between\nthe computed gradient and target loss, potentially degrading training performance [19].\nMost existing SNN training algorithms including the aforementioned BP works focus on feedforward\nnetworks. Recurrent spiking neural networks (RSNNs), which are an important class of SNNs and\nare especially competent for processing temporal signals such as time series or speech data [12],\ndeserve equal attention. The liquid State Machine (LSM) [27] is a special RSNN which has a single\nrecurrent reservoir layer followed by one readout layer. To mitigate training challenges, the reservoir\nweights are either \ufb01xed or trained by unsupervised learning like spike-timing-dependent plasticity\n(STDP) [29] with only the readout layer trained by supervision [31, 41, 18]. The inability in training\nthe entire network with supervision and its architectural constraints, e.g. only admitting one reservoir\nand one readout, limit the performance of LSM. [4] proposes an architecture called long short-term\nmemory SNNs (LSNNs) and trains it using BPTT with the aforementioned issue on approximate\ngradient computation. When dealing with training of general RSNNs, in addition to the dif\ufb01culties\nencountered in feedforward SNNs, one has to cope with added challenges incurred by recurrent\nconnections and potential vanishing/exploding gradients.\nThis work is motivated by: 1) lack of powerful supervised training of general RSNNs, and 2) an\nimmediate outcome of 1), i.e. the existing SNN research has limited scope in exploring sophisticated\nlearning architectures like deep RSNNs with multiple feedforward and recurrent layers hybridized\ntogether. As a \ufb01rst step towards addressing these challenges, we propose a novel biologically non-\nplausible Spike-Train level RSNNs Backpropagation (ST-RSBP) algorithm which is applicable to\nRSNNs with an arbitrary network topology and achieves the state-of-the-art performances on several\nwidely used datasets. The proposed ST-RSBP employs spike-train level computation similar to what\nis adopted in the recent hybrid macro/micro level BP (HM2-BP) method for feedforward SNNs [19],\nwhich demonstrates encouraging performances and outperforms BPTT such as the one implemented\nin [39].\nST-RSBP is rigorously derived and can handle arbitrary recurrent connections in various RSNNs.\nWhile capturing the temporal behavior of the RSNN at the spike-train level, ST-RSBP directly\ncomputes the gradient of a rate-coded loss function w.r.t tunable parameters without incurring\napproximations resulted from altering and smoothing the underlying spiking behaviors. ST-RSBP\nis able to train RSNNs without costly unfolding the network through time and performing BP time\npoint by time point, offering faster training and avoiding vanishing/exploding gradients for general\nRSNNs. Moreover, as mentioned in Section 2.2.1 and 2.3 of the Supplementary Materials, since\nST-RSBP more precisely computes error gradients than HM2-BP [19], it can achieve better results\nthan HM2-BP even on the feedforward SNNs.\nWe apply ST-RSBP to train several deep RSNNs with multiple feedforward and recurrent layers\nto demonstrate the best performances on several widely adopted datasets. Based upon challenging\nspeech and image datasets including TI46 [25], N-TIDIGITS [3] and Fashion-MNIST [40], ST-RSBP\ntrains RSNNs with an accuracy noticeably surpassing that of the current state-of-the-art SNN BP\nalgorithms and conventional non-spiking deep learning models and algorithms. Furthermore, ST-\nRSBP is also evaluated on feedforward spiking convolutional neural networks (spiking CNNs) with\nthe MNIST dataset and achieves 99.62% accuracy, which is the best among all SNN BP rules.\n\n2\n\n\f2 Background\n\n2.1 SNN Architectures and Training Challenges\n\nFig. 1A shows two SNN architectures often explored in neuroscience: single layer (top) and liquid\nstate machine (bottom) networks for which different mechanisms have been adopted for training.\nHowever, typically spike timing dependent plasticity (STDP) [29] and winner-take-all (WTA) [8] are\nonly for unsupervised training and have limited performance. WTA and other supervised learning\nrules [31, 41, 18] can only be applied to the output layer, obstructing adoption of more sophisticated\ndeep architectures.\n\nFigure 1: Various SNN networks: (A) one layer SNNs and liquid state machine; (B)\nmulti-layer feedforward SNNs; (C) deep hybrid feedforward/recurrent SNNs.\n\nWhile bio-inspired learning mechanisms are yet to demonstrate competitive performance for chal-\nlenging real-life tasks, there has been much recent effort aiming at improving SNN performance\nwith supervised BP. Most existing SNN BP methods are only applicable to multi-layer feedfor-\nward networks as shown in Fig. 1B. Several such methods have demonstrated promising results\n[23, 39, 19, 33]. Nevertheless, these methods are not applicable to complex deep RSNNs such as the\nhybrid feedforward/recurrent networks shown in Fig. 1C, which are the target of this work. Backprop-\nagation through time (BPTT) in principle may be applied to training RSNNs [4], but bottlenecked\nwith several challenges in: (1) unfolding the recurrent connections through time, (2) back propagating\nerrors over both time and space, and (3) back propagating errors over non-differentiable spike events.\n\nFigure 2: Backpropagation in recurrent SNNs: BPTT vs. ST-RSBP.\n\nFig. 2 compares BPTT and ST-RSBP, where we focus on a recurrent layer since feedforward layer\ncan be viewed as a simpli\ufb01ed recurrent layer. To apply BPTT, one shall \ufb01rst unfold a RSNN in time\n\n3\n\nReservoirInput LayerOutput LayerSTDP/No TrainingWinner Take All/ Single layer supervised learningInput LayerOutput LayerABInput LayerHidden LayerHidden LayerHidden LayerOutput LayerBackpropagationInput LayerHidden LayerHidden LayerHidden LayerOutput LayerCRecurrentLayerRecurrentLayerHidden LayerInput LayerRecurrentLayeroutput Layerhidden Layerhidden LayerBPTTFiringMembrane potentialBackpropagation (BP) by approximating differentiation of discontinuous spiking activities.jieijwijS-PSPBackpropagation over spike-train}Step1: Unfold the entire recurrent layer through time into a feedforward networkStep2: Backpropagate the errors across the whole unfolded network in time with a su\ufb03ciently small time stept=1t=2t=3inoutST-RSBPinoutin[t1]in[t2]in[t3]out[t2]out[t1]out[t3]\fto convert it into a larger feedforward network without recurrent connections. The total number of\nlayers in the feedforward network is increased by a factor equal to the number of times the RSNN\nis unfolded, and hence can be very large. Then, this unfolded network is integrated in time with\na suf\ufb01ciently small time step to capture dynamics of the spiking behavior. BP is then performed\nspatio-temproally layer-by-layer across the unfolded network based on the same time stepsize used\nfor integration as shown in Fig. 2. In contrast, the proposed ST-RSBP does not convert the RSNN\ninto a larger feedforward SNN. The forward pass of BP is based on time-domain integration of the\nRSNN of the original size. Following the forward pass, importantly, the backward pass of BP is not\nconducted point by point in time, but instead, much more ef\ufb01ciently on the spike-train level. We\nmake use of Spike-train Level Post-synaptic Potentials (S-PSPs) discussed in Section 2.2 to capture\ntemporal interactions between any pair of pre/post-synaptic neurons. ST-RSBP is more scalable and\nhas the added bene\ufb01ts of avoiding exploding/vanishing gradients for general RSNNs.\n\n2.2 Spike-train Level Post-synaptic Potential (S-PSP)\n\nS-PSP captures the spike-train level interactions between a pair of pre/post-synaptic neurons. Note that\neach neuron \ufb01res whenever its post-synaptic potential reaches the \ufb01ring threshold. The accumulated\ncontributions of the pre-synaptic neuron j\u2019s spike train to the (normalized) post-synaptic potential of\nthe neuron i right before all the neuron i\u2019s \ufb01ring times is de\ufb01ned as the (normalized) S-PSP from the\nneuron j to the neuron i as in (6) in the Supplementary Materials. The S-PSP eij characterizes the\naggregated effect of the spike train of the neuron j on the membrane potential of the neuron i and its\n\ufb01ring activities. S-PSPs allow consideration of the temporal dynamics and recurrent connections of\nan RSNN across all \ufb01ring events at the spike-train level without expensive unfolding through time\nand backpropagation time point by time point.\nThe sum of the weighted S-PSPs from all pre-synaptic neurons of the neuron i is de\ufb01ned as the total\npost-synaptic potential (T-PSP) ai. ai is the post-synaptic membrane potential accumulated right\nbefore all \ufb01ring times and relates to the \ufb01ring count oi via the \ufb01ring threshold \u03bd [19]:\n\nai =\n\nwij eij,\n\noi = g(ai) \u2248\n\nai\n\u03bd\n\n.\n\n(1)\n\n(cid:88)\n\nj\n\nai and oi are analogous to the pre-activation and activation in the traditional ANNs, respectively, and\ng(\u00b7) can be considered as an activation function converting the T-PSP to the output \ufb01ring count.\nA detailed description of S-PSP and T-PSP can be found in Section 1 in the Supplementary Materials.\n\n3 Proposed Spike-Train level Recurrent SNNs Backpropagation (ST-RSBP)\n\nNk(cid:88)\n\nNk+1(cid:88)\n\nWe use the generic recurrent spiking neural network with a combination of feedforward and recurrent\nlayers of Fig. 2 to derive ST-RSBP. For the spike-train level activation of each neuron l in the layer\nk + 1, (1) is modi\ufb01ed to include the recurrent connections explicitly if necessary:\n\nak+1\nl =\n\nwk+1\n\nlj\n\nek+1\nlj +\n\nj=1\n\np=1\n\nwk+1\n\nlp\n\nek+1\nlp\n\n,\n\nok+1\nl = g(ak+1\n\nl\n\nak+1\nl\n\u03bdk+1 .\n\n) \u2248\n\n(2)\n\nlp\n\nare the corresponding S-PSPs, \u03bdk+1 is the \ufb01ring threshold at the layer k + 1, ok+1\n\nNk+1 and Nk are the number of neurons in the layers k + 1 and k, wk+1\nis the feedforward weight\nlj\nfrom the neuron j in the layer k to the neuron l in the layer k + 1, wk+1\nis the recurrent weight from\nthe neuron p to the neuron l in the layer k + 1, which is non-existent if the layer k + 1 is feedforward,\nand ek+1\nand\nek+1\nlj\nare the \ufb01ring count and pre-activation (T-PSP) of the neuron l at the layer k + 1, respectively.\nak+1\nl\nThe rate-coded loss is de\ufb01ned at the output layer as:\n1\n2||o \u2212 y||2\n\n(3)\nwhere y, o and a are vectors of the desired output neuron \ufb01ring counts (labels) and actual \ufb01ring\ncounts, and the T-PSPs of the output neurons, respectively. Differentiating (3) with respect to each\ntrainable weight wk\n\nij incident upon the layer k leads to:\n\n1\n2||\n\na\n\u03bd \u2212 y||2\n2,\n\nE =\n\n2 =\n\nlp\n\nl\n\n\u2202E\n\u2202wk\nij\n\n=\n\n\u2202E\n\u2202ak\ni\n\n\u2202ak\ni\n\u2202wk\nij\n\n= \u03b4k\ni\n\n\u2202ak\ni\n\u2202wk\nij\n\nwith \u03b4k\n\ni =\n\n\u2202E\n\u2202ak\ni\n\n,\n\n(4)\n\n4\n\n\fi and \u2202ak\n\nwhere \u03b4k\nrespectively, for the neuron i. ST-RSBP updates wk\n\nare referred to as the back propagated error and differentiation of activation,\n, where \u03b7 is a learning rate.\n\ni\n\u2202wk\nij\n\nij by \u2206wk\n\nij = \u03b7 \u2202E\n\u2202wk\nij\n\nWe outline the key component of derivation of ST-RSBP: the back propagated errors. The full\nderivation of ST-RSBP is presented in Section 2 of the Supplementary Materials.\n\n3.1 Outline of the Derivation of Back Propagated Errors\n\n3.1.1 Output Layer\n\nIf the layer k is the output, the back propagated error of the neuron i is given by differentiating (3):\n(5)\n\n=\n\n\u2202E\n\u2202ak\ni\n\ni \u2212 yk\n(ok\ni )\n\u03bdk\n\n\u03b4k\ni =\ni the desired \ufb01ring count (label), and ak\n\n,\n\ni the T-PSP.\n\nwhere ok\n\ni is the actual \ufb01ring count, yk\n\n3.1.2 Hidden Layers\n\nNk+1(cid:88)\n\nl=1\n\nAt each hidden layer k, the chain rule is applied to determine the error \u03b4i for the neuron i:\n\n\u03b4k\ni =\n\n\u2202E\n\u2202ak\ni\n\n=\n\n\u2202E\n\n\u2202ak+1\n\nl\n\nl\n\n\u2202ak+1\n\u2202ak\ni\n\n=\n\n\u03b4k+1\nl\n\nl\n\n\u2202ak+1\n\u2202ak\ni\n\n.\n\n(6)\n\nNk\n\n1 ,\u00b7\u00b7\u00b7 , \u03b4k\n\n], and\n], respectively. Assuming \u03b4k+1 is given, which is the case for the output layer, the\n\nDe\ufb01ne two error vectors \u03b4k+1 and \u03b4k for the layers k + 1 and k : \u03b4k+1 = [\u03b4k+1\n\u03b4k = [\u03b4k\ngoal is to back propagate from \u03b4k+1 to \u03b4k. This entails to compute \u2202ak+1\n\u2202ak\ni\n[Backpropagation from a Hidden Recurrent Layer] Now consider the case that the errors are\nback propagated from a recurrent layer k + 1 to its preceding layer k. Note that the S-PSP elj from\nany pre-synaptic neuron j to a post-synpatic neuron l is a function of both the rate and temporal\ninformation of the pre/post-synaptic spike trains, which can be made explicitly via some function f:\n(7)\n\n,\u00b7\u00b7\u00b7 , \u03b4k+1\n\nelj = f (oj, ol, t(f )\n\nin (6).\n\n, t(f )\n\nNk+1\n\n),\n\n1\n\nl\n\nj\n\nl\n\nare the pre/post-synaptic \ufb01ring counts and \ufb01ring times, respectively.\n\nNk+1(cid:88)\n\nl=1\n\nNk(cid:88)\n\nj\n\n, t(f )\n\nwhere oj, ol, t(f )\nNow based on (2), \u2202ak+1\n\u2202ak\ni\n\nj\n\nl\n\nl\n\nis split also into two summations:\n\nl\n\n\u2202ak+1\n\u2202ak\ni\n\n=\n\nwk+1\n\nlj\n\ndek+1\nlj\ndak\ni\n\n+\n\nwk+1\n\nlp\n\ndek+1\nlp\ndak\ni\n\n,\n\n(8)\n\nNk+1(cid:88)\n\np\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3 1\n\n\u03bdk\n\n1\n\nwhere the \ufb01rst summation sums over all pre-\nsynaptic neurons in the previous layer k while\nthe second sums over the pre-synaptic neurons\nin the current recurrent layer as illustrated in\nFig. 3.\nOn the right side of (8), dek+1\nlj\ndak\ni\n\nis given by:\n\u2202ek+1\n\u2202ok+1\n\n\u2202ak+1\n\u2202ak\ni\n\nlj\n\nl\n\nl\n\nj = i\n\ndek+1\nlj\ndak\ni\n\n=\n\n\u2202ek+1\nli\n\u2202ok\ni\n\u2202ek+1\n\u2202ok+1\n\nlj\n\n+ 1\n\n\u03bdk+1\n\u2202ak+1\n\u2202ak\ni\n\nl\n\nl\n\n\u03bdk+1\n\nj (cid:54)= i.\n(9)\n\u03bdk and \u03bdk+1 are the \ufb01ring threshold voltages for\nthe layers k and k + 1, respectively, and we have\nused that ok\n/\u03bdk+1\nfrom (1). Importantly, the last term on the right\n\u2019s dependency on\nside of (9) exists due to ek+1\nthe post-synaptic \ufb01ring rate ok+1\nper (7) and\n\ni /\u03bdk and ok+1\n\n\u2248 ak+1\n\ni \u2248 ak\n\nlj\n\nl\n\nl\n\nl\n\n5\n\nFigure 3: Connections for a recurrent layer\nneuron and the dependencies among its S-\nPSPs.\n\nkk+1Summation over all neurons in the previous layer kSummation over all connected neurons in the recurrent layerilDependencies of S-PSP:pjek+1lpokiok+1lok+1plek+1ljok+1lokj\f\u2019s further dependency on the pre-synaptic activation ok\n\ni (hence pre-activation ak\n\ni ), as shown in\n\nok+1\nl\nFig. 3.\nOn the right side of (8), dek+1\nlp\ndak\ni\ndek+1\nlp\ndak\ni\n\nis due to the recurrent connections within the layer k + 1:\n\n=\n\n1\n\n\u03bdk+1\n\nlp\n\n\u2202ek+1\n\u2202ok+1\n\nl\n\nl\n\n\u2202ak+1\n\u2202ak\ni\n\n+\n\n1\n\n\u03bdk+1\n\nlp\n\n\u2202ek+1\n\u2202ok+1\n\np\n\n\u2202ak+1\np\n\u2202ak\ni\n\n.\n\n(10)\n\nl\n\nper (7) and ok+1\n\nThe \ufb01rst term on the right side of (10) is due to ek+1\n\u2019s further dependence on the pre-synaptic activation ok\nok+1\nl\ni ). Per (7), it is important to note that the second term exists because ek+1\nak\npre-synaptic \ufb01ring rate ok+1\nFig. 3.\nPutting (8), (9), and (10) together leads to:\n\n\u2019s dependency on the post-synaptic \ufb01ring rate\ni (hence pre-activation\n\u2019s dependency on the\ni ), as shown in\n\ni (hence pre-activation ak\n\n, which further depends on ok\n\nlp\n\nlp\n\np\n\n\uf8eb\uf8ed1 \u2212\n\n1\n\n\u03bdk+1\n\n\uf8eb\uf8ed Nk(cid:88)\n\nj\n\n= wk+1\n\nli\n\n1\n\u03bdk\n\n\u2202ek+1\nli\n\u2202ok\ni\n\n+\n\nNk+1(cid:88)\n\nwk+1\n\nlj\n\nlj\n\n\u2202ek+1\n\u2202ok+1\n\nl\n\n+\n\nNk+1(cid:88)\n\np\n\np\n\n1\n\n\u03bdk+1\n\nwk+1\n\nlp\n\nwk+1\n\nlp\n\nlp\n\n\u2202ek+1\n\u2202ok+1\n\nl\n\nlp\n\n\u2202ek+1\n\u2202ok+1\n\np\n\n\u2202ak+1\np\n\u2202ak\ni\n\n.\n\n\uf8f6\uf8f8\uf8f6\uf8f8 \u2202ak+1\n\nl\n\n\u2202ak\ni\n\n(11)\n\nl\n\nIt is evident that all Nk+1\u00d7 Nk partial derivatives involving the recurrent layer k + 1 and its preceding\nlayer k, i.e. \u2202ak+1\n, l = [1, Nk+1], i = [1, Nk], form a coupled linear system via (11), which is written\n\u2202ak\ni\nin a matrix form as:\n(12)\nwhere P k+1,k \u2208 RNk+1\u00d7Nk contains all the desired partial derivatives, \u2126k+1,k \u2208 RNk+1\u00d7Nk+1 is\ndiagonal, \u0398k+1,k \u2208 RNk+1\u00d7Nk+1, \u03a6k+1,k \u2208 RNk+1\u00d7Nk, and the detailed de\ufb01nitions of all these\nmatrices can be found in Section 2.1 of the Supplementary Materials.\nSolving the linear system in (12) gives all \u2202ak+1\n\u2202ak\nj\n\n\u2126k+1,k \u00b7 P k+1,k = \u03a6k+1,k + \u0398k+1,k \u00b7 P k+1,k,\n\nP k+1,k = (\u2126k+1,k \u2212 \u0398k+1,k)\u22121 \u00b7 \u03a6k+1,k.\n\n(13)\nNote that since \u2126 is a diagonal matrix, the cost in factoring the above linear system can be reduced by\napproximating the matrix inversion using a \ufb01rst-order Taylor\u2019s expansion without matrix factorization.\nError propagation from the layer k + 1 to layer k of (6) is cast in the matrix form: \u03b4k = P T \u00b7 \u03b4k+1.\n[Backpropagation from a Hidden Feedforward Layer] The much simpler case of backpropagating\nerrors from a feedforward layer k + 1 to its preceding layer k is described in Section 2.1 of the\nSupplementary Materials.\nThe complete ST-RSBP algorithm is summarized in Section 2.4 in the Supplementary Materials.\n\n:\n\ni\n\n4 Experiments and Results\n\n4.1 Experimental Settings\n\nAll reported experiments below are conducted on an NVIDIA Titan XP GPU. The experimented\nSNNs are based on the LIF model and weights are randomly initialized by following the uniform\ndistribution U [\u22121, 1]. Fixed \ufb01ring thresholds are used in the range of 5mV to 20mV depending\non the layer. Exponential weight regularization [23], lateral inhibition in the output layer [23]\nand Adam [20] as the optimizer are adopted. The parameters like the desired output \ufb01ring counts,\nthresholds and learning rates are empirically tuned. Table 1 lists the typical constant values adopted\nin the proposed ST-RSBP learning rule in our experiments. The simulation step size is set to 1 ms.\nThe batch size is 1 which means ST-RSBP is applied after each training sample to update the weights.\nUsing three speech datasets and two image dataset, we compare the proposed ST-RSBP with several\nother methods which either have the best previously reported results on the same datasets or represent\nthe current state-of-the-art performances for training SNNs. Among these, HM2-BP [19] is the\nbest reported BP algorithm for feedforward SNNs based on LIF neurons. ST-RSBP is evaluated\n\n6\n\n\fTable 1: Parameters settings\n\nParameter\nTime Constant of Membrane Voltage \u03c4m\nTime Constant of Synapse \u03c4s\nRefractory Period\nDesired Firing Count for Target Neuron\nDesired Firing Count for Non-Target Neuron\n\nValue Parameter\n64 ms Threshold \u03bd\n8 ms\n2 ms\n35\n5\n\nSynaptic Time Delay\nReset Membrane Voltage Vreset\nLearning Rate \u03b7\nBatch Size\n\nValue\n10 mV\n1 ms\n0 mV\n0.001\n\n1\n\nusing RSNNs of multiple feedforward and recurrent layers with full connections between adjacent\nlayers and sparse connections inside the recurrent layers. The network models of all other BP\nmethods we compare with are fully connected feedforward networks. The liquid state machine (LSM)\nnetworks demonstrated below have sparse input connections, sparse reservoir connections, and a fully\nconnected readout layer. Since HM2-BP cannot train recurrent networks, we compare ST-RSBP with\nHM2-BP using models of a similar number of tunable weights. Moreover, we also demonstrate that\nST-RSBP achieves the best performance among several state-of-the-art SNN BP rules evaluated on\nthe same or similar spiking CNNs. Each experiment reported below is repeated \ufb01ve times to obtain\nthe mean and standard deviation (stddev) of the accuracy.\n\n4.2 TI46-Alpha Speech Dataset\n\nTI46-Alpha is the full alphabets subset of the TI46 Speech corpus [25] and contains spoken English\nalphabets from 16 speakers. There are 4,142 and 6,628 spoken English examples in 26 classes for\ntraining and testing, respectively. The continuous temporal speech waveforms are \ufb01rst preprocessed\nby the Lyon\u2019s ear model [26] and then encoded into 78 spike trains using the BSA algorithm [32].\n\nTable 2: Comparison of different SNN models on TI46-Alpha\n\nAlgorithm\nHM2-BP [19]\nHM2-BP [19]\nHM2-BP [19]\nNon-spiking BPb [38] LSM: R2000\n91.57% 0.20% 91.85%\nST-RSBP (this work)\n93.06% 0.21% 93.35%\nST-RSBP (this work)\na We show the number of neurons in each feedforward/recurrent hidden layer. R represent recurrent layer.\nb An LSM model. The state vector of the reservoir is used to train the single readout layer using BP.\n\n89.36% 0.30% 89.92%\n89.83% 0.71% 90.60%\n90.50% 0.45% 90.98%\n\n# Epochs Mean\n138\n163\n174\n\nStddev Best\n\n78%\n\nHidden Layersa\n800\n400-400\n800-800\n\nR800\n400-R400-400\n\n# Params\n83,200\n201,600\n723,200\n52,000\n86,280\n363,313\n\n75\n57\n\nTable 2 compares ST-RSBP with several other algorithms on TI46-Alpha. The result from [38] shows\nthat only training the single readout layer of a recurrent LSM is inadequate for this challenging\ntask, demonstrating the necessity of training all layers of a recurrent network using techniques such\nas ST-RSBP. ST-RSBP outperforms all other methods. In particular, ST-RSBP is able to train a\nthree-hidden-layer RSNN with 363,313 weights to increase the accuracy from 90.98% to 93.35%\nwhen compared with the feedforward SNN with 723,200 weights trained by HM2-BP.\n\n4.3 TI46-Digits Speech Datasest\n\nTI46-Digits is the full digits subset of the TI46 Speech corpus [25]. It contains 1,594 training\nexamples and 2,542 testing examples of 10 utterances for each of digits \"0\" to \"9\" spoken by 16\ndifferent speakers. The same preprocessing used for TI46-Alpha is adopted. Table 3 shows that\nthe proposed ST-RSBP delivers a high accuracy of 99.39% while outperforming all other methods\nincluding HM2-BP. On recurrent network training, ST-RSBP produces large improvements over two\nother methods. For instance, with 19,057 tunable weights, ST-RSBP delivers an accuracy of 98.77%\nwhile [35] has an accuracy of 86.66% with 32,000 tunable weights.\n\n7\n\n\fTable 3: Comparison of different SNN models on TI46-Digits\n\nStddev Best\n\n# Epochs Mean\n22\n21\n\nHidden Layers\n100-100\n200-200\nLSM: R500\nLSM: R3200\n\nAlgorithm\nHM2-BP [19]\nHM2-BP [19]\nNon-spiking BP [38]\nSpiLinCa [35]\nST-RSBP (this work) R100-100\nST-RSBP (this work) R200-200\nST-RSBP (this work)\naAn LSM with multiple reservoirs in parallel. Weights between input and reservoirs are trained using STDP.\nThe excitatory neurons in the reservoir are tagged with the classes for which they spiked at a highest rate during\ntraining and are grouped accordingly. During inference, for a test pattern, the average spike count of every group\nof neurons tagged is examined and the tag with the highest average spike count represents the predicted class.\n\n98.42%\n98.50%\n78%\n86.66%\n98.77% 0.13% 98.95%\n99.16% 0.11% 99.27%\n99.25% 0.13% 99.39%\n\n# Params\n18,800\n57,600\n5,000\n32,000\n19,057\n58,230\n98,230\n\n200-R200-200\n\n75\n28\n23\n\n4.4 N-Tidigits Neuromorphic Speech Dataset\n\nThe N-Tidigits [3] is the neuromorphic version of the well-known speech dataset Tidigits, and consists\nof recorded spike responses of a 64-channel CochleaAMS1b sensor in response to audio waveforms\nfrom the original Tidigits dataset [24]. 2,475 single digit examples are used for training and the same\nnumber of examples are used for testing. There are 55 male and 56 female speakers and each of\nthem speaks two examples for each of the 11 single digits including \u201coh,\u201d \u201czero\u201d, and the digits \u201c1\u201d\nto \u201c9\u201d. Table 4 shows that proposed ST-RSBP achieves excellent accuracies up to 93.90%, which\nis signi\ufb01cantly better than that of HM2-BP and the non-spiking GRN and LSTM in [3]. With a\nsimilar/less number of tunable weights, ST-RSBP outperforms all other methods rather signi\ufb01cantly.\n\nTable 4: Comparison of different models on N-Tidigits\n\n# Epoch Mean\n\nAlgorithm\nHM2-BP [19]\nGRN (NSa) [3]\nPhased-LSTM (NS) [3]\nST-RSBP (this work)\nST-RSBP (this work)\naNS represents non-spiking algorithm; bG represents a GRN layer; cL represents an LSTM layer.\n\nHidden Layers\n250-250\n2\u00d7 G200-100b\n2\u00d7 250Lc\n250-R250\n400-R400-400\n\n89.69%\n90.90%\n91.25%\n92.94% 0.20% 93.13%\n93.63% 0.27% 93.90%\n\n# Params\n81,250\n109,200\n610,500\n82,050\n351,241\n\nStddev Best\n\n268\n287\n\nTable 5: Comparison of different models on Fashion-MNIST\n\nStddev Best\n\n# Epochs Mean\n15\n\nHidden Layers\n400-400\n5\u00d7 256\n5\u00d7 256\n3\u00d7 512\n512-512\n400-R400\n\n# Params\nAlgorithm\n477,600\nHM2-BP [19]\nBP [30]a\n465,408\nLRA-E [30]b\n465,408\nDL BP [1]a\n662,026\nKeras BPc\n669706\n478,841\nST-RSBP (this work)\na Fully connected ANN trained with the BP algorithm.\nb Fully connected ANN with locally de\ufb01ned errors trained using gradient descent. Loss functions are L2 norm\nfor hidden layers and categorical cross-entropy for the output layer.\nc Fully connected ANN trained using the Keras package with RELU activation, categorical cross-entropy loss,\nand RMSProp optimizer; a dropout layer applied between each dense layer with rate of 0.2.\n\n88.99%\n87.02%\n87.69%\n89.06%\n89.01%\n90.00% 0.14% 90.13%\n\n50\n36\n\n4.5 Fashion-MNIST Image Dataset\n\nThe Fashion-MNIST dataset [40] contains 28x28 grey-scale images of clothing items, meant to serve\nas a much more dif\ufb01cult drop-in replacement for the well-known MNIST dataset. It contains 60,000\ntraining examples and 10,000 testing examples with each image falling under one of the 10 classes.\nUsing Poisson sampling, we encode each 28 \u00d7 28 image into a 2D 784 \u00d7 L binary matrix, where\nL = 400 represents the duration of each spike sequence in ms, and a 1 in the matrix represents a\n\n8\n\n\fspike. The simulation time step is set to be 1ms. No other preprocessing or data augmentation is\napplied. Table 5 shows that ST-RSBP outperforms all other SNN and non-spiking BP methods.\n\n4.6 Spiking Convolution Neural Networks for the MNIST\n\nAs mentioned in Section 1, ST-RSBP can more precisely compute gradients error than HM2-BP even\nfor the case of feedforward CNNs. We demonstrate the performance improvement of ST-RSBP over\nseveral other state-of-the-art SNN BP algorithms based on spiking CNNs using the MNIST dataset.\nThe preprocessing steps are the same as the ones for Fashion-MNIST in Section 4.5. The spiking\nCNN trained by ST-RSBP consists of two 5 \u00d7 5 convolutional layers with a stride of 1, each followed\nby a 2 \u00d7 2 pooling layer, one fully connected hidden layer and an output layer for classi\ufb01cation. In\nthe pooling layer, each neuron connects to 2 \u00d7 2 neurons in the preceding convolutional layer with a\n\ufb01xed weight of 0.25. In addition, we use elastic distortion [34] for data augmentation which is similar\nto [23, 39, 19]. In Table 6, we compare the results of the proposed ST-RSBP with other BP rules on\nsimilar network settings. It shows that ST-RSBP can achieve an accuracy of 99.62%, surpassing the\nbest previously reported performance [19] with the same model complexity.\n\nAlgorithm\nSpiking CNN [23]\nSTBP [39]\nSLAYER [33]\nHM2-BP [19]\nST-RSBP (this work)\nST-RSBP (this work)\na 20C5 represents convolution layer with 20 of the 5 \u00d7 5 \ufb01lters. P2 represents pooling layer with 2 \u00d7 2 \ufb01lters.\n\nTable 6: Performances of Spiking CNNs on MNIST\nHidden Layers\nStddev Best\n20C5-P2-50C5-P2-200a\n15C5-P2-40C5-P2-300\n12C5-p2-64C5-p2\n15C5-P2-40C5-P2-300\n12C5-p2-64C5-p2\n15C5-P2-40C5-P2-300\n\nMean\n\n99.31%\n99.42%\n99.36% 0.05% 99.41%\n99.42% 0.11% 99.49%\n99.50% 0.03% 99.53%\n99.57% 0.04% 99.62%\n\n5 Discussions and Conclusion\n\nIn this paper, we present the novel spike-train level backpropagation algorithm ST-RSBP, which can\ntransparently train all types of SNNs including RSNNs without unfolding in time. The employed S-\nPSP model improves the training ef\ufb01ciency at the spike-train level and also addresses key challenges\nof RSNNs training in handling of temporal effects and gradient computation of loss functions with\ninherent discontinuities for accurate gradient computation. The spike-train level processing for\nRSNNs is the starting point for ST-RSBP. After that, we have applied the standard BP principle while\ndealing with speci\ufb01c issues of derivative computation at the spike-train level.\nMore speci\ufb01cally, in ST-RSBP, the given rate-coded errors can be ef\ufb01ciently computed and back-\npropagated through layers without costly unfolding the network in time and through expensive time\npoint by time point computation. Moreover, ST-RSBP handles the discontinuity of spikes during\nBP without altering and smoothing the microscopic spiking behaviors. The problem of network\nunfolding is dealt with accurate spike-train level BP such that the effect of all spikes are captured and\npropagated in an aggregated manner to achieve accurate and fast training. As such, both rate and\ntemporal information in the SNN are well exploited during the training process.\nUsing the ef\ufb01cient GPU implementation of ST-RSBP, we demonstrate the best performances for both\nfeedforward SNNs, RSNNs and spiking CNNs over the speech datasets TI46-Alpha, TI46-Digits,\nand N-Tidigits and the image dataset MNIST and Fashion-MNIST, outperforming the current state-\nof-the-art SNN training techniques. Moreover, ST-RSBP outperforms conventional deep learning\nmodels like LSTM, GRN, and traditional non-spiking BP on the same datasets. By releasing the GPU\nimplementation code, we expect this work would advance the research on spiking neural networks\nand neuromorphic computing.\n\n9\n\n\fAcknowledgments\n\nThis material is based upon work supported by the National Science Foundation (NSF) under\nGrants No.1639995 and No.1948201. This work is also supported by the Semiconductor Research\nCorporation (SRC) under Task 2692.001. Any opinions, \ufb01ndings, conclusions or recommendations\nexpressed in this material are those of the authors and do not necessarily re\ufb02ect the views of NSF,\nSRC, UC Santa Barbara, and their contractors.\n\nReferences\n[1] Abien Fred Agarap. Deep learning using recti\ufb01ed linear units (relu). arXiv preprint arXiv:1803.08375,\n\n2018.\n\n[2] Filipp Akopyan, Jun Sawada, Andrew Cassidy, Rodrigo Alvarez-Icaza, John Arthur, Paul Merolla, Nabil\nImam, Yutaka Nakamura, Pallab Datta, Gi-Joon Nam, et al. Truenorth: Design and tool \ufb02ow of a 65 mw\n1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of\nIntegrated Circuits and Systems, 34(10):1537\u20131557, 2015.\n\n[3] Jithendar Anumula, Daniel Neil, Tobi Delbruck, and Shih-Chii Liu. Feature representations for neuromor-\n\nphic audio spike streams. Frontiers in neuroscience, 12:23, 2018.\n\n[4] Guillaume Bellec, Darjan Salaj, Anand Subramoney, Robert Legenstein, and Wolfgang Maass. Long short-\nterm memory and learning-to-learn in networks of spiking neurons. In Advances in Neural Information\nProcessing Systems, pages 787\u2013797, 2018.\n\n[5] Sander M Bohte, Joost N Kok, and Han La Poutre. Error-backpropagation in temporally encoded networks\n\nof spiking neurons. Neurocomputing, 48(1-4):17\u201337, 2002.\n\n[6] Ronan Collobert and Jason Weston. A uni\ufb01ed architecture for natural language processing: Deep neural\nnetworks with multitask learning. In Proceedings of the 25th international conference on Machine learning,\npages 160\u2013167. ACM, 2008.\n\n[7] Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri Harsha Choday,\nGeorgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain, et al. Loihi: A neuromorphic manycore processor\nwith on-chip learning. IEEE Micro, 38(1):82\u201399, 2018.\n\n[8] Peter U Diehl and Matthew Cook. Unsupervised learning of digit recognition using spike-timing-dependent\n\nplasticity. Frontiers in computational neuroscience, 9:99, 2015.\n\n[9] Peter U Diehl, Daniel Neil, Jonathan Binas, Matthew Cook, Shih-Chii Liu, and Michael Pfeiffer. Fast-\nclassifying, high-accuracy spiking deep networks through weight and threshold balancing. In Neural\nNetworks (IJCNN), 2015 International Joint Conference on, pages 1\u20138. IEEE, 2015.\n\n[10] Steve K Esser, Rathinakumar Appuswamy, Paul Merolla, John V Arthur, and Dharmendra S Modha.\nIn Advances in Neural Information\n\nBackpropagation for energy-ef\ufb01cient neuromorphic computing.\nProcessing Systems, pages 1117\u20131125, 2015.\n\n[11] Wulfram Gerstner and Werner M Kistler. Spiking neuron models: Single neurons, populations, plasticity.\n\nCambridge university press, 2002.\n\n[12] Arfan Ghani, T Martin McGinnity, Liam P Maguire, and Jim Harkin. Neuro-inspired speech recognition\nwith recurrent spiking neurons. In International Conference on Arti\ufb01cial Neural Networks, pages 513\u2013522.\nSpringer, 2008.\n\n[13] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.\n\n[14] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew\nSenior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic\nmodeling in speech recognition: The shared views of four research groups. IEEE Signal Processing\nMagazine, 29(6):82\u201397, 2012.\n\n[15] Dongsung Huh and Terrence J Sejnowski. Gradient descent for spiking neural networks. In Advances in\n\nNeural Information Processing Systems, pages 1433\u20131443, 2018.\n\n[16] Eric Hunsberger and Chris Eliasmith. Spiking deep networks with lif neurons.\n\narXiv:1510.08829, 2015.\n\narXiv preprint\n\n10\n\n\f[17] Eugene M Izhikevich and Gerald M Edelman. Large-scale model of mammalian thalamocortical systems.\n\nProceedings of the national academy of sciences, 105(9):3593\u20133598, 2008.\n\n[18] Yingyezhe Jin and Peng Li. Ap-stdp: A novel self-organizing mechanism for ef\ufb01cient reservoir computing.\n\nIn 2016 International Joint Conference on Neural Networks (IJCNN), pages 1158\u20131165. IEEE, 2016.\n\n[19] Yingyezhe Jin, Wenrui Zhang, and Peng Li. Hybrid macro/micro level backpropagation for training deep\nspiking neural networks. In Advances in Neural Information Processing Systems, pages 7005\u20137015, 2018.\n\n[20] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[21] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classi\ufb01cation with deep convolutional\n\nneural networks. In Advances in neural information processing systems, pages 1097\u20131105, 2012.\n\n[22] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.\n\n[23] Jun Haeng Lee, Tobi Delbruck, and Michael Pfeiffer. Training deep spiking neural networks using\n\nbackpropagation. Frontiers in neuroscience, 10:508, 2016.\n\n[24] R Gary Leonard and George Doddington. Tidigits speech corpus. Texas Instruments, Inc, 1993.\n\n[25] Mark Liberman, Robert Amsler, Ken Church, Ed Fox, Carole Hafner, Judy Klavans, Mitch Marcus, Bob\nMercer, Jan Pedersen, Paul Roossin, Don Walker, Susan Warwick, and Antonio Zampolli. TI 46-word\nLDC93S9, 1991.\n\n[26] Richard Lyon. A computational model of \ufb01ltering, detection, and compression in the cochlea. In Acoustics,\nSpeech, and Signal Processing, IEEE International Conference on ICASSP\u201982., volume 7, pages 1282\u2013\n1285. IEEE, 1982.\n\n[27] Wolfgang Maass, Thomas Natschl\u00e4ger, and Henry Markram. Real-time computing without stable states: A\nnew framework for neural computation based on perturbations. Neural computation, 14(11):2531\u20132560,\n2002.\n\n[28] Paul A Merolla, John V Arthur, Rodrigo Alvarez-Icaza, Andrew S Cassidy, Jun Sawada, Filipp Akopyan,\nBryan L Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, et al. A million spiking-neuron integrated\ncircuit with a scalable communication network and interface. Science, 345(6197):668\u2013673, 2014.\n\n[29] Abigail Morrison, Markus Diesmann, and Wulfram Gerstner. Phenomenological models of synaptic\n\nplasticity based on spike timing. Biological cybernetics, 98(6):459\u2013478, 2008.\n\n[30] Alexander G Ororbia and Ankur Mali. Biologically motivated algorithms for propagating local target\n\nrepresentations. arXiv preprint arXiv:1805.11703, 2018.\n\n[31] Filip Ponulak and Andrzej Kasi\u00b4nski. Supervised learning in spiking neural networks with resume: sequence\n\nlearning, classi\ufb01cation, and spike shifting. Neural computation, 22(2):467\u2013510, 2010.\n\n[32] Benjamin Schrauwen and Jan Van Campenhout. Bsa, a fast and accurate spike train encoding scheme. In\nNeural Networks, 2003. Proceedings of the International Joint Conference on, volume 4, pages 2825\u20132830.\nIEEE, 2003.\n\n[33] Sumit Bam Shrestha and Garrick Orchard. Slayer: Spike layer error reassignment in time. In Advances in\n\nNeural Information Processing Systems, pages 1412\u20131421, 2018.\n\n[34] Patrice Y Simard, David Steinkraus, John C Platt, et al. Best practices for convolutional neural networks\n\napplied to visual document analysis. In ICDAR, volume 3, pages 958\u2013962, 2003.\n\n[35] Gopalakrishnan Srinivasan, Priyadarshini Panda, and Kaushik Roy. Spilinc: Spiking liquid-ensemble\n\ncomputing for unsupervised speech and image recognition. Frontiers in neuroscience, 12, 2018.\n\n[36] Christian Szegedy, Alexander Toshev, and Dumitru Erhan. Deep neural networks for object detection. In\n\nAdvances in neural information processing systems, pages 2553\u20132561, 2013.\n\n[37] Paul J Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE,\n\n78(10):1550\u20131560, 1990.\n\n[38] Parami Wijesinghe, Gopalakrishnan Srinivasan, Priyadarshini Panda, and Kaushik Roy. Analysis of liquid\nensembles for enhancing the performance and accuracy of liquid state machines. Frontiers in Neuroscience,\n13:504, 2019.\n\n11\n\n\f[39] Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, and Luping Shi. Spatio-temporal backpropagation for training\n\nhigh-performance spiking neural networks. arXiv preprint arXiv:1706.02609, 2017.\n\n[40] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking\n\nmachine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.\n\n[41] Yong Zhang, Peng Li, Yingyezhe Jin, and Yoonsuck Choe. A digital liquid state machine with biologically\ninspired learning and its application to speech recognition. IEEE transactions on neural networks and\nlearning systems, 26(11):2635\u20132649, 2015.\n\n12\n\n\f", "award": [], "sourceid": 4223, "authors": [{"given_name": "Wenrui", "family_name": "Zhang", "institution": "University of California, Santa Barbara"}, {"given_name": "Peng", "family_name": "Li", "institution": "University of California, Santa Barbara"}]}