{"title": "Hybrid Macro/Micro Level Backpropagation for Training Deep Spiking Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 7005, "page_last": 7015, "abstract": "Spiking neural networks (SNNs) are positioned to enable spatio-temporal information processing and ultra-low power event-driven neuromorphic hardware. However, SNNs are yet to reach the same performances of conventional deep artificial neural networks (ANNs), a long-standing challenge due to complex dynamics and non-differentiable spike events encountered in training. The existing SNN error backpropagation (BP) methods are limited in terms of scalability, lack of proper handling of spiking discontinuities, and/or mismatch between the rate-coded loss function and computed gradient. We present a hybrid macro/micro level backpropagation (HM2-BP) algorithm for training multi-layer SNNs. The temporal effects are precisely captured by the proposed spike-train level post-synaptic potential (S-PSP) at the microscopic level. The rate-coded errors are defined at the macroscopic level, computed and back-propagated across both macroscopic and microscopic levels. Different from existing BP methods, HM2-BP directly computes the gradient of the rate-coded loss function w.r.t tunable parameters. We evaluate the proposed HM2-BP algorithm by training deep fully connected and convolutional SNNs based on the static MNIST [14] and dynamic neuromorphic N-MNIST [26]. HM2-BP achieves an accuracy level of 99.49% and 98.88% for MNIST and N-MNIST, respectively, outperforming the best reported performances obtained from the existing SNN BP algorithms. Furthermore, the HM2-BP produces the highest accuracies based on SNNs for the EMNIST [3] dataset, and leads to high recognition accuracy for the 16-speaker spoken English letters of TI46 Corpus [16], a challenging patio-temporal speech recognition benchmark for which no prior success based on SNNs was reported. It also achieves competitive performances surpassing those of conventional deep learning models when dealing with asynchronous spiking streams.", "full_text": "Hybrid Macro/Micro Level Backpropagation for\n\nTraining Deep Spiking Neural Networks\n\nYingyezhe Jin\n\nTexas A&M University\n\nCollege Station, TX 77843\n\njyyz@tamu.edu\n\nWenrui Zhang\n\nTexas A&M University\n\nCollege Station, TX 77843\nzhangwenrui@tamu.edu\n\nPeng Li\n\nTexas A&M University\n\nCollege Station, TX 77843\n\npli@tamu.edu\n\nAbstract\n\nSpiking neural networks (SNNs) are positioned to enable spatio-temporal informa-\ntion processing and ultra-low power event-driven neuromorphic hardware. How-\never, SNNs are yet to reach the same performances of conventional deep arti\ufb01cial\nneural networks (ANNs), a long-standing challenge due to complex dynamics\nand non-differentiable spike events encountered in training. The existing SNN\nerror backpropagation (BP) methods are limited in terms of scalability, lack of\nproper handling of spiking discontinuities, and/or mismatch between the rate-\ncoded loss function and computed gradient. We present a hybrid macro/micro level\nbackpropagation (HM2-BP) algorithm for training multi-layer SNNs. The tempo-\nral effects are precisely captured by the proposed spike-train level post-synaptic\npotential (S-PSP) at the microscopic level. The rate-coded errors are de\ufb01ned at\nthe macroscopic level, computed and back-propagated across both macroscopic\nand microscopic levels. Different from existing BP methods, HM2-BP directly\ncomputes the gradient of the rate-coded loss function w.r.t tunable parameters. We\nevaluate the proposed HM2-BP algorithm by training deep fully connected and\nconvolutional SNNs based on the static MNIST [14] and dynamic neuromorphic\nN-MNIST [26]. HM2-BP achieves an accuracy level of 99.49% and 98.88% for\nMNIST and N-MNIST, respectively, outperforming the best reported performances\nobtained from the existing SNN BP algorithms. Furthermore, the HM2-BP pro-\nduces the highest accuracies based on SNNs for the EMNIST [3] dataset, and\nleads to high recognition accuracy for the 16-speaker spoken English letters of\nTI46 Corpus [16], a challenging spatio-temporal speech recognition benchmark for\nwhich no prior success based on SNNs was reported. It also achieves competitive\nperformances surpassing those of conventional deep learning models when dealing\nwith asynchronous spiking streams.\n\nIntroduction\n\n1\nIn spite of recent success in deep neural networks (DNNs) [5, 9, 13], it is believed that biological\nbrains operate rather differently. Compared with DNNs that lack processing of spike timing and\nevent-driven operations, biologically realistic spiking neural networks (SNNs) [11, 19] provide a\npromising paradigm for exploiting spatio-temporal patterns for added computing power, and enable\nultra-low power event-driven neuromorphic hardware [1, 7, 20]. There are theoretical evidences\nsupporting that SNNs possess greater computational power over traditional arti\ufb01cial neural networks\n(ANNs) [19]. SNNs are yet to achieve a performance level on a par with deep ANNs for practical\napplications. The error backpropagation [28] is very successful in training ANNs. Attaining the\nsame success of backpropagation (BP) for SNNs is challenged by two fundamental issues: complex\ntemporal dynamics and non-differentiability of discrete spike events.\nProblem Formulation: As a common practice in SNNs, the rate coding is often adopted to de\ufb01ne a\nloss for each training example at the output layer [15, 32]\n||o \u2212 y||2\n2,\n\nE =\n\n(1)\n\n1\n2\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fwhere o and y are vectors specifying the actual and desired (label) \ufb01ring counts of the output neurons.\nFiring counts are determined by the underlying \ufb01ring events, which are adjusted discretely by tunable\nweights, resulting in great challenges in computing the gradient of the loss with respect to the weights.\nPrior Works: There exist approaches that stay away from the SNN training challenges by \ufb01rst\ntraining an ANN and then approximately converting it to an SNN [6, 7, 10, 24]. [25] takes a similar\napproach which treats spiking neurons almost like non-spiking ReLU units. The accuracy of those\nmethods may be severely compromised because of imprecise representation of timing statistics of\nspike trains. Although the latest ANN-to-SNN conversion approach [27] shows promise, the problem\nof direct training of SNNs remains unsolved.\nThe SpikeProp algorithm [2] is the \ufb01rst attempt to train an SNN by operating on discontinuous spike\nactivities. It speci\ufb01cally targets temporal learning for which derivatives of the loss w.r.t. weights\nare explicitly derived. However, SpikeProp is very much limited to single-spike learning, and its\nsuccessful applications to realistic benchmarks have not been demonstrated. Similarly, [33] proposed\na temporal training rule for understanding learning in SNNs. More recently, the backpropagation\napproaches of\n[15] and [32] have shown competitive performances. Nevertheless, [15] lacks\nexplicit consideration of temporal correlations of neural activities. Furthermore, it does not handle\ndiscontinuities occurring at spiking moments by treating them as noise while only computing the\nerror gradient for the remaining smoothed membrane voltage waveforms instead of the rate-coded\nloss. [32] addresses the \ufb01rst limitation of [15] by performing BPTT [31] to capture temporal effects.\nHowever, similar to [15], the error gradient is computed for the continuous membrane voltage\nwaveforms resulted from smoothing out all spikes, leading to inconsistency w.r.t the rate-coded loss\nfunction. In summary, the existing SNNs BP algorithms have three major limitations: i) suffering from\nlimited learning scalability [2], ii) either staying away from spiking discontinuities (e.g. by treating\nspiking moments as noise [15]) or deriving the error gradient based on the smoothed membrane\nwaveforms [15, 32], and therefore iii) creating a mismatch between the computed gradient and\ntargeted rate-coded loss [15, 32].\nPaper Contributions: We derive the gradient of the rate-coded error de\ufb01ned in (1) by decomposing\neach derivative into two components\n\n\u2202E\n\u2202wij\n\n=\n\n\u00d7\n\n\u2202E\n\n\u2202ai(cid:124)(cid:123)(cid:122)(cid:125)\n\nbp over \ufb01ring rates\n\n\u2202ai\n\u2202wij\n\n(cid:124)(cid:123)(cid:122)(cid:125)\n\nbp over spike trains\n\n,\n\n(2)\n\nwhere ai is the (weighted) aggregated membrane potential for the\npost-synaptic neuron i per (11). As such, we propose a novel hybrid\nmacro-micro level backpropagation (HM2-BP) algorithm which per-\nforms error backpropagation across two levels: 1) backpropagation\nover \ufb01ring rates (macro-level), 2) backpropagation over spike trains\n(micro-level), and 3) backpropagation based on interactions between\nthe two levels, as illustrated in Fig. 1.\nAt the microscopic level, for each pre/post-synaptic spike train pair,\nwe precisely compute the spike-train level post-synaptic potential,\nreferred to as S-PSP throughout this paper, to account for the tempo-\nral contribution of the given pre-synaptic spike train to the \ufb01rings of\nthe post-synaptic neuron based on exact spike times. At the macro-\nscopic level, we back-propagate the errors of the de\ufb01ned rate-based\nloss by aggregating the effects of spike trains on each neuron\u2019s \ufb01ring count via the use of S-PSPs, and\nleverage this as a practical way of linking spiking events to \ufb01ring rates. To assist backpropagation,\nwe further propose a decoupled model of the S-PSP for disentangling the effects of \ufb01ring rates and\nspike-train timings to allow differentiation of the S-PSP w.r.t. pre and post-synaptic \ufb01ring rates at\nthe micro-level. As a result, our HM2-BP approach is able to evaluate the direct impact of weight\nchanges on the rate-coded loss function. Moreover, the resulting weight updates in each training\niteration can lead to introduction or disappearance of multiple spikes.\nWe evaluate the proposed BP algorithm by training deep fully connected and convolutional SNNs\nbased on the static MNIST [14], dynamic neuromorphic N-MNIST [26], and EMNIST [3] datasets.\nOur BP algorithm achieves an accuracy level of 99.49%, 98.88% and 85.57% for MNIST, N-MNIST\nand EMNIST, respectively, outperforming the best reported performances obtained from the existing\n\nFigure 1: Hybrid macro-micro\nlevel backpropagation.\n\n2\n\nMicro-levelWeight update\u201cDirect\u201d gradient(Add/Sub) Multiple spikes (Good scalability)Macro-Rate-BPMicro-Temporal-BPTemporalLoss (Rate)RateMacro-level\fSNN BP algorithms. Furthermore, our algorithm achieves high recognition accuracy of 90.98% for\nthe 16-speaker spoken English letters of TI46 Corpus [16], a challenging spatio-temporal speech\nrecognition benchmark for which no prior success based on SNNs was reported.\n\n2 Hybrid Macro-Micro Backpropagation\n\nThe complex dynamics generated by spiking neurons and non-differentiable spike impulses are two\nfundamental bottlenecks for training SNNs using backpropagation. We address these dif\ufb01culties at\nboth macro and micro levels.\n\n2.1 Micro-level Computation of Spiking Temporal Effects\n\nThe leaky integrate-and-\ufb01re (LIF) model is one of the most prevalent choices for describing dynamics\nof spiking neurons, where the neuronal membrane voltage ui(t) at time t for the neuron i is given by\n\n\u03c4m\n\nui(t)\n\ndt\n\n= \u2212ui(t) + R Ii(t),\n\n(3)\n\nwhere Ii(t) is the input current, R the effective leaky resistance, C the effective membrane capac-\nitance, and \u03c4m = RC the membrane time constant. A spike is generated when ui(t) reaches the\nthreshold \u03bd. After that ui(t) is reset to the resting potential ur, which equals to 0 in this paper. Each\npost-synaptic neuron i is driven by a post-synaptic current of the following general form\n\n\u03b1(t \u2212 t(f )\n\nj\n\n),\n\n(4)\n\nwhere wij is the weight of the synapse from the pre-synaptic neuron j to the neuron i, t(f )\nparticular \ufb01ring time of the neuron j. We adopt a \ufb01rst order synaptic model with time constant \u03c4s\n\ndenotes a\n\nj\n\n\u03b1(t) =\n\nq\n\u03c4s\n\nexp\n\n\u2212 t\n\u03c4s\n\nH(t),\n\n(5)\n\nwhere H(t) is the Heaviside step function, and q the total charge injected into the post-synaptic\nneuron i through a synapse of a weight of 1. Let \u02c6ti denote the last \ufb01ring time of the neuron i w.r.t\ntime t: \u02c6ti = \u02c6ti(t) = max{ti|t(f )\ni < t}. Plugging (4) into (3) and integrating (3) with u(\u02c6ti) = 0 as its\ninitial condition, we map the LIF model to the Spike Response Model (SRM) [8]\n\n(cid:88)\n\nIi(t) =\n\nwij\n\nj\n\nf\n\n(cid:88)\n\n(cid:18)\n\n(cid:19)\n\n(6)\n\n(7)\n\nui(t) =\n\nwij\n\nt \u2212 \u02c6ti, t \u2212 t(f )\n\nj\n\n,\n\n\u0001\n\n(cid:16)\n\n(cid:88)\n(cid:18)\n\nf\n\n\u2212 t(cid:48)\n\n\u03c4m\n\n(cid:19)\n\n(cid:88)\n(cid:90) s\n\nj\n\n0\n\n1\nC\n\n(cid:20)\n\nexp\n\n(cid:18)\n\n(cid:19)\n\n(cid:17)\n\n(cid:18)\n\nwith\n\n\u0001(s, t) =\n\n\u03b1 (t \u2212 t(cid:48)) dt(cid:48).\n\n(cid:19)(cid:21)\n\nSince q and C can be absorbed into the synaptic weights, we set q = C = 1. Integrating (7) yields\n\n\u0001(s, t) =\n\nexp(\u2212 max(t \u2212 s, 0)/\u03c4s)\n\n1 \u2212 \u03c4s\n\n\u03c4m\n\nexp\n\n\u2212 min(s, t)\n\n\u2212 exp\n\n\u03c4m\n\n\u2212 min(s, t)\n\n\u03c4s\n\nH(s)H(t). (8)\n\n\u0001 is interpreted as the normalized (by synaptic weight) post-synaptic\npotential, which is evoked by a single \ufb01ring spike of the pre-synaptic\nneuron j.\nFor any time t, the exact \"contribution\" of the neuron j\u2019s spike train\nto the neuron i\u2019s post-synaptic potential is given by summing (8) over\nall pre-synaptic spike times t(f )\nj < t. We particularly concern\nthe contribution right before each post-synaptic \ufb01ring time t(f )\ni when\nui(t(f )\n) over all\npost-synaptic \ufb01ring times gives the total contribution of the neuron j\u2019s\nspike-train to the \ufb01ring activities of the neuron i as shown in Fig. 2\n\n) = \u03bd, which we denote by ei|j(t(f )\n\n). Summing ei|j(t(f )\n\n, t(f )\n\nj\n\ni\n\ni\n\ni\n\nei|j =\n\n\u0001(t(f )\n\ni \u2212 \u02c6t(f )\n\ni\n\ni \u2212 t(f )\n, t(f )\n\nj\n\n),\n\n(9)\n\n(cid:88)\n\n(cid:88)\n\nt(f )\ni\n\nt(f )\nj\n\n3\n\nFigure 2: The computa-\ntion of the S-PSP.\n\n\ud835\udf16(\ud835\udc60,\ud835\udc61)\ud835\udc61\ud835\udc57(\ud835\udc53)\ud835\udc61\ud835\udc56(\ud835\udc53)\ud835\udc52\ud835\udc56|\ud835\udc57(\ud835\udc61\ud835\udc56(\ud835\udc53))\ud835\udc52\ud835\udc56|\ud835\udc57\fi\n\ni\n\n(t(f )\n\ni = \u02c6t(f )\n\n) denotes the last post-synaptic \ufb01ring time before t(f )\n\nwhere \u02c6t(f )\nImportantly, we refer to ei|j as the (normalized) spike-train level post-synaptic potential (S-PSP). As\nits name suggests, S-PSP characterizes the aggregated in\ufb02uence of the pre-synaptic neuron on the\npost-synaptic neuron\u2019s \ufb01rings at the level of spike trains, providing a basis for relating \ufb01ring counts\nto spike events and enabling scalable SNN training that adjusts spike trains rather than single spikes.\nClearly, each S-PSP ei|j depends on both rate and temporal information of the pre/post spike trains.\nTo assist the derivation of our BP algorithm, we make the dependency of ei|j on the pre/post-synaptic\n\ufb01ring counts oi and oj explicit although oi and oj are already embedded in the spike trains\n\n.\n\ni\n\nei|j = f (oj, oi, t(f )\n\nj\n\n, t(f )\n\ni\n\n),\n\n(10)\n\nj\n\nand t(f )\n\nwhere t(f )\nrepresent the pre and post-synaptic timings, respectively. Summing the weighted\nS-PSPs from all pre-synaptic neurons results in the total post-synaptic potential (T-PSP) ai, which is\ndirectly correlated to the neuron i\u2019s \ufb01ring count\n\ni\n\nai =\n\nwij ei|j.\n\n(11)\n\n(cid:88)\n\nj\n\n2.2 Error Backpropagation at Macro and Micro Levels\n\nIt is evident that the total post-synaptic potential ai must be no less than the threshold \u03bd in order to\n\nmake the neuron i \ufb01re at least once, and the total \ufb01ring count is(cid:4) ai\n(cid:5). We relate the \ufb01ring count oi of\n(cid:80)\n\nthe neuron i to ai approximately by\n\n\u03bd\n\n(cid:22)(cid:80)\n\n(cid:23)\n\noi = g(ai) =\n\n=\n\nj wij ei|j\n\n\u03bd\n\n\u2248\n\nj wij ei|j\n\n,\n\n\u03bd\n\n(cid:106) ai\n\n(cid:107)\n\n\u03bd\n\n(12)\n\nwhere the rounding error would be insigni\ufb01cant when \u03bd is small. Despite that (12) is linear in S-PSPs,\nit is the interaction between the S-PSPs through nonlinearities hidden in the micro-level LIF model\nthat leads to a given \ufb01ring count oi. Missing from the existing works [15, 32], (12) serves as an\nimportant bridge connecting the aggregated micro-level temporal effects with the macro-level count\nof discrete \ufb01ring events. In a vague sense, ai and oi are analogies to pre-activation and activation\nin the traditional ANNs, respectively, although they are not directly comparable. (12) allows for\nrate-coded error backpropagation on top of discrete spikes across the macro and micro levels.\nUsing (12), the macro-level rate-coded loss of (1) is rewritten as\n\nE =\n\n||o \u2212 y||2\n\n2 =\n\n1\n2\n\n1\n2\n\n||g(a) \u2212 y||2\n2,\n\n(13)\n\nwhere y, o and a are vectors specifying the desired \ufb01ring counts (label vector), the actual \ufb01ring\ncounts, and the weighted sums of S-PSP of the output neurons, respectively. We now derive the\ngradient of E w.r.t wij at each layer of an SNN.\n\u2022 Output\n\nlayer m, we\n\nneuron\n\nlayer:\n\noutput\n\nhave\n\nFor\n\nthe\n\nthe\n\nin\n\n\u2202E\n\u2202wij\n\n=\n\n\u2202E\n\u2202am\n\ni(cid:124)(cid:123)(cid:122)(cid:125)\n\nmacro-level bp\n\nith\n\u00d7 \u2202am\ni\n\u2202wij\n\n(cid:124)(cid:123)(cid:122)(cid:125)\n\nmicro-level bp\n\n,\n\n(14)\n\nwhere variables associated with neurons in the layer m have m as the\nsuperscript. As shown in Fig. 3, the \ufb01rst term of (14) represents the\nmacro-level backpropagation of the rate-coded error with the second\nterm being the micro-level error backpropagation. From (13), the\nmacro-level error backpropagation is given by\n\n\u03b4m\ni =\n\n\u2202E\n\u2202am\ni\n\n= (om\n\ni ) g(cid:48)(am\n\ni ) =\n\ni \u2212 ym\nom\n\ni\n\n\u03bd\n\n.\n\n(15)\n\nFigure 3: Macro/micro back-\npropagation in the output\nlayer.\n\nSimilar to the conventional backpropagation, we use \u03b4m\ni\nback propagated error. According to (11) and (10), am\ni can be unwrapped as\n\nto denote the\n\ni \u2212 ym\nrm\u22121(cid:88)\n\nrm\u22121(cid:88)\n\nam\ni =\n\nwij em\n\ni|j =\n\nj=1\n\nj=1\n\n, om\n\ni , t(f )\n\nj\n\n, t(f )\n\ni\n\n),\n\n(16)\n\nwij f (om\u22121\n\nj\n\n4\n\nMacro-level (Rate) Micro-level (Temporal) \ud835\udc5c\ud835\udc57\ud835\udc5a\u22121 \ud835\udc4e\ud835\udc56\ud835\udc5a \ud835\udc57 \ud835\udc56 \ud835\udc5c\ud835\udc56\ud835\udc5a \ud835\udc64\ud835\udc56\ud835\udc57\ud835\udc5a \ud835\udc52\ud835\udc56|\ud835\udc57\ud835\udc5a \ud835\udc95\ud835\udc57(\ud835\udc53) \ud835\udc95\ud835\udc56(\ud835\udc53) \ud835\udc38\ud835\udc5f\ud835\udc5f\ud835\udc5c\ud835\udc5f \fwhere rm\u22121 is the number of neurons in the (m \u2212 1)th layer. Differentiating (16) and making use of\n(12) leads to the micro-level error propagation based on the total post-synaptic potential (T-PSP) am\ni\n\n\uf8eb\uf8edrm\u22121(cid:88)\n\n\uf8f6\uf8f8 = em\n\nrm\u22121(cid:88)\n\nwij em\ni|j\n\ni|j +\n\nj=1\n\nl=1\n\n\u2202am\ni\n\u2202wij\n\n=\n\n\u2202\n\n\u2202wij\n\nwil\n\n\u2202em\ni|l\n\u2202om\ni\n\n\u2202om\ni\n\u2202wij\n\n= em\n\ni|j +\n\nem\ni|j\n\u03bd\n\nwil\n\n\u2202em\ni|l\n\u2202om\ni\n\n. (17)\n\nrm\u22121(cid:88)\n\nl=1\n\nAlthough the network is feed-forward, there are non-linear interactions between S-PSPs. The second\nterm of (17) captures the hidden dependency of the S-PSPs on the post-synaptic \ufb01ring count om\ni .\n\u2022 Hidden layers: For the ith neuron in the hidden layer k, we have\n\nThe macro-level error backpropagation at a hidden layer is much more involved as in Fig. 4\n\n\u00d7 \u2202ak\ni\n\u2202wij\n\n(cid:124)(cid:123)(cid:122)(cid:125)\n\nmicro-level bp\n\n= \u03b4k\ni\n\n\u2202ak\ni\n\u2202wij\n\n.\n\n\u2202E\n\u2202wij\n\n=\n\nmacro-level bp\n\n\u2202E\n\u2202ak\n\ni(cid:124)(cid:123)(cid:122)(cid:125)\nrk+1(cid:88)\n\n\u03b4k\ni =\n\n\u2202E\n\u2202ak\ni\n\n=\n\n\u2202E\n\n\u2202ak+1\n\nl\n\nl=1\n\nl\n\n\u2202ak+1\n\u2202ak\ni\n\n=\n\n(18)\n\n(19)\n\n). (20)\n\nrk+1(cid:88)\n\nl=1\n\nrk(cid:88)\n\np=1\n\n\u03b4k+1\nl\n\nl\n\n\u2202ak+1\n\u2202ak\ni\n\n.\n\nwlp f (g(ak\n\np), ok+1\n\nl\n\n, t(f )\n\np , t(f )\n\nl\n\nAccording to (11) , (10) and (12), we unwrap ak+1\n\nand get\n\nl\n\nrk(cid:88)\n\nrk(cid:88)\n\nak+1\nl =\n\nwlp ek+1\n\nl|p =\n\nwlp f (ok\n\np, ok+1\n\nl\n\n, t(f )\n\np , t(f )\n\nl\n\n) =\n\np=1\n\np=1\n\nTherefore, \u2202ak+1\n\u2202ak\ni\n\nl\n\nbecomes\n\nl\n\n\u2202ak+1\n\u2202ak\ni\n\n\u2202ek+1\nl|i\n\u2202ok\ni\n\n\u2202ok\ni\n\u2202ak\ni\n\n\u2202ek+1\nl|i\n\u2202ok\ni\n\ng(cid:48)(ak\n\nwli\n\u03bd\n\n\u2202ek+1\nl|i\n\u2202ok\ni\n\n= wli\n\n= wli\n\n,\n(21)\nwhere the dependency of ek+1\nl|i on the pre-synaptic \ufb01ring count ok\ni\nis considered but the one on the \ufb01ring timings are ignored, which\nis supported by the decoupled S-PSP model in (25). Plugging (21)\ninto (19), we have\n\ni ) =\n\nrk+1(cid:88)\n\nl=1\n\n\u03b4k\ni =\n\n1\n\u03bd\n\n\u03b4k+1\nl wli\n\n\u2202ek+1\nl|i\n\u2202ok\ni\n\n.\n\nFigure 4: Macro-level back-\npropagation at a hidden layer.\n\n(22)\n\nThe micro-stage backpropagation at hidden layers is identical to that\nat the output layer, i.e. (17). Finally, we obtain the derivative of E\nwith respect to wij as follows\n\nwhere\n\n\u03b4k\ni =\n\n= \u03b4k\n\ni ek\ni|j\n\n\u2202E\n\u2202wij\n\n\uf8f1\uf8f2\uf8f3 om\n(cid:80)rk+1\n\ni \u2212ym\n\n1\n\u03bd\n\n\u03bd\n\ni\n\nl=1 \u03b4k+1\n\nl wli\n\n\uf8eb\uf8ed1 +\n\nrk\u22121(cid:88)\n\nl=1\n\n1\n\u03bd\n\nwil\n\n\u2202ek\ni|l\n\u2202ok\ni\n\n\uf8f6\uf8f8 ,\n\nfor output layer,\nfor hidden layers.\n\n\u2202ek+1\nl|i\n\u2202ok\ni\n\n(23)\n\n(24)\n\nUnlike [15, 32], here decomposing the rate-coded error backpropagation into the macro and micro\nlevels enables computation of the gradient of the actual loss function with respect to the tunable\nweights, leading to highly competitive performances. Our HM2-BP algorithm can introduce/remove\nmultiple spikes by one update, greatly improving learning ef\ufb01ciency in comparison with SpikeProp [2].\nTo complete the derivation of HM2-BP, derivatives in the forms of \u2202ek\ni|j\nas needed in (17)\n\u2202ok\ni\nand (22) are yet to be estimated, which is non-trivial as shall be presented in Section 2.3.\n\nand ek\ni|j\n\u2202ok\nj\n\n5\n\n\ud835\udc56\ud835\udc64\ud835\udc59\ud835\udc56\ud835\udc59\ud835\udeff\ud835\udc59\ud835\udc58+1\ud835\udc58+1\ud835\udc58\ud835\udc58\u22121\ud835\udc4e\ud835\udc56\ud835\udc58\ud835\udc4e\ud835\udc59\ud835\udc58+1\ud835\udeff\ud835\udc56\ud835\udc58\f2.3 Decoupled Micro-Level Model for S-PSP\n\ni|j with respect to the pre and post-synaptic neuron \ufb01ring counts\nThe derivatives of the S-PSP ek\nare key components in our HM2-BP rule. According to (9), the S-PSP ek\ni|j is dependent on both\nrate and temporal information of the pre and post-synaptic spikes. The \ufb01ring counts of pre and\npost-synaptic neurons (i.e., the rate information) are represented by the two nested summations in (9).\nThe exact \ufb01ring timing information determines the (normalized) post-synaptic potential \u0001 of each\npre/post-synaptic spike train pair as seen from (8). The rate and temporal information of spike trains\nare strongly coupled together, making the exact computation of \u2202ek\ni|j\n\u2202ok\ni\ni|j to untangle the rate and temporal\ni in the limit\ni|j into an asymptotic rate-dependent\ni and a correction factor \u02c6\u03b1 accounting for temporal correlations\n\nTo address this dif\ufb01culty, we propose a decoupled model for ek\neffects. The model is motivated by the observation that ek\nof high \ufb01ring counts. For \ufb01nite \ufb01ring rates, we decompose ek\neffect using the product of ok\nbetween the pre and post-synaptic spike trains\n\ni|j is linear in both ok\n\nj and ok\n\nand ek\ni|j\n\u2202ok\nj\n\nchallenging.\n\nj and ok\n\ni|j = \u02c6\u03b1(t(f )\nek\n\nj\n\n, t(f )\n\ni\n\n)ok\n\nj ok\ni .\n\n(25)\n\n\u02c6\u03b1 is a function of exact spike timing. Since the SNN is trained incrementally with small weight\nupdates set by a well-controlled learning rate, \u02c6\u03b1 does not change substantially by one training iteration.\nTherefore, we approximate \u02c6\u03b1 by using the values of ek\ni available before the next training\nupdate by\n\nj , and ok\n\ni|j, ok\n\n\u02c6\u03b1(t(f )\n\nj\n\n, t(f )\n\ni\n\n) \u2248 ek\ni|j\nok\nj ok\ni\n\n.\n\nWith the micro-level temporal effect considered by \u02c6\u03b1, we estimate the two derivatives by\n\n\u2202ek\ni|j\n\u2202ok\ni\n\n\u2248 \u02c6\u03b1 ok\n\nj =\n\nek\ni|j\nok\ni\n\n,\n\n\u2202ek\ni|j\n\u2202ok\nj\n\n\u2248 \u02c6\u03b1 ok\n\ni =\n\nek\ni|j\nok\nj\n\n.\n\nOur hybrid training method follows the typical backpropagation methodology. First of all, a forward\npass is performed by analytically simulating the LIF model (3) layer by layer. Then the \ufb01ring counts\nof the output layer are compared with the desirable \ufb01ring levels to compute the macro-level error.\nAfter that, the error in the output layer is propagated backwards at both the macro and micro levels\nto determine the gradient. Finally, an optimization method (e.g. Adam [12]) is used to update the\nnetwork parameters given the computed gradient.\n\n3 Experiments and Results\nExperimental Settings and Datasets The weights of the experimented SNNs are randomly initial-\nized by using the uniform distribution U [\u2212a, a], where a is 1 for fully connected layers and 0.5 for\nconvolutional layers. We use \ufb01xed \ufb01ring thresholds in the range of 5 to 20 depending on the layer. We\nadopt the exponential weight regularization scheme in [15] and introduce the lateral inhibition in the\noutput layer to speed up training convergence [15], which slightly modi\ufb01es the gradient computation\nfor the output layer (see Supplementary Material). We use Adam [12] as the optimizer and its\nparameters are set according to the original Adam paper. We impose greater sample weights for\nincorrectly recognized data points during the training as a supplement to the Adam optimizer. More\ntraining settings are reported in the released source code.\nThe MNIST handwritten digit dataset [14] consists of 60k samples for training and 10k for testing,\neach of which is a 28 \u00d7 28 grayscale image. We convert each pixel value of a MNIST image into a\nspike train using Poisson sampling based on which the probability of spike generation is proportional\nto the pixel intensity. The N-MNIST dataset [26] is a neuromorphic version of the MNIST dataset\ngenerated by tilting a Dynamic Version Sensor (DVS) [17] in front of static digit images on a computer\nmonitor. The movement induced pixel intensity changes at each location are encoded as spike trains.\nSince the intensity can either increase or decrease, two kinds of ON- and OFF-events spike events\nare recorded. Due to the relative shifts of each image, an image size of 34 \u00d7 34 is produced. Each\nsample of the N-MNIST is a spatio-temporal pattern with 34 \u00d7 34 \u00d7 2 spike sequences lasting for\n\n6\n\n\f300ms. We reduce the time resolution of the N-MNIST samples by 600x to speed up simulation. The\nExtended MNIST-Balanced (EMNIST) [3] dataset, which includes both letters and digits, is more\nchallenging than MNIST. EMNIST has 112,800 training and 18,800 testing samples for 47 classes.\nWe convert and encode EMNIST in the same way as we do for MNIST. We also use the 16-speaker\nspoken English letters of TI46 Speech corpus [16] to benchmark our algorithm for demonstrating its\ncapability of handling spatio-temporal patterns. There are 4,142 and 6,628 spoken English letters for\ntraining and testing, respectively. The continuous temporal speech waveforms are \ufb01rst preprocessed\nby Lyon\u2019s ear model [18] and then encoded into 78 spike trains using the BSA algorithm [29].\nWe train each network for 200 epochs except for ones used for EMNIST, where we use 50 training\nepochs. The best recognition rate of each setting is collected and each experiment is run for at least\n\ufb01ve times to report the error bar. For each setting, we also report the best performance over all the\nconducted experiments.\nFully Connected SNNs for the Static MNIST Using Poisson sampling, we encode each 28 \u00d7 28\nimage of the MNIST dataset into a 2D 784 \u00d7 L binary matrix, where L = 400ms is the duration\nof each spike sequence, and a 1 in the matrix represents a spike. The simulation time step is set\nto be 1ms. No pre-processing or data augmentation is done in our experiments. Table 1 compares\nthe performance of SNNs trained by the proposed HM2-BP rule with other algorithms. HM2-BP\nachieves 98.93% test accuracy, outperforming STBP [32], which is the best previously reported\nalgorithm for fully-connected SNNs. The proposed rule also achieves the best accuracy earlier than\nSTBP (100 epochs v.s. 200 epochs). We attribute the overall improvement to the hybrid macro-micro\nprocessing that handles the temporal effects and discontinuities at two levels in a way such that\nexplicit back-propagation of the rate-coded error becomes possible and practical.\n\nTable 1: Comparison of different SNN models on MNIST\n\nEpochs\n\nBest\nModel\nSpiking MLP (converted*) [24]\n94.09% 50\n94.09%\nSpiking MLP (converted*) [10]\n98.37% 160\n98.37%\nSpiking MLP (converted*) [6]\n98.64% 50\n98.64%\n97.80% 50\n97.80%\nSpiking MLP [25]\n98.71%a\n98.71% 200\nSpiking MLP [15]\n98.89% 200\n98.89%\nSpiking MLP (STBP) [32]\n98.84 \u00b1 0.02% 98.93% 100\nSpiking MLP (this work)\nWe only compare SNNs without any pre-processing (i.e., data augmentation) except for [24].\n* means the model is converted from an ANN. a [15] achieves 98.88% with hidden layers of 300-300.\n\nHidden layers Accuracy\n500-500\n500-200\n1200-1200\n300-300\n800\n800\n800\n\nFully Connected SNNs for N-MNIST The simulation time step is 0.6ms for N-MNIST. Table 2\ncompares the results obtained by different models on N-MNIST. The \ufb01rst two results are obtained by\nthe conventional CNNs with the frame-based method, which accumulates spike events over short time\nintervals as snapshots and recognizes digits based on sequences of snapshot images. The relative poor\nperformances of the \ufb01rst two models may be attributed to the fact that the frame-based representations\ntend to be blurry and do not fully exploit spatio-temporal patterns of the input. The two non-spiking\nLSTM models, which are trained directly on spike inputs, do not perform too well, suggesting that\nLSTMs may be incapable of dealing with asynchronous and sparse spatio-temporal spikes. The\nSNN trained by our proposed approach naturally processes spatio-temporal spike patterns, achieving\nthe start-of-the-art accuracy of 98.88%, outperforming the previous best ANN (97.38%) and SNN\n(98.78%) with signi\ufb01cantly less training epochs required.\nSpiking Convolution Network for the Static MNIST We construct a spiking CNN consisting of\ntwo 5 \u00d7 5 convolutional layers with a stride of 1, each followed by a 2 \u00d7 2 pooling layer, and one\nfully connected hidden layer. The neurons in the pooling layer are simply LIF neurons, each of which\nconnects to 2 \u00d7 2 neurons in the preceding convolutional layer with a \ufb01xed weight of 0.25. Similar\nto [15, 32], we use elastic distortion [30] for data augmentation. As shown in Table 3, our proposed\nmethod achieves an accuracy of 99.49%, surpassing the best previously reported performance [32]\nwith the same model complexity after 190 epochs.\nFully Connected SNNs for EMNIST Table 4 shows that the HM2-BP outperforms the non-spiking\nANN and the spike-based backpropagation (eRBP) rule reported in [21] signi\ufb01cantly with less training\nepochs.\n\n7\n\n\fTable 2: Comparison of different models on N-MNIST\n\nBest\n\n98.30% 15-20\n\nModel\nNon-spiking CNN [23]\nNon-spiking CNN [22]\nNon-spiking LSTM [23]\nNon-spiking Phased-LSTM [23]\nSpiking CNN (converted*) [22]\nSpiking MLP [4]\nSpiking MLP [15]\nSpiking MLP (STBP) [32]\nSpiking MLP (this work)\nOnly structures of SNNs are shown for clarity.* means the SNN model is converted from an ANN.\n\n95.02 \u00b1 0.30% -\n98.30%\n96.93 \u00b1 0.12% -\n97.28 \u00b1 0.10% -\n95.72%\n95.72% 15-20\n92.87%\n92.87% -\n98.74%\n98.74% 200\n98.78%\n98.78% 200\n98.84 \u00b1 0.02% 98.88% 60\n\nHidden layers Accuracy\n-\n-\n-\n-\n-\n10000\n800\n800\n800\n\nEpochs\n-\n\n-\n-\n\nTable 3: Comparison of different spiking CNNs on MNIST\n\nBest\nAccuracy\nModel\nSpiking CNN (converteda) [6]\n99.12%\n99.12%\n92.70%c\nSpiking CNN (convertedb) [7]\n92.70%\nSpiking CNN (converteda) [27]\n99.44%\n99.44%\n99.31%\n99.31%\nSpiking CNN [15]\n99.42%\n99.42%\nSpiking CNN (STBP) [32]\n99.32 \u00b1 0.05% 99.36%\nSpiking CNN (this workd)\n99.42 \u00b1 0.11% 99.49%\nSpiking CNN (this work)\na converted from a trained ANN. b converted from a trained probabilistic model with binary weights.\nc performance of a single spiking CNN. 99.42% obtained for ensemble learning of 64 spiking CNNs.\nd performance without data augmentation.\n\nNetwork structure\n12C5-P2-64C5-P2-10\n-\n-\n20C5-P2-50C5-P2-200-10\n15C5-P2-40C5-P2-300-10\n15C5-P2-40C5-P2-300-10\n15C5-P2-40C5-P2-300-10\n\nFully Connected SNNs for TI46 Speech The HM2-BP produces excellent results on the 16-\nspeaker spoken English letters of TI46 Speech corpus [16] as shown in Table 5. This is a challenging\nspatio-temporal speech recognition benchmark and no prior success based on SNNs was reported.\nIn-depth Analysis of the MNIST and N-MNIST Results Fig. 5(a) plots the HM2-BP conver-\ngence curves for the best settings of the \ufb01rst three experiments reported in the paper. The convergence\nis logged in the code. Data augmentation contributes to the \ufb02uctuation of convergence in the case of\nSpiking Convolution network. We conduct the experiment to see if our assumption used in approxi-\nmating \u02c6\u03b1 of (25) is valid. Fig. 5(b) shows that the value of \u02c6\u03b1 of a randomly selected synapse does not\nchange substantially over epochs during the training of a two-layer SNN (10 inputs and 1 output).\nAt the high \ufb01ring frequency limit, the S-PSP is proportional to ok\ni , making the multiplicative\ndependency on the two \ufb01ring rates a good choice in (25).\n\nj \u00b7 ok\n\nFigure 5: (a) HM2-BP convergence for the \ufb01rst three reported experiments; (b) \u02c6\u03b1 v.s. epoch.\n\nTraining Complexity Comparison and Implementation Unlike [32], our hybrid method does\nnot unwrap the gradient computation in the time domain, roughly making it O(NT ) times more\nef\ufb01cient than [32], where NT is the number of time points in each input example. The proposed\n\n8\n\n05101520250.010.0120.0140.0160.0180.02Epoch^,(a)(b)050100150200012345EpochError Rates (%) N-MNISTMNISTCNN\uf061\u02c6\fTable 4: Comparison of different models on EMNIST\n\nModel\nANN [21]\nSpiking MLP (eRBP) [21]\nSpiking MLP (HM2-BP)\nSpiking MLP (HM2-BP)\n\nHidden Layers Accuracy\n200-200\n200-200\n200-200\n800\n\nBest\n81.77% 30\n81.77%\n78.17% 30\n78.17%\n84.31 \u00b1 0.10% 84.43% 10\n85.41 \u00b1 0.09% 85.57% 19\n\nEpochs\n\nTable 5: Performances of HM2-BP on TI46 (16-speaker speech)\n\nHidden Layers Accuracy\n800\n400-400\n800-800\n\n89.36 \u00b1 0.30% 89.92% 138\n89.83 \u00b1 0.71% 90.60% 163\n90.50 \u00b1 0.45% 90.98% 174\n\nBest\n\nEpochs\n\nmethod can be easily implemented. We have made our CUDA implementation available online1, the\n\ufb01rst publicly available high-speed GPU framework for direct training of deep SNNs.\n\n4 Conclusion and Discussions\n\nIn this paper, we present a novel hybrid macro/micro level error backpropagation scheme to train deep\nSNNs directly based on spiking activities. The spiking timings are exactly captured in the spike-train\nlevel post-synaptic potentials (S-PSP) at the microscopic level. The rate-coded error is de\ufb01ned and\nef\ufb01ciently computed and back-propagated across both the macroscopic and microscopic levels. We\nfurther propose a decoupled S-PSP model to assist gradient computation at the micro-level. In contrast\nto the previous methods, our hybrid approach directly computes the gradient of the rate-coded loss\nfunction with respect to tunable parameters. Using our ef\ufb01cient GPU implementation of the proposed\nmethod, we demonstrate the best performances for both fully connected and convolutional SNNs\nover the static MNIST, the dynamic N-MNIST and the more challenging EMNIST and 16-speaker\nspoken English letters of TI46 datasets, outperforming the best previously reported SNN training\ntechniques. Furthermore, the proposed approach also achieves competitive performances better than\nthose of the conventional deep learning models when dealing with asynchronous spiking streams.\nThe performances achieved by the proposed BP method may be attributed to the fact that it addresses\nkey challenges of SNN training in terms of scalability, handling of temporal effects, and gradient\ncomputation of loss functions with inherent discontinuities. Coping with these dif\ufb01culties through\nerror backpropagation at both the macro and micro levels provides a unique perspective to training\nof SNNs. More speci\ufb01cally, orchestrating the information \ufb02ow based on a combination of temporal\neffects and \ufb01ring rate behaviors across the two levels in an interactive manner allows for the de\ufb01nition\nof the rate-coded loss function at the macro level, and backpropagation of errors from the macro level\nto the micro level, and back to the macro level. This paradigm provides a practical solution to the\ndif\ufb01culties brought by discontinuities inherent in an SNN while capturing the micro-level timing\ninformation via S-PSP. As such, both rate and temporal information in the SNN is exploited during the\ntraining process, leading to the state-of-the-art performances. By releasing the GPU implementation\ncode in the future, we expect this work would help move the community forward towards enabling\nhigh-performance spiking neural networks and neuromorphic computing.\n\nAcknowledgments\n\nThis material is based upon work supported by the National Science Foundation under Grant No.CCF-\n1639995 and the Semiconductor Research Corporation (SRC) under Task 2692.001. The authors\nwould like to thank High Performance Research Computing (HPRC) at Texas A&M University for\nproviding computing support. Any opinions, \ufb01ndings, conclusions or recommendations expressed in\nthis material are those of the authors and do not necessarily re\ufb02ect the views of NSF, SRC, Texas\nA&M University, and their contractors.\n\n1https://github.com/jinyyy666/mm-bp-snn\n\n9\n\n\fReferences\n[1] Ben Varkey Benjamin, Peiran Gao, Emmett McQuinn, Swadesh Choudhary, Anand R Chandrasekaran,\nJean-Marie Bussat, Rodrigo Alvarez-Icaza, John V Arthur, Paul A Merolla, and Kwabena Boahen. Neuro-\ngrid: A mixed-analog-digital multichip system for large-scale neural simulations. Proceedings of the IEEE,\n102(5):699\u2013716, 2014.\n\n[2] Sander M Bohte, Joost N Kok, and Han La Poutre. Error-backpropagation in temporally encoded networks\n\nof spiking neurons. Neurocomputing, 48(1-4):17\u201337, 2002.\n\n[3] Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andr\u00e9 van Schaik. EMNIST: an extension of mnist\n\nto handwritten letters. arXiv preprint arXiv:1702.05373, 2017.\n\n[4] Gregory K Cohen, Garrick Orchard, Sio-Hoi Leng, Jonathan Tapson, Ryad B Benosman, and Andr\u00e9\nVan Schaik. Skimming digits: neuromorphic classi\ufb01cation of spike-encoded images. Frontiers in neuro-\nscience, 10:184, 2016.\n\n[5] Ronan Collobert and Jason Weston. A uni\ufb01ed architecture for natural language processing: Deep neural\nnetworks with multitask learning. In Proceedings of the 25th international conference on Machine learning,\npages 160\u2013167. ACM, 2008.\n\n[6] Peter U Diehl, Daniel Neil, Jonathan Binas, Matthew Cook, Shih-Chii Liu, and Michael Pfeiffer. Fast-\nclassifying, high-accuracy spiking deep networks through weight and threshold balancing. In Neural\nNetworks (IJCNN), 2015 International Joint Conference on, pages 1\u20138. IEEE, 2015.\n\n[7] Steve K Esser, Rathinakumar Appuswamy, Paul Merolla, John V Arthur, and Dharmendra S Modha.\nIn Advances in Neural Information\n\nBackpropagation for energy-ef\ufb01cient neuromorphic computing.\nProcessing Systems, pages 1117\u20131125, 2015.\n\n[8] Wulfram Gerstner and Werner M Kistler. Spiking neuron models: Single neurons, populations, plasticity.\n\nCambridge university press, 2002.\n\n[9] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew\nSenior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic\nmodeling in speech recognition: The shared views of four research groups. IEEE Signal Processing\nMagazine, 29(6):82\u201397, 2012.\n\n[10] Eric Hunsberger and Chris Eliasmith. Spiking deep networks with lif neurons.\n\narXiv:1510.08829, 2015.\n\narXiv preprint\n\n[11] Eugene M Izhikevich and Gerald M Edelman. Large-scale model of mammalian thalamocortical systems.\n\nProceedings of the national academy of sciences, 105(9):3593\u20133598, 2008.\n\n[12] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[13] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classi\ufb01cation with deep convolutional\n\nneural networks. In Advances in neural information processing systems, pages 1097\u20131105, 2012.\n\n[14] Yann LeCun, L\u00e9on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to\n\ndocument recognition. Proceedings of the IEEE, 86(11):2278\u20132324, 1998.\n\n[15] Jun Haeng Lee, Tobi Delbruck, and Michael Pfeiffer. Training deep spiking neural networks using\n\nbackpropagation. Frontiers in neuroscience, 10:508, 2016.\n\n[16] Mark Liberman, Robert Amsler, Ken Church, Ed Fox, Carole Hafner, Judy Klavans, Mitch Marcus, Bob\nMercer, Jan Pedersen, Paul Roossin, Don Walker, Susan Warwick, and Antonio Zampolli. The TI46 speech\ncorpus. http://catalog.ldc.upenn.edu/LDC93S9. Accessed: 2014-06-30.\n\n[17] Patrick Lichtsteiner, Christoph Posch, and Tobi Delbruck. A 128 \u00d7128 120 db 15\u00b5s latency asynchronous\n\ntemporal contrast vision sensor. IEEE journal of solid-state circuits, 43(2):566\u2013576, 2008.\n\n[18] Richard F Lyon. A computational model of \ufb01ltering, detection, and compression in the cochlea. In\nAcoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP\u201982., volume 7, pages\n1282\u20131285. IEEE, 1982.\n\n[19] Wolfgang Maass. Networks of spiking neurons: the third generation of neural network models. Neural\n\nnetworks, 10(9):1659\u20131671, 1997.\n\n10\n\n\f[20] Paul A Merolla, John V Arthur, Rodrigo Alvarez-Icaza, Andrew S Cassidy, Jun Sawada, Filipp Akopyan,\nBryan L Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, et al. A million spiking-neuron integrated\ncircuit with a scalable communication network and interface. Science, 345(6197):668\u2013673, 2014.\n\n[21] Emre O Neftci, Charles Augustine, Somnath Paul, and Georgios Detorakis. Event-driven random back-\n\npropagation: Enabling neuromorphic deep learning machines. Frontiers in neuroscience, 11:324, 2017.\n\n[22] Daniel Neil and Shih-Chii Liu. Effective sensor fusion with event-based sensors and deep network\narchitectures. In Circuits and Systems (ISCAS), 2016 IEEE International Symposium on, pages 2282\u20132285.\nIEEE, 2016.\n\n[23] Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu. Phased lstm: Accelerating recurrent network training for\nlong or event-based sequences. In Advances in Neural Information Processing Systems, pages 3882\u20133890,\n2016.\n\n[24] Peter O\u2019Connor, Daniel Neil, Shih-Chii Liu, Tobi Delbruck, and Michael Pfeiffer. Real-time classi\ufb01cation\n\nand sensor fusion with a spiking deep belief network. Frontiers in neuroscience, 7:178, 2013.\n\n[25] Peter O\u2019Connor and Max Welling. Deep spiking networks. arXiv preprint arXiv:1602.08323, 2016.\n\n[26] Garrick Orchard, Ajinkya Jayawant, Gregory K Cohen, and Nitish Thakor. Converting static image datasets\n\nto spiking neuromorphic datasets using saccades. Frontiers in neuroscience, 9:437, 2015.\n\n[27] Bodo Rueckauer, Yuhuang Hu, Iulia-Alexandra Lungu, Michael Pfeiffer, and Shih-Chii Liu. Conversion of\ncontinuous-valued deep networks to ef\ufb01cient event-driven networks for image classi\ufb01cation. Frontiers in\nneuroscience, 11:682, 2017.\n\n[28] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-\n\npropagating errors. nature, 323(6088):533, 1986.\n\n[29] Benjamin Schrauwen and Jan Van Campenhout. BSA, a fast and accurate spike train encoding scheme. In\nProceedings of the International Joint Conference on Neural Networks, volume 4, pages 2825\u20132830. IEEE\nPiscataway, NJ, 2003.\n\n[30] Patrice Y Simard, David Steinkraus, John C Platt, et al. Best practices for convolutional neural networks\n\napplied to visual document analysis. In ICDAR, volume 3, pages 958\u2013962, 2003.\n\n[31] Paul J Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE,\n\n78(10):1550\u20131560, 1990.\n\n[32] Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, and Luping Shi. Spatio-temporal backpropagation for training\n\nhigh-performance spiking neural networks. arXiv preprint arXiv:1706.02609, 2017.\n\n[33] Friedemann Zenke and Surya Ganguli. Superspike: Supervised learning in multilayer spiking neural\n\nnetworks. Neural computation, 30(6):1514\u20131541, 2018.\n\n11\n\n\f", "award": [], "sourceid": 3481, "authors": [{"given_name": "Yingyezhe", "family_name": "Jin", "institution": "Facebook Inc"}, {"given_name": "Wenrui", "family_name": "Zhang", "institution": "Texas A&M University"}, {"given_name": "Peng", "family_name": "Li", "institution": "Texas A&M University"}]}