__ Summary and Contributions__: The study proposed a new learning algorithm, Temporal Spike Sequence Learning Backpropagation (TSSL-BP) for synaptic optimization in Spiking Neural Networks. TSSL-BP is expected to reduce consistencies between gradient computation and loss. In addition, the authors achieved remarkable performance with short stimulus duration. The authors have compared this method to the existing learning algorithms and achieved state-of-the-art performance in standard MNIST, Fashin-MNIST and CIFAR datasets.

__ Strengths__: Some of the existing algorithms smooth the non-differential spiking information to accommodate BP-based techniques and lead to inconsistent is gradient computation. The proposed algorithm overcomes this problem by estimating inter-neuron and intra-neuron dependencies. This method, has the ability to compute output even with short duration stimulus in contrary to existing methods that use long duration to determine the label.

__ Weaknesses__: The formulation of net gradient computation, which is the summation of gradient estimated by inter-neuron and intra-neuron dependencies is vague and logical basis is required for such formulation. Also, the comparison of the performance on the datasets is limited to only few algorithms.

__ Correctness__: The proposed method has provided derivation and has shown remarkable performance with short stimuli duration and training from few epochs. The methodology presented is accurate.

__ Clarity__: The paper was well written. However, some of the sentences are repeated multiple times, especially about inter-neuron and intra-neuron dependencies.

__ Relation to Prior Work__: The study provided brief information on previous works and its difference to the proposed method. However, some of the recent literature on BP based techniques for SNN are left out.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: A method for training spiking neural networks through error back-propagation is presented. The model uses leaky-integrate and fire neurons and introduces a new method to approximate the gradient at the non-differentiable firing time. The model is benchmarked on simpler machine learning tasks such as MNIST and CIFAR10 and show state-of-the art performance.

__ Strengths__: The paper reports a new method to compute gradients in spiking neural network. Although a number of other methods exist, the presented approach is interesting and works well on practical tasks. The algorithm is nicely illustrated and seems overall sound. Although the mechanism seems not very biologically plausible it may help to deepen the general understanding of spiking neural network computation.

__ Weaknesses__: It would have been nice to see results also on more complex machine learning benchmarks like ImageNet etc. Have the authors attempted to target more ambitious benchmark data sets with their model?
It is also unfortunate that all presented tasks (MNIST, CIFAR10, etc.) are not sequential in nature. This obscures the sequence learning feature of the model that is so prominent in the title. It also has never been demonstrated that spike sequences can be actually learned with this model. This raises doubts on whether sequence learning is accurately possible with this model.
The learning rules are also inherently non-local. This makes them little attractive for neuromorphic hardware and also does not provide a lot of new insights to understand learning in the brain.

__ Correctness__: The claim that "The training of SNNs is significantly more costly than that of
the conventional neural networks" is not correct. Arguably the most efficient neural network implementation we know of is the brain which is (mostly) spiking. Also technological solutions like spiking neuromorphic hardware is often more efficient than standard GPU hardware. This claim should be corrected or removed.

__ Clarity__: The paper is well structured and understandable, but it contains a number of typos and would benefit from being carefully proof-read.

__ Relation to Prior Work__: The paper did not provide a very deep discussion of other gradient-based methods to train SNNs. Especially older work, like the Tempotron model that is very similar in nature is ignored. See:
http://courses.cs.tamu.edu/rgutier/cpsc636_s10/gutig2006tempotron.pdf
Also more recent work is not discussed, e.g.:
https://papers.nips.cc/paper/7359-long-short-term-memory-and-learning-to-learn-in-networks-of-spiking-neurons.pdf
and
https://www.mitpressjournals.org/doi/full/10.1162/neco_a_01086
These should be discussed and compared to the presented method.

__ Reproducibility__: Yes

__ Additional Feedback__: POST FEEDBACK:
The authors have further clarified. Very nice work.

__ Summary and Contributions__: The authors present a training method for spiking neural networks that is based on standard backpropagation. The main novelty is the way the paper deals with the non-differentiable spike non-linearity. Instead of approximating this hard-nonlinearity with a differentiable smooth function, the proposed method sidesteps this issue by relating the Post-synaptic current to the presynaptic spike times and then relating the pre-synaptic spike times to the pre-synaptic membrane potential.

__ Strengths__: The paper reports impressive results on a number of datasets and the proposed exact BP method is a nice change from the approximate BP methods used till now.

__ Weaknesses__: Some points that need clarification:
1)The authors argue that their approach is better because it allows neural networks to respond within very few timesteps. The results are reported after 5 timesteps. I am not sure if for such very fast respons, the dynamics of the network are playing any role in this case. For example, training in MNIST requires the correct output neuron to emit a spike in the second time step. It seems there is just one volley of activity passing instantaneously throughout the network. The network is thus more similar to binary ANNs than a SNN. The authors should comment on whether the network dynamics actually play a role in the MNIST and CIFAR10 datasets. Preferably, they should also include firing statistics with the fraction of neurons that spike during these 5 time steps, and whether any neuron manages to spike more than once.
2)L194: I do not get how du_i/dt_m is obtained by differentiating [3]. The spike time t_m is obtained by thresholding u_i, so how do you differentiate through the thresholding function?
In summary, the presented BP method is interesting and, I believe, novel. The major concern is that the dynamics of the network seem to have become irrelevant since the response is obtained almost immediately. Thus a better point of comparison would be binary ANNs and not other SNNs.
Minor comments:
L21: “rendered energy-efficient VLSI chips..’ : the sentence is incomplete
L23: “fully leveraging the theoretical computing advantages of SNNs over traditional artificial neural networks (ANNs) [14]”. I do not follow this assertion. How do liquid state machines prove a theoretical advantage of SNNs over ANNs?
L28: “a par with” -> “on par with”
L39: “combinations of thereof” -> “combinations thereof”
L86: For completeness, write down the form of the synaptic kernel \epsilon
L118 “an desired” -> “a desired”
L.155 “ these approaches effectively spread each discrete firing spike continuously over time, converting one actual spike to multiple “fictitious” spikes and also generating multiple “fictitious” PSCs”: This argument is a bit hand-wavy and not backed by evidence. A smoothed spike could be tuned so that it injects the same amount of PSC into the post-synaptic neuron as a real discrete spike. Spike time is also still well-defined as the peak of the smoothed spike waveform. See [1,2] for example, where training using smoothed
spikes is still able to produce networks with fine control over spike times.
-----
The author comments addressed my concerns. I believe it is a good paper and thus keep my score as accept.
[1]Huh, Dongsung, and Terrence J. Sejnowski. "Gradient descent for spiking neural networks." Advances in Neural Information Processing Systems. 2018.
[2]Zenke, Friedemann, and Surya Ganguli. "Superspike: Supervised learning in multilayer spiking neural networks." Neural computation 30.6 (2018): 1514-1541.

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The authors introduce a new manner of performing backpropagation in spiking neural networks, called Temporal Spike Sequence Learning Backpropagation (TSSL-BP). Their method takes both inter-neuron and intra-neuron dependencies into account and allows to learn "precise" spiking sequences on the output neurons, leading to improved learning precision on various tasks.

__ Strengths__: - The proposed learning method seems to better take into account the discrete nature of spikes than previous methods, with attention for inter-neuron and intra-neuron effects.
- The learning method leads to a high precision given a low number of necessary time steps, which is an impressive double gain.

__ Weaknesses__: - A main remark I would have is the following. The authors state in the main article that their loss is as in Eq 6. This would mean that the error is only zero if an output fires at exactly the right time. It is 1 otherwise (or 0.5 given the 1/2 in the function). I would estimate that this setup would form a hard, needle-in-a-haystack-like learning problem, in terms of determining a sensible gradient. Indeed, in the supplementary material, the authors say that they actually use the spike response kernel in their error function (Eq 3 in SM). This kernel smooths out the spike a bit, as shown in many of the authors' figures, and gives a lower error if spikes get closer to each other, so in my eyes is likely to facilitate learning with backprop. Hence, I think that this is a very relevant part of the method. The actual formula for the error is only mentioned in the SM - but should be mentioned in the main article in my eyes. Furthermore, about the shape of that kernel, the authors only say in the main article that they adopt "a first order synaptic model as the spike response kernel." As I would guess the spike response kernel is important, I would like to see the mathematical formulation in the main article (perhaps I read over it?) and I would be interested to see how the method fares without this "smoothed" error. (Please note that it is interesting that the authors "criticize" previous methods for assuming a differentiable function for the spike because it smooths out a signal).
- Currently, the results are reported for the full method. However, in order to see what the influence is of, e.g., the intra-neuron backprop part of the learning method, an ablation study would be useful. What kind of things can the method learn better when including this term?
- I see no reference to e-prop: Bellec, G., Scherr, F., Hajek, E., Salaj, D., Legenstein, R., & Maass, W. (2019). Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets. arXiv preprint arXiv:1901.09049. A direct comparison with that method is I would say out of scope, but I would be interested in a reflection of the authors on the difference with that method, which relies on synaptic traces.

__ Correctness__: As far as I can judge the methodology and claims are correct.

__ Clarity__: The article is in general well-written.

__ Relation to Prior Work__: Yes. As said though, I would be interested in having the authors comment on how their method relates to that of Bellec et al.

__ Reproducibility__: Yes

__ Additional Feedback__: In general I think this is very insightful work, with really impressive results on the given datasets.
POST FEEDBACK:
The authors have adequately addressed my comments. I congratulate them with their interesting work.