
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors present a method for learning to cancel input currents in a spiking network, which they then use to learn linear dynamical systems. The mathematics looks correct, and the paper is written a fairly clear manner. I know the related literature fairly well, and it appears original. The learning of linear dynamical systems in the paper is fairly limited, but the model constitutes a step in the right direction. I am impressed by how small the networks are that learn these dynamics. Typical spiking network learning uses huge numbers of neurons to approximate effective population rates. Not so here. It's neat.
Q2: Please summarize your review in 12 sentences
It's hard to make spiking networks learn interesting dynamics, and the authors present a method for doing so that is only somewhat unnatural. I think this work is worthy of acceptance.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Summary:
This paper introduces a new learning framework in leaky integrate and fire neurons, which permits a recurrent network to efficiently learn linear dynamical systems. The approach uses weight changes at two timescales: fast weight changes quickly balance excitation and inhibition, while slower weight changes learn the structure of the LDS. A key insight is that the fast plasticity which balances excitation and inhibition distributes a global signal about the network's performance to all neurons, enabling error driven learning of the LDS with a local learning rule.
Major comments:
This paper presents the intriguing idea of using the balance of excitation and inhibition to distribute global error information throughout a neural network, permitting supervised learning with a local learning rule.
Moreover, the scheme introduced is based on predictive coding, which as the paper shows, naturally leads to sparse irregular spiking activity. On this subtle view, neural firing in response to an identical input will not yield identical precise spike times; but the particular spike times for each input presentation are nonetheless precisely arranged, and cannot be replaced by a rate coded approximation without a drop in fidelity or efficiency.
The theoretical arguments of the paper are lucid and clearly expressed, and contain many smaller insights such as how various signals can be incorporated into recurrent weights, or swapped to decoded rather than actual outputs. A useful insight is the idea that a presynaptic term necessary for errordriven learning may be estimated by filtered spikes arriving at the postsynaptic neuron's synapse. Overall this paper shines in its details.
The toy examples given in simulation are sufficient to demonstrate that the technique does indeed work as expected. The paper would be stronger with a more compelling application, but already makes an interesting contribution.
It would be useful to add details of the simulations, perhaps as supplementary materials, to allow for replications. What are the values of \lambda, \mu, \lambda_V, etc?
There are several limitations to the method at present. Most notably, only linear dynamical systems can be represented. Each neuron requires a strong, precisely targeted error feedback signal. It is only the recurrent weights that learn, with the feedforward and feedback weights being fixed by hand. The network and feedback signal must operate directly on the state of the LDS, rather than an output derived from the LDS (e.g. y = Cx where C is a new matrix). The network is closely tied to the task of representing its input accurately, but in general it is not input reconstruction but a transformation from input to output that must be learned. These challenges are reasonable to leave for future work.
Minor comments:
ln 104: "to the its" > "to its" ln 175: \hat x should not be transposed, I believe ln 215: "for these reason" > "for this reason" Fig 3B caption ln 351: "top/bottom panel"> left/right panel ln 351: "using the equations" > "using equations" ln 398: "who are believed" > "which are believed"
Q2: Please summarize your review in 12 sentences
This paper presents the intriguing idea of using the balance of excitation and inhibition to distribute global error information throughout a spiking neural network, permitting supervised learning with a local learning rule. This is a highly novel role for balanced excitation/inhibition that many will find interesting.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
I greatly appreciate the problem the authors are trying to solve: supervised learning is difficult to implement in a biologically plausible manner, indeed known algorithms (such as backprop through time) are typically nonlocal. The authors propose a neat solution: unsupervised learning of fast recurrent connections establishes a detailed balance of E and I currents, which is then shown to make relevant aspects of the global output of the network available to each neuron locally. These globalmadelocal signals can then be used to learn slower connections in order to implement a linear dynamical system.
I find the idea really appealing, but I think the results are rather preliminary and the whole study could have been more thorough.
 while the spikebased rule used to learn a detailed balance can certainly be shown to achieve its goal in an isolated postsynaptic neuron, it is by no means guaranteed to work in a recurrent network (for just about the same reasons that make local approximations to error backpropagation close to useless in a recurrent net). Why is it that it works here?
 learning the slow connections requires very strong feedback (K large). The authors claim that after learning, since the error has vanished anyways, K could be set to zero without affecting performance. Is that true?
Is there a figure where we can see that? If the error was exactly zero, then I would buy that, but since it's never perfect, one is allowed to wonder about the stability of the K=0 solution to small errors. Indeed the error can be reduced arbitrarily by making K very large, without even touching the slow weights, so clearly in that case a very small error isn't at all predictive of successful outputs with K=0; for a stable solution to emerge with K small, the error with K large should be further reduced (through learning) by an amount that ought to be made more precise mathematically. I guess what I'm uncomfortable with is that if the authors' claim was true, we could imagine decreasing K progressively as W_slow is being learned, but then at some point we would hit the limit of validity of the firstorder expansion of (IM/K)^{1} used to derive Eq7 and run into trouble. Am I missing something here?
 somehow, the authors postpone the difficult problem of local supervised learning to the (in my opinion, not much easier) learning of identical input, output, and error feedback weights that must all be equal to the same matrix F. Such learning is not addressed here. Could the authors discuss/comment on that?
 in their derivations, the authors assume that the inputs to the dynamical system they are trying to learn are constant (or very slow changing). What happens if inputs are faster? (btw, what was the author's choice of input c in the learning setup of Fig3?)
Some minor comments/typos:
 line 94: what does it mean for the feedforward weights to "evenly span the input space"?
 isn't there a sign issue in Eq2? If W_ij > 0 and V_i > 0, the learning rule is going to strengthen W_ij which makes the imbalance even worse.
 l175: adjust to bold, "vector font"
 trying to derive Eq.7, I instead got \delta M_{ij} \approx (c_j + Kx_j) e_i. Did the authors drop the constant term on the basis of K being large?
 the whole paragraph l.278l.285 was rather obscure to me; in the end, don't the authors want the error to be fed back through F because of the Fe term in Eq. 9? Couldn't that be stated more simply/clearly?
 l221: different from  ref list Gutig  Guetig of Guetig  ref list Deneve  Deneve  caption of fig3B  replace top/bottom by left/right  l387 the the
 l403 different from
UPDATE following author's responses: I'm updating my evaluation to 8 due to a satisfying response to my second comment in particular.
Q2: Please summarize your review in 12 sentences
This paper provides a solution to the problem of getting biologically implementable gradients for supervised learning in recurrent networks. I found the author's solution  based on a detailed E/I balance  very neat, but I've been disappointed by the lack of rigor in the exposition and the preliminary nature of the numerical results.
Submitted by Assigned_Reviewer_4
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
 In general, I had difficulty reconciling the text and Eq. 2. In particular, these two sentences are not clear at all, and should be revised carefully: (pg 2, line 103) Since we want to keep ... (pg 3, line 159) The factor 1/2 imposes ...  In the discussion, the authors speculate on extending to nonlinear systems, however the extension to timevarying linear systems (e.g., A_t) is not discussed.  The equivalence of the two circuits in Fig. 1A is not in a strict circuittheoretic sense and assumes asymptotic behavior. The text and the figure should reflect this distinction.  (pg 5, line 250) The statement using Laplace transform should assume that the frequency is bounded.  There are quite a few typos. Please check the manuscript carefully.
Q2: Please summarize your review in 12 sentences
This paper proposes a learning algorithm in spiking recurrent networks. The paper demonstrates the algorithm's capability of learning stable linear timeinvariant dynamical systems. Three features of the appraoch are (i) the locality of the learning rules, (ii) the maintenance and utilization of balance in network activity, and (iii) the utilization of two different timescales within the network to facilitate learning.
Overall, I think the paper provides a welcome advance to learning in spiking neural networks although the class of learnable dynamical systems with the proposed model is somewhat narrow. The presentation is not clear especially in Section 1, and includes vague sentences at critical places and minor technical mistakes.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We would like to thank the reviewers for their
thoughtful comments and their constructive criticism. One frequently
raised issue is the simplicity of the toy example that illustrates the
learning in Fig 3. We will remediate this by simulating a 4 D system
representing a planar robot arm and show how the tuning curves to movement
direction develop as a result of learning. Another issue is the learning
of the feed forward weights and the feedback weights that are assumed to
be equal to each other here. Currently, we have a rule that permits the
learning of the optimal feed forward weight. Nonetheless we do not have
yet a learning rule for the feedback weights. We are investigating this
issue and we think that those weights could be learned with a similar rule
to that of the feed forward weight. We did not show the learning of the
latter here because we thought that it would be of little direct interest
to our problem without a rule for the feedback weights. Finally we will
rewrite the paragraphs and sections that the reviewers marked as unclear,
and we will report the parameters used in the
simulations.
#Assigned_Reviewer_1
We hope that we responded
to your concerns in the general comments above.
#Assigned_Reviewer_2
Indeed there are no absolute mathematical
guarantees for the fast rule to converge in a recurrent network. However,
this rule is an approximation of another exact, but biologically
implausible rule that is not shown in the paper. It can be shown that the
only fixed points of the latter are FF^T. Although we cannot show that
they are stable, we do observe in numerical simulations that the fast
weights always converge to this solution using random initializations. One
condition for the learning to work is that inputs must be independent
across input channels.
It is not stated clearly in the text, but
Fig 3B, left, shows the reconstruction error when K=0. More precisely,
the learning was achieved using a large fixed K. Different instances of
the connectivity matrices were sampled during learning. The network was
then resimulated using these instances while fixing K=0 in oder to mesure
the performance in the absence of feedback. It is clear from the plot that
at the end of learning the performance is very good. This is the case
because the slow lateral connections are learned very accurately (Fig 3C).
In fact, at the end of learning, the currents provided by the feedback are
two orders of magnitude smaller than the currents provided by the slow
connections and therefore have very little influence.
The rule is
derived assuming slowly changing input in order to make the derivations
possible to do analytically. However, in practice the inputs used can be
very fast. For instance, in Fig 3 the input is a filtered Gaussian
noise.
#Assigned_Reviewer_3
We only considered the extension
to nonlinear systems because nonlinear dynamics are frequently used in
order to model many cognitive tasks, e.g decision making, memory
retrieval, etc.
The two circuits are strictly equivalent if in the
top panel of Fig 1A, the leaky integration takes place at the node that is
on the left of the population.
#Assigned_Reviewer_6
It is
stated that our work is simply a spiking version of an earlier work by
Gomi and Kawato (1993, perhaps?) and therefore it is of limited interest.
We strongly disagree with this claim for three main reasons.
First,
translating a learning scheme for rate neurons into a learning scheme for
spiking neurons is usually very challenging. Indeed, rate neurons and
spiking neurons are not always equivalent in a phenomenological sense
(Brunel 2000, Ostojic 2014). An example is that of FORCE learning
(Sussillo and abbott 2009) which is very hard to translate into a spiking
setup.
Second, our approach is substantially different and it is
not an extension of the cited work. It uses recurrent networks while the
cited paper assumes feed forward architectures, such as a multilayered
perceptron. In addition to its large presence in the brain, recurrence in
our case permits cooperation between neurons and makes the neural code
very efficient (few spikes and very high accuracy). Moreover, the
learning rule in the cited work is a priori nonlocal.
Third and
most importantly, we link widely observed phenomena in the brain such as
balance between excitation and inhibition and irregular firing, to neural
coding and learning.

