Paper ID: | 7967 |
---|---|

Title: | A Similarity-preserving Network Trained on Transformed Images Recapitulates Salient Features of the Fly Motion Detection Circuit |

This paper examines neural circuits for motion detection, which have been extensively modeled and are certainly of interest to the field. A nice connection is drawn with recent work on biologically plausible learning of similarity matching objective functions, which is also of great interest to the field. While the work draws a nice connection on timely topics, some of the conclusions and biological predictions were difficult for me to follow. While the paper is decently written, the presentation is dense and difficult to follow in several places. This is partly due to space constraints on the manuscript, but I nonetheless think there is room for improvement in sections 2.1, section 3, and section 4. Section 2.1 and the preceding paragraph would be easily understood by readers with a background in group theory (I imagine this would be mostly limited to physicists), but I don't think this level of sophistication is necessary to convey the main ideas of the paper. It should certainly be mentioned that Lie groups provide an elegant perspective, but I think this is currently over-emphasized, and jargon terms like "generators" are over-used, without much explanation. In reality the model is simply learning something pretty simple due to the linearity assumption on line 113. A greater effort should be made to provide readers with intuition, e.g., "the model aims to learn a small set of canonical, linear transformations A_1, A_2, etc., which may, for example, denote rotations or translations of input images. Formally, these may be viewed as a Lie Algebra..." Section 3 is a bit long and unclear. I think the main takeaway is that the model learns reasonable transformations corresponding to shifts and rotations. I am unsure why the comparisons to PCA and K-means are important. Perhaps this section could be shortened/streamlined? Perhaps it does not deserve its own section and should be absorbed into section 2 or 4. Section 4 is maybe the most confusing to me, and could be clarified. Lines 256-262 describe a model very informally (no equations) which would require the reader to do a lot of work to parse and understand. It is not 100% clear to me how things like back-propagating action potentials could support the Hebbian learning rule without interfering with signals flowing from synapse to soma. Making more explicit references to equation 18 would help here (it took me a while to figure out that this was the relevant expression).

As far as I know, the model is novel. As was recognized earlier by Rao & Ruderman, Lie groups are an elegant way to think about visual invariance, and this model appears to be the first attempt to specify how the brain might learn to approximate such a representation. One thing that I felt was lacking was a discussion of why Lie groups are useful at a conceptual level. The authors offer a very brief technical introduction and then jump right into the learning model. It's significance as a biological model is a bit hard for me to ascertain, because the authors don't really evaluate it with respect to benchmark physiological data, apart from a few stylized facts. What one would want to see is that known receptive field properties of these neurons emerge from this learning rule. For example, in their discussion of mammalian pyramidal cells, the authors argue that known properties of these neurons would allow them to carry out these computations, but I didn't see any discussion of how the representations posited by their model offer a better account of cortical tuning properties than any alternative model. In the discussion of the Hassenstein-Riechardt detector, it was interesting to see that the model could capture the dependence on three pixels (although this confused me because neurons don't get information about pixels in the real world). But I would have liked to see that the tuning functions for this model actually better account for fly motion detector tuning functions compared to the Hassenstein-Reichardt detector, or any other model (surely there are more recent ones?). Minor comments: p. 5: "Now we have identified" -> "Now that we have identified" ------------ Post-rebuttal comments: The responses to my comments were thoughtful. In terms of model comparison, I was looking for comparisons in terms of how well the models fit data (what are the data that this model explains and other models don't). It didn't seem like the response provided that. Instead, they showed that it was more robust compared to another model, which doesn't say anything about fit to empirical data. Overall, I've decided to increase my rating by one point.

Quality: I found the notion of similarity preserving transformations interesting, as well the posited learning rules. In terms of normative modeling, I found it somewhat odd that the authors considered only the case of K=2 (Line 212) filters. Given a reasonable set of natural movie statistics, it would be interesting to see what filters are generated by larger values of K, and whether they correspond to those seen in fly vision. Clarity: I found the paper very dense to read and difficult to understand. While some of this is necessary because of the mathematical machinery involved, I have the sense that the clarity could be improved by putting some of the intuition first. For example, after going through all the math of section 2, one of the core results is given by Figure 2. After thinking it through, I found the core idea that translations are good at approximating future images from past images very intuitive, but during the process of reading the paper and working through the math, very little of that intuition came across. This might be a matter of taste, but I’d consider myself a fairly mathematical person and I still had a lot of trouble working my way through all the formalism. Originality and Significance: I would classify this work as original, and possibly significant. Receptive fields forming through a combination of local learning rules and natural image is an interesting area of research, and the authors make some progress towards positing plausible objective functions as well as local learning rules