Paper ID: 1157
Title: Spectral methods for neural characterization using generalized quadratic models
Reviews

Submitted by Assigned_Reviewer_4

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper makes two contributions. First, this paper continues a trend of recent work by Park and Pillow (2011) and Ramirez & Paninski (2012) to provide a firm grounding of the spike-triggered methods STA and STC within forward, likelihood-based modelling. STA and STC are calculated as moments from the stimulus/response ensemble, and are popular for constructing stimulus/response models due to their speed and simplicity. However, as explored extensively in previous work (e.g. Paninski 2003, Sharpee et al 2004, Samengo & Gollisch 2013), the validity and utility of these moments depends on particular restrictive conditions on the stimulus set. The prior work of Park & Pillow (2011) was able to shed light on this, by relating these moments to the parameters of a Generalised Linear Model (GLM), a popular forward model for neural responses. While GLM parameter estimation typically requires log-likelihood maximisation over a full dataset (a potentially slow operation), if one instead maximises the expected log-likelihood (EL) under the stimulus distribution, the STA/STC moments provide a fast and asymptotically consistent means to estimate the GLM. Here, the authors extend this work, by asking how moments such as STA and STC can assist in the estimation of a more complex forward model -- the Generalised Quadratic Model (GQM) -- via the EL framework. In doing so, the paper makes its second contribution, in that it introduces the GQM as a new, general class of models for characterising neural responses. The GQM extends the popular Generalised Linear Model (GLM) by allowing for quadratic (rather than just linear) relationships between stimulus and response (together with a point nonlinearity and exponential-family noise).

For Gaussian likelihoods, the authors derive moment-based estimators for the GQM under the conditions of Gaussian-distributed stimuli, and axis-symmetric stimuli. A similar derivation is provided for Poisson likelihoods under a set of assumptions for the stimulus distribution. Low-rank estimates of the parameters are also shown to be asymptotically consistent. Finally, the authors demonstrate an application of these results to real neural data (intracellular voltage and spiking cells), demonstrating a performance gain of the GQM over the GLM. They also show a synthetic example where the vast speed improvements from using moments, as opposed to maximising the likelihood over a full dataset, are evident.

The paper is technically sound, giving thorough derivations of the relationships between the GQM and stimulus/response moments under the set of conditions studied. While this aspect of the work is technical in nature, its significance -- as for that of recent work on this topic raised above -- is in establishing the theoretical groundwork that underlies the workings of a common set of analytical techniques. These results also build incrementally upon these previous papers, although the relationship between the exponentiated-quadratic model presented in Park & Pillow (2011) section 2.2, and that given here in section 3.2, should be made clearer. The GQM may generally be a promising tool for neural characterisation, although this paper does not really provide a thorough enough evaluation of the realistic data requirements to properly assess this.

This paper, in its current form, does fall a little short in its clarity. Part of my confusion in reading it stems from the fact that half of the time the paper is selling the GQM, and half of the time the paper is demonstrating the deeper relationships between moment-based estimators and the GQM parameters. In addition, the title seems to emphasise spectral methods (I assume this refers to the dimensionality reduction in Section 4), but this only figures as a small part of the paper. Because of the jumps in focus between the EL approach in section 3, dimensionality reduction in section 4, and the GQM demo in section 5, it took me a number of reads just to get a handle on the narrative of the paper. Some bracketing commentary at the start and the end of each section to ground the reader would really be helpful. Also, section 3.3 should really be a part of section 4.

Some minor comments:

- line 207: E[x_i . x_j^3] is zero. This should be E[x_i . x_j^2] or E[x_j^3] or something like that in order to be a third moment.

- line 209: what happens when the stimulus is not white? Also, an explanation of how the general fourth-order moments at the end of the expression for Lambda (line 204), reduce to the matrix M = E[x_i^2 . x_j^2] would be helpful (I admit I didn't follow this logic).

- lines 283-292: Typesetting -- mu_x has the subscript in math italics on line 283, but bold in (17) and on line 291.

- Section 4. I'm not sure why the notation here was changed from vector (bold) x to matrix (bold) X, and scalar (italic non-bold) y to matrix (bold) Y. It makes this section needlessly confusing (X and Y are matrices now?) and harder to connect with the work in previous sections. Surely the authors could stick with vector x and scalar y.

- Fig 4: r^2 = 0.55 in the figure legend, but r^2 = 0.50 in the panel.

- line 122: unfinished sentence
Q2: Please summarize your review in 1-2 sentences
The proposed model, the GQM, is a natural extension of the GLM, and connects very well with moment-based estimators (e.g. the STA/STC) via the EL framework. This is a technically sound paper.

Submitted by Assigned_Reviewer_5

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents an extension of the widely-used GLM characterization of neural firing to a Generalized Quadratic Model (GQM). The authors show how to fit the GQM model using expected log-likelihoods, and apply it to real neural data. The paper is generally clear and well-written.

This is an interesting extension of the GLM model and, as far as I can tell, the fitting methods are solid. In trying to evaluate the impact of this paper, it's unclear to me whether the improvement of GQM over GLM shown in the paper actually matters in practice. In other words, is this an incremental improvement or will the GQM allow us to answer scientific questions that was not possible using the GLM? The first data example (intracellular) uses only a short snippet of data and doesn't compare to the GLM. The second data example (extracellular) shows an improvement in r^2 from 44% to 58%, but it's unclear to me whether this is a small or large improvement. This also seems to be a small dataset (167 trials, and the length of each trial is not stated).

I had some difficulties understanding Section 4:
- 'We propose to use...space basis': It's unclear to me why it makes sense to make this choice
- 'span precisely the same space': I'm having trouble seeing this
- Should the moment-based method used instead of the GQM or as part of the GQM? Should there be some performance comparison between the moment-based method and the GQM on real data?

Minor comments:
- p.3: 'However, the low-rank' is left hanging
- p.5: some of the equation references in the paragraph after equation (14) seem incorrect
Q2: Please summarize your review in 1-2 sentences
This work extends the widely-used GLM method for neural characterization. The paper is technically solid, but its benefits over existing methods is not entirely clear.

Submitted by Assigned_Reviewer_6

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)

The paper extends previous work on approximate maximum likelihood-based fitting of GLMs to the more general class of Generalized Quadratic Models. While conceptually similar to multi-filter spike-triggered covariance approaches, the GQM nicely inherits spike-history terms from the GLM. The main contribution of this work is in presenting computationally efficient fitting methods based on optimizing the so-called "expected likelihood" function (an approximate version of the true likelihood function). This paper both establishes the GQM as a good model for neural data, and presents a new and efficient fitting procedure.
Previously, [Park and Pillow 2011] and [Ramirez and Paninski 2013] very nicely described and evaluated the expected likelihood framework for estimating GLM parameters. This paper nicely extends their results to GQMs, and this is a valuable contribution, although the performance of the different algorithms/model combinations are poorly explored in the results section.
The paper has parts that are very well written, but feels a bit sloppy/hurried in others. In particular, the Results, figures and the comparisons could be organized better. There are much mixing of methods and results, which would be fine if it were handled a bit better. For instance, Fig 2. is described in Section 3.1, and Fig 3. in Section 3.3. I guess the idea was to put real data in the Results section and simulations in the Methods, but it makes for a tough read as simulations and figures are not as well described.
Minor points:
Line 122 is incomplete. What does the rest of this sentence say? I'm dying to know!
Line 135 mentions but does not define the form of A.
Line 174-180. Perhaps \tilde{a}_ml (and friends) would be better named a_MEL for maximum expected likelihood as in [Ramirez and Paninski 2013]?
Line 255. Where can we see the filter referred to here?
Figure 3 (right). Why do the MELE curves go down? Why does the optimization take *less* time for more samples?
Section 5.1. Why is a "predominantly linear" V1 neuron being fit with a generalized *quadratic* model? I can appreciate that there are still quadratic components, but I can't really tell how important they are. And there does not appear to be a GLM for comparison. Perhaps a V1 complex cell would be a better test? And how well does the GQM fit using exact-ML perform, since the stimulus does not actually come from a gaussian distribution?
Section 5.2. Stimulus history filters suddenly make an appearance here. How are they fit? Surely not by the methods described earlier, which only apply to gaussian or axis-symmetric input distributions?
Q2: Please summarize your review in 1-2 sentences
The "expected likelihood" based fitting procedure for GLMs is extended to Generalized Quadratic Models for the assumption of axis-aligned stimulus distributions, and gaussian and poisson noise models. An important contribution, however the evaluation is a bit sparse.
Author Feedback

Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however that reviewers and area chairs are very busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We thank the reviewers for their careful reading of our manuscript and many useful suggestions. We first address some points raised by all reviewers, and then consider the concerns of Reviewer 5 and 6 separately.

* Our theoretical contributions:

In our view, there are four primary contributions:

1. We introduce a moment-based method (analogous to STC) for
dimensionality reduction for analog data (e.g., membrane potential
signal)

2. We provide a model-based, unifying framework for dimensionality
reduction for both spiking and analog data

3. We clarify the link between moment-based dimensionality
reduction and ML estimation under GQM models of the data

4. We derive maximum ELL estimators for a broad class of stimulus
distributions

* On significance:

GQM is already becoming popular in the literature: e.g., [McFarland, Cui and Butts 2013] and [Rajan, Marre, and Tkacik 2013] are using the model, not to mention the long-standing popularity of second-order Volterra models in neuroscience. We believe that the details of the GQM model and our spectral estimation procedures (with their vast speed improvements over other techniques) will impact both theoreticians and practitioners.

* Writing and organization issues:

We apologize to all reviewers for deficiencies in the paper's writing and organization. We agree with many of the reviewer's comments and plan to rewrite and reorganize the manuscript accordingly.

In particular, we will:
* Generally improve flow and reinforce the narrative of the text
* Clean up the notation, and make it consistent
* Merge the current Section 3.3 into Section 4
* Place Fig 2 and Fig 3 nearer their discussions in the text, and expand their captions

* Comparison between GLM and GQM for V1 data:

A common concern of the reviewers is that the results section does not provide a systematic comparison between the GQM and GLM. We agree that this comparison should be included in Fig. 4 and its discussion. For the V1 data, GQM systematically provides a better fit than GLM: r^2 GQM:55%, GLM:50% for V1 example shown.
In addition, we note that GQM captures highly skewed distributions evident in actual membrane potential (Vm) recordings, despite symmetric stimulation. GLM only predicts a symmetric Vm distribution from symmetric stimulation. This is particularly important for Vm distributions because the skewed portion of the distribution drives action potential generation, and thus communication to downstream targets, with greatest efficacy.
Similar observations can be made for the case of RGC.

Reviewer 5:

We apologize for ambiguities that may have made it difficult to readSection 4. To clarify:

* 'We propose... space basis': We propose to use the first moment and the eigenvectors of the whitened second moment to estimate a basis of the feature space. This is similar to the technique used in STA/STC analysis, except that in our case Y need not be positive integers. The remainder of the section discusses why the eigenvectors of the whitened covariance span the feature space.

* 'span precisely the same space': Our argument is that the moment-based estimators derived earlier all arise from estimates of the quantity E[YXX^T]. Critically, however, Y was previously assumed to arise from a quadratic nonlinearity, whereas the nonlinearity f in Section 4 is allowed to be arbitrary. We have mistakenly wrote y = f(beta^T x), but it should have been "Y has mean f(beta^T x) and finite variance" which includes the Gaussian and Poisson noise cases. In fact, this argument is more general than necessary for the GQM family. The response-weighted mean and covariance are ("asymptotic") sufficient statistics. This has not to our knowledge been pointed out in the literature. We will clarify this.

* 'Should the moment-based method used instead of the GQM or as part of the GQM?': Moment-based methods are one means by which the GQM parameters may be estimated, via the expected log likelihood (ELL). In fact, we use the moment-based methods to initialize the ML fit for the GQM for Fig 5 (see also response to Reviewer 6).

Reviewer 6:

Reviewer 6 is correct in that the expected log-likelihood trick doesn't apply to the spike-history dependent filters, since we do not control the spike history and do not have an analytic description of the distribution. We would like to clarify that we consider these two aspects to be separate contributions: (1) ELL for a GQM without spike-history (offering novel moment-based formulas that confer a massive speedup over exact ML); (2) incorporating quadratic spike history into a GQM-style model. To the best of our knowledge, both contributions are novel. ELL-GQM (without spike-history) was used only as an initialization for the ML estimate of the GQM model (with spike history) shown in Fig 5. We apologize for not being more clear. In our experience, adding spike history changes the temporal shape of the filters, but leaves the spatial weighting more or less intact, thereby providing a substantial speedup when optimizing the model with spike history.