
Submitted by
Assigned_Reviewer_2
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents an elegant nonparametric
generalization of widelyused parametric neural population models. This
generalization is able to capture statistical patterns that diverge from
the parametric model predictions. Interestingly, they also show that two
apparently different models (Ising and cascaded logistic) are equivalent
to each other under certain conditions.
QUALITY: The models and
analyses are rigorous and the experiments seem to be fairly thorough.
 My main substantive comment is that it seems weird to model the
firing patterns of a neural population without taking into account the
inputs. For example, the data analyzed in section 5 reflect responses to
gabors that vary in orientation, but the dependence on orientation is not
modeled. I realize that this may be conventional in this literature, but
it still seems wrong to me.  Should cite something for the Dirichlet
process.  p. 4: Is Eq. 11 correct? p(x_i=0x_{1:i1}) doesn't equal 0
but rather 1p(x_i=1x_{1:i1}).
CLARITY: The paper is clearly
written.
ORIGINALITY: The model and results in this paper are
original.
SIGNIFICANCE: This paper will be of significance to
neuroscientists working on modeling neural populations.
MINOR
COMMENTS:  p. 3: "for which we" > "when we"  p. 4: the Ising
model's partition function does have a closed form (since it's discrete),
it's just usually intractable. Q2: Please summarize your
review in 12 sentences
A wellwritten paper with some interesting theoretical
results. Of interest mainly to neuroscientists working on neural
population modeling. Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
Review for #1163 “Universal models for binary spike
patterns using centered Dirichlet processes.”
The goal of this
paper is to provide a more accurate method for modeling the distribution
of binary spike patterns over a population of neurons. Essentially what
the authors are trying to do is improve upon parametric models of the
pattern distribution (such as a Bernoulli, cascaded logistic or Ising
model) by allowing for deviations from the parametric model (or base
model) if they are justified by the data. The involves postulating a
Dirichlet process centered upon the base model and fitting the parameters
of the base model and the concentration parameter of the Dirichlet process
via gradient ascent (although I imagine other methods could be used for
fitting). Intuitively this constitutes fitting a type of weighted average
between the probability distribution of the base model and the pattern
probabilities estimated by counting alone. Thus the base model provides a
smoothing (over the naïve, counting based maximum likelihood estimate) and
the weighting with the counting based estimate allows for deviations from
this distribution if they are strong enough in the data. The authors
discuss several tractable base models (Bernoulli and cascaded logistic)
and then proceed to demonstrate that their method optimally captures (when
compared to the base models alone and also counting based estimates) the
pattern probability distribution for both simulated data and for a
population of 10 V1 neurons.
I enjoyed reading this paper. It was
very clear what the Authors were attempting to do and estimating pattern
probability distributions that deviate from standard parametric models
(Ising, GLM etc.) is an important goal. The formalism itself seems to be
rather standard (very similar to Chapter 25 of Murphy’s Machine Learning
book) but I think the application is new, or at least I’ve never seen
anyone apply such methods to neural data before. In this vein, I would
have liked to have seen more application to real data … but I do
understand the space constraints of a NIPS paper. If the authors can, in
the final version, include another real data example I think that would
make their paper more appealing … but I won’t insist upon this, its just a
recommendation.
I don’t have any major corrections to the paper. I
would recommend some minor additions to the introduction or discussion (a
sentence or two would suffice) motivating what the Author’s method is
useful for. If I understand the paper correctly, it seems to me that the
main utility of this work is in identifying the existence of deviations
from the base model as opposed to explaining the source of those
deviations. Parametric base models have explanatory power, for example a
wellfit Ising model indicates that second order correlations are
sufficient to describe the population’s spiking statistics. The Dirichlet
process, being nonparametric, really doesn’t tell you why the
distribution deviates from the base model … just that it does. If I’m
correct here, I think the authors should put a sentence or two in the
discussion about these points.
There are also a couple typos so a
proofreading would be good. The most notable is that the last sentence of
the introduction ends mid sentence.
In summary, while the
mathematical formalism isn’t new, the application is novel and
interesting. I think the paper will be of use to the computational
neuroscience community and should be accepted as a NIPS paper subject to
some minor edits and proofreading.
Q2: Please summarize your review in 12
sentences
A nice clear presentation of a method for determining
if the probability distribution of spike patterns (across a neural
population) differs from a parametric "base" model. The method uses a
standard Dirichlet process centered upon neurophysiological base models to
generalize these models if the deviations from the base are strong enough
in the data. While the mathematics are not exactly new, the application is
new to my knowledge and quite interesting. This paper should be accepted
with minor edits. Submitted by
Assigned_Reviewer_6
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper combines two main ideas: 1 an interesting
proposal for a new nonparametric spike train model, and 2 an equivalence
between certain logistic regression models and Ising (maximum entropy)
spike train models. The two pieces don't entirely hang together, and
both parts could be fleshed out some more; the paper as a whole feels like
two notquitecomplete papers that have been shoved together to make a
longer paper.
There are also a number of typos that should be
corrected.
That aside, some more substantive comments. Re: the
first part  I agree about the importance of introducing new, more
flexible spike train models. It's clear that there's more to life
than logistic regression models and Ising models. This paper takes a
good step in the right direction, and demonstrates that incorporating a
nonparametric component is useful. (It would be worth citing Sam Behseta
here, who has done some relevant work in this area recently.) That said,
the authors don't manage to go beyond this and actually show that the new
model can do something new for us, or lead to some novel insight about
population coding. But maybe that's too much to expect in a nips paper. It
should also be noted that there are many alternative ways of moving beyond
the simple logistictype models. Dirichlet mixtures of logistics would be
another possible route. Some more discussion about alternative
possibilities would be useful.
Minor  is the predictive
distribution (5) ever used? if not, it doesn't seem to add much here.
More major  the authors propose a kind of empirical bayes
approach for estimating the model  they optimize the marginal likelihood
of the data as a function of the base measure parameters \alpha and
\theta. I can't quite tell if the posterior estimated in this way will be
consistent, in the sense that p(X \hat \alpha, G_{\hat \theta})
converges to a delta function at the true datagenerating
distribution, where \hat \alpha and \hat \theta denote the parameters
estimated via the maximum marginal likelihood procedure described in sec
2.1. It would be great if the authors could clarify this. (Clearly the
statement is true for fixed \alpha and \theta  but what happens when
\alpha and \theta adapt to the data as well?)
A note about
scalability  the authors claim the proposed methods are highly scalable,
but I worry a bit about the nonconvexity of their marginal likelihood.
Certainly if we use a complicated, highdimensional model for \theta we
can run into trouble here. This argues against the scalability of the
method. Worth discussing a bit.
The cascaded logistic model is
computationally easy because neuron i's rate only depends on the firing of
neurons 1:i1. So the order in which these neurons are arranged is
important  neuron i has i parameters to adjust, which means that neuron 1
seems much less flexible than neuron m. Maybe there's some reason why the
ordering turns out not to matter here? If not, then how do we choose the
"right" order? This issue needs to be addressed.
Fig 5 caption  i
didn't understand what was meant by "perform … perfectly" here.
It's unclear what conclusion to draw from sec 5 / fig 6, beyond
"our code can be run on real data, too."
Q2: Please
summarize your review in 12 sentences
This paper combines two main ideas: 1 an interesting
proposal for a new nonparametric spike train model, and 2 an equivalence
between certain logistic regression models and Ising (maximum entropy)
spike train models. The two pieces don't entirely hang together, and both
parts could be fleshed out some more; the paper as a whole feels like two
notquitecomplete papers that have been shoved together to make a longer
paper.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their detailed and
thoughtful comments. We greatly appreciate the time and effort of all
reviewers, and believe their suggestions will substantially improve
our manuscript. We will first briefly address points raised by
Reviewers #2 and #5, and then address the comments of Reviewer #6 in
greater detail.
Reviewer #2: 
Thank you; we
will add references on DP and correct equation #11. Also, we agree
about the importance of stimulusconditional distributions; our work
follows others (e.g., using the Ising model) that focus primarily on
marginal distributions, though we should point out that we can also
use the model to describe spike patterns conditioned on fixed
(discrete) stimuli, making it suitable (e.g.) for decoding analyses.
Reviewer #3: 
Thank you for the enthusiastic
review. We will attempt to add another neural data example and revise
the Discussion as suggested.
Reviewer #6: 
1. Nonparametric and CascadedLogistic vs Ising pieces not hanging
together:
We agree that the comparison between Ising and the
cascaded logistic model could make an independent story. However,
since the UBM isn't scalable without a scalable base measure, we view
the cascaded logistic model as an essential part of the current paper
(see also comment 6). We will make this connection more clear in our
writing.
2. Citation to Behseta's work: thanks, we will add this
and citations to other relevant literature on NP Bayes methods in
neuroscience.
3. Mixture models
Mixture modeling is
definitely a viable alternative for modeling distributions. We are
aware that Bernoulli mixture models have been used in modeling binary
images, for example. We thank the reviewer for the suggestion, and
plan to modify the paper to include discussion along these lines.
4. When do we use predictive distribution (5)?
The scatter
plots and JensenShannon divergences in the results (Figs. 3 to 6) are
computed with the predictive distribution. We will clarify this in the
paper.
5. Does the posterior concentrate with the empirical Bayes
like MAP inference procedure on (alpha, theta)?
Thank you for
raising this important issue. Since the posterior concentrates on the
true distribution for fixed \theta and \alpha as the reviewer noted,
it should not (in general) perform worse when \theta and \alpha are
adapted to the data. (That is, a random setting of hyperparameters
shouldn't do better than a maximum likelihood setting of the
hyperparameters.) We concede that we haven't proved posterior
concentration however. For now, we can guarantee only that the
posterior concentrates when \alpha remains finite (since \alpha
determines the total contribution of the base measure, which will be
overwhelmed by the data so long as alpha is finite.)
In
practice, what we observe is that if data truly come from the base
distribution, alpha runs off to infinity as sample size increases,
leaving a parametric model (which is the correct model in this case).
By contrast, if data come from a distribution not covered by the
parametric base measure, alpha converges to zero with more data,
resulting in a pure "histogram" model in the limit of infinite
data. In both scenarios, the posterior concentrates on the true
distribution.
6. Scalability: if we use a complicated,
highdimensional model for theta we can run into trouble here.
We have shown only that the UBM is scalable for the Bernoulli and
cascadelogistic base measures (the latter of which is certainly
highdimensional, though we don't regard it as complicated due to its
convexity and the ease of fitting and normalizing it). However, the
UBM approach scales for any normalizable base measure with convex
negative loglikelihood. Nonconvexity of the joint likelihood appears
only in the alpha parameter, as shown in the supplement (Eqs. 34 and
35). This means that we can tractably perform a 1D line search for
\alpha (where \theta is generally highdimensional).
7.
Importance of ordering in cascaded logistic model:
We agree this
is an important point. On page 5 we suggested using the CuthillMcKee
algorithm as a possible solution to the ordering; we actually spent
some time experimenting with orderings based on this algorithm, but
found we obtained almost identically good fits even with random
orderings, and so elected to leave this out of the paper. But we do
plan to explore the issue of how/whether ordering matters in future
work.
8. Fig 5 caption  I didn't understand what was meant by
"perform ... perfectly" here.
We apologize for our lack of
clarity. We know theoretically that cascaded logistic is in the
correct model class for Fig 5., and what we intended to say was that
the convergence does not saturate as in other examples.
We
will address these issues in the final manuscript. Thanks again for
the detailed and constructive comments.
 