__ Summary and Contributions__: This paper concerns "motif" finding in multi electrode spike train data (that is particular temporal arrangement of spikes from an ordered set of neurons, that recur in the spike train data, that is shrouded by noise, i.e., random spikes). The proposed technique's closest relative is the convolutive nonnegative matrix factorization, which "factorizes" the time binned spike train matrix into a sum of "motif" matrices and their corresponding trigger times plus noise. The parameterized optimization in that case is not difficult to see.
The current paper gives convolutive nonnegative matrix factorization a Bayesian flavor. They appeal to the Neyman-Scott process as the generative model, for which they perform posterior inference.
The proposed scheme is first of all, rate based. That is, the generative model is a hierarchical process where rate based motifs come together to assign each neuron a rate function. And at the next stage, the neuron fires according to the inhomogeneous Poisson process with that rate. Second, the rates are built out of Gaussian bumps according to each motif. Posterior inference is based on a collapsed Gibbs sampler.

__ Strengths__: The primary novelty of the paper is the observation that a particular Neyman-Scott process can play the role of generating rate based motif's in multi neuron spike data. The construction of the mcmc based Posterior inference, draws from prior ideas on mixture models, and although laborious, are not particularly novel.

__ Weaknesses__: The biggest drawback seems to be that the model is over parametrized, particularly since it relates to the spike train data via a Poisson process. And therefore (1) there is a possibility of finding spurious motifs when there are none. Experiments determining whether there is overfit should be done. (2) By definition, a motif is a motif only when it recurs in the data (otherwise anything could vacuously be a motif). The model does not address this. (3) Number of motifs: Dirichlet versus using an AIC/BIC like criterion would help validation.

__ Correctness__: yes

__ Clarity__: yes

__ Relation to Prior Work__: yes

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper formulates a point-process model for neural sequences (called PP-Seq), which allows a sparse and interpretable representation of neural sequences as clusters of spikes with latent sources. The model is extended to include a "time-warping factor" which is a local scaling of time axis within each sequence; this addition of extra parameters is feasible due to the sparsity of the model. The paper then develops a sampling-based inference method that is suitable for the model, making use of an annealing procedure for further efficiency. The method is tested on two synthetic datasets and two real datasets, first for basic demonstrations of method and then for more challenging applications with less stereotyped sequences (with more pronounced time-warping effect).

__ Strengths__: This is an excellent paper that touches all the important aspects of method development. The model is well-motivated and theoretically sound; a suitable and efficient inference method is proposed; the method is tested using synthetic data with various statistics of interest; finally, the method is applied to real datasets to demonstrate the usefulness, while being compared to a benchmark.
The proposed model (PP-Seq) is motivated as a continuous-time generalization of another well-grounded model for sequence detection, convolutive NMF, but it offers many novel properties that make this new model a valuable contribution. The model provides an intuitive representation of neural sequences as clusters of spikes around a latent source, and its sparse nature enables the addition of time-warping factor, a core feature of the method as demonstrated in this paper.
This method will be a significant impact on the experimental neuroscience community; the problem of neural sequence detection is itself an important problem in contemporary neuroscience. Moreover, the time-warping factor in the model will allow flexible and interpretable analysis of hippocampal data, where such temporal scaling is commonly observed. The efficiency of the method is also a plus. For the computational neuroscience community, as well, the proposed point-process model will be a useful baseline for the development of other models.

__ Weaknesses__: The choice of priors for the parameters in the model may be tricky when applying this method to a different dataset; if so, a further understanding of the role of priors in the inference would be a topic of further study. But this is a fine point and I do not really see a major weakness in this work.

__ Correctness__: As far as I can see, the methods appear to be carefully developed and clearly described in the paper. The paper also follows a good practice of statistical modeling in general, for example selecting the hyperparameter through cross-validation, and testing the method using synthetic data.

__ Clarity__: The paper is very well written, starting from the backgrounds and building towards the forefronts of the model, balancing technicalities and insights at the right level.

__ Relation to Prior Work__: It is clearly discussed throughout the paper how the current work is motivated based on an existing method (convNMF), and what were the limitations of the previous method that were improved in this work (continuous-time, sparseness, additional features like time-warping). Results from the proposed method are compared to the previous method, demonstrating the differences.

__ Reproducibility__: Yes

__ Additional Feedback__: Great work --- it was a pleasure to read. I just have some minor comments:
- L106, L112: the \Delta notation is not familiar to me; a brief explanation may help.
- Eq 3: the \omega notation (I understand that this is for "warping") is confusing to someone from physics or electrical engineering background, because it feels like it should be a frequency.
- Fig 3E: is the response "offset" same as the latency?
=======
** Edit ** The author response addressed my question about prior selection; it would be nice to include the discussion in the revised paper as well. I am still enthusiastic about this paper.

__ Summary and Contributions__: The authors present a model for finding sequences of spikes across a population of recorded neurons. The modeling framework (PP-seq) makes clever use of a Neyman-Scott process formulation with marked Poisson processes to define sequences that occur across a set of noisy neurons. The model definition allows the spikes to be partitioned into groups so that inference can be performed with a collapsed Gibbs sampler.

__ Strengths__: The authors provide a thorough derivation of the statistical model and fitting methods and demonstrate how the model is appropriate for finding sequences in neural data. The model is shown to improve upon existing approaches, and it provides a novel take on the problem. The methodology is well-presented and the results are convincing. This method, especially with the time-warping of sequences, is likely to be very valuable to many in the neuroscience community and the work is very fitting for NeurIPS audience.

__ Weaknesses__: One lingering question was how well does the Gibbs sampler converge? (metrics such as \hat{R} or recent improvements like split-\hat{R}). If different realizations of the chain end up giving somewhat different results (as the authors allude to, similar types of chains can get stuck), what recommendations can the authors provide for practitioners when using this method?
UPDATE: The authors' response has answered some of my questions here. To better clarify my original comments, I did not expect the Gibbs sampler to converge to exactly the same result each run (and I don't think it's a catastrophic problem, just as it isn't for ConvNMF). I was more interested in a few best practices recommendations for experimentalists applying this approach (e.g., multiple runs). I remain enthusiastic of this method.

__ Correctness__: The claims and methods presented are correct.

__ Clarity__: The paper was very well-written.
Minor: the terms in the Gibbs sampler in eq 5 were a bit hard to parse in the paper, but the steps were made clearer in the supplementary

__ Relation to Prior Work__: The paper clearly compares to existing methodology.
One minor point, line 46 mentions that convNMF uses a least-squares criterion and the abstract says this is suboptimal. I agree completely, but it would help to briefly mention here why itâ€™s suboptimal.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The paper proposes a latent process model for spike trains using Neyman-Scott process. The method extends convNMF to continuous time and does fully Bayesian inference by MCMC. The paper also proposes to parallelize the MCMC to accelerate the inference.
The method was tested on real neural recording of HVC and hippocampus.

__ Strengths__: The proposed method uses continuous spiking time which addresses the shortcoming of binned spike counts in previous methods and provides a sparse representation. The proposed MCMC scheme enables fully Bayesian inference on the latent sequences.

__ Weaknesses__: I suppose that the number of types R and number of events K are hyperparameters that require predetermined before inference. How were these numbers determined in this study?
The authors claim that the proposed parallel MCMC has a decent speed. The global parameter Theta however does not benefit from the parallel execution. How was the parameters (including the time warping parameter) learned?
The author feedback addressed my concerns.

__ Correctness__: The claims, method and experiments are correct to the best of my knowledge.

__ Clarity__: The paper is overall well written.
Can you please elaborate more on how were alpha and beta adjusted slowly in the annealing procedure in Line 182?

__ Relation to Prior Work__: It is clearly discussed how this work differs from previous contributions.

__ Reproducibility__: Yes

__ Additional Feedback__: Fig 1A. Can you please explain how the 2D latent events were sampled from a homogeneous Poisson process?