NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:914
Title:Multivariate Sparse Coding of Nonstationary Covariances with Gaussian Processes

Reviewer 1


		
The authors provide a model of stochastic processes that aim at modeling strong non-stationarities. The model is built from Gaussian Processes (GPs) and has multivariate outputs. Algorithms are carefully discussed for the Bayesian implementation. Good results are provided on two real applications to iEEG and criminology. Originality: As discussed in the paper, the model is more flexible than previous approaches for modeling non-stationarity. The method and applications of the paper are thus original, in my opinion. Quality: The model is complex and seems to provide good numerical results in applications. The methodology is well developed. Clarity: I think the clarity could be improved. I would have liked to see more careful definitions of the different quantity involved, and more explanations and discussions. For instance around equation (3) I did not understand what C is exactly. Significance: I think the topic is significant and the methods of the paper are useful.

Reviewer 2


		
Originality: As mentioned in the paper, when dealing with non-stationary data within GPs, there are two approaches: 1. Use a non-stationary covariance function and 2. Partition the input space into local regions and for each region fit a stationary GP. The paper follows the latter, although the novelty of the paper is the fact that the probabilistic partition is coupled with ARD which yields the most relevant data point for each partition member, which become the training locations for the global GP. The paper lack comparisons against the first type of non-stationary GP models. Clarity: I found the presentation of the paper very clear and easy to follow. It would have been ideal to see, for the sake of comprehension, an example showing the partitions learned with each local GP on it and the long-range global GP. Quality: The paper is technically correct although there are some potential issues that limit the effectiveness of the proposed model. For instance, the multivariate model is limited to highly correlated signals since the underlying global GP is the same across all multivariate variables. A mixture of global GPs could allow for more flexibility across signals, as is typically done in multivariate GPs models. Questions: - Given that the probabilistic model for the partitions is based on observation correlations, how does the model guarantee that each partition is local? - Do you think is possible to encode the observation correlations into a Chinese Restaurant process in order to determine the number of partitions nonparametrically? Significance: I can see this model being used in the problem of SOZ detection given its principled approach and the experimental results, although the scope of the model seems rather limited to applications to highly correlated signals.

Reviewer 3


		
Generally, the idea is clearly conveyed. Although the techniques employed are not new, the modelling is quite intuitive, and all derivations are technically sound. However, there is a major problem with the rigor of the related work section. The paper seems to have missed an important work on non-stationary GP which imposes hyper GP priors on the parameters of the squared exponential covariance function (Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo, Heinonen et al., 2016). I believe this work does not belong in the same category with [14] (which the authors have claimed to be “too strong of a modeling assumption”), because it jointly learns the distribution of the covariance parameters. I feel that the referred work is relevant and deserves to be rigorously compared with the proposed method in the experiment section. As for [14], the authors have also not justified their claim with empirical evidence. Most comparisons are made against stationary methods such as full GP and local stationary GP ensemble with a separate partitioning step. I would like to see how [14] proves to be ineffective in the domain of interest because of its restrictive assumption. It would also be helpful if the author can show RMSE vs. no. Gibbs sampling iterations to demonstrate the convergence of the proposed method. -- Post-rebuttal Feedback: Thank you for the response, which has reasonably addressed my concern. I have increased my rating of your work.