NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:7726
Title:Combining Generative and Discriminative Models for Hybrid Inference

Reviewer 1


		
This paper is very well-written, providing a clear description of the proposed method and a compelling argument for why it would be useful. The experiments nicely demonstrate the advantages of this method (combining the advantages of generative models with those of DL for approximate inference). I really liked this paper. There is some related work in this field. As far as I know, the authors do a good job at putting this work in context and citing relevant papers. I believe this work is both original and highly significant.

Reviewer 2


		
Overall this is a nice idea that works on using black box models to amortize the residuals from doing inference assuming a linearized approximation to the model. I found the experiments to be well organized albeit mostly on small scale/synthetic data. Summary: This paper introduces a procedure for combining graph neural networks with traditional methods for probabilistic inference (instantiated in HMMs). When we have linear dynamics in a HMM, inference is exact. For nonlinear dynamics, when we have access to the functional form of the true dynamics of the state space model, we can linearize the transition and emission functions (via a Taylor expansion) and represent them as matrices. Using the transition and emission matrices, we can obtain the closed form update rules for the latent variables given observations. These form the “GM-messages”. However since we linearized the (potentially nonlinear) model, we may have a gap between the true posterior and the one representable by GM-messages. To fix this, the paper models the residuals from the linearized model using a graph neural network (specifically a combination of an GRU and a few MLPs). The combination of the two is used to model the updates to the latent states. The method is compared to (a) Kalman smoothing (b) a variant of the method that only uses message passing, and (c) a variant that only uses the GNN messages on three datasets: (i) a linear model where the number of samples to match the true states is tracked -- the key finding is that the hybrid model is capable of finding the optimal posterior distribution (ii) a nonlinear lorentz attractor simulation where the MSE between the true states and inferred states is tracked -- here the hybrid model realizes the lowest MSE (iii) a robotics dataset where the goal is to infer the true GPS location of the robot from ground truth measurements. Here too, the hybrid model realizes the lowest MSE. Comments: (1) One way to improve the manuscript would be to study how the proposed method scales with the dimensionality of the observations as well as how the sample complexity increases as the true transition/emission function move further away from being linear. (2) One limitation of the work that could be made more explicit is that practitioners need to know the functional form of the transition and emission function apriori in order to estimate the matrices necessary to initialize the GM-messages. (3) I think the experimental section can also be improved by adding in a comparison to Unscented Kalman Filters. Unlike EKFs which work with a linearization of nonlinear model functions, UKFs have a tunable number of particles to obtain high fidelity fits to the latent variable. It would be worthwhile incorporating this baseline into the results on the lorentz sequences as well as the nclt dataset. (4) Missing citation to related work: http://proceedings.mlr.press/v80/marino18a/marino18a.pdf

Reviewer 3


		
The idea of the paper is straightforward, which is to use a graph neural network as a residual part to fix the messages from a Gaussian graphical model. The proposed model is simple but effective. My main concern is the experimental part. In the experiment, the dynamics are simple and fixed. The authors may want to train the model on a set of different dynamics (with the same structure, but different parameters), and then test them on the dynamics with the same structure but different parameters (obviously, the parameters can be input to the model as conditional variables). That would make the paper more convincing. ===== After Rebuttal ===== The authors' response resolves my concern, and thus I update my score.