Paper ID: | 2348 |
---|---|

Title: | Iterative Refinement of the Approximate Posterior for Directed Belief Networks |

This paper introduces an iterative refinement procedure for improving the approximate posterior of a recognition network. It shows that training with the refined posterior is competitive with state-of-the-art methods. This refinement is further evident in an increased effective sample size, which implies a lower variance of gradient estimate. This paper shows that inference need not all be done via a complex recognition network: iterative refinement can be used to aid in inference with a relatively simple approximate posterior.

The paper is clear, very well written. All the steps are provided. We would like a discussion on the time complexity of the approach. As shown in fig 2b, the number of refinement steps is increasing with the epochs... Please give both experimental and formal complexities.

2-Confident (read it all; understood it all reasonably well)

A Directed Belief Network is augmented by an iterative refinement procedure for improving the approximate posterior of a recognition network. Training with the refined posterior is competitive with methods that are considered SOTA in this particular field.

I think this is basically a fine paper which contributes to DBN research. However, how relevant is this work for today's deep learning community? DBNs have been largely replaced by other deep learning methods which have produced SOTA results, outperforming DBN-based systems in many domains. Few use DBNs any more; most have switched to more successful (and often older) deep NNs, even where few training data are available. To summarize, the significance of this paper for advanced deep learning seems doubtful.

2-Confident (read it all; understood it all reasonably well)

This paper proposes a sequential Monte Carlo algorithm to improve posterior estimation in a class of generative models. The main motivation is to reduce variance during training of autoencoder models that use discrete random variables in their latent space, which are unsuitable for standard variational autoencoder (VAE) training as they prevent the use of the so-called "reparametrization trick". Backpropagation through discrete stochastic unit is possible by using Monte Carlo gradient estimation, but these techniques tend to suffer from high estimation variance and hence poor convergence speed. The proposed approach, iterative refinement for variational inference (IRVI), applies at each training step a sequential Monte Carlo algorithm to refine the posterior in order to reduce the estimation variance. The approach depends on the choice of a refinement procedure, which in the experiments reported in the paper is adaptive importance refinement (AIR). The paper compares the proposed methods with other method described in the literature, including SOTA methods.

The paper is very clearly written and describes technical concepts in a very comprehensible way. The approach is sound and well motivated and the experimental comparisons with other approaches are fair, though they could have been more extensive in terms of datasets. My greatest concern is about the execution time of the proposed approach, since this is a sequential Monte Carlo method that performs multiple refinement passes for each step of the training process. The authors report convergence curves vs epochs but not vs wall clock time, which should be provided as the main motivation of the paper is to speed up training for this class of generative methods. The experimental section is good in terms of which methods it compares against, but a bit lacking in terms of datasets. Comparisons on larger images and color images should be used. The paper reports SOTA or near SOTA likelihood, which is good, but it has been argued in the literature that this measure may not well correlate with subjective quality of generative models, hence larger and richer color image examples would be a valuable addition to the experimental section. In conclusion I think that this paper provides a valuable contribution, it is well written well placed in comparison with existing approaches. It does however need some discussion and empirical evaluation of wall clock time complexity and it might use some more extensive choice of datasets in the experimental section.

2-Confident (read it all; understood it all reasonably well)

In the recent years, variational methods becomes the state-of-the-art for training directed graphical models. The reason behind the success of variational methods in approximating the posterior of directed graphical models lies in better inference and learning w.r.t. previous methods. From the other hand, however, the limited capacity of the recognition network can constrain the representational power of the generative model and increase the variance of Monte Carlo estimates. As a remedy to this problem, the authors proposed an iterative refinement procedure aiming to improve the recognition network posterior and showed that training with the refined posterior gives state-of-the-art performance. The authors further demonstrate the utility of the refinement procedure by showing how the proposed procedure has a lower variance of gradient estimate.

I do not have any particular remark.

1-Less confident (might not have understood significant parts)

The authors address deficiencies in approximate variational methods of training probabilistic networks. They note that an inaccurate recognition network may lead to a bad estimate of the variational lower bound. They propose to improve the output of the recognition network with importance sampling, and provide empirical results.

In Section 3.1, a more precise justification of the procedure would be helpful. If not rigorous mathematical claims, some explanation of the logic would be helpful. I.e. when exactly is this guaranteed to improve the bound?

1-Less confident (might not have understood significant parts)