Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Summary: The authors suggest a new composition rule that brings flow models and autoregressive closer together. The main contribution is a sufficient condition for invertibility and a type of masked convolution design with triangular Jacobian, whose inverse can be computed sequentially in a parallelizable manner. Some concerns remain about novelty. Pros: There is a dire need for a generalized framework around recent progress in flow-based modeling. The paper aims to provide this. The empirical results are strong. Major concerns: A recent paper  is neither cited nor discussed. However, the paper contains masked convolutions and discusses relationship to autoregressive models. It also has to compute a sequential, but parallelizable inverse. I am concerned about the novelty of the current paper and its lack of discussion of this work. If the authors can not make a compeling argument how their method is sufficiently more general or different than , the current work seems to be too incremental to warrant acceptance to Neurips.  Hoogeboom et al., ICML 2019, Emerging Convolutions for Generative Normalizing Flows. ==== Post rebuttal ==== The authors addressed my major concern and discussed emerging convolutions in the context of their work. The significantly reduced time-complexity due to the fixed point iteration inverse is a significant improvement, so I will raise my score by one point.
The paper is fairly clear. Experimental results show promises of the proposed architecture and the module along with its inference/reversed inference seems simple enough to be practical. However, the main reasons for my borderline score are: (1) significance and novelty of the proposed method: existing works such as Flow++ is already using similar idea and achieving better results (the authors mention Flow++ treat the output is discrete, but the gap is quite significant). (2) experimental results: the qualitative results are not too impressive and ImageNet only is up to 32x32. Appendix mentioned CelebA but no results. Additionally, the math proofs in Appendix (i.e. Thm1) is classic linear algebra results. The authors can simply cite e.g. Matrix Analysis, and give a re-phrased proof in context for clarity. I raised my score to 6 on the author's promise to release code.
Originality: While I would not call the use of masked transformations particularly novel in this setting, the authors present a satisfying and simple architecture which should be broadly applicable to many domains and tasks. This stands in contrast to many other invertible models which utilize very tailored and domain specific architectures. Quality: I believe this paper to be of high quality. The strong performance of the proposed architecture on generative modeling is well-backed by experimental results. I feel the classification experiments could have been stronger and presented more clearly. The proposed architecture can be implemented with an identical structure to an iResNet (by replacing the masked convolutions with spectral-normalized convolutions). A comparison of this kind could improve the strength of the author's claims. I was also somewhat confused about the effect of the step size in the inverse algorithm. Given the authors grid-searched over this parameter, it seems important. I do not feel like the authors adequately explained the impact of this parameter on reconstruction accuracy and speed. The performance of the method presented by the authors to invert their layers relies on the choice of an initial guess of the solution x_0. The authors do not present a method for choosing this initial guess and they also do not empirically show the effect of making a bad choice. In lieu of theoretical claims about this, some further experiments would be sufficient. Clarity: Overall the paper is clear and well-written. The introduction and background sections motivate the development of invertible models and give the reader a solid background on related methods. In section 3 the authors introduce their approach and discuss how many previous methods fall into their framework. The authors introduce some compositionally rules that all one to define invertible models. These rules are clear, as is their proof. I was somewhat confused by some aspects of the chosen model architecture. How are "paired" mint layers created? Is an upper-triangular layer followed by a lower-triangular layer? Are their results added together? I found this unclear. The authors introduce paired layers to combat a potential argument about the choice of feature ordering used. Does using paired layers improve performance compared to non-paired layers? Significance: The biggest contribution of this work is the attempt to formalize and standardize the design of invertible models. Most previous work in the area has been very ad-hoc. If bijective models become a popular model paradigm then the ideas presented in this paper may be quite useful. Similar standardization ideas were presented in the iResNet paper. The biggest advantage this work has over that work is the ability to compute the jacobian log-determinant exactly. This is appealing to have but further work has demonstrated that improved performance can be obtained on generative modeling without exact log-determinant computation. -------- post rebuttal ------- I thank the authors for clarifying on a few questions I had. This is appreciated but I will maintain my score.