NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
(Originality): The paper proposes a new elementwise transformation for use in coupling & autoregressive transforms. There have been multiple papers proposing such transformations recently. Notably, [Muller et al, 2018] proposes invertible transformations based on linear and quadratic splines. In an earlier iteration of this work, the authors proposed an extension of this in the form of cubic splines. This work extends on these by 1) allowing unconstrained inputs/outputs by letting the transformation be linear outside a range (improving numerical stability) and 2) using rational-quadratic splines instead of linear/quadratic/cubic splines (improving flexibility). This is clearly explained in the paper and related work is cited. (Quality): The paper is well executed. The paper proposes a new flexible elementwise transformation which can be used a replacement for the typical additive/affine transformations in coupling/autoregressive transformations, increasing their overall flexibility. This is demonstrated in the experiments where the authors show that by exchanging the additive/affine transformations in IAF/MAF/Glow by rational-quadratic transformations, they can 1) for some tabular datasets obtain state-of-the-art results, 2) show small but consistent improvements when used as a prior and variational distribution in a VAE and 3) obtain comparable results to Glow for images using 1/10th of the parameters. In the experiments on tabular data, the baselines are updated to be more similar to the proposed method. That is, MAF is updated to use ResMADE and invertible linear layers instead of permutations, while the quadratic spline flows of [Muller, 2018] are updated to have linear tails. This allows for a more fair comparison by isolating factors of improvement. (Clarity): The paper is generally well written and organized. The proposed method is clearly explained and supported by illustrative figures. While rational-quadratic splines were explained in detail in the appendix, I would have liked to see equations for the forward/inverse transformation and/or Jacobian determinant for the rational-quadratic transformation in the main text as this is a central point of the paper. Experimental details are given in the appendix, facilitating reproducibility of the results. Some comments/questions: - Was the coupling layer which transforms all variables used in all your experiments? And does this yield significant improvements over a standard coupling layer? - I would prefer to have boldface highlighting the top results within the C/AR part of the tables. (Significance): The proposed elementwise transformation adds a useful new module to the toolbox. As demonstrated in the experiments, this module can improve the flexibility of coupling and autoregressive transforms, in some cases leading to state-of-the-art results.
Reviewer 2
1) Originality The authors do a very good job at describing how their method fits within the family of building blocks of flow-based models. The literature review is, in my opinion, very thorough and well-written. 2) Quality The spline-based proposal of the authors for modelling makes a lot of sense, and the experimental results support show that using these rational-quadratic splines improves the flexibility of flows. A way of improving the paper would be to discuss more the computational cost implied by their method compared to simpler ones (e.g. affine transformations). 3) Clarity The review of flow-based models is extremely well-written (Sections 1 & 2). The final discussion is also very interesting. However, I think the authors should explain more clearly what monotonic rational-quadratic splines are (Section 3.1). Indeed, I think that understanding the paper without reading Appendix A is quite difficult, and some of the results form this Appendix could be moved in the main paper in future iterations. A few sentences are a bit vague: - l. 127: "Rational-quadratic functions are easily differentiable, more flexible than a polynomial in that they have an infinite Taylor-series expansion". There are families of functions with infinite Taylor-series expansions that are arguably not very flexible (like exp(-alpha*x)) - l. 136: "We [...] set the boundary derivatives to 1 to match the linear tails, which we found to be important for stable training." An experiment (for example in the appendix) would be nice to understand what you mean by "stable training". Also, do you have an explaination for this phenomenon? - l. 157: "Unlike the additive and affine transformations, which have limited flexibility, a monotonic spline with sufficiently many bins can approximate any continuous monotonic function on the specified interval" Could you add more details, and/or back the result by a citation? 4) Significance I'll repeat what I wrote in the contributions box: while the papers has essentially a single contribution, I think this is not a bad thing at all. Indeed the authors thorougly study (and, arguably, improve) a very specific and important part of flow-based models, which is valuable. It is an "incremental" paper in the good sense of the term. I would therefore recommend acceptance. --------- Post-rebuttal edit ------------- I've read your reviews and the rebuttal. I'm happy with the clarifications provided by the authors. This will be a nice contribution to the quickly growing field of flows.
Reviewer 3
The paper introduces a transformation which is elementwise strictly monotone (a bijection, i.e. invertible in principle) as well analytically invertible. The goal is to improve the flow's flexibility without losing access to an analytical inverse---which then enables *both* sampling and likelihood assessment. The technique is compatible both with couplings and autoregressive flows. The general idea is based on dividing a segment (say from -B to B) in bins by predicting bin widths (determined by ordered knots -- computed via cumulative sum of unordered predictions) and heights. Parameterising the transformation also involves predicting positive derivatives at the knots. This paper employs the parameterisation of Gregory and Delbourgo (1982). For the parts the paper does discuss, it does so reasonably clearly. But the parameterisation itself (key to flexibility) and the inverse transform (one of the big selling points) are considered supplementary and only explained in an Appendix. The empirical analysis shows a mixture of competitive results and state-of-the-art results, but generally never worse. In particular, density estimation shows clearer advantage for the proposed splines. The VAE setting was limited to MNIST and EMNIST -- it was unclear to me why not use all the typical benchmarks (Freyfaces, Omniglot, Caltech101), especially in light of a comment by the authors that suggest the datasets are too simple for their flow to shine. The authors also show results competitive with Glow in generative modelling (CIFAR and ImageNet) while using a rather compact model (up to 10x smaller parameter footprint). Whether or not the proposed flow will be widely used will largely depend on practical concerns regarding implementation (and availability of code): for example, it was unclear to me what difficulties (if any) underlie the parameterisation and the computation of the inverse.