Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This submission presents two new and expressive types of normalizing flows where computation of Jacobian determinant is efficient. The main problem that I see is a rather linear presentation of the background material and the main theoretical contribution. The easy fix to the former would be to introduce a table that lists different normalizing flows, specifies any assumptions (e.g. volume preserving, diagonal Jacobian), lists computation costs, etc. The benefit of such simple addition would be immense. The easy fix to the latter would be to produce a diagram to illustrate what CONF is, how different sections are related to each element of it. Another problem that I see is that in a number of cases I see proofs by statement (...appears to be particularly well suited to capture structure in images and audio signals...) and procedural descriptions ("we do that, then we do that". E.g. for normalization, we use a data dependent initialization transform....). Each of these cases needs to be properly referenced and argued for by describing the problem, detailing possible solutions and then picking one that you argue is the best one to use. Overall, I believe that the submission is sufficiently original, lacks quality and clarity in some respects and is of sufficient significance to the specialists in normalizing flow community. Following authors feedback and other reviewers comments I hope that the authors will address the issues raised in an adequate fashion. Under this assumption I raised my score by 1.
I appreciated the answers to my questions in the author response regarding 2D convolutions and invertibility issues. I'd suggest being explicit about both of these things in the camera ready and especially about the 2D convolutions. I'd also like to note that I don't think all 2D convolutions are separable. Thus, it should be noted in the final version that your methods only handle separable 2D convolutions. ------ Original review ------ The paper presents a novel flow based on invertible convolutions. This primarily through an interesting synthesis of ideas from flows and linear algebra (specifically Toeplitz, circulant or block Toeplitz/circulant concepts). This could be form a foundational building block for future flow-based models. The clarity of some parts could be improved. Detailed comments to follow. 1. Details on 2D convolutions. The paper briefly mentions extensions to the 2D case at the end of section 3 but this definitely needs to be expanded. As one question, are the convolutions used in the experiments all separable convolutions (i.e. they can be computed by a 1D convolution along each dimension)? It is unclear from the paragraph if actual 2D convolutions (and related block Toeplitz matrices) are used in the experiments or merely mentioned as a future work possibility. If general 2D convolutions are used, then the details are definitely important here. The 1D case is relatively easy to understand but the 2D case seems a bit more complicated and helpful to the reader. Also, intuitions about the 2D case would be quite useful as well. I think I would be okay with a reduction in discussing the 1D case in order to develop the 2D case a bit more. 2. Clarity and/or intuitions regarding the point-wise nonlinearities connection to regularizers. It was not immediately clear that when mentioning "regularizers", the idea is regularizing the *latent representation* rather than regularizing the parameters directly. Is this correct? Usually, regularizers are used to explicitly operate on the parameters of the network. This might still be a reasonable term but I would make sure to clarify this. Also, note that using point-wise nonlinearities does not seem novel itself. Only possibly the interpretation of them as regularizing the latent representation seems novel. --- Minor comments --- How do you ensure that the convolutions are invertible? Some parameter settings for the kernel of the convolution will not be invertible since the corresponding Toeplitz matrix is singular. Maybe this is only enforced softly through the log determinant term but it would be great to at least mention this in the paper. Proposition 2 seems very wordy. Maybe split up into a few definitions and then the proposition rather than including everything into one. Overall, the proposition as it is currently written is very difficult to parse as a single idea. It seems that comparing against MADE and MAF is not quite fair because these do not ignore temporal or spatial structure and are more general than convolutional-based architectures. Maybe it's still reasonable to compare to them but it would be good to clarify why it makes sense to compare to these.
Originality: Whilst previous research may have been headed in this direction (perhaps Zheng et al. 2017), this paper has fleshed out those previous ideas fully and produced an excellent architecture. The paper is adequately cited. Quality: The submission is a complete piece of work, in that it clearly explains the background theory and achieves excellent empirical results. The authors take care to compare their model fairly to other normalising flows (e.g. controlling the network capacity to be the same as that used in other architectures when finding flow parameters). Clarity: The paper is well-written and easy to understand. Significance: I expect this architecture to become a useful tool for future ML research and applications.