Reviews: Copulas as High-Dimensional Generative Models: Vine Copula Autoencoders

The paper proposes to introduce pair-copula construction in the autoencoder architecture to create more robust generative model. Specifically, with a conventionally trained autoencoder encoding input data into a low dimensional latent space, the authors propose estimating the encoding vector distribution using vine-copulas. It is claimed that such estimation can be done efficiently based on sequential estimation of the pair copulation decomposition on vine trees. Furthermore, the estimated distribution can be sampled easily and passed to the decoder to create new data, thus serve as a generative model. My biggest issue with the work is the presentation, which needs a lot of improvements. In particular, the concept, models and algorithms, in general, are poorly presented or developed. The paper assumes an extensive background knowledge from its readers. While it is fair to assume certain technical background understanding on the topics, the paper should sufficiently summarize the core algorithms or concepts used from other referred works. This is far from the case here. The authors spend paragraphs explaining basic concepts such as fundamental architectures of autoencoders (Section 3), as well as very basic definitions of copula and sklar’s theorem. But rather than going into more details about many essential algorithms in the context of the paper, the authors chose to gloss over many of them or simply refer the reader to other works. In particular, more detailed description should cover, Model selection of copula Sequential estimation, constructing pseudo-observations (not defined throughout the paper) More detailed explanation of the nonparametric estimation Consequently, there is limited insight on various aspects of the proposed algorithm. How does the selection of copula affect the complexity and quality of the estimations and generated results Similarly for the truncation of the vine tree sequence (and independence assumptions) Continuity and differentiability conditions for the copula decomposition and estimation, how do they affect the model, if any? To the end, it looks like a very interesting approach to use copula decomposition in the center of autoencoder for generative models. The authors also claimed some advantage of VCAE against VAE and DCGAN, in terms of complexity, eyeballing test and a couple of metrics such as MMD and C2ST. But the paper was poorly presented and is lack of detailed analysis or insight. There are many typos and mistakes in the paper. Here is an incomplete list: Page 3 referred to Appendix A. 2 for “visual example of both the estimation and sampling of copula-based distributions. I don’t see the example at A.2 Section 2.2, for the edge e, D_e, j_e, k_e are generally subsets of the index set {1,...,d}, the authors used “element of” symbols. Typo, Section 2.5 “estimato” Section 2.4, “Furthermore, there exists” Figure 2: where is equation (7), you mean (2)? Figure 3 description, shouldn’t it be setting c_{2.3|1} to independence? Section 2.4, “Let C be a copula C”? After author rebuttals: I have read all the reviews and author rebuttals. I agree that the paper scores high on novelty/contribution, while the presentation lowers the quality considerably. I trust the authors will address the feedback, so I have increased my overall score to accept.

The proposed approach of combining copula and autoencoder as a generative model is quite novel. It has a superior performance to Variational Autoencoders and comparable to DCGAN ( one win and one loss) with modest gains in execution speed. Having said that, the comparison is based on only three datasets. Thus the results are not very conclusive. Perhaps there needs to be a better notation for the numerator of formula 2. The mean and covariance given as subscript appear weird. It took me some time to figure out thats what it means. In Figure 3, why does the graph for X1 versus X2 for C1 and C2 have circular contours? Shouldn't the contours be square since the model is nonparametric; X1,X2 have a uniform distribution over a square. I understand that copula help generate more diverse images, but I fail to understand why they give sharper images? Is there a good intuition for that? Figure 4: The caption says that the rightmost plot is a random sample of VCAE on MNIST and SVHN. How can there be a single sample for both SVHN and MNIST. I couldn't find graphs similar to Figure 6 for MNIST and CELEBA. Why were they omitted? The images in Figure 7 are too small to recognized the subtle points made by the author. It would be useful to have larger pictures in the supplement. How can one objectively verify that VCAE is not memorizing the pictures in Fig 7? I've seen section D on interpolation in the supplement. Is that enough to verify diversity? Please proofread the paper again; there are a few Typos and grammatical errors.

Paper ID:	3510
Title:	Copulas as High-Dimensional Generative Models: Vine Copula Autoencoders

Reviewer 1

Reviewer 2

Reviewer 3