{"title": "Copulas as High-Dimensional Generative Models: Vine Copula Autoencoders", "book": "Advances in Neural Information Processing Systems", "page_first": 6528, "page_last": 6540, "abstract": "We introduce the vine copula autoencoder (VCAE), a flexible generative model for high-dimensional distributions built in a straightforward three-step procedure.\n First, an autoencoder (AE) compresses the data into a lower dimensional representation.\nSecond, the multivariate distribution of the encoded data is estimated with vine copulas. \nThird, a generative model is obtained by combining the estimated distribution with the decoder part of the AE.\nAs such, the proposed approach can transform any already trained AE into a flexible generative model at a low computational cost.\nThis is an advantage over existing generative models such as adversarial networks and variational AEs which can be difficult to train and can impose strong assumptions on the latent space.\nExperiments on MNIST, Street View House Numbers and Large-Scale CelebFaces Attributes datasets show that VCAEs can achieve competitive results to standard baselines.", "full_text": "Copulas as High-Dimensional Generative Models:\n\nVine Copula Autoencoders\n\nNatasa Tagasovska\n\nDepartment of Information Systems\n\nHEC Lausanne, Switzerland\n\nnatasa.tagasovska@unil.ch\n\nDamien Ackerer\nSwissquote Bank\nGland, Switzerland\n\ndamien.ackerer@swissquote.ch\n\nThibault Vatter\n\nDepartment of Statistics\n\nColumbia University, New York, USA\nthibault.vatter@columbia.edu\n\nAbstract\n\nWe introduce the vine copula autoencoder (VCAE), a \ufb02exible generative model\nfor high-dimensional distributions built in a straightforward three-step procedure.\nFirst, an autoencoder (AE) compresses the data into a lower dimensional represen-\ntation. Second, the multivariate distribution of the encoded data is estimated with\nvine copulas. Third, a generative model is obtained by combining the estimated\ndistribution with the decoder part of the AE. As such, the proposed approach\ncan transform any already trained AE into a \ufb02exible generative model at a low\ncomputational cost. This is an advantage over existing generative models such as\nadversarial networks and variational AEs which can be dif\ufb01cult to train and can\nimpose strong assumptions on the latent space. Experiments on MNIST, Street\nView House Numbers and Large-Scale CelebFaces Attributes datasets show that\nVCAEs can achieve competitive results to standard baselines.\n\nIntroduction\n\n1\nExploiting the statistical structure of high-dimensional distributions behind audio, images, or video\ndata is at the core of machine learning. Generative models aim not only at creating feature representa-\ntions, but also at providing means of sampling new realistic data points. Two classes are typically\ndistinguished: explicit and implicit generative models. Explicit generative models make distributional\nassumptions on the data generative process. For example, variational autoencoders (VAEs) assume\nthat the latent features are independent and normally distributed [37]. Implicit generative models\nmake no statistical assumption but leverage another mechanism to transform noise into realistic data.\nFor example, generative adversarial networks (GANs) use a discriminant model penalizing the loss\nfunction of a generative model producing unrealistic data [22]. Interestingly, adversarial autoen-\ncoders (AAEs) combined both features as they use a discriminant model penalizing the loss function\nof an encoder when the encoded data distribution differs from the prior (Gaussian) distribution [48].\nAll of these new types of generative models have achieved unprecedent results and also proved to\nbe computationally more ef\ufb01cient than the \ufb01rst generation of deep generative models which require\nMarkov chain Monte Carlo methods [32, 30]. However, adversarial approaches require multiple\nmodels to be trained, leading to dif\ufb01culties and computational burden [62, 26, 24], and variational\napproaches make (strong) distributional assumptions, potentially detrimental to the generative model\nperformance [64].\nWe present a novel approach to construct a generative model which is simple, makes no prior\ndistributional assumption (over the input or latent space), and is computationally ef\ufb01cient: the vine\ncopula autoencoders (VCAEs). Our approach, schematized in Figure 1 combines three tasks. First, an\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fX\n\nZ\n\n3, 5\n\n1, 3\n\n3, 4\n\n3\n\n, \n\n5\n\nX\u2019\n\n1, 2\n\n3\n\n1\n\n,\n \n\n4\n\n \n|\n \n\n2\n\n1, 3\n\n2\n\n1,2\n\n1\n\n4\n\n5\n\n3, 4\n\n1,4 |\n3\n\n1, 3 |\n2\n\n2\n\n,\n \n\n5\n\n \n|\n \n\n1\n\n,\n \n\n3\n\n,\n \n\n4\n\n2, 4 | \n1, 3\n\n1, 5 | 3\n\n2, 3 | 1\n\n4, 5 | 1, 3\n\n2, 4 | 1, 3\n\nfeatures of X \n\nfrom the vine \n\n(2.a) Extract the latent \n\n(1) Train an autoencoder\n\n(3.a) Simulate random features \n\n(3.b) Decode the simulated features\n\n(2.b) Train a vine copula on the latent features\n\nautoencoder (AE) is trained to provide high-quality embeddings of the data. Second, the multivariate\ndistribution of the encoded train data is estimated with vine copulas, namely, a \ufb02exible tool to\nconstruct high-dimensional multivariate distributions [3, 4, 1]. Third, a generative model is obtained\nby combining the estimated vine copula distribution with the decoder part of the AE.\nIn other words, new data is produced by decod-\ning random samples generated from the vine\ncopula. An already trained AE can thus be trans-\nformed into a generative model, where the only\nadditional cost would be the estimation of the\nvine copula. We show in multiple experiments\nthat this approach performs well in building\ngenerative models for the MNIST, Large-Scale\nCelebFaces Attributes, and Street View House\nNumbers datasets. To the best of our knowledge,\nthis is the \ufb01rst time that vine copulas are used\nto construct generative models for very high di-\nmensional data (such as images).\nNext, we review the related work most relevant\nto our setting. The most widespread genera-\ntive models nowadays focus on synthetic im-\nage generation, and mainly fall into the GAN\nor VAE categories, some interesting recent de-\nvelopments include [49, 15, 26, 76, 29, 14, 6].\nThese modern approaches have been largely in-\nspired by previous generative models such as\nbelief networks [32], independent component\nanalysis [33] or denoising AEs [79]. Part of their success can be attributed to the powerful neural\nnetwork architectures which provide high quality feature representations, often using Convolutional\narchitectures [41]. A completely different framework to model multivariate distributions has been\ndeveloped in the statistical literature: the so-called copulas. Thanks to their ability to capture complex\ndependence structures, copulas have been applied to a wide range of scienti\ufb01c problems, and their\nsuccesses have led to continual advances in both theory and open-source software availability. We\nrefer to [56, 35] for textbook introductions. More recently, copulas also made their way into machine\nlearning research [43, 20, 47, 78, 45, 13, 74, 38]. However, copulas have not yet been employed\nin constructing high dimensional generative models. While [42, 59] use copulas for synthetic data\ngeneration, they rely on strong parametric assumptions. In this work, we illustrate how nonparametric\nvine copulas allow for arbitrary density estimation [50], which in turn can be used to sample realistic\nsynthetic datasets.\nBecause their training is relatively straightforward, VCAEs have some advantages over GANs. For\ninstance, GANs require some complex modi\ufb01cations of the baseline algorithm in order to avoid mode\ncollapse, whereas vines naturally \ufb01t multimodal data. Additionally, while GANs suffer from the\n\u201cexploding gradients\u201d phenomenon (e.g., see [24]) and require careful monitoring of the training and\nearly stopping, this is not an issue with VCAEs as they are built upon standard AEs.\nTo summarize, the contribution of this work is introducing a novel, competitive generative model\nbased on copulas and AEs. There are three main advantages of the proposed approach. First, it offers\nmodeling \ufb02exibility by avoiding most distributional assumptions. Second, training and sampling\nprocedures for high-dimensional data are straightforward. Third, it can be used as a plug-in allowing\nto turn any AE into generative model, simultaneously allowing it to serve other purposes (e.g.,\ndenoising, clustering).\nThe remainder of the paper is as follows. Section 2 reviews vine copulas as well as their estimation\nand simulation algorithms. Section 3 discusses the VCAE approach. Section 4 presents the results of\nour experiments. Section 5 concludes and discusses future research. The supplementary material\ncontains further information on algorithm and experiments, as well as additional experiments.\n\nFigure 1: Conceptual illustration of a VCAE.\n\n1, 5 | \n3\n\n4, 5 | \n1, 3\nVine tree structure\n\n2\n\n\f2 Vine copulas\n2.1 Preliminaries and motivation\nA copula, from the latin word link, \ufb02exibly \u201ccouples\u201d marginal distributions into a joint distribution.\nAs such, copulas allow to construct joint distributions with the same margins but different dependence\nstructures, or conversely by \ufb01xing the dependence structure and changing the individual behaviors.\nThanks to this versatility, there has been an exponentially increasing interest in copula-based models\nover the last two decades. One important reason lies in the following theorem.\nTheorem 1 (Sklar\u2019s theorem [71]). The continuous random vector X = (X1, . . . , Xd) has joint\ndistribution F and marginal distributions F1, . . . , Fd if and only if there exist a unique copula 1 C,\n\nwhich is the joint distribution of U = (U1, . . . , Ud) =F1(X1), . . . , Fd(Xd).\nAssuming that all densities exist, we can write f (x1, . . . , xd) = cu1, . . . , ud \u21e5Qd\n\nk=1 fk(xk),\nwhere ui = Fi(xi) and f, c, f1, . . . , fd are the densities corresponding to F, C, F1, . . . , Fd respec-\ntively. As such, copulas allow to decompose a joint density into a product between the marginal\ndensities fi and the dependence structure represented by the copula density c.\nThis has an important implication for the estimation and sampling of copula-based marginal distri-\nbutions: algorithms can generally be built into two steps. For instance, estimation is often done by\nestimating the marginal distributions \ufb01rst, and then using the estimated distributions to construct\npseudo-observations via the probability integral transform before estimating the copula density.\nSimilarly, synthetic samples can be obtained by sampling from the copula density \ufb01rst, and then\nusing the inverse probability integral transform to transform the copula sample back to the natural\nscale of the data. We give a detailed visual example of both the estimation and sampling of (bivariate)\ncopula-based distributions in Figure 2. We also refer to Appendix A.1 or the textbooks [56] and [35]\nfor more detailed introductions on copulas.\nThe availability of higher-dimensional models is rather limited, yet there exists numerous parametric\nfamilies in the bivariate case. This has inspired the development of hierarchical models, constructed\nfrom cascades of bivariate building blocks: the pair-copula constructions (PCCs), also called vine\ncopulas. Thanks to its \ufb02exibility and computational ef\ufb01ciency, this new class of simple yet versatile\nmodels has quickly become a hot-topic of multivariate analysis [2].\n\n2.2 Vine copulas construction\nPopularized in [3, 4, 1], PCCs model the joint distribution of a random vector by decomposing the\nproblem into modeling pairs of conditional random variables, making the construction of complex\ndependencies both \ufb02exible and yet tractable. Let us exemplify such constructions using a three\ndimensional vector of continuously distributed random variables X = (X1, X2, X3). The joint\ndensity f of X can be decomposed as\n\nf = f1 f2 f3 c1,2 c2,3 c1,3|2,\n\n(1)\nwhere we omitted the arguments for the sake of clarity, and f1, f2, f3 are the marginal densities of\nX1, X2, X3, c1,2 and c2,3 are the joint densities of (F1(X1), F2(X2)) and (F2(X2), F3(X3)),\nc1,3|2 is the joint density of (F1|2(X1|X2), F3|2(X3|X2))|X2.\nThe above decomposition can be generalized to an arbitrary dimension d and leads to tractable and\n\ufb02exible probabilistic models [34, 3, 4]. While a decomposition is not unique, it can be organized as a\ngraphical model, a sequence of d1 nested trees, called regular vine, R-vine, or simply vine. Denoting\nTm = (Vm, Em) with Vm and Em the set of nodes and edges of tree m for m = 1, . . . , d 1, the\nsequence is a vine if it satis\ufb01es a set of conditions guaranteeing that the decomposition leads to a\nvalid joint density. The corresponding tree sequence is then called the structure of the PCC and has\nimportant implications to design ef\ufb01cient algorithms for the estimation and sampling of such models\n(see Section 2.3 and Section 2.4).\nEach edge e is associated to a bivariate copula cje,ke|De (a so-called pair-copula), with the set\nDe 2{ 1,\u00b7\u00b7\u00b7 , d} and the indices je, ke 2{ 1,\u00b7\u00b7\u00b7 , d} forming respectively its conditioning set and\nthe conditioned set. Finally, the joint copula density can be written as the product of all pair-copula\ndensities c =Qd1\ncje,ke|De. In the following two sections, we discuss two topics that are\n\nm=1Qe2Em\n\n1A copula is a distribution function with uniform margins.\n\n3\n\n\fimportant for the application of vines as generative models: estimation and simulation. For further\ndetails, we refer to the numerous books and surveys written about them [16, 39, 72, 18, 2], as well as\nAppendix A.2.\n\n2.3 Sequential estimation\nTo estimate vine copulas, it is common to follow a sequential approach [1, 27, 50], which we outline\nbelow. Assuming that the vine structure is known, the pair-copulas of the \ufb01rst tree, T1, can be\ndirectly estimated from the data. But this is not as straightforward for the other trees, since data from\nthe densities cje,de|De are not observed. However, it is possible to sequentially construct \u201cpseudo-\nobservations\u201d using appropriate data transformations, leading to the following estimation procedure,\nstarting with tree T1: for each edge in the tree, estimate all pairs, construct pseudo-observations for\nthe next tree, and iterate. The fact that the tree sequence T1, T2, . . . , Td1 is a regular vine guarantees\nthat at any step in this procedure, all required pseudo-observations are available. Additionally to\nAppendix A.2.1 and Appendix A.2.2, we further refer to [1, 12, 18, 19, 9, 36] for model selection\nmethods and to [17, 73, 11, 27, 69] for more details on the inference and computational challenges\nrelated to PCCs.\nImportantly, vines can be truncated after a given number of trees [12, 8, 10] by setting pair-copulas in\nfurther trees to independence.\n\nComplexity Because there are d pair-copulas in T1, d 1 pair-copulas in T2, . . . , and a single\npair-copula in Td1, the complexity of this algorithm is O(f (n) \u21e5 d \u21e5 truncation level), where\nf (n) is the complexity of estimating a single pair and the truncation level is at most d 1. In our\nimplementation, described Section 2.5, f (n) = O(n).\n\n2.4 Simulation\nAdditionally to their \ufb02exibility, vines are easy to sample from using inverse transform sampling.\nLet C be a copula and U = (U1, . . . , Ud) is a vector of independent U (0, 1) random variables.\nThen, de\ufb01ne V = (V1, . . . , Vd) through V1 = C1(U1), V2 = C1(U2|U1), and so on until\nVd = C1(Ud|U1, . . . , Ud1), with C(vk|v1, . . . , vk1) is the conditional distribution of Vk given\nV1, . . . , Vk1, k = 2, . . . , d. In other words, V is the inverse Rosenblatt transform [65] of U. It\nis then straightforward to notice that V \u21e0 C, which can be used to simulate from C. As for the\nsequential estimation procedure, it turns out that\n\nconditional bivariate copulas are available (see Algorithm 2.2 of [19]),\n\n\u2022 the fact that the tree sequence T1, T2, . . . , Td1 is a vine guarantees that all the required\n\u2022 the complexity of the algorithm O(n \u21e5 d \u21e5 truncation level), since f (n) is trivially the\n\ncomplexity required for one inversion multiplied by the number of generated samples.\n\nFurthermore, there exist analytical expressions or good numerical approximations of such inverses\nfor common parametric copula families. We refer to Section 2.5 for a discussion of the inverse\ncomputations for nonparametric estimators.\n\nImplementation\n\n2.5\nTo avoid specifying the marginal distributions, we estimate them using a Gaussian kernel with a\nbandwidth chosen using the direct plug-in methodology of [70]. The observations can then be mapped\nto the unit square using the probability integral transform (PIT). See steps 1 and 2 of Figure 2 for an\nexample.\nRegarding the copula families used as building blocks for the vine, one can contrast parametric and\nnonparametric approaches. As is common in machine learning and statistics, the default choice is the\nGaussian copula. In Section 2.6, we show empirically why this assumption (allowing for dependence\nbetween the variables but still in the Gaussian setting) can be too simplistic, resulting in failure to\ndeliver even for three dimensional datasets.\nAlternatively, using a nonparametric bivariate copula estimator provides the required \ufb02exibility.\nHowever, the bivariate Gaussian kernel estimator, targeted at densities of unbounded support, cannot\nbe directly applied to pair-copulas, which are supported in the unit square. To get around this issue,\nthe trick is to transform the data to standard normal margins before using a bivariate Gaussian\n\n4\n\n\fkernel. Bivariate copulas are thus estimated nonparametrically using the transformation estimator\n[67, 47, 50, 21] de\ufb01ned as\n\nN (1(u), 1(v)|1(uj), 1(vj), \u2303)\n\n (1(u)) (1(v))\n\n,\n\n(2)\n\n1\nn\n\nnXj=1\n\nbc(u, v) =\n\nwhere N (\u00b7,\u00b7|1, 2, \u2303) is a two-dimensional Gaussian density with mean 1, 2, and covariance\nmatrix \u2303= n1/3 Cor(1(U ), 1(V )). For the notation we let , and 1 to be the standard\nGaussian density, distribution and quantile function respectively. See step 3 of Figure 2 for an\nexample.\n\nFigure 2: Estimation and sampling algorithm for a pair copula.\n\nAlong with vines-related functions (i.e., for sequential estimation and simulation), the Gaussian\ncopula and (2) are implemented in C++ as part of vinecopulib [51], a header-only C++ library for\ncopula models based on Eigen [25] and Boost [68]. In the following experiments, we use the R\ninterface [61] interface to vinecopulib called rvinecopulib [53], which also include kde1d [52]\nfor univariate density estimation.\nNote that inverses of partial derivatives of the copula distribution corresponding to (2) are required\nto sample from a vine, as described in Section 2.4. Internally, vinecopulib constructs and stores\na grid over [0, 1]2 along with the evaluated density at the grid points. Then, bilinear interpolation\n\nis used to ef\ufb01ciently compute the copula distribution bC(u, v) and its partial derivatives. Finally,\n\nvinecopulib computes the inverses by numerically inverting the bilinearly interpolated quantities\nusing a vectorized version of the bisection method, and we show a copula sample example as step 4\nof Figure 2. The consistency and asymptotic normality of this estimator are derived in [21] under\nassumptions described in Appendix A.3.\nTo recover samples on the original scale, the simulated copulas samples, often called pseudo-samples,\nare then transformed using the inverse PIT, see step 5 of Figure 2. In Appendix C.1, we show that\nthis estimator performs well on two toy bivariate datasets that are typically challenging for GANs: a\ngrid of isotropic Gaussians and the swiss roll.\n\ndimensional dataset X1, X2, X3 with X1, X2 \u21e0 U [5, 5] and X3 =pX 2\n\n2.6 Vines as generative models\nTo exemplify the use of vines as generative models, let us consider as a running example a three\n2 + U [0.1, 0.1].\nThe joint density can be decomposed as in the right-hand side of (1), and estimated following the\nprocedures described in Section 2.5 and Section 2.3. With the structure and the estimated pair copulas,\nwe can then use vines as generative models.\nIn Figure 3, we showcase three models. C1 is a nonparametric vine truncated after the \ufb01rst tree.\nIn other words, it sets c2,3|1 to independence. C2 is a nonparametric vine with two trees. C3 is a\nGaussian vine with two trees. On the left panel, we show their vine structure, namely the trees and\nthe pair copulas. On the right panel, we present synthetic samples from each of the models in blue,\n\n1 + X 2\n\nwith the green data points corresponding topX 2\n\n2 .\n1 + X 2\n\n5\n\n\f2\nz\n\n0\n\n2\nz\n\n0\n\n2\nz\n\n0\n\n0\nz1\n\n0\nz1\n\n1\n\n2\n\n3\n\n2\nz\n\n0\n\n3\n\n2\n\n3\n\n2\n\n1\n\n0\nz1\n\n1\n\n2\n\n3\n\n2\nz\n\n0\n\n1\n\n2\n\n3\n\n1\n\n2\n\n3\n\n3\n\n2\n\n1\n\n3\n\n2\n\n1\n\n3\n\n2\n\n1\n\n1\n\n2\n\n3\n\n1\n\u2212\n\n2\n\u2212\n\n3\n\u2212\n\n1\n\u2212\n\n2\n\u2212\n\n3\n\u2212\n\n1\n\u2212\n\n2\n\u2212\n\n3\n\u2212\n\n\u22123\n\n\u22122\n\n\u22121\n\n\u22123\n\n\u22122\n\n\u22121\n\n\u22123\n\n\u22122\n\n\u22121\n\n\u22123\n\n\u22122\n\n\u22121\n\n1\n\u2212\n\n2\n\u2212\n\n3\n\u2212\n\n\u22123\n\n\u22122\n\n\u22121\n\n1\n\u2212\n\n2\n\u2212\n\n3\n\u2212\n\n2\n\n3\n\n1\n\n2\n\n3\n\n1\n\n1,3\n\n1,2\n\n1,3\n\n1,2\n\n1,3\n\n1,2\n\n1,2\n\n1,2\n\n1,3\n\n1,3\n\n2,3|1\n\nTree 1\n\nTree 2\n\nTree 1\n\nVine copula C1\n\n1 + X 2\n\nspread around the pX 2\n\nComparing C1 to C2 allows to under-\nstand the truncation effect: C2, be-\ning more \ufb02exible (\ufb01tting richer/deeper\nmodel), captures better the features\nof the joint distribution.\nIt can be\ndeduced from the fact that data gen-\nerated by C2 looks like uniformly\n2 sur-\nface, while data generated by C1 is\nspread all around. It should be noted\nthat, in both cases, the nonparametric\nestimator captures the fact that X1 and\nX2 are independent, as can be seen\nfrom the contour densities on the left\npanel. Regarding C3, it seems clear\nthat Gaussian copulas are not suited\nto handle this kind of dependencies:\nfor such nonlinearities, the estimated\ncorrelations are (close to) zero, as can\nbe seen from the contour densities on\nthe left panel.\nWith this motivation, the next section is dedicated to extending the vine generative approach to high\ndimensional data. While vines are theoretically suitable for \ufb01tting and sampling in high dimensions,\nthey have been only applied to model a few thousands of variables. The reason is mainly that state-\nof-the-art implementations were geared towards applications such as climate science and \ufb01nancial\nrisk computations. While software such a vinecopulib satis\ufb01es the requirements of such problems,\neven low-resolution images (e.g., 64 \u21e5 64 \u21e5 3) are beyond its current capabilities. To address this\nchallenge, we can rely on the embedded representations provided by neural networks.\n\nFigure 3: Simulation with different truncation levels, top to\nbottom - 1 level truncated vine, 2 levels non-parametric vine,\n2 levels Gaussian vine.\n\nVine copula C3\n\nVine copula C2\n\nTree 1\n\nTree 2\n\n2,3|1\n\n1,2\n\n1,2\n\n1,3\n\n1,2\n\n1,3\n\n\u22121\n\n0\nz1\n\n1,3\n\n1\n\n3\n\n2\n\n\u22123\n\n\u22122\n\n\u22121\n\n1\n\u2212\n\n2\n\u2212\n\n3\n\u2212\n\n1\n\u2212\n\n2\n\u2212\n\n3\n\u2212\n\n\u22123\n\n\u22122\n\n\u22121\n\n1\n\u2212\n\n2\n\u2212\n\n3\n\u2212\n\n1\n\n2\n\n3\n\n3\n\n2\n\n1\n\n1\n\n2\n\n3\n\n1\n\n2\n\n3\n\n\u22123\n\n\u22122\n\n3\n\n2\n\n3\n\n2\n\n1\n\n1\n\n2\nz\n\n0\n\n2\nz\n\n0\n\n0\nz1\n\n0\nz1\n\n0\nz1\n\n1\n\n2\nz\n\n0\n\n0\nz1\n\n3 Vine copula autoencoders\nThe other building block of the VCAE is an autoencoder (AE) [7, 31]. These neural network models\ntypically consist of two parts: an encoder f mapping a datum X from the original space X to the\nlatent space Y, and a decoder g mapping a latent code Y from the latent space Y to the original\nspace X . The AE is trained to reconstruct the original input with minimal reconstruction loss, that is\nX0 \u21e1 g(f (X)).\nHowever, AEs simply learn the most informative features to minimize the reconstruction loss,\nand therefore cannot be considered as generative models. In other words, since they do not learn\nthe distributional properties of the latent features [5], they cannot be used to sample new data\npoints. Because of the latent manifold\u2019s complex geometry, attempts using simple distributions (e.g.,\nGaussian) for the latent space may not provide satisfactory results.\nNonparametric vines naturally \ufb01ll this gap. After training an AE, we use its encoder component to\nextract lower dimensional feature representations of the data. Then, we \ufb01t a vine without additional\nrestrictions on the latent distribution. With this simple step, we transform AEs into generators, by\nsystematically sampling data from the vine copula, following the procedure from Section 2.4. Finally,\nwe use the decoder to transform the samples from vine in latent space into simulated images in pixel\nspace. A schematic representation of this idea is given in Figure 1 and pseudo-code for the VCAE\nalgorithm can be found in Appendix B.\nThe vine copula is \ufb01tted post-hoc for two reasons. First, since the nonparametric estimator is\nconsistent for (almost) any distribution, the only purpose of the AE is to minimize the reconstruction\nerror. The AE\u2019s latent space is unconstrained and the same AE can be used for both conditional and\nunconditional sampling. Second, it is unclear how to train a model that includes a nonparametric\nestimator since it has no parameters, there is no loss function to minimize or gradients to propagate.\nOne possibility would be using spline estimators, which would allow to train the model end-to-end\nby \ufb01tting the basis expansion\u2019s coef\ufb01cients. However, spline estimators of copula densities have been\nempirically shown to have inferior performance than the transformation kernel estimator [55].\n\n6\n\n\fThere is some leeway in modeling choices related to the vine. For instance, the number of trees as\nwell as the choice of copula family (i.e., Gaussian or nonparametric) have an impact of the synthetic\nsamples, as sharper details are expected from more \ufb02exible models. Note that one can adjust the\ncharacteristics of the vine until an acceptable \ufb01t of the latent features even after the AE is trained.\n\n4 Experiments\nTo evaluate VCAEs as generative models, we follow an experimental setup similar as related works\non GANs and VAEs. We compare vanilla VAEs to VCAEs using the same architectures, but replacing\nthe variational part of the VAEs by vines to obtain the VCAEs. From the generative adversarial\nframework, we compare to DCGAN [62]. The architectures for all networks are described in\nAppendix D.\nAdditionally, we explore two modi\ufb01cations of VCAE, (i) Conditional VCAE, that is sampling from a\nmixture obtained by \ufb01tting one vine per class label, and (ii) DEC-VCAE, namely adding a clustering-\nrelated penalty as in [81]. The rationale behind the clustering penalty was to better disentangle the\nfeatures in the latent space. In other words, we obtain latent representations where the different\nclusters (i.e., classes) are better separated, thereby facilitating their modeling.\n\n4.1 Experimental setup\nDatasets and metrics\nWe explore three real-world datasets: two small scale - MNIST [40] and Street View House Numbers\n(SVNH) [57], and one large scale - CelebA [44]. While it is generally common to evaluate models by\ncomparing their log-likelihood on a test dataset, this criterion is known to be unsuitable to evaluate\nthe quality of sampled images [75]. As a result, we use an evaluation framework recently developed\nfor GANs [82]. According to [82], the most robust metrics for two sample testing are the classi\ufb01er\ntwo sample test (C2ST, [46]) and mean maximum discrepancy score (MMD, [23]). Furthermore, [82]\nproposes to use these metrics not only in the pixel space, but over feature mappings in convolution\nspace. Hence, we also compare generative models in terms of Wasserstein distance, MMD score\nand C2ST accuracy over ResNet-34 features. Additionally, we also use the common inception score\n[66] and Fr\u00e9chet inception distance (FID, [28]). For all metrics, lower values are better, except for\ninception. We refer the reader to [82] for further details on the metrics and the implementation.\nArchitectures, hyperparameters, and hardware\nFor all models, we \ufb01x the AE\u2019s architecture as described in Appendix D. Parameters of the optimizers\nand other hyperparameters are \ufb01xed as follows. Unless stated otherwise, all experiments were run\nwith nonparametric vines and truncated after 5 trees. We use deep CNN models for the AEs in all\nbaselines and follow closely DCGAN [62] with batch normalization layers for natural image datasets.\nFor all AE-based methods, we use the Adam optimizer with learning rate 0.005 and weight decay\n0.001 for all the natural image experiments, and 0.001 for both parameters on MNIST. For DCGAN,\nwe use the recommended learning rate 0.0002 and 1 = 0.5 for Adam. The size of the latent spaces\nz was selected depending on the dataset\u2019s size and complexity. For MNIST, we present results with\nz = 10, SVHN z = 20 and for CelebA z = 100. We chose to present the values that gave reasonable\nresults for all baselines. For MNIST, we used batch size of 128, for SVHN 32, and for CelebA\nbatches of 100 samples for training. All models were trained on a separate train set, and evaluated on\nhold out test sets of 2000 samples, which is the evaluation size used in [82]. We used Pythorch 4.1\n[58], and we provide our code in Appendix E. All experiments were executed on an AWS instance\np2.xlarge with an NVIDIA K80 GPU, 4 CPUs and 61 GB of RAM.\n\n4.2 Results\nMNIST\nIn Figure 4, we present results from VCAE to understand how different copula families impact the\nquality of the samples. The independence copula corresponds to assuming independence between the\nlatent features as in VAEs. And the images generated using nonparametric vines seem to improve\nover the other two. Within our framework, the training of the AE and the vine \ufb01t are independent.\nAnd we can leverage this to perform conditional sampling by \ufb01tting a different vine for each class of\ndigit. We show results of vine samples per digit class in Figure 4.\n\n7\n\n\fIndependent\n\nNonparametric\n\nGaussian\n\nClass \nlabel\n8\n7\n6\n3\n2\n1\n9\n4\n0\n5\n\nClass \nlabel\n0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n\nFigure 4: Left - impact of copula family selection on MNIST. Middle and Right - random samples\nof Conditional VCAE on MNIST and SVHN.\n\nVAE\n\nVCAE\n\nDEC-VCAE\n\nDCGAN\n\nFigure 5: Left to right, random samples of VAE, VCAE, DEC-VCAE, and DCGAN for SVHN.\n\nSVHN\nThe results in Figure 5 show that the variants of vine generative models visually provide sharper\nimages than vanilla VAEs when architectures and training hyper-parameters are the same for all\nmodels. All AE-based methods were trained on latent space z = 20 for 200 epochs, while for\nDCGAN we use z = 100 and evaluate it at its best performance (50 epochs). In Figure 6, we can see\nthat VCAE and DEC-VCAE have very similar and competitive results to DCGAN (at its best) across\nall metrics, and both clearly outperform vanila VAE. Finally, the FID score calculated with regards to\n104 real test samples are has 0.205 for VAE, 0.194 for DCGAN and 0.167 for VCAE which shows\nthat VCAE also has slight advantage using this metric. In Appendix C.2, Figure 12 and Figure 13\nshow similar results respectively for the MNIST and CelebA datasets.\n\nFigure 6: Various evaluation scores for all baselines on the SVHN dataset.\n\nCelebA\nIn the large scale setting, we present results for VCAE, VAE, and DCGAN only, because our GPU\nran out of memory on DEC-VCAE. From the random samples in Figure 7, we see that, for the same\namount of training (in terms of epochs), VCAE results is not only sharper but also produce more\ndiverse samples. VAEs improve using additional training, but vine-based solutions achieve better\nresults with less resources and without constraints on the latent space. Note that, in Appendix C.3,\nwe also study the quality of the latent representation.\nTo see the effect of the number of trees in the vine structure, we include Figure 8, where we can\nsee that from the random sample the vine with \ufb01ve trees provides images with sharper details.\n\n8\n\n\fVAE\n\nVCAE\n\nDCGAN\n\n1 tree\n\n5 trees\n\n1 tree\n\n5 trees\n\nFigure 7: Random samples for models trained on the CelebA dataset, for VAE and VCAE at 200\nepochs, and for DCGAN best results at 30 epochs.\nSince, as stated in Section 2.3 and Section 2.4, the\nalgorithms complexity increases linearly with the\nnumber of trees, we explore the trade-off between\ncomputation time and quality of the samples in Ap-\npendix C.4. Results show that, as expected, deeper\nvines, and hence longer computation times, improve\nthe quality of the generated images. Finally, as for\nSVHN, the FID score shows an advantage of the vine-\nbase method over VAEs as we \ufb01nd 0.247 for VAE\nFigure 8: Higher truncation - sharper images.\nand 0.233 for VCAE. For DCGAN the FID score is\n0.169 which is better than VCAE, however, looking at the random batch samples in Figure 7 although\nGANs outputs sharper images, it is clear that VCAE produces more realistic faces.\nExecution times\nWe conclude the experimental section with Ta-\nble 1 comparing execution times. We note that\nVCAE compares favorably to VAE, which is a\n\u201cfair\u201d observation given that the architectures are\nalike. Comparison to DCGAN is more dif\ufb01cult,\ndue to the different nature of the two frameworks\n(i.e., based respectively on AEs or adversarial).\nIt should also be noted that the implementation of VCAE is far from optimal for two reasons. First,\nwe use the R interface to vinecopulib in Python through rpy2. As such, there is a communication\noverhead resulting from switching between R and Python. Second, while vinecopulib uses native\nC++11 multithreading, it does not run on GPU cores. From our results, this is not problematic, since\nthe execution times are satisfactory. But VCAE could be much faster if nonparametric vines were\nimplemented in a tensor-based framework.\n\nTable 1: Execution times.\n\nVAE\nVCAE\nDEC VCAE\nDCGAN\n\n4h 7 min\n1h 32 min\n2h 35 min\n\nMNIST\n\n(200 epochs)\n\nCelebA\n\n(100 epochs)\n\n120 min (40 epochs)\n\n3h 20 min (50 epochs)\n\n5h (30 epochs)\n\n50 min\n55 min\n101 min\n\nSVHN\n\n(200 epochs)\n\n7h\n6.5h\n\n/\n\n5 Conclusion\nIn this paper, we present vine copula autoencoders (VCAEs), a \ufb01rst attempt at using copulas as\nhigh-dimensional generative models. VCAE leverage the capacities of AEs at providing compressed\nrepresentations of the data, along with the \ufb02exibility of nonparametric vines to model arbitrary\nprobability distributions. We highlight the versatility and power of vines as generative models in\nhigh-dimensional settings with experiments on various real datasets. VCAEs results show that they\nare comparable to existing solutions in terms of sample quality, while at the same time providing\nstraightforward training along more control over \ufb02exibility at modeling and exploration (tuning\ntruncation level, selection of copula families/parameter values). Several directions for future work\nand extensions are being considered. First, we started to experiments with VAEs having \ufb02exible\ndistributional assumptions (i.e., by using a vine on the variational distribution). Second, we plan\non studying hybrid models using adversarial mechanisms. In related work [38] (see Appendix F),\nwe have also investigated the method\u2019s potential for sampling sequential data (arti\ufb01cial mobility\ntrajectories). There can also be extensions to text data, or investigating which types of vines synthesize\nbest samples for different data types.\n\n9\n\n\fReferences\n[1] K. Aas, C. Czado, A. Frigessi, and H. Bakken. Pair-Copula Constructions of Multiple Depen-\n\ndence. Insurance: Mathematics and Economics, 44(2):182\u2013198, 2009.\n\n[2] Kjersti Aas. Pair-copula constructions for \ufb01nancial applications: A review. Econometrics, 4(4):\n\n43, October 2016.\n\n[3] Tim Bedford and Roger M. Cooke. Probability Density Decomposition for Conditionally Depen-\ndent Random Variables Modeled by Vines. Annals of Mathematics and Arti\ufb01cial Intelligence,\n32(1-4):245\u2013268, 2001.\n\n[4] Tim Bedford and Roger M. Cooke. Vines \u2013 A New Graphical Model for Dependent Random\n\nVariables. The Annals of Statistics, 30(4):1031\u20131068, 2002.\n\n[5] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and\nnew perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):\n1798\u20131828, 2013.\n\n[6] Diane Bouchacourt, Ryota Tomioka, and Sebastian Nowozin. Multi-level variational autoen-\n\ncoder: Learning disentangled representations from grouped observations. In AAAI, 2018.\n\n[7] Herv\u00e9 Bourlard and Yves Kamp. Auto-association by multilayer perceptrons and singular value\n\ndecomposition. Biological cybernetics, 59(4-5):291\u2013294, 1988.\n\n[8] Eike C. Brechmann and Harry Joe. Parsimonious parameterization of correlation matrices using\ntruncated vines and factor analysis. Computational Statistics and Data Analysis, 77:233\u2013251,\n2014.\n\n[9] Eike Christian Brechmann and Claudia Czado. COPAR\u2014multivariate time series modeling\nusing the copula autoregressive model. Applied Stochastic Models in Business and Industry, 31\n(4):495\u2013514, 2015.\n\n[10] Eike Christian Brechmann and Harry Joe. Truncation of vine copulas using \ufb01t indices. Journal\n\nof Multivariate Analysis, 138:19\u201333, 2015.\n\n[11] Eike Christian Brechmann and Ulf Schepsmeier. Modeling dependence with C-and D-vine\n\ncopulas: The R-package CDVine. Journal of Statistical Software, 52(3):1\u201327, 2013.\n\n[12] Eike Christian Brechmann, Claudia Czado, and Kjersti Aas. Truncated regular vines in high\ndimensions with application to \ufb01nancial data. Canadian Journal of Statistics, 40(1):68\u201385,\nMarch 2012.\n\n[13] Yale Chang, Yi Li, Adam Ding, and Jennifer Dy. A robust-equitable copula dependence measure\n\nfor feature selection. AISTATS, 2016.\n\n[14] Tatjana Chavdarova and Fran\u00e7ois Fleuret. Sgan: An alternative training of generative adversarial\n\nnetworks. In CVPR, 2018.\n\n[15] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan:\nInterpretable representation learning by information maximizing generative adversarial nets. In\nAdvances in neural information processing systems, pages 2172\u20132180, 2016.\n\n[16] Claudia Czado. Pair-Copula Constructions of Multivariate Copulas. In Piotr Jaworski, Fab-\nrizio Durante, Wolfgang Karl H\u00e4rdle, and Tomasz Rychlik, editors, Copula Theory and Its\nApplications, Lecture Notes in Statistics, pages 93\u2013109. Springer Berlin Heidelberg, 2010.\n\n[17] Claudia Czado, Ulf Schepsmeier, and Aleksey Min. Maximum likelihood estimation of mixed\n\nC-vines with application to exchange rates. Statistical Modelling, 12(3):229\u2013255, 2012.\n\n[18] Claudia Czado, Eike Christian Brechmann, and Lutz Gruber. Selection of Vine Copulas. In\nPiotr Jaworski, Fabrizio Durante, and Wolfgang Karl H\u00e4rdle, editors, Copulae in Mathematical\nand Quantitative Finance: Proceedings of the Workshop Held in Cracow, 10-11 July 2012,\nvolume 36. Springer New-York, 2013.\n\n10\n\n\f[19] J. Dissmann, Eike Christian Brechmann, Claudia Czado, Dorota Kurowicka, J Di\u00dfmann,\nEike Christian Brechmann, Claudia Czado, and Dorota Kurowicka. Selecting and estimating\nregular vine copulae and application to \ufb01nancial returns. Computational Statistics & Data\nAnalysis, 59:52\u201369, March 2013.\n\n[20] Gal Elidan. Copulas in machine learning. In Copulae in mathematical and quantitative \ufb01nance,\n\npages 39\u201360. Springer, 2013.\n\n[21] Gery Geenens, Arthur Charpentier, and Davy Paindaveine. Probit transformation for nonpara-\n\nmetric kernel estimation of the copula density. Bernoulli, 23(3):1848\u20131873, 2017.\n\n[22] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil\nOzair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NeurIPS, pages\n2672\u20132680, 2014.\n\n[23] Arthur Gretton, Karsten M Borgwardt, Malte Rasch, Bernhard Sch\u00f6lkopf, and Alex J Smola. A\n\nkernel method for the two-sample-problem. In NeurIPS, 2007.\n\n[24] Paulina Grnarova, K\ufb01r Y Levy, Aurelien Lucchi, Nathanael Perraudin, Thomas Hofmann, and\n\nAndreas Krause. Evaluating gans via duality. arXiv preprint arXiv:1811.05512, 2018.\n\n[25] Ga\u00ebl Guennebaud, Beno\u00eet Jacob, and Others. Eigen v3, 2010.\n\n[26] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville.\n\nImproved training of wasserstein gans. In NeurIPS, 2017.\n\n[27] Ingrid Hob\u00e6k Haff. Parameter estimation for pair-copula constructions. Bernoulli, 19(2):\n\n462\u2013491, 2013.\n\n[28] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter.\nGans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS,\n2017.\n\n[29] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick,\nShakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual concepts with a\nconstrained variational framework. ICLR, 2017.\n\n[30] Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with\n\nneural networks. Science, 313(5786):504\u2013507, 2006.\n\n[31] Geoffrey E Hinton and Richard S Zemel. Autoencoders, minimum description length and\n\nhelmholtz free energy. In NeurIPS, pages 3\u201310, 1994.\n\n[32] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep\n\nbelief nets. Neural computation, 18(7):1527\u20131554, 2006.\n\n[33] Aapo Hyv\u00e4rinen and Erkki Oja. Independent component analysis: algorithms and applications.\n\nNeural networks, 13(4-5):411\u2013430, 2000.\n\n[34] Harry Joe. Multivariate Models and Dependence Concepts. Chapman & Hall/CRC, 1997.\n\n[35] Harry Joe. Dependence modeling with copulas. Chapman and Hall/CRC, 2014.\n\n[36] Matthias Killiches, Daniel Kraus, and Claudia Czado. Model distances for vine copulas in high\n\ndimensions. Statistics and Computing, pages 1\u201319, 2017.\n\n[37] Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. In ICLR, 2014.\n\n[38] Vaibhav Kulkarni, Natasa Tagasovska, Thibault Vatter, and Benoit Garbinato. Generative\n\nmodels for simulating mobility trajectories. 2018.\n\n[39] Dorota Kurowicka and Harry Joe. Dependence Modeling. World Scienti\ufb01c Publishing Company,\n\nIncorporated, 2010. ISBN 978-981-4299-87-9.\n\n[40] Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010.\n\n11\n\n\f[41] Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series.\n\nThe handbook of brain theory and neural networks, 3361(10):1995, 1995.\n\n[42] Haoran Li, Li Xiong, and Xiaoqian Jiang. Differentially Private Synthesization of Multi-\nDimensional Data using Copula Functions. In Proc. of the 17th International Conference on\nExtending Database Technology, number c, pages 475\u2013486, 2014.\n\n[43] Han Liu, John Lafferty, and Larry Wasserman. The Nonparanormal: semiparametric estimation\n\nof high dimensional undirected graphs. JMLR, 10:2295\u20132328, 2009.\n\n[44] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the\n\nwild. In ICCV, 2015.\n\n[45] David Lopez-Paz. From Dependence to Causation. PhD thesis, University of Cambridge, 2016.\n[46] David Lopez-Paz and Maxime Oquab. Revisiting classi\ufb01er two-sample tests. ICLR, 2016.\n[47] David Lopez-Paz, J M Hernandez-Lobato, and Bernhard Sch\u00f6lkopf. Semi-supervised domain\n\nadaptation with copulas. NeurIPS, 2013.\n\n[48] Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, and Ian Goodfellow. Adversarial autoen-\n\ncoders. In ICLR, 2016.\n\n[49] Luke Metz, Ben Poole, David Pfau, and Jascha Sohl-Dickstein. Unrolled generative adversarial\n\nnetworks. ICLR, 2016.\n\n[50] Thomas Nagler and Claudia Czado. Evading the curse of dimensionality in nonparametric\ndensity estimation with simpli\ufb01ed vine copulas. Journal of Multivariate Analysis, 151:69\u201389,\n2016.\n\n[51] Thomas Nagler and Thibault Vatter. vinecopulib: High Performance Algorithms for Vine\n\nCopula Modeling in C++, 2017.\n\n[52] Thomas Nagler and Thibault Vatter. kde1d: Univariate Kernel Density Estimation, 2018. R\n\npackage version 0.2.1.\n\n[53] Thomas Nagler and Thibault Vatter. rvinecopulib: high performance algorithms for vine copula\n\nmodeling, 2018.\n\n[54] Thomas Nagler, Christian Schellhase, and Claudia Czado. Nonparametric estimation of sim-\npli\ufb01ed vine copula models: comparison of methods. Dependence Modeling, 5(1):99\u2013120,\n2017.\n\n[55] Thomas Nagler, Christian Schellhase, and Claudia Czado. Nonparametric estimation of sim-\npli\ufb01ed vine copula models: comparison of methods. Dependence Modeling, 5(1):99\u2013120,\n2017.\n\n[56] Roger B Nelsen. An introduction to copulas. Springer Science & Business Media, 2007.\n[57] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng.\nReading digits in natural images with unsupervised feature learning. In NIPS workshop on deep\nlearning and unsupervised feature learning, volume 2011, page 5, 2011.\n\n[58] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito,\nZeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in\npytorch. 2017.\n\n[59] Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. The synthetic data vault. Proceedings -\n3rd IEEE International Conference on Data Science and Advanced Analytics, pages 399\u2013410,\n2016.\n\n[60] Robert Clay Prim. Shortest connection networks and some generalizations. The Bell System\n\nTechnical Journal, 36(6):1389\u20131401, 1957.\n\n[61] R Core Team. R: A language and environment for statistical computing, 2017.\n\n12\n\n\f[62] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with\n\ndeep convolutional generative adversarial networks. ICLR, 2015.\n\n[63] Alfr\u00e9d R\u00e9nyi. On measures of dependence. Acta mathematica hungarica, 10(3-4):441\u2013451,\n\n1959.\n\n[64] Danilo Rezende and Shakir Mohamed. Variational inference with normalizing \ufb02ows. In ICML,\n\n2015.\n\n[65] Murray Rosenblatt. Remarks on a multivariate transformation. The annals of mathematical\n\nstatistics, 23(3):470\u2013472, 1952.\n\n[66] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen.\n\nImproved techniques for training gans. In NeurIPS, 2016.\n\n[67] Olivier Scaillet, Arthur Charpentier, and Jean-David Fermanian. The estimation of copulas:\nTheory and practice. Technical report, Ensae-Crest and Katholieke Universiteit Leuven, NP-\nParibas and Crest; HEC Geneve and Swiss Finance Institute, 2007.\n\n[68] Boris Sch\u00e4ling. The Boost C++ Libraries. 2011.\n[69] Ulf Schepsmeier and Jakob St\u00f6ber. Derivatives and Fisher information of bivariate copulas.\n\nStatistical Papers, 55(2):525\u2013542, May 2014.\n\n[70] Simon J Sheather and Michael C Jones. A reliable data-based bandwidth selection method for\nkernel density estimation. Journal of the Royal Statistical Society. Series B (Methodological),\npages 683\u2013690, 1991.\n\n[71] A. Sklar. Fonctions de R\u00e9partition \u00e0 n Dimensions et Leurs Marges. Publications de L\u2019Institut\n\nde Statistique de L\u2019Universit\u00e9 de Paris, 8:229\u2013231, 1959.\n\n[72] Jakob St\u00f6ber and Claudia Czado. Sampling Pair Copula Constructions with Applications to\nMathematical Finance. In Jan-Frederik Mai and Matthias Scherer, editors, Simulating Copulas:\nStochastic Models, Sampling Algorithms and Applications, Series in quantitative \ufb01nance. World\nScienti\ufb01c Publishing Company, Incorporated, 2012.\n\n[73] Jakob St\u00f6ber and Ulf Schepsmeier. Estimating standard errors in regular vine copula models.\n\nComputational Statistics, 28(6):2679\u20132707, 2013.\n\n[74] Natasa Tagasovska, Thibault Vatter, and Val\u00e9rie Chavez-Demoulin. Nonparametric quantile-\n\nbased causal discovery. arXiv:1801.10579, 2018.\n\n[75] Lucas Theis, A\u00e4ron van den Oord, and Matthias Bethge. A note on the evaluation of generative\n\nmodels. In ICLR, 2015.\n\n[76] Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein auto-\n\nencoders. ICLR, 2018.\n\n[77] Ilya O Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, and Bernhard\n\nSch\u00f6lkopf. Adagan: Boosting generative models. In NeurIPS, pages 5424\u20135433, 2017.\n\n[78] Dustin Tran, David M Blei, and Edoardo M Airoldi. Copula variational inference. In NeurIPS,\n\n2015.\n\n[79] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting\nand composing robust features with denoising autoencoders. In ICML, pages 1096\u20131103, 2008.\n[80] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for\n\nbenchmarking machine learning algorithms. 2017.\n\n[81] Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering\n\nanalysis. In ICML, 2016.\n\n[82] Qiantong Xu, Gao Huang, Yang Yuan, Chuan Guo, Yu Sun, Felix Wu, and Kilian Weinberger.\nAn empirical study on evaluation metrics of generative adversarial networks. arXiv preprint\narXiv:1806.07755, 2018.\n\n13\n\n\f", "award": [], "sourceid": 3510, "authors": [{"given_name": "Natasa", "family_name": "Tagasovska", "institution": "University of Lausanne"}, {"given_name": "Damien", "family_name": "Ackerer", "institution": "Swissquote"}, {"given_name": "Thibault", "family_name": "Vatter", "institution": "Columbia University"}]}