{"title": "Structured Generative Adversarial Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 3899, "page_last": 3909, "abstract": "We study the problem of conditional generative modeling based on designated semantics or structures. Existing models that build conditional generators either require massive labeled instances as supervision or are unable to accurately control the semantics of generated samples. We propose structured generative adversarial networks (SGANs) for semi-supervised conditional generative modeling. SGAN assumes the data x is generated conditioned on two independent latent variables: y that encodes the designated semantics, and z that contains other factors of variation. To ensure disentangled semantics in y and z, SGAN builds two collaborative games in the hidden space to minimize the reconstruction error of y and z, respectively. Training SGAN also involves solving two adversarial games that have their equilibrium concentrating at the true joint data distributions p(x, z) and p(x, y), avoiding distributing the probability mass diffusely over data space that MLE-based methods may suffer. We assess SGAN by evaluating its trained networks, and its performance on downstream tasks. We show that SGAN delivers a highly controllable generator, and disentangled representations; it also establishes start-of-the-art results across multiple datasets when applied for semi-supervised image classification (1.27%, 5.73%, 17.26% error rates on MNIST, SVHN and CIFAR-10 using 50, 1000 and 4000 labels, respectively). Benefiting from the separate modeling of y and z, SGAN can generate images with high visual quality and strictly following the designated semantic, and can be extended to a wide spectrum of applications, such as style transfer.", "full_text": "Structured Generative Adversarial Networks\n\n1Tsinghua University, 2Carnegie Mellon University, 3Petuum Inc.\n\n{dzj17,xsz12}@mails.tsinghua.edu.cn, {hao,xiaodan1,luonay1}@cs.cmu.edu,\n\n1Zhijie Deng\u2217, 2,3Hao Zhang\u2217, 2Xiaodan Liang, 2Luona Yang,\n\n1,2Shizhen Xu, 1Jun Zhu\u2020, 3Eric P. Xing\n\ndcszj@mail.tsinghua.edu.cn, epxing@cs.cmu.edu\n\nAbstract\n\nWe study the problem of conditional generative modeling based on designated\nsemantics or structures. Existing models that build conditional generators either\nrequire massive labeled instances as supervision or are unable to accurately control\nthe semantics of generated samples. We propose structured generative adversarial\nnetworks (SGANs) for semi-supervised conditional generative modeling. SGAN\nassumes the data x is generated conditioned on two independent latent variables:\ny that encodes the designated semantics, and z that contains other factors of\nvariation. To ensure disentangled semantics in y and z, SGAN builds two col-\nlaborative games in the hidden space to minimize the reconstruction error of y\nand z, respectively. Training SGAN also involves solving two adversarial games\nthat have their equilibrium concentrating at the true joint data distributions p(x, z)\nand p(x, y), avoiding distributing the probability mass diffusely over data space\nthat MLE-based methods may suffer. We assess SGAN by evaluating its trained\nnetworks, and its performance on downstream tasks. We show that SGAN delivers\na highly controllable generator, and disentangled representations; it also establishes\nstart-of-the-art results across multiple datasets when applied for semi-supervised\nimage classi\ufb01cation (1.27%, 5.73%, 17.26% error rates on MNIST, SVHN and\nCIFAR-10 using 50, 1000 and 4000 labels, respectively). Bene\ufb01ting from the\nseparate modeling of y and z, SGAN can generate images with high visual quality\nand strictly following the designated semantic, and can be extended to a wide\nspectrum of applications, such as style transfer.\n\nIntroduction\n\n1\nDeep generative models (DGMs) [12, 8, 26] have gained considerable research interest recently\nbecause of their high capacity of modeling complex data distributions and ease of training or inference.\nAmong various DGMs, variational autoencoders (VAEs) and generative adversarial networks (GANs)\ncan be trained unsupervisedly to map a random noise z \u223c N (0, 1) to the data distribution p(x), and\nhave reported remarkable successes in many domains including image/text generation [17, 9, 3, 27],\nrepresentation learning [27, 4], and posterior inference [12, 5]. They have also been extended to\nmodel the conditional distribution p(x|y), which involves training a neural network generator G that\ntakes as inputs both the random noise z and a condition y, and generates samples that have desired\nproperties speci\ufb01ed by y. Obtaining such a conditional generator would be quite helpful for a wide\nspectrum of downstream applications, such as classi\ufb01cation, where synthetic data from G can be used\nto augment the training set. However, training conditional generator is inherently dif\ufb01cult, because it\nrequires not only a holistic characterization of the data distribution, but also \ufb01ne-grained alignments\nbetween different modes of the distribution and different conditions. Previous works have tackled this\nproblem by using a large amount of labeled data to guide the generator\u2019s learning [32, 23, 25], which\ncompromises the generator\u2019s usefulness because obtaining the label information might be expensive.\n\n\u2217 indicates equal contributions. \u2020 indicates the corresponding author. 31st Conference on Neural Information\nProcessing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fIn this paper, we investigate the problem of building conditional generative models under semi-\nsupervised settings, where we have access to only a small set of labeled data. The existing works [11,\n15] have explored this direction based on DGMs, but the resulted conditional generators exhibit\ninadequate controllability, which we de\ufb01ne as the generator\u2019s ability to conditionally generate samples\nthat have structures strictly agreeing with those speci\ufb01ed by the condition \u2013 a more controllable\ngenerator can better capture and respect the semantics of the condition.\nWhen supervision from labeled data is scarce, the controllability of a generative model is usually\nin\ufb02uenced by its ability to disentangle the designated semantics from other factors of variations\n(which we will term as disentanglability in the following text). In other words, the model has to\n\ufb01rst learn from a small set of labeled data what semantics or structures the condition y is essentially\nrepresenting by trying to recognize y in the latent space. As a second step, when performing\nconditional generation, the semantics shall be exclusively captured and governed within y but not\ninterweaved with other factors. Following this intuition, we build the structured generative adversarial\nnetwork (SGAN) with enhanced controllability and disentanglability for semi-supervised generative\nmodeling. SGAN separates the hidden space to two parts y and z, and learns a more structured\ngenerator distribution p(x|y, z) \u2013 where the data are generated conditioned on two latent variables:\ny, which encodes the designated semantics, and z that contains other factors of variation. To impose\nthe aforementioned exclusiveness constraint, SGAN \ufb01rst introduces two dedicated inference networks\nC and I to map x back to the hidden space as C : x \u2192 y, I : x \u2192 z, respectively. Then, SGAN\nenforces G to generate samples that when being mapped back to hidden space using C (or I), the\ninferred latent code and the generator condition are always matched, regardless of the variations\nof the other variable z (or y). To train SGAN, we draw inspirations from the recently proposed\nadversarially learned inference framework (ALI) [5], and build two adversarial games to drive I, G to\nmatch the true joint distributions p(x, z), and C, G to match the true joint distribution p(x, y). Thus,\nSGAN can be seen as a combination of two adversarial games and two collaborative games, where\nI, G combat each other to match joint distributions in the visible space, but I, C, G collaborate with\neach other to minimize a reconstruction error in the hidden space. We theoretically show that SGAN\nwill converge to desired equilibrium if trained properly.\nTo empirically evaluate SGAN, we \ufb01rst de\ufb01ne a mutual predictability (MP) measure to evaluate the\ndisentanglability of various DGMs, and show that in terms of MP, SGAN outperforms all existing\nmodels that are able to infer the latent code z across multiple image datasets. When classifying\nthe generated images using a golden classi\ufb01er, SGAN achieves the highest accuracy, con\ufb01rming\nits improved controllability for conditional generation under semi-supervised settings. In the semi-\nsupervised image classi\ufb01cation task, SGAN outperforms strong baselines, and establishes new\nstate-of-the-art results on MNIST, SVHN and CIFAR-10 dataset. For controllable generation, SGAN\ncan generate images with high visual quality in terms of both visual comparison and inception score,\nthanks to the disentangled latent space modeling. As SGAN is able to infer the unstructured code z,\nwe further apply SGAN for style transfer, and obtain impressive results.\n2 Related Work\nDGMs have drawn increasing interest from the community, and have been developed mainly toward\ntwo directions: VAE-based models [12, 11, 32] that learn the data distribution via maximum likelihood\nestimation (MLE), and GAN-based methods [19, 27, 21] that train a generator via adversarial learning.\nSGAN combines the best of MLE-based methods and GAN-based methods which we will discuss\nin detail in the next section. DGMs have also been applied for conditional generation, such as\nCGAN [19], CVAE [11]. DisVAE [32] is a successful extension of CVAE that generates images\nconditioned on text attributes. In parallel, CGAN has been developed to generate images conditioned\non text [24, 23], bounding boxes, key points [25], locations [24], other images [10, 6, 31], or generate\ntext conditioned on images [17]. All these models are trained using fully labeled data.\nA variety of techniques have been developed toward learning disentangled representations for genera-\ntive modeling [3, 29]. InfoGAN [3] disentangles hidden dimensions on unlabeled data by mutual\ninformation regularization. However, the semantic of each disentangled dimension is uncontrollable\nbecause it is discovered after training rather than designated by user modeling. We establish some\nconnections between SGAN and InfoGAN in the next section.\nThere is also interest in developing DGMs for semi-supervised conditional generation, such as semi-\nsupervised CVAE [11], its many variants [16, 9, 18], ALI [5] and TripleGAN [15], among which\nthe closest to us are [15, 9]. In [9], VAE is enhanced with a discriminator loss and an independency\n\n2\n\n\fconstraint, and trained via joint MLE and discriminator loss minimization. By contrast, SGAN is an\nadversarial framework that is trained to match two joint distributions in the visible space, thus avoids\nMLE for visible variables. TripleGAN builds a three-player adversarial game to drive the generator to\nmatch the conditional distribution p(x|y), while SGAN models the conditional distribution p(x|y, z)\ninstead. TripleGAN therefore lacks constraints to ensure the semantics of interest to be exclusively\ncaptured by y, and lacks a mechanism to perform posterior inference for z.\n3 Structured Generative Adversarial Networks (SGAN)\nWe build our model based on the generative adversarial networks (GANs) [8], a framework for\nlearning DGMs using a two-player adversarial game. Speci\ufb01cally, given observed data {xi}N\ni=1,\nGANs try to estimate a generator distribution pg(x) to match the true data distribution pdata(x),\nwhere pg(x) is modeled as a neural network G that transforms a noise variable z \u223c N (0, 1)\ninto generated data \u02c6x = G(z). GANs assess the quality of \u02c6x by introducing a neural network\ndiscriminator D to judge whether a sample is from pdata(x) or the generator distribution pg(x). D\nis trained to distinguish generated samples from true samples while G is trained to fool D:\nL(D, G) = Ex\u223cpdata(x)[log(D(x))] + Ez\u223cp(z)[log(1 \u2212 D(G(z)))],\n\nmax\n\nmin\n\nG\n\nD\n\nGoodfellow et al. [8] show the global optimum of the above problem is attained at pg = pdata. It is\nnoted that the original GAN models the latent space using a single unstructured noise variable z. The\nsemantics and structures that may be of our interest are entangled in z, and the generator transforms\nz into \u02c6x in a highly uncontrollable way \u2013 it lacks both disentanglability and controllability.\nWe next describe SGAN, a generic extension to GANs that is enhanced with improved disentanglabil-\nity and controllability for semi-supervised conditional generative modeling.\nOverview. We consider a semi-supervised setting, where we observe a large set of unlabeled data\nX = {xi}N\ni=1. We are interested in both the observed sample x and some hidden structures y of\nx, and want to build a conditional generator that can generate data \u02c6x that matches the true data\ndistribution of x, while obey the structures speci\ufb01ed in y (e.g. generate pictures of digits given 0-9).\nj}M\nBesides the unlabeled x, we also have access to a small chunk of data Xl = {xl\nj=1 where\nthe structure y is jointly observed. Therefore, our model needs to characterize the joint distribution\np(x, y) instead of the marginal p(x), for both fully and partially observed x.\nAs the data generation process is intrinsically complex and usually determined by many factors\nbeyond y, it is necessary to consider other factors that are irrelevant with y, and separate the hidden\nspace into two parts (y, z), of which y encodes the designated semantics, and z includes any\nother factors of variation [3]. We make a mild assumption that y and z are independent from each\nother so that y could be disentangled from z. Our model thus needs to take into consideration the\nuncertainty of both (x, y) and z, i.e. characterizing the joint distribution p(x, y, z) while being able\nto disentangle y from z. Directly estimating p(x, y, z) is dif\ufb01cult, as (1) we have never observed z\nand only observed y for partial x; (2) y and z might be entangled at any time as the training proceeds.\nAs an alternative, SGAN builds two inference networks I and C. The two inference networks de\ufb01ne\ntwo distributions pi(z|x) and pc(y|x) that are trained to approximate the true posteriors p(z|x) and\np(y|x) using two different adversarial games. The two games are uni\ufb01ed via a shared generator\nx \u223c pg(x|y, z). Marginalizing out z or y obtains pg(x|z) and pg(x|y):\n\nj, yl\n\n(cid:90)\n\n(cid:90)\n\npg(x|z) =\n\np(y)pg(x|y, z)dy, pg(x|y) =\n\np(z)pg(x|y, z)dz,\n\n(1)\n\ny\n\nz\n\nwhere p(y) and p(z) are appropriate known priors for y and z. As SGAN is able to perform posterior\ninference for both z and y given x (even for unlabeled data), we can directly imposes constraints [13]\nthat enforce the structures of interest being exclusively captured by y, while those irreverent factors\nbeing encoded in z (as we will show later). Fig.1 illustrates the key components of SGAN, which we\nelaborate as follows.\nGenerator pg(x|y, z). We assume the following generative process from y, z to x: z \u223c p(z), y \u223c\np(y), x \u223c p(x|y, z), where p(z) is chosen as a non-informative prior, and p(y) as an appropriate\nprior that meets our modeling needs (e.g. a categorical distribution for digit class). We parametrize\np(x|y, z) using a neural network generator G, which takes y and z as inputs, and outputs generated\nsamples x \u223c pg(x|y, z) = G(y, z). G can be seen as a \u201cdecoder\u201d in VAE parlance, and its\narchitecture depends on speci\ufb01c applications, such as a deconvolutional neural network for generating\nimages [25, 21].\n\n3\n\n\fmax\nDxz\n\nmax\nDxy\n\nG\n\ng and p\u2217\n\ng(x, z) = p\u2217\ni to denote the optimal distributions when Lxz reaches its equilibrium.\n\nFigure 1: An overview of the SGAN model: (a) the generator pg(x|y, z); (b) the adversarial game Lxz; (c) the\nadversarial game Lxy; (d) the collaborative game Rz; (e) the collaborative game Ry.\nAdversarial game Lxz. Following the adversarially learning inference (ALI) framework, we con-\nstruct an adversarial game to match the distributions of joint pairs (x, z) drawn from the two dif-\nferent factorizations: pg(x, z) = p(z)pg(x|z), pi(x, z) = p(x)pi(z|x). Speci\ufb01cally, to draw\nsamples from pg(x, z), we note the fact that we can \ufb01rst draw the tuple (x, y, z) following\ny \u223c p(y), z \u223c p(z), x \u223c pg(x|y, z), and then only taking (x, z) as needed. This implicitly\nperforms the marginalization as in Eq. 1. On the other hand, we introduce an inference network\nI : x \u2192 z to approximate the true posterior p(z|x). Obtaining (x, z) \u223c p(x)pi(z|x) with I is\nstraightforward: x \u223c p(x), z \u223c pi(z|x) = I(x). Training G and I involves \ufb01nding the Nash\nequilibrium for the following minimax game Lxz (we slightly abuse Lxz for both the minimax\nobjective and a name for this adversarial game):\nLxz = Ex\u223cp(x)[log(Dxz(x, I(x)))] + Ez\u223cp(z),y\u223cp(y)[log(1 \u2212 Dxz(G(y, z), z))], (2)\nmin\nI,G\nwhere we introduce Dxz as a critic network that is trained to distinguish pairs (x, z) \u223c pg(x, z)\nfrom those come from pi(x, z). This minimax objective reaches optimum if and only if the condi-\ntional distribution pg(x|z) characterized by G inverses the approximate posterior pi(z|x), implying\npg(x, z) = pi(x, z) [4, 5]. As we have never observed z for x, as long as z is assumed to be inde-\npendent from y, it is reasonable to just set the true joint distribution p(x, z) = p\u2217\ni (x, z),\nwhere we use p\u2217\nAdversarial game Lxy. The second adversarial game is built to match the true joint data distribution\np(x, y) that has been observed on Xl. We introduce the other critic network Dxy to discriminate\n(x, y) \u223c p(x, y) from (x, y) \u223c pg(x, y) = p(y)pg(x|y), and build the game Lxy as:\nLxy = E(x,y)\u223cp(x,y)[log(Dxy(x, y))] + Ey\u223cp(y),z\u223cp(z)[log(1 \u2212 Dxy(G(y, z), y))].\nmin\n(3)\nCollaborative game Ry. Although training the adversarial game Lxy theoretically drives pg(x, y)\nto concentrate on the true data distribution p(x, y), it turns out to be very dif\ufb01cult to train Lxy to\ndesired convergence, as (1) the joint distribution p(x, y) characterized by Xl might be biased due\nto its small data size; (2) there is little supervision from Xl to tell G what y essentially represents,\nand how to generate samples conditioned on y. As a result, G might lack controllability \u2013 it might\ngenerate low-\ufb01delity samples that are not aligned with their conditions, which will always be rejected\nby Dxy. A natural solution to these issues is to allow (learned) posterior inference of y to reconstruct\ny from generated x [5]. By minimizing the reconstruction error, we can backpropagate the gradient\nto G to enhance its controllability. Once pg(x|y) can generate high-\ufb01delity samples that respect the\nstructures y, we can reuse the generated samples (x, y) \u223c pg(x, y) as true samples in the \ufb01rst term\nof Lxy, to prevent Dxz from collapsing into a biased p(x, y) characterized by Xl.\nIntuitively, we introduce the second inference network C : x \u2192 y which approximates the posterior\np(y|x) as y \u223c pc(y|x) = C(x), e.g. C reduces to a N-way classi\ufb01er if y is categorical. To train\npc(y|x), we de\ufb01ne a collaboration (reconstruction) game Ry in the hidden space of y:\nRy = \u2212E(x,y)\u223cp(x,y)[log pc(y|x)] \u2212 E(x,y)\u223cpg(x,y)[log pc(y|x)],\n\n(4)\n\nmin\nC,G\n\nwhich aims to minimize the reconstruction error of y in terms of C and G, on both labeled data Xl\nand generated data (x, y) \u223c pg(x, y). On the one hand, minimizing the \ufb01rst term of Ry w.r.t. C\nguides C toward the true posterior p(y|x). On the other hand, minimizing the second term w.r.t. G\nenhances G with extra controllability \u2013 it minimizes the chance that G could generate samples that\nwould otherwise be falsely predicted by C. Note that we also minimize the second term w.r.t. C,\nwhich proves effective in semi-supervised learning settings that uses synthetic samples to augment the\npredictive power of C. In summary, minimizing Ry can be seen as a collaborative game between two\nplayers C and G that drives pg(x|y) to match p(x|y) and pc(y|x) to match the posterior p(y|x).\n\n4\n\n\t\ud835\udc65#\t\t\ud835\udc67\t\t\ud835\udc66\t\t\ud835\udc65#\t\ud835\udc67\t\t\ud835\udc66\t\t\ud835\udc65\t\ud835\udc67\ud835\udc37(\ud835\udc65,\ud835\udc67)\t\t\ud835\udc65#\t\t\ud835\udc66\t\t\ud835\udc67\t\t\ud835\udc65*\t\ud835\udc66*\ud835\udc37(\ud835\udc65,\ud835\udc66)\t\t\ud835\udc65#\t\ud835\udc67\t\t\ud835\udc66\t\t\ud835\udc65#\t\ud835\udc66\t\t\ud835\udc67G(\ud835\udc66,\ud835\udc67)I(\ud835\udc65)(a)(b)(c)(d)(e)G(\ud835\udc66,\ud835\udc67)I(\ud835\udc65)C(\ud835\udc65)G(\ud835\udc66,\ud835\udc67)\fCollaborative games Rz. As SGAN allows posterior inference for both y and z, we can ex-\nplicitly impose constraints Ry and Rz to separate y from z during training. To explain, we\n\ufb01rst note that optimizing the second term of Ry w.r.t G actually enforces the structure in-\nformation to be fully persevered in y, because C is asked to recover the structure y from\nG(y, z), which is generated conditioned on y, regardless of the uncertainty of z (as z is\nmarginalized out during sampling). Therefore, minimizing Ry indicates the following constraint:\nminC,G Ey\u223cp(y)\nfunction between a and b (e.g. cross entropy if C is a N-way classi\ufb01er). On the counter part, we also\nwant to enforce any other unstructured information that is not of our interest to be fully captured in z,\nwithout being entangled with y. So we build the second collaborative game Rz as:\n\n(cid:13)(cid:13)pc(y|G(y, z1)), pc(y|G(y, z2))(cid:13)(cid:13),\u2200z1, z2 \u223c p(z), where(cid:13)(cid:13)a, b(cid:13)(cid:13) is some distance\n\nRz = \u2212E(x,z)\u223cpg(x,z)[log pi(z|x)]\n\nmin\nI,G\n\n(5)\n\nreconstructing z in the hidden space.\n\nxz = 1\n\n2 .\nxy = 1\n\n(cid:13)(cid:13)pi(z|G(y1, z)), pi(z|G(y2, z))(cid:13)(cid:13),\u2200y1, y2 \u223c p(y), and when we model I as a\n\nwhere I is required to recover z from those samples generated by G conditioned on z,\nSimilar to Ry, minimizing Rz indicates:\ni.e.\nminI,G Ez\u223cp(z)\ndeterministic mapping [4], the (cid:107)\u00b7(cid:107) distance between distributions is equal to the (cid:96)-2 distance between\nthe outputs of I.\nTheoretical Guarantees. We provide some theoretical results about the SGAN framework under the\nnonparametric assumption. The proofs of the theorems are deferred to the supplementary materials.\nTheorem 3.1 The global minimum of maxDxz Lxz is achieved if and only if p(x)pi(z|x) =\np(z)pg(x|z). At that point D\u2217\n2 . Similarly, the global minimum of maxDxy Lxy is achieved if\nand only if p(x, y) = p(y)pg(x|y). At that point D\u2217\nTheorem 3.2 There exists a generator G\u2217(y, z) of which the conditional distributions pg(x|y) and\npg(x|z) can both achieve equilibrium in their own minimax games Lxy and Lxz.\nTheorem 3.3 Minimizing Rz w.r.t. I will keep the equilibrium of the adversarial game Lxz. Sim-\nilarly, minimizing Ry w.r.t. C will keep the equilibrium of the adversarial game Lxy unchanged.\nAlgorithm 1 Training Structured Generative Adversarial Networks (SGAN).\n1: Pretrain C by minimizing the \ufb01rst term of Eq. 4 w.r.t. C using Xl.\n2: repeat\n3:\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13: until convergence.\n\nSample a batch of x: xu \u223c p(x).\nSample batches of pairs (x, y): (xl, yl) \u223c p(x, y), (xg, yg) \u223c pg(x, y), (xc, yc) \u223c pc(x, y).\nObtain a batch (xm, ym) by mixing data from (xl, yl), (xg, yg), (xc, yc) with proper mixing portion.\nfor k = 1 \u2192 K do\n\nend for\nTrain I by minimizing Lxz using xu and Rz using xg.\nTrain C by minimizing Ry using (xm, ym) (see text).\nTrain G by minimizing Lxy + Lxz + Ry + Rz using (xg, yg).\n\nTrain Dxz by maximizing the \ufb01rst term of Lxz using xu and the second using xg.\nTrain Dxy by maximizing the \ufb01rst term of Lxy using (xm, ym) and the second using (xg, yg).\n\nTraining. SGAN is fully differentiable and can be trained end-to-end using stochastic gradient\ndescent, following the strategy in [8] that alternatively trains the two critic networks Dxy, Dxz and\nthe other networks G, I and C. Though minimizing Ry and Rz w.r.t. G will introduce slight bias,\nwe \ufb01nd empirically it works well and contributes to disentangling y and z. The training procedures\nare summarized in Algorithm 1. Moreover, to guarantee that C could be properly trained without\nbias, we pretrain C by minimizing the \ufb01rst term of Ry until convergence, and do not minimize Ry\nw.r.t. C until G has started generating meaning samples (usually after several epochs of training).\nAs the training proceeds, we gradually improve the portion of synthetic samples (x, y) \u223c pg(x, y)\nand (x, y) \u223c pc(x, y) in the stochastic batch, to help the training of Dxy and C (see Algorithm 1),\nand you can refer to our codes on GitHub for more details of the portion. We empirically found this\nmutual bootstrapping trick yields improved C and G.\nDiscussion and connections. SGAN is essentially a combination of two adversarial games Lxy and\nLxz, and two collaborative games Ry, Rz, where Lxy and Lxz are optimized to match the data\ndistributions in the visible space, while Ry and Rz are trained to match the posteriors in the hidden\nspace. It combines the best of GAN-based methods and MLE-based methods: on one hand, estimating\n\n5\n\n\fdensity in the visible space using GAN-based formulation avoids distributing the probability mass\ndiffusely over data space [5], which MLE-based frameworks (e.g. VAE) suffer. One the other hand,\nincorporating reconstruction-based constraints in latent space helps enforce the disentanglement\nbetween structured information in y and unstructured ones in z, as we argued above.\nWe also establish some connections between SGAN and some existing works [15, 27, 3]. We note\nthe Lxy game in SGAN is connected to the TripleGAN framework [15] when its trade-off parameter\n\u03b1 = 0. We will empirically show that SGAN yields better controllability on G, and also improved\nperformance on downstream tasks, due to the separate modeling of y and z. SGAN also connects to\nInfoGAN in the sense that the second term of Ry (Eq. 4) reduces to the mutual information penalty in\nInfoGAN under unsupervised settings. However, SGAN and InfoGAN have totally different aims and\nmodeling techniques. SGAN builds a conditional generator that has the semantic of interest y as a\nfully controllable input (known before training); InfoGAN in contrast aims to disentangle some latent\nvariables whose semantics are interpreted after training (by observation). Though extending InfoGAN\nto semi-supervised settings seems straightforward, successfully learning the joint distribution p(x, y)\nwith very few labels is non-trivial: InfoGAN only maximizes the mutual information between y\nand G(y, z), bypassing p(y|x) or p(x, y), thus its direct extension to semi-supervised settings may\nfail due to lack of p(x, y). Moreover, SGAN has dedicated inference networks I and C, while\nthe network Q(x) in InfoGAN shares parameters with the discriminator, which has been argued\nas problematic [15, 9] as it may compete with the discriminator and prevents its success in semi-\nsupervised settings. See our ablation study in section 4.2 and Fig.3. Finally, the \ufb01rst term in Ry\nis similar to the way Improved-GAN models the conditional p(y|x) for labeled data, but SGAN\ntreats the generated data very differently \u2013 Improved-GAN labels xg = G(z, y) as a new class\ny = K + 1, instead SGAN reuses xg and xc to mutually boost I, C and G, which is key to the\nsuccess of semi-supervised learning (see section 4.2).\n4 Evaluation\nWe empirically evaluate SGAN through experiments on different datasets. We show that separately\nmodeling z and y in the hidden space helps better disentangle the semantics of our interest from other\nirrelevant attributes, thus yields improved performance for both generative modeling (G) and posterior\ninference (C, I) (section 4.1 4.3). Under SGAN framework, the learned inference networks and\ngenerators can further bene\ufb01t a lot of downstream applications, such as semi-supervised classi\ufb01cation,\ncontrollable image generation and style transfer (section 4.2 4.3).\nDataset and con\ufb01gurations. We evaluate SGAN on three image datasets: (1) MNIST [14]: we use\nthe 60K training images as unlabeled data, and sample n \u2208 {20, 50, 100} labels for semi-supervised\nlearning following [12, 27], and evaluate on the 10K test images. (2) SVHN [20]: a standard train/test\nsplit is provided, where we sample n = 1000 labels from the training set for semi-supervised\nlearning [27, 15, 5]. (3) CIFAR-10: a challenging dataset for conditional image generation that\nconsists of 50K training and 10K test images from 10 object classes. We randomly sample n = 4000\nlabels [27, 28, 15] for semi-supervised learning. For all datasets, our semantic of interest is the\ndigit/object class, so y is a 10-dim categorical variable. We use a 64-dim gaussian noise as z in\nMNIST and a 100-dim uniform noise as z in SVHN and CIFAR-10.\nImplementation. We implement SGAN using TensorFlow [1] and Theano [2] with distributed\nacceleration provided by Poseidon [33] which parallelizes line 7-8 and 10-12 of Algorithm. 1. The\nneural network architectures of C, G and Dxy mostly follow those used in TripleGAN [15] and\nwe design I and Dxz according to [5] but with shallower structures to alleviate the training costs.\nEmpirically SGAN needs 1.3-1.5x more training time than TripleGAN [15] without parallelization.\nIt is noted that properly weighting the losses of the four games in SGAN during training may lead to\nperformance improvement. However, we simply set them equal without heavy tuning1.\n4.1 Controllability and Disentanglability\nWe evaluate the controllability and disentanglability of SGAN by assessing its generator network G\nand inference network I, respectively. Speci\ufb01cally, as SGAN is able to perform posterior inference for\nz, we de\ufb01ne a novel quantitative measure based on z to compare its disentanglability to other DGMs:\nwe \ufb01rst use the trained I (or the \u201crecognition network\u201d in VAE-based models) to infer z for unseen x\nfrom test sets. Ideally, as z and y are modeled as independent, when I is trained to approach the true\nposterior of z, its output, when used as features, shall have weak predictability for y. Accordingly, we\n\n1The code is publicly available at https://github.com/thudzj/StructuredGAN.\n\n6\n\n\fuse z as features to train a linear SVM classi\ufb01er to predict the true y, and de\ufb01ne the converged accu-\nracy of this classi\ufb01er as the mutual predictability (MP) measure, and expect lower MP for models that\ncan better disentangle y from z. We conduct this experiment on all three sets, and report the averaged\nMP measure of \ufb01ve runs in Fig. 2, comparing the following DGMs (that are able to infer z): (1) ALI [5]\nand (2) VAE [12], trained without label information; (3) CVAE-full2: the M2 model in [11] trained un-\nder the fully supervised setting; (4) SGAN trained under semi-supervised settings. We use 50, 1000 and\n4000 labels for MNIST, SVHN and CIFAR-10 dataset under semi-supervised settings, respectively.\n\nClearly, SGAN demonstrates low MP when predicting y\nusing z on three datasets. Using only 50 labels, SGAN\nexhibits reasonable MP. In fact, on MNIST with only 20\nlabels as supervision, SGAN achieves 0.65 MP, outper-\nforming other baselines by a large margin. The results\nclearly demonstrate SGAN\u2019s ability to disentangle y and\nz, even when the supervision is very scarce.\nOn the other hand, better disentanglability also implies\nimproved controllability of G, because less entangled y and z would be easier for G to recognize the\ndesignated semantics \u2013 so G should be able to generate samples that are less deviated from y during\nconditional generation. To verify this, following [9], we use a pretrained gold-standard classi\ufb01er\n(0.56% error on MNIST test set) to classify generated images, and use the condition y as ground truth\nto calculate the accuracy. We compare SGAN in Table 1 to CVAE-semi and TripleGAN [15], another\nstrong baseline that is also designed for conditional generation under semi-supervised settings. We use\nn = 20, 50, 100 labels on MNIST, and observe a signi\ufb01cantly higher accuracy for both TripleGAN\nand SGAN. For comparison, a generator trained by CVAE-full achieves 0.6% error. When there are\nfewer labels available, SGAN outperforms TripleGAN. The generator in SGAN can generate samples\nthat consistently obey the conditions speci\ufb01ed in y, even when there are only two images per class\n(n = 20) as supervision. These results verify our statements that disentangled semantics further\nenhance the controllability of the conditioned generator G.\n\nFigure 2: Comparisons of the MP measure\nfor different DGMs (lower is better).\n\n4.2 Semi-supervised Classi\ufb01cation\n\n# labeled samples\n\nn = 100\n\nModel\n\nCVAE-semi\nTripleGAN\n\nSGAN\n\n5.66\n1.29\n0.93\n\nn = 50\n10.72\n1.80\n1.23\n\nn = 20\n33.05\n3.06\n1.68\n\nIt is natural to use SGAN for semi-supervised\nprediction.With a little supervision, SGAN can\ndeliver a conditional generator with reasonably\ngood controllability, with which, one can syn-\nthesize samples from pg(x, y) to augment the\ntraining of C when minimizing Ry. Once C\nTable 1: Errors (%) of generated samples classi\ufb01ed by a\nclassi\ufb01er with 0.56% test error.\nbecomes more accurate, it tends to make less\nmistakes when inferring y from x. Moreover, as we are sampling (x, y) \u223c pc(x, y) to train Dxy\nduring the maximization of Lxy, a more accurate C means more available labeled samples (by\npredicting y from unlabeled x using C) to lower the bias brought by the small set Xl, which in return\ncan enhance G in the minimization phase of Lxy. Consequently, a mutual boosting cycle between G\nand C is formed.\nTo empirically validate this, we deploy SGAN for semi-supervised classi\ufb01cation on MNIST, SVHN\nand CIFAR-10, and compare the test errors of C to strong baselines in Table 2. To keep the\ncomparisons fair, we adopt the same neural network architectures and hyper-parameter settings\nfrom [15], and report the averaged results of 10 runs with randomly sampled labels (every class has\nequal number of labels). We note that SGAN outperforms the current state-of-the-art methods across\nall datasets and settings. Especially, on MNIST when labeled instances are very scarce (n = 20),\nSGAN attains the highest accuracy (4.0% test error) with signi\ufb01cantly lower variance, bene\ufb01ting\nfrom the mutual boosting effects explained above. This is very critical for applications under low-shot\nor even one-shot settings where the small set Xl might not be a good representative for the data\ndistribution p(x, y).\n\n2For CVAE-full, we use test images and ground truth labels together to infer z when calculating MP. We\nare unable to compare to semi-supervised CVAE as in CVAE inferring z for test images requires image labels as\ninput, which is unfair to other methods.\n\n7\n\nMNISTSVHNCIFAR-100.30.60.9MPALIVAESGANCVAE-full\fMethod\n\nn = 20\n\nMNIST\nn = 50\n\nLadder [22]\n\nVAE [12]\n\nCatGAN [28]\n\nALI [5]\n\nImprovedGAN [27]\nTripleGAN [15]\n\nSGAN\n\n-\n-\n-\n-\n\n16.77(\u00b14.52)\n5.40(\u00b16.53)\n4.0(\u00b14.14)\n\n-\n-\n-\n-\n\n2.21(\u00b11.36)\n1.59(\u00b10.69)\n1.29(\u00b10.47)\n\nn = 100\n0.89(\u00b10.50)\n3.33(\u00b10.14)\n1.39(\u00b10.28)\n0.93 (\u00b10.07)\n0.92(\u00b10.58)\n0.89(\u00b10.11)\n\n-\n\nSVHN\n\nn = 1000\n36.02(\u00b10.10)\n\n-\n\n-\n7.3\n\n8.11(\u00b11.3)\n5.83(\u00b10.20)\n5.73(\u00b10.12)\n\n-\n\nCIFAR-10\nn = 4000\n20.40(\u00b10.47)\n19.58(\u00b10.58)\n18.63(\u00b12.32)\n18.82(\u00b10.32)\n17.26(\u00b10.69)\n\n18.3\n\n(a) w/o Ry,Rz\n\nTable 2: Comparisons of semi-supervised classi\ufb01cation errors (%) on MNIST, SVHN and CIFAR-10 test sets.\n4.3 Qualitative Results\nIn this section we present qualitative results produced by SGAN\u2019s generator under semi-supervised\nsettings. Unless otherwise speci\ufb01ed, we use 50, 1000 and 4000 labels on MNIST, SVHN, CIFAR-10\nfor the results. These results are randomly selected without cherry pick, and more results could be\nfound in the supplementary materials.\nControllable generation. To \ufb01g-\nure out how each module in SGAN\ncontributes to the \ufb01nal results, we\nconduct an ablation study in Fig.3,\nwhere we plot images generated by\nSGAN with or without the terms\nRy and Rz during training. As\nwe have observed, our full model\naccurately disentangles y and z.\nWhen there is no collaborative game\ninvolved, the generator easily col-\nlapses to a biased conditional distribution de\ufb01ned by the classi\ufb01er C that is trained only on a very\nsmall set of labeled data with insuf\ufb01cient supervision. For example, the generator cannot clearly\ndistinguish the following digits: 0, 2, 3, 5, 8. Incorporating Ry into training signi\ufb01cantly alleviate\nthis issue \u2013 an augmented C would resolve G\u2019s confusion. However, it still makes mistakes in some\nconfusing classes, such as 3 and 5. Ry and Rz connect the two adversarial games to form a mutual\nboosting cycle. The absence of any of them would break this cycle, consequently, SGAN would be\nunder-constrained and may collapse to some local minima \u2013 resulting in both a less accurate classi\ufb01er\nC and a less controlled G.\nVisual quality. Next, we investigate whether a more disentangled y, z will result in higher visual\nquality on generated samples, as it makes sense that the conditioned generator G would be much easier\nto learn when its inputs y and z carry more orthogonal information. We conduct this experiment on\nCIFAR-10 that is consisted of natural images with more uncertainty besides the object categories.\nWe compare several state-of-the-art generators in Fig 4 to SGAN without any advanced GAN\ntraining strategies (e.g. WGAN, gradient penalties) that are reported to possibly improve the visual\nquality. We \ufb01nd SGAN\u2019s conditional generator does generate less blurred images with the main\nobjects more salient, compared to TripleGAN and ImprovedGAN w/o minibatch discrimination (see\nsupplementary). For a quantitative measure, we generate 50K images and compute the inception\n\nFigure 3: Ablation study: conditional generation results by SGAN\n(a) without Ry,Rz, (b) without Rz (c) full model. Each row has the\nsame y while each column shares the same z.\n\n(b) w/o Rz\n\n(c) Full model\n\n(a) CIFAR-10 data\n\n(b) TripleGAN\n\n(c) SGAN\n\nFigure 4: Visual comparison of generated images on CIFAR-10. For (b) and (c), each row shares the same y.\n\n8\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\nFigure 5: (a)-(c): image progression, (d)-(f): style transfer using SGAN.\n\n(e)\n\n(f)\n\nscore [27] as 6.91(\u00b10.07), compared to TripleGAN 5.08(\u00b10.09) and Improved-GAN 3.87(\u00b10.03)\nw/o minibatch discrimination, con\ufb01rming the advantage of structured modeling for y and z.\nImage progression. To demonstrate that SGAN generalizes well instead of just memorizing the\ndata, we generate images with interpolated z in Fig.5(a)-(c) [32]. Clearly, the images generated with\nprogression are semantically consistent with y, and change smoothly from left to right. This veri\ufb01es\nthat SGAN correctly disentangles semantics, and learns accurate class-conditional distributions.\nStyle transfer. We apply SGAN for style transfer [7, 30]. Speci\ufb01cally, as y is modeled as digit/object\ncategory on all three dataset, we suppose z shall encode any other information that are orthogonal\nto y (probably style information). To see whether I behaves properly, we use SGAN to transfer the\nunstructured information from z in Fig.5(d)-(f): given an image x (the leftmost image), we infer its\nunstructured code z. We generate images conditioned on z, but with different y. It is interesting to\nsee that z encodes various aspects of the images, such as the shape, texture, orientation, background\ninformation, etc, as expected. Moreover, G can correctly transfer these information to other classes.\n\n5 Conclusion\nWe have presented SGAN for semi-supervised conditional generative modeling, which learns from a\nsmall set of labeled instances to disentangle the semantics of our interest from other elements in the\nlatent space. We show that SGAN has improved disentanglability and controllability compared to\nbaseline frameworks. SGAN\u2019s design is bene\ufb01cial to a lot of downstream applications: it establishes\nnew state-of-the-art results on semi-supervised classi\ufb01cation, and outperforms strong baseline in\nterms of the visual quality and inception score on controllable image generation.\n\nAcknowledgements\n\nZhijie Deng and Jun Zhu are supported by NSF China (Nos. 61620106010, 61621136008, 61332007),\nthe MIIT Grant of Int. Man. Comp. Stan (No. 2016ZXFB00001), Tsinghua Tiangong Institute\nfor Intelligent Computing and the NVIDIA NVAIL Program. Hao Zhang is supported by the\nAFRL/DARPA project FA872105C0003. Xiaodan Liang is supported by award FA870215D0002.\n\nReferences\n[1] Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin,\nSanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensor\ufb02ow: A system for large-scale machine\nlearning. In USENIX Symposium on Operating Systems Design and Implementation, 2016.\n\n9\n\n\f[2] James Bergstra, Olivier Breuleux, Fr\u00e9d\u00e9ric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Des-\njardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. Theano: A cpu and gpu math compiler in\npython. pages 3\u201310, 2010.\n\n[3] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel.\n\nInfogan:\nInterpretable representation learning by information maximizing generative adversarial nets. In Advances\nin Neural Information Processing Systems, pages 2172\u20132180, 2016.\n\n[4] Jeff Donahue, Philipp Kr\u00e4henb\u00fchl, and Trevor Darrell. Adversarial feature learning. arXiv preprint\n\narXiv:1605.09782, 2016.\n\n[5] Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Alex Lamb, Martin Arjovsky, Olivier Mastropietro, and\n\nAaron Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.\n\n[6] Tzu-Chien Fu, Yen-Cheng Liu, Wei-Chen Chiu, Sheng-De Wang, and Yu-Chiang Frank Wang. Learning\ncross-domain disentangled deep representation with supervision from a single domain. arXiv preprint\narXiv:1705.01314, 2017.\n\n[7] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. A neural algorithm of artistic style. arXiv preprint\n\narXiv:1508.06576, 2015.\n\n[8] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron\nCourville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing\nSystems, pages 2672\u20132680, 2014.\n\n[9] Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. Controllable text\n\ngeneration. arXiv preprint arXiv:1703.00955, 2017.\n\n[10] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional\n\nadversarial networks. arXiv preprint arXiv:1611.07004, 2016.\n\n[11] Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. Semi-supervised\nlearning with deep generative models. In Advances in Neural Information Processing Systems, pages\n3581\u20133589, 2014.\n\n[12] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114,\n\n2013.\n\n[13] Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. arXiv preprint\n\narXiv:1610.02242, 2016.\n\n[14] Yann LeCun, L\u00e9on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to\n\ndocument recognition. Proceedings of the IEEE, 86(11):2278\u20132324, 1998.\n\n[15] Chongxuan Li, Kun Xu, Jun Zhu, and Bo Zhang. Triple generative adversarial nets. In Advances in Neural\n\nInformation Processing Systems, 2017.\n\n[16] Chongxuan Li, Jun Zhu, Tianlin Shi, and Bo Zhang. Max-margin deep generative models. In Advances in\n\nNeural Information Processing Systems, pages 1837\u20131845, 2015.\n\n[17] Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, and Eric P Xing. Recurrent topic-transition gan for\n\nvisual paragraph generation. arXiv preprint arXiv:1703.07022, 2017.\n\n[18] Lars Maal\u00f8e, Casper Kaae S\u00f8nderby, S\u00f8ren Kaae S\u00f8nderby, and Ole Winther. Auxiliary deep generative\n\nmodels. arXiv preprint arXiv:1602.05473, 2016.\n\n[19] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784,\n\n2014.\n\n[20] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits\n\nin natural images with unsupervised feature learning.\n\n[21] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep\n\nconvolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.\n\n[22] Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semi-supervised\nlearning with ladder networks. In Advances in Neural Information Processing Systems, pages 3546\u20133554,\n2015.\n\n10\n\n\f[23] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee.\nGenerative adversarial text to image synthesis. In International Conference on Machine Learning, pages\n1060\u20131069, 2016.\n\n[24] Scott Reed, A\u00e4ron van den Oord, Nal Kalchbrenner, Victor Bapst, Matt Botvinick, and Nando de Freitas.\nGenerating interpretable images with controllable structure. In International Conference on Learning\nRepresentations, 2017.\n\n[25] Scott E Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. Learning\nwhat and where to draw. In Advances in Neural Information Processing Systems, pages 217\u2013225, 2016.\n\n[26] Ruslan Salakhutdinov and Geoffrey Hinton. Deep boltzmann machines. In Arti\ufb01cial Intelligence and\n\nStatistics, pages 448\u2013455, 2009.\n\n[27] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved\ntechniques for training gans. In Advances in Neural Information Processing Systems, pages 2226\u20132234,\n2016.\n\n[28] Jost Tobias Springenberg. Unsupervised and semi-supervised learning with categorical generative adver-\n\nsarial networks. arXiv preprint arXiv:1511.06390, 2015.\n\n[29] Luan Tran, Xi Yin, and Xiaoming Liu. Disentangled representation learning gan for pose-invariant face\n\nrecognition. In Conference on Computer Vision and Pattern Recognition, 2017.\n\n[30] Hao Wang, Xiaodan Liang, Hao Zhang, Dit-Yan Yeung, and Eric P Xing. Zm-net: Real-time zero-shot\n\nimage manipulation network. arXiv preprint arXiv:1703.07255, 2017.\n\n[31] Xiaolong Wang and Abhinav Gupta. Generative image modeling using style and structure adversarial\n\nnetworks. In European Conference on Computer Vision, pages 318\u2013335. Springer, 2016.\n\n[32] Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. Attribute2image: Conditional image generation\n\nfrom visual attributes. In European Conference on Computer Vision, pages 776\u2013791. Springer, 2016.\n\n[33] Hao Zhang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Gunhee Kim, Qirong Ho, and Eric Xing. Posei-\ndon: A system architecture for ef\ufb01cient gpu-based deep learning on multiple machines. arXiv preprint\narXiv:1512.06216, 2015.\n\n11\n\n\f", "award": [], "sourceid": 2110, "authors": [{"given_name": "Zhijie", "family_name": "Deng", "institution": "Tsinghua University"}, {"given_name": "Hao", "family_name": "Zhang", "institution": "Carnegie Mellon University"}, {"given_name": "Xiaodan", "family_name": "Liang", "institution": "Carnegie Mellon University"}, {"given_name": "Luona", "family_name": "Yang", "institution": "Carnegie Mellon University"}, {"given_name": "Shizhen", "family_name": "Xu", "institution": "Tsinghua University"}, {"given_name": "Jun", "family_name": "Zhu", "institution": "Tsinghua University"}, {"given_name": "Eric", "family_name": "Xing", "institution": "Carnegie Mellon University"}]}