{"title": "Adversarial Self-Defense for Cycle-Consistent GANs", "book": "Advances in Neural Information Processing Systems", "page_first": 637, "page_last": 647, "abstract": "The goal of unsupervised image-to-image translation is to map images from one domain to another without the ground truth correspondence between the two domains. State-of-art methods learn the correspondence using large numbers of unpaired examples from both domains and are based on generative adversarial networks. In order to preserve the semantics of the input image, the adversarial objective is usually combined with a cycle-consistency loss that penalizes incorrect reconstruction of the input image from the translated one. However, if the target mapping is many-to-one, e.g. aerial photos to maps, such a restriction forces the generator to hide information in low-amplitude structured noise that is undetectable by human eye or by the discriminator. In this paper, we show how such self-attacking behavior of unsupervised translation methods affects their performance and provide two defense techniques. We perform a quantitative evaluation of the proposed techniques and show that making the translation model more robust to the self-adversarial attack increases its generation quality and reconstruction reliability and makes the model less sensitive to low-amplitude perturbations. Our project page can be found at ai.bu.edu/selfadv.", "full_text": "Adversarial Self-Defense for Cycle-Consistent GANs\n\nDina Bashkirova 1, Ben Usman1, and Kate Saenko 1,2\n\n1Boston University\n\n2MIT-IBM Watson AI Lab\n{dbash,usmn,saenko}@bu.edu\n\nAbstract\n\nThe goal of unsupervised image-to-image translation is to map images from one\ndomain to another without the ground truth correspondence between the two\ndomains. State-of-art methods learn the correspondence using large numbers of\nunpaired examples from both domains and are based on generative adversarial\nnetworks. In order to preserve the semantics of the input image, the adversarial\nobjective is usually combined with a cycle-consistency loss that penalizes incorrect\nreconstruction of the input image from the translated one. However, if the target\nmapping is many-to-one, e.g. aerial photos to maps, such a restriction forces the\ngenerator to hide information in low-amplitude structured noise that is undetectable\nby human eye or by the discriminator. In this paper, we show how such self-\nattacking behavior of unsupervised translation methods affects their performance\nand provide two defense techniques. We perform a quantitative evaluation of the\nproposed techniques and show that making the translation model more robust to the\nself-adversarial attack increases its generation quality and reconstruction reliability\nand makes the model less sensitive to low-amplitude perturbations. Our project\npage can be found at ai.bu.edu/selfadv/.\n\n1\n\nIntroduction\n\nGenerative adversarial networks (GANs) [7] have enabled many recent breakthroughs in image\ngeneration, such as being able to change visual attributes like hair color or gender in an impressively\nrealistic way, and even generate highly realistic-looking faces of people that do not exist [13, 31, 14].\nConditional GANs designed for unsupervised image-to-image translation can map images from one\ndomain to another without pairwise correspondence and ground truth labels, and are widely used for\nsolving such tasks as semantic segmentation, colorization, style transfer, and quality enhancement of\nimages [34, 10, 19, 3, 11, 35, 4] and videos [2, 1]. These models learn the cross-domain mapping\nby ensuring that the translated image both looks like a true representative of the target domain, and\nalso preserves the semantics of the input image, e.g. the shape and position of objects, overall layout\netc. Semantic preservation is usually achieved by enforcing cycle-consistency [34], i.e. a small error\nbetween the source image and its reverse reconstruction from the translated target image.\nDespite the success of cycle-consistent GANs, they have a major \ufb02aw. The reconstruction loss forces\nthe generator network to hide the information necessary to faithfully reconstruct the input image\ninside tiny perturbations of the translated image [5]. The problem is particularly acute in many-to-one\nmappings, such as photos to semantic labels, where the model must reconstruct textures and colors\nlost during translation to the target domain. For example, Figure 1\u2019s top row shows that even when\nthe car is mapped incorrectly to semantic labels of building (gray) and tree (green), CycleGAN is still\nable to \u201ccheat\u201d and perfectly reconstruct the original car from hidden information. It also reconstructs\nroad textures lost in the semantic map. This behavior is essentially an adversarial attack that the\nmodel is performing on itself, so we call it a self-adversarial attack.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: Results of translation of GTA [26] frames to semantic segmentation maps using CycleGAN, UNIT\nand CycleGAN with our two proposed defense methods, additive noise and guess loss. The last column shows\nthe reconstruction of the input image when high-frequency noise (Gaussian noise with mean 0 and standard\ndeviation 0.08 \u223c 10 intensity levels out of 256) is added to the output map. Ideally, if the reconstruction is\n\"honest\" and relies solely on the visual features of the input, the reconstruction quality should not be greater than\nthat of the translation. The results of all three translation methods (CycleGAN, UNIT and MUNIT) show that\nthe reconstruction is almost perfect regardless of the translation accuracy. Furthermore, the reconstruction of the\ninput image is highly sensitive to low-amplitude random noise added to the translation. Both of the proposed\nself-adversarial defense techniques (Section 4) make the CycleGAN model more robust to the random noise\nand make it rely more on the translation result rather than the adversarial structured noise as in the original\nCycleGAN and UNIT. More translation examples can be found in the Section 3 of supplementary material. Best\nviewed in color.\n\nIn this paper, we extend the analysis of self-adversarial attacks provided in [5] and show that the\nproblem is present in recent state-of-art methods that incorporate cycle consistency. We provide two\ndefense mechanisms against the attack that resemble the adversarial training technique widely used\nto increase robustness of deep neural networks to adversarial attacks [9, 16, 32]. We also introduce\nquantitative evaluation metrics for translation quality and reconstruction \u201chonesty\u201d that help to detect\nself-adversarial attacks and provide a better understanding of the learned cross-domain mapping. We\nshow that due to the presence of hidden embeddings, state of the art translation methods are highly\nsensitive to high-frequency perturbations as illustrated in Figure 1. In contrast, our defense methods\nsubstantially decrease the amount of self-adversarial structured noise and thus make the mapping\nmore reliant on the input image, which results in more interpretable translation and reconstruction\nand increased translation quality. Importantly, robustifying the model against the self-adversarial\nattack makes it also less susceptible to the high-frequency perturbations which make it less likely to\nconverge to a non-optimal solution.\n\n2 Related Work\n\nUnsupervised image-to-image translation is one of the tasks of domain adaptation that received a\nlot of attention in recent years. Current state-of-art methods [34, 20, 11, 15, 4, 10] solve this task\nusing generative adversarial networks [8] that usually consist of a pair of generator and discriminator\nnetworks that are trained in a min-max fashion to generate realistic images from the target domain\nand correctly classify real and fake images respectively.\nThe goal of image-to-image translation methods is to map the image from one domain to another in\nsuch way that the output image both looks like a real representative of the target domain and contains\n\n2\n\n\fthe semantics of the input image. In the supervised setting, the semantic consistency is enforced by\nthe ground truth labels or pairwise correspondence. In case when there is no supervision, however,\nthere is no such ground truth guidance, so using regular GAN results in often realistic-looking\nbut unreliable translations. In order to overcome this problem, current state-of-art unsupervised\ntranslation methods incorporate cycle-consistency loss \ufb01rst introduced in [34] that forces the model\nto learn such mapping from which it is possible to reconstruct the input image.\nRecently, various methods have been developed for unimodal (CycleGAN [34], UNIT [20], CoGAN\n[21] etc.) and multimodal (MUNIT [11], StarGAN [4], BicycleGAN [35]) image-to-image translation.\nIn this paper, we explore the problem of self-adversarial attacks in three of them: CycleGAN,\nUNIT and MUNIT. CycleGAN is a unimodal translation method that consists of two domain\ndiscriminators and two generator networks; the generators are trained to produce realistic images\nfrom the corresponding domains, while the discriminators aim to distinguish in-domain real images\nfrom the generated ones. The generator-discriminator pairs are trained in a min-max fashion both to\nproduce realistic images and to satisfy the cycle-consistency property. The main idea behind UNIT is\nthat both domains share some common semantics, and thus can be encoded to the shared latent space.\nIt consists of two encoder-decoder pairs that map images to the latent space and back; the cross-\ndomain translation is then performed by encoding the image from the source domain to the latent\nspace and decoding it with the decoder for the target domain. MUNIT is a multimodal extension of\nUNIT that performs disentanglement of domain-speci\ufb01c (style space) and domain-agnostic (content\nspace) features. While the original MUNIT does not use the explicit cycle-consistency loss, we\nfound that cycle-consistency penalty signi\ufb01cantly increases the quality of translation and helps the\nmodel to learn more reliable content disentanglement (see Figure 2). Thus, we used the MUNIT with\ncycle-consistency loss in our experiments.\nAs illustrated in Figure 2, adding cycle-consistency loss indeed helps to disentangle domain-agnostic\ninformation and enhance the translation quality and reliability. However, such pixelwise penalty was\nshown [5] to force the generator to hide the domain-speci\ufb01c information that cannot be explicitly\nreconstructed from the translated image (i.e., shadows or color of the buildings from maps in maps-\nto-photos example) in such way that it cannot be detected by the discriminator.\nIt has been known that deep neural networks [17], while providing higher accuracy in the majority of\nmachine learning problems, are highly susceptible to the adversarial attacks [24, 29, 16, 23]. There\nexist multiple defense techniques that make neural networks more robust to the adversarial examples,\nsuch as adding adversarial examples to the training set or adversarial training [24, 22], distillation\n[25], ensemble adversarial training [30], denoising [18] and many more. Moreover, [33] have shown\nthat defending the discriminator in a GAN setting increases the generation quality and prevents the\nmodel from converging to a non-optimal solution. However, most adversarial defense techniques are\ndeveloped for the classi\ufb01cation task and are very hard to adapt to the generative setting.\n\n3 Self-Adversarial Attack in Cyclic Models\nSuppose we are given a number of samples from two image domains x \u223c pA and y \u223c pB. The goal\nis to learn two mappings G : x \u223c pA \u2192 y \u223c pB and F : y \u223c pB \u2192 x \u223c pA. In order to learn the\ndistributions pA and pB, two discriminators DA and DB are trained to classify whether the input\nimage is a true representative of the corresponding domain or generated by G or F accordingly. The\ncross-distribution mapping is learned using the cycle-consistency property in form of a loss based on\nthe pixelwise distance between the input image and its reconstruction. Usually, the cycle-consistency\nloss can be described as following:\n\nLrec = (cid:107)F (G(x)) \u2212 x(cid:107)1\n\n(1)\nHowever, in case when domain A is richer than B, the mapping G : x \u223c pA \u2192 y \u223c pB is many-to-\none (i.e. if for one image x \u223c pB there are multiple correct correspondences y \u223c pA), the generator\nis still forced to perfectly reconstruct the input even though some of the information of the input\nimage is lost after the translation to the domain B. As shown in [5], such behavior of a CycleGAN\ncan be described as an adversarial attack, and in fact, for any given image it is possible to generate\nsuch structured noise that would lead to reconstruction of the target image [5].\nIn practice, CycleGAN and other methods that utilize cycle-consistency loss add a very low-amplitude\nsignal to the translation \u02c6y that is invisible for a human eye. Addition of a certain signal is enough\n\n3\n\n\fto reconstruct the information of image x that should not be present in \u02c6y. This makes methods that\nincorporate cycle-consistency loss sensitive to low-amplitude high-frequency noise since that noise\ncan destroy the hidden signal (shown in Figure 3). In addition, such behavior can force the model\nto converge to a non-optimal solution or even diverge since by adding structured noise the model\n\"cheats\" to minimize the reconstruction loss instead of learning the correct mapping.\n\n4 Defense techniques\n\n4.1 Adversarial training with noise\n\nOne approach to defend the model from a self-adversarial attack is to train it to be resistant to the\nperturbation of nature similar to the one produced by the hidden embedding. Unfortunately, it is\nimpoossible to separate the pure structured noise from the traslated image, so classic adversarial\ndefense training cannot be used in this scenario. However, it is possible to prevent the model from\nlearning to embed by adding perturbations to the translated image before reconstruction. The intuition\nbehind this approach is that adding random noise of amplitude similar to the hidden signal disturbs\nthe embedded message. This results in high reconstruction error, so the generator cannot rely on the\nembedding. The modi\ufb01ed noisy cycle-consistency loss can be described as follows:\n\nrec = (cid:107)F (G(x) + \u2206(\u03b8n)) \u2212 x(cid:107)1 ,\nLnoisy\n\n(2)\nwhere \u2206(\u03b8n) is some high-frequency perturbation function with parameters \u03b8n. In our experiments\nwe used low-amplitude Gaussian noise with mean equal to zero. Such a simplistic defense approach is\nvery similar to the one proposed in [33] where the discriminator is defended from the generator attack\nby regularizing the discriminator objective using the adversarial vectors. In our setting, however, the\nattack is targeted on both the discriminator and the generator of opposite domain, which makes it\nharder to \ufb01nd the exact adversarial vector. Which is why we regularize both the discriminator and\ngenerator using random noise. Since adding noise to the input image is equivalent to penalizing\nlarge magnitude of the gradients of the loss function, this also forces the model to learn smoother\nboundaries and prevents it from over\ufb01tting.\n\n4.2 Guess Discriminator\n\nIdeally, the self-adversarial attack should be detected by the discriminator, but this might be too\nhard for it since it never sees real and fake examples of the same content.\nIn the supervised\nsetting, this problem is naturally solved by conditioning the outputs on the ground truth labels. For\nexample, a self-adversarial attack does not occur in Conditional GANs because the discriminator\nis conditioned on the ground truth class labels and is provided with real and fake examples of each\nclass. In the unsupervised setting, however, there is no such information about the class labels,\nand the discriminator only receives unpaired real and fake examples from the domain. This task\nis signi\ufb01cantly harder for the discriminator as it has to learn the distribution of the whole domain.\nOne widely used defense strategy is adding the adversarial examples to the training set. While it is\npossible to model the adversarial attack of the generator, it is very time and memory consuming as\nit requires training an additional network that generates such examples at each step of training the\nGAN. However, we can use the fact that cycle-consistency loss forces the model to minimize the\ndifference between the input and reconstructed images, so we can use the reconstruction output to\nprovide the fake example for the real input image as an approximation of the adversarial example.\nThus, the defense during training can be formulated in terms of an additional guess discriminator\nthat is very similar to the original GAN discriminator, but receives as input two images \u2013 input and\nreconstruction \u2013 in a random order, and \"guesses\" which of the images is fake. As with the original\ndiscriminator, the guess discriminator Dguess is trained to minimize its error while the generator\naims to produce such images that maximize it. The guess discriminator loss or guess loss can be\ndescribed as:\n\n(cid:26)GA\n\nguess{X, F (G(X)},\n1 \u2212 GA\n\nLguess =\n(3)\nguess(X, \u02c6X) \u2208 [0, 1]. This loss resembles the class label conditioning in the\nwhere X \u223c PA, GA\nConditional GAN in the sense that the guess discriminator receives real and fake examples that are\npresumably of the same content, therefore the embedding detection task is signi\ufb01cantly simpli\ufb01ed.\n\nwith probability 0.5\nguess{F (G(X)), X}, with probability 0.5\n\n4\n\n\fFigure 2: Comparison of translation results produced by original MUNIT method and MUNIT with additional\ncycle-consistency loss. In columns 2 and 3 are shown the translation results with two different randomly\ngenerated style vectors. It can be observed that, while both methods incorrectly disentangled style and content\ninformation, the method that contains cycle-consistency loss forces the model to preserve the overall scene\nlayout and produce more reliable translation in general. Column 5 shows the results of reconstruction of the\ninput image from the maps with the \ufb01rst random style (column 2). More examples on Google Maps translation\ncan be found in the supplementary material. Best viewed in color.\n\nIn addition to the defense approaches described above, it is bene\ufb01cial to use the fact that the\nrelationship between the domains is one-to-many. One naive solution to add such prior knowledge\nis by assigning a smaller weight to the reconstruction loss of the \"richer\" domain (e.g. photos\nin maps-to-photos experiment). Results of our experiments show substantial improvement in the\ngeneration quality when such a domain relation prior is used.\n\n5 Experiments and results\n\nIn abundance of GAN-based methods for unsupervised image translation, we limited our analysis\nto three popular state-of-art models that cover both unimodal and multimodal translation cases:\nCycleGAN[34], UNIT[20] and MUNIT[11]. The details on model architectures and choice of\nhyperparameters used in our experiments can be found in the supplementary materials.\n\n5.1 Datasets\n\nTo provide empirical evidence of our claims, we performed a sequence of experiments on three\npublicly available image-to-image translation datasets. Despite the fact that all three datasets are\npaired and hence the ground truth correspondence is known, the models that we used are not capable\nof using the ground-truth alignment by design and thus were trained in an unsupervised manner.\nGoogle Aerial Photo to Maps dataset consisting of 3292 pairs of aerial photos and corresponding\nmaps. In our experiments, we resized the images from 600 \u00d7 600 pixels to 400 \u00d7 400 pixels for\nMUNIT and UNIT and to 289\u00d7289 pixels for CycleGAN. During training, the images were randomly\ncropped to 360 \u00d7 360 for UNIT and MUNIT and 256 \u00d7 256 for CycleGAN. The dataset is available\nat [6]. We used 1098 images for training and 1096 images for testing.\nPlaying for Data (GTA)[26] dataset that consists of 24966 pairs of image frames and their semantic\nsegmentation maps. We used a subset of 10000 frames (7500 images for training, 2500 images for\ntesting) with day-time lighting resized to 192 \u00d7 192 pixels, and randomly cropped with window size\n128 \u00d7 128. SynAction [28] synthetic human action dataset consisting of a set of 20 possible actions\nperformed by 10 different human renders. For our experiments, we used two actors and all existing\nactions to perform the translation from one actor to another; all other conditions such as background,\nlighting, viewpoint etc. are chosen to be the same for both domains. We used this dataset to test\nwhether the self-adversarial attack is present in the one-to-one setting. The original images were\nresized to 512 \u00d7 512 and cropped to 452 \u00d7 452. We split the data to 1561 images in each domain for\ntraining 357 images for testing.\n\n5\n\n\fFigure 3: SynAction actor translation example with CycleGAN, CycleGAN with noise and CycleGAN with\nguess loss.\n5.2 Metrics\n\nTranslation quality. The choice of aligned datasets was dictated by the need to quantitatively\nevaluate the translation quality which is impossible when the ground truth correspondence is unknown.\nHowever, even having the ground truth pairs does not solve the issue of quality evaluation in one-\nto-many case, since for one input image there exist a large (possibly in\ufb01nite) number of correct\ntranslations, so pixelwise comparison of the ground truth image and the output of the model does not\nprovide a correct metric for the translation quality.\nIn order to overcome this issue, we adopted the idea behind the Inception Score [27] and trained\nthe supervised Pix2pix[12] model to perform many-to-one mapping as an intermediate step in the\nevaluation. Considering the GTA dataset example, in order to evaluate the unsupervised mapping\nfrom segmentation maps to real frames (later on \u2013 segmentation to real), we train the Pix2pix model\nto translate from real to segmentation; then we feed it the output of the unsupervised model to perform\n\"honest\" reconstruction of the input segmentation map, and compute the Intersection over Union\n(IoU) and mean class-wise accuracy of the output of Pix2Pix when given a ground truth example and\nthe output of the one-to-many translation model. For any ground truth pair (Ai, Bi), the one-to-many\ntranslation quality is computed as IoU(pix(GA(Bi)), pix(Ai)), where pix(\u00b7) is the translation with\nPix2pix from A to B. The \"honest reconstruction\" is compared with the Pix2pix translation of the\nground truth image Ai instead of the ground truth image itself in order to take into account the error\nproduced by the Pix2pix translation.\nReconstruction honesty. Since it is impossible to acquire the structured noise produced as a result\nof a self-adversarial attack, there is no direct way to either detect the attack or measure the amount of\ninformation hidden in the embedding.\nIn order to evaluate the presence of a self-adversarial attack, we developed a metric that we call\nquantized reconstruction honesty. The intuition behind this metric is that, ideally, the reconstruction\nerror of the image of the richer domain should be the same as the one-to-many translation error\nif given the same input image from the poorer domain. In order to measure whether the model is\nindependent of the origin of the input image, we quantize the many-to-one translation results in\nsuch way that it only contains the colors from the domain-speci\ufb01c palette. In our experiments, we\napproximate the quantized maps by replacing the colors of each pixel by the closest one from the\npalette. We then feed those quantized images to the model to acquire the \"honest\" reconstruction\nerror, and compare it with the reconstruction error without quantization. The honesty metric for a\none-to-many reconstruction can be described as follows:\n\nN(cid:88)\n\ni=1\n\nRH =\n\n1\nN\n\n{(cid:107)GA((cid:98)GB(Xi)(cid:99)) \u2212 Yi(cid:107)2 \u2212 (cid:107)GA(GB(Xi)) \u2212 Yi(cid:107)2},\n\n(4)\n\nwhere (cid:98)\u2217(cid:99) is a quantization operation, GB is a many-to-one mapping, (Xi, Yi) is a ground truth pair\nof examples from domains A and B.\n\n6\n\n\fFigure 4: Illustration of sensitivity (Eq. 5) of cycle-consistent translation methods to high-frequency\nperturbations in one-to-many (left) and in many-to-one (right) cases. Here the domains A and B\nare segmentation maps and GTA video frames respectively. If the method is robust to the random\nperturbations then the reconstruction error should grow linearly with the amplitude of the added\nnoise. For the cycle-consistent methods, we observe exponential growth of the reconstruction error\nin the one-to-many mapping that saturates at \u03c3 = 0.09, which means that these methods are highly\nsensitive to noise.\n\nFigure 5: Quantized reconstruction results of the original CycleGAN, CycleGAN with noise defense and\nCycleGAN with guess loss defense. After translating the input GTA frame to the semantic translation map,\nwe performed quantization such that the resulting translation would only contain the colors present in the real\nsegmentation maps. We then fed the quantized translation results to reconstruct the input image (column 5). The\nlast column represents the translation from the corresponding ground truth semantic segmentation map to real\nframe for comparison. As with random noise, quantization removes structured self-adversarial noise needed\nto accurately reconstruct the input, therefore the quantized reconstruction with CycleGAN differs drastically\nfrom the non-quantized reconstruction. CycleGAN with the guess loss and noisy CycleGAN, on the other hand,\nrely more on the input segmentation map than the original CycleGAN, therefore the quantized reconstruction is\nsimilar to the original reconstruction. More quantized translation examples can be found in the supplementary\nmaterial. Best viewed in color.\n\nSensitivity to noise. Aside from the obvious consequences of the self-adversarial attack, such as\nconvergence of the generator to a suboptimal solution, there is one more signi\ufb01cant side effect of\nit \u2013 extreme sensitivity to perturbations. Figure 1 shows how addition of low-amplitude Gaussian\nnoise effectively destroys the hidden embedding thus making a model that uses cycle-consistency\nloss unable to correctly reconstruct the input image. In order to estimate the sensitivity of the model,\nwe add zero-mean Gaussian noise to the translation result before reconstruction and compute the\nreconstruction error. The sensitivity to noise of amplitude \u03c3 for a set of images Xi \u223c pA is computed\n\n7\n\n\facc. segm \u2191\n\nIoU segm\u2191\n\nIoU p2p\u2191\n\nMethod\nCycleGAN\nCycleGAN + noise*\nCycleGAN + guess*\nCycleGAN + guess + noise*\nUNIT\nMUNIT + cycle\npix2pix (supervised)\n\nSN\u2193\n0.23\n446.9\n0.24\n94.2\n0.24\n212.6\n0.236\n150.6\n0.08\n361.5\n0.13\n244.9\n0.4\n\u2013\nacc. segm and IoU segm represent mean class-wise\nTable 2: Results on the GTA V dataset.\nIoU p2p is the mean IoU of the pix2pix segmentation of the\nsegmentation accuracy and IoU,\nsegmentation-to-frame mapping; RH (Eq.4) and SN(Eq.5) are the quantized reconstruction honesty\nand sensitivity to noise of the many-to-one mapping (B2A2B) respectively. * \u2013 our proposed defense\nmethods. The reconstruction error distributions plots can be found in the supplementary material\n(Section 2).\n\nRH\u2193\n27.43 \u00b1 6.1\n9.17 \u00b1 7.4\n11.4 \u00b1 7.0\n6.1 \u00b1 5.9\n6.4 \u00b1 11.7\n2.5 \u00b1 8.9\n\u2013\n\n0.16\n0.17\n0.17\n0.17\n0.04\n0.08\n0.34\n\n0.20\n0.23\n0.21\n0.24\n0.06\n0.17\n\n\u2013\n\nMethod\nCycleGAN\nCycleGAN + noise*\nCycleGAN + guess*\nCycleGAN + guess + noise*\nUNIT\nMUNIT + cycle\npix2pix (supervised)\n\nacc. segm\u2191\n\n0.23\n0.24\n0.24\n0.25\n0.21\n0.15\n0.3\n\nIoU segm\u2191\n\n0.18\n0.19\n0.184\n0.19\n0.15\n0.09\n0.23\n\nIoU p2p\u2191\n\n0.21\n0.22\n0.224\n0.22\n0.12\n0.12\n\n\u2013\n\nRH \u2193\n21.8 \u00b1 5.2\n12.27 \u00b1 4.42\n7.5 \u00b1 2.4\n-0.45 \u00b1 2.3\n19.6 \u00b1 6.1\n21.4 \u00b1 7.9\n\u2013\n\nSN\u2193\n251.2\n222.2\n235.4\n238.3\n528.2\n687.3\n\u2013\n\nTable 3: Results on the Google Maps dataset. The notation is same as in the Table 2.\n\nby the following formula:\n\nSN (\u03c3) =\n\nN(cid:88)\n\ni=1\n\n1\nN\n\n(cid:107)GA(GB(Xi) + N (0, \u03c3)) \u2212 GA(GB(Xi))(cid:107)2\n\n(5)\n\n(cid:82) b\n\nThe overall sensitivity of a method is then computed as an area under curve of AuC(SN (\u03c3)) \u2248\na SN (x)dx. In our experiments we chose a = 0, b = 0.2, N = 500 for Google Maps and GTA\nexperiments and N = 100 for the SynAction experiment. In case when there is no structured noise\nin the translation, the reconstruction error should be proportional to the amplitude of added noise,\nwhich is what we observe for the one-to-many mapping using MUNIT and CycleGAN. Surprisingly,\nUNIT translation is highly senstive to noise even in one-to-many case.\n\nMethod\nCycleGAN\nCycleGAN+noise*\nCycleGAN+guess*\nCycleGAN+guess+noise*\n\nMSE\u2193\n32.55\n22.18\n23.57\n23.13\n\nSN \u2193\n6.5\n1.1\n2.4\n1.35\n\nTable 1: Results on SynAction dataset: mean\nsquare error of the translation and sensitivity to\nnoise.\n\nThe many-to-one mapping result (Figure 3), in\ncontrast, suggests that the structured noise is\npresent, since the reconstruction error increases\nrapidly and quickly saturates at noise amplitude\n0.08. The results of one-to-many and many-to-\none noisy reconstruction show that both noisy\nCycleGAN and guess loss defense approaches\nmake the CycleGAN model more robust to high-\nfrequency perturbations compared to the origi-\nnal CycleGAN.\n\n5.3 Results\n\nThe results of our experiments show that the problem of self-adversarial attacks is present in all three\ncycle-consistent methods we examined. Surprisingly, the results on the SynAction dataset had shown\nthat self-adversarial attack appears even if the learned mapping is one-to-one (Table 1). Both defense\ntechniques proposed in Section 4 make CycleGAN more robust to random noise and increase its\ntranslation quality (see Tables 1, 2 and 3). The noise-regularization defense helps the CycleGAN\n\n8\n\n\fFigure 6: Results of the GTA frames-to-segmentation translation with the original CycleGAN and our\ndefense techniques. The frame reconstruction (b2a2b) with noisy CycleGAN is remarkably similar to\nthe opposite translation (a2b). For example, the road marking in the reconstructed image is located at\nthe same place as in the translation (a2b) rather than as in the input (b).\n\nmodel to become more robust both to small perturbations and to the self-adversarial attack. The\nguess loss approach, on the other hand, while allowing the model to hide some small portion of\ninformation about the input image (for example, road marking for the GTA experiment), produces\nmore interpretable and reliable reconstructions. Furthermore, combination of both proposed defense\ntechniques results beats both methods in terms of translation quality and reconstruction honesty\n(Figure 6).\nSince both defense techniques force the generators to rely more on the input image than on the\nstructured noise, their results are more interpretable and provide deeper understanding of the methods\n\"reasoning\". For example, since the training set did not contain any examples of a truck that is\ncolored in white and green, at test time the guess-loss CycleGAN approximated the green part of the\ntruck with the \"vegetation\" class color and the white part with the building class color (see Section 3\nof the supplementary material); the reconstructed frame looked like a rough approximation of the\ntruck despite the fact that the semantic segmentation map was wrong. This can give a hint about the\nlimitations of the given training set.\n\n6 Conclusion\n\nIn this paper, we introduced the self-adversarial attack phenomenon of unsupervised image-to-image\ntranslation methods \u2013 the hidden embedding performed by the model itself in order to reconstruct\nthe input image with high precision. We empirically showed that self-adversarial attack appears in\nmodels when the cycle-consistency property is enforced and the target mapping is many-to-one. We\nprovided the evaluation metrics that help to indicate the presence of self-adversarial attack, and a\ntranslation quality metric for one-to-many mappings. We also developed two adversarial defense\ntechniques that signi\ufb01cantly reduce the hidden embedding and force the model to produce more\n\"honest\" results, which, in return, increases its translation quality.\n\n7 Acknowledgements\n\nThis project was supported in part by NSF and DARPA.\n\n9\n\n\fReferences\n[1] A. Bansal, S. Ma, D. Ramanan, and Y. Sheikh. Recycle-GAN: Unsupervised video retargeting.\nIn Proceedings of the European Conference on Computer Vision (ECCV), pages 119\u2013135, 2018.\n\n[2] D. Bashkirova, B. Usman, and K. Saenko. Unsupervised video-to-video translation. arXiv\n\npreprint arXiv:1806.03698, 2018.\n\n[3] Q. Chen and V. Koltun. Photographic image synthesis with cascaded re\ufb01nement networks. In\n\nThe IEEE International Conference on Computer Vision (ICCV), volume 1, 2017.\n\n[4] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo. Stargan: Uni\ufb01ed generative\nadversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE\nConference on Computer Vision and Pattern Recognition, pages 8789\u20138797, 2018.\n\n[5] C. Chu, A. Zhmoginov, and M. Sandler. Cyclegan, a master of steganography. arXiv preprint\n\narXiv:1712.02950, 2017.\n\n[6] A. Efros. Google aerial photos and maps dataset, 2017.\n\n[7] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. Deep learning, volume 1. MIT press\n\nCambridge, 2016.\n\n[8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and\nY. Bengio. Generative adversarial nets. In Advances in neural information processing systems,\npages 2672\u20132680, 2014.\n\n[9] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.\n\narXiv preprint arXiv:1412.6572, 2014.\n\n[10] J. Hoffman, E. Tzeng, T. Park, J. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell. Cycada:\n\nCycle-consistent adversarial domain adaptation. CoRR, abs/1711.03213, 2017.\n\n[11] X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz. Multimodal unsupervised image-to-image\ntranslation. In Proceedings of the European Conference on Computer Vision (ECCV), pages\n172\u2013189, 2018.\n\n[12] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional\nadversarial networks. In Proceedings of the IEEE conference on computer vision and pattern\nrecognition, pages 1125\u20131134, 2017.\n\n[13] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality,\n\nstability, and variation. arXiv preprint arXiv:1710.10196, 2017.\n\n[14] T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial\n\nnetworks. arXiv preprint arXiv:1812.04948, 2018.\n\n[15] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim. Learning to discover cross-domain relations\nwith generative adversarial networks. In Proceedings of the 34th International Conference on\nMachine Learning-Volume 70, pages 1857\u20131865. JMLR. org, 2017.\n\n[16] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv\n\npreprint arXiv:1607.02533, 2016.\n\n[17] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. nature, 521(7553):436, 2015.\n\n[18] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu. Defense against adversarial attacks\nusing high-level representation guided denoiser. In Proceedings of the IEEE Conference on\nComputer Vision and Pattern Recognition, pages 1778\u20131787, 2018.\n\n[19] M. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. CoRR,\n\nabs/1703.00848, 2017.\n\n[20] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. In\n\nAdvances in Neural Information Processing Systems, pages 700\u2013708, 2017.\n\n10\n\n\f[21] M.-Y. Liu and O. Tuzel. Coupled generative adversarial networks. In Advances in neural\n\ninformation processing systems, pages 469\u2013477, 2016.\n\n[22] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models\n\nresistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.\n\n[23] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method\nto fool deep neural networks. In Proceedings of the IEEE conference on computer vision and\npattern recognition, pages 2574\u20132582, 2016.\n\n[24] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations\nof deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and\nPrivacy (EuroS&P), pages 372\u2013387. IEEE, 2016.\n\n[25] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial\nperturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy\n(SP), pages 582\u2013597. IEEE, 2016.\n\n[26] S. R. Richter, V. Vineet, S. Roth, and V. Koltun. Playing for data: Ground truth from computer\n\ngames. In European Conference on Computer Vision, pages 102\u2013118. Springer, 2016.\n\n[27] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved\ntechniques for training gans. In Advances in neural information processing systems, pages\n2234\u20132242, 2016.\n\n[28] X. Sun, H. Xu, and K. Saenko. A two-stream variational adversarial network for video\n\ngeneration. arXiv preprint arXiv:1812.01037, 2018.\n\n[29] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus.\n\nIntriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.\n\n[30] F. Tram\u00e8r, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel. Ensemble\n\nadversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.\n\n[31] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro. High-resolution image\nIn Proceedings of the IEEE\n\nsynthesis and semantic manipulation with conditional gans.\nConference on Computer Vision and Pattern Recognition, pages 8798\u20138807, 2018.\n\n[32] Z. Yan, Y. Guo, and C. Zhang. Deep defense: Training dnns with improved adversarial\n\nrobustness. In Advances in Neural Information Processing Systems, pages 419\u2013428, 2018.\n\n[33] B. Zhou and P. Kr\u00e4henb\u00fchl. Don\u2019t let your discriminator be fooled. 2018.\n\n[34] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using\n\ncycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593, 2017.\n\n[35] J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. Toward\nmultimodal image-to-image translation. In Advances in Neural Information Processing Systems,\npages 465\u2013476, 2017.\n\n11\n\n\f", "award": [], "sourceid": 327, "authors": [{"given_name": "Dina", "family_name": "Bashkirova", "institution": "Boston University"}, {"given_name": "Ben", "family_name": "Usman", "institution": "Boston University"}, {"given_name": "Kate", "family_name": "Saenko", "institution": "Boston University"}]}