Reviews: Cross-Domain Transferability of Adversarial Perturbations

- Clarity: the paper is very well written. The problem is clearly formulated with appropriate distinction with other methods in the abstract and introduction. Compared to the prior works, conceptual differences are well stated. I enjoyed reading it. - Originality and significance: despite the impressive empirical results, the main novelty of this work might be applying the idea of relativistic loss from [20] to adversarial perturbation. I do not underestimate the effort of applying a new technique to a different problem. Then, not only empirical results but also more follow-up analyses should be followed to validate how and why it is applicable. For example, I am curious about what types of cross-domain features are learned across domains. I get a little bit of hint from Figure 3: the background texture becomes more painting styles, but they are not enough. Depending on the combination of different source and target domains, how could the cross-domain features be transferable or not? The predictions on adversarial examples were all Jigsaw Puzzle. Does that indicate that any features in Painting are biased to the class? More broadly, how does style transfer relate to this cross-domain perturbation? Please provide us your insights on bigger and scientific questions.

Reviewer 2

Originality: The authors propose to apply the relativistic discriminator loss to the generation of adversarial examples. The novelty is limited. Significance: The proposed method shows superior performance than several existing method in the field. Clarity: The presentation is clear overall, but there may be several typos in notation which may cause some confusion. For example, D(x, x′) in line 107 is misleading, as D takes in a single input alone. Quality: The manuscript lacks enough theoretical arguments / explanations why the relativistic discriminator loss is better than the traditional cross-entropy loss in generating adversarial examples. On the other hand, sufficient experimental results are provided for validating this claim. Pros: The paper provides sufficient experimental results for validating the claim that the proposed relativistic discriminator loss helps the training of a transferable generator of adversarial examples. It also provides a very concrete introduction and related work in terms of different attacking settings (or threat models). Cons: A main problem with the paper may be the lack of novelty. The proposed relativistic discriminator seems to be only a small modification of an existing RGANs. While the authors successfully applied this model to generation of transferable adversarial examples, they fail to explain the fundamental reason why this loss has a benefit.

Reviewer 3

This submission proposes to use a GAN to generate adversarial examples. The key difference from prior work is that the discriminator uses a relativistic loss (and perturbations are projected to satisfy l_infinity norm <= 10, rather than being unbounded). This relativistic loss enables the generator to create successful cross-domain adversarial examples; in other words, it can generate adversarial examples for an image distribution (e.g., ImageNet) that it was not trained on. The use of a relativistic loss in training GANs is not novel, as the authors acknowledge, but this is the first time it has been applied to generating adversarial examples, and the result is quite impressive. It leads to state-of-the-art adversarial attacks on both naturally-trained and adversarially-trained ImageNet models, in *both* white-box and black-box settings. Originality and Significance: This submission is low in originality, but high in significance. Although the contribution has relatively low technical novelty, the fact that it leads to state-of-the-art adversarial attacks is highly significant. The community should be aware of this, so that it can take these attacks into consideration when generating defenses. In terms of related work, the submission did a good job of acknowledging existing work on adversarial attacks that are image-agnostic or produced by generative models. One recent work (in the latter category) that should be included is [1]. When comparing with prior methods (those in Table 1), why not also compare to other methods that use GANs, for instance that work or [19]? I would also appreciate a more detailed comparison of how the proposed GAN framework differs from those two works. Clarity: I found the submission to be clearly written and well-organized. Quality: I'm curious about the use of "instance-agnostic" to describe adversarial examples produced by GANs. Is this common terminology? There is still a separate adversarial perturbation computed per image, rather than a single perturbation that is added to all images (as in UAP). It's true that only a single forward pass is necessary, instead of a backward pass as well, or multiple forward and backward passes. But it doesn't seem truly instance-agnostic to me. [1] Song et al. Constructing Unrestricted Adversarial Examples with Generative Models. NeurIPS 2018. https://arxiv.org/abs/1805.07894

Paper ID:	7048
Title:	Cross-Domain Transferability of Adversarial Perturbations

Reviewer 1

Reviewer 2

Reviewer 3