Paper ID: | 3146 |
---|---|

Title: | Adversarial Training and Robustness for Multiple Perturbations |

The theoretical contribution of this paper (Section 2) is solid and neatly extends the work of Tsipras et al. (ICLR, 2019) to simultaneous robustness to multiple perturbations. The empirical work about comparison of different attacks and simultaneous robustness of models to them is also interesting, and raises important questions (e.g., multi-perturbation robust models for MNIST and CIFAR 10). The presentation is clear, and I personally liked the problem and their results.

Originality: the paper is mostly original in considering the problem of robustness to multiple perturbation types. The trade-off between adversarial robustness to different norms and the definition of "affine attacks" has been also investigated for linear classifiers in: - A. Demontis, P. Russu, B. Biggio, G. Fumera, and F. Roli. On security and sparsity of linear classifiers for adversarial settings. In A. Robles-Kelly, M. Loog, B. Biggio, F. Escolano, and R. Wilson, editors, Joint IAPR Int’l Workshop on Structural, Syntactic, and Statistical Pattern Recognition, volume 10029 of LNCS, pages 322–332, Cham, 2016. Springer International Publishing. In that paper, it is shown that while one can design an optimal classifier against one lp-norm attack, the same classifier will be vulnerable to the corresponding dual-norm attack (e.g., if one designs a robust classifier against l-inf attacks, it will be vulnerable to l1 attacks). This result was based on the paper: - H. Xu, C. Caramanis, and S. Mannor. Robustness and regularization of support vector machines. Journal of Machine Learning Research, 10:1485–1510, July 2009. where the authors established an equivalence between 'adversarial training' (i.e., solving a non-regularized, robust optimization problem) and a regularized linear SVM. In other words, it is shown that a proper regularizer can be the optimal response to a specific lp-norm attack. In the submitted paper, this phenomenon is stated in terms of mutually exclusive perturbations (MEPs) and shown for a toy Gaussian dataset. I think it would be very interesting to explore connections to the notion of dual norms as explored in the aforementioned papers. This at least deserves to be discussed in the paper. Quality: the submission is technically sound, even though I did not check all the derivations in detail. Clarity: clarity could be improved. The paper provides mixed heterogeneous contributions which are somewhat listed in the paper in a not completely structured manner. It would have been useful to try to better separate the main contributions and the corresponding experimental analyses. I also understand however that this may be problematic, as space is quite limited and the contributions of this paper are somewhat heterogeneous in nature. Significance: the results are interesting for the community working on adversarial examples. Even though the notion of trade-off between different attacks has been around for a while, it has not been clearly analyzed as done in this paper. COMMENTS AFTER READING THE AUTHORS' REBUTTAL -------------- I read the authors' response and thank them for welcoming my suggestions. Another suggestion that they may find useful is to read a recent survey which clarifies that adversarial machine learning started much earlier than 2014, and gradient-based adversarial examples were essentially re-discovered in 2014. - B. Biggio and F. Roli. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition, 84:317–331, 2018. In fact, similar gradient-based attacks were already developed by Biggio et al., who were actually the first to formulate adversarial attacks as optimization problems and solve them via gradient-based optimization (in particular, see: - B. Biggio et al., Evasion attacks against ML at test time, ECML PKDD 2013 - B. Biggio et al., Poisoning attacks against SVMs in ICML 2012 This is also mentioned by Ian Goodfellow in some of his talks (see, e.g., his talk at Stanford - https://www.youtube.com/watch?v=CIfsB_EYsVI) and acknowledged in the book: - A. D. Joseph, B. Nelson, B. I. P. Rubinstein, and J. Tygar. Adversarial Machine Learning. Cambridge University Press, 2018.

The paper follows the work of Tsipras et. al., and use simple data distributions to theoretically study the robustness trade-offs. The hypothesis is that adversarially-trained models tend to focus on different features to achieve robustness to a particular attack. The paper shows, through analysis of the activations of the network, that robustness to l_\infty leads to gradient masking for other types of adversaries on MNIST. Looking at Table 5., this does not seem to be the case for CIFAR. Furthermore, the trade-off seems to be less significant on this dataset (e.g., comparing OPT(R^{avg}) and R^{avg}). This suggests that some of the results may be artifacts of MNIST, and more generally, the hypothesis in Tsipras et. al. on explaining the phenomenon of adversarial examples may not be sufficient.