Paper ID: | 7741 |
---|---|

Title: | Adversarial Robustness through Local Linearization |

Originality: Starting from the gamma (local linearity) measure, the paper shows the importance of local linearity of the loss surface. Inspired by this, the authors propose LLC regularization. The story in this paper is complete and convincing. Quality: The submission is technically sound. The claims are well supported by the experiments, although for me the theoretical analysis in Proposition 4.1 is trivial by a simple use of Taylor expansion. Clarity: The paper is well-written and the presentation is clear to me. Significance: Finding better methods to train neural network models with improved robustness is an important research question. The paper moves further by proposing a new regularization technique, which improves both the performance (or comparable performance on CIFAR-10) and the speed over prior work. I also have some questions on the presentation of the paper: 1. It is not very convincing to me why using the difference between logits for the loss function yield lower adversarial accuracy than the cross-entropy, where the latter has been widely used in various papers. 2. The paper does not show how to choose the regularization parameter. 3. In Table 2, it seems that TRADES achieves the highest natural accuracy (thus putting more weights on the accuracy for its regularization parameter). I am wondering how the authors tune the regularization parameter for TRADES. By putting more weights on the robustness, can TRADES outperform the proposed method? ================== I have read the authors' rebuttal. The authors promise to clarify and include full sweep results for various baseline methods in the later version. I am looking forward to it, as I find the reported results in Table 2 are a little strange. In particular, the natural accuracy of well-trained TRADES in many papers is ~84-85%, while the reported result in this paper is ~87-88%. So I guess the authors did not trade the regularization parameter of TRADES for its highest robustness (The author can compare their method with the provided checkpoint in the TRADES official Github as in the footnote 8, but footnote 8 does not show the result of LLR for Wide-ResNet-34-10 architecture). Thus, I still feel skeptical of the fairness of the comparisons in Table 2. Besides this, this is a good paper. So I am willing to vote for acceptance of this paper.

This paper provides a new regularizer for robust training. The empirical results show the efficiency of the proposed method. But there are some places the authors should further clarify: 1. Previous work shows that gradient obfuscation is the mechanism of many failed defenses, but no work verifies that prevent gradient obfuscation can lead to better robustness. 2. In Eq (7), the authors give an upper bound of the loss gap, and minimize the upper bound in the training objective. I wonder why minimizing the upper bound will be better than directly minimizing the loss gap, as basic PGD-training does. 3. The authors should report results on more diverse attacks, like Deepfool, which is more adaptive to linear loss function.

The authors proposed to minimize a local linearity measure of the loss function (defined in Eq. (6) as the difference between tangent hyperplane and loss function) along with the empirical loss in the adversarial training. By doing so, one could avoid the so-called "gradient obfuscation" problem associated with few iterations of gradient based optimization of inner maximization in adversarial training. This leads to significant speedup of adversarial training, while achieving comparable or better robustness compared to PGA-based adversarial training. The main theoretical result is presented in Prop. 4.1, where the adversarial change in loss function was shown to be upper bounded by sum of the defined local linearity measure and absolute inner product of perturbation and loss gradient w.r.t. the input. The authors then suggested to use these two terms as regularizers in adversarial training of the model (Eq. (8)). For the first term, as we seek to minimize <\delta, \grad_x l(x)> for *all* perturbations \delta in the local neighborhood B_\epsilon, we should naturally aim at minimizing ||\grad_x l(x)||_2. However, the authors proposed to minimize <\delta_LLR, \grad_x l(x)> instead, where \delta_LLR is the perturbation that yields highest deviation from the tangent hyperplane. So the logic is not clear to me for this term. The second regularizer is the measure of deviation from linearity, which is computed in the same way as PGA iterative approximation to inner maximization of adversarial training, but with much fewer iterations. The empirical results on CIFAR 10 and ImageNet datasets support the claims under a rich variety of attacks.