Robustness Guarantees for Adversarially Trained Neural Networks

Mianjy, Poorya; Arora, Raman

doi:10.52202/075280-1725

Robustness Guarantees for Adversarially Trained Neural Networks

Poorya Mianjy, Raman Arora

Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track

Bibtex Paper Supplemental

Abstract

We study robust adversarial training of two-layer neural networks as a bi-level optimization problem. In particular, for the inner loop that implements the adversarial attack during training using projected gradient descent (PGD), we propose maximizing a \emph{lower bound} on the $0/1$-loss by reflecting a surrogate loss about the origin. This allows us to give a convergence guarantee for the inner-loop PGD attack. Furthermore, assuming the data is linearly separable, we provide precise iteration complexity results for end-to-end adversarial training, which holds for any width and initialization. We provide empirical evidence to support our theoretical results.

DOI

10.52202/075280-1725

Abstract

DOI

Name Change Policy