Review for NeurIPS paper: Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations

NeurIPS 2020

Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations

Meta Review

This work investigates the phenomenon of adversarial examples for neural networks. It shows that, under certain conditions on the architecture (monotonically decreasing widths of layers), for "most" ReLU networks (with respect to random generation of edge weights), all points permit small distance adversarial perturbations (w.r.t. Euclidean metric) and these points can be found by gradient flow (or gradient descent with sufficiently small step sizes). This study uses techniques from random matrix theory and gradient flows to establish the result. Now experimental validation of the phenomena is proven, which, as some reviewers pointed out, would be useful to have, since the proven results involve potentially large constants (in fact, the work makes heavy use of big-O-notation, which makes the statements and presentation nicely succinct, but can also hide potential issues with relevance of the proven claims). Most reviewers appreciated the results established in this work, providing a sound theoretical explanation for the abundant phenomenon of adversarial examples for DNNs. This phenomenon is currently receiving a lot of research attention in the general ML community, and also of societal interest as neural networks are being employed in a growing range of applications and better understanding of their performance and vulnerabilities is important to develop user trust. This paper brings a new set of techniques to formally understand this phenomenon. Several reviewers have also pointed out weaknesses with regard to the presentation, which the authors are encouraged to improve when preparing their final version. In particular, given that the current manuscript has more than two pages room, the authors should consider: - being more explicit w.r.t. asymptotics, the heavy use of big-Oh notation can be viewed as hiding limitations of the current results, and in some places is also no cleanly used (e.g. Thm, 3.2) - overall, adding more explanations, illustrations, intuition to make the manuscript more accessible to less technically versed readers, flesh out proof sketches etc (consider that neurips is a venue where practitioners and theoreticians in the ML research community come together; personally I appreciate the clean and succinct writing style in the submission, but the authors should make reasonable efforts to make their work accessible to newcomers and the general audience at the venue they aim to publish at; maybe both succinctness and accessibility can be achieved) - adding a small set of experimental illustrations of the proven phenomena The reviews also contain more concrete suggestions along these lines.