Reviews: Lower Bounds on Adversarial Robustness from Optimal Transport

- It is worth mentioning that there has already being some theoretical work linking adversarial robustness to optimal transport. See "Generalized No Free Lunch Theorem for Adversarial Robustness", ICML 2019 - To be more rigorous, there are a few constraints which must be satisfied by f, g, and c in definition 2, for the ensuing arguments to be strictly valid. For example, f and g should be restricted to bounded continuous functions, etc. etc. - In the proof of Theorem 2, what is $\tilde{P}_{X_1}$ ? I checked over and over again, but this quantity was never defined in the manuscript. - In the proof of Theorem 2, what do you mean by "translate and pair in place" ? Please write out the form of such transportation plans explicitly. How do you get the upper bound in terms of TV distance ? Please provide all relevant details. - In inequality (1) of appendix B.2, what is \tilde{P}_{X_1} ? What is it's relation to minimization variable z ? How do you go from (2) to (3). The proof as it stands is at best not clear, and at worst incorrect. Too many underfined quantities, and unjustified transitions. - line 76: ... gives tilde{x} to the learner and ... - line 92: correctly, ==> correctly. - lines 224 -- 234: What is m ? m ==> d ? - line 177: missing verb "optimizing over plans us" ==> "optimizing over plans gives us" - In the equation just after line 196, the inf after the first inequality doesn't make sense (variable z is not used!) - What are X_1 and X_{-1} ? - Theorem 1 and the constructions leading to it could be greatly simplified by only considering "symmetric" neighborhoods N, i.e x' in N(x) iff x in N(x'). Also only considering spaces for which N(x) subseteq X for all x (which leads to tilde{X} = X), simplifies the whole thing, and is also a reasonable restriction since adversarial examples are only "genuine" if they are from the same space as clean ones - The Gaussian example considered in section 4 was already proposed in Tsipras'18 (ref. [74] ?). I hand-checked and the lower bound on adversarial error (see eq. 8 of supp. mat) proposed by the present authors matches what was obtained in that paper. The authors should check this, and perhaps add a paragraph in their paper making this point clear. These are all minor points, and should be easy to address. Major Edit: There is a Bug in proof of Theorem 1. ====================================== The condition under which Theorem 1 has been proven, namely the "separability condition", is too restrictive: it implies the joint distribution of $X$ and $Y$ is discrete! Indeed, I'll show that the set compliment $\mathcal Z\setminus (a_n)_n$ has zero measure, where $\mathcal Z = \mathcal X \times \{-1,1\}$. Indeed, $\mathcal Z \setminus (a_n)_n$ (assumed measurable!) contains no $a_n$, thus by the contrapositive of "separability condition", we must have $P(\mathcal Z \setminus (a_n)_n) = 0$. Thus the support of $P$ is a contained in the sequence $(a_n)_n$, and so the former must be a countable combination of atoms. In case my above measurability assumption (in brackets) is troublesome, one may remedy my argument as follows. Actually the point $\{a_n\}$ may be non-measurable. But we may denote by $c_n$ the infimum of measures of all measurable sets containing $a_n$, this infimum is realized (since the countable intersection of the sequence of minimizing sets is measurable itself), and the set $A_n$ which realizes it is an atom. In any case, the product spafce $\mathcal Z$ must be discrete! $\Box$ However (and fortunately), one can discard the troublesome "separability" condition and still prove that in Theorem 1, LHS >= RHS. This is anyways sufficient for obtaining lower bounds on universal error.

General comment: This paper provides a nice principled way to compute a theoretical lower bound for robust accuracy based on a distance metric between two classes in binary classification. This makes rigorous the intuition that the more separated (in the sense of the perturbation neighborhood the attacker can search in), the lower the optimal robust loss should be - bounded below by the optimal standard loss. Although I believe this type of result could have also been shown outside the optimal transport framework with perhaps less notation, it is one justified way to look at the problem. The paper is written in a coherent manner, the theory is clear as are the experiments. I am not very aware of similar work - if indeed it is the first to prove lower bounds for adversarial robustness using a distributional distance metric I would highly recommended this paper for publication at Neurips. Here are some questions that remain, from high to low level: - How could this framework be extended to multiclass classification? And how would the method scale with the number of classes? - For the CIFAR plots you should add robust classifier losses as well even though I understand they might not look that great, that is what we have at the moment! - For Figure 3, aparently there are training issues above certain beta. What is the adversarial training accuracy in those cases? I suspect that it is below 100%. In this case, choosing a sufficiently expressive model with the right parameters should mitigate this issue? - In proof of theorem 3: It should read Pr(V \in ...) on the RHS.

Paper ID:	4082
Title:	Lower Bounds on Adversarial Robustness from Optimal Transport

Reviewer 1

Reviewer 2

Reviewer 3