Paper ID: | 4658 |
---|---|

Title: | Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks |

Originality: While the methods used in the paper exist already, I think the way in which they have been combined is sufficiently novel and interesting. Quality: I think the idea is theoretically sound and is evaluated sufficiently. I agree with the intuition that encoding logits according to a Hadamard matrix provides sufficient separation between logits, which presumably makes it harder for an adversary to misclassify and input. I assume that when the authors say `white box attacks`, they reveal the Hadamard matrix and final layer to the adversary. The idea of using an ensemble of networks where each network predicts two bits is well motivated. Clarity: I think the paper is very well written and could possibly be reproduced by others. Significance: For me the most significant part of the paper are Tables 1, 2 and 3. I think the reported robustness and improvements over Madry is a convincing result. Clarifications/weaknesses: 1. When the authors say `white box attacks`, I assume this means that the adversary can see the full network with the final layers, every network in the ensemble, every rotation used by networks in the ensemble. I would like them to confirm this is correct. 2. Did the authors study numbers of bits in logits helps against a larger epsilon in the PGD attack? Because intuition suggests that having a 32 bit logit should improve robustness against a more powerful adversary. This experiment isn't absolutely necessary, but does strengthen the paper. 3. Did the authors study the same approach on Cifar? It seems like this approach should be readily applicable there as well. ---Edit after rebuttal--- I am updating my score to 8. The improved experiments on Cifar10 make a convincing argument for your method.

Coding theory addresses both stochastic and adverarial noise. Minimum distance, one of the simplest and most familiar metrics of a code, measure the ability of a code to correct adversarial errors. I don't understand the argument in Section 2.1 about the volume of the "uncertain region" in the space of logits. Why is the Euclidian volume relevant? Any continuous function with the same limiting behavior will have this behavior. The relevant way to measure the size of the uncertain region is using the distribution of the inputs to the softmax layer. If the variance of these inputs is too large then the outputs will be overconfident and if the variance is too small they will be underconfident. A persuasive argument that the shape of the softmax function is responsible for overconfident classification must be more nuanced than the one made here. Section 2.2 describes a "correlation-based decoder". This is a logistic activation layer followed by a linear layed defined by the code followed by a ReLU layer followed by normalization to a probability distribution. Why were these two choices for the nonlinear activations made? Doesn't the ReLU throw away information that could be useful for calibrated confidence levels? Why is the alpha hyperparameter introduced? It is not used in the experiments and the framework does not seem to depend on it. In Section 2.3, the code described is equivalent to a repetition code. This is a trivial code. Why should it offer any benefit? The Hamming distance between codewords is increased, but on the other hand, the operator norm of the final layer is larger than that of an identity layer. This would seem to cancel out the benefit of the code. In general, the problem of adversarial examples exists even with only two classes, but there are no interesting codes in this case. Theorem 1 is a variation on the well known Plotkin bound.

Summary: The region of uncertainty (prediction probability close to 0.5) for softmax of logits is extremely small near an M-1 dimensional hyperplane in the logits space. The reason is changing one of the logits for one of the classes affects the probability vectors in all dimensions. The authors show that, if each logit is first converted to an independent probability using 1/(1+exp(-x)) function and the probability vector correlated with each codeword of an error correcting in a soft way to decode, this method has a large volume of uncertainty. The volume of uncertainty is larger when the min hamming distance of the code is large. This because multiple logits must be changed at the same time to cause a wrong decoding. Authors demonstrate this with nice plots. Authors propose to use subset of rows of Hamming Matrix for two properties: They have almost the best minimum hamming distance (half the dimension of the code) and they are orthogonal (which is important for unconfused decoding). They derive an upper bound on distance for any code. I checked the proof of the upper bound - it seems correct. Then the authors demonstrate performance (using Hamming 16 code) against PGD and Carlini Wagner attacks. In the case of MNIST for PGD attacks, it loses 7% accuracy compared to Madry et al's adversarialy trained examples. In all other combinations it seems to better. In Table 2 additional results are provided for epsilon=0.4,0.45 where it is much better than the Madry et al's adversarially trained models. For Fashion-MNIST, results are more encouraging against an adversarially trained MNIST. Supplement has very interesting insights as to how the codes help in various ways (adversarial examples having high uncertainty amongst others and actual example digits which are difficult even for humans and hence already on the border). Significance and Originality: Use of error correcting output codes to improve robustness with empirical results is very novel (although explored in Machine learning in some other context before). The results seem very promising as well. Clarity: I like the presentation of the paper, explaining the intuition behind using codes with large distance. Quality: Submission seems technically sound. Weaknesses: a) This page from the "Madry Challenge" - https://github.com/MadryLab/mnist_challenge - lists state of the art attacks. The one reported in Table 1 for epsilon=3 seems to be the PGD with 100 steps. What happens with other state of the art attacks ? - At least the latest in the list could have been tried instead of just PGD. b) Authors could have tried on CIFAR-10 - again - the leaderboard roughly has 47% accuracy for the adversarially trained model. It would have been interesting to find if output codes could contribute independently or in combination with existing robustification methods. c) Just based on the code + Lipschitz constant + architecture information is it possible to offer a robustness certificate ? Does using codes make it easier to certify ?? ****After the rebuttal ****** I asked for more powerful attacks and also results for CIFAR-10. The authors did exactly that. Under the DAA attack (which is the top 2 in CIFAR-10, MNIST) for CIFAR-10 their method achieves 55% which is 10% (!) more than 44% on Madry's adversarially trained model. That really is a big improvement. The authors argument about higher volume of the uncertainty region seems convincing and intuitive and the fact that one has to disturb multiple logits instead of just one single one in the softmax case is very intuitive. I thank the authors for their work even during the rebuttal in clarifying my questions about experiments. I am increasing my score.