NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:932
Title:ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies

Reviewer 1

- Originality: This work is a novel combination of existing methods [2,18,22]. - Quality: The submission is technically sound and well supported by theoretical analysis and experimental results. Authors show some failure case examples, however, they did not particularly describe or analyze why it happened. - Clarity: The paper is easy to read. - Significance: The experimental results are important as it advances the state-of-the-art in this field. However, this method is inherently limited to ResNet based network architectures. ----------After rebuttal----------- Authors address the some concerns in rebuttal, e.g., runtime overhead is negligible. I look forward to see other results that authors promised to report in the finial version. So, I stand with my initial decision as an accept.

Reviewer 2

[Originality] Although the ideas of 1) using ensemble for adversarial robustness, 2) explaining ResNets using ODE/PDE and 3) adding noise to architectures [1,2], ares not new, to my knowledge this is the first paper that connects PDE-based theory with adversarial robustness. [Quality] The theoretical part of this paper is sound, and mostly self-contained. However, I believe there are some mismatches between the theory and the experiments. I have listed my concerns in the improvement/question section. The authors are probably not aware of some similar papers like [1,2], which possibly have great overlap with the algorithm presented by the authors. [Clarity] The theoretical part of this paper is clear. The paper provides a nice visualization for explaining how the introduced diffusion term works. Since the algorithm is rather simple, I believe one experienced reader should be able to implement that quite easily. However, I found that there may be some inconsistency between the theory and the authors' implementation: the PDE equation in Theorem 1 definitely shows that all ResNets in one single ensemble share weights, while the authors do not point this out directly. It may cause some confusion, and I will return to this point in the questions. And there are some other minor issues, like the authors do not point out the configurations for the architectures such as channel size and batch norm position. Although ResNets are quite standard, I believe it is still important to specify the details at least in the appendix. The experiment results are not easy to read. I personally believe that Section 3.3 in the paper is the worst part - the authors could have put all the numbers into a table (or into other existing ones) so that readers do not have to check back and forth. [Significance] The PDE-based approach builds a connection between recent studies on PDE/ODE for ResNets and the study of adversarial robustness. It provides some theoretical soundness for a rather simple algorithm. The algorithm itself does not seem too novel, as using ensemble and adding noise are separated presented by some other papers. I believe the best point of this paper is that it potentially shows a new direction/framework for improving adversarial robustness. Finally, there are some issues with the empirical experiments. If these issues can be answered/solved, I believe the paper can be a contribution to Neurips. [1] Liu, X., Cheng, M., Zhang, H. and Hsieh, C.J., 2018. Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV) [2] He, Z., Rakin, A.S. and Fan, D., 2019. Parametric noise injection: Trainable randomness to improve deep neural network robustness against adversarial attack. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 588-597). [3] Cohen, J.M., Rosenfeld, E. and Kolter, J.Z., 2019. Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918. --------Post Rebuttal---------- The authors addressed my concerns and I raised my score accordingly.

Reviewer 3

Update: I have read the author's rebuttal and the other reviews. I am happy with my previous recommendation. - Regarding experiments, the highest vanilla resnet accuracy in the paper is close to 85%, however, there are many implementations of resnets which achieve close to 95% on CIFAR10. Why aren't the baseline numbers reported in the paper high enough? The observations of the paper still hold. However, this reviewer is concerned regarding the efficacy of the method towards very high or SOTA performing models. - I believe the derivation of the solution eq 10 is important and relevant for readers new to ODEs. The paper's accessibility/correctness would be improved if the derivation was included in the supplementary. - Line 126, strong suggestion to change the symbol denoting injected noise into resnets as Resnet'. Better symbols might be Resnet* etc. This is because Line 130, uses Resnet's, which cause confusion. - How is a picked, what happens if you increase or decrease a? Minor comments: - Th 1, line 116, please add in the meanings of the symbols that were mentioned in [22]. This interrupts readability. Even a very brief one linear suffices.