__ Summary and Contributions__: The authors propose a first-order dual SDP algorithm for verifying properties of verification-agnostic networks. The generalization of the SDP formulation for adversarial robustness is presented. And an algorithm based on forward or backward operations on layers of the neural network is proposed to solve it.

__ Strengths__: The theoretical analysis is sufficient, and the experimental results demonstrate it's efficiency.

__ Weaknesses__: There are several upper bounds constraint, it is better to discuss and compare with those methods.

__ Correctness__: Yes, the claims and method are correct. The empirical methodology is correct.

__ Clarity__: Yes, the paper is well written.

__ Relation to Prior Work__: Yes, it is clearly discussed how this work differs from previous contributions.

__ Reproducibility__: Yes

__ Additional Feedback__: The authors' response is satisfied

__ Summary and Contributions__: This paper proposes a first-order variant of SDP-based robustness certification for neural networks. The proposed method is efficiently implementable on GPUs and scales to much larger networks compared to prior works and outperformed LP-based methods.

__ Strengths__: - Strong experimental results.

__ Weaknesses__: - Novelty, review of relevant works
- Presentation

__ Correctness__: I tried my best to verify the mathematical claims and found no obvious gaps or mistakes.

__ Clarity__: The writing can be improved in the future version, see detailed comments below.

__ Relation to Prior Work__: The comparison to prior works is somewhat incomplete, especially on the first-order SDP solvers.

__ Reproducibility__: Yes

__ Additional Feedback__: Post-rebuttal update:
The authors' response is very nice and detailed, probably the best one I've seen in NeurIPS/ICML this year. It does address many of my previous concerns, including the ones regarding contribution/novelty of this paper. I am raising my score so we can reach a consensus.
-------------------
This paper proposes a first-order variant of SDP-based robustness certification for neural networks. The proposed method is efficiently implementable on GPUs and scales to much larger networks compared to prior works and outperformed LP-based methods.
The main strength of this paper is it strong empirical performance - using the first order SDP solver proposed in the paper, the authors were able to verify much larger networks, and provided the first SDP-based certification for CIFAR-10. (disclaimer: I am not well-versed enough on the state-of-the-art of SDP based certification, so it's possible that I missed some of the existing work here).
However, my main conservation on this paper is about its novelty, presentation and review of the prior works.
(1) Novelty and review of the prior works: the main improvement of this paper over existing works [33], [15] is that they proposed the first-order SDP solver instead of the off-the-shelf ones, along with some heuristic acceleration tricks (and if I understood correctly, that's the only difference with [33]). However, the usage of the first-order methods in semidefinite programming is not a novel idea at all. From my quick and incomplete literature review, the origin of this idea can at least be dated back to 2002, in R. Monteiro's paper "First-and second-order methods for semidefinite programming" published in Mathematical Programming. There are some other papers on this topic, for example,
Primal-dual first-order methods with O(1/epsilon) iteration-complexity for cone programming, Guanghui Lan, Zhaosong Lu, Renato Monteiro, Mathematical Programming, 2011.
I would also like to refer to Zaiwen Wen's 2009 PhD thesis in Columbia University, "First Order Methods for Semidefinite Programming" and the bibliography therein. There are also other approaches to fast SDP solving, e.g. Sanjeev Arora and Satyen Kale's STOC 2007 paper "Combinatorial, primal-dual approach to semidefinite programs". Although I am not an expert in this topic (fast SDP solver), it seems like there is a whole research area dedicated to it, but this paper did not cite any of these papers. In the experiment and algorithm design sections, there is also no comparison with existing first-order SDP solvers. In my opinion, this part requires a major revision.
(2) Presentation: It is very difficult to understand Section 4 for anyone who haven't read [33] before (and the presentation in [33] is much more accessible). My suggestion is to include a brief summary of [33] before presenting the lagrangeian formulation. And it should be highlighted more about the fact that proposition 1 is just an lagrangian re-formulation of the SDP in [33].
Regarding the experiments: I hope the authors can explain the discrepancy between the verified accuracy for MLP-SDP using your method and [33]? As the authors mentioned (line 174), both methods are optimizing the same objective, but a 5%+ performance gap seems too large for such scenario. (I noticed that the authors mentioned a probable reason which is the use of different test set subsample, but it still seems a bit too large to me.)
Minor comments:
- line 163, optimval - optimal

__ Summary and Contributions__: This paper considers the semidefinite program for verifying adversarial robustness from [33], and proposes an optimization approach for obtaining an approximation for such SDP that reduces the computational complexity significantly and is still a certified upper bound.

__ Strengths__: The improvement in terms of computational complexity is significant. Typically SDPs are quite inefficient. The authors exploit the specific structure of the SDP to further relax it to an optimization problem that they can approach with first order methods. They implement the first order method by using an automatic differentiation tool. In particular this approach reduces the memory dependence on the number of activations from quartic (in [33]'s if an interior point method is used) to linear.
The numerical section is comprehensive.

__ Weaknesses__: I don't see major weaknesses in this paper.
One small weakness is that they don't show an explicit measurement of how this paper improves upon [33]. For instance reporting the running times in the experiments (or the appendix) could be a way to do so.

__ Correctness__: The methodology seems correct.

__ Clarity__: The paper is clear. An explicit succinct explanation of how verification methods work could be useful. For instance, what points are chosen to evaluate them, and what are the typical \phi functions taken into consideration, but given the space constraints I think it's quite self-contained.
There are some typos the authors should fix (see below).

__ Relation to Prior Work__: Relevant previous work is properly cited.

__ Reproducibility__: Yes

__ Additional Feedback__: The paper is very nice. The optimization ideas are straightforward and the application is relevant.
I found many typos. Some of them are below but I am sure I didn't catch them all.
line 112: is->be
line 116 andb
line 142: desribe
line 145: relexation
line 163: optimval
line 215: is->are
Algorithm 1: reference to equation (9) should be (4)