Summary and Contributions: In a nutshell, this paper reads like the algorithm-focused companion of the paper "Strong mixed-integer programming formulations for trained neural networks". The paper describes a tight convex relaxation of the multivariate inputs to a single ReLU (as opposed to the now standard triangle-relaxation). Even thought the tight description has exponentially many facets, performing separation over this set is easy. This separation algorithm is then utilized to build two new algorithms for verification -- (1) a cutting plane approach over the LP relaxation, (2) an approach that iteratively tightens bounds by doing forward/backward passes over the network. Empirical results over small-ish networks demonstrate a significant improvement in performance over prior methods.
Strengths: This paper is a good example of using mathematical intuition + mathematical theory to bring about new _practical_ algorithms for a problem that has seen a lot of attention in recent years. The paper brings tools from discrete optimization to bear on the problem of performing robustness verification on feedforward neural networks with ReLU neurons. The high-level ideas are clean and easy to understand for those with a background from that community. There is a good amount of theory and explanations that could be of interest to the community. The experimental results are solid.
Weaknesses: This paper is very dense and can be hard to read for some. See my comments under "Clarity".
Correctness: To my knowledge, yes.
Clarity: It is well-written but dense. The presentation of the paper is similar to what you might see in Mathematical Programming but crammed into the NeurIPS format, with some additional explanations to make the work appeal to a wider ML audience. I am familiar with works in the MP/IPCO community and so I found it reasonable to follow, but sometimes I got lost in the notation. I can imagine that it could be harder for some others. If the other reviewers concur with this sentiment, I would suggest the following changes to better appeal to the NeurIPS audience: - Defer more of the math and propositions/corollaries to the supplementary material. I understand the desire to get all the statements in the main paper to showcase your theoretical contributions, but I think this obfuscates the practical algorithmic contributions of the paper. - Add a figure to illustrate the basic high-level ideas behind the algorithms in the main text. This figure would also be useful for the eventual poster/presentation of this work. It might also be the case that giving the main text more room to breathe (e.g. an arxiv version with more pages) might suffice.
Relation to Prior Work: I would like a little discussion of how this work builds on or differs from the aforementioned prior work. The authors can decide if it's appropriate to do so in the paper itself or just in the response.
Additional Feedback: As stated earlier, the main issue is the presentation. Another thing that would be nice is providing performance profiles for the experiments in the supplementary. Do the authors know how this work empirically compares with "Neural Network Branching for Neural Network Verification" by Jingyue Lu, M. Pawan Kumar? I would love to see the code released. ---------------------------------- Update after rebuttal: I would like to thank the authors for their detailed response. I have increased my score to an 8 and I look forward to reading a version of this paper with better presentation.
Summary and Contributions: ***Update after author feedback*** Authors have addressed some of my concerns in feedback. After some reflection, I tend to *conditionally* increase my score (from 6 to 7) because of the following reasons: 1. they have provided partial evidence of assessment of variance in the experiments, showing that results are significant. It is crucial that such variance estimates are added to the paper and mentioned. There are strong arguments from a majority in the community, that not assessing variance arising from randomness is an experimental flaw. This is the *conditional* part of my score increase 2. The fact that they provide a separation oracle for the *tightest* convex relaxation of ReLU, helps to close down the chapter on such type of methods based on convex relaxations of activations. I find this novelty significant. *** Original Review *** The paper introduces a tighter convex relaxation, than the triangle-relaxation, for the problem of certifying ReLU networks i.e., certifying that all elements in a certain set are assigned the same label. The triangle relaxation has been the basis of many previous certification methods. The set defined in the new convex relaxation is shown to admit an efficient separation oracle, and thus the optimization problem is computationally tractable (polynomial time). Empirically there is some improvements over the baselines, but with larger computational budget.
Strengths: THe results are strongly rooted in theory, in particular, proving the existence of an efficient separation oracle shows that the optimization problem is computationally tractable. The empirical evaluation is also done on what appears to be a standard benchmark (ERAN dataset) and there is some improvement over the baselines.
Weaknesses: The work lacks novelty, as the certification problem has been extensively studied. In my opinion, it is not clear how much more can it be improved, and how relevant is in applications. Also, the dataset proposed (ERAN) does not have multiple trained versions of the same network. Given that such networks are trained using stochastic methods like SGD, there might potentially be a high variance in the numerical results. So, it is hard to convince the reader that the empirical improvement is significant. I would suggest to run the certification procedure on multiple versions of the same network (trained with different seed), to obtain some measure of variance in the numerical results.
Correctness: The supplementary seems complete and supporting of the main claims. I have not checked thoroughly, as it is quite long. The empirical evaluation is done on a standard dataset. However such dataset lacks repeated runs of the same network (with different seeds), and thus there is no measure of variance in the numerical results.
Clarity: The paper is clear and it is not hard to follow.
Relation to Prior Work: the work compares to multiple baselines on a standard dataset. The main differences are discussed and clear from the main text.
Summary and Contributions: The authors proposed a explicit linear inqeuality description for the tightest possible convex relaxation of a single neuron. They show that there's an efficient separation routine and present a linear time algorithm to evaluate point.
Strengths: clear presentation for the algorithm and proofs.
Weaknesses: similar performance compared to Kpoly
Relation to Prior Work: yes