__ Summary and Contributions__: As promised by the title, the paper presents an algorithm for the exact verification of BNNs. The BNN and the robustness specification are encoded as a set of Boolean clauses. A SAT solver is used; the standard SAT solver is modified to handle reified cardinality constraints. Empirical results show that this leads to considerable decrease in computation time (2 to 3 orders of magnitude). A second contribution is the training method and the enforcement of sparsity.

__ Strengths__: The problem setting itself is not new. The key contribution is the modification to the SAT solver which results in significant decrease in training time. The paper hypothesizes that imbalance in the layer-wise sparsity slows down the SAT solver; some empirical evidence is provided, and the paper then proposes a novel design method to enforce it. A variation of gradient canceling is proposed to induce robustness.

__ Weaknesses__: All the empirical results are for small examples (CIFAR-10) raising the question of scalability.
Added post-discussion: The authors' acknowledge this issue in the feedback, but provide no potential approaches to lowering complexity / enabling scalability.

__ Correctness__: No significant issues.

__ Clarity__: Yes

__ Relation to Prior Work__: Seems to be well-positioned wrt SoA

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The manuscript studies the problem of verifying BNNs using SAT solvers. The manuscript introduces a methodology that is based on i) modifying an existing SAT solver (i.e., MiniSAT) to natively handle reification of cardinality constraints, and train solver friendly BNNs for verification by the means of lowering their bias. Experimentally, the introduced method (i.e., EEV) outperforms SAT solvers with sequential counters by Sinz et al. 2005, and a SMT solver (i.e., z3) over MNIST and CIFAR10 datasets.

__ Strengths__: The main strengths of the work are i) the potential efficiency of adapting existing SAT solvers to handle BNN verification problems more efficiently compared to naive cardinality encodings, and ii) the training methodology for balanced sparsity in BNNs.

__ Weaknesses__: The main weaknesses of the work include i) missing experimental baseline comparisons and ii) missing discussion of related works in BNN and SAT literatures on similar problems. For these reasons which will be explained in detail below, the manuscript is not ready for a publication at NeurIPS in its current form.

__ Correctness__: Comment 1: The main contribution of the paper is the extension of an existing SAT solver (i.e., MiniSAT) to a SAT solver that can handle (reified) cardinality constraints. There exists successful solvers already in the pseudo-Boolean (some of which DIRECTLY extend **MiniSAT**) and constraint programming research that can handle cardinality constraints without the use of auxiliary variables/constraints. The manuscripts unfortunately completely skips areas of research that are highly relevant. Unfortunately the manuscript neither discusses why they have not been used for comparison, or use them as meaningful baselines.

__ Clarity__: The paper is well-written aside from important missing related work and meaningful baseline comparisons.

__ Relation to Prior Work__: Comment 2: In the SAT research, there are more compact and efficient encodings for encoding cardinality constraints compared to sequential counters by Sinz et al. 2005 (e.g., cardinality networks by Asin et al. 2011). In the planning with BNNs research by Say et al 2020, cardinality networks have been extended to reified cardinality constraints, and have been shown to perform faster compared to sequential counter-based encodings.
Comment 3: There also exists a (preliminary) work on training BNNs with constraint programming that uses low amounts of data by Icarte et al. 2019 - which you can use as a reference for training sparse BNNs.

__ Reproducibility__: No

__ Additional Feedback__: Additional References:
Asín, Roberto and Nieuwenhuis, Robert and Oliveras, Albert and Rodríguez-Carbonell, Enric. Cardinality Networks and Their Applications. Theory and Applications of Satisfiability Testing - SAT 2009, pp. 167-180, 2009.
Icarte, Rodrigo Toro and Illanes, Leon and Castro, Margarita P. and Cire, Andre A. and McIlraith, Sheila and Beck, Christopher J. Training Binarized Neural Networks using MIP and CP. Principles and Practice of Constraint Programming. pp. 40-417, 2019.
Say, Buser and Sanner, Scott. Compact and Efficient Encodings for Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models. Artificial Intelligence, 285, 103291, 2020.
**ATFER REBUTTAL**
Thank you for your responses and the additional set of experiments. Below, please find my recommendations for your paper.
- Please consider including the results presented during rebuttal into the final version of the paper (i.e., this makes the paper stronger).
- Please remove the claim that the paper is the first to handle reified constraints of the form [\sum_i x_i >= b] <-> y natively.
- Please note there already exists modelling languages and solvers (i.e., ILP, CP, pseudo Boolean optimization etc.) that achieves a native encoding for reified constraints in the size of the BNN.
- Please consider releasing your code.

__ Summary and Contributions__: The paper addresses the problem of verifying binary neural networks (BNN) and of learning verifiable BNNs. To this end, it makes the following contributions:
- A extension of MiniSat that handles reified cardinality constraints that arise in BNN verification.
- Balanced sparsification and cardinality bound decay for training verifiable BNNs.
- Improved PGD attacks for adversarial training of BNNs.

__ Strengths__: + The speed of verifying BNNs over real-valued neural networks is significantly better.
+ Each of the contributions made in the paper seems to help further speed-up verification.
+ The paper provides a good high-level overview of the approach.

__ Weaknesses__: - Some aspects of the proposed approach require more explanation (please see the "Clarity" section for details).
- The accuracy of BNNs is worse than the real-valued networks (please see the "Correctness" section for details).

__ Correctness__: - In section 4.1, how is the proposed encoding different from [42]? How does the method of [42] compare with the proposed approach in terms of speed?
- In section 3.2, how much does the accuracy suffer due to the changes made in the first and the last layer in order to help with the SAT formulation of verification?
- In Table 2, PGD accuracy has no noticeable pattern wrt $\eta$. Any thoughts on why that might be the case?
- The accuracy of BNNs is lower (sometimes significantly lower) than the real-valued networks. Given the faster inference and verification time of BNNs, it would be interesting to see if larger BNNs can be used that still offer a speed-up but are much closer in terms of accuracy.
- In the "reject option" paragraph of the experiments section, I'm not sure I understand why the encoding would not be straightforward in real-valued networks. See, for example, the supplementary of [9], which talks about CNF type formulas for combining linear specifications.

__ Clarity__: - While the paper provides a good high-level overview of the proposed approach, it would be difficult to reproduce the results. The paper does state that the code will be open sourced, which would be essential for future research based on this submission.
- Why does sparsity help speed-up verification in the case of BNNs? Is it because the number of variables that the SAT solver needs to handle reduces? Or that propagation is easier?
- In section 5.1, why does the reformulation in equation (5) address the two drawbacks of ternary quantization mentioned at the start of the section?

__ Relation to Prior Work__: To my knowledge, relevant prior work has been cited. The relationship between the proposed encoding and [42] is not clear. A direct comparison of the proposed method and [42] is also missing.

__ Reproducibility__: No

__ Additional Feedback__: Post-rebuttal comment: I have read the rebuttal and the other reviews. While I agree with reviewer 2 that there exist other alternatives to the proposed extension, I still believe there is merit in this work as it applies the extended SAT formulation to binary neural network verification. I believe the claims of the paper can be restated in light of reviewer 2's comments in the final version of the paper, and hence, still support acceptance.

__ Summary and Contributions__: The paper aims to scale up formal verification of binarized neural networks. The authors push on two fronts: 1) tweaking a SAT solver to be more specialized in solving CNF encodings of BNNs, and 2) tweaking the learning of BNN to reduce cardinality bounds (which authors claim are the biggest bottleneck in current approaches) using new masking and regularization strategies.

__ Strengths__: Well written, it was easy/clear to read. The speedup in the experiments are quite impressive. The bookkeeping for CDCL seems very natural/intuitive, so it is nice to see a simple solution that works. Since most of the clauses are derived from cardinality constraints, the authors propose to do some bookkeeping to detect when a cardinality constraint only has a unique solution, and then directly fill in that unique solution.
The authors also propose to limit the size of the cardinality constraints by regularizing/penalizing high cardinalities directly during training.
Also appreciated that the authors conducted an ablation study in the appendix to study the individual effects of each method.
Overall the problem of verifying BNNs is quite relevant/important, and the authors propose a variety of improvements that all seem quite intuitive, well motivated and are backed up by the experiments.

__ Weaknesses__: Section 6 seems a bit out of place in a paper titled "Efficient Exact Verification of Binarized Neural Networks", considering its focus on training more robust BNNs. The section deviates from the overall theme of improving verification efficiency.

__ Correctness__: Yes, to the best of my judgment.

__ Clarity__: Yes, clear and well written.

__ Relation to Prior Work__: Yes, the authors present three techniques (bookkeeping cardinality constraints in CDCL, binmask, and regularizing cardinality bounds that are all novel to the best of my judgment.

__ Reproducibility__: Yes

__ Additional Feedback__: The parameter tau=5 in the regularization seems like it would be important to the verification efficiency and model accuracy tradeoff. Did you try other values for tau?
For the experiments in Section 4, do the other baselines use a variable ordering that considers variables derived from the same cardinality constraint in order? If not, could this simple tweak be added to also get close to the same benefits as the bookkeeping technique proposed in MiniSatCS?
Minor complaint -- since the code will only be provided after reviewing, it is perhaps a little unfair to check off training/evaluation code under "ML Reproducibility -- Code".
I have read the rebuttal, my opinion has not changed.