NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:2025
Title:Exact inference in structured prediction

Reviewer 1

Detailed remarks Concerning theorem 1, the bound you prove is related to theorem 4.3 of (Mohar, B. (1989). Isoperimetric numbers of graphs. Journal of combinatorial theory) where a tighter upper bound is proved for laplacian matrix. It would be great to discuss whether your bound is tight in the general case of signed Laplacian matrices knowing that this one is (up to what I read) is tight for the Laplacian one. In particular it would be great to recall what you refer to as the 'Cheeger inequality' (which I believe concerns the first smallest eigenvalue) and the known result for the 2nd eigenvalue of the Laplacian in the paper previously cited Concerning the derivation of theorem 2, the result of theorem 1 appears naturally as the ingredient to bound the 3rd term which is the main originality of the proof. As a consequence, I think it would improve the readability of the paper to announce the use of Th1 earlier in the paper since I understood the logic of your sections only at my second read of the paper. Finally it would have been great to provide a numerical illustration of your result in theorem 2 by simulating graphs and showing whether the bound of theorem 2 indicating when the graph is correctly recovered by the SDP. This could have improved the paper even if it is not mandatory for the sake of the presentation. Minor remarks The notation change from V to \textit{V} (footnote (3)) is a bit weird since it is very hard to distinguish.

Reviewer 2

Overview: - This paper studies the conditions for exact recovery of ground-truth labels in structured prediction under some data generation assumptions. In particular, the analysis generalizes the one in Globerson et al. (2015) from grid graphs to general connected graphs, providing high-probability guarantees for exact label recovery which depend on structural properties of the graph. - On the one hand, extending the results of Globerson et al. (2015) to general graphs seems like a valid contribution. On the other hand, the assumed generative process (lines 89-101, proposed in Globerson et al., 2015) is somewhat toyish which might make the results less interesting. Therefore, I am inclined towards acceptance but not strongly. - I went over most of the proofs in detail and did not find obvious errors (see more minor comments below though). Comments: - I feel like the presentation can be greatly improved by including an overview of the main result at the beginning of Section 3. In particular, you can state the main result, which is actually given in Remark 2 (!), and then provide some high-level intuition on the path to prove it. This may mean that some derivations will have to be deferred to the appendix, which I think is fine. - Lack of empirical evaluation is another obvious shortcoming of this paper. - Perhaps you can come up with a title that is more specific to this paper (e.g., “Exact Recovery of Labels in Structured Prediction with General Graphs”)? Additional related work: - Very recent paper (published after NeurIPS submission): Approximate Inference in Structured Instances with Noisy Categorical Observations, Heidari et al., to appear in UAI 2019. Generalizes Globerson et al. (2015) to multiclass variables. I am wondering if you can use their result to generalize your approach from binary to multiclass variables. - Result on tractable models (line 41): Tightness of LP relaxations for almost balanced models, Weller et al., AISTATS (2016). - Result on approximate inference (line 47): Train and test tightness of LP relaxations in structured prediction, Meshi et al., ICML (2016). - Very recent result on label recovery (published after NeurIPS submission): Accuracy-Memory Tradeoffs and Phase Transitions in Belief Propagation, Jain et al., COLT (2019). The generative process seems very different and the focus is Belief Propagation, so this may be only remotely related. Other minor comments: - Line 142: Let M be a signed Laplacian **with eigenvector y** as in Definition 5. Otherwise it’s not clear what y is in the statement of Lemma 1. - Line 163: missing superscript before \top - Line 167: did you mean “\beta w_i^+” instead of just “\beta w_i”? - Line 175: I am probably missing something, but why is the probability equal to |(w_i^+)^2 - (w_j^+)^2|? Don’t you also have to require that (w_i^+)^2 \geq (w_j^+)^2? - Line 197: consider adding n to the statement of Thm 2 as it appears in the expression for epsilon. This is actually done in Thm 3, so makes sense here as well. Also, maybe mention in Thm 3 that q is the node noise (as done in Thm 2 for p)? - Line 274: \mathcal{R} is overloaded, used earlier for Reals. - Lines 282-286: what does “bad expansion” mean?

Reviewer 3

This paper is a, quite technical, theoretical paper that addresses a variant of a tough and old combinatoric problem : the (exact) recovery of a ground state from a noisy planted spin-glass model recast here as a structured prediction inference problem. It comes as a follow-up from two papers : "How hard is inference for structured prediction" by (Globerson et al. 2015), and "Inference in Sparse Graphs with Pairwise Measurements and Side Information" by (Foster et al. 2018). The key contribution of this paper is to answer an open problem stated in (Globerson et al. 2015) by providing a characterisation of the graphs where the exact recovery is tractable. The authors define the hardness of the exact recovery in term of a probability bound : the probability of recovering the exact ground state in polynomial time is higher when the graph is a good expander (i.e. when its Cheeger constant is high) and lower when its nodes have high degrees. They provide examples of graph families like the "d-regular expanders" that fulfil this requirement. As a by-product they provide an extension of Cheeger inequality to signed Laplacians. I found the paper reasonably clear but not as well written as (Globerson et al. 2015) for instance. It lacks a proper justification, a few explaining figures and concrete examples to guide the reader before she/he dives into the (painful) technical aspects of the proofs. The state of the art is really minimalistic: see "Related Work" section in (Globerson et al. 2015). It's not mandatory but it would make sense to move at last one of the proofs in the appendix (keeping only a sketch in the main paper) and use the freed space to fix these issues. I only deeply checked the proof of Theorem 1, but the rest of the maths (mainly the proof of Theorem 2) seems solid. Minor remarks/questions: - The edges noise model as defined on line 105 is a noisy observation of edges from correct spins, would it be easier for instance if we considered a generative model where all the edges would be derived from the same noisy observation of the ground node spins ? How would it impact the error bounds ? - It's only a factor 2 but please make it clear what kind of node degree you use in the Laplacian (oriented or non-oriented) POST REBUTTAL: If the degree of the D matrix is the non-oriented degree i.e. the number of edges connected, then the Laplacian should be L=2D-A otherwise x^T L x is not as stated in Definition 5.