Paper ID: | 6710 |
---|---|

Title: | Universality in Learning from Linear Measurements |

The paper should provide a more thorough comparison with existing results on the characterizations of non-Gaussian random matrices (e.g., binary, Rademacher, etc.). In particular, whether the analytical results in this paper meet or exceed the performance guarantees already available in the literature for several non-Gaussian matrix designs. (This has been addressed in the rebuttal) Clarity: * The statement in lines 51-55 does not discuss any dependence of the number of measurements on the signal observed x_0. * The corollaries (1,2,...) should be linked to the Theorems from CGMT to which they are derived from. * The statements in lines 233-234 ("resembles the definition of the Gaussian width") should be more precise. Is (5) the Gaussian width? Or are there differences, and what are they? (This has been addressed in the rebuttal) * In line 247, should the matrix M be normalized so that its Gramian could match the covariance matrix? * In line 307, "Corollary 3 indicates that 3rn measurements...": the result in the corollary is asymptotic, while this statement appears to describe a finite value of n; so the two statements do not seem to be equivalent? (This has been addressed in the rebuttal) * The supplemental material has many broken references (??) Minor items and typos: Line 39 define acronym iid Line 59 vec(X) should be vec(X_0) Line 64 should non-negativity be positive (semi)definiteness? Line 67 et al -> et al. Line 185 funtion -> function Lines 251, 253 i.i.d -> i.i.d. Line 272 There is no need to define the assumption within the assumption. Line 301 the later -> the latter

Overall a very nice paper. I don't have many comments, as I found the results quite interesting and the paper well-written. It is quite relevant to the community as it enables a wider range of matrices to be used. === I've seen and taken into account the author's response, and it does not change my score.

The results are of a high quality and certainly deserving of publication. The results are original, but not particularly surprising or timely. There has been an intense study of convex relaxation for recovery of structured signals, with the initial optimal order theoretical results starting in approximately 2005 and 2006. The initial results were primarily for gaussian or randomised Fourier measurements. There has been a number of results showing that the results hold beyond these cases; some of which has been well cited, but others such as "Universality in polytope phase transitions and message passing algorithms" by Bayati. Lelarge, and Montanari is missing. Extending these minimal sampling bounds to a broader class of linear measurements is important, but not especially surprising and as a result I wonder if NeurIPS is the most suitable venue. While very pleased with the results, I expect that even for subject area experts in sparsity attending NeurIPS, that they would likely prefer to attend talk on more unexpected or timely topics. Outside this issue of innovation, the writing is clear and generally scholarly. That said there are a few areas that would benefit from minor improvements. Title: The title is not sufficiently informative as the results are for convex objectives, while the title does not convey this important restriction. The title should be adapted to reflect this restriction. Non-convex alternatives: There are now a number of computationally efficient non-convex algorithms with recovery guarantees under the RIP conditions. While theoretical these methods are not well understood, they do have good performance, especially for the low rank approximation problem where for small sampling rates they appear to circumvent the necessary factor of 3 in the phase transition for convex algorithms. Historical context: The results focus on deriving the asymptotic minimal sampling complexity. The thresholds presented were first derived in a series of papers by Donoho and Donoho and Tanner. Unfortunately these initial results are not cited, instead later results extending these original results are stated. It would be appropriate to cite these foundational works, such as the paper "Counting faces of randomly-projected polytopes when the projection radically lowers dimension" by Donoho and Tanner which originally proved the results 2k\log(n/k) stated on the top of page 2 and attributed to [12]. Moreover, if interested in notions of universality the authors might be interested in the paper "Counting the faces of randomly-projected hypercubes and orthants, with applications" by Donoho and Tanner where they prove similar sampling complexity that is truly universal in that it holds even for deterministic matrices.