Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper studies the problem of robust sparse Mean Estimation, and robust sparse PCA in the strong-adversary model, i.e. an adversary inspects your samples, and can replace a constant fraction of them. In this setting, the authors get near optimal bounds, along with near-optimal sample complexities, in particular, the sample complexity scales only logarithmically with the dimension p. The main benefit of this proposed algorithm is that it escapes doing sparse-eigenvector computations, which was a bottleneck in previous works. The authors achieve this by carefully exploiting the problem structure, namely, using the knowledge of true covariance matrix, to construct a simple test to identify outliers. The proposed algorithm only requires eigenvector computations, and hence is a practical algorithm. Moreover, the authors do a good job at testing out their proposed algorithm by conducting extensive simulations. The paper is clearly written, and the authors have done a good job at giving an intuitive explanation of their algorithm.
This paper revisits the setting of robust statistics, specifically the mean estimation and PCA problems for Gaussian distributions with adversarial errors. It proposes algorithms for estimation with improved performance in time and sample complexity when the solution is sparse. For the mean estimation problem, the sample complexity matches the information theoretic optimum, and the error rate matches that of the best efficient algorithms, which is conjectured to be optimal for efficient algorithms. For PCA, the sample complexity is roughly quadratically worse than the optimum. Both methods also feature a significantly reduced running time compared to a previous (ellipsoid based) algorithm, though this is still quadratic in the ambient dimension and number of points. For mean estimation, the algorithm of Cheng-Diakonikolas-Ge from this year's SODA seems to obtain a different tradeoff that could be better under some circumstances, i.e., when Nd is large compared to epsilon. The question is important and the theoretical guarantees are nice. I note that other works on robust mean estimation in particular have considered much more general distributions than the Gaussian setting considered here though. I also have some serious concerns about the presentation. In the first place, the algorithm itself contains a number of undefined terms and notation. In particular, what is the h_k function? Also, I can guess what makes a set of examples "good" but not the particular, parameterized notion you use. Second, I could not understand the sketch of the main argument for mean estimation, specifically around lines 167-170. I realize the NeurIPS page limit is harsh, but if I can't get an appreciation for the key new theoretical ideas from the paper, that is a problem. This would be more useful than the nop Conclusion section. The current presentation simply does not stand alone. The empirical evaluation of the sample complexity and error rates seems pretty thorough, and this is good. One thing I would really have liked to see that is missing is the running time for the new method; the improved sample complexity is perhaps the headline, but one of the real problems with the prior, ellipsoid-based algorithms is the running time, and so it would have been ideal to report this improvement. I am pretty confident that it should be possible to beat an ellipsoid-based algorithm in practice, but why not report those numbers? A comparison with Cheng-Diakonikolas-Ge would also be very informative. ============== The running time numbers are helpful and I do encourage including them; the comparison to the running time of the first SDP of the ellipsoid-based method for the small instance gives a good point of reference. Also, thank you for the comparison to Cheng et al. Regarding the presentation, the description in the rebuttal (and the response to reviewer #3) were helpful. As a general principle I would strongly *discourage* using notation in the body of your paper that is only defined in an Appendix (in this case, the supplementary material). If the paper is accepted, please clean this up.
This paper is applying the existing filtering idea to a new problem, so the bar should be raised higher. I think it has good ideas, but there are a few issues the authors should address in order to increase the competitiveness of this work. 1. Theorem 1.2 talks about sparse mean estimation for Gaussian with identity cov. However, if k = d, which means that we have no sparsity assumption, the sample complexity the authors get is d^2 log d, which is way worse than the optimal d. Why do the authors claim it is optimal? 2. In general, the filtering idea and the ellipsoid idea are very similar: the ellipsoid idea relies on an unknown parameter, and the filtering idea is trying new parameters again and again. Hence, it would be very important to explain how the authors' test statistic compare with the ellipsoid algorithm approach. Are they the same or similar? If so, the value of this paper is watered down significantly. I read the rebuttal, and decided to increase my score since the authors have clearly demonstrated what their concrete contributions are.