Review for NeurIPS paper: Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time

NeurIPS 2020

Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time

Review 1

Summary and Contributions: The paper studies the problem of robust covariance estimation, i.e, given N samples from N(0,\Sigma), of which \epsilon fraction are contaminated, the goal is to design an estimator \hat{\Sigma} which recovers the true covariance \Sigma in the Mahalanobis norm. For this problem, the authors provide the first polynomial time algorithm which gets the (near-)optimal statistical error, and runs in near-matrix multiplication time, i.e. O(T(n,D) \log(\kappa)), where \kappa is the condition number and T(n,D) is time taken to multiply d \times N with its transpose. Note that previous methods obtaining similar statistical guarantees have an additional poly(1/\epsilon) dependence, and removing that is the main contribution of the paper.

Strengths: The paper is a strong theoretical work. Technically, the paper hinges on combining ideas from the quantum entropy scoring algorithm of [DHL19] with the existing covariance estimation algorithm in [CDGW19]. The problem itself is very relevant to the community and I believe that the idea should be useful in other problems as well.

Weaknesses: Dependence on condition number. Is there any intuition on why this condition number dependence appears, and how can one possibly remove it, if possible? It would be good to do actual runtime comparisons and experiments between CDWG19 and this work to demonstrate the improvement in run time. ### After Rebuttal #### I thank the authors for their thoughtful response. After looking at it and reading the other reviews, my score remains the same.

Correctness: Yes.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: Minor Point: Note that the main submission file also includes the supplementary material, and in particular, violates the submission policy.

Review 2

Summary and Contributions: This paper extends recent lines of work for achieving nearly linear time algorithms for robustly estimating core statistical quantities (e.g., mean and covariance of Gaussians). The main result is a nearly linear time algorithm for robust covariance estimation of a gaussian that has no dependence on the error parameter \epsilon.

Strengths: This is a very nice addition to the literature. The problem is fundamental and the end result is impressive. The paper seems correct and well written.

Weaknesses: The paper seems to import most of its machinery from a prior work [DHL] on quantum entropy scoring. In particular, it seems as though some very technical details need to be adjusted to suitably modify work due to CDGW19.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: I think the paper should be accepted, but it seems less exciting than a similar prior work of DHL that pioneered most of the novelty (applied in that case only to robust mean estimation).

Review 3

Summary and Contributions: The main contribution of the paper is to introduce an algorithm that improves time complexity while maintains the same error as reference [CDGW19]. The authors achieve this by replacing packing SDP phase in [CDGW19] With an approximated quantum entropy score filtering.

Strengths: This is a relevant work to the community and of significant interest. The work also has theoretical grounding and shows that the time complexity of the corrupted samples is almost the same as clean samples, thus it is robust to corruption with the same time complexity.

Weaknesses: The reviewer thinks it would be more interesting if the authors could also provide simulations to support their results.

Correctness: I have not gone through the details of the proof but the general claims and method sounds correct to me.

Clarity: The paper is generally written acceptably but can be improved.

Relation to Prior Work: The reviewer thinks it is not explained enough that how the authors are getting the better time complexity.

Reproducibility: Yes

Additional Feedback: In line 38, it will be better to provide a reference for the information theoretic bound on error for covariance estimation. In line 39, “However, …”, please provide a reference or clarify why the previous algorithms achieve O(\epsilon) error that its run time complexity is exponential in 'd'. The reviewer thinks although the total variation distance is well known but $d_{TV}$ in line 110 needs clear definition. Also the reviewer thinks it is nice if 'd' be defined after the abstract in the introduction before the first appearance as well. Please provide an intuition for the regularity condition in Definition 3.2. There is a typo in the probability of Lemma 4.3. In Assumption 4.2, please clarify what are S_b and S_r.

Review 4

Summary and Contributions: The paper considers the problem of robust covariance estimation. In the usual covariance estimation problem, given samples from d-dimensional Gaussian N(0, Sigma), the goal is to use as few samples as possible to recover the covariance matrix Sigma. In the robust version, an eps-fraction of the samples is arbitrarily corrupted and the goal is still to recover the covariance matrix Sigma. The error of an estimate Sigma’ is given by |Sigma^{-1/2} * Sigma’ * Sigma^{-1/2} - I|_F. The best possible error is O(eps) with O(d^2/eps^2) samples---which is tight---but the algorithm is not polynomial time. The previous best polynomial-time algorithm achieves O(eps log(1/eps)) with O(d^2/eps^2), with a bad eps-dependence of 1/eps^8 in the runtime. This paper removes the dependence on eps in the runtime while maintaining the O(eps log(1/eps)) error.

Strengths: Removal of 1/eps^8 in runtime while maintaining the best error achieved by polynomial-time algorithms.

Weaknesses: Techniques were developed in earlier works. This paper plugs the new technique (based on the QUE score) for mean estimation in [DHL19] into the previous result on covariance estimation [CDGW19]. [Previously I asked why it only worked for Gaussians with mean zero because I thought it should work for an arbitrary Gaussian. The author has confirmed this in their rebuttal.]

Correctness: The results seem correct to me though I didn't check all the details.

Clarity: The paper is largely well-written. I think the authors can do more highlighting on the difference from [CDGW19]. It seems that the sketch of Algorithm 1 is completely the same as in [CDGW19], even the guarantee of each step is the same. This should be made clearer. [I agree with the authors' rebuttal that the difference lies in the fine-tuning. I still think, in respect of writing, the authors are not doing enough to mention how much this paper inherits from [CDGW19]. ] Discussion on the running time of the approximate score oracle (Theorem D.1) should also be included in the main body. This is important and should not be swept under the carpet. In Line 11 of Algorithm 1, shouldn’t it \hat\Sigma_{T_2+1}?

Relation to Prior Work: The difference is the result is clearly discussed but the difference in approach was not thoroughly discussed. I think more highlight on the difference is needed.

Reproducibility: Yes

Additional Feedback: