Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track
Ilias Diakonikolas, Daniel Kane, Sushrut Karmalkar, Ankit Pensia, Thanasis Pittas
We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter α∈(0,1/2), we are given m points in Rn, ⌊αm⌋ of which are i.i.d. samples from a distribution D with unknown k-sparse mean μ. No assumptions are made on the remaining points, which form the majority of the dataset. The goal is to return a small list of candidates containing a vector ˆμ such that ‖ is small. Prior work had studied the problem of list-decodable mean estimation in the dense setting. In this work, we develop a novel, conceptually simpler technique for list-decodable mean estimation. As the main application of our approach, we provide the first sample and computationally efficient algorithm for list-decodable sparse mean estimation. In particular, for distributions with certifiably bounded'' t-th moments in k-sparse directions and sufficiently light tails, our algorithm achieves error of (1/\alpha)^{O(1/t)} with sample complexity m = (k\log(n))^{O(t)}/\alpha and running time \mathrm{poly}(mn^t). For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee \Theta (\sqrt{\log(1/\alpha)}) with quasi-polynomial complexity. We complement our upper bounds with nearly-matching statistical query and low-degree polynomial testing lower bounds.