Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track

*Gleb Novikov, David Steurer, Stefan Tiegel*

We study the problem of robustly estimating the mean or location parameter without moment assumptions.Known computationally efficient algorithms rely on strong distributional assumptions, such as sub-Gaussianity, or (certifiably) bounded moments.Moreover, the guarantees that they achieve in the heavy-tailed setting are weaker than those for sub-Gaussian distributions with known covariance.In this work, we show that such a tradeoff, between error guarantees and heavy-tails, is not necessary for symmetric distributions.We show that for a large class of symmetric distributions, the same error as in the Gaussian setting can be achieved efficiently.The distributions we study include products of arbitrary symmetric one-dimensional distributions, such as product Cauchy distributions, as well as elliptical distributions, a vast generalization of the Gaussian distribution.For product distributions and elliptical distributions with known scatter (covariance) matrix, we show that given an $\varepsilon$-corrupted sample, we can with probability at least $1-\delta$ estimate its location up to error $O(\varepsilon \sqrt{\log(1/\varepsilon)})$ using $\tfrac{d\log(d) + \log(1/\delta)}{\varepsilon^2 \log(1/\varepsilon)}$ samples.This result matches the best-known guarantees for the Gaussian distribution and known SQ lower bounds (up to the $\log(d)$ factor).For elliptical distributions with unknown scatter (covariance) matrix, we propose a sequence of efficient algorithms that approaches this optimal error.Specifically, for every $k \in \mathbb{N}$, we design an estimator using time and samples $\tilde{O}({d^k})$ achieving error $O(\varepsilon^{1-\frac{1}{2k}})$.This matches the error and running time guarantees when assuming certifiably bounded moments of order up to $k$.For unknown covariance, such error bounds of $o(\sqrt{\varepsilon})$ are not even known for (general) sub-Gaussian distributions.Our algorithms are based on a generalization of the well-known filtering technique [DK22].More specifically, we show how this machinery can be combined with Huber-loss-based techniques to work with projections of the noise that behave more nicely than the initial noise.Moreover, we show how sum-of-squares proofs can be used to obtain algorithmic guarantees even for distributions without a first moment.We believe that this approach may find other applications in future works.

Do not remove: This comment is monitored to verify that the site is working properly