Part of Advances in Neural Information Processing Systems 30 (NIPS 2017)
Xiaohan Wei, Stanislav Minsker
We propose and analyze a new estimator of the covariance matrix that admits strong theoretical guarantees under weak assumptions on the underlying distribution, such as existence of moments of only low order. While estimation of covariance matrices corresponding to sub-Gaussian distributions is well-understood, much less in known in the case of heavy-tailed data. As K. Balasubramanian and M. Yuan write,
data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. In such cases, it is not clear that those covariance matrix estimators .. remain optimal'' and..what are the other possible strategies to deal with heavy tailed distributions warrant further studies.'' We make a step towards answering this question and prove tight deviation inequalities for the proposed estimator that depend only on the parameters controlling the ``intrinsic dimension'' associated to the covariance matrix (as opposed to the dimension of the ambient space); in particular, our results are applicable in the case of high-dimensional observations.