Part of Advances in Neural Information Processing Systems 27 (NIPS 2014)
Ananda Theertha Suresh, Alon Orlitsky, Jayadev Acharya, Ashkan Jafarpour
Many important distributions are high dimensional, and often they can be modeled as Gaussian mixtures. We derive the first sample-efficient polynomial-time estimator for high-dimensional spherical Gaussian mixtures. Based on intuitive spectral reasoning, it approximates mixtures of $k$ spherical Gaussians in $d$-dimensions to within$\ell_1$ distance $\epsilon$ using $\mathcal{O}({dk^9(\log^2 d)}/{\epsilon^4})$ samples and $\mathcal{O}_{k,\epsilon}(d^3\log^5 d)$ computation time. Conversely, we show that any estimator requires $\Omega\bigl({dk}/{\epsilon^2}\bigr)$ samples, hence the algorithm's sample complexity is nearly optimal in the dimension. The implied time-complexity factor \mathcal{O}_{k,\epsilon}$ is exponential in $k$, but much smaller than previously known. We also construct a simple estimator for one-dimensional Gaussian mixtures that uses $\tilde\mathcal{O}(k /\epsilon^2)$ samples and $\tilde\mathcal{O}((k/\epsilon)^{3k+1})$ computation time.