Part of Advances in Neural Information Processing Systems 26 (NIPS 2013)
Sivaraman Balakrishnan, Srivatsan Narayanan, Alessandro Rinaldo, Aarti Singh, Larry Wasserman
We investigate the problem of estimating the cluster tree for a density $f$ supported on or near a smooth $d$-dimensional manifold $M$ isometrically embedded in $\mathbb{R}^D$. We study a $k$-nearest neighbor based algorithm recently proposed by Chaudhuri and Dasgupta. Under mild assumptions on $f$ and $M$, we obtain rates of convergence that depend on $d$ only but not on the ambient dimension $D$. We also provide a sample complexity lower bound for a natural class of clustering algorithms that use $D$-dimensional neighborhoods.