Part of Advances in Neural Information Processing Systems 19 (NIPS 2006)
Hariharan Narayanan, Mikhail Belkin, Partha Niyogi
One of the intuitions underlying many graph-based methods for clustering and semi-supervised learning, is that class or cluster boundaries pass through areas of low probability density. In this paper we provide some formal analysis of that notion for a probability distribution. We introduce a notion of weighted boundary volume, which measures the length of the class/cluster boundary weighted by the density of the underlying probability distribution. We show that sizes of the cuts of certain commonly used data adjacency graphs converge to this continuous weighted volume of the boundary.
keywords: Clustering, Semi-Supervised Learning