On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

Part of Advances in Neural Information Processing Systems 19 (NIPS 2006)

Bibtex Metadata Paper


Hariharan Narayanan, Mikhail Belkin, Partha Niyogi


One of the intuitions underlying many graph-based methods for clustering and semi-supervised learning, is that class or cluster boundaries pass through areas of low probability density. In this paper we provide some formal analysis of that notion for a probability distribution. We introduce a notion of weighted boundary volume, which measures the length of the class/cluster boundary weighted by the density of the underlying probability distribution. We show that sizes of the cuts of certain commonly used data adjacency graphs converge to this continuous weighted volume of the boundary.

keywords: Clustering, Semi-Supervised Learning