Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

Part of Advances in Neural Information Processing Systems 22 (NIPS 2009)

Bibtex »Metadata »Paper »


Boaz Nadler, Nathan Srebro, Xueyuan Zhou


We study the behavior of the popular Laplacian Regularization method for Semi-Supervised Learning at the regime of a fixed number of labeled points but a large number of unlabeled points. We show that in $\R^d$, $d \geq 2$, the method is actually not well-posed, and as the number of unlabeled points increases the solution degenerates to a noninformative function. We also contrast the method with the Laplacian Eigenvector method, and discuss the ``smoothness assumptions associated with this alternate method.