Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)
Mikhail Belkin, Partha Niyogi
We consider the general problem of utilizing both labeled and un(cid:173) labeled data to improve classification accuracy. Under t he assump(cid:173) tion that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner . The central idea of our approach is that classification functions are naturally defined only on t he sub(cid:173) manifold in question rather than the total ambient space. Using the Laplace Beltrami operator one produces a basis for a Hilbert space of square integrable functions on the submanifold. To recover such a basis , only unlab eled examples are required. Once a basis is ob(cid:173) tained , training can be performed using the labeled data set. Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace Beltrami operator by the graph Laplacian. Practical applications to image and text classification are considered.