{"title": "Using Manifold Stucture for Partially Labeled Classification", "book": "Advances in Neural Information Processing Systems", "page_first": 953, "page_last": 960, "abstract": null, "full_text": "Using Manifold Structure for Partially \n\nLabelled Classification \n\nMikhail B elkin \n\nUniversity of Chicago \n\nDepartment of Mathematics \nmisha@math .uchicago .edu \n\nPartha Niyogi \n\nUniversity of Chicago \n\nDepts of Computer Science and Statistics \n\nniyogi@cs.uchicago .edu \n\nAbstract \n\nWe consider the general problem of utilizing both labeled and un(cid:173)\nlabeled data to improve classification accuracy. Under t he assump(cid:173)\ntion that the data lie on a submanifold in a high dimensional space, \nwe develop an algorithmic framework to classify a partially labeled \ndata set in a principled manner . The central idea of our approach is \nthat classification functions are naturally defined only on t he sub(cid:173)\nmanifold in question rather than the total ambient space. Using the \nLaplace Beltrami operator one produces a basis for a Hilbert space \nof square integrable functions on the submanifold. To recover such \na basis , only unlab eled examples are required. Once a basis is ob(cid:173)\ntained , training can be performed using the labeled data set. Our \nalgorithm models the manifold using the adjacency graph for the \ndata and approximates the Laplace Beltrami operator by the graph \nLaplacian. Practical applications to image and text classification \nare considered. \n\n1 \n\nIntroduction \n\nIn many practical applications of data classification and data mining , one finds a \nwealth of easily available unlabeled examples , while collecting labeled examples can \nbe costly and time-consuming . Standard examples include object recognition in im(cid:173)\nages, speech recognition, classifying news articles by topic. In recent times , genetics \nhas also provided enormous amounts of readily accessible data. However, classi(cid:173)\nfication of this data involves experimentation and can be very resource intensive. \nConsequently it is of interest to develop algorithms that are able to utilize both \nlabeled and unlabeled data for classification and other purposes. Although the area \nof partially labeled classification is fairly new, a considerable amount of work has \nbeen done in that field since the early 90 's , see [2, 4, 7]. In this paper we address \nthe problem of classifying a partially labeled set by developing the ideas proposed \nin [1] for data representation. In particular , we exploit the intrinsic structure of \nthe data to improve classification with unlab eled examples under the assumption \n\n\fthat the data resides on a low-dimensional manifold within a high-dimensional rep(cid:173)\nresentation space. In some cases it seems to be a reasonable assumption that the \ndata lies on or close to a manifold. For example a handwritten digit 0 can be \nfairly accurately represented as an ellipse , which is completely determined by the \ncoordinates of its foci and the sum of the distances from the foci to any point. \nThus the space of ellipses is a five-dimensional manifold. An actual handwritten 0 \nwould require more parameters, but perhaps not more than 15 or 20. On the other \nhand the dimensionality of the ambient representation space is the number of pixels \nwhich is typically far higher. For other types of data the question of the manifold \nstructure seems significantly more involved. While there has been recent work on \nusing manifold structure for data representation ([6, 8]), the only other application \nto classification problems that we are aware of, was in [7] , where the authors use a \nrandom walk on the data adjacency graph for partially labeled classification. \n\n2 Why Manifold Structure IS Useful for Partially \n\nSupervised Learning \n\nTo provide a motivation for using a manifold structure, consider a simple synthetic \nexample shown in Figure l. The two classes consist of two parts of the curve shown \nin the first panel (row 1). We are given a few labeled points and 500 unlabeled \npoints shown in panels 2 and 3 respectively. The goal is to establish the identity of \nthe point labeled with a question mark. By observing the picture in panel 2 (row 1) \nwe see that we cannot confidently classify\"?\" by using the labeled examples alone. \nOn the other hand, the problem seems much more feasible given the unlabeled \ndata shown in panel 3. Since there is an underlying manifold, it seems clear at \nthe outset that the (geodesic) distances along the curve are more meaningful than \nEuclidean distances in the plane. Therefore rather than building classifiers defined \non the plane (lR 2) it seems preferable to have classifiers defined on the curve itself. \nEven though the data has an underlying manifold, the problem is still not quite \ntrivial since the two different parts of the curve come confusingly close to each \nother. There are many possible potential representations of the manifold and the \none provided by the curve itself is unsatisfactory. Ideally, we would like to have a \nrepresentation of the data which captures the fact that it is a closed curve. More \nspecifically, we would like an embedding of the curve where the coordinates vary \nas slowly as possible when one traverses the curve. Such an ideal representation \nis shown in the panel 4 (first panel of the second row). Note that both represent \nthe same underlying manifold structure but with different coordinate functions. It \nturns out (panel 6) that by taking a two-dimensional representation of the data \nwith Laplacian Eigenmaps [1] , we get very close to the desired embedding. Panel 5 \nshows the locations of labeled points in the new representation space. We see that \n\"?\" now falls squarely in the middle of \"+\" signs and can easily be identified as a \n\"+\". \nThis artificial example illustrates that recovering the manifold and developing clas(cid:173)\nsifiers on the manifold itself might give us an advantage in classification problems. \nTo recover the manifold, all we need is unlab eled data. The labeled data is then \nused to develop a classifier defined on this manifold. However we need a model for \nthe manifold to utilize this structure. The model used here is that of a weighted \ngraph whose vertices are data points. Two data points are connected with an edge if \n\n\f3 \n\n2 \n\nr--\" \n\n0 ~\\-) \n\n- 1 \n\n-2 \n\n-3 \n\n-2 \n\n2 \n\n'''',-) \n\n2 \n\n? 00 \na \n+8 \n\n- 1 \n\n- 1 \n\n-2 \n\n-3 \n\n-2 \n\n-2 \n\n+ \n\n& to \n0.5 0 \n\n-0.1 \n\n-3 \n\n-2 \n\n-1 \n\n-1 \n\n0 \n\n0 \n\n-0.5 \n\n?+-++- a C) \n\n s is an unlabeled point we put \n\n{\n\nCi = \n\nI , \n-1 , \n\nThis, of course, is just applying a linear classifier constructed in Step 3. If there are \nseveral classes , one-against-all classifiers compete using 2::~ =1 aj ej (i) as a confidence \nmeasure. \n\n4 Theoretical Interpretation \n\nLet M C \nII{ k be an n-dimensional compact Riemannian manifold isometrically \nembedded in II{ k for some k. Intuitively M can be thought of as an n-dimensional \n\"surface\" in II{ k. Riemannian structure on M induces a volume form that allows \nus to integrate functions defined on M. The square integrable functions form a \nHilbert space .c2(M). The Laplace-Beltrami operator 6.M (or just 6.) acts on \ntwice differentiable functions on M. There are three important points that are \nrelevant to our discussion here. \nThe Laplacian provides a basis on .c 2(M): \nIt can be shown (e.g. , [5]) that 6. is a self-adjoint positive semidefinite operator and \nthat its eigenfunctions form a basis for the Hilbert space .c 2(M) . The spectrum of 6. \nis discret e (provided M is compact) , with the smallest eigenvalue 0 corresponding \nto the constant eigenfunction. Therefore any f E .c 2(M) can be written as f(x) = \n2::~o ai ei(x) , where ei are eigenfunctions, 6. ei = Ai ei. \n\nThe simplest nontrivial example is a circle Sl. 6. S1 f( \u00a2) \n\n- d'li,