{"title": "MLLE: Modified Locally Linear Embedding Using Multiple Weights", "book": "Advances in Neural Information Processing Systems", "page_first": 1593, "page_last": 1600, "abstract": null, "full_text": "MLLE: Modi\ufb01ed Locally Linear Embedding Using\n\nMultiple Weights\n\nZhenyue Zhang\n\nDepartment of Mathematics\n\nZhejiang University, Yuquan Campus,\n\nHangzhou, 310027, P. R. China\n\nzyzhang@zju.edu.cn\n\nCollege of Information Science and Engineering\n\nJing Wang\n\nHuaqiao University\n\nQuanzhou, 362021, P. R. China\n\nDep. of Mathematics, Zhejiang University\n\nwroaring@yahoo.com.cn\n\nAbstract\n\nThe locally linear embedding (LLE) is improved by introducing multiple linearly\nindependent local weight vectors for each neighborhood. We characterize the\nreconstruction weights and show the existence of the linearly independent weight\nvectors at each neighborhood. The modi\ufb01ed locally linear embedding (MLLE)\nproposed in this paper is much stable. It can retrieve the ideal embedding if MLLE\nis applied on data points sampled from an isometric manifold. MLLE is also\ncompared with the local tangent space alignment (LTSA). Numerical examples\nare given that show the improvement and ef\ufb01ciency of MLLE.\n\n1\n\nIntroduction\n\nThe problem of nonlinear dimensionality reduction is to \ufb01nd the meaningful low-dimensional struc-\nture hidden in high dimensional data. Recently, there have been advances in developing effective\nand ef\ufb01cient algorithms to perform nonlinear dimension reduction which include isometric mapping\nIsomap [7], locally linear embedding (LLE) [5] and its variations, manifold charting [2], Hessian\nLLE [1] and local tangent space alignment (LTSA) [9]. All these algorithms cover two common\nsteps: learn the local geometry around each data point and nonlinearly map the high dimensional\ndata points into a lower dimensional space using the learned local information [3]. The performances\nof these algorithms, however, are different both in learning local information and in constructing\nglobal embedding, though each of them solves an eigenvalue problem eventually. The effectiveness\nof the local geometry retrieved determines the ef\ufb01ciency of the methods.\n\nThis paper will focus on the reconstruction weights that characterize intrinsic geometric properties\nof each neighborhood in LLE [5]. LLE has many applications such as image classi\ufb01cation, image\nrecognition, spectra reconstruction and data visualization because of its simple geometric intuitions,\nstraightforward implementation, and global optimization [6, 11]. It is however also reported that\nLLE may be not stable and may produce distorted embedding if the manifold dimension is larger\nthan one. One of the curses that make LLE fail is that the local geometry exploited by the reconstruc-\ntion weights is not well-determined, since the constrained least squares (LS) problem involved for\ndetermining the local weights may be ill-conditioned. A Tikhonov regularization is generally used\nfor the ill conditions LS problem. However, a regularized solution may be not a good approximation\nto the exact solution if the regularization parameter is not suitably selected.\n\nThe purpose of this paper is to improve LLE by making use of multiple local weight vectors. We will\nshow the existence of linearly independent weight vectors that are approximately optimal. The local\ngeometric structure determined by multiple weight vectors is much stable and hence can be used to\nimprove the standard LLE. The modi\ufb01ed LLE named as MLLE uses multiple weight vectors for each\npoint in reconstruction of lower dimensional embedding. It can stably retrieve the ideal isometric\n\n\f||y0||=2.6706e\u22125\n\n105\n\nr\no\nr\nr\ne\n\n100\n\n105\n\n100\n\n10\u22125\n\nr\no\nr\nr\ne\n\n||y0||=8.5272e\u22124\n\n103\n\n100\n\nr\no\nr\nr\ne\n\n10\u22125\n\n||y0||=1.6107\n\n10\u22125\n\n10\u221220\n\n10\u221210\n\n\u03b3\n\n10\u221210\n\n10\u221220\n\n100\n\n10\u221210\n\n\u03b3\n\n10\u221210\n\n10\u221220\n\n100\n\n10\u221210\n\n\u03b3\n\n100\n\nFigure 1: Examples of (cid:1)w(\u03b3) \u2212 w\u2217(cid:1) (solid line) and (cid:1)w(\u03b3) \u2212 u(cid:1) (dotted line) for swiss-roll data.\n\nembedding approximately for an isometric manifold. MLLE has properties similar to LTSA both\nin measuring linear dependence of neighborhood and in constructing the (sparse) matrix whose\nsmallest eigenvectors form the wanted lower dimensional embedding. It exploits the tight relations\nbetween LLE/MLLE and LTSA. Numerical examples given in this paper show the improvement and\nef\ufb01ciency of MLLE.\n\n2 The Local Combination Weights\nLet {x1, . . . , xN} be a given data set of N points in Rm. LLE constructs locally linear structures\nat each point xi by representing xi using its selected neighbor set Ni = {xj, j \u2208 Ji}. The optimal\ncombination weights are determined by solving the constrained least squares problem\n\nmin(cid:1)xi \u2212\n\nwjixj(cid:1),\n\n(2.1)\nOnce all the reconstruction weights {wji, j \u2208 Ji}, i = 1,\u00b7\u00b7\u00b7 , N, are computed, LLE maps the set\n{x1, . . . , xN} to {t1, . . . , tN} in a lower dimensional space Rd (d < m) that preserves the local\ncombination properties totally,\n\nj\u2208Ji\n\nj\u2208Ji\n\ns.t.\n\nwji = 1.\n\n(cid:1)\n\n(cid:1)\n\n(cid:1)\n\ni\n\n(cid:1)\n\nj\u2208Ji\n\nmin\n\nT =[t1,...,tN ]\n\n(cid:1)ti \u2212\n\nwjitj(cid:1)2,\n\ns.t. T T T = I.\n\nThe low dimensional embedding T constructed by LLE tightly depends on the local weights. To\nformulate the weight vector wi consisting of the local weights wji, j \u2208 Ji, let us denote matrix\nerror as xi \u2212(cid:2)\nGi = [. . . , xj \u2212 xi, . . .]j\u2208Ji. Using the constraint\nwji = 1, we can write the combination\n\nj\u2208Ji\nwjixj = Giwi and hence (2.1) reads\n\nj\u2208Ji\n\n(cid:2)\n\nmin(cid:1)Giw(cid:1),\n\ns.t. wT 1ki\n\n= 1,\n\nwhere 1ki denotes the ki-dimensional vector of all 1\u2019s. Theoretically, a null vector of Gi that is not\northogonal to 1ki can be normalized to be a weight vector as required. Otherwise, a weight vector is\ngiven by wi = yi/1T\ni Giy = 1ki [6]. Indeed, one can\nki\nformulate the solution using the singular value decomposition (SVD) of Gi.\n\nyi with yi a solution to the linear system GT\n\nTheorem 2.1 Let G be a given matrix of k column vectors. Denote by y0 the orthogonal projection\nof 1k onto the null space of G and y1 = (GT G)+1k.1 Then the vector\n\n(cid:3)\n\ny\u2217 =\n\ny0,\ny1,\n\ny0 (cid:3)= 0\ny0 = 0\n\nw\u2217 = y\u2217\nk y\u2217 ,\n1T\nw=1 (cid:1)Gw(cid:1).\n\nk\n\nis an optimal solution to min1T\nThe problem of solving min1T w=1 (cid:1)Gw(cid:1) is not stable if GT G is singular (has zero eigenvalues) or\nnearly singular (has relative small eigenvalues). To regularize the problem, it is suggested in [5] to\nsolve the regularized linear system replaced\n(GT G + \u03b3(cid:1)G(cid:1)2\n\nF I)y = 1k, w = y/1T\nk y\n\n(2.3)\n\n1(\u00b7)+ denotes the Moore-Penrose generalized inverse of a matrix.\n\n(2.2)\n\n\f||X\u2212Y(1)||=1.277\n\n||X\u2212Y(2)||=0.24936\n\n||X\u2212Y(3)||=0.39941\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n0.5\n\n1\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n0.5\n\n1\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n0.5\n\n1\n\nFigure 2: A 2D data set (\u25e6-points) and computed coordinates (dot points) by LLE using different\nsets of optimal weight vectors (left two panels) or regularization weight vectors (right panel).\n\nk y(\u03b3) converges to w\u2217\n\n1T y0 eventually. Note that u and w\u2217\n\nwith a small positive \u03b3. Let y(\u03b3) be the unique solution to the regularized linear system. One can\nas \u03b3 \u2192 0. However, the convergence behavior of\nprove that w(\u03b3) = y(\u03b3)/1T\nw(\u03b3) is quite uncertain for small \u03b3 > 0. In fact, if y0 (cid:3)= 0 is small, then w(\u03b3) tends to u = y1\n1T y1 at\n\ufb01rst and then turns to the limit value w\u2217 = y0\nare orthogonal each\nother. In Figure 1, we plot three examples of the error curves (cid:1)w(\u03b3)\u2212w\u2217(cid:1) (solid line) and (cid:1)w(\u03b3)\u2212u(cid:1)\n(dotted line) with different values of (cid:1)y0(cid:1) for the swiss-roll data. The left two panels show the\nmetaphase phenomenon clearly, where (cid:1)y0(cid:1) \u22480. Therefore, w\u2217\ncan not be well approximated by\nw(\u03b3) if \u03b3 is not small enough. This partially explains the instability of LLE.\nOther factor that results in the instability of LLE is that the learned linear structure by using single\nweight vector at each point is brittle. LLE may give a wrong embedding even if all weight vector is\nwell approximated in a high accuracy. It is imaginable if Gi is rank reducible since multiple optimal\nweight vectors exist in that case. Figure 2 shows a small example of N = 20 two-dimensional points\nfor which LLE fails even if exact optimal weight vectors are used. We plot three sets of computed 2D\nembeddings T (j) (within an optimal af\ufb01ne transformation to the ideal X) by LLE with k = 4 using\ntwo sets of exact optimal weight vectors and one set of weight vectors that solve the regularized\nequations, respectively. The errors (cid:1)X \u2212 Y (j)(cid:1) = minc,L (cid:1)X \u2212 (c1T + LT (j))(cid:1) between the ideal\nset X and the computed sets within optimal af\ufb01ne transformation are large in the example.\nThe uncertainty of w(\u03b3) with small \u03b3 occurs because of existence of small singular values of G.\nFortunately, it also implies the existence of multiple almost optimal weight vectors simultaneously.\nIndeed, if G has s \u2264 k small singular values, then there are s approximately optimal weight vectors\nthat are linear independent on each others. The following theorem characterizes construction of the\napproximately optimal weight vectors w((cid:1)) using the matrix V of left singular vectors correspond-\ning to the s smallest singular values and bounds the combination errors (cid:1)Gw((cid:1))(cid:1) in terms of the\nminimum of (cid:1)Gw(cid:1) and the largest one of the s smallest singular values.\nTheorem 2.2 Let G \u2208 Rm\u00d7k and \u03c31(G) \u2265 . . . \u2265 \u03c3k(G) be the singular values of G. Denote\n\nw((cid:1)) = (1 \u2212 \u03b1)w\u2217 + V H(:, (cid:4)),\n\n(cid:4) = 1,\u00b7\u00b7\u00b7 , s,\n\nwhere V is the eigenvector matrix of G corresponding to the s smallest right singular values, \u03b1 =\n1\u221a\ns\n\n(cid:1)V T 1k(cid:1), and H is a Householder matrix that satis\ufb01es HV T 1k = \u03b11s.Then\n\n(cid:1)Gw((cid:1))(cid:1) \u2264 (cid:1)Gw\u2217(cid:1) + \u03c3k\u2212s+1(G).\n\n(2.4)\nThe Householder matrix is symmetric and orthogonal. It is given by H = I \u2212 2hhT with vector\nh \u2208 Rs de\ufb01ned as follows. Let h0 = \u03b11s \u2212 V T 1k. If h0 = 0, then h = 0. Otherwise, h = h0(cid:4)h0(cid:4) .\nNote that (cid:1)w\u2217(cid:1) can be very large when G is approximately singular. In that case, (1 \u2212 \u03b1)w\u2217\ndomi-\nnates w((cid:1)) and hence w(1), . . . , w(s) are almost same and numerically linear dependent each others.\nEquivalently, W = [w(1), . . . , w(s)] has large condition number cond(W ) = \u03c3max(W )\n\u03c3min(W ) . For numer-\nical stability, we replace w\u2217\nby a regularized weight vector w(\u03b3) like in LLE. This modi\ufb01cation\nis quite practical in application and, more importantly, it can reinforce the numerically linear inde-\npendence of {w((cid:1))}. In our experiment, the construction of the {w((cid:1))} is stable with respect to the\nchoice of \u03b3. We show an estimation of the condition number cond(W ) for the modi\ufb01ed W below.\n\n\fTheorem 2.3 Let W = (1 \u2212 \u03b1)w(\u03b3)1T\n\ns + V H. Then cond(W ) \u2264 (1 +\n\n\u221a\n\nk(1 \u2212 \u03b1)(cid:1)w(\u03b3)(cid:1))2.\n\n3 MLLE: Modi\ufb01ed locally linear embedding\n\nIt is justi\ufb01able to learn the local structure by multiple optimal weight vectors at each point, rather\nthan a single one. Though the exact optimal weight vector may be unique, multiple approximately\noptimal weight vectors exist by Theorem 2.2. We will use these weight vectors to determine an\nimproved and more stable embedding. Below we show the details of the modi\ufb01ed locally linear\nembedding using multiple local weight vectors.\nConsider the neighbor set of xi with ki neighbors. Assume that the \ufb01rst ri singular values of Gi are\nlarge compared with the remaining si = ki \u2212 ri singular values. (We will discuss how to choose it\nlater.) Let w(1)\n\n, . . . , w(si)\n\ni\n\nbe si \u2264 k linearly independent weight vectors,\ni = (1 \u2212 \u03b1i)wi(\u03b3) + ViHi(:, (cid:4)),\nw((cid:1))\n\ni\n\nHere wi(\u03b3) is the regularized solution de\ufb01ned in (2.2) with G = Gi, Vi is the matrix of Gi cor-\nresponding to the si smallest right singular values, \u03b1i = 1\u221a\ni 1ki, and Hi is a\nsi\nHouseholder matrix that satis\ufb01es HiV T\nWe look for a d-dimensional embedding {t1, . . . , tN}, that minimizes the embedding cost function\n\n= \u03b1i1si.\n\ni 1ki\n\n(cid:4) = 1,\u00b7\u00b7\u00b7 , si.\n(cid:1)vi(cid:1) with vi = V T\n\nE(T ) =\n\n(cid:1)ti \u2212\n\nji tj(cid:1)2\nw((cid:1))\n\n(3.5)\n\nN(cid:1)\n\nsi(cid:1)\n\ni=1\n\n(cid:1)=1\n\n(cid:1)\n\nj\u2208Ji\n\nwith the constraint T T T = I. Denote by Wi = (1 \u2212 \u03b1i)wi(\u03b3)1T\nand let \u02c6Wi \u2208 RN\u00d7si be the embedded matrix of Wi into the N-dimensional space such that\n\n+ ViHi the local weight matrix\n\nsi\n\n, \u02c6W (j, :) = 0,\n\nj /\u2208 Ii = Ji \u222a {i}.\n\nThe cost function (3.5) can be rewritten as\n\n\u02c6Wi(Ji, :) = Wi,\n(cid:1)\n\nE(T ) =\n\n\u02c6W (i, :) = \u22121T\n(cid:1)\n\nsi\n\n(cid:1)T \u02c6Wi(cid:1)2\n\nF = Tr(T\n\ni\n\ni\n\n(cid:2)\n\n\u02c6Wi\n\n\u02c6W T\n\ni T T ) = Tr(T \u03a6T T ),\n\n(3.6)\n\nwhere \u03a6 =\nd eigenvectors of \u03a6 corresponding to the 2nd to d + 1st smallest eigenvalues.\n\ni . The minimizer of E(T ) is given by the matrix T = [u2, . . . , ud+1]T of the\n\ni\n\n\u02c6Wi\n\n\u02c6W T\n\n3.1 Determination of number si of approximation optimal weight vectors\nObviously, si should be selected such that \u03c3ki\u2212si+1(Gi) is relatively small.\nIn general, if the\ndata points are sampled from a d-dimensional manifold and the neighbor set is well selected, then\n\u03c3d(Gi) (cid:11) \u03c3d+1(Gi). So si can be any integer satisfying si \u2264 ki \u2212 d, and si = ki \u2212 d is the\nbest choice. However because of noise and that the neighborhood is possibly not well selected,\n\u03c3d+1(Gi) may be not relatively small. It makes sense to choose si as large as possible if the ratio\nki\u2212si+1+\u00b7\u00b7\u00b7+\u03bb(i)\n\u03bb(i)\ni Gi. There is a trade\nki\n1 +\u00b7\u00b7\u00b7+\u03bb(i)\n\u03bb(i)\nki\u2212si\n(cid:5)\nbetween the number of weight vectors and the approximation to (cid:1)Giw\u2217\ni (cid:1). We suggest\n\nj (Gi) are the eigenvalues of GT\n\nis small, where \u03bb(i)\n\nj = \u03c32\n\n(cid:4)\n\nsi = max\n\n(cid:1)\n\n(cid:4) \u2264 ki \u2212 d,\n\n< \u03b7\n\n,\n\n(3.7)\n\n(cid:2)ki\n(cid:2)ki\u2212(cid:1)\n\nj=ki\u2212(cid:1)+1 \u03bb(i)\n\nj\n\nj=1 \u03bb(i)\n\nj\n\nfor a given \u03b7 < 1 that is a threshold error. Here d can be over estimated to be d(cid:5) > d.\nObviously, si depends on the parameter \u03b7 monotonically. The smaller \u03b7 is, the smaller si is, and\nof course, the smaller the combination errors for the weight vectors used are. We use an adaptive\nj , i = 1, . . . , N, and reorder {\u03c1i}\nstrategy to set \u03b7 as follows. Let \u03c1i =\nas \u03c1\u03c01 \u2264 . . . \u2264 \u03c1\u03c0N . Then we set \u03b7 to be the middle term of {\u03c1i}, \u03b7 = \u03c1\u03c0(cid:1)N/2(cid:2), where (cid:12)N/2(cid:13) is\nthe nearest integer of N/2 towards in\ufb01nity. In general, if the manifold near xi is \ufb02oat or has small\n\nj=d+1 \u03bb(i)\nj /\n\n(cid:2)ki\n\n(cid:2)d\n\nj=1 \u03bb(i)\n\n\fcurvatures and the neighbors are well selected, \u03c1i is smaller than \u03b7 and si = k \u2212 d. For those\nneighbor sets with large local curvatures, \u03c1i > \u03b7and si < ki \u2212 d. So less number of weight vectors\nare used in constructing the local linear structures and the combination errors decrease.\n\nWe summarize the Modi\ufb01ed Locally linear Embedding (MLLE) algorithm as follows.\n\nAlgorithm MLLE (Modi\ufb01ed Locally linear Embedding).\n\n1. For each i = 1,\u00b7\u00b7\u00b7 , N,\n\n1.1 Determine a neighbor set Ni = {xj, j \u2208 Ji} of xi, i /\u2208 Ji.\n1.2 Compute the regularized solution wi(\u03b3) by (2.3) with a small \u03b3 > 0.\n1.3 Compute the eigenvalues \u03bb(i)\n1 , . . . , \u03bb(i)\nj=1 \u03bb(i)\nj .\n\nki and eigenvectors v(i)\n\nj=d+1 \u03bb(i)\nj /\n} in increasing order and set \u03b7 = \u03c1\u03c0(cid:1)N/2(cid:2).\n\n(cid:2)ki\n\n(cid:2)d\n\n\u03c1i =\n\n1 , . . . , v(i)\n\n2. Sort {\u03c1i} to be {\u03c1\u03c0i\n3. For each i = 1,\u00b7\u00b7\u00b7 , N,\n\nki of GT\n\ni Gi. Set\n\n3.1 Set si by (3.7) and set Vi = [v(i)\n3.2 Construct \u03a6 by using Wi = wi(\u03b3)1T\nsi\n\nki\u2212si+1, . . . , v(i)\n\nki\n\n+ Vi.\n\n], \u03b1i = (cid:1)1T\n\nki\n\nVi(cid:1).\n\n4. Compute the d + 1 smallest eigenvectors of \u03a6 and pick up the eigenvector matrix corre-\n\nsponding to the 2nd to d + 1st smallest eigenvalues, and set T = [u2, . . . , ud+1]T .\n\nThe computational cost of MLLE is almost the same as that of LLE. The additional \ufb02ops of MLLE\ni ) and totally O(k3N) with k = maxi ki.\nfor computing the eigendecomposition of GT\nNote that the most computationally expensive steps in both LLE and MLLE are the neighborhood\nselection and the computation of the d + 1 eigenvectors of the alignment matrix \u03a6 corresponding to\nsmall eigenvalues. They cost O(mN 2) and O(dN 2), respectively. Because k (cid:14) N, the additional\ncost of MLLE is ignorable.\n\ni Gi is O(k3\n\n4 An analysis of MLLE for isometric manifolds\nConsider the application of MLLE on an isometric manifold M = f(\u2126) with open set \u2126 \u2282 Rd and\nsmooth function f. Assume that {xi} are sampled from M, xi = f(\u03c4i), i = 1, . . . , N. We have\n\n(cid:1)\n\n(cid:1)\n\n(cid:1)xi \u2212\n\nwjixj(cid:1) = (cid:1)\u03c4i \u2212\n\nwji\u03c4j(cid:1) + O(\u03b52\ni ),\n\n(4.8)\n\nw((cid:1))\n\ni\n\nj\u2208Ji\n\nj\u2208Ji\n\nSo we have that (cid:1)xi \u2212 (cid:2)\n, we have (cid:1)xi \u2212(cid:2)\n(cid:1)\u03c4i \u2212(cid:2)\nji \u03c4j(cid:1) \u2248 \u03c3ki\u2212si+1(Gi) +O (\u03b52\nN(cid:1)\nsi(cid:1)\nw((cid:1))\n\nj\u2208Ji\njixj(cid:1) = O(\u03b52\nw\u2217\nj\u2208Ji\nji xj(cid:1) \u2248 \u03c3ki\u2212si+1(Gi) +O (\u03b52\nw((cid:1))\n(cid:1)\n\ndue to the isometry of f. If ki > d, then the optimal reconstruction error of \u03c4i should be zero.\ni ). For the approximately optimal weight vectors\ni ). Inversely, if follows from (4.8) that\ni ). Therefore, denoting T \u2217 = [\u03c41, . . . , \u03c4N ], we have\n\nw((cid:1))\nj\u2208Ji\n, i.e., T \u2217 = LU and U U T = I, since L = T \u2217U T \u2208 Rd\u00d7d, we have\nFor the orthogonalized U of T \u2217\nthat \u03c3d(L) = \u03c3d(T \u2217) and E(U) \u2264 E(T \u2217)/\u03c32\nki\u2212si+1(Gi) is very small generally.\nSo E(U) is always small and approximately achieves the minimum. Roughly speaking, MLLE can\nretrieve the isometric embedding.\n\nd(T \u2217). Note that \u03c32\n\nki\u2212si+1(Gi) + O(max\n\nji \u03c4j \u2212 \u03c4i(cid:1)2 \u2264 N(cid:1)\n\nj\u2208Ji\nE(T \u2217) =\n\ni ).\n\u03b52\n\nsi\u03c32\n\ni=1\n\n(cid:1)=1\n\ni=1\n\n(cid:1)\n\ni\n\n5 Comparison to LTSA\n\nMLLE has similar properties similar to those of LTSA. In this section, we compare MLLE and\nLTSA in the linear dependence of neighbors and alignment matrices. For simplicity, we assume that\nri = d, i.e., ki \u2212 d weight vectors are used in MLLE for each neighbor set.\n\n\f5.1 Linear dependence of neighbors.\n\nThe total combination error\n\n\u0001M LLE(Ni) =\n\nki\u2212d(cid:1)\n\n(cid:1)\n\n(cid:1)\n\nj\u2208Ji\n\n(cid:1)=1\n\nji xj \u2212 xi(cid:1)2 = (cid:1)GiWi(cid:1)2\nw((cid:1))\n(cid:2)\n\nF\n\nof xi can be a measure of the linear dependence of the neighborhood Ni. To compare it with the\nmeasure of linear dependence de\ufb01ned by LTSA, we denote by \u00afxi = 1|Ii|\nxj the mean of\nmembers in the whole neighbors of xi including xi itself, and \u00afXi = [. . . , xj \u2212 \u00afxi, . . .]j\u2208Ii. It can\n\u02dcWi with \u02dcWi = \u02c6Wi(Ii, :). So \u0001M LLE(Ni) =(cid:1) \u00afXi\nbe veri\ufb01ed that GiWi = \u00afXi\n(cid:1)\nIn LTSA, the linear dependence of Ni is measured by the total errors\n\nj\u2208Ii\n\u02dcWi(cid:1)2\nF .\n\n\u0001LT SA(Ni) =\n\n(cid:1)xj \u2212 \u00afxi \u2212 Qi\u03b8(i)\n\nj (cid:1)2 = (cid:1) \u00afXi \u2212 Qi\u0398i(cid:1)2\n\nF = (cid:1) \u00afXi\n\n\u02dcVi(cid:1)2\nF ,\n\nwhere \u02dcVi is the matrix consists of the right singular vectors of \u00afXi corresponding to ki \u2212 d smallest\nsingular values. The MLLE-measure \u0001M LLE and the LTSA-measure \u0001LT SA of neighborhood linear\ndependence are similar,\n\nj\u2208Ii\n\n\u0001M LLE(Ni) =(cid:1) \u00afXi\n\u0001LT SA(Ni) = (cid:1) \u00afXi\n\n\u02dcWi(cid:1)2\nF ,\n\u02dcVi(cid:1)2\nF = min\n\n(cid:1) \u00afXi \u02dcw((cid:1))\n\ni (cid:1) \u2248min , (cid:4) \u2264 ki \u2212 d,\n(cid:1) \u00afXiZ(cid:1)2\nF .\n\nZT Z=I\n\n5.2 Alignment matrices.\nBoth MLLE and LTSA minimize a trace function of an alignment matrix \u03a6 to obtain an embedding,\nminT T T =I trace(T \u03a6T T ). The alignment matrix can be written in the same form\n\nN(cid:1)\n\ni=1\n\n\u03a6 =\n\nSi\u03a6iST\ni ,\n\ni\n\n\u02dcW T\n\n= \u02dcWi\n\nwhere Si is a selection matrix consisting of the columns j \u2208 Ii of the large identity matrix of order\nN. In LTSA, the local matrix \u03a6i is given by the orthogonal projection, i.e. \u03a6LT SA\n, see\ni . It is interesting that the range space of \u02dcWi span( \u02dcWi) and the\n[10]. For MLLE, \u03a6M LLE\nrange space span( \u02dcVi) of \u02dcVi are tightly close each other if the reconstruction error of xi is small. The\nfollowing theorem gives an upper bound of the closeness using the distance dist( \u02dcWi, \u02dcVi) between\nspan( \u02dcWi) and span( \u02dcVi) that denotes the largest angle between the two subspaces.\n(See [4] for\ndiscussion about distance of subspaces.)\nTheorem 5.1 Let Gi = [\u00b7\u00b7\u00b7 , xj \u2212 xi,\u00b7\u00b7\u00b7]j\u2208Ji. Then dist( \u02dcWi, \u02dcVi) \u2264 (cid:4)GiWi(cid:4)\n\n= \u02dcVi\n\n\u02dcV T\n\ni\n\ni\n\n\u03c3d( \u02dcWi)\u03c3d( \u00afXi) .\n\n6 Experimental Results.\n\nIn this section, we present several numerical examples to illustrate the performance of MLLE algo-\nrithm. The test data sets include simulated date sets and real world examples.\n\nFirst, we compare Isomap, LLE, LTSA, and MLLE on the Swiss roll with a hole. The data points\ngenerated from a rectangle with a missing rectangle strip punched out of the center and then the\nresulting Swiss roll is not convex. We run these four algorithms with k = 10. In the top middle of\nFigure 3, we plot the computed coordinates by Isomap, and there is a dilation of the missing region\nand a warp on the rest of the embedding. As seen in the top right of Figure 3, there is a strong\ndistortion on the computed coordinates by LLE. As we have shown in the bottom of Figure 3, LTSA\nand MLLE perform well.\nWe now compare MLLE and LTSA for a 2D manifold with 3 peaks embedded in 3D space. We\ngenerate N = 1225 3D-points xi = [ti, si, h(ti, si)]T , where ti and si are uniformly distributed in\n(cid:7)\nthe interval [\u22121.5, 1.5] and h(t, s) is de\ufb01ned by\n(t\u22120.5)2+(s\u22120.5)2\n\n(1+t)2+s2\n\nt2+(s+1)2\n\n(cid:6)\n\n(cid:7)\n\n(cid:7)\n\n(cid:6)\n\n(cid:6)\n\n\u2212 e\u221210\n\n.\n\nh(t, s) = e\u221210\n\n\u2212 e\u221210\n\n\fOriginal\n\nISOMAP\n\nLLE\n\n15\n\n10\n\n5\n\n0\n\n\u22125\n\n\u221210\n\n\u221215\n30\n\n20\n\n10\n\n0\n\n\u221210\n\n20\n\n10\n\n0\n\n\u221210\n\n0\n\n10\n\n\u221220\n\n\u221260 \u221240 \u221220\n\n0\n\n20\n\n40\n\nGenerating Coordinates\n\nLTSA\n\n20\n\n15\n\n10\n\n5\n\n0.05\n\n0\n\n3\n\n2\n\n1\n\n0\n\n\u22121\n\n\u22122\n\n\u22123\n\n\u22122\n\n2\n\n1\n\n0\n\n\u22121\n\n0\n\n2\n\nMLLE\n\n0\n\n0\n\n50\n\n100\n\n\u22120.05\n\n\u22120.05\n\n0\n\n0.05\n\n\u22122\n\n\u22122\n\n0\n\n2\n\nFigure 3: Left column: Swiss-roll data and generating coordinates with a missing rectangle. Middle\ncolumn: computed results by Isomap and LTSA. Right column: results of LLE and MLLE.\n\n0\n\n\u22122\n\n\u22122\n\n0\n\n2\n\n0.1\n\n0.05\n\n0\n\n\u22120.05\n\n\u22120.1\n\n\u22120.1\n\nLTSA\n\n\u22120.05\n\n0\n\n0.05\n\n0.1\n\nGenerating parameter\n\nMLLE\n\n2\n\n1\n\n0\n\n\u22121\n\n\u22122\n\n\u22121\n\n0\n\n1\n\n2\n\n\u22122\n\n\u22121\n\n0\n\n1\n\n2\n\n1\n\n0\n\n\u22121\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\n\u22121\n\n\u22121.5\n\n\u22122\n\nFigure 4: Left column:Plots of the 3-peak data and the generating coordinates. Right column:\nResults of LTSA and MLLE.\n\nSee the left of Figure 4 for the data points and the generating parameters. It is easy to show that the\nmanifold parameterized by f(t, s) = [t, s, h(t, s)]T is approximately isometric since the Jacobian\nJf (t, s) is orthonormal approximately. In the right of Figure 4, we plot the computed coordinates\nby LTSA and MLLE with k = 12. The deformations of the computed coordinates by LTSA near\nthe peaks are prominent because the curvature of the 3-peak manifold varies very much. This bias\ncan be reduced by the modi\ufb01ed curvature model of LTSA proposed in [8]. MLLE can recover the\ngenerating parameter perfectly up to an af\ufb01ne transformation.\nNext, we consider a data set containing N = 4400 handwritten digits (\u20192\u2019-\u20195\u2019) with 1100 examples\nof each class. The gray scale images of handwritten numerals are at 16\u00d716 resolution and converted\nm = 256 dimensional vectors2. The data points are mapped into a 2-dimensional space using LLE\nand MLLE respectively. These experiments are shown in Figure 5. It is clear that MLLE performs\nmuch better than LLE. Most of the digit classes (digits \u20192\u2019-\u20195\u2019 are marked by \u2019\u25e6\u2019, \u2019(cid:16)\u2019, \u2019(cid:12)\u2019 and \u2019(cid:17)\u2019\nrespectively) are well clustered in the resulting embedding of MLLE.\n\nFinally, we consider application of MLLE and LLE on the real data set of 698 face images with\nvariations of two pose parameters (left-right and up-down) and one lighting parameter. The image\nsize is 64-by-64 pixel, and each image is converted to an m = 4096 dimensional vector. We apply\nMLLE with k = 14 and d = 3 on the data set. The \ufb01rst two coordinates of MLLE are plotted in\nthe middle of Figure 6. We also extract four paths along the boundaries of the set of the \ufb01rst two\ncoordinates, and display the corresponding images along each path. These components appear to\ncapture well the pose and lighting variations in a continuous way.\n\nReferences\n\n[1] D. Donoho and C. Grimes. Hessian Eigenmaps: new tools for nonlinear dimensionality reduc-\n\ntion. Proceedings of National Academy of Science, 5591-5596, 2003\n\n2The data set can be downloaded at http://www.cs.toronto.edu/ roweis/data.html.\n\n\fLLE\n\nMLLE\n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\n\u22121\n\n\u22122\n\n\u22123\n\n\u22124\n\n\u22125\n\u22121.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\n3\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\n\u22121\n\n\u22121.5\n\n\u22121.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\nFigure 5: Embedding results of N = 4400 handwritten digits by LLE(left) and MLLE(right).\n\nFigure 6: Images of faces mapped into the embedding described by the \ufb01rst two coordinates of\nMLLE, using the parameters k = 14 and d = 3.\n\n[2] M. Brand. Charting a manifold. Advances in Neural Information Processing Systems, 15, MIT\n\nPress, 2003\n\n[3] Jihun Ham, Daniel D. Lee, Sebastian Mika, Bernhard Scholkopf. A kernel view of the dimen-\n\nsionality reduction of manifolds. International Conference On Machine Learning 21, 2004.\n\n[4] G. H. Golub and C. F Van Loan. Matrix Computations. Johns Hopkins University Press,\n\nBaltimore, Maryland, 3nd edition, 1996.\n\n[5] S. Roweis and L Saul. Nonlinear dimensionality reduction by locally linear embedding. Sci-\n\nence, 290: 2323\u20132326, 2000.\n\n[6] L. Saul and S. Roweis. Think globally, \ufb01t locally: unsupervised learning of nonlinear mani-\n\nfolds. Journal of Machine Learning Research, 4:119-155, 2003.\n\n[7] J Tenenbaum, V. De Silva and J. Langford. A global geometric framework for nonlinear\n\ndimension reduction. Science, 290:2319\u20132323, 2000\n\n[8] J. Wang, Z. Zhang and H. Zha. Adaptive Manifold Learning. Advances in Neural Information\nProcessing Systems 17, edited by Lawrence K. Saul and Yair Weiss and L\u00b4eon Bottou, MIT\nPress, Cambridge, MA, pp.1473-1480, 2005.\n\n[9] Z. Zhang and H. Zha. Principal Manifolds and Nonlinear Dimensionality Reduction via Tan-\n\ngent Space Alignment. SIAM J. Scienti\ufb01c Computing, 26(1):313\u2013338, 2004.\n\n[10] H. Zha and Z. Zhang. Spectral Analysis of Alignment in Manifold Learning. Submitted, 2006.\n[11] M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios, and N. Koudas Non-Linear Di-\nmensionality Reduction Techniques for Classi\ufb01cation and Visualization Proc. Eighth ACM\nSIGKDD Int\u2019l Conf. Knowledge Discovery and Data Mining, July 2002.\n\n\f", "award": [], "sourceid": 3132, "authors": [{"given_name": "Zhenyue", "family_name": "Zhang", "institution": null}, {"given_name": "Jing", "family_name": "Wang", "institution": null}]}