{"title": "Maximum Likelihood Estimation of Intrinsic Dimension", "book": "Advances in Neural Information Processing Systems", "page_first": 777, "page_last": 784, "abstract": null, "full_text": "Maximum Likelihood Estimation of\n\nIntrinsic Dimension\n\nElizaveta Levina\n\nDepartment of Statistics\nUniversity of Michigan\n\nAnn Arbor MI 48109-1092\n\nelevina@umich.edu\n\nPeter J. Bickel\n\nDepartment of Statistics\nUniversity of California\nBerkeley CA 94720-3860\n\nbickel@stat.berkeley.edu\n\nAbstract\n\nWe propose a new method for estimating intrinsic dimension of a\ndataset derived by applying the principle of maximum likelihood to\nthe distances between close neighbors. We derive the estimator by\na Poisson process approximation, assess its bias and variance theo-\nretically and by simulations, and apply it to a number of simulated\nand real datasets. We also show it has the best overall performance\ncompared with two other intrinsic dimension estimators.\n\n1\n\nIntroduction\n\nThere is a consensus in the high-dimensional data analysis community that the only\nreason any methods work in very high dimensions is that, in fact, the data are not\ntruly high-dimensional. Rather, they are embedded in a high-dimensional space,\nbut can be e\u2013ciently summarized in a space of a much lower dimension, such as a\nnonlinear manifold. Then one can reduce dimension without losing much informa-\ntion for many types of real-life high-dimensional data, such as images, and avoid\nmany of the \\curses of dimensionality\". Learning these data manifolds can improve\nperformance in classi\ufb02cation and other applications, but if the data structure is\ncomplex and nonlinear, dimensionality reduction can be a hard problem.\n\nTraditional methods for dimensionality reduction include principal component anal-\nysis (PCA), which only deals with linear projections of the data, and multidimen-\nsional scaling (MDS), which aims at preserving pairwise distances and traditionally\nis used for visualizing data. Recently, there has been a surge of interest in manifold\nprojection methods (Locally Linear Embedding (LLE) [1], Isomap [2], Laplacian\nand Hessian Eigenmaps [3, 4], and others), which focus on \ufb02nding a nonlinear\nlow-dimensional embedding of high-dimensional data. So far, these methods have\nmostly been used for exploratory tasks such as visualization, but they have also\nbeen successfully applied to classi\ufb02cation problems [5, 6].\n\nThe dimension of the embedding is a key parameter for manifold projection meth-\nods: if the dimension is too small, important data features are \\collapsed\" onto the\nsame dimension, and if the dimension is too large, the projections become noisy\nand, in some cases, unstable. There is no consensus, however, on how this dimen-\nsion should be determined. LLE [1] and its variants assume the manifold dimension\n\n\fis provided by the user. Isomap [2] provides error curves that can be \\eyeballed\" to\nestimate dimension. The charting algorithm, a recent LLE variant [7], uses a heuris-\ntic estimate of dimension which is essentially equivalent to the regression estimator\nof [8] discussed below. Constructing a reliable estimator of intrinsic dimension and\nunderstanding its statistical properties will clearly facilitate further applications of\nmanifold projection methods and improve their performance.\n\nWe note that for applications such as classi\ufb02cation, cross-validation is in principle\nthe simplest solution { just pick the dimension which gives the lowest classi\ufb02ca-\ntion error. However, in practice the computational cost of cross-validating for the\ndimension is prohibitive, and an estimate of the intrinsic dimension will still be\nhelpful, either to be used directly or to narrow down the range for cross-validation.\n\nIn this paper, we present a new estimator of intrinsic dimension, study its statistical\nproperties, and compare it to other estimators on both simulated and real datasets.\nSection 2 reviews previous work on intrinsic dimension. In Section 3 we derive the\nestimator and give its approximate asymptotic bias and variance. Section 4 presents\nresults on datasets and compares our estimator to two other estimators of intrinsic\ndimension. Section 5 concludes with discussion.\n\n2 Previous Work on Intrinsic Dimension Estimation\n\nThe existing approaches to estimating the intrinsic dimension can be roughly di-\nvided into two groups: eigenvalue or projection methods, and geometric methods.\nEigenvalue methods, from the early proposal of [9] to a recent variant [10] are based\non a global or local PCA, with intrinsic dimension determined by the number of\neigenvalues greater than a given threshold. Global PCA methods fail on nonlinear\nmanifolds, and local methods depend heavily on the precise choice of local regions\nand thresholds [11]. The eigenvalue methods may be a good tool for exploratory\ndata analysis, where one might plot the eigenvalues and look for a clear-cut bound-\nary, but not for providing reliable estimates of intrinsic dimension.\n\nThe geometric methods exploit the intrinsic geometry of the dataset and are most\noften based on fractal dimensions or nearest neighbor (NN) distances. Perhaps the\nmost popular fractal dimension is the correlation dimension [12, 13]: given a set\nSn = fx1; : : : ; xng in a metric space, de\ufb02ne\n\nCn(r) =\n\n2\n\nn(n \u00a1 1)\n\nnXi=1\n\nnXj=i+1\n\n1fkxi \u00a1 xjk < rg:\n\n(1)\n\nThe correlation dimension is then estimated by plotting log Cn(r) against log r and\nestimating the slope of the linear part [12]. A recent variant [13] proposed plotting\nthis estimate against the true dimension for some simulated data and then using\nthis calibrating curve to estimate the dimension of a new dataset. This requires a\ndi\ufb01erent curve for each n, and the choice of calibration data may a\ufb01ect performance.\nThe capacity dimension and packing numbers have also been used [14]. While\nthe fractal methods successfully exploit certain geometric aspects of the data, the\nstatistical properties of these methods have not been studied.\n\nThe correlation dimension (1) implicitly uses NN distances, and there are methods\nthat focus on them explicitly. The use of NN distances relies on the following fact: if\nX1; : : : ; Xn are an independent identically distributed (i.i.d.) sample from a density\nf (x) in Rm, and Tk(x) is the Euclidean distance from a \ufb02xed point x to its k-th\nNN in the sample, then\n\nk\nn \u2026 f (x)V (m)[Tk(x)]m;\n\n(2)\n\n\fon log k over a suitable range of k, where \u201eTk = n\u00a11Pn\n\nwhere V (m) = \u2026m=2[\u00a1(m=2 + 1)]\u00a11 is the volume of the unit sphere in Rm. That is,\nthe proportion of sample points falling into a ball around x is roughly f (x) times\nthe volume of the ball.\nThe relationship (2) can be used to estimate the dimension by regressing log \u201eTk\ni=1 Tk(Xi) is the average of\ndistances from each point to its k-th NN [8, 11]. A comparison of this method to\na local eigenvalue method [11] found that the NN method su\ufb01ered more from un-\nderestimating dimension for high-dimensional datasets, but the eigenvalue method\nwas sensitive to noise and parameter settings. A more sophisticated NN approach\nwas recently proposed in [15], where the dimension is estimated from the length of\nthe minimal spanning tree on the geodesic NN distances computed by Isomap.\n\nWhile there are certainly existing methods available for estimating intrinsic dimen-\nsion, there are some issues that have not been adequately addressed. The behavior\nof the estimators as a function of sample size and dimension is not well understood\nor studied beyond the obvious \\curse of dimensionality\"; the statistical properties\nof the estimators, such as bias and variance, have not been looked at (with the\nexception of [15]); and comparisons between methods are not always presented.\n\n3 A Maximum Likelihood Estimator of Intrinsic Dimension\n\nHere we derive the maximum likelihood estimator (MLE) of the dimension m from\ni.i.d. observations X1; : : : ; Xn in Rp. The observations represent an embedding of a\nlower-dimensional sample, i.e., Xi = g(Yi), where Yi are sampled from an unknown\nsmooth density f on Rm, with unknown m \u2022 p, and g is a continuous and su\u2013ciently\nsmooth (but not necessarily globally isometric) mapping. This assumption ensures\nthat close neighbors in Rm are mapped to close neighbors in the embedding.\nThe basic idea is to \ufb02x a point x, assume f (x) \u2026 const in a small sphere Sx(R) of\nradius R around x, and treat the observations as a homogeneous Poisson process in\nSx(R). Consider the inhomogeneous process fN (t; x); 0 \u2022 t \u2022 Rg,\n\nN (t; x) =\n\n1fXi 2 Sx(t)g\n\n(3)\n\nwhich counts observations within distance t from x. Approximating this binomial\n(\ufb02xed n) process by a Poisson process and suppressing the dependence on x for\nnow, we can write the rate \u201a(t) of the process N (t) as\n\u201a(t) = f (x)V (m)mtm\u00a11\n\n(4)\nThis follows immediately from the Poisson process properties since V (m)mtm\u00a11 =\nd\ndt [V (m)tm] is the surface area of the sphere Sx(t). Letting (cid:181) = log f (x), we can\nwrite the log-likelihood of the observed process N (t) as (see e.g., [16])\n\nnXi=1\n\nThis is an exponential family for which MLEs exist with probability ! 1 as n ! 1\nand are unique. The MLEs must satisfy the likelihood equations\n\n\u201a(t) dt\n\nlog \u201a(t) dN (t) \u00a1Z R\n\n0\n\n0\n\nL(m; (cid:181)) =Z R\ndN (t) \u00a1Z R\n= Z R\n= (cid:181) 1\nV (m)\u00b6 N (R) +Z R\n\u00a1e(cid:181)V (m)Rm(cid:181)log R +\n\nV 0(m)\n\nm\n\n+\n\n0\n\n0\n\n@L\n@(cid:181)\n\n@L\n@m\n\nlog t dN (t) \u00a1\n\n0\n\nV 0(m)\n\nV (m)\u00b6 = 0:\n\n\u201a(t)dt = N (R) \u00a1 e(cid:181)V (m)Rm = 0;\n\n(5)\n\n(6)\n\n\fIn practice, it may be more convenient to \ufb02x the number of neighbors k rather than\nthe radius of the sphere R. Then the estimate in (7) becomes\n\n\u00a11\n\n:\n\n(7)\n\n:\n\n(8)\n\nSubstituting (5) into (6) gives the MLE for m:\n\nN (R;x)Xj=1\n\nN (R; x)\n\n1\n\n^mR(x) =24\n^mk(x) =24 1\n\nlog\n\nk\u00a11Xj=1\n\nk \u00a1 1\n\nR\n\nlog\n\nTj(x)35\nTj(x)35\n\nTk(x)\n\n\u00a11\n\nNote that we omit the last (zero) term in the sum in (7). One could divide by\nk \u00a1 2 rather than k \u00a1 1 to make the estimator asymptotically unbiased, as we show\nbelow. Also note that the MLE of (cid:181) can be used to obtain an instant estimate of\nthe entropy of f , which was also provided by the method used in [15].\n\nFor some applications, one may want to evaluate local dimension estimates at every\ndata point, or average estimated dimensions within data clusters. We will, however,\nassume that all the data points come from the same \\manifold\", and therefore\naverage over all observations.\n\nThe choice of k clearly a\ufb01ects the estimate. It can be the case that a dataset has\ndi\ufb01erent intrinsic dimensions at di\ufb01erent scales, e.g., a line with noise added to it\ncan be viewed as either 1-d or 2-d (this is discussed in detail in [14]). In such a\ncase, it is informative to have di\ufb01erent estimates at di\ufb01erent scales.\nIn general,\nfor our estimator to work well the sphere should be small and contain su\u2013ciently\nmany points, and we have work in progress on choosing such a k automatically.\nFor this paper, though, we simply average over a range of small to moderate values\nk = k1 : : : k2 to get the \ufb02nal estimates\n\n^mk =\n\n1\nn\n\nnXi=1\n\n^mk(Xi) ;\n\n^m =\n\n1\n\nk2 \u00a1 k1 + 1\n\nk2Xk=k1\n\n^mk :\n\n(9)\n\nThe choice of k1 and k2 and behavior of ^mk as a function of k are discussed further\nin Section 4. The only parameters to set for this method are k1 and k2, and the\ncomputational cost is essentially the cost of \ufb02nding k2 nearest neighbors for every\npoint, which has to be done for most manifold projection methods anyway.\n\n3.1 Asymptotic behavior of the estimator for m \ufb02xed, n ! 1.\nHere we give a sketchy discussion of the asymptotic bias and variance of our estima-\ntor, to be elaborated elsewhere. The computations here are under the assumption\nthat m is \ufb02xed, n ! 1; k ! 1, and k=n ! 0.\nAs we remarked, for a given x if n ! 1 and R ! 0, the inhomogeneous binomial\nprocess N (t; x) in (3) converges weakly to the inhomogeneous Poisson process with\nrate \u201a(t) given by (4). If we condition on the distance Tk(x) and assume the Poisson\n\napproximation is exact, then 'm\u00a11 log(Tk=Tj) : 1 \u2022 j \u2022 k \u00a1 1\u201c are distributed as\nthe order statistics of a sample of size k\u00a11 from a standard exponential distribution.\nHence U = m\u00a11Pk\u00a11\nj=1 log(Tk=Tj) has a Gamma(k\u00a1 1; 1) distribution, and EU \u00a11 =\n1=(k \u00a1 2). If we use k \u00a1 2 to normalize, then under these assumptions, to a \ufb02rst\norder approximation\n\nE ( ^mk(x)) = m; Var ( ^mk(x)) =\n\nm2\nk \u00a1 3\n\n(10)\n\n\fAs this analysis is asymptotic in both k and n, the factor (k \u00a1 1)=(k \u00a1 2) makes\nno di\ufb01erence. There are, of course, higher order terms since N (t; x) is in fact a\n\nbinomial process with EN (t; x) = \u201a(t)\u00a11 + O(t2)\u00a2, where O(t2) depends on m.\n\nWith approximations (10), we have E ^m = E ^mk = m, but the computation of\nVar( ^m) is complicated by the dependence among ^mk(Xi). We have a heuristic\nargument (omitted for lack of space) that, by dividing ^mk(Xi) into n=k roughly\nindependent groups of size k each, the variance can be shown to be of order n\u00a11,\nas it would if the estimators were independent. Our simulations con\ufb02rm that this\napproximation is reasonable { for instance, for m-d Gaussians the ratio of the theo-\nretical SD = C(k1; k2)m=pn (where C(k1; k2) is calculated as if all the terms in (9)\nwere independent) to the actual SD of ^m was between 0:7 and 1:3 for the range of\nvalues of m and n considered in Section 4. The bias, however, behaves worse than\nthe asymptotics predict, as we discuss further in Section 5.\n\n4 Numerical Results\n\n(a)\n\n(b)\n\nn=2000\nn=1000\nn=500\nn=200\n\n7\n\n6.5\n\n6\n\n5.5\n\n5\n\n4.5\n\n4\n\n3.5\n\nm=20\nm=10\nm=5\nm=2\n\n25\n\n20\n\n15\n\n10\n\n5\n\nk\n\nm\ne\n\n \n\nt\n\na\nm\n\ni\nt\ns\ne\n\n \n\ni\n\nn\no\ns\nn\ne\nm\nD\n\ni\n\nk\n\n \n\nm\ne\na\nm\n\nt\n\ni\nt\ns\ne\n\n \n\ni\n\nn\no\ns\nn\ne\nm\nD\n\ni\n\n3\n\n0\n\n10\n\n20\n\n30\n\n40\n\n50\nk\n\n60\n\n70\n\n80\n\n90\n\n100\n\n0\n\n10\n\n20\n\n30\n\n40\n\n60\n\n70\n\n80\n\n90\n\n100\n\n50\nk\n\nFigure 1: The estimator ^mk as a function of k. (a) 5-dimensional normal for several\nsample sizes. (b) Various m-dimensional normals with sample size n = 1000.\n\nWe \ufb02rst investigate the properties of our estimator in detail by simulations, and\nthen apply it to real datasets. The \ufb02rst issue is the behavior of ^mk as a function\nof k. The results shown in Fig. 1 are for m-d Gaussians Nm(0; I), and a similar\npattern holds for observations in a unit cube, on a hypersphere, and on the popular\n\\Swiss roll\" manifold. Fig. 1(a) shows ^mk for a 5-d Gaussian as a function of k for\nseveral sample sizes n. For very small k the approximation does not work yet and\n^mk is unreasonably high, but for k as small as 10, the estimate is near the true value\nm = 5. The estimate shows some negative bias for large k, which decreases with\ngrowing sample size n, and, as Fig. 1(b) shows, increases with dimension. Note,\nhowever, that it is the intrinsic dimension m rather than the embedding dimension\np \u201a m that matters; and as our examples below and many examples elsewhere\nshow, the intrinsic dimension for real data is frequently low.\n\nThe plots in Fig. 1 show that the \\ideal\" range k1 : : : k2 is di\ufb01erent for every com-\nbination of m and n, but the estimator is fairly stable as a function of k, apart\nfrom the \ufb02rst few values. While \ufb02ne-tuning the range k1 : : : k2 for di\ufb01erent n is\npossible and would reduce the bias, for simplicity and reproducibility of our results\nwe \ufb02x k1 = 10, k2 = 20 throughout this paper. In this range, the estimates are not\n\n\fa\ufb01ected much by sample size or the positive bias for very small k, at least for the\nrange of m and n under consideration.\n\nNext, we investigate an important and often overlooked issue of what happens when\nthe data are near a manifold as opposed to exactly on a manifold. Fig. 2(a) shows\nsimulation results for a 5-d correlated Gaussian with mean 0, and covariance matrix\n[(cid:190)ij] = [\u2030 + (1 \u00a1 \u2030)\u2013ij], with \u2013ij = 1fi = jg. As \u2030 changes from 0 to 1, the dimension\nchanges from 5 (full spherical Gaussian) to 1 (a line in R5), with intermediate values\nof \u2030 providing noisy versions.\n\n(a)\n\n(b)\n\ni\n\nn\no\ns\nn\ne\nm\nd\n\ni\n\n \nf\n\n \n\no\nE\nL\nM\n\n6\n\n5.5\n\n5\n\n4.5\n\n4\n\n3.5\n\n3\n\n2.5\n\n2\n\n1.5\n\n1\n 0\n\nn=2000\nn=1000\nn=500\nn=100\n\nMLE\nRegression\nCorr.dim.\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\ni\n\nn\no\ns\nn\ne\nm\nd\n\ni\n\n \n\nt\n\nd\ne\na\nm\n\ni\nt\ns\nE\n\n10\u22124\n\n10\u22123\n10\u22122\n1\u2212r (log scale)\n\n10\u22121\n\n 1\n\n0\n\n0\n\n5\n\n10\n\n15\n\nTrue dimension\n\n20\n\n25\n\n30\n\nFigure 2: (a) Data near a manifold: estimated dimension for correlated 5-d normal\nas a function of 1 \u00a1 \u2030.\n(b) The MLE, regression, and correlation dimension for\nuniform distributions on spheres with n = 1000. The three lines for each method\nshow the mean \u00a72 SD (95% con\ufb02dence intervals) over 1000 replications.\nThe plots in Fig. 2(a) show that the MLE of dimension does not drop unless \u2030\nis very close to 1, so the estimate is not a\ufb01ected by whether the data cloud is\nspherical or elongated. For \u2030 close to 1, when the dimension really drops, the\nestimate depends signi\ufb02cantly on the sample size, which is to be expected: n = 100\nhighly correlated points look like a line, but n = 2000 points \ufb02ll out the space\naround the line. This highlights the fundamental dependence of intrinsic dimension\non the neighborhood scale, particularly when the data may be observed with noise.\nThe MLE of dimension, while re(cid:176)ecting this dependence, behaves reasonably and\nrobustly as a function of both \u2030 and n.\n\nA comparison of the MLE, the regression estimator (regressing log T k on log k),\nand the correlation dimension is shown in Fig. 2(b). The comparison is shown\non uniformly distributed points on the surface of an m\u00a1dimensional sphere, but\na similar pattern held in all our simulations. The regression range was held at\nk = 10 : : : 20 (the same as the MLE) for fair comparison, and the regression for\ncorrelation dimension was based on the \ufb02rst 10 : : : 100 distinct values of log Cn(r),\nto re(cid:176)ect the fact there are many more points for the log Cn(r) regression than for\nthe log T k regression. We found in general that the correlation dimension graph can\nhave more than one linear part, and is more sensitive to the choice of range than\neither the MLE or the regression estimator, but we tried to set the parameters for\nall methods in a way that does not give an unfair advantage to any and is easily\nreproducible.\n\nThe comparison shows that, while all methods su\ufb01er from negative bias for higher\ndimensions, the correlation dimension has the smallest bias, with the MLE coming\n\n\fin close second. However, the variance of correlation dimension is much higher\nthan that of the MLE (the SD is at least 10 times higher for all dimensions). The\nregression estimator, on the other hand, has relatively low variance (though always\nhigher than the MLE) but the largest negative bias. On the balance of bias and\nvariance, MLE is clearly the best choice.\n\nFigure 3: Two image datasets: hand rotation and Isomap faces (example images).\n\nTable 1: Estimated dimensions for popular manifold datasets. For the Swiss roll,\nthe table gives mean(SD) over 1000 uniform samples.\n\nDataset\nSwiss roll\nFaces\nHands\n\nData dim.\n\nSample size MLE\n\n3\n\n64 \u00a3 64\n480 \u00a3 512\n\n1000\n698\n481\n\n2.1(0.02)\n4.3\n3.1\n\nRegression Corr. dim.\n1.8(0.03)\n4.0\n2.5\n\n2.0(0.24)\n3.5\n3.91\n\nFinally, we compare the estimators on three popular manifold datasets (Table 1):\nthe Swiss roll, and two image datasets shown on Fig. 3: the Isomap face database2,\nand the hand rotation sequence3 used in [14]. For the Swiss roll, the MLE again\nprovides the best combination of bias and variance.\n\nThe face database consists of images of an arti\ufb02cial face under three changing con-\nditions:\nillumination, and vertical and horizontal orientation. Hence the intrinsic\ndimension of the dataset should be 3, but only if we had the full 3-d images of the\nface. All we have, however, are 2-d projections of the face, and it is clear that one\nneeds more than one \\basis\" image to represent di\ufb01erent poses (from casual inspec-\ntion, front view and pro\ufb02le seem su\u2013cient). The estimated dimension of about 4 is\ntherefore very reasonable.\n\nThe hand image data is a real video sequence of a hand rotating along a 1-d curve in\nspace, but again several basis 2-d images are needed to represent di\ufb01erent poses (in\nthis case, front, back, and pro\ufb02le seem su\u2013cient). The estimated dimension around\n3 therefore seems reasonable. We note that the correlation dimension provides two\ncompletely di\ufb01erent answers for this dataset, depending on which linear part of the\ncurve is used; this is further evidence of its high variance, which makes it a less\nreliable estimate that the MLE.\n\n5 Discussion\n\nIn this paper, we have derived a maximum likelihood estimator of intrinsic dimen-\nsion and some asymptotic approximations to its bias and variance. We have shown\n\n1This estimate is obtained from the range 500...1000. For this dataset, the correlation\ndimension curve has two distinct linear parts, with the \ufb02rst part over the range we would\nnormally use, 10...100, producing dimension 19.7, which is clearly unreasonable.\n\n2http://isomap.stanford.edu/datasets.html\n3http://vasc.ri.cmu.edu//idb/html/motion/hand/index.html\n\n\fthat the MLE produces good results on a range of simulated and real datasets\nand outperforms two other dimension estimators. It does, however, su\ufb01er from a\nnegative bias for high dimensions, which is a problem shared by all dimension esti-\nmators. One reason for this is that our approximation is based on su\u2013ciently many\nobservations falling into a small sphere, and that requires very large sample sizes in\nhigh dimensions (we shall elaborate and quantify this further elsewhere). For some\ndatasets, such as points in a unit cube, there is also the issue of edge e\ufb01ects, which\ngenerally become more severe in high dimensions. One can potentially reduce the\nnegative bias by removing the edge points by some criterion, but we found that\nthe edge e\ufb01ects are small compared to the sample size problem, and we have been\nunable to achieve signi\ufb02cant improvement in this manner. Another option used by\n[13] is calibration on simulated datasets with known dimension, but since the bias\ndepends on the sampling distribution, and a di\ufb01erent curve would be needed for\nevery sample size, calibration does not solve the problem either. One should keep in\nmind, however, that for most interesting applications intrinsic dimension will not be\nvery high { otherwise there is not much bene\ufb02t in dimensionality reduction; hence\nin practice the MLE will provide a good estimate of dimension most of the time.\n\nReferences\n\n[1] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear\n\nembedding. Science, 290:2323{2326, 2000.\n\n[2] J. B. Tenenbaum, V. de Silva, and J. C. Landford. A global geometric framework for\n\nnonlinear dimensionality reduction. Science, 290:2319{2323, 2000.\n\n[3] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding\n\nand clustering. In Advances in NIPS, volume 14. MIT Press, 2002.\n\n[4] D. L. Donoho and C. Grimes. Hessian eigenmaps: New locally linear embedding\ntechniques for high-dimensional data. Technical Report TR 2003-08, Department of\nStatistics, Stanford University, 2003.\n\n[5] M. Belkin and P. Niyogi. Using manifold structure for partially labelled classi\ufb02cation.\n\nIn Advances in NIPS, volume 15. MIT Press, 2003.\n\n[6] M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios, and N. Koudas. Non-linear\ndimensionality reduction techniques for classi\ufb02cation and visualization. In Proceedings\nof 8th SIGKDD, pages 645{651. Edmonton, Canada, 2002.\n\n[7] M. Brand. Charting a manifold. In Advances in NIPS, volume 14. MIT Press, 2002.\n\n[8] K.W. Pettis, T.A. Bailey, A.K. Jain, and R.C. Dubes. An intrinsic dimensionality\n\nestimator from near-neighbor information. IEEE Trans. on PAMI, 1:25{37, 1979.\n\n[9] K. Fukunaga and D.R. Olsen. An algorithm for \ufb02nding intrinsic dimensionality of\n\ndata. IEEE Trans. on Computers, C-20:176{183, 1971.\n\n[10] J. Bruske and G. Sommer. Intrinsic dimensionality estimation with optimally topology\n\npreserving maps. IEEE Trans. on PAMI, 20(5):572{575, 1998.\n\n[11] P. Verveer and R. Duin. An evaluation of intrinsic dimensionality estimators. IEEE\n\nTrans. on PAMI, 17(1):81{86, 1995.\n\n[12] P. Grassberger and I. Procaccia. Measuring the strangeness of strange attractors.\n\nPhysica, D9:189{208, 1983.\n\n[13] F. Camastra and A. Vinciarelli. Estimating the intrinsic dimension of data with a\n\nfractal-based approach. IEEE Trans. on PAMI, 24(10):1404{1407, 2002.\n\n[14] B. Kegl. Intrinsic dimension estimation using packing numbers. In Advances in NIPS,\n\nvolume 14. MIT Press, 2002.\n\n[15] J. Costa and A. O. Hero. Geodesic entropic graphs for dimension and entropy esti-\n\nmation in manifold learning. IEEE Trans. on Signal Processing, 2004. To appear.\n\n[16] D. L. Snyder. Random Point Processes. Wiley, New York, 1975.\n\n\f", "award": [], "sourceid": 2577, "authors": [{"given_name": "Elizaveta", "family_name": "Levina", "institution": null}, {"given_name": "Peter", "family_name": "Bickel", "institution": null}]}