Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)
Matthew Brand
We construct a nonlinear mapping from a high-dimensional sample space to a low-dimensional vector space, effectively recovering a Cartesian coordinate system for the manifold from which the data is sampled. The mapping preserves local geometric relations in the manifold and is pseudo-invertible. We show how to estimate the intrinsic dimensionality of the manifold from samples, decompose the sample data into locally linear low-dimensional patches, merge these patches into a single low- dimensional coordinate system, and compute forward and reverse map- pings between the sample and coordinate spaces. The objective functions are convex and their solutions are given in closed form.
1 Nonlinear dimensionality reduction (NLDR) by charting
Charting is the problem of assigning a low-dimensional coordinate system to data points in a high-dimensional sample space. It is presumed that the data lies on or near a low- dimensional manifold embedded in the sample space, and that there exists a 1-to-1 smooth nonlinear transform between the manifold and a low-dimensional vector space. The data- modeler’s goal is to estimate smooth continuous mappings between the sample and co- ordinate spaces. Often this analysis will shed light on the intrinsic variables of the data- generating phenomenon, for example, revealing perceptual or configuration spaces.
Our goal is to find a mapping—expressed as a kernel-based mixture of linear projections— that minimizes information loss about the density and relative locations of sample points. This constraint is expressed in a posterior that combines a standard gaussian mixture model (GMM) likelihood function with a prior that penalizes uncertainty due to inconsistent pro- jections in the mixture. Section 3 develops a special case where this posterior is unimodal and maximizable in closed form, yielding a GMM whose covariances reveal a patchwork of overlapping locally linear subspaces that cover the manifold. Section 4 shows that for this (or any) GMM and a choice of reduced dimension d, there is a unique, closed-form solution for a minimally distorting merger of the subspaces into a d-dimensional coordinate space, as well as an reverse mapping defining the surface of the manifold in the sample space. The intrinsic dimensionality d of the data manifold can be estimated from the growth pro- cess of point-to-point distances. In analogy to differential geometry, we call the subspaces “charts” and their merger the “connection.” Section 5 considers example problems where these methods are used to untie knots, unroll and untwist sheets, and visualize video data. 1.1 Background Topology-neutral NLDR algorithms can be divided into those that compute mappings, and
those that directly compute low-dimensional embeddings. The field has its roots in map- ping algorithms: DeMers and Cottrell [3] proposed using auto-encoding neural networks with a hidden layer “ bottleneck,” effectively casting dimensionality reduction as a com- pression problem. Hastie defined principal curves [ 5] as nonparametric 1D curves that pass through the center of “ nearby” data points. A rich literature has grown up around properly regularizing this approach and extending it to surfaces. Smola and colleagues [10] analyzed the NLDR problem in the broader framework of regularized quantization methods.
More recent advances aim for embeddings: Gomes and Mojsilovic [4] treat manifold com- pletion as an anisotropic diffusion problem, iteratively expanding points until they connect to their neighbors. The ISOMAP algorithm [12] represents remote distances as sums of a trusted set of distances between immediate neighbors, then uses multidimensional scaling to compute a low-dimensional embedding that minimally distorts all distances. The locally linear embedding algorithm (LLE) [9] represents each point as a weighted combination of a trusted set of nearest neighbors, then computes a minimally distorting low-dimensional barycentric embedding. They have complementary strengths: ISOMAP handles holes well but can fail if the data hull is nonconvex [12]; and vice versa for LLE [9]. Both offer em- beddings without mappings. It has been noted that trusted-set methods are vulnerable to noise because they consider the subset of point-to-point relationships that has the lowest signal-to-noise ratio; small changes to the trusted set can induce large changes in the set of constraints on the embedding, making solutions unstable [1].
In a return to mapping, Roweis and colleagues [8] proposed global coordination— learning a mixture of locally linear projections from sample to coordinate space. They constructed a posterior that penalizes distortions in the mapping, and gave a expectation-maximization (EM) training rule. Innovative use of variational methods highlighted the difficulty of even hill-climbing their multimodal posterior. Like [2, 7, 6, 8], the method we develop below is a decomposition of the manifold into locally linear neighborhoods. It bears closest relation to global coordination [8], although by a different construction of the problem, we avoid hill-climbing a spiky posterior and instead develop a closed-form solution.
2 Estimating locally linear scale and intrinsic dimensionality We begin with matrix of sample points Y : = [y1; (cid:1) (cid:1) (cid:1) ;yN]; yn 2 RD populating a D- dimensional sample space, and a conjecture that these points are samples from a man- ifold M of intrinsic dimensionality d < D. We seek a mapping onto a vector space G(Y) ! X : = [x1; (cid:1) (cid:1) (cid:1) ;xN]; xn 2 Rd and 1-to-1 reverse mapping G(cid:0)1(X) ! Y such that local relations between nearby points are preserved (this will be formalized below). The map G should be non-catastrophic, that is, without folds: Parallel lines on the manifold in RD should map to continuous smooth non-intersecting curves in Rd. This guarantees that linear operations on X such as interpolation will have reasonable analogues on Y. Smoothness means that at some scale r the mapping from a neighborhood on M to Rd is effectively linear. Consider a ball of radius r centered on a data point and containing n(r) data points. The count n(r) grows as rd, but only at the locally linear scale; the grow rate is inflated by isotropic noise at smaller scales and by embedding curvature at larger scales. : To estimate r, we look at how the r-ball grows as points are added to it, tracking c(r) = d log n(r) log r. At noise scales, c(r) (cid:25) 1=D < 1=d, because noise has distributed points in all directions with equal probability. At the scale at which curvature becomes significant, c(r) < 1=d, because the manifold is no longer perpendicular to the surface of the ball, so the ball does not have to grow as fast to accommodate new points. At the locally linear scale, the process peaks at c(r) = 1=d, because points are distributed only in the directions of the manifold’s local tangent space. The maximum of c(r) therefore gives an estimate of both the scale and the local dimensionality of the manifold (see figure 1), provided that the ball hasn’t expanded to a manifold boundary— boundaries have lower dimension than
d
Scale behavior of a 1D manifold in 2-space
samples noise scale locally linear scale curvature scale
Point-count growth process on a 2D manifold in 3-space
radial growth process 1D hypothesis 2D hypothesis 3D hypothesis