{"title": "Mapping a Manifold of Perceptual Observations", "book": "Advances in Neural Information Processing Systems", "page_first": 682, "page_last": 688, "abstract": "", "full_text": "Mapping a manifold of perceptual observations \n\nJoshua B. Tenenbaum \n\nDepartment of Brain and Cognitive Sciences \n\nMassachusetts Institute of Technology, Cambridge, MA 02139 \n\njbt@psyche.mit.edu \n\nAbstract \n\nNonlinear dimensionality reduction is formulated here as the problem of trying to \nfind a Euclidean feature-space embedding of a set of observations that preserves \nas closely as possible their intrinsic metric structure - the distances between points \non the observation manifold as measured along geodesic paths. Our isometric \nfeature mapping procedure, or isomap, is able to reliably recover low-dimensional \nnonlinear structure in realistic perceptual data sets, such as a manifold of face \nimages, where conventional global mapping methods find only local minima. \nThe recovered map provides a canonical set of globally meaningful features, \nwhich allows perceptual transformations such as interpolation, extrapolation, and \nanalogy - highly nonlinear transformations in the original observation space - to \nbe computed with simple linear operations in feature space. \n\n1 Introduction \n\nIn psychological or computational research on perceptual categorization, it is generally taken \nfor granted that the perceiver has a priori access to a representation of stimuli in terms of \nsome perceptually meaningful features that can support the relevant classification. However, \nthese features will be related to the raw sensory input (e.g. values of retinal activity or image \npixels) only through a very complex transformation, which must somehow be acquired \nthrough a combination of evolution, development, and learning. Fig. 1 illustrates the feature(cid:173)\ndiscovery problem with an example from visual perception. The set of views of a face from \nall possible viewpoints is an extremely high-dimensional data set when represented as image \narrays in a computer or on a retina; for example, 32 x 32 pixel grey-scale images can be \nthought of as points in a 1 ,024-dimensional observation space. The perceptually meaningful \nstructure of these images, however, is of much lower dimensionality; all of the images in \nFig. 1 lie on a two-dimensional manifold parameterized by viewing angle. A perceptual \nsystem that discovers this manifold structure has learned a model of the appearance of \nthis face that will support a wide range of recognition, classification, and imagery tasks \n(some demonstrated in Fig. 1), despite the absence of any prior physical knowledge about \nthree-dimensional object geometry, surface texture, or illumination conditions. \n\nLearning a manifold of perceptual observations is difficult because these observations \n\n\fMapping a Manifold of Perceptual ObseIVations \n\n683 \n\n. \n\u2022 \n: \nL..... .... ------ .. --. --...; .... \"-.... ...... --.. ~. ----. --... ... --i:::: ......... ~. \n\nWi \n\n\u2022 . \n\n: \n\n11 \n\n\" .. \n. \n\n-\n\nc: o \n~ \n> <1) \n(i) \n\nl.. \n\n.. _--,.---' \n\nC! \n\n~-------------------------------------------~. \n~.-p---\n\n: \n\n~ \n\n\u2022 \n\nA [11---.---.---.---\n---B. \nB fll---fii-{ii iI \u2022 \u2022 \u2022 \n\nazimuth \n\ncll-. \nW\u00b7 \n\nFigure 1: Isomap recovers a global topographic map of face images varying in two viewing \nangle parameters, azimuth and elevation. Image interpolation (A), extrapolation (B), and \nanalogy (C) can then be carried out by linear operations in this feature space. \n\nusually exhibit significant nonlinear structure. Fig. 2A provides a simplified version of \nthis problem. A flat two-dimensional manifold has been nonlinearly embedded in a three(cid:173)\ndimensional observation space, 1 and must be \"unfolded\" by the learner_ For linearly \nembedded manifolds. principal component analysis (PCA) is guaranteed to discover the \ndimensionality of the manifold and produce a compact representation in the form of an \northonormal basis_ However, PCA is completely insensitive to the higher-order. nonlinear \nstructure that characterizes the points in Fig. 2A or the images in Fig. 1. \n\nNonlinear dimensionality reduction - the search for intrinsically low-dimensional struc(cid:173)\ntures embedded nonlinearly in high-dimensional observations - has long been a goal of \ncomputational learning research. The most familiar nonlinear techniques. such as the \nself-organizing map (SOM; Kohonen, 1988), the generative topographic mapping (GTM; \nBishon, Svensen, & Williams, 1998), or autoencoder neural networks (DeMers & Cottrell, \n1993), try to generalize PCA by discovering a single global low-dimensional nonlinear \nmodel of the observations. In contrast, local methods (Bregler & Omohundro. 1995; Hin(cid:173)\nton, Revow, & Dayan, 1995) seek a set of low-dimensional models, usually linear and \nhence valid only for a limited range of data. When appropriate, a single global model is \n\nIGiven by XI = ZI COS(ZI), X2 = ZI sin(zJ), X3 = Z2, for Zl E [311\"/2,911\"/2], Z2 E [0,15]. \n\n\f684 \n\n10 \n\no \n\nA \n\nB \n\nc \n\nJ. B. Tenenbaum \n\n10 \n\no \n\n10 \n\no \n\no \n\n10 \n\no 10 \n\n10 \n\n10 \n\nFigure 2: A nonlinearly embedded manifold may create severe local minima for \"top-down\" \nmapping algorithms. (A) Raw data. (B) Best SOM fit. (C) Best GTM fit. \n\nmore revealing and useful than a set of local models. However, local linear methods are in \ngeneral far more computationally efficient and reliable than global methods. \n\nFor example, despite the visually obvious structure in Fig. 2A, this manifold was not \nsuccessfuly modeled by either of two popular global mapping algorithms, SOM (Fig. 2B) \nand GTM (Fig. 2C), under a wide range of parameter settings. Both of these algorithms \ntry to fit a grid of predefined (usually two-dimensional) topology to the data, using greedy \noptimization techniques that first fit the large-scale (linear) structure of the data, before \nmaking small-scale (nonlinear) refinements. The coarse structure of such \"folded\" data sets \nas Fig. 2A hides their nonlinear structure from greedy optimizers, virtually ensuring that \ntop-down mapping algorithms will become trapped in highly suboptimal solutions. \n\nRather than trying to force a predefined map onto the data manifold, this paper shows how a \nperceptual system may map a set of observations in a \"bottom-up\" fashion, by first learning \nthe topological structure of the manifold (as in Fig. 3A) and only then learning a metric map \nof the data (as in Fig. 3C) that respects this topology. The next section describes the goals and \nsteps of the mapping procedure, and subsequent sections demonstrate applications to two \nchallenging learning tasks: recovering a five-dimensional manifold embedded nonlinearly \nin 50 dimensions, and recovering the manifold of face images depicted in Fig. I. \n\n2 Isometric feature mapping \n\nWe assume our data lie on an unknown manifold M embedded in a high-dimensional \nobservation space X. Let xci) denote the coordinates of the ith observation. We seek a \nmapping I : X - Y from the observation space X to a low-dimensional Euclidean feature \nspace Y that preserves as well as possible the intrinsic metric structure of the observations, \ni.e. the distances between observations as measured along geodesic (locally shortest) paths \nof M . The isometric feature mapping, or isomap, procedure presented below generates \nan implicit description of the mapping I, in terms of the corresponding feature points \ny(i) = I( xci)) for sufficiently many observations x(i). Explicit parametric descriptions \nof I or I-I can be found with standard techniques of function approximation (Poggio & \nGirosi, 1990) that interpolate smoothly between the known corresponding pairs {x( i) , y( i)} . \n\nA Euclidean map of the data's intrinsic geometry has several important properties. First, \nintrinsically similar observations should map to nearby points in feature space, support(cid:173)\ning efficient similarity-based classification and informative visualization. Moreover, the \ngeodesic paths of the manifold, which are highly nonlinear in the original observation space, \nshould map onto straight lines in feature space. Then perceptually natural transfonnations \nalong these paths, such as the interpolation, extrapolation and analogy demonstrated in \nFigs. IA-C, may be computed by trivial linear operations in feature space. \n\n\fMapping a Manifold of Perr:eptual Observations \n\n685 \n\nA \n\nB \n\nC \n\n10 \n\no \n\n. . . . . . ~ \n\nManifold Distance \n\n4 \n\n0 \n\n~bU~ 1~9 ~ \n~ ~ (lJ?t \n'0 c!Je~r!~ Q \n~ Qligo)\", \n\n~ \n\n1!@> \n\n2 1t:\u00b0\u00b71\u00b7\\~ \n\n\" ' . rt;,.. \n\ntje. 'fJ \n\n5 \n\n10 \n\n15 \n\nFigure 3: The results of the three-step isomap procedure. (A) Discrete representation of \nmanifold in Fig. 2A. (B) Correlation between measured graph distances and true mani(cid:173)\nfold distances. (C) Correspondence of recovered two-dimensional feature points {Yl, Y2} \n(circles) with original generating vectors {ZI' Z2} (line ends). \n\nThe isomap procedure consists of three main steps, each of which might be carried out by \nmore or less sophisticated techniques. The crux of isomap is finding an efficient way to \ncompute the true geodesic distance between observations, gi ven only their Euclidean dis(cid:173)\ntances in the high-dimensional observation space. Isomap assumes that distance between \npoints in observation space is an accurate measure of manifold distance only locally and \nmust be integrated over paths on the manifold to obtain global distances. As preparation for \ncomputing manifold distances, we first construct a discrete representation of the manifold \nin the form of a topology-preserving network (Fig. 3A). Given this network representation, \nwe then compute the shortest-path distance between any two points in the network using \ndynamic programming. This polynomial-time computation provides a good approximation \nto the actual manifold distances (Fig. 3B) without having to search over all possible paths in \nthe network (let alone the infinitely many paths on the unknown manifold!). Finally, from \nthese manifold distances, we construct a global geometry-preserving map of the observa(cid:173)\ntions in a low-dimensional Euclidean space, using multidimensional scaling (Fig. 3C). The \nimplementation of this procedure is detailed below. \n\nStep 1: Discrete representation of manifold (Fig. 3A). From the input data of n observations \n{x(1) , . \u2022 . , xC n)}, we randomly select a subset of T points to serve as the nodes {g(1) , .. . , gC r)} of the \ntopology-preserving network. We then construct a graph G over these nodes by connecting gCi) and \ng(;) if and only if there exists at least one xCk) whose two closest nodes (in observation space) are gC i) \nand gCi) (Martinetz & Schulten, 1994). The resulting graph for the data in Fig. 2A is shown in Fig. 3A \n(with n = 104, T = )03). This graph clearly respects the topology of the manifold far better than the \nbest fits with SOM (Fig. 2B) or GTM (Fig. 2C). In the limit of infinite data, the graph thus produced \nconverges to the Delaunay triangulation of the nodes, restricted to the data manifold (Martinetz & \nSchulten, 1994). In practice, n = 104 data points have proven sufficient for all examples we have \ntried. This number may be reduced significantly if we know the dimensionality d of the manifold, \nbut here we assume no a priori information about dimensionality. The choice of T, the number of \nnodes in G, is the only free parameter in isomap. If T is too small, the shortest-path distances between \nnodes in G will give a poor approximation to their true manifold distance. If T is too big (relative to \nn), G will be missing many appropriate links (because each data point XCi) contributes at most one \nlink). In practice, choosing a satisfactory T is not difficult - all three examples presented in this paper \nuse T = n /10, the first value tried. I am currently exploring criteria for selecting the optimal value T \nbased on statistical arguments and dimensionality considerations. \n\nStep 2: Manifold distance measure (Fig. 3B). We first assign a weight to each link w,) in the graph \nG, equal to d1 = I\\xCi ) - xC)I\\, the Euclidean distance between nodes i and j in the observation \nspace X . The length of a path in G is defined to be the sum of link weights along that path. We then \ncompute the geodesic distance d~ (i.e. shortest path length) between all pairs of nodes i and j in G, \nusing Floyd's O( T 3 ) algorithm (Foster, 1995). Initialize d& = d1 if nodes i and j are connected \n\n\f686 \n\nJ. B. Tenenbaum \n\niii 1 \n::> \n:2 \nm \nQ) \na: \n-g 0,5 \n\n1 \n\n0 \nz 0 \n0 \n\nA \n\nOisomap \n\n5 \n\nDimension \n\n10 \n\nlI! \n\niii 1 \n::> \n\"0 \n'iii \nQ) \na: \n-g 0,5 \nN \n'a \nE \n0 \nz 0 \n0 \n\nB \n\n)1, \nx '. \nX \n\n., \n' '. \n\n\" \nx , \n\n.PCA \nxMDS \n\nGiven \n\nFigure 4: \na \n5-dimensional manifold \nembedded nonlinearly in \na 50-dimensional space, \nisomap \nidentifies \nthe \nintrinsic \ndimensionality \n(A), while PCA and \n\n10 MDS alone do not (B). \n\n'. \n\nx'x','., \n'X\" \n\n5 \n\nDimension \n\nand 00 otherwise. Then for each node k, set each d~ = min(d~, d~ + d\"d). Fig. 3B plots the \ndistances d~ computed between nodes i and j in the graph of Fig. 3A versus their actual manifold \ndistances d~. Note that the correlation is almost perfect (R > .99), but d~ tends to overestimate d~ \nby a constant factor due to the discretization introduced by the graph. As the density of observations \nincreases, so does the possible graph resolution. Thus, in the limit of infinite data, the graph-based \napproximation to manifold distance may be made arbitrarily accurate. \n\nStep 3: Isometric Euclidean embedding (Fig. 3C). We use ordinal multidimensional scaling (MOS; \nCox & Cox, 1994; code provided by Brian Ripley), also called \"non metric \" MOS, to find a k(cid:173)\ndimensional Euclidean embedding that preserves as closely as possible the graph distances d~. In \ncontrast to classical \"metric\" MOS, which explicitly tries to preserve distances, ordinal MOS tries \nto preserve only the rank ordering of distances. MOS finds a configuration of k-dimensional feature \nvectors {y(1) \u2022 ...\u2022 y( r)}, corresponding to the high-dimensional observations {x(I), ... \u2022 x(r)}, that \nminimizes the stress function, \n\nS = min \nd;1 \n\n(1) \n\nHere d~ = II y( i) - yU) II, the Euclidean distance between feature vectors i and j, and the d~ are \nsome monotonic transformation of the graph distances d~. We use ordinal MOS because it is less \nsenstitive to noisy estimates of manifold distance. Moreover, when the number of points scaled is \nlarge enough (as it is in all our examples), ordinal constraints alone are sufficient to reconstruct a \nprecise metric map. Fig. 3C shows the projections of 100 random points on the manifold in Fig, 2A \nonto a two-dimensional feature space computed by MOS from the graph distances output by step 2 \nabove. These points are in close correspondence (after rescaling) with the original two-dimensional \nvectors used to generate the manifold (see note 1), indicating that isomap has successfully unfolded \nthe manifold onto a 2-dimensional Euclidean plane. \n\n3 Example 1: Five-dimensional manifold \n\nThis section demonstrates isomap's ability to discover and model a noisy five-dimensional \nmanifold embedded within a 50-dimensional space. As the dimension of the manifold \nincreases beyond two, SOM, GTM, and other constrained clustering approaches become \nimpractical due to the exponential proliferation of cluster centers. Isomap, however, is \nquite practical for manifolds of moderate dimensionality, because the estimates of manifold \ndistance for a fixed graph size degrade gracefully as dimensionality increases. Moreover, \nisomap is able to automatically discover the intrinsic dimensionality of the data, while \nconventional methods must be initialized with a fixed dimensionality. \nWe consider a 5-dimensional manifold parameterized by {Z\\, . . \" zs} E [0,4]5. The first 10 \nof 50 observation dimensions were determined by nonlinear functions of these parameters. 2 \n2XI = cos( 1rzt}, X2 = sine 1rzI), X3 = cose; zI), X4 = sine; zI) , Xs = cos( fzI), \nX6 = sin(fzl), X7 = z2cos\\j~'zl)+z3sin2(lizl)' X8 = z2sin2(lizI)+Z3COS2(~zt}, X9 = \nZ4 cos2(lizt} + Zs sin2(~zl)' XIO = Z4 sin\\j~'zl) + Zs COS2(~ZI)' \n\n\fMapping a Manifold of Perceptual Observations \n\n687 \n\nLow-amplitude gaussian noise (4-5% of variance) was added to each of these dimensions, \nand the remaining 40 dimensions were set to pure noise of similar variance. The isomap \nprocedure applied to this data (n = 104 , r = 103) correctly recognized its intrinsic five(cid:173)\ndimensionality, as indicated by the sharp decrease of stress (see Eq. 1) for embedding \ndimensions up to 5 and only gradual decrease thereafter (Fig. 4A). In contrast, both PCA \nand raw MDS (using distances in observation space rather than manifold distances) identify \nthe lO-dimensional linear subspace containing the data, but show no sensitivity to the \nunderlying five-dimensional manifold (Fig. 4B). \n\n4 Example 2: 1Wo-dimensional manifold of face images \n\nThis section illustrates the performance of isomap on the two-dimensional manifold of \nface images shown in Fig. 1. To generate this map, 32 x 32-pixel images of a face were \nfirst rendered in MATLAB in many different poses (azimuth E [-90\u00b0,90\u00b0], elevation \nE [-10\u00b0, 10\u00b0]), using a 3-D range image of an actual head and a combination oflambertian \nand specular reflectance models. To save computation, the data (n = 104 images) were \nfirst reduced to 60 principal components and then submitted to isomap (r = 103). The \nplot of stress S vs. dimension indicated a dimensionality of two (even more clearly than \nFig. 4A). Fig. 1 shows the two-dimensional feature space that results from applying MDS to \nthe computed graph distances, with 25 face images placed at their corresponding points in \nfeature space. Note the clear topographic representation of similar views at nearby feature \npoints. The principal axes of the feature space can be identified as the underlying viewing \nangle parameters used to generate the data. The correlations of the two isomap dimensions \nwith the two pose angles are R = .99 and R = .95 respectively. No other global mapping \nprocedure tried (PCA, MDS, SOM, GTM) produced interpretable results for these data. \n\nThe human visual system's implicit knowledge of an object's appearance is not limited to \na representation of view similarity, and neither is isomap's. As mentioned in Section 2, an \nisometric feature map also supports analysis and manipulation of data, as a consequence of \nmapping geodesics of the observation manifold to straight lines in feature space. Having \nfound a number of corresponding pairs {x( i) , y( i)} of images x( i) and feature vectors y( i) , \nit is easy to learn an explicit inverse mapping 1-1 : y -+ X from low-dimensional feature \nspace to high-dimensional observation space, using generic smooth interpolation techniques \nsuch as generalized radial basis function (GRBF) networks (Poggio & Girosi, 1990). All \nimages in Fig. 1 have been synthesized from such a mapping. 3 \n\nFigs. lA-C show how learning this inverse mapping allows interpolation, extrapolation, \nand analogy to be carried out using only linear operations. We can interpolate between \ntwo images x(l) and x(2) by synthesizing a sequence of images along their connecting line \nin feature space (Fig. lA). We can extrapolate the transformation from one \n(y(2) _ yO) \nimage to another and far beyond, by following the line to the edge of the manifold (Fig. IB). \nWe can map the transformation between two images xCI) and x(2) onto an analogous \ntransformation of another image x(3), by adding the transformation vector (y(2) - y(1\u00bb \nto \ny(3) and synthesizing a new image at the resulting feature coordinates (Fig. 1 C). \n\nA number of authors (Bregler & Omohundro, 1995; Saul & Jordan, 1997; Beymer & \nPoggio, 1995) have previously shown how learning from examples allows sophisticated \n\n3The map from feature vectors to images was learned by fitting a GRBF net to 1000 corresponding \npoints in both spaces. Each point corresponds to a node in the graph G used to measure manifold \ndistance, so the feature-space distances required to fit the GRBF net are given (approximately) by the \ngraph distances d~ computed in step 2 of isomap. A subset C of m = 300 points were randomly \nchosen as RBF centers, and the standard deviation of the RBFs was set equal to max;,jEC d~rJ2m \n(as prescribed by Haykin, 1994). \n\n\f688 \n\nJ. B. Tenenbaum \n\nimage manipulations to be carried out efficiently. However, these approaches do not support \nas broad a range of transformations as isomap does, because of their use of only locally \nvalid models and/or the need to compute special-purpose image features such as optical \nflow. See Tenenbaum (1997) for further discussion, as well as examples of isomap applied \nto more complex manifolds of visual observations. \n\n5 Conclusions \n\nThe essence of the isomap approach to nonlinear dimensionality reduction lies in the \nnovel problem formulation: to seek a low-dimensional Euclidean embedding of a set of \nobservations that captures their intrinsic similarities, as measured along geodesic paths of the \nobservation manifold. Here I have presented an efficient algorithm for solving this problem \nand shown that it can discover meaningful feature-space models of manifolds for which \nconventional \"top-down\" approaches fail. As a direct consequence of mapping geodesics \nto straight lines in feature space, isomap learns a representation of perceptual observations \nin which it is easy to perform interpolation and other complex transformations. A negative \nconsequence of this strong problem formulation is that isomap will not be applicable to \nevery data manifold. However, as with the classic technique of peA, we can state clearly \nthe general class of data for which isomap is appropriate - manifolds with no \"holes\" and \nno intrinsic curvature - with a guarantee that isomap will succeed on data sets from this \nclass, given enough samples from the manifold. Future work will focus on generalizing \nthis domain of applicability to allow for manifolds with more complex topologies and \nsignificant curvature, as would be necessary to model certain perceptual manifolds such as \nthe complete view space of an object. \n\nAcknowledgements \n\nThanks to M. Bernstein, W. Freeman, S. Gilbert, W. Richards, and Y. Weiss for helpful discussions. \nThe author is a Howard Hughes Medical Institute Predoctoral Fellow. \n\nReferences \n\nBeymer, D. & Poggio, T. (1995). Representations for visual learning, Science 272,1905. \nBishop, c., Svensen, M., & Williams, C. (1998). GTM: The generative topographic mapping. Neural \nComputation 10(1). \n\nBregler, C. & Omohundro, S. (1995). Nonlinear image interpolation using manifold learning. NIPS \n7. MIT Press. \n\nCox, T. & Cox, M. (1994). Multidimensional scaling. Chapman & Hall. \n\nDeMers, D. & Cottrell, G. (1993). Nonlinear dimensionality reduction. NIPS 5. Morgan Kauffman. \nFoster, I. (1995). Designing and building parallel programs. Addison-Wesley. \n\nHayldn, S. (1994). Neural Networks: A Comprehensive Foundation. Macmillan. \n\nHinton. G. , Revow. M .\u2022 & Dayan, P. (1995). Recognizing handwritten digits using mixtures of linear \nmodels. NIPS 7. MIT Press. \n\nKohonen. T. (1988). Self-Organization and Associative Memory. Berlin: Springer. \n\nMartinetz. T. & Schulten, K. (1994). Topology representing networks. Neural Networks 7, 507. \n\nPoggio, T. & Girosi. F. (1990). Networks for approximation and learning. Proc. IEEE 78, 1481 . \n\nSaul, L. & Jordan. M. (1997). A variational principle for model-based morphing. NIPS 9. MIT Press. \n\nTenenbaum, J. (1997). Unsupervised learning of appearance manifolds. Manuscript submitted. \n\n\f", "award": [], "sourceid": 1332, "authors": [{"given_name": "Joshua", "family_name": "Tenenbaum", "institution": null}]}