{"title": "Artefactual Structure from Least-Squares Multidimensional Scaling", "book": "Advances in Neural Information Processing Systems", "page_first": 937, "page_last": 944, "abstract": null, "full_text": "Artefactual Structure from Least Squares\n\nMultidimensional Scaling\n\nDepartment of Engineering Science\n\nNeural Computing Research Group\n\nNicholas P. Hughes\n\nUniversity of Oxford\nOxford, 0X1 3PJ, UK\nnph@robots.ox.ac.uk\n\nDavid Lowe\n\nAston University\n\nBirmingham, B4 7ET, UK\n\nd.lowe@aston.ac.uk\n\nAbstract\n\nWe consider the problem of illusory or artefactual structure from the vi-\nsualisation of high-dimensional structureless data. In particular we ex-\namine the role of the distance metric in the use of topographic mappings\nbased on the statistical \ufb01eld of multidimensional scaling. We show that\nthe use of a squared Euclidean metric (i.e. the SS TRESS measure) gives\nrise to an annular structure when the input data is drawn from a high-\ndimensional isotropic distribution, and we provide a theoretical justi\ufb01ca-\ntion for this observation.\n\n1 Introduction\n\nThe discovery of meaningful patterns and relationships from large amounts of multivariate\ndata is a signi\ufb01cant and challenging problem with close ties to the \ufb01elds of pattern recog-\nnition and machine learning, and important applications in the areas of data mining and\nknowledge discovery in databases (KDD).\n\nFor many real-world high-dimensional data sets (such as collections of images, or multi-\nchannel recordings of biomedical signals) there will generally be strong correlations be-\ntween neighbouring observations, and thus we expect that the data will lie on a lower\ndimensional (possibly nonlinear) manifold embedded in the original data space. One ap-\nproach to the aforementioned problem then is to \ufb01nd a faithful1 representation of the data in\na lower dimensional space. Typically this space is chosen to be two- or three-dimensional,\nthus facilitating the visualisation and exploratory analysis of the intrinsic low-dimensional\nstructure in the data (which would otherwise be masked by the dimensionality of the data\nspace).\n\nIn this context then, an effective dimensionality reduction algorithm should seek to extract\nthe underlying relationships in the data with minimum loss of information. Conversely, any\ninteresting patterns which are present in the visualisation space should be representative of\nsimilar patterns in the original data space, and not artefacts of the dimensionality reduction\nprocess.\n\n1By \u201cfaithful\u201d we mean that the underlying geometric structure in the data space, which charac-\n\nterises the informative relationships in the data, is preserved in the visualisation space.\n\n\fAlthough much effort has been focused on the former problem of optimal structure elu-\ncidation (see [7, 10] for recent approaches to dimensionality reduction), comparatively\nlittle work has been undertaken on the latter (and equally important) problem of artefactual\nstructure. This shortcoming was recently highlighted in a controversial example of the ap-\nplication of visualisation techniques to neuroanatomical connectivity data derived from the\nprimate visual cortex [12, 9, 13, 3].\n\nIn this paper we attempt to redress the balance by considering the visualisation of high-\ndimensional structureless data through the use of topographic mappings based on the sta-\ntistical \ufb01eld of multidimensional scaling (MDS). This is an important class of mappings\nwhich have recently been brought into the neural network domain [5], and have signi\ufb01cant\nconnections to modern kernel-based algorithms such as kernel PCA [11].\n\nThe organisation of the remainder of this paper is as follows: In section 2 we introduce\nthe technique of multidimensional scaling and relate this to the \ufb01eld of topographic map-\npings. In section 3 we show how under certain conditions such mappings can give rise to\nartefactual structure. A theoretical analysis of this effect is then presented in section 4.\n\n2 Multidimensional Scaling and Topographic Mappings\n\nThe visualisation of experimental data which is characterised by pairwise proximity val-\nues is a common problem in areas such as psychology, molecular biology and linguistics.\nMultidimensional scaling (MDS) is a statistical technique which can be used to construct\n\na spatial con\ufb01guration of points in a (typically) two- or three-dimensional space given a\nmatrix of pairwise proximity values between  objects. The proximity matrix provides a\n\nmeasure of the similarity or dissimilarity between the objects, and the geometric layout of\nthe resulting MDS con\ufb01guration re\ufb02ects the relationships between the objects as de\ufb01ned by\nthis matrix. In this way the information contained within the proximity matrix can be cap-\ntured by a more succinct spatial model which aids visualisation of the data and improves\nunderstanding of the processes that generated it.\n\nIn many situations, the raw dissimilarities will not be representative of actual inter-point\ndistances between the objects, and thus will not be suitable for embedding in a low-\ndimensional space. In this case the dissimilarities can be transformed into a set of values\nmore suitable for embedding through the use of an appropriate transformation:\n\n\u0002\u0004\u0003\u0006\u0005\b\u0007\n\t\f\u000b\r\u0002\u000e\u0003\u000f\u0005\u0011\u0010\nrepresents the transformation function and \u0001\n\nsimilarities (which are termed \u201cdisparities\u201d). The aim of metric MDS then is that the trans-\n\n\u0003\u0006\u0005 are the resulting transformed dis-\n\u0002\u000e\u0003\u000f\u0005 should correspond as closely as possible to the inter-point dis-\n\nin the resulting con\ufb01guration2.\n\nwhere \t\nformed dissimilarities \u0001\ntances\u0012\n\n\u0003\u000f\u0005\n\nMetric MDS can be formulated as a continuous optimisation problem through the de\ufb01nition\nof an appropriate error function. In particular, least squares scaling algorithms directly seek\nto minimise the sum-of-squares error between the disparities and the inter-point distances.\nThis error, or STRESS3 measure, is given by:\n\nSTRESS \u0007\n\n\u0003\r\u0015\n\n\u0002\u0017\u0016\n\u0003\u0006\u0005\b\u0018\n\n\u0005\u001a\u0019\u001b\u0003\u001d\u001c\n\n\u0003\u000f\u0005\u001f\u001e\n\n\u0003\u000f\u0005! \n\n\u0003\u0006\u0005\u0011\"$#\n\n(1)\n\n2This is in contrast to nonmetric MDS which requires that only the ordering of the disparities\ncorresponds to the ordering of the inter-point distances (and thus that the disparities are some arbitrary\nmonotonically increasing function of the distances).\n\n3STRESS is an acronym for STandard REsidual Sum of Squares.\n\n\u0001\n\u0002\n\u0013\n\u0014\n\u0005\n\u0001\n\u0003\n\u0018\n\u0001\n\u0002\n\u0012\n\fis a normalising constant which reduces the sensitivity of the\n\n\u0003\u0006\u0005\nwhere the term \u0013\u0001\nmeasure to the number of points and the scaling of the disparities, and the \u001c\nto the con\ufb01guration points \u0002\n\n\u0003\u000f\u0005 are the\n\u0003 and minimise the error through the use of standard nonlinear\n\nweighting factors. It is straightforward to differentiate this S TRESS measure with respect\n\noptimisation techniques.\n\nAn alternative and commonly used error function, which is referred to as SS TRESS, is given\nby:\n\nSSTRESS \u0007\n\n\u0003\r\u0015\n\n\u0003\u000f\u0005\n\n\u0005\u001a\u0019\u001b\u0003\n\n\u0003\u000f\u0005\n\n\u0003\u0006\u0005\n\n(2)\n\nwhich represents the sum-of-squares error between squared disparities and squared dis-\ntances. The primary advantage of the SS TRESS measure is that it can be ef\ufb01ciently min-\nimised through the use of an alternating least squares procedure 4 [1].\nClosely related to the \ufb01eld of Metric MDS is Sammon\u2019s mapping [8], which takes as its\ninput a set of high-dimensional vectors and seeks to produce a set of lower dimensional\nvectors such that the following error measure is minimised:\n\n\u0003\u0005\u0004\u0007\u0006\t\b\n\b\f\u000b\u000e\n\n\u0003\r\u0015\n\n\u0003\u000f\u0005\n\u0012\u0010\u000f\n\n\u0005\u001a\u0019\n\n\u0003\u000f\u0005\n\n\u0003\u000f\u0005\n\u0012\u0010\u000f\n\n\u0003\u0006\u0005\n\n(3)\n\n\u0011 .\n\n\u0003\u000f\u0005\n\n\u0007\u0012\u0011\u0014\u0013\n\n\u0003\u000f\u0005 are the inter-point Euclidean distances in the data space: \u0012\n\u0011 ,\n\u0003\u000f\u0005 are the corresponding inter-point Euclidean distances in the feature or map\n\u0007\u0015\u0011\n\nwhere the\u0012\nand the \u0012\n\u0003\u000f\u0005\nspace:\u0012\nand the weighting factors given by \u001c\nbased on the minimisation of an error measure of the form \u0014\n\nIgnoring the normalising constant, Sammon\u2019s mapping is thus equivalent to least squares\nmetric MDS with the disparities taken to be the raw inter-point distances in the data space\n\n\u0003\u000f\u0005 . Lowe (1993) termed such a mapping\n# a topographic\n\nmapping, since this constraint \u201coptimally preserves the geometric structure in the data\u201d [5].\n\nInterestingly the choice of the S TRESS or SSTRESS measure in MDS has a more natural\ninterpretation when viewed within the framework of Sammon\u2019s mapping. In particular,\nSTRESS corresponds to the use of the standard Euclidean distance metric whereas SS TRESS\ncorresponds to the use of the squared Euclidean distance metric. In the next section we\nshow that this choice of metric can lead to markedly different results when the input data\nis sampled from a high-dimensional isotropic distribution.\n\n\u0013\u0016\n\n\u0003\u000f\u0005\n\n\u0003\u000f\u0005\n\n\u0003\u000f\u0005\n\n3 Emergence of Artefactual Structure\n\nIn order to investigate the problem of artefactual structure we consider the visualisation of\nhigh-dimensional structureless data (where we use the term \u201cstructureless\u201d to indicate that\nthe data density is equal in all directions from the mean and varies only gradually in any\ndirection). Such data can be generated by sampling from an isotropic distribution (such as\na spherical Gaussian), which is characterised by a covariance matrix that is proportional to\nthe identity matrix, and a skewness of zero.\n\nWe created four structureless data sets by randomly sampling 1000 i.i.d. points from unit\n5, 10, 30 and 100. For each data set, we generated a pair\n\nhypercubes of dimensions \u0017\n\n4The SSTRESS measure now forms the basis of the ALSCAL implementation of MDS, which is\n\nincluded as part of the SPSS software package for statistical data analysis.\n\n\u0014\n\u0003\n\u0015\n\u0005\n\u0001\n\u0002\n\u0016\n\u0013\n\u0014\n\u0005\n\u0001\n\u0002\n\u0016\n\u0018\n\u0003\n\u0018\n\u001e\n\u0001\n\u0002\n\u0016\n \n\u0012\n\u0016\n\"\n#\n\u0007\n\u0013\n\u0014\n\u0005\n\u0018\n\u0003\n\u0018\n\u0003\n\u000b\n\u0012\n\u000f\n \n\u0012\n\u0010\n#\n\u000f\n\u000f\n\u0003\n \n\u0013\n\u0005\n\u0002\n\u0003\n \n\u0002\n\u0005\n\u0007\n\u0012\n\u000f\n\u0003\n\u0015\n\u0005\n\u000b\n\u0012\n\u000f\n \n\u0012\n\u0010\n\u0007\n\f1.4\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n\u22120.2\n\n\u22120.4\n\n1.5\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\n\u22121\n\n\u22121.5\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n2\n\n\u22122\n\n\u22121.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\n3\n\n4\n\n3\n\n2\n\n1\n\n0\n\n\u22121\n\n\u22122\n\n\u22123\n\n\u22124\n\n\u22123\n\n\u22122\n\n\u22121\n\n0\n\n1\n\n2\n\n3\n\n4\n\n5\n\n(a)\n\n\u0002\u0001\n\n5\n\n(b)\n\n\u0003\u0001\n\n10\n\n(c)\n\n\u0003\u0001\n\n30\n\n(d)\n\n\u0003\u0001\n\n100\n\nFigure 1: Final map con\ufb01gurations produced by S TRESS mappings of data uniformly ran-\n\ndomly distributed in unit hypercubes of dimension \u0017\n\n.\n\nof 2-D con\ufb01gurations by minimising 5 STRESS and SSTRESS error measures of the form\nrespectively. The process was repeated \ufb01fty times\n(for each individual error function and data set) using different initial con\ufb01gurations of the\nmap points, and the con\ufb01guration with the lowest \ufb01nal error was retained.\n\n# and\u0014\n\n\u0003\u000f\u0005\n\n\u0003\u000f\u0005\n\n\u0003\u000f\u0005\n\n\u0003\u0006\u0005\n\nAs previously noted, the choice of the S TRESS or SSTRESS error measure is best viewed\nas a choice of distance metric, where S TRESS corresponds to the standard Euclidean metric\nand SSTRESS corresponds to the squared Euclidean metric. Figure 1 shows the resulting\ncon\ufb01gurations from the S TRESS mappings. It is clear that each con\ufb01guration has captured\nthe isotropic nature of the associated data set, and there are no spurious patterns or clusters\nevident in the \ufb01nal visualisation plots.\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n\u22120.2\n\n\u22120.4\n\n1.6\n\n1.4\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n\u22120.2\n\n\u22120.4\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n\u22121.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\n3\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\n\u22121\n\n\u22121.5\n\n\u22122\n\n\u22123\n\n\u22122\n\n\u22121\n\n0\n\n1\n\n2\n\n3\n\n(a)\n\n\u0003\u0001\n\n5\n\n(b)\n\n\u0003\u0001\n\n10\n\n(c)\n\n\u0003\u0001\n\n30\n\n(d)\n\n\u0003\u0001\n\n100\n\nFigure 2: Final map con\ufb01gurations produced by SS TRESS mappings of data uniformly\n\nrandomly distributed in unit hypercubes of dimension \u0017\n\n.\n\nFigure 2 shows the resulting con\ufb01gurations from the SS TRESS mappings. The con\ufb01gura-\ntions exhibit signi\ufb01cant artefactual structure, which is characterised by a tendency for the\nmap points to cluster in a circular fashion. Furthermore, the degree of clustering increases\n\nwith increasing dimensionality of the data space \u0017\n\n10).\n\n(and is clearly evident for \u0017 as low as\n\nAlthough the tendency for SS TRESS con\ufb01gurations to cluster in a circular fashion has been\nnoted in the MDS literature [2], the connection between artefactual structure and the choice\nof distance metric has not been made. Indeed, in the next section we show analytically that\nthe use of the squared Euclidean metric leads to a globally optimal solution corresponding\nto an annular structure.\n\nTo date, the most signi\ufb01cant work on this problem is that of Klock and Buhmann [4], who\nproposed a novel transformation of the dissimilarities (i.e. the squared inter-point distances\n\n5We used a conjugate gradients optimisation algorithm.\n\n\u0014\n\u0003\n\u0015\n\u0005\n\u000b\n\u0012\n\u000f\n \n\u0012\n\u0010\n\u0003\n\u0015\n\u0005\n\u000b\n\u0012\n\u000f\n\u0016\n \n\u0012\n\u0016\n\u0010\n#\n\fin the data space) such that \u201cthe \ufb01nal disparities are more suitable for Euclidean embed-\nding\u201d. However this transformation assumes that the input data are drawn from a spherical\nGaussian distribution6, which is inappropriate for most real-world data sets of interest.\n\n4 Theoretical Analysis of Artefactual Structure\n\n\u0013\u0007\u0006\t\b\n\u0006\u0010\b\n\ndimensional map con\ufb01guration is considered to be the result of a SS TRESS mapping of a\n\nIn this section we present a theoretical analysis of the artefactual structure problem. A \ni.i.d. points drawn from a \u0017 dimensional isotropic distribution (where \u0017\u0003\u0002\u0004 ).\ndata set of\u0001\n\u0013\r\u000e\b\u0010 T and similarly\n\b\u000b\n\f\n\u000b\n\r\b\nThe set of data points is given by the\u0001\n\u000e\u001f\u0010 T.\n\b\u000b\n\u000b\n\f\n\r\b\nthe set of map points is given by the\u0001\n\u0003\u000f\u0005\n\u0003\r\u0015\nWe begin by de\ufb01ning the derivative of the SS TRESS error measure \u0003\nwith respect to a particular map vector \u0002\n \u0013\u0012\n\u0005\u0015\u0014\n\u0003\u000f\u0005 are given by:\n\u0003$ \n\u0003$ \n\nx \u0017 matrix\u0005\nx matrix\u000f\n\u0003 :\n\u0003\u0006\u0005\n\u0003\u000f\u0005\n\nThe inter-point distances\u0012\n\nand\u0012\n\n\u0010 T\u000b\n\u0010 T\u000b\n\n\u0010\f\u000b\n\n\u0005\u0011\u0010\n\n(4)\n\n\u0013 T\u0005\nT\u0005\n\n\u0013 T\u0003\nT\u0003\n\n\u0003\u000f\u0005\n\u0003\u000f\u0005\n\n\u0003\u0006\u0005\n\n\u0003\u0006\u0005\n\n\u0013 T\u0003\nT\u0003\n\n\u0003\u0018\u0017\n\u0003\u001b\u0017\n\n \u001a\u0019\n \u001c\u0019\n\nEquation (4) can therefore be expanded to:\n\n \u001d\u0012\n \"!\n\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\n\u0013 T\u0003\nT\u0003\n\nT\u0003\n\u0003#\u0017$!\n\n\u0003$ \n\n\u0013 T\u0003\n\n\u0005\u0015\u0014\n\nWe can immediately simplify some of these terms as follows:\n\nT\u0005\n\n\u0005\t \n \u001d!\n\n\u0003\u0017 \n\u0005\u0015\u0014\n\n\u0013 T\u0003\n\nT\u0005\n\u0013 T\u0005\n\nT\u0003\n\n\u0005\u0015\u0014\n\n\u0013 T\u0005\n\nT\u0003\n\u0013 T\u0003\n\n \u001e\u0012\n\u0005\u0015\u0014\n\u0003#\u0017%!\n\u0010\u001b\u0007\nT\u0003\n\u0010$\u0007\n\u0013 T\u0003\n\u0003& \nT\u0005\n\u0003& \n\u0013 T\u0005\n\u0007.- ), we have:\n\u0013 T\u0005\n\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\n')(\n'\u000b*,+\n34\u0017\n\nT\u0003\n\u0013 T\u0003\nT\u0003\n\u0013 T\u0003\n\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\nThus at a stationary point of the error (i.e.\n\nT\u0003\n\n\u0003$ \n\n\u00100/\n\n\u0005! \nT\u0005\n\u0013 T\u0003\n6In this case the squared inter-point distances will follow a5\r6 distribution.\n\n\u0005\u0015\u0014\n\n\u0005\u0015\u0014\n\n\u0005\u000b2\n\n\u0003$ \n\n\u0007\n\u000b\n\u0013\n#\n\u0007\n\u000b\n\u0002\n\u0002\n#\n\u0002\n\u0007\n\u0014\n\u0005\n\u000b\n\u0012\n\u000f\n\u0016\n \n\u0012\n\u0016\n\u0010\n#\n\u0011\n\u0003\n\u0011\n\u0002\n\u0003\n\u0007\n\u0018\n\u0016\n\u0003\n\u000b\n\u0012\n\u000f\n\u0016\n \n\u0012\n\u0016\n\u0002\n\u0003\n \n\u0002\n\u000f\n\u0016\n\u0016\n\u0012\n\u000f\n\u0016\n\u0007\n\u0011\n\u0013\n\u0003\n \n\u0013\n\u0005\n\u0011\n#\n\u0007\n\u000b\n\u0013\n\u0013\n\u0005\n\u0013\n\u0003\n \n\u0013\n\u0005\n\u0010\n\u0007\n\u0013\n\u0013\n\u0005\n\u0013\n\u0005\n\u0012\n\u0016\n\u0007\n\u0011\n\u0002\n\u0003\n \n\u0002\n\u0005\n\u0011\n#\n\u0007\n\u000b\n\u0002\n\u0002\n\u0005\n\u0002\n\u0003\n \n\u0002\n\u0005\n\u0010\n\u0007\n\u0002\n\u0002\n\u0002\n\u0002\n\u0005\n\u0002\n\u0002\n\u0005\n\u0011\n\u0003\n\u0011\n\u0002\n\u0003\n\u0007\n\u0018\n\u0016\n\u0003\n\u000b\n\u0013\n\u0003\n \n\u0002\n\u0002\n\u0003\n\u0010\n\u000b\n\u0002\n\u0002\n\u0005\n\u0010\n\u0018\n\u0016\n\u0003\n\u001f\n\u0013\n\u0005\n \n\u0002\n\u0002\n\u000b\n\u0002\n\u0002\n\u0005\n\u0010\n\u0018\n\u0016\n\u0003\n\u000b\n\u0002\n\u0002\n\u0005\n\u0010\n\u0002\n\u0018\n\u0016\n\u0003\n\u000b\n\u0013\n\u0005\n\u0010\n\u0002\n\u0018\n\u0016\n\u0003\n\u000b\n\u0002\n\u0002\n\u0005\n\u0010\n\u0002\n\u0005\n\u0018\n\u0016\n\u0003\n\u000b\n\u0013\n\u0005\n\u0010\n\u0002\n\u0005\n\u0018\n\u0016\n\u0003\n\u000b\n\u0002\n\u0002\n\u0005\n\u0010\n\u0002\n\u0003\n\u0007\n\u0018\n\u0016\n\u0003\n\u0002\n\u0003\n\u000b\n\u0002\n\u0002\n\u0005\n\u0002\n\u0003\n\u0002\n\u0018\n\u0016\n\u0003\n\u0002\n\u0005\n\u0018\n\u0016\n\u0003\n\u000b\n\u0013\n\u0005\n\u0010\n\u0002\n\u0003\n\u0007\n\u0018\n\u0016\n\u0003\n\u0002\n\u0003\n\u000b\n\u0013\n\u0005\n\u0002\n\u0003\n\u0018\n\u0016\n\u0003\n\u0013\n\u0005\n\u0018\n\u0016\n\u0003\n\u000b\n\u0002\n\u0002\n\u0005\n\u0010\n\u0002\n\u0005\n\u0007\n\u0018\n\u0016\n\u0003\n\u0002\n\u0005\n\u001f\n\u0002\n\u0002\n\u0007\n\u0018\n\u0016\n\u0003\n\u001f\n\u0002\n\u0005\n\u0002\n \n\u0002\n\u0003\n\u0018\n\u0016\n\u0003\n\u000b\n\u0013\n\u0005\n\u0010\n\u0002\n\u0005\n\u0007\n\u0018\n\u0016\n\u0003\n\u0002\n\u0005\n\u001f\n\u0013\n\u0007\n\u0018\n\u0016\n\u0003\n\u001f\n\u0002\n\u0005\n \n\u0013\n\u0003\n\u000b\n\u0013\n\u0003\n \n\u0002\n\u0002\n\u0003\n1\n\u0002\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u0002\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u001f\n\u0013\n\u0002\n\u0002\n\u0005\n \n\u000b\n\u0002\n\u0002\n\u0005\n\u0010\n\fT\u0003\n\n\u0005\u0015\u0014\n\n\u0013 T\u0003\n\n\u0005\u0015\u0014\n\nT\u0005\n\n\u0003$ \n\n\u0013 T\u0005\n\n(5)\n\n\u0005\u0015\u0014\n\nT\u0005\n\nT\u0005\n\n\u0013 T\u0005\n\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\u0005\u0015\u0014\n\nis a function of the inter-point distances only, we can centre both the data\n\nSince the error \u0003\npoints and the map points on the origin without loss of generality. For large \u0001 we have:\n\n\u0005\u0015\u0014\n-\u0005\u0004\n\u0001\n\u0006\t\b\f\u000b\n\u000b\u0010\u000e\ntr\r\nwhere-\nis the covariance matrix of the map vectors and the data vectors, and tr\r\u0013\u0012\n\u0013 T\u0005\n\u0003 at a stationary point\n\n\u0005\u0015\u0014\nThis represents a general expression for the value of the map vector \u0002\n\nis the covariance matrix of the map vectors,\u0006\nT\u0005\n\n-\u0003\u0002\n\u0005\u0015\u0014\n\u0001\u0007\u0006\t\b\n\u0005\u0015\u0014\ntr\r\n\u0005\u0015\u0014\nx zero matrix,\u0006\nT\u0003\n\u0003$ \"\u0019\n\nof the SSTRESS error, regardless of the nature of the input data distribution. However we\nare interested in the case where the input data is drawn from a high-dimensional isotropic\ndistribution.\n\nis the\u0011\n\u0013 T\u0003\n\nThus equation (5) reduces to:\n\n\u0005\u0015\u0014\n\u000b tr\n\ntrace operator.\n\nis the matrix\n\ntr\n\n\b\u0014\u000e\n\n\b\u000f\u000e\n\n\u000b\u0015\u000e\n\n\u0013 T\u0005\n\n\b\f\u000b\n\n\b\f\u000b\n\n\u0003\u0018\u0017\n\n(6)\n\nIf the data space is isotropic then a stationary point of the error will correspond to a similarly\n\nisotropic map space7. Thus, at a stationary point, we have for large\u0001\n\n:\n\nThe \ufb01rst term is the third order moment, which is zero for an isotropic distribution [6]. For\n\nhigh-dimensional data (i.e. large \u0017 ) the second term can be simpli\ufb01ed to:\n\u0005\u0015\u0014\n\n\u0013 T\u0005\n\n\u0005\u0015\u0014\n\n\u0005\u0015\u0014\n\n\u0005) \n\n7This is true regardless of the initial distribution of the map points, although a highly non-uniform\ninitial con\ufb01guration would take signi\ufb01cantly longer to reach a local minimum of the error function.\n\n-#\u0002\n\n\u001f\"!\n\n(7)\n\n\u0006\u0005 \n\nthe data space respectively.\n\nFinally, consider the expression:\n\nis the x\n\nwhere\u001a\n\n\u0006\t\b\u0016\u0001\u0018\u0017\n\b\f\u000b\n\u0006\u001c\u000b\n\u0019 and\u0017\n\n\u0019\u001b\u001a\n-\u0003\u0002\n\u001d are the variances in the map space and\n\u0005\u0015\u0014\n\n\u0013 T\u0005\n\n\u0005\u0015 \n\n\u0006\t\b\n\ntr\r\nidentity matrix, and\u0017\n\u0005) \nT\u0005\n\ntr\r\n\u0005\u0015\u0014\n\n\u0007\n\u0019\n\u0001\n \n\u0013\n/\n1\n\u0002\n\u0003\n\u0018\n\u0016\n\u0003\n\u0013\n\u0005\n \n\u0002\n\u0003\n\u0002\n\u0018\n\u0016\n\u0003\n\u0002\n\u0005\n\u0017\n\u0018\n\u0016\n\u0003\n\u001f\n\u0002\n\u0005\n\u0002\n \n\u0002\n\u0018\n\u0016\n\u0003\n\u001f\n\u0002\n\u0005\n \n\u0013\n\u0003\n2\n3\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u0002\n\u0005\n\u0001\n\u0015\n\u0006\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u0013\n\u0005\n\u0001\n\u0015\n\u0006\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u0002\n\u0005\n\u0002\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u0002\n\u0005\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u0002\n\u0002\n\u0005\n\u0001\n\u0006\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u0013\n\u0005\n\u0001\n\u0006\n\b\n\u0015\n\n\b\n\u000e\n\u000b\n\u0013\n\u0003\n \n\u0002\n\u0002\n\u0003\n\u0010\n\u0002\n\u0003\n\u0017\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u001f\n\u0002\n\u0002\n\u0005\n \n\u0002\n\u0005\n \n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u001f\n\u0013\n\u0005\n \n\u0002\n\u0005\n\u0007\n\u0019\n\u0006\n\b\n\u0002\n\u0006\n\u0013\n\u0006\n \n\u0006\n\u0010\n\u0002\n\u0003\n\u0016\n\u0002\n\u0006\n\u0001\n\u0015\n\u0004\n\u000e\n \n\u000e\n\u0001\n\n\u0017\n\u0016\n\u0019\n \n\u0017\n\u0017\n\u0016\n\u001d\n\u0002\n\u0016\n\u0016\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u001f\n\u0002\n\u0002\n\u0002\n\u0005\n \n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u001f\n\u0013\n\u0002\n\u0005\n\u0013\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u001f\n\u0013\n\u0002\n\u0005\n\u0007\n\u0017\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u001e\n\u0013\n\u0017\n\u0004\n\u0018\n\u001f\n\u0016\n\u0016\n\u0005\n\u0002\n\u0005\n\u0001\n\u0017\n\u0017\n\u0016\n\u001d\n\u0001\n \n\u0013\n\u0018\n\u0016\n\u0003\n\u0002\n\u0005\n\u0001\n\u0015\n\u0006\n\fThus the equation governing the stationary points of the SS TRESS error is given by:\n\nAt the minimum error con\ufb01guration, we have:\n\n\u0013 T\u0003\n\nT\u0003\n\nT\u0003\n\n\u0013 T\u0003\nSumming over all points \u0002 , gives:\n\u0013 T\u0003\n\u0013 T\u0003\n\nT\u0003\n\n\u0003\u0018\u0017\n\nT\u0003\n\n\u0003\u0018\u0017\n\n\u0017$\u0019\n\n\u0007.-\n\u0007\u0001\n\n\u0007\u0003\n\n\u0007\u0001\n\n\u0007\u0003\n\n\u0017$\u0019\n\n, the variance of the map points is related to the variance of the data points\n\nby a factor of\nfor 1000 data points sampled randomly from uniform distributions in the interval\n\n\u000b\u0010\u000e\ntr\r\ntr\r\nThus, for large\u0017\n\u0006 . Table 1 shows the values of the observed and predicted map variances\n\u0002\u0006\u0005\n\u0013\t\b\n\u0007\n\n!\f\u000b ) of dimensions \u0017\n(i.e.\u0017\ndata space\u0017\nNumber of points\u0001\nDimension\u0017\n\n5, 10, 30, and 100. Clearly as the dimension of the\nincreases, so too does the accuracy of the approximation given by equation (7),\n\nand therefore the accuracy of equation (8).\n\nPercentage error\n\n(8)\n\n\u0019 observed\n\n0.166\n0.303\n0.864\n2.823\n\n\u0019 predicted\n\n0.139\n0.278\n0.835\n2.783\n\n16.4%\n8.1%\n3.4%\n1.4%\n\n1000\n1000\n1000\n1000\n\n5\n10\n30\n100\n\nTable 1: A comparison of the predicted and observed map variances.\n\n(9)\n\n\u000b\u0013\u0012\n\nWe can show that this mismatch in variances in the two spaces results in the map points\nclustering in a circular fashion by considering the expected squared distance of the map\n\npoints from the origin (i.e. the expected squared radius \n\n# of the annulus):\nT\u0003\n\r\u000f\r\nIn addition we can derive an analytic expression for \u000e\n\u000e . For simplicity, consider a\n\r\u000f\r\u0011\u0010\n\u0010 T. Then we have:\ntwo-dimensional map space \u0002\n\r\u0014\r\n\u0006 and \u0012\n# will be uncorrelated due to the\n. In general for a  -dimensional map space we have that \u000e\n\r\u0014\r\u0017\u0010\n\u0019 . Thus the variance of \r\n\u0007\u001c\n\r\u0014\n\n\u0006\u0010\b\u0006\u0012\n\u0017\u0016\u0015\n# separates since \u0012\n\nwhere the expectation over \u0012\nisotropic nature of \u0002\n\nthe optimal con\ufb01guration will be an annulus or ring shape, as observed\n\nHence for large \u0017\n\nis given by:\n\nin \ufb01gure 2.\n\n\u0018\u0017\u0019\u001b\u001a\n\n\u000f\n\n(10)\n\n\u001f\n\u0013\n\u0003\n \n\u0002\n\u0002\n\u0003\n\u0017\n\u0017\n\u0017\n\u0016\n\u001d\n \n\u000b\n\n\u0017\n\u0019\n\u0010\n\u0017\n\u0016\n\u0019\n \n\u0002\n\u0003\n\u0002\n\u0015\n\u0006\n\u0013\n\u0003\n \n\u0002\n\u0002\n\u0003\n\u0017\n\u0017\n\u0017\n\u0016\n\u001d\n \n\u000b\n\n\u0010\n\u0017\n\u0016\n\u0019\n\u000e\n\u0018\n\u0003\n\u0016\n\u0006\n\u001f\n\u0013\n\u0003\n \n\u0002\n\u0002\n\u0017\n\u0017\n\u0016\n\u001d\n \n\u000b\n\n\u0017\n\u0019\n\u0010\n\u0017\n\u0016\n\u0019\n \n\u0004\n\u0013\n\u0001\n\u000e\n\u0018\n\u0003\n\u0016\n\u0006\n\u0013\n\u0003\n \n\u0013\n\u0001\n\u000e\n\u0018\n\u0003\n\u0016\n\u0006\n\u0002\n\u0002\n\u0017\n\u0017\n\u0016\n\u001d\n \n\u000b\n\n\u0017\n\u0019\n\u0010\n\u0017\n\u0016\n\u0019\n\u0004\n\u0006\n \n\u0006\n\b\n\u000e\n\u0017\n\u0017\n\u0017\n\u0016\n\u001d\n \n\u000b\n\n\u0010\n\u0017\n\u0016\n\u0019\n\u0004\n\u0017\n\u0016\n\u0019\n\u0007\n\u0017\n\n\u0017\n\u0013\n\u0017\n\u0016\n\u001d\n\u0004\n\u0007\n\n\b\n\u0004\n\u0016\n\u001d\n\n\n\u0007\n\u0017\n\u0016\n\u0017\n\u0016\n\u000e\n#\n\u000e\n\u0007\n\u0013\n\u0001\n\u000e\n\u0018\n\u0003\n\u0016\n\u0006\n\u0002\n\u0002\n\u0003\n\u0007\n\n\u0017\n\u0016\n\u0019\n\u0007\n\u0017\n\n\n\u0017\n\u0013\n\u0017\n\u0016\n\u001d\n\u0007\n#\n\u000e\n\u0010\n\u000e\n\u0007\n\u000e\n\n\u0012\n\u0010\n\u0006\n\u0017\n\u0019\n\u0012\n#\n\u0006\n\u0012\n#\n#\n\u0017\n\u0012\n\u0010\n#\n\u000e\n\u0007\n\u000e\n\n\u0012\n\u0010\n\u0006\n\u000e\n\u0017\n\u0019\n\u000e\n\n\u0012\n#\n\u0006\n\u000e\n\u000e\n\n\u0012\n#\n#\n\u000e\n\u0017\n\u000e\n\n\u0012\n\u0010\n#\n\u000e\n\u0007\n\u0012\n\u0019\n#\n\u0006\n\u0012\n#\n#\n#\n\u000e\n\u0007\n\n\u0016\n\u0017\n\u0015\n#\n\u000b\n\n#\n\u0010\n\u0007\n\u000e\n\u0010\n\u000e\n \n\u000b\n\u000e\n#\n\u000e\n\u0010\n#\n\f5 Conclusions\n\nWe have investigated the problem or artefactual or illusory structure from topographic map-\npings based upon least squares scaling algorithms from multidimensional scaling. In partic-\nular we have shown that the use of a squared Euclidean distance metric (i.e. the SS TRESS\nmeasure) gives rise to an annular structure when the input data is drawn from a high-\ndimensional isotropic distribution. A theoretical analysis of this problem was presented\nand a simple relationship between the variance of the map and the data points was de-\nrived. Finally we showed that this relationship results in an optimal con\ufb01guration which is\ncharacterised by the map points clustering in a circular fashion.\n\nAcknowledgments\n\nWe thank Miguel Carreira-Perpi\u02dcn\u00b4an for useful comments on this work.\n\nReferences\n\n[1] T. F. Cox and M. A. A. Cox. Multidimensional scaling. Chapman and Hall, London, 1994.\n[2] J. deLeeuw and B. Bettonvil. An upper bound for sstress. Psychometrika, 51:149 \u2013 153, 1986.\n[3] G. J. Goodhill, M. W. Simmen, and D. J. Willshaw. An evaluation of the use of multidimensional\nscaling for understanding brain connectivity. Philosophical Transactions of the Royal Society,\nSeries B, 348:256 \u2013 280, 1995.\n\n[4] H. Klock and J. M. Buhmann. Multidimensional scaling by deterministic annealing.\n\nIn\nM. Pelillo and E. R. Hancock, editors, Energy Minimization Methods in Computer Vision\nand Pattern Recognition, Proc. Int. Workshop EMMCVPR \u201997, Venice, Italy, pages 246\u2013260.\nSpringer Lecture Notes in Computer Science, 1997.\n\n[5] D. Lowe and M. E. Tipping. Neuroscale: Novel topographic feature extraction with radial basis\nfunction networks. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural\nInformation Processing Systems 9. Cambridge, MA: MIT Press, 1997.\n\n[6] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate analysis. Academic Press, 1997.\n[7] S. T. Roweis, L. K. Saul, and G. E. Hinton. Global coordination of local linear models. In T. G.\nDietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing\nSystems 14. Cambridge, MA: MIT Press, 2002.\n\n[8] J. W. Sammon. A nonlinear mapping for data structure analysis. IEEE Transactions On Com-\n\nputers, C-18(5):401 \u2013 409, 1969.\n\n[9] M. W. Simmen, G. J. Goodhill, and D. J. Willshaw. Scaling and brain connectivity. Nature,\n\n369:448\u2013450, 1994.\n\n[10] J. B. Tenenbaum. Mapping a manifold of perceptual observations. In M. I. Jordan, M. J. Kearns,\nand S. A. Solla, editors, Advances in Neural Information Processing Systems 10. Cambridge,\nMA: MIT Press, 1998.\n\n[11] C. K. Williams. On a connection between kernel PCA and metric multidimensional scaling. In\nT. K. Leen, T. G. Diettrich, and V. Tresp, editors, Advances in Neural Information Processing\nSystems 13. Cambridge, MA: MIT Press, 2001.\n\n[12] M. P. Young. Objective analysis of the topological organization of the primate cortical visual\n\nsystem. Nature, 358:152\u2013155, 1992.\n\n[13] M. P. Young, J. W. Scannell, M. A. O\u2019Neill, C. C. Hilgetag, G. Burns, and C. Blakemore.\nNon-metric multidimensional scaling in the analysis of neuroanatomical connection data and\nthe organization of the primate cortical visual system. Philosophical Transactions of the Royal\nSociety, Series B, 348:281 \u2013 308, 1995.\n\n\f", "award": [], "sourceid": 2239, "authors": [{"given_name": "Nicholas", "family_name": "Hughes", "institution": null}, {"given_name": "David", "family_name": "Lowe", "institution": null}]}