{"title": "A Polygonal Line Algorithm for Constructing Principal Curves", "book": "Advances in Neural Information Processing Systems", "page_first": 501, "page_last": 507, "abstract": null, "full_text": "A Polygonal Line Algorithm for Constructing \n\nPrincipal Curves \n\nBalazs Kegl, Adam Krzyzak \nDept. of Computer Science \n\nConcordia University \n\n1450 de Maisonneuve Blvd. W. \nMontreal, Canada H3G IM8 \n\nkegl@cs.concordia.ca \n\nkrzyzak@cs.concordia.ca \n\nTamas Linder \n\nDept. of Mathematics \n\nand Statistics \n\nQueen's University \nKingston, Ontario \nCanada K7L 3N6 \n\nlinder@mast.queensu.ca \n\nKenneth Zeger \n\nDept. of Electrical and \nComputer Engineering \nUniversity of California \n\nSan Diego, La Jolla \n\nCA 92093-0407 \nzeger@ucsd.edu \n\nAbstract \n\nPrincipal curves have been defined as \"self consistent\" smooth curves \nwhich pass through the \"middle\" of a d-dimensional probability distri(cid:173)\nbution or data cloud. Recently, we [1] have offered a new approach by \ndefining principal curves as continuous curves of a given length which \nminimize the expected squared distance between the curve and points of \nthe space randomly chosen according to a given distribution. The new \ndefinition made it possible to carry out a theoretical analysis of learning \nprincipal curves from training data. In this paper we propose a practical \nconstruction based on the new definition. Simulation results demonstrate \nthat the new algorithm compares favorably with previous methods both \nin terms of performance and computational complexity. \n\n1 Introduction \n\nHastie [2] and Hastie and Stuetzle [3] (hereafter HS) generalized the self consistency prop(cid:173)\nerty of principal components and introduced the notion of principal curves. Consider a \nd-dimensional random vector X = (X(I), ... ,X(d)) with finite second moments, and let \nf{t) = (II (t), ... ,!d(t)) be a smooth curve in 1{,d parameterized by t E 1{,. For any x E 1{,d \nlet tf(x) denote the parameter value t for which the distance between x and f(t) is mini(cid:173)\nmized. By the HS definition, f(t) is a principal curve if it does not intersect itself and is \nself consistent, that is, f(t) = E(Xltr(X) = t). Intuitively speaking, self-consistency means \nthat each point of f is the average (under the distribution of X) of points that project there. \nBased on their defining property HS developed an algorithm for constructing principal \ncurves for distributions or data sets, and described an application in the Stanford Linear \nCollider Project [3]. \n\n\f502 \n\nB. Keg/, A. Krzyiak, T. Linder and K. Zeger \n\nPrincipal curves have been applied by Banfield and Raftery [4] to identify the outlines of \nice floes in satellite images. Their method of clustering about principal curves led to a fully \nautomatic method for identifying ice floes and their outlines. On the theoretical side, Tib(cid:173)\nshirani [5] introduced a semiparametric model for principal curves and proposed a method \nfor estimating principal curves using the EM algorithm. Recently, Delicado [6] proposed \nyet another definition based on a property of the first principal components of multivari(cid:173)\nate normal distributions. Close connections between principal curves and Kohonen's self(cid:173)\norganizing maps were pointed out by Mulier and Cherkas sky [7]. Self-organizing maps \nwere also used by Der et al. [8] for constructing principal curves. \n\nThere is an unsatisfactory aspect of the definition of principal curves in the original HS \npaper as well as in subsequent works. Although principal curves have been defined to be \nnonparametric, their existence for a given distribution or probability density is an open \nquestion, except for very special cases such as elliptical distributions. This also makes it \ndifficult to theoretically analyze any learning schemes for principal curves. \n\nRecently, we [1] have proposed a new definition of principal curves which resolves this \nproblem. In the new definition, a curve f\" is called a principal curve of length L for X if f\" \nminimizes ~(f) = E [infr IIX - f(t) 112] = EIIX - f(tf(X)) 11 2, the expected squared distance \nbetween X and the curve, over all curves of length less than or equal to L. It was proved in \n[1] that for any X with finite second moments there always exists a principal curve in the \nnew sense. \n\nA theoretical algorithm has also been developed to estimate principal curves based on a \ncommon model in statistical learning theory (e.g. see [9]). SUJ'pose that the distribution of \nX is concentrated on a closed and bounded convex set K C 1l ,and we are given n training \npoints XI, ... ,Xn drawn independently from the distribution of X. Let S denote the family \nof curves taking values in K and having length not greater than L. For k :2: 1 let Sk be the \nset of polygonal (piecewise linear) curves in K which have k segments and whose lengths \ndo not exceed L. Let \n\n~(X,f) = minllx - f(t)112 \n\nr \n\n(1) \n\ndenote the squared distance between x and f. For any f E S the empirical squared er(cid:173)\n\nror of f on the training data is the sample average ~n(f) = * L:':I ~(Xi' f). Let the \n\ntheoretical algorithm choose an fk ,n E Sk which minimizes the empirical error, i.e, let \nfk,n = arg minfEsk ~n (f). It was shown in [1] that if k is chosen to be proportional to n I /3 , \nthen the expected squared loss of the empirically optimal polygonal curve with k segments \nand length at most L converges, as n -+ 00, to the squared loss of the principal curve of \nlength L at a rate ~(fk,n) - ~(f\") = O(n- 1/ 3 ). \n\nAlthough amenable to theoretical analysis, the algorithm in [1] is computationally burden(cid:173)\nsome for implementation. In this paper we develop a suboptimal algorithm for learning \nprincipal curves. This practical algorithm produces polygonal curve approximations to the \nprincipal curve just as the theoretical method does, but global optimization is replaced by \na less complex iterative descent method. We give simulation results and compare our algo(cid:173)\nrithm with previous work. In general, on examples considered by HS the performance of \nthe new algorithm is comparable with the HS algorithm, while it proves to be more robust \nto changes in the data generating model. \n\n2 A Polygonal Line Algorithm \nGiven a set of data points ~ = {XI, ... ,xn} C 1ld, the task of finding the polygonal curve \nwith k segments and length L which minimizes ~ Li=1 ~(Xi' f) is computationally difficult. \nWe propose a suboptimal method with reasonable complexity. The basic idea is to start \nwith a straight line segment fl ,n (k = 1) and in each iteration of the algorithm to increase \n\n\fA Polygonal Line Algorithmfor Constructing Principal Curves \n\n503 \n\nthe number of segments by one by adding a new vertex to the polygonal curve fk,n produced \nby the previous iteration. After adding a new vertex, the positions of all vertices are updated \nin an inner loop. \n\n. . ' \n....... \n\n\" , . \n.' \n.'. ..... \n\n' ,1\"1' \n\n.. ., .. ., \n\n~ \n\n~ \n\nV\u00b7::\u00b7 \n... \n:\";\" ': \n' . \n.' \n.....\\ \n.. \n..... \n, . IV ~ .. \u00b7 \n\n.. \n.. \n\n., \n\ni'> \n1\"' \n1\"' \ni\" \n\n\"\" \n\"\" ... \n\nFigure I: The curves fk,n produced by the polygonal line algorithm for n = 100 data points. \nThe data was generated by adding independent Gaussian errors to both coordinates of a \npoint chosen randomly on a half circle. (a) fl,n, (b) f2,n, (c) f4 ,n, (d) fU,n (the output of the \nalgorithm). \n\nSTART \n\nProjection \n\nVertex optimization \n\ny \n\nEND \n\nFigure 2: The flow chart of the polygonal line algorithm. \n\nThe inner loop consists of a projection step and an optimization step. In the projection \nstep the data points are partitioned into \"Voronoi regions\" according to which segment or \nvertex they project. In the optimization step the new position of each vertex is determined \nby minimizing an average squared distance criterion penalized by a measure of the local \ncurvature. These two steps are iterated until convergence is achieved and An is produced. \nThen a new vertex is added. \n\nThe algorithm stops when k exceeds a threshold c(n,~) . This stopping criterion is based \non a heuristic complexity measure, determined by the number segments k, the number of \ndata points n, and the average squared distance ~n(fk,n) . \nTHE INITIALIZATION STEP. To obtain fl,n, take the shortest segment of the first principal \ncomponent line which contains all of the projected data points. \n\nTHE PROJECTION STEP. Let f denote a polygonal curve with vertices VI , . . . ,Vk+1 and \nclosed line segments SI , ... ,Sk, such that Si connects vertices Vi and Vi+l. In this step the \ndata set Xn is partitioned into (at most) 2k + 1 disjoint sets VI, .. ' ,Vk+1 and SI,'\" ,Sk, \nthe Voronoi regions of the vertices and segments of f, in the following manner. For any \nx E 1(d let ~(x, Si) be the squared distance from x to Si (see definition (1\u00bb, and let ~(x, Vi) = \nIIX-ViI12. Then let \n\nVi = {x E Xn: ~(x, Vi) = ~(x,f), ~(x, Vi) < ~(x, vm), m = 1, ... ,i -I}. \n\nUpon setting V = utl Vi, the Si sets are defined by \n\nSi = {x E Xn: x \u00a2 V, ~(X, Si) = ~(x,f) , ~(X,Si) < ~(x,sm) , m = 1, . .. ,i -I}. \n\nThe resulting partition is illustrated in Figure 3. \n\n\f504 \n\nB. Keg/, A. Krzyz'ak, T Linder and K. Zeger \n\nFigure 3: The Voronoi partition induced by the vertices and segments of f \n\nTHE VERTEX OPTIMIZATION STEP. In this step we iterate over the vertices, and relocate \neach vertex while all the others are kept fixed. For each vertex, we minimize ~n(Vi) + \nAPP(Vi), a local average squared distance criterion penalized by a measure of the local \ncurvature by using a gradient (steepest descent) method. \n\nThe local measure of the average squared distance is calculated from the data points which \nproject to Vi or to the line segment(s) starting at Vi (see Projection Step). Accordingly, \nlet O+(Vi) = LXESi~(X,Si)' O-(Vi) = LXESi-l ~(X,Si-l), and V(Vi) = LXEVi~(X, Vi). Now \ndefine the local average squared distance as a function of Vi by \nif i = 1 \n\nV(Vi) + O+(Vi) \n\nIVil + ISil \n\nO-(Vi) + V(Vi) + O+(Vi) \n\nlSi-Ii + IVil + ISil \n\nO-(Vi) + V(Vi) \nlSi-II + IV;! \n\nifl < i < k+ 1 \n\n(2) \n\nifi = k+ 1. \n\nIn the theoretical algorithm the average squared distance ~n(x,f) is minimized subject to \nthe constraint that f is a polygonal curve with k segments and length not exceeding L. One \ncould use a Lagrangian formulation and attempt to find a new position for Vi (while all \nother vertices are fixed) such that the penalized squared error ~n(f) + Al(f)2 is minimum. \nHowever, we have observed that this approach is very sensitive to the choice of A, and \nreproduces the estimation bias of the HS algorithm which flattens the curve at areas of high \ncurvature. So, instead of directly penalizing the lengths of the line segments, we chose \nto penalize sharp angles to obtain a smooth curve solution. Nonetheless, note that if only \none vertex is moved at a time, penalizing sharp angles will indirectly penalize long line \nsegments. At inner vertices Vi, 3 ::; i ::; k - 1 we penalize the sum of the cosines of the \nthree angles at vertices Vi-I, Vi, and Vi+l. The cosine function was picked because of \nits regular behavior around 1t, which makes it especially suitable for the steepest descent \nalgorithm. To make the algorithm invariant under scaling, we multiply the cosines by the \nsquared radius of the data, that is, r = 1/2maxxEx\",YEx\"lIx- YII. At the endpoints and at \ntheir immediate neighbors (Vi, i = 1,2,k,k+ 1), where penalizing sharp angles does not \ntranslate to penalizing long line segments, the penalty on a nonexistent angle is replaced \nby a direct penalty on the squared length of the first (or last) segment. Formally, let \"Ii \ndenote the angle at vertex Vi, let 1t(Vi) = ,2(1 + COS'Yi) , let Jl+(Vi) = IIVi - Vi+111 2, and let \nJl- (Vi) = IIVi - Vi_111 2\u2022 Then the penalty at vertex Vi is \n\n! 2Jl+(Vi) + 1t(Vi+l) \n\nJl-(Vi) + 1t(Vi) + 1t(Vi+l) \n1t(Vi-I) +1t(Vi) +1t(Vi+l) \n1t(Vi-I) + 1t(Vi) + Jl+(Vi) \n1t(Vi-l) + 2Jl-(Vi) \n\nP(Vi) = \n\nifi= 1 \nif i = 2 \nif2::; i::; k-l \nif i = k \nif i = k+ 1. \n\n\fA Polygonal Line Algorithmfor Constructing Principal Curves \n\n505 \n\nOne important issue is the amount of smoothing required for a given data set. In the HS \nalgorithm one needs to set the penalty coefficient of the spline smoother, or the span of \nthe scatterplot smoother. In our algorithm, the corresponding parameter is the curvature \npenalty factor Ap. If some a priori knowledge about the distribution is available, one can \nuse it to determine the smoothing parameter. However in the absence of such knowledge, \nthe coefficient should be data-dependent. Intuitively, Ap should increase with the number \nof segments and the size of the average squared error, and it should decrease with the data \nsize. Based on heuristic considerations and after carrying out practical experiments, we \nset Ap = A~n-l/3~n(fk,n)1/2r-l, where A'p is a parameter of the algorithm, and can be kept \nfixed for substantially different data sets. \n\nADDING A NEW VERTEX. We start with the optimized fk,n and choose the segment that \nhas the largest number of data points projecting to it. If more then one such segment exists, \nwe choose the ~n~~.st ~ne. ~e I?~dPoint of this seg~ent is selecte~ as the new vertex. \ntl. IS,I ~ IS;I, ) - 1, ... ,k}, and f - argmaxiEI Ilv, - v,+lli. Then the \nFormally, let! -\nnew vertex is V new = (ve + vl+l)/2. \nSTOPPING CONDITION. According to the theoretical results of [1], the number of seg(cid:173)\nments k should be proportional to n1/3 to achieve the O(n1/3) convergence rate for the ex(cid:173)\npected squared distance. Although the theoretical bounds are not tight enough to determine \nthe optimal number of segments for a given data size, we found that k '\" n1/ 3 also works \nin practice. To achieve robustness we need to make k sensitive to the average squared dis(cid:173)\ntance. The stopping condition blends these two considerations. The algorithm stops when \nk exceeds c(n,~n(fk,n)) = Aknl/3~n(fk,n)-1/2r. \nCOMPUTATIONAL COMPLEXITY. The complexity of the inner loop is dominated by the \ncomplexity of the projection step, which is O(nk). Increasing the number of segments by \none at a time (as described in Section 2), and using the stopping condition of Section 2, the \ncomputational complexity of the algorithm becomes O( n5/ 3). This is slightly better than the \nO(n2) complexity of the HS algorithm. The complexity can be dramatically decreased if, \ninstead of adding only one vertex, a new vertex is placed at the midpoint of every segment, \ngiving O(n4/3logn) , or if k is set to be a constant, giving O(n). These simplifications work \nwell in certain situations, but the original algorithm is more robust. \n\n3 Experimental Results \n\nWe have extensively tested our algorithm on two-dimensional data sets. In most experi(cid:173)\nments the data was generated by a commonly used (see, e.g., [3) [5] [7]) additive model \nX = Y + e, where Y is uniformly distributed on a smooth planar curve (hereafter called the \ngenerating curve) and e is bivariate additive noise which is independent of Y. \n\nSince the \"true\" principal curve is not known (note that the generating curve in the model \nX = Y + e is in general not a principal curve either in the HS sense or in our definition), it \nis hard to give an objective measure of performance. For this reason, in what follows, the \nperformance is judged subjectively, mainly on the basis of how closely the resulting curve \nfollows the shape of the generating curve. \n\nIn general, in simulation examples considered by HS the performance of the new algorithm \nis comparable with the HS algorithm. Due to the data-dependence of the curvature penalty \nfactor and the stopping condition, our algorithm turns out to be more robust to alterations in \nthe data generating model, as well as to changes in the parameters of the particular model. \n\nWe use varying generating shapes, noise parameters, and data sizes to demonstrate the ro(cid:173)\nbustness of the polygonal line algorithm. All of the plots in Figure 4 show the generating \ncurve (Generator Curve), the curve produced by our polygonal line algorithm (Principal \n\n\f506 \n\nB. Keg/, A. Krzyiak, T. Linder and K. Zeger \n\nCurve), and the curve produced by the HS algorithm with spline smoothing (HS Principal \nCurve), which we have found to perform better than the HS algorithm using scatterplot \nsmoothing. For closed generating curves we also include the curve produced by the Ban(cid:173)\nfield and Raftery (BR) algorithm [4], which extends the HS algorithm to closed curves (BR \nPrincipal Curve). The two coefficients of the polygonal line algorithm are set in all exper(cid:173)\niments to the constant values Ak = 0.3 and A~ = 0.1. All plots have been normalized to fit \nin a 2 x 2 square. The parameters given below refer to values before this normalization. \n\n(a) \n\nCiIdt, 'OOpoi'lllwlhrnodiumnoioa \n\n(b) \n\nH.I_. 'OOpoi'lll\"\"rnodiumnoioa \n\n(c) \n\nOioIoItadhalcllcit, 'OOpoinllwthmodllllnoioa \n\nr-~~~~~~~~~'~~~~~~~~~~ ' ~~~~~~~~~~ \n\n.\n\n:\nI \n\n'Slft'4>lapoi'lll \u2022 \n\u2022 5enIriIorp.xve -\n0.8 \n\u2022 !f'~~ ~-:- M \n0.4 \n\n\u2022 \u2022 ::,;.~. \n1 .\" \n/\n'\n\u2022\u2022\u2022\u2022 (1 \n. . \n. \n\n.!\n\n. ~ \n\n02 \n\n0.& \n\nI \u2022 \n' 1 \nI \n\n.. I, \nt\" \n\n\u2022 \n\n\u2022 \n\n. 1 ~O.~& --', ., .'--, \n\n....... -0.-& ~-o.-I -'--o.4~-o2~0 -0\"':-2 ~0'-.4 ~O\"\"'.I -0.& .......... , \n\nSlapoi'lll \u2022 \n\nGtnItaIorCUM (cid:173)\nPltrdpalc..vo ---. \n\nHS pltrdpalc..vo . . \n\n-OJ \n\n. , '--,--~~\"\"\"\"\",----\"\"--,\"--,\"--,--' \n\n\u2022 \n\n. , \n\n-0.& -0.1 \n\n-0.4 \n\n-0.2 \n\n0 \n\n02 \n\n0.4 \n\n0.1 0.& \n\n1 \u2022 \nI' \n\n_~ 'OOpoinllwth_noIoa \n\n. \" . . ... \n. ~ . --..::::.. :. \n'---'-0.-& ~.O.-I -'--0.4----\"-:-02:--'-0 -0~2'-0'-.4 -0 .....\n\nc .... , 'OOpohta\"\"omaIl!OIII \n\n~~~~-T~~~~~'~~~~~~~~~~ \n\n\u2022 \n\u2022 \"-. \n\n1 \nI _,' . \". GIMIiIorOllW -\n\n~poi'III . \n\n0.8 \n\n'. p ' \n\nc..vo ... \n\n~ \"\u00b7 O.I \n\n\\\n\n. \n\n' .' \n\n0.4 \n\n02 \n\ntt,: \n\n'. \n\n\" \n\nPltrdpalc..vo .. -\n\n: a:'::: -=- OJ \n\n1 \nI. ~ Pltrdpalc ..... .. 0.1 \n..~.:;.::: .. ,.\" \nI ,.\" \nI\u00b7 \n\n.. \n\nD.4 \n\n\u2022 \n\n02 \n\nOJ \n\nM \n\n0.4 \n\n-0.4 \n\n-0.1 \n\n\u2022 \n\u2022 ( \n\\ \n\n\u2022 \n\n0.& \n\nOA \n\n02 \n\n\u00b702 \n\n-0.4 \n\n-0.1 \n\n-0.& \n\n. , \n\n., \n\n-0.& \n\n-0.1 \n\n-0.4 \n\n-0.2 \n\n1 \n\nl \n' \".\" \n, . ~ ' . \n., . \n. \n~/ . \n00 \n\n0 \n\n., \n\n., \n\n0.& \n\n, \n\n-0.& \n\n-0.1 \n\n-0.4 \n\n-0.2 \n\n., '---'\"--'~~---'-~~~~--' \n\n;\" \n\n02 0.4 \n\n0.1 \n\n0.8 \n\n, \n\n., \n\n-0.8 \n\n-0.1 \n\n-0.4 \n\n-0.2 \n\n02 0.4 O.! \n\n0.& \n\n, \n\n0 \n\n02 0.4 O.! \n\nW \n\n0 \n\n00 \n\nFigure 4: (a) The Circle Example: the BR and the polygonal line algorithm show less \nbias than the HS algorithm. (b) The Half Circle Example: the HS and the polygonal line \nalgorithms produce similar curves. (c) and (d) Transformed Data Sets: the polygonal line \nalgorithm still follows fairly closely the \"distorted\" shapes. (e) Small Noise Variance and \n(f) Large Sample Size: the curves produced by the polygonal line algorithm are nearly \nindistinguishable from the generating curves. \nIn Figure 4( a) the generating curve is a circle of radius r = 1, and e = (el' e2) is a zero mean \nbivariate uncorrelated Gaussian with variance E(er) = 0.04, i = 1, 2. The performance of \nthe three algorithms (HS, BR, and the polygonal line algorithm) is comparable, although \nthe HS algorithm exhibits more bias than the other two. Note that the BR algorithm [4] has \nbeen tailored to fit closed curves and to reduce the estimation bias. In Figure 4(b), only half \nof the circle is used as a generating curve and the other parameters remain the same. Here, \ntoo, both the HS and our algorithm behave similarly. \n\nWhen we depart from these usual settings the polygonal line algorithm exhibits better be(cid:173)\nhavior than the HS algorithm. In Figure 4(c) the data set of Figure 4(b) was linearly trans(cid:173)\nformed using the matrix (j'.~ ?:~). In Figure 4( d) the transformation (-I:g =b:~) was used. \nThe original data set was generated by an S-shaped generating curve, consisting of two \nhalf circles of unit radii, to which the same Gaussian noise was added as in Figure 4(b). In \nboth cases the polygonal line algorithm produces curves that fit the generator curve more \nclosely. This is especially noticeable in Figure 4(c) where the HS principal curve fails to \nfollow the shape of the distorted half circle. \n\n\fA Polygonal Line Algorithm for Constructing Principal Curves \n\n507 \n\nThere are two situations when we expect our algorithm to perform particularly well. If the \ndistribution is concentrated on a curve, then according to both the HS and our definitions \nthe principal curve is the generating curve itself. Thus, if the noise variance is small, \nwe expect both algorithms to very closely approximate the generating curve. The data in \nFigure 4(e) was generated using the same additive Gaussian model as in Figure 4(a), but \nthe noise variance was reduced to E(er) = 0.001 for i = 1,2. In this case we found that the \npolygonal line algorithm outperformed both the HS and the BR algorithms. \n\nThe second case is when the sample size is large. Although the generating curve is not \nnecessarily the principal curve of the distribution, it is natural to expect the algorithm to \nwell approximate the generating curve as the sample size grows. Such a case is shown in \nFigure 4(f), where n = 10000 data points were generated (but only a small subset of these \nwas actually plotted). Here the polygonal line algorithm approximates the generating curve \nwith much better accuracy than the HS algorithm. \n\nThe Java implementation of the algorithm is available at the WWW site \n\nhttp://www.cs.concordia.ca/-grad/kegl/pcurvedemo.html \n\n4 Conclusion \n\nWe offered a new definition of principal curves and presented a practical algorithm for \nconstructing principal curves for data sets. One significant difference between our method \nand previous principal curve algorithms ([3],[4], and [8]) is that, motivated by the new \ndefinition, our algorithm minimizes a distance criterion (2) between the data points and the \npolygonal curve rather than minimizing a distance criterion between the data points and the \nvertices of the polygonal curve. This and the introduction of the data-dependent smoothing \nfactor Ap made our algorithm more robust to variations in the data distribution, while we \ncould keep computational complexity low. \n\nAcknowledgments \n\nThis work was supported in part by NSERC grant OGPOOO270, Canadian National Networks of \nCenters of Excellence grant 293, and the National Science Foundation. \n\nReferences \n[1] B. Kegl, A. Krzyzak, T. Linder, and K. Zeger, ''Principal curves: Learning and convergence,\" in \n\nProceedings of IEEE Int. Symp. on Information Theory, p. 387, 1998. \n\n[2] T. Hastie, Principal curves and surfaces. PhD thesis, Stanford University, 1984. \n[3] T. Hastie and W. Stuetzle, ''Principal curves,\" Journal of the American Statistical Association, \n\nvol. 84,no. 406, pp. 502-516, 1989. \n\n[4] J. D. Banfield and A. E. Raftery, \"Ice floe identification in satellite image~ using mathematical \nmorphology and clustering about principal curves,\" Journal of the American Statistical Associa(cid:173)\ntion, vol. 87, no. 417, pp. 7-16, 1992. \n\n[5] R. Tibshirani, \"Principal curves revisited,\" Statistics and Computation, vol. 2, pp. 183-190, 1992. \n[6] P. Delicado, ''Principal curves and principal oriented points,\" Tech. Rep. 309, Department \n\nd'Economia i Empresa, Universitat Pompeu Fabra, 1998. \nhttp://www.econ.upf.es/deehome/what/wpapers/postscripts/309.pdf. \n[7] F. Mulier and V. Cherkassky, \"Self-organization as an iterative kernel smoothing process,\" Neural \n\nComputation, vol. 7, pp. 1165-1177, 1995. \n\n[8] R. Der, U. Steinmetz, and G. Balzuweit, \"Nonlinear principal component analysis,\" tech. rep., \n\nInstitut fUr Infonnatik, Universitilt Leipzig, 1998. \nhttp://www.informatik.uni-leipzig.de/~der/Veroeff/npcafin.ps. \n[9] V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. \n\n\f", "award": [], "sourceid": 1627, "authors": [{"given_name": "Bal\u00e1zs", "family_name": "K\u00e9gl", "institution": null}, {"given_name": "Adam", "family_name": "Krzyzak", "institution": null}, {"given_name": "Tam\u00e1s", "family_name": "Linder", "institution": null}, {"given_name": "Kenneth", "family_name": "Zeger", "institution": null}]}