{"title": "Efficient Out-of-Sample Extension of Dominant-Set Clusters", "book": "Advances in Neural Information Processing Systems", "page_first": 1057, "page_last": 1064, "abstract": null, "full_text": " Efficient Out-of-Sample Extension of\n Dominant-Set Clusters\n\n\n\n Massimiliano Pavan and Marcello Pelillo\n Dipartimento di Informatica, Universit`a Ca' Foscari di Venezia\n Via Torino 155, 30172 Venezia Mestre, Italy\n {pavan,pelillo}@dsi.unive.it\n\n Abstract\n\n Dominant sets are a new graph-theoretic concept that has proven to\n be relevant in pairwise data clustering problems, such as image seg-\n mentation. They generalize the notion of a maximal clique to edge-\n weighted graphs and have intriguing, non-trivial connections to continu-\n ous quadratic optimization and spectral-based grouping. We address the\n problem of grouping out-of-sample examples after the clustering process\n has taken place. This may serve either to drastically reduce the compu-\n tational burden associated to the processing of very large data sets, or\n to efficiently deal with dynamic situations whereby data sets need to be\n updated continually. We show that the very notion of a dominant set of-\n fers a simple and efficient way of doing this. Numerical experiments on\n various grouping problems show the effectiveness of the approach.\n\n\n1 Introduction\n\nProximity-based, or pairwise, data clustering techniques are gaining increasing popular-\nity over traditional central grouping techniques, which are centered around the notion of\n\"feature\" (see, e.g., [3, 12, 13, 11]). In many application domains, in fact, the objects to\nbe clustered are not naturally representable in terms of a vector of features. On the other\nhand, quite often it is possible to obtain a measure of the similarity/dissimilarity between\nobjects. Hence, it is natural to map (possibly implicitly) the data to be clustered to the\nnodes of a weighted graph, with edge weights representing similarity or dissimilarity rela-\ntions. Although such a representation lacks geometric notions such as scatter and centroid,\nit is attractive as no feature selection is required and it keeps the algorithm generic and\nindependent from the actual data representation. Further, it allows one to use non-metric\nsimilarities and it is applicable to problems that do not have a natural embedding to a uni-\nform feature space, such as the grouping of structural or graph-based representations.\n\nWe have recently developed a new framework for pairwise data clustering based on a novel\ngraph-theoretic concept, that of a dominant set, which generalizes the notion of a maximal\nclique to edge-weighted graphs [7, 9]. An intriguing connection between dominant sets\nand the solutions of a (continuous) quadratic optimization problem makes them related in\na non-trivial way to spectral-based cluster notions, and allows one to use straightforward\ndynamics from evolutionary game theory to determine them [14]. A nice feature of this\nframework is that it naturally provides a principled measure of a cluster's cohesiveness as\nwell as a measure of a vertex participation to its assigned group. It also allows one to obtain\n\"soft\" partitions of the input data, by allowing a point to belong to more than one cluster.\nThe approach has proven to be a powerful one when applied to problems such as intensity,\ncolor, and texture segmentation, or visual database organization, and is competitive with\n\n\f\nspectral approaches such as normalized cut [7, 8, 9].\n\nHowever, a typical problem associated to pairwise grouping algorithms in general, and\nhence to the dominant set framework in particular, is the scaling behavior with the number\nof data. On a dataset containing N examples, the number of potential comparisons scales\nwith O(N 2), thereby hindering their applicability to problems involving very large data\nsets, such as high-resolution imagery and spatio-temporal data. Moreover, in applications\nsuch as document classification or visual database organization, one is confronted with\na dynamic environment which continually supplies the algorithm with newly produced\ndata that have to be grouped. In such situations, the trivial approach of recomputing the\ncomplete cluster structure upon the arrival of any new item is clearly unfeasible.\n\nMotivated by the previous arguments, in this paper we address the problem of efficiently\nassigning out-of-sample, unseen data to one or more previously determined clusters. This\nmay serve either to substantially reduce the computational burden associated to the process-\ning of very large (though static) data sets, by extrapolating the complete grouping solution\nfrom a small number of samples, or to deal with dynamic situations whereby data sets need\nto be updated continually. There is no straightforward way of accomplishing this within\nthe pairwise grouping paradigm, short of recomputing the complete cluster structure. Re-\ncent sophisticated attempts to deal with this problem use optimal embeddings [11] and the\nNystrom method [1, 2]. By contrast, we shall see that the very notion of a dominant set,\nthanks to its clear combinatorial properties, offers a simple and efficient solution to this\nproblem. The basic idea consists of computing, for any new example, a quantity which\nmeasures the degree of cluster membership, and we provide simple approximations which\nallow us to do this in linear time and space, with respect to the cluster size. Our classifi-\ncation schema inherits the main features of the dominant set formulation, i.e., the ability\nof yielding a soft classification of the input data and of providing principled measures for\ncluster membership and cohesiveness.\n\nNumerical experiments show that the strategy of first grouping a small number of data\nitems and then classifying the out-of-sample instances using our prediction rule is clearly\nsuccessful as we are able to obtain essentially the same results as the dense problem in\nmuch less time. We also present results on high-resolution image segmentation problems,\na task where the dominant set framework would otherwise be computationally impractical.\n\n\n2 Dominant Sets and Their Continuous Characterization\n\nWe represent the data to be clustered as an undirected edge-weighted (similarity) graph\nwith no self-loops G = (V, E, w), where V = {1, . . . , n} is the vertex set, E V \nV is the edge set, and w : E IR+ is the (positive) weight function. Vertices in G\ncorrespond to data points, edges represent neighborhood relationships, and edge-weights\nreflect similarity between pairs of linked vertices. As customary, we represent the graph\nG with the corresponding weighted adjacency (or similarity) matrix, which is the n n\nnonnegative, symmetric matrix A = (aij) defined as:\n\n a w(i, j) , if (i, j) E\n ij = 0 , otherwise .\n\nLet S V be a non-empty subset of vertices and i V . The (average) weighted degree\nof i w.r.t. S is defined as:\n 1\n awdegS (i) = a\n |S| ij (1)\n jS\n\nwhere |S| denotes the cardinality of S. Moreover, if j /\n S we define S (i, j) = aij -\nawdegS (i) which is a measure of the similarity between nodes j and i, with respect to the\naverage similarity between node i and its neighbors in S.\n\n\f\nFigure 1: An example edge-weighted graph. Note that w{1,2,3,4} (1) < 0 and this reflects the fact\nthat vertex 1 is loosely coupled to vertices 2, 3 and 4. Conversely, w{5,6,7,8} (5) > 0 and this reflects\nthe fact that vertex 5 is tightly coupled with vertices 6, 7, and 8.\n\nLet S V be a non-empty subset of vertices and i S. The weight of i w.r.t. S is\n \n \n 1, if |S| = 1\n wS (i) = (2)\n S\\{i} (j, i) wS\\{i} (j) , otherwise\n jS\\{i}\n\nwhile the total weight of S is defined as:\n\n W(S) = wS(i) . (3)\n iS\n\nIntuitively, wS (i) gives us a measure of the overall similarity between vertex i and the\nvertices of S \\ {i} with respect to the overall similarity among the vertices in S \\ {i}, with\npositive values indicating high internal coherency (see Fig. 1).\n\nA non-empty subset of vertices S V such that W (T ) > 0 for any non-empty T S, is\nsaid to be dominant if:\n\n 1. wS (i) > 0, for all i S\n 2. wS{i} (i) < 0, for all i /\n S.\n\nThe two previous conditions correspond to the two main properties of a cluster: the\nfirst regards internal homogeneity, whereas the second regards external inhomogeneity.\nThe above definition represents our formalization of the concept of a cluster in an edge-\nweighted graph.\n\nNow, consider the following quadratic program, which is a generalization of the so-called\nMotzkin-Straus program [5] (here and in the sequel a dot denotes the standard scalar prod-\nuct between vectors):\n maximize f(x) = x Ax (4)\n subject to x n\nwhere\n n = {x IRn : xi 0 for all i V and e x = 1}\nis the standard simplex of IRn, and e is a vector of appropriate length consisting of unit\nentries (hence e x = i xi). The support of a vector x n is defined as the set of\nindices corresponding to its positive components, that is (x) = {i V : xi > 0}. The\nfollowing theorem, proved in [7], establishes an intriguing connection between dominant\nsets and local solutions of program (4).\n\nTheorem 1 If S is a dominant subset of vertices, then its (weighted) characteristics vector\nxS, which is the vector of n defined as\n wS(i)\n xS W(S) , if i S\n i = (5)\n 0, otherwise\n\nis a strict local solution of program (4). Conversely, if x is a strict local solution of pro-\ngram (4) then its support S = (x) is a dominant set, provided that wS{i} (i) = 0 for all\ni / S.\n\n\f\nThe condition that wS{i} (i) = 0 for all i /\n S = (x) is a technicality due to the presence\nof \"spurious\" solutions in (4) which is, at any rate, a non-generic situation.\n\nBy virtue of this result, we can find a dominant set by localizing a local solution of pro-\ngram (4) with an appropriate continuous optimization technique, such as replicator dynam-\nics from evolutionary game theory [14], and then picking up its support. Note that the\ncomponents of the weighted characteristic vectors give us a natural measure of the par-\nticipation of the corresponding vertices in the cluster, whereas the value of the objective\nfunction measures the cohesiveness of the class. In order to get a partition of the input data\ninto coherent groups, a simple approach is to iteratively finding a dominant set and then\nremoving it from the graph, until all vertices have been grouped (see [9] for a hierarchical\nextension of this framework). On the other hand, by finding all dominant sets, i.e., local so-\nlutions of (4), of the original graph, one can obtain a \"soft\" partition of the dataset, whereby\nclusters are allowed to overlap. Finally, note that spectral clustering approaches such as,\ne.g., [10, 12, 13] lead to similar, though intrinsically different, optimization problems.\n\n\n3 Predicting Cluster Membership for Out-of-Sample Data\n\nSuppose we are given a set V of n unlabeled items and let G = (V, E, w) denote the\ncorresponding similarity graph. After determining the dominant sets (i.e., the clusters)\nfor these original data, we are next supplied with a set V of k new data items, together\nwith all kn pairwise affinities between the old and the new data, and are asked to assign\neach of them to one or possibly more previously determined clusters. We shall denote by\n^\nG = ( ^\n V , ^\n E, ^\n w), with ^\n V = V V , the similarity graph built upon all the n + k data. Note\nthat in our approach we do not need the k2 affinities between the new points, which is a\nnice feature as in most applications k is typically very large. Technically, ^\n G is a supergraph\nof G, namely a graph having V ^\n V , E ^\n E and w(i, j) = ^\n w(i, j) for all (i, j) E.\n\nLet S V be a subset of vertices which is dominant in the original graph G and let\ni ^\n V \\ V a new data point. As pointed out in the previous section, the sign of wS{i} (i)\nprovides an indication as to whether i is tightly or loosely coupled with the vertices in S\n(the condition wS{i} (i) = 0 corresponds to a non-generic boundary situation that does\nnot arise in practice and will therefore be ignored).1 Accordingly, it is natural to propose\nthe following rule for predicting cluster membership of unseen data:\n\n if wS{i} (i) > 0, then assign vertex i to cluster S . (6)\n\nNote that, according to this rule, the same point can be assigned to more than one class,\nthereby yielding a soft partition of the input data. To get a hard partition one can use the\ncluster membership approximation measures we shall discuss below. Note that it may also\nhappen for some instance i that no cluster S satisfies rule (6), in which case the point gets\nunclassified (or assigned to an \"outlier\" group). This should be interpreted as an indication\nthat either the point is too noisy or that the cluster formation process was inaccurate. In our\nexperience, however, this situation arises rarely.\n\nA potential problem with the previous rule is its computational complexity. In fact, a direct\napplication of formula (2) to compute wS{i} (i) is clearly infeasible due to its recursive\nnature. On the other hand, using a characterization given in [7, Lemma 1] would also be\nexpensive since it would involve the computation of a determinant. The next result allows\nus to compute the sign of wS{i} (i) in linear time and space, with respect to the size of S.\n\nProposition 1 Let G = (V, E, w) be an edge-weighted (similarity) graph, A = (aij) its\nweighted adjacency matrix, and S V a dominant set of G with characteristic vector\n\n 1Observe that wS (i) depends only on the the weights on the edges of the subgraph induced by S.\nHence, no ambiguity arises as to whether wS (i) is computed on G or on ^\n G.\n\n\f\nxS. Let ^\n G = ( ^\n V , ^\n E, ^\n w) be a supergraph of G with weighted adjacency matrix ^\n A = (^aij).\nThen, for all i ^\n V \\ V , we have:\n wS{i} (i) > 0 ^\n ahixSh > f(xS) (7)\n hS\nProof. From Theorem 1, xS is a strict local solution of program (4) and hence it satisfies\nthe Karush-Kuhn-Tucker (KKT) equality conditions, i.e., the first-order necessary equality\nconditions for local optimality [4]. Now, let ^\n n = | ^\n V | be the cardinality of ^\n V and let ^xS be\nthe (^\n n-dimensional) characteristic vector of S in ^\n G, which is obtained by padding xS with\nzeros. It is immediate to see that ^\n xS satisfies the KKT equality conditions for the problem\nof maximizing ^\n f(^x) = ^x ^\n A^x, subject to ^x ^n. Hence, from Lemma 2 of [7] we have\nfor all i ^\n V \\ V :\n wS{i} (i) = (^\n ahi - ahj)xS\n W(S) h (8)\n hS\nfor any j S. Now, recall that the KKT equality conditions for program (4) imply\n hS ahjxS\n h = xS AxS = f (xS) for any j S [7]. Hence, the proposition follows\nfrom the fact that, being S dominant, W (S) is positive.\n\nGiven an out-of-sample vertex i and a class S such that rule (6) holds, we now provide an\napproximation of the degree of participation of i in S {i} which, as pointed out in the\nprevious section, is given by the ratio between wS{i} (i) and W(S {i}). This can be\nused, for example, to get a hard partition of the input data when an instance happens to be\nassigned to more than one class. By equation (8), we have:\n wS{i} (i) W(S)\n = (^\n ahi - ahj)xS\n W(S {i}) h W(S {i})\n hS\nfor any j S. Since computing the exact value of the ratio W(S)/W(S {i}) would\nbe computationally expensive, we now provide simple approximation formulas. Since S\nis dominant, it is reasonable to assume that all weights within it are close to each other.\nHence, we approximate S with a clique having constant weight a, and impose that it has\nthe same cohesiveness value f (xS) = xS AxS as the original dominant set. After some\nalgebra, we get\n |S|\n a = f\n |S| - (xS)\n 1\nwhich yields W(S) |S|a|S|-1. Approximating W(S {i}) with |S + 1|a|S| in a similar\nway, we get:\n W(S) |S|a|S|-1 |S| -\n 1 1\n W(S {i}) |S + 1|a|S| = f(xS) |S| + 1\nwhich finally yields:\n wS{i} (i) |S| -\n 1 hS ^\n ahixSh - 1 . (9)\n W(S {i}) |S| + 1 f(xS)\nUsing the above formula one can easily get, by normalization, an approximation of the\ncharacteristic vector x ^\n S n+k of ^S, the extension of cluster S obtained applying rule (6):\n ^\n S = S {i ^\n V \\ V : wS{i} (i) > 0} .\nWith an approximation of x ^\n S at hand, it is also easy to compute an approximation of the\ncohesiveness of the new cluster ^\n S, i.e., x ^S ^\n Ax ^S. Indeed, assuming that ^\n S is dominant in ^\n G,\nand recalling the KKT equality conditions for program (4) [7], we get ( ^\n Ax ^S)i = x^S ^\n Ax ^S\nfor all i ^\n S. It is therefore natural to approximate the cohesiveness of ^\n S as a weighted\naverage of the ( ^\n Ax ^S)i's.\n\n\f\n \n\n 0.025 0.00045 12\n\n\n 0.0004\n 11\n\n 0.02 0.00035\n\n\n 0.0003 10\n\n\n 0.015 0.00025\n 9\n\n 0.0002\n\n Seconds 8\n 0.01 0.00015\nEuclidean distance Euclidean distance\n\n 0.0001 7\n\n\n 0.005 5e-005\n\n 6\n 0\n\n\n 0 -5e-005 5\n 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1\n\n Sampling rate Sampling rate Sampling rate\n\n\n\n\n\nFigure 2: Evaluating the quality of our approximations on a 150-point cluster. Average distance\nbetween approximated and actual cluster membership (left) and cohesiveness (middle) as a function\nof sampling rate. Right: average CPU time as a function of sampling rate.\n\n4 Experimental Results\n\nIn an attempt to evaluate how the approximations given at the end of the previous sec-\ntion actually compare to the solutions obtained on the dense problem, we conducted the\nfollowing preliminary experiment. We generated 150 points on the plane so as to form\na dominant set (we used a standard Gaussian kernel to obtain similarities), and extracted\nrandom samples with increasing sampling rate, ranging from 1/15 to 1. For each sampling\nrate 100 trials were made, for each of which we computed the Euclidean distance between\nthe approximated and the actual characteristic vector (i.e., cluster membership), as well\nas the distance between the approximated and the actual cluster cohesiveness (that is, the\nvalue of the objective function f ). Fig. 2 shows the average results obtained. As can be\nseen, our approximations work remarkably well: with a sampling rate less than 10 % the\ndistance between the characteristic vectors is around 0.02 and this distance decreases lin-\nearly towards zero. As for the objective function, the results are even more impressive as\nthe distance from the exact value (i.e., 0.989) rapidly goes to zero starting from 0.00025,\nat less than 10% rate. Also, note how the CPU time increases linearly as the sampling rate\napproaches 100%.\n\nNext, we tested our algorithm over the Johns Hopkins University ionosphere database2\nwhich contains 351 labeled instances from two different classes. As in the previous exper-\niment, similarities were computed using a Gaussian kernel. Our goal was to test how the\nsolutions obtained on the sampled graph compare with those of the original, dense prob-\nlem and to study how the performance of the algorithm scales w.r.t. the sampling rate. As\nbefore, we used sampling rates from 1/15 to 1, and for each such value 100 random sam-\nples were extracted. After the grouping process, the out-of-sample instances were assigned\nto one of the two classes found using rule (6). Then, for each example in the dataset a\n\"success\" was recorded whenever the actual class label of the instance coincided with the\nmajority label of its assigned class. Fig. 3 shows the average results obtained. At around\n40% rate the algorithm was already able to obtain a classification accuracy of about 73.4%,\nwhich is even slightly higher that the one obtained on the dense (100% rate) problem, which\nis 72.7%. Note that, as in the previous experiment, the algorithm appears to be robust with\nrespect to the choice of the sample data. For the sake of comparison we also ran normalized\ncut on the whole dataset, and it yielded a classification rate of 72.4%.\n\nFinally, we applied our algorithm to the segmentation of brightness images. The image\nto be segmented is represented as a graph where vertices correspond to pixels and edge-\nweights reflect the \"similarity\" between vertex pairs. As customary, we defined a similarity\nmeasure between pixels based on brightness proximity. Specifically, following [7], simi-\nlarity between pixels i and j was measured by w(i, j) = exp (I(i) - I(j))2/2 where\n is a positive real number which affects the decreasing rate of w, and I(i) is defined as\nthe (normalized) intensity value at node i. After drawing a set of pixels at random with\nsampling rate p = 0.005, we iteratively found a dominant set in the sampled graph using\nreplicator dynamics [7, 14], we removed it from the graph. and we then employed rule (6)\n\n 2http://www.ics.uci.edu/mlearn/MLSummary.html\n\n\f\n \n\n 0.75 16\n\n\n 0.74 15\n\n 0.73\n\n 14\n\n 0.72\n\n 13\n 0.71\n\n\n 0.7 12\n Hit rate Seconds\n 0.69 11\n\n 0.68\n\n 10\n\n 0.67\n\n 9\n 0.66\n\n\n 0.65 8\n 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1\n\n Sampling rate Sampling rate\n\n\n\n\n\nFigure 3: Results on the ionosphere database. Average classification rate (left) and CPU time (right)\nas a function of sampling rate.\n\n\n\n\n\nFigure 4: Segmentation results on a 115 97 weather radar image. From left to right: original\nimage, the two regions found on the sampled image (sampling rate = 0.5%), and the two regions\nobtained on the whole image (sampling rate = 100%).\n\n\nto extend it with out-of-sample pixels.\n\nFigure 4 shows the results obtained on a 115 97 weather radar image, used in [13, 7]\nas an instance whereby edge-detection-based segmentation would perform poorly. Here,\nand in the following experiment, the major components of the segmentations are drawn\non a blue background. The leftmost cluster is the one obtained after the first iteration of\nthe algorithm, and successive clusters are shown left to right. Note how the segmentation\nobtained over the sparse image, sampled at 0.5% rate, is almost identical to that obtained\nover the whole image. In both cases, the algorithms correctly discovered a background\nand a foreground region. The approximation algorithm took a couple of seconds to return\nthe segmentation, i.e., 15 times faster than the one run over the entire image. Note that\nour results are better than those obtained with normalized cut, as the latter provides an\nover-segmented solution (see [13]).\n\nFig. 5 shows results on two 481 321 images taken from the Berkeley database.3 On\nthese images the sampling process produced a sample with no more than 1000 pixels,\nand our current MATLAB implementation took only a few seconds to return a solution.\nRunning the grouping algorithm on the whole images (which contain more than 150, 000\npixels) would simply be unfeasible. In both cases, our approximation algorithm partitioned\nthe images into meaningful and clean components. We also ran normalized cut on these\nimages (using the same sample rate of 0.5%) and the results, obtained after a long tuning\nprocess, confirm its well-known inherent tendency to over-segment the data (see Fig. 5).\n\n\n\n5 Conclusions\n\nWe have provided a simple and efficient extension to the dominant-set clustering framework\nto deal with the grouping of out-of-sample data. This makes the approach applicable to\nvery large grouping problems, such as high-resolution image segmentation, where it would\notherwise be impractical. Experiments show that the solutions extrapolated from the sparse\ndata are comparable with those of the dense problem, which in turn compare favorably with\nspectral solutions such as normalized cut's, and are obtained in much less time.\n\n\n 3http://www.cs.berkeley.edu/projects/vision/grouping/segbench\n\n\f\nFigure 5: Segmentation results on two 481 321 images. Left columns: original images. For each\nimage, the first line shows the major regions obtained with our approximation algorithm, while the\nsecond line shows the results obtained with normalized cut.\n\nReferences\n\n [1] Y. Bengio, J.-F. Paiement, P. Vincent, O. Delalleau, N. Le Roux, and M. Ouimet. Out-of-sample\n extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering. In: S. Thrun, L. Saul,\n and B.Scholkopf (Eds.), Advances in Neural Information Processing Systems 16, MIT Press,\n Cambridge, MA, 2004.\n [2] C. Fowlkes, S. Belongie, F. Chun, and J. Malik. Spectral grouping using the Nystrom method.\n IEEE Trans. Pattern Anal. Machine Intell. 26:214225, 2004.\n [3] T. Hofmann and J. M. Buhmann. Pairwise data clustering by deterministic annealing. IEEE\n Trans. Pattern Anal. Machine Intell. 19:114, 1997.\n [4] D. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, Reading, MA, 1984.\n [5] T. S. Motzkin and E. G. Straus. Maxima for graphs and a new proof of a theorem of Turan.\n Canad. J. Math. 17:533540, 1965.\n [6] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In: T.\n G. Dietterich, S. Becker, and Z. Ghahramani (Eds.), Advances in Neural Information Process-\n ing Systems 14, MIT Press, Cambridge, MA, pp. 849856, 2002.\n [7] M. Pavan and M. Pelillo. A new graph-theoretic approach to clustering and segmentation. In\n Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 145152, 2003.\n [8] M. Pavan, M. Pelillo. Unsupervised texture segmentation by dominant sets and game dynamics.\n In Proc. 12th Int. Conf. on Image Analysis and Processing, pp. 302307, 2003.\n [9] M. Pavan and M. Pelillo. Dominant sets and hierarchical clustering. In Proc. 9th Int. Conf. on\n Computer Vision, pp. 362369, 2003.\n[10] P. Perona and W. Freeman. A factorization approach to grouping. In: H. Burkhardt and B. Neu-\n mann (Eds.), Computer Vision--ECCV'98, pp. 655670. Springer, Berlin, 1998.\n[11] V. Roth, J. Laub, M. Kawanabe, and J. M. Buhmann. Optimal cluster preserving embedding of\n nonmetric proximity data. IEEE Trans. Pattern Anal. Machine Intell. 25:15401551, 2003.\n[12] S. Sarkar and K. Boyer. Quantitative measures of change based on feature organization: Eigen-\n values and eigenvectors. Computer Vision and Image Understanding 71:110136, 1998.\n[13] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Ma-\n chine Intell. 22:888905, 2000.\n[14] J. W. Weibull. Evolutionary Game Theory. MIT Press, Cambridge, MA, 1995.\n[15] Y. Weiss. Segmentation using eigenvectors: A unifying view. In Proc. 7th Int. Conf. on Com-\n puter Vision, pp. 975982, 1999.\n\n\f\n", "award": [], "sourceid": 2626, "authors": [{"given_name": "Massimiliano", "family_name": "Pavan", "institution": null}, {"given_name": "Marcello", "family_name": "Pelillo", "institution": null}]}