{"title": "A Game-Theoretic Approach to Hypergraph Clustering", "book": "Advances in Neural Information Processing Systems", "page_first": 1571, "page_last": 1579, "abstract": "Hypergraph clustering refers to the process of extracting maximally coherent groups from a set of objects using high-order (rather than pairwise) similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a user-defined number of classes, thereby obtaining the clusters as a by-product of the partitioning process. In this paper, we provide a radically different perspective to the problem. In contrast to the classical approach, we attempt to provide a meaningful formalization of the very notion of a cluster and we show that game theory offers an attractive and unexplored perspective that serves well our purpose. Specifically, we show that the hypergraph clustering problem can be naturally cast into a non-cooperative multi-player ``clustering game, whereby the notion of a cluster is equivalent to a classical game-theoretic equilibrium concept. From the computational viewpoint, we show that the problem of finding the equilibria of our clustering game is equivalent to locally optimizing a polynomial function over the standard simplex, and we provide a discrete-time dynamics to perform this optimization. Experiments are presented which show the superiority of our approach over state-of-the-art hypergraph clustering techniques.", "full_text": "A Game-Theoretic Approach to\n\nHypergraph Clustering\n\nSamuel Rota Bul`o\n\nMarcello Pelillo\n\nUniversity of Venice, Italy\n\n{srotabul,pelillo}@dsi.unive.it\n\nAbstract\n\nHypergraph clustering refers to the process of extracting maximally coherent\ngroups from a set of objects using high-order (rather than pairwise) similarities.\nTraditional approaches to this problem are based on the idea of partitioning the\ninput data into a user-de\ufb01ned number of classes, thereby obtaining the clusters as\na by-product of the partitioning process. In this paper, we provide a radically dif-\nferent perspective to the problem. In contrast to the classical approach, we attempt\nto provide a meaningful formalization of the very notion of a cluster and we show\nthat game theory offers an attractive and unexplored perspective that serves well\nour purpose. Speci\ufb01cally, we show that the hypergraph clustering problem can\nbe naturally cast into a non-cooperative multi-player \u201cclustering game\u201d, whereby\nthe notion of a cluster is equivalent to a classical game-theoretic equilibrium con-\ncept. From the computational viewpoint, we show that the problem of \ufb01nding the\nequilibria of our clustering game is equivalent to locally optimizing a polynomial\nfunction over the standard simplex, and we provide a discrete-time dynamics to\nperform this optimization. Experiments are presented which show the superiority\nof our approach over state-of-the-art hypergraph clustering techniques.\n\n1 Introduction\nClustering is the problem of organizing a set of objects into groups, or clusters, in a way as to have\nsimilar objects grouped together and dissimilar ones assigned to different groups, according to some\nsimilarity measure. Unfortunately, there is no universally accepted formal de\ufb01nition of the notion\nof a cluster, but it is generally agreed that, informally, a cluster should correspond to a set of objects\nsatisfying two conditions: an internal coherency condition, which asks that the objects belonging to\nthe cluster have high mutual similarities, and an external incoherency condition, which states that\nthe overall cluster internal coherency decreases by adding to it any external object.\n\nObjects similarities are typically expressed as pairwise relations, but in some applications higher-\norder relations are more appropriate, and approximating them in terms of pairwise interactions can\nlead to substantial loss of information. Consider for instance the problem of clustering a given set of\nd-dimensional Euclidean points into lines. As every pair of data points trivially de\ufb01nes a line, there\ndoes not exist a meaningful pairwise measure of similarity for this problem. However, it makes\nperfect sense to de\ufb01ne similarity measures over triplets of points that indicate how close they are\nto being collinear. Clearly, this example can be generalized to any problem of model-based point\npattern clustering, where the deviation of a set of points from the model provides a measure of their\ndissimilarity. The problem of clustering objects using high-order similarities is usually referred to\nas the hypergraph clustering problem.\n\nIn the machine learning community, there has been increasing interest around this problem. Zien\nand co-authors [24] propose two approaches called \u201cclique expansion\u201d and \u201cstar expansion\u201d, respec-\ntively. Both approaches transform the similarity hypergraph into an edge-weighted graph, whose\nedge-weights are a function of the hypergraph\u2019s original weights. This way they are able to tackle\n\n1\n\n\fthe problem with standard pairwise clustering algorithms. Bolla [6] de\ufb01nes a Laplacian matrix for\nan unweighted hypergraph and establishes a link between the spectral properties of this matrix and\nthe hypergraph\u2019s minimum cut. Rodr`\u0131guez [16] achieves similar results by transforming the hyper-\ngraph into a graph according to \u201cclique expansion\u201d and shows a relationship between the spectral\nproperties of a Laplacian of the resulting matrix and the cost of minimum partitions of the hy-\npergraph. Zhou and co-authors [23] generalize their earlier work on regularization on graphs and\nde\ufb01ne a hypergraph normalized cut criterion for a k-partition of the vertices, which can be achieved\nby \ufb01nding the second smallest eigenvector of a normalized Laplacian. This approach generalizes\nthe well-known \u201cNormalized cut\u201d pairwise clustering algorithm [19]. Finally, in [2] we \ufb01nd another\nwork based on the idea of applying a spectral graph partitioning algorithm on an edge-weighted\ngraph, which approximates the original (edge-weighted) hypergraph. It is worth noting that the ap-\nproaches mentioned above are devised for dealing with higher-order relations, but can all be reduced\nto standard pairwise clustering approaches [1]. A different formulation is introduced in [18], where\nthe clustering problem with higher-order (super-symmetric) similarities is cast into a nonnegative\nfactorization of the closest hyper-stochastic version of the input af\ufb01nity tensor.\n\nAll the afore-mentioned approaches to hypergraph clustering are partition-based. Indeed, clusters\nare not modeled and sought directly, but they are obtained as a by-product of the partition of the input\ndata into a \ufb01xed number of classes. This renders these approaches vulnerable to applications where\nthe number of classes is not known in advance, or where data is affected by clutter elements which\ndo not belong to any cluster (as in \ufb01gure/ground separation problems). Additionally, by partitioning,\nclusters are necessarily disjoint sets, although it is in many cases natural to have overlapping clusters,\ne.g., two intersecting lines have the point in the intersection belonging to both lines.\n\nIn this paper, following [14, 20] we offer a radically different perspective to the hypergraph cluster-\ning problem. Instead of insisting on the idea of determining a partition of the input data, and hence\nobtaining the clusters as a by-product of the partitioning process, we reverse the terms of the prob-\nlem and attempt instead to derive a rigorous formulation of the very notion of a cluster. This allows\none, in principle, to deal with more general problems where clusters may overlap and/or outliers\nmay get unassigned. We found that game theory offers a very elegant and general mathematical\nframework that serves well our purposes. The basic idea behind our approach is that the hypergraph\nclustering problem can be considered as a multi-player non-cooperative \u201cclustering game\u201d. Within\nthis context, the notion of a cluster turns out to be equivalent to a classical equilibrium concept from\n(evolutionary) game theory, as the latter re\ufb02ects both the internal and external cluster conditions\nalluded to before. We also show that there exists a correspondence between these equilibria and\nthe local solutions of a polynomial, linearly-constrained, optimization problem, and provide an al-\ngorithm for \ufb01nding them. Experiments on two standard hypergraph clustering problems show the\nsuperiority of the proposed approach over state-of-the-art hypergraph clustering techniques.\n\n2 Basic notions from evolutionary game theory\n\nEvolutionary game theory studies models of strategic interactions (called games) among large\nnumbers of anonymous agents. A game can be formalized as a triplet \u0393 = (P, S, \u03c0), where\nP = {1, . . . , k} is the set of players involved in the game, S = {1, . . . , n} is the set of pure\nstrategies (in the terminology of game-theory) available to each player and \u03c0 : Sk \u2192 R is the payoff\nfunction, which assigns a payoff to each strategy pro\ufb01le, i.e., the (ordered) set of pure strategies\nplayed by the individuals. The payoff function \u03c0 is assumed to be invariant to permutations of the\nstrategy pro\ufb01le. It is worth noting that in general games, each player may have its own set of strate-\ngies and own payoff function. For a comprehensive introduction to evolutionary game theory we\nrefer to [22].\n\nBy undertaking an evolutionary setting we assume to have a large population of non-rational agents,\nwhich are randomly matched to play a game \u0393 = (P, S, \u03c0). Agents are considered non-rational, be-\ncause each of them initially chooses a strategy from S, which will be always played when selected\nfor the game. An agent, who selected strategy i \u2208 S, is called i-strategist. Evolution in the popula-\ntion takes place, because we assume that there exists a selection mechanism, which, by analogy with\na Darwinian process, spreads the \ufb01ttest strategies in the population to the detriment of the weakest\none, which will in turn be driven to extinction. We will see later in this work a formalization of such\na selection mechanism.\n\n2\n\n\fThe state of the population at a given time t can be represented as a n-dimensional vector x(t),\nwhere xi(t) represents the fraction of i-strategists in the population at time t. The set of all possible\nstates describing a population is given by\n\n\u2206 =(x \u2208 Rn : Xi\u2208S\n\nxi = 1 and xi \u2265 0 for all i \u2208 S) ,\n\nwhich is called standard simplex. In the sequel we will drop the time reference from the population\nstate, where not necessary. Moreover, we denote with \u03c3(x) the support of x \u2208 \u2206, i.e., the set of\nstrategies still alive in population x \u2208 \u2206: \u03c3(x) = {i \u2208 S : xi > 0}.\n\n(i) \u2208 \u2206 is the probability distribution identifying which strategy the ith player will adopt if\n\nIf y\ndrawn to play the game \u0393, then the average payoff obtained by the agents can be computed as\n\nk\n\nu(y\n\n(1), . . . , y\n\n(k)) = X(s1,...,sk)\u2208Sk\n\nYj=1\n\n\u03c0(s1, . . . , sk)\n\ny(j)\nsj .\n\n(1)\n\nNote that (1) is invariant to any permutation of the input probability vectors.\n\ni, x\n\nk), where x\n\nk\u22121), where e\n\ni \u2208 \u2206 is a vector with xi = 1 and zero elsewhere.\n\nAssuming that the agents are randomly and independently drawn from a population x \u2208 \u2206 to play\nthe game \u0393, the population average payoff is given by u(x\nk is a shortcut for x, . . . , x\nrepeated k times. Furthermore, the average payoff that an i-strategist obtains in a population x \u2208 \u2206\nis given by u(e\nAn important notion in game theory is that of equilibrium [22]. A population x \u2208 \u2206 is in equilibrium\nwhen the distribution of strategies will not change anymore, which intuitively happens when every\nindividual in the population obtains the same average payoff and no strategy can thus prevail on the\nother ones. Formally, x \u2208 \u2206 is a Nash equilibrium if\nk) ,\n\n(2)\nIn other words, every agent in the population performs at most as well as the population average\npayoff. Due to the multi-linearity of u, a consequence of (2) is that\n\nfor all i \u2208 S .\n\nk\u22121) \u2264 u(x\n\ni, x\n\nu(e\n\n(3)\ni.e., all the agents that survived the evolution obtain the same average payoff, which coincides with\nthe population average payoff.\n\nfor all i \u2208 \u03c3(x) ,\n\nk\u22121) = u(x\n\ni, x\n\nk) ,\n\nu(e\n\nA key concept pertaining to evolutionary game theory is that of an evolutionary stable strategy\n[7, 22]. Such a strategy is robust to evolutionary pressure in an exact sense. Assume that in a\npopulation x \u2208 \u2206, a small share \u01eb of mutant agents appears, whose distribution of strategies is\ny \u2208 \u2206. The resulting postentry population is given by w\u01eb = (1 \u2212 \u01eb)x + \u01eby. Biological intuition\nsuggests that evolutionary forces select against mutant individuals if and only if the average payoff\nof a mutant agent in the postentry population is lower than that of an individual from the original\npopulation, i.e.,\n\nk\u22121\n\u01eb\n\nu(y, w\n\n) < u(x, w\n\n(4)\nA population x \u2208 \u2206 is evolutionary stable (or an ESS) if inequality (4) holds for any distribution of\nmutant agents y \u2208 \u2206 \\ {x}, granted the population share of mutants \u01eb is suf\ufb01ciently small (see, [22]\nfor pairwise contests and [7] for n-wise contests).\nAn alternative, but equivalent, characterization of ESSs involves a leveled notion of evolutionary\nstable strategies [7]. We say that x \u2208 \u2206 is an ESS of level j against y \u2208 \u2206, if there exists j \u2208\n{0, . . . , k \u2212 1} such that both conditions\n\n) .\n\nk\u22121\n\u01eb\n\nu(y\n\nu(y\n\nj, x\ni, x\n\nj+1, x\ni+1, x\n\nk\u2212j\u22121) < u(y\nk\u2212i\u22121) = u(y\n\n(5)\n(6)\nare satis\ufb01ed. Clearly, x \u2208 \u2206 is an ESS if it satis\ufb01es a condition of this form for every y \u2208 \u2206 \\ {x}.\nIt is straightforward to see that any ESS is a Nash equilibrium [22, 7]. An ESS, which satis\ufb01es\nconditions (6) with j never more than J, will be called an ESS of level J. Note that for the generic\ncase most of the preceding conditions will be super\ufb02uous, i.e., only ESSs of level 0 or 1 are required\n[7]. Hence, in the sequel, we will consider only ESSs of level 1. It is not dif\ufb01cult to verify that any\nESS (of level 1) x \u2208 \u2206 satis\ufb01es\n\nfor all 0 \u2264 i < j ,\n\nk\u2212j ) ,\nk\u2212i) ,\n\nu(w\n\nk\n\u01eb ) < u(x\n\nk) ,\n\n(7)\n\nfor all y \u2208 \u2206 \\ {x} and small enough values of \u01eb.\n\n3\n\n\f3 The hypergraph clustering game\nThe hypergraph clustering problem can be described by an edge-weighted hypergraph. Formally,\nan edge-weighted hypergraph is a triplet H = (V, E, s), where V = {1, . . . , n} is a \ufb01nite set\nof vertices, E \u2286 P(V ) \\ {\u2205} is the set of (hyper-)edges (here, P(V ) is the power set of V ) and\ns : E \u2192 R is a weight function which associates a real value with each edge. Note that negative\nweights are allowed too. Although hypergraphs may have edges of varying cardinality, we will focus\non a particular class of hypergraphs, called k-graphs, whose edges have all \ufb01xed cardinality k \u2265 2.\nIn this paper, we cast the hypergraph clustering problem into a game, called (hypergraph) clustering\ngame, which will be played in an evolutionary setting. Clusters are then derived from the analy-\nsis of the ESSs of the clustering game. Speci\ufb01cally, given a k-graph H = (V, E, s) modeling a\nhypergraph clustering problem, where V = {1, . . . , n} is the set of objects to cluster and s is the\nsimilarity function over the set of objects in E, we can build a game involving k players, each of\nthem having the same set of (pure) strategies, namely the set of objects to cluster V . Under this\nsetting, a population x \u2208 \u2206 of agents playing a clustering game represents in fact a cluster, where\nxi is the probability for object i to be part of it. Indeed, any cluster can be modeled as a probability\ndistribution over the set of objects to cluster. The payoff function of the clustering game is de\ufb01ned\nin a way as to favour the evolution of agents supporting highly coherent objects. Intuitively, this\nis accomplished by rewarding the k players in proportion to the similarity that the k played objects\nhave. Hence, assuming (v1, . . . , vk) \u2208 V k to be the tuple of objects selected by k players, the payoff\nfunction can be simply de\ufb01ned as\n\n\u03c0(v1, . . . , vk) =(cid:26) 1\n\nif {v1, . . . , vk} \u2208 E ,\nelse ,\nwhere the term 1/k! has been introduced for technical reasons.\nGiven a population x \u2208 \u2206 playing the clustering game, we have that the average population payoff\nk) measures the cluster\u2019s internal coherency as the average similarity of the objects forming the\nu(x\ncluster, whereas the average payoff u(e\nk\u22121) of an agent supporting object i \u2208 V in population\nx, measures the average similarity of object i with respect to the cluster.\nAn ESS of a clustering game incorporates the properties of internal coherency and external inco-\nherency of a cluster:\n\n(8)\n\nk! s ({v1, . . . , vk})\n0\n\ni, x\n\ninternal coherency: since ESSs are Nash equilibria, from (3), it follows that every object contribut-\ning to the cluster, i.e., every object in \u03c3(x), has the same average similarity with respect to\nthe cluster, which in turn corresponds to the cluster\u2019s overall average similarity. Hence, the\ncluster is internally coherent;\n\nexternal incoherency: from (2), every object external to the cluster, i.e., every object in V \\ \u03c3(x),\nhas an average similarity which does not exceed the cluster\u2019s overall average similarity.\nThere may still be cases where the average similarity of an external object is the same as\nthat of an internal object, mining the cluster\u2019s external incoherency. However, since x is\nan ESS, from (7) we see that whenever we try to extend a cluster with small shares of\nexternal objects, the cluster\u2019s overall average similarity drops. This guarantees the external\nincoherency property of a cluster to be also satis\ufb01ed.\n\nFinally, it is worth noting that this theory generalizes the dominant-sets clustering framework which\nhas recently been introduced in [14]. Indeed, ESSs of pairwise clustering games, i.e. clustering\ngames de\ufb01ned on graphs, correspond to the dominant-set clusters [20, 17]. This is an additional\nevidence that ESSs are meaningful notions of cluster.\n\n4 Evolution towards a cluster\nIn this section we will show that the ESSs of a clustering game are in one-to-one correspondence\nwith (strict) local solution of a non-linear optimization program. In order to \ufb01nd ESSs, we will also\nprovide a dynamics due to Baum and Eagon, which generalizes the replicator dynamics [22].\n\nLet H = (V, E, s) be a hypergraph clustering problem and \u0393 = (P, V, \u03c0) be the corresponding\nclustering game. Consider the following non-linear optimization problem:\n\nmaximize\n\nf (x) = Xe\u2208E\n\ns(e)Yi\u2208e\n\n4\n\nxi ,\n\nsubject to\n\nx \u2208 \u2206 .\n\n(9)\n\n\fIt is simple to see that any \ufb01rst-order Karush-Kuhn-Tucker (KKT) point x \u2208 \u2206 of program (9) [13]\nis a Nash equilibrium of \u0393. Indeed, by the KKT conditions there exist \u00b5i \u2265 0, i \u2208 S, and \u03bb \u2208 R\nsuch that for all i \u2208 S,\n\n\u2207f (x)i + \u00b5i \u2212 \u03bb =\n\n1\nk\n\nu(e\n\ni, x\n\nk\u22121) + \u00b5i \u2212 \u03bb = 0\n\nand\n\n\u00b5ixi = 0 ,\n\ni, x\n\nwhere \u2207 is the gradient operator. From this it follows straightforwardly that u(e\nk)\nfor all i \u2208 S. Moreover, it turns out that any strict local maximizer x \u2208 \u2206 of (9) is an ESS of \u0393.\nIndeed, by de\ufb01nition, a strict local maximizer of this program satis\ufb01es u(z\nk) = f (z) < f (x) =\nk), for any z \u2208 \u2206 \\ {x} close enough to x, which is in turn equivalent to (7) for suf\ufb01ciently\nu(x\nsmall values of \u01eb.\nThe problem of extracting ESSs of our hypergraph clustering game can thus be cast into the problem\nof \ufb01nding strict local solutions of (9). We will address this optimization task using a result due to\nBaum and Eagon [3], who introduced a class of nonlinear transformations in probability domain.\nTheorem 1 (Baum-Eagon). Let P (x) be a homogeneous polynomial in the variables xi with non-\nnegative coef\ufb01cients, and let x \u2208 \u2206. De\ufb01ne the mapping z = M(x) as follows:\n\nk\u22121) \u2264 u(x\n\nzi = xi\u2202iP (x).\n\nn\n\nXj=1\n\nxj\u2202j P (x),\n\ni = 1, . . . , n.\n\n(10)\n\nThen P (M(x)) > P (x), unless M(x) = x. In other words M is a growth transformation for the\npolynomial P .\n\nThe Baum-Eagon inequality provides an effective iterative means for maximizing polynomial func-\ntions in probability domains, and in fact it has served as the basis for various statistical estimation\ntechniques developed within the theory of probabilistic functions of Markov chains [4]. It was also\nemployed for the solution of relaxation labelling processes [15].\nSince f (x) is a homogeneous polynomial in the variables xi, we can use the transformation of\nTheorem 1 in order to \ufb01nd a local solution x \u2208 \u2206 of (9), which in turn provides us with an ESS of the\nhypergraph clustering game. By taking the support of x, we have a cluster under our framework. The\ncomplexity of \ufb01nding a cluster is thus O(\u03c1|E|), where |E| is the number of edges of the hypergraph\ndescribing the clustering problem and \u03c1 is the average number of iteration needed to converge. Note\nthat \u03c1 never exceeded 100 in our experiments.\nIn order to obtain the clustering, in principle, we have to \ufb01nd the ESSs of the clustering game.\nThis is a non-trivial, although still feasible, task [21], which we leave as a future extension of this\nwork. By now, we adopt a naive peeling-off strategy for our cluster extraction procedure. Namely,\nwe iteratively \ufb01nd a cluster and remove it from the set of objects, and we repeat this process on\nthe remaining objects until a desired number of clusters have been extracted. The set of extracted\nESSs with this procedure does not technically correspond to the ESSs of the original game, but to\nESSs of sub-games of it. The cost of this approximation is that we unfortunately loose (by now) the\npossibility of having overlapping clusters.\n\n5 Experiments\n\nIn this section we present two types of experiments. The \ufb01rst one addresses the problem of line\nclustering, while the second one addresses the problem of illuminant-invariant face clustering. We\ntested our approach against Clique Averaging algorithm (CAVERAGE), since it was the best per-\nforming approach in [2] on the same type of experiments. Speci\ufb01cally, CAVERAGE outperformed\nClique Expansion [10] combined with Normalized cuts, Gibson\u2019s Algorithm under sum and product\nmodel [9], kHMeTiS [11] and Cascading RANSAC [2]. We also compare against Super-symmetric\nNon-negative Tensor Factorization (SNTF) [18], because it is the only approach, other than ours,\nwhich does not approximate the hypergraph to a graph.\n\nSince both CAVERAGE and SNTF, as opposed to our method, require the number of classes K to be\nspeci\ufb01ed, we run them with values of K \u2208 {K \u2217 \u2212 1, K \u2217, K \u2217 + 1} among which the optimal one\n(K \u2217) is present. This allows us to verify the robustness of the approaches under wrong values of K,\nwhich may occur in general as the optimal number of clusters is not known in advance.\n\n5\n\n\f1.5\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\n\u22121\n\n\u22121.5\n\n\u22121.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n2\n\n(a) Example of three lines with \u03c3 = 0.04.\n\nHoCluGame\n\nCav. K=3\n\nCav. K=4\n\nCav. K=5\n\nSntf K=3\n\nSntf K=4\n\nSntf K=5\n\ne\nr\nu\ns\na\ne\nm\n-\nF\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\nHoCluGame\n\nCav. K=2\n\nCav. K=3\n\nCav. K=4\n\nSntf K=2\n\nSntf K=3\n\nSntf K=4\n\n0\n\n0.01 0.02\n\n0.04\n\n0.08\n\n\u03c3\n\n(b) Three lines.\n\nHoCluGame\n\nCav. K=4\n\nCav. K=5\n\nCav. K=6\n\nSntf K=4\n\nSntf K=5\n\nSnft K=6\n\ne\nr\nu\ns\na\ne\nm\n-\nF\n\ne\nr\nu\ns\na\ne\nm\n-\nF\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n0.01 0.02\n\n0.04\n\n0.08\n\n\u03c3\n\n(c) Four lines.\n\n0\n\n0.01 0.02\n\n0.04\n\n0.08\n\n\u03c3\n\n(d) Five lines.\n\nFigure 1: Results on clustering 3, 4 and 5 lines perturbed with increasing levels of Gaussian noise\n(\u03c3 = 0, 0.01, 0.02, 0.04, 0.08).\n\nWe executed the experiments on a AMD Sempron 3Ghz computer with 1Gb RAM. Moreover, we\nevaluated the quality of a clustering by computing the average F-measure of each cluster in the\nground-truth with the most compatible one in the obtained solution (according to a one-to-one cor-\nrespondence).\n\n5.1 Line clustering\nWe consider the problem of clustering lines in spaces of dimension greater than two, i.e., given a\nset of points in Rd, the task is to \ufb01nd sets of collinear points. Pairwise measures of similarity are\nuseless and at least three points are needed. The dissimilarity measure on triplets of points is given\nby their mean distance to the best \ufb01tting line. If d(i, j, k) is the dissimilarity of points {i, j, k}, the\nsimilarity function is given by s({i, j, k}) = exp(\u2212d(i, j, k)2/\u03c32), where \u03c3 is a scaling parameter,\nwhich has been optimally selected for all the approaches according to a small test set.\n\nWe conducted two experiments, in order to assess the robustness of the approaches to both local\nand global noise. Local noise refers to a Gaussian perturbation applied to the points of a line, while\nglobal noise consists of random outlier points.\n\nA \ufb01rst experiment consists in clustering 3, 4 and 5 lines generated in the 5-dimensional space\n[\u22122, 2]5. Each line consists of 20 points, which have been perturbed according to 5 increasing\nlevels of Gaussian noise, namely \u03c3 = 0, 0.01, 0.02, 0.04, 0.08. With this setting there are no outliers\nand every point should be assigned to a line (e.g., see Figure 1(a)). Figure 1(b) shows the results\nobtained with three lines. We reported, for each noise level, the mean and the standard deviation\nof the average F-measures obtained by the algorithms on 30 randomly generated instances. Note\nthat, if the optimal K is used, CAVERAGE and SNTF perform well and the in\ufb02uence of local noise\nis minimal. This behavior intuitively makes sense under moderate perturbations, because if the ap-\nproaches correctly partitioned the data without noise, it is unlikely that the result will change by\nslightly perturbing them. Our approach however achieves good performances as well, although we\ncan notice that with the highest noise level, the performance slightly drops. This is due to the fact\nthat points deviating too much from the overall cluster average collinearity will be excluded as they\nundermine the cluster\u2019s internal coherency. Hence, some perturbed points will be considered out-\nliers. Nevertheless, it is worth noting that by underestimating the optimal number of classes both\nCAVERAGE and SNTF exhibit a drastic performance drop, whereas the in\ufb02uence of overestimations\n\n6\n\n\f1.5\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\n\u22121\n\n\u22121.5\n\n\u22122\n\n \n\n \n\nFirst line\nSecond line\nOutliers\n\n\u22122\n\n\u22121.5\n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n1.5\n\n2\n\n(a) Example of two lines with 40 outliers.\n\ne\nr\nu\ns\na\ne\nm\n-\nF\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\nHoCluGame\n\nCav. K=2\n\nCav. K=3\n\nCav. K=4\n\nSntf K=2\n\nSntf K=3\n\nSntf K=4\n\n10\n\n20\n\n\u03c3\n\n40\n\n(c) Three lines.\n\ne\nr\nu\ns\na\ne\nm\n-\nF\n\ne\nr\nu\ns\na\ne\nm\n-\nF\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\nHoCluGame\n\nCav. K=2\n\nCav. K=3\n\nCav. K=4\n\nSntf K=2\n\nSntf K=3\n\nSntf K=4\n\n10\n\n20\n\n\u03c3\n\n40\n\n(b) Two lines.\n\nHoCluGame\n\nCav. K=3\n\nCav. K=4\n\nCav. K=5\n\nSntf K=3\n\nSntf K=4\n\nSntf K=5\n\n10\n\n20\n\n\u03c3\n\n40\n\n(d) Four lines.\n\nFigure 2: Results on clustering 2, 3 and 4 lines with an increasing number of outliers (0, 10, 20, 40).\n\nhas a lower impact on the two partition-based algorithms. By increasing the number of lines involved\nin the experiment from three to four (Figure 1(c)) and to \ufb01ve (Figure 1(d)) the scenario remains al-\nmost the same for our approach and SNTF, while we can notice a slight decrease of CAVERAGE\u2019s\nperformance.\n\nThe second experiment consists in clustering 2, 3 and 4 slightly perturbed lines (with \ufb01xed local\nnoise \u03c3 = 0.01) generated in the 5-dimensional space [\u22122, 2]5. Again, each line consists of 20\npoints. This time however we added also global noise, i.e., 0, 10, 20 and 40 random points as outliers\n(e.g., see Figure 2(a)). Figure 2(b) shows the results obtained with two lines. Here, the supremacy\nof our approach over partition-based ones is clear. Indeed, our method is not in\ufb02uenced by outliers\nand therefore it performs almost perfectly, whereas CAVERAGE and SNTF perform well only without\noutliers and with the optimal K. It is interesting to notice that, as outliers are introduced, CAVERAGE\nand SNTF perform better with K > 2. Indeed, the only way to get rid of outliers is to group them in\nadditional clusters. However, since outliers are not mutually similar and intuitively they do not form\na cluster, we have that the performance of CAVERAGE and SNTF drop drastically as the number of\noutliers increases. Finally, by increasing the number of lines from two to three (Figure 2(c)) and\nto four (Figure 2(d)), the performance of CAVERAGE and SNTF get worse, while our approach still\nachieves good results.\n\n4/(s2\n\n1 + \u00b7 \u00b7 \u00b7 + s2\n\n5.2 Illuminant-invariant face clustering\nIn [5] it has been shown that images of a Lambertian object illuminated by a point light source lie in\na three dimensional subspace. According to this result, if we assume that four images of a face form\nthe columns of a matrix then d = s2\n4) provides us with a measure of dissimilarity,\nwhere si is the ith singular value of this matrix [2]. We use this dissimilarity measure for the face\nclustering problem and we consider as dataset the Yale Face Database B and its extended version\n[8, 12]. In total we have faces of 38 individuals, each under 64 different illumination conditions. We\ncompared our approach against CAVERAGE and SNTF on subsets of this face dataset. Speci\ufb01cally,\nwe considered cases where we have faces from 4 and 5 random individuals (10 faces per individual),\nand with and without outliers. The case with outliers consists in 10 additional faces each from a\ndifferent individual. For each of those combinations, we created 10 random subsets. Similarly to the\ncase of line clustering, we run CAVERAGE and SNTF with values of K \u2208 {K \u2217 \u2212 1, K \u2217, K \u2217 + 1},\nwhere K \u2217 is the optimal one.\n\n7\n\n\fn. of classes:\nn. of outliers:\nCAVERAGE K=3\nCAVERAGE K=4\nCAVERAGE K=5\nCAVERAGE K=6\nSNTF K=3\nSNTF K=4\nSNTF K=5\nSNTF K=6\nHoCluGame\n\n0\n\n0.63\u00b10.11\n0.96\u00b10.06\n0.91\u00b10.06\n\n-\n\n0.62\u00b10.12\n0.87\u00b10.07\n0.82\u00b10.09\n\n-\n\n0.95\u00b10.03\n\n4\n\n10\n\n0.59\u00b10.07\n0.84\u00b10.07\n0.79\u00b10.05\n\n-\n\n0.58\u00b10.10\n0.81\u00b10.08\n0.76\u00b10.09\n\n-\n\n0.94\u00b10.02\n\n5\n\n10\n-\n\n-\n\n0\n-\n\n-\n\n0.56\u00b10.14\n0.85\u00b10.12\n0.84\u00b10.09\n\n0.58\u00b10.07\n0.83\u00b10.06\n0.82\u00b10.06\n\n0.61\u00b10.13\n0.86\u00b10.12\n0.85\u00b10.08\n0.95\u00b10.05\n\n0.59\u00b10.09\n0.80\u00b10.07\n0.79\u00b10.11\n0.94\u00b10.02\n\nTable 1: Experiments on illuminant-invariant face clustering.\n\nIn Table 1 we report the average F-measures (mean and standard deviation) obtained by the three\napproaches. The results are consistent with those obtained in the case of line clustering with the\nexception of SNTF, which performs worse than the other approaches on this real-world application.\nCAVERAGE and our algorithm perform comparably well when clustering 4 individuals without out-\nliers. However, our approach turns out to be more robust in every other tested case, i.e., when the\nnumber of classes increases and when outliers are introduced. Indeed, CAVERAGE\u2019s performance\ndecreases, while our approach yields the same good results.\n\nIn both the experiments of line and face clustering the execution times of our approach were higher\nthan those of CAVERAGE, but considerably lower than SNTF. The main reason why CAVERAGE\nrun faster is that our approach and SNTF work directly on the hypergraph without resorting to pair-\nwise relations, which is indeed what CAVERAGE does. Further, we mention that our code was not\noptimized to improve speed and all the approaches were run without any sampling policy.\n\n6 Discussion\nIn this paper, we offered a game-theoretic perspective to the hypergraph clustering problem. Within\nour framework the clustering problem is viewed as a multi-player non-cooperative game, and clas-\nsical equilibrium notions from evolutionary game theory turn out to provide a natural formalization\nof the notion of a cluster. We showed that the problem of \ufb01nding these equilibria (clusters) is equiv-\nalent to solving a polynomial optimization problem with linear constraints, which we solve using an\nalgorithm based on the Baum-Eagon inequality. An advantage of our approach over traditional tech-\nniques is the independence from the number of clusters, which is indeed an intrinsic characteristic\nof the input data, and the robustness against outliers, which is especially useful when solving \ufb01gure-\nground-like grouping problems. We also mention, as a potential positive feature of the proposed\napproach, the possibility of \ufb01nding overlapping clusters (e.g., along the lines presented in [21]), al-\nthough in this paper we have not explicitly dealt with this problem. The experimental results show\nthe superiority of our approach with respect to the state of the art in terms of quality of solution. We\nare currently studying alternatives to the plain Baum-Eagon dynamics in order to improve ef\ufb01ciency.\nAcknowledgments. We acknowledge \ufb01nancial support from the FET programme within EU FP7,\nunder the SIMBAD project (contract 213250). We also thank Sameer Agarwal and Ron Zass for\nproviding us with the code of their algorithms.\n\nReferences\n\n[1] S. Agarwal, K. Branson, and S. Belongie. Higher order learning with graphs. In Int. Conf. on\n\nMach. Learning, volume 148, pages 17\u201324, 2006.\n\n[2] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Belongie. Beyond\npairwise clustering. In IEEE Conf. Computer Vision and Patt. Recogn., volume 2, pages 838\u2013\n845, 2005.\n\n[3] L. E. Baum and J. A. Eagon. An inequality with applications to statistical estimation for\nprobabilistic functions of Markov processes and to a model for ecology. Bull. Amer. Math.\nSoc., 73:360\u2013363, 1967.\n\n8\n\n\f[4] L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the\nstatistical analysis of probabilistic functions of Markov chains. Ann. Math. Statistics, 41:164\u2013\n171, 1970.\n\n[5] P. Belhumeur and D. Kriegman. What is the set of images of an object under all possible\n\nlighting conditions. Int. J. Comput. Vision, 28(3):245\u2013260, 1998.\n\n[6] M. Bolla. Spectral, euclidean representations and clusterings of hypergraphs. Discr. Math.,\n\n117:19\u201339, 1993.\n\n[7] M. Broom., C. Cannings, and G. T. Vickers. Multi-player matrix games. Bull. Math. Biology,\n\n59(5):931\u2013952, 1997.\n\n[8] A. S. Georghiades., P. N. Belhumeur, and D. J. Kriegman. From few to many: illumination\ncone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal.\nMachine Intell., 23(6):643\u2013660, 2001.\n\n[9] D. Gibson, J. M. Kleinberg, and P. Raghavan. VLDB, chapter Clustering categoral data: An\napproach based on dynamical systems., pages 311\u2013322. Morgan Kaufmann Publishers Inc.,\n1998.\n\n[10] T. Hu and K. Moerder. Multiterminal \ufb02ows in hypergraphs. In T. Hu and E. S. Kuh, editors,\n\nVLSI circuit layout: theory and design, pages 87\u201393. 1985.\n\n[11] G. Karypis and V. Kumar. Multilevel k-way hypergraph partitioning. VLSI Design, 11(3):285\u2013\n\n300, 2000.\n\n[12] K. C. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under\n\nvariable lighting. IEEE Trans. Pattern Anal. Machine Intell., 27(5):684\u2013698, 2005.\n\n[13] D. G. Luenberger. Linear and nonlinear programming. Addison Wesley, 1984.\n[14] M. Pavan and M. Pelillo. Dominant sets and pairwise clustering. IEEE Trans. Pattern Anal.\n\nMachine Intell., 29(1):167\u2013172, 2007.\n\n[15] M. Pelillo. The dynamics of nonlinear relaxation labeling processes. J. Math. Imag. and Vision,\n\n7(4):309\u2013323, 1997.\n\n[16] J. Rodr`\u0131guez. On the Laplacian spectrum and walk-regular hypergraphs. Linear and Multilin-\n\near Algebra, 51:285\u2013297, 2003.\n\n[17] S. Rota Bul`o. A game-theoretic framework for similarity-based data clustering. PhD thesis,\n\nUniversity of Venice, 2009.\n\n[18] A. Shashua, R. Zass, and T. Hazan. Multi-way clustering using super-symmetric non-negative\n\ntensor factorization. In Europ. Conf. on Comp. Vision, volume 3954, pages 595\u2013608, 2006.\n\n[19] J. Shi and J. Malik. Normalized cuts and image segmentation.\n\nMachine Intell., 22:888\u2013905, 2000.\n\nIEEE Trans. Pattern Anal.\n\n[20] A. Torsello, S. Rota Bul`o, and M. Pelillo. Grouping with asymmetric af\ufb01nities: a game-\nIn IEEE Conf. Computer Vision and Patt. Recogn., pages 292\u2013299,\n\ntheoretic perspective.\n2006.\n\n[21] A. Torsello, S. Rota Bul`o, and M. Pelillo. Beyond partitions: allowing overlapping groups in\n\npairwise clustering. In Int. Conf. Patt. Recogn., 2008.\n\n[22] J. W. Weibull. Evolutionary game theory. Cambridge University Press, 1995.\n[23] D. Zhou, J. Huang, and B. Sch\u00a8olkopf. Learning with hypergraphs: clustering, classi\ufb01cation,\n\nembedding. In Adv. in Neur. Inf. Processing Systems, volume 19, pages 1601\u20131608, 2006.\n\n[24] J. Y. Zien, M. D. F. Schlag, and P. K. Chan. Multilevel spectral hypergraph partitioning\nwith arbitrary vertex sizes. IEEE Trans. on Comp.-Aided Design of Integr. Circ. and Systems,\n18:1389\u20131399, 1999.\n\n9\n\n\f", "award": [], "sourceid": 882, "authors": [{"given_name": "Samuel", "family_name": "Bul\u00f2", "institution": null}, {"given_name": "Marcello", "family_name": "Pelillo", "institution": null}]}