{"title": "Consistency of Spectral Partitioning of Uniform Hypergraphs under Planted Partition Model", "book": "Advances in Neural Information Processing Systems", "page_first": 397, "page_last": 405, "abstract": "Spectral graph partitioning methods have received significant attention from both practitioners and theorists in computer science. Some notable studies have been carried out regarding the behavior of these methods for infinitely large sample size (von Luxburg et al., 2008; Rohe et al., 2011), which provide sufficient confidence to practitioners about the effectiveness of these methods. On the other hand, recent developments in computer vision have led to a plethora of applications, where the model deals with multi-way affinity relations and can be posed as uniform hyper-graphs. In this paper, we view these models as random m-uniform hypergraphs and establish the consistency of spectral algorithm in this general setting. We develop a planted partition model or stochastic blockmodel for such problems using higher order tensors, present a spectral technique suited for the purpose and study its large sample behavior. The analysis reveals that the algorithm is consistent for m-uniform hypergraphs for larger values of m, and also the rate of convergence improves for increasing m. Our result provides the first theoretical evidence that establishes the importance of m-way affinities.", "full_text": "Consistency of Spectral Partitioning of Uniform\n\nHypergraphs under Planted Partition Model\n\nDebarghya Ghoshdastidar\n\nAmbedkar Dukkipati\n\nDepartment of Computer Science & Automation\n\nIndian Institute of Science\nBangalore \u2013 560012, India\n\n{debarghya.g,ad}@csa.iisc.ernet.in\n\nAbstract\n\nSpectral graph partitioning methods have received signi\ufb01cant attention from both\npractitioners and theorists in computer science. Some notable studies have been\ncarried out regarding the behavior of these methods for in\ufb01nitely large sample size\n(von Luxburg et al., 2008; Rohe et al., 2011), which provide suf\ufb01cient con\ufb01dence\nto practitioners about the effectiveness of these methods. On the other hand, recent\ndevelopments in computer vision have led to a plethora of applications, where the\nmodel deals with multi-way af\ufb01nity relations and can be posed as uniform hyper-\ngraphs. In this paper, we view these models as random m-uniform hypergraphs\nand establish the consistency of spectral algorithm in this general setting. We de-\nvelop a planted partition model or stochastic blockmodel for such problems using\nhigher order tensors, present a spectral technique suited for the purpose and study\nits large sample behavior. The analysis reveals that the algorithm is consistent for\nm-uniform hypergraphs for larger values of m, and also the rate of convergence\nimproves for increasing m. Our result provides the \ufb01rst theoretical evidence that\nestablishes the importance of m-way af\ufb01nities.\n\n1\n\nIntroduction\n\nThe central theme in approaches like kernel machines [1] and spectral clustering [2, 3] is the use\nof symmetric matrices that encode certain similarity relations between pairs of data instances. This\nallows one to use the tools of matrix theory to design ef\ufb01cient algorithms and provide theoretical\nanalysis for the same. Spectral graph theory [4] provides classic examples of this methodology,\nwhere various hard combinatorial problems pertaining to graphs are relaxed to problems of matrix\ntheory. In this work, we focus on spectral partitioning, where the aim is to group the nodes of a graph\ninto disjoint sets using the eigenvectors of the adjacency matrix or the Laplacian operator. A statis-\ntical framework for this partitioning problem is the planted partition or stochastic blockmodel [5].\nHere, one assumes the existence of an unknown map that partitions the nodes of a random graph,\nand the probability of occurrence of any edge follows the partition rule. In a recent work, Rohe et\nal. [6] studied normalized spectral clustering under the stochastic blockmodel and proved that, for\nthis method, the fractional number of misclustered nodes goes to zero as the sample size grows.\nHowever, recent developments in signal processing, computer vision and statistical modeling have\nposed numerous problems, where one is interested in computing multi-way similarity functions that\ncompute similarity among more than two data points. A few applications are listed below.\nExample 1. In geometric grouping, one is required to cluster points sampled from a number of\ngeometric objects or manifolds [7]. Usually, these objects are highly overlapping, and one cannot\nuse standard distance based pairwise af\ufb01nities to retrieve the desired clusters. Hence, one needs to\nconstruct multi-point similarities based on the geometric structure. A special case is the subspace\nclustering problem encountered in motion segmentation [7], face clustering [8] etc.\n\n1\n\n\fExample 2. The problem of point-set matching [9] underlies several problems in computer vision\nincluding image registration, object recognition, feature tracking etc. The problem is often formu-\nlated as \ufb01nding a strongly connected component in a uniform hypergraph [9, 10], where the strongly\nconnected component represents the correct matching. This formulation has the \ufb02avor of the stan-\ndard problem of detecting cliques in random graphs.\nBoth of the above problems are variants of the classic hypergraph partitioning problem, that arose in\nthe VLSI community [11] in 1980s, and has been an active area of research till date [12]. Spectral ap-\nproaches for hypergraph partitioning also exist in the literature [13, 14, 15], and various de\ufb01nitions of\nthe hypergraph Laplacian matrix has been proposed based on different criteria. Recent studies [16]\nsuggest an alternative representation of uniform hypergraphs in terms of the \u201caf\ufb01nity tensor\u201d. Ten-\nsors have been popular in machine learning and signal processing for a considerable time (see [17]),\nand have even found use in graph partitioning and detecting planted partitions [17, 18]. But their\nrole in hypergraph partitioning have been mostly overlooked in the literature. Recently, techniques\nhave emerged in computer vision that use such af\ufb01nity tensors in hypergraph partitioning [8, 9].\nThis paper provides the \ufb01rst consistency result on uniform hypergraph partitioning by analyzing the\nspectral decomposition of the af\ufb01nity tensor. The main contributions of this work are the following.\n(1) We propose a planted partition model for random uniform hypergraphs similar to that of\ngraphs [5]. We show that the above examples are special cases of the proposed partition model.\n(2) We present a spectral technique to extract the underlying partitions of the model. This method\nrelies on a spectral decomposition of tensors [19] that can be computed in polynomial time, and\nhence, it is computationally ef\ufb01cient than the tensorial approaches in [10, 8].\n(3) We analyze the proposed approach and provide almost sure bounds on the number of misclus-\ntered nodes. Our analysis reveals that the presented method is consistent almost surely in the group-\ning problem and for detection of a strongly connected component, whenever one uses m-way af\ufb01ni-\nties for any m \u2265 3 and m \u2265 4, respectively. The derived rate of convergence also shows that the\nuse of higher order af\ufb01nities lead to a faster decay in the number of misclustered nodes.\n(4) We numerically demonstrate the performance of the approach on benchmark datasets.\n\n2 Planted partitions in random uniform hypergraphs\nWe describe the planted partition model for an undirected unweighted graph. Let \u03c8 : {1, . . . , n} \u2192\n{1, . . . , k} be an (unknown) partition of n nodes into k disjoint groups, i.e., \u03c8i = \u03c8(i) denotes the\npartition in which node-i belongs. We also de\ufb01ne an assignment matrix Zn \u2208 {0, 1}n\u00d7k such that\n(Zn)ij = 1 if j = \u03c8i, and 0 otherwise. For some unknown symmetric matrix B \u2208 [0, 1]k\u00d7k, the\nrandom graph on the n nodes contains the edge (i, j) with probability B\u03c8i\u03c8j . Let the symmetric\nmatrix An \u2208 {0, 1}n\u00d7n be a realization of the af\ufb01nity matrix of the random graph on n nodes. The\naim is to identify Zn given the matrix An. In some cases, one also needs to estimate the entries in\nB. One can hope to achieve this goal for the following reason: If An \u2208 Rn\u00d7n contains the expected\nvalues of the entries in An conditioned on B and \u03c8, then one can write An as An = ZnBZ T\nn [6].\nThus, if one can \ufb01nd An, then this relation can be used to \ufb01nd Zn.\nWe generalize the partition model to uniform hypergraphs. A hypergraph is a structure on n nodes\nwith multi-way connections or hyperedges. Formally, each hyperedge in an undirected unweighted\nhypergraph is a collection of an arbitrary number of vertices. A special case is that of m-uniform\nhypergraph, where each hyperedge contains exactly m nodes. One can note that a graph is a 2-\nuniform hypergraph. An often cited example of uniform hypergraph is as follows [10]. Let the\nnodes be representative of points in an Euclidean space, where a hyperedge exists if the points\nare collinear. For m = 2, we obtain a complete graph that does not convey enough information\nabout the nodes. However, for m = 3, the constructed hypergraph is a union of several connected\ncomponents, each component representing a set of collinear points. The af\ufb01nity relations of an m-\nuniform hypergraph can be represented in the form of an mth-order tensor An \u2208 {0, 1}n\u00d7n\u00d7...\u00d7n,\nwhich we call an af\ufb01nity tensor. The entry (An)i1...im = 1 if there exists a hyperedge on nodes\ni1, . . . , im. One can observe that the tensor is symmetric, i.e., invariant under any permutation of\nindices. In some works [16], the tensor is scaled by a factor of 1/(m \u2212 1)! for certain reasons.\nLet \u03c8 and Zn be as de\ufb01ned above, and B \u2208 [0, 1]k\u00d7...\u00d7k be an mth-order k-dimensional symmetric\ntensor. The random m-uniform hypergraph on the n nodes is constructed such that a hyperedge\noccurs on nodes i1, . . . , im with probability B\u03c8i1 ...\u03c8im . If An is a random af\ufb01nity tensor of the\n\n2\n\n\fhypergraph, our aim is to \ufb01nd Zn or \u03c8 from An. Notice that if An \u2208 Rn\u00d7...\u00d7n contains the\nexpected values of the entries in An, then one can write the entries in An as\n\nk(cid:88)\n\n(An)i1...im = B\u03c8i1 ...\u03c8im\n\n=\n\nBj1...jm(Zn)i1j1 . . . (Zn)imjm .\n\n(1)\n\nThe subscript n in the above terms emphasizes their dependence on the number of nodes. We now\ndescribe how two standard applications in computer vision can be formulated as the problem of\ndetecting planted partitions in uniform hypergraphs.\n\nj1,...,jm=1\n\n2.1 Subspace clustering problem\n\nIn motion segmentation [7, 20] or illumination invariant face clustering [8], the data belong to a\nhigh dimensional space. However, the instances belonging to each cluster approximately span a\nlow-dimensional subspace (usually, of dimension 3 or 4). Here, one needs to check whether m\npoints approximate such a subspace, where this information is useful only when m is larger than the\ndimension of the underlying subspace of interest. The model can be represented as an m-uniform\nhypergraph, where a hyperedge occurs on m nodes whenever they approximately span a subspace.\nThe partition model for this problem is similar to the standard four parameter blockmodel [6]. The\nnumber of partitions is k, and each partition contains s nodes, i.e., n = ks. There exists probabilities\np \u2208 (0, 1] and q \u2208 [0, p) such that any set of m vectors span a subspace with probability p if all m\nvectors belong to the same group, and with probability q if they come from different groups. Thus,\nthe tensor B has the form Bi...i = p for all i = 1, . . . , k, and Bi1...im = q for all the other entries.\n\n2.2 Point set matching problem\n\nWe consider a simpli\ufb01ed version of the matching problem [10], where one is given two sets of\npoints of interest, each of size s. In practice, these points may come from two different images\nof the same object or scene, and the goal is to match the corresponding points. One can see that\nthere are s2 candidate matches. However, if one considers m correct matches then certain properties\nare preserved. For instance, let i1, . . . , im be some points from the \ufb01rst image, and i(cid:48)\nm be\nthe corresponding points in the second image, then the angles or ratio of areas of triangles formed\namong these points are more or less preserved [9]. Thus, the set of matches (i1, i(cid:48)\nm)\nhave a certain connection, which is usually not present if the matches are not exact.\nThe above model is an m-uniform hypergraph on n = s2 nodes, each node representing a candi-\ndate match, and a hyperedge is formed if properties (like preservation of angles) is satis\ufb01ed by m\ncandidate matches. Here, one can see that there are only s =\nn correct matches, which have a\nlarge number of hyperedges among them, whereas very few hyperedges may be present for other\ncombinations. Thus, the partition model has two groups of size\nn), respectively. For\np, q \u2208 [0, 1], p (cid:29) q, p denotes the probability of a hyperedge among m correct matches and for any\nother m candidates, there is a hyperedge with probability q. Thus, if the \ufb01rst partition is the strongly\nconnected component, then we have B \u2208 [0, 1]2\u00d7...\u00d72 with B1...1 = p and Bi1...im = q otherwise.\n\n\u221a\n\u221a\nn and (n\u2212\u221a\n\n1, . . . , i(cid:48)\n1), . . . , (im, i(cid:48)\n\n3 Spectral partitioning algorithm and its consistency\n\nBefore presenting the algorithm, we provide some background on spectral decomposition of tensors.\nIn the related literature, one can \ufb01nd a number of signi\ufb01cantly different characterizations of the\nspectral properties of tensors. While the work in [16] builds on a variational characterization, De\nLathauwer et al. [19] provide an explicit decomposition of a tensor in the spirit of the singular\nvalue decomposition of matrices. The second line of study is more appropriate for our work since\nour analysis signi\ufb01cantly relies on the use of Davis-Kahan perturbation theorem [21] that uses an\nexplicit decomposition, and has been often used to analyze spectral clustering [2, 6].\nThe work in [19] provides a way of expressing any mth-order n-dimensional symmetric tensor,\nAn, as a mode-k product [19] of a certain core tensor with m orthonormal matrices, where each\n\northonormal matrix is formed from the orthonormal left singular vectors of (cid:98)An \u2208 {0, 1}n\u00d7nm\u22121,\n\n3\n\n\fm(cid:88)\n\nwhose entries, for all i = 1, . . . , n and j = 1, . . . , nm\u22121, are de\ufb01ned as\n\n((cid:98)An)ij = (An)i1i2...im ,\n\nif i = i1 and j = 1 +\n\n(il \u2212 1)nl\u22122 .\n\n(2)\n\nl=2\n\nThe above matrix (cid:98)An, often called the mode-1 \ufb02attened matrix, forms a key component of the\npartitioning algorithm. Later, we show that the leading k left singular vectors of (cid:98)An contain infor-\nWn = (cid:98)An(cid:98)AT\nn \u2208 Rn\u00d7n, whose eigenvectors correspond to the left singular vectors of (cid:98)An. The\n\nmation about the true partitions in the hypergraph. It is easier to work with the symmetric matrix\n\nspectral partitioning algorithm is presented in Algorithm 1, which is quite similar to the normalized\nspectral clustering [2]. Such a tensor based approach was \ufb01rst studied in [7] for geometric group-\ning. Subsequent improvements of the algorithm were proposed in [22, 20]. However, we deviate\nfrom these methods as we do not normalize the rows of the eigenvector matrix. The method in [9]\nalso uses the largest eigenvector of the \ufb02attened matrix for the point set matching problem. This is\ncomputed via tensor power iterations. To keep the analysis simple, we do not use such iterations.\nThe complexity of Algorithm 1 is O(nm+1), which can be signi\ufb01cantly improved using sampling\ntechniques as in [7, 9, 20]. The matrix Dn is used for normalization as in spectral clustering.\n\nAlgorithm 1 Spectral partitioning of m-uniform hypergraph\n\n1. From the mth-order af\ufb01nity tensor An, construct (cid:98)An using (2).\nn , and Dn \u2208 Rn\u00d7n be diagonal with (Dn)ii =(cid:80)n\n2. Let Wn = (cid:98)An(cid:98)AT\n\nj=1(Wn)ij.\n\n\u22121/2\n\u22121/2\nn WnD\nn\n\n.\n\n3. Set Ln = D\n4. Compute leading k orthonormal eigenvectors of Ln, denoted by matrix Xn \u2208 Rn\u00d7k.\n5. Cluster the rows of Xn into k clusters using k-means clustering.\n6. Assign node-i of hypergraph to jth partition if ith row of Xn is grouped in jth cluster.\n\nAn alternative technique of using eigenvectors of Laplacian matrix is often preferred in graph par-\n\ntitioning [3], and has been extended to hypergraphs [13, 15]. Unlike the \ufb02attened matrix, (cid:98)An, in\n\nAlgorithm 1, such Laplacians do not preserve the spectral properties of a higher-order structure such\nas the af\ufb01nity tensor that accurately represents the af\ufb01nities of the hypergraph. Hence, we avoid the\nuse of hypergraph Laplacian.\n\n3.1 Consistency of above algorithm\n\nCn = (Z T\n\nn Zn)1/2(cid:98)B(Z T\n\nWe now comment on the error incurred by Algorithm 1. For this, let Mn be the set of nodes that\nare incorrectly clustered by Algorithm 1. It is tricky to formalize the de\ufb01nition of Mn in clustering\nproblems. We follow the de\ufb01nition of Mn given in [6] that requires some details of the analysis\nand hence, a formal de\ufb01nition is postponed till Section 4. In addition, we need the following terms.\n[0, 1]k\u00d7km\u22121 be the \ufb02attening of tensor B using (2). We also de\ufb01ne a matrix Cn \u2208 Rk\u00d7k as\n\nThe analysis depends on the tensor B \u2208 [0, 1]k\u00d7...\u00d7k of the underlying random model. Let (cid:98)B \u2208\n\nn Zn)\u2297(m\u22121)(cid:98)BT (Z T\nn Zn)\u2297(m\u22121) is the (m \u2212 1)-times Kronecker product of Z T\n\n(3)\nwhere (Z T\nn Zn with itself. Use of such\nKronecker product is quite common in tensor decompositions (see [19]). Observe that the positive\nsemi-de\ufb01nite matrix Cn contains information regarding the connectivity of clusters (stored in B)\nand the cluster sizes (diagonal entries of Z T\nn Zn). Let \u03bbk(Cn) be the smallest eigenvalue of Cn,\nwhich is non-negative. In addition, de\ufb01ne Dn \u2208 Rn\u00d7n as the expectation of the diagonal matrix\nDn. One can see that (Dn)ii \u2264 nm for all i = 1, . . . , n. Let Dn and Dn be the smallest and largest\nvalues in Dn. Also, let Sn and Sn be the sizes of the smallest and largest partitions, respectively.\nWe have the following bound on the number of misclustered nodes.\nTheorem 1. If there exists N such that for all n > N,\n\nn Zn)1/2,\n\n(cid:18) \u03bbk(Cn)\n\nDn\n\n\u2212 2nm\u22121\nDn\n\n(cid:19)\n\n\u03b4n :=\n\n(cid:114) 2\n\n,\n\nlog n\n\nDn \u2265 nm(m \u2212 1)!\n\n> 0\n\nand\n\n4\n\n\f(cid:16)\n\nm\u22121\n\n2\n\n\u03b4nn\n\n(cid:17)\n\nand if (log n)3/2 = o\n\n, then the number of misclustered nodes\n\n(cid:18) Sn(log n)2nm+1\n\n(cid:19)\n\nnD2\n\u03b42\n\nn\n\n|Mn| = O\n\nalmost surely.\n\nThe above result is too general to provide conclusive remarks about consistency of the algorithm.\nHence, we focus on two examples, precisely the ones described in Sections 2.1 and 2.2. However,\nwithout loss of generality, we assume here that q > 0 since otherwise, the problem of detecting the\npartitions is trivial (at least for reasonably large n) as we can construct the partitions only based on\nthe presence of hyperedges. The following results are proved in the appendix. The proofs mainly\ndepend on computation of \u03bbk(Cn), which can be derived for the \ufb01rst example, while for the second,\nit is enough to work with a lower bound of \u03bbk(Cn). Further, in the \ufb01rst example, we make the result\ngeneral by allowing the number of clusters, k, to grow with n under certain conditions.\nCorollary 2. Consider the setting of subspace clustering described in Section 2.1. If the number\nof clusters k satisfy k = O\n, then the conditions in Theorem 1 are satis\ufb01ed and\n|Mn| = O\nalmost surely. Hence, for m > 2, |Mn| \u2192 0\n\u2192 0 a.s.\na.s. as n \u2192 \u221e, i.e., the algorithm is consistent. For m = 2, we can only conclude\n\n2m (log n)\u22121(cid:17)\n(cid:18) (log n)3\u22122m\n\n(cid:18) k2m\u22121(log n)2\n\nnm\u22123+ 1\n\n(cid:16)\n(cid:19)\n\nnm\u22122\n\n(cid:19)\n\n= O\n\nn 1\n\n2m\n\n|Mn|\nn\n\nFrom the above result, it is evident that the rate of convergence improves as m increases, indicating\nthat, ignoring practical considerations, one should prefer the use of higher order af\ufb01nities. How-\never, the condition of number of clusters becomes more strict in such cases. We note here that our\nresult and conditions are quite similar to those given in [6] for the case of four-parameter block-\nmodel. Thus, Algorithm 1 is comparable to spectral clustering [6]. Next, we consider the setting of\nSection 2.2.\nCorollary 3. For the problem of point set matching described in Section 2.2, the conditions in\nTheorem 1 are satis\ufb01ed for m \u2265 3 and |Mn| = O\na.s. Hence, for m > 3, |Mn| \u2192 0\n\u2192 0 a.s.\na.s. as n \u2192 \u221e, i.e., the algorithm is consistent. For m = 3, we can only conclude\n\n(cid:18) (log n)2\n\nnm\u22123\n\n(cid:19)\n\n|Mn|\nn\n\n\u221a\n\nq3 eventually, then Algorithm 1 is consistent.\n\nThe above result shows, theoretically, why higher order matching provides high accuracy in prac-\ntice [9]. It also suggests that increase in the order of tensor will lead to a better convergence rate.\nWe note that the following result does not hold for graphs (m = 2). In Corollary 3, we used the fact\nn. The result can be made more general in terms of s, i.e.,\nthat the smaller partition is of size s =\nfor m > 4, if s \u2265 3p\nBefore providing the detailed analysis (proof of Theorem 1), we brie\ufb02y comment on the model\nconsidered here. In Section 2, we have followed the lines of [6] to de\ufb01ne the model with An =\nn . However, this would mean that the diagonal entries in An are non-negative, and hence,\nZnBZ T\nthere is a non-zero probability of formation of self loops that is not common in practice. The same\nissue exists for hypergraphs. To avoid this, one can add a correction term to An so that the entries\nwith repeated indices become zero. Under this correction, conditions in Theorem 1 should not\nchange signi\ufb01cantly. This is easy to verify for graphs, but it is not straightforward for hypergraphs.\n\n4 Analysis of partitioning algorithm\n\nde\ufb01ning certain terms. Let (cid:99)An be the \ufb02attening of the tensor An de\ufb01ned in (1). Then we can\nIn this section, we prove Theorem 1. The result follows from a series of lemmas. The proof requires\nwrite (cid:99)An = Zn(cid:98)B(Z T\nn )\u2297(m\u22121) is (m \u2212 1)-times Kronecker product of Z T\nwith itself. Along with the de\ufb01nitions in Section 3, let Wn \u2208 Rn\u00d7n be the expectation of Wn, and\na diagonal matrix de\ufb01ned in terms of the entries in (cid:99)An. The proof contains the following steps:\nLn = D\u22121/2\nn WnD\u22121/2\n+ Pn, where Pn is\n\n. One can see that Wn can be written as Wn = (cid:99)An(cid:99)An\n\nn )\u2297(m\u22121), where (Z T\n\n(1) For any \ufb01xed n, we show that if \u03b4n > 0 (stated in Theorem 1), the leading k orthonormal\n\nn\n\nn\n\nT\n\n5\n\n\f(cid:113) 2\n\neigenvectors of Ln has k distinct rows, where each row is a representative of a partition.\n(2) Since, Ln is not the expectation of Ln, we derive a bound on the Frobenius norm of their\ndifference. The bound holds almost surely for all n if eventually Dn \u2265 nm(m \u2212 1)!\n(cid:17)\n(3) We use a version of Davis-Kahan sin-\u0398 theorem given in [6] that almost surely bounds the\ndifference in the leading eigenvectors of Ln and Ln if (log n)3/2 = o\n(4) Finally, we rely on [6, Lemma 3.2], which holds in our case, to de\ufb01ne the set of misclustered\nnodes Mn, and its size is bounded almost surely using the previously derived bounds.\nWe now present the statements for the above results. The proofs can be found in the appendix.\nLemma 4. Fix n and let \u03b4n be as de\ufb01ned in Theorem 1. If \u03b4n > 0, then there exists \u00b5n \u2208 Rk\u00d7k such\nthat the columns of Zn\u00b5n are the leading k orthonormal eigenvectors of Ln. Moreover, for nodes i\nand j, \u03c8i = \u03c8j if and only if the ith and jth rows of Zn\u00b5n are identical.\n\nlog n.\n\n(cid:16)\n\n\u03b4nn\n\nm\u22121\n\n.\n\n2\n\nThus, clustering the rows of Zn\u00b5n into k clusters will provide the true partitions, and the cluster\ncenters will precisely be these rows. The condition \u03b4n > 0 is required to ensure that the eigenvalues\ncorresponding to the columns of Zn\u00b5n are strictly greater than other eigenvalues. The requirement\nof a positive eigen-gap is essential for analysis of any spectral partitioning method [2, 23]. Next, we\nfocus on deriving the upper bound for (cid:107)Ln \u2212 Ln(cid:107)F .\nLemma 5. If there exists N such that Dn \u2265 nm(m \u2212 1)!\n\nlog n for all n > N, then\n\n(cid:113) 2\n\n(cid:107)Ln \u2212 Ln(cid:107)F \u2264 4n\n\nlog n\n\nm+1\n\n2\n\nDn\n\n,\n\nalmost surely.\n\n(4)\n\nn\n\nm\u22121\n\n(cid:17)\n\n(cid:16) (log n)3/2\n\nThe condition in the above result implies that each vertex is reasonably connected to other vertices\nof the hypergraph, i.e., there are no outliers. It is easy to satisfy this condition in the stated examples\nas Dn \u2265 q2nm and hence, it holds for all q > 0. Under the condition, one can also see that the\nbound in (4) is O\nand hence goes to zero as n increases. Note that in Lemma 4, \u03b4n > 0\nneed not hold for all n, but if it holds eventually, then we can choose N such that the conditions in\nLemmas 4 and 5 both hold for all n > N. Under such a case, we use the Davis-Kahan perturbation\ntheorem [21] as stated in [6, Theorem 2.1] to claim the following.\nLemma 6. Let Xn \u2208 Rn\u00d7k contain the leading k orthonormal eigenvectors of Ln. If (log n)3/2 =\nlog n for all n > N,\no\nthen there exists an orthonormal (rotation) matrix On \u2208 Rk\u00d7k such that\n\nand there exists N such that \u03b4n > 0 and Dn \u2265 nm(m \u2212 1)!\n\n(cid:113) 2\n\n(cid:16)\n\n(cid:17)\n\n\u03b4nn\n\nm\u22121\n\n2\n\n2\n\n(cid:107)Xn \u2212 Zn\u00b5nOn(cid:107)F \u2264 16n\n\n(cid:16)\n\n(cid:17)\n\nm\u22121\n\nlog n\n\nm+1\n\n2\n\n\u03b4nDn\n\n,\n\nalmost surely.\n\n(5)\n\n2\n\n\u03b4nn\n\nis crucial as it ensures that the difference in eigenvalues\nThe condition (log n)3/2 = o\nof Ln and Ln decays much faster than the eigen-gap in Ln. This condition requires the eigen-gap\n(lower bounded by \u03b4n) to decay at a relatively slow rate, and is necessary for using [6, Theorem 2.1].\nThe bound (5) only says that rows of Xn converges to some rotation of the rows of Zn\u00b5n. However,\nthis is not an issue since the k-means algorithm is expected to perform well as long as the rows of\nXn corresponding to each partition are tightly clustered, and the k clusters are well-separated. Now,\nlet z1, . . . , zn be the rows of Zn, and let ci be the center of the cluster in which ith row of Xn is\ngrouped for each i \u2208 {1, . . . , n}. We use a key result from [6] that is applicable in our setting.\nLemma 7. [6, Lemma 3.2] For the matrix On from Lemma 6, if (cid:107)ci \u2212 zi\u00b5nOn(cid:107)2 < 1\u221a\n(cid:107)ci \u2212 zi\u00b5nOn(cid:107)2 < (cid:107)ci \u2212 zj\u00b5nOn(cid:107)2 for all zj (cid:54)= zi.\nThis result hints that one may use the de\ufb01nition of correct clustering as follows. Node-i is correctly\nclustered if its center ci is closer to zi\u00b5nOn than the rows corresponding to other partitions. A suf\ufb01-\ncient condition to satisfy this de\ufb01nition is (cid:107)ci \u2212 zi\u00b5nOn(cid:107)2 < 1\u221a\n. Hence, the set of misclustered\nnodes is de\ufb01ned as [6]\n\n, then\n\n2Sn\n\n2Sn\n\n(cid:40)\ni \u2208 {1, . . . , n} : (cid:107)ci \u2212 zi\u00b5nOn(cid:107)2 \u2265 1(cid:112)\n\n(cid:41)\n\n.\n\n(6)\n\nMn =\n\n2Sn\n\n6\n\n\fIt is easy to see that if Mn is empty, i.e., all nodes satisfy the condition (cid:107)ci \u2212 zi\u00b5nOn(cid:107)2 < 1\u221a\n,\n2Sn\nthen the clustering leads to true partitions, and does not incur any error. Hence, for statements, where\n|Mn| is small (at least compared to n), one can always use such a de\ufb01nition for misclustered nodes.\nThe next result provides a simple bound on |Mn|, that immediately leads to Theorem 1.\nLemma 8. If the k-means algorithm achieves its global optimum, then the set Mn satis\ufb01es\n\n|Mn| \u2264 8Sn(cid:107)Xn \u2212 Zn\u00b5nOn(cid:107)2\nF .\n\n(7)\n\nIn practice, k-means algorithm tries to \ufb01nd a local minimum, and hence, one should run this step\nwith multiple initializations to achieve a global minimum. However, empirically we found that\ngood performance is achieved even if we use a single run of k-means. From above lemma, it is\nstraightforward to arrive at Theorem 1 by using the bound in Lemma 6.\n\n5 Experiments\n\n5.1 Validation of Corollaries 2 and 3\n\nWe demonstrate the claims of Corollaries 2 and 3, where we stated that for higher order tensors, the\nnumber of misclustered nodes decays to zero at a faster rate. We run Algorithm 1 on both the models\nof subspace clustering and point-set matching, varying the number of nodes n, the results for each n\nbeing averaged over 10 trials. For the clustering model (Section 2.1), we choose p = 0.6, q = 0.4,\nand consider two cases of k = 2 and 3 cluster problems. Figure 1 (top row) shows that in this model,\nthe number of errors eventually decreases for all m, even m = 2. This observation is similar to the\none in [6]. However, the decrease is much faster for m = 3, where accurate partitioning is often\nobserved for n \u2265 100. We also observe that error rises for larger k, thus validating the dependence\nof the bound on k. A similar inference can be drawn from Figure 1 (second row) for the matching\nproblem (Section 2.2), where we use p = 0.9, q = 0.1 and the number of correct matches as\n\n\u221a\n\nn.\n\n5.2 Motion Segmentation on Hopkins 155 dataset\n\nWe now turn to practical applications, and test the performance of Algorithm 1 in motion segmenta-\ntion. We perform the experiments on the Hopkins 155 dataset [24], which contains 120 videos with\n2 independent af\ufb01ne motions. Figure 1 (third row) shows two cases, where Algorithm 1 correctly\nclusters the trajectories into their true groups. We used 4th-order tensors in the approach, where the\n\nlarge dimensionality of (cid:98)An is tackled by using only 500 uniformly sampled columns of (cid:98)An for com-\n\nputing Wn. We also compare the performance of Algorithm 1, averaged over 20 runs, with some\nstandard approaches. The results for other methods have been taken from [20]. We observe that Al-\ngorithm 1 performs reasonably well, while the best performance is obtained using Sparse Grassmann\nClustering (SGC) [20], which is expected as SGC is an iterative improvement of Algorithm 1.\n\n5.3 Matching point sets from the Mpeg-7 shape database\n\nWe now consider a matching problem using points sampled from images in Mpeg-7 database [25].\nThis problem has been considered in [10]. We use 70 random images, one from each shape class.\nTen points were sampled from the boundary of each shape, which formed one point set. The other\nset of points was generated by adding Gaussian noise of variance \u03c32 to the original points and then\nusing a random af\ufb01ne transformation on the points. In Figure 1 (last row), we compare performance\nof Algorithm 1 with the methods in [9, 10], which have been shown to outperform other methods.\nWe use 4-way similarities based on ratio of areas of two triangles. We show the variation in the\nnumber of correctly detected matches and the F1-score for all methods as \u03c3 increases from 0 to\n0.2. The results show that Algorithm 1 is quite robust compared to [10] in detecting true matches.\nHowever, Algorithm 1 does not use additional post-processing as in [9], and hence, allows high\nnumber of false positives that reduces F1-score, whereas [9, 10] show similar trends in both plots.\n\n6 Concluding remarks\n\nIn this paper, we presented a planted partition model for unweighted undirected uniform hyper-\ngraphs. We devised a spectral approach (Algorithm 1) for detecting the partitions from the af\ufb01nity\n\n7\n\n\fThe plots show variation in the\nnumber (left) and fraction (right)\nof misclustered nodes as n in-\ncreases in k = 2 and 3 clus-\nter problems for 2 and 3-uniform\nhypergraphs. Black lines are for\nm = 2 and red for m = 3. Solid\nlines for k = 2, and dashed lines\nfor k = 3.\n\nThe plots show variation in num-\nber (left) and fraction (right) of\nincorrect matches as n increases\nin matching problem for 2 and\n3-uniform hypergraphs. Black\nlines are for m = 2 and red for\nm = 3.\n\nPercentage error in clustering\n\nLSA\nSCC\nLRR-H\nLRSC\nSSC\nSGC\n\nAlgorithm 1\n\n4.23 %\n2.89 %\n2.13 %\n3.69 %\n1.52 %\n1.03 %\n1.83 %\n\nFigure 1: First row: Number of misclustered nodes in clustering problem as n increases.\nSecond row: Number of misclustered nodes in matching problem as n increases.\nThird row: Grouping two af\ufb01ne motions with Algorithm 1 (left), and performance comparison of\nAlgorithm 1 with other methods (right).\nFourth row: Variation in number of correct matches detected (left) and F1-score (middle) as noise\nlevel, \u03c3 increases. (right) A pair of images where Algorithm 1 correctly matches all sampled points.\n\ntensor of the corresponding random hypergraph. The above model is appropriate for a number of\nproblems in computer vision including motion segmentation, illumination-invariant face clustering,\npoint-set matching, feature tracking etc. We analyzed the approach to provide an almost sure upper\nbound on the number of misclustered nodes (c.f. Theorem 1). Using this bound, we conclude that\nfor the problems of subspace clustering and point-set matching, Algorithm 1 is consistent for m \u2265 3\nand m \u2265 4, respectively. To the best of our knowledge, this is the \ufb01rst theoretical study of the above\nproblems in a probabilistic setting, and also the \ufb01rst theoretical evidence that shows importance of\nm-way af\ufb01nities.\n\nAcknowledgement\n\nD. Ghoshdastidar is supported by Google Ph.D. Fellowship in Statistical Learning Theory.\n\n8\n\n\fReferences\n[1] B. Scholk\u00a8opf and A. J. Smola. Learning with Kernels. MIT Press, 2002.\n[2] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In Advances\n\nin Neural Information Processing Systems, pages 849\u2013856, 2002.\n\n[3] U. von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395\u2013416,\n\n2007.\n\n[4] F. R. K. Chung. Spectral graph theory, volume 92. American Mathematical Soc., 1997.\n[5] F. McSherry. Spectral partitioning of random graphs. In IEEE Symposium on Foundations of\n\nComputer Science, pages 529\u2013537, 2001.\n\n[6] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic\n\nblockmodel. Annals of Statistics, 39(4):1878\u20131915, 2011.\n\n[7] V. M. Govindu. A tensor decomposition for geometric grouping and segmentation. In IEEE\n\nConference on Computer Vision and Pattern Recognition, pages 1150\u20131157, 2005.\n\n[8] S. Rota Bulo and M. Pelillo. A game-theoretic approach to hypergraph clustering.\n\nTransactions on Pattern Analysis and Machine Intelligence, 35(6):1312\u20131327, 2013.\n\nIEEE\n\n[9] M. Chertok and Y. Keller. Ef\ufb01cient high order matching. IEEE Trans. on Pattern Analysis and\n\nMachine Intelligence, 32(12):2205\u20132215, 2010.\n\n[10] H. Liu, L. J. Latecki, and S. Yan. Robust clustering as ensembles of af\ufb01nity relations.\n\nAdvances in Neural Information Processing Systems, pages 1414\u20131422, 2010.\n\nIn\n\n[11] G. Schweikert and B. W. Kernighan. A proper model for the partitioning of electrical circuits.\n\nIn Proceedings of 9th Design Automation Workshop, pages 57\u201362, Dallas, 1979.\n\n[12] N. Selvakkumaran and G. Karypis. Multi-objective hypergraph partitioning algorithms for cut\nand maximum subdomain degree minimization. IEEE Trans. on CAD, 25(3):504\u2013517, 2006.\n[13] M. Bolla. Spectra, euclidean representations and clusterings of hypergraphs. Discrete Mathe-\n\nmatics, 117(1):19\u201339, 1993.\n\n[14] S. Agarwal, K. Branson, and S. Belongie. Higher order learning with graphs. In Proceedings\n\nof the International Conference on Machine Learning, pages 17\u201324, 2006.\n\n[15] J. A. Rodriguez. Laplacian eigenvalues and partition problems in hypergraphs. Applied Math-\n\nematics Letters, 22(6):916\u2013921, 2009.\n\n[16] J. Cooper and A. Dutle. Spectra of uniform hypergraphs. Linear Algebra and its Applications,\n\n436(9):3268\u20133292, 2012.\n\n[17] A. Anandkumar, R. Ge, D. Hsu, and S.M. Kakade. A tensor spectral approach to learning\nmixed membership community models. In Conference on Learning Theory (Expanded version\nat arXiv:1210.7559v3), 2013.\n\n[18] A. Frieze and R. Kannan. A new approach to the planted clique problem.\n\nIn IARCS An-\nnual Conference on Foundations of Software Technology and Theoretical Computer Science,\nvolume 2, pages 187\u2013198, 2008.\n\n[19] L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition.\n\nSIAM Journal on Matrix Analysis and Appl., 21(4):1253\u20131278, 2000.\n\n[20] S. Jain and V. M. Govindu. Ef\ufb01cient higher-order clustering on the grassmann manifold. In\n\nIEEE International Conference on Computer Vision, 2013.\n\n[21] G. W. Stewart and J. Sun. Matrix Perturbation Theory. Academic Press, 1990.\n[22] G. Chen and G. Lerman. Foundations of a multi-way spectral clustering framework for hybrid\n\nlinear modeling. Foundations of Computational Mathematics, 9:517\u2013558, 2009.\n\n[23] U. von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. Annals of\n\nStatistics, 36(2):555\u2013586, 2008.\n\n[24] R. Tron and R. Vidal. A benchmark for the comparison of 3-D motion segmentation algo-\n\nrithms. In IEEE Conference on Computer Vision and Pattern Recognition, 2007.\n\n[25] L. J. Latecki, R. Lakamper, and T. Eckhardt. Shape descriptors for non-rigid shapes with\na single closed contour. In IEEE Conference on Computer Vision and Pattern Recognition,\nvolume 1, pages 424\u2013429, 2000.\n\n9\n\n\f", "award": [], "sourceid": 268, "authors": [{"given_name": "Debarghya", "family_name": "Ghoshdastidar", "institution": "Indian Institute of Science"}, {"given_name": "Ambedkar", "family_name": "Dukkipati", "institution": "Indian Institute of Science"}]}