{"title": "Inhomogeneous Hypergraph Clustering with Applications", "book": "Advances in Neural Information Processing Systems", "page_first": 2308, "page_last": 2318, "abstract": "Hypergraph partitioning is an important problem in machine learning, computer vision and network analytics. A widely used method for hypergraph partitioning relies on minimizing a normalized sum of the costs of partitioning hyperedges across clusters. Algorithmic solutions based on this approach assume that different partitions of a hyperedge incur the same cost. However, this assumption fails to leverage the fact that different subsets of vertices within the same hyperedge may have different structural importance. We hence propose a new hypergraph clustering technique, termed inhomogeneous hypergraph partitioning, which assigns different costs to different hyperedge cuts. We prove that inhomogeneous partitioning produces a quadratic approximation to the optimal solution if the inhomogeneous costs satisfy submodularity constraints. Moreover, we demonstrate that inhomogenous partitioning offers significant performance improvements in applications such as structure learning of rankings, subspace segmentation and motif clustering.", "full_text": "Inhomogeneous Hypergraph Clustering with\n\nApplications\n\nDepartment ECE\n\nPan Li\n\nUIUC\n\npanli2@illinois.edu\n\nOlgica Milenkovic\nDepartment ECE\n\nUIUC\n\nmilenkov@illinois.edu\n\nAbstract\n\nHypergraph partitioning is an important problem in machine learning, computer\nvision and network analytics. A widely used method for hypergraph partitioning\nrelies on minimizing a normalized sum of the costs of partitioning hyperedges\nacross clusters. Algorithmic solutions based on this approach assume that different\npartitions of a hyperedge incur the same cost. However, this assumption fails\nto leverage the fact that different subsets of vertices within the same hyperedge\nmay have different structural importance. We hence propose a new hypergraph\nclustering technique, termed inhomogeneous hypergraph partitioning, which as-\nsigns different costs to different hyperedge cuts. We prove that inhomogeneous\npartitioning produces a quadratic approximation to the optimal solution if the\ninhomogeneous costs satisfy submodularity constraints. Moreover, we demonstrate\nthat inhomogenous partitioning offers signi\ufb01cant performance improvements in\napplications such as structure learning of rankings, subspace segmentation and\nmotif clustering.\n\nIntroduction\n\n1\nGraph partitioning or clustering is a ubiquitous learning task that has found many applications in\nstatistics, data mining, social science and signal processing [1, 2]. In most settings, clustering is\nformally cast as an optimization problem that involves entities with different pairwise similarities\nand aims to maximize the total \u201csimilarity\u201d of elements within clusters [3, 4, 5], or simultaneously\nmaximize the total similarity within cluster and dissimilarity between clusters [6, 7, 8]. Graph\npartitioning may be performed in an agnostic setting, where part of the optimization problem is to\nautomatically learn the number of clusters [6, 7].\nAlthough similarity among entities in a class may be captured via pairwise relations, in many real-\nworld problems it is necessary to capture joint, higher-order relations between subsets of objects. From\na graph-theoretic point of view, these higher-order relations may be described via hypergraphs, where\nobjects correspond to vertices and higher-order relations among objects correspond to hyperedges.\nThe vertex clustering problem aims to minimize the similarity across clusters and is referred to as\nhypergraph partitioning. Hypergraph clustering has found a wide range of applications in network\nmotif clustering, semi-supervised learning, subspace clustering and image segmentation. [8, 9, 10,\n11, 12, 13, 14, 15].\nClassical hypergraph partitioning approaches share the same setup: A nonnegative weight is assigned\nto every hyperedge and if the vertices in the hyperedge are placed across clusters, a cost proportional\nto the weight is charged to the objective function [9, 11]. We refer to this clustering procedure\nas homogenous hyperedge clustering and refer to the corresponding partition as a homogeneous\npartition (H-partition). Clearly, this type of approach prohibits the use of information regarding\nhow different vertices or subsets of vertices belonging to a hyperedge contribute to the higher-order\nrelation. A more appropriate formulation entails charging different costs to different cuts of the\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFigure 1: Clusters obtained using homogenous and inhomogeneous hypergraph partitioning and\ngraph partitioning (based on pairwise relations). Left: Each reaction is represented by a hyperedge.\nThree different cuts of a hyperedge are denoted by c(M3), c(M1), and c(M2), based on which vertex\nis \u201cisolated\u201d by the cut. The graph partition only takes into account pairwise relations between\nreactants, corresponding to w(c(M3)) = 0. The homogenous partition enforces the three cuts to\nhave the same weight, w(c(M3)) = w(c(M1)) = w(c(M2)), while an inhomogenous partition is\nnot required to satisfy this constraint. Right: Three different clustering results based on optimally\nnormalized cuts for a graph partition, a homogenous partition (H-partition) and an inhomogenous\npartition (InH-partition) with 0.01 w(c(M1)) \u2264 w(c(M3)) \u2264 0.44 w(c(M1)).\nhyperedges, thereby endowing hyperedges with vector weights capturing these costs. To illustrate\nthe point, consider the example of metabolic networks [16]. In these networks, vertices describe\nmetabolites while edges describe transformative, catalytic or binding relations. Metabolic reactions\nare usually described via equations that involve more than two metabolites, such as M1 + M2 \u2192 M3.\nHere, both metabolites M1 and M2 need to be present in order to complete the reaction that leads\nto the creation of the product M3. The three metabolites play different roles: M1, M2 are reactants,\nwhile M3 is the product metabolite. A synthetic metabolic network involving reactions with three\nreagents as described above is depicted in Figure 1, along with three different partitions induced by a\nhomogeneous, inhomogeneous and classical graph cut. As may be seen, the hypergraph cuts differ in\nterms of how they split or group pairs of reagents. The inhomogeneous clustering preserves all but\none pairing, while the homogenous clustering splits two pairings. The graph partition captures only\npairwise relations between reactants and hence, the optimal normalized cut over the graph splits six\nreaction triples. The differences between inhomogenous, homogenous, and pairwise-relation based\ncuts are even more evident for large graphs and they may lead to signi\ufb01cantly different partitioning\nperformance in a number of important partitioning applications.\nThe problem of inhomogeneous hypergraph clustering has not been previously studied in the literature.\nThe main results of the paper are ef\ufb01cient algorithms for inhomogenous hypergraph partitioning\nwith theoretical performance guarantees and extensive testing of inhomogeneous partitioning in\napplications such as hierarchical biological network studies, structure learning of rankings and\nsubspace clustering1 (All proofs and discussions of some applications are relegated to the Supple-\nmentary Material). The algorithmic methods are based on transforming hypergraphs into graphs\nand subsequently performing spectral clustering based on the normalized Laplacian of the derived\ngraph. A similar approach for homogenous clustering has been used under the name of Clique\nExpansion [14]. However, the projection procedure, which is the key step of Clique Expansion,\ndiffers signi\ufb01cantly from the projection procedure used in our work, as the inhomogenous clustering\nalgorithm allows non-uniform expansion of one hyperedge while Clique Expansion only allows\nfor uniform expansions. A straightforward analysis reveals that the normalized hypergraph cut\nproblem [11] and the normalized Laplacian homogeneous hypergraph clustering algorithms [9, 11]\nare special cases of our proposed algorithm, where the costs assigned to the hyperedges take a very\nspecial form. Furthermore, we show that when the costs of the proposed inhomogeneous hyperedge\nclustering are submodular, the projection procedure is guaranteed to \ufb01nd a constant-approximation\nsolution for several graph-cut related entities. Hence, the inhomogeneous clustering procedure has\nthe same quadratic approximation properties as spectral graph clustering [17].\n2 Preliminaries and Problem Formulation\nA hypergraph H = (V, E) is described in terms of a vertex set V = {v1, v2, ..., vn} and a set of\nhyperedges E. A hyperedge e \u2208 E is a subset of vertices in V . For an arbitrary set S, we let |S|\nstand for the cardinality of the set, and use \u03b4(e) = |e| to denote the size of a hyperedge. If for all\ne \u2208 E, \u03b4(e) equals a constant \u2206, the hypergraph is called a \u2206-uniform hypergraph.\n1The code for experiments can be found at https://github.com/lipan00123/InHclustering.\n\n2\n\nc(M3)M3(Product)M1(Reactant)M2(Reactant)c(M1)c(M2)123456789H-\u00ad\u2010partitionInH-\u00ad\u2010partition10Graph\t\r \u00a0partition\fLet 2e denote the power set of e. An inhomogeneous hyperedge (InH-hyperedge) is a hyperedge\nwith an associated weight function we : 2e \u2192 R\u22650. The weight we(S) indicates the cost of\ncutting/partitioning the hyperedge e into two subsets, S and e/S. A consistent weight we(S) satis\ufb01es\nspeci\ufb01ed. The degree of a vertex v is de\ufb01ned as dv =(cid:80)\nthe following properties: we(\u2205) = 0 and we(S) = we(e/S). The de\ufb01nition also allows we(\u00b7) to\nbe enforced only for a subset of 2e. However, for singleton sets S = {v} \u2208 e, we({v}) has to be\ne: v\u2208e we({v}), while the volume of a subset\n(cid:88)\nof vertices S \u2286 V is de\ufb01ned as\n\nvolH(S) =\n\ndv.\n\nv\u2208S\n\nLet (S, \u00afS) be a partition of the vertices V . De\ufb01ne the hyperedge boundary of S as \u2202S = {e \u2208\n(cid:88)\nE|e \u2229 S (cid:54)= \u2205, e \u2229 \u00afS (cid:54)= \u2205} and the corresponding set volume as\n\n(cid:88)\n\nvolH(\u2202S) =\n\nwe(e \u2229 S) =\n\ne\u2208\u2202S\n\nwe(e \u2229 S),\n\ne\u2208E\n\nwhere the second equality holds since we(\u2205) = we(e) = 0. The task of interest is to minimize the\nnormalized cut NCut of the hypergraph with InH-hyperedges, i.e., to solve the following optimization\nproblem\n\narg min\n\nS\n\nNCutH(S) = arg min\nS\n\nvolH(\u2202S)\n\n1\n\nvolH(S)\n\n+\n\n1\n\nvolH( \u00afS)\n\n.\n\n(3)\n\nOne may also extend the notion of InH hypergraph partitioning to k-way InH-partition. For this pur-\npose, we let (S1, S2, ..., Sk) be a k-way partition of the vertices V , and de\ufb01ne the k-way normalized\ncut for inH-partition according to\n\n(cid:19)\n\n(1)\n\n(2)\n\n(cid:18)\n\nk(cid:88)\n\ni=1\n\nNCutH(S1, S2, ..., Sk) =\n\nvolH(\u2202Si)\nvolH(Si)\n\n.\n\n(4)\n\nSimilarly, the goal of a k-way inH-partition is to minimize NCutH(S1, S2, ..., Sk). Note that if\n\u03b4(e) = 2 for all e \u2208 E, the above de\ufb01nitions are consistent with those used for graphs [18].\n3\n\nInhomogeneous Hypergraph Clustering Algorithms\n\nMotivated by the homogeneous clustering approach of [14], we propose an inhomogeneous clustering\nalgorithm that uses three steps: 1) Projecting each InH-hyperedge onto a subgraph; 2) Merging\nthe subgraphs into a graph; 3) Performing classical spectral clustering based on the normalized\nLaplacian (described in the Supplementary Material, along with the complexity of all algorithmic\nsteps). The novelty of our approach is in introducing the inhomogenous clustering constraints via the\nprojection step, and stating an optimization problem that provides the provably best weight splitting\nfor projections. All our theoretical results are stated for the NCut problem, but the proposed methods\nmay be used as heuristics for k-way NCuts.\nSuppose that we are given a hypergraph with inhomogeneous hyperedge weights, H = (V, E, w).\nFor each InH-hyperedge (e, we), we aim to \ufb01nd a complete subgraph Ge = (V (e), E(e), w(e))\nthat \u201cbest\u201d represents this InH-hyperedge; here, V (e) = e, E(e) = {{v, \u02dcv}|v, \u02dcv \u2208 e, v (cid:54)= \u02dcv}, and\nw(e) : E(e) \u2192 R denotes the hyperedge weight vector. The goal is to \ufb01nd the graph edge weights\nthat provide the best approximation to the split hyperedge weight according to:\n\nmin\n\nw(e),\u03b2(e)\n\n\u03b2(e) s.t. we(S) \u2264\n\nv\u2208S,\u02dcv\u2208e/S\n\nw(e)\nv\u02dcv \u2264 \u03b2(e) we(S), for all S \u2208 2e s.t. we(S) is de\ufb01ned. (5)\n\nUpon solving for the weights w(e), we construct a graph G = (V, Eo, w), where V are the vertices of\nthe hypergraph, Eo is the complete set of edges, and where the weights wv\u02dcv, are computed via\n\n(cid:88)\n\nwv\u02dcv (cid:44)(cid:88)\n\ne\u2208E\n\nw(e)\nv\u02dcv ,\n\n\u2200{v, \u02dcv} \u2208 Eo.\n\n(6)\n\n3\n\n\fThis step represents the projection weight merging procedure, which simply reduces to the sum of\nweights of all hyperedge projections on a pair of vertices. Due to the linearity of the volumes (1) and\nboundaries (2) of sets S of vertices, for any S \u2282 V , we have\n\nVolH(\u2202S) \u2264 VolG(\u2202S) \u2264 \u03b2\u2217VolH(\u2202S), VolH(S) \u2264 VolG(S) \u2264 \u03b2\u2217VolH(S),\n\n(7)\nwhere \u03b2\u2217 = maxe\u2208E \u03b2(e). Applying spectral clustering on G = (V, Eo, w) produces the desired\npartition (S\u2217, \u00afS\u2217). The next result is a consequence of combining the bounds of (7) with the\napproximation guarantees of spectral graph clustering (Theorem 1 [17]).\nTheorem 3.1. If the optimization problem (5) is feasible for all InH-hyperedges and the weights\nwv\u02dcv obtained from (6) are nonnegative for all {v, \u02dcv} \u2208 Eo, then \u03b1\u2217 = NCutH(S\u2217) satis\ufb01es\n\n(\u03b2\u2217\n\n)3\u03b1H \u2265\n\n(\u03b1\u2217)2\n\n8 \u2265\n\n\u03b12H\n8\n\n.\n\n(8)\n\ne(w(e)\n\nremove negative weights (other choices, such as (wv\u02dcv)+ =(cid:80)\n\nwhere \u03b1H is the optimal value of normalized cut of the hypergraph H.\nThere are no guarantees that the wv\u02dcv will be nonnegative: The optimization problem (5) may result\nin solutions w(e) that are negative. The performance of spectral methods in the presence of negative\nedge weights is not well understood [19, 20]; hence, it would be desirable to have the weights\nwv\u02dcv generated from (6) be nonnegative. Unfortunately, imposing nonngativity constraints in the\noptimization problem may render it infeasible. In practice, one may use (wv\u02dcv)+ = max{wv\u02dcv, 0} to\nv\u02dcv )+ do not appear to perform\nwell). This change invalidates the theoretical result of Theorem 3.1, but provides solutions with very\ngood empirical performance. The issues discussed are illustrated by the next example.\nExample 3.1. Let e = {1, 2, 3}, (we({1}), we({2}), we({3})) = (0, 0, 1). The solution to the\nweight optimization problem is (\u03b2(e), w(e)\n23 ) = (1,\u22121/2, 1/2, 1/2). If all components\nw(e) are constrained to be nonnegative, the optimization problem is infeasible. Nevertheless, the above\nchoice of weights is very unlikely to be encountered in practice, as we({1}), we({2}) = 0 indicates\nthat vertices 1 and 2 have no relevant connections within the given hyperedge e, while we({3}) = 1\nindicates that vertex 3 is strongly connected to 1 and 2, which is a contradiction. Let us assume\nnext that the negative weight is set to zero. Then, we adjust the weights ((w(e)\n23 ) =\n(0, 1/2, 1/2), which produce clusterings ((1,3)(2)) or ((2,3)(1)); both have zero costs based on we.\nAnother problem is that arbitrary choices for we may cause the optimization problem to be infeasi-\nble (5) even if negative weights of w(e) are allowed, as illustrated by the following example.\nExample 3.2. Let e = {1, 2, 3, 4}, with we({1, 4}) = we({2, 3}) = 1 and we(S) = 0 for all other\nchoices of sets S. To force the weights to zero, we require w(e)\nv\u02dcv = 0 for all pairs v\u02dcv, which fails to\nhave(cid:0)\u03b4(e)\nwork for we({1, 4}), we({2, 3}). For a hyperedge e, the degrees of freedom for we are 2\u03b4(e)\u22121 \u2212 1,\nas two values of we are \ufb01xed, while the other values are paired up by symmetry. When \u03b4(e) > 3, we\n\n(cid:1) < 2\u03b4(e)\u22121 \u2212 1, which indicates that the problem is overdetermined/infeasible.\n\n12 )+, w(e)\n\n13 , w(e)\n\n12 , w(e)\n\n13 , w(e)\n\nIn what follows, we provide suf\ufb01cient conditions for the optimization problem to have a feasible\nsolution with nonnegative values of the weights w(e). Also, we provide conditions for the weights\nwe that result in a small constant \u03b2\u2217 and hence allow for quadratic approximations of the optimum\nsolution. Our results depend on the availability of information about the weights we: In practice, the\nweights have to be inferred from observable data, which may not suf\ufb01ce to determine more than the\nweight of singletons or pairs of elements.\nOnly the values of we({v}) are known. In this setting, we are only given information about how\nmuch each node contributes to a higher-order relation, i.e., we are only given the values of we({v}),\nv \u2208 V . Hence, we have \u03b4(e) costs (equations) and \u03b4(e) \u2265 3 variables, which makes the problem\nunderdetermined and easy to solve. The optimal \u03b2e = 1 is attained by setting for all edges {v, \u02dcv}\n(9)\n\n(cid:88)\n\nw(e)\n\n1\n\n1\n\n2\n\nv\u02dcv =\n\n\u03b4(e) \u2212 2\n\n[we({v}) + we({\u02dcv})] \u2212\n\n(\u03b4(e) \u2212 1)(\u03b4(e) \u2212 2)\n\nv(cid:48)\u2208e\n\nwe({v(cid:48)\n\n}).\n\nThe components of we(\u00b7) with positive coef\ufb01cients in (3) are precisely those associated with the\nendpoints of edges v\u02dcv. Using simple algebraic manipulations, one can derive the conditions under\nwhich the values w(e)\n\nv\u02dcv are nonnegative, and these are presented in the Supplementary Material.\n\n4\n\n\fis termed submodular.\nTheorem 3.3. If we is submodular, then\nwe(S)\n\n(cid:88)\n\n(cid:20)\n\n\u2217(e)\nv\u02dcv =\n\nw\n\n2|S|(\u03b4(e) \u2212 |S|)\n\n1|{v,\u02dcv}\u2229S|=1\n\nS\u22082e/{\u2205,e}\nwe(S)\n\n(10)\n\n(cid:21)\n\n1|{v,\u02dcv}\u2229S|=2\n\nThe solution to (9) produces a perfect projection with \u03b2(e) = 1. Unfortunately, one cannot guarantee\nthat the solution is nonnegative. Hence, the question of interest is to determine for what types of\ncuts can one can deviate from a perfect projection but ensure that the weights are nonnegative. The\nproposed approach is to set the unspeci\ufb01ed values of we(\u00b7) so that the weight function becomes\nsubmodular, which guarantees nonnegative weights we\nv\u02dcv that can constantly approximate we(\u00b7),\nalthough with a larger approximation constant \u03b2.\nSubmodular weights we(S). As previously discussed, when \u03b4(e) > 3, the optimization problem (5)\nmay not have any feasible solutions for arbitrary choices of weights. However, we show next that if\nthe weights we are submodular, then (5) always has a nonnegative solution. We start by recalling the\nde\ufb01nition of a submodular function.\nDe\ufb01nition 3.2. A function we : 2e \u2192 R\u22650 that satis\ufb01es\n\nwe(S1) + we(S2) \u2265 we(S1 \u2229 S2) + we(S1 \u222a S2)\n\nfor all S1, S2 \u2208 2e,\n\n\u2212\n\n2(|S| + 1)(\u03b4(e) \u2212 |S| \u2212 1)\n\n1|{v,\u02dcv}\u2229S|=0 \u2212\n\nwe(S)\n\n2(|S| \u2212 1)(\u03b4(e) \u2212 |S| + 1)\n\nis nonnegative. For 2 \u2264 \u03b4(e) \u2264 7, the function above is a feasible solution for the optimization\nproblem (5) with parameters \u03b2(e) listed in Table 1.\n\nTable 1: Feasible values of \u03b2(e) for \u03b4(e)\n\n|\u03b4(e)|\n\u03b2\n\n2\n1\n\n3\n1\n\n4\n3/2\n\n5\n2\n\n6\n4\n\n7\n6\n\nTheorem 3.3 also holds when some weights in the set we are not speci\ufb01ed, but may be completed to\nsatisfy submodularity constraints (See Example 3.3).\nExample 3.3. Let e = {1, 2, 3, 4}, (we({1}), we({2}), we({3}), we({4})) = (1/3, 1/3, 1, 1). Solv-\ning (9) yields w(e)\n12 = \u22121/9 and \u03b2(e) = 1. By completing the missing components in we as\n(we({1, 2}), we({1, 3}), we({1, 4})) = (2/3, 1, 1) leads to submodular weights (Observe that com-\npletions are not necessarily unique). Then, the solution of (10) gives w(e)\n12 = 0 and \u03b2(e) \u2208 (1, 2/3],\nwhich is clearly larger than one.\nRemark 3.1. It is worth pointing out that \u03b2 = 1 when \u03b4(e) = 3, which asserts that homogeneous\ntriangle clustering may be performed via spectral methods on graphs without any weight projection\ndistortion [9]. The above results extend this \ufb01nding to the inhomogeneous case whenever the weights\nare submodular. In addition, triangle clustering based on random walks [21] may be extended to the\ninhomogeneous case.\n\nAlso, (10) lead to an optimal approximation ratio \u03b2(e) if we restrict w(e) to be a linear mapping of\nwe, which is formally stated next.\nTheorem 3.4. Suppose that for all pairs of {v, \u02dcv} \u2208 Eo, w(e)\nv\u02dcv is a linear function of we, denoted by\nw(e)\nv\u02dcv = fv\u02dcv(we), where {fv\u02dcv}{v\u02dcv\u2208E(e)} depends on \u03b4(e) but not on we. Then, when \u03b4(e) \u2264 7, the\noptimal values of \u03b2 for the following optimization problem depend only on \u03b4(e), and are equal to\nthose listed in Table 1.\n\nmin\n\n{fv \u02dcv}{v,\u02dcv}\u2208Eo ,\u03b2\n\nmax\n\nsubmodular we\n\ns.t. we(S) \u2264\n\nv\u2208S,\u02dcv\u2208e/S\n\n\u03b2\n\n(cid:88)\n\nfv\u02dcv(we) \u2264 \u03b2we(S),\n\nfor all S \u2208 2e.\n\n(11)\n\nRemark 3.2. Although we were able to prove feasibility (Theorem 3.3) and optimality of linear\nsolutions (Theorem 3.4) only for small values of \u03b4(e), we conjecture the results to be true for all \u03b4(e).\n\n5\n\n\fThe following theorem shows that if the weights we of hyperedges in a hypergraph are generated\nfrom graph cuts of a latent weighted graph, then the projected weights of hyperedges are proportional\nto the corresponding weights in the latent graph.\nTheorem 3.5. Suppose that Ge = (V (e), E(e), w(e)) is a latent graph that generates hyperedge\nv\u02dcv . Then,\n\nweights we according to the following procedure: for any S \u2286 e, we(S) =(cid:80)\n\n\u2217(e)\nv\u02dcv = \u03b2(e)w(e)\n\nv\u02dcv , for all v\u02dcv \u2208 E(e), with \u03b2(e) = 2\u03b4(e)\u22122\n\nv\u2208S,\u02dcv\u2208e/S w(e)\n\u03b4(e)(\u03b4(e)\u22121).\n\nequation (10) establishes that w\n\nTheorem 3.5 establishes consistency of the linear map (10), and also shows that the min-max optimal\napproximation ratio for linear functions equals \u2126(2\u03b4(e)/\u03b4(e)2). An independent line of work [22],\nbased on Gomory-Hu trees (non-linear), established that submodular functions represent nonnegative\nsolutions of the optimization problem (5) with \u03b2(e) = \u03b4e \u2212 1. Therefore, an unrestricted solution of\nthe optimization problem (5) ensures that \u03b2(e) \u2264 \u03b4e \u2212 1.\nAs practical applications almost exclusively involve hypergraphs with small, constant \u03b4(e), the\nGomory-Hu tree approach in this case is suboptimal in approximation ratio compared to (10). The\nexpression (10) can be rewritten as w\u2217(e) = M we, where M is a matrix that only depends on \u03b4(e).\nHence, the projected weights can be computed in a very ef\ufb01cient and simple manner, as opposed\nto constructing the Gomory-Hu tree or solving (5) directly. In the rare case that one has to deal\nwith hyperedges for which \u03b4(e) is large, the Gomory-Hu tree approach and a solution of (5) may be\npreferred.\n\n4 Related Work and Discussion\n\nv\u02dcv = wH\n\n\u03b4(e)\n\ne\u2208E\n\ne\u2208E h(e, v)wH\n\ne =(cid:80)\n\nin [11] is de\ufb01ned as dv =(cid:80)\n\nOne contribution of our work is to introduce the notion of an inhomogenous partition of hyperedges\nand a new hypergraph projection method that accompanies the procedure. Subsequent edge weight\nmerging and spectral clustering are standardly used in hypergraph clustering algorithms, and in\nparticular in Zhou\u2019s normalized hypergraph cut approach [11], Clique Expansion, Star Expansion and\nClique Averaging [14]. The formulation closest to ours is Zhou\u2019s method [11]. In the aforementioned\nhypergraph clustering method for H-hyperedges, each hyperedge e is assigned a scalar weight wH\ne .\nFor the projection step, Zhou used wH\ne /\u03b4(e) for the weight of each pair of endpoints of e. If we\nview the H-hyperedge as an InH-hyperedge with weight function we, where we(S) = wH\ne |S|(\u03b4(e) \u2212\n|S|)/\u03b4(e) for all S \u2208 2e, then our de\ufb01nition of the volume/cost of the boundary (2) is identical to\nthat of Zhou\u2019s. With this choice of we, the optimization problem (5) outputs w(e)\ne /\u03b4(e), with\n\u03b2(e) = 1, which are the same values as those obtained via Zhou\u2019s projection. The degree of a vertex\n\u03b4(e)\u22121 we({v}), which is a weighted sum of\nthe we({v}) and thus takes a slightly different form when compared to our de\ufb01nition. As a matter of\nfact, for uniform hypergraphs, the two forms are same. Some other hypergraph clustering algorithms,\nsuch as Clique expansion and Star expansion, as shown by Agarwal et al. [23], represent special cases\nof our method for uniform hypergraphs as well.\nThe Clique Averaging method differs substantially from all the aforedescribed methods. Instead\nof projecting each hyperedge onto a subgraph and then combining the subgraphs into a graph, the\nalgorithm performs a one-shot projection of the whole hypergraph onto a graph. The projection\nis based on a (cid:96)2-minimization rule, which may not allow for constant-approximation solutions. It\nis unknown if the result of the procedure can provide a quadratic approximation for the optimum\nsolution. Clique Averaging also has practical implementation problems and high computational\ncomplexity, as it is necessary to solve a linear regression with n2 variable and n\u03b4(e) observations.\nIn the recent work on network motif clustering [9], the hyperedges are deduced from a graph where\nthey represent so called motifs. Benson et. al [9] proved that if the motifs have three vertices, resulting\nin a three-uniform hypergraph, their proposed algorithm satis\ufb01es the Cheeger inequality for motifs2.\nIn the described formulation, when cutting an H-hyperedge with weight wH\ne , one is required to pay\nwH\ne . Hence, recasting this model within our setting, we arrive at inhomogenous weights we(S) =\ne , for all S \u2208 2e, for which (5) yields w(e)\nwH\n4 (cid:99)/(\u03b4(e) \u2212 1),\n2The Cheeger inequality [17] arises in the context of minimizing the conductance of a graph, which is related\n\nv\u02dcv = wH\n\ne /(\u03b4(e) \u2212 1) and \u03b2(e) = (cid:98) \u03b42(e)\n\nto the normalized cut.\n\n6\n\n\fidentical to the solution of [9]. Furthermore, given the result of our Theorem 3.1, one can prove that\nthe algorithm of [9] offers a quadratic-factor approximation for motifs involving more than three\nvertices, a fact that was not established in the original work [9].\nAll the aforementioned algorithms essentially learn the spectrum of Laplacian matrices obtained\nthrough hypergraph projection. The ultimate goal of projections is to avoid solving the NP-hard\nproblem of learning the spectrum of certain hypergraph Laplacians [24]. Methods that do not rely on\nhypergraph projection, including optimization with the total variance of hypergraphs [12, 13], tensor\nspectral methods [25] and nonlinear Laplacian spectral methods [26], have also been reported in the\nliterature. These techniques were exclusively applied in homogeneous settings, and they typically\nhave higher complexity and smaller spectral gaps than the projection-based methods. A future line\nof work is to investigate whether these methods can be extended to the inhomogeneous case. Yet\nanother relevant line of work pertains to the statistical analysis of hypergraph partitioning methods\nfor generalized stochastic block models [27, 28].\n\n5 Applications\nNetwork motif clustering. Real-world networks exhibit rich higher-order connectivity patterns\nfrequently referred to as network motifs [29]. Motifs are special subgraphs of the graph and may be\nviewed as hyperedges of a hypergraph over the same set of vertices. Recent work has shown that\nhypergraph clustering based on motifs may be used to learn hidden high-order organization patterns\nin networks [9, 8, 21]. However, this approach treats all vertices and edges within the motifs in the\nsame manner, and hence ignores the fact that each structural unit within the motif may have a different\nrelevance or different role. As a result, the vertices of the motifs are partitioned with a uniform\ncost. However, this assumption is hardly realistic as in many real networks, only some vertices of\nhigher-order structures may need to be clustered together. Hence, inhomogenous hyperedges are\nexpected to elucidate more subtle high-order organizations of network. We illustrate the utility of\nInH-partition on the Florida Bay foodweb [30] and compare our \ufb01ndings to those of [9].\nThe Florida Bay foodweb comprises 128 vertices corresponding to different species or organisms\nthat live in the Bay, and 2106 directed edges indicating carbon exchange between two species. The\nFoodweb essentially represents a layered \ufb02ow network, as carbon \ufb02ows from so called producers\norganisms to high-level predators. Each layer of the network consists of \u201csimilar\u201d species that play\nthe same role in the food chain. Clustering of the species may be performed by leveraging the\nlayered structure of the interactions. As a network motif, we use a subgraph of four species, and\ncorrespondingly, four vertices denoted by vi, for i = 1, 2, 3, 4. The motif captures, among others,\nrelations between two producers and two consumers: The producers v1 and v2 both transmit carbons\nto v3 and v4, and all types of carbon \ufb02ow between v1 and v2, v3 and v4 are allowed (see Figure 2\nLeft). Such a motif is the smallest structural unit that captures the fact that carbon exchange occurs in\nuni-direction between layers, while is allowed freely within layers. The inhomogeneous hyperedge\ncosts are assigned according to the following heuristics: First, as v1 and v2 share two common\ncarbon recipients (predators) while v3 and v4 share two common carbon sources (preys), we set\nwe({vi}) = 1 for i = 1, 2, 3, 4, and we({v1, v2}) = 0, we({v1, v3}) = 2, and we({v1, v4}) = 2.\nBased on the solution of the optimization problem (5), one can construct a weighted subgraph whose\ncosts of cuts match the inhomogeneous costs, with \u03b2(e) = 1. The graph is depicted in Figure 2 (left).\nOur approach is to perform hierarchical clustering via iterative application of the InH-partition\nmethod. In each iteration, we construct a hypergraph by replacing the chosen motif subnetwork by an\nhyperedge. The result is shown in Figure 2. At the \ufb01rst level, we partitioned the species into three\nclusters corresponding to producers, primary consumers and secondary consumers. The producer\ncluster is homogeneous in so far that it contains only producers, a total of nine of them. At the second\nlevel, we partitioned the obtained primary-consumer cluster into two clusters, one of which almost\nexclusively comprises invertebrates (28 out of 35), while the other almost exclusively comprises\nforage \ufb01shes. The secondary-consumer cluster is partitioned into two clusters, one of which comprises\ntop-level predators, while the other cluster mostly consists of predatory \ufb01shes and birds. Overall,\nwe recovered \ufb01ve clusters that \ufb01t \ufb01ve layers ranging from producers to top-level consumers. It is\neasy to check that the producer, invertebrate and top-level predator clusters exhibit high functional\nsimilarity of species (> 80%). An exact functional classi\ufb01cation of forage and predatory \ufb01shes is not\nknown, but our layered network appears to capture an overwhelmingly large number of prey-predator\nrelations among these species. Among the 1714 edges, obtained after removing isolated vertices and\ndetritus species vertices, only \ufb01ve edges point in the opposite direction from a higher to a lower-level\n\n7\n\n\fFigure 2: Motif clustering in the Florida Bay food web. Left: InHomogenous case. Left-top: Hy-\nperedge (network motif) & the weighted induced subgraph; Left-bottom: Hierarchical clustering\nstructure and \ufb01ve clusters via InH-partition. The vertices belonging to different clusters are distin-\nguished by the colors of vertices. Edges with a uni-direction (right to left) are colored black while\nother edges are kept blue. Right: Homogenous partitioning [9] with four clusters. Grey vertices are\nnot connected by motifs and thus unclassi\ufb01ed.\ncluster, two of which go from predatory \ufb01shes to forage \ufb01shes. Detailed information about the species\nand clusters is provided in the Supplementary Material.\nIn comparison, the related work of Benson et al. [9] which used homogenous hypergraph clustering\nand triangular motifs reported a very different clustering structure. The corresponding clusters\ncovered less than half of the species (62 out of 128) as many vertices were not connected by the\ntriangle motif; in contrast, 127 out of 128 vertices were covered by our choice of motif. We attribute\nthe difference between our results and the results of [9] to the choices of the network motif. A triangle\nmotif, used in [9] leaves a large number of vertices unclustered and fails to enforce a hierarchical\nnetwork structure. On the other hand, our fan motif with homogeneous weights produces a giant\ncluster as it ties all the vertices together, and the hierarchical decomposition is only revealed when the\nfan motif is used with inhomogeneous weights. In order to identify hierarchical network structures,\ninstead of hypergraph clustering, one may use topological sorting to rank species based on their\ncarbon \ufb02ows [31]. Unfortunately, topological sorting cannot use biological side information and\nhence fails to automatically determine the boundaries of the clusters.\nLearning the Rif\ufb02ed Independence Structure of Ranking Data. Learning probabilistic models\nfor ranking data has attracted signi\ufb01cant interest in social and political sciences as well as in machine\nlearning [32, 33]. Recently, a probabilistic model, termed the rif\ufb02ed-independence model, was shown\nto accurately describe many benchmark ranked datasets [34]. In the rif\ufb02ed independence model, one\n\ufb01rst generates two rankings over two disjoint sets of element independently, and then rif\ufb02e shuf\ufb02es\nthe rankings to arrive at an interleaved order. The structure learning problem in this setting reduces to\ndistinguishing the two categories of elements based on limited ranking data. More precisely, let Q\nbe the set of candidates to be ranked, with |Q| = n. A full ranking is a bijection \u03c3 : Q \u2192 [n], and\nfor an a \u2208 Q, \u03c3(a) denotes the position of candidate a in the ranking \u03c3. We use \u03c3(a) < (>)\u03c3(b)\nto indicate that a is ranked higher (lower) than b in \u03c3. If S \u2286 Q, we use \u03c3S : S \u2192 [|S|] to denote\nthe ranking \u03c3 projected onto the set S. We also use S(\u03c3) (cid:44) {\u03c3(a)|a \u2208 S} to denote the subset of\npositions of elements in S. Let P(E) denote the probability of the event E. Rif\ufb02ed independence\nasserts that there exists a rif\ufb02ed-independent set S \u2282 Q, such that for a \ufb01xed ranking \u03c3(cid:48) over [n],\n\nP(\u03c3 = \u03c3(cid:48)\n\n) = P(\u03c3S = \u03c3(cid:48)\n\nS)P(\u03c3Q/S = \u03c3(cid:48)\n\nQ/S)P(S(\u03c3) = S(\u03c3(cid:48)\n\n)).\n\nSuppose that we are given a set of rankings \u03a3 = {\u03c3(1), \u03c3(2), ..., \u03c3(m)} drawn independently according\nto some probability distribution P. If P has a rif\ufb02ed-independent set S\u2217, the structure learning problem\nis to \ufb01nd S\u2217. In [34], the described problem was cast as an optimization problem over all possible\nsubsets of Q, with the objective of minimizing the Kullback-Leibler divergence between the ranking\ndistribution with rif\ufb02ed independence and the empirical distribution of \u03a3 [34]. A simpli\ufb01ed version\nof the optimization problem reads as\n\nS\u2282QF(S) (cid:44) (cid:88)\n\narg min\n\n(cid:88)\n\nIi;j,k +\n\nIi;j,k,\n\n(12)\n\n(i,j,k)\u2208\u2126cross\nS, \u00afS\n\n(i,j,k)\u2208\u2126cross\n\u00afS,S\n\n(cid:44) {(i, j, k)|i \u2208 A, j, k \u2208 B}, and where Ii;j,k denotes the estimated mutual informa-\nwhere \u2126cross\nA,B\ntion between the position of the candidate i and two \u201ccomparison candidates\u201d j, k. If 1\u03c3(j)<\u03c3(k)\n\n8\n\nPrimaryconsumersSecondaryconsumersProducersInvertebratesForagefishesPredatoryfishes&BirdsTop-\u00ad\u2010levelPredatorsv1v2v3v4100001v1v2v3v4ProjectionMotif:MicrofaunaPelagicfishesCrabs&BenthicfishesMacroinvertebratesMotif(Benson\u201916):Projection\fParty\n\nFianna F\u00e1il\nFine Gael\nIndependent\n\nOthers\n\nCandidates\n\n1,4,13\n2,5,6\n3,7,8,9\n\n10, 11,12,14\n\nFigure 3: Election dataset. Left-top: parties and candidates; Left-bottom: hierarchical partitioning\nstructure of Irish election detected by InH-Par; Middle: Success rate vs Sample Complexity; Right:\nSuccess rate vs Triple-sampling Rate.\n\ndenotes the indicator function of the underlying event, we may write\n\nIi;j,k (cid:44) \u02c6I(\u03c3(i); 1\u03c3(j)<\u03c3(k)) =\n\n(cid:88)\n\n(cid:88)\n\n\u02c6P(\u03c3(i), 1\u03c3(j)<\u03c3(k)) log\n\n\u02c6P(\u03c3(i), 1\u03c3(j)<\u03c3(k))\n\u02c6P(\u03c3(i))P(1\u03c3(j)<\u03c3(k))\n\n,\n\n(13)\n\n\u03c3(i)\n\n1\u03c3(j)<\u03c3(k)\n\nwhere \u02c6P denotes an estimate of the underlying probability. If i and j, k are in different rif\ufb02ed-\nindependent sets, the estimated mutual information \u02c6I(\u03c3(i); 1\u03c3(j)<\u03c3(k)) converges to zero as the\nnumber of samples increases. When the number of samples is small, one may use mutual information\nestimators described in [35, 36, 37].\nOne may recast the above problem as an InH-partition problem over a hypergraph where each\ncandidate represents a vertex in the hypergraph, and Ii;j,k represents the inhomogeneous cost we({i})\nfor the hyperedge e = {i, j, k}. Note that as mutual information \u02c6I(\u03c3(i); 1\u03c3(j)<\u03c3(k)) is in general\nasymmetric, one would not have been able to use H-partitions. The optimization problem reduces to\nminS volH(\u2202S). The two optimization tasks are different, and we illustrate next that the InH-partition\noutperforms the original optimization approach AnchorsPartition (Apar) [34] both on synthetic data\nand real data. Due to space limitations, synthetic data and a subset of the real dataset results are listed\nin the Supplementary Material.\nHere, we analyzed the Irish House of Parliament election dataset (2002) [38]. The dataset consists\nof 2490 ballots fully ranking 14 candidates. The candidates were from a number of parties, where\nFianna F\u00e1il (F.F.) and Fine Gael (F.G.) are the two largest (and rival) Irish political parties. Using InH-\npartition (InH-Par), one can split the candidates iteratively into two sets (See Figure 3) which yields\nto meaningful clusters that correspond to large parties: {1, 4, 13} (F.F.), {2, 5, 6} (F.G.), {7, 8, 9}\n(Ind.). We compared InH-partition with Apar based on their performance in detecting these three\nclusters using a small training set: We independently sampled m rankings 100 times and executed\nboth algorithms to partition the set of candidates iteratively. During the partitioning procedure,\n\u201cparty success\u201d was declared if one exactly detected one of the three party clusters (\u201cF.F.\u201d, \u201cF.G.\u201d &\n\u201cInd.\u201d). \u201cAll\u201d was used to designate that all three party clusters were detected completely correctly.\nInH-partition outperforms Apar in recovering the cluster Ind. and achieved comparable performance\nfor cluster F.F., although it performs a little worse than Apar for cluster F.G.; InH-partition also\noffers superior overall performance compared to Apar. We also compared InH-partition with APar\nin the large sample regime (m = 2490), using only a subset of triple comparisons (hyperedges)\nsampled independently with probability r (This strategy signi\ufb01cantly reduces the complexity of both\nalgorithms). The average is computed over 100 independent runs. The results are shown in Figure 3,\nhighlighting the robustness of InH-partition with respect to missing triples. Additional test on ranking\ndata are described in the Supplementary Material, along with new results on subspace clustering,\nmotion segmentation and others.\n\n6 Acknowledgement\n\nThe authors gratefully acknowledge many useful suggestions by the reviewers. They are also indebted\nto the reviewers for providing many additional and relevant references. This work was supported in\npart by the NSF grant CCF 1527636.\n\n9\n\n{1,2,3,4,5,6,7,8,9,10,11,12,13,14}{1,4,13}{2,3,5,6,7,8,9,10,11,12,14}{2,5,6}{3,7,8,9,10,11,12,14}{7,8,9}{3,10,11,12,14}.........FiannaF\u00b4ailFineGaelIndependent101102103Sample Complexity m00.10.20.30.40.50.60.70.80.91Success Rate00.20.40.60.81Triple-Sampling Probability r00.10.20.30.40.50.60.70.80.91Success RateInH-Par-F.F.InH-Par-F.G.InH-Par-Ind.InH-Par-AllApar-F.F.Apar-F.G.Apar-Ind.Apar-All\fReferences\n\n[1] A. K. Jain, M. N. Murty, and P. J. Flynn, \u201cData clustering: a review,\u201d ACM computing surveys\n\n(CSUR), vol. 31, no. 3, pp. 264\u2013323, 1999.\n\n[2] A. Y. Ng, M. I. Jordan, and Y. Weiss, \u201cOn spectral clustering: Analysis and an algorithm,\u201d in\n\nAdvances in Neural Information Processing Systems (NIPS), 2002, pp. 849\u2013856.\n\n[3] S. R. Bul\u00f2 and M. Pelillo, \u201cA game-theoretic approach to hypergraph clustering,\u201d in Advances\n\nin Neural Information Processing Systems (NIPS), 2009, pp. 1571\u20131579.\n\n[4] M. Leordeanu and C. Sminchisescu, \u201cEf\ufb01cient hypergraph clustering,\u201d in International Confer-\n\nence on Arti\ufb01cial Intelligence and Statistics (AISTATS), 2012, pp. 676\u2013684.\n\n[5] H. Liu, L. J. Latecki, and S. Yan, \u201cRobust clustering as ensembles of af\ufb01nity relations,\u201d in\n\nAdvances in Neural Information Processing Systems (NIPS), 2010, pp. 1414\u20131422.\n\n[6] N. Bansal, A. Blum, and S. Chawla, \u201cCorrelation clustering,\u201d in The 43rd Annual IEEE\n\nSymposium on Foundations of Computer Science (FOCS), 2002, pp. 238\u2013247.\n\n[7] N. Ailon, M. Charikar, and A. Newman, \u201cAggregating inconsistent information: ranking and\n\nclustering,\u201d Journal of the ACM (JACM), vol. 55, no. 5, p. 23, 2008.\n\n[8] P. Li, H. Dau, G. Puleo, and O. Milenkovic, \u201cMotif clustering and overlapping clustering for\nsocial network analysis,\u201d in IEEE Conference on Computer Communications (INFOCOM),\n2017, pp. 109\u2013117.\n\n[9] A. R. Benson, D. F. Gleich, and J. Leskovec, \u201cHigher-order organization of complex networks,\u201d\n\nScience, vol. 353, no. 6295, pp. 163\u2013166, 2016.\n\n[10] H. Yin, A. R. Benson, J. Leskovec, and D. F. Gleich, \u201cLocal higher-order graph clustering,\u201d in\nProceedings of the 23rd ACM International Conference on Knowledge Discovery and Data\nMining (SIGKDD), 2017, pp. 555\u2013564.\n\n[11] D. Zhou, J. Huang, and B. Sch\u00f6lkopf, \u201cLearning with hypergraphs: Clustering, classi\ufb01cation,\nand embedding,\u201d in Advances in neural information processing systems, 2007, pp. 1601\u20131608.\n[12] M. Hein, S. Setzer, L. Jost, and S. S. Rangapuram, \u201cThe total variation on hypergraphs-learning\non hypergraphs revisited,\u201d in Advances in Neural Information Processing Systems (NIPS), 2013,\npp. 2427\u20132435.\n\n[13] C. Zhang, S. Hu, Z. G. Tang, and T. H. Chan, \u201cRe-revisiting learning on hypergraphs: con\ufb01dence\ninterval and subgradient method,\u201d in International Conference on Machine Learning (ICML),\n2017, pp. 4026\u20134034.\n\n[14] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Belongie, \u201cBeyond\npairwise clustering,\u201d in IEEE Conference on Computer Vision and Pattern Recognition (CVPR),\nvol. 2, 2005, pp. 838\u2013845.\n\n[15] S. Kim, S. Nowozin, P. Kohli, and C. D. Yoo, \u201cHigher-order correlation clustering for image\nsegmentation,\u201d in Advances in Neural Information Processing Systems (NIPS), 2011, pp. 1530\u2013\n1538.\n\n[16] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barab\u00e1si, \u201cThe large-scale organization\n\nof metabolic networks,\u201d Nature, vol. 407, no. 6804, pp. 651\u2013654, 2000.\n\n[17] F. R. Chung, \u201cFour proofs for the cheeger inequality and graph partition algorithms,\u201d in\n\nProceedings of ICCM, vol. 2, 2007, p. 378.\n\n[18] J. Shi and J. Malik, \u201cNormalized cuts and image segmentation,\u201d IEEE Transactions on Pattern\n\nAnalysis and Machine Intelligence, vol. 22, no. 8, pp. 888\u2013905, 2000.\n\n[19] J. Kunegis, S. Schmidt, A. Lommatzsch, J. Lerner, E. W. De Luca, and S. Albayrak, \u201cSpectral\nanalysis of signed graphs for clustering, prediction and visualization,\u201d in SIAM International\nConference on Data Mining (ICDM), 2010, pp. 559\u2013570.\n\n[20] A. V. Knyazev, \u201cSigned laplacian for spectral clustering revisited,\u201d arXiv preprint\n\narXiv:1701.01394, 2017.\n\n[21] C. Tsourakakis, J. Pachocki, and M. Mitzenmacher, \u201cScalable motif-aware graph clustering,\u201d\n\narXiv preprint arXiv:1606.06235, 2016.\n\n10\n\n\f[22] N. R. Devanur, S. Dughmi, R. Schwartz, A. Sharma, and M. Singh, \u201cOn the approximation of\n\nsubmodular functions,\u201d arXiv preprint arXiv:1304.4948, 2013.\n\n[23] S. Agarwal, K. Branson, and S. Belongie, \u201cHigher order learning with graphs,\u201d in International\n\nConference on Machine Learning (ICML). ACM, 2006, pp. 17\u201324.\n\n[24] G. Li, L. Qi, and G. Yu, \u201cThe z-eigenvalues of a symmetric tensor and its application to spectral\nhypergraph theory,\u201d Numerical Linear Algebra with Applications, vol. 20, no. 6, pp. 1001\u20131029,\n2013.\n\n[25] A. R. Benson, D. F. Gleich, and J. Leskovec, \u201cTensor spectral clustering for partitioning higher-\norder network structures,\u201d in Proceedings of the 2015 SIAM International Conference on Data\nMining (ICDM), 2015, pp. 118\u2013126.\n\n[26] A. Louis, \u201cHypergraph markov operators, eigenvalues and approximation algorithms,\u201d in\nProceedings of the forty-seventh annual ACM symposium on Theory of computing (STOC),\n2015, pp. 713\u2013722.\n\n[27] D. Ghoshdastidar and A. Dukkipati, \u201cConsistency of spectral partitioning of uniform hyper-\ngraphs under planted partition model,\u201d in Advances in Neural Information Processing Systems\n(NIPS), 2014, pp. 397\u2013405.\n\n[28] \u2014\u2014, \u201cConsistency of spectral hypergraph partitioning under planted partition model,\u201d arXiv\n\npreprint arXiv:1505.01582, 2015.\n\n[29] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, \u201cNetwork motifs:\nsimple building blocks of complex networks,\u201d Science, vol. 298, no. 5594, pp. 824\u2013827, 2002.\n[30] \u201cFlorida bay trophic exchange matrix,\u201d http://vlado.fmf.uni-lj.si/pub/networks/data/bio/\n\nfoodweb/Florida.paj.\n\n[31] S. Allesina, A. Bodini, and C. Bondavalli, \u201cEcological subsystems via graph theory: the role of\n\nstrongly connected components,\u201d Oikos, vol. 110, no. 1, pp. 164\u2013176, 2005.\n\n[32] P. Awasthi, A. Blum, O. Sheffet, and A. Vijayaraghavan, \u201cLearning mixtures of ranking models,\u201d\n\nin Advances in Neural Information Processing Systems (NIPS), 2014, pp. 2609\u20132617.\n\n[33] C. Meek and M. Meila, \u201cRecursive inversion models for permutations,\u201d in Advances in Neural\n\nInformation Processing Systems (NIPS), 2014, pp. 631\u2013639.\n\n[34] J. Huang, C. Guestrin et al., \u201cUncovering the rif\ufb02ed independence structure of ranked data,\u201d\n\nElectronic Journal of Statistics, vol. 6, pp. 199\u2013230, 2012.\n\n[35] J. Jiao, K. Venkat, Y. Han, and T. Weissman, \u201cMaximum likelihood estimation of functionals of\ndiscrete distributions,\u201d IEEE Transactions on Information Theory, vol. 63, no. 10, pp. 6774\u2013\n6798, 2017.\n\n[36] Y. Bu, S. Zou, Y. Liang, and V. V. Veeravalli, \u201cEstimation of KL divergence: optimal minimax\n\nrate,\u201d arXiv preprint arXiv:1607.02653, 2016.\n\n[37] W. Gao, S. Oh, and P. Viswanath, \u201cDemystifying \ufb01xed k-nearest neighbor information estima-\ntors,\u201d in IEEE International Symposium on Information Theory (ISIT), 2017, pp. 1267\u20131271.\n[38] I. C. Gormley and T. B. Murphy, \u201cA latent space model for rank data,\u201d in Statistical Network\n\nAnalysis: Models, Issues, and New Directions. Springer, 2007, pp. 90\u2013102.\n\n11\n\n\f", "award": [], "sourceid": 1344, "authors": [{"given_name": "Pan", "family_name": "Li", "institution": "University of Illinois Urbana-Champaign"}, {"given_name": "Olgica", "family_name": "Milenkovic", "institution": "University of Illinois at Urbana-Champaign"}]}