{"title": "Generalized Method-of-Moments for Rank Aggregation", "book": "Advances in Neural Information Processing Systems", "page_first": 2706, "page_last": 2714, "abstract": "In this paper we propose a class of efficient Generalized Method-of-Moments(GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. Our technique is based on breaking the full rankings into pairwise comparisons, and then computing parameters that satisfy a set of generalized moment conditions. We identify conditions for the output of GMM to be unique, and identify a general class of consistent and inconsistent breakings. We then show by theory and experiments that our algorithms run significantly faster than the classical Minorize-Maximization (MM) algorithm, while achieving competitive statistical efficiency.", "full_text": "Generalized Method-of-Moments for Rank\n\nAggregation\n\nHossein Azari Sou\ufb01ani\n\nSEAS\n\nHarvard University\n\nazari@fas.harvard.edu\n\nWilliam Z. Chen\n\nStatistics Department\nHarvard University\n\nwchen@college.harvard.edu\n\nDavid C. Parkes\n\nSEAS\n\nHarvard University\n\nparkes@eecs.harvard.edu\n\nLirong Xia\n\nComputer Science Department\nRensselaer Polytechnic Institute\n\nTroy, NY 12180, USA\n\nxial@cs.rpi.edu\n\nAbstract\n\nIn this paper we propose a class of ef\ufb01cient Generalized Method-of-Moments\n(GMM) algorithms for computing parameters of the Plackett-Luce model, where\nthe data consists of full rankings over alternatives. Our technique is based on\nbreaking the full rankings into pairwise comparisons, and then computing param-\neters that satisfy a set of generalized moment conditions. We identify conditions\nfor the output of GMM to be unique, and identify a general class of consistent\nand inconsistent breakings. We then show by theory and experiments that our al-\ngorithms run signi\ufb01cantly faster than the classical Minorize-Maximization (MM)\nalgorithm, while achieving competitive statistical ef\ufb01ciency.\n\n1\n\nIntroduction\n\nIn many applications, we need to aggregate the preferences of agents over a set of alternatives to\nproduce a joint ranking. For example, in systems for ranking the quality of products, restaurants, or\nother services, we can generate an aggregate rank through feedback from individual users. This idea\nof rank aggregation also plays an important role in multiagent systems, meta-search engines [4],\nbelief merging [5], crowdsourcing [15], and many other e-commerce applications.\nA standard approach towards rank aggregation is to treat input rankings as data generated from\na probabilistic model, and then learn the MLE of the input data. This idea has been explored in\nboth the machine learning community and the (computational) social choice community. The most\npopular statistical models are the Bradley-Terry-Luce model (BTL for short) [2, 13], the Plackett-\nLuce model (PL for short) [17, 13], the random utility model [18], and the Mallows (Condorcet)\nmodel [14, 3]. In machine learning, researchers have focused on designing ef\ufb01cient algorithms to\nestimate parameters for popular models; e.g. [8, 12, 1]. This line of research is sometimes referred\nto as learning to rank [11].\nRecently, Negahban et al. [16] proposed a rank aggregation algorithm, called Rank Centrality (RC),\nbased on computing the stationary distribution of a Markov chain whose transition matrix is de\ufb01ned\naccording to the data (pairwise comparisons among alternatives). The authors describe the approach\nas being model independent, and prove that for data generated according to BTL, the output of RC\nconverges to the ground truth, and the performance of RC is almost identical to the performance of\n\n1\n\n\fMLE for BTL. Moreover, they characterized the convergence rate and showed experimental com-\nparisons.\nOur Contributions. In this paper, we take a generalized method-of-moments (GMM) point of view\ntowards rank aggregation. We \ufb01rst reveal a new and natural connection between the RC algo-\nrithm [16] and the BTL model by showing that RC algorithm can be interpreted as a GMM estimator\napplied to the BTL model.\nThe main technical contribution of this paper is a class of GMMs for parameter estimation under\nthe PL model, which generalizes BTL and the input consists of full rankings instead of pairwise\ncomparisons as in the case of BTL and RC algorithm.\nOur algorithms \ufb01rst break full rankings into pairwise comparisons, and then solve the generalized\nmoment conditions to \ufb01nd the parameters. Each of our GMMs is characterized by a way of breaking\nfull rankings. We characterize conditions for the output of the algorithm to be unique, and we also\nobtain some general characterizations that help us to determine which method of breaking leads to\na consistent GMM. Speci\ufb01cally, full breaking (which uses all pairwise comparisons in the ranking)\nis consistent, but adjacent breaking (which only uses pairwise comparisons in adjacent positions) is\ninconsistent.\nWe characterize the computational complexity of our GMMs, and show that the asymptotic com-\nplexity is better than for the classical Minorize-Maximization (MM) algorithm for PL [8]. We also\ncompare statistical ef\ufb01ciency and running time of these methods experimentally using both synthetic\nand real-world data, showing that all GMMs run much faster than the MM algorithm.\nFor the synthetic data, we observe that many consistent GMMs converge as fast as the MM algo-\nrithm, while there exists a clear tradeoff between computational complexity and statistical ef\ufb01ciency\namong consistent GMMs.\nTechnically our technique is related to the random walk approach [16]. However, we note that\nour algorithms aggregate full rankings under PL, while the RC algorithm aggregates pairwise com-\nparisons. Therefore, it is quite hard to directly compare our GMMs and RC fairly since they are\ndesigned for different types of data. Moreover, by taking a GMM point of view, we prove the consis-\ntency of our algorithms on top of theories for GMMs, while Negahban et al. proved the consistency\nof RC directly.\n2 Preliminaries\nLet C = {c1, .., cm} denote the set of m alternatives. Let D = {d1, . . . , dn} denote the data, where\neach dj is a full ranking over C. The PL model is a parametric model where each alternative ci is\ni=1 \u03b3i = 1. Let (cid:126)\u03b3 = (\u03b31, . . . , \u03b3m) and \u2126 denote the\ni=1 \u03b3i = 1}.\n\nparameterized by \u03b3i \u2208 (0, 1), such that(cid:80)m\nparameter space. Let \u00af\u2126 denote the closure of \u2126. That is, \u00af\u2126 = {(cid:126)\u03b3 : \u2200i, \u03b3i \u2265 0 and (cid:80)m\n\nGiven (cid:126)\u03b3\u2217 \u2208 \u2126, the probability for a ranking d = [ci1 (cid:31) ci2 (cid:31) \u00b7\u00b7\u00b7 (cid:31) cim ] is de\ufb01ned as follows.\n\nl=1 \u03b3il\n\nl=2 \u03b3il\n\n\u00d7 \u00b7\u00b7\u00b7 \u00d7\n\n\u03b3im\u22121\n\n\u03b3im\u22121 + \u03b3im\n\nPrPL(d|(cid:126)\u03b3) =\n\n\u03b3i1(cid:80)m\n\n\u03b3i2(cid:80)m\n\n\u00d7\n\nIn the BTL model, the data is composed of pairwise comparisons instead of rankings, and the\nmodel is parameterized in the same way as PL, such that PrBTL(ci1 (cid:31) ci2|(cid:126)\u03b3) =\n(cid:80)\n.\n\u03b3i1 + \u03b3i2\nBTL can be thought of as a special case of PL via marginalization, since PrBTL(ci1 (cid:31) ci2|(cid:126)\u03b3) =\n\nPrPL(d|(cid:126)\u03b3). In the rest of the paper, we denote Pr = PrPL.\n\n\u03b3i1\n\nd:ci1(cid:31)cc2\n\nGeneralized Method-of-Moments (GMM) provides a wide class of algorithms for parameter estima-\ntion. In GMM, we are given a parametric model whose parametric space is \u2126 \u2286 Rm, an in\ufb01nite\n(cid:80)\nseries of q \u00d7 q matrices W = {Wt : t \u2265 1}, and a column-vector-valued function g(d, (cid:126)\u03b3) \u2208 Rq.\nFor any vector (cid:126)a \u2208 Rq and any q \u00d7 q matrix W , we let (cid:107)(cid:126)a(cid:107)W = ((cid:126)a)T W(cid:126)a. For any data D, let\nd\u2208D g(d, (cid:126)\u03b3), and the GMM method computes parameters (cid:126)\u03b3(cid:48) \u2208 \u2126 that minimize\ng(D, (cid:126)\u03b3) = 1\n(cid:107)g(D, (cid:126)\u03b3(cid:48))(cid:107)Wn, formally de\ufb01ned as follows:\nn\nGMMg(D,W) = {(cid:126)\u03b3(cid:48) \u2208 \u2126 : (cid:107)g(D, (cid:126)\u03b3(cid:48))(cid:107)Wn = inf\n(cid:126)\u03b3\u2208\u2126\n\n(1)\nSince \u2126 may not be compact (as is the case for PL), the set of parameters GMMg(D,W) can be\nempty. A GMM is consistent if and only if for any (cid:126)\u03b3\u2217 \u2208 \u2126, GMMg(D,W) converges in probability\nto (cid:126)\u03b3\u2217 as n \u2192 \u221e and the data is drawn i.i.d. given (cid:126)\u03b3\u2217. Consistency is a desirable property for GMMs.\n\n(cid:107)g(D, (cid:126)\u03b3)(cid:107)Wn}\n\n2\n\n\fIt is well-known that GMMg(D,W) is consistent if it satis\ufb01es some regularity conditions plus the\nfollowing condition [7]:\nCondition 1. Ed|(cid:126)\u03b3\u2217 [g(d, (cid:126)\u03b3)] = 0 if and only if (cid:126)\u03b3 = (cid:126)\u03b3\u2217.\nExample 1. MLE as a consistent GMM: Suppose the likelihood function is twice-differentiable,\nthen the MLE is a consistent GMM where g(d, (cid:126)\u03b3) = (cid:53)(cid:126)\u03b3 log Pr(d|(cid:126)\u03b3) and Wn = I.\nExample 2. Negahban et al. [16] proposed the Rank Centrality (RC) algorithm that aggregates\npairwise comparisons DP = {Y1, . . . , Yn}.1 Let aij denote the number of ci (cid:31) cj in DP and it is\nassumed that for any i (cid:54)= j, aij + aji = k. Let dmax denote the maximum pairwise defeats for an\nalternative. RC \ufb01rst computes the following m \u00d7 m column stochastic matrix:\n\n(cid:26)\n\n1 \u2212(cid:80)\n\nPRC(DP )ij =\n\naij/(kdmax)\nl(cid:54)=i ali/(kdmax)\n\nif i (cid:54)= j\nif i = j\n\nThen, RC computes (PRC(DP ))T \u2019s stationary distribution (cid:126)\u03b3 as the output.\nLet X ci(cid:31)cj (Y ) =\nRC(d) \u00b7 (cid:126)\u03b3. It is not hard to check that the output of RC is the output of GMMgRC .\nLet gRC(d, (cid:126)\u03b3) = P \u2217\nMoreover, GMMgRC satis\ufb01es Condition 1 under the BTL model, and as we will show later in Corol-\nlary 4, GMMgRC is consistent for BTL.\n\nif Y = [ci (cid:31) cj]\notherwise\n\nX ci(cid:31)cj\nl(cid:54)=i X cl(cid:31)ci\n\nif i (cid:54)= j\nif i = j .\n\n\u2212(cid:80)\n\nand P \u2217\n\nRC(Y ) =\n\n0\n\n(cid:26) 1\n\n(cid:26)\n\n3 Generalized Method-of-Moments for the Plakett-Luce model\nIn this section we introduce our GMMs for rank aggregation under PL. In our methods, q = m,\nWn = I and g is linear in (cid:126)\u03b3. We start with a simple special case to illustrate the idea.\nExample 3. For any full ranking d over C, we let\n\u2022 X ci(cid:31)cj (d) =\n\n(cid:26) 1\n\nci (cid:31)d cj\notherwise\n\n0\n\n(cid:26)\n\n\u2212(cid:80)\n\nd\u2208D P (d)\n\nX ci(cid:31)cj (d)\nl(cid:54)=i X cl(cid:31)ci(d)\n\nif i (cid:54)= j\nif i = j\n\n\u2022 P (d) be an m \u00d7 m matrix where P (d)ij =\n\u2022 gF (d, (cid:126)\u03b3) = P (d) \u00b7 (cid:126)\u03b3 and P (D) = 1\nFor example,\n1/2\n1/2 \u22121/2\n1/2\n\n(cid:34) \u22121\n\n1/2\n1\n\u22123/2\n\n(cid:80)\n\n(cid:35)\n\n0\n\nn\n\nlet m = 3, D = {[c1 (cid:31) c2 (cid:31) c3], [c2 (cid:31) c3 (cid:31) c1]}. Then P (D) =\n\n. The corresponding GMM seeks to minimize (cid:107)P (D) \u00b7 (cid:126)\u03b3(cid:107)2\n\n2 for (cid:126)\u03b3 \u2208 \u2126.\n\n\uf8f1\uf8f2\uf8f3\n\n\u2212(cid:80)\n\ni\n\n\u03b3\u2217\n\u03b3\u2217\ni +\u03b3\u2217\n\u03b3\u2217\n\u03b3\u2217\ni +\u03b3\u2217\n\nj\n\nl\n\nif i (cid:54)= j\nif i = j\n\n, which means that\nIt is not hard to verify that (Ed|(cid:126)\u03b3\u2217 [P (d)])ij =\nEd|(cid:126)\u03b3\u2217 [gF (d, (cid:126)\u03b3\u2217)] = Ed|(cid:126)\u03b3\u2217 [P (d)] \u00b7 (cid:126)\u03b3\u2217 = 0.\nIt is not hard to verify that (cid:126)\u03b3\u2217 is the only solution\nto Ed|(cid:126)\u03b3\u2217 [gF (d, (cid:126)\u03b3)] = 0. Therefore, GMMgF satis\ufb01es Condition 1. Moreover, we will show in\nCorollary 3 that GMMgF is consistent for PL.\n\nl(cid:54)=i\n\nl\n\nIn the above example, we count all pairwise comparisons in a full ranking d to build P (d), and de\ufb01ne\ng = P (D) \u00b7 (cid:126)\u03b3 to be linear in (cid:126)\u03b3. In general, we may consider some subset of pairwise comparisons.\nThis leads to the de\ufb01nition of our class of GMMs based on the notion of breakings. Intuitively, a\nbreaking is an undirected graph over the m positions in a ranking, such that for any full ranking\nd, the pairwise comparisons between alternatives in the ith position and jth position are counted to\nconstruct PG(d) if and only if {i, j} \u2208 G.\nDe\ufb01nition 1. A breaking is a non-empty undirected graph G whose vertices are {1, . . . , m}. Given\nany breaking G, any full ranking d over C, and any ci, cj \u2208 C, we let\n\n1The BTL model in [16] is slightly different from that in this paper. Therefore, in this example we adopt an\n\nequivalent description of the RC algorithm.\n\n3\n\n\f(cid:26) 1\n\nG\n\n0\n\n(d) =\n\n{Pos(ci, d), Pos(cj, d)} \u2208 G and ci (cid:31)d cj\notherwise\n\n\u2022 X ci(cid:31)cj\ntion of ci in d.\n\u2022 PG(d) be an m \u00d7 m matrix where PG(d)ij =\n\u2022 gG(d, (cid:126)\u03b3) = PG(d) \u00b7 (cid:126)\u03b3\n\u2022 GMMG(D) be the GMM method that solves Equation (1) for gG and Wn = I.2\n\nX ci(cid:31)cj\nl(cid:54)=i X cl(cid:31)ci\n\nif i (cid:54)= j\nif i = j\n\n\u2212(cid:80)\n\n(d)\n(d)\n\n(cid:26)\n\nG\n\nG\n\n, where Pos(ci, d) is the posi-\n\nIn this paper, we focus on the following breakings, illustrated in Figure 1.\n\u2022 Full breaking: GF is the complete graph. Example 3 is the GMM with full breaking.\n\u2022 Top-k breaking: for any k \u2264 m, Gk\n\u2022 Bottom-k breaking: for any k \u2265 2, Gk\n\u2022 Adjacent breaking: GA = {{1, 2},{2, 3}, . . . ,{m \u2212 1, m}}.\nP = {{k, i} : i (cid:54)= k}.\n\u2022 Position-k breaking: for any k \u2265 2, Gk\n\nB = {{i, j} : i, j \u2265 m + 1 \u2212 k, j (cid:54)= i}.3\n\nT = {{i, j} : i \u2264 k, j (cid:54)= i}.\n\n(a) Full breaking.\n\n(b) Top-3 breaking.\n\n(c) Bottom-3 breaking.\n\n(d) Adjacent breaking.\n\n(e) Position-2 breaking.\n\nFigure 1: Example breakings for m = 6.\n\nIntuitively, the full breaking contains all the pairwise comparisons that can be extracted from each\nagent\u2019s full rank information in the ranking; the top-k breaking contains all pairwise comparisons\nthat can be extracted from the rank provided by an agent when she only reveals her top k alternatives\nand the ranking among them; the bottom-k breaking can be computed when an agent only reveals\nher bottom k alternatives and the ranking among them; and the position-k breaking can be computed\nwhen the agent only reveals the alternative that is ranked at the kth position and the set of alternatives\nranked in lower positions.\n\nWe note that Gm\nT = Gm\nP .\nGk\nl=1 Gl\nWe are now ready to present our GMM algorithm (Algorithm 1) parameterized by a breaking G.\n\nB = GF , G1\n\nT = G1\n\nP , and for any k \u2264 m \u2212 1, Gk\n\nT \u222a Gm\u2212k\n\nB\n\n= GF , and\n\nT =(cid:83)k\n\n2To simplify notation, we use GMMG instead of GMMgG.\n3We need k \u2265 2 since Gk\n\nB is empty.\n\n4\n\n6\t\n\r3\t\n\r4\t\n\r5\t\n\r2\t\n\r1\t\n\r6\t\n\r3\t\n\r4\t\n\r5\t\n\r2\t\n\r1\t\n\r6\t\n\r3\t\n\r4\t\n\r5\t\n\r2\t\n\r1\t\n\r6\t\n\r3\t\n\r4\t\n\r5\t\n\r2\t\n\r1\t\n\r6\t\n\r3\t\n\r4\t\n\r5\t\n\r2\t\n\r1\t\n\r\fAlgorithm 1: GMMG(D)\nInput: A breaking G and data D = {d1, . . . , dn} composed of full rankings.\nOutput: Estimation GMMG(D) of parameters under PL.\nd\u2208D PG(d) in De\ufb01nition 1.\n\n(cid:80)\n\n1 Compute PG(D) = 1\nn\n2 Compute GMMG(D) according to (1).\n3 return GMMG(D).\n\nStep 2 can be further simpli\ufb01ed according to the following theorem. Due to the space constraints,\nmost proofs are relegated to the supplementary materials.\nTheorem 1. For any breaking G and any data D, there exists (cid:126)\u03b3 \u2208 \u00af\u2126 such that PG(D) \u00b7 (cid:126)\u03b3 = 0.\nTheorem 1 implies that in Equation (1), inf(cid:126)\u03b3\u2208\u2126 g(D, (cid:126)\u03b3)T Wng(D, (cid:126)\u03b3)} = 0. Therefore, Step 2 can\nbe replaced by: 2\u2217 Let GMMG = {(cid:126)\u03b3 \u2208 \u2126 : PG(D) \u00b7 (cid:126)\u03b3 = 0}.\n3.1 Uniqueness of Solution\nIt is possible that for some data D, GMMG(D) is empty or non-unique. Our next theorem charac-\nterizes conditions for |GMMG(D)| = 1 and |GMMG(D)| (cid:54)= \u2205. A Markov chain (row stochastic\nmatrix) M is irreducible, if any state can be reached from any other state. That is, M only has one\ncommunicating class.\nTheorem 2. Among the following three conditions, 1 and 2 are equivalent for any breaking G and\nany data D. Moreover, conditions 1 and 2 are equivalent to condition 3 if and only if G is connected.\n\n1. (I + PG(D)/m)T is irreducible.\n2. |GMMG(D)| = 1.\n3. GMMG(D) (cid:54)= \u2205.\nCorollary 1. For the full breaking, adjacent breaking, and any top-k breaking, the three statements\nin Theorem 2 are equivalent for any data D. For any position-k (with k \u2265 2) and any bottom-k\n(with k \u2264 m \u2212 1), 1 and 2 are not equivalent to 3 for some data D.\n\nFord, Jr. [6] identi\ufb01ed a necessary and suf\ufb01cient condition on data D for the MLE under PL to be\nunique, which is equivalent to condition 1 in Theorem 2. Therefore, we have the following corollary.\nCorollary 2. For the full breaking GF , |GMMGF (D)| = 1 if and only if |MLEP L(D)| = 1.\n3.2 Consistency\nWe say a breaking G is consistent (for PL), if GMMG is consistent (for PL). Below, we show that\nsome breakings de\ufb01ned in the last subsection are consistent. We start with general results.\nTheorem 3. A breaking G is consistent if and only if Ed|(cid:126)\u03b3\u2217 [g(d, (cid:126)\u03b3\u2217)] = 0, which is equivalent to\nthe following equalities:\n\nPr(ci (cid:31) cj|{Pos(ci, d), Pos(cj, d)} \u2208 G)\n\nPr(cj (cid:31) ci|{Pos(ci), Pos(cj)} \u2208 G)\n\n\u03b3\u2217\n\u03b3\u2217\n\ni\n\nj\n\nfor all i (cid:54)= j,\n\n=\n\n.\n\n(2)\n\nTheorem 4. Let G1, G2 be a pair of consistent breakings.\n1. If G1 \u2229 G2 = \u2205, then G1 \u222a G2 is also consistent.\n2. If G1 (cid:40) G2 and (G2 \\ G1) (cid:54)= \u2205, then (G2 \\ G1) is also consistent.\nContinuing, we show that position-k breakings are consistent, then use this and Theorem 4 as build-\ning blocks to prove additional consistency results.\nProposition 1. For any k \u2265 1, the position-k breaking Gk\n\nl=1 Gl\n\nP , GF = Gm\n\nWe recall that Gk\n. Therefore, we have the\nfollowing corollary.\nT is consistent, and for any k \u2265 2,\nCorollary 3. The full breaking GF is consistent; for any k, Gk\nB is consistent.\nGk\nTheorem 5. Adjacent breaking GA is consistent if and only if all components in (cid:126)\u03b3\u2217 are the same.\n\nT , and Gk\n\nT\n\nT = (cid:83)k\n\nP is consistent.\nB = GF \\ Gm\u2212k\n\n5\n\n\fLastly, the technique developed in this section can also provide an independent proof that the RC\nalgorithm is consistent for BTL, which is implied by the main theorem in [16]:\nCorollary 4. [16] The RC algorithm is consistent for BTL.\n\nRC is equivalent to GM MgRC that satis\ufb01es Condition 1. By checking similar conditions as we did\nin the proof of Theorem 3, we can prove that GM MgRC is consistent for BTL.\nThe results in this section suggest that if we want to learn the parameters of PL, we should use\nconsistent breakings, including full breaking, top-k breakings, bottom-k breakings, and position-k\nbreakings. The adjacent breaking seems quite natural, but it is not consistent, thus will not provide a\ngood estimate to the parameters of PL. This will also be veri\ufb01ed by experimental results in Section 4.\n\n3.3 Complexity\nWe \ufb01rst characterize the computational complexity of our GMMs.\nProposition 2. The computational complexity of the MM algorithm for PL [8] and our GMMs are\nlisted below.\n\u2022 MM: O(m3n) per iteration.\n\u2022 GMM (Algorithm 1) with full breaking: O(m2n + m2.376), with O(m2n) for breaking and\nO(m2.376) for computing step 2\u2217 in Algorithm 1 (matrix inversion).\n\n\u2022 GMM with adjacent breaking: O(mn + m2.376), with O(mn) for breaking and O(m2.376)\n\nfor computing step 2\u2217 in Algorithm 1.\n\u2022 GMM with top-k breaking: O((m + k)kn + m2.376), with O((m + k)kn) for breaking and\nO(m2.376) for computing step 2\u2217 in Algorithm 1.\n\nIt follows that the asymptotic complexity of the GMM algorithms is better than for the classical MM\nalgorithm. In particular, the GMM with adjacent breaking and top-k breaking for constant k\u2019s are\nthe fastest. However, we recall that the GMM with adjacent breaking is not consistent, while the\nother algorithms are consistent. We would expect that as data size grows, the GMM with adjacent\nbreaking will provide a relatively poor estimation to (cid:126)\u03b3\u2217 compared to the other methods.\nMoreover in the statistical setting in order to gain consistency we need regimes that m = o(n) and\nlarge ns are going to lead to major computational bottlenecks. All the above algorithms (MM and\ndifferent GMMs) have linear complexity in n, hence, the coef\ufb01cient for n is essential in determining\nthe tradeoffs between these methods. As it can be seen above the coef\ufb01cient for n is linear in m for\ntop-k breaking and quadratic for full breaking while it is cubic in m for the MM algorithm. This\ndifference is illustrated through experiments in Figure 5.\nAmong GMMs with top-k breakings, the larger the k is, the more information we use in a single\nranking, which comes at a higher computational cost. Therefore, it is natural to conjecture that for\nwith small k. In other words,\nthe same data, GMMGk\nwe expect to see the following time-ef\ufb01ciency tradeoff among GMMGk\nfor different k\u2019s, which is\nveri\ufb01ed by the experimental results in the next section.\nConjecture 1. (time-ef\ufb01ciency tradeoff) for any k1 < k2, GMMGk1\nprovides a better estimate to the ground truth.\n4 Experiments\nThe running time and statistical ef\ufb01ciency of MM and our GMMs are examined for both synthetic\ndata and a real-world sushi dataset [9]. The synthetic datasets are generated as follows.\n\u2022 Generating the ground truth: for m \u2264 300, the ground truth (cid:126)\u03b3\u2217 is generated from the Dirichlet\ndistribution Dir((cid:126)1).\n\u2022 Generating data: given a ground truth (cid:126)\u03b3\u2217, we generate up to 1000 full rankings from PL.\nWe implemented MM [8] for 1, 3, 10 iterations, as well as GMMs with full breaking, adjacent\nbreaking, and top-k breaking for all k \u2264 m \u2212 1.\n\nwith large k converges faster than GMMGk\n\nruns faster, while GMMGk2\n\nT\n\nT\n\nT\n\nT\n\nT\n\n6\n\n\fWe focus on the following representative criteria. Let (cid:126)\u03b3 denote the output of the algorithm.\n\u2022 Mean Squared Error: MSE = E((cid:107)(cid:126)\u03b3 \u2212 (cid:126)\u03b3\u2217(cid:107)2\n2).\n\u2022 Kendall Rank Correlation Coef\ufb01cient: Let K((cid:126)\u03b3, (cid:126)\u03b3\u2217) denote the Kendall tau distance between\nthe ranking over components in (cid:126)\u03b3 and the ranking over components in (cid:126)\u03b3\u2217. The Kendall correlation\nis 1 \u2212 2 K((cid:126)\u03b3,(cid:126)\u03b3\u2217)\nm(m\u22121)/2.\nAll experiments are run on a 1.86 GHz Intel Core 2 Duo MacBook Air. The multiple repetitions\nfor the statistical ef\ufb01ciency experiments in Figure 3 and experiments for sushi data in Figure 5 have\nbeen done using the odyssey cluster. All the codes are written in R project and they are available as\na part of the package \u201dStatRank\u201d.\n4.1 Synthetic Data\nIn this subsection we focus on comparisons among MM, GMM-F (full breaking), and GMM-A\n(adjacent breaking). The running time is presented in Figure 2. We observe that GMM-A (adjacent\nbreaking) is the fastest and MM is the slowest, even for one iteration.\nThe statistical ef\ufb01ciency is shown in Figure 3. We observe that in regard to the MSE criterion,\nGMM-F (full breaking) performs as well as MM for 10 iterations (which converges), and that these\nare both better than GMM-A (adjacent breaking). For the Kendall correlation criterion, GMM-F (full\nbreaking) has the best performance and GMM-A (adjacent breaking) has the worst performance.\nStatistics are calculated over 1840 trials. In all cases except one, GMM-F (full breaking) outperforms\nMM which outperforms GMM-A (adjacent breaking) with statistical signi\ufb01cance at 95% con\ufb01dence.\nThe only exception is between GMM-F (full breaking) and MM for Kendall correlation at n = 1000.\n\nFigure 2: The running time of MM (one iteration), GMM-F (full breaking), and GMM-A (adjacent breaking),\nplotted in log-scale. On the left, m is \ufb01xed at 10. On the right, n is \ufb01xed at 10. 95% con\ufb01dence intervals are\ntoo small to be seen. Times are calculated over 20 datasets.\n\nFigure 3: The MSE and Kendall correlation of MM (10 iterations), GMM-F (full breaking), and GMM-A\n(adjacent breaking). Error bars are 95% con\ufb01dence intervals.\n4.2 Time-Ef\ufb01ciency Tradeoff among Top-k Breakings\nResults on the running time and statistical ef\ufb01ciency for top-k breakings are shown in Figure 4. We\nrecall that top-1 is equivalent to position-1, and top-(m \u2212 1) is equivalent to the full breaking.\nFor n = 100, MSE comparisons between successive top-k breakings are statistically signi\ufb01cant at\n95% level from (top-1, top-2) to (top-6, top-7). The comparisons in running time are all signi\ufb01cant at\n95% con\ufb01dence level. On average, we observe that top-k breakings with smaller k run faster, while\ntop-k breakings with larger k have higher statistical ef\ufb01ciency in both MSE and Kendall correlation.\nThis justi\ufb01es Conjecture 1.\n\n7\n\n0.010.101.002505007501000n (agents)GMM\u2212AGMM\u2212FMMTime (log scale) (s)0.011.00255075100m (alternatives)Time (log scale) (s)0.0000.0010.0020.0030.0042505007501000n (agents)GMM\u2212AGMM\u2212FMMMSE0.50.60.70.82505007501000n (agents)Kendall Correlation\f4.3 Experiments for Real Data\nIn the sushi dataset [9], there are 10 kinds of sushi (m = 10) and the amount of data n is varied,\nrandomly sampling with replacement. We set the ground truth to be the output of MM applied to\nall 5000 data points. For the running time, we observe the same as for the synthetic data: GMM\n(adjacent breaking) runs faster than GMM (full breaking), which runs faster than MM (The results\non running time can be found in supplementary material B).\nComparisons for MSE and Kendall correlation are shown in Figure 5. In both \ufb01gures, 95% con\ufb01-\ndence intervals are plotted but too small to be seen. Statistics are calculated over 1970 trials.\n\nFigure 4: Comparison of GMM with top-k breakings as k is varied. The x-axis represents k in the top-k\nbreaking. Error bars are 95% con\ufb01dence intervals and m = 10, n = 100.\n\nFigure 5: The MSE and Kendall correlation criteria and computation time for MM (10 iterations), GMM-F\n(full breaking), and GMM-A (adjacent breaking) on sushi data.\n\nFor MSE and Kendall correlation, we observe that MM converges fastest, followed by GMM (full\nbreaking), which outperforms GMM (adjacent breaking) which does not converge. Differences be-\ntween performances are all statistically signi\ufb01cant with 95% con\ufb01dence (with exception of Kendall\ncorrelation and both GMM methods for n = 200, where p = 0.07). This is different from com-\nparisons for synthetic data (Figure 3). We believe that the main reason is because PL does not \ufb01t\nsushi data well, which is a fact recently observed by Azari et al. [1]. Therefore, we cannot ex-\npect that GMM converges to the output of MM on the sushi dataset, since the consistency results\n(Corollary 3) assumes that the data is generated under PL.\n5 Future Work\nWe plan to work on the connection between consistent breakings and preference elicitation. For\nexample, even though the theory in this paper is developed for full ranks, the notion of top-k and\nbottom-k breaking are implicitly allowing some partial order settings. More speci\ufb01cally, top-k\nbreaking can be achieved from partial orders that include full rankings for the top-k alternatives.\nAcknowledgments\nThis work is supported in part by NSF Grants No. CCF- 0915016 and No. AF-1301976. Lirong\nXia acknowledges NSF under Grant No. 1136996 to the Computing Research Association for the\nCIFellows project and an RPI startup fund. We thank Joseph K. Blitzstein, Edoardo M. Airoldi,\nRyan P. Adams, Devavrat Shah, Yiling Chen, G\u00b4abor C\u00b4ardi and members of Harvard EconCS group\nfor their comments on different aspects of this work. We thank anonymous NIPS-13 reviewers, for\nhelpful comments and suggestions.\n\n8\n\n0.00000.00050.00100.00150.00200.0025123456789k (Top k Breaking)MSE (n = 100)0.60.70.8123456789k (Top k Breaking)Kendall Correlation (n = 100)0.10.20.30.4123456789k (Top k Breaking)Time (n = 100)0.00000.00050.00100.001510002000300040005000n (agents)GMM\u2212AGMM\u2212FMMMSE0.40.60.81.010002000300040005000n (agents)Kendall Correlation010020010002000300040005000n (agents)Time\fReferences\n\n[1] Hossein Azari Sou\ufb01ani, David C. Parkes, and Lirong Xia. Random utility theory for social choice. In\nProceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 126\u2013\n134, Lake Tahoe, NV, USA, 2012.\n\n[2] Ralph Allan Bradley and Milton E. Terry. Rank analysis of incomplete block designs: I. The method of\n\npaired comparisons. Biometrika, 39(3/4):324\u2013345, 1952.\n\n[3] Marquis de Condorcet. Essai sur l\u2019application de l\u2019analyse `a la probabilit\u00b4e des d\u00b4ecisions rendues `a la\n\npluralit\u00b4e des voix. Paris: L\u2019Imprimerie Royale, 1785.\n\n[4] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In\n\nProceedings of the 10th World Wide Web Conference, pages 613\u2013622, 2001.\n\n[5] Patricia Everaere, S\u00b4ebastien Konieczny, and Pierre Marquis. The strategy-proofness landscape of merg-\n\ning. Journal of Arti\ufb01cial Intelligence Research, 28:49\u2013105, 2007.\n\n[6] Lester R. Ford, Jr. Solution of a ranking problem from binary comparisons. The American Mathematical\n\nMonthly, 64(8):28\u201333, 1957.\n\n[7] Lars Peter Hansen. Large Sample Properties of Generalized Method of Moments Estimators. Economet-\n\nrica, 50(4):1029\u20131054, 1982.\n\n[8] David R. Hunter. MM algorithms for generalized Bradley-Terry models.\n\nvolume 32, pages 384\u2013406, 2004.\n\nIn The Annals of Statistics,\n\n[9] Toshihiro Kamishima. Nantonac collaborative \ufb01ltering: Recommendation based on order responses. In\nProceedings of the Ninth International Conference on Knowledge Discovery and Data Mining (KDD),\npages 583\u2013588, Washington, DC, USA, 2003.\n\n[10] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov Chains and Mixing Times. American\n\nMathematical Society, 2008.\n\n[11] Tie-Yan Liu. Learning to Rank for Information Retrieval. Springer, 2011.\n[12] Tyler Lu and Craig Boutilier. Learning Mallows models with pairwise preferences. In Proceedings of the\nTwenty-Eighth International Conference on Machine Learning (ICML 2011), pages 145\u2013152, Bellevue,\nWA, USA, 2011.\n\n[13] Robert Duncan Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959.\n[14] Colin L. Mallows. Non-null ranking model. Biometrika, 44(1/2):114\u2013130, 1957.\n[15] Andrew Mao, Ariel D. Procaccia, and Yiling Chen. Better human computation through principled voting.\nIn Proceedings of the National Conference on Arti\ufb01cial Intelligence (AAAI), Bellevue, WA, USA, 2013.\n[16] Sahand Negahban, Sewoong Oh, and Devavrat Shah. Iterative ranking from pair-wise comparisons. In\nProceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 2483\u2013\n2491, Lake Tahoe, NV, USA, 2012.\n\n[17] Robin L. Plackett. The analysis of permutations. Journal of the Royal Statistical Society. Series C\n\n(Applied Statistics), 24(2):193\u2013202, 1975.\n\n[18] Louis Leon Thurstone. A law of comparative judgement. Psychological Review, 34(4):273\u2013286, 1927.\n\n9\n\n\f", "award": [], "sourceid": 1262, "authors": [{"given_name": "Hossein", "family_name": "Azari Soufiani", "institution": "Harvard University"}, {"given_name": "William", "family_name": "Chen", "institution": "Harvard University"}, {"given_name": "David", "family_name": "Parkes", "institution": "Harvard University"}, {"given_name": "Lirong", "family_name": "Xia", "institution": "Harvard University"}]}