{"title": "Optimal Sample Complexity of M-wise Data for Top-K Ranking", "book": "Advances in Neural Information Processing Systems", "page_first": 1686, "page_last": 1696, "abstract": "We explore the top-K rank aggregation problem in which one aims to recover a consistent ordering that focuses on top-K ranked items based on partially revealed preference information. We examine an M-wise comparison model that builds on the Plackett-Luce (PL) model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities. As our result, we characterize the minimax optimality on the sample size for top-K ranking. The optimal sample size turns out to be inversely proportional to M. We devise an algorithm that effectively converts M-wise samples into pairwise ones and employs a spectral method using the refined data. In demonstrating its optimality, we develop a novel technique for deriving tight $\\ell_\\infty$ estimation error bounds, which is key to accurately analyzing the performance of top-K ranking algorithms, but has been challenging. Recent work relied on an additional maximum-likelihood estimation (MLE) stage merged with a spectral method to attain good estimates in $\\ell_\\infty$ error to achieve the limit for the pairwise model. In contrast, although it is valid in slightly restricted regimes, our result demonstrates a spectral method alone to be sufficient for the general M-wise model. We run numerical experiments using synthetic data and confirm that the optimal sample size decreases at the rate of 1/M. Moreover, running our algorithm on real-world data, we find that its applicability extends to settings that may not fit the PL model.", "full_text": "Optimal Sample Complexity of M-wise Data for\n\nTop-K Ranking\n\nSchool of Electrical Engineering\n\nElectronics and Telecommunications Research Institute\n\nMinje Jang\u2217\n\nKAIST\n\nSunghyun Kim\u2217\n\nDaejeon, Korea\n\njmj427@kaist.ac.kr\n\nkoishkim@etri.re.kr\n\nChangho Suh\n\nSchool of Electrical Engineering\n\nKAIST\n\nchsuh@kaist.ac.kr\n\nIndustrial and Enterprise Systems Engineering Department\n\nSewoong Oh\n\nUIUC\n\nswoh@illinois.edu\n\nAbstract\n\nWe explore the top-K rank aggregation problem in which one aims to recover a\nconsistent ordering that focuses on top-K ranked items based on partially revealed\npreference information. We examine an M-wise comparison model that builds on\nthe Plackett-Luce (PL) model where for each sample, M items are ranked according\nto their perceived utilities modeled as noisy observations of their underlying true\nutilities. As our result, we characterize the minimax optimality on the sample size\nfor top-K ranking. The optimal sample size turns out to be inversely proportional to\nM. We devise an algorithm that effectively converts M-wise samples into pairwise\nones and employs a spectral method using the re\ufb01ned data. In demonstrating\nits optimality, we develop a novel technique for deriving tight (cid:96)\u221e estimation\nerror bounds, which is key to accurately analyzing the performance of top-K\nranking algorithms, but has been challenging. Recent work relied on an additional\nmaximum-likelihood estimation (MLE) stage merged with a spectral method to\nattain good estimates in (cid:96)\u221e error to achieve the limit for the pairwise model. In\ncontrast, although it is valid in slightly restricted regimes, our result demonstrates\na spectral method alone to be suf\ufb01cient for the general M-wise model. We run\nnumerical experiments using synthetic data and con\ufb01rm that the optimal sample\nsize decreases at the rate of 1/M. Moreover, running our algorithm on real-world\ndata, we \ufb01nd that its applicability extends to settings that may not \ufb01t the PL model.\n\n1\n\nIntroduction\n\nRank aggregation has been explored in a variety of contexts such as social choice [15, 6], web search\nand information retrieval [20], recommendation systems [7], and crowd sourcing [16], to name a few.\nIt aims to bring a consistent ordering to a collection of items, given partial preference information.\n\n\u2217Equal contribution.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fPreference information can take various forms depending on the context. One such form, which we\nexamine in this paper, is ordinal; preferences for alternatives are represented as an ordering. Consider\ncrowd-sourced data collected by annotators asked to rank a few given alternatives based on their\npreference. The aggregated data can be used to identify the most preferred. One example can be\na review process for conference papers (e.g., NIPS) where reviewers are asked to not only review\npapers, but also order them based on how much they enjoy them. The collected data could be used to\nhighlight papers that may interest a large audience. Alternatively, consider sports (races or the like)\nand online games where a number of players compete. One may wish to rank them according to skill.\nIts broad range of applications has led to a volume of work done. Of numerous schemes developed,\narguably most dominant paradigms are spectral algorithms [14, 20, 37, 41, 47, 45] and maximum\nlikelihood estimation (MLE) [22, 28]. Postulating the existence of underlying real-valued preferences\nof items, they aim to produce preference estimates consistent in a global sense, e.g., measured by low\nsquared loss. But such estimates do not necessarily guarantee optimal ranking accuracy. Accurate\nranking has more to do with how well the ordering of estimates matches that of the true preferences,\nand less to do with how close the estimates are to the true preferences minimizing overall errors.\nMoreover, in practice, what we expect from accurate ranking is an ordering that precisely separates\nonly a few items ranked highest from the rest, not an ordering that respects the entire items.\nMain contributions. In light of it, we explore top-K ranking which aims to recover the correct set\nof top-ranked items only. We examine the Plackett-Luce (PL) model which has been extensively\nexplored [24, 18, 5, 25, 38, 43, 33, 4]. It is a special case of random utility models [46] where true\nutilities of items are presumed and a user\u2019s revealed preference is a partial ordering according to\nnoisy manifestations of the utilities. It satis\ufb01es the \u2018independence of irrelevant alternatives\u2019 property\nin social choice theory [34, 35] and is the most popular model in studying human choice behavior\ngiven multiple alternatives (see Section 2). It is well-known that it subsumes as a special case the\nBradley-Terry-Luce (BTL) model [12, 32] which concerns two items. We consider an M-wise\ncomparison model where comparisons are given as a preference ordering of M items. In this setting,\nwe characterize the minimax limit on the sample size (i.e., sample complexity) needed to reliably\nidentify the set of top-K ranked items, which turns out to be inversely proportional to M. To the best\nof our knowledge, it is the \ufb01rst result that characterizes the limit under an M-wise comparison model.\nIn achieving the limit, we propose an algorithm that consists of sample breaking and Rank Centrality\n[37], one spectral method we choose among other variants [10, 9, 37, 33]. First, it converts M-wise\n\nsamples into many more pairwise ones, and in doing so, it carefully chooses only M out of all(cid:0)M\n\npairwise samples obtainable from each M-wise sample. This sample breaking (see Section 3.1)\nextracts only the essential information needed to achieve the limit from given M-wise data. Next,\nusing the re\ufb01ned pairwise data, the algorithm runs a spectral method to identify top-ranked items.\nA novel technique we develop to attain tight (cid:96)\u221e estimation error bounds has been instrumental to\nour progress. Analyzing (cid:96)\u221e error bounds is a critical step to characterizing the minimax sample\ncomplexity for top-K ranking as presented in [17], but has been technically challenging. Even after\ndecades of research since the introduction of spectral methods and MLE, two dominant approaches in\nthe \ufb01eld, we lack notable results for tight (cid:96)\u221e error bounds. This is largely because techniques proven\nuseful to obtain good (cid:96)2 error bounds do not translate into attaining good (cid:96)\u221e error bounds. In this\nregard, our result contributes to progress on (cid:96)\u221e error analysis (see Section 3.2 and the supplementary).\nWe can compare our result to that of [17] by considering M = 2. Although the two optimal sample\ncomplexities match, the conditions under which they do differ; our result turns out to be valid under a\nslightly restricted condition (see Section 3.3). In terms of achievability, the algorithm in [17] merges\nan additional MLE stage with a spectral method, whereas we employ only a spectral method. From\nnumerical experiments, we speculate that the condition under which the result of [17] holds may not\nbe suf\ufb01cient for spectral methods alone to achieve optimality (see Section 4.1).\nWe conduct numerical experiments to support our result. Using synthetic data, we show that the\nminimax optimal sample size indeed decreases at the rate of 1/M. We run our algorithm on real-world\ndata collected from a popular online game (League of Legends) and \ufb01nd its applicability to extend to\nsettings that may not necessarily match the PL model. From the collected data, we extract M-wise\ncomparisons and rank top users in terms of skill. We examine its robustness aspect against partial\ndata and also evaluate its rank result with respect to the of\ufb01cial rank League of Legends provides. In\nboth cases, we compare it with a counting-based algorithm [42, 11] and demonstrate its advantages.\n\n(cid:1)\n\n2\n\n2\n\n\finto(cid:0)M\n\n2\n\nRelated work. To the best of our knowledge, [17] investigated top-K identi\ufb01cation under the random\ncomparison model of interest for the \ufb01rst time. A key distinction here is that we examine the random\nlistwise comparison model based on the PL model. Rank Centrality was developed in [37] based on\nwhich we devise our ranking scheme tailored for listwise comparison data.\nIn the PL model, some viewed ranking as parameter estimation. Maystre and Grossglauser [33]\ndeveloped an algorithm that shares a spirit of spectral ranking and showed its performance is the\nsame as MLE for estimating underlying preference scores. Hajek et al. [25] derived minimax lower\nbounds of parameter estimation error, and examined gaps with upper bounds of MLE as well as MLE\nwith a rank-breaking scheme that decomposes partial rankings into pairwise comparisons.\nSome works examined several sample breaking methods that convert listwise data into pairwise data\nin the PL model. Azari Sou\ufb01ani et al. [5] considered various methods to see if they sustain some\nstatistical property in parameter estimation. It examined full breaking that converts an M-wise sample\n\n(cid:1) pairwise ones, and adjacent breaking that converts an ordinal M-wise sample into M \u2212 1\n\npairwise ones whose associated items are adjacent in the sample. Ashish and Oh [4] considered a\nmethod that converts an M-wise sample into multiple pairwise ones and assigns different importance\nweights to each, and examined the method on several types of comparison graphs.\nThere are a number of works that explored ranking problems in different models and with different\ninterests. Some works [43, 2] have adopted PAC (probably approximately correct) [44] or regret\n[21, 8, 23] as their metric to allow some margin of error, in contrast to our work where 0/1 loss\n(the most stringent criterion) is considered to investigate the worst-case scenario (see Section 2).\nRajkumar and Agarwal [40] put forth statistical assumptions that ensure the convergence of rank\naggregation methods including Rank Centrality and MLE to an optimal ranking. Active ranking\nwhere samples are obtained adaptively has received attention as well. Jamieson and Nowak [29]\nconsidered perfect total ranking and characterized the query complexity gain of adaptive sampling in\nthe noise-free case, and the works of [29, 1] explored the query complexity in the presence of noise\naiming at approximate total rankings. Recently, Braverman et al. [13] considered three noisy models,\nexamining if their algorithm can achieve reliable top-K ranking. Heckel et al. [27] considered a\nmodel where noisy pairwise observations are given, with a goal to partition the items into sets of\npre-speci\ufb01ed sizes based on their scores, which includes top-K ranking as a special case. Mohajer\net al. [36] considered a fairly general noisy model which subsumes as special cases various models.\nThey derived upper bounds on the sample size required for reliable top-K sorting as well as top-K\npartitioning, and showed that active ranking can provide signi\ufb01cant gains over passive ranking.\n\n2 Problem Formulation\n\nNotation. We denote by [n] to represent {1, 2, . . . , n}, and by G = ([n],E (M )) to represent an\nM-wise comparison graph in which total n vertices reside and each hyper-edge is connected if there\nis a comparison among M vertices, and di to represent the out-degree of vertex i.\nComparison model and assumptions. Suppose we perform a few evaluations on n items. We\nassume the comparison outcomes are generated based on the PL model [39]. We consider M-wise\nmodels where the comparison outcomes are obtained in the form of a preference ordering of M items.\nPreference scores. The PL model assumes the existence of underlying preferences w :=\n{w1, w2, . . . , wn}, where wi represents the preference score of item i. The outcome of each compar-\nison depends solely on the latent scores of the items being compared. Without loss of generality, we\nassume that w1 \u2265 w2 \u2265 \u00b7\u00b7\u00b7 \u2265 wn > 0. We assume the range of scores to be \ufb01xed irrespective of n.\nFor some positive constants wmin and wmax, wi \u2208 [wmin, wmax], 1 \u2264 i \u2264 n. We note that the case\nwhere the range wmax/wmin grows with n can be translated into the above \ufb01xed-range regime by\nseparating out those items with vanishing scores (e.g. via a voting method like Borda count [11, 3]).\nComparison model. We denote by G = ([n],E (M )) a comparison graph where a set of M items\nI = {i1, i2, . . . , iM} are compared if and only if I belongs to the hyper-edge set E (M ). We examine\nrandom graphs, constructed in a similar manner according to the Erd\u02ddos-R\u00e9nyi random graph model;\neach set of M vertices is connected by a hyper-edge independently with probability p. Notice that\nwhen M = 2, such random graphs we consider follow precisely the Erd\u02ddos-R\u00e9nyi random model.\n\n3\n\n\fm=1\n\n(cid:17)\n\nr=m wir\n\n(cid:16)\nwim/(cid:80)M\n\nM-wise comparisons. We observe L samples for each I = {i1, i2, . . . , iM} \u2208 E (M ). Each\nsample is an ordering of M items in order of preference. The outcome of the (cid:96)th sample, de-\n(cid:81)M\nnoted by s((cid:96))I , is generated according to the PL model: s((cid:96))I = (i1, i2, . . . , iM ) with probability\n, where item ia is preferred over item ib in I if ia appears to the left\nof ib, which we also denote by ia (cid:31) ib. We assume that conditional on G, s((cid:96))I \u2019s are jointly inde-\npendent over I and (cid:96). We denote the collection of all samples by s := {sI : I \u2208 E (M )}, where\nsI = {s(1)I , s(2)I , . . . , s(L)I }.\nPerformance metric and goal. Given comparison data, one wishes to know whether or not the\ntop-K ranked items are identi\ufb01able. We consider the probability of error Pe in identifying the correct\nset of the top-K ranked items: Pe(\u03c8) := P{\u03c8(s) (cid:54)= [K]}, where \u03c8 is any ranking scheme that\nreturns a set of K indices and [K] is the set of the \ufb01rst K indices. Our goal in this work is to\ncharacterize the admissible region Rw of (p, L) in which top-K ranking is feasible for a given PL\nparameter w, in other words, Pe can be vanishingly small as n grows. The admissible region Rw is\nde\ufb01ned as Rw := {(p, L) : limn\u2192\u221e Pe(\u03c8(s)) = 0}. In particular, we are interested in the minimax\nsample complexity of an estimator de\ufb01ned as S\u03b4 :=\nwhere \u2126\u03b4 = {v \u2208 Rn : (vK \u2212 vK+1)/vmax \u2265 \u03b4}. Note that this de\ufb01nition shows that we\nconservatively examine minimax scenarios where nature behaves adversely with the worst-case w.\n\n(cid:1)pL : (p, L) \u2208 Rv\n\n(cid:8)(cid:0) n\n\np\u2208[0,1],L\u2208Z+\n\nsup\nv\u2208\u2126\u03b4\n\n(cid:9),\n\ninf\n\nM\n\n3 Main Results\n\nSeparating the two items near the decision boundary (i.e., the K th and (K + 1)th ranked items) is\nkey in top-K ranking. Unless the gap is large enough, noise in the observations leads to erroneous\nestimates which no ranking scheme can overcome. We pinpoint a separation measure as \u2206K :=\n(wK \u2212 wK+1)/wmax, which turns out to be crucial in establishing the fundamental limit.\nNoted in [22], if a comparison graph G is not connected, it is impossible to determine the relative pref-\nerences between two disconnected entities. Thus, we assume all comparison graphs to be connected.\n\nTo guarantee it, for a hyper-random graph with edge size M, we assume p > log n/(cid:0) n\u22121\n\n(cid:1)2.\n\nM\u22121\n\nNow, let us formally state our main results. First, for comparison graphs under M-wise observations,\nwe establish a necessary condition for top-K ranking.\nTheorem 1. Fix \u0001 \u2208 (0, 1\n\n2 ). Given an M-wise comparison graph G = ([n],E (M )), if\n\npL \u2264 c0(1 \u2212 \u0001)\n\nn log n\n\n\u22062\nK\n\n1\nM\n\n,\n\n(1)\n\n(cid:18) n\n\n(cid:19)\n\nM\n\nfor some numerical constant c0, then for any ranking scheme \u03c8, there exists a preference score vector\nw with separation measure \u2206K such that Pe(\u03c8) \u2265 \u0001.\nThe proof is a generalization of Theorem 2 in [17], and we provide it in the supplementary. Next, for\ncomparison graphs under M-wise observations, we establish a suf\ufb01cient condition for top-K ranking.\nTheorem 2. Given an M-wise comparison graph G = ([n],E (M )) and p \u2265 c1(M \u2212 1)\n\n(cid:114)\n\n, if\n\n(cid:18) n\n\n(cid:19)\n\nM\n\npL \u2265 c2\n\nn log n\n\n\u22062\nK\n\n1\nM\n\n,\n\nlog n\n( n\u22121\nM\u22121)\n\n(2)\n\nfor some numerical constants c1 and c2, then Rank Centrality correctly identi\ufb01es the top-K ranked\nitems with probability at least 1 \u2212 2n\u2212 1\n15 .\nWe provide the proof of Theorem 2 in the supplementary. From below, we describe the algorithm we\nuse, sample breaking and Rank Centrality [37], and soon give an outline of the proof.\nNote that Theorem 1 gives a necessary condition of the sample complexity S\u2206K\n(cid:46) n log n/M \u22062\nand Theorem 2 gives a suf\ufb01cient condition of it S\u2206K\nestablish the minimax optimality of Rank Centrality: n log n/M \u22062\nK.\n\n(cid:1) is derived in [19] as a sharp threshold for connectivity of hyper-graphs. We assume a\n\n(cid:38) n log n/M \u22062\nK, and they match. That is, we\n\n2p > log n/(cid:0) n\n\nslightly more strict condition for ease of analysis. This does not make a big difference in our result, as the two\nconditions are almost identical order-wise given M < n/2, a reasonable condition for regimes where n is large.\n\nM\u22121\n\nK\n\n4\n\n\f3.1 Algorithm description\n\nAlgorithm 1 Rank Centrality [37]\n\nInput the collection of statistics s =(cid:8)sI : I \u2208 E (M )(cid:9).\n\nConvert the M-wise sample for each hyper-edge I into M pairwise samples:\n1. Choose a circular permutation of the items in I uniformly at random,\n2. Break it into the M pairs of adjacent items, and denote the set of pairs by \u03c6(I),\n3. Use the (pairwise) data of the pairs in \u03c6(I).\nif i (cid:54)= j;\nif i = j;\notherwise.,\n\nCompute the transition matrix \u02c6P = [ \u02c6Pij]1\u2264i,j\u2264n: \u02c6Pij =\nwhere dmax is the maximum out-degree of vertices in E (M ).\nOutput the stationary distribution of matrix \u02c6P .\n\n\uf8f1\uf8f2\uf8f3 1\n1 \u2212(cid:80)\n\nyij\nk:k(cid:54)=j\n\n\u02c6Pkj\n\n2dmax\n\n0\n\nRank Centrality aims to estimate rankings from pairwise comparison data. Thus, to make use of\nM-wise comparison data for Rank Centrality, we apply a sample breaking method that converts\nM-wise data into pairwise data. To be more speci\ufb01c, if there is a hyper-edge I = {1, 2, . . . , M}, we\nchoose a circular permutation of the items in I uniformly at random. Suppose we pick a circular\npermutation (1, 2, . . . , M \u2212 1, M, 1). Then, we break it into M pairs of items in the order speci\ufb01ed\nby the permutation: {1, 2},{2, 3}, . . . ,{M \u2212 1, M},{M, 1} (see Section 3.3 for a remark on why\nwe do not lose optimality by our sample breaking method). Let us denote by \u03c6(I) this set of pairs.\nWe use the converted pairwise comparison data associated with the pairs in \u03c6(I)3:\n\nif {i, j} \u2208 \u03c6(I) and i (cid:31) j;\notherwise\n\n,\n\nyij :=\n\n(cid:88)\n\nL(cid:88)\n\n(cid:96)=1\n\n1\nL\n\nI:{i,j}\u2208\u03c6(I)\n\ny((cid:96))\nij,I.\n\n(3)\n\n(cid:26) 1\n\n0\n\ny((cid:96))\nij,I =\n\nIn an ideal scenario where we obtain an in\ufb01nite number of samples per M-wise comparison, i.e.,\nL \u2192 \u221e, suf\ufb01cient statistics 1\nij,I converge to wi/(wi + wj). Then, the constructed matrix\n\u02c6P de\ufb01ned in Algorithm 1 becomes a matrix P whose entries [Pij]1\u2264i,j\u2264n are de\ufb01ned as\n\n(cid:96)=1 y((cid:96))\n\nL\n\n(cid:80)L\n\uf8f1\uf8f2\uf8f3 1\n1 \u2212(cid:80)\n\n2dmax\n\n0\n\n(cid:80)I:{i,j}\u2208\u03c6(I)\n\nk:k(cid:54)=j Pkj\n\nPij =\n\nwi\n\nwi+wj\n\nfor I \u2208 E (M );\nif i = j;\notherwise.\n\n(4)\n\nThe entries for observed item pairs represent the relative likelihood of item i being preferred over\nitem j. Intuitively, random walks of P in the long run visit some states more often, if they have been\npreferred over other frequently-visited states and/or preferred over many other states. The random\nwalks are reversible as wiPji = wjPij holds, and irreducible under the connectivity assumption. Once\nwe obtain the unique stationary distribution, it is equal to w = {w1, . . . , wn} up to some constant\nscaling. It is clear that random walks of \u02c6P , a noisy version of P , will give us an approximation of w.\n\n3.2 Proof outline\n\nWe outline the proof of Theorem 2 by introducing Theorem 3, which we show leads to Theorem 2.\nTheorem 3. When Rank Centrality is employed, with high probability, the (cid:96)\u221e norm estimation error\nis upper-bounded by\n\n(cid:114)\n\nwhere p \u2265 c1(M \u2212 1)\n\n(cid:115)\n\n(cid:114) 1\n\nM\n\n(cid:0) n\n\nM\n\n(cid:1)pL\n\nn log n\n\n(cid:107) \u02c6w \u2212 w(cid:107)\u221e\n\n(cid:107)w(cid:107)\u221e\n\n(cid:46)\n\nlog n\n( n\u22121\nM\u22121)\n\n, and c1 is some numerical constant.\n\n,\n\n(5)\n\n3In comparison, the adjacent breaking method [5] directly follows the ordering evaluated in each sample; if it\nis 1 \u227a 2 \u227a \u00b7\u00b7\u00b7 \u227a M \u2212 1 \u227a M, it is broken into pairs of adjacent items: 1 \u227a 2 up to M \u2212 1 \u227a M. Our method\nturns out to be consistent, i.e., Pr[yij =1]\n\n(see (4)), whereas the adjacent breaking method is not [5].\n\nPr[yji=0] = wi\n\nwj\n\n5\n\n\f(cid:113)\nlog n/(cid:0) n\ndesired. Hence, as long as \u2206K (cid:38)(cid:113)\n\nLet (cid:107)w(cid:107)\u221e = wmax = 1 for ease of presentation.\n\n(cid:1)pL(cid:112)1/M. Then, \u02c6wi \u2212 \u02c6wj \u2265 wi \u2212 wj \u2212 | \u02c6wi \u2212 wi| \u2212 | \u02c6wj \u2212 wj| \u2265 wK \u2212 wK+1 \u2212\n\nSuppose \u2206K = wK \u2212 wK+1 (cid:38)\n\n2(cid:107) \u02c6w \u2212 w(cid:107)\u221e > 0, for all 1 \u2264 i \u2264 K and j \u2265 K + 1. That is, the top-K items are identi\ufb01ed as\nK, reliable\n\nM\n\ntop-K ranking is achieved with the sample size of n log n/M \u22062\nK.\nNow, let us prove Theorem 3. To \ufb01nd an (cid:96)\u221e error bound, we \ufb01rst derive an upper bound on the\npoint-wise error between the score estimate of item i and its true score, which consists of three terms:\n\nM\n\nM\n\n| \u02c6wj \u2212 wj| \u02c6Pij +\n\nlog n/(cid:0) n\n\n(cid:1)pL(cid:112)1/M, i.e.,(cid:0) n\n(cid:12)(cid:12)(cid:12)(cid:12) (cid:88)\n(cid:88)\n\n(cid:1)pL (cid:38) n log n/M \u22062\n(cid:17)(cid:12)(cid:12)(cid:12)(cid:12).\n(cid:1)pL\n(cid:0) n\n(cid:1)pL(cid:112)1/M (Theorem 4 in the supplementary) to the desired (cid:96)\u221e error bound (5).\n\n(cid:16) \u02c6Pji \u2212 Pji\n(cid:115)\n\n(cid:88)\n(cid:17)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46)\n\n(cid:16) \u02c6Pji \u2212 Pji\n\n| \u02c6wj \u2212 wj| \u02c6Pij (cid:46)\n\n(cid:114) 1\n\n(cid:1)pL\n\n(cid:114) 1\n\n(wi + wj)\n\n(cid:0) n\n\n(cid:115)\n\nn log n\n\nn log n\n\nj:j(cid:54)=i\n\nj:j(cid:54)=i\n\nj:j(cid:54)=i\n\nM\n\nM\n\nM\n\n,\n\nwith high probability (Lemmas 1, 2 and 3 in the supplementary). (7) ends the proof. We obtain the\n\ufb01rst two from Hoeffding\u2019s inequality. The last is key; this is where we sharply link an (cid:96)2 error bound\nof\n\nj:j(cid:54)=i\n\n(cid:12)(cid:12)(cid:12)(cid:12) (cid:88)\nn log n/(cid:0) n\n\nM\n\n(cid:113)\n\nWe can obtain (6) from \u02c6w = \u02c6P \u02c6w and w = P w. We then obtain upper bounds on the three terms:\n\n| \u02c6wi \u2212 wi| \u2264 | \u02c6wi \u2212 wi| \u02c6Pii +\n\n(6)\n\n,\n\nM\n(7)\n\n\u02c6Pii < 1,\n\n(wi + wj)\n\nOn the left hand side of the third inequality, the point-wise error of item j which affects that of\nitem i as expressed in (6), may not be captured for some j, since there may be no hyper-edge that\nincludes items i and j. This makes it hard to draw a link from the obtained (cid:96)2 error bound to the\ninequality, since (cid:96)2 errors can be seen as the sum of all point-wise errors. To include them all, we\nrecursively apply (6) to | \u02c6wj \u2212 wj| in the third inequality and then apply the rest two properly (for\ndetailed derivation, see the beginning of the proof of Lemma 3 in the supplementary). Then, we get\n\n| \u02c6wk \u2212 wk| \u02c6Pjk \u02c6Pij +\n\n.\n\n(8)\n\nManipulating the \ufb01rst term of the right hand side (for derivation, see the proof of Lemma 3), we get\n\n(cid:114) 1\n\nM\n\nn log n\n\n(cid:1)pL\n\n(cid:115)\n\nM\n\n(cid:0) n\n(cid:18) (cid:88)\n\nj:j /\u2208{i,k}\n\n(cid:19)2\n\n(cid:118)(cid:117)(cid:117)(cid:116) n(cid:88)\n\nk=1\n\n\u02c6Pjk \u02c6Pij \u2264 (cid:107) \u02c6w \u2212 w(cid:107)2\n\n\u02c6Pjk \u02c6Pij\n\n.\n\n(9)\n\n(cid:88)\n\nj:j(cid:54)=i\n\nk:k(cid:54)=j\n\nj:j(cid:54)=i\n\n(cid:88)\n\n| \u02c6wj \u2212 wj| \u02c6Pij (cid:46) (cid:88)\nn(cid:88)\n| \u02c6wk \u2212 wk| (cid:88)\nWe show that(cid:80)\n(cid:113)\nn log n/(cid:0) n\n3. Since (cid:107)w(cid:107)2 \u2264 \u221a\nTo describe the concentration of(cid:80)\n(cid:88)\n\nerror bound to be\n\nj:j /\u2208{i,k}\n\nk=1\n\nM\n\n\u221a\n\nn(cid:107)w(cid:107)\u221e =\n\nj:j /\u2208{i,k} \u02c6Pjk \u02c6Pij concentrates on the order of 1/n for all k\u2019s in the proof of Lemma\nn \u2264 (cid:107) \u02c6w \u2212 w(cid:107)2/(cid:107)w(cid:107)2. We derive this (cid:96)2\n\n(cid:1)pL(cid:112)1/M (Theorem 4 in the supplementary), matching (5).\n\n\u221a\nn, we get (cid:107) \u02c6w \u2212 w(cid:107)2 /\n\nthem, we upper-bound it as follows (for details, see the proof of Lemma 3 in the supplementary).\n\nj:j /\u2208{i,k} \u02c6Pjk \u02c6Pij, we need to consider dependencies in it. To see\n\n(cid:88)\n\n(cid:88)\n\n\u02c6Pij \u02c6Pjk \u2264 1\n4d2\n\nj:j /\u2208{i,k}\n\nmax\n\nj:j /\u2208{i,k}\n\nI1:i,j\u2208I1,I2:j,k\u2208I2\n\nXI1I2,\n\n(10)\n\nwhere XI1I2 := I [{i, j} \u2208 \u03c6(I1)] I [{j, k} \u2208 \u03c6(I2)] . For M > 2, there can exist ja and jb such that\n{i, ja, jb} \u2208 I1, ja \u2208 I2 and jb /\u2208 I2. Then, summing over j, XI1I2 and XI1I3, where I3 is another\nhyper-edge that includes jb and k, are dependent concerning the same hyper-edge I1. To handle this,\nwe use Janson\u2019s inquality [30], one of concentration inequalities that consider dependencies.\nTo derive a necessary condition matching our suf\ufb01cient condition, we use a generalized version of\nFano\u2019s inequality [26] as in the proof of Theorem 3 in [17] and complete combinatorial calculations.\n\n6\n\n\f3.3 Discussion\n\nOptimality versus M \u2014 intuition behind our sample breaking method: For each M-wise sample, we\nform a circular permutation uniformly at random, and extract M pairwise samples each of which\nconcerns two adjacent items in it. Suppose we have an M-wise sample 1 \u227a 2 \u227a \u00b7\u00b7\u00b7 \u227a M, and\nfor simplicity we happen to form a circular permutation as (1, 2, . . . , M \u2212 1, M, 1); we extract M\npairwise samples as 1 \u227a 2, 2 \u227a 3, . . . , (M \u2212 1) \u227a M, 1 \u227a M. Let us provide the intuition behind\nwhy this leads us to the optimal sample complexity. For the case of M = 2, Rank Centrality achieves\nthe optimal order-wise sample complexity of n log n/\u22062\nK as characterized in [17]. In addition, one\nM-wise sample in the PL model can be broken into M \u2212 1 independent pairwise ones, since pairwise\ndata of two arbitrary items among the M items depend on the true scores of the two items only. In our\nexample, one can convert the M-wise sample into M \u2212 1 independent pairwise ones as 1 \u227a 2, 2 \u227a 3,\n. . . , (M \u2212 1) \u227a M. From these, it is intuitive to see that we can achieve reliable top-K ranking with\nan order-wise sample complexity of n log n/(M \u2212 1)\u22062\nK by converting each M-wise sample into\nM \u2212 1 independent pairwise ones. Notice a close gap to the optimal sample complexity in Section 3.\nTight (cid:96)\u221e error bounds: As shown in Section 3.2, deriving a tight (cid:96)\u221e error bound is critical to\nanalyzing the performance of a top-K ranking algorithm. Recent work [17] has relied on combining\nan additional stage of local re\ufb01nement in series with Rank Centrality to derive it, and characterized\nthe optimal sample complexity for the pairwise model. In contrast, although it is valid in a slightly\nrestricted regime (see the next remark), we employ only Rank Centrality and still succeed in achieving\noptimality for the M-wise model that includes the pairwise model. Deriving tight (cid:96)\u221e error bounds\nbeing crucial, it is hard for one to attain this result without a \ufb01ne analytical technique. It is our main\ntheoretical contribution to develop one. For details, see the proof of Lemma 3 in the supplementary\nthat sharply links an (cid:96)\u221e error bound (Theorem 3 therein) and an (cid:96)2 error bound (Theorem 4 therein).\nRank Centrality has been shown to achieve the performance nearly as good as MLE in terms of (cid:96)2\nerror, but little has been known in terms of (cid:96)\u221e error, until now. Our result has made clear progress.\nAnalytical technique: Our analysis is not limited to Rank Centrality. Whenever one wishes to compute\nthe difference between the leading eigenvector of any matrix and that of its noisy version, one can\nobtain (6), (8) and (9). Thus, it can be adopted to link (cid:96)2 and (cid:96)\u221e error bounds for any spectral method.\nDense regimes: Our main result concerns a slightly denser regime, indicated by the condition\np (cid:38) (M \u22121)\nsee that this dense regime condition is not necessary for top-K ranking; for the pairwise case M = 2, it\nis p (cid:38) log n/n as shown in [17]. However, it is not clear yet whether or not the dense regime condition\nis required under our approach that employs only a spectral method. Our speculation from numerical\n\n(cid:1), where many distinct groups of items are likely to be compared. One can\n(cid:1), may\n\nexperiments is that the sparse regime condition, log n/(cid:0) n\u22121\n\n(cid:113)\n(cid:1) (cid:46) p (cid:46) (M \u2212 1)\n\nlog n/(cid:0) n\u22121\n\nM\u22121\n\n(cid:113)\n\nlog n/(cid:0) n\u22121\n\nM\u22121\n\nnot be suf\ufb01cient for spectral methods to achieve reliable top-K ranking (see Section 4).\n\nM\u22121\n\n4 Experimental Results\n\n4.1 Synthetic data simulation\n\nFigure 1: Dense regime (pdense = 0.25, \ufb01rst two \ufb01gures): empirical (cid:96)\u221e estimation error v.s. L (left);\nempirical success rate v.s. L (right). Sparse regime (psparse = 0.025, last two \ufb01gures): empirical (cid:96)\u221e\nestimation error v.s. L (left); empirical success rate v.s. L (right).\n\nFirst, we conduct a synthetic data experiment for M = 2, the pairwise comparison model, to compare\n\nour result in Theorem 2 to that in recent work [17]. We consider both dense (p (cid:38)(cid:112)log n/n) and\nsparse (log n/n (cid:46) p (cid:46) (cid:112)log n/n) regimes. We set constant c1 = 2, and set pdense = 0.25 and\n\npsparse = 0.025, to make each be in its proper range. We use n = 500, K = 10, and \u2206K = 0.1.\nEach result in all numerical simulations is obtained by averaging over 10000 Monte Carlo trials.\n\n7\n\nL:numberofrepeatedcomparisons1510152025\u2113\u221enormofestimationerrors00.10.20.30.40.5RankCentrality:p=0.25SpectralMLE:p=0.25L:numberofrepeatedcomparisons1510152025empiricalsuccessrate00.10.20.30.40.50.60.70.80.91RankCentrality:p=0.25SpectralMLE:p=0.25BordaCount:p=0.25L:numberofrepeatedcomparisons1050100150200250\u2113\u221enormofestimationerrors00.10.20.30.40.5RankCentrality:p=0.025SpectralMLE:p=0.025L:numberofrepeatedcomparisons1050100150200250empiricalsuccessrate00.10.20.30.40.50.60.70.80.91RankCentrality:p=0.025SpectralMLE:p=0.025BordaCount:p=0.025\fIn Figure 1, the \ufb01rst two \ufb01gures show the experiments in the dense regime. We see that as L increases,\nmeaning as we obtain pairwise samples beyond the minimal sample complexity, (1) the (cid:96)\u221e error\nof Rank Centrality decreases and meets that of Spectral MLE (left); (2) the success rate of Rank\nCentrality increases and soon hits 100% along with Spectral MLE (right). The curves support our\n\nmay not be suf\ufb01cient for spectral methods to achieve reliable top-K ranking.\n\nresult; in the dense regime p (cid:38)(cid:112)log n/n, Rank Centrality alone can achieve reliable top-K ranking.\n(right). The curves lead us to speculate that the sparse regime condition log n/n (cid:46) p (cid:46)(cid:112)log n/n\n\nThe last two \ufb01gures show the experiments in the sparse regime. We see that as L increases, (1) the\n(cid:96)\u221e error of Rank Centrality decreases but does not meet that of Spectral MLE (left); (2) the success\nrate of Rank Centrality increases but does not reach that of Spectral MLE which hits nearly 100%\n\nFigure 2: Empirical minimal sample complexity v.s. M (\ufb01rst), \u2206K (second), and n log n (third).\n\nNext, we corroborate our optimal sample complexity result in Theorem 2. We examine whether the\nempirical minimal sample complexity decreases at the rate of 1/M and 1/\u22062\nK, and increases at the\nrate of n log n. To verify its reduction at the rate of 1/M, we run experiments for M ranging from\n3 to 15. We increase the number of samples by increasing p until the success rate reaches 95% for\neach M. The number of samples we use to achieve it is considered as the empirical minimal sample\ncomplexity for each M. We set the other parameters as n = 100, L = 20, K = 5 and \u2206K = 0.3.\nThe result for each M in all simulations is obtained by averaging over 1000 Monte Carlo trials. To\nverify the other two relations, we follow similar procedures. As for 1/\u22062\nK, we set n = 200, M = 2,\nL = 20 and K = 5. As for n log n, we set M = 2, L = 4, K = 5 and \u2206K = 0.4.\nThe \ufb01rst \ufb01gure in Figure 2 shows the reduction in empirical minimal sample complexity with a blue\nsolid curve. The red dashed curve is obtained by curve-\ufb01tting. We can see that the empirical minimal\nsample complexity drops inversely proportional to M. From the second and third \ufb01gures, we can see\nthat in terms of \u2206K and n log n, it also behaves as our result in Theorem 2 predicts.\n\nFigure 3: (First) Empirical success rates of four algorithms: our algorithm (blue circle), heuristic\nSpectral MLE (red cross), least square (green plus), and counting (purple triangle); (Second) Top-5\nranked users: normalized overlap v.s. fraction of samples used; (Third) Top-5 users\u2019 (sorted by\naverage League of Legends points earned per match) percentile in the ranks by our algorithm, heuristic\nSpectral MLE, least square, and counting. For instance, the user who earns largest points per match\n(\ufb01rst entry) is at around the 80-th percentile according to our algorithm and heuristic Spectral MLE,\nthe 60-th percentile according to least square, and the 10-th percentile according to counting.\n\nLast, we evaluate the success rates of various algorithms on M-wise comparison data. We consider\nour proposed algorithm, Spectral MLE, least square (HodgeRank [31]), and counting. Since Spectral\nMLE has been developed for pairwise data, we heuristically extend it. We apply our sample breaking\nmethod to obtain pairwise data needed. For any parameters required to run Spectral MLE, we\nheuristically \ufb01nd the best ones which give rise to the highest success rate. In the other two algorithms,\nwe \ufb01rst apply our sample breaking method as well. Then, for least square, we \ufb01nd a score vector\n(i,j)\u2208E (log( \u02c6wi/ \u02c6wj) \u2212 log(yij/yji))2, where E is the edge set for\nthe converted pairwise data, is minimized. For counting, we count each item\u2019s number of wins in all\n\n\u02c6w such that the squared error(cid:80)\n\n8\n\n3456789101112131415M:Sizeofhyper-edges020004000600080001000012000MinimalsamplecomplexityEmpiricalCurve\ufb01tting:1/M0.10.150.20.250.3\u0394K00.511.522.53Minimalsamplecomplexity\u00d7105EmpiricalCurve\ufb01tting:1/\u03942K50010001500n00.511.52Minimalsamplecomplexity\u00d7105EmpiricalCurve\ufb01tting:nlogn101520253035404550L00.10.20.30.40.50.60.70.80.91SuccessrateSpectralMLEProposedLeastSquareCounting0.50.60.70.80.91f:Fractionofsamplesused0.40.50.60.70.80.91Normalizedoverlap(K=5)ProposedSpectralMLELeastSquareCounting12345Top-5usersbasedonaverageleaguepointpermatch020406080100PercentileProposedSpectralMLELeastSquareCounting\f(cid:113)\n\nlog n/(cid:0) n\u22121\n\n(cid:1), K = 5 and\n\ninvolved pairwise data. We use n = 100, M = 4, p = 0.0025 \u00b7 (M \u2212 1)\n\u2206K = 0.3. Each result in all simulations is obtained by averaging over 5000 Monte Carlo trials.\nThe \ufb01rst \ufb01gure in Figure 3 shows that our algorithm and heuristic Spectral MLE perform best (the\nlatter being marginally better), achieving near-100% success rates for large L. It also shows that they\noutperform the other two algorithms which do not achieve near-100% success rates even for large L.\n\nM\u22121\n\n4.2 Real-world data simulation\n\n(cid:96)=1 y((cid:96))\n\nmay play together in multiple, say Lij, matches. We can compute yij := ((cid:80)Lij\n\nOne natural setting where we can obtain M-wise comparison data is an online game. Users randomly\nget together and play, and the results depend on their skills. We \ufb01nd League of Legends to be a proper\n\ufb01t4. In extracting M-wise data, we adopt a measure widely accepted as a factor that rates users\u2019 skill\nin the user community5. We incorporate this measure into our model as follows. For each match\n(M-wise sample), we have 10 users, each associated with its measure. In breaking M-wise samples,\nfor each user pair (i, j), we compare their measures and declare user i wins if its measure is larger\nthan user j\u2019s. This corresponds to y((cid:96))\nin our model. We assign 1 if user i wins and 0 otherwise. They\nij\nij )/Lij to use for\nRank Centrality. As M-wise data is extracted from team competitions, League of Legends does not\nperfectly \ufb01t our model. Yet one main reason to run this experiment is to see whether our algorithm\nworks well in other settings that do not necessarily \ufb01t the PL model, being broadly applicable.\nWe \ufb01rst investigate the robustness aspect by evaluating the performance against partial information.\nTo this end, we use all collected data and obtain a ranking result for each algorithm which we consider\nas its baseline. Then, for each algorithm, we reduce sample sizes by discarding some of the data, and\ncompare the results to the baseline to see how robust each algorithm is against partial information.\nWe conduct this experiment for four algorithms: our proposed algorithm, the heuristic extension of\nSpectral MLE, least square and counting.\nWe choose our metric as a normalized overlap: |Scomp \u2229 Spart|/K, where K = 5, Scomp is the set of\ntop-K users identi\ufb01ed using the complete dataset and Spart is that identi\ufb01ed using partial datasets.\nIn choosing partial data, we set f \u2208 (0.5, 1), and discard each match result with probability f\nindependently. We compute the metric for each f by averaging over 1000 Monte Carlo trials.\nThe second \ufb01gure of Figure 3 shows that over the range of f where overlaps above 60% are retained,\nour algorithm, along with some others, demonstrates good robustness against partial information.\nIn addition, we compare the ranks estimated by the four algorithms to the rank provided by League of\nLegends. By computing the average points earned per match for each user, we infer the rank of the\nusers determined by of\ufb01cial standards. In the third \ufb01gure of Figure 3, the x-axis indicates the top-5\nusers identi\ufb01ed by computing average League of Legends points earned per match and sorting them\nin descending order. The y-axis indicates the percentile of these top-5 users according to the ranks by\nthe algorithms of interest. Notice that the top-5 ranked users by League of Legends standards are also\nplaced at high ranks when ranked by our algorithm and heuristic Spectral MLE; they are all placed at\nthe 80-th percentile or above. On the other hand, most of them (4 users out of the top-5 users) are\nplaced at noticeably lower ranks when ranked by least square and counting.\n\n5 Conclusion\n\nWe characterized the minimax (order-wise) optimal sample complexity for top-K rank aggregation\nin the M-wise comparison model that builds on the PL model. We corroborated our result using\nsynthetic data experiments and veri\ufb01ed the applicability of our algorithm on real-world data.\n\n4Two teams of 5 users compete. Each user kills an opponent, assists a mate to kill one, and dies from an\nattack. At the end, one team wins, and different points are given to the users. We use users\u2019 kill/assist/death data\n(non-negative integers), which can be considered as noisy measurements of their skill, and rank them by skill.\n5We de\ufb01ne a measure as {(# of kills + # of assists)/(1 + # of deaths)}\u00d7weight. We adopt this measure\nsince it is similar to the one of\ufb01cially provided (called KDA statistics). We assign winning users a weight of 1.1\nand losing users a weight of 1.0, to give extra credit (10%) to users who lead their team\u2019s winning.\n\n9\n\n\fAcknowledgments\n\nThis work was supported by Institute for Information & communications Technology Promotion(IITP)\ngrant funded by the Korea government(MSIT) (2017-0-00694, Coding for High-Speed Distributed\nNetworks).\n\nReferences\n[1] Ailon, N. (2012). Active learning ranking from pairwise preferences with almost optimal query complexity.\n\nJournal of Machine Learning, 13, 137\u2013164.\n\n[2] Ailon, N. and Mohri, M. (2007). An ef\ufb01cient reduction of ranking to classi\ufb01cation. arXiv preprint\n\narXiv:0710.2889.\n\n[3] Ammar, A. and Shah, D. (2011). Ranking: Compare, don\u2019t score. In Allerton Conference, pages 776\u2013783.\n\nIEEE.\n\n[4] Ashish, K. and Oh, S. (2016). Data-driven rank breaking for ef\ufb01cient rank aggregation. Journal of Machine\n\nLearning Research, 17, 1\u201354.\n\n[5] Azari Sou\ufb01ani, H., Chen, W., Parkes, D. C., and Xia, L. (2013). Generalized method-of-moments for rank\n\naggregation. In Neural Information Processing Systems, pages 2706\u20132714.\n\n[6] Azari Sou\ufb01ani, H., Parkes, D. C., and Xia, L. (2014). A statistical decision-theoretic framework for social\n\nchoice. In Neural Information Processing Systems, pages 3185\u20133193.\n\n[7] Baltrunas, L., Makcinskas, T., and Ricci, F. (2010). Group recommendations with rank aggregation and\n\ncollaborative \ufb01ltering. In ACM Conference on Recommender Systems, pages 119\u2013126. ACM.\n\n[8] Bell, D. (1982). Econometric models for probabilisitic choice among products. Operations Research, 30(5),\n\n961\u2013981.\n\n[9] Bergstrom, C. T., W. J. D. and Wiseman, M. A. (2008). The eigenfactorTM metrics. Journal of Neuroscience,\n\n28(45), 11433\u201311434.\n\n[10] Bonacich, P. and Lloyd, P. (2001). Eigenvector-like measures of centrality for asymmetric relations. Social\n\nnetworks, 23(3), 191\u2013201.\n\n[11] Borda, J. C. (1781). M\u00e9moire sur les \u00e9lections au scrutin.\n\n[12] Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs: I. the method of paired\n\ncomparisons. Biometrika, 39(3-4), 324\u2013345.\n\n[13] Braverman, M., Mao, J., and Weinberg, S. M. (2016). Parallel algorithms for select and partition with\n\nnoisy comparisons. In ACM symposium on Theory of Computing, pages 851\u2013862.\n\n[14] Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer\n\nNetworks and ISDN systems, 30(1), 107\u2013117.\n\n[15] Caplin, A. and Nalebuff, B. (1991). Aggregation and social choice: a mean voter theorem. Econometrica,\n\npages 1\u201323.\n\n[16] Chen, X., Bennett, P. N., Collins-Thompson, K., and Horvitz, E. (2013). Pairwise ranking aggregation in a\n\ncrowdsourced setting. In ACM Conference on Web Search and Data Mining, pages 193\u2013202. ACM.\n\n[17] Chen, Y. and Suh, C. (2015). Spectral MLE: Top-K rank aggregation from pairwise comparisons. In\n\nInternational Conference on Machine Learning, pages 371\u2013380.\n\n[18] Cheng, W., H. E. and Dembczynski, K. J. (2010). Label ranking methods based on the Plackett-Luce\n\nmodel. In International Conference on Machine Learning, pages 215\u2013222.\n\n[19] Cooley, O., Kang, M., and Koch, C. (2016). Threshold and hitting time for high-order connectedness in\n\nrandom hypergraphs. the electronic journal of combinatorics, pages 2\u201348.\n\n[20] Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. (2001). Rank aggregation methods for the web. In\n\nInternational conference on World Wide Web, pages 613\u2013622. ACM.\n\n[21] Fishburn, P. (1982). Nontransitive measurable utility. Journal of Mathematical Psychology, 26(1), 31\u201367.\n\n10\n\n\f[22] Ford, L. R. (1957). Solution of a ranking problem from binary comparisons. American Mathematical\n\nMonthly, pages 28\u201333.\n\n[23] Graham, L. and Sugden, R. (1982). Econometric models for probabilisitic choice among products.\n\nEconomic Journal, 92(368), 805\u2013824.\n\n[24] Guiver, J. and Snelson, E. (2009). Bayesian inference for Plackett-Luce ranking models.\n\nInternational Conference on Machine Learning, pages 377\u2013384.\n\n[25] Hajek, B., Oh, S., and Xu, J. (2014). Minimax-optimal inference from partial rankings.\n\nInformation Processing Systems, pages 1475\u20131483.\n\nIn ACM\n\nIn Neural\n\n[26] Han, T. and Verd\u00fa, S. (1994). Generalizing the Fano inequality. IEEE Transactions on Information Theory,\n\n40, 1247\u20131251.\n\n[27] Heckel, R., Shah, N., Ramchandran, K., and Wainwright, M. (2016). Active ranking from pairwise\n\ncomparisons and when parametric assumptions don\u2019t help. arXiv preprint arXiv:1606.08842.\n\n[28] Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models. Annals of Statistics, pages\n\n384\u2013406.\n\n[29] Jamieson, K. G. and Nowak, R. (2011). Active ranking using pairwise comparisons. In Neural Information\n\nProcessing Systems, pages 2240\u20132248.\n\n[30] Janson, S. (2004). Large deviations for sums of partly dependent random variables. In Random Structures\n\n& Algorithms, pages 234\u2013248.\n\n[31] Jiang, X., Lim, L. H., Yao, Y., and Ye, Y. (2011). Statistical ranking and combinatorial Hodge theory.\n\nMathematical Programming, 127, 203\u2013244.\n\n[32] Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. Wiley.\n\n[33] Maystre, L. and Grossglauser, M. (2015). Fast and accurate inference of Plackett-Luce models. In Neural\n\nInformation Processing Systems, pages 172\u2013180.\n\n[34] McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. Frontiers in Econometrics,\n\npages 105\u2013142.\n\n[35] McFadden, D. (1980). Econometric models for probabilisitic choice among products. Journal of Business,\n\n53(3), S13\u2013S29.\n\n[36] Mohajer, S., Suh, C., and Elmahdy, A. (2017). Active learning for top-K rank aggregation from noisy\n\ncomparisons. In International Conference on Machine Learning, pages 2488\u20132497.\n\n[37] Negahban, S., Oh, S., and Shah, D. (2016). Rank centrality: Ranking from pair-wise comparisons.\n\nOperations Research, 65, 266\u2013287.\n\n[38] Oh, S., Thekumparampil, K. K., and Xu, J. (2015). Collaboratively learning preferences from ordinal data.\n\nIn Neural Information Processing Systems, pages 1909\u20131917.\n\n[39] Plackett, R. L. and Luce, R. D. (1975). The analysis of permutations. Applied Statistics, pages 193\u2013202.\n\n[40] Rajkumar, A. and Agarwal, S. (2014). A statistical convergence perspective of algorithms for rank\n\naggregation from pairwise data. In International Conference on Machine Learning, pages 118\u2013126.\n\n[41] Seeley, J. R. (1949). The net of reciprocal in\ufb02uence. Canadian Journal of Psychology, 3(4), 234\u2013240.\n\n[42] Shah, N. B. and Wainwright, M. J. (2015). Simple, robust and optimal ranking from pairwise comparisons.\n\narXiv preprint arXiv:1512.08949.\n\n[43] Sz\u00f6r\u00e9nyi, B., Busa-Fekete, R., Paul, A., and H\u00fcllermeier, E. (2015). Online rank elicitation for Plackett-\n\nLuce: A dueling bandits approach. In Neural Information Processing Systems, pages 604\u2013612.\n\n[44] Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134\u20131142.\n[45] Vigna, S. (2016). Spectral ranking. Network Science, 4(4), 433\u2013445.\n\n[46] Walker, J. and Ben-Akiva, M. (2002). Generalized random utility model. Mathematical Social Sciences,\n\n43(3), 303\u2013343.\n\n[47] Wei, T. H. (1952). The algebraic foundations of ranking theory. Ph.D. thesis, University of Cambridge.\n\n11\n\n\f", "award": [], "sourceid": 1058, "authors": [{"given_name": "Minje", "family_name": "Jang", "institution": "KAIST"}, {"given_name": "Sunghyun", "family_name": "Kim", "institution": "ETRI"}, {"given_name": "Changho", "family_name": "Suh", "institution": "KAIST"}, {"given_name": "Sewoong", "family_name": "Oh", "institution": "UIUC"}]}