{"title": "Learning Mixtures of Plackett-Luce Models from Structured Partial Orders", "book": "Advances in Neural Information Processing Systems", "page_first": 10143, "page_last": 10153, "abstract": "Mixtures of ranking models have been widely used for heterogeneous preferences. However, learning a mixture model is highly nontrivial, especially when the dataset consists of partial orders. In such cases, the parameter of the model may not be even identifiable. In this paper, we focus on three popular structures of partial orders: ranked top-$l_1$, $l_2$-way, and choice data over a subset of alternatives. We prove that when the dataset consists of combinations of ranked top-$l_1$ and $l_2$-way (or choice data over up to $l_2$ alternatives), mixture of $k$ Plackett-Luce models is not identifiable when $l_1+l_2\\le 2k-1$ ($l_2$ is set to $1$ when there are no $l_2$-way orders). We also prove that under some combinations, including ranked top-$3$, ranked top-$2$ plus $2$-way, and choice data over up to $4$ alternatives, mixtures of two Plackett-Luce models are identifiable. Guided by our theoretical results, we propose efficient generalized method of moments (GMM) algorithms to learn mixtures of two Plackett-Luce models, which are proven consistent. Our experiments demonstrate the efficacy of our algorithms. Moreover, we show that when full rankings are available, learning from different marginal events (partial orders) provides tradeoffs between statistical efficiency and computational efficiency.", "full_text": "Learning Mixtures of Plackett-Luce Models from\n\nStructured Partial Orders\n\nZhibing Zhao\n\nDepartment of Computer Science\nRensselaer Polytechnic Institute\n\nTroy, NY 12180\nzhaoz6@rpi.edu\n\nLirong Xia\n\nDepartment of Computer Science\nRensselaer Polytechnic Institute\n\nTroy, NY 12180\n\nxial@cs.rpi.edu\n\nAbstract\n\nMixtures of ranking models have been widely used for heterogeneous preferences.\nHowever, learning a mixture model is highly nontrivial, especially when the dataset\nconsists of partial orders. In such cases, the parameter of the model may not be even\nidenti\ufb01able. In this paper, we focus on three popular structures of partial orders:\nranked top-l1, l2-way, and choice data over a subset of alternatives. We prove that\nwhen the dataset consists of combinations of ranked top-l1 and l2-way (or choice\ndata over up to l2 alternatives), mixture of k Plackett-Luce models is not identi\ufb01able\nwhen l1 +l2 \u2264 2k\u22121 (l2 is set to 1 when there are no l2-way orders). We also prove\nthat under some combinations, including ranked top-3, ranked top-2 plus 2-way,\nand choice data over up to 4 alternatives, mixtures of two Plackett-Luce models\nare identi\ufb01able. Guided by our theoretical results, we propose ef\ufb01cient generalized\nmethod of moments (GMM) algorithms to learn mixtures of two Plackett-Luce\nmodels, which are proven consistent. Our experiments demonstrate the ef\ufb01cacy of\nour algorithms. Moreover, we show that when full rankings are available, learning\nfrom different marginal events (partial orders) provides tradeoffs between statistical\nef\ufb01ciency and computational ef\ufb01ciency.\n\nIntroduction\n\n1\nSuppose a group of four friends want to choose one of the four restaurants {a1, a2, a3, a4} for dinner.\nThe \ufb01rst person ranks all four restaurants as a2 (cid:31) a3 (cid:31) a4 (cid:31) a1, where a2 (cid:31) a3 means that \u201ca2 is\nstrictly preferred to a3\u201d. The second person says \u201ca4 and a3 are my top two choices, among which I\nprefer a4 to a3\u201d. The third person ranks a3 (cid:31) a4 (cid:31) a1 but has no idea about a2. The fourth person\nhas no idea about a4, and would choose a3 among {a1, a2, a3}. How should they aggregate their\npreferences to choose the best restaurant?\nSimilar rank aggregation problems exist in social choice, crowdsourcing [20, 6], recommender\nsystems [5, 3, 14, 24], information retrieval [1, 17], etc. Rank aggregation can be cast as the\nfollowing statistical parameter estimation problem: given a statistical model for rank data and the\nagents\u2019 preferences, the parameter of the model is estimated to make decisions. Among the most\nwidely-applied statistical models for rank aggregation are the Plackett-Luce model [19, 28] and its\nmixtures [8, 9, 17, 23, 30, 23]. In a Plackett-Luce model over a set of alternatives A, each alternative\nis parameterized by a strictly positive number that represents its probability to be ranked higher than\nother alternatives. A mixture of k Plackett-Luce models, denoted by k-PL, combines k component\nPlackett-Luce models via the mixing coef\ufb01cients (cid:126)\u03b1 = (\u03b11, . . . , \u03b1k) \u2208 Rk\u22650 with (cid:126)\u03b1 \u00b7 (cid:126)1 = 1, such that\nfor any r \u2264 k, with probability \u03b1r, a data point is generated from the r-th Plackett-Luce component.\nOne critical limitation of Plackett-Luce model and its mixtures is that their sample space consists of\nlinear orders over A. In other words, each data point must be a full ranking of all alternatives in A.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fHowever, this is rarely the case in practice, because agents are often not able to rank all alternatives\ndue to lack of information [27], as illustrated in the example in the beginning of Introduction.\nIn general, each rank datum is a partial order, which can be seen as a collection of pairwise\ncomparisons among alternatives that satisfy transitivity. However, handling partial orders is more\nchallenging than it appears. In particular, the pairwise comparisons of the same agent cannot be seen\nas independently generated due to transitivity.\nConsequently, most previous works focused on structured partial orders, where agents\u2019 preferences\nshare some common structures. For example, given l \u2208 N, in ranked-top-l preferences [23, 10],\nagents submit a linear order over their top l choices; in l-way preferences [21, 11, 22], agents submit\na linear order over a set of l alternatives, which are not necessarily their top l alternatives; in choice-l\npreferences (a.k.a. choice sets) [31], agents only specify their top choice among a set of l alternatives.\nIn particular, pairwise comparisons can be seen as 2-way preferences or choice-2 preferences.\nHowever, as far as we know, most previous works assumed that the rank data share the same structure\nfor their algorithms and theoretical guarantees to apply. It is unclear how rank aggregation can be\ndone effectively and ef\ufb01ciently from structured partial orders of different kinds, as in the example in\nthe beginning of Introduction. This is the key question we address in this paper.\nHow can we effectively and ef\ufb01ciently learn Plackett-Luce and its mixtures from structured partial\norders of different kinds?\nSuccessfully addressing this question faces two challenges. First, to address the effectiveness concern,\nwe need a statistical model that combines various structured partial orders to prove desirable statistical\nproperties, and we are unaware of an existing one. Second, to address the ef\ufb01ciency concern, we need\nto design new algorithms as either previous algorithms cannot be directly applied, or it is unclear\nwhether the theoretical guarantee such as consistency will be retained.\n\n1.1 Our Contributions\n\nOur contributions in addressing the key question are three-fold.\nModeling Contributions. We propose a class of statistical models to model the co-existence of the\nfollowing three types of structured partial orders mentioned in the Introduction: ranked-top-l, l-way,\nand choice-l, by leveraging mixtures of Plackett-Luce models. Our models can be easily generalized\nto include other types of structured partial orders.\nTheoretical Contributions. Our main theoretical results characterize the identi\ufb01ability of the pro-\nposed models. Identi\ufb01ability is fundamental in parameter estimation, which states that different\nparameters of the model should give different distributions over data. Clearly, if a model is non-\nidenti\ufb01able, then no parameter estimation algorithm can be consistent.\nWe prove that when only ranked top-l1 and l2-way (l2 is set to 1 if there are no l2-way orders)\norders are available, the mixture of k Plackett-Luce models is not identi\ufb01able if k \u2265 (l1 + l2 + 1)/2\n(Theorem 1). We also prove that the mixtures of two Plackett-Luce models is identi\ufb01able under the\nfollowing combinations of structures: ranked top-3 (Theorem 2 (a) extended from [33]), ranked top-2\nplus 2 way (Theorem 2 (b)), choice-2, 3, 4 (Theorem 2 (c)), and 4-way (Theorem 2 (d)). For the case\nof mixtures of k Plackett-Luce models over m alternatives, we prove that if there exist m(cid:48) \u2264 m s.t.\nthe mixture of k Plackett-Luce models over m(cid:48) alternatives is identi\ufb01able, we can learn the parameter\nusing ranked top-l1 and l2-way orders where l1 + l2 \u2265 m(cid:48) (Theorem 3). This theorem, combined\nwith Theorem 3 in [33], which provides a condition for mixtures of k Plackett-Luce models to be\ngenerically identi\ufb01able, can guide the algorithm design for mixtures of arbitrary k Plackett-Luce\nmodels.\nAlgorithmic Contributions. We propose ef\ufb01cient generalized-method-of-moments (GMM) algo-\nrithms for parameter estimation of the proposed model based on 2-PL. Our algorithm runs much\nfaster while providing better statistical ef\ufb01ciency than the EM-algorithm proposed by Liu et al. [16]\non datasets with large numbers of structured partial orders, see Section 6 for more details. Our\nalgorithms are compared with the GMM algorithm by Zhao et al. [33] under two different settings.\nWhen full rankings are available, our algorithms outperform the GMM algorithm by Zhao et al. [33]\nin terms of MSE. When only structured partial orders are available, the GMM algorithm by Zhao\net al. [33] is the best. We believe this difference is caused by the intrinsic information in the data.\n\n2\n\n\f1.2 Related Work and Discussions\n\nModeling. We are not aware of a previous model targeting rank data that consists of different types\nof structured partial orders. We believe that modeling the coexistence of different types of structured\npartial orders is highly important and practical, as it is more convenient, ef\ufb01cient, and accurate for an\nagent to report her preferences as a structured partial order of her choice. For example, some voting\nwebsites allow users to use different UIs to submit structured partial orders [4].\nThere are two major lines of research in rank aggregation from partial orders: learning from struc-\ntured partial orders and EM algorithms for general partial orders. Popular structured partial orders\ninvestigated in the literature are pairwise comparisons [13, 12], top-l [23, 10], l-way [21, 11, 22], and\nchoice-l [31]. Khetan and Oh [15] focused on partial orders with \u201cseparators\", which is a broader\nclass of partial orders than top-k. But still, [15] assumes the same structure for everyone. Our model\nis more general as it allows the coexistence of different types of structured partial orders in the dataset.\nEM algorithms have been designed for learning mixtures of Mallows\u2019 model [18] and mixtures of\nrandom utility models including the Plackett-Luce model [16], from general partial orders. Our model\nis less general, but as EM algorithms are often slow and it is unclear whether they are consistent, our\nmodel allows for theoretically and practically more ef\ufb01cient algorithms. We believe that our approach\nprovides a principled balance between the \ufb02exibility of modeling and the ef\ufb01ciency of algorithms.\nTheoretical results. Several previous works provided theoretical guarantees such as identi\ufb01ability\nand sample complexity of mixtures of Plackett-Luce models and their extensions to structured partial\norders. For linear orders, Zhao et al. [33] proved that the mixture of k Plackett-Luce models over\nm alternatives is not identi\ufb01able when k \u2264 2m \u2212 1 and this bound is tight for k = 2. We extend\ntheir results to the case of structured partial orders of various types. Ammar et al. [2, Theorem 1]\nproved that when m = 2k, where k = 2l is a nonnegative integer power of 2, there exist two different\nmixtures of k Plackett-Luce models parameters that have the same distribution over (2l + 1)-way\norders. Our Theorem 1 signi\ufb01cantly extends this result in the following aspects: (i) our results\nincludes all possible values of k rather than powers of 2; (ii) we show that the model is not identi\ufb01able\neven under (2l+1 \u2212 1)-way (in contrast to (2l + 1)-way) orders; (iii) we allow for combinations\nof ranked top-l1 and l2-way structures. Oh and Shah [26] showed that mixtures of Plackett-Luce\nmodels are in general not identi\ufb01able given partial orders, but under some conditions on the data,\nthe parameter can be learned using pairwise comparisons. We consider many more structures than\npairwise comparisons.\nRecently, Chierichetti et al. [7] proved that at least O(m2) random marginal probabilities of partial\norders are required to identify the parameter of uniform mixture of two Plackett-Luce models. We\nshow that a carefully chosen set of O(m) marginal probabilities can be suf\ufb01cient to identify the\nparameter of nonuniform mixtures of Plackett-Luce models, which is a signi\ufb01cant improvement.\nFurther, our proposed algorithm can be easily modi\ufb01ed to handle the case of uniform mixtures.\nZhao et al. [35] characterized the conditions when mixtures of random utility models are generically\nidenti\ufb01able. We focus on strict identi\ufb01ability, which is stronger.\nAlgorithms. Several learning algorithms for mixtures of Plackett-Luce models have been proposed,\nincluding tensor decomposition based algorithm [26], a polynomial system solving algorithm [7], a\nGMM algorithm [33], and EM-based algorithms [8, 30, 23, 16]. In particular, Liu et al. [16] proposed\nan EM-based algorithm to learn from general partial orders. However, it is unclear whether their\nalgorithm is consistent (as for most EM algorithms), and their algorithm is signi\ufb01cantly slower than\nours. Our algorithms for linear orders are similar to the one proposed by Zhao et al. [33], but we\nconsider different sets of marginal probabilities and our algorithms signi\ufb01cantly outperforms the one\nby Zhao et al. [33] w.r.t. MSE while taking similar running time.\n\n2 Preliminaries\nLet A = {a1, a2, . . . , am} denote a set of m alternatives and L(A) denote the set of all linear orders\n(full rankings) over A, which are antisymmetric, transitive and total binary relations. A linear order\nR \u2208 L(A) is denoted as ai1 (cid:31) ai2 (cid:31) . . . (cid:31) aim, where ai1 is the most preferred alternative and aim\nis the least preferred alternative. A partial order O is an antisymmetric and transitive binary relation.\nIn this paper, we consider three types of strict partial orders: ranked-top-l (top-l for short), l-way,\nand choice-l, where l \u2264 m. A top-l order is denoted by Otop-l = [ai1 (cid:31) . . . (cid:31) ail (cid:31) others]; an\n\n3\n\n\fl-way order is denoted by Ol-way = [ai1 (cid:31) . . . (cid:31) ail ], which means that the agent does not have\n= (A(cid:48), a), where\npreferences over unranked alternatives; and a choice-l order is denoted by Ochoice\u2212l\nA(cid:48) \u2286 A, |A(cid:48)| = l, and a \u2208 A(cid:48), which means that the agent chooses a from A(cid:48). We note that the\nthree types of partial orders are not mutually exclusive. For example, a pairwise comparison is a\n2-way order as well as a choice-2 order. Let P(A) denote the set of all partial orders of the three\nstructures: ranked top-l, l-way, and choice-l (l \u2264 m) over A. It is worth noting that L(A) \u2286 P(A).\nLet P = (O1, O2, . . . , On) \u2208 P(A)n denote the data, also called a preference pro\ufb01le. Let OsA(cid:48)\ndenote a partial order over a subset A(cid:48) whose structure is s. When s is top-l, A(cid:48) is set to be A. Let\n[d] denote the set {1, 2, . . . , d}.\nDe\ufb01nition 1. (Plackett-Luce model). The parameter space is \u0398 = {(cid:126)\u03b8 = {\u03b8i|1 \u2264 i \u2264 m, 0 < \u03b8i <\ni=1 \u03b8i = 1}}. The sample space is L(A)n. Given a parameter (cid:126)\u03b8 \u2208 \u0398, the probability of any\nlinear order R = [ai1 (cid:31) ai2 (cid:31) . . . (cid:31) aim] is\n\n1,(cid:80)m\n\nA(cid:48)\n\nPrPL(R|(cid:126)\u03b8) =\n\nm\u22121(cid:89)\n\np=1\n\n\u03b8ip(cid:80)m\n\n.\n\nq=p \u03b8iq\n\nUnder Plackett-Luce model, a partial order O can be viewed as a marginal event which consists of all\nlinear orders that extend O, that is, for any extension R, a (cid:31)O b implies a (cid:31)R b. The probabilities of\nthe aforementioned three types of partial orders are as follows [32].\n\n\u2022 Top-l. For any top-l order Otop-l = [ai1 (cid:31) . . . (cid:31) ail (cid:31) others], we have\n\nPrPL(Otop-l|(cid:126)\u03b8) =\n\nPrPL(Ol-wayA(cid:48)\n\n|(cid:126)\u03b8) =\n\nPrPL(O|(cid:126)\u03b8) =\n\n.\n\n.\n\nl(cid:89)\n\np=1\n\n\u03b8ip(cid:80)m\n\nq=p \u03b8iq\n\nl\u22121(cid:89)\n\np=1\n\n\u03b8ip(cid:80)l\n\u03b8i(cid:80)\n\naj\u2208A(cid:48) \u03b8j\n\n.\n\nq=p \u03b8iq\n\n\u2022 l-way. For any l-way order Ol-wayA(cid:48) = [ai1 (cid:31) . . . (cid:31) ail ], where A(cid:48) = {ai1 , . . . , ail}, we\n\nhave\n\n\u2022 Choice-l. For any choice order O = (A(cid:48), ai), we have\n\nwhere (cid:126)\u03b1 = (\u03b11, . . . , \u03b1k) is the mixing coef\ufb01cients. For all r \u2264 k, \u03b1r \u2265 0 and(cid:80)k\n\nIn this paper, we assume that data points are i.i.d. generated from the model.\nDe\ufb01nition 2 (Mixtures of k Plackett-Luce models for linear orders (k-PL)). Given m \u2265 2 and\nk \u2208 N+, the sample space of k-PL is L(A)n. The parameter space is \u0398 = {(cid:126)\u03b8 = ((cid:126)\u03b1, (cid:126)\u03b8(1), . . . , (cid:126)\u03b8(k))},\nr=1 \u03b1r = 1. For\nall 1 \u2264 r \u2264 k, (cid:126)\u03b8(r) is the parameter of the rth Plackett-Luce component. The probability of a linear\norder R is:\n\nk(cid:88)\n\nPrk-PL(R|(cid:126)\u03b8) =\n\n\u03b1r PrPL(R|(cid:126)\u03b8(r)).\n\nr=1\n\nWe now recall the de\ufb01nition of identi\ufb01ability of statistical models.\nDe\ufb01nition 3 (Identi\ufb01ability). Let M = {Pr(\u00b7|(cid:126)\u03b8) : (cid:126)\u03b8 \u2208 \u0398} be a statistical model, where \u0398 is the\nparameter space and Pr(\u00b7|(cid:126)\u03b8) is the distribution over the sample space associated with (cid:126)\u03b8 \u2208 \u0398. M is\nidenti\ufb01able if for all (cid:126)\u03b8, (cid:126)\u03b3 \u2208 \u0398, we have\n\nPr(\u00b7|(cid:126)\u03b8) = Pr(\u00b7|(cid:126)\u03b3) =\u21d2 (cid:126)\u03b8 = (cid:126)\u03b3.\n\nA mixture model is generally not identi\ufb01able due to the label switching problem [29], which means\nthat labeling the components differently leads to the same distribution over data. In this paper, we\nconsider identi\ufb01ability of mixture models modulo label switching. That is, in De\ufb01nition 3, we further\nrequire that (cid:126)\u03b8 and (cid:126)\u03b3 cannot be obtained from each other by label switching.\n\n4\n\n\fFigure 1: The mixture model for structured partial preferences.\n\n3 Mixtures of Plackett-Luce Models for Partial Orders\n\n> 0 and we require(cid:80)u\n\nWe propose the class of mixtures of Plackett-Luce models for the aforementioned structures of\npartial orders. To this end, each such model should be described by the collection of allowable\ntypes of structured partial orders, denoted by \u03a6. More precisely, \u03a6 is a set of u structures \u03a6 =\n{(s1,A1), . . . , (su,Au)}, where for any t \u2208 [u], (st,At) means structure st over At. For the case of\ntop-l, At is set to be A. Since the three structured considered in this paper are not mutually exclusive,\nwe require that \u03a6 does not include any pair of overlapping structures simultaneously for the\nmodel to be identi\ufb01able. There are two types of pairs of overlapping structures: (1) (top-(m \u2212 1),A)\nand (m-way,A); and (2) for any subset of two alternatives A(cid:48), (2-way,A(cid:48)) and (choice-2,A(cid:48)). Each\nstructure corresponds to a number \u03c6stAt\n= 1. A partial order is\ngenerated in two stages as illustrated in Figure 1: (i) a linear order R is generated by k-PL given\n(cid:126)\u03b1, (cid:126)\u03b8(1), . . . , (cid:126)\u03b8(k); (ii) with probability \u03c6stAt\n, R is projected to the randomly-generated partial order\nstructure (st, At) to obtain a partial order O. Formally, the model is de\ufb01ned as follows.\nDe\ufb01nition 4 (Mixtures of k Plackett-Luce models for partial orders by \u03a6 (k-PL-\u03a6)). Given m \u2265 2,\nk \u2208 N+, and the set of structures \u03a6 = {(s1,A1), . . . , (su,Au)}, the sample space is all structured\npartial orders de\ufb01ned by \u03a6. Given l1 \u2208 [m \u2212 1], l2, l3 \u2208 [m], the parameter space is \u0398 = {(cid:126)\u03b8 =\nand(cid:80)u\n((cid:126)\u03c6, (cid:126)\u03b1, (cid:126)\u03b8(1), . . . , (cid:126)\u03b8(k))}. The \ufb01rst part is a vector (cid:126)\u03c6 = (\u03c6s1A1\n(cid:80)k\n), whose entries are all positive\n= 1. The second part is (cid:126)\u03b1 = (\u03b11, . . . , \u03b1k) where for all r \u2264 k, \u03b1r > 0 and\nr=1 \u03b1r = 1. The remaining part is ((cid:126)\u03b8(1), . . . , (cid:126)\u03b8(k)), where (cid:126)\u03b8(r) is the parameter of the rth Plackett-\nLuce component. Then the probability of any partial order O, whose structure is de\ufb01ned by (s,A(cid:48)),\nis\n\n, . . . , \u03c6suAu\n\nt=1 \u03c6stAt\n\nt=1 \u03c6stAt\n\nk(cid:88)\n\nPrk-PL-\u03a6(O|(cid:126)\u03b8) = \u03c6sA(cid:48)\n\n\u03b1r PrPL(OsA(cid:48)|(cid:126)\u03b8(r)).\n\nr=1\n\nFor any partial order O whose structure is (s,A(cid:48)), we can also write\nPrk-PL-\u03a6(O|(cid:126)\u03b8) = \u03c6sA(cid:48) Prk-PL(O|(cid:126)\u03b8)\n\n(1)\nwhere Prk-PL(O|(cid:126)\u03b8) is the marginal probability of O under k-PL. This is a class of models because\nthe sample space is different when \u03a6 is different.\nExample 1. Let the set of alternatives be {a1, a2, a3, a4}. Consider the 2-PL-\u03a6 M where \u03a6 =\n{(top-3,A), (top-2,A), (3-way,{a1, a3, a4}), (choice-3,{a1, a2, a3})}. \u03c6top-3A = 0.2, \u03c6top-2A = 0.1,\n\u03c63-way\n{a1,a2,a3} = 0.4, (cid:126)\u03b1 = [\u03b11, \u03b12] = [0.2, 0.8], (cid:126)\u03b8(1) = [0.1, 0.2, 0.3, 0.4], (cid:126)\u03b8(2) =\n{a1,a3,a4} = 0.3, \u03c6choice-3\n[0.2, 0.2, 0.3, 0.3]. Now we compute the probabilities of the following partial orders given the model:\nO1 = a2 (cid:31) a3 (cid:31) a4 (cid:31) a1 (top-3), O2 = a4 (cid:31) a3 (cid:31) {a1, a2} (top-2), O3 = a3 (cid:31) a4 (cid:31) a1 (3-way),\nand O4 = ({a1, a2, a3}, a3) (choice-3 over {a1, a2, a3}). We \ufb01rst compute PrPL(Oj|\u03b8(r)) for all\ncombinations of j and r, shown in Table 1.\n\nO1\nO2\nO3\nO4\n\n0.2\n\n0.1+0.2+0.3+0.4\n\n0.4\n\n0.1+0.3+0.4\n0.3\n\n0.4\n\n0.1+0.4 = 0.06\n\n0.2\n\n0.2+0.2+0.3+0.3\n\n0.3\n\n0.2+0.3+0.3\n0.3\n\n0.3\n\n0.2+0.3 = 0.045\n\nr = 1\n\n0.3\n\nr = 2\n0.3\n\n0.1+0.2+0.3+0.4\n\n0.3\n\n0.1+0.3+0.4\n0.3\n\n0.1+0.2+0.3 = 0.2\n0.1+0.4 = 0.3\n\n0.4\n\n0.2+0.2+0.3+0.3\n\n0.3\n\n0.2+0.3+0.3\n0.3\n\n0.2+0.2+0.3 = 0.13\n0.3\n0.2+0.3 = 0.225\n\n0.1+0.2+0.3 = 0.5\nTable 1: Pr(Rj|\u03b8(r)) for all j = 1, 2, 3, 4 and r = 1, 2.\n\n0.2+0.2+0.3 = 0.43\n\n5\n\n\fLet PrM(Oj) denote the probability of Oj under model M, we have\n\nr=1\n\n2(cid:88)\n2(cid:88)\n2(cid:88)\n2(cid:88)\n\nr=1\n\nPrM(O1) = \u03c6top-3A\n\nPrM(O2) = \u03c6top-2A\n\n\u03b1r Pr(O1|(cid:126)\u03b8(r)) = 0.2 \u00d7 (0.2 \u00d7 0.06 + 0.8 \u00d7 0.045) = 0.0096\n\n\u03b1r Pr(O2|(cid:126)\u03b8(r)) = 0.1 \u00d7 (0.2 \u00d7 0.2 + 0.8 \u00d7 0.13) = 0.014\n\nPrM(O3) = \u03c62-way\n\n{a3,a4}\n\nr=1\n\n\u03b1r Pr(O3|(cid:126)\u03b8(r)) = 0.3 \u00d7 (0.2 \u00d7 0.3 + 0.8 \u00d7 0.225) = 0.072\n\nPrM(O4) = \u03c6choice-3\n\n{a1,a2,a3}\n\n\u03b1r Pr(O4|(cid:126)\u03b8(r)) = 0.4 \u00d7 (0.2 \u00d7 0.5 + 0.8 \u00d7 0.43) = 0.18\n\nr=1\n\n(Non-)identi\ufb01ability of k-PL-\u03a6\n\n4\nLet \u03a6l-way = {(l-way,Al)|Al \u2208 A,|Al| = l} and \u03a6choice-l = {(choice-l,Al)|Al \u2208 A,|Al| = l}.\nThe following theorem shows that under some conditions on \u03a6, k, and m, k-PL-\u03a6 is not identi\ufb01able.\nTheorem 1. Given a set of m alternatives A and any 0 \u2264 l1 \u2264 m \u2212 1, 1 \u2264 l2 \u2264 m. Let\n\u03a6\u2217 = {(top-1,A), . . . , (top-l1,A)} \u222a \u03a61-way \u222a . . . \u222a \u03a6l2-way. Given any \u03a6 \u2282 \u03a6\u2217, and for any\nk \u2265 (l1 + l2 + 1)/2, k-PL-\u03a6 is not identi\ufb01able.\nWe prove that the theorem holds when \u03a6 = \u03a6\u2217. See full proof in the appendix. Considering that any\nl-way order implies a choice-l order, we have the following corollary.\nCorollary 1. Given a set of m alternatives A and any 0 \u2264 l1 \u2264 m \u2212 1, 1 \u2264 l3 \u2264 m. Let\n\u03a6\u2217 = {(top-1,A), . . . , (top-l1,A)} \u222a \u03a6choice-1 \u222a . . . \u222a \u03a6choice-l3. Given any \u03a6 \u2282 \u03a6\u2217, and for any\nk \u2265 (l1 + l3 + 1)/2, k-PL-\u03a6 is not identi\ufb01able.\nGiven any k, these results show what structures of data we cannot use if we want to interpret the\nlearned parameter. Next, we will characterize conditions for 2-PL-\u03a6\u2019s to be identi\ufb01able.\nTheorem 2. Let \u03a6\u2217 be one of the four combinations of structures below. For any \u03a6 \u2283 \u03a6\u2217, 2-PL-\u03a6\nover m \u2265 4 alternatives is identi\ufb01able.\n(a) \u03a6\u2217 = {(top-3,A)}, (b) \u03a6\u2217 = {(top-2,A)} \u222a \u03a62-way, (c) \u03a6\u2217 = \u222a4\nt=2\u03a6choice-t, or (d) \u03a6\u2217 = \u03a64-way.\nWe \ufb01rst show that for any (cid:126)\u03c61 (cid:54)= (cid:126)\u03c62, the distribution over sample space must be different. Then given\n(cid:126)\u03c6, we prove that for any ((cid:126)\u03b1, (cid:126)\u03b8(1), . . . , (cid:126)\u03b8(k)), there does not exist another parameter leading to the\nsame distribution over the sample space. See the full proof in the appendix.\nIdenti\ufb01ability for k \u2265 3 is still an open question and Zhao et al. [33] proved that when k \u2264\n(cid:98) m\u22122\n2 (cid:99)!, generic identi\ufb01ability holds for k-PL, which means the Lebesgue measure of non-identi\ufb01able\nparameter is zero. We have the following theorem that can guide algorithm design for k-PL-\u03a6. Full\nproof of Theorem 3 can be found in the appendix.\nTheorem 3. Let l1 \u2208 [m \u2212 1], l2 \u2208 [m], and \u03a6\u2217 = {(top-l1,A), (l2-way,A(cid:48))|A(cid:48) \u2208 A,|A(cid:48)| = l2}.\nGiven any \u03a6 \u2283 \u03a6\u2217, if k-PL over m(cid:48) alternatives is (generically) identi\ufb01able, k-PL-\u03a6 over m \u2265 m(cid:48)\nalternatives is (generically) identi\ufb01able when l1 + l2 \u2265 m(cid:48).\n\n5 Consistent Algorithms for Learning 2-PL-\u03a6\n\nWe propose a two-stage estimation algorithm. In the \ufb01rst stage, we make one pass of the dataset to\ndetermine \u03a6 and estimate (cid:126)\u03c6. In the second stage, we estimate the parameter (cid:126)\u03b8. We note that these\ntwo stages only require one pass of the data.\nIn the \ufb01rst stage we check the existence of each structure in the dataset and estimate \u03c6top-lA , \u03c6l-wayA(cid:48)\n\u03c6choice-l\nA(cid:48)(cid:48)\nFormally, for any structure (s,As),\n\n, and\nfor any l, A(cid:48) and A(cid:48)(cid:48) by dividing the occurrences of each structure by the size of the dataset.\n\n# of orders with structure (s,As)\n\n\u03c6sAs\n\n=\n\n(2)\n\nn\n\n6\n\n\fIn the second stage, we estimate (cid:126)\u03b8 using the generalized-method-of-moments (GMM) algorithm.\nIn a GMM algorithm, a set of q marginal events (partial orders in the case of rank data), denoted\nby E = {E1, . . . ,Eq}, are selected. Then q moment conditions (cid:126)g(O, (cid:126)\u03b8) \u2208 Rq, which are functions\nof a data point O and the parameter (cid:126)\u03b8, are designed. The expectation of any moment condition is\nzero at the ground truth parameter (cid:126)\u03b8\u2217, i.e., E[g(O, (cid:126)\u03b8\u2217)] = (cid:126)0. For a dataset P with n rankings, we let\n(cid:126)g(P, (cid:126)\u03b8) = 1\nn\nNow we de\ufb01ne moment conditions (cid:126)g(O, (cid:126)\u03b8). For any t \u2264 q, the t-th moment condition gt(O, (cid:126)\u03b8)\ncorresponds to the event Et. Let (st,At) denote the structure of Et. If O = Et, we de\ufb01ne gt(O, (cid:126)\u03b8) =\nPrk-PL-\u03a6(Et|(cid:126)\u03b8). Under this de\ufb01nition, we have\n1\n\u03c6stAt\n\n(cid:80)\nO\u2208P g(O, (cid:126)\u03b8). Then the estimate is \u02c6\u03b8 = arg min||g(P, (cid:126)\u03b8)||2\n2.\n\n(Prk-PL-\u03a6(Et|(cid:126)\u03b8) \u2212 1); otherwise gt(O, (cid:126)\u03b8) = 1\n\u03c6stAt\n\n)2\n\n(3)\n\n(cid:126)\u03b8(cid:48) = arg min\n\nq(cid:88)\n\nt=1\n\n(\n\nPrk-PL-\u03a6(Et|(cid:126)\u03b8)\n\n\u03c6stAt\n\n\u2212 # of Et\nn\u03c6stAt\n\nWe consider two ways of selecting E for 2-PL-\u03a6 guided by our Theorem 2 (b) and (c) respectively.\nRanked top-2 and 2-way (\u03a6 = {(top-2,A), (2-way,A(cid:48))|A(cid:48) \u2208 A,|A(cid:48)| = 2}). The selected partial\norders are: ranked top-2 for each pair (m(m \u2212 1) \u2212 1 moment conditions) and all combinations\nof 2-way orders (m(m \u2212 1)/2 moment conditions). We remove one of the ranked top-2 orders\nbecause this corresponding moment condition is linearly dependent of the other ranked top-2 moment\nconditions. For the same reason, we only choose one for each 2-way comparison, resulting in\nm(m \u2212 1)/2 moment conditions. For example. in the case of A = {a1, a2, a3, a4}, we can choose\nE = {a1 (cid:31) a2 (cid:31) others, a1 (cid:31) a3 (cid:31) others, a1 (cid:31) a4 (cid:31) others, a2 (cid:31) a1 (cid:31) others, a2 (cid:31) a3 (cid:31)\nothers, a2 (cid:31) a4 (cid:31) others, a3 (cid:31) a1 (cid:31) others, a3 (cid:31) a2 (cid:31) others, a3 (cid:31) a4 (cid:31) others, a4 (cid:31) a1 (cid:31)\nothers, a4 (cid:31) a2 (cid:31) others, a1 (cid:31) a2, a1 (cid:31) a3, a1 (cid:31) a4, a2 (cid:31) a3, a2 (cid:31) a4, a3 (cid:31) a4}.\nChoice-4. We \ufb01rst group A into subsets of four alternatives so that a1 is included in all\nsubsets. And a small number of groups is desirable for computational considerations. One\npossible way is G1 = {a1, a2, a3, a4}, G2 = {a1, a5, a6, a7}, etc. The last group can be\n{a1, am\u22122, am\u22121, am}. More than one overlapping alternatives across groups is \ufb01ne.\nIn this\n3 (cid:101) groups. We will de\ufb01ne \u03a6G and EG for any group G = {ai1, ai2 , ai3 , ai4}.\nway we have (cid:100) m\u22121\nt=1 \u03a6Gt and E = \u222a(cid:100) m\u22121\nThen \u03a6 = \u222a(cid:100) m\u22121\n3 (cid:101)\n3 (cid:101)\nt=1 EGt. For any G = {ai1 , ai2 , ai3, ai4}, \u03a6G =\n{(choice-4, G), (choice-3, G(cid:48)), (choice-2, G(cid:48)(cid:48))|G(cid:48), G(cid:48)(cid:48) \u2208 G,|G(cid:48)| = 3,|G(cid:48)(cid:48)| = 2}. E includes all 17\nchoice-2,3,4 orders. E = {(G, ai1), (G, ai2 ), (G, ai3), ({ai1 , ai2 , ai3}, ai1), ({ai1, ai2, ai3}, ai2 ),\n({ai1, ai2, ai4}, ai1), ({ai1, ai2, ai4}, ai2 ), ({ai1 , ai3 , ai4}, ai1), ({ai1, ai3, ai4}, ai3 ), ({ai2, ai3, ai4},\nai2 ), ({ai2 , ai3 , ai4}, ai3), ({ai1, ai2}, ai1 ), ({ai1, ai3}, ai1), ({ai1 , ai4}, ai1), ({ai2, ai3}, ai2 ),\n({ai2 , ai4}, ai2), ({ai3, ai4}, ai3)}.\nFormally our algorithms are collectively represented as Algorithm 1. We note that only one pass of\ndata is required for estimating (cid:126)\u03c6 and computing the frequencies of each partial order. The following\ntheorem shows that Algorithm 1 is consistent when E is chosen for \u201cranked top-2 and 2-way\" and\n\u201cchoice-4\".\n\nAlgorithm 1 Algorithms for 2-PL-\u03a6.\nInput: Preference pro\ufb01le P with n partial orders. A set of preselected partial orders E.\nOutput: Estimated parameter (cid:126)\u03b8(cid:48).\nEstimate (cid:126)\u03c6 using (2).\nFor each E \u2208 E, compute the frequency of E.\nCompute the output using (3).\nTheorem 4. Given m \u2265 4. If there exists \u0001 > 0 s.t. for all r = 1, 2 and i = 1, . . . , m, \u03b8(r)\ni \u2208 [\u0001, 1],\nand E is selected following either of \u201cranked top-2 and 2-way\" and \u201cchoice-4\", then Algorithm 1 is\nconsistent.\n\nProof. We \ufb01rst prove that the estimate of (cid:126)\u03c6 is consistent. Let Xt denote a random variable, where\nXt = 1 if a structure (st,At) is observed and 0 otherwise. The dataset of n partial orders is\nconsidered as n trials. Let the j-th observation of Xt be xj. Then we have E[\n, which\nmeans as n \u2192 \u221e,\n\nwith probability approaching one.\n\n] = \u03c6stAt\n\nj=1 xj\nn\n\n(cid:80)n\n\n(cid:80)n\n\nj=1 xj\nn\n\nconverges to \u03c6stAt\n\n7\n\n\fNow we prove that the estimation of (cid:126)\u03b1, (cid:126)\u03b8(1), (cid:126)\u03b8(2) is also consistent.\nWe write the moment conditions (cid:126)g(P, (cid:126)\u03b8) as (cid:126)gn((cid:126)\u03b8) and de\ufb01ne\n\nLet (cid:126)\u03b8\u2217 denote the ground truth parameter. By de\ufb01nition, we have\n\n(cid:126)g0((cid:126)\u03b8) = E[(cid:126)gn((cid:126)\u03b8)].\n\n(cid:126)g0((cid:126)\u03b8\u2217) = E[\n\nPrk-PL-\u03a6(Et|(cid:126)\u03b8\u2217)\n\n\u2212 # of Et\nn\u03c6stAt\n\n] =\n\n1\n\u03c6stAt\n\n(Prk-PL-\u03a6(Et|(cid:126)\u03b8\u2217) \u2212 E[\n\n# of Et\n\n]) = (cid:126)0.\n\nn\n\n\u03c6stAt\n2, which is minimized at (cid:126)\u03b8(cid:48) (the estimate) and de\ufb01ne Q0((cid:126)\u03b8) = E[Qn((cid:126)\u03b8)],\n\nLet Qn((cid:126)\u03b8) = ||g(P, (cid:126)\u03b8)||2\nwhich is minimized at (cid:126)\u03b8\u2217. We \ufb01rst prove the following lemma:\nLemma 1. sup(cid:126)\u03b8\u2208\u0398 |Qn((cid:126)\u03b8) \u2212 Q0((cid:126)\u03b8)| p\u2212\u2192 0.\nProof. Recall that any moment condition g(Oj, (cid:126)\u03b8) (corresponding to partial order Et where 1 \u2264\nt \u2264 q) has the from Prk-PL-\u03a6(Et|(cid:126)\u03b8) \u2212 Xt,j where Xt,j = 1 if Et is observed from Oj and Xt,j = 0\nj=1 (cid:126)g(Oj, (cid:126)\u03b8), for any moment condition, we have\notherwise. And also from (cid:126)gn((cid:126)\u03b8) = (cid:126)g(P, (cid:126)\u03b8) = 1\nn\n\n(cid:80)n\nn(cid:88)\n\nj=1\n\n|gn((cid:126)\u03b8) \u2212 g0((cid:126)\u03b8)| = | 1\nn\n\nXt,j \u2212 E[Xt]| p\u2212\u2192 0.\n\nTherefore, we obtain sup(cid:126)\u03b8\u2208\u0398 ||(cid:126)gn((cid:126)\u03b8) \u2212 (cid:126)g0((cid:126)\u03b8)|| p\u2212\u2192 (cid:126)0.\nThen we have (omitting the independent variable (cid:126)\u03b8)\n\n|Qn \u2212 Q0| = |(cid:126)g(cid:62)\n\nn (cid:126)gn \u2212 (cid:126)g(cid:62)\n\n0 (cid:126)g0| \u2264 |((cid:126)gn \u2212 (cid:126)g0)(cid:62)((cid:126)gn \u2212 (cid:126)g0)| + 2|(cid:126)g(cid:62)\n\n0 ((cid:126)gn \u2212 (cid:126)g0)|\n\nSince all moment conditions fall in [\u22121, 1] for any (cid:126)\u03b8 \u2208 \u0398, we have\n\n|Qn((cid:126)\u03b8) \u2212 Q0((cid:126)\u03b8)| p\u2212\u2192 0.\n\nsup\n(cid:126)\u03b8\u2208\u0398\n\nNow we are ready to prove consistency. By our Theorem 2, the model is identi\ufb01able, which means\ni \u2208 [\u0001, 1] for all\ng0((cid:126)\u03b8) is uniquely minimized at (cid:126)\u03b8\u2217. Since Q0((cid:126)\u03b8) is continuous and \u0398 is compact (\u03b8(r)\nr = 0, 1 and i = 1, . . . , m), by Lemma 1 and Theorem 2.1 by Newey and McFadden [25], we have\n(cid:126)\u03b8(cid:48) p\u2212\u2192 (cid:126)\u03b8\u2217.\n\n6 Experiments\n\ns.t. (cid:80)m\n\nSetup. We conducted experiments on synthetic data to demonstrate the effectiveness of our algorithms.\nThe data are generated as follows: (i) generate \u03b1, (cid:126)\u03b8(1), and (cid:126)\u03b8(2) uniformly at random and normalize\ni = 1 for r = 1, 2; (ii) generate linear orders using k-PL-linear; (iii) choose \u03c6top-lA ,\ni=1 \u03b8(r)\n, and \u03c6choice-l\nand sample partial orders from the generated linear orders. The partial orders are\nA(cid:48)\n\n\u03c6l-wayA(cid:48)\ngenerated from the following two models:\n\u2022 ranked top-2 and 2-way: \u03c6top-2A = 1\n\u2022 choice-2, 3, 4: \ufb01rst group the alternatives as described in the previous section. Let C =\n3 (cid:101) be the number of groups. We \ufb01rst sample a group uniformly at random. Let A(4) be\n28; for each subset A(3) \u2282 A(4)\n28; for each subset\nA(3) = 1\n28.\nA(2) = 1\n\n(cid:100) m\u22121\nthe sampled group (of four alternatives). Then \u03c6choice-4\nof three alternatives (four such subsets within A(4)), \u03c6choice-3\nA(2) \u2282 A(4) of two alternatives (six subsets within A(4)), \u03c6choice-2\n\nm(m\u22121) for all A(cid:48) \u2282 A and |A(cid:48)| = 2;\n\n2, \u03c62-wayA(cid:48) =\n\nC\n\nC\n\nA(4) = 1\n\nC\n\n4\n\n3\n\n1\n\n1\n\n8\n\n\fBesides, we tested our algorithms on linear orders. In this case, all partial orders are marginal events\nof linear orders and there is no (cid:126)\u03c6 estimation. Our algorithms reduce to the standard generalized-\nmethod-of-moments algorithms.\nThe baseline algorithms are the GMM algorithm by [33] and ELSR-Gibbs algorithm by [16]. The\nGMM algorithm by [33] is for linear order, but it utilizes only ranked top-3 orders. So it can be viewed\nas both a linear order algorithm and a partial order algorithm. We apply ELSR-Gibbs algorithm\nby [16] on \u201cchoice-2,3,4\" datasets because the algorithm is expected to run faster than \u201cranked top-2\nand 2-way\" dataset.\nAll algorithms were implemented with MATLAB1 on an Ubuntu Linux server with Intel Xeon\nE5 v3 CPUs each clocked at 3.50 GHz. We use Mean Squared Error (MSE), which is de\ufb01ned as\nE[||(cid:126)\u03b8(cid:48) \u2212 (cid:126)\u03b8\u2217||2\n2], and runtime to compare the performance of the algorithms. For fair comparisons with\nprevious works, we ignore the (cid:126)\u03c6 parameter when computing MSE.\n\nFigure 2: MSE and runtime with 95% con\ufb01dence intervals for 2-PL over 10 alternatives when n\nvaries. \u201cChoice\" denotes the setting of \u201cchoice-2, 3, 4\". For ELSR-Gibbs [16], we used the partial\norders generated by \u201cchoice-2, 3, 4\". One linear extension was generated from each partial order and\nthree EM iterations were run. All values were averaged over 2000 trials.\n\nResults and Discussions. The algorithms are compared when the number of rankings varies (Fig-\nure 2). We have the following observations.\n\n\u2022 When learning from partial orders only: \u201cELSR-gibbs [16]\" is much slower than other\nalgorithms for large datasets. MSEs of all other algorithms converge towards zero as n\nincreases. We can see \u201ctop-2 and 2-way, partial\" and \u201cchoice, partial\" converge slower than\n\u201ctop-3\". Ranked top-l orders are generally more informative for parameter estimation than\nother partial orders. However, as was reported in [34], it is much more time consuming for\nhuman to pick their ranked top alternative(s) from a large set of alternatives than fully rank a\nsmall set of alternatives, which means ranked top-l data are harder or more costly to collect.\n\u2022 When learning from linear orders: our \u201cranked top-2 and 2-way, linear\" and \u201cchoice-2, 3, 4,\nlinear\" outperform \u201ctop-3 [33]\" in terms of MSE (left of Figure 2), but only slightly slower\nthan \u201ctop-3 [33]\" (Figure 2 right).\n\n7 Conclusions and Future Work\n\nWe extend the mixtures of Plackett-Luce models to the class of models that sample structured partial\norders and theoretically characterize the (non-)identi\ufb01ability of this class of models. We propose\nconsistent and ef\ufb01cient algorithms to learn mixtures of two Plackett-Luce models from linear orders\nor structured partial orders. For future work, we will explore more statistically and computationally\nef\ufb01cient algorithms for mixtures of an arbitrary number of Plackett-Luce models, or the more general\nrandom utility models.\n\n1Code available at https://github.com/zhaozb08/MixPL-SPO\n\n9\n\n\fAcknowledgments\n\nWe thank all anonymous reviewers for helpful comments and suggestions. This work is supported by\nNSF #1453542 and ONR #N00014-17-1-2621.\n\nReferences\n[1] Alon Altman and Moshe Tennenholtz. Ranking systems: The PageRank axioms. In Proceedings of the\n\nACM Conference on Electronic Commerce (EC), Vancouver, BC, Canada, 2005.\n\n[2] Ammar Ammar, Sewoong Oh, Devavrat Shah, and L Voloch. What\u2019s your choice? learning the mixed multi-\nnomial logit model. In Proceedings of the ACM SIGMETRICS/international conference on Measurement\nand modeling of computer systems, 2014.\n\n[3] Linas Baltrunas, Tadas Makcinskas, and Francesco Ricci. Group recommendations with rank aggregation\nand collaborative \ufb01ltering. In Proceedings of the fourth ACM conference on Recommender systems, pages\n119\u2013126. ACM, 2010.\n\n[4] Felix Brandt and Guillaume Chabinand Christian Geist. Pnyx:: A Powerful and User-friendly Tool for\nPreference Aggregation. In Proceedings of the 2015 International Conference on Autonomous Agents and\nMultiagent Systems, pages 1915\u20131916, 2015.\n\n[5] Emmanuel J Cand\u00e8s and Benjamin Recht. Exact matrix completion via convex optimization. Foundations\n\nof Computational mathematics, 9(6):717, 2009.\n\n[6] Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz. Pairwise ranking aggregation in a\ncrowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data\nmining, pages 193\u2013202. ACM, 2013.\n\n[7] Flavio Chierichetti, Ravi Kumar, and Andrew Tomkins. Learning a mixture of two multinomial logits. In\n\nProceedings of the 35rd International Conference on Machine Learning (ICML-18), 2018.\n\n[8] Isobel Claire Gormley and Thomas Brendan Murphy. Exploring voting blocs within the irish exploring\nvoting blocs within the irish electorate: A mixture modeling approach. Journal of the American Statistical\nAssociation, 103(483):1014\u20131027, 2008.\n\n[9] Isobel Claire Gormley and Thomas Brendan Murphy. A grade of membership model for rank data.\n\nBayesian Analysis, 4(2):265\u2013296, 2009.\n\n[10] Jonathan Huang, Ashish Kapoor, and Carlos Guestrin. Ef\ufb01cient probabilistic inference with partial ranking\nqueries. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Arti\ufb01cial Intelligence, pages\n355\u2013362. AUAI Press, 2011.\n\n[11] David R. Hunter. MM algorithms for generalized Bradley-Terry models. In The Annals of Statistics,\n\nvolume 32, pages 384\u2013406, 2004.\n\n[12] Kevin G Jamieson and Robert Nowak. Active ranking using pairwise comparisons. In Advances in Neural\n\nInformation Processing Systems, pages 2240\u20132248, 2011.\n\n[13] Minje Jang, Sunghyun Kim, Changho Suh, and Sewoong Oh. Top-k ranking from pairwise comparisons:\n\nWhen spectral ranking is optimal. arXiv preprint arXiv:1603.04153, 2016.\n\n[14] Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from noisy entries.\n\nJournal of Machine Learning Research, 11(Jul):2057\u20132078, 2010.\n\n[15] Ashish Khetan and Sewoong Oh. Data-driven rank breaking for ef\ufb01cient rank aggregation. Journal of\n\nMachine Learning Research, 17(193):1\u201354, 2016.\n\n[16] Ao Liu, Zhibing Zhao, Chao Liao, Pinyan Lu, and Lirong Xia. Learning plackett-luce mixtures from partial\npreferences. In Proceedings of the Thirty-Third AAAI Conference on Arti\ufb01cial Intelligence (AAAI-19),\n2019.\n\n[17] Tie-Yan Liu. Learning to Rank for Information Retrieval. Springer, 2011.\n\n[18] Tyler Lu and Craig Boutilier. Effective sampling and learning for mallows models with pairwise-preference\n\ndata. The Journal of Machine Learning Research, 15(1):3783\u20133829, 2014.\n\n[19] Robert Duncan Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959.\n\n10\n\n\f[20] Andrew Mao, Ariel D. Procaccia, and Yiling Chen. Better human computation through principled voting.\nIn Proceedings of the National Conference on Arti\ufb01cial Intelligence (AAAI), Bellevue, WA, USA, 2013.\n\n[21] John I. Marden. Analyzing and modeling rank data. Chapman & Hall, 1995.\n\n[22] Lucas Maystre and Matthias Grossglauser. Fast and accurate inference of plackett\u2013luce models.\n\nAdvances in neural information processing systems, pages 172\u2013180, 2015.\n\nIn\n\n[23] Cristina Mollica and Luca Tardella. Bayesian Plackett\u2013Luce mixture models for partially ranked data.\n\nPsychometrika, 82(2):442\u2013458, 2017.\n\n[24] Sahand Negahban and Martin J Wainwright. Restricted strong convexity and weighted matrix completion:\n\nOptimal bounds with noise. Journal of Machine Learning Research, 13(May):1665\u20131697, 2012.\n\n[25] Whitney K Newey and Daniel McFadden. Large sample estimation and hypothesis testing. Handbook of\n\neconometrics, 4:2111\u20132245, 1994.\n\n[26] Sewoong Oh and Devavrat Shah. Learning mixed multinomial logit model from ordinal data. In Advances\n\nin Neural Information Processing Systems, pages 595\u2013603, 2014.\n\n[27] Maria Silvia Pini, Francesca Rossi, Kristen Brent Venable, and Toby Walsh. Incompleteness and incom-\nparability in preference aggregation: Complexity results. Arti\ufb01cial Intelligence, 175(7\u20138):1272\u20141289,\n2011.\n\n[28] Robin L. Plackett. The analysis of permutations. Journal of the Royal Statistical Society. Series C (Applied\n\nStatistics), 24(2):193\u2013202, 1975.\n\n[29] Richard A Redner and Homer F Walker. Mixture densities, maximum likelihood and the em algorithm.\n\nSIAM review, 26(2):195\u2013239, 1984.\n\n[30] Maksim Tkachenko and Hady W Lauw. Plackett-luce regression mixture model for heterogeneous rankings.\nIn Proceedings of the 25th ACM International on Conference on Information and Knowledge Management,\npages 237\u2013246. ACM, 2016.\n\n[31] Kenneth E. Train. Discrete Choice Methods with Simulation. Cambridge University Press, 2nd edition,\n\n2009.\n\n[32] Lirong Xia. Learning and Decision-Making from Rank Data. Synthesis Lectures on Arti\ufb01cial Intelligence\n\nand Machine Learning. Morgan & Claypool Publishers, 2019.\n\n[33] Zhibing Zhao, Peter Piech, and Lirong Xia. Learning mixtures of Plackett-Luce models. In Proceedings of\n\nthe 33rd International Conference on Machine Learning (ICML-16), 2016.\n\n[34] Zhibing Zhao, Haoming Li, Junming Wang, Jeffrey Kephart, Nicholas Mattei, Hui Su, and Lirong Xia. A\ncost-effective framework for preference elicitation and aggregation. In Proceedings of the 34th Conference\non Uncertainty in Arti\ufb01cial Intelligence (UAI-2018), 2018.\n\n[35] Zhibing Zhao, Tristan Villamil, and Lirong Xia. Learning mixtures of random utility models. In Proceedings\n\nof the Thirty-Second AAAI Conference on Arti\ufb01cial Intelligence (AAAI-18), 2018.\n\n11\n\n\f", "award": [], "sourceid": 5361, "authors": [{"given_name": "Zhibing", "family_name": "Zhao", "institution": "RPI"}, {"given_name": "Lirong", "family_name": "Xia", "institution": "RPI"}]}