{"title": "Random Utility Theory for Social Choice", "book": "Advances in Neural Information Processing Systems", "page_first": 126, "page_last": 134, "abstract": null, "full_text": "Random Utility Theory for Social Choice\n\nHossein Azari Souani\nSEAS, Harvard University\nazari@fas.harvard.edu\n\nDavid C. Parkes\nSEAS, Harvard University\nparkes@eecs.harvard.edu\n\nLirong Xia\nSEAS, Harvard University\nlxia@seas.harvard.edu\n\nAbstract\nRandom utility theory models an agents preferences on alternatives by drawing\na real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A\nspecial case that has received signicant attention is the Plackett-Luce model, for\nwhich fast inference methods for maximum likelihood estimators are available.\nThis paper develops conditions on general random utility models that enable fast\ninference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on\nboth real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models\nincluding Plackett-Luce.\n\n1\n\nIntroduction\n\nProblems of learning with rank-based error metrics [16] and the adoption of learning for the purpose\nof rank aggregation in social choice [7, 8, 23, 25, 29, 30] are gaining in prominence in recent years.\nIn part, this is due to the explosion of socio-economic platforms, where opinions of users need to be\naggregated; e.g., judges in crowd-sourcing contests, ranking of movies or user-generated content.\nIn the problem of social choice, users submit ordinal preferences consisting of partial or total ranks\non the alternatives and a single rank order must be selected to be representative of the reports.\nSince Condorcet [6], one approach to this problem is to formulate social choice as the problem\nof estimating a true underlying world state (e.g., a true quality ranking of alternatives), where the\nindividual reports are viewed as noisy data in regard to the true state. In this way, social choice\ncan be framed as a problem of inference. In particular, Condorcet assumed the existence of a true\nranking over alternatives, with a voters preference between any pair of alternatives a, b generated to\nagree with the true ranking with probability p > 1/2 and disagree otherwise. Condorcet proposed\nto choose as the outcome of social choice the ranking that maximizes the likelihood of observing the\nvoters preferences. Later, Kemenys rule was shown to provide the maximum likelihood estimator\n(MLE) for this model [32].\nBut Condorcets probabilistic model assumes identical and independent distributions on pairwise\ncomparisons. This ignores the strength in agents preferences (the same probability p is adopted\nfor all pairwise comparisons), and allows for cyclic preferences. In addition, computing the winner through the Kemeny rule is P -complete [13]. To overcome the rst criticism, a more recent\n2\nliterature adopts the random utility model (RUM) from economics [26]. Consider C = {c1 , .., cm }\nalternatives. In RUM, there is a ground truth utility (or score) associated with each alternative.\nThese are real-valued parameters, denoted by = (1 , . . . , m ). Given this, an agent independently\nsamples a random utility (Xj ) for each alternative cj with conditional distribution j (|j ). Usually\nj is the mean of j (|j ).1 Let denote a permutation of {1, . . . , m}, which naturally corresponds\nto a linear order: [c(1) c(2) c(m) ]. Slightly abusing notation, we also use to denote\n1\n\nj (|j ) might be parameterized by other parameters, for example variance.\n\n1\n\n\fthis linear order. Random utility (X1 , . . . , Xm ) generates a distribution on preference orders, as\nPr( | ) = Pr(X(1) > X(2) > . . . > X(m) )\n\n(1)\n\nThe generative process is illustrated in Figure 1.\n\nFigure 1: The generative process for RUMs.\nAdopting RUMs rules out cyclic preferences, because each agents outcome corresponds to an order\non real numbers, and it also captures the strength of preference, and thus overcomes the second\ncriticism, by assigning a different parameter (j ) to each alternative.\nA popular RUM is Plackett-Luce (P-L) [18, 21], where the random utility terms are generated according to Gumbel distributions with xed shape parameter [2, 31]. For P-L, the likelihood function\nhas a simple analytical solution, making MLE inference tractable. P-L has been extensively applied\nin econometrics [1, 19], and more recently in machine learning and information retrieval (see [16]\nfor an overview). Efcient methods of EM inference [5, 14], and more recently expectation propagation [12], have been developed for P-L and its variants. In application to social choice, the P-L\nmodel has been used to analyze political elections [10]. EM algorithm has also been used to learn\nthe Mallows model, which is closely related to the Condorcets probabilistic model [17].\nAlthough P-L overcomes the two difculties of the Condorcet-Kemeny approach, it is still quite\nrestricted, by assuming that the random utility terms are distributed as Gumbel, with each alternative\nis characterized by one parameter, which is the mean of its corresponding distribution. In fact, little\nis known about inference in RUMs beyond P-L. Specically, we are not aware of either an analytical\nsolution or an efcient algorithm for MLE inference for one of the most natural models proposed by\nThurstone [26], where each Xj is normally distributed.\n1.1\n\nOur Contributions\n\nIn this paper we focus on RUMs in which the random utilities are independently generated with\nrespect to distributions in the exponential family (EF) [20]. This extends the P-L model, since\nthe Gumbel distribution with xed shape parameters belonging to the EF. Our main theoretical\ncontributions are Theorem 1 and Theorem 2, which propose conditions such that the log-likelihood\nfunction is concave and the set of global maxima solutions is bounded for the location family, which\nare RUMs where the shape of each distribution j is xed and the only latent variables are the\nlocations, i.e., the means of j s. These results hold for existing special cases, such as the P-L\nmodel, and many other RUMs, for example the ones where each j is chosen from Normal, Gumbel,\nLaplace and Cauchy.\nWe also propose a novel application of MC-EM. We treat the random utilities (X) as latent variables,\nand adopt the Expectation Maximization (EM) method to estimate parameters . The E-step for\nthis problem is not analytically tractable, and for this we adopt a Monte Carlo approximation. We\nestablish through experiments that the Monte-Carlo error in the E-step is controllable and does not\naffect inference, as long as numerical parameterizations are chosen carefully. In addition, for the Estep we suggest a parallelization over the agents and alternatives and a Rao-Blackwellized method,\n2\n\n\fwhich further increases the scalability of our method. We generally assume that the data provides\ntotal orders on alternatives from voters, but comment on how to extend the method and theory to the\ncase where the input preferences are partial orders.\nWe evaluate our approach on synthetic data as well as two real-world datasets, a public election\ndataset and one involving rank preferences on sushi. The experimental results suggest that the\napproach is scalable despite providing signicantly improved modeling exibility over existing approaches. For the two real-world datasets we have studied, we compare RUMs with normal distributions and P-L in terms of four criteria: log-likelihood, predictive log-likelihood, Akaike information\ncriterion (AIC), and Bayesian information criterion (BIC). We observe that when the amount of\ndata is not too small, RUMs with normal distributions t better than P-L. Specically, for the loglikelihood, predictive log-likelihood, and AIC criteria, RUMs with normal distributions outperform\nP-L with 95% condence in both datasets.\n\n2\n\nRUMs and Exponential Families\n\nIn social choice, each agent i {1, . . . , n} has a strict preference order on alternatives. This\nprovides the data for an inferential approach to social choice. In particular. let L(C) denote the set\nof all linear orders on C. Then, a preference-prole, D, is a set of n preference orders, one from each\nagent, so that D L(C)n . A voting rule r is a mapping that assigns to each preference-prole a set\nof winning rankings, r : L(C)n (2L(C) \\ ). In particular, in the case of ties the set of winning\nrankings may include more than a singleton ranking.\nIn the maximum likelihood (MLE) approach to social choice, the preference prole is viewed as\ndata, D = { 1 , . . . , n }. Given this, the probability (likelihood) of the data given ground truth \nn\n(and for a particular ) is Pr(D | ) = i=1 Pr( i | ), where,\n\n\nP (|) =\n\n\n\n\n\n..\nx(n) =\n\nx(n1) =x(n)\n\n(n) (x(n) )..(1) (x(1) )dx(1) dx(2) ..dx(n)\nx(1) =x(2)\n\n(2)\n\nThe MLE approach to social choice selects as the winning ranking that which corresponds to the \nthat maximizes Pr(D | ). In the case of multiple parameters that maximize the likelihood then the\nMLE approach returns a set of rankings, one ranking corresponding to each parameterization.\nIn this paper, we focus on probabilistic models where each j belongs to the exponential family\n(EF). The density function for each in EF has the following format:\nPr(X = x) = (x) = e()T (x)A()+B(x) ,\n\n(3)\n\nwhere () and A() are functions of , B() is a function of x, and T (x) denotes the sufcient\nstatistics for x, which could be multidimensional.\nExample 1 (Plackett-Luce as an RUM [2]) In the RUM, let j s be Gumbel distributions. That\n(xj j )\nis, for alternative j {1, . . . , m} we have j (xj |j ) = e(xj j ) ee\n. Then, we have:\n\nm\nPr( | ) = Pr(x(1) > x(2) > .. > x(m) ) = j=1 m (j)\n, where (j ) = j = ej ,\n\nj =j\n\n(j )\n\nT (xj ) = exj , B(xj ) = xj and A(j ) = j .This gives us the Plackett-Luce model.\n\n3\n\nGlobal Optimality and Log-Concavity\n\nIn this section, we provide a condition on distributions that guarantees that the likelihood function (2)\nis log-concave in parameters . We also provide a condition under which the set of MLE solutions\nis bounded when any one latent parameter is xed. Together, this guarantees the convergence of our\nMC-EM approach to a global mode with an accurate enough E-step.\nWe focus on the location family, which is a subset of RUMs where the shapes of all j s are xed,\nand the only parameters are the means of the distributions. For the location family, we can write\nXj = j + j , where Xj j (|j ) and j = Xj j is a random variable whose mean is 0\nand models an agents subjective noise. The random variables j s do not need to be identically\ndistributed for all alternatives j; e.g., they can be normal with different xed variances. We focus on\ncomputing solutions () to maximize the log-likelihood function,\n3\n\n\fn\n\nlog Pr( i | )\n\nl(; D) =\n\n(4)\n\ni=1\n\nTheorem 1 For the location family, if for every j m the probability density function for j is\nlog-concave, then l(; D) is concave.\nProof sketch: The theorem is proved by applying the following lemma, which is Theorem 9 in [22].\nLemma 1 Suppose g1 (, ), ..., gR (, ) are concave functions in R2m where is the vector of m\nparameters and is a vector of m real numbers that are generated according to a distribution whose\npdf is logarithmic concave in Rm . Then the following function is log-concave in Rm .\nLi (, G) = Pr(g1 (, ) 0, ..., gR (, ) 0), Rm\n\n(5)\n\nTo apply Lemma 1, we dene a set Gi of function g i s that is equivalent to an order i in the sense of\ninequalities implied by RUM for i and Gi (the joint probability in (5) for Gi to be the same as the\ni\ni\ni\nprobity of i in RUM with parameters ). Suppose gr (, ) = i (r) + i (r) i (r+1) i (r+1)\ni\nfor r = 1, .., m 1. Then considering that the length of order is R + 1, we have:\ni\ni\nLi (, i ) = Li (, Gi ) = Pr(g1 (, ) 0, ..., gR (, ) 0), Rm\n\n(6)\n\ni\nThis is because gr (, ) 0 is equivalent to that in i alternative i (r) is preferred to alternative\ni\n (r + 1) in the RUM sense.\n\nTo see how this extends to the case where preferences are specied as partial orders, we consider\nin particular an interpretation where an agents report for the ranking of mi alternatives implies that\ni\nall other alternatives are worse for the agent, in some undened order. Given this, dene gr (, ) =\ni\ni\ni\ni\ni (r) + i (r) i (r+1) i (r+1) for r = 1, .., mi 1 and gr (, ) = i (mi ) + i (mi ) \ni\ni\ni (r+1) i (r+1) for r = mi , .., m 1. Considering that gr ()s are linear (hence, concave) and\ni i\ni\nusing log concavity of the distributions of i = (1 , 2 , .., m )s, we can apply Lemma 1 and prove\nlog-concavity of the likelihood function.\n\nIt is not hard to verify that pdfs for normal and Gumbel are log-concave under reasonable conditions\nfor their parameters, made explicit in the following corollary.\nCorollary 1 For the location family where each j is a normal distribution with mean zero and\nwith xed variance, or Gumbel distribution with mean zeros and xed shape parameter, l(; D) is\nconcave. Specically, the log-likelihood function for P-L is concave.\nThe concavity of log-likelihood of P-L has been proved [9] using a different technique.\nUsing Fact 3.5. in [24], the set of global maxima solutions to the likelihood function, denoted by SD ,\nis convex since the likelihood function is log-concave. However, we also need that SD is bounded,\nand would further like that it provides one unique order as the estimation for the ground truth.\nFor P-L, Ford, Jr. [9] proposed the following necessary and sufcient condition for the set of global\nm\nmaxima solutions to be bounded (more precisely, unique) when j=1 ej = 1.\nCondition 1 Given the data D, in every partition of the alternatives C into two nonempty subsets\nC1 C2 , there exists c1 C1 and c2 C2 such that there is at least one ranking in D where c1 c2 .\nWe next show that Condition 1 is also a necessary and sufcient condition for the set of global\nmaxima solutions SD to be bounded in location families, when we set one of the values j to be 0\n(w.l.o.g., let 1 = 0). If we do not bound any parameter, then SD is unbounded, because for any ,\nany D, and any number s R, l(; D) = l( + s; D).\nTheorem 2 Suppose we x 1 = 0. Then, the set SD of global maxima solutions to l(; D) is\nbounded if and only if the data D satises Condition 1.\nProof sketch: If Condition 1 does not hold, then SD is unbounded because the parameters for all\nalternatives in C1 can be increased simultaneously to improve the log-likelihood. For sufciency,\nwe rst present the following lemma whose proof is omitted due to the space constraint.\n4\n\n\fLemma 2 If alternative j is preferred to alternative j in at least in one ranking then the difference\nof their mean parameters j j is bounded from above (Q where j j < Q) for all the \nthat maximize the likelihood function.\nNow consider a directed graph GD , where the nodes are the alternatives, and there is an edge between cj to cj if in at least one ranking cj cj . By Condition 1, for any pair j = j , there is a path\nfrom cj to cj (and conversely, a path from cj to cj ). To see this, consider building a path between\nj and j by starting from a partition with C1 = {j} and following an edge from j to j1 in the graph\nwhere j1 is an alternatives in C2 for which there must be such an edge, by Condition 1. Consider the\npartition with C1 = {j, j1 }, and repeat until an edge can be followed to vertex j C2 . It follows\nfrom Lemma 2 that for any SD we have |j j | < Qm, using the telescopic sum of bounded\nvalues of the difference of mean parameters along the edges of the path, since the length of the path\nis no more than m (and tracing the path from j to j and j to j), meaning that SD is bounded.\nNow that we have the log concavity and bounded property, we need to declare conditions under\nwhich the bounded convex space of estimated parameters corresponds to a unique order. The next\ntheorem provides a necessary and sufcient condition for all global maxima to correspond to the\nsame order on alternatives. Suppose that we order the alternatives based on estimated s (meaning\nthat cj is ranked higher than cj iff j > j ).\nTheorem 3 The order over parameters is strict and is the same across all SD if, for all SD\nand all alternatives j = j , j = j .\nProof: Suppose for the sake of contradiction there exist two maxima, , SD and a pair of\n\n\nalternatives j = j such that j > j and j > j . Then, there exists an < 1 such that the jth\n\nand j th components of + (1 ) are equal, which contradicts the assumption.\nHence, if there is never a tie in the scores in any SD , then any vector in SD will reveal the\nunique order.\n\n4\n\nMonte Carlo EM for Parameter Estimation\n\nIn this section, we propose an MC-EM algorithm for MLE inference for RUMs where every j\nbelongs to the EF.2\nThe EM algorithm determines the MLE parameters iteratively, and proceeds as follows. In each\niteration t + 1, given parameters t from the previous iteration, the algorithm is composed of an\nE-step and an M-step. For the E-step, for any given = (1 , . . . , m ), we compute the conditional\nexpectation of the complete-data log-likelihood (latent variables x and data D), where the latent\nvariables x are distributed according to data D and parameters t from the last iteration. For the\nM-step, we optimize to maximize the expected log-likelihood computed in the E-step, and use it\nas the input t+1 for the next iteration:\nn\n\nE-Step : Q(, t ) = EX\n\nPr(xi , i | ) | D, t\n\nlog\ni=1\n\nM-step : \n\nt+1\n\n arg max Q(, t )\n\n\n4.1\n\nMonte Carlo E-step by Gibbs sampler\n\nThe E-step can be simplied using (3) as follows:\nn\n\nn\n\nPr(xi , i | ) | D, t } = EX {log\n\nEX {log\ni=1\nn\n\nm\n\ni=1\nn\n\nm\n\ni\nEXj {log j (xi |j ) | i , t } =\nj\n\n=\ni=1 j=1\n\nPr(xi | ) Pr( i |xi ) | D, t }\n\ni\n((j )EXj {T (xi ) | i , t } A(j ) + W,\nj\n\ni=1 j=1\n\n2\nOur algorithm can be naturally extended to compute a maximum a posteriori probability (MAP) estimate,\nwhen we have a prior over the parameters . Still, it seems hard to motivate the imposition of a prior on\nparameters in many social choice domains.\n\n5\n\n\fi\nwhere W = EXj {B(xi ) | i , t } only depends on t and D (not on ), which means that it can be\nj\ntreated as a constant in the M-step.\n\ni,t+1\ni\n= EXj {T (xi ) | i , t } where T (xi ) is the\nHence, in the E-step we only need to compute Sj\nj\nj\nsufcient statistic for the parameter j in the model. We are not aware of an analytical solution for\ni\nEXj {T (xi ) | i , t }. However, we can use a Monte Carlo approximation, which involves sampling\nj\ni,t+1\nxi from the distribution Pr(xi | i , t ) using a Gibbs sampler, and then approximates Sj\nby\nN\ni,k\n1\nk=1 T (xj ) where N is the number of samples in the Gibbs sampler.\nN\n\nIn each step of our Gibbs sampler for voter i, we randomly choose a position j in i and\nsample xi i (j) according to a TruncatedEF distribution Pr(| xi (j) , t , i ), where xi (j) =\n\n( xi (1) , . . . , xi (j1) , xi (j+1) , . . . , xi (m) ). The TruncatedEF is obtained by truncating the tails\nt\nof i (j) (|i (j) ) at xi (j1) and xi (j+1) , respectively. For example, a truncated normal distribution is illustrated in Figure 2.\nRao-Blackwellized: To further improve the\nGibbs sampler, we use Rao-Blackwellized [4]\nestimation using E{T (xi,k ) | xi,k , i , t }\nj\nj\ninstead of the sample xi,k , where xi,k is all\nj\nj\nof xi,k except for xi,k .\nFinally, we estij\nmate E{T (xi,k ) | xi,k , i , t } in each step\nj\nj\nof the Gibbs sampler using M samples as\nN\ni,k\ni,t+1\n1\nk\ni t\nSj\nk=1 E{T (xj ) | xj , , }\nN\nN\nk=1\nl\nPr(xi ,k |\nj\n1\nNM\n\nFigure 2: A truncated normal distribution.\n\n4.2\n\nM\nil ,k\nl=1 T (xj ),\nxi,k , i , ).\nj\n\nl\n\nwhere\n\nxi ,k\nj\n\n\n\nRao-Blackwellization\nreduces the variance of the estimator because of conditioning and expectation in\nE{T (xi,k ) | xi,k , i , t }.\nj\nj\n\nM-step\n\ni,t+1\nIn the E-step we have (approximately) computed Sj\n. In the M-step we compute t+1 to maxn\nm\ni\ni\nimize i=1 j=1 ((j )EXj {T (xi ) | i , t } A(j ) + EXj {B(xi ) | i , t }). Equivalently, we\nj\nj\nt+1\ncompute j for each j m separately to maximize\nn\ni=1\n\nn\ni\ni\ni=1 {(j )EXj {T (xj )\n\ni,t+1\nSj\n\n| i , t }A(j )} =\n\n(j )\n nA(j ). For the case of the normal distribution with xed variance, where\nn\ni,t+1\nt+1\n1\n. The algorithm is illustrated in\n(j ) = j and A(j ) = (j )2 , we have j = n i=1 Sj\nFigure 3.\n\nFigure 3: The MC-EM algorithm for normal distribution.\n6\n\n\fTheorem 1 and Theorem 2 guarantee the convergence of MC-EM for an exact E-step. In order to\ncontrol the error of approximation in the MC-E step we can increase the number of samples with\nthe iterations, in order to decrease the error in Monte Carlo step [28]. Details are omitted due to the\nspace constraints and can be found in an extended version online.\n\n5\n\nExperimental Results\n\nWe evaluate the proposed MC-EM algorithm on synthetic data as well as two real world data sets,\nnamely an election data set and a dataset representing preference orders on sushi. For simulated data\nwe use the Kendall correlation [11] between two rank orders (typically between the true order and\nthe methods result) as a measure of performance.\n5.1\n\nExperiments for Synthetic Data\n\nWe rst generate data from Normal models for the random utility terms, with means j = j and\nequal variance for all terms, for different choices of variance (Var = 2, 4). We evaluate the performance of the method as the number of agents n varies. The results show that a limited number of\niterations in the EM algorithm (at most 3), and samples M N = 4000 (M=5, N=800) are sufcient\nfor inferring the order in most cases. The performance in terms of Kendall correlation for recovering\nground truth improves for larger number of agents, which corresponds to more data. See Figure 4,\nwhich shows the asymptotic behavior of the maximum likelihood estimator in recovering the true\nparameters. Figure 4 left and middle panels show that the more the size of dataset the better the performance of the method. Moreover, for large variances in data generation, due to increasing noise in\nthe data, the rate that performance gets better is slower than that for the case for smaller variances.\nNotice that the scales on the y-axis are different in the left and middle panels.\n\nFigure 4: Left and middle panel: Performance for different number of agents n on synthetic data for m = 5, 10\nand Var = 2, 4, with specications M N = 4000, EM iterations = 3. Right panel: Performance given\naccess to sub-samples of the data in the public election dataset, x-axis: size of sub-samples, y-axis: Kendall\nCorrelation with the order obtained from the full data-set. Dashed lines are the 95% condence intervals.\n\n5.2\n\nExperiments for Model Robustness\n\nWe apply our method to a public election dataset collected by Nicolaus Tideman [27], where the\nvoters provided partial orders on candidates. A partial order includes comparisons among a subset\nof alternative, and the non-mentioned alternatives in the partial order are considered to be ranked\nlower than the lowest ranked alternative among mentioned alternatives.\nThe total number of votes are n = 280 and the number of alternatives m = 15. For the purpose of\nour experiments, we adopt the order on alternatives obtained by applying our method on the entire\ndataset as an assumed ground truth, since no ground truth is given as part of the data. After nding\nthe ground truth by using all 280 votes (and adopting a normal model), we compare the performance\nof our approach as we vary the amount of data available. We evaluate the performance for subsamples consisting of 10, 20, . . . , 280 of samples randomly chosen from the full dataset. For each\nsub-sample size, the experiment is repeated 200 times and we report the average performance and\nthe variance. See the right panel in Figure 4. This experiment shows the robustness of the method,\nin the sense that the result of inference on a subset of the dataset shows consistent behavior with the\ncase that the result on the full dataset. For example, the ranking obtained by using half of the data\n7\n\n\fcan still achieve a fair estimate to the results with full data, with an average Kendall correlation of\ngreater than 0.4.\n5.3\n\nExperiments for Model Fitness\n\nIn addition to a public election dataset, we have tested our algorithm on a sushi dataset, where 5000\nusers give rankings over 10 different kinds of sushi [15]. For each experiment we randomly choose\nn {10, 20, 30, 40, 50} rankings, apply our MC-EM for RUMs with normal distributions where\nvariances are also parameters.\nIn the former experiments, both the synthetic data generation and the model for election data, the\nvariances were xed to 1 and hence we had the theoretical guarantees for the convergence to global\noptimal solutions by Theorem 1 and Theorem 2. When we let the variances to be part of parametrization we lose the theoretical guarantees. However, the EM algorithm can still be applied, and since\nthe variances are now parameters (rather than being xed to 1), the model ts better in terms of\nlog-likelihood.\nFor this reason, we adopt RUMs with normal distributions in which the variance is a parameter that\nis t by EM along with the mean. We call this model a normal model. We compute the difference\nbetween the normal model and P-L in terms of four criteria: log-likelihood (LL), predictive loglikelihood (predictive LL), AIC, and BIC. For (predictive) log-likelihood, a positive value means\nthat normal model ts better than P-L, whereas for AIC and BIC, a negative number means that\nnormal model ts better than P-L. Predictive likelihood is different from likelihood in the sense\nthat we compute the likelihood of the estimated parameters for a part of the data that is not used for\nparameter estimation.3 In particular, we compute predictive likelihood for a randomly chosen subset\nof 100 votes. The results and standard deviations for n = 10, 50 are summarized in Table 1.\nn = 10\nn = 50\nDataset\nLL\nPred. LL\nAIC\nBIC\nLL\nPred. LL\nAIC\nBIC\nSushi 8.8(4.2) -56.1(89.5) -7.6(8.4) 5.4(8.4) 22.6(6.3) 40.1(5.1) -35.2(12.6) -6.1(12.6)\nElection 9.4(10.6) 91.3(103.8) -8.8(21.2) 4.2(21.2) 44.8(15.8) 87.4(30.5) -79.6(31.6) -50.5(31.6)\nTable 1: Model selection for the sushi dataset and election dataset. Cases where the normal model ts better\nthan P-L statistically with 95% condence are in bold.\n\nWhen n is small (n = 10), the variance is high and we are unable to obtain statistically signicant\nresults in comparing tness. When n is not too small (n = 50), RUMs with normal distributions\nt better than P-L. Specically, for log-likelihood, predictive log-likelihood, and AIC, RUMs with\nnormal distributions outperform P-L with 95% condence in both datasets.\n5.4\n\nImplementation and Run Time\n\nThe running time for our MC-EM algorithm scales linearly with number of agents on real world\ndata (Election Data) with slope 13.3 second per agent on an Intel i5 2.70GHz PC. This is for 100\niterations of EM algorithm with Gibbs sampling number increasing with iterations as 2000 + 300 \niteration steps.\n\nAcknowledgments\nThis work is supported in part by NSF Grant No. CCF- 0915016. Lirong Xia is supported by NSF\nunder Grant #1136996 to the Computing Research Association for the CIFellows Project. We thank\nCraig Boutilier, Jonathan Huang, Tyler Lu, Nicolaus Tideman, Paolo Viappiani, and anonymous\nNIPS-12 reviewers for helpful comments and suggestions, or help on the datasets.\n\nReferences\n[1] Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equilibrium. Econometrica, 63(4):841890, 1995.\n[2] Henry David Block and Jacob Marschak. Random orderings and stochastic theories of responses. In\nContributions to Probability and Statistics, pages 97132, 1960.\n3\nThe use of predictive likelihood allows us to evaluate the performance of the estimated parameters on the\nrest of the data, and is similar in this sense to cross validation for supervised learning.\n\n8\n\n\f[3] James G. Booth and James P. Hobert. Maximizing Generalized Linear Mixed Model Likelihoods with an\nAutomated Monte Carlo EM Algorithm. JRSS. Series B, 61(1):265285, 1999.\n[4] Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng, editors. Handbook of Markov Chain\nMonte Carlo. Chapman and Hall/CRC, 2011.\n[5] Francois Caron and Arnaud Doucet. Efcient Bayesian Inference for Generalized Bradley-Terry Models.\nJournal of Computational and Graphical Statistics, 21(1):174196, 2012.\n[6] Marquis de Condorcet. Essai sur lapplication de lanalyse a la probabilit des d cisions rendues a la\ne\ne\n`\n`\npluralit des voix. Paris: LImprimerie Royale, 1785.\ne\n[7] Vincent Conitzer, Matthew Rognlie, and Lirong Xia. Preference functions that score rankings and maximum likelihood estimation. In Proc. IJCAI, pages 109115, 2009.\n[8] Vincent Conitzer and Tuomas Sandholm. Common voting rules as maximum likelihood estimators. In\nProc. UAI, pages 145152, 2005.\n[9] Lester R. Ford, Jr. Solution of a ranking problem from binary comparisons. The American Mathematical\nMonthly, 64(8):2833, 1957.\n[10] Isobel Claire Gormley and Thomas Brendan Murphy. A grade of membership model for rank data.\nBayesian Analysis, 4(2):265296, 2009.\n[11] Przemyslaw Grzegorzewski. Kendalls correlation coefcient for vague preferences. Soft Computing,\n13(11):10551061, 2009.\n[12] John Guiver and Edward Snelson. Bayesian inference for Plackett-Luce ranking models. In Proc. ICML,\npages 377384, 2009.\n[13] Edith Hemaspaandra, Holger Spakowski, and J rg Vogel. The complexity of Kemeny elections. Theoreto\nical Computer Science, 349(3):382391, December 2005.\n[14] David R. Hunter. MM algorithms for generalized Bradley-Terry models. In The Annals of Statistics,\nvolume 32, pages 384406, 2004.\n[15] Toshihiro Kamishima. Nantonac collaborative ltering: Recommendation based on order responses. In\nProc. KDD, pages 583588, 2003.\n[16] Tie-Yan Liu. Learning to Rank for Information Retrieval. Springer, 2011.\n[17] Tyler Lu and Craig Boutilier. Learning mallows models with pairwise preferences. In Proc. ICML, pages\n145152, 2011.\n[18] R. Duncan Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959.\n[19] Daniel McFadden. Conditional logit analysis of qualitative choice behavior. In Frontiers of Econometrics,\npages 105142, New York, NY, 1974. Academic Press.\n[20] Carl N. Morris. Natural Exponential Families with Quadratic Variance Functions. Annals of Statistics,\n10(1):6580, 1982.\n[21] R. L. Plackett. The analysis of permutations. JRSS. Series C, 24(2):193202, 1975.\n[22] Andr Pr kopa. Logarithmic concave measures and related topics. In Stochastic Programming, pages\ns e\n6382. Academic Press, 1980.\n[23] Ariel D. Procaccia, Sashank J. Reddi, and Nisarg Shah. A maximum likelihood approach for selecting\nsets of alternatives. In Proc. UAI, 2012.\n[24] Frank Proschan and Yung L. Tong. Chapter 29. log-concavity property of probability measures. FSU\ntechinical report Number M-805, pages 5768, 1989.\n[25] Magnus Roos, J rg Rothe, and Bj rn Scheuermann. How to calibrate the scores of biased reviewers by\no\no\nquadratic programming. In Proc. AAAI, pages 255260, 2011.\n[26] Louis Leon Thurstone. A law of comparative judgement. Psychological Review, 34(4):273286, 1927.\n[27] Nicolaus Tideman. Collective Decisions and Voting: The Potential for Public Choice. Ashgate Publishing,\n2006.\n[28] Greg C. G. Wei and Martin A. Tanner. A Monte Carlo Implementation of the EM Algorithm and the Poor\nMans Data Augmentation Algorithms. JASA, 85(411):699704, 1990.\n[29] Lirong Xia and Vincent Conitzer. A maximum likelihood approach towards aggregating partial orders. In\nProc. IJCAI, pages 446451, 2011.\n[30] Lirong Xia, Vincent Conitzer, and J r me Lang. Aggregating preferences in multi-issue domains by using\neo\nmaximum likelihood estimators. In Proc. AAMAS, pages 399406, 2010.\n[31] John I. Jr. Yellott. The relationship between Luces Choice Axiom, Thurstones Theory of Comparative\nJudgment, and the double exponential distribution. J. of Mathematical Psychology, 15(2):109144, 1977.\n[32] H. Peyton Young. Optimal voting rules. Journal of Economic Perspectives, 9(1):5164, 1995.\n\n9\n\n\f", "award": [], "sourceid": 4735, "authors": [{"given_name": "Hossein", "family_name": "Azari", "institution": null}, {"given_name": "David", "family_name": "Parks", "institution": null}, {"given_name": "Lirong", "family_name": "Xia", "institution": null}]}