{"title": "Generalized Random Utility Models with Multiple Types", "book": "Advances in Neural Information Processing Systems", "page_first": 73, "page_last": 81, "abstract": "We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents' types. Our model extends the popular setup in Berry, Levinsohn and Pakes (1995) to allow for the data-driven classification of agents' types using agent-level data. We focus on applications involving data on agents' ranking over alternatives, and present theoretical conditions that establish the identifiability of the model and uni-modality of the likelihood/posterior. Results on both real and simulated data provide support for the scalability of our approach.", "full_text": "Generalized Random Utility Models with Multiple\n\nTypes\n\nHossein Azari Sou\ufb01ani\n\nHansheng Diao\n\nZhenyu Lai\n\nDavid C. Parkes\n\nSEAS\n\nMathematics Department Economics Department\n\nSEAS\n\nHarvard University\nazari@fas.harvard.edu\n\nHarvard University\ndiao@fas.harvard.edu\n\nHarvard University\nzlai@fas.harvard.edu\n\nHarvard University\n\nparkes@eecs.harvard.edu\n\nAbstract\n\nWe propose a model for demand estimation in multi-agent, differentiated prod-\nuct settings and present an estimation algorithm that uses reversible jump MCMC\ntechniques to classify agents\u2019 types. Our model extends the popular setup in Berry,\nLevinsohn and Pakes (1995) to allow for the data-driven classi\ufb01cation of agents\u2019\ntypes using agent-level data. We focus on applications involving data on agents\u2019\nranking over alternatives, and present theoretical conditions that establish the iden-\nti\ufb01ability of the model and uni-modality of the likelihood/posterior. Results on\nboth real and simulated data provide support for the scalability of our approach.\n\n1\n\nIntroduction\n\nRandom utility models (RUM), which presume agent utility to be composed of a deterministic com-\nponent and a stochastic unobserved error component, are frequently used to model choices by in-\ndividuals over alternatives. In this paper, we focus on applications where the data is rankings by\nindividuals over alternatives. Examples from economics include the popular random coef\ufb01cients\nlogit model [7] where the data may involve a (partial) consumer ranking of products [9]. In a RUM,\neach agent receives an intrinsic utility that is common across all agents for a given choice of alter-\nnative, a pairwise-speci\ufb01c utility that varies with the interaction between agent characteristics and\nthe characteristics of the agent\u2019s chosen alternative, as well as an agent-speci\ufb01c taste shock (noise)\nfor his chosen alternative. These ingredients are used to construct a posterior/likelihood function of\nspeci\ufb01c data moments, such as the fraction of agents of each type that choose each alternative.\nTo estimate preferences across heterogenous agents, one approach as allowed by prior work [20, 24]\nis to assume a mixture of agents with a \ufb01nite number of types. We build upon this work by develop-\ning an algorithm to endogenously learn the classi\ufb01cation of agent types within this mixture. Empir-\nical researchers are increasingly being presented with rich data on the choices made by individuals,\nand asked to classify these agents into different types [28, 29] and to estimate the preferences of each\ntype [10, 23]. Examples of individual-level data used in economics include household purchases\nfrom supermarket-scanner data [1, 21], and patients\u2019 hospital or treatment choices from healthcare\ndata [22].\nThe partitioning of agents into latent, discrete sets (or \u201ctypes\u201d) allows for the study of the underlying\ndistribution of preferences across a population of heterogeneous agents. For example, preferences\nmay be correlated with an agent characteristic, such as income, and the true classi\ufb01cation of each\nagent\u2019s type, such as his income bracket, may be unobserved. By using a model of demand to esti-\nmate the elasticity in behavioral response of each type of agent and by aggregating these responses\nover the different types of agents, it is possible to simulate the impact of a social or public policy [8],\nor simulate the counterfactual outcome of changing the options available to agents [19].\n\n1\n\n\f1.1 Our Contributions\n\nThis paper focuses on estimating generalized random utility models (GRUM1) when the observed\ndata is partial orders of agents\u2019 rankings over alternatives and when latent types are present.\nWe build on recent work [3, 4] on estimating GRUMs by allowing for an interaction between agent\ncharacteristics and the characteristics of the agent\u2019s chosen alternative.The interaction term helps us\nto avoid unrealistic substitution patterns due to the independence of irrelevant alternatives [26] by\nallowing agent utilities to be correlated across alternatives with similar characteristics. For example,\nthis prevents a situation where removing the top choices of both a rich household and a poor house-\nhold lead them to become equally likely to substitute to the same alternative choice. Our model also\nallows the marginal utilities associated with the characteristics of alternatives to vary across agent\ntypes.\nTo classify agents\u2019\ntypes and\nestimate the parameters associ-\nated with each type, we pro-\npose an algorithm involving a\nnovel application of\nreversible\njump Markov Chain Monte Carlo\n(RJMCMC) techniques. RJM-\nCMC can be used for model se-\nlection and learning a posterior\non the number of types in a mix-\nture model [31]. Here, we use\nRJMCMC to cluster agents into\ndifferent types, where each type\nexhibits demand for alternatives\nbased on different preferences;\ni.e., different interaction terms be-\ntween agent and alternative char-\nacteristics.\nWe apply the approach to a real-world dataset involving consumers\u2019 preference rankings and also\nconduct experiments on synthetic data to perform coverage analysis of RJMCMC. The results show\nthat our method is scalable, and that the clustering of types provides a better \ufb01t to real world data.\nThe proposed learning algorithm is based on Bayesian methods to \ufb01nd posteriors on the parameters.\nThis differentiates us from previous estimation approaches in econometrics rely on techniques based\non the generalized method of moments.2\nThe main theoretical contribution establishes identi\ufb01ability of mixture models over data consisting\nof partial orders. Previous theoretical results have established identi\ufb01ability for data consisting of\nvectors of real numbers [2, 18], but not for data consisting of partial orders. We establish conditions\nunder which the GRUM likelihood function is uni-modal for the case of observable types. We do\nnot provide results on the log concavity of the general likelihood problem with unknown types and\nleave it for future studies.\n\nFigure 1: A GRUM with multiple types of agents\n\n1.2 Related work\n\nPrior work in econometrics has focused on developing models that use data aggregated across types\nof agents, such as at the level of a geographic market, and that allow heterogeneity by using random\ncoef\ufb01cients on either agents\u2019 preference parameters [7, 9] or on a set of dummy variables that de\ufb01ne\ntypes of agents [6, 27], or by imposing additional structure on the covariance matrix of idiosyncratic\ntaste shocks [16]. In practice, this approach typically relies on restrictive functional assumptions\nabout the distribution of consumer taste shocks that enter the RUM in order to reduce computational\n\n1De\ufb01ned in [4] as a RUM with a generalized linear model for the regression of the mean parameters on the\n\ninteraction of characteristics data as in Figure 1\n\n2There are alternative methods to RJMCMC, such as the saturation method [13]. However, the memory\nrequired to keep track of former sampled memberships in the saturation method quickly becomes infeasible\ngiven the combinatorial nature of our problem.\n\n2\n\nHeavinessSale VolumePrice GenderAgeAlternativesAgentsCharacteristics of AlternativesCharacteristics of AgentsCharacteristics RelationshipsCustomerLoyaltyExpectedUtilitiesAlternative 1Alternative 2Alternative 3Alternatives\u2019 Intrinsic E(cid:31)ectSake SushiEbi SushiTako SushiAgent Type 1Agent Type 4Agent Type 3Agent Type 2= \u03b4 + x W (z )\u00b5ij jijT1= \u03b4 + x W (z )\u00b5ij jijT2= \u03b4 + x W (z )\u00b5ij jijT3= \u03b4 + x W (z )\u00b5ij jijT4\fburden. For example, the logit model [26] assumes i.i.d. draws from a Type I extreme value dis-\ntribution. This may lead to biased estimates, in particular when the number of alternatives grow\nlarge [5].\nPrevious work on clustering ranking data for variations of the Placket-Luce (PL) model [28, 29]\nhas been restricted to settings without agent and alternative characteristics. Morover, Gormley et\nal. [28] and Chu et al. [14] performed clustering for RUMs with normal distributions, but this was\nlimited to pairwise comparisons. Inference of GRUMs for partial ranks involved the computational\nhardness addressed in [3]. In mixture models, assuming an arbitrary number of types can lead to\nbiased results, and reduces the statistical ef\ufb01ciency of the estimators [15].\nTo the best of our knowledge, we are the \ufb01rst to study the identi\ufb01ability and inference of GRUMs\nwith multiple types. Inference for GRUMs has been generalized in [4], However, Azari et al. [4]\ndo not consider existence of multiple types. Our method applies to data involving individual-level\nobservations, and partial orders with more than two alternatives. The inference method establishes\na posterior on the number of types, resolving the common issue of how the researcher should select\nthe number of types.\n2 Model\nSuppose we have N agents and M alternatives {c1, .., cM}, and there are S types (subgroups) of\nagents and s(n) is agent n\u2019s type.\nAgent characteristics are observed and de\ufb01ned as an N \u00d7K matrix X, and alternative characteristics\nare observed and de\ufb01ned as an L \u00d7 M matrix Z, where K and L are the number of agent and\nalternative characteristics respectively.\nLet unm be agent n\u2019s perceived utility for alternative m, and let W s(n) be a K \u00d7 L real matrix that\nmodels the linear relation between the attributes of alternatives and the attributes of agents. We have,\n\n(1)\nwhere (cid:126)xn is the nth row of the matrix X and (cid:126)zm is the mth column of the matrix Z. In words, agent\nn\u2019s utility for alternative m consists of the following three parts:\n\nunm = \u03b4m + (cid:126)xnW s(n)((cid:126)zm)T + \u0001nm,\n\n1. \u03b4m:gs The intrinsic utility of alternative m, which is the same across all agents;\n2. (cid:126)xnW s(n)((cid:126)zm)T : The agent-speci\ufb01c utility, which is unique to all agents of type s(n), and\n\nwhere W s(n) has at least one nonzero element;\n\n3. \u0001nm: The random noise (agent-speci\ufb01c taste shock), which is generated independently\n\nacross agents and alternatives.\n\nM\u00d7P , such that A(n)\n\nKL+m,m = 1 for 1 \u2264 m \u2264 M and A(n)\n\nThe number of parameters for each type is P = KL + M.\nSee Figure 2 for an illustration of the model. In order to write the model as a linear regression, we\nKL+m,m(cid:48) = 0 for m (cid:54)=\nde\ufb01ne matrix A(n)\n(k\u22121)L+l,m = (cid:126)xn(k)(cid:126)zm(l) for 1 \u2264 l \u2264 L and 1 \u2264 k \u2264 K. We also need to shuf\ufb02e\nm(cid:48) and A(n)\nthe parameters for all types into a P \u00d7 S matrix \u03a8, such that \u03a8KL+m,s = \u03b4 and \u03a8(k\u22121)L+l,s =\nS\u00d71 to indicate the type of agent n, with\nW s\nkl\nB(n)\ns(n),1 = 1 and B(n)\nm,1 = unm.\nWe can now rewrite (1) as:\n\ns,1 = 0 for all s (cid:54)= s(n). We also de\ufb01ne an M \u00d7 1 matrix, U (n), as U (n)\n\nfor 1 \u2264 k \u2264 K and 1 \u2264 l \u2264 L. We adopt B(n)\n\nU (n) = A(n)\u03a8B(n) + \u0001\n\n(2)\n\nbe written as, Pr(U (n)|X (n), Z, \u03a8, \u0393) = (cid:80)S\n\nSuppose that an agent has type s with probability \u03b3s. Given this, the random utility model can\ns=1 \u03b3s Pr(U (n)|X (n), Z, \u03a8s), where \u03a8s is the sth\ncolumn of the matrix \u03a8. An agent ranks the alternatives according to her perceived utilities for\nthe alternatives. De\ufb01ne rank order \u03c0n as a permutation (\u03c0n(1), . . . , \u03c0n(m)) of {1, . . . , M}. \u03c0n\nrepresents the full ranking [c\u03c0i(1) (cid:31)i c\u03c0i(2) (cid:31)i\n\u00b7\u00b7\u00b7 (cid:31)i c\u03c0i(m)] of the alternatives {c1, .., cM}.\nThat is, for agent n, cm1 (cid:31)n cm2 if and only if unm1 > unm2 (In this model, situations with tied\nperceived utilities have zero probability measure).\n\n3\n\n\fThe model for observed data \u03c0(n), can be written as:\nPr(\u03c0(n)|X (n), Z, \u0393, \u03a8) =\n\nPr(U (n)|X (n), Z, \u03a8, \u0393) =\n\n(cid:90)\n\n\u03c0(n)=order (U (n))\n\nS(cid:88)\n\ns=1\n\n\u03b3s Pr(\u03c0(n)|X (n), Z, \u03a8s)\n\nNote that X (n) and Z are observed characteristics, while \u0393 and \u03a8 are unknown parameters. \u03c0 =\norder (U ) is the ranking implied by U, and \u03c0(i) is the ith largest utility in U. D = {\u03c01, .., \u03c0N}\ndenotes the collection of all data for different agents. We have that\n\nN(cid:89)\n\nPr(D|X, Z, \u03a8, \u0393) =\n\nPr(\u03c0(n)|X (n), Z, \u03a8, \u0393)\n\n3 Strict Log-concavity and Identi\ufb01ability\n\nn=1\n\nIn this section, we establish conditions for identi\ufb01ability of the types and parameters for the model.\nIdenti\ufb01ability is a necessary property in order for researchers to be able to infer economically-\nrelevant parameters from an econometric model. Establishing identi\ufb01ability in a model with multiple\ntypes and ranking data requires a different approach from classical identi\ufb01ability results for mixture\nmodels [2, 18, e.g.].\nMoreover, we establish conditions for uni-modality of the\nlikelihood for the parameters \u0393 and \u03a8, when the types\nare observed. Although our main focus is on data with\nunobservable types, establishing the conditions for uni-\nmodality conditioned on known types remains an essen-\ntial step because of the sampling and optimization aspects\nof RJMCMC. We sample from the parameters conditional\non the algorithm\u2019s speci\ufb01cation of types.\nThe uni-modality result establishes that the sampling ap-\nproach is exploring a uni-modal distribution conditional\non its speci\ufb01ed types. Despite adopting a Bayesian point\nof view in presenting the model, we adopt a uniform prior\non the parameter set, and only impose nontrivial priors on\nthe number of types in order to obtain some regulariza-\ntion. Given this, we present the theory with regards to the\nlikelihood function from the data rather than the posterior\non parameters.\n\nFigure 2: Graphical representation of\nthe multiple type GRUM generative\nprocess.\n\n3.1 Strict Log-concavity of the Likelihood Function\n\nk,l xn(k)W s(n)\n\nkl\n\n1 ( (cid:126)\u03c8,(cid:126)\u0001) \u2265 0, ..., gn\n\nM\u22121( (cid:126)\u03c8,(cid:126)\u0001) \u2265 0). This is because gn\n\nm = 1, .., M \u2212 1 where \u00b5nj = \u03b4j +(cid:80)\n\nFor agent n, we de\ufb01ne a set Gn of function gn\u2019s whose positivity is equivalent to giving an order\nm( (cid:126)\u03c8,(cid:126)\u0001) = [\u00b5n\u03c0n(m) + \u0001n\u03c0n(m)] \u2212 [\u00b5n\u03c0n(m+1) + \u0001n\u03c0n(m+1)] for\n\u03c0n. More precisely, we de\ufb01ne gn\nzj(l) for 1 \u2264 j \u2264 M. Here, (cid:126)\u03c8 is a\nvector of KL + M variables consisting of all \u03b4j\u2019s and Wkl\u2019s. We have, L( (cid:126)\u03c8, \u03c0n) = L( (cid:126)\u03c8, Gn) =\nm( (cid:126)\u03c8,(cid:126)\u0001) \u2265 0 is equivalent to saying\nPr(gn\nalternative \u03c0n(m) is preferred to alternative \u03c0n(m + 1) in the RUM sense.\nThen using the result in [3] and [30], L( (cid:126)\u03c8) = L( (cid:126)\u03c8, \u03c0) is logarithmic concave in the sense that\nL(\u03bb (cid:126)\u03c8 + (1 \u2212 \u03bb) (cid:126)\u03c8(cid:48)) \u2265 L(\u03c8)\u03bbL(\u03c8(cid:48))1\u2212\u03bb for any 0 < \u03bb < 1 and any two vectors (cid:126)\u03c8, (cid:126)\u03c8(cid:48) \u2208 RLK+M .\nThe detailed statement and proof of this result are contained in the Appendix. Let\u2019s consider all\nn=1 log P r(\u03c0n| (cid:126)\u03c8s(n)). By log-concavity\nof L( (cid:126)\u03c8, \u03c0) and using the fact that sum of concave functions is concave, we know that l(\u03a8, D) is\nconcave in \u03a8, viewed as a vector in RSKL+M . To show uni-modality, we need to prove that this\n\nn agents together. We study the function, l(\u03a8, D) = (cid:80)N\n\n4\n\nXZ\u03b4W\u03b3A\u03c8Bu\u03c0N(n)(n)(n)(n)(n)(n)\fconcave function has a unique maximum. Namely, we need to be able to establish the conditions for\nwhen the equality holds. If our data is subject to some mild condition, which implies boundedness\nof the parameter set that maximizes l(\u03a8, D), Theorem 1 bellow tells us when the equality holds.\nThis condition has been explained in [3] as condition (1).\nBefore stating the main result, we de\ufb01ne the following auxiliary (M \u2212 1)N(cid:48) \u00d7 (SKL + M \u2212 1)\n(Here, let N(cid:48) \u2264 N be a positive number that we will specify later.) such that,\nto 0 if s (cid:54)= s(n), for all 1 \u2264 n \u2264 N(cid:48), 1 \u2264 m \u2264 M \u2212 1, 1 \u2264 s \u2264 S, 1 \u2264 k \u2264 K, and\nfor all 1 \u2264 m, m(cid:48) \u2264 M \u2212 1 and 1 \u2264 n \u2264 N(cid:48).\n\nmatrix (cid:101)A = (cid:101)AN(cid:48)\n(cid:101)A(M\u22121)(n\u22121)+m,(s\u22121)KL+(K\u22121)l+k is equal to xn(k)(zm(l) \u2212 zM (l))if s = s(n) and is equal\n1 \u2264 l \u2264 L. Also, (cid:101)A(M\u22121)(n\u22121)+m,SKL+m(cid:48) is equal to 1 if m = m(cid:48) and is equal to 0 if m (cid:54)= m(cid:48),\nTheorem 1. Suppose there is an N(cid:48) \u2264 N such that rank (cid:101)AN(cid:48)\n\n= SKL + M \u2212 1. Then l(\u03a8) =\n\nl(\u03a8, D) is strictly concave up to \u03b4-shift, in the sense that,\n\nl(\u03bb\u03a8 + (1 \u2212 \u03bb)\u03a8(cid:48)) \u2265 \u03bbl(\u03a8) + (1 \u2212 \u03bb)l(\u03a8(cid:48)),\n\n(3)\nfor any 0 < \u03bb < 1 and any \u03a8, \u03a8(cid:48) \u2208 RSKL+M , and the equality holds if and only if there exists\nc \u2208 R, such that:\n\n(cid:26) \u03b4m = \u03b4(cid:48)\n\nm + c\nkl = W (cid:48)s\nkl\n\nW s\n\nfor all 1 \u2264 m \u2264 M\nfor all s, k, l\n\nThe proof of this theorem is in the appendix.\nRemark 1. We remark that the strictness \u201cup to \u03b4-shift\u201d is natural. A \u03b4-shift results in a shift in the\nintrinsic utilities of all the products, which does not change the utility difference between products.\nSo such a shift does not affect our outcome. In practice, we may set one of the \u03b4\u2019s to be 0 and then\nour algorithm will converge to a single maximum.\nRemark 2. It\u2019s easy to see that N(cid:48) must be larger than or equal to 1 + SKL\nintroduce N(cid:48) is to avoid cumbersome calculations involving N.\n\nM\u22121 . The reason we\n\n3.2\n\nIdenti\ufb01ability of the Model\n\nIn this section, we show that, for the case of unobserved types, our model is identi\ufb01able for a certain\nclass of cdfs for the noise in random utility models. Let\u2019s \ufb01rst specify this class of \u201cnice\u201d cdfs:\nDe\ufb01nition 1. Let \u03c6(x) be a smooth pdf de\ufb01ned on R or [0,\u221e), and let \u03a6(x) be the associated cdf.\nFor each i \u2265 1, we write \u03c6(i)(x) for the i-th derivative of \u03c6(x). Let gi(x) = \u03c6(i+1)(x)\n\u03c6(i)(x) . The function\n\u03a6 is called nice if it satis\ufb01es one of the following two mutually exclusive conditions:\n\n(a) \u03c6(x) is de\ufb01ned on R. For any x1, x2 \u2208 R, the sequence gi(x1)\nR (as i \u2192 \u221e) only if either x1 = x2; or x1 = \u2212x2 and gi(x1)\n\n(b) \u03c6(x) is de\ufb01ned on [0,\u221e). For any x1, x2 \u2265 0, the ratio \u03c6(i)(x1)\n\nsuf\ufb01ciently large. Moreover, we require that \u03c6(x1) = \u03c6(x2) if and only if x1 = x2.\n\ngi(x2) converges to some value in\ngi(x2) \u2192 \u22121 as i \u2192 \u221e.\n\u03c6(i)(x2) is independent of i for i\n\nS(cid:88)\n\nS(cid:48)(cid:88)\n\nThis class of nice functions contains normal distributions and exponential distributions. A proof of\nthis fact is included in the appendix.\nIdenti\ufb01ability is formalized as follows: Let C = {{\u03b3s}S\nSuppose, for two sequences {\u03b3s}S\n\ns=1 | S \u2208 Z>0, \u03b3i \u2208 R>0,(cid:80)S\n\ns=1 \u03b3s = 1}.\n\ns}S(cid:48)\ns=1, we have:\n\ns=1 and {\u03b3(cid:48)\n\n\u03b3s Pr(\u03c0|X (n), Z, \u03a8) =\n\ns Pr(\u03c0|X (n), Z, \u03a8(cid:48))\n\u03b3(cid:48)\n\n(4)\n\ns=1\n\ns=1\n\nfor all possible orders \u03c0 of M products, and for all agents n. Then, we must have S = S(cid:48) and (up to\na permutation of indices {1,\u00b7\u00b7\u00b7 , S}) \u03b3s = \u03b3(cid:48)\n\ns and \u03a8 = \u03a8(cid:48) (up to \u03b4-shift).\n\n5\n\n\fS(cid:48)(cid:88)\n\nS(cid:48)(cid:88)\n\nS(cid:88)\n\nFor now, let\u2019s \ufb01x the number of agent characteristics, K. One observation is that the number xn(k),\nfor any characteristic k, re\ufb02ects certain characteristics of agent n. Varying the agent n, this amount\nxn(k) is in a bounded interval in R. Suppose the collection of data D is suf\ufb01ciently large. Based\non this, assuming that N can be be arbitrarily large, we can assume that the xn(k)\u2019s form a dense\nsubset in a closed interval Ik \u2282 R. Hence, (4) should hold for any X \u2208 Ik, leading to the following\ntheorem:\n\nTheorem 2. De\ufb01ne an (M \u2212 1)\u00d7 L matrix (cid:101)Z by setting (cid:101)Zm,l = zm(l)\u2212 zM (l). Suppose the matrix\n(cid:101)Z has rank L, and suppose,\nS(cid:88)\n\nS(cid:48)(cid:88)\n\n\u03b3s Pr(\u03c0|X, Z, \u03a8) =\n\ns Pr(\u03c0|X, Z, \u03a8(cid:48)),\n\u03b3(cid:48)\n\n(5)\n\ns=1\n\ns=1\n\ns and \u03a8 = \u03a8(cid:48) (up to \u03b4-shift).\n\nfor all x(k) \u2208 Ik and all possible orders \u03c0 of M products. Here, the probability measure is associ-\nated with a nice cdf. Then we must have S = S(cid:48) and (up to a permutation of indices {1,\u00b7\u00b7\u00b7 , S}),\n\u03b3s = \u03b3(cid:48)\nThe proof of this theorem is provided in the appendix. Here, we illustrate the idea for the simple\ncase, with two alternatives (m = 2) and no agent or alternative characteristics (K = L = 1).\nEquation (5) is merely a single identity. Unwrapping the de\ufb01nition, we obtain:\n\nS(cid:88)\n\n\u03b3s Pr(\u00011\u2212\u00012 > \u03b41\u2212\u03b42 +xW s(z1\u2212z2)) =\n\ns Pr(\u00011\u2212\u00012 > \u03b4(cid:48)\n\u03b3(cid:48)\n\n1\u2212\u03b4(cid:48)\n\n2 +xW (cid:48)s(z1\u2212z2)). (6)\n\ns=1\n\ns=1\n\nWithout loss of generality, we may assume z1 = 1, z2 = 0, and \u03b42 = 0. We may further assume\nthat the interval I = I1 contains 0. (Otherwise, we just need to shift I and \u03b4 accordingly.) Given\nthis, the problem reduces to the following lemma:\nLemma 1. Let \u03a6(x) be a nice cdf. Suppose,\n\n\u03b3s\u03a6(\u03b4 + xW s) =\n\n\u03b3(cid:48)\ns\u03a6(\u03b4(cid:48) + xW (cid:48)s),\n\n(7)\n\ns=1\n\ns=1\n\nfor all x in a closed interval I containing 0. Then we must have S = S(cid:48), \u03b4 = \u03b4(cid:48) and (up to a\npermutation of {1,\u00b7\u00b7\u00b7 , S}) \u03b3s = \u03b3s, W s = W (cid:48)s.\nThe proof of this lemma is in the appendix. By applying this to (6), we can show identi\ufb01ablity for\nthe simple case of m = 2 and K = L = 1.\nTheorem 2 guarantees identi\ufb01ability in the limit case that we observe agents with characteristics\nthat are dense in an interval. Beyond the theoretical guarantee, we would in practice expect (6) to\nhave a unique solution with a enough agents with different characteristics. Lemma 1 itself is a new\nidenti\ufb01ability result for scalar observations from a set of truncated distributions.\n\n4 RJMCMC for Parameter Estimation\n\nWe are using a uniform prior for the parameter space and regularize the number of types with a\ngeometric prior. We use a Gibbs sampler, as detailed in the appendix (supplementary material\nAlgorithm (1)) to sample from the posterior. In each of T iterations, we sample utilities un for\neach agent, matrix \u03c8s for each type, and set of assignments of agents to alternatives Sn. The utility\nof each agent for each alternative conditioned on the data and other parameters is sampled from\na truncated Exponential Family (e.g. Normal) distribution.\nIn order to sample agent i\u2019s utility\nfor alternative j (uij), we set thresholds for lower and upper truncation based on agent i\u2019s former\nsamples of utility for the two alternatives that are ranked one below and one above alternative j,\nrespectively.\nWe use reversible-jump MCMC [17] for sampling from conditional distributions of the assignment\nfunction (see Algorithm 1). We consider three possible moves for sampling from the assignment\nfunction S(n):\n\n6\n\n\fa\n\n1\n\n1\nS\n\nof\n\nby\n\nS+1\n\n.\n\none,\n\n. p+1\np\u22121\n\nIncreasing\n\nthis move is:\n\nPr(S) Pr(M(t)|D)\n\nnumber\nits own.\n\nthrough moving\n\n(1)\nthe\ntypes\nrandom agent\nto a new type of\nThe acceptance ratio for\nPrsplit =\nmin{1, Pr(S+1) Pr(M(t+1)|D)\np(\u03b1) .J(t)\u2192(t+1)}, where M(t) = {u, \u03c8, B, S, \u03c0}(t),\n. 1\nand J(t)\u2192(t+1) = 2P is the Jacobian of the transformation from the previous state to the proposed\nstate and Pr(S) is the prior (regularizer) for the number of types.\n(2) Decrease the number of types by one, through merging two random types. The acceptance ratio\nfor the merge move is: Prmerge = min{1, Pr(S\u22121) Pr(M(t+1)|D)\n(3) We do not change the number of types, and consider moving one random agent from one type to\nanother. This case reduces to a standard Metropolis-Hastings, where because of the normal symmet-\nric proposal distribution, the proposal is accepted with probability: Prmh = min{1, Pr(M(t+1)|D)\nPr(M(t)|D) }.\nAlgorithm 1 RJMCMC to update S(t+1)(n) from\nS(t)(n)\n\n.J(t)\u2192(t+1)}.\n\nPr(S) Pr(M(t)|D)\n\n5 Experimental Study\n\n. p\u22121\np+1\n\n.\n\n1\n\nS\u22121\n\n1\nS\n\nWe evaluate the performance of the algorithm\non synthetic data, and for a real world data\nset in which we observe agents\u2019 characteris-\ntics and their orderings on alternatives. For the\nsynthetic data, we generate data with different\nnumbers of types and perform RJMCMC in or-\nder to estimate the parameters and number of\ntypes. The algorithm is implemented in MAT-\nLAB and scales linearly in the number of sam-\nIt takes on average 60 \u00b1 5\nples and agents.\nseconds to generate 50 samples for N = 200,\nM = 10, K = 4 and L = 3 on an i5 2.70GHz\nIntel(R).\n\nSet p\u22121, p0, p+1, Find S: number of distinct\ntypes in S(t)(n)\nPropose move \u03bd from {\u22121, 0, +1} with proba-\nbilities p\u22121, p0, p+1, respectively.\ncase \u03bd = +1:\n\nSelect random type Ms and agent n \u2208 Ms\nuniformly and Assign n to module Ms1 and\nremainder to Ms2 and Draw vector \u03b1 \u223c\nN (0, 1) and Propose \u03c8s1 = \u03c8s \u2212 \u03b1 and\n\u03c8s2 = \u03c8s + \u03b1 and Compute proposal\n{un, \u03c0n}(t+1)\nAccept\nS + 1,\nS(t+1)(Ms2) = s with Prsplit from up-\ndate S = S + 1\n\nS(t+1)(Ms1 )\n\n=\n\ncase \u03bd = \u22121:\n\nSelect\ntwo random types Ms1 and Ms2\nand Merge into one type Ms and Propose\n\u03c8s = (\u03c8s1 + \u03c8s1)/2 and Compute proposed\n{un, \u03c0n}(i+1)\nAccept S(t+1)(n) = s1 for \u2200n s.t. S(t)(n) =\ns2 with Prmerge update S = S \u2212 1\n\nCoverage Analysis for the number of types S\nfor Synthetic Data:\nIn this experiment, the\ndata is generated from a randomly chosen num-\nber of clusters S for N = 200, K = 3, L = 3\nand M = 10 and the posterior on S is es-\ntimated using RJMCMC. The prior is chosen\nto be Pr(S) \u221d exp(\u22123SKL). We consider\na noisy regime by generating data from noise\nlevel of \u03c3 = 1, where all the characteristics\n(X,Z) are generated from N (0, 1). We repeat\nthe experiment 100 times. Given this, we esti-\nmate 60%, 90% and 95% con\ufb01dence intervals\nfor the number of types from the posterior sam-\nples. We also estimate the coverage percentage,\nwhich is de\ufb01ned to be the percentage of samples which include the true number of types in the\ninterval. The simulations show 61%, 73%, 88%, 93% for the intervals 60%, 75%, 90%, 95%\nrespectively, which indicates that the method is providing reliable intervals for the number of types.\n\nSelect two random types Ms1 and Ms2 and\nMove a random agent n from Ms1 to Ms2\nand Compute proposed {u(n), \u03c0(n)}(t+1)\nAccept S(t+1)(n) = s2 with probability\nPrmh\n\ncase \u03bd = 0:\n\nend switch\n\nPerformance for Synthetic Data: We generate data randomly from a model with between 1 and\n4 types. N is set to 200, and M is set to 10 for K = 4 and L = 3. We draw 10, 000 samples from\nthe stationary posterior distribution. The prior for S has chosen to be exp(\u2212\u03b1SKL) where \u03b1 is\nuniformly chosen in (0, 10). We repeat the experiment 5 times. Table 1 shows that the algorithm\nsuccessfully provides larger log posterior when the number of types is the number of true types.\n\nClustering Performance for Real World Data: We have tested our algorithm on a sushi dataset,\nwhere 5, 000 users provide rankings on M = 10 different kinds of sushi [25]. We \ufb01t the multi-type\n\n7\n\n\fGRUM for different number of types, on 100 randomly chosen subsets of the sushi data with size\nN = 200 , using the same prior we used in synthetic case and provide the performance on the Sushi\ndata in Table 1. It can be seen that GRUM with 3 types has signi\ufb01cantly better performance in terms\nof log posterior (with the prior that we chose, log posterior can be seen as log likelihood penalized\nfor number of parameters) than GRUM with one, two or four types. We have taken non-categorical\nfeatures as K = 4 feature for agents (age, time for \ufb01lling the questionnaire, region ID, prefecture\nID) and L = 3 features for sushi ( price,heaviness, sales volume).\n6 Conclusions\n\nFigure 3: Left Panel: 10000 samples for S in Syn-\nthetic data, where the true S is 5. Right Panel:\nHistogram of the samples for S with max at 5 and\nmean at 4.56.\n\nSynthetic True types\n\nType\n\nIn this paper, we have proposed an extension of\nGRUMs in which we allow agents to adopt het-\nerogeneous types. We develop a theory estab-\nlishing the identi\ufb01ability of the mixture model\nwhen we observe ranking data. Our theoreti-\ncal results for identi\ufb01ability show that the num-\nber of types and the parameters associated with\nthem can be identi\ufb01ed. Moreover, we prove\nuni-modality of the likelihood (or posterior)\nfunction when types are observable. We pro-\npose a scalable algorithm for inference, which\ncan be parallelized for use on very large data\nsets. Our experimental results show that models\nwith multiple types provide a signi\ufb01cantly bet-\nter \ufb01t, in real-world data. By clustering agents\ninto multiple types, our estimation algorithm\nallows choices to be correlated across agents\nof the same type, without making any a priori\nassumptions on how types of agents are to be\npartitioned. This use of machine learning tech-\nniques complements various approaches in economics [11, 7, 8] by allowing the researcher to have\nadditional \ufb02exibility in dealing with missing data or unobserved agent characteristics. We expect\nthe development of these techniques to grow in importance as large, individual-level datasets be-\ncome increasingly available. In future research we intend to pursue applications of this method to\nproblems of economic interest.\n\nSushi\nOne two Three Four sushi\none type\n-2069 -2631 -2780 -2907 -2880\n-2755 -2522 -2545 -2692 -2849\ntwo types\nthree types -2796 -2642 -2582 -2790 -2819\n-2778 -2807 -2803 -2593 -2850\nfour types\n\nTable 1: Performance of the method for different\nnumber of true types and number of types in algorithm\nin terms of log posterior. All the standard deviations\nare between 15 and 20. Bold numbers indicate the\nbest performance in their column with statistical sig-\nni\ufb01cance of 95%.\n\nAcknowledgments\n\nThis work is supported in part by NSF Grants No. CCF- 0915016 and No. AF-1301976. We thank\nElham Azizi for helping in the design and implementation of RJMCMC algorithm. We thank Simon\nLunagomez for helpful discussion on RJMCMC. We thank Lirong Xia, Gregory Lewis, Edoardo\nAiroldi, Ryan Adams and Nikhil Agarwal for comments on the modeling and algorithmic aspects of\nthis paper. We thank anonymous NIPS-13 reviewers, for helpful comments and suggestions.\n\nReferences\n[1] Daniel A. Ackerberg. Advertising, learning, and consumer choice in experience goods: An empirical\n\nexamination. International Economic Review, 44(3):1007\u20131040, 2003.\n\n[2] N. Atienza, J. Garcia-Heras, and J.M. Muoz-Pichardo. A new condition for identi\ufb01ability of \ufb01nite mix-\n\nture distributions. Metrika, 63(2):215\u2013221, 2006.\n\n[3] Hossein Azari Sou\ufb01ani, David C. Parkes, and Lirong Xia. Random utility theory for social choice. In\nProceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 126\u2013\n134, Lake Tahoe, NV, USA, 2012.\n\n[4] Hossein Azari Sou\ufb01ani, David C. Parkes, and Lirong Xia. Preference elicitation for generalized random\nutility models. In Proceedings of the Annual Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI),\nBellevue, Washington, USA, 2013.\n\n[5] Patrick Bajari and C. Lanier Benkard. Discrete choice models as structural models of demand: Some\n\neconomic implications of common approaches. Technical report, Working Paper, 2003.\n\n8\n\n020004000600080001000002468100246810Number of Subgroups (S)IterationsFrequency\fBiometrika, 82(4):711\u2013732, 1995.\n[18] Bettina Grn and Friedrich Leisch.\n\nIdenti\ufb01ability of \ufb01nite mixtures of multinomial logit models with\n\n[6] James Berkovec and John Rust. A nested logit model of automobile holdings for one vehicle households.\n\nTransportation Research Part B: Methodological, 19(4):275\u2013285, 1985.\n\n[7] Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equilibrium. Economet-\n\nrica, 63(4):841\u2013890, 1995.\n\n[8] Steven Berry, James Levinsohn, and Ariel Pakes. Voluntary export restraints on automobiles: evaluating\n\na trade policy. The American Economic Review, 89(3):400\u2013430, 1999.\n\n[9] Steven Berry, James Levinsohn, and Ariel Pakes. Differentiated products demand systems from a com-\nbination of micro and macro data: The new car market. Journal of Political Economy, 112(1):68\u2013105,\n2004.\n\n[10] Steven Berry and Ariel Pakes. Some applications and limitations of recent advances in empirical indus-\n\ntrial organization: Merger analysis. The American Economic Review, 83(2):247\u2013252, 1993.\n\n[11] Steven Berry. Estimating discrete-choice models of product differentiation. The RAND Journal of Eco-\n\nnomics, pages 242\u2013262, 1994.\n\n[12] Edwin Bonilla, Shengbo Guo, and Scott Sanner. Gaussian process preference elicitation. In Advances in\n\nNeural Information Processing Systems 23, pages 262\u2013270. 2010.\n\n[13] Stephen P Brooks, Paulo Giudici, and Gareth O Roberts. Ef\ufb01cient construction of reversible jump\nMarkov chain Monte Carlo proposal distributions. Journal of the Royal Statistical Society: Series B\n(Statistical Methodology), 65(1):3\u201339, 2003.\n\n[14] Wei Chu and Zoubin Ghahramani. Gaussian processes for ordinal regression. In Journal of Machine\n\nLearning Research, pages 1019\u20131041, 2005.\n\n[15] Chris Fraley and Adrian E. Raftery. How many clusters? which clustering method? answers via model-\n\nbased cluster analysis. THE COMPUTER JOURNAL, 41(8):578\u2013588, 1998.\n\n[16] John Geweke, Michael Keane, and David Runkle. Alternative computational approaches to inference in\n\nthe multinomial probit model. Review of Economics and Statistics, pages 609\u2013632, 1994.\n\n[17] P.J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.\n\nvarying and \ufb01xed effects. Journal of Classi\ufb01cation, 25(2):225\u2013247, 2008.\n\n[19] Jerry A. Hausman. Valuation of new goods under perfect and imperfect competition. In The economies\n\nof new goods, pages 207\u2013248. University of Chicago Press, 1996.\n\n[20] James J. Heckman and Burton Singer. Econometric duration analysis. Journal of Econometrics, 24(1-\n\n2):63\u2013132, 1984.\n\n[21] Igal Hendel and Aviv Nevo. Measuring the implications of sales and consumer inventory behavior.\n\nEconometrica, 74(6):1637\u20131673, 2006.\n\n[22] Katherine Ho. The welfare effects of restricted hospital choice in the us medical care market. Journal of\n\nApplied Econometrics, 21(7):1039\u20131079, 2006.\n\n[23] Neil Houlsby, Jose Miguel Hernandez-Lobato, Ferenc Huszar, and Zoubin Ghahramani. Collaborative\ngaussian processes for preference learning. In Proceedings of the Annual Conference on Neural Infor-\nmation Processing Systems (NIPS), pages 2105\u20132113. Lake Tahoe, NV, USA, 2012.\n\n[24] Kamel Jedidi, Harsharanjeet S. Jagpal, and Wayne S. DeSarbo. Finite-mixture structural equation models\nfor response-based segmentation and unobserved heterogeneity. Marketing Science, 16(1):39\u201359, 1997.\n[25] Toshihiro Kamishima. Nantonac collaborative \ufb01ltering: Recommendation based on order responses. In\nProceedings of the Ninth International Conference on Knowledge Discovery and Data Mining (KDD),\npages 583\u2013588, Washington, DC, USA, 2003.\n\n[26] Daniel McFadden. The measurement of urban travel demand. Journal of Public Economics, 3(4):303\u2013\n\n328, 1974.\n\n[27] Daniel McFadden. Modelling the choice of residential location.\n\nIn Daniel McFadden, A Karlqvist,\nL Lundqvist, F Snickars, and J Weibull, editors, Spatial Interaction Theory and Planing Models, pages\n75\u201396. New York: Academic Press, 1978.\n\n[28] Gormley-Claire McParland, Damien. Clustering ordinal data via latent variable models. IFCS 2013 Con-\nference of the International Federation of Classi\ufb01cation Societies, Tilburg University, The Netherlands,\n2013.\n\n[29] Marina Meila and Harr Chen. Dirichlet process mixtures of generalized Mallows models. arXiv preprint\n\narXiv:1203.3496, 2012.\n\n[30] Andr\u00b4as Pr\u00b4ekopa. Logarithmic concave measures and related topics. In Stochastic Programming, pages\n\n63\u201382. Academic Press, 1980.\n\n[31] Mahlet G. Tadesse, Naijun Sha, and Marina Vannucci. Bayesian variable selection in clustering high-\n\ndimensional data. Journal of the American Statistical Association, 100(470):602\u2013617, 2005.\n\n9\n\n\f", "award": [], "sourceid": 87, "authors": [{"given_name": "Hossein", "family_name": "Azari Soufiani", "institution": "Harvard University"}, {"given_name": "Hansheng", "family_name": "Diao", "institution": "Harvard University"}, {"given_name": "Zhenyu", "family_name": "Lai", "institution": "Harvard University"}, {"given_name": "David", "family_name": "Parkes", "institution": "Harvard University"}]}