{"title": "A Data-Driven Approach to Modeling Choice", "book": "Advances in Neural Information Processing Systems", "page_first": 504, "page_last": 512, "abstract": "We visit the following fundamental problem: For a `generic model of consumer choice (namely, distributions over preference lists) and a limited amount of data on how consumers actually make decisions (such as marginal preference information), how may one predict revenues from offering a particular assortment of choices? This problem is central to areas within operations research, marketing and econometrics. We present a framework to answer such questions and design a number of tractable algorithms (from a data and computational standpoint) for the same.", "full_text": "A Data-Driven Approach to Modeling Choice\n\nVivek F. Farias\n\nSrikanth Jagabathula\n\nDevavrat Shah\u2217\n\nAbstract\n\nWe visit the following fundamental problem: For a \u2018generic\u2019 model of con-\nsumer choice (namely, distributions over preference lists) and a limited\namount of data on how consumers actually make decisions (such as marginal\npreference information), how may one predict revenues from o\ufb00ering a par-\nticular assortment of choices? This problem is central to areas within op-\nerations research, marketing and econometrics. We present a framework to\nanswer such questions and design a number of tractable algorithms (from\na data and computational standpoint) for the same.\n\n1 Introduction\n\nConsider a seller who must pick from a universe of N products, N , a subset M of products\nto o\ufb00er to his customers. The ith product has price pi. Given a probabilistic model of how\ncustomers make choices, P(\u00b7|\u00b7), where P(i|M) is the probability that a potential customer\npurchases product i when faced with options M, the seller may solve\n(1)\n\nmax\n\nM\u2282N \u00d8i\u2208M\n\npiP(i|M).\n\nIn addition to being a potentially non-trivial optimization problem, one faces a far more\nfundamental obstacle here: specifying the \u2018choice\u2019 model P(\u00b7|\u00b7) is a di\ufb03cult task and it\nis unlikely that a seller will have su\ufb03cient data to estimate a generic such model. Thus,\nsimply predicting expected revenues, R(M) = qi\u2208M piP(i|M), for a given o\ufb00er set, M, is\n\na di\ufb03cult task. This problem, and variants thereof, are central in the \ufb01elds of marketing,\noperations research and econometrics. With a few exceptions, the typical approach to\ndealing with the challenge of specifying a choice model with limited data has been to make\nparametric assumptions on the choice model that allow for its estimation from a limited\namount of data. This approach has a natural de\ufb01ciency \u2013 the implicit assumptions made in\nspecifying a parametric model of choice may not hold. Indeed, for one of the most commonly\nused parametric models in modern practice (the multinomial logit), it is a simple task to\ncome up with a list of de\ufb01ciencies, ranging from serious economic fallacies presumed by\nthe model ([5]), to a lack of statistical \ufb01t to observed data for real-world problems ([1, 8]).\nThese issues have led to a proliferation of increasingly arcane parametric choice models.\n\nThe present work considers the following question: given a limited amount of data on\ncustomer preferences and assuming only a \u2018generic\u2019 model of customer choice, what can one\npredict about expected revenues from a given set of products? We take as our \u2018generic\u2019\nmodel of customer choice, the set of distributions over all possible customer preference lists\n(i.e. all possible permutations of N ). We will subsequently see that essentially all extant\nmodels of customer choice can be viewed as a special case of this generic model. We view\n\u2018data\u2019 as some linear transformation of the distribution specifying the choice model, yielding\nmarginal information. Again, we will see that this view is consistent with reality.\n\n\u2217VF, DS are a\ufb03liated with ORC; VF with Sloan School of Management; SJ and DS with LIDS\nand Department of EECS at MIT. Emails: vfarias, jskanth, devavrat@mit.edu. The work was\nsupported in part by NSF CAREER CNS 0546590.\n\n1\n\n\fGiven these views, we \ufb01rst consider \ufb01nding the \u2018simplest\u2019 choice model, consistent with the\nobserved marginal data on customer preferences. Here we take as our notion of simple,\na distribution over permutations of N with the sparsest support. We present two simple\nabstract properties that if satis\ufb01ed by the \u2018true\u2019 choice model, allow us to solve the sparsest\n\ufb01t problem exactly via a simple combinatorial procedure (Theorem 2). In fact, the sparsest\n\ufb01t in this case coincides with the true model (Theorem 1). We present a generative family of\nchoice models that illustrates when the two properties we posit may be expected to hold (see\nTheorem 3). More generally, when we may not anticipate the above abstract properties, we\nseek to \ufb01nd a \u2018worst-case\u2019 distribution consistent with the observed data in the sense that\nthis distribution yields minimum expected revenues for a given o\ufb00er set M while remaining\nconsistent with the observed marginal data. This entails solving mathematical programs\nwith as many variables as there are permutations (N !).\nIn spite of this, we present a\nsimple e\ufb03cient procedure to solve this problem that is exact for certain interesting types\nof data and produces approximations (and computable error bounds) in general. Finally,\nwe present a computational study illustrating the e\ufb03cacy of our approach relative to a\nparametric technique on a real-world data set.\n\nOur main contribution is thus a novel approach to modeling customer choice given limited\ndata. The approach we propose is complemented with e\ufb03cient, implementable algorithms.\nThese algorithms yield subroutines that make non-parametric revenue predictions for any\ngiven o\ufb00er set (i.e. predict R(M) for any M) given limited data. Such subroutines could\nthen be used in conjunction with generic set-function optimization heuristics to solve (1).\n\nRelevant Literature: There is a vast body of literature on the parametric modeling\nof customer choice; a seminal paper in this regard is [10]. See also [14] and references\ntherein for an overview of the area with an emphasis on applications. There is a stream of\nresearch (eg. [6]) on estimating and optimizing (parametric) choice models when products\npossess measurable attributes that are the sole in\ufb02uencers of choice; we do not assume the\navailability of such attributes and thus do not consider this situation here. A non-parametric\napproach to choice modeling is considered by [12]; that work studies a somewhat distinct\npricing problem, and assumes the availability of a speci\ufb01c type of rich observable data.\nFitting a sparsest model to observable data has recently become of great interest in the area\nof compressive sensing in signal processing [3, 7], and in the design of sketches for streaming\nalgorithms [2, 4]. This work focuses on deriving precise conditions on the support size of\nthe true model, which, when satis\ufb01ed, guarantee that the sparsest solution is indeed the\ntrue solution. However, these prior methods do not apply in the present context (see [9]);\ntherefore, we take a distinct approach to the problem in this paper.\n\n2 The Choice Model and Problem Formulations\n\nWe consider a universe of N products, N = {0, 1, 2, . . . , N \u2212 1}. We assume that the 0th\nproduct in N corresponds to the \u2018outside\u2019 or \u2018no-purchase\u2019 option. A consumer is associated\nwith a permutation \u03c3 of the elements of N ; the customer prefers product i to product j\ni\ufb00 \u03c3(i) < \u03c3(j). Given that the customer is faced with a set of alternatives M \u2282 N , she\nchooses to purchase her single most preferred product among those in M. In particular, she\npurchases argmini\u2208M \u03c3(i).\nChoice Model: We take as our model of customer choice a distribution, \u03bb : SN \u2192 [0, 1],\nover all possible permutations (i.e. the set of all permutations SN ). De\ufb01ne the set\n\nSj(M) = {\u03c3 \u2208 SN : \u03c3(j) < \u03c3(i),\u2200i \u2208 M, i \u00d3= j}\n\nas the set of all customer types that would result in a purchase of j when the o\ufb00er set is\nM. Our choice model is thus\n\nP(j|M) = \u00d8\u03c3\u2208Sj (M)\n\n\u03bb(\u03c3) , \u03bbj(M).\n\nThis model subsumes a vast body of extant parametric choice models.\nRevenues: We associate every product in N with a retail price pj. Of course, p0 = 0. The\nexpected revenues to a retailer from o\ufb00ering a set of products M to his customers under\nour choice model is thus given by R(M) = qj\u2208M pj\u03bbj(M).\n\n2\n\n\fData: A seller will have limited data with which to estimate \u03bb. We simply assume that\nthe data observed by the seller is given by an m-dimensional \u2018partial information\u2019 vector\ny = A\u03bb, where A \u2208 {0, 1}m\u00d7N ! makes precise the relationship between the observed data\nand the underlying choice model. For the purposes of illustration, we consider the following\nconcrete examples of data vectors y:\n\nRanking Data: This data represents the fraction of customers that rank a given product i\nas their rth choice. Here the partial information vector y is indexed by i, r with 0 \u2264 i, r \u2264 N .\nFor each i, r, yri denotes the probability that product i is ranked at position r. The matrix\nA is thus in {0, 1}N 2\u00d7N !. For a column of A corresponding to the permuation \u03c3, A(\u03c3), we\nwill thus have A(\u03c3)ri = 1 i\ufb00 \u03c3(i) = r.\n\nThis data represents the fraction of customers that prefer a given\nComparison Data:\nproduct i to a product j. The partial information vector y is indexed by i, j with 0 \u2264 i, j \u2264\nN ; i \u00d3= j. For each i, j, yi,j denotes the probability that product i is preferred to product j.\nThe matrix A is thus in {0, 1}N (N \u22121)\u00d7N !. A column of A, A(\u03c3), will thus have A(\u03c3)ij = 1\nif and only if \u03c3(i) < \u03c3(j).\n\nTop Set Data: This data refers to a concatenation of the \u201cComparison Data\u201d and informa-\ntion on the fraction of customers who have a given product i as their topmost choice for\neach i. Thus A\u00db = [A\u00db\n2 ] where A1 is simply the A matrix for comparison data, and\nA2 \u2208 {0, 1}N \u00d7N ! has A2(\u03c3)i = 1 i\ufb00 \u03c3(i) = 1.\nMany other types of data vectors consistent with the above view are possible; all we antici-\npate is that the dimension of the observed data m is substantially smaller than N !. We are\nnow in a position to formulate the questions broached in the previous section precisely:\n\n1 A\u00db\n\n\u201cSimplest\u201d Model: In \ufb01nding the simplest choice model consistent with the observed data\nwe attempt to solve:\n\nminimize\u00eb\u03bb\u00eb0\n\n(2)\nRobust Approach: For a given o\ufb00er set M, and data vector y, what are the minimal\nexpected revenues we might expect from M consistent with the observed data? To answer\nthis question, we attempt to solve :\n\n\u03bb \u2265 0.\n\nsubject to A\u03bb = y,\n\n1\u00db\u03bb = 1,\n\n(3)\n\nminimize\n\n\u03bb\n\nqj\u2208M\n\npj\u03bbj(M)\n\nsubject to A\u03bb = y,\n\n1\u00db\u03bb = 1,\n\n\u03bb \u2265 0.\n\n3 Estimating Sparse Choice Models\n\nHere we consider \ufb01nding the sparsest model consistent with the observed data (i.e. problem\n(2)). We face two questions: (a) Why is sparsity an interesting criterion? (b) Is there\nan e\ufb03cient procedure to solve the program in (2)? We begin by identifying two simple\nconditions that de\ufb01ne a class of choice models (i.e. a class of distributions \u03bb). Assuming\nthat the \u2018true\u2019 underlying model \u03bb belongs to this class, we prove that the sparsest model\n(i.e the solution to (2)) is in fact this true model. This answers the \ufb01rst question. We then\npropose a simple procedure inspired by [9] that correctly solves the program in (2) assuming\nthese conditions. It is di\ufb03cult to expect the program in (2) to recover the true solution in\ngeneral (see [9] for a justi\ufb01cation). Nonetheless, we show that the conditions we impose are\nnot overly restrictive: we prove that a \u201csu\ufb03ciently\u201d sparse model generated uniformly at\nrandom from the set of all possible choice models satis\ufb01es the two conditions with a high\nprobability.\n\nBefore we describe the conditions we impose on the true underlying distribution, we intro-\nduce some notation. Let \u03bb denote the true underlying distribution, and let K denote the\nsupport size, \u00eb\u03bb\u00eb0. Let \u03c31, \u03c32, . . . , \u03c3K denote the permutations in the support, i.e, \u03bb(\u03c3i) \u00d3= 0\nfor 1 \u2264 i \u2264 K, and \u03bb(\u03c3) = 0 for all \u03c3 \u00d3= \u03c3i, 1 \u2264 i \u2264 K. Recall that y is of dimension m and\nwe index its elements by d. The two conditions we impose are as follows:\n\nin the support, there exists a d(i) \u2208\nSignature Condition: For every permutation \u03c3i\n{1, 2, . . . , m} such that A(\u03c3i)d(i) = 1 and A(\u03c3j)d(i) \u00d3= 0, for every j \u00d3= i and 1 \u2264 i, j \u2264 K.\nIn other words, for each permutation \u03c3i in the support, yd(i) serves as its \u2018signature\u2019.\n\n3\n\n\fLinear Independence Condition: qK\ni=1 ci\u03bb(\u03c3i) \u00d3= 0, for any ci \u2208 Z and |ci| \u2264 C, where Z\ndenotes the set of integers and C is a su\ufb03ciently large number \u2265 K. This condition is\nsatis\ufb01ed with probability 1 if [\u03bb1\u03bb2 . . . \u03bbK]\u00db is drawn uniformly from K-dim simplex.\n\nWhen the two conditions are satis\ufb01ed, the sparsest solution is indeed the true solution as\nstated in the following theorem:\n\nTheorem 1. Suppose we are given y = A\u03bb and \u03bb satis\ufb01es the \u201cSignature\u201d condition and\nthe \u201cLinear Independence\u201d condition. Then, \u03bb is the unique solution to the program in (2).\n\nThe proof of Theorem 1 is given in the appendix. Next we describe the algorithm we propose\nfor recovery. The algorithm takes y and A and as the input and outputs \u03bbi (denotes \u03bb(\u03c3i))\nand A(\u03c3i) for every permutation \u03c3i in the support. The algorithm assumes the observed\nvalues yd are sorted. Therefore, without loss of generality, assume that y1 < y2 < . . . < ym.\nThen, the algorithm is as follows:\n\nAlgorithm:\n\nInitialization: \u03bb0 = 0, k(0) = 0 and A(\u03c3i)d = 0, 1 \u2264 i \u2264 K and 1 \u2264 d \u2264 m.\nfor\n\nd = 1 to m\n\nif\n\nelse\n\nyd = qi\u2208T \u03bbi\nk(d) = k(d \u2212 1),\nk(d) = k(d \u2212 1) + 1,\n\nend if\n\nfor some T \u2286 {1, . . . , k(d \u2212 1)}\n\nA(\u03c3i)d = 1 \u2200 i \u2208 T\n\n\u03bbk(d) = yd,\n\nA(\u03c3k(d))d = 1,\n\nend for\nOutput K = k(m) and (\u03bbi, A(\u03c3i)), 1 \u2264 i \u2264 K.\nNow, we have the following theorem:\n\nTheorem 2. Suppose we are given y = A\u03bb and \u03bb satis\ufb01es the \u201csignature\u201d and the \u201clinear\nindependence\u201d conditions. Then, the above described algorithm recovers \u03bb.\n\nTheorem 2 is proved in the appendix. The algorithm we have described either succeeds\nin \ufb01nding a valid \u03bb or else determines that the two properties are not satis\ufb01ed. We now\nshow that the conditions we have imposed do not restrict the class of plausible models\nseverely. For this, we show that models drawn from the following generative model satisfy\nthe conditions with high probability.\n\nGenerative Model. Given K and an interval [a, b] on the positive real line, we generate a\nchoice model \u03bb as follows: choose K permutations, \u03c31, \u03c32, . . . , \u03c3K, uniformly at random with\nreplacement, choose K numbers uniformly at random from the interval [a, b], normalize the\nnumbers so that they sum to 1, and assign them to the permutations \u03c3i, 1 \u2264 i \u2264 K. For all\nother permutations \u03c3 \u00d3= \u03c3i, \u03bb(\u03c3) = 0. Note that, since we are choosing permutations in the\nsupport with replacement, there could be repetitions. However, for large N and K \u00b9 N !,\nthis happens with a vanishing probability.\n\nDepending on the observed data, we characterize values of sparsity K for which distribu-\ntions generated by the above generative model can be recovered with a high probability.\nSpeci\ufb01cally, we have the following theorem for the three forms of observed data mentioned\nin Section 2. The proof may be found in the appendix.\n\nTheorem 3. Suppose \u03bb is a choice model of support size K drawn from the generative model.\nThen, \u03bb satis\ufb01es the \u201csignature\u201d and \u201clinear independence\u201d conditions with probability 1 \u2212\no(1) as N \u2192 \u221e provided K = O(N ) for ranking data, K = o(log N ) for comparison data,\nand K = o(\u221aN ) for the top set data.\n\nOf course, in general, the underlying choice model may not satisfy the two conditions we\nhave posited or be exactly recoverable from the observed data. In order to deal with this\nmore general scenario, we next propose an approach that implicitly identi\ufb01es a \u2018worst-case\u2019\ndistribution consistent with the observed data.\n\n4\n\n\f4 Robust Revenue Estimates Consistent with Data\n\nIn this section, we propose a general algorithm for the solution of program (3). This LP\nhas N ! variables and is clearly not amenable to direct solution; hence we consider its dual.\nIn preparation for taking the dual, let Aj(M) , {A(\u03c3) : \u03c3 \u2208 Sj(M)}, where, recall that,\nSj(M) denotes the set of all permutations that result in the purchase of j \u2208 M when o\ufb00ered\nthe assortment M. Since SN = \u222aj\u2208MSj(M), we have implicitly speci\ufb01ed a partition of the\ncolumns of the matrix A. Armed with this notation, the dual of (3) is:\n\n(4) maximize\n\n\u03b1,\u03bd\n\n\u03b1\u00dby + \u03bd\n\nsubject to\n\nmax\n\nxj \u2208Aj (M)\n\n!\u03b1\u00dbxj + \u03bd\" \u2264 pj,\n\nfor each j \u2208 M.\n\nOur solution procedure will rely on an e\ufb00ective representation of the sets Aj(M).\n4.1 A Canonical Representation of Aj(M) and its Application\nWe assume that every set Sj(M) can be expressed as a disjoint union of Dj sets. We denote\nthe dth such set by Sjd(M) and let Ajd(M) be the corresponding set of columns. Consider\nthe convex hull of the set Ajd(M), conv{Ajd(M)} , \u00afAjd(M). \u00afAjd(M) is by de\ufb01nition a\npolytope contained in the m-dimensional unit cube, [0, 1]m. In other words,\n\n\u00b7\n\n(5)\n\n, bjd\n\n\u00b7\n\n1 , Ajd\n\n2 xjd = bjd\n\n2 , Ajd\n\n1 xjd \u2265 bjd\n\n3 xjd \u2264 bjd\n3 ,\n\n\u00afAjd(M) = {xjd : Ajd\nfor appropriately de\ufb01ned Ajd\n. By a canonical representation of Aj(M), we will thus un-\nderstand a partition of Sj(M) and a polyhedral representation of the columns corresponding\nto every set in the partition as given by (5). Ignoring the problem of actually obtaining this\nrepresentation for now, we assume access to a canonical representation and present a simple\nprogram whose size is polynomial in the size of this representation that is equivalent to (3),\n(4). For simplicity of notation, we assume that each of the polytopes \u00afAjd(M) is in standard\nxjd \u2265 0.}. Now since an a\ufb03ne function is\nform, i.e.\nalways optimized at the vertices of a polytope, we know:\n\nxjd \u2265 0.}\n\n\u00afAjd(M) = {xjd : Ajd xjd = bjd,\nxj \u2208Aj (M)!\u03b1\u00dbxj + \u03bd\" =\n\nmax\n\nmax\n\nd,xjd\u2208 \u00afAjd(M)!\u03b1\u00dbxjd + \u03bd\" .\n\nWe have thus reduced (4) to a \u2018robust\u2019 LP. By strong duality we have:\n\n(6)\n\nmax\n\nxjd\u2208 \u00afAjd(M)!\u03b1\u00dbxjd + \u03bd\" ,\n\nmaximize\n\nxjd\n\n\u03b1\u00dbxjd + \u03bd\n\nsubject to Ajd xjd = bjd\n\nxjd \u2265 0.\n\nminimize\n\n\u03b3jd\n\n=\n\n\u00db\n\nbjd\n\n\u03b3jd + \u03bd\n\nsubject to \u03b3jd\n\n\u00db\n\nAjd \u2265 \u03b1\n\nWe have thus established the following useful equality:\n\n;\u03b1, \u03bd : max\n\nxj \u2208 \u00afAj (M)!\u03b1\u00dbxj + \u03bd\" \u2264 pj< = \u00ee\u03b1, \u03bd : bjd\n\n\u00db\n\n\u03b3jd + \u03bd \u2264 pj, \u03b3jd\n\n\u00db\n\nAjd \u2265 \u03b1, d = 1, 2, . . . , Dj\u00ef .\n\nIt follows that solving (3) is equivalent to the following LP whose complexity is polynomial\nin the description of our canonical representation:\n\n(7)\n\nmaximize\n\n\u03b1,\u03bd\n\nsubject to\n\n\u03b1\u00dby + \u03bd\n\n\u00db\n\n\u00db\n\nbjd\n\u03b3jd\n\n\u03b3jd + \u03bd \u2264 pj\nAjd \u2265 \u03b1\n\nfor all j \u2208 M, d = 1, 2, . . . , Dj\nfor all j \u2208 M, d = 1, 2, . . . , Dj.\n\nOur ability to solve (7) relies on our ability to produce an e\ufb03cient canonical representation\nof Sj(M).\nIn what follows, we \ufb01rst consider an example where such a representation is\nreadily available, and then consider the general case.\n\nCanonical Representation for Ranking Data: Recall the de\ufb01nition of ranking data\nfrom Section 2. Consider partitioning Sj(M) into N sets wherein the dth set is given by\n\n5\n\n\f(8)\n\nN \u22121\n\nxjd\nri = 1\n\nqi=0\nxjd\nqr=0\nri = 1\nxjd\nri \u2208 {0, 1}\nxjd\ndj = 1\nxjd\nd\u00cdi = 0\n\nfor 0 \u2264 i \u2264 N \u2212 1\nfor 0 \u2264 r \u2264 N \u2212 1\nfor 0 \u2264 i, r \u2264 N \u2212 1.\nfor all i \u2208 M, i \u00d3= jand 0 \u2264 d\u00cd < d.\n\nSjd(M) = {\u03c3 \u2208 Sj(M) : \u03c3(j) = d}. It is not di\ufb03cult to show that the set Ajd(M) is equal\nto the set of all vectors xjd in {0, 1}N satisfying:\n\nN \u22121\n\nOur goal is, of course, to \ufb01nd a description for \u00afAjd(M) of the type (5). Now consider\nreplacing the third (integrality) constraint in (8) with simply the non-negativity constraint\nri \u2265 0. It is clear that the resulting polytope contains \u00afAjd(M). In addition, one may\nxjd\nshow that the resulting polytope has integral vertices since it is simply a matching polytope\nwith some variables forced to be integers, so that in fact the polytope is precisely \u00afAjd(M),\nand we have our canonical representation. Further, notice that this representation yields an\ne\ufb03cient algorithm to solve (3) via (7)!\n\n4.2 Computing a Canonical Representation: Comparison Data\n\nRecall the de\ufb01nition of comparison data from Section 2. We use this data as an example to\nillustrate a general procedure for computing a canonical representation. Consider Sj(M).\nIt is not di\ufb03cult to see that the corresponding set of columns Aj(M) is equal to the set of\nvectors in {0, 1}(N \u22121)N satisfying the following constraints:\n\n(9)\n\nkl \u2212 1\n\nik + xj\nki = 1\n\nxj\nil \u2265 xj\nxj\nik + xj\nxj\nji = 1\nxj\nik \u2208 {0, 1}\n\nfor all i, k, l \u2208 N , i \u00d3= k \u00d3= l\nfor all i, k \u2208 N , i \u00d3= k\nfor all i \u2208 M, i \u00d3= j\nfor all i, k \u2208 N , i \u00d3= k\n\nik \u2265 0. Call this polytope \u00afAo\nj (M) \u00d3= \u00afAj(M) in general. In this case we resort to the following procedure.\n\nBrie\ufb02y, the second constraint follows since for any i, k, i \u00d3= k, either \u03c3(i) > \u03c3(k) or else \u03c3(i) <\n\u03c3(k). The \ufb01rst constraint enforces transitivity: \u03c3(i) < \u03c3(k) and \u03c3(k) < \u03c3(l) together imply\n\u03c3(i) < \u03c3(l). The third constraint enforces that all \u03c3 \u2208 Sj(M) must satisfy \u03c3(j) < \u03c3(i) for all\ni \u2208 M. Now consider the polytope obtained by relaxing the fourth (integrality) constraint\nj (M) \u2287 \u00afAj(M).\nto simply xj\nUnlike the case of ranking data, however, \u00afAo\nj (M) can in fact be shown to be non-integral,\nso that \u00afAo\n[1.] Solve (7) using the representation of \u00afAo\nbound on (3) since \u00afAo\n[2.] Solve the optimization problem max \u03b1\u00db\nj (M) for each j. If the\noptimal solution \u02c6xj is integral for each j, then stop; the solution computed in the \ufb01rst step\nis in fact optimal.\n\nj (M) \u2283 \u00afAj(M). Call the corresponding solution \u03b1(1), \u03bd(1).\n\nj (M) in place of \u00afAj(M). This yields a lower\n\nj (M). Of course, we must have \u00afAo\n\n(1)xj subject to xj \u2208 \u00afAo\n\n[3.] Let \u02c6xj\nik be a non-integral variable. Partition Sj(M) on this variable - i.e. de\ufb01ne\nSj1(M) = {\u03c3 : \u03c3 \u2208 Sj(M), \u03c3(i) < \u03c3(k)} and Sj2(M) = {\u03c3 : \u03c3 \u2208 Sj(M), \u03c3(i) > \u03c3(k)}.\nDe\ufb01ne outer-approximations to \u00afAj1(M) and \u00afAj2(M) as the projection of \u00afAo\nik = 1\nand xj\n\nj (M) on xj\n\nik = 0 respectively. Go to step 1.\n\nThe above procedure is \ufb01nite, but the size of the LP we solve at each iteration doubles.\nNonetheless, each iteration produces a lower bound to (3) whose quality is easily measured\n(for instance, by solving the maximization version of (3) using the same procedure), and\nthis quality improves with each iteration. In our computational experiments with a related\ntype of data, it su\ufb03ced to stop after a single iteration.\n\n6\n\n\f5 An Empirical Evaluation of the Approach\n\nWe have presented simple sub-routines to estimate the revenues R(M) from a particular o\ufb00er\nset M, given marginal preference data y. These sub-routines are e\ufb00ectively \u2018non-parametric\u2019\nand can form the basis of a procedure that solves the revenue optimization problem posed in\nthe introduction. Here we seek to contrast this approach with a commonly used parametric\napproach. We consider two types of observable data: ranking data and a \u2018censored\u2019 version\nof the comparison data which gives us for every pair of products i, j,\u00d3= 0, the fraction of\ncustomers that prefer i to j, and in addition prefer i to 0 (i.e. not buying). The latter type\nof data is quite realistic.\n\nThe parametric recipe we consider is the following: One \ufb01ts a Multinomial Logit (MNL)\nmodel to the observable data and picks an optimal o\ufb00er set by evaluating R(M) =\nqj\u2208M pj P(j|M) assuming P(\u00b7|M) follows the estimated model. The MNL is a commonly\nused parametric model that associates with each product i in N a positive scalar wi; w0 = 1\nby convention. The model assumes P(i|M) = wi/qj\u2208M wj. In place of making this para-\nmetric assumption, we could instead evaluate R(M) using the robust sub-routine developed\nin the previous section and pick M to maximize this conservative estimate. It is clear that\nif the MNL model is a poor \ufb01t to the true choice model, P, our robust approach is likely\nto outperform the parametric approach substantially.\nInstead, what we focus on here is\nwhat happens if the MNL model is a perfect \ufb01t to the true choice model. In this case, the\nparametric approach is the best possible. How sub-optimal is our non-parametric approach\nhere?\n\nWe consider an MNL model on N = 25 products. The model and prices were speci\ufb01ed\nusing customer utilities for Amazon.com\u2019s highest selling DVDs (and their prices) during a\n3-month period from 1 July 2005 to 30 September 2005 estimated by [13] 1. We generate\nsynthetic observed data (of both the ranking type and the comparison type) according to this\n\ufb01tted MNL model. This represents a scenario where the \ufb01tted MNL is a perfect descriptor\nof reality. We conduct the following experiments:\n\nQuality of Revenue Predictions: For each type of observable data we compute our\nestimate of the minimum value that R(M) can take on, consistent with that data, by\nsolving (3). We compare this with the value of R(M) predicted under the MNL model\n(which in this case, is exact). Figures 1(b) and 1(d) compare these two quantities for a\nset of randomly chosen subsets M of the 25 potential DVD\u2019s assuming ranking data and\nthe censored comparison data respectively. In both cases, our procedure produces excellent\npredictions of expected revenue without making the assumptions on P(\u00b7|\u00b7) inherent in the\nMNL model.\n\nQuality of Optimal Solutions to Revenue Maximization Problems: For each type\nof observable data, we compute optimal o\ufb00er sets M of varying capacities assuming the\n\ufb01tted MNL model and an optimization procedure described in [13]. We then evaluate the\nrevenue predictions for these optimal o\ufb00er sets by solving (3). Figures 1(a) and 1(c) plot\nthese estimates for the two types of observable data. The gap between the \u2018MNL\u2019 and the\n\u2018MIN\u2019 curves is thus an upper bound on the expected revenue loss if one used our non-\nparametric procedure to pick an optimal o\ufb00er set M over the parametric procedure (which\nin this setting is optimal). Again, we see that the revenue loss is surprisingly small.\n\n6 Conclusion and Potential Future Directions\n\nWe have presented a general framework that allows us to answer questions related to how\nconsumers choose among alternatives using limited observable data and without making\nadditional parametric assumptions. The approaches we have proposed are feasible from\na data availability standpoint as well as a computational standpoint and provide a much\nneeded non-parametric \u2018sub-routine\u2019 for the revenue optimization problems described at the\noutset. This paper also opens up the potential for a stream of future work.\n\n1The problem of optimizing over M is particularly relevant to Amazon.com given limited screen\n\nreal-estate and cannibilization e\ufb00ects\n\n7\n\n\f14\n14\n14\n\n12\n12\n12\n\n10\n10\n10\n\n8\n8\n8\n\n6\n6\n6\n\n4\n4\n4\n\n2\n2\n2\n\n)\n)\n)\ns\ns\ns\nr\nr\nr\na\na\na\n\nl\nl\nl\nl\nl\nl\n\no\no\no\nd\nd\nd\n(\n(\n(\n \n \n \n\ne\ne\ne\nu\nu\nu\nn\nn\nn\ne\ne\ne\nv\nv\nv\ne\ne\ne\nR\nR\nR\nd\nd\nd\ne\ne\ne\n\n \n \n \n\nt\nt\nt\nc\nc\nc\ne\ne\ne\np\np\np\nx\nx\nx\nE\nE\nE\n\n0\n0\n0\n\n \n \n \n0\n0\n0\n\nMNL Expected Revenue\nMNL Expected Revenue\nMNL Expected Revenue\nMIN Expected Revenue\nMIN Expected Revenue\nMIN Expected Revenue\n\n \n \n \n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n)\ns\nr\na\n\nl\nl\n\no\nd\n(\n \n\ne\nu\nn\ne\nv\ne\nR\nd\ne\n\n \n\nt\nc\ne\np\nx\nE\n\nMNL Expected Revenue\nMIN Expected Revenue\n\n \n\n5\n5\n5\n\n10\n10\n10\n\n15\n15\n15\n\n20\n20\n20\n\n25\n25\n25\n\nOptimal MNL assortment (size)\nOptimal MNL assortment (size)\nOptimal MNL assortment (size)\n\n0\n\n \n0\n\n5\n\n10\n15\nAssortment index\n\n20\n\n25\n\n(a) Ranking Data: Optimal M\n\n(b) Ranking Data: Random M\n\n14\n14\n14\n14\n\n12\n12\n12\n12\n\n10\n10\n10\n10\n\n8\n8\n8\n8\n\n6\n6\n6\n6\n\n4\n4\n4\n4\n\n2\n2\n2\n2\n\n)\n)\n)\n)\ns\ns\ns\ns\nr\nr\nr\nr\na\na\na\na\n\nl\nl\nl\nl\nl\nl\nl\nl\n\no\no\no\no\nd\nd\nd\nd\n(\n(\n(\n(\n \n \n \n \n\ne\ne\ne\ne\nu\nu\nu\nu\nn\nn\nn\nn\ne\ne\ne\ne\nv\nv\nv\nv\ne\ne\ne\ne\nR\nR\nR\nR\nd\nd\nd\nd\ne\ne\ne\ne\n\n \n \n \n \n\nt\nt\nt\nt\nc\nc\nc\nc\ne\ne\ne\ne\np\np\np\np\nx\nx\nx\nx\nE\nE\nE\nE\n\n0\n0\n0\n0\n\n \n \n \n \n0\n0\n0\n0\n\nMNL Expected Revenue\nMNL Expected Revenue\nMNL Expected Revenue\nMNL Expected Revenue\nMIN Expected Revenue\nMIN Expected Revenue\nMIN Expected Revenue\nMIN Expected Revenue\n\n \n \n \n \n\n14\n14\n\n12\n12\n\n10\n10\n\n8\n8\n\n6\n6\n\n4\n4\n\n2\n2\n\n)\n)\ns\ns\nr\nr\na\na\n\nl\nl\nl\nl\n\no\no\nd\nd\n(\n(\n \n \n\ne\ne\nu\nu\nn\nn\ne\ne\nv\nv\ne\ne\nR\nR\nd\nd\ne\ne\n\n \n \n\nt\nt\nc\nc\ne\ne\np\np\nx\nx\nE\nE\n\nMNL Expected Revenue\nMNL Expected Revenue\nMIN Expected Revenue\nMIN Expected Revenue\n\n \n \n\n5\n5\n5\n5\n\n10\n10\n10\n10\n\n15\n15\n15\n15\n\n20\n20\n20\n20\n\n25\n25\n25\n25\n\nOptimal MNL assortment (size)\nOptimal MNL assortment (size)\nOptimal MNL assortment (size)\nOptimal MNL assortment (size)\n\n0\n0\n\n \n \n0\n0\n\n5\n5\n\n10\n15\n10\n15\nAssortment index\nAssortment index\n\n20\n20\n\n25\n25\n\n(c) Comparison Data: Optimal M\n\n(d) Comparison Data: Random M\n\nReferences\n\n[1] K. Bartels, Y. Boztug, and M. M. Muller. Testing the multinomial logit model. Working\n\nPaper, 1999.\n\n[2] R. Berinde, AC Gilbert, P. Indyk, H. Karlo\ufb00, and MJ Strauss. Combining geometry\n\nand combinatorics: A uni\ufb01ed approach to sparse signal recovery. Preprint, 2008.\n\n[3] E.J. Candes, J.K. Romberg, and T. Tao. Stable signal recovery from incomplete and\ninaccurate measurements. Communications on Pure and Applied Mathematics, 59(8),\n2006.\n\n[4] G. Cormode and S. Muthukrishnan. Combinatorial algorithms for compressed sensing.\n\nLecture Notes in Computer Science, 4056:280, 2006.\n\n[5] G. Debreu. Review of r.d. luce, \u2018individual choice behavior: A theoretical analysis\u2019.\n\nAmerican Economic Review, 50:186\u2013188, 1960.\n\n[6] G. Dobson and S. Kalish. Positioning and pricing a product line. Marketing Science,\n\n7(2):107\u2013125, 1988.\n\n[7] DL Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):\n\n1289\u20131306, 2006.\n\n[8] J. L. Horowitz. Semiparametric estimation of a work-trip mode choice model. Journal\n\nof Econometrics, 58:49\u201370, 1993.\n\n[9] S. Jagabathula and D. Shah. Inferring rankings under constrained sensing. In NIPS,\n\npages 7\u20131, 2008.\n\n[10] D. McFadden. Econometric models for probabiistic choice among products. The Journal\n\nof Business, 53(3):S13\u2013S29, 1980.\n\n[11] EH McKinney. Generalized birthday problem. American Mathematical Monthly, pages\n\n385\u2013387, 1966.\n\n[12] P. Rusmevichientong, B. Van Roy, and P. Glynn. A nonparametric approach to multi-\n\nproduct pricing. Operations Research, 54(1), 2006.\n\n[13] P. Rusmevichientong, ZJ Shen, and D.B. Shmoys. Dynamic Assortment Optimization\nwith a Multinomial Logit Choice Model and Capacity Constraint. Technical report,\nWorking Paper, 2008.\n\n[14] Kalyan T. Talluri and Garrett J. van Ryzin. The Theory and Practice of Revenue\n\nManagement. Springer Science+Business Media, 2004.\n\n8\n\n\f", "award": [], "sourceid": 460, "authors": [{"given_name": "Vivek", "family_name": "Farias", "institution": null}, {"given_name": "Srikanth", "family_name": "Jagabathula", "institution": null}, {"given_name": "Devavrat", "family_name": "Shah", "institution": null}]}