{"title": "Gaussian Process Preference Elicitation", "book": "Advances in Neural Information Processing Systems", "page_first": 262, "page_last": 270, "abstract": "Bayesian approaches to preference elicitation (PE) are particularly attractive due to their ability to explicitly model uncertainty in users' latent utility functions. However, previous approaches to Bayesian PE have ignored the important problem of generalizing from previous users to an unseen user in order to reduce the elicitation burden on new users. In this paper, we address this deficiency by introducing a Gaussian Process (GP) prior over users' latent utility functions on the joint space of user and item features. We learn the hyper-parameters of this GP on a set of preferences of previous users and use it to aid in the elicitation process for a new user. This approach provides a flexible model of a multi-user utility function, facilitates an efficient value of information (VOI) heuristic query selection strategy, and provides a principled way to incorporate the elicitations of multiple users back into the model. We show the effectiveness of our method in comparison to previous work on a real dataset of user preferences over sushi types.", "full_text": "Gaussian Process Preference Elicitation\n\nEdwin V. Bonilla, Shengbo Guo, Scott Sanner\n\nNICTA & ANU, Locked Bag 8001, Canberra ACT 2601, Australia\n\n{edwin.bonilla, shengbo.guo, scott.sanner}@nicta.com.au\n\nAbstract\n\nBayesian approaches to preference elicitation (PE) are particularly attractive due\nto their ability to explicitly model uncertainty in users\u2019 latent utility functions.\nHowever, previous approaches to Bayesian PE have ignored the important prob-\nlem of generalizing from previous users to an unseen user in order to reduce the\nelicitation burden on new users. In this paper, we address this de\ufb01ciency by in-\ntroducing a Gaussian Process (GP) prior over users\u2019 latent utility functions on the\njoint space of user and item features. We learn the hyper-parameters of this GP on\na set of preferences of previous users and use it to aid in the elicitation process for\na new user. This approach provides a \ufb02exible model of a multi-user utility func-\ntion, facilitates an ef\ufb01cient value of information (VOI) heuristic query selection\nstrategy, and provides a principled way to incorporate the elicitations of multiple\nusers back into the model. We show the effectiveness of our method in comparison\nto previous work on a real dataset of user preferences over sushi types.\n\n1\n\nIntroduction\n\nPreference elicitation (PE) is an important component of interactive decision support systems that\naim to make optimal recommendations to users by actively querying their preferences. A crucial\nrequirement for PE systems is that they should be able to make optimal or near optimal recommen-\ndations based only on a small number of queries. In order to achieve this, a PE system should (a)\nmaintain a \ufb02exible representation of the user\u2019s utility function; (b) handle uncertainty in a principled\nmanner; (c) select queries that allow the system to discriminate amongst the highest utility items;\nand (d) allow for the incorporation of prior knowledge from different sources.\nWhile previous Bayesian PE approaches have addressed (a), (b) and (c), they appear to ignore an\nimportant aspect of (d) concerning generalization from previous users to a new unseen user in order\nto reduce the elicitation burden on new users. In this paper we propose a Bayesian PE approach\nto address (a)\u2013(d), including generalization to new users, in an elegant and principled way. Our\napproach places a (correlated) Gaussian process (GP) prior over the latent utility functions on the\njoint space of user features (T , mnemonic for tasks) and item features (X ). User preferences over\nitems are then seen as drawn from the comparison of these utility function values.\nThe main advantages of our GP-based Bayesian PE approach are as follows. First, due to the non-\nparametric Bayesian nature of GPs, we have a \ufb02exible model of the user\u2019s utility function that can\nhandle uncertainty and incorporate evidence straightforwardly. Second, by having a GP over the\njoint T \u00d7 X space, we can integrate prior knowledge on user similarity or item similarity, or simply\nhave more general-purpose covariances whose parameterization can be learned from observed pref-\nerences of previous users (i.e. achieving integration of multi-user information). Finally, our approach\ndraws from concepts in the Gaussian process optimization and decision-making literature [1, 2] to\npropose a Bayesian decision-theoretic PE approach. Here the required expected value of informa-\ntion computations can be derived in closed-form to facilitate the selection of informative queries and\ndetermine the highest utility item from the available item set as quickly as possible.\n\n1\n\n\fIn this paper we focus on pairwise comparison queries for PE, which are known to have low cognitive\nload [3, 4]. In particular, we assume a likelihood model of pairwise preferences that factorizes over\nusers and preferences and a GP prior over the latent utility functions correlates users and items.\n\n2 Problem Formulation\n\n(cid:110)\n\nD =\n(cid:31) x(j)\n\nLet x denote a speci\ufb01c item (or product) that is described by a set of features x and t denote a user\n(mnemonic for task) that can be characterized with features t. For a set of items X = {x1, . . . , xN}\nand users T = {t1, . . . , tM} we are given a set of training preference pairs:\n\n(t(j), x(j)\nk1\n\n(cid:31) x(j)\n\nk2\n\n)|k = 1, . . . , Kj; k1, k2 \u2208 {1, . . . , N}; j = 1 . . . , M\n\n,\n\n(1)\n\n(cid:111)\n\nk2\n\ndenotes that we have observed that user j prefers item k1 over item k2 and Kj is\n\nwhere x(j)\nk1\nthe number of preference relations observed for user j.\nThe preference elicitation problem is that given a new user, described by a set of features t\u2217, we aim\nto determine (or elicit) what his/her preferences (or favourite items) are by asking a small number\ndef= xi (cid:31) xj, meaning that he/she will prefer item i over item j. Ideally, we\nof queries of the form qij\nwould like to obtain the best user preferences with the smallest number of possible queries.\nThe key idea of this paper is that of learning a Gaussian process (GP) model over users\u2019 latent utility\nfunctions and use this model in order to drive the elicitation process of a new user. Due to the\nnon-parametric Bayesian nature of the GPs, this allows us to have a powerful model of the user\u2019s\nutility function and to incorporate the evidence (i.e. the responses the user gives to our queries) in a\nprincipled manner. Our approach directly exploits: (a) user-relatedness, i.e. that users with similar\ncharacteristics may have similar preferences; (b) items\u2019 similarities and (c) the value of information\nof obtaining a response to a query in order to elicit the preferences of the user.\n\n3 Likelihood Model\n\nOur likelihood model considers that the users\u2019 preference relationships are conditionally indepen-\ndent given the latent utility functions. In other words, the probability of a user t preferring item x\nover item x(cid:48) given their utility functions is:\n\np(xt (cid:31) x(cid:48)t|f (t, x), f (t, x(cid:48)), \u03be) = I[f (t, x) \u2212 f (t, x(cid:48)) \u2265 \u03be] with\n\n(2)\nwhere I[\u00b7] is an indicator function that is 1 if the condition [\u00b7] is true and 0 otherwise; and \u03c32 is the\nvariance of the normally distributed variable \u03be that dictates how different the latent functions should\nbe for the corresponding relation to hold. Hence:\n\np(\u03be) = N (\u03be|0, \u03c32),\n\np(xt (cid:31) x(cid:48)t|f (t, x), f (t, x(cid:48))) =\n\nI[f (t, x) \u2212 f (t, x(cid:48)) \u2265 \u03be]N (\u03be|0, \u03c32)d\u03be\n\n(3)\n\n(cid:90) \u221e\n(cid:18) f (t, x) \u2212 f (t, x(cid:48))\n\n\u2212\u221e\n\n(cid:19)\n\n(4)\nwhere \u03a6(\u00b7) is the Normal cumulative distribution function (cdf). The conditional data-likelihood is\nthen given by:\n\n= \u03a6\n\n\u03c3\n\n,\n\nM(cid:89)\n\nKj(cid:89)\n\nj=1\n\nk=1\n\np(D|f ) =\n\n\u03a6(zj\n\nk) with zj\n\nk =\n\n1\n\u03c3\n\nf (t(j), x(j)\nk1\n\n) \u2212 f (t(j), x(j)\n\nk2\n\n)\n\n.\n\n(5)\n\n(cid:16)\n\n(cid:17)\n\n4 Modeling User Dependencies with a GP Prior\n\nAs mentioned above, we model user (and item) dependencies via the user latent utility functions,\nwhich are assumed to be drawn from a GP prior that accounts for user similarity and item similarity\ndirectly:\n\nf (t, x) \u223c GP(cid:0)0, \u03bat(t, t(cid:48))\u03bax(x, x(cid:48))(cid:1) ,\n\n(6)\n\n2\n\n\fwhere \u03bat(\u00b7,\u00b7) is a covariance function on user-descriptors t and \u03bax(\u00b7,\u00b7) is a covariance function\non item features x. We will denote the parameters of these covariance functions (so-called hyper-\nparameters) by \u03b8t and \u03b8x. (These types of priors have been considered previously in the regression\nsetting, see e.g. [5].)\nAdditionally, let f be the utility function values for all training users at all training input locations\n(i.e. items) so that f = [f (t(1), x(1)), . . . f (t(1), x(N )), . . . , f (t(M ), x(1)), . . . , f (t(M ), x(N ))]T and\nF be the N \u00d7 M matrix for which the jth column corresponds to the latent values for the jth user\nat all input points such that f = vec F. Hence:\n\nf \u223c N (0, \u03a3) with \u03a3 = Kt \u2297 Kx,\n\n(7)\nwhere Kt is the covariance between all the training users, Kx is the covariance between all the\ntraining input locations, and \u2297 denotes the Kronecker product. Note that dependencies between\nusers are not arbitrarily imposed but rather they will be learned from the available data by optimizing\nthe marginal likelihood. (We will describe the details of hyper-parameter learning in section 7.)\n\n5 Posterior and Predictive Distributions\n\nGiven the data in (1) and the prior over the latent utility functions in equation (6), we can obtain the\nposterior distribution:\n\nP (f|D, \u03b8) =\n\np(D|f , \u03b8)p(f|\u03b8)\n\np(D|\u03b8)\n\n,\n\n(8)\n\nand where p(D|\u03b8) is the marginal likelihood (or evidence) with p(D|\u03b8) = (cid:82) p(D|f , \u03b8)p(f|\u03b8)df.\n\nwhere we have emphasized the dependency on the hyper-parameters \u03b8 that include \u03b8t, \u03b8x and \u03c32\n\nThe non-Gaussian nature of the conditional likelihood term (given in equation (5)) makes the above\nintegral analytically intractable and hence we will require approximations. In this paper we will fo-\ncus on analytical approximations and more speci\ufb01cally, we will approximate the posterior p(f|D, \u03b8),\nand the evidence, using the Laplace approximation.\nThe Laplace method approximates the true posterior with a Gaussian: p(f|D, \u03b8) \u2248 N (f|\u02c6f , A\u22121),\nwhere \u02c6f = argmaxf p(f|D, \u03b8) = argmaxf p(D|f , \u03b8)p(f|\u03b8) and A is the Hessian of the negative\nlog-posterior evaluated at \u02c6f. Hence we consider the unnormalized expression p(D|f , \u03b8)p(f|\u03b8) and,\nomitting the terms that are independent of f, we focus on the maximization of the following expres-\nsion:\n\n\u03c8(f ) =\n\nM(cid:88)\nKj(cid:88)\n(cid:18) \u2202log p(D|f , \u03b8)\n\nk=1\n\nj=1\n\nUsing Newton\u2019s method we obtain the following iterative update:\n\nf new = (W + \u03a3\u22121)\u22121\n\n+ Wf\n\n\u2202f\n\nlog \u03a6(zj\n\nk) \u2212 1\n2\n\nf T \u03a3\u22121f.\n\n(cid:19)\n\nwith Wpq = \u2212 M(cid:88)\n\nKj(cid:88)\n\nj=1\n\nk=1\n\n\u22022log \u03a6(zj\nk)\n\u2202fp\u2202fq\n\n.\n\nOnce we have found the maximum posterior \u02c6f by using the above iteration we can show that:\n\np(f|D) \u2248 N (f|\u02c6f , (W + \u03a3\u22121)\u22121).\n\n5.1 Predictive Distribution\n\nIn order to set-up our elicitation framework we will also need the predictive distribution for a \ufb01xed\ntest user t\u2217 at an unseen pair x1\u2217, x2\u2217. This is given by:\n\np(f\u2217|D) =\n\n(cid:90)\n= N (f\u2217|\u00b5\u2217, C\u2217),\n\np(f\u2217|f )p(f|D)df\n\nwith:\n\n\u00b5\u2217 = kT\u2217 \u03a3\u22121\u02c6f\n\nand C\u2217 = \u03a3\u2217 \u2212 kT\u2217 (\u03a3 + W\u22121)\u22121k\u2217,\n\n3\n\n(9)\n\n(10)\n\n(11)\n\n(12)\n\n(13)\n\n(14)\n\n\fwhere \u03a3 is de\ufb01ned as in equation (7) and:\n\nk\u2217 = kt\u2217 \u2297 kx\u2217\n\n. . . \u03bat(t\u2217, t(M ))(cid:3)T\n\nkt\u2217 =(cid:2)\u03bat(t\u2217, t(1)),\n(cid:20)\u03bax(x1\u2217, x(1)),\n(cid:20)\u03bat(t\u2217, t\u2217)\u03bax(x1\u2217, x1\u2217) \u03bat(t\u2217, t\u2217)\u03bax(x1\u2217, x2\u2217)\n(cid:21)\n\n. . . \u03bax(x1\u2217, x(N ))\n. . . \u03bax(x2\u2217, x(N ))\n\n\u03bax(x2\u2217, x(1)),\n\n(cid:21)T\n\nkx\u2217 =\n\n\u03bat(t\u2217, t\u2217)\u03bax(x2\u2217, x1\u2217) \u03bat(t\u2217, t\u2217)\u03bax(x2\u2217, x2\u2217)\n\n\u03a3\u2217 =\n\n(15)\n\n(16)\n\n(17)\n\n(18)\n\n.\n\n6 Gaussian Process Preference Elicitation Framework\n\nNow we have the main components to set up our preference elicitation framework for a test user\ncharacterized by features t\u2217. Our main objective is to use the previously seen data (and the corre-\nsponding learned hyper-parameters) in order to drive the elicitation process and to incorporate the\ninformation obtained from the user\u2019s responses back into our model in a principled manner. Our\nmain requirement is a function that dictates the value of making a query qij. In other words, we aim\nat trading-off the expected actual utility of the items involved in the query and the information these\nitems will provide regarding the user\u2019s preferences. This is the exploration-exploitation dilemma,\nusually seen in optimization and reinforcement learning problems. We can address this issue by\ncomputing the expected value of information (EVOI, [2]) of making a query involving items i and\nj. Before de\ufb01ning the EVOI, we will make use of the concept of expected improvement, a measure\nthat is commonly used in optimization methods based on response surfaces (see e.g. [1]).\n\n6.1 Expected Improvement\n\nWe have seen in equation (13) that the predictive distribution for the utility function on a test user\nt\u2217 on item x follows a Gaussian distribution:\n\nf (t\u2217, x|D, \u03b8) \u223c N (\u00b5\u2217(t\u2217, x), s2\u2217(t\u2217, x)),\n\n(19)\nwhere \u00b5\u2217(t\u2217, x) and s2\u2217(t\u2217, x) can be obtained by using (the marginalized version of) equation\n(14). Let us assume that, at any point during the elicitation process we have an estimate of the\nutility of the best item and let us denote it by f best. If we de\ufb01ne the predicted improvement at x as\nI = f (t\u2217, x|D, \u03b8) \u2212 f best then the expected improvement (EI) of recommending item x (for a \ufb01xed\nuser t\u2217) instead of recommending the best item xbest is given by:\n\nEI(x|D) =\n\nIp(I)dI = s\u2217(t\u2217, x)[z(cid:48)\u03a6(z(cid:48)) + \u03c6(z(cid:48))],\n\n(20)\nwhere z(cid:48) = (\u00b5\u2217(t\u2217, x) \u2212 f best)/s\u2217(t\u2217, x), \u03a6(\u00b7) is the Normal cumulative distribution function (cdf)\nand \u03c6(\u00b7) is the Normal probability density function (pdf). Note that, for simplicity in the notation,\nwe have omitted the dependency of EI(x|D) on the user\u2019s features t\u2217. Hence the maximum expected\nimprovement (ME) under the current observed data D is:\n\n0\n\n(cid:90) \u221e\n\nMEI(D) = max\n\nEI(x|D).\n\nx\n\n6.2 Expected Value of Information\n\n(21)\n\n(22)\n\n(23)\n\nNow we can de\ufb01ne the expected value of information (EVOI) as the expected gain in improvement\nthat is obtained by adding a query involving a particular pairwise relation. Thus, the expected value\nof information of obtaining the response for the queries involving items x\u2217i, x\u2217j with corresponding\nutility values f\u2217 = (f\u2217(t\u2217, xi\u2217), f\u2217(t\u2217, xj\u2217))T is given by:\n\nEVOI(D, i, j) = \u2212MEI(D) +\n\np(qij|f\u2217,D)MEI(D \u222a qij)\n\n(cid:43)\n\np(f\u2217|D)\n\n= \u2212MEI(D) +\n\np(x\u2217i (cid:31) x\u2217j|f\u2217,D)\np(x\u2217j (cid:31) x\u2217i|f\u2217,D)\n\n+\n\nMEI(D \u222a {x\u2217i (cid:31) x\u2217j})\nMEI(D \u222a {x\u2217j (cid:31) x\u2217i}),\n\np(f\u2217|D)\n\np(f\u2217|D)\n\n(cid:42)(cid:88)\n(cid:68)\n(cid:68)\n\nqij\n\n(cid:69)\n(cid:69)\n\n4\n\n\ffor all candidate pairs (i, j) do\n\nCompute EVOI(i, j,D, \u02c6f , W) {equation (23)}\n\nend for\n(i\u2217, j\u2217) \u2190 argmaxi,j EVOI(i, j) {best pair}\nRemove (i\u2217, j\u2217) from candidate list\nif qi\u2217,j\u2217 is true then {ask user and set true preference}\nelse\n\n(itrue, jtrue) \u2190 (i\u2217, j\u2217)\n(itrue, jtrue) \u2190 (j\u2217, i\u2217)\n\nend if\nD \u2190 D \u222a (tM +1, xitrue (cid:31) xjtrue) {Expand D and get D+}\nUpdate \u02c6f , W {i.e. P (f|D) as in equation (10)}\n\nuntil Satis\ufb01ed\n\nwhere(cid:68)\n\n(cid:69)\n\np(x\u2217i (cid:31) x\u2217j|f\u2217,D)\n\n= p(x\u2217i (cid:31) x\u2217j|D)\n\np(f\u2217|D)\n\nAlgorithm 1 Gaussian Process Preference Elicitation\nRequire: hyper-parameters \u03b8x, \u03b8t, \u03b8\u03c3 {learned from M previous users} and corresponding D\n\nrepeat\n\n(cid:90)\n(cid:90)\n\n=\n\n=\n\nf\u2217\n\n(cid:90)\n(cid:18)\n\nf\u2217\n\n\u03be\n\np(x\u2217i (cid:31) x\u2217j|f\u2217,D)p(f\u2217|D)df\u2217\n\nI[f\u2217\n\ni \u2212 f\u2217\nj \u2265 \u03be]N (\u03be|0, \u03c32)N (f\u2217|\u00b5\u2217, C\u2217)d\u03bedf\u2217\ni \u2212 \u00b5\u2217\n\u00b5\u2217\n\n(cid:19)\n\nj\n\n= \u03a6\n\nCi,i \u2212 Cj,j \u2212 2Ci,j \u2212 \u03c32\n\n,\n\n(24)\n\n(25)\n\n(26)\n\n(27)\n\nand \u00b5\u2217 and C\u2217 as de\ufb01ned in (14). Note that in our model p(x\u2217j (cid:31) x\u2217i|D) = 1 \u2212 p(x\u2217i (cid:31) x\u2217j|D).\nAs mentioned above, f best can be thought of as an estimate of the utility of the best item as its true\nutility is unknown. In practice we maintain our beliefs over the utilities of the items p(f|D+) for the\ni,M +1, where (cid:99)F+\non the test user (which is initially empty). Hence, we can set-up f best = maxi (cid:99)F+\ntraining users and the test user, where D+ denotes the data extended by the set of seen relationships\n\nis the matrix containing the mean estimates of the latent utility function distribution given by the\nLaplace approximation in equation (9). Alternatively, we can draw samples from such a distribution\nand apply the max operator.\nIn order to elicit preferences on a new user we simply select a query so that it maximizes the expected\nvalue of information EVOI as de\ufb01ned in equation (23). A summary of our approach is presented\nin algorithm 1. We note that although, in principle, one could also update the hyper-parameters\nbased on the data provided by the new user, we avoid this in order to keep computations manageable\nat query time. The reasoning being that, implicitly, we have learned the utility functions over all\nusers and we represent the utility of the test user (explicitly) on demand, updating our beliefs to\nincorporate the information provided by the user\u2019s responses.\n\n7 Hyper-parameter Learning\n\nThroughout this paper we have assumed that we have learned a Gaussian process model for the\nutility functions over users and items based upon previously seen preference relations. We refer to\nthe hyper-parameters of our model as the hyper-parameters \u03b8t and \u03b8x of the covariance functions\n(\u03bat and \u03bax respectively) and \u03b8\u03c3 = log \u03c3, where \u03c32 is the \u201cnoise\u201d variance.\nAlthough it is entirely possible to use prior knowledge on what these covariance functions are (or\ntheir corresponding parameter settings) for the speci\ufb01c problem under consideration, in many prac-\ntical applications such prior knowledge is not available and one requires to tune such parameteriza-\n\n5\n\n\ftion based upon the available data. Fortunately, as in the standard GP regression framework, we can\nachieve this in a principled way through maximization of the marginal likelihood (or evidence).\nAs in the case of the posterior distribution, the marginal likelihood is analytically intractable and\napproximations are needed. The Laplace approximation to the marginal log-likelihood is given by:\n\nlog p(D|\u03b8) \u2248 \u2212 1\n2\n\nlog|\u03a3W + I| \u2212 1\n2\n\n\u02c6f T \u03a3\u22121\u02c6f +\n\nM(cid:88)\n\nKj(cid:88)\n\nlog \u03a6(\u02c6zj\nk)\n\n(28)\n\nj=1\n\nk=1\n\nk = zj\n\nk|\u02c6f , \u02c6f and W are de\ufb01ned as in (10) and \u03a3 is de\ufb01ned as in equation (7). Note that\nwhere \u02c6zj\ncomputations are not carried out at all the M \u00d7 N data-points but only at those locations that\n\u201csupport\u201d the seen relations and hence we should write e.g. \u02c6fo, \u03a3o where the subindex {}o indicates\nthis fact. However, for simplicity, we have omitted this notation.\nGiven the equation above, gradient-based optimization can be used for learning the hyper-parameters\nin our model. As we shall see in the following section, for our experiments we do not have much\nprior information on suitable hyper-parameter settings and therefore we have carried out hyper-\nparameter learning by maximization of the marginal log-likelihood.\n\n8 Experiments & Results\n\nIn this section we describe the dataset used in our experiments, the evaluation setting and the results\nobtained with our model and other baseline methods.\n\n8.1 The Sushi Dataset\n\nWe evaluate our approach on the Sushi dataset [6]. Here we present a brief description of this dataset\nand the pre-processing we have carried out in order to apply our method. The reader is referred to [6]\nfor more details. The Sushi dataset contains full rankings given by 5000 Japanese users over N = 10\ndifferent types of sushi. Each sushi is associated with a set of features which include style, major\ngroup, minor group, heaviness, consumption frequency, normalized price and sell frequency. The\n\ufb01rst three features are categorical and therefore we have created the corresponding dummy variables\nto be used by our method. The resulting features are then represented by a 15-dimensional vector\n(x). Each user is also represented by a set of features wich include gender, age and other features\nthat compile geographical/regional information. As with the item features, we have created dummy\nvariables for those categorical features, which resulted into a 85-dimensional feature vector (t) for\neach user. As pointed out in the documentation of the dataset, Japanese food preferences are strongly\ncorrelated with geographical and regional information. Therefore, modeling user similarities may\nprovide useful information during the elicitation process.\n\n8.2 Evaluation Methodology and Experimental Details\n\nWe evaluate our method via 10-fold cross-validation, where we have sub-sampled the training folds\nin order to (a) keep the computational burden as low as possible and (b) show that we can learn\nsensible parameterizations based upon relatively low requirements in terms of the preferences seen\non previous users. In particular, we have subsampled 50 training users and selected about 5 training\npairwise preferences drawn from each of the N = 10 available items.\nFor the GPs we have used the squared exponential (SE) covariance functions with automatic rele-\nvance determination (ARD) for both \u03bat and \u03bax and have carried out hyperparameter learning via\ngradient-based optimization of the marginal likelihood in equation (28). We have initialized the\nhyper-parameters of the models deterministically, setting the signal variance and the length-scales\nof the covariance function to the initial values of 1 and the \u03c32 parameter to 0.01.\nIn order to measure the quality of our preference elicitation approach we use the normalized\nloss as a function of the number of queries, where at each iteration the method provides a rec-\nommendation based on the available information. The normalized loss function is de\ufb01ned as:\n(ubest \u2212 upred)/ubest, where ubest is the best utility for a speci\ufb01c item/user and upred is the utility\nachieved by the recommendation provided by the system.\n\n6\n\n\f(a)\n\n(b)\n\nFigure 1: The Normalized average loss as a function of the number of queries with 2 standard\n(errors of the mean) error bars. (a) The performance of our model compared to the RVOI method\ndescribed in [7] and the B&L heuristic over the full set of 5000 test users. (b) The performance of our\nmodel when the hyper-parameters have been optimized via maximization of the marginal likelihood\n(GPPE-OPT) compared to the same GP elicitation framework when these hyper-parameters have\nbeen set to their default values (GPPE-PRIOR).\n\nWe compare our approach to two baseline methods. One is the restricted value of information al-\ngorithm [7] and the other one is the best and largest heuristic, which we wil refer to as the RVOI\nmethod and the B&L heuristic respectively. The RVOI approach is also a VOI-based method but it\ndoes not leverage information from other users and it considers diagonal Gaussians as prior models\nof the latent utility functions. The B&L heuristic selects the current best item and the one with the\nlargest uncertainty. Both baselines have been shown to be competitive methods for preference elic-\nitation (see [7] for more details). Additionally, we compare our method when the hyper-parameters\nhave been learned on the set of previously seen users with the same GP elicitation approach when\nthe hyper-parameters have been set to the initial values described above. This allows us to show that,\nindeed, when prior information on user and item similarity is not available, our model does learn\nsensible settings of the hyper-parameters, which lead to better quality elicitation outcomes.\n\n8.3 Results\n\nFigure 1(a) shows the normalized average loss across all 5000 users as a function of the number\nof queries. As can be seen, on average, all competing methods reduce the expected loss as the\nnumber of queries increases. More importantly, our method (GPPE) clearly outperforms the other\nalgorithms even for a small number of queries. This demonstrates that our approach exploits the\ninter-relations between users and items effectively in order to enhance the elicitation process on a\nnew user. Although it may be surprising that the B&L heuristic outperforms the RVOI method, we\npoint out that the evaluation of these methods presented in [7] did not consider real datasets as we\ndo in our experiments.\nFigure 1(b) shows the normalized average loss across all 5000 users for our method when the hyper-\nparameters have been set to the initial values described in section 8 (labeled in the \ufb01gure as GPPE-\nPRIOR) and when the hyper-parameters have been optimized by maximization of the marginal like-\nlihood on a set of previously seen users (labeled in the \ufb01gure as GPPE-OPT). We can see that,\nindeed, the GPPE model that learns the hyper-parameters from previous users\u2019 data signi\ufb01cantly\noutperforms the same method when these (hyper-)parameters are not optimized.\n\n9 Related Work\n\nPreference elicitation (PE) is an important component of recommender systems and market research.\nTraditional PE frameworks focus on modeling and eliciting a single user\u2019s preferences. We can\ncategorize different PE frameworks in terms of query types. In [8], the authors propose to model\n\n7\n\n024681012141600.050.10.150.20.250.3NUMBER OF QUERIESNORMALIZED AVERAGE LOSS RVOIB&LGPPE024681012141600.050.10.150.20.250.3NUMBER OF QUERIESNORMALIZED AVERAGE LOSS GPPE\u2212PRIORGPPE\u2212OPT\futilities as random variables, and re\ufb01nes utility uncertainty by using standard gamble queries. The\nsame query type is also used in [9], which differs from [8] in treating PE as a Partially Observable\nMarkov Decision Process (POMDP). However, standard gamble queries are dif\ufb01cult for users to\nrespond to, and naturally lead to noisy responses. Simpler query types have also been used for PE.\nFor example, [7] uses pairwise comparison queries, which are believed to have low cognitive load.\nOur work also adopts simple pairwise comparison queries, but it differs from [7] in that it makes use\nof users\u2019 preferences that have been seen before and does not assume additive independent utilities.\nIn the machine learning community preference learning has received substantial interest over the past\nfew years. For example, one the most recent approaches to preference learning is presented in [10],\nwhere a multi-task learning approach to the problem of modeling human preferences is adopted\nby extending the model in [11] to deal with preference data. Their model follows a hierarchical\napproach based on \ufb01nite Gaussian processes (GPs), where inter-user similarities are exploited by\nassuming that the subjects share a set of hyper-parameters. Their model is different to ours in\nthat they consider the dual representation of the GPs as they do not generalize over user features.\nFurthermore, they do not address the elicitation problem, which is the main concern of this paper.\nExtensions of the Gaussian process formalism to model ordinal data and user preferences are given\nin [12] and [13]. Both their prior and their likelihood models can be seen as single-user (task)\nspeci\ufb01cations of our model. In other words, unlike the work of [10], their model (as ours) considers\nthe function space view of the GPs but, unlike [10] and our approach, they do not address the\nmulti-task case or generalize across users. More importantly, an elicitation framework for actively\nquerying the user is not presented in such works.\n[14] proposes an active preference learning method for discrete choice data. Their approach is based\non the model in [13]. Unlike our approach they do not leverage information from seen preferences on\nprevious users and hence their active preference learning process on a new user starts from scratch.\nThis leads to the problem of either relying on good prior information on the covariance function\nor on hyper-parameter updating during the active learning process, which is computationally too\nexpensive to be used in practice. Additionally, as their concern is on a possibly in\ufb01nite set of\ndiscrete choices, their approach completely relies upon the expected improvement (EI) measure.\n\n10 Conclusions & Future Work\n\nIn this paper we have presented a Gaussian process approach to the problem of preference elicitation.\nOne of the crucial characteristics of our method is that it exploits user-similarity via a (correlated)\nGaussian process prior over the users\u2019 latent utility functions. These similarities are \u201clearned\u201d from\npreferences on previous users. Our method maintains a \ufb02exible representation of the user\u2019s latent\nutility function, handles uncertainty in a principled manner and allows the incorporation of prior\nknowledge from different sources. The required expected value of information computations can be\nderived in closed-form to facilitate the selection of informative queries and determine the highest\nutility item from the available item set as quickly as possible.\nWe have shown the bene\ufb01ts of our method on a real dataset of 5000 users with preferences over\n10 sushi types.\nIn future work we aim at investigating other elicitation problems such as those\ninvolving a Likert scale [15] where our approach may be effective. The main practical constraint\nis that in order to carry out the evaluation (but not the application) of our method on real data we\nrequire the full set of preferences of the users over a set of items.\nOur main motivation for the Laplace method is its computational ef\ufb01ciency. However, [10] has\nshown that this method is a good approximation to the posterior in the context of the prefer-\nence learning problem. We intend to investigate other approximation methods to the posterior and\nmarginal likelihood and their joint application with sparse approximation methods within our frame-\nwork (see e.g. [16]), which will be required if the number of training users is large.\n\nAcknowledgments\n\nNICTA is funded by the Australian Government as represented by the Department of Broadband,\nCommunications and the Digital Economy and the Australian Research Council through the ICT\nCentre of Excellence program.\n\n8\n\n\fReferences\n[1] Donald R. Jones. A taxonomy of global optimization methods based on response surfaces.\n\nJournal of Global Optimization, 21(4):345\u2013383, 2001.\n\n[2] R.A. Howard. Information value theory. IEEE Transactions on Systems Science and Cyber-\n\nnetics, 2(1):22\u201326, 1966.\n\n[3] Urszula Chajewska, Daphne Koller, and Ronald Parr. Making rational decisions using adaptive\nutility elicitation. In Proceedings of the Seventeenth National Conference on Arti\ufb01cial Intel-\nligence and Twelfth Conference on Innovative Applications of Arti\ufb01cial Intelligence, pages\n363\u2013369. AAAI Press / The MIT Press, 2000.\n\n[4] Vincent Conitzer. Eliciting single-peaked preferences using comparison queries. Journal of\n\nArti\ufb01cial Intelligence Research, 35:161\u2013191, 2009.\n\n[5] Edwin V. Bonilla, Kian Ming A. Chai, and Christopher K. I. Williams. Multi-task Gaussian\nIn J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in\nprocess prediction.\nNeural Information Processing Systems 20, pages 153\u2013160. MIT Press, Cambridge, MA, 2008.\n[6] Toshihiro Kamishima. Nantonac collaborative \ufb01ltering: recommendation based on order re-\nsponses. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge\ndiscovery and data mining, pages 583\u2013588, New York, NY, USA, 2003. ACM.\n\n[7] Shengbo Guo and Scott Sanner. Real-time multiattribute Bayesian preference elicitation with\npairwise comparison queries. In Proceedings of the Thirteenth International Conference on\nArti\ufb01cial Intelligence and Statistics, 2010.\n\n[8] Urszula Chajewska and Daphne Koller. Utilities as random variables: Density estimation\nand structure discovery. In Proceedings of the 16th Conference on Uncertainty in Arti\ufb01cial\nIntelligence, pages 63\u201371. Morgan Kaufmann Publishers Inc., 2000.\n\n[9] Craig Boutilier. A POMDP formulation of preference elicitation problems. In Proceedings\nof the 18th National Conference on Arti\ufb01cial Intelligence, pages 239\u2013246, Menlo Park, CA,\nUSA, 2002. American Association for Arti\ufb01cial Intelligence.\n\n[10] Adriana Birlutiu, Perry Groot, and Tom Heskes. Multi-task preference learning with an appli-\n\ncation to hearing aid personalization. Neurocomputing, 73(7-9):1177\u20131185, 2010.\n\n[11] Kai Yu, Volker Tresp, and Anton Schwaighofer. Learning Gaussian processes from multiple\ntasks. In Proceedings of the 22nd international conference on Machine learning, pages 1012\u2013\n1019, New York, NY, USA, 2005. ACM.\n\n[12] Wei Chu and Zoubin Ghahramani. Gaussian processes for ordinal regression. Journal of\n\nMachine Learning Research, 6:1019\u20131041, 2005.\n\n[13] Wei Chu and Zoubin Ghahramani. Preference learning with Gaussian processes. In Proceed-\nings of the 22nd international conference on Machine learning, pages 137\u2013144, New York,\nNY, USA, 2005. ACM.\n\n[14] Brochu Eric, Nando De Freitas, and Abhijeet Ghosh. Active preference learning with discrete\nchoice data. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural\nInformation Processing Systems 20, pages 409\u2013416. MIT Press, Cambridge, MA, 2008.\n\n[15] Rensis Likert. A technique for the measurement of attitudes. Archives of Psychology,\n\n22(140):1\u201355, 1932.\n\n[16] Joaquin Qui\u02dcnonero Candela and Carl Edward Rasmussen. A unifying view of sparse approx-\nimate Gaussian process regression. Journal of Machine Learning Research, 6:1939\u20131959,\n2005.\n\n9\n\n\f", "award": [], "sourceid": 582, "authors": [{"given_name": "Shengbo", "family_name": "Guo", "institution": null}, {"given_name": "Scott", "family_name": "Sanner", "institution": null}, {"given_name": "Edwin", "family_name": "Bonilla", "institution": null}]}