{"title": "Modeling User Rating Profiles For Collaborative Filtering", "book": "Advances in Neural Information Processing Systems", "page_first": 627, "page_last": 634, "abstract": "", "full_text": "Modeling User Rating Pro(cid:12)les For\n\nCollaborative Filtering\n\nBenjamin Marlin\n\nDepartment of Computer Science\n\nUniversity of Toronto\n\nToronto, ON, M5S 3H5, CANADA\n\nmarlin@cs.toronto.edu\n\nAbstract\n\nIn this paper we present a generative latent variable model for\nrating-based collaborative (cid:12)ltering called the User Rating Pro(cid:12)le\nmodel (URP). The generative process which underlies URP is de-\nsigned to produce complete user rating pro(cid:12)les, an assignment of\none rating to each item for each user. Our model represents each\nuser as a mixture of user attitudes, and the mixing proportions are\ndistributed according to a Dirichlet random variable. The rating for\neach item is generated by selecting a user attitude for the item, and\nthen selecting a rating according to the preference pattern associ-\nated with that attitude. URP is related to several models including\na multinomial mixture model, the aspect model [7], and LDA [1],\nbut has clear advantages over each.\n\n1\n\nIntroduction\n\nIn rating-based collaborative (cid:12)ltering, users express their preferences by explicitly\nassigning ratings to items that they have accessed, viewed, or purchased. We assume\na set of N users f1; :::; N g, a set of M items f1; :::; M g, and a set of V discrete rating\nvalues f1; :::; V g. In the natural case where each user has at most one rating ru\ny\nfor each item y, the ratings for each user form a vector with one component per\nitem. Of course, the values of some components are not known. We refer to user\nu\u2019s rating vector as their rating pro(cid:12)le denoted ru.\n\nRating prediction is the elementary task performed with rating-based data. Given\na particular item and user, the goal is to predict the user\u2019s true rating for the\nitem in question. Early work on rating prediction focused on neighborhood-based\nmethods such as the GroupLens algorithm [9]. Personalized recommendations can\nbe generated for any user by (cid:12)rst predicting ratings for all items the user has not\nrated, and recommending items with the highest predicted ratings. The capability\nto predict ratings has other interesting applications. Rating predictions can be\nincorporated with content-based scores to create a preference augmented search\nprocedure [4]. Rating prediction also facilitates an active approach to collaborative\n(cid:12)ltering using expected value of information. In such a framework the predicted\nrating of each item is interpreted as its expected utility to the user [2].\n\n\fIn order to gain the maximum advantage from the expressive power of ratings, a\nprobabilistic model must enable the calculation of the distribution over ratings, and\nthus the calculation of predicted ratings. A handful of such models exist including\nthe multinomial mixture model shown in (cid:12)gure 3, and the aspect model shown in\n(cid:12)gure 1 [7]. As latent variable models, both the aspect model and the multinomial\nmixture model have an intuitive appeal. They can be interpreted as decomposing\nuser preferences pro(cid:12)les into a set of typical preference patterns, and the degree to\nwhich each user participates in each preference pattern. The settings of the latent\nvariable are casually referred to as user attitudes. The multinomial mixture model\nconstrains all users to have the same prior distribution over user attitudes, while\nthe aspect model allows each user to have a di(cid:11)erent prior distribution over user\nattitudes. The added (cid:13)exibility of the aspect model is quite attractive, but the\ninterpretation of the distribution over user attitudes as parameters instead of ran-\ndom variables induces several problems.1First, the aspect model lacks a principled,\nmaximum likelihood inference procedure for novel user pro(cid:12)les. Second the number\nof parameters in the model grows linearly with the number of users in the data set.\n\nRecent research has seen the proposal of several generative latent variable models\nfor discrete data, including Latent Dirichlet Allocation [1] shown in (cid:12)gure 2, and\nmultinomial PCA (a generalization of LDA to priors other than Dirichlet) [3]. LDA\nand mPCA were both designed with co-occurrence data in mind (word-document\npairs). They can only be applied to rating data if the data is (cid:12)rst processed into\nuser-item pairs using some type of thresholding operation on the rating values.\nThese models can then be used to generate recommendations; however, they can\nnot be used to infer a distribution over ratings of items, or to predict the ratings of\nitems.\n\nThe contribution of this paper is a new generative, latent variable model that views\nrating-based data at the level of user rating pro(cid:12)les. The URP model incorporates\nproper generative semantics at the user level that are similar to those used in LDA\nand mPCA, while the inner workings of the model are designed speci(cid:12)cally for rating\npro(cid:12)les. Like the aspect model and the multinomial mixture model, the URP model\ncan be interpreted in terms of decomposing rating pro(cid:12)les into typical preference\npatterns, and the degree to which each user participates in each pattern. In this\npaper we describe the URP model, give model (cid:12)tting and initialization procedures,\nand present empirical results for two data sets.\n\n2 The User Rating Pro(cid:12)le Model\n\nThe graphical representation of the aspect, LDA, multinomial mixture, and URP\nmodels are shown in (cid:12)gures 1 through 4. In all models U is a user index, Y is an item\nindex, Z is a user attitude, Zy is the user attitude responsible for item y, R is a rating\nvalue, Ry is a rating value for item Y , and (cid:12)vyz is a multinomial parameter giving\nP (Ry = vjZy = z). In the aspect model (cid:18) is a set of multinomial parameters where\n(cid:18)u\nz represents P (Z = zjU = u). The number of these parameters obviously grows as\nthe number of training users is increased. In the mixture of multinomials model (cid:18)\nis a single distribution over user attitudes where (cid:18)z represents P (Z = z). This gives\nthe multinomial mixture model correct, yet simplistic, generative semantics at the\nuser level. In both LDA and URP (cid:18) is not a parameter, but a Dirichlet random\nvariable with parameter (cid:11). A unique (cid:18) is sampled for each user where (cid:18)z gives\n\n1Girolami and Kab(cid:19)an have recently shown that a co-occurrence version of the aspect\nmodel can be interpreted as a MAP/ML estimated LDA model under a uniform Dirichlet\nprior [5]. Essentially the same relationship holds between the aspect model for ratings\nshown in (cid:12)gure 1, and the URP model.\n\n\fFigure 1: Aspect Model\n\n(cid:1)\n\nZ\n\nR1\n\nR2\n\nRM\n\nN\n\n(cid:1)\n\nZ\n\nY\n\nM\n\nN\n\nFigure 2: LDA Model\n\n(cid:1)\n\nZ1\n\nZ2\n\nZM\n\nR1\n\nR2\n\nRM\n\nN\n\nFigure 3: Multinomial Mixture Model\n\nFigure 4: URP Model\n\nP (Z = z) for that user. This gives URP much more powerful generative semantics\nat the user level than the multinomial mixture model. As with LDA, URP could\nbe generalized to use any continuous distribution on the simplex, but in this case\nthe Dirichlet leads to e(cid:14)cient prediction equations. Note that the bottom level of\nthe LDA model consists of an item variable Y , and ratings do not come into LDA\nat any point.\n\nThe probability of observing a given user rating pro(cid:12)le ru under the URP model\nis shown in equation 1 where we de(cid:12)ne (cid:14)(ru\ny ; v) to be equal to 1 if user u assigned\nrating v to item y, and 0 otherwise. Note that we assume unspeci(cid:12)ed ratings are\nmissing at random. As in LDA, the Dirichlet prior renders the computation of the\nposterior distribution p((cid:18); zjru; (cid:11); (cid:12)) = P ((cid:18); z; ruj(cid:11); (cid:12))=P (ruj(cid:11); (cid:12)) intractable.\n\nP (ruj(cid:11); (cid:12)) =Z(cid:18)\n\nP ((cid:18)j(cid:11))\n\nM\n\nYy=1\n\nV\n\nYv=1 K\nXz=1\n\nP (Zy = zj(cid:18))P (Ry = vjZy = z; (cid:12))!\n\n(cid:14)(ru\n\ny ;v)\n\nd(cid:18)\n\n(1)\n\n3 Parameter Estimation\n\nThe procedure we use for parameter estimation is a variational expectation maxi-\nmization algorithm based on free energy maximization. As with LDA, other meth-\nods including expectation propagation could be applied. We choose to apply a\nfully factored variational q-distribution as shown in equation 2. We de(cid:12)ne q((cid:18)j(cid:13) u)\nto be a Dirichlet distribution with Dirichlet parameters (cid:13) u\ny ) to be a\nmultinomial distribution with parameters (cid:30)u\n\nz , and q(Zyj(cid:30)u\n\nzy.\n\nP ((cid:18); zj(cid:11); (cid:12); ru) (cid:25) q((cid:18); zj(cid:13)u; (cid:30)u) = q((cid:18)j(cid:13)u)\n\nq(Zy = zyj(cid:30)u\ny )\n\n(2)\n\nM\n\nYy=1\n\n(cid:2)\n(cid:3)\n(cid:2)\n(cid:2)\n(cid:3)\n\fA per-user free energy function F [(cid:13) u; (cid:30)u; (cid:11); (cid:12)] provides a variational lower bound\non the log likelihood log p(ruj(cid:11); (cid:12)) of a single user rating pro(cid:12)le. The sum of the\nper-user free energy functions F [(cid:13) u; (cid:30)u; (cid:11); (cid:12)] yields the total free energy function\nF [(cid:13); (cid:30); (cid:11); (cid:12)], which is a lower bound on the log likelihood of a complete data set of\nuser rating pro(cid:12)les. The variational and model parameter updates are obtained by\nexpanding F [(cid:13); (cid:30); (cid:11); (cid:12)] using the previously described distributions, and maximizing\nthe result with respect to (cid:13) u, (cid:30)u, (cid:11) and (cid:12). The variational parameter updates are\nshown in equations 3, and 4. (cid:9) denotes the (cid:12)rst derivative of the log gamma\nfunction, also know as the digamma or psi function.\n\n(cid:12)vyz\n\n(cid:14)(ru\n\ny ;v) exp((cid:9)((cid:13)u\n\nj=1(cid:13)u\n\nj ))\n\nz ) (cid:0) (cid:9)(Pk\n\n(cid:30)u\n\nzy /\n\nV\n\nYv=1\n\n(cid:13)u\nz = (cid:11)z +\n\n(cid:30)u\nzy\n\nM\n\nXy=1\n\n(3)\n\n(4)\n\nBy iterating the the variational updates with (cid:12)xed (cid:11) and (cid:12) for a particular user, we\nare guaranteed to reach a local maximum of the per-user free energy F [(cid:13) u; (cid:30)u; (cid:11); (cid:12)].\nThis iteration is a well de(cid:12)ned approximate inference procedure for the URP model.\n\nThe model multinomial update has a closed form solution as shown in equation 5.\nThis is not the case for the model Dirichlet (cid:11) due to coupling of its parameters.\nHowever, Minka has proposed two iterative methods for estimating a Dirichlet distri-\nbution from probability vectors that can be used here. We give Minka\u2019s (cid:12)xed-point\niteration in equations 6 and 7, which yields very similar results compared to the\nalternative Newton iteration. Details for both procedures including the inversion of\nthe digamma function may be found in [8].\n\nN\n\n(cid:12)vyz /\n\n(cid:30)u\nzy(cid:14)(ru\n\ny ; v)\n\nXu=1\nXz=1\n\nK\n\n(cid:9)((cid:11)z) = (cid:9)(\n\n(cid:11)z) + 1=N (\n\n(cid:11)z = (cid:9)(cid:0)1((cid:9)((cid:11)z))\n\nN\n\nXu=1\n\n(cid:9)((cid:13)u\n\nz ) (cid:0) (cid:9)(\n\n(cid:13)u\nz ))\n\nN\n\nXu=1\n\n(5)\n\n(6)\n\n(7)\n\n4 Model Fitting and Initialization\n\nWe give a variational expectation maximization procedure for model (cid:12)tting in this\nsection as well as an initialization method that has proved to be very e(cid:11)ective for\nthe URP model. Lastly, we discuss stopping criteria used for the EM iterations.\n\n4.1 Model Fitting\n\nThe variational inference procedure should be run to convergence to insure a maxi-\nmum likelihood solution. However, if we are satis(cid:12)ed with simply increasing the free\nenergy at each step, other (cid:12)tting procedures are possible. In general, the number\nof steps of variational inference can be determined by a user dependant heuristic\nfunction H(u). Buntine uses a single step of variational inference for each user to\n(cid:12)t the mPCA model. At the other end of the spectrum, Blei et al. select a su(cid:14)cient\n\n\fnumber of steps to achieve convergence when (cid:12)tting the LDA model. Empirically,\nwe have found that simple linear functions, of the number of ratings in each user\npro(cid:12)le provide a good heuristic. The details of the (cid:12)tting procedure are given below.\n\nE-Step:\n\n1. For all users u\n\n2.\n\n3.\n\n4.\n\nM-Step:\n\nFor h = 0 to H(u)\n\n(cid:30)u\n\n(cid:13)u\n\nv=1 (cid:12)ryz\n\nzy /QV\nz = (cid:11)z +PM\n\ny=1 (cid:30)u\n\nzy\n\n(cid:14)(ru\n\ny ;v) exp((cid:9)((cid:13)u\n\nj=1(cid:13)u\n\nj ))\n\nz ) (cid:0) (cid:9)(Pk\n\n1. For each v; y; z set (cid:12)vyz /PN\n\n2. While not converged\n\nu=1 (cid:30)u\n\nzyv(cid:14)(ru\n\ny ; v).\n\n3.\n\n4.\n\n(cid:9)((cid:11)z) = (cid:9)(PK\n\n(cid:11)z = (cid:9)(cid:0)1((cid:9)((cid:11)z))\n\nz=1 (cid:11)z) + 1=N (PN\n\nu=1 (cid:9)((cid:13)u\n\nu=1 (cid:13)u\n\nz ))\n\nz ) (cid:0) (cid:9)(PN\n\n4.2\n\nInitialization and Early Stopping\n\nFitting the URP model can be quite di(cid:14)cult starting from randomly initialized\nparameters. The initialization method we have adopted is to partially (cid:12)t a multi-\nnomial mixture model with the same number of user attitudes as the URP model.\nFitting the multinomial mixture model for a small number of EM iterations yields\na set of multinomial distributions encoded by (cid:12) 0, as well as a single multinomial\ndistribution over user attitudes encoded by (cid:18) 0. To initialize the URP model we set\n(cid:12) = (cid:12) 0, (cid:11) = (cid:20)(cid:18)0 where (cid:20) is a positive constant. Letting (cid:20) = 1 appears to give good\nresults in practice.\n\nNormally EM is run until the bound on log likelihood converges, but this tends\nto lead to over (cid:12)tting in some models including the aspect model. To combat\nthis problem Hofmann suggests using early stopping of the EM iteration [7]. We\nimplemented early stopping for all models using a separate validation set to allow\nfor a fair comparison.\n\n5 Prediction\n\nThe primary task for any model applied to the rating-based collaborative (cid:12)ltering\nproblem is to predict ratings for the items a user has not rated, based on the ratings\nthe user has speci(cid:12)ed. Assume we have a user u with rating pro(cid:12)le ru, and we wish\nto predict the user\u2019s rating ru\ny for an unrated item y. The distribution over ratings\nfor the item y can be calculated using the model as follows:\n\nP (Ry = vjru) =Z(cid:18)Xz\n\nP (Ry = vjZy = z)P (Zy = zj(cid:18))P ((cid:18)jru)d(cid:18)\n\n(8)\n\nThis quantity may look quite di(cid:14)cult to compute, but by interchanging the sum\nand integral, and appealing to our variational approximation q((cid:18)j(cid:13) u) (cid:25) P ((cid:18)jru) we\nobtain an expression in terms of the model and variational parameters.\n\n\fp(Ry = vjru) =\n\n(cid:12)vyz\n\nK\n\nXz=1\n\n(cid:13)u\nz\nj=1 (cid:13)u\n\nj\n\nPK\n\n(9)\n\nTo compute P (Ry = vjru) according to equation 9 given the model parameters (cid:11)\nand (cid:12), it is necessary to apply our variational inference procedure to compute (cid:13) u.\nHowever, this only needs to be done once for each user in order to predict all un-\nknown ratings in the user\u2019s pro(cid:12)le. Given the distribution P (Ryjru), various rules\ncan be used to compute the predicted rating. One could predict the rating with\nmaximal probability, predict the expected rating, or predict the median rating. Of\ncourse, each of these prediction rules minimizes a di(cid:11)erent prediction error mea-\nsure. In particular, median prediction minimizes the mean absolute error and is the\nprediction rule we use in our experiments.\n\n6 Experimentation\n\nWe consider two di(cid:11)erent experimental procedures that test the predictive ability\nof a rating-based collaborative (cid:12)ltering method. The (cid:12)rst is a weak generalization\nall-but-1 experiment where one of each user\u2019s ratings is held out. The model is then\ntrained on the remaining observed ratings and tested on the held out ratings. This\nexperiment is designed to test the ability of a method to generalize to other items\nrated by the users it was trained on.\n\nWe introduce a second experimental protocol for testing a stronger form of gener-\nalization. The model is (cid:12)rst trained using all ratings from a set of training users.\nOnce the model is trained, an all-but-1 experiment is performed using a separate\nset of test users. This experiment is designed to test the ability of the model to\ngeneralize to novel user pro(cid:12)les.\n\nTwo di(cid:11)erent base data sets were used in the experiments. The well known Each-\nMovie data set, and the recently released million rating MovieLens data set. Both\ndata sets were (cid:12)ltered to contain users with at least 20 ratings. EachMovie was\n(cid:12)ltered to remove movies with less than 2 ratings leaving 1621 movies. The Movie-\nLens data was similarly (cid:12)ltered leaving 3592 movies. The EachMovie training sets\ncontained 30000 users while the test sets contained 5000 users. The MovieLens\ntraining sets contained 5000 users while the test sets contained 1000 users. The\nEachMovie rating scale is from 0 to 5, while the MovieLens rating scale is from 1\nto 5.\n\nBoth types of experiment were performed for a range of numbers of user attitudes.\nFor each model and number of user attitudes, each experiment was repeated on\nthree di(cid:11)erent random partitions of each base data set into known ratings, held out\nratings, validation ratings, training users and testing users. In the weak generaliza-\ntion experiments the aspect, multinomial mixture, and URP models were tested.\nIn the strong generalization experiments only the multinomial mixture and URP\nmodels were tested since a trained aspect model can not be applied to new user\npro(cid:12)les. Also recall that LDA and mPCA can not be used for rating prediction\nso they are not be tested in these experiments. We provide results obtained with\na best-K-neighbors version of the GroupLens method for various values of K as a\nbaseline method.\n\n\fr\no\nr\nr\n\nl\n\nE\n \ne\nt\nu\no\ns\nb\nA\n \nn\na\ne\nM\n \nd\ne\nz\n\ni\nl\n\na\nm\nr\no\nN\n\nEachMovie Weak Generalization\n\n0.5\n\n0.49\n\n0.48\n\n0.47\n\n0.46\n\n0.45\n\n0.44\n\n0.43\n\n0.42\n\n0.41\n\n0.4\n\n5\n\n10\n\n15\n\n20\n\nK\n\nNeighborhood\nAspect Model\nMultinomial Mixture\nURP\n\n25\n\n30\n\n35\n\n40\n\nr\no\nr\nr\n\nl\n\nE\n \ne\nt\nu\no\ns\nb\nA\n \nn\na\ne\nM\n \nd\ne\nz\n\ni\nl\n\na\nm\nr\no\nN\n\nEachMovie Strong Generalization\n\n0.5\n\n0.49\n\n0.48\n\n0.47\n\n0.46\n\n0.45\n\n0.44\n\n0.43\n\n0.42\n\n0.41\n\n0.4\n\n5\n\n10\n\n15\n\n20\n\nK\n\nNeighborhood\nMultinomial Mixture\nURP\n\n25\n\n30\n\n35\n\n40\n\nFigure 5: EachMovie weak generalization.\n\nFigure 6: EachMovie strong generalization.\n\n0.5\n\n0.49\n\n0.48\n\n0.47\n\n0.46\n\n0.45\n\n0.44\n\n0.43\n\n0.42\n\n0.41\n\nr\no\nr\nr\n\nl\n\nE\n \ne\nt\nu\no\ns\nb\nA\nn\na\ne\nM\nd\ne\nz\n\n \n\n \n\ni\nl\n\na\nm\nr\no\nN\n\n0.4\n\n2\n\n3\n\n4\n\n5\n\n6\n\nK\n\nMovieLens Weak Generalization\n\nr\no\nr\nr\n\nl\n\nE\n \ne\nt\nu\no\ns\nb\nA\nn\na\ne\nM\nd\ne\nz\n\n \n\n \n\ni\nl\n\na\nm\nr\no\nN\n\nNeighborhood\nAspect Model\nMultinomial Mixture\nURP\n\n7\n\n8\n\n9\n\n10\n\nMovieLens Strong Generalization\n\n0.5\n\n0.49\n\n0.48\n\n0.47\n\n0.46\n\n0.45\n\n0.44\n\n0.43\n\n0.42\n\n0.41\n\n0.4\n\n2\n\n3\n\n4\n\n5\n\n6\n\nK\n\nNeighborhood\nMultinomial Mixture\nURP\n\n7\n\n8\n\n9\n\n10\n\nFigure 7: MovieLens weak generalization.\n\nFigure 8: MovieLens strong generalization.\n\n7 Results\n\nResults are reported in (cid:12)gures 5 through 8 in terms of normalized mean absolute\nerror (NMAE). We de(cid:12)ne our NMAE to be the standard MAE normalized by the the\nexpected value of the MAE assuming uniformly distributed rating values and rating\npredictions. For the EachMovie dataset E[M AE] is 1:9(cid:22)4, and for the MovieLens\ndata set it is 1:6. Note that our de(cid:12)nition of NMAE di(cid:11)ers from that used by\nGoldberg et al.\ntake the normalizer to be the di(cid:11)erence\nbetween the minimum and maximum ratings, which means most of the error scale\ncorresponds to performing much worse than random.\n\n[6]. Goldberg et al.\n\nIn both the weak and strong generalization experiments using the EachMovie data\nset, the URP model performs signi(cid:12)cantly better than the other methods, and\nobtains the lowest prediction error. The results obtained from the MovieLens data\nset do not show the same clean trends as the EachMovie data set for the weak\ngeneralization experiment. The smaller size of MovieLens data set seems to cause\nURP to over (cid:12)t for larger values of K, thus increasing its test error. Nevertheless,\nthe lowest error attained by URP is not signi(cid:12)cantly di(cid:11)erent than that obtained\nby the aspect model. In the strong generalization experiment the URP model again\nout performs the other methods.\n\n\f8 Conclusions\n\nIn this paper we have presented the URP model for rating-based collaborative\n(cid:12)ltering. Our model combines the intuitive appeal of the multinomial mixture and\naspect models, with the strong high level generative semantics of LDA and mPCA.\nAs a result of being specially designed for collaborative (cid:12)ltering, our model also\ncontains unique rating pro(cid:12)le generative semantics not found in LDA or mPCA.\nThis gives URP the capability to operate directly on ratings data, and to e(cid:14)ciently\npredict all missing ratings in a user pro(cid:12)le. This means URP can be applied to\nrecommendation, as well as many other tasks based on rating prediction.\n\nWe have empirically demonstrated on two di(cid:11)erent data sets that the weak general-\nization performance of URP is at least as good as that of the aspect and multinomial\nmixture models. For online applications where it is impractical to re(cid:12)t the model\neach time a rating is supplied by a user, the result of interest is strong generalization\nperformance. The aspect model can not be applied in a principled manner in such\na scenario, and we see that URP outperforms the other methods by a signi(cid:12)cant\nmargin.\n\nAcknowledgments\n\nWe thank the Compaq Computer Corporation for the use of the EachMovie data\nset, and the GroupLens Research Group at the University of Minnesota for use of\nthe MovieLens data set. Many thanks go to Rich Zemel for helpful comments and\nnumerous discussions about this work.\n\nReferences\n\n[1] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine\n\nLearning Research, 3:993{1022, Jan. 2003.\n\n[2] C. Boutilier, R. S. Zemel, and B. Marlin. Active collaborative (cid:12)ltering.\n\nIn\nProceedings of the Nineteenth Annual Conference on Uncertainty in Arti(cid:12)cial\nIntelligence, pages 98{106, 2003.\n\n[3] W. Buntine. Variational extensions to EM and multinomial PCA. In Proceedings\n\nof the European Conference on Machine Learning, 2002.\n\n[4] M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes, and M. Sartin.\nCombining content-based and collaborative (cid:12)lters in an online newspaper. In\nProceedings of ACM SIGIR Workshop on Recommender Systems, 1999.\n\n[5] M. Girolami and A. Kab(cid:19)an. On an equivalence between PLSI and LDA. In Pro-\nceedings of the ACM Conference on Research and Development in Information\nRetrieval, pages 433{434, 2003.\n\n[6] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time\ncollaborative (cid:12)ltering algorithm. Information Retrieval Journal, 4(2):133{151,\nJuly 2001.\n\n[7] T. Hofmann. Learning What People (Don\u2019t) Want. In Proceedings of the Euro-\n\npean Conference on Machine Learning, 2001.\n\n[8] T. Minka. Estimating a Dirichlet Distribution. Unpublished, 2003.\n[9] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl. GroupLens:\nAn Open Architecture for Collaborative Filtering of Netnews. In Proceedings of\nACM 1994 Conference on Computer Supported Cooperative Work, pages 175{\n186, Chapel Hill, North Carolina, 1994. ACM.\n\n\f", "award": [], "sourceid": 2377, "authors": [{"given_name": "Benjamin", "family_name": "Marlin", "institution": null}]}