{"title": "Mean Field Approach to a Probabilistic Model in Information Retrieval", "book": "Advances in Neural Information Processing Systems", "page_first": 513, "page_last": 520, "abstract": null, "full_text": "Mean-Field Approach to a Probabilistic Model\n\nin Information Retrieval\n\nBin Wu, K. Y. Michael Wong\n\nDepartment of Physics\n\nHong Kong University of Science and Technology\n\nClear Water Bay, Hong Kong\n\nphwbd@ust.hk\n\nphkywong@ust.hk\n\nDavid Bodoff\n\nDepartment of ISMT\n\nHong Kong University of Science and Technology\n\nClear Water Bay, Hong Kong\n\ndbodoff@ust.hk\n\nAbstract\n\nWe study an explicit parametric model of documents, queries, and rel-\nevancy assessment for Information Retrieval (IR). Mean-\ufb01eld methods\nare applied to analyze the model and derive ef\ufb01cient practical algorithms\nto estimate the parameters in the problem. The hyperparameters are es-\ntimated by a fast approximate leave-one-out cross-validation procedure\nbased on the cavity method. The algorithm is further evaluated on several\nbenchmark databases by comparing with standard algorithms in IR.\n\n1 Introduction\n\nThe area of information retrieval (IR) studies the representation, organization and access of\ninformation in an information repository. With the advent and boom of the Internet, espe-\ncially the World Wide Web (WWW), more and more information is available to be shared\nonline. Search on the Internet becomes increasingly popular. In this respect, probabilistic\nmodels have become very useful in empowering information searches [1, 2].\n\nIn fact, information searches themselves contain rich information, which can be recorded\nand fruitfully used to improve the performance of subsequent retrievals. This is an exten-\nsion of the process of relevance feedback [3], which incorporates the relevance assessments\nsupplied by the user to construct new representations for queries, during the procedure of\nthe users interactive document retrieval. In the process, the feedback information helps\nto re\ufb01ne the queries continuously, but the effects pertain only to the particular retrieval\nsession. On the other hand, our objective is to re\ufb01ne the representations of documents\nand queries with the help of relevancy data, so that subsequent retrieval sessions can be\nbene\ufb01ted.\n\nBased on Fuhr and Buckley\u2019s meta-structure [4] relating documents, queries and relevancy\nassessments, one of us recently proposed a probabilistic model [5] in which these objects\n\n\fare described by explicit parametric distribution functions, facilitating the construction of\na likelihood function, whose maximum can be used to characterize the documents and\nqueries. Rather than relying on heuristics as in many previous work, the proposed model\nprovides a uni\ufb01ed formal framework for the following two tasks: (a) ad hoc information\nretrieval, in which a query is given and the goal is to return a list of ranked documents\naccording to their similarities with the query; (b) document routing, in which a document\nis given and the goal is to categorize it using a list of ranked queries according to their sim-\nilarities with the document. (Here we assume a model in which categories are represented\nby queries.)\n\nIn this paper, we report our recent progress in putting this new theoretical approach to\nempirical tests. Since documents and queries are represented by high dimensional vec-\ntors in a vector space model, a mean-\ufb01eld approach will be adopted. mean-\ufb01eld methods\nwere commonly used to study magnetic systems in statistical physics, but thanks to their\nability to deal with high dimensional systems, they are increasingly applied to many ar-\neas of information processing recently [6]. In the present context, a mean-\ufb01eld treatment\nimplies that when a particular component of a document or query vector is analyzed, all\nother components of the same and other vectors can be considered as background \ufb01elds\nsatisfying appropriate average properties, and correlations of statistical \ufb02uctuations with\nthe background vectors can be neglected.\n\nAfter introducing the parametric model in Section 2, the mean-\ufb01eld approach will be used\nin two steps. First, in Section 3, the true representations of documents and queries will be\nestimated by maximizing the total probability of observation. It results in a set of mean-\n\ufb01eld equations, which can be solved by a fast iterative algorithm. Respectively, the esti-\nmated true documents and queries will then be used for ad hoc information retrieval and\ndocument routing.\n\nSecondly, the model depends on a few hyperparameters which are conventionally deter-\nmined by the cross-validation method. Here, as described in Section 4, the mean-\ufb01eld ap-\nproach can be used again to accelerate the otherwise tedious leave-one-out cross-validation\nprocedure. For a given set of hyperparameter values, it enables us to carry out the sys-\ntemwide iteration only once (rather than repeating once for each left-out document or\nquery), and the leave-one-out estimations of the document and query representations can\nbe obtained by a version of mean-\ufb01eld theory called the cavity method [7].\n\nIn Section 6, we compare the model with the standard tf-idf [8] and latent semantic indexing\n(LSI) [9] on benchmark test collections. As we shall see, the validity of our model is well\nsupported by its superior performance. The paper is concluded in Section 7.\n\n2 A Uni\ufb01ed Probabilistic Model\n\ndimensional vector. The vectors are denoted by \u0006\n\n),\nwhich are referred to as the true meaning of the document (query). Our model consists of\nthe following 3 components:\n\nOur work is motivated by Fuhr and Buckley\u2019s conceptual model. Assume that a set of \u0002\u0001\ndocuments and \u0004\u0003 queries is available to us. In the vector space model, each document\nand query is represented by an \u0005\n( \u0007\n(a) The document \u0006\u0002\b we really observe is distributed around the true document vector\naccording to the probability distribution \t\u000b\n\r\f\u000e\u0006\n\u0006\u0012\u0011 , the difference resulting from the\nother words, the document \u0006\n(b) Similarly, the query \u0007\nquery vector \u0007\n\naccording to the probability distribution distribution \t\u0014\u0013\u0015\f\u0016\u0007\n\ndocuments containing terms that do not ideally represent the meaning of the document. In\n\n\u0007\u0017\u0011 .\n\n\b\u0014\u000f\n\n\b\u0010\u000f\n\nis generated from its true meaning \u0006\n\n.\n\nthat the user actually submits is also distributed around the true\n\n\u0006\n\b\n\b\n\f(c) There is some relation between the document and query, called relevancy assessment.\nfor each pair of document and query. If\n\nWe denote this relation with a binary variable \n\u0002\u0001\u0004\u0003 , we say the document is relevant to the query, that is, the document is what the user\nwants. Otherwise, \u0005\u0001\u0007\u0006 and the document is irrelevant to the query. Suppose we have\naccording to the distribution \t\t\b\n\u0004\u0011 , that is, the true representation of documents and\n\nsome relevancy relations between documents and queries (through historical records, from\nexperts, etc.). Then we hypothesize that the true documents and queries are distributed\n\nqueries should satisfy their relevancy relations.\n\n\f\u000e\u0006\u000b\n\nWe summarize the idea through a probabilistic meta-structure shown in Figure 1.\n\nB ( | )\nf D,Q B\n\nB\n\n( | )\nQf Q Q\n\n0\n\n0( | )\nf D D\nD\n\nQ\n\nD\n\ndata\n\nunknown\nparameters\n\nFigure 1: Probabilistic meta-structure\n\n0\n\nQ\n\nD 0\n\ndata\n\nIn order to complete the model, we need to hypothesize the form of the distribution func-\ntions. In this paper, we restrict the documents and queries to a hypersphere, since usually\nonly the cosines of the angles between documents and queries are used to determine the\nsimilarity between documents and queries. Hence, we assume the following distribution\nfunctions:\n\n(a) The distribution of each observed document \u0006\n\u0006\u0017\u0016\n\u001a\u001d\u001c\u001f\u001e \u0003\n\b given its true location \u0007\n\u0007$\u0016\n\n\u0010\u0012\u0011\u0014\u0013\n\f\u0015\r\n(b) The distribution of each observed query \u0007\n\u0010\u0012\u0011\u0014\u0013\n\f\u0015\n\n\b given its true location \u0006\n\n\u001a\u0012\u001c&\u001e \u0003\n\n\u0006\u000b\f\u000e\n\n\u0011\u000f\u0001\n\n\f\u001b\u001a\n\n\f\u001b\u001a\n\n\u0011%\u0018\n\n\u0011\u0019\u0018\n\n\f\u0016\u0006\n\n:\n\n\u0013\u0015\f\u0016\u0007\n\n\u0007#\f\u001b\n\n\u0011\u000f\u0001\n\n:\n\n(1)\n\n(2)\n\n(c) The prior distribution of the documents and queries, given the relevance relation be-\ntween them:\n\n\f('\n\n\f76\n\nwhere \u0018\nand \t\n\n)\u00152\n\n\u00110\u0001\n\n.\f\u001b/\n\n\u0007+*-,\n\n\u0010\u0012\u0011\u0014\u0013\n\f\u0015/3\n\u0011\u0019\u0018\n\u0006#)%\n\nis the Dirac \u0018 -function, and !\n\u0013 and !\nrespectively, and are hence independent of \u0006\n\n)4*\n, !\n\n\u0011\u0019\u0018\n\n\f\u000e\u001a\n\n\u001a\u001d\u001c&\u001e5\u0003\n\n\u001a\u001d\u001c\u001f\u001e \u0003\n\n\f\u001b\u001a\u000b\f\u000e\u0006\n\b are normalization constants of \t\u000b\n\nand \u0007\n\n.\n\n(3)\n\n,\n\nIf we further assume that the observation of documents and queries are independent of each\nother, we can obtain the total probability of observing all documents and queries, given the\nrelevancy relation between them:\n\n\f('\n\n9\f\u001b:\n\n\u00110\u0001\n\n!<;\n\n\u0015\u0011%=?>\n\n\f\u001b\u001a\n\n=?@\n\n\u001e \u0003\n\n\u0011\u0019\u0018\n\n\f\u000e\u001a\n\n\u001e \u0003\n\n\u0011A\n\n(4)\n\n\u0007\n\u000f\n\t\n\n\b\n\u000f\n\u0001\n\u0001\n\u0006\n\b\n\u0006\n\b\n\u0011\n!\n\n\"\n\t\n\b\n\u000f\n\u0003\n\u0003\n\u0007\n\b\n\u0007\n\b\n\u0011\n!\n\u0013\n\"\n\t\n\b\n\u000f\n1\n*\n\u0006\n)\n\u0016\n\u0007\n*\n)\n\u0007\n*\n\u0011\n\u0011\n!\n\b\n\n\u0011\n\n\t\n\u0013\n\b\n8\n\u0006\n\b\n)\n\n\u0007\n\b\n*\n,\n\u000f\n!\n\b\n\f\n!\n\f\n!\n\u0013\n\u0011\n\u0018\n\u0006\n\b\n)\n\u001a\n\u001c\n\u0007\n\b\n*\n\u001a\n\u001c\n\fwhere\n\n!\u000f;\n\n\u0002\u0001\n)\u00152\n/\u0006\u0005\u0004\u0001\u0004\u001e\u001f/\b\u0007\n\n*\u0004\u0003\n)\u00152\n\n\u0006#)\n\n\u0007+*\n\n\u001e \u0003\n\n\u0011%\u0018\n\n\f\u001b\u001a\n\n\u0007+*\n\u0006#)\n\n\u0010\u001d\u0011\u0014\u0013\n\n\u001e \u0003\n\n\f\u0019\u001e\u001f/\u0006\u0005\n\n\u0011\u0012\n\n*-\n\n\u0006#)\u001b\u001a\n\f\u001b\u001a\n\u0007+*\u001f\u001e\n\n\u000e\n\nis the energy function.\n\n%/<, . There is now an appealing correspondence\n\nand : denotes all hyperparameters '\nbetween the present model and spin models in statistical physics. It is observed that !\njust the familiar partition function and \u0005\nBy maximizing the probability in Eq. (4), we can obtain an estimation of the true docu-\nments \t\n, which can be used in ad hoc retrieval: we de\ufb01ne the similarity function between\ntwo vectors as the cosine of the angle between them, and rank the similarities between \t\n(instead of \u0006\n\b ) with a new query to determine whether the documents should be retrieved\nor not. As a byproduct, we can also obtain the estimation of the true queries \t\nturn can be used in document routing: new documents should be compared with \t\n\n, which in\nto de-\ntermine whether it belongs to this category or not. So our model gives a unifying procedure\nfor both ad hoc retrieval and routing.\n\nis\n\n3 Parameter Estimation\n\nIn this section, we derive a fast iterative algorithm for parameter estimation. First, we\n\n\f\u0016\u0006\n\n\u0010\u0015\u0011\n\u0003\u0014\u0013\n\ni \u0001\n\nreplace the \u0018 -function by its Fourier transform. Then !\n\n\u0010\u0012\u0011\n\u0003\u000f\u000e\n\f\u000e\u001a\n\ni\f\n\u000b\n\ni\f\n\u0011\u000f\u0001\u0019\u0018\n\ni \nwhere\u0016\n\u0011\u0014\u001a\u001b\u0018\n\u0001 , \nMean-\ufb01eld theory works in the limit of large \nwell approximated by taking the saddle point of \u0016\nderivatives of\u0016 with respect to \u0006\n\u0006#)\n\nand\n\n, \u0007\n\n,\n\n\u001a\u0012\u001c<\u001e\nformula, we have changed the integration to the imaginary axis.\n, when the integration can be\n. This is obtained by equating the partial\n\n\u001a\u001d\u001c<\u001e\n\n\f\u001b\u001a\n\u0003 and \u0005\n\n. In writing this\n\nto zero, yielding\n\n\u0010\u0012\u0011\u0014\u0013\n\n; can be written as\n\f%\u001e\u0017\u0016\u0017\u0011A\n\n\u0011\u001c\u001a\n/\u0006\u0005\n\n(5)\n\n(6)\n\n(7)\n\n(8)\n\n(9)\n\n(10)\n\n(11)\n\n\u001a-\n\n\u0006#)\u001d\u001a\n+)4*\u0004\t\n*\u001f\t\n* \u001a\n\u0006#)\u001d\u001a\n*\u001f\t\n\n\u0010\u000f\u001aA/\u001e\u0007\n\u001aA/\u001e\u0007\n\nThis set of equations is referred to as the mean-\ufb01eld equations, since \ufb02uctuations around\nthe mean values of the parameters have been neglected. Due to its simple form, it can be\nsolved by an iterative scheme. Though we have not studied the theoretical convergence\nof the iterative scheme, its effectiveness can be seen from the following arguments. If we\nreplace\nat the saddle\npoint, then the iteration process becomes a linear one. Now, Eqs. (8) and (9) differ from\nthis linear iteration problem by scale factors of\nrespectively. Hence after\nusing Eqs. (10) and (11), the problem is equivalent to rescaling the lengths of the iterated\n\nin Eq. (9) by the respective values of\n\nin Eq. (8) and\n\n\u000e\"!\n\n\u0013\u0014!\n\nand\n\n)$#\n\u000e\"!\n\n) and\n\n*\u001c#\n\u0013\u0014!\n\n\u0001\n\u0003\n\u0018\n\u001c\n\u001a\n\u001c\n\u0011\n*\n\n)\n*\n\u0006\n)\n\u0016\n\n\u0001\n\u0007\n)\n\u0006\n\b\n)\n\u0016\n\u001e\n\n\u0003\n\u0007\n*\n\u0007\n\b\n*\n\u0016\n\u0007\n\n\u0001\n\u0003\n;\n\u0006\n\u0006\n\u0007\n\u0007\n!\n;\n\u0001\n\n\u0001\n)\n)\n*\n*\n\u0001\n\u0003\n\u0006\n)\n\u0003\n\u0007\n*\n)\n\n\u0007\n*\n\n\u000e\n)\n\n\u0013\n*\n)\n\u000e\n)\n\u0006\n)\n\u0003\n*\n\u0013\n*\n\u0007\n*\n\u0003\n\u000e\n\u0013\n\t\n\u0001\n/\n\u0018\n*\n\n)\n*\n\t\n\u0007\n*\n\u001a\n\n\u0001\n\u0006\n\b\n)\n\u0010\n\t\n\u000e\n)\n\n\t\n\u0007\n*\n\u0001\n/\n\u0018\n)\n\n\u0003\n\u0007\n\b\n*\n\u0010\n\t\n\u0013\n*\n\n\t\n\u000e\n)\n\u0001\n\u0003\n*\n\n)\n\u0007\n\n\u0001\n\u0006\n\b\n)\n\t\n\u0013\n*\n\u0001\n\u0003\n\u0010\n)\n\n)\n\n\u0003\n\u0007\n\b\n*\n\u001a\n\"\n\t\n\u000e\n)\n\t\n\u0013\n*\n)\n*\n\t\n\u000e\n\t\n\u0013\n*\n\fvectors back to the hypersphere de\ufb01ned by \u001a\n\noperation of linear iteration and rescaling back to the hypersphere makes it a very stable\nalgorithm. The complexity of the algorithm is linear in the number of documents and\nqueries. Empirically, it converges in just a few tens of steps. Alternatively, one may use\n, whose convergence is\n\n\u0003 . This alternate\n\n\u0003 and \u001a\n\n\u0006.)\u001b\u001a\u001d\u001c\u000b\u0001\n\nguaranteed, but is computationally more complex [10].\n\nthe Augmented Lagrangian method to \ufb01nd the saddle point of \u0016\n\n\u0007+*\n\n\u001a\u0012\u001c\n\n4 Hyperparameter Estimation\nIn our model, the parameters /\nand \t\n\n\u0001 and \n\n, \n\n, and in\ufb02uence the parameter estimation described in Section 3. We refer to them as\nhyperparameters. They have to be chosen so that the model performs optimally when new\nqueries are raised to retrieve documents, or when new documents are routed.\n\n\u0003 determine the shape of the distributions \t\n\n, \t\n\nexamples. The hyperparameters are chosen as the ones that give the optimal performance\n\nA standard method for hyperparameter estimation in machine learning is leave-one-out\nexamples for training the model. Then each\n\ncross-validation [11]. Suppose we have \ntime we pick one data as the validation set and train the model with the rest of the \naveraged over the \nrameters, because of the need to train the model \n\nThe exact leave-one-out cross-validation is very tedious, especially for multiple hyperpa-\ntimes for each combination of hyper-\nparameters. For this model, we propose an approximate leave-one-out procedure based on\nthe cavity method [7]. Suppose we have trained the model with all data, and obtain the\n\ntest examples.\n\n, , which satis\ufb01es the steady state equation\n/\u001e\u0018\n\n*\u001f\t\n\n* \u001a\n\n)4*\n\n(12)\n\nwere left out from the training set of queries, the cavity estimation should\n\nestimation '\n\n\u0006#)\u0019\n\n\u0006#)\nIf the query \u0007\u0001\n/\u001e\u0018\n\nsatisfy the equation\n\n*\u0004\u0003\n\n)4*\n\n)4*\n\n(13)\n\n\u0007\u0006\t\b\n\u0001\u000b\n\nBy subtracting (7) by (8), and assuming that\n\nis approximately the same as\n\n)%\n\n*\u0004\u0003\n\n/\u001e\u0018\n\n*\t, , we can get the difference,\n/3\n\u0006#)\u000e\r\nFor ad hoc retrieval, we eliminate \f\nlution can be further simpli\ufb01ed by using the mean-\ufb01eld argument that the changes induced\ncan be decoupled. Hence we can neglect the\nby removing the query \u0007\u0012\n\n\u0011\u0006\t\b\n\u0001\u000b\n\nto obtain a set of linear equations for \f\n\noff-diagonal terms, yielding\n\n) . The so-\n\non documents\n\n\u0007+*\u0010\n\n(14)\n\n)\u000f\n\n)4*\n\n(15)\n\ncan be estimated\nare then used to predict the\nleave-one-out ad hoc retrieval performance of the model. Equations for document routing\ncan be derived analogously.\n\n*\t, have been known in the systemwide training. Then \t\n\n) . The similarities between \u0007\n\nNote that '\nby \t\n\n)%\n\n/3\n\n)\u0014\n\n*\u0004\u0003\n\n\b\u0016\u0015\n\u0017\u0019\u0018\n\u001c\u001b\u001a\nand \t\n\n\u0001\n\n\u0013\n\b\n\u001e\n\u0003\n\t\n\t\n\u0007\n*\n\t\n\u0001\n*\n\n)\n\u0007\n\n\u0001\n\u0006\n\b\n)\n\u0010\n\t\n\u000e\n)\n\n\t\n\u0007\n*\n\u0001\n/\n\u0018\n)\n\n\t\n\u0006\n)\n\u001a\n\n\u0003\n\u0007\n\b\n*\n\u0010\n\t\n\u0013\n*\n\"\n\t\n\u0006\n\u0002\n\n)\n\u0001\n\u0005\n\n\n\t\n\u0007\n\u0002\n\n*\n\u001a\n\n\u0001\n\u0006\n\b\n)\n\u0010\n\t\n\u000e\n\u0002\n\n)\n\n\t\n\u0007\n\u0002\n\n*\n\u0001\n/\n\u0018\n)\n\n\t\n\u0006\n\u0002\n\n)\n\u001a\n\n\u0003\n\u0007\n\b\n*\n\u0010\n\t\n\u0013\n\u0002\n\n*\n\"\n'\n\u000e\n\u0002\n\n)\n\n\u0013\n\u0002\n\n*\n,\n'\n\u000e\n\u0013\n\f\n\u0005\n\n\n)\n*\n\f\n\u0007\n*\n\u001a\n\u0007\n\b\n\n\u0010\n\t\n\u000e\n)\n\n\f\n/\n\u0018\n)\n\n\f\n\u0006\n)\n\u0010\n\t\n\u0013\n*\n\"\n\u0007\n*\n\u0006\n\u0013\n\f\n\u0006\n)\n\u0001\n\u0007\n\b\n\n\u0010\n\t\n\u000e\n)\n\u001e\n/\n\u001c\n\u0018\n\u0005\n\n\u001c\n\u0018\n\"\n\t\n\u000e\n\t\n\u0013\n\u0006\n\u0002\n\n)\n\u0006\n\u0002\n\n)\n\u0001\n\u0006\n)\n\u001e\n\f\n\u0006\n\n\u0006\n\u0002\n\n)\n\fNote that we need to train the model only once, and the leave-one-out estimation of docu-\nments and queries can be obtained in one step. So the algorithm is extremely fast. Amaz-\ningly, it also gives reasonable estimations of hyperparameters, as shown in the following\nexperiments.\n\nWe remark that the mean-\ufb01eld technique can be applied to distributions of documents,\nqueries and relevance feedbacks other than those described by Eqs. (1-3). In the present\ncase specti\ufb01ed by Eqs. (1-3), our model is similar to the Gaussian model, if the spherical\n\u2019s are replaced by a spherical Gaussian prior. Though leave-one-\nout cross-validation can be done exactly in the Gaussian model, it involves the inversion of\na large matrix. On the other hand, the mean-\ufb01eld estimation greatly simpli\ufb01es the process\nby neglecting the off-diagonal elements.\n\nconstraint on \u0007\n\n\u2019s and \u0006\n\n5 Experimental Results\n\nWe have applied the proposed method to ad hoc retrieval and routing for the test collections\nof Cran\ufb01eld and CISI. Because we treat both tasks identically, we use the same evaluation\ncriterion: the recall precision curve and the average retrieval precision. We have run two\n\nversions of our algorithm: (a) in the original dimension, the observed documents \u0006\nqueries \u0007\n\n\b and\n\b are represented by the original tf-idf weights; (b) in the reduced dimension of\n\n100, in which the original vectors are reduced by singular value decomposition (SVD) in\nLSI.\n\nIn Figs. 2 (a-b), we show the recall precision curves at the optimal hyperparameters. The\nmean-\ufb01eld estimates are compared with the baseline results of LSI. It is clear that our\nmethod gives signi\ufb01cant gains in retrieval precision. Comparisons using the original di-\nmension or the Cran\ufb01eld collection, not shown here due to space limitations, yield equally\nsatisfactory results.\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n0.6\n\n0.4\n\n0.2\n\nLSI\n\n0.2\n\n0\n\n0\n\n0.6\n\n0.4\n\n0.2\n\nMF\n\nLSI\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\nMF\n\n0.4\n\n0.6\n\n0.8\n\n1\n\nRecall\n\n0\n\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\nRecall\n\nFigure 2: The recall precision curves of the mean-\ufb01eld estimation (MF) and the baseline\n(LSI) for (a) ad hoc retrieval (b) document routing for CISI in reduced dimension\n\nFor hyperparameter estimation, we can compare the mean-\ufb01eld results and those for exact\nleave-one-out cross-validation in reduced dimension, since the computation of the exact\nones is still feasible. In Fig. 3, we have plotted the average precision versus the two hyper-\nparameters, as computed by the two methods. They have very similar contours, although\nthere is a uniform displacement between their values. This demonstrates the usefulness of\nthe mean-\ufb01eld approximation in hyperparameter estimation.\n\nIn Table 1, we obtain the values of the optimal hyperparameters from the mean-\ufb01eld leave-\n\n\fone-out method, and the average precisions of the exact leave-one-out are then computed\nusing these optimal hyperparameters. These are compared with the results of the exact\nleave-one-out and listed in Table 1. For the hyperparameter estimation in the original\ndimension, the exact leave-one-out is not available since it is too tedious.\nInstead, we\ncompare the hyperparameters with the ones from the\n-fold cross-validation. Whether we\ncompare the mean-\ufb01eld with the exact leave-one-out or\n-fold cross-validation, the optimal\nhperparameters are comparable in most cases, and when there are discrepancies, one can\nobserve that the average precisions are essentially the same.\n\nFigure 3: Average retrieval precision versus hyperparameters for ad hoc retrieval in reduced\n\ndimension for CISI: (a) mean-\ufb01eld leave-one-out, peaked at \f\n(b) exact leave-one-out. peaked at \f\u0015\n\n/\u000f\n\u001b\n\n\u0011\u000f\u0001\n\n/0\n\u001b\n\n\u0011 .\n\n\u0011+\u0001\n\n\u0011 ;\n\nTable 1: The average retrieval precision for leave-one-out cross-validation in reduced di-\nmension: mean-\ufb01eld versus exact.\n\nCISI\n\nCran\ufb01eld\n\nLSI\n\nMean-Field\n\nExact\n\nLSI\n\nMean-Field\n\nExact\n\n\u2013\n0.3\n0.3\n\n\u2013\n\n28.9\n23.0\n\n\u2013\n\n12.0\n10.1\n\n\u2013\n1.6\n2.5\n\nAverage precision\n\nAverage precision\n\nad hoc retrieval\n\n0.079\n0.142\n0.142\n\n\u2013\n0.4\n0.6\n\nDocument Routing\n\n0.104\n0.192\n0.193\n\n\u2013\n2.5\n0.9\n\n\u2013\n1.1\n1.5\n\n\u2013\n1.1\n0.7\n\n0.178\n0.248\n0.250\n\n0.240\n0.351\n0.356\n\n6 Conclusion\n\nWe have considered a probabilistic model of documents, queries and relevancy assess-\nments. Fast algorithms are derived for parameter and hyperparameter estimations. Signif-\nicant improvement is achieved for both ad hoc retrieval and routing compared with tf-idf\nand LSI. In another paper [12], we have compared the model with other heuristic meth-\nods such as Rocchio heuristics [3] and Bartell\u2019s Multidimensional Scaling [13], and the\nmean-\ufb01eld method still outperforms them. These successes illustrate the potentials of the\nmean-\ufb01eld approach, which is especially suitable for systems with high dimensions and\n\n\n\n\n\u0001\n#\n\u0003\n#\n/\n\f\n\u0006\n\"\n\u0001\n\n\u0003\n\u0010\n\"\n\u0006\n\u0001\n#\n\u0003\n#\n/\n\f\n\u0006\n\"\n\u0001\n\n\u0003\n\u0006\n\"\n\u0003\n\n\u0001\n#\n/\n\n\u0003\n#\n/\n\n\u0001\n#\n/\n\n\u0003\n#\n/\n\fnumerous mutually interacting components, such as those in IR. Hence we anticipate that\nmean-\ufb01eld methods will have increasing applications in many other probabilistic models\nin IR.\n\nAcknowledgments\n\nWe thank R. Jin for interesting discussions. This work was supported by the grant\nHKUST6157/99P of the Research Grant Council of Hong Kong.\n\nReferences\n\n[1] Cohn, D. and T. Hofmann (2001). The Missing Link \u2013 A Probabilistic Model of Doc-\nument Content and Hypertext Connectivity. Advances in Neural Information Process-\ning Systems 13, T. K. Leen, T. G. Dietterich and V. Tresp, eds., MIT Press, Cambridge,\nMA, 430-436.\n\n[2] Jaakola, T. and H. Siegelmann (2002). Active Information Retrieval. Advances in\nNeural Information Processing Systems 14, T. G. Dietterich, S. Becker and Z. Ghahra-\nmani, eds., MIT Press, Cambridge, MA, 777-784.\n\n[3] Rocchio, J. J. (1971). Relevance Feedback in Information Retrieval. SMART Retrieval\nSystem\u2013Experiments in Automatic Document Processing, G. Salton ed., Prentice-\nHall, Englewood Cliffs, NJ, Chapter 14.\n\n[4] Fuhr, N. and C. Buckley (1991). A Probabilistic Learning Approach for Document\n\nIndexing. ACM Transactions on Information Systems 9(3): 223-248.\n\n[5] Bodoff, D., D. Enabe, A. Kanbil, G. Simon and A. Yukhimets (2001). A Uni\ufb01ed\nMaximumn Likelihood Approach to Document Retrieval. Journal of the American\nSociety for Information Science and Technology 52(10): 785-796.\n\n[6] Opper, M. and D. Saad, eds. (2001). Advanced Mean Field Methods, MIT Press,\n\nCambridge, MA.\n\n[7] Wong, K. Y. M. and F. Li (2002). Fast Parameter Estimation Using Green\u2019s Functions.\nAdvances in Neural Information Processing System 14: 535-542, T.G. Dietterich, S.\nBecker and Z. Ghahramani, eds., MIT Press, Cambridge, MA.\n\n[8] Salton, G. and M. J. McGill (1983). Introduction to Modern Information Retrieval,\n\nMcGraw-Hill, New York, 63-66.\n\n[9] Deerwester, S., S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman (1990).\nIndexing by Latent Semantic Analysis. Journal of the American Society for Informa-\ntion Science 41(16): 391-407.\n\n[10] Nocedal, J. and S. J. Wright (1999). Numerical Optimization, Springer, Berlin, Ch.\n\n17.\n\n[11] Bishop, C. M. (1995). Neural Networks for Pattern Recognition, Clarendon Press,\n\nOxford, 372-375.\n\n[12] Bodoff, D., B. Wu and K. Y. M. Wong (2002). Relevance Feedback meets Maximum\n\nLikelihood, preprint.\n\n[13] Bartell, B. T., G. W. Cottrell and R. K. Belew (1992). Latent Semantic Indexing Is\nan Optimal Special Case of Multidimensional Scaling. Proceedings of the 15th In-\nternational ACM SIGIR Conference on Research and Development in Information\nRetrieval, 161-167.\n\n\f", "award": [], "sourceid": 2250, "authors": [{"given_name": "Bin", "family_name": "Wu", "institution": null}, {"given_name": "K.", "family_name": "Wong", "institution": null}, {"given_name": "David", "family_name": "Bodoff", "institution": null}]}