{"title": "GenDeR: A Generic Diversified Ranking Algorithm", "book": "Advances in Neural Information Processing Systems", "page_first": 1142, "page_last": 1150, "abstract": "Diversified ranking is a fundamental task in machine learning. It is broadly applicable in many real world problems, e.g., information retrieval, team assembling, product search, etc. In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. We formulate it as an optimization problem and show that in general it is NP-hard. Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the near-optimal solution. Experimental results on real data sets demonstrate the effectiveness of the proposed algorithm.", "full_text": "GenDeR: A Generic Diversi\ufb01ed Ranking Algorithm\n\nJingrui He\n\nIBM T.J. Watson Research\n\nYorktown Heights, NY 10598\njingruhe@us.ibm.com\n\nQiaozhu Mei\n\nUniversity of Michigan\nAnn Arbor, MI 48109\nqmei@umich.edu\n\nHanghang Tong\n\nIBM T.J. Watson Research\n\nYorktown Heights, NY 10598\n\nhtong@us.ibm.com\n\nBoleslaw K. Szymanski\n\nRensselaer Polytechnic Institute\n\nTroy, NY 12180\n\nszymab@rpi.edu\n\nAbstract\n\nDiversi\ufb01ed ranking is a fundamental task in machine learning. It is broadly appli-\ncable in many real world problems, e.g., information retrieval, team assembling,\nproduct search, etc. In this paper, we consider a generic setting where we aim\nto diversify the top-k ranking list based on an arbitrary relevance function and\nan arbitrary similarity function among all the examples. We formulate it as an\noptimization problem and show that in general it is NP-hard. Then, we show that\nfor a large volume of the parameter space, the proposed objective function enjoys\nthe diminishing returns property, which enables us to design a scalable, greedy\nalgorithm to \ufb01nd the (1 \u2212 1/e) near-optimal solution. Experimental results on\nreal data sets demonstrate the effectiveness of the proposed algorithm.\n\n1 Introduction\n\nMany real applications can be reduced to a ranking problem. While traditional ranking tasks mainly\nfocus on relevance, it has been widely recognized that diversity is another highly desirable property.\nIt is not only a key factor to address the uncertainty and ambiguity in an information need, but also\nan effective way to cover the different aspects of the information need [14]. Take team assembling\nas an example. Given a task which typically requires a set of skills, we want to form a team of\nexperts to perform that task. On one hand, each team member should have some relevant skills.\nOn the other hand, the whole team should somehow be diversi\ufb01ed, so that we can cover all the\nrequired skills for the task and different team members can bene\ufb01t from each other\u2019s diversi\ufb01ed,\ncomplementary knowledge and social capital. More recent research discovers that diversity plays\na positive role in improving employees\u2019 performance within big organizations as well as their job\nretention rate in face of lay-off [21]; in improving the human-centric sensing results [15, 17]; in the\ndecision of joining a new social media site (e.g., Facebook) [18], etc.\n\nTo date, many diversi\ufb01ed ranking algorithms have been proposed. Early works mainly focus on\ntext data [5, 23] where the goal is to improve the coverage of (sub-)topics in the retrieval result. In\nrecently years, more attention has been paid to result diversi\ufb01cation in web search [2, 20]. For ex-\nample, if a query bears multiple meanings (such as the key word \u2018jaguar\u2019, which could refer to either\ncars or cats), we would like to have each meaning (e.g., \u2018cars\u2019 and \u2018cats\u2019 in the example of \u2018jaguar\u2019)\ncovered by a subset of the top ranked web pages. Another recent trend is to diversify PageRank-type\nof algorithms for graph data [24, 11, 16]. It is worth pointing out that almost all the existing diver-\nsi\ufb01ed ranking algorithms hinge on the speci\ufb01c choice of the relevance function and/or the similarity\nfunction. For example, in [2] and [20], both the relevance function and the similarity function im-\nplicitly depend on the categories/subtopics associated with the query and the documents; in [16], the\n\n1\n\n\frelevance function is obtained via personalized PageRank [8], and the similarity is measured based\non the so-called \u2018Google matrix\u2019; etc.\n\nIn this paper, we shift the problem to a more generic setting and ask: given an arbitrary relevance\nfunction wrt an implicit or explicit query, and an arbitrary similarity function among all the available\nexamples, how can we diversify the resulting top-k ranking list? We address this problem from\nthe optimization viewpoint. First, we propose an objective function that admits any non-negative\nrelevance function and any non-negative, symmetric similarity function. It naturally captures both\nthe relevance with regard to the query and the diversity of the ranking list, with a regularization\nparameter that balances between them. Then, we show that while such an optimization problem\nis NP-hard in general, for a large volume of the parameter space, the objective function exhibits\nthe diminishing returns property, including submodurality, monotonicity, etc. Finally, we propose a\nscalable, greedy algorithm to \ufb01nd provably near-optimal solution.\n\nThe rest of the paper is organized as follows. We present our optimization formulation for diversi\ufb01ed\nranking in Section 2, followed by the analysis of its hardness and properties. Section 3 presents our\ngreedy algorithm for solving the optimization problem. The performance of the proposed algorithm\nis evaluated in Section 4. In Section 5, we brie\ufb02y review the related work. Finally, we conclude the\npaper in Section 6.\n\n2 The Optimization Formulation\n\nIn this section, we present the optimization formulation for diversi\ufb01ed ranking. We start by intro-\nducing the notation, and then present the objective function, followed by the analysis regarding its\nhardness and properties.\n\n2.1 Notation\n\nIn this paper: we use normal lower-case letters to denote scalers or functions, bold-face lower-case\nletters to denote vectors, bold-face upper-case letters to denote matrices, and calligraphic upper-case\nletters to denote sets. To be speci\ufb01c, for a set X of n examples {x1, x2, . . . , xn}, let S denote the\nn \u00d7 n similarity matrix, which is both symmetric and non-negative. In other words, Si,j = Sj,i\nand Si,j \u2265 0, where Si,j is the element of S in the ith row and the jth column (i, j = 1, . . . , n).\nFor any ranking function r(\u00b7), which returns the non-negative relevance score for each example\nin X with respect to an implicit or explicit query, our goal is to \ufb01nd a subset T of k examples,\nwhich are relevant to the query and diversi\ufb01ed among themselves. Here the positive integer k is the\nbudget of the ranking list size, and the ranking function r(\u00b7) generates an n \u00d7 1 vector r, whose ith\nelement ri = r(xi). When we describe the objective function as well as the proposed optimization\nalgorithm, it is convenient to introduce the following n \u00d7 1 reference vector q = S \u00b7 r. Intuitively,\nits ith element qi measures the importance of xi. To be speci\ufb01c, if xi is similar to many examples\n(high Si,j (j = 1, 2, ...., )) that are relevant to the query (high rj(j = 1, 2, ...), it is more important\nthan the examples whose neighbors are not relevant. For example, if xi is close to the center of a\nbig cluster relevant to the query, the value of qi is large.\n\n2.2 Objective Function\n\nWith the above notation, our goal is to \ufb01nd a subset T of k examples which are both relevant to\nthe query and diversi\ufb01ed among themselves. To this end, we propose the following optimization\nproblem.\n\narg max\n|T |=k\n\ng(T ) = w X\ni\u2208T\n\nqiri \u2212 X\ni,j\u2208T\n\nriSi,j rj\n\n(1)\n\nwhere w is a positive regularization parameter that de\ufb01nes the trade-off between the two terms, and\nT consists of the indices of the k examples that will be returned in the ranking list.\nIntuitively, in the goodness function g(T ), the \ufb01rst term measures the weighted overall relevance\nof T with respect to the query, and qi is the weight for xi. It favors relevant examples from big\nclusters. In other words, if two examples are equally relevant to the query, one from a big cluster and\nthe other isolated, by using the weighted relevance, we prefer the former. The second term measures\n\n2\n\n\fthe similarity among the examples within T . That is, it penalizes the selection of multiple relevant\nexamples that are very similar to each other. By including this term in the objective function, we seek\na set of examples which are relevant to the query, but also dissimilar to each other. For example, in\nthe human-centric sensing [15, 17], due to the homophily in social networks, reports of two friends\nare likely correlated so that they are a lesser corroboration of events than reports of two socially\nunrelated witnesses.\n\n2.3 The Hardness of Equation (1)\n\nIn the optimization problem in Equation (1), we want to \ufb01nd a subset T of k examples that collec-\ntively maximize the goodness function g(T ). Unfortunately, by the following theorem, it is NP-hard\nto \ufb01nd the optimal solution.\nTheorem 2.1. The optimization problem in Equation (1) is NP-hard.\n\nProof. We will prove this from the reduction of the Densest k-Subgraph (DkS) problem, which is\nknown to be NP-hard [7].\n\nTo be speci\ufb01c, given an undirected graph G(V, E) with the connectivity matrix W , where V is the\nset of vertices, and E is the set of edges. W is a |V| \u00d7 |V| symmetric matrix with elements being 0 or\n1. Let |E| be the total number of the edges in the graph. The DkS problem is de\ufb01ned in Equation (2).\n\nQ = arg max\n|Q|=k\n\nW i,j\n\nX\ni,j\u2208Q\n\n(2)\n\nDe\ufb01ne another |V| \u00d7 |V| matrix \u00afW as: \u00afW i,j = 1 \u2212 W i,j. It is easy to see that Pi,j\u2208Q W i,j =\nk2 \u2212 Pi,j\u2208Q\n\n\u00afW i,j. Therefore, Equation (2) is equivalent to\n\nQ = arg min\n|Q|=k\n\n\u00afW i,j\n\nX\ni,j\u2208Q\n\n(3)\n\nFurthermore, notice that P|V|\nis equivalent to\n\ni,j=1\n\n\u00afW i,j = |V|2 \u2212 |E| = constant. Let T = V \\ Q, then Equation (3)\n\narg max\n|Q|=k\n\ni\u2208Q,j\u2208T\n\nX\n\n\u00afW i,j + X\n\n\u00afW i,j + X\n\n\u00afW i,j\n\n= arg max\n\n|T |=|V|\u2212k\n\n2 X\ni\u2208Q,j\u2208T\n\ni\u2208T ,j\u2208T\n\ni\u2208T ,j\u2208Q\n\u00afW i,j + X\ni,j\u2208T\n\n\u00afW i,j\n\n(4)\n\nNext, we will show that Equation (4) can be viewed as an instance of the optimization problem in\nEquation (1) with the following setting: let the similarity function S be \u00afW , the ranking function r\nbe 1|V|\u00d71, the budget be |V| \u2212 k, and the regularization parameter w be 2. Under such settings, the\nobjective function in Equation (1) becomes\n\ng(T ) = 2 X\ni\u2208T\n\nqiri \u2212 X\ni,j\u2208T\n\nri\n\n\u00afW i,j rj\n\n= 2 X\ni\u2208T\n= 2 X\ni\u2208Q\n= 2 X\ni\u2208Q\n\n|V|\nX\nj=1\nX\nj\u2208T\nX\nj\u2208T\n\nri\n\nri\n\n\u00afW ij rj \u2212 X\ni,j\u2208T\n\u00afW ij rj + X\ni,j\u2208T\n\u00afW i,j\n\n\u00afW ij + X\ni,j\u2208T\n\nri\n\n\u00afW i,j rj\n\n(dfn. of q)\n\nri\n\n\u00afW i,j rj\n\n(symmetry of \u00afW)\n\n(dfn. of r)\n\nwhich is equivalent to the objective function in Equation (4). This completes the proof.\n\n3\n\n(5)\n\n(cid:3)\n\n\f2.4 Diminish Returns Property of g(T )\n\nGiven that Equation (1) is NP-hard in general, we seek for a provably near-optimal solution instead\nin the next section. Here, let us \ufb01rst answer the following question: under what condition (e.g., in\nwhich range of the regularization parameter w), is it possible to \ufb01nd such a near-optimal solution\nfor Equation (1)?\n\nTo this end, we present the so-called diminishing returns property of the goodness function g(T )\nde\ufb01ned in Equation (1), which is summarized in the following theorem. By Theorem 2.2, if we\nadd more examples into an existing top-k ranking list, the goodness of the overall ranking list is\nnon-decreasing (P2). However, the marginal bene\ufb01t of adding additional examples into the ranking\nlist decreases wrt the size of the existing ranking list (P1).\nTheorem 2.2. Diminish Returns Property of g(T ). The goodness function g(T ) de\ufb01ned in Equa-\ntion (1) has the following properties:\n\n(P1) submodularity. For any w > 0, the objective function g(T ) is submodular wrt T ;\n\n(P2) monotonicity. For any w \u2265 2, The objective function g(T ) is monotonically non-\n\ndecreasing wrt T .\n\nProof. We \ufb01rst prove (P1). For any T1 \u2282 T2 and any given example x /\u2208 T2, we have\n\ng(T1 \u222a x) \u2212 g(T1) = (w X\ni\u2208T1\u222ax\n\nqiri \u2212 X\n\ni,j\u2208T1\u222ax\n\nriSi,j rj) \u2212 (w X\ni\u2208T1\n\nqiri \u2212 X\ni,j\u2208T1\n\nriSi,j rj)\n\n= wqxrx \u2212 (X\ni\u2208T1\n\nriSi,xrx + X\nj\u2208T1\n\nrxS x,j rj + rxS x,xrx)\n\n= wqxrx \u2212 S x,xr2\n\nx \u2212 2rx X\nj\u2208T1\n\nS x,j rj\n\n(6)\n\nSimilarly, we have g(T2 \u222a x) \u2212 g(T2) = wqxrx \u2212 S x,xr2\nTherefore, we have\n\nx \u2212 2rx Pj\u2208T2\n\nS x,j rj.\n\n(g(T1 \u222a x) \u2212 g(T1)) \u2212 (g(T2 \u222a x) \u2212 g(T2)) = 2rx X\nj\u2208T2\n= 2rx X\n\nS x,j rj \u2212 2rx X\nj\u2208T1\n\nS x,j rj\n\nS x,j rj \u2265 0\n\n(7)\n\nwhich completes the proof of (P1).\n\nj\u2208T2\\T1\n\nNext, we prove (P2). Given any T1 \u2229 T2 = \u03a6, where \u03a6 is the empty set, with w \u2265 2, we have\ng(T2 \u222a T1) \u2212 g(T2) = w X\ni\u2208T1\n\nriSi,j rj + X\ni,j\u2208T1\n\nriSi,j rj + X\n\nqiri \u2212 ( X\n\ni\u2208T1,j\u2208T2\n\ni\u2208T2,j\u2208T1\n\nriSi,j rj)\n\n= w X\ni\u2208T1\n\nSi,j rj \u2212 (2 X\n\nriSi,j rj + X\ni,j\u2208T1\n\nriSi,j rj)\n\ni\u2208T1,j\u2208T2\n\nSi,j rj \u2212 2( X\n\nriSi,j rj + X\ni,j\u2208T1\n\nriSi,j rj)\n\ni\u2208T1,j\u2208T2\n\nri\n\nri\n\nn\nX\nj=1\nn\nX\nj=1\nn\nX\nj=1\nri X\n\nri(\n\nj /\u2208T1\u222aT2\n\n\u2265 2 X\ni\u2208T1\n\n= 2 X\ni\u2208T1\n= 2 X\ni\u2208T1\n\nSi,j rj \u2212 X\n\nSi,j rj)\n\nj\u2208T1\u222aT2\n\nSi,j rj \u2265 0\n\n(8)\n\nwhich completes the proof of (P2).\n\n(cid:3)\n\n4\n\n\f3 The Optimization Algorithm\n\nIn this section, we present our algorithm GenDeR for solving Equation (1), and analyze its perfor-\nmance with respect to its near-optimality and complexity.\n\n3.1 Algorithm Description\n\nBased on the diminishing returns property of the goodness function g(T ), we propose the following\ngreedy algorithm to \ufb01nd a diversi\ufb01ed top-k ranking list. In Alg. 1, after we calculate the reference\nvector q (Step 1) and initialize the ranking list T (Step 2), we try to expand the ranking list T\none-by-one (Step 4-8). At each iteration, we add one more example with the highest score si into\nthe current ranking list T (Step 5). Each time we expand the current ranking list, we update the\nscore vector s based on the newly added example i (Step 7). Notice that in Alg. 1, \u2018\u2297\u2019 means the\nelement-wise multiplication, and diag(S) returns an n \u00d7 1 vector with the corresponding elements\nbeing the diagonal elements in the similarity matrix S.\n\nAlgorithm 1 GenDeR\nInput: The similarity matrix Sn\u00d7n, the relevance vector rn\u00d71, the weight w \u2265 2, and the budget\n\nk;\n\nOutput: A subset T of k nodes.\n1: Compute the reference vector q: q = Sr;\n2: Initialize T as an empty set;\n3: Initialize the score vector s = w \u00d7 (q \u2297 r) \u2212 diag(S) \u2297 r \u2297 r;\n4: for iter = 1 : k do\n5:\n6:\n7:\n8: end for\n9: Return the subset T as the ranking list (earlier selected examples ranked higher).\n\nFind i = argmaxj(sj|j = 1, ..., n; j /\u2208 T );\nAdd i to T ;\nUpdate the score vector s \u2190 s \u2212 2riS :,i \u2297 r\n\n3.2 Algorithm Analysis\n\nThe accuracy of the proposed GenDeR is summarized in Lemma 3.1, which says that for a large\nvolume of the parameter space (i.e., w \u2265 2), GenDeR leads to a (1 \u2212 1/e) near-optimal solution.\n\nLemma 3.1. Near-Optimality of GenDeR. Let T be the subset found by GenDeR, |T | = k, and\nT \u2217 = argmax|T |=kg(T ). We have that g(T ) \u2265 (1 \u2212 1/e)g(T \u2217), where e is the base of the natural\nlogarithm.\n\nProof. The key of the proof is to verify that for any example xj /\u2208 T , sj = g(T \u222a xj) \u2212 g(T ),\nwhere s is the score vector we calculate in Step 3 or update in Step 7, and T is the initial empty\nranking list or the current ranking list in Step 6. The remaining part of the proof directly follows the\ndiminishing returns property of the goodness function in Theorem 2.2, together with the fact that\ng(\u03a6) = 0 [12]. We omit the detailed proof for brevity.\n(cid:3)\nThe complexity of the proposed GenDeR is summarized in Lemma 3.2. Notice that the quadratic\nterm in the time complexity comes from the matrix-vector multiplication in Step 1 (i.e., q = Sr);\nand the quadratic term in the space complexity is the cost to store the similarity matrix S. If the\nsimilarity matrix S is sparse, say we have m non-zero elements in S, we can reduce the time\ncomplexity to O(m + nk); and reduce the space complexity to O(m + n + k).\n\nLemma 3.2. Complexity of GenDeR. The time complexity of GenDeR is O(n2 + nk); the space\ncomplexity of GenDeR is O(n2 + k).\n\nProof. Omitted for Brevity.\n\n(cid:3)\n\n5\n\n\f4 Experimental Results\nWe compare the proposed GenDeR with several most recent diversi\ufb01ed ranking algorithms, includ-\ning DivRank based on reinforced random walks [11] (referred to as \u2018DR\u2019), GCD via resistive graph\ncenters [6] (referred to as \u2018GCD\u2019) and manifold ranking with stop points [25] (referred to as \u2018MF\u2019).\nAs all these methods aim to improve the diversity of PageRank-type of algorithms, we also present\nthe results by PageRank [13] itself as the baseline. We use two real data sets, including an IMDB\nactor professional network and an academic citation data set. In [11, 6], the authors provide detailed\nexperimental comparisons with some earlier methods (e.g., [24, 23, 5], etc) on the same data sets.\nWe omit the results by these methods for clarity.\n\n4.1 Results on Actor Professional Network\nThe actor professional network is constructed from the Internet Movie Database (IMDB)1, where\nthe nodes are the actors/actresses and the edges are the numbers of the co-stared movies between two\nactors/actresses. For the inputs of GenDeR, we use the adjacency matrix of the co-stared network as\nthe similarity function S; and the ranking results by \u2018DR\u2019 as the relevance vector r. Given a top-k\nranking list, we use the density of the induced subgraph of S by the k nodes as the reverse measure\nof the diversity (lower density means higher diversity). We also measure the diversity of the ranking\nlist by the so-called \u2018country coverage\u2019 as well as \u2018movie coverage\u2019 (higher coverage means higher\ndiversity), which are de\ufb01ned in [24]. Notice that for a good top-k diversi\ufb01ed ranking list, it often\nrequires the balance between the diversity and the relevance in order to ful\ufb01ll the user\u2019s information\nneed. Therefore, we also present the relevance score (measured by PageRank) captured by the entire\ntop-k ranking list. In this application, such a relevance score measures the overall prestige of the\nactors/actresses in the ranking list. Overall, we have 3,452 actors/actresses, 23,460 edges, 1,027\nmovies and 47 countries.\n\nThe results are presented in Fig. 1. First, let us compare GenDeR with the baseline method \u2018PageR-\nank\u2019. From Fig. 1(d), we can see that our GenDeR is as good as \u2018PageRank\u2019 in terms of capturing\nthe relevance of the entire top-k ranking list (notice that the two curves almost overlap with each\nother). On the other hand, GenDeR outperforms \u2018PageRank\u2019 in terms of the diversity by all the\nthree measures (Fig. 1(a-c)). Since GenDeR uses the ranking results by \u2018DR\u2019 as its input, \u2018DR\u2019\ncan be viewed as another baseline method. The two methods perform similarly in terms of density\n(Fig. 1(c)). Regarding all the remaining measures, our GenDeR is always better than \u2018DR\u2019. For\nexample, when k \u2265 300, GenDeR returns both higher \u2018country-coverage\u2019 (Fig. 1(a)) and higher\n\u2018movie-coverage\u2019 (Fig. 1(b)). In the entire range of the budget k (Fig. 1(d)), our GenDeR captures\nhigher relevance scores than \u2018DR\u2019, indicating the actors/actresses in our ranking list might be more\nprestigious than those by \u2018DR\u2019. Based on these results, we conclude that our GenDeR indeed im-\nproves \u2018DR\u2019 in terms of both diversity and relevance. The most competitive method is \u2018MF\u2019. We\ncan see that GenDeR and \u2018MF\u2019 perform similarly in terms of both density (Fig. 1(c)) and \u2018movie\ncoverage\u2019 (Fig. 1(b)). In terms of \u2018country coverage\u2019 (Fig. 1(a)), \u2018MF\u2019 performs slightly better than\nour GenDeR when 300 \u2264 k \u2264 400; and for the other values of k, the two methods mix with each\nother. However, in terms of relevance (Fig. 1(d)), our GenDeR is much better than \u2018MF\u2019. Therefore,\nwe conclude that \u2018MF\u2019 performs comparably with or slightly better than our GenDeR in terms of\ndiversity, at the cost of sacri\ufb01cing the relevance of the entire ranking list. As for \u2018GCD\u2019, although\nit leads to the lowest density, it performs poorly in terms of balancing between the diversity and the\nrelevance (Fig. 1(d)), as well as the coverage of countries/movies (Fig. 1(a-b)).\n\n4.2 Results on Academic Citation Networks\nThis data set is from ACL Anthology Network2. It consists of a paper citation network and a re-\nsearcher citation network. Here, the nodes are papers or researchers; and the edges indicate the\ncitation relationship. Overall, we have 11,609 papers and 54,208 edges in the paper citation net-\nwork; 9,641 researchers and 229,719 edges in the researcher citation network. For the inputs of\nGenDeR, we use the symmetrized adjacency matrix as the similarity function S; and the ranking\nresults by \u2018DR\u2019 as the relevance vector r. We use the same measure as in [11] (referred to as \u2018cover-\nage\u2019), which is the total number of unique papers/researchers that cite the top-k papers/researchers in\nthe ranking list. As pointed out in [11], the \u2018coverage\u2019 might provide a better measure of the overall\nquality of the top-k ranking list than those traditional measures (e.g., h-index) as they ignore the di-\nversity of the ranking list. The results are presented in Fig. 2. We can see that the proposed GenDeR\n\n1http://www.imdb.com/\n2http://www.aclweb.org/anthology-new/\n\n6\n\n\fe\ng\na\nr\ne\nv\no\nc\n \ny\nr\nt\nn\nu\no\nc\n\n50\n\n45\n\n40\n\n35\n\n30\n\n25\n\n20\n\n15\n\n \n50\n\n \n\n1000\n\n \n\nPageRank\nDR\nMF\nGCD\nGenDeR\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\nk\n\ne\ng\na\nr\ne\nv\no\nc\n \n\ni\n\ne\nv\no\nm\n\n900\n\n800\n\n700\n\n600\n\n500\n\n400\n\n300\n\n200\n\n100\n\n \n50\n\nPageRank\nDR\nMF\nGCD\nGenDeR\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\nk\n\n(a) Country Coverage (Higher is better)\n\n(b) Movie Coverage (Higher is better)\n\n0.09\n\n0.08\n\n0.07\n\n0.06\n\ny\nt\ni\ns\nn\ne\nd\n\n0.05\n\n0.04\n\n0.03\n\n0.02\n\n0.01\n\n \n0\n50\n\n \n\nPageRank\nDR\nMF\nGCD\nGenDeR\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\nk\n\ne\nc\nn\na\nv\ne\ne\nr\n\nl\n\n0.35\n\n0.3\n\n0.25\n\n0.2\n\n0.15\n\n0.1\n\n0.05\n\n \n0\n50\n\n \n\nPageRank\nDR\nMF\nGCD\nGenDeR\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\nk\n\n(c) Density (Lower is better)\n\n(d) Relevance (Higher is better)\n\nFigure 1: The evaluations on actor professional network. (a-c) are different diversity measures and\n(d) measures the relevance of the entire ranking list.\n\nperforms better than all the alternative choices. For example, with k = 50, GenDeR improves the\n\u2018coverage\u2019 of the next best method by 416 and 157 on the two citation networks, respectively.\n\ne\ng\na\nr\ne\nv\no\nc\n\n6000\n\n5000\n\n4000\n\n3000\n\n2000\n\n1000\n\n0\n\n \n0\n\n \n\nPageRank\nDR\nMF\nGCD\nGenDeR\n\n20\n\n40\n\n60\n\n80\n\n100\n\nk\n\ne\ng\na\nr\ne\nv\no\nc\n\n8000\n\n7000\n\n6000\n\n5000\n\n4000\n\n3000\n\n2000\n\n1000\n\n0\n\n \n0\n\n \n\nPageRank\nDR\nMF\nGCD\nGenDeR\n\n20\n\n40\n\n60\n\n80\n\n100\n\nk\n\n(a) Paper Citation Network\n\n(b) Researcher Citation Network\n\nFigure 2: The evaluations on academic citation networks. Higher is better.\n\n5 Related Work\nCarbonell et al [5] are among the \ufb01rst to study diversi\ufb01ed ranking in the context of text retrieval and\nsummarization. To this end, they propose to use the Maximal Marginal Relevance (MMR) criterion\nto reduce redundancy while maintaining query relevance, which is a linear combination of relevance\nand novelty. In [23], Zhai et al address this problem from a different perspective by explicitly model-\ning the subtopics associated with a query, and proposing a framework to evaluate subtopic retrieval.\nRecently, researchers leverage external information sources to help with diversi\ufb01ed ranking. For ex-\nample, in [2], Agrawal et al maximize the probability that the average user \ufb01nds at least one useful\n\n7\n\n\fresult within the top ranked results with the help of a taxonomy available through Open Directory\nProject (ODP); in [4], Capannini et al mine the query log to \ufb01nd specializations of a given query,\nand use the search results of the specializations to help evaluate the set of top ranked documents;\nin [20], Welch et al model the expected number of hits based on the number of relevant documents\na user will visit, user intent in terms of the probability distribution over subtopics, and document\ncategorization, which are obtained from the query logs, WordNet or Wikipedia.\n\nWith the prevalence of graph data, such as social networks, author/paper citation networks, actor\nprofessional networks, etc, researchers have started to study the problem of diversi\ufb01ed ranking in\nthe presence of relationships among the examples. For instance, in [24], Zhu et al propose the\nGRASSHOPPER algorithm by constructing random walks on the input graph, and iteratively turning\nthe ranked nodes into absorbing states. In [11], Mei et al propose the DivRank algorithm based on\na reinforced random walk de\ufb01ned on the input graph, which automatically balances the prestige\nand the diversity among the top ranked nodes due to the fact that adjacent nodes compete for their\nranking scores. In [16], Tong et al propose a scalable algorithm to \ufb01nd the near-optimal solution to\ndiversify the top-k ranking list for PageRank. Due to the asymmetry in their formulation, it remains\nunclear if the optimization problem in [16] is NP-hard. On a higher level, the method in [16]\ncan be roughly viewed as an instantiation of our proposed formulation with the speci\ufb01c choices\nin the optimization problem (e.g, the relevance function, the similarity function, the regularization\nparameter, etc). In [25], Zhu et al leverage the stopping points in the manifold ranking algorithms\nto diversify the results. All these works aim to diversify the results of one speci\ufb01c type of ranking\nfunction (i.e., PageRank and its variants).\n\nLearning to rank [10, 1, 3] and metric learning [19, 22, 9] have been two very active areas in the\nrecent years. Most of these methods require some additional information (e.g., label, partial order-\ning, etc) for training. They are often tailored for other purposes (e.g., improving the F-score in the\nranking task, improving the classi\ufb01cation accuracy in metric learning, etc) without the consideration\nof diversity. Nonetheless, thanks to the generality of our formulation, the learned ranking functions\nand metric functions from most of these works can be naturally admitted into our optimization ob-\njective function. In other words, our formulation brings the possibility to take advantage of these\nexisting research results in the diversi\ufb01ed ranking setting.\n\nRemarks. While generality is one of the major contributions of this paper, we do not disregard the\nvalue of the domain-speci\ufb01c knowledge. The generality of our method is orthogonal to domain-\nspeci\ufb01c knowledge. For example, such knowledge can be re\ufb02ected in the (learnt) ranking function\nand/or the (learnt) similarity function, which can in turn serve as the input of our method.\n\n6 Conclusion\nIn this paper, we study the problem of diversi\ufb01ed ranking. The key feature of our formulation lies\nin its generality: it admits any non-negative relevance function and any non-negative, symmetric\nsimilarity function as input, and outputs a top-k ranking list that enjoys both relevance and diversity.\nFurthermore, we identify the regularization parameter space where our problem can be solved near-\noptimally; and we analyze the hardness of the problem, the optimality as well as the complexity of\nthe proposed algorithm. Finally, we conduct experiments on several real data sets to demonstrate\nthe effectiveness of this algorithm. Future work includes extending our formulation to the on-line,\ndynamic setting.\n7 Acknowledgement\nResearch was sponsored by the Army Research Laboratory and was accomplished under Cooper-\native Agreement Number W911NF-09-2-0053. This work was in part supported by the National\nScience Foundation under grant numbers IIS-1054199 and CCF-1048168; and by DAPRA under\nSMISC Program Agreement No. W911NF-12-C-0028. The views and conclusions contained in this\ndocument are those of the authors and should not be interpreted as representing the of\ufb01cial policies,\neither expressed or implied, of the Army Research Laboratory, the National Science Foundation, or\nthe U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for\nGovernment purposes notwithstanding any copyright notation here on.\n\nReferences\n\n[1] A. Agarwal and S. Chakrabarti. Learning random walks to rank nodes in graphs. In ICML, pages 9\u201316,\n\n2007.\n\n8\n\n\f[2] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results.\n\n5\u201314, 2009.\n\nIn WSDM, pages\n\n[3] C. J. C. Burges, K. M. Svore, P. N. Bennett, A. Pastusiak, and Q. Wu. Learning to rank using an ensemble\nof lambda-gradient models. Journal of Machine Learning Research - Proceedings Track, 14:25\u201335, 2011.\n\n[4] G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri. Ef\ufb01cient diversi\ufb01cation of search results using\n\nquery logs. In WWW (Companion Volume), pages 17\u201318, 2011.\n\n[5] J. G. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents\n\nand producing summaries. In SIGIR, pages 335\u2013336, 1998.\n\n[6] A. Dubey, S. Chakrabarti, and C. Bhattacharyya. Diversity in ranking via resistive graph centers. In KDD,\n\npages 78\u201386, 2011.\n\n[7] U. Feige, G. Kortsarz, and D. Peleg. The dense k-subgraph problem. Algorithmica, 29, 1999.\n\n[8] T. H. Haveliwala. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE\n\nTrans. Knowl. Data Eng., 15(4):784\u2013796, 2003.\n\n[9] P. Jain, B. Kulis, and I. S. Dhillon. Inductive regularized learning of kernel functions. In NIPS, pages\n\n946\u2013954, 2010.\n\n[10] T.-Y. Liu. Learning to rank for information retrieval. In SIGIR, page 904, 2010.\n\n[11] Q. Mei, J. Guo, and D. R. Radev. Divrank: the interplay of prestige and diversity in information networks.\n\nIn KDD, pages 1009\u20131018, 2010.\n\n[12] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing sub-\n\nmodular set functions\u0142i. MATHEMATICAL PROGRAMMING, (1):265\u2013294, 1973.\n\n[13] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the\nweb. Technical report, Stanford Digital Library Technologies Project, 1998. Paper SIDL-WP-1999-0120\n(version of 11/11/1999).\n\n[14] F. Radlinski, P. N. Bennett, B. Carterette, and T. Joachims. Redundancy, diversity and interdependent\n\ndocument relevance. SIGIR Forum, 43(2):46\u201352, 2009.\n\n[15] M. Srivastava, T. Abdelzaher, and B. Szymanski. Human-centric sensing. Phil. Trans. R. Soc. 370 ser.\n\nA(1958), pages 176\u2013197, 2012.\n\n[16] H. Tong, J. He, Z. Wen, R. Konuru, and C.-Y. Lin. Diversi\ufb01ed ranking on large graphs: an optimization\n\nviewpoint. In KDD, pages 1028\u20131036, 2011.\n\n[17] M. Y. S. Uddin, M. T. A. Amin, H. Le, T. Abdelzaher, B. Szymanski, and T. Nguyen. On diversifying\n\nsource selection in social sensing. In INSS, 2012.\n\n[18] J. Ugander, L. Backstrom, C. Marlow, and J. Kleinberg. Structural diversity in social contagion. PNAS,\n\n109(16):596\u20135966, 2012.\n\n[19] J. Wang, H. Do, A. Woznica, and A. Kalousis. Metric learning with multiple kernels. In NIPS, pages\n\n1170\u20131178, 2011.\n\n[20] M. J. Welch, J. Cho, and C. Olston. Search result diversity for informational queries. In WWW, pages\n\n237\u2013246, 2011.\n\n[21] L. Wu. Social network effects on performance and layoffs: Evidence from the adoption of a social\n\nnetworking tool. Job Market Paper, 2011.\n\n[22] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. Russell. Distance metric learning with application to clustering\n\nwith side-information. In NIPS, pages 505\u2013512, 2002.\n\n[23] C. Zhai, W. W. Cohen, and J. D. Lafferty. Beyond independent relevance: methods and evaluation metrics\n\nfor subtopic retrieval. In SIGIR, pages 10\u201317, 2003.\n\n[24] X. Zhu, A. B. Goldberg, J. V. Gael, and D. Andrzejewski. Improving diversity in ranking using absorbing\n\nrandom walks. In HLT-NAACL, pages 97\u2013104, 2007.\n\n[25] X. Zhu, J. Guo, X. Cheng, P. Du, and H. Shen. A uni\ufb01ed framework for recommending diverse and\n\nrelevant queries. In WWW, pages 37\u201346, 2011.\n\n9\n\n\f", "award": [], "sourceid": 551, "authors": [{"given_name": "Jingrui", "family_name": "He", "institution": null}, {"given_name": "Hanghang", "family_name": "Tong", "institution": null}, {"given_name": "Qiaozhu", "family_name": "Mei", "institution": null}, {"given_name": "Boleslaw", "family_name": "Szymanski", "institution": null}]}