{"title": "Linear Relaxations for Finding Diverse Elements in Metric Spaces", "book": "Advances in Neural Information Processing Systems", "page_first": 4098, "page_last": 4106, "abstract": "Choosing a diverse subset of a large collection of points in a metric space is a fundamental problem, with applications in feature selection, recommender systems, web search, data summarization, etc. Various notions of diversity have been proposed, tailored to different applications. The general algorithmic goal is to find a subset of points that maximize diversity, while obeying a cardinality (or more generally, matroid) constraint.  The goal of this paper is to develop a novel linear programming (LP) framework that allows us to design approximation algorithms for such problems. We study an objective known as {\\em sum-min} diversity, which is known to be effective in many applications, and give the first constant factor approximation algorithm. Our LP framework allows us to easily incorporate additional constraints, as well as secondary objectives. We also prove a hardness result for two natural diversity objectives, under the  so-called {\\em planted clique} assumption. Finally, we study the empirical performance of our algorithm on several standard datasets. We first study the approximation quality of the algorithm by comparing with the LP objective. Then, we compare the quality of the solutions produced by our method with other popular diversity maximization algorithms.", "full_text": "Linear Relaxations for Finding Diverse Elements in\n\nMetric Spaces\n\nAditya Bhaskara\nUniversity of Utah\n\nMehrdad Ghadiri\n\nSharif University of Technology\n\nVahab Mirrokni\nGoogle Research\n\nbhaskara@cs.utah.edu\n\nghadiri@ce.sharif.edu\n\nmirrokni@google.com\n\nOla Svensson\n\nEPFL\n\nola.svensson@epfl.ch\n\nAbstract\n\nChoosing a diverse subset of a large collection of points in a metric space is a fun-\ndamental problem, with applications in feature selection, recommender systems,\nweb search, data summarization, etc. Various notions of diversity have been pro-\nposed, tailored to different applications. The general algorithmic goal is to \ufb01nd\na subset of points that maximize diversity, while obeying a cardinality (or more\ngenerally, matroid) constraint. The goal of this paper is to develop a novel linear\nprogramming (LP) framework that allows us to design approximation algorithms\nfor such problems. We study an objective known as sum-min diversity, which\nis known to be effective in many applications, and give the \ufb01rst constant factor\napproximation algorithm. Our LP framework allows us to easily incorporate addi-\ntional constraints, as well as secondary objectives. We also prove a hardness result\nfor two natural diversity objectives, under the so-called planted clique assumption.\nFinally, we study the empirical performance of our algorithm on several standard\ndatasets. We \ufb01rst study the approximation quality of the algorithm by comparing\nwith the LP objective. Then, we compare the quality of the solutions produced by\nour method with other popular diversity maximization algorithms.\n\n1\n\nIntroduction\n\nComputing a concise, yet diverse and representative subset of a large collection of elements is a\ncentral problem in many areas. In machine learning, it has been used for feature selection [23],\nand in recommender systems [24]. There are also several data mining applications, such as web\nsearch [21, 20], news aggregation [2], etc. Diversity maximization has also found applications\nin drug discovery, where the goal is to choose a small and diverse subset of a large collection of\ncompounds to use for testing [16].\nA general way to formalize the problem is as follows: we are given a set of objects in a metric\nspace, and the goal is to \ufb01nd a subset of them of a prescribed size so as to maximize some measure\nof diversity (a function of the distances between the chosen points). One well studied example of\na diversity measure is the minimum pairwise distance between the selected points \u2013 the larger it is,\nthe more \u201cmutually separated\u201d the chosen points are. This, as well as other diversity measures have\nbeen studied in the literature [11, 10, 6, 23], including those based on mutual information and linear\nalgebraic notions of distance, and approximation algorithms have been proposed. This is similar in\nspirit to the rich and beautiful literature on clustering problems with various objectives (e.g. k-center,\nk-median, k-means). Similar to clustering, many of the variants of diversity maximization admit\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fFigure 1: Preference for far clusters in sum-sum(\u00b7) maximization\n\nconstant factor approximation algorithms. Most of the known algorithms for diversity maximization\nare based on a natural greedy approach, or on local search.\nOur goal in this work is to develop novel linear programming formulations for diversity maximiza-\ntion and provide new approximation guarantees. Convex relaxation approaches are typically power-\nful in that they can incorporate additional constraints and additional objective functions, as we will\nillustrate. This is important in some applications, and indeed, diversity maximization has been stud-\nied under additional knapsack [3] and matroid [2] constraints. In applications such as web search,\nit is important to optimize diversity, along with other objectives, such as total relevance or coverage\n(see [4]). Another contribution of this work is to explore approximation lower bounds for diversity\nmaximization. Given the simplicity of the best known algorithms for some objectives (e.g., greedy\naddition, single-swap local search), it is natural to ask if better algorithms are possible. Rather\nsurprisingly, we show that the answer is no, for the most common objectives.\n\nObjective functions. The many variants of diversity maximization differ in their choice of the\nobjective function, i.e., how they de\ufb01ne diversity of a set S of points. Our focus in this paper will be\ndistance based objectives, which can be de\ufb01ned over arbitrary metric spaces, via pairwise distances\nbetween the chosen points. Let d(u, v) be the distance between points u and v, and for a set of points\nT , let d(u, T ) = minv\u2208T d(u, v). The three most common objectives are:\n1. Min-min diversity, de\ufb01ned by min-min(S) = minu\u2208S d(u, S \\ u).\n2. Sum-min diversity, de\ufb01ned by sum-min(S) =\ufffdu\u2208S d(u, S \\ u).\n3. Sum-sum diversity, de\ufb01ned by sum-sum(S) =\ufffdu\u2208S\ufffdv\u2208S d(u, v).\n\nAll three objectives have been used in applications [16]. Of these min-min and sum-sum are also\nknown to admit constant factor approximation algorithms. In fact, a natural greedy algorithm gives\na factor 1/2 approximation for min-min, while local search gives a constant factor approximation\nfor sum-sum, even with matroid constraints [6, 2, 4]. However, for the sum-min objective, the best\nknown algorithm had an approximation factor of O(1/ log n) [6] and no inapproximability results\nwere known. Combinatorial methods such as greedy and local search fail (see Lemma 1), and\nachieving a constant factor approximation has remained a challenge. Compared to the other objec-\ntives, the sets that maximize the sum-min objective have properties that are desirable in practice,\nas observed in [16], and demonstrated in our experiments. We will now outline some theoretical\nreasons.\n\nDrawbacks of the min-min and sum-sum objectives. The main problem with min-min stems\nfrom the fact that it solely depends on the closest pair of chosen points, and it does not capture\nthe distance distribution between the chosen points well. Another concern is that it is highly non-\nmonotone in the size of |S| \u2013 in applications such as search, it is paradoxical for the diversity to\ntake a sharp drop once we add one extra element to the set of search results. The sum-sum objective\nis much more robust, and is hence much more popular in applications. However, as also noted\nin [16], it tends to promote picking too many \u201ccorner\u201d points. To illustrate, suppose we have a set\nof points that fall into k clusters (which is common in candidate search results). Suppose the points\nare distributed as a mixture of k equally spaced Gaussians on a line (see Figure 1). The intuitively\ndesired solution is to pick one point from each of the clusters. However the optimizer for sum-sum\npicks all the points from the farthest two clusters (shown shaded in Figure 1).\nThe sum-min objective inherits the good properties of both \u2013 it is robust to a small number of\nadditions/removals, and it tries to ensure that each point is far from the others. However, it is\ntrickier to optimize, as we mentioned earlier. In fact, in the supplement, Section E, we show that:\nLemma 1. The natural Greedy and Local-Search algorithms for sum-min diversity have an approx-\nimation ratio of O(1/\u221ak).\n\n2\n\n\fOur contributions. With these motivations, we study the problem of maximizing sum-min di-\nversity subject to a cardinality constraint \u2013 max|S|\u2264k sum-min(S). Our main algorithmic results\nare:\n\n\u2022 We give a factor 1/8 approximation for sum-min diversity maximization with cardinality\nconstraint (the \ufb01rst constant factor approximation). Indeed, when k is a large enough con-\nstant, we give a (roughly) 1\n2e-approximation. This is presented in Section 2 to illustrate\nour ideas (Theorem 1). The algorithm can also incorporate arbitrary concave functions of\ndistance, as well as explicit constraints to avoid duplicates (end of Section 2).\n\n\u2022 We show that the 1/8 approximation holds when we replace cardinality constraints with ar-\nbitrary matroid constraints. Such constraints arise in applications such as product search [3]\nor news aggregators [2] where it is desirable to report items from different brands or differ-\nent news agencies. This can be modeled as a partition matroid.\n\n\u2022 Our formulation can be used to maximize the sum-min diversity, along with total relevance\nor coverage objectives (Theorem 3). This is motivated by applications in recommender\nsystems in which we also want the set of results we output to cover a large range of top-\nics [4, 2], or have a high total relevance to a query.\n\nNext, we show that for both the sum-sum and the sum-min variants of diversity maximization, ob-\ntaining an approximation factor better than 1/2 is hard, under the planted clique assumption (The-\norem 5). (We observe that such a result for min-min is easy, by a reduction from independent set.)\nThis implies that the simple local search algorithms developed for the sum-sum diversity maximiza-\ntion problem [6, 10, 11] are the best possible under the planted clique assumption.\nFinally, we study the empirical performance of our algorithm on several standard datasets. Our\ngoal here is two-fold: \ufb01rst, we make an experimental case for the sum-min objective, by comparing\nthe quality of the solutions output by our algorithm (which aims to maximize sum-min) with other\npopular algorithms (that maximize sum-sum). This is measured by how well the solution covers\nvarious clusters in the data, as well as by measuring quality in a feature selection task. Second, we\nstudy the approximation quality of the algorithm on real datasets, and observe that it performs much\nbetter than the theoretical guarantee (factor 1/8).\n\n1.1 Notation and Preliminaries\nThroughout, (V, d) will denote the metric space we are working with, and we will write n = |V |.\nThe number of points we need to output will, unless speci\ufb01ed otherwise, be denoted by k.\n\nApproximation factor. We say that an algorithm provides an \u03b1 factor approximation if, on every\ninstance, it outputs a solution whose objective value is at least \u03b1 \u00b7 opt, where opt is the optimum\nvalue of the objective. (Since we wish to maximize our diversity objectives, \u03b1 will be \u2264 1, and\nratios closer to 1 are better.)\nMonotonicity of sum-min.We observe that our main objective, sum-min(\u00b7), is not monotone. I.e.,\nsum-min(S \u222a u) could be \u2264 sum-min(S) (for instance, if u is very close to one of the elements\nof S). This means that it could be better for an algorithm to output k\ufffd < k elements if the goal\nis to maximize sum-min(\u00b7). However, this non-monotonicity is not too serious a problem, as the\nfollowing lemma shows (proof in the supplement, Section A.1).\nLemma 2. Let (V, d) be a metric space, and n = |V |. Suppose 1 < k < n/3 be the target number\nof elements. Let S\ufffd be any subset of V of size \u2264 k. Then we can ef\ufb01ciently \ufb01nd an S \u2286 V of size\n= k, such that sum-min(S) \u2265 1/4 \u00b7 sum-min(S\ufffd).\nSince our aim is to design a constant factor approximation algorithm, in what follows, we will allow\nour algorithms to output \u2264 k elements (we can then use the lemma above to output precisely k).\nMatroid constraints. Let D be a ground set of elements (which in our case, it will be V or its\nsubset). A matroid M is de\ufb01ned by I, a family of subsets of D, called the independent sets of the\nmatroid. I is required to have the properties of being subset-closed and having the basis exchange\nproperty (see Schrijver [22] for details). Some well-studied matroids which we consider are: (a) the\nuniform matroid of rank k, for which we have I := {X \u2286 D :\n|X| \u2264 k}, (b) partition matroids,\nwhich are the direct sum of uniform matroids.\n\n3\n\n\fIn matroid constrained diversity maximization, we are given a matroid M as above, and the goal is\nto output an element of I that maximizes diversity. Note that if M is the uniform matroid, this is\nequivalent to a cardinality constraint. The matroid polytope P (M), de\ufb01ned to be the convex hull\nof the indicator vectors of sets in I, plays a key role in optimization under matroid constraints. For\nmost matroids of practical interest, it turns out optimization over P (M) can be done in polynomial\ntime.\n\n2 Basic Linear Programming Formulation\n\n2e\n\nWe will now illustrate the main ideas behind our LP framework. We do so by proving a slightly\nsimpler form of our result, where we assume that k is not too small. Speci\ufb01cally, we show that:\nTheorem 1. Let (V, d) be a metric space on n points, and let \ufffd, k be parameters that satisfy \ufffd \u2208\n(0, 1) and k > 8 log(1/\ufffd)/\ufffd2. There is a randomized polynomial time algorithm that outputs a set\nS \u2286 V of size \u2264 k with E[sum-min(S)] \u2265 1\u22122\ufffd\n\u00b7 opt, where opt is the largest possible sum-min()\nvalue for a subset of V of size \u2264 k.\nThe main challenge in formulating an LP for the sum-min objective is to capture the quantity d(u, S\\\nu). The key trick is to introduce new variables to do so. To make things formal, for i \u2208 V , we denote\nby Ri = {d(i, j) : j \ufffd= i} the set of candidate distances from i to its closest point in S. Next, let\nB(i, r) denote the \u201copen\u201d ball of radius r centered at i, i.e., B(i, r) = {j \u2208 V : d(i, j) < r}; and\nlet B\ufffd(i, r) = B(i, r/2) denote the ball of half the radius.\nThe LP we consider is as follows: we have a variable xir for each i \u2208 V and r \u2208 Ri which is\nsupposed to be 1 iff i \u2208 S and r = minj\u2208S\\{i} d(i, j). Thus for every i, at most one xir is 1 and the\nrest are 0. Hence\ufffdi,r\u2208Ri\nxir \u2264 k for the intended solution. The other set of constraints we add is\nthe following: for each u \u2208 V ,\n\ufffdi\u2208V,r\u2208Ri:u\u2208B\ufffd(i,r)\n\n(\ufb01gure in Section A.3 of supplement)\n\nxir \u2264 1.\n\nThese constraints are the crux of our LP formulation. They capture the fact that if we take any\nsolution S \u2286 V , the balls B(s, r/2), where s \u2208 S and r = d(s, S \\ {s}) are disjoint. This is\nbecause if u \u2208 B\ufffd(i1, r1) \u2229 B\ufffd(i2, r2), then assuming r1 \u2265 r2 (w.l.o.g.), triangle inequality implies\nthat d(i1, i2) < r1 (the strict inequality is because we de\ufb01ned the balls to be \u2018open\u2019); Thus, in an\nintegral solution, we will set at most one of xi1r1 and xi2r2 to 1. The full LP can now be written as\nfollows\n\n(1)\n\nmaximize \ufffdi \ufffdr\u2208Ri\n\ufffdi\u2208V,r\u2208Ri\n\ufffdi\u2208V,r\u2208Ri:u\u2208B\ufffd(i,r)\n\nxir \u00b7 r\n\nsubject to\n\nxir \u2264 k,\n\nxir \u2264 1\n\nfor all u \u2208 V ,\n\n0 \u2264 xir \u2264 1.\n\nThe algorithm then proceeds by solving this LP, and rounding via the procedure de\ufb01ned below.\nNote that after step 2, we may have pairs with the same \ufb01rst coordinate, since we round them\nindependently. But after step 3, this will not happen, as all but one of them will have been removed.\n\n// LP solution (x)\n\nprocedure round(x)\n1: Initialize S = \u2205.\n2: Add (i, r) to S with probability (1 \u2212 \ufffd)(1 \u2212 e\u2212xir ) (independent of the other point-radius pairs).\n3: If (i, r) \ufffd= (j, r\ufffd) \u2208 S such that r \u2264 r\ufffd and i \u2208 B\ufffd(j, r\ufffd), remove (i, r) from S.\n4: If |S| > k, abort (i.e., return \u2205 which has value 0); else return S, the set of \ufb01rst coordinates of\n\nS.\n\nRunning time. The LP as described contains n2 variables, n for each vertex. This can easily be\nreduced to O(log n) per vertex, by only considering r in multiples of (1 + \u03b4), for some \ufb01xed \u03b4 > 0.\n\n4\n\n\fFurther, we note that the LP is a packing LP. Thus it can be solved in time that is nearly linear in the\nsize (and can be solved in parallel) [19].\nAnalysis.\nLet us now show that round returns a solution to with large expected value for the\nobjective (note that due to the last step, it always returns a feasible solution, i.e., size \u2264 k). The idea\nis to write the expected diversity as a sum of appropriately de\ufb01ned random variables, and then use\nthe linearity of expectation. For a (vertex, radius) pair (i, r), de\ufb01ne \u03c7ir to be an indicator random\nvariable that is 1 iff (a) the pair (i, r) is picked in step 2, (b) it is not removed in step 3, and (c)\n|S| \u2264 k after step 3. Then we have the following.\nLemma 3. Let S be the solution output by the algorithm, and de\ufb01ne \u03c7ir as above. Then we have\nsum-min(S) \u2265\ufffdi,r\nProof. If the set S after step 3 is of size > k, each \u03c7ir = 0, and so there is nothing to prove.\nOtherwise, consider the set S at the end of step 3 and consider two pairs (i, r), (j, r\ufffd) \u2208 S. The fact\nthat both of them survived step 3 implies that d(i, j) \u2265 max(r, r\ufffd)/2. Thus d(i, j) \u2265 r/2 for any\nj \ufffd= i in the output, which implies that the contribution of i to the sum-min objective is \u2265 r/2. This\ncompletes the proof.\n\n2 \u00b7 \u03c7ir.\n\nr\n\nNow, we will \ufb01x one pair (i, r) and show a lower bound on Pr[\u03c7ir = 1].\nLemma 4. Consider the execution of the algorithm, and consider some pair (i, r). De\ufb01ne \u03c7ir as\nabove. We have Pr[\u03c7ir = 1] \u2265 (1 \u2212 2\ufffd)xir/e.\nProof. Let T be the set of all (point, radius) pairs (j, r\ufffd) such that (i, r) \ufffd= (j, r\ufffd), i \u2208 B\ufffd(j, r\ufffd), and\nr\ufffd \u2265 r. Now, the condition (b) in the de\ufb01nition of \u03c7ir is equivalent to the condition that none of the\npairs in T are picked in step 2. Let us denote by \u03c7(a) (resp., \u03c7(b), \u03c7(c)) the indicator variable for the\ncondition (a) (resp. (b), (c)) in the de\ufb01nition of \u03c7ir. We need to lower bound Pr[\u03c7(a) \u2227 \u03c7(b) \u2227 \u03c7(c)].\nTo this end, note that\n\nPr[\u03c7(a) \u2227 \u03c7(b) \u2227 \u03c7(c)] = Pr[\u03c7(a) \u2227 \u03c7(b)] \u2212 Pr[\u03c7(a) \u2227 \u03c7(b) \u2227 \u03c7(c)]\n\n\u2265 Pr[\u03c7(a) \u2227 \u03c7(b)] \u2212 Pr[\u03c7(a) \u2227 \u03c7(c)].\n(2)\nHere \u03c7(c) denotes the complement of \u03c7(c), i.e., the event |S| > k at the end of step 3. Now, since\nthe rounding selects pairs independently, we can lower bound the \ufb01rst term as\n\n(3)\n\nPr[\u03c7(a) \u2227 \u03c7(b)] \u2265 (1 \u2212 \ufffd)\ufffd1 \u2212 e\u2212xir\ufffd \ufffd(j,r\ufffd)\u2208T\ufffd1 \u2212 (1 \u2212 \ufffd)(1 \u2212 e\u2212xjr\ufffd )\ufffd\n\n\u2265 (1 \u2212 \ufffd)\ufffd1 \u2212 e\u2212xir\ufffd \ufffd(j,r\ufffd)\u2208T\n\ne\u2212xjr\ufffd\n\nNow, we can upper bound\ufffd(j,r\ufffd)\u2208T xjr\ufffd, by noting that for all such pairs, B\ufffd(j, r\ufffd) contains i, and\nthus the LP constraint for i implies that\ufffd(j,r\ufffd)\u2208T xjr\ufffd \u2264 1 \u2212 xir. Plugging this into (3), we get\n\nPr[\u03c7(a) \u2227 \u03c7(b)] \u2265 (1 \u2212 \ufffd)\ufffd1 \u2212 e\u2212xir\ufffd e\u2212(1\u2212xir) = (1 \u2212 \ufffd)\n\nWe then need to upper bound the second term of (2). This is done using a Chernoff bound, which\nthen implies the lemma. (see the Supplement, Section A.2 for details).\n\nexir \u2212 1\n\ne\n\n\u2265 (1 \u2212 \ufffd)xir/e.\n\nProof of Theorem 1. The proof follows from Lemmas 3 and 4, together with linearity of expectation.\nFor details, see Section A.3 of the supplementary material.\n\nDirect Extensions. We mention two useful extensions that follow from our argument.\n(1) We can explicitly prevent the LP from picking points that are too close to each other (near\nduplicates). Suppose we are only looking for solutions in which every pair of points are at least a\ndistance \u03c4. Then, we can modify the set of \u2018candidate\u2019 distances Ri for each vertex to only include\nthose \u2265 \u03c4. This way, in the \ufb01nal solution, all the chosen points are at least \u03c4 /2 apart.\n(2) Our approximation guarantee also holds if the objective has any monotone concave function g()\nof d(u, S \\ u). In the LP, we could maximize\ufffdi\ufffdr\u2208Ri\nxir \u00b7 g(r), and the monotone concavity\n(which implies g(r/2) \u2265 g(r)/2) ensures the same approximation ratio. In some settings, having a\ncap on a vertex\u2019s contribution to the objective is useful (e.g., bounding the effect of outliers).\n\n5\n\n\f3 General Matroid Constraints\n\nLet us now state our general result. It removes the restriction on k, and has arbitrary matroid con-\nstraints, as opposed to cardinality constraints in Section 2.\nTheorem 2. Let (V, d) be a metric space on n points, and let M = (V,I) be a matroid on V . Then\nthere is an ef\ufb01cient randomized algorithm1 to \ufb01nd an S \u2208 I whose expected sum-min(S) value is at\nleast opt/8, where opt = maxI\u2208I sum-min(I).\nThe algorithm proceeds by solving an LP relaxation as before. The key differences in the formu-\nlation are: (1) we introduce new opening variables yi := \ufffdr\u2208Ri\nxir for each i \u2208 V , and (2) the\nconstraint\ufffdi yi \u2264 k (which we had written in terms of the x variables) is now replaced with a gen-\neral matroid constraint, which states that y \u2208 P (M). See Section B (of the supplementary material)\nfor the full LP.\nThis LP is now rounded using a different procedure, which we call generalized-round. Here, instead\nof independent rounding, we employ the randomized swap rounding algorithm (or the closely related\npipage rounding) of [7], followed by a randomized rounding step.\n\n// LP solution (y, x).\n\nprocedure generalized-round(y, x)\n1: Initialize S = \u2205.\n2: Apply randomized swap rounding to the vector y/2 to obtain Y \u2208 {0, 1}V \u2229 P (M).\n3: For each i with Yi = 1, add i to S and sample a radius ri according to the probability distribution\n4: If i \u2208 B\ufffd(j, rj) with i \ufffd= j \u2208 S and rj \u2265 ri, remove i from S.\n5: Return S.\n\nthat selects r \u2208 Ri with probability xir/yi.\n\nNote that the rounding outputs S, along with an ri value for each i \u2208 S. The idea behind the analysis\nis that this rounding has the same properties as randomized rounding, while ensuring that S is an\nindependent set of M. The details, and the proof of Theorem 2 are deferred to the supplementary\nmaterial (Section B).\n\n4 Additional Objectives and Hardness\n\nThe LP framework allows us to incorporate \u201csecondary objectives\u201d. As an example, consider the\nproblem of selecting search results, in which every candidate page has a relevance to the query,\nalong with the metric between pages. Here, we are interested in selecting a subset with a high total\nrelevance, in addition to a large value of sum-min(). A generalization of relevance is coverage.\nSuppose every page u comes with a set Cu of topics it covers. Now consider the problem of picking\na set S of pages so as to simultaneously maximize sum-min() and the total coverage, i.e., the size\nof the union \u222au\u2208SCu, subject to cardinality constraints. (Coverage generalizes relevance, because\nif the sets Cu are all disjoint, then |Cu| acts as the relevance of u.)\nBecause we have a simple formulation and rounding procedure, we can easily incorporate a coverage\n(and therefore relevance) objective into our LP, and obtain simultaneous guarantees. We prove the\nfollowing: (A discussion of the theorem and its proof are deferred to Section C.)\nTheorem 3. Let (V, d) be a metric space and let {Cu : u \u2208 V } be a collection of subsets of\na universe [m]. Suppose there exists a set S\u2217 \u2286 V of size \u2264 k with sum-min(S\u2217) = opt, and\n| \u222au\u2208S\u2217 Cu| = C. Then there is an ef\ufb01cient randomized algorithm that outputs a set S satisfying:\n(1) E[|S|] \u2264 k, (2) E[sum-min(S)] \u2265 opt/8, and (3) E[| \u222au\u2208S Cu|] \u2265 C/16.\n4.1 Hardness Beyond Factor 1/2\n\nFor diversity maximization under both the sum-sum and the sum-min objectives, we show that ob-\ntaining approximation ratios better than 2 is unlikely, by a reduction from the so-called planted\nclique problem. Such a reduction for sum-sum was independently obtained by Borodin et al. [4].\nFor completeness, we provide the details and proof in the supplementary material (Section D).\n1Assuming optimization over P (M) can be done ef\ufb01ciently, which is true for all standard matroids.\n\n6\n\n\f5 Experiments\n\nGoals and design. The goal of our experiments is to evaluate the sum-min objective as well as\nthe approximation quality of our algorithm on real datasets. For the \ufb01rst of the two, we consider\nthe k-element subsets obtained by maximizing the sum-min objective (using a slight variant of our\nalgorithm), and compare their quality (in terms of being representative of the data) with subsets ob-\ntained by maximizing the sum-sum objective, which is the most commonly used diversity objective.\nSince measuring the quality as above is not clearly de\ufb01ned, we come up with two measures, using\ndatasets that have a known clustering:\n(1) First, we see how well the different clusters are represented in the chosen subset. This is impor-\ntant in web search applications, and we do this in two ways: (a) by measuring the number of distinct\nclusters present, and (b) by observing the \u201cnon-uniformity\u201d in the number of nodes picked from the\ndifferent clusters, measured as a deviation from the mean.\n(2) Second, we consider feature-selection. Here, we consider data in which each object has several\nfeatures, and then we pick a subset of the features (treating each feature as a vector of size equal\nto the number of data points). Then, we restrict data to just the chosen features, and see how well\n3-NN clustering in the obtained space (which is much faster to perform than in the original space,\ndue to the reduced number of features) compares with ground-truth clustering.\nLet us go into the details of (1) above. We used two datasets with ground-truth clusterings. The\n\ufb01rst is COIL100, which contains images of 100 different objects [17]. It includes 72 images per\nobject. We convert them into 32 \u00d7 32 grayscale images and consider 6 pictures per object. We used\nEuclidean distance as the metric. The second dataset is CDK2 \u2013 a drug discovery dataset publicly\navailable in BindingDB.org [15, 1]. It contains 2253 compounds in 151 different clusters. Tanimoto\ndistance, which is widely used in the drug discovery literature (and is similar to Jaccard distance),\nwas used as the metric. Figure 2 (top) shows the number of distinct clusters picked by algorithms\nfor the two objectives, and (bottom) shows the non-uniformity in the #(elements) picked from the\ndifferent clusters (mean std deviation). We note that throughout this section, augmented LP is the\nalgorithm that \ufb01rst does our LP rounding, and then adds nodes in a greedy manner to as to produce\na subset of size precisely k (since randomized rounding could produce smaller sets).\n\n(a) COIL100 coverage\n\n(b) CDK2 coverage\n\n(c) COIL100 non-uniformity\n\n(d) CDK2 non-uniformity\n\nFigure 2: Sum-min vs Sum-sum objectives \u2013 how chosen subsets represent clusters\n\n7\n\n\fNow consider (2) above \u2013 feature selection. We used two handwritten text datasets. Multiple Fea-\ntures is a dataset of handwritten digits (649 features, 2000 instances [14]). USPS is a dataset of\nhandwritten text (256 features, 9298 instances [12, 5]). We used the Euclidean distance as the metric\n(we could use more sophisticated features to compute distance, but even the simplistic one produces\ngood results). Figure 3 shows the performance of the features selected by various algorithms.\n\n(a) Multiple Features dataset\n\n(b) USPS dataset\n\nFigure 3: Comparing outputs of feature selection via 3-NN classi\ufb01cation with 10-fold cross validation.\n\ni\n\nNext, we evaluate the practical performance of our LP algorithm, in terms of the proximity to the\noptimum objective value. Since we do not know the optimum, we compare it with the minimum of\ntwo upper bounds: the \ufb01rst is simply the value of the LP solution. The second is obtained as follows.\nFor every i, let tj\ni denote the jth largest distance from i to other points in the dataset. The sum of\nk largest elements of {tk\u22121\n|i = 1, . . . , n} is clearly an upper bound on the sum-min objective, and\nsometimes it could be better than the LP optimum. Figure 4 shows the percentage of the minimum\nof the upperbounds that the augmented-LP algorithm achieves for two datasets [14, 18, 12, 8]. Note\nthat it is signi\ufb01cantly better than the theoretical guarantee 1/8. In fact, by adding the so-called clique\nconstraints on the LP, we can obtain an even better bounds on the approximation ratio. However,\nthis will result in a quadratic number of constraints, making the LP approach slow. Figure 4 also\ndepicts the value of the simple LP algorithm (without augmenting to select k points).\nFinally, we point out that for many of the datasets we consider, there is no signi\ufb01cant difference\nbetween the LP based algorithm, and the Local Search (and sometimes even the Greedy) heuristic in\nterms of the sum-min objective value. However, as we noted, the heuristics do not have worst case\nguarantees. A comparision is shown in Figure 4 (c).\n\n(a) Madelon dataset\n\n(b) USPS dataset\n\n(c) COIL100 dataset\n\nFigure 4:\nAugmented LP with Greedy and LocalSearch in terms of sum-min objective value\n\n(a) and (b) show the approximation factor of LP and augmented LP algorithms; (c) compares\n\nConclusions. We have presented an approximation algorithm for diversity maximization, under\nthe sum-min objective, by developing a new linear programming (LP) framework for the problem.\nSum-min diversity turns out to be very effective at picking representatives from clustered data \u2013 a\nfact that we have demonstrated experimentally. Simple algorithms such as Greedy and Local Search\ncould perform quite badly for sum-min diversity, which led us to the design of the LP approach.\nThe approximation factor turns out to be much better in practice (compared to 1/8, which is the\ntheoretical bound). Our LP approach is also quite general, and can easily incorporate additional\nobjectives (such as relevance), which often arise in applications.\n\n8\n\n\fReferences\n[1] The binding database. http://www.bindingdb.org/. Accessed: 2016-05-01.\n[2] Z. Abbassi, V. S. Mirrokni, and M. Thakur. Diversity maximization under matroid constraints. In KDD,\n\npages 32\u201340, 2013.\n\n[3] S. Bhattacharya, S. Gollapudi, and K. Munagala. Consideration set generation in commerce search. In\nS. Srinivasan, K. Ramamritham, A. Kumar, M. P. Ravindra, E. Bertino, and R. Kumar, editors, WWW,\npages 317\u2013326. ACM, 2011.\n\n[4] A. Borodin, H. C. Lee, and Y. Ye. Max-sum diversi\ufb01cation, monotone submodular functions and dynamic\n\nupdates. In M. Benedikt, M. Kr\u00a8otzsch, and M. Lenzerini, editors, PODS, pages 155\u2013166. ACM, 2012.\n\n[5] D. Cai, X. He, J. Han, and T. S. Huang. Graph regularized nonnegative matrix factorization for data\nrepresentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(8):1548\u20131560,\n2011.\n\n[6] B. Chandra and M. M. Halld\u00b4orsson. Approximation algorithms for dispersion problems. Journal of\n\nalgorithms, 38(2):438\u2013465, 2001.\n\n[7] C. Chekuri, J. Vondrak, and R. Zenklusen. Dependent randomized rounding via exchange properties of\ncombinatorial structures. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Sympo-\nsium on, pages 575\u2013584, Oct 2010.\n\n[8] P. Duygulu, K. Barnard, J. F. G. de Freitas, and D. A. Forsyth. Object recognition as machine transla-\ntion: Learning a lexicon for a \ufb01xed image vocabulary. In Computer Vision - ECCV 2002, 7th European\nConference on Computer Vision, Copenhagen, Denmark, May 28-31, 2002, Proceedings, Part IV, pages\n97\u2013112, 2002.\n\n[9] V. Feldman, E. Grigorescu, L. Reyzin, S. Vempala, and Y. Xiao. Statistical algorithms and a lower bound\nIn Proceedings of the Forty-\ufb01fth Annual ACM Symposium on Theory of\n\nfor detecting planted cliques.\nComputing, STOC \u201913, pages 655\u2013664, New York, NY, USA, 2013. ACM.\n\n[10] S. Gollapudi and A. Sharma. An axiomatic approach for result diversi\ufb01cation. In J. Quemada, G. Le\u00b4on,\n\nY. S. Maarek, and W. Nejdl, editors, WWW, pages 381\u2013390. ACM, 2009.\n\n[11] R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Oper. Res.\n\nLett., 21(3):133\u2013137, 1997.\n\n[12] J. J. Hull. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell.,\n\n16(5):550\u2013554, 1994.\n\n[13] R. M. Karp. Probabilistic analysis of some combinatorial search problems. Algorithms and Complexity:\n\nNew Directions and Recent Results, pages 1\u201319, 1976.\n[14] M. Lichman. UCI machine learning repository, 2013.\n[15] T. Liu, Y. Lin, X. Wen, R. N. Jorissen, and M. K. Gilson. Bindingdb: a web-accessible database of\nexperimentally determined protein\u2013ligand binding af\ufb01nities. Nucleic acids research, 35(suppl 1):D198\u2013\nD201, 2007.\n\n[16] T. Meinl, C. Ostermann, and M. R. Berthold. Maximum-score diversity selection for early drug discovery.\n\nJournal of chemical information and modeling, 51(2):237\u2013247, 2011.\n\n[17] S. Nayar, S. Nene, and H. Murase. Columbia object image library (coil 100). Department of Comp.\n\nScience, Columbia University, Tech. Rep. CUCS-006-96, 1996.\n\n[18] H. Peng, F. Long, and C. Ding. Feature selection based on mutual information criteria of max-dependency,\nmax-relevance, and min-redundancy. Pattern Analysis and Machine Intelligence, IEEE Transactions on,\n27(8):1226\u20131238, 2005.\n\n[19] S. A. Plotkin, D. B. Shmoys, and E. Tardos. Fast approximation algorithms for fractional packing and\ncovering problems. In Proceedings of the 32Nd Annual Symposium on Foundations of Computer Science,\nSFCS \u201991, pages 495\u2013504, Washington, DC, USA, 1991. IEEE Computer Society.\n\n[20] L. Qin, J. X. Yu, and L. Chang. Diversifying top-k results. Proceedings of the VLDB Endowment,\n\n5(11):1124\u20131135, 2012.\n\n[21] F. Radlinski and S. T. Dumais.\n\nSIGIR, 2006.\n\nImproving Personalized Web Search using Result Diversi\ufb01cation.\n\nIn\n\n[22] A. Schrijver. Combinatorial Optimization. Springer-Verlag, Berlin, 2003.\n[23] N. Vasconcelos. Feature selection by maximum marginal diversity: optimality and implications for visual\nIn Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer\n\nrecognition.\nSociety Conference on, volume 1, pages I\u2013762\u2013I\u2013769 vol.1, June 2003.\n\n[24] M. R. Vieira, H. L. Razente, M. C. Barioni, M. Hadjieleftheriou, D. Srivastava, C. Traina, and V. J.\nIn Data Engineering (ICDE), 2011 IEEE 27th International\n\nTsotras. On query result diversi\ufb01cation.\nConference on, pages 1163\u20131174. IEEE, 2011.\n\n9\n\n\f", "award": [], "sourceid": 2043, "authors": [{"given_name": "Aditya", "family_name": "Bhaskara", "institution": "University of Utah"}, {"given_name": "Mehrdad", "family_name": "Ghadiri", "institution": "Sharif University of Technolog"}, {"given_name": "Vahab", "family_name": "Mirrokni", "institution": "Google"}, {"given_name": "Ola", "family_name": "Svensson", "institution": "EPFL"}]}