{"title": "Maximization of Approximately Submodular Functions", "book": "Advances in Neural Information Processing Systems", "page_first": 3045, "page_last": 3053, "abstract": "We study the problem of maximizing a function that is approximately submodular under a cardinality constraint. Approximate submodularity implicitly appears in a wide range of applications as in many cases errors in evaluation of a submodular function break submodularity. Say that $F$ is $\\eps$-approximately submodular if there exists a submodular function $f$ such that $(1-\\eps)f(S) \\leq F(S)\\leq (1+\\eps)f(S)$ for all subsets $S$. We are interested in characterizing the query-complexity of maximizing $F$ subject to a cardinality constraint $k$ as a function of the error level $\\eps > 0$. We provide both lower and upper bounds: for $\\eps > n^{-1/2}$ we show an exponential query-complexity lower bound. In contrast, when $\\eps < {1}/{k}$ or under a stronger bounded curvature assumption, we give constant approximation algorithms.", "full_text": "Maximization of\n\nApproximately Submodular Functions\n\nThibaut Horel\n\nHarvard University\n\nthorel@seas.harvard.edu\n\nYaron Singer\n\nHarvard University\n\nyaron@seas.harvard.edu\n\nAbstract\n\nWe study the problem of maximizing a function that is approximately submodular\nunder a cardinality constraint. Approximate submodularity implicitly appears in a\nwide range of applications as in many cases errors in evaluation of a submodular\nfunction break submodularity. Say that F is \u03b5-approximately submodular if there\nexists a submodular function f such that (1\u2212\u03b5)f (S) \u2264 F (S) \u2264 (1+\u03b5)f (S) for all\nsubsets S. We are interested in characterizing the query-complexity of maximizing\nF subject to a cardinality constraint k as a function of the error level \u03b5 > 0. We\nprovide both lower and upper bounds: for \u03b5 > n\u22121/2 we show an exponential\nquery-complexity lower bound. In contrast, when \u03b5 < 1/k or under a stronger\nbounded curvature assumption, we give constant approximation algorithms.\n\n1\n\nIntroduction\n\nIn recent years, there has been a surge of interest in machine learning methods that involve discrete\noptimization. In this realm, the evolving theory of submodular optimization has been a catalyst\nfor progress in extraordinarily varied application areas. Examples include active learning and\nexperimental design [9, 12, 14, 19, 20], sparse reconstruction [1, 6, 7], graph inference [23, 24, 8],\nvideo analysis [29], clustering [10], document summarization [21], object detection [27], information\nretrieval [28], network inference [23, 24], and information diffusion in networks [17].\nThe power of submodularity as a modeling tool lies in its ability to capture interesting application\ndomains while maintaining provable guarantees for optimization. The guarantees however, apply to\nthe case in which one has access to the exact function to optimize. In many applications, one does\nnot have access to the exact version of the function, but rather some approximate version of it. If\nthe approximate version remains submodular then the theory of submodular optimization clearly\napplies and modest errors translate to modest loss in quality of approximation. But if the approximate\nversion of the function ceases to be submodular all bets are off.\nApproximate submodularity. Recall that a function f : 2N \u2192 R is submodular if for all S, T \u2286\nN, f (S \u222a T ) + f (S \u2229 T ) \u2264 f (S) + f (T ). We say that a function F : 2N \u2192 R is \u03b5-approximately\nsubmodular if there exists a submodular function f : 2N \u2192 R s.t. for any S \u2286 N:\n\n(1)\nUnless otherwise stated, all submodular functions f considered are normalized (f (\u2205) = 0) and\nmonotone (f (S) \u2264 f (T ) for S \u2286 T ). Approximate submodularity appears in various domains.\n\n(1 \u2212 \u03b5)f (S) \u2264 F (S) \u2264 (1 + \u03b5)f (S).\n\n\u2022 Optimization with noisy oracles. In these scenarios, we wish to solve optimization prob-\nlems where one does not have access to a submodular function but a noisy version of it. An\nexample recently studied in [5] involves maximizing information gain in graphical models;\nthis captures many Bayesian experimental design settings.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\f\u2022 PMAC learning. In the active area of learning submodular functions initiated by Balcan\nand Harvey [3], the objective is to approximately learn submodular functions. Roughly\nspeaking, the PMAC-learning framework guarantees that the learned function is a constant-\nfactor approximation of the true submodular function with high probability. Therefore, after\nlearning a submodular function, one obtains an approximately submodular function.\n\u2022 Sketching. Since submodular functions have, in general, exponential-size representation, [2]\nstudied the problem of sketching submodular functions: \ufb01nding a function with polynomial-\nsize representation approximating a given submodular function. The resulting sketch is an\napproximately submodular function.\n\nF (S)\n\nmax\nS : |S|\u2264k\n\nOptimization of approximate submodularity. We focus on optimization problems of the form\n(2)\nwhere F is an \u03b5-approximately submodular function and k \u2208 N is the cardinality constraint.\nWe say that a set S \u2286 N is an \u03b1-approximation to the optimal solution of (2) if |S| \u2264 k and\nF (S) \u2265 \u03b1 max|T|\u2264k F (T ). As is common in submodular optimization, we assume the value query\nmodel: optimization algorithms have access to the objective function F in a black-box manner,\ni.e. they make queries to an oracle which returns, for a queried set S, the value F (S). The query-\ncomplexity of the algorithm is simply the number of queries made to the oracle. An algorithm\nis called an \u03b1-approximation algorithm if for any approximately submodular input F the solution\nreturned by the algorithm is an \u03b1-approximately optimal solution. Note that if there exists an \u03b1-\napproximation algorithm for the problem of maximizing an \u03b5-approximate submodular function F ,\nthen this algorithm is a \u03b1(1\u2212\u03b5)\n1+\u03b5 -approximation algorithm for the original submodular function f 1.\nConversely, if no such algorithm exists, this implies an inapproximability for the original function.\nClearly, if a function is 0-approximately submodular then it retains desirable provable guarantees2,\nand it if it is arbitrarily far from being submodular it can be shown to be trivially inapproximable (e.g.\nmaximize a function which takes value of 1 for a single arbitrary set S \u2286 N and 0 elsewhere). The\nquestion is therefore:\n\nHow close should a function be to submodular to retain provable approximation guarantees?\n\nIn recent work, it was shown that for any constant \u03b5 > 0 there exists a class of \u03b5-approximately\nsubmodular functions for which no algorithm using fewer than exponentially-many queries has a\nconstant approximation ratio for the canonical problem of maximizing a monotone submodular\nfunction under a cardinality constraint [13]. Such an impossibility result suggests two natural\nrelaxations: the \ufb01rst is to make additional assumptions about the structure of errors, such a stochastic\nerror model. This is the direction taken in [13], where the main result shows that when errors are\ndrawn i.i.d. from a wide class of distributions, optimal guarantees are obtainable. The second\nalternative is to assume the error is subconstant, which is the focus of this paper.\n\n1.1 Overview of the results\nOur main result is a spoiler: even for \u03b5 = 1/n1/2\u2212\u03b2 for any constant \u03b2 > 0 and n = |N|, no\nalgorithm can obtain a constant-factor approximation guarantee. More speci\ufb01cally, we show that:\n\u2022 For the general case of monotone submodular functions, for any \u03b2 > 0, given access to a\nn1/2\u2212\u03b2 -approximately submodular function, no algorithm can obtain an approximation ratio\nbetter than O(1/n\u03b2) using polynomially many queries (Theorem 3);\n\u2022 For the case of coverage functions we show that for any \ufb01xed \u03b2 > 0 given access to an\nn1/3\u2212\u03b2 -approximately submodular function, no algorithm can obtain an approximation ratio\nstrictly better than O(1/n\u03b2) using polynomially many queries (Theorem 4).\n\n1\n\n1\n\n1Observe that for an approximately submodular function F , there exists many submodular functions f of\nwhich it is an approximation. All such submodular functions f are called representatives of F . The conversion\nbetween an approximation guarantee for F and an approximation guarantee for a representative f of F holds for\nany choice of the representative.\n2Speci\ufb01cally, [22] shows that it possible to obtain a (1\u2212 1/e) approximation ratio for a cardinality constraint.\n\n2\n\n\fThe above results imply that even in cases where the objective function is arbitrarily close to being\nsubmodular as the number n of elements in N grows, reasonable optimization guarantees are\nunachievable. The second result shows that this is the case even when we aim to optimize coverage\nfunctions. Coverage functions are an important class of submodular functions which are used in\nnumerous applications [11, 21, 18].\n\nApproximation guarantees. The inapproximability results follow from two properties of the\nmodel: the structure of the function (submodularity), and the size of \u03b5 in the de\ufb01nition of approximate\nsubmodularity. A natural question is whether one can relax either conditions to obtain positive\napproximation guarantees. We show that this is indeed the case:\n\nachieves a(cid:0)1\u22121/e\u2212O(\u03b4)(cid:1) approximation ratio when \u03b5 = \u03b4\n\n\u2022 In the general case of monotone submodular functions we show that the greedy algorithm\nk (Theorem 5). Furthermore, this\nbound is tight: given a 1/k1\u2212\u03b2-approximately submodular function, the greedy algorithm\nno longer provides a constant factor approximation guarantee (Proposition 6).\n\u2022 Since our query-complexity lower bound holds for coverage functions, which already contain\na great deal of structure, we relax the structural assumption by considering functions with\nbounded curvature c; this is a common assumption in applications of submodularity to\nmachine learning and has been used in prior work to obtain theoretical guarantees [15, 16].\nUnder this assumption, we give an algorithm which achieves an approximation ratio of\n(1 \u2212 c)( 1\u2212\u03b5\n\n1+\u03b5 )2 (Proposition 8).\n\nWe state our positive results for the case of a cardinality constraint of k. Similar results hold for\nmatroids of rank k, the proofs of those can be found in the Appendix. Note that cardinality constraints\nare a special case of matroid constraints, therefore our lower bounds also apply to matroid constraints.\n\n1.2 Discussion and additional related work\n\nBefore transitioning to the technical results, we brie\ufb02y survey error in applications of submodularity\nand the implications of our results to these applications. First, notice that there is a coupling between\napproximate submodularity and erroneous evaluations of a submodular function: if one can evaluate\na submodular function within (multiplicative) accuracy of 1 \u00b1 \u03b5 then this is an \u03b5-approximately\nsubmodular function.\n\nAdditive vs multiplicative approximation. The de\ufb01nition of approximate submodularity in (1)\nuses relative (multiplicative) approximation. We could instead consider absolute (additive) approx-\nimation, i.e. require that f (S) \u2212 \u03b5 \u2264 F (S) \u2264 f (S) + \u03b5 for all sets S. This de\ufb01nition has been\nused in the related problem of optimizing approximately convex functions [4, 25], where functions\nare assumed to have normalized range. For un-normalized functions or functions whose range is\nunknown, a relative approximation is more informative. When the range is known, speci\ufb01cally if an\nupper bound B on f (S) is known, an \u03b5/B-approximately submodular function is also an \u03b5-additively\napproximate submodular function. This implies that our lower bounds and approximation results\ncould equivalently be expressed for additive approximations of normalized functions.\n\nError vs noise.\nIf we interpret Equation (1) in terms of error, we see that no assumption is made\non the source of the error yielding the approximately submodular function. In particular, there is\nno stochastic assumption: the error is deterministic and worst-case. Previous work have considered\nsubmodular or combinatorial optimization under random noise. Two models naturally arise:\n\n\u2022 consistent noise: the approximate function F is such that F (S) = \u03beSf (S) where \u03beS is\ndrawn independently for each set S from a distribution D. The key aspect of consistent\nnoise is that the random draws occur only once: querying the same set multiple times always\nreturns the same value. This de\ufb01nition is the one adopted in [13]; a similar notion is called\npersistent noise in [5].\n\u2022 inconsistent noise: in this model F (S) is a random variable such that f (S) = E[F (S)]. The\nnoisy oracle can be queried multiple times and each query corresponds to a new independent\nrandom draw from the distribution of F (S). This model was considered in [26] in the\ncontext of dataset summarization and is also implicitly present in [17] where the objective\nfunction is de\ufb01ned as an expectation and has to be estimated via sampling.\n\n3\n\n\fFormal guarantees for consistent noise have been obtained in [13]. A standard way to approach\noptimization with inconsistent noise is to estimate the value of each set used by the algorithm to an\naccuracy \u03b5 via independent randomized sampling, where \u03b5 is chosen small enough so as to obtain\napproximation guarantees. Speci\ufb01cally, assuming that the algorithm only makes polynomially many\nvalue queries and that the function f is such that F (S) \u2208 [b, B] for any set S, then a classical\napplication of the Chernoff bound combined with a union bound implies that if the value of each set\nis estimated by averaging the value of m samples with m = \u2126\n, then with high probability\nthe estimated value F (S) of each set used by the algorithm is such that (1 \u2212 \u03b5)f (S) \u2264 F (S) \u2264\n(1 + \u03b5)f (S). In other words, randomized sampling is used to construct a function which is \u03b5-\napproximately submodular with high probability.\n\n(cid:16) B log n\n\n(cid:17)\n\nb\u03b52\n\nImplications of results in this paper. Given the above discussion, our results can be interpreted in\nthe context of noise as providing guarantees on what is a tolerable noise level. In particular, Theorem 5\nimplies that if a submodular function is estimated using m samples, with m = \u2126\n, then the\nGreedy algorithm is a constant approximation algorithm for the problem of maximizing a monotone\nsubmodular function under a cardinality constraint. Theorem 3 implies that if m = O\nthen the resulting estimation error is within the range where no algorithm can obtain a constant\napproximation ratio.\n\n(cid:17)\n\n(cid:17)\n(cid:16) Bn log n\n\n(cid:16) Bn2 log n\n\nb\n\nb\n\n2 Query-complexity lower bounds\n\nIn this section we give query-complexity lower bounds for the problem of maximizing an \u03b5-\napproximately submodular function subject to a cardinality constraint. In Section 2.1, we show\nan exponential query-complexity lower bound for the case of general submodular functions when\n\u03b5 \u2265 n\u22121/2 (Theorem 3). The same lower-bound is then shown to hold even when we restrict\nourselves to the case of coverage functions for \u03b5 \u2265 n\u22121/3 (Theorem 4).\n\nA general overview of query-complexity lower bounds. At a high level, the lower bounds are\nconstructed as follows. We de\ufb01ne a class of monotone submodular functions F, and draw a function\nf uniformly at random from F. In addition we de\ufb01ne a submodular function g : 2N \u2192 R s.t.\nmax|S|\u2264k f (S) \u2265 \u03c1(n) \u00b7 max|S|\u2264k g(S), where \u03c1(n) = o(1) for a particular choice of k < n. We\nthen de\ufb01ne the approximately submodular function F :\n\n(cid:26)g(S),\n\nf (S)\n\nF (S) =\n\nif (1 \u2212 \u03b5)f (S) \u2264 g(S) \u2264 (1 + \u03b5)f (S)\notherwise\n\nNote that by its de\ufb01nition, this function is an \u03b5-approximately submodular function. To show the\nlower bound, we reduce the problem of proving inapproximability of optimizing an approximately\nsubmodular function to the problem of distinguishing between f and g using F . We show that\nfor every algorithm, there exists a function f \u2208 F s.t. if f is unknown to the algorithm, it cannot\ndistinguish between the case in which the underlying function is f and the case in which the\nunderlying function is g using polynomially-many value queries to F , even when g is known to the\nalgorithm. Since max|S|\u2264k f (S) \u2265 \u03c1(n) max|S|\u2264k g(S), this implies that no algorithm can obtain\nan approximation better than \u03c1(n) using polynomially-many queries; otherwise such an algorithm\ncould be used to distinguish between f and g.\n\n2.1 Monotone submodular functions\nConstructing a class of hard functions. A natural candidate for a class of functions F and a\nfunction g satisfying the properties described in the overview is:\n\nf H (S) = |S \u2229 H|\n\nand\n\ng(S) =\n\n|S|h\nn\n\nfor H \u2286 N of size h. The reason why g is hard to distinguish from f H is that when H is drawn\nuniformly at random among sets of size h, f H is close to g with high probability. This follows from\nan application of the Chernoff bound for negatively associated random variables. Formally, this is\nstated in Lemma 1 whose proof is given in the Appendix.\n\n4\n\n\fLemma 1. Let H \u2286 N be a set drawn uniformly among sets of size h, then for any S \u2286 N, writing\n\u00b5 =\n\n|S|h\nn , for any \u03b5 such that \u03b52\u00b5 > 1:\n\n(cid:2)(1 \u2212 \u03b5)\u00b5 \u2264 |S \u2229 H| \u2264 (1 + \u03b5)\u00b5(cid:3) \u2265 1 \u2212 2\u2212\u2126(\u03b52\u00b5)\n\nPH\n\nUnfortunately this construction fails if the algorithm is allowed to evaluate the approximately\nsubmodular function at small sets: for those the concentration of Lemma 1 is not high enough. Our\nconstruction instead relies on designing F and g such that when S is \u201clarge\u201d, then we can make use of\nthe concentration result of Lemma 1 and when S is \u201csmall\u201d, functions in F and g are deterministically\nclose to each other. Speci\ufb01cally, we introduce for H \u2286 N of size h:\n|S \u2229 (N \\ H)|, \u03b1\n\n(cid:19)(cid:19)\n\n(cid:18)\n\n(cid:18)\n\n1 \u2212 h\nn\n\nf H (S) = |S \u2229 H| + min\n|S|h\nn\n\ng(S) = min\n\n|S|,\n\n(cid:18)\n\n(cid:18)\n\n(cid:19)(cid:19)\n\n+ \u03b1\n\n1 \u2212 h\nn\n\n(3)\n\nThe value of the parameters \u03b1 and h will be set later in the analysis. Observe that when S is small\n(|S \u2229 \u00afH| \u2264 \u03b1(1\u2212 h/n) and |S| \u2264 \u03b1) then f H (S) = g(S) = |S|. When S is large, Lemma 1 implies\nthat |S \u2229 H| is close to |S|h/n and |S \u2229 (N \\ H)| is close to |S|(1 \u2212 h/n) with high probability.\nFirst note that f H and g are monotone submodular functions. f H is the sum of a monotone additive\nfunction and a monotone budget-additive function. The function g can be written g(S) = G(|S|)\nwhere G(x) = min(x, xh/n + \u03b1(1 \u2212 h/n)). G is a non-decreasing concave function (minimum\nbetween two non-decreasing linear functions) hence g is monotone submodular.\nNext, we observe that there is a gap between the maxima of the functions f H and the one of g.\nWhen S \u2264 k, g(S) =\nupper-bounded by kh\nH of size k. So for \u03b1 \u2264 k \u2264 h, we obtain:\n\nn + \u03b1(cid:0)1 \u2212 h\n(cid:1). The maximum is clearly attained when |S| = k and is\n(cid:18) \u03b1\n\nn + \u03b1. For f H, the maximum is equal to k and is attained when S is a subset of\n\n(cid:19)\n\n|S|h\n\nn\n\ng(S) \u2264\n\nmax\n|S|\u2264k\n\n+\n\nk\n\nh\nn\n\nmax\n|S|\u2264k\n\nf H (S), H \u2286 N\n\n(4)\n\nIndistinguishability. The main challenge is now to prove that f H is close to g with high probability.\nFormally, we have the following lemma.\nLemma 2. For h \u2264 n\n\n(cid:2)(1 \u2212 \u03b5)f H (S) \u2264 g(S) \u2264 (1 + \u03b5)f H (S)(cid:3) \u2265 1 \u2212 2\u2212\u2126(\u03b52\u03b1h/n)\n\n2 , let H be drawn uniformly at random among sets of size h, then for any S:\n(5)\nProof. For concision we de\ufb01ne \u00afH := N \\ H, the complement of H in N. We consider four cases\ndepending on the cardinality of S and S \u2229 \u00afH.\n\nPH\n\n|S| \u2264 \u03b1 and |S \u2229 \u00afH| \u2264 \u03b1(cid:0)1 \u2212 h\n\n(cid:1). In this case f H (S) = |S \u2229 H| + |S \u2229 \u00afH| = |S| and\n\nCase 1:\ng(S) = |S|. The two functions are equal and the inequality is immediately satis\ufb01ed.\n\nn\n\n|S| \u2264 \u03b1 and |S \u2229 \u00afH| \u2265 \u03b1(1 \u2212 h\n\nCase 2:\nf H (S) = |S \u2229 H| + \u03b1(1 \u2212 h\n\nn ). By assumption on |S \u2229 \u00afH|, we have:\n\nn ). In this case g(S) = |S| = |S \u2229 H| + |S \u2229 \u00afH| and\n(cid:18)\n\n(cid:19)\n\n(1 \u2212 \u03b5)\u03b1\n\n1 \u2212 h\nn\n\n\u2264 |S \u2229 \u00afH|\n\nFor the other side, by assumption on |S \u2229 \u00afH|, we have that |S| \u2265 \u03b1(1 \u2212 h\ncan then apply Lemma 1 and obtain:\n\n(cid:20)\n\n(cid:19)(cid:21)\n\nn ) \u2265 \u03b1\n\n2 (since h \u2264 n\n\n(cid:18)\n(cid:1). In this case f H (S) = |S \u2229 H| + \u03b1(1 \u2212 h\n\n\u2265 1 \u2212 2\u2212\u2126(\u03b52\u03b1h/n)\n\n1 \u2212 h\nn\n\n2 ). We\n\nn ) and\n\nPH\n\n|S \u2229 \u00afH| \u2264 (1 + \u03b5)\u03b1\n\n|S| \u2265 \u03b1 and |S \u2229 \u00afH| \u2265 \u03b1(cid:0)1 \u2212 h\n\nn\n\nCase 3:\ng(S) =\n\n(cid:20)\n|S|h\nn + \u03b1(1 \u2212 h\n\nPH\n\nn ). We need to show that:\n(1 \u2212 \u03b5)\n\n\u2264 |S \u2229 H| \u2264 (1 + \u03b5)\n\n|S|h\nn\n\n|S|h\nn\n\n\u2265 1 \u2212 2\u2212\u2126(\u03b52\u03b1h/n)\n\n(cid:21)\n\nThis is a direct consequence of Lemma 1.\n\n5\n\n\f|S| \u2265 \u03b1 and |S \u2229 \u00afH| \u2264 \u03b1(cid:0)1 \u2212 h\n\n(cid:1). In this case f H (S) = |S \u2229 H| + |S \u2229 \u00afH| and\n\nn\n\nCase 4:\ng(S) =\n\n(cid:20)\n|S|h\nn + \u03b1(1 \u2212 h\n\nPH\n\nn ). As in the previous case, we have:\n|S|h\n(1 \u2212 \u03b5)\nn\n\n\u2264 |S \u2229 H| \u2264 (1 + \u03b5)\n\n|S|h\nn\n\n\u2265 1 \u2212 2\u2212\u2126(\u03b52\u03b1h/n)\n\nBy the assumption on |S \u2229 \u00afH|, we also have:\n\n(cid:18)\n\n(cid:19)\n\nSo we need to show that:\nPH\n\n(cid:20)\n\n|S \u2229 \u00afH| \u2264 \u03b1\n\n1 \u2212 h\nn\n\n(cid:19)\n\n(cid:18)\n\n(1 \u2212 \u03b5)\u03b1\n\n1 \u2212 h\nn\n\n\u2264 (1 + \u03b5)\u03b1\n\n1 \u2212 h\nn\n\n(cid:21)\n\n\u2264 |S \u2229 \u00afH|\n\n\u2265 1 \u2212 2\u2212\u2126(\u03b52\u03b1h/n)\n\n(cid:21)\n\n(cid:18)\n\n(cid:19)\n\nand then we will be able to conclude by union bound. This is again a consequence of Lemma 1.\nTheorem 3. For any 0 < \u03b2 < 1\nn1/2\u2212\u03b2 , and any (possibly randomized) algorithm with\nquery-complexity smaller than 2\u2126(n\u03b2/2), there exists an \u03b5-approximately submodular function F\nsuch that for the problem of maximizing F under a cardinality constraint, the algorithm achieves an\napproximation ratio upper-bounded by\n\nn\u03b2/2 with probability at least 1 \u2212\n\n2 , \u03b5 \u2265\n\n.\n\n1\n\n2\n\n1\n\n2\u2126(n\u03b2/2)\n\nProof. We set k = h = n1\u2212\u03b2/2 and \u03b1 = n1\u2212 \u03b2. Let H be drawn uniformly at random among sets of\nsize h and let f H and g be as in (3). We \ufb01rst de\ufb01ne the \u03b5-approximately submodular function F H:\n\nF H (S) =\n\nf H (S) otherwise\n\nif (1 \u2212 \u03b5)f H (S) \u2264 g(S) \u2264 (1 + \u03b5)f H (S)\n\n(cid:26)g(S)\n\nIt is clear from the de\ufb01nition that this is an \u03b5-approximately submodular function. Consider a\ndeterministic algorithm A and let us denote by S1, . . . , Sm the queries made by the algorithm when\ngiven as input the function g (g is 0-approximately submodular, hence it is a valid input to A).\nWithout loss of generality, we can include the set returned by the algorithm in the queries, so Sm\ndenotes the set returned by the algorithm. By (5), for any i \u2208 [m]:\n\nwhen these events realize, we have F H (Si) = g(Si). By union bound over i, when m < 2\u2126(cid:0)n\n\nPH [(1 \u2212 \u03b5)f H (Si) \u2264 g(Si) \u2264 (1 + \u03b5)f H (Si)] \u2265 1 \u2212 2\n\n2(cid:1)\n\n:\n\n\u03b2\n\nPH [\u2200i, F H (Si) = g(Si)] > 1 \u2212 m2\n\n= 1 \u2212 2\n\n> 0\n\nThis implies the existence of H such that A follows the same query path when given g and F H as\ninputs. For this H:\n\nF H (Sm) = g(Sm) \u2264 max\n|S|\u2264k\n\nh\nn\nwhere the second inequality comes from (4). For our choice of parameters, \u03b1\nk + h\n\ng(S) \u2264\n\nf H (S) =\n\nmax\n|S|\u2264k\n\nh\nn\n\n+\n\n+\n\nk\n\nk\n\nF H (S)\n\nmax\n|S|\u2264k\n\nn = 2/n\n\n\u03b2\n\n2 , hence:\n\n(cid:18) \u03b1\n\n\u03b2\n\n2(cid:1)\n\u2212\u2126(cid:0)n\n\u2212\u2126(cid:0)n\u03b2/2(cid:1)\n(cid:19)\n(cid:18) \u03b1\n\n\u2212\u2126(cid:0)n\u03b2/2(cid:1)\n(cid:19)\n\nF H (Sm) \u2264 2\nn\n\n\u03b2\n2\n\nF H (S)\n\nmax\n|S|\u2264k\n\n(cid:20)\n\nLet us now consider the case where the algorithm A is randomized and let us denote AH,R the\nsolution returned by the algorithm when given function F H as input and random bits R. We have:\nPH,R\n\nP[R = r]PH\n\n(cid:88)\n\nF H (S)\n\nF H (S)\n\n=\n\nF H (AH,R) \u2264 2\nn\u03b2/2\n\nmax\n|S|\u2264k\n\n(cid:21)\n\n(cid:21)\n\nr\n\n\u2265 (1 \u2212 2\u2212\u2126(n\n\n\u03b2\n\n2 ))\n\nF H (AH,R) \u2264 2\nn\u03b2/2\n\nmax\n|S|\u2264k\nP[R = r] = 1 \u2212 2\u2212\u2126(n\u03b2 2)\n\n(cid:20)\n\nwhere the equality comes from the analysis of the deterministic case (when the random bits are \ufb01xed,\nthe algorithm is deterministic). This implies the existence of H such that:\n\nPR\n\nF H (AH,R) \u2264 2\nn\u03b2/2\n\nF H (S)\n\n\u2265 1 \u2212 2\u2212\u2126(n\u03b2 2)\n\nand concludes the proof of the theorem.\n\n(cid:20)\n(cid:88)\n\nr\n\n(cid:21)\n\nmax\n|S|\u2264k\n\n6\n\n\f2.2 Coverage functions\n\nuniverse. For a set S = {S1, . . . ,Sm} of subsets of U, the value f (S) is given by f (S) = |(cid:83)m\n\nIn this section, we show that an exponential query-complexity lower bound still holds even in\nthe restricted case where the objective function approximates a coverage function. Recall that by\nde\ufb01nition of a coverage function, the elements of the ground set N are subsets of a set U called the\ni=1 Si|.\nTheorem 4. For any 0 < \u03b2 < 1\nn1/3\u2212\u03b2 , and any (possibly randomized) algorithm with\nquery-complexity smaller than 2\u2126(n\u03b2/2), there exists a function F which \u03b5-approximates a coverage\nfunction, such that for the problem of maximizing F under a cardinality constraint, the algorithm\nachieves an approximation ratio upper-bounded by\n\nn\u03b2/2 with probability at least 1 \u2212\n\n2 , \u03b5 \u2265\n\n.\n\n1\n\n1\n\n2\n\n2\u2126(n\u03b2/2)\n\nThe proof of Theorem 4 has the same structure as the proof of Theorem 3. The main difference is a\ndifferent choice of class of functions F and function g. The details can be found in the appendix.\n\n3 Approximation algorithms\n\nThe results from Section 2 can be seen as a strong impossibility result since an exponential query-\ncomplexity lower bound holds even in the speci\ufb01c case of coverage functions which exhibit a lot\nof structure. Faced with such an impossibility, we analyze two ways to relax the assumptions in\norder to obtain positive results. One relaxation considers \u03b5-approximate submodularity when \u03b5 \u2264 1\nk ;\nin this case we show that the Greedy algorithm achieves a constant approximation ratio (and that\nk is tight for the Greedy algorithm). The other relaxation considers functions with stronger\n\u03b5 = 1\nstructural properties, namely, functions with bounded curvature. In this case, we show that a constant\napproximation ratio can be obtained for any constant \u03b5.\n\n3.1 Greedy algorithm\n\nFor the general class of monotone submodular functions, the result of [22] shows that a simple\ngreedy algorithm achieves an approximation ratio of 1 \u2212 1\ne . Running the same algorithm for an\n\u03b5-approximately submodular function results in a constant approximation ratio when \u03b5 \u2264 1\nk . The\ndetailed description of the algorithm can be found in the appendix.\nTheorem 5. Let F be an \u03b5-approximately submodular function, then the set S returned by the greedy\nalgorithm satis\ufb01es:\n\n(cid:32)\n\n1 \u2212\n\n(cid:18) 1 \u2212 \u03b5\n\n(cid:19)2k(cid:18)\n\n1 + \u03b5\n\n1 \u2212 1\nk\n\n(cid:19)k(cid:33)\n\nmax\nS:|S|\u2264k\n\nF (S)\n\nIn particular, for k \u2265 2, any constant 0 \u2264 \u03b4 < 1 and \u03b5 = \u03b4\n\nlower-bounded by(cid:0)1 \u2212 1\n\nF (S) \u2265\n\n1\n1 + 4k\u03b5\n(1\u2212\u03b5)2\n\ne \u2212 16\u03b4(cid:1).\n\nk , this approximation ratio is constant and\n\nProof. Let us denote by O an optimal solution to maxS:|S|\u2264K F (S) and by f a submodular repre-\nsentative of F . Let us write S = {e1, . . . , e(cid:96)} the set returned by the greedy algoithm and de\ufb01ne\nSi = {e1, . . . , ei}, then:\n\nf (O) \u2264 f (Si) +\n\ne\u2208OPT\n\n(cid:88)\n(cid:2)f (Si \u222a {e}) \u2212 f (Si)(cid:3) \u2264 f (Si) +\n(cid:20) 1\n(cid:88)\n(cid:20) 1 + \u03b5\n\nF (Si+1) \u2212 f (Si)\n\n1 \u2212 \u03b5\n\n\u2264 f (Si) +\n\n(cid:21)\n\n(cid:21)\n\nf (Si+1) \u2212 f (Si)\n\n1 \u2212 \u03b5\n\n(cid:20) 1\n(cid:88)\n(cid:20) 1 + \u03b5\n(cid:88)\n\n1 \u2212 \u03b5\n\ne\u2208O\n\n1 \u2212 \u03b5\n\ne\u2208O\n\n\u2264 f (Si) +\n\ne\u2208O\n\u2264 f (Si) + K\n\n(cid:21)\n\n(cid:21)\n\nF (Si \u222a {e}) \u2212 f (Si)\n\nf (Si+1) \u2212 f (Si)\n\nwhere the \ufb01rst inequality uses submodularity, the second uses the de\ufb01nition of approximate submodu-\nlarity, the third uses the de\ufb01nition of the Algorithm, the fourth uses approximate submodularity again\nand the last one uses that |O| \u2264 k.\n\n7\n\n\fReordering the terms, and expressing the inequality in terms of F (using the de\ufb01nition of approximate\nsubmodularity) gives:\n\n(cid:18)\n\n(cid:19)(cid:18) 1 \u2212 \u03b5\n\n(cid:19)2\n\n1 + \u03b5\n\nF (Si+1) \u2265\n\n1 \u2212 1\nk\n\nF (Si) +\n\n1\nk\n\n(cid:18) 1 \u2212 \u03b5\n\n(cid:19)2\n\n1 + \u03b5\n\nF (O)\n\nb\n\nThis is an inductive inequality of the form ui+1 \u2265 aui + b, u0 = 0. Whose solution is ui \u2265\n1\u2212a (1 \u2212 ai). For our speci\ufb01c a and b, we obtain:\n(cid:18)\n\n(cid:19)2k(cid:33)\n\n(cid:19)k(cid:18) 1 \u2212 \u03b5\n\n(cid:32)\n\nF (S) \u2265\n\n1\n1 + 4k\u03b5\n(1\u2212\u03b5)2\n\n1 \u2212\n\n1 \u2212 1\nk\n\n1 + \u03b5\n\nF (O)\n\nThe following proposition shows that \u03b5 = 1\neven for additive functions. The proof can be found in the Appendix.\n\nProposition 6. For any \u03b2 > 0, there exists an \u03b5-approximately additive function with \u03b5 = \u2126(cid:0) 1\n\nk is tight for the greedy algorithm, and that this is the case\n\n(cid:1)\n\nfor which the Greedy algorithm has non-constant approximation ratio.\n\nk1\u2212\u03b2\n\nMatroid constraint. Theorem 5 can be generalized to the case of matroid constraints. We are now\nlooking at a problem of the form: maxS\u2208I F (S), where I is the set of independent sets of a matroid.\nTheorem 7. Let I be the set of independent sets of a matroid of rank k, and let F be an \u03b5-\napproximately submodular function, then if S is the set returned by the greedy algorithm:\n\n(cid:18) 1 \u2212 \u03b5\n\n(cid:19)\n\n1\n1 + k\u03b5\n1\u2212\u03b5\nIn particular, for k \u2265 2, any constant 0 \u2264 \u03b4 < 1 and \u03b5 = \u03b4\nlower-bounded by ( 1\n\nF (S) \u2265 1\n2\n\n1 + \u03b5\n\n2 \u2212 2\u03b4).\n3.2 Bounded curvature\n\nmax\nS\u2208I\n\nf (S)\n\nk , this approximation ratio is constant and\n\nWith an additional assumption on the curvature of the submodular function f, it is possible to\nobtain a constant approximation ratio for any \u03b5-approximately submodular function with constant\n\u03b5. Recall that the curvature c of function f : 2N \u2192 R is de\ufb01ned by c = 1 \u2212 mina\u2208N\n. A\nconsequence of this de\ufb01nition when f is submodular is that for any S \u2286 N and a \u2208 N \\ S we have\nthat fS(a) \u2265 (1 \u2212 c)f (a).\nProposition 8. For the problem max|S|\u2264k F (S) where F is an \u03b5-approximately submodular function\nwhich approximates a monotone submodular f with curvature c, there exists a polynomial time\nalgorithm which achieves an approximation ratio of (1 \u2212 c)( 1\u2212\u03b5\n\nfN\\{a}(a)\n\nf (a)\n\n1+\u03b5 )2.\n\nReferences\n[1] F. Bach. Structured sparsity-inducing norms through submodular functions. In NIPS, 2010.\n\n[2] A. Badanidiyuru, S. Dobzinski, H. Fu, R. Kleinberg, N. Nisan, and T. Roughgarden. Sketching\n\nvaluation functions. In SODA, pages 1025\u20131035. SIAM, 2012.\n\n[3] M.-F. Balcan and N. J. Harvey. Learning submodular functions. In Proceedings of the forty-third\n\nannual ACM symposium on Theory of computing, pages 793\u2013802. ACM, 2011.\n\n[4] A. Belloni, T. Liang, H. Narayanan, and A. Rakhlin. Escaping the local minima via simulated\nannealing: Optimization of approximately convex functions. In COLT, pages 240\u2013265, 2015.\n\n[5] Y. Chen, S. H. Hassani, A. Karbasi, and A. Krause. Sequential information maximization:\n\nWhen is greedy near-optimal? In COLT, pages 338\u2013363, 2015.\n\n[6] A. Das, A. Dasgupta, and R. Kumar. Selecting diverse features via spectral relaxation. In NIPS,\n\n2012.\n\n8\n\n\f[7] A. Das and D. Kempe. Submodular meets spectral: Greedy algorithms for subset selection,\n\nsparse approximation and dictionary selection. In ICML, 2011.\n\n[8] A. Defazio and T. Caetano. A convex formulation for learning scale-free networks via submod-\n\nular relaxation. In NIPS, 2012.\n\n[9] D. Golovin and A. Krause. Adaptive submodularity: Theory and applications in active learning\n\nand stochastic optimization. JAIR, 42:427\u2013486, 2011.\n\n[10] R. Gomes and A. Krause. Budgeted nonparametric learning from data streams. In ICML, 2010.\n[11] C. Guestrin, A. Krause, and A. Singh. Near-optimal sensor placements in Gaussian processes.\n\nIn International Conference on Machine Learning (ICML), August 2005.\n\n[12] A. Guillory and J. Bilmes. Simultaneous learning and covering with adversarial noise. In ICML,\n\n2011.\n\n[13] A. Hassidim and Y. Singer. Submodular optimization under noise. CoRR, abs/1601.03095,\n\n2016.\n\n[14] S. Hoi, R. Jin, J. Zhu, and M. Lyu. Batch mode active learning and its application to medical\n\nimage classi\ufb01cation. In ICML, 2006.\n\n[15] R. K. Iyer and J. A. Bilmes. Submodular optimization with submodular cover and submodular\n\nknapsack constraints. In NIPS, pages 2436\u20132444, 2013.\n\n[16] R. K. Iyer, S. Jegelka, and J. A. Bilmes. Curvature and optimal algorithms for learning and\n\nminimizing submodular functions. In NIPS, pages 2742\u20132750, 2013.\n\n[17] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of in\ufb02uence through a social\n\nnetwork. In KDD, 2003.\n\n[18] A. Krause and C. Guestrin. Near-optimal observation selection using submodular functions. In\n\nNational Conference on Arti\ufb01cial Intelligence (AAAI), Nectar track, July 2007.\n\n[19] A. Krause and C. Guestrin. Nonmyopic active learning of gaussian processes. an exploration\u2013\n\nexploitation approach. In ICML, 2007.\n\n[20] A. Krause and C. Guestrin. Submodularity and its applications in optimized information\n\ngathering. ACM Trans. on Int. Systems and Technology, 2(4), 2011.\n\n[21] H. Lin and J. Bilmes. A class of submodular functions for document summarization.\n\nACL/HLT, 2011.\n\nIn\n\n[22] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing\n\nsubmodular set functions\u2014i. Mathematical Programming, 14(1):265\u2013294, 1978.\n\n[23] M. G. Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and in\ufb02uence.\n\nACM TKDD, 5(4), 2011.\n\n[24] M. G. Rodriguez and B. Sch\u00f6lkopf. Submodular inference of diffusion networks from multiple\n\ntrees. In ICML, 2012.\n\n[25] Y. Singer and J. Vondr\u00e1k. Information-theoretic lower bounds for convex optimization with\n\nerroneous oracles. In NIPS, pages 3186\u20133194, 2015.\n\n[26] A. Singla, S. Tschiatschek, and A. Krause. Noisy submodular maximization via adaptive\nsampling with applications to crowdsourced image collection summarization. arXiv preprint\narXiv:1511.07211, 2015.\n\n[27] H. Song, R. Girshick, S. Jegelka, J. Mairal, Z. Harchaoui, and T. Darrell. On learning to localize\n\nobjects with minimal supervision. In ICML, 2014.\n\n[28] S. Tschiatschek, R. Iyer, H. Wei, and J. Bilmes. Learning mixtures of submodular functions for\n\nimage collection summarization. In NIPS, 2014.\n\n[29] J. Zheng, Z. Jiang, R. Chellappa, and J. Phillips. Submodular attribute selection for action\n\nrecognition in video. In NIPS, 2014.\n\n9\n\n\f", "award": [], "sourceid": 1517, "authors": [{"given_name": "Thibaut", "family_name": "Horel", "institution": "Harvard University"}, {"given_name": "Yaron", "family_name": "Singer", "institution": "Harvard University"}]}