{"title": "Non-monotone Submodular Maximization in Exponentially Fewer Iterations", "book": "Advances in Neural Information Processing Systems", "page_first": 2353, "page_last": 2364, "abstract": "In this paper we consider parallelization for applications whose objective can be\nexpressed as maximizing a non-monotone submodular function under a cardinality constraint. Our main result is an algorithm whose approximation is arbitrarily close\nto 1/2e in O(log^2 n) adaptive rounds, where n is the size of the ground set. This is an exponential speedup in parallel running time over any previously studied algorithm for constrained non-monotone submodular maximization. Beyond its provable guarantees, the algorithm performs well in practice. Specifically, experiments on traffic monitoring and personalized data summarization applications show that the algorithm finds solutions whose values are competitive with state-of-the-art algorithms while running in exponentially fewer parallel iterations.", "full_text": "Non-monotone Submodular Maximization\n\nin Exponentially Fewer Iterations\n\nEric Balkanski\nHarvard University\n\nericbalkanski@g.harvard.edu\n\nAdam Breuer\n\nHarvard University\n\nbreuer@g.harvard.edu\n\nYaron Singer\n\nHarvard University\n\nyaron@seas.harvard.edu\n\nAbstract\n\nIn this paper we consider parallelization for applications whose objective can be\nexpressed as maximizing a non-monotone submodular function under a cardinality\nconstraint. Our main result is an algorithm whose approximation is arbitrarily close\nto 1/2e in O(log2 n) adaptive rounds, where n is the size of the ground set. This\nis an exponential speedup in parallel running time over any previously studied\nalgorithm for constrained non-monotone submodular maximization. Beyond its\nprovable guarantees, the algorithm performs well in practice. Speci\ufb01cally, experi-\nments on traf\ufb01c monitoring and personalized data summarization applications show\nthat the algorithm \ufb01nds solutions whose values are competitive with state-of-the-art\nalgorithms while running in exponentially fewer parallel iterations.\n\n1\n\nIntroduction\n\nIn machine learning, many fundamental quantities we care to optimize such as entropy, graph\ncuts, diversity, coverage, diffusion, and clustering are submodular functions. Although there has\nbeen a great deal of work in machine learning on applications that require constrained monotone\nsubmodular maximization, many interesting submodular objectives are non-monotone. Constrained\nnon-monotone submodular maximization is used in large-scale personalized data summarization\napplications such as image summarization, movie recommendation, and revenue maximization in\nsocial networks [MBK16]. In addition, many data mining applications on networks require solving\nconstrained max-cut problems (see Section 4).\nNon-monotone submodular maximization is well-studied [FMV11, LMNS09, GRST10, FNS11,\nGV11, BFNS14, CJV15, MBK16, EN16], particularly under a cardinality constraint [LMNS09,\nGRST10, GV11, BFNS14, MBK16]. For maximizing a non-monotone submodular function under a\ncardinality constraint k, a simple randomized greedy algorithm that iteratively includes a random\nelement from the set of k elements with largest marginal contribution at every iteration achieves a 1/e\napproximation to the optimal set of size k [BFNS14]. For more general constraints, Mirzasoleiman et\nal. develop an algorithm with strong approximation guarantees that works well in practice [MBK16].\nWhile the algorithms for constrained non-monotone submodular maximization achieve strong ap-\nproximation guarantees, their parallel runtime is linear in the size of the data due to their high\nadaptivity. Informally, the adaptivity of an algorithm is the number of sequential rounds it requires\nwhen polynomially-many function evaluations can be executed in parallel in each round. The\nadaptivity of the randomized greedy algorithm is k since it sequentially adds elements in k rounds.\nThe algorithm in Mirzasoleiman et al. is also k-adaptive, as is any known constant approximation\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\falgorithm for constrained non-monotone submodular maximization. In general, k may be \u2326(n), and\nhence the adaptivity as well as the parallel runtime of all known constant approximation algorithms\nfor constrained submodular maximization are at least linear in the size of the data.\nFor large-scale applications we seek algorithms with low adaptivity. Low adaptivity is what\nenables algorithms to be ef\ufb01ciently parallelized (see Appendix A for further discussion). For\nthis reason, adaptivity is studied across a wide variety of areas including online learning\n[NSYD17], ranking [Val75, Col88, BMW16], multi-armed bandits [AAAK17], sparse recovery\n[HNC09, IPW11, HBCN09], learning theory [CG17, BGSMdW12, CST+17], and communication\ncomplexity [PS84, DGS84, NW91]. For submodular maximization, somewhat surprisingly, until\nvery recently \u2326(n) was the best known adaptivity (and hence best parallel running time) required for\na constant factor approximation to monotone submodular maximization under a cardinality constraint.\nAlthough there has been a great deal of work on distributed submodular optimization (e.g. in the\nMap-Reduce model), the algorithms for distributed submodular optimization address the challenges\nassociated with processing data that exceeds memory capacity. These algorithms partition the ground\nset to multiple machines and run sequential greedy algorithms on each machine separately and are\ntherefore \u2326(n)-adaptive in the worst case (e.g. [CKT10, KMVV15, MKSK13, MZ15, BENW16]).\nA recent line of work introduces new techniques for maximizing monotone submodular functions\nunder a cardinality constraint that produce algorithms that are O(log n)-adaptive and achieve both\nstrong constant factor approximation guarantees [BS18a, BS18b] and even optimal approximation\nguarantees [BRS18, EN18]. This is tight in the sense that no algorithm can achieve a constant factor\napproximation with \u02dco(log n) rounds [BS18a]. Unfortunately, these techniques are only applicable to\nmonotone submodular maximization and can be arbitrarily bad in the non-monotone case.\n\nIs it possible to design fast parallel algorithms for non-monotone submodular maximization?\n\nFor unconstrained non-monotone submodular maximization, one can trivially obtain an approximation\nof 1/4 in 0 rounds by simply selecting a set uniformly at random [FMV11]. We therefore focus on\nthe problem of maximizing a non-monotone submodular function under a cardinality constraint.\n\nMain result. Our main result is the BLITS algorithm, which obtains an approximation ratio ar-\nbitrarily close to 1/2e for maximizing a non-monotone (or monotone) submodular function under\na cardinality constraint in O(log2 n) adaptive rounds (and O(log3 n) parallel runtime \u2014 see Ap-\npendix A), where n is the size of the ground set. Although its approximation ratio is about half of\nthe best known approximation for this problem [BFNS14], it achieves its guarantee in exponentially\nfewer rounds. Furthermore, we observe across a variety of experiments that despite this slightly\nweaker worst-case approximation guarantee, BLITS consistently returns solutions that are competitive\nwith the state-of-the-art, and does so exponentially faster.\n\nTechnical overview. Non-monotone submodular functions are notoriously challenging to optimize.\nUnlike in the monotone case, standard algorithms for submodular maximization such as the greedy\nalgorithm perform arbitrarily poorly on non-monotone functions, and the best achievable approxima-\ntion remains unknown.1 Since the marginal contribution of an element to a set is not guaranteed to\nbe non-negative, an algorithm\u2019s local decisions in the early stages of optimization may contribute\nnegatively to the value of its \ufb01nal solution. At a high level, we overcome this problem with an algo-\nrithmic approach that iteratively adds to the solution blocks of elements obtained after aggressively\ndiscarding other elements. Showing the guarantees for this algorithm on non-monotone functions\nrequires multiple subtle components. Speci\ufb01cally, we require that at every iteration, any element\nis added to the solution with low probability. This requirement imposes a signi\ufb01cant additional\nchallenge to just \ufb01nding a block of high contribution at every iteration, but it is needed to show\nthat in future iterations there will exist a block with large contribution to the solution. Second, we\nintroduce a pre-processing step that discards elements with negative expected marginal contribution\nto a random set drawn from some distribution. This pre-processing step is needed for two different\narguments: the \ufb01rst is that a large number of elements are discarded at every iteration, and the second\nis that a random block has high value when there are k surviving elements.\n\n1To date, the best upper and lower bounds are [BFNS14] and [GV11] respectively for non-monotone\n\nsubmodular maximization under a cardinality constraint.\n\n2\n\n\fPaper organization. Following a few preliminaries, we present the algorithm and its analysis in\nsections 2 and 3. We present the experiments in Section 4.\n\nPreliminaries. A function f : 2N ! R+ is submodular if the marginal contributions fS(a) :=\nf (S [ a) f (S) of an element a 2 N to a set S \u2713 N are diminishing, i.e., fS(a) fT (a) for\nall a 2 N \\ T and S \u2713 T . It is monotone if f (S) \uf8ff f (T ) for all S \u2713 T . We assume that f is\nnon-negative, i.e., f (S) 0 for all S \u2713 N, which is standard. We denote the optimal solution by\nO, i.e. O := argmax|S|\uf8ffk f (S), and its value by OPT := f (O). We use the following lemma from\n[FMV11], which is useful for non-monotone functions:\nLemma 1 ([FMV11]). Let g : 2N ! R be a non-negative submodular function. Denote by A(p)\na random subset of A where each element appears with probability at most p (not necessarily\nindependently). Then, E [g(A(p))] (1 p)g(;) + p \u00b7 g(A) (1 p)g(;).\nAdaptivity.\nInformally, the adaptivity of an algorithm is the number of sequential rounds it requires\nwhen polynomially-many function evaluations can be executed in parallel in each round. Formally,\ngiven a function f, an algorithm is r-adaptive if every query f (S) for the value of a set S occurs at a\nround i 2 [r] such that S is independent of the values f (S0) of all other queries at round i.\n2 The BLITS Algorithm\n\nIn this section, we describe the BLock ITeration Submodular maximization algorithm (henceforth\nBLITS), which obtains an approximation arbitrarily close to 1/2e in O(log2 n) adaptive rounds.\nBLITS iteratively identi\ufb01es a block of at most k/r elements using a SIEVE subroutine, treated as a\nblack-box in this section, and adds this block to the current solution S, for r iterations.\n\nAlgorithm 1 BLITS: the BLock ITeration Submodular maximization algorithm\nInput: constraint k, bound on number of iterations r\n\nS ;\nfor r iterations i = 1 to r do\nS S [ SIEVE(S, k, i, r)\n\nreturn S\n\nThe main challenge is to \ufb01nd in logarithmically many rounds a block of size at most k/r to add to the\ncurrent solution S. Before describing and analyzing the SIEVE subroutine, in the following lemma\nwe reduce the problem of showing that BLITS obtains a solution of value \u21b5v?/e to showing that\nSIEVE \ufb01nds a block with marginal contribution at least (\u21b5/r)((1 1/r)i1v? f (Si1)) to S at\nevery iteration i, where we wish to obtain v? close to OPT. The proof generalizes an argument in\n[BFNS14] and is deferred to Appendix B.\nLemma 2. For any \u21b5 2 [0, 1], assume that at iteration i with current solution Si1, SIEVE returns a\nrandom set Ti s.t. E\u21e5fSi1(Ti)\u21e4 \u21b5\nThe advantage of BLITS is that it terminates after O(d \u00b7 log n) adaptive rounds when using r =\nO(log n) and a SIEVE subroutine that is d-adaptive. In the next section we describe SIEVE and prove\nthat it respects the conditions of Lemma 2 in d = O(log n) rounds.\n3 The SIEVE Subroutine\n\nri1 v? f (Si1)\u2318 . Then, E [f (Sr)] \u21b5\n\nr\u21e31 1\n\ne \u00b7 v?.\n\nIn this section, we describe and analyze the SIEVE subroutine. We show that for any constant \u270f> 0,\nthis algorithm \ufb01nds in O(log n) rounds a block of at most k/r elements with marginal contribution to\nS that is at least t/r, with t := ((1 \u270f/2)/2)((1 1/r)i1(1 \u270f/2)OPT f (Si1)), when called\nat iteration i of BLITS. By Lemma 2 with \u21b5 = (1 \u270f)/2 and v? = (1 \u270f/2)OPT, this implies that\nBLITS obtains an approximation arbitrarily close to 1/2e in O(log2 n) rounds.\nThe SIEVE algorithm, described formally below, iteratively discards elements from a set X initialized\nto the ground set N. We denote by U(X) the uniform distribution over all subsets of X of size\n\n3\n\n\fexactly k/r and by (a, S, X) the expected marginal contribution of an element a to a union of the\ncurrent solution S and a random set R \u21e0U (X), i.e.\n\n(a, S, X) := ER\u21e0U(X)\u21e5fS[(R\\a)(a)\u21e4 .\n\nAt every iteration, SIEVE \ufb01rst pre-processes surviving elements X to obtain X +, which is the set\nof elements a 2 X with non-negative marginal contribution (a, S, X). After this pre-processing\nstep, SIEVE evaluates the marginal contribution ER\u21e0U(X) [fS(R \\ X +)] of a random set R \u21e0U (X)\nwithout its elements not in X + (i.e. R excluding its elements with negative expected marginal\ncontribution). If the marginal contribution of R \\ X + is at least t/r, then R \\ X + is returned.\nOtherwise, the algorithm discards from X the elements a with expected marginal contribution\n(a, S, X) less than (1 + \u270f/2)t/k. The algorithm iterates until either E[fS(R \\ X +)] t/r or\nthere are less than k surviving elements, in which case SIEVE returns a random set R \\ X + with\nR \u21e0U (X) and with dummy elements added to X so that |X| = k. A dummy element a is an\nelement with fS(a) = 0 for all S.\n\nAlgorithm 2 SIEVE(S, k, i, r)\nInput: current solution S at outer-iteration i \uf8ff r\nX N, t 1\u270f/2\nwhile |X| > k do\n\n((1 1/r)i1 (1 \u270f/2)OPT f (S))\n\n2\n\nX + { a 2 X :(\nif ER\u21e0U(X) [fS(R \\ X +)] t/r return R \\ X +, where R \u21e0U (X)\nX { a 2 X :(\n\na, S, X) 0}\na, S, X) (1 + \u270f/4) t/k}\n\nX X [{ k | X| dummy elements}\nX + { a 2 X :(\na, S, X) 0}\nreturn R \\ X +, where R \u21e0U (X)\n\nThe above description is an idealized version of the algorithm. In practice, we do not know OPT\nand we cannot compute expectations exactly. Fortunately, we can apply multiple guesses for OPT\nnon-adaptively and obtain arbitrarily good estimates of the expectations in one round by sampling.\nThe sampling process for the estimates \ufb01rst samples m sets from U(X), then queries the desired sets\nto obtain a random realization of fS(R \\ X +) and fS[(R\\a)(a), and \ufb01nally averages the m random\nrealizations of these values. By standard concentration bounds, m = O((OPT/\u270f)2 log(1/)) samples\nare suf\ufb01cient to obtain with probability 1 an estimate with an \u270f error. For ease of presentation\nand notation, we analyze the idealized version of the algorithm, which easily extends to the algorithm\nwith estimates and guesses as in [BS18a, BS18b, BRS18]. Due to lack of space, we only include\nproof sketches for some lemmas and defer full proofs to Appendix D.\n\n3.1 The approximation\nOur goal is to show that SIEVE returns a random block whose expected marginal contribution to S is\nat least t/r. By Lemma 2 this implies BLITS obtains a (1 \u270f)/2e-approximation.\nLemma 3. Assume r 20\u21e2\u270f1 and that after at most \u21e2 1 iterations of SIEVE, SIEVE returns a\nset R at iteration i of BLITS, then E[fS(R)] t\nThe remainder of the analysis of the approximation is devoted to the proof of Lemma 3. First note\nthat if SIEVE returns R \\ X +, then the desired bound on E[fS(R)] follows immediately from the\ncondition to return that block. Otherwise SIEVE returns R due to |X|\uf8ff k, and then the proof consists\nof two parts. First, in Section 3.1.1 we argue that when SIEVE terminates, there exists a subset T\nof X for which fS(T ) t. Then, in Section 3.1.2 we prove that such a subset T of X for which\nfS(T ) t not only exists, but is also returned by SIEVE. We do this by proving a new general lemma\nfor non-monotone submodular functions that may be of independent interest. This lemma shows that\na random subset of X of size s well approximates the optimal subset of size s in X.\n\nri1 (1 \u270f/2)OPT f (S)\u2318 .\n\n2r \u21e31 1\n\nr = 1\u270f/2\n\n3.1.1 Existence of a surviving block with high contribution to S\nThe main result in this section is Lemma 6, which shows that when SIEVE terminates there exists a\nsubset T of X s.t fS(T ) t. To prove this, we \ufb01rst prove Lemma 4, which argues that f (O [ S) \n\n4\n\n\f(1 1/r)i1OPT. This bound explains the (1 1/r)i1(1 \u270f/2)OPT f (Si1)) term in t. For\nmonotone functions, this is trivial since f (O [ S) f (O) = OPT by de\ufb01nition of monotonicity. For\nnon-monotone functions, this inequality does not hold. Instead, the approach used to bound f (O [ S)\nis to argue that any element a 2 N is added to S by SIEVE with probability at most 1/r at every\niteration. The key to that argument is that in both cases where SIEVE terminates we have |X| k\n(with X possibly containing dummy elements), which implies that every element a is in R \u21e0U (X)\nwith probability at most 1/r.\nLemma 4. Let S be the set obtained after i 1 iterations of BLITS calling the SIEVE subroutine,\nthen E[f (O [ S)] (1 1/r)i1OPT.\nProof. In both cases where SIEVE terminates, |X| k. Thus Pr[a 2 R \u21e0U (X)] = k/(r|X|) <\n1/r. This implies that at iteration i of BLITS, Pr[a 2 S] \uf8ff 1 (1 1/r)i1. Next, we de\ufb01ne\ng(T ) := f (O [ T ), which is also submodular. By Lemma 1 from the preliminaries, we get\n\nE[f (S [ O)] = E[g(S)] (1 1/r)i1g(;) = (1 1/r)i1OPT.\nLet \u21e2, Xj, and Rj denote the number of iterations of SIEVE(S, k, i, r), the set X at iteration j \uf8ff \u21e2\nof SIEVE, and the set R \u21e0U (Xj) respectively. We show that the expected marginal contribution\nj=1Rj approximates (1 1/r)i1OPT f (S) well. This crucial fact allows us to\nof O to S [[\u21e2\nargue about the value of optimal elements that survive iterations of SIEVE. We defer the proof to\nAppendix C.\nLemma 5. For all r, \u21e2, \u270f > 0 s.t. r 20\u21e2\u270f1, if SIEVE(S, k, i, r) has not terminated after \u21e2\nj=1Rj) (O)i (1 \u270f/10)(1 1/r)i1(1 \u270f/2)OPT f (S) .\niterations, then ER1,...,R\u21e2hfS[([\u21e2\nWe are now ready to show that when SIEVE terminates after \u21e2 iterations, there exists a subset T of\nX\u21e2 s.t fS(T ) t. At a high level, the proof de\ufb01nes T to be a set of meaningful optimal elements,\nthen uses Lemma 5 to show that these elements survive \u21e2 iterations of SIEVE and respect fS(T ) t.\nLemma 6. For all r, \u21e2, \u270f > 0, if r 20\u21e2\u270f1, then there exists T \u2713 X\u21e2, that survives \u21e2 iterations of\nSIEVE(S, k, i, r) and that satis\ufb01es fS(T ) 1\u270f/10\nProof Sketch; full proof in Appendix C. Let O = {o1, . . . , ok} be the optimal elements in some\narbitrary order and O` = {o1, . . . , o`}. We de\ufb01ne ` := ER1,...,R\u21e2[fS[O`1[([\u21e2\nj=1Rj\\{o`})(o`)]\nand T to be the set of optimal elements o` such that ` 1\nj=1Rj) (O)]. By\n2k \u00b7 ER1,...,R\u21e2[fS[([\u21e2\nLemma 5 and submodularity, we then argue that for all o` 2 T and at every iteration j \uf8ff \u21e2,\nE\u21e5fS[(Rj\\{o`})(o`)\u21e4 (1 + \u270f/4) t/k. Thus, o` 2 X\u21e2. Finally, by submodularity and the de\ufb01nition\nof T , we show that fS(T ) 1\u270f/10\n3.1.2 A random subset approximates the best surviving block\nIn the previous part of the analysis, we showed the existence of a surviving set T with contribution\n(1 1/r)i1(1 \u270f/2)OPT f (S) to S. In this part, we show that the random set\nat least 1\u270f/10\nR \\ X +, with R \u21e0U (X), is a 1/r approximation to any surviving set T \u2713 X + when |X| = k. A\nkey component of the algorithm for this argument to hold for non-monotone functions is the \ufb01nal\npre-processing step to restrict X to X + after adding dummy elements. We use this restriction to argue\nthat every element a 2 R \\ X + must contribute a non-negative expected value to the set returned.\nLemma 7. Assume SIEVE returns R \\ X + with R \u21e0U (X) and |X| = k. For any T \u2713 X +, we\nhave ER\u21e0U(X)[fS(R \\ X +)] fS(T )/r.\nProof Sketch; full proof in Appendix D. Since T \u2713 X +, we get E[fS(R \\ X +)] =\nE[fS(R\\T )]+E[fS[(R\\T )((R\\X +)\\T )]. We then use submodularity to argue that E[fS(R\\T )] \nr fS(T ). By using submodularity and the de\ufb01nition of X +, we then show that E[fS[(R\\T )((R \\\nX +) \\ T )] 0. Combining these three pieces, we get the desired inequality.\nThere is a tradeoff between the contribution fS(T ) of the best surviving set T and the contribution of\na random set R \\ X + returned in the middle of an iteration due to the thresholds (1 + \u270f/4)t/k and\nt/r, which is controlled by t. The optimization of this tradeoff explains the (1 \u270f/2)/2 term in t.\n\n(1 1/r)i1(1 \u270f/2)OPT f (S) .\n\n2\n\n(1 1/r)i1(1 \u270f/2)OPT f (S) .\n\n2\n\n2\n\n1\n\n5\n\n\f2\n\n2\n\n2r\n\n(1 1/r)i1(1 \u270f/2)OPT f (S) t/r.\n\n3.1.3 Proof of main lemma\nProof of Lemma 3. There are two cases. If SIEVE returns R \\ X + in the middle of an iteration,\n((1 1/r)i1 (1 \nthen by the condition to return that set, ER\u21e0U(X) [fS(R \\ X +)] t/r = 1\u270f/2\n\u270f/2)OPTf (S))/r. Otherwise, SIEVE returns R\\X + with |X| = k. By Lemma 6, there exists T \u2713\n(1 1/r)i1(1 \u270f/2)OPT f (S) .\nX\u21e2 that survives \u21e2 iterations of SIEVE s.t. fS(T ) 1\u270f/10\nSince there are at most \u21e21 iterations of SIEVE, T survives each iteration and the \ufb01nal pre-processing.\nThis implies that T \u2713 X + when the algorithm terminates. By Lemma 7, we then conclude that\nER\u21e0U(X)[fS(R \\ X +)] fS(T )/r 1\u270f/10\n3.2 The adaptivity of SIEVE is O(log n)\nWe now observe that the number of iterations of SIEVE is O(log n). This logarithmic adaptivity is\ndue to the fact that SIEVE either returns a random set or discards a constant fraction of the surviving\nelements at every iteration. Similarly to Section 3.1.2, the pre-processing step to obtain X + is crucial\nto argue that since a random subset R \\ X + has contribution below the t/r threshold and since all\nelements in X + have non-negative marginal contributions, there exists a large set of elements in X +\nwith expected marginal contribution to S [ R that is below the (1 + \u270f/4)t/k threshold. We defer the\nproof to Appendix E.\nLemma 8. Let Xj and Xj+1 be the surviving elements X at the start and end of iteration j of\nSIEVE(S, k, i, r). For all S \u2713 N and r, j, \u270f > 0, if SIEVE(S, k, i, r) does not terminate at iteration\nj, then |Xj+1| < |Xj|/(1 + \u270f/4).\n3.3 Main result for BLITS\n\n2e approximation.\n\nadaptive and obtains a 1\u270f\n\nTheorem 1. For any constant \u270f> 0, BLITS initialized with r = 20\u270f1 log1+\u270f/2(n) is Olog2 n-\nri1 (1 \u270f/2)OPT f (S)\u2318 . Thus, by\nProof. By Lemma 3, we have E[fS(R)] 1\u270f/2\nLemma 2 with \u21b5 = 1\u270f/2\nand v? = (1 \u270f/2)OPT, BLITS returns S that satis\ufb01es E [f (S)] \n1\u270f/2\n2e \u00b7 OPT. For adaptivity, note that each iteration of SIEVE has two adaptive\n\u00b7 (1 \u270f/2)OPT 1\u270f\nrounds: one for (a, S, X) for all a 2 N and one for ER\u21e0U(X) [fS(R \\ X +)]. Since |X| decreases\nby a 1+\u270f/4 fraction at every iteration of SIEVE, every call to SIEVE has at most log1+\u270f/4(n) iterations.\nFinally, as there are r = 20\u270f1 log1+\u270f/4(n) iterations of BLITS, the adaptivity is Olog2 n.\n\n2 \u21e31 1\n\n2e\n\n2\n\n4 Experiments\n\nOur goal in this section is to show that beyond its provable guarantees, BLITS performs well in\npractice across a variety of application domains. Speci\ufb01cally, we are interested in showing that despite\nthe fact that the parallel running time of our algorithm is smaller by several orders of magnitude\nthan that of any known algorithm for maximizing non-monotone submodular functions under a\ncardinality constraint, the quality of its solutions are consistently competitive with or superior to\nthose of state-of-the-art algorithms for this problem. To do so, we conduct two sets of experiments\nwhere the goal is to solve the problem of maxS:|S|\uf8ffk f (S) given a function f that is submodular\nand non-monotone. In the \ufb01rst set of experiments, we test our algorithm on the classic max-cut\nobjective evaluated on graphs generated by various random graph models. In the second set of\nexperiments, we apply our algorithm to a max-cut objective on a new road network dataset, and we\nalso benchmark it on the three objective functions and datasets used in [MBK16]. In each set of\nexperiments, we compare the quality of solutions found by our algorithm to those found by several\nalternative algorithms.\n\n4.1 Experiment set I: cardinality constrained max-cut on synthetic graphs\nGiven an undirected graph G = (N, E), recall that the cut induced by a set of nodes S \u2713 N denoted\nC(S) is the set of edges that have one end point in S and another in N \\ S. The cut function\n\n6\n\n\fER Graph\n\nSBM Graph\n\n\u25cf\u25cf\n\n\u25cf\n\n1e+05\n\n)\n\nS\n\n(\nf\n\n5e+04\n\n0e+00\n\n0\n\n\u25cf\n\n\u25cf\n\u25cf\n\nBLITS\nBLITS+\nRandomGreedy\nP\u2212Fantom\nGreedy\nRandom\n\n200\n\n400\nRound\n\n600\n\n10000\n\n7500\n\n\u25cf\u25cf\n\n\u25cf\n\n)\n\nS\n\n(\nf\n\n5000\n\n2500\n\n0\n\n0\n\n100\n\n200\nRound\n\n300\n\n20000\n\n15000\n\n\u25cf\u25cf\n\n\u25cf\n\n\u25cf\n\u25cf\n\n\u25cf\n\n)\n\nS\n\n(\nf\n\n10000\n\n5000\n\n0\n\n0\n\nBA Graph\n\nConfiguration Graph\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\n750\n\n500\n\n)\n\nS\n\n(\nf\n\n250\n\n0\n\n0\n\n\u25cf\n\u25cf\n\n\u25cf\n\n100\n\n200\n\nRound\n\n300\n\n100\n\n200\nRound\n\n300\n\nFigure 1: Experiments Set 1: Random Graphs. Performance of BLITS (red) and BLITS+ (blue) versus\nRANDOMGREEDY (yellow), P-FANTOM (green), GREEDY (dark blue), and RANDOM (purple).\n\nf (S) = |C(S)| is a quintessential example of a non-monotone submodular function. To study the\nperformance of our algorithm on different cut functions, we use four well-studied random graph\nmodels that yield cut functions with different properties. For each of these graphs, we run the\nalgorithms from Section 4.3 to solve maxS:|S|\uf8ffk |C(S)| for different k:\n\n\u2022 Erd\u02ddos R\u00e9nyi. We construct a G(n, p) graph with n = 1000 nodes and p = 1/2. We set\nk = 700. Since each node\u2019s degree is drawn from a Binomial distribution, many nodes will\nhave a similar marginal contribution to the cut function, and a random set S may perform\nwell.\n\u2022 Stochastic block model. We construct an SBM graph with 7 disconnected clusters of 30 to\n120 nodes and a high (p = 0.8) probability of an edge within each cluster. We set k = 360.\nUnlike for G(n, p), here we expect a set S to achieve high value only by covering all of the\nclusters.\n\n\u2022 Barb\u00e1si-Albert. We create a graph with n = 500 and m = 100 edges added per iteration.\nWe set k = 333. We expect that a relatively small number of nodes will have high degree in\nthis model, so a set S consisting of these nodes will have much greater value than a random\nset.\n\n\u2022 Con\ufb01guration model. We generate a con\ufb01guration model graph with n = 500, a power\nlaw degree distribution with exponent 2. We set k = 333. Although con\ufb01guration model\ngraphs are similar to Barb\u00e1si-Albert graphs, their high degree nodes are not connected to\neach other, and thus greedily adding these high degree nodes to S is a good heuristic.\n\n4.2 Experiment set II: performance benchmarks on real data\n\nTo measure the performance of BLITS on real data, we use it to optimize four different objective\nfunctions, each on a different dataset. Speci\ufb01cally, we consider a traf\ufb01c monitoring application\nas well as three additional applications introduced and experimented with in [MBK16]: image\nsummarization, movie recommendation, and revenue maximization. We note that while these\napplications are sometimes modeled with monotone objectives, there are many advantages to using\nnon-monotone objectives (see [MBK16]). We brie\ufb02y describe these objective functions and data here\nand provide additional details in Appendix F.\n\n\u2022 Traf\ufb01c monitoring. Consider an application where a government has a budget to build a\n\ufb01xed set of monitoring locations to monitor the traf\ufb01c that enters or exits a region via its\ntransportation network. Here, the goal is not to monitor traf\ufb01c circulating within the network,\nbut rather to choose a set of locations (or nodes) such that the volume of traf\ufb01c entering or\nexiting via this set is maximal. To accomplish this, we optimize a cut function de\ufb01ned on\nthe weighted transportation network. More precisely, we seek to solve maxS:|S|\uf8ffk f (S),\nwhere f (S) is the sum of weighted edges (e.g. traf\ufb01c counts between two points) that have\none end point in S and another in N \\ S. To conduct an experiment for this application, we\nreconstruct California\u2019s highway transportation network using data from the CalTrans PeMS\nsystem [Cal], which provides real-time traf\ufb01c counts at over 40,000 locations on California\u2019s\nhighways. Appendix F.1 details this network reconstruction. The result is a directed network\nin which nodes are locations along each direction of travel on each highway and edges are\n\n7\n\n\fImage Summarization\n\nMovie Recommendation\n\nRevenue Maximization on YouTube\n\nHighway Network Objective\n\n\u25cf\n\u25cf\n\n\u25cf\n\n)\n\nS\n\n(\nf\n\n460\n\n450\n\n440\n\n430\n\n420\n\n410\n\n\u25cf\n\u25cf\n\n\u25cf\n\n8e+08\n\n\u25cf\n\u25cf\n\n\u25cf\n\n6e+08\n\nBLITS\nBLITS+\nRandomGreedy\nP\u2212Fantom\nGreedy\nRandom\n\n)\n\nS\n\n(\nf\n\n4e+08\n\n2e+08\n\n0e+00\n\n0\n\n20\n\n40\n\nRound\n\n60\n\n80\n\n0\n\n50\n\n100\n\nRound\n\n150\n\n200\n\n\u25cf\n\u25cf\n\n\u25cf\n\n400\n\n300\n\n\u25cf\n\n\u25cf\n\n)\n\nS\n\n(\nf\n\n200\n\n100\n\n0\n\n\u25cf\n\n0\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\n25\n\n50\n\nRound\n\n75\n\n100\n\n1e+09\n\n)\n\nS\n\n(\nf\n\n5e+08\n\n0e+00\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\n0\n\n\u25cf\n\u25cf\n\n\u25cf\n\n100\n\n200\n\nRound\n\n300\n\nFigure 2: Experiments Set 2: Real Data. Performance of BLITS (red) and BLITS+ (blue) versus\nRANDOMGREEDY (yellow), P-FANTOM (green), GREEDY (dark blue), and RANDOM (purple).\n\nthe total count of vehicles that passed between adjacent locations in April, 2018. We set\nk = 300 for this experiment.\n\n\u2022 Image summarization. Here we must select a subset to represent a large, diverse set of\nimages. This experiment uses 500 randomly chosen images from the 10K Tiny Images\ndataset [KH09] with k = 80. We measure how well an image represents another by their\ncosine similarity.\n\n\u2022 Movie recommendation. Here our goal is to recommend a diverse short list S of movies\nfor a user based on her ratings of movies she has already seen. We conduct this experiment\non a randomly selected subset of 500 movies from the MovieLens dataset [HK15] of 1\nmillion ratings by 6000 users on 4000 movies with k = 200. Following [MBK16], we\nde\ufb01ne the similarity of one movie to another as the inner product of their raw movie ratings\nvectors.\n\u2022 Revenue maximization. Here we choose a subset of k = 100 users in a social network to\nreceive a product for free in exchange for advertising it to their network neighbors, and the\ngoal is to choose users in a manner that maximizes revenue. We conduct this experiment on\n25 randomly selected communities (\u21e01000 nodes) from the 5000 largest communities in\nthe YouTube social network [FHK15], and we randomly assign edge weights from U(0, 1).\n\n4.3 Algorithms\n\nWe implement a version of BLITS exactly as described in this paper as well as a slightly modi\ufb01ed\nheuristic, BLITS+. The only difference is that whenever a round of samples has marginal value\nexceeding the threshold, BLITS+ adds the highest marginal value sample to its solution instead\nof a randomly chosen sample. BLITS+ does not have any approximation guarantees but slightly\noutperforms BLITS in practice. We compare these algorithms to several benchmarks:\n\n\u2022 RandomGreedy. This algorithm adds an element chosen u.a.r. from the k elements with\nthe greatest marginal contribution to f (S) at each round. It is a 1/e approximation for\nnon-monotone objectives and terminates in k adaptive rounds [BFNS14].\n\n\u2022 P-Fantom. P-FANTOM is a parallelized version of the FANTOM algorithm in [MBK16].\nFANTOM is the current state-of-the-art algorithm for non-monotone submodular objectives,\nand its main advantage is that it can maximize a non-monotone submodular function subject\nto a variety of intersecting constraints that are far more general than cardinality constraints.\nThe parallel version, P-FANTOM, requires O(k) rounds and gives a 1/6 \u270f approximation.\n\nWe also compare our algorithm to two reasonable heuristics:\n\n\u2022 Greedy. GREEDY iteratively adds the element with the greatest marginal contribution at\neach round. It is k-adaptive and may perform arbitrarily poorly for non-monotone functions.\n\u2022 Random. This algorithm merely returns a randomly chosen set of k elements. It performs\n\narbitrarily poorly in the worst case but requires 0 adaptive rounds.\n\n8\n\n\f4.4 Experimental results\nFor each experiment, we analyze the value of the algorithms\u2019 solutions over successive rounds (Fig.\n1 and 2). The results support four conclusions. First, BLITS and/or BLITS+ nearly always found\nsolutions whose value matched or exceeded those of FANTOM and RANDOMGREEDY\u2014 the two\nalternatives we consider that offer approximation guarantees for non-monotone objectives. This\nalso implies that BLITS found solutions with value far exceeding its own approximation guarantee,\nwhich is less than that of RANDOMGREEDY. Second, our algorithms also performed well against the\ntop-performing algorithm \u2014 GREEDY. Note that GREEDY\u2019s solutions decrease in value after some\nnumber of rounds, as GREEDY continues to add the element with the highest marginal contribution\neach round even when only negative elements remain. While BLITS\u2019s solutions were slightly eclipsed\nby the maximum value found by GREEDY in \ufb01ve of the eight experiments, our algorithms matched\nGREEDY on Erd\u02ddos R\u00e9nyi graphs, image summarization, and movie recommendation. Third, our\nalgorithms achieved these high values despite the fact that their solutions S contained \u21e010-15%\nfewer than k elements, as they removed negative elements before adding blocks to S at each round.\nThis means that they could have actually achieved even higher values in each experiment if we had\nallowed them to run until |S| = k elements. Finally, we note that BLITS achieved this performance\nin many fewer adaptive rounds than alternative algorithms. Here, it is also worth noting that for all\nexperiments, we initialized BLITS to use only 30 samples of size k/r per round \u2014 far fewer than the\ntheoretical requirement necessary to ful\ufb01ll its approximation guarantee. We therefore conclude that\nin practice, BLITS \u2019s superior adaptivity does not come at a high price in terms of sample complexity.\n\n9\n\n\fAcknowledgments\n\nThis research was supported by a Google PhD Fellowship, NSF grant CAREER CCF 1452961, NSF\nCCF 1301976, BSF grant 2014389, NSF USICCS proposal 1540428, a Google Research award, and\na Facebook research award.\n\nReferences\n\n[AAAK17] Arpit Agarwal, Shivani Agarwal, Sepehr Assadi, and Sanjeev Khanna. Learning with limited\nrounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons.\nIn COLT, pages 39\u201375, 2017.\n\n[BENW16] Rafael da Ponte Barbosa, Alina Ene, Huy L Nguyen, and Justin Ward. A new framework for\n\ndistributed submodular maximization. In FOCS, pages 645\u2013654. Ieee, 2016.\n\n[BFNS14] Niv Buchbinder, Moran Feldman, Joseph Sef\ufb01 Naor, and Roy Schwartz. Submodular maxi-\nmization with cardinality constraints. In Proceedings of the twenty-\ufb01fth annual ACM-SIAM\nsymposium on Discrete algorithms, pages 1433\u20131452. Society for Industrial and Applied Math-\nematics, 2014.\n\n[BGSMdW12] Harry Buhrman, David Garc\u00eda-Soriano, Arie Matsliah, and Ronald de Wolf. The non-adaptive\n\nquery complexity of testing k-parities. arXiv preprint arXiv:1209.3849, 2012.\n\n[Ble96] Guy E Blelloch. Programming parallel algorithms. Communications of the ACM, 39(3):85\u201397,\n\n1996.\n\n[BMW16] Mark Braverman, Jieming Mao, and S Matthew Weinberg. Parallel algorithms for select and\n\npartition with noisy comparisons. In STOC, pages 851\u2013862, 2016.\n\n[BPT11] Guy E Blelloch, Richard Peng, and Kanat Tangwongsan. Linear-work greedy parallel approxi-\n\nmate set cover and variants. In SPAA, pages 23\u201332, 2011.\n\n[BRM98] Guy E Blelloch and Margaret Reid-Miller. Fast set operations using treaps. In SPAA, pages\n\n16\u201326, 1998.\n\n[BRS89] Bonnie Berger, John Rompel, and Peter W Shor. Ef\ufb01cient nc algorithms for set cover with\n\napplications to learning and geometry. In FOCS, pages 54\u201359. IEEE, 1989.\n\n[BRS18] Eric Balkanski, Aviad Rubinstein, and Yaron Singer. An exponential speedup in parallel\nrunning time for submodular maximization without loss in approximation. arXiv preprint\narXiv:1804.06355, 2018.\n\n[BS18a] Eric Balkanski and Yaron Singer. The adaptive complexity of maximizing a submodular function.\n\nIn STOC, 2018.\n\n[BS18b] Eric Balkanski and Yaron Singer. Approximation guarantees for adaptive sampling. ICML,\n\n2018.\n\n[BST12] Guy E Blelloch, Harsha Vardhan Simhadri, and Kanat Tangwongsan. Parallel and i/o ef\ufb01cient\n\nset covering algorithms. In SPAA, pages 82\u201390. ACM, 2012.\n\n[Cal] CalTrans. Pems: California performance measuring system. http://pems.dot.ca.gov/\n\n[accessed: May 1, 2018].\n\n[CG17] Clement Canonne and Tom Gur. An adaptivity hierarchy theorem for property testing. arXiv\n\npreprint arXiv:1702.05678, 2017.\n\n[CJV15] Chandra Chekuri, TS Jayram, and Jan Vondr\u00e1k. On multiplicative weight updates for concave\nand submodular function maximization. In Proceedings of the 2015 Conference on Innovations\nin Theoretical Computer Science, pages 201\u2013210. ACM, 2015.\n\n[CKT10] Flavio Chierichetti, Ravi Kumar, and Andrew Tomkins. Max-cover in map-reduce. In WWW,\n\npages 231\u2013240, 2010.\n\n[Col88] Richard Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770\u2013785, 1988.\n\n[CST+17] Xi Chen, Rocco A Servedio, Li-Yang Tan, Erik Waingarten, and Jinyu Xie. Settling the query\n\ncomplexity of non-adaptive junta testing. arXiv preprint arXiv:1704.06314, 2017.\n\n10\n\n\f[DGS84] Pavol Duris, Zvi Galil, and Georg Schnitger. Lower bounds on communication complexity. In\n\nSTOC, pages 81\u201391, 1984.\n\n[EN16] Alina Ene and Huy L Nguyen. Constrained submodular maximization: Beyond 1/e.\n\nIn\nFoundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages\n248\u2013257. IEEE, 2016.\n\n[EN18] Alina Ene and Huy L Nguyen. Submodular maximization with nearly-optimal approximation\n\nand adaptivity in nearly-linear time. arXiv preprint arXiv:1804.05379, 2018.\n\n[FHK15] Moran Feldman, Christopher Harshaw, and Amin Karbasi. De\ufb01ning and evaluating network\ncommunities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 33\npages., 2015.\n\n[FMV11] Uriel Feige, Vahab S Mirrokni, and Jan Vondrak. Maximizing non-monotone submodular\n\nfunctions. SIAM Journal on Computing, 40(4):1133\u20131153, 2011.\n\n[FNS11] Moran Feldman, Joseph Naor, and Roy Schwartz. A uni\ufb01ed continuous greedy algorithm for\nsubmodular maximization. In Foundations of Computer Science (FOCS), 2011 IEEE 52nd\nAnnual Symposium on, pages 570\u2013579. IEEE, 2011.\n\n[GRST10] Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Constrained non-monotone\nsubmodular maximization: Of\ufb02ine and secretary algorithms. In International Workshop on\nInternet and Network Economics, pages 246\u2013257. Springer, 2010.\n\n[GV11] Shayan Oveis Gharan and Jan Vondr\u00e1k. Submodular maximization by simulated annealing. In\nProceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pages\n1098\u20131116. Society for Industrial and Applied Mathematics, 2011.\n\n[HBCN09] Jarvis D Haupt, Richard G Baraniuk, Rui M Castro, and Robert D Nowak. Compressive distilled\nsensing: Sparse recovery using adaptivity in compressive measurements. In Signals, Systems\nand Computers, 2009 Conference Record of the Forty-Third Asilomar Conference on, pages\n1551\u20131555. IEEE, 2009.\n\n[HK15] F. Maxwell Harper and Joseph A. Konstan. The movielens datasets: History and context. ACM\nTransactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19\npages., 2015.\n\n[HNC09] Jarvis Haupt, Robert Nowak, and Rui Castro. Adaptive sensing for sparse signal recovery. In\nDigital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop,\npages 702\u2013707. IEEE, 2009.\n\n[IPW11] Piotr Indyk, Eric Price, and David P Woodruff. On the power of adaptivity in sparse recovery.\n\nIn FOCS, pages 285\u2013294. IEEE, 2011.\n\n[KH09] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images,\n\n2009.\n\n[KMVV15] Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. Fast greedy algorithms\n\nin mapreduce and streaming. ACM Transactions on Parallel Computing, 2(3):14, 2015.\n\n[LMNS09] Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Non-monotone\nIn Proceedings of the\n\nsubmodular maximization under matroid and knapsack constraints.\nforty-\ufb01rst annual ACM symposium on Theory of computing, pages 323\u2013332. ACM, 2009.\n\n[MBK16] Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. Fast constrained\nsubmodular maximization: Personalized data summarization. In ICML, pages 1358\u20131367, 2016.\n\n[MKSK13] Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause. Distributed sub-\nmodular maximization: Identifying representative elements in massive data. In NIPS, pages\n2049\u20132057, 2013.\n\n[MZ15] Vahab Mirrokni and Morteza Zadimoghaddam. Randomized composable core-sets for dis-\n\ntributed submodular maximization. In STOC, pages 153\u2013162, 2015.\n\n[NSYD17] Hongseok Namkoong, Aman Sinha, Steve Yadlowsky, and John C Duchi. Adaptive sampling\n\nprobabilities for non-smooth optimization. In ICML, pages 2574\u20132583, 2017.\n\n11\n\n\f[NW91] Noam Nisan and Avi Widgerson. Rounds in communication complexity revisited. In STOC,\n\npages 419\u2013429, 1991.\n\n[PS84] Christos H Papadimitriou and Michael Sipser. Communication complexity. Journal of Computer\n\nand System Sciences, 28(2):260\u2013269, 1984.\n\n[RV98] Sridhar Rajagopalan and Vijay V Vazirani. Primal-dual rnc approximation algorithms for set\n\ncover and covering integer programs. SIAM Journal on Computing, 28(2):525\u2013540, 1998.\n\n[Val75] Leslie G Valiant. Parallelism in comparison problems. SIAM Journal on Computing, 4(3):348\u2013\n\n355, 1975.\n\n12\n\n\f", "award": [], "sourceid": 1194, "authors": [{"given_name": "Eric", "family_name": "Balkanski", "institution": "Harvard University"}, {"given_name": "Adam", "family_name": "Breuer", "institution": "Harvard University"}, {"given_name": "Yaron", "family_name": "Singer", "institution": "Harvard University"}]}