{"title": "An Online Algorithm for Maximizing Submodular Functions", "book": "Advances in Neural Information Processing Systems", "page_first": 1577, "page_last": 1584, "abstract": "We present an algorithm for solving a broad class of online resource allocation problems. Our online algorithm can be applied in environments where abstract jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submodular function of a set of pairs (v,t), where t is the time invested in activity v. Under this assumption, our online algorithm performs near-optimally according to two natural metrics: (i) the fraction of jobs completed within time T, for some fixed deadline T > 0, and (ii) the average time required to complete each job. We evaluate our algorithm experimentally by using it to learn, online, a schedule for allocating CPU time among solvers entered in the 2007 SAT solver competition.", "full_text": "An Online Algorithm for Maximizing\n\nSubmodular Functions\n\nMatthew Streeter\n\nGoogle, Inc.\n\nPittsburgh, PA 15213\n\nmstreeter@google.com\n\nDaniel Golovin\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\ndgolovin@cs.cmu.edu\n\nAbstract\n\nWe present an algorithm for solving a broad class of online resource allocation\nproblems. Our online algorithm can be applied in environments where abstract\njobs arrive one at a time, and one can complete the jobs by investing time in a\nnumber of abstract activities, according to some schedule. We assume that the\nfraction of jobs completed by a schedule is a monotone, submodular function of\na set of pairs (v, \u03c4), where \u03c4 is the time invested in activity v. Under this as-\nsumption, our online algorithm performs near-optimally according to two natural\nmetrics: (i) the fraction of jobs completed within time T , for some \ufb01xed dead-\nline T > 0, and (ii) the average time required to complete each job. We evaluate\nour algorithm experimentally by using it to learn, online, a schedule for allocating\nCPU time among solvers entered in the 2007 SAT solver competition.\n\n1 Introduction\n\nThis paper presents an algorithm for solving the following class of online resource allocation prob-\nlems. We are given as input a \ufb01nite set V of activities. A pair (v, \u03c4) \u2208 V \u00d7 R>0 is called an\naction, and represents spending time \u03c4 performing activity v. A schedule is a sequence of actions.\nWe use S to denote the set of all schedules. A job is a function f : S \u2192 [0, 1], where for any\nschedule S \u2208 S, f(S) represents the proportion of some task that is accomplished by performing\nthe sequence of actions S. We require that a job f have the following properties (here \u2295 is the\nconcatenation operator):\n\n1. (monotonicity) for any schedules S1, S2 \u2208 S, we have f(S1) \u2264 f(S1 \u2295 S2) and f(S2) \u2264\n2. (submodularity) for any schedules S1, S2 \u2208 S and any action a \u2208 V \u00d7R>0, fa(S1\u2295 S2) \u2264\n\nf(S1 \u2295 S2)\nfa(S1), where we de\ufb01ne fa(S) \u2261 f(S \u2295 (cid:104)a(cid:105)) \u2212 f(S).\n\nWe will evaluate schedules in terms of two objectives. The \ufb01rst objective, which we call bene\ufb01t-\nmaximization, is to maximize f (S) subject to the constraint (cid:96) (S) \u2264 T , for some \ufb01xed T > 0, where\n(cid:96) (S) equals the sum of the durations of the actions in S. For example if S = (cid:104)(v1, 3), (v2, 3)(cid:105), then\n(cid:96)(S) = 6. The second objective is to minimize the cost of a schedule, which we de\ufb01ne as\n\n(cid:90) \u221e\n\n1 \u2212 f(cid:0)S(cid:104)t(cid:105)(cid:1) dt\n\nc (f, S) =\n\nt=0\n\nwhere S(cid:104)t(cid:105) is the schedule that results from truncating schedule S at time t. For example if S =\n(cid:104)(v1, 3), (v2, 3)(cid:105) then S(cid:104)5(cid:105) = (cid:104)(v1, 3), (v2, 2)(cid:105).1 One way to interpret this objective is to imagine\nwhere k is the largest integer such thatPk\n1More formally, if S = (cid:104)a1, a2, . . .(cid:105), where ai = (vi, \u03c4i), then S(cid:104)t(cid:105) = (cid:104)a1, a2, . . . , ak\u22121, ak, (vk+1, \u03c4(cid:48))(cid:105),\n\ni=1 \u03c4i < t and \u03c4(cid:48) = t \u2212Pk\n\ni=1 \u03c4i.\n\n1\n\n\ft=0\n\nS. For any non-negative random variable X, we have E [X] =(cid:82) \u221e\n\nthat f(S) is the probability that some desired event occurs as a result of performing the actions in\nP [X > t] dt. Thus c (f, S) is\nthe expected time we must wait before the desired event occurs if we execute actions according to\nthe schedule S. The following example illustrates these de\ufb01nitions.\nExample 1. Let each activity v represent a randomized algorithm for solving some decision prob-\nlem, and let the action (v, \u03c4) represent running the algorithm (with a fresh random seed) for time\n\u03c4. Fix some particular instance of the decision problem, and for any schedule S, let f(S) be the\nprobability that one (or more) of the runs in the sequence S yields a solution to that instance. So\nf(S(cid:104)T(cid:105)) is (by de\ufb01nition) the probability that performing the runs in schedule S yields a solution\nto the problem instance in time \u2264 T , while c (f, S) is the expected time that elapses before a so-\nlution is obtained. It is clear that f(S) is monotone, because adding runs to the sequence S can\nonly increase the probability that one of the runs is successful. The fact that f is submodular can\nbe seen as follows. For any schedule S and action a, fa(S) equals the probability that action a\nsucceeds after every action in S has failed, which can also be written as (1 \u2212 f(S)) \u00b7 f((cid:104)a(cid:105)). This,\ntogether with the monotonicity of f, implies that for any schedules S1, S2 and any action a, we have\nfa(S1 \u2295 S2) = (1 \u2212 f(S1 \u2295 S2)) \u00b7 f((cid:104)a(cid:105)) \u2264 (1 \u2212 f(S1)) \u00b7 f((cid:104)a(cid:105)) = fa(S1).\nIn the online setting, an arbitrary sequence (cid:104)f (1), f (2), . . . , f (n)(cid:105) of jobs arrive one at a time, and\nwe must \ufb01nish each job (via some schedule) before moving on to the next job. When selecting a\nschedule S(i) to use to \ufb01nish job f (i), we have knowledge of the previous jobs f (1), f (2), . . . , f (i\u22121)\nbut we have no knowledge of f (i) itself or of any subsequent jobs. In this setting we aim to minimize\nregret, which measures the difference between the average cost (or average bene\ufb01t) of the schedules\nproduced by our online algorithm and that of the best single schedule (in hindsight) for the given\nsequence of jobs.\n\n1.1 Problems that \ufb01t into this framework\n\n(cid:80)n\n\n(cid:17)\n(v,\u03c4 )\u2208S (1 \u2212 pi(v, \u03c4))\n\n(cid:16)\n\n1 \u2212(cid:81)\n\ni=1\n\nA number of previously-studied problems can be cast as the task of computing a schedule S that\nminimizes c (f, S), where f is of the form f(S) = 1\n. This\nn\nexpression can be interpreted as follows: the job f consists of n subtasks, and pi(v, \u03c4) is the prob-\nability that investing time \u03c4 in activity v completes the ith subtask. Thus, f(S) is the expected\nfraction of subtasks that are \ufb01nished after performing the sequence of actions S. Assuming pi(v, \u03c4)\nis a non-decreasing function of \u03c4 for all i and v, it can be shown that any function f of this form is\nmonotone and submodular. PIPELINED SET COVER [11, 15] can be de\ufb01ned as the special case in\nwhich for each activity v there is an associated time \u03c4v, and pi(v, \u03c4) = 1 if \u03c4 \u2265 \u03c4v and pi(v, \u03c4) = 0\notherwise. MIN-SUM SET COVER [7] is the special case in which, additionally, \u03c4v = 1 or \u03c4v = \u221e\nfor all v \u2208 V. The problem of constructing ef\ufb01cient sequences of trials [5] corresponds to the case\nin which we are given a matrix q, and pi(v, \u03c4) = qv,i if \u03c4 \u2265 1 and pi(v, \u03c4) = 0 otherwise.\nThe problem of maximizing f(S(cid:104)T(cid:105)) is a slight generalization of the problem of maximizing a\nmonotone submodular set function subject to a knapsack constraint [14, 20] (which in turn gener-\nalizes BUDGETED MAXIMUM COVERAGE [12], which generalizes MAX k-COVERAGE [16]). The\nonly difference between the two problems is that, in the latter problem, f(S) may only depend on\nthe set of actions in the sequence S, and not on the order in which the actions appear.\n\n1.2 Applications\nWe now discuss three applications, the \ufb01rst of which is the focus of our experiments in \u00a75.\n1. Online algorithm portfolio design. An algorithm portfolio [9] is a schedule for interleaving the\nexecution of multiple (randomized) algorithms and periodically restarting them with a fresh random\nseed. Previous work has shown that combining multiple heuristics for NP-hard problems into a port-\nfolio can dramatically reduce average-case running time [8, 9, 19]. In particular, algorithms based\non chronological backtracking often exhibit heavy-tailed run length distributions, and periodically\nrestarting them with a fresh random seed can reduce the mean running time by orders of magnitude\n[8]. As illustrated in Example 1, our algorithms can be used to learn an effective algorithm portfolio\nonline, in the course of solving a sequence of problem instances.\n\n2\n\n\f2. Database query processing. In database query processing, one must extract all the records in a\ndatabase that satisfy every predicate in a list of one or more predicates (the conjunction of predicates\ncomprises the query). To process the query, each record is evaluated against the predicates one\nat a time until the record either fails to satisfy some predicate (in which case it does not match\nthe query) or all predicates have been examined. The order in which the predicates are examined\naffects the time required to process the query. Munagala et al. [15] introduced and studied a problem\ncalled PIPELINED SET COVER (discussed in \u00a71.1), which entails \ufb01nding an evaluation order for the\npredicates that minimizes the average time required to process a record. Our work addresses the\nonline version of this problem, which arises naturally in practice.\n3. Sensor placement. Sensor placement is the task of assigning locations to a set of sensors so\nas to maximize the value of the information obtained (e.g., to maximize the number of intrusions\nthat are detected by the sensors). Many sensor placement problems can be optimally solved by\nmaximizing a monotone submodular set function subject to a knapsack constraint [13], a special\ncase of our bene\ufb01t-maximization problem (see \u00a71.1). Our online algorithms could be used to select\nsensor placements when the same set of sensors is repeatedly deployed in an unknown or adversarial\nenvironment.\n\n1.3 Summary of results\n\nWe \ufb01rst consider the of\ufb02ine variant of our problem. As an immediate consequence of existing\nresults [6, 7], we \ufb01nd that, for any \u0001 > 0, (i) achieving an approximation ratio of 4 \u2212 \u0001 for the\ncost-minimization problem is NP-hard and (ii) achieving an approximation ratio of 1\u2212 1\ne + \u0001 for the\nbene\ufb01t-maximization problem is NP-hard. We then present a greedy approximation algorithm that\nsimultaneously achieves the optimal approximation ratios (of 4 and 1 \u2212 1\ne ) for these two problems,\nbuilding on and generalizing previous work on special cases of these two problems [7, 20].\nIn the online setting we provide an online algorithm whose worst-case performance guarantees ap-\nproach those of the of\ufb02ine greedy approximation algorithm asymptotically (as the number of jobs\napproaches in\ufb01nity). We then show how to modify our online algorithm for use in several different\n\u201cbandit\u201d feedback settings. Finally, we prove information-theoretic lower bounds on regret. We\nconclude with an experimental evaluation.\n\n2 Related Work\nAs discussed in \u00a71.1, the of\ufb02ine cost-minimization problem considered here generalizes MIN-SUM\nSET COVER [7], PIPELINED SET COVER [11, 15], and the problem of constructing ef\ufb01cient se-\nquences of trials [5]. Several of these problems have been considered in the online setting. Mu-\nnagala et al. [15] gave an online algorithm for PIPELINED SET COVER that is asymptotically\nO (log |V|)-competitive. Babu et al. [3] and Kaplan et al. [11] gave online algorithms for PIPE-\nLINED SET COVER that are asymptotically 4-competitive, but only in the special case where the\njobs are drawn independently at random from a \ufb01xed probability distribution (whereas our online\nalgorithm is asymptotically 4-competitive on an arbitrary sequence of jobs).\nOur of\ufb02ine bene\ufb01t-maximization problem generalizes the problem of maximizing a monotone sub-\nmodular set function subject to a knapsack constraint. Previous work gave of\ufb02ine greedy approx-\nimation algorithms for this problem [14, 20], which generalized earlier algorithms for BUDGETED\nMAXIMUM COVERAGE [12] and MAX k-COVERAGE [16]. To our knowledge, none of these prob-\nlems have previously been studied in an online setting. Note that our problem is quite different from\nonline set covering problems (e.g., [1]) that require one to construct a single collection of sets that\ncovers each element in a sequence of elements that arrive online.\nIn this paper we convert a speci\ufb01c greedy approximation algorithm into an online algorithm. Re-\ncently, Kakade et al. [10] gave a generic procedure for converting an \u03b1-approximation algorithm\ninto an online algorithm that is asymptotically \u03b1-competitive. Their algorithm applies to linear\noptimization problems, but not to the non-linear problems we consider here.\nIndependently of us, Radlinkski et al. [17] developed a no-regret algorithm for the online version of\nMAX k-COVERAGE, and applied it to online ranking. As it turns out, their algorithm is a special\ncase of the algorithm OGunit that we present in \u00a74.1.\n\n3\n\n\f3 Of\ufb02ine Greedy Algorithm\nIn the of\ufb02ine setting, we are given as input a job f : S (cid:31) [0(cid:44) 1]. Our goal is to compute a schedule S\nthat achieves one of two objectives, either minimizing the cost c (f(cid:44) S) or maximizing f(S) subject\nto the constraint (cid:31)(S) (cid:30) T .2 As already mentioned, this of\ufb02ine problem generalizes MIN-SUM SET\nCOVER under the former objective and generalizes MAX k-COVERAGE under the latter objective,\nwhich implies the following computational complexity result [6, 7].\nTheorem 1. For any (cid:30) > 0, achieving a 4 (cid:29)(cid:30)(resp. 1 (cid:29) 1\nminimization (resp. bene\ufb01t-maximization) problem is NP-hard.\n\ne + (cid:30)) approximation ratio for the cost-\n\n(cid:31)f(v(cid:44)(cid:31))(Gj )\n\n(cid:30)(cid:29)sj. We will prove\n\n,\n\n(cid:31)\n\n(cid:31)j\n\n(cid:31)f(v(cid:44)(cid:31))(Gj )\n\nWe now consider an arbitrary schedule G, whose jth action is gj = (vj(cid:44) (cid:29)j). Let sj = fgj (Gj )\nwhere Gj = (cid:28)g1(cid:44) g2(cid:44) (cid:46)(cid:46)(cid:46)(cid:44) gj(cid:31)1(cid:27), and let (cid:30)j = max(v(cid:44)(cid:31))(cid:30)V(cid:215) R>0\n(cid:30)\nbounds on the performance of G in terms of the (cid:30)j values. Note that we can ensure (cid:30)j = 0 (cid:26)j by\ngreedily choosing gj = arg max(v(cid:44)(cid:31))(cid:30)V(cid:215) R>0\n(i.e., greedily appending actions to the\nschedule so as to maximize the resulting increase in f per unit time). A key property is stated in the\nfollowing lemma, which follows from the submodularity assumption (for the proof, see [18]).\nLemma 1. For any schedule S, any positive integer j, and any t > 0, f(S(cid:29)t(cid:28)) (cid:30) f(Gj)+t(cid:183) (sj +(cid:30)j).\nUsing Lemma 1, together with a geometric proof technique developed in [7], we now show that the\ngreedy algorithm achieves the optimal approximation ratio for the cost-minimization problem.\ngenerally, let L be a positive integer, and let T =(cid:29) L\nTheorem 2. Let S(cid:27) = arg minS(cid:30)S c (f(cid:44) S). If (cid:30)j = 0 (cid:26)j, then c (f(cid:44) G) (cid:30) 4 (cid:183) c (f(cid:44) S(cid:27)). More\nt=0 1 (cid:29)f(cid:27)S(cid:29)t(cid:28)(cid:26)dt. Then cT (f(cid:44) G) (cid:30) 4 (cid:183) c (f(cid:44) S(cid:27)) +(cid:29) L\n(cid:28)T\nj=1 (cid:29)j. For any schedule S, de\ufb01ne cT (f(cid:44) S) (cid:25)\n\nj=1 Ej(cid:29)j, where Ej =(cid:29)\n\nl\n\nj=1 (cid:29)j.\n\n\f4 Online Greedy Algorithm\nIn the online setting we are fed, one at a time, a sequence (cid:104)f (1), f (2), . . . , f (n)(cid:105) of jobs. Prior to\nreceiving job f (i), we must specify a schedule S(i). We then receive complete access to the function\nf (i).\nWe measure performance using two different notions of regret. For the cost-minimization objective,\nwe de\ufb01ne Rcost = 1\nn\n\ni=1 c(cid:0)S, f (i)(cid:1)(cid:9), for some \ufb01xed T > 0.\n(cid:80)n\nt=0 1 \u2212 f(cid:0)S(cid:104)t(cid:105)(cid:1) dt to be the value of\nHere for any schedule S and job f, we de\ufb01ne cT (S, f) = (cid:82) T\nc(cid:0)S(i), f (i)(cid:1) could be in\ufb01nite, and without bounding it we could not prove any \ufb01nite bound on regret\ni=1 f (i)(cid:0)S(cid:104)T(cid:105)(cid:1)(cid:9) \u2212 1\ni=1 f (i)(cid:0)S(i)(cid:1). Here we require\nde\ufb01ne Rbene\ufb01t =(cid:0)1 \u2212 1\n(cid:80)n\n(cid:80)n\nthat for each i, E(cid:2)(cid:96)(cid:0)S(i)(cid:1)(cid:3) = T , where the expectation is over the online algorithm\u2019s random bits.\n\ni=1 cT(cid:0)S(i), f (i)(cid:1)\u22124\u00b7minS\u2208S(cid:8) 1\n(cid:80)n\n(cid:1) maxS\u2208S(cid:8) 1\n\nc (S, f) when the integral is truncated at time T . Some form of truncation is necessary because\n\n(our regret bounds will be stated as a function of T ). For the bene\ufb01t-maximization objective, we\n\nThat is, we allow the online algorithm to treat T as a budget in expectation, rather than a hard budget.\nOur goal is to bound the worst-case expected values of Rcost and Rbene\ufb01t. For simplicity, we\nconsider the oblivious adversary model, in which the sequence of jobs is \ufb01xed in advance and does\nnot change in response to the decisions made by our online algorithm. We con\ufb01ne our attention to\nschedules that consist of actions that come from some \ufb01nite set A, and assume that the actions in A\nhave integer durations (i.e. A \u2286 V \u00d7 Z>0).\n\nn\n\nn\n\nn\n\ne\n\na\n\n.\n\n(cid:16)\n\nO\n\nT\n\nand E [Rcost] =\n\n(cid:16)\n(cid:16)(cid:113) T\nn ln|A|(cid:17)\n\n4.1 Unit-cost actions\nIn the special case in which each action takes unit time (i.e., A \u2286 V \u00d7 {1}), our online algorithm\nOGunit is very simple. OGunit runs T action-selection algorithms, E1,E2, . . . ,ET , where T is\nthe number of time steps for which our schedule is de\ufb01ned. The intent is that each action-selection\nalgorithm is a no-regret algorithm such as randomized weighted majority (WMR) [4], which selects\nactions so as to maximize payoffs associated with the actions. Just before job f (i) arrives, each\naction-selection algorithm Et selects an action ai\nt. The schedule used by OGunit on job f (i) is\nS(i) = (cid:104)ai\n\nT(cid:105). The payoff that Et associates with action a is f (i)\n\n2, . . . , ai\n\n1, ai\n\n(cid:17)\n\nS(i)(cid:104)t\u22121(cid:105)\n\nin the worst case, when WMR [4] is the subroutine action-selection algorithm.\n\nTheorem 4. Algorithm OGunit has E [Rbene\ufb01t] = O\n\n(cid:113) T\nn ln|A|(cid:17)\n(cid:80)n\nProof. We will view OGunit as producing an approximate version of the of\ufb02ine greedy schedule for\ni=1 f (i). First, view the sequence of actions selected by Et as a single meta-action\nthe job f = 1\nn\n\u02dcat, and extend the domain of each f (i) to include the meta-actions by de\ufb01ning f (i)(S \u2295 (cid:104)\u02dcat(cid:105)) =\nt(cid:105)) for all S \u2208 S (note each f (i) remains monotone and submodular). Thus, the online\nf (i)(S \u2295 (cid:104)ai\n(cid:17)\nalgorithm produces a single schedule \u02dcS = (cid:104)\u02dca1, \u02dca2, . . . , \u02dcaT(cid:105) for all i. Let rt be the regret experienced\nby action-selection algorithm Et. By construction, rt = maxa\u2208A\n.\nThus OGunit behaves exactly like the greedy schedule G for the function f, with \u0001t = rt. Thus,\nt=1 rt \u2261 R. Similarly, Theorem 2 implies that Rcost \u2264 T R.\nTo complete the analysis, it remains to bound E [R]. WMR has worst-case expected regret\n, where Gmax is the maximum sum of payoffs payoff for any single ac-\nO\ntion.3 Because each payoff is at most 1 and there are n rounds, Gmax \u2264 n, so a trivial bound is\nE [R] = O\nalgorithms, leading to an improved bound of E [R] = O\ncompletes the proof.\n\n. In fact, the worst case is when Gmax = \u0398(cid:0) n\n(cid:16)(cid:113) T\nn ln|A|(cid:17)\n\nTheorem 3 implies that Rbene\ufb01t \u2264(cid:80)T\n(cid:16) 1\n\n(cid:112)Gmax ln|A|(cid:17)\n(cid:113) 1\n(cid:16)\nn ln|A|(cid:17)\n\n(cid:1) for all T action-selection\n\nT\n(for details see [18]), which\n\n(cid:17)(cid:111) \u2212 f\u02dcat\n\n(cid:16) \u02dcS(cid:104)t\u22121(cid:105)\n\n(cid:16) \u02dcS(cid:104)t\u22121(cid:105)\n\n(cid:110)\n\nfa\n\nn\n\nT\n\n3This bound requires Gmax to be known in advance; however, the same guarantee can be achieved by\n\nguessing a value of Gmax and doubling the guess whenever it is proven wrong.\n\n5\n\n\f4.2 From unit-cost actions to arbitrary actions\n\nt\n\nt(cid:105)) if ai\n\n\u03c4 . Let S(i)\n\nIn this section we generalize the online greedy algorithm presented in the previous section to accom-\nmodate actions with arbitrary durations. Like OGunit, our generalized algorithm OG makes use\nof a series of action-selection algorithms E1,E2, . . . ,EL (for L to be determined). On each round\ni, OG constructs a schedule S(i) as follows: for t = 1, 2, . . . , L, it uses Et to choose an action\nt = (v, \u03c4) \u2208 A, and appends this action to S(i) with probability 1\ndenote the schedule\nai\nthat results from the \ufb01rst t steps of this process (so S(i)\ncontains between 0 and t actions). The\npayoff that Et associates with an action a = (v, \u03c4) equals 1\n\u03c4 fa(S(i)\nt\u22121) (i.e., the increase in f per unit\ntime that would have resulted from appending a to the schedule-under-construction).\nAs in the previous section, we view each action-selection algorithm Et as selecting a single meta-\naction \u02dcat. We extend the domain of each f (i) to include the meta-actions by de\ufb01ning f (i)(S\u2295(cid:104)\u02dcat(cid:105)) =\nf (i)(S \u2295(cid:104)ai\nt was appended to S(i), and f (i)(S \u2295(cid:104)\u02dcat(cid:105)) = f (i)(S) otherwise. Thus, the online\nalgorithm produces a single schedule \u02dcS = (cid:104)\u02dca1, \u02dca2, . . . , \u02dcaL(cid:105) for all i. Note that each f (i) remains\nmonotone and submodular.\nFor the purposes of analysis, we will imagine that each meta-action \u02dcat always takes unit time\n(whereas in fact, \u02dcat takes unit time per job in expectation). We show later that this assumption\n(cid:80)n\ndoes not invalidate any of our arguments.\ni=1 f (i), and let \u02dcSt = (cid:104)\u02dca1, \u02dca2, . . . , \u02dcat(cid:105). Thus \u02dcS can be viewed as a version of the\nLet f = 1\nn\ngreedy schedule from \u00a73, with \u0001t = max(v,\u03c4 )\u2208A\n, where we\nare using the assumption that \u02dcat takes unit time. Let rt be the regret experienced by Et. Although\nrt (cid:54)= \u0001t in general, the two quantities are equal in expectation (proof omitted).\nLemma 2. E [\u0001t] = E [rt].\n\n(cid:17)(cid:111) \u2212(cid:16)\n\nf(v,\u03c4 )( \u02dcSt\u22121)\n\nf\u02dcat( \u02dcSt\u22121)\n\n(cid:110) 1\n\n(cid:17)\n\nt\n\n(cid:16)\n\n\u03c4\n\nt=1 rt.\n\nWe now prove a bound on E [Rbene\ufb01t]. Because each f (i) is monotone and submodular, f is mono-\ntone and submodular as well, so the greedy schedule\u2019s approximation guarantees apply to f. In\nt=1 \u0001t. Thus by Lemma 2, E [Rbene\ufb01t] \u2264 E [R],\n\nparticular, by Theorem 3, we have Rbene\ufb01t \u2264 (cid:80)T\nwhere R =(cid:80)T\na constraint violation of the form E(cid:2)(cid:96)(cid:0)S(i)(cid:1)(cid:3) (cid:54)= T . But by construction, E(cid:2)(cid:96)(cid:0)S(i)(cid:1)(cid:3) = L for each\n\nTo bound E [Rbene\ufb01t], it remains to justify the assumption that each meta-action \u02dcat always takes unit\ntime. First, note that the value of the objective function f( \u02dcS) is independent of how long each meta-\naction \u02dcat takes. Thus, the only potential danger is that in making this assumption we have overlooked\n\ni, regardless of what actions are chosen by each action-selection algorithm. Thus if we set L = T\nthere is no constraint violation. Combining the bound on E [R] stated in the proof of Theorem 4\nwith the fact that E [Rbene\ufb01t] \u2264 E [R] yields the following theorem.\nTheorem 5. Algorithm OG, run with input L = T , has E [Rbene\ufb01t] \u2264 E [R]. If WMR [4] is used\nas the subroutine action-selection algorithm, then E [R] = O\n\n(cid:16)(cid:113) T\nn ln|A|(cid:17)\n\n.\n\nThe argument bounding E [Rcost] is similar, although somewhat more involved (for details, see [18]).\n\nOne additional complication is that (cid:96)(cid:0)S(i)(cid:1) is now a random variable, whereas in the de\ufb01nition of\nprobability that (cid:96)(cid:0)S(i)(cid:1) < T suf\ufb01ciently small, which can be done by setting L (cid:29) T and applying\n\nRcost the cost of a schedule is always calculated up to time T . This can be addressed by making the\n\nconcentration of measure inequalities. However, E [R] grows as a function of L, so we do not want\nto make L too large. The (approximately) best bound is obtained by setting L = T ln n.\nTheorem 6. Algorithm OG, run with input L = T ln n, has E [Rcost] = O(T ln n \u00b7 E [R] + T\u221a\nIn particular, E [Rcost] = O\nselection algorithm.\n\nn).\nif WMR [4] is used as the subroutine action-\n\n(cid:113) T\nn ln|A|(cid:17)\n\n(ln n) 3\n\n(cid:16)\n\n2 T\n\n6\n\n\f4.3 Dealing with limited feedback\n\nThus far we have assumed that, after specifying a schedule S(i), the online algorithm receives com-\nplete access to the job f (i). We now consider three more limited feedback settings that may arise\nin practice. In the priced feedback model, to receive access to f (i) we must pay a price C, which\nfor\n\nis added to our regret. In the partially transparent feedback model, we only observe f (i)(cid:16)\neach t > 0. In the opaque feedback model, we only observe f (i)(cid:0)S(i)(cid:1).\n\nS(i)(cid:104)t(cid:105)\n\n(cid:17)\n\nThe priced and partially transparent feedback models arise naturally in the case where action (v, \u03c4)\nrepresents running a deterministic algorithm v for \u03c4 time units, and f(S) = 1 if some action in S\nyields a solution to some particular problem instance, and f(S) = 0 otherwise. If we execute a\nschedule S and halt as soon as some action yields a solution, we obtain exactly the information that\nis revealed in the partially transparent model. Alternatively, running each algorithm v until it returns\na solution would completely reveal the function f (i), but incurs a computational cost, as re\ufb02ected in\nthe priced feedback model.\nAlgorithm OG can be adapted to work in each of these three feedback settings; see [18] for the\nspeci\ufb01c bounds. In all cases, the high-level idea is to replace the unknown quantities used by OG\nwith (unbiased) estimates of those quantities. This technique has been used in a number of online\nalgorithms (e.g., see [2]).\n\n4.4 Lower bounds on regret\n\nWe now state lower bounds on regret; for the proofs see the full paper [18]. Our proofs have the\nsame high-level structure as that of the lower bound given in [4], in that we de\ufb01ne a distribution\nover jobs that allows any online algorithm\u2019s expected performance to be easily bounded, and then\nprove a bound on the expected performance of the best schedule in hindsight. The upper bounds in\nTheorem 4 match the lower bounds in Theorem 7 up to logarithmic factors, although the latter apply\nto standard regret as opposed to Rbene\ufb01t and Rcost (which include factors of 1 \u2212 1\nTheorem 7. Let X =\n(resp. \u2126 (T X)) for the online bene\ufb01t-maximization (resp. cost-minimization) problem.\n\nT . Then any online algorithm has worst-case expected regret \u2126 (X)\n\nn ln |V|\n\ne and 4).\n\n(cid:113)\n\nT\n\n5 Experimental Evaluation on SAT 2007 Competition Data\n\nThe annual SAT solver competition (www.satcompetition.org) is designed to encourage the\ndevelopment of ef\ufb01cient Boolean satis\ufb01ability solvers, which are used as subroutines in state-of-\nthe-art model checkers, theorem provers, and planners. The competition consists of running each\nsubmitted solver on a number of benchmark instances, with a per-instance time limit. Solvers are\nranked according to the instances they solve within each of three instance categories: industrial,\nrandom, and hand-crafted.\nWe evaluated the online algorithm OG by using it to combine solvers from the 2007 SAT solver\ncompetition. To do so, we used data available on the competition web site to construct a matrix\nX, where Xi,j is the time that the jth solver required on the ith benchmark instance. We used this\ndata to determine whether or not a given schedule would solve an instance within the time limit\nT (schedule S solves instance i if and only if, for some j, S(cid:104)T(cid:105) contains an action (hj, \u03c4) with\n\u03c4 \u2265 Xi,j). As illustrated in Example 1, the task of maximizing the number of instances solved\nwithin the time limit, in an online setting in which a sequence of instances must be solved one at a\ntime, is an instance of our online problem (under the bene\ufb01t-maximization objective).\nWithin each instance category, we compared OG to the of\ufb02ine greedy schedule, to the individual\nsolver that solved the most instances within the time limit, and to a schedule that ran each solver\nin parallel at equal strength. For these experiments, we ran OG in the full-information feedback\nmodel, after \ufb01nding that the number of benchmark instances was too small for OG to be effective\nin the limited feedback models. Table 1 summarizes the results. In each category, the of\ufb02ine greedy\nschedule and the online greedy algorithm outperform all solvers entered in the competition as well\nas the na\u00a8\u0131ve parallel schedule.\n\n7\n\n\fTable 1: Number of benchmark instances solved within the time limit.\n\nCategory\n\nIndustrial\nRandom\nHand-crafted\n\nOf\ufb02ine Online Parallel\ngreedy\nschedule\n132\n147\n302\n350\n114\n95\n\ngreedy\n149\n347\n107\n\nTop\nsolver\n139\n257\n98\n\nReferences\n[1] Noga Alon, Baruch Awerbuch, and Yossi Azar. The online set cover problem. In Proceedings\n\nof the 35th STOC, pages 100\u2013105, 2003.\n\n[2] Peter Auer, Nicol`o Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic\n\nmultiarmed bandit problem. SIAM Journal on Computing, 32(1):48\u201377, 2002.\n\n[3] Shivnath Babu, Rajeev Motwani, Kamesh Munagala, Itaru Nishizawa, and Jennifer Widom.\nAdaptive ordering of pipelined stream \ufb01lters. In Proc. Intl. Conf. on Management of Data,\npages 407\u2013418, 2004.\n\n[4] Nicol`o Cesa-Bianchi, Yoav Freund, David Haussler, David Helmbold, Robert Schapire, and\n\nManfred Warmuth. How to use expert advice. Journal of the ACM, 44(3):427\u2013485, 1997.\n\n[5] Edith Cohen, Amos Fiat, and Haim Kaplan. Ef\ufb01cient sequences of trials. In Proceedings of\n\nthe 14th SODA, pages 737\u2013746, 2003.\n\n[6] Uriel Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45(4):634\u2013\n\n652, 1998.\n\n[7] Uriel Feige, L\u00b4aszl\u00b4o Lov\u00b4asz, and Prasad Tetali. Approximating min sum set cover. Algorith-\n\nmica, 40(4):219\u2013234, 2004.\n\n[8] Carla P. Gomes and Bart Selman. Algorithm portfolios. Arti\ufb01cial Intelligence, 126:43\u201362,\n\n2001.\n\n[9] Bernardo A. Huberman, Rajan M. Lukose, and Tad Hogg. An economics approach to hard\n\ncomputational problems. Science, 275:51\u201354, 1997.\n\n[10] Sham Kakade, Adam Kalai, and Katrina Ligett. Playing games with approximation algorithms.\n\nIn Proceedings of the 39th STOC, pages 546\u2013555, 2007.\n\n[11] Haim Kaplan, Eyal Kushilevitz, and Yishay Mansour. Learning with attribute costs. In Pro-\n\nceedings of the 37th STOC, pages 356\u2013365, 2005.\n\n[12] Samir Khuller, Anna Moss, and Joseph (Sef\ufb01) Naor. The budgeted maximum coverage prob-\n\nlem. Information Processing Letters, 70(1):39\u201345, 1999.\n\n[13] Andreas Krause and Carlos Guestrin. Near-optimal nonmyopic value of information in graph-\n\nical models. In Proceedings of the 21st UAI, pages 324\u2013331, 2005.\n\n[14] Andreas Krause and Carlos Guestrin. A note on the budgeted maximization of submodular\n\nfunctions. Technical Report CMU-CALD-05-103, Carnegie Mellon University, 2005.\n\n[15] Kamesh Munagala, Shivnath Babu, Rajeev Motwani, Jennifer Widom, and Eiter Thomas. The\n\npipelined set cover problem. In Proc. Intl. Conf. on Database Theory, pages 83\u201398, 2005.\n\n[16] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maxi-\n\nmizing submodular set functions. Mathematical Programming, 14(1):265\u2013294, 1978.\n\n[17] Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. Learning diverse rankings with\n\nmulti-armed bandits. In Proceedings of the 25th ICML, pages 784\u2013791, 2008.\n\n[18] Matthew Streeter and Daniel Golovin. An online algorithm for maximizing submodular func-\n\ntions. Technical Report CMU-CS-07-171, Carnegie Mellon University, 2007.\n\n[19] Matthew Streeter, Daniel Golovin, and Stephen F. Smith. Combining multiple heuristics on-\n\nline. In Proceedings of the 22nd AAAI, pages 1197\u20131203, 2007.\n\n[20] Maxim Sviridenko. A note on maximizing a submodular set function subject to a knapsack\n\nconstraint. Operations Research Letters, 32:41\u201343, 2004.\n\n8\n\n\f", "award": [], "sourceid": 729, "authors": [{"given_name": "Matthew", "family_name": "Streeter", "institution": null}, {"given_name": "Daniel", "family_name": "Golovin", "institution": null}]}