{"title": "Beyond Convexity: Online Submodular Minimization", "book": "Advances in Neural Information Processing Systems", "page_first": 700, "page_last": 708, "abstract": "We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efficient and are Hannan-consistent in both the full information and bandit settings.", "full_text": "Online Submodular Minimization\n\nElad Hazan\n\nIBM Almaden Research Center\n\n650 Harry Rd, San Jose, CA 95120\n\nSatyen Kale\n\nYahoo! Research\n\n4301 Great America Parkway, Santa Clara, CA 95054\n\nhazan@us.ibm.com\n\nskale@yahoo-inc.com\n\nAbstract\n\nWe consider an online decision problem over a discrete space in which the loss\nfunction is submodular. We give algorithms which are computationally ef\ufb01cient\nand are Hannan-consistent in both the full information and bandit settings.\n\n1 Introduction\n\nOnline decision-making is a learning problem in which one needs to choose a decision repeatedly\nfrom a given set of decisions, in an effort to minimize costs over the long run, even in the face of\ncomplete uncertainty about future outcomes. The performance of an online learning algorithm is\nmeasured in terms of its regret, which is the difference between the total cost of the decisions it\nchooses, and the cost of the optimal decision chosen in hindsight. A Hannan-consistent algorithm is\none that achieves sublinear regret (as a function of the number of decision-making rounds). Hannan-\nconsistency implies that the average per round cost of the algorithm converges to that of the optimal\ndecision in hindsight.\nIn the past few decades, a variety of Hannan-consistent algorithms have been devised for a wide\nrange of decision spaces and cost functions, including well-known settings such as prediction from\nexpert advice [10], online convex optimization [15], etc. Most of these algorithms are based on\nan online version of convex optimization algorithms. Despite this success, many online decision-\nmaking problems still remain open, especially when the decision space is discrete and large (say,\nexponential size in the problem parameters) and the cost functions are non-linear.\nIn this paper, we consider just such a scenario. Our decision space is now the set of all subsets of\na ground set of n elements, and the cost functions are assumed to be submodular. This property\nis widely seen as the discrete analogue of convexity, and has proven to be a ubiquitous property in\nvarious machine learning tasks (see [4] for references). A crucial component in these latter results\nare the celebrated polynomial time algorithms for submodular function minimization [7].\nTo motivate the online decision-making problem with submodular cost functions, here is an example\nfrom [11]. Consider a factory capable of producing any subset from a given set of n products E.\nLet f : 2E (cid:55)\u2192 R be the cost function for producing any such subset (here, 2E stands for the set of\nall subsets of E). Economics tells us that this cost function should satisfy the law of diminishing\nreturns:\ni.e., the additional cost of producing an additional item is lower the more we produce.\nMathematically stated, for all sets S, T \u2286 E such that T \u2286 S, and for all elements i \u2208 E, we have\n\nf(T \u222a {i}) \u2212 f(T ) \u2265 f(S \u222a {i}) \u2212 f(S).\n\nSuch cost functions are called submodular, and frequently arise in real-world economic and other\n(cid:80)\nscenarios. Now, for every item i, let pi be the market price of the item, which is only determined in\nthe future based on supply and demand. Thus, the pro\ufb01t from producing a subset S of the items is\ni\u2208S pi \u2212 f(S). Maximizing pro\ufb01t is equivalent to minimizing the function \u2212P , which\nP (S) =\nis easily seen to be submodular as well.\nThe online decision problem which arises is now to decide which set of products to produce, to max-\nimize pro\ufb01ts in the long run, without knowing in advance the cost function or the market prices. A\n\n1\n\n\fmore dif\ufb01cult version of this problem, perhaps more realistic, is when the only information obtained\nis the actual pro\ufb01t of the chosen subset of items, and no information on the pro\ufb01t possible for other\nsubsets.\nIn general, the Online Submodular Minimization problem is the following. In each iteration, we\nchoose a subset of a ground set of n elements, and then observe a submodular cost function which\ngives the cost of the subset we chose. The goal is to minimize the regret, which is the difference\nbetween the total cost of the subsets we chose, and the cost of the best subset in hindsight. Depending\non the feedback obtained, we distinguish between two settings, full-information and bandit. In the\nfull-information setting, we can query each cost function at as many points as we like. In the bandit\nsetting, we only get to observe the cost of the subset we chose, and no other information is revealed.\nObviously, if we ignore the special structure of these problems, standard algorithms for learning\nwith expert advice and/or with bandit feedback can be applied to this setting. However, the com-\nputational complexity of these algorithms would be proportional to the number of subsets, which is\n2n. In addition, for the submodular bandits problem, even the regret bounds have an exponential\ndependence on n. It is hence of interest to design ef\ufb01cient algorithms for these problems. For the\nbandit version an even more basic question arises: does there exist an algorithm with regret which\ndepends only polynomially on n?\nIn this paper, we answer these questions in the af\ufb01rmative. We give ef\ufb01cient algorithms for both\nproblems, with regret which is bounded by a polynomial in n \u2013 the underlying dimension \u2013 and\n\u221a\nsublinearly in the number of iterations. For the full information setting, we give two different ran-\nT ). One of these algorithms is based on the follow-\ndomized algorithms with expected regret O(n\nthe-perturbed-leader approach [5, 9]. We give a new way of analyzing such an algorithm. This\nanalysis technique should have applications for other problems with large decision spaces as well.\nThis algorithm is combinatorial, strongly polynomial, and can be easily generalized to arbitrary dis-\ntributive lattices, rather than just all subsets of a given set. The second algorithm is based on convex\nanalysis. We make crucial use of a continuous extension of a submodular function known as the\nLov\u00b4asz extension. We obtain our regret bounds by running a (sub)gradient descent algorithm in the\nstyle of Zinkevich [15].\nFor the bandit setting, we give a randomized algorithm with expected regret O(nT 2/3). This algo-\nrithm also makes use of the Lov\u00b4asz extension and gradient descent. The algorithm folds exploration\nand exploitation steps into a single sample and obtains the stated regret bound. We also show that\nthese regret bounds hold with high probability. Note that the technique of Flaxman, Kalai and\nMcMahan [1], when applied to the Lov\u00b4asz extension, gives a worse regret bound of O(nT 3/4).\n\n2 Preliminaries and Problem Statement\n\nSubmodular functions. The decision space is the set of all subsets of a universe of n elements,\n[n] = {1, 2, . . . , n}. The set of all subsets of [n] is denoted 2[n]. For a set S \u2286 [n], denote by \u03c7S its\ncharacteristic vector in {0, 1}n, i.e. \u03c7S(i) = 1 if i \u2208 S, and 0 otherwise.\nA function f : 2[n] \u2192 R is called submodular if for all sets S, T \u2286 [n] such that T \u2286 S, and for all\nelements i \u2208 E, we have\n\nf(T + i) \u2212 f(T ) \u2265 f(S + i) \u2212 f(S).\n\nHere, we use the shorthand notation S + i to indicate S \u222a {i}. An explicit description of f would\ntake exponential space. We assume therefore that the only way to access f is via a value oracle, i.e.\nan oracle that returns the value of f at any given set S \u2286 [n].\nGiven access to a value oracle for a submodular function, it is possible to minimize it in polynomial\ntime [3], and indeed, even in strongly polynomial time [3, 7, 13, 6, 12, 8]. The current fastest strongly\npolynomial algorithm are those of Orlin[12] and Iwata-Orlin [8], which takes time O(n5EO + n6),\nwhere EO is the time taken to run the value oracle. The fastest weakly polynomial algorithm is that\nof Iwata [6] and Iwata-Orlin [8] which runs in time \u02dcO(n4EO + n5).\n\nOnline Submodular Minimization.\nIn the Online Submodular Minimization problem, over a\nsequence of iterations t = 1, 2, . . ., an online decision maker has to repeatedly chose a subset\n\n2\n\n\fSt \u2286 [n]. In each iteration, after choosing the set St, the cost of the decision is speci\ufb01ed by a\nsubmodular function ft : 2[n] \u2192 [\u22121, 1]. The decision maker incurs cost ft(St). The regret of the\ndecision maker is de\ufb01ned to be\n\nT(cid:88)\n\nt=1\n\nT(cid:88)\n\nt=1\n\nRegretT :=\n\nft(St) \u2212 min\nS\u2286[n]\n\nft(S).\n\nIf the sets St are chosen by a randomized algorithm, then we consider the expected regret over the\nrandomness in the algorithm.\nAn online algorithm to choose the sets St will be said to be Hannan-consistent if it ensures that\nRegretT = o(T ). The algorithm will be called ef\ufb01cient if it computes each decision St in poly(n, t)\ntime. Depending on the kind of feedback the decision maker receives, we distinguish between two\nsettings of the problem:\n\n\u2022 Full information setting. In this case, in each round t, the decision maker has unlimited\n\naccess to the value oracles of the previously seen cost function f1, f2, . . . ft\u22121.\n\n\u2022 Bandit setting. In this case, in each round t, the decision maker only observes the cost of\n\nher decision St, viz. ft(St), and receives no other information.\n\nMain Results.\nIn the setup of the Online Submodular Minimization, we have the following results:\nTheorem 1. In the full information setting of Online Submodular Minimization, there is an ef\ufb01cient\nrandomized algorithm that attains the following regret bound:\n\n(cid:112)\n\n\u221a\nE[RegretT ] \u2264 O(n\nT ).\n\u221a\nT ) with probability at least 1 \u2212 \u03b5.\nlog(1/\u03b5))\n\nFurthermore, RegretT \u2264 O((n +\nTheorem 2. In the bandit setting of Online Submodular Minimization, there is an ef\ufb01cient random-\nized algorithm that attains the following regret bound:\n\n(cid:112)\nE[RegretT ] \u2264 O(nT 2/3).\nlog(1/\u03b5)) with probability at least 1 \u2212 \u03b5.\n\nFurthermore, RegretT \u2264 O(nT 2/3\nBoth of the theorems above hold against both oblivious as well as adaptive adversaries.\n\nThe Lov\u00b4asz Extension. A major technical construction we need for the algorithms is the Lov\u00b4asz\nextension \u02c6f of the submodular function f. This is de\ufb01ned on the Boolean hypercube K = [0, 1]n and\ntakes real values. Before de\ufb01ning the Lov\u00b4asz extension, we need the concept of a chain of subsets\nof [n]:\nDe\ufb01nition 1. A chain of subsets of [n] is a collection of sets A0, A1, . . . , Ap such that\n\nA0 \u2282 A1 \u2282 A2 \u2282 \u00b7\u00b7\u00b7 \u2282 Ap.\n\n(cid:80)p\n\nA maximal chain is one where p = n. For a maximal chain, we have A0 = \u2205, An = [n], and there is\na unique associated permutation \u03c0 : [n] \u2192 [n] such that for all i \u2208 [n], we have A\u03c0(i) = A\u03c0(i)\u22121+i.\nNow let x \u2208 K. There is a unique chain A0 \u2282 A2 \u2282 \u00b7\u00b7\u00b7 Ap such that x can be expressed as a\nconvex combination x =\ni=0 \u00b5i = 1. A nice way to construct this\ncombination is the following random process: choose a threshold \u03c4 \u2208 [0, 1] uniformly at random,\nand consider the level set S\u03c4 = {i : xi > \u03c4}. The sets in the required chain are exactly the level\nsets which are obtained with positive probability, and for any such set Ai, \u00b5i = Pr[S\u03c4 = Ai]. In\nother words, we have x = E\u03c4 [\u03c7S\u03c4 ]. This follows immediately by noting that for any i, we have\nPr\u03c4 [i \u2208 S\u03c4 ] = xi. Of course, the chain and the weights \u00b5i can also be constructed deterministically\nsimply by sorting the coordinates of x.\nNow, we are ready to de\ufb01ne1 the Lov\u00b4asz extension \u02c6f:\n\ni=0 \u00b5i\u03c7Ai where \u00b5i > 0 and\n\n(cid:80)p\n\n1Note that this is not the standard de\ufb01nition of the Lov\u00b4asz extension, but an equivalent characterization.\n\n3\n\n\fDe\ufb01nition 2. Let x \u2208 K. Let A0 \u2282 A2 \u2282 \u00b7\u00b7\u00b7 Ap such that x can be expressed as a convex\ni=0 \u00b5i = 1. Then the value of the Lov\u00b4asz\ncombination x =\nextension \u02c6f at x is de\ufb01ned to be\n\ni=0 \u00b5i\u03c7Ai where \u00b5i > 0 and\n\n(cid:80)p\n\n(cid:80)p\n\np(cid:88)\n\n\u02c6f(x) :=\n\n\u00b5if(Ai).\n\nThe preceding discussion gives an equivalent way of de\ufb01ning the Lov\u00b4asz extension: choose a thresh-\nold \u03c4 \u2208 [0, 1] uniformly at random, and consider the level set S\u03c4 = {i : xi > \u03c4}. Then we have\n\ni=0\n\n\u02c6f(x) = E\u03c4 [f(S\u03c4 )].\n\nNote that the de\ufb01nition immediately implies that for all sets S \u2286 [n], we have \u02c6f(\u03c7S) = f(S).\nWe will also need the notion of a maximal chain associated to a point x \u2208 K in order to de\ufb01ne\nsubgradients of the Lov\u00b4asz extension:\nDe\ufb01nition 3. Let x \u2208 K, and let A0 \u2282 A2 \u2282 \u00b7\u00b7\u00b7 Ap be the unique chain such that x =\ni=0 \u00b5i\u03c7Ai\ni=0 \u00b5i = 1. A maximal chain associated with x is any maximal completion of\nwhere \u00b5i > 0 and\nthe Ai chain, i.e. a maximal chain \u2205 = B0 \u2282 B1 \u2282 B2 \u2282 \u00b7\u00b7\u00b7 Bn = [n] such that all sets Ai appear\nin the Bj chain.\n\n(cid:80)p\n\n(cid:80)p\n\nWe have the following key properties of the Lov\u00b4asz extension. For proofs, refer to Fujishige [2],\nchapter IV.\nProposition 3. The following properties of the Lov\u00b4asz extension \u02c6f : K \u2192 R hold:\n\n1. \u02c6f is convex.\n2. Let x \u2208 K. Let \u2205 = B0 \u2282 B1 \u2282 B2 \u2282 \u00b7\u00b7\u00b7 Bn = [n] be an arbitrary maximal chain\nassociated with x, and let \u03c0 : [n] \u2192 [n] be the corresponding permutation. Then, a\nsubgradient g of \u02c6f at x is given as follows:\n\ngi = f(B\u03c0(i)) \u2212 f(B\u03c0(i)\u22121).\n\n3 The Full Information Setting\n\n\u221a\nIn this section we give two algorithms for regret minimization in the full information setting, both of\nT ). The \ufb01rst is a randomized combinatorial algorithm,\nwhich attain the same regret bound of O(n\nbased on the \u201cfollow the leader\u201d approach of Hannan [5] and Kalai-Vempala [9], and the second is\nan analytical algorithm based on (sub)gradient descent on the Lov\u00b4asz extension.\nBoth algorithms have pros and cons: while the second algorithm is much simpler and more ef\ufb01cient,\nwe do not know how to extend it to distributive lattices, for which the \ufb01rst algorithm readily applies.\n\n3.1 A Combinatorial Algorithm\n\nIn this section we analyze a combinatorial, strongly polynomial, algorithm for minimizing regret in\nthe full information Online Submodular Minimization setting:\n\nAlgorithm 1 Submodular Follow-The-Perturbed-Leader\n1: Input: parameter \u03b7 > 0.\n2: Initialization: For every i \u2208 [n], choose a random number ri \u2208 [\u22121/\u03b7, 1/\u03b7] uniformly at\n\n(cid:80)\n\nrandom. De\ufb01ne R : 2[n] \u2192 R as R(S) =\n\ni\u2208S ri.\n\n3: for t = 1 to T do\n4:\n5: end for\n\nUse the set St = arg minS\u2286[n]\n\n\u03c4 =1 f\u03c4 (S) + R(S), and obtain cost ft(St).\n\nDe\ufb01ne \u03a6t : 2[n] \u2192 R as \u03a6t(S) =\n\u03c4 =1 f\u03c4 (S) + R(S). Note that R is a submodular function, and\n\u03a6t, being the sum of submodular functions, is itself submodular. Furthermore, it is easy to construct\n\n(cid:80)t\u22121\n(cid:80)t\u22121\n\n4\n\n\fa value oracle for \u03a6t simply by using the value oracles for the f\u03c4 . Thus, the optimization in step 3\nis poly-time solvable given oracle access to \u03a6t.\nWhile the algorithm itself is a simple extension of Hannan\u2019s [5] follow-the-perturbed-leader algo-\nrithm, previous analysis (such as Kalai and Vempala [9]), which rely on linearity of the cost func-\ntions, cannot be made to work here. Instead, we introduce a new analysis technique: we divide the\ndecision space using n different cuts so that any two decisions are separated by at least one cut, and\nthen we give an upper bound on the probability that the chosen decision switches sides over each\nsuch cut. This new technique may have applications to other problems as well. We now prove the\nregret bound of Theorem 1:\nTheorem 4. Algorithm 1 run with parameter \u03b7 = 1/\n\n\u221a\nT achieves the following regret bound:\n\u221a\nT .\n\nE[RegretT ] \u2264 6n\n\nProof. We note that the algorithm is essentially running a \u201cfollow-the-leader\u201d algorithm on the\ncost functions f0, f1, . . . , ft\u22121, where f0 = R is a \ufb01ctitious \u201cperiod 0\u201d cost function used for\nregularization. The \ufb01rst step to analyzing this algorithm is to use a stability lemma, essentially\nproved in Theorem 1.1 of [9], which bounds the regret as follows:\n\nRegretT \u2264 T(cid:88)\n(cid:80)T\n\nt=1\n\nft(St) \u2212 ft(St+1) + R(S\u2217) \u2212 R(S1).\n\n(cid:88)\n\ni\u2208[n]\n\nt=1 ft(S).\n\nHere, S\u2217 = arg minS\u2286[n]\nTo bound the expected regret, by linearity of expectation, it suf\ufb01ces to bound E[f(St) \u2212 f(St+1)],\nwhere for the purpose of analysis, we assume that we re-randomize in every round (i.e. choose a\nfresh random function R : 2[n] \u2192 R). Naturally, the expectation E[f(St) \u2212 f(St+1)] is the same\nregardless of when R is chosen.\nTo bound this, we need the following lemma:\nLemma 5.\n\nPr[St (cid:54)= St+1] \u2264 2n\u03b7.\n\nProof. First, we note the following simple union bound:\n\nPr[St (cid:54)= St+1] \u2264\n\nPr[i \u2208 St and i /\u2208 St+1] + Pr[i /\u2208 St and i \u2208 St+1].\n\n(1)\n\n(cid:80)\n\n(cid:80)t\u22121\nNow, \ufb01x any i, and we aim to bound Pr[i \u2208 St and i /\u2208 St+1]. For this, we condition on the\nrandomness in choosing rj for all j (cid:54)= i. De\ufb01ne R(cid:48) : 2[n] \u2192 R as R(cid:48)(S) =\nj\u2208S,j(cid:54)=i rj, and\n\u03c4 =1 f\u03c4 (S) + R(cid:48)(S). Note that if i /\u2208 S, then R(cid:48)(S) = R(S) and\nt : 2[n] \u2192 R as \u03a6(cid:48)\n\u03a6(cid:48)\n\u03a6(cid:48)\nt(S) = \u03a6t(S). Let\n\nt(S) =\n\nA = arg min\n\nS\u2286[n]:i\u2208S\n\n\u03a6(cid:48)(S)\n\nand B = arg min\nS\u2286[n]:i /\u2208S\nt(A) + ri < \u03a6(cid:48)\n\n\u03a6(cid:48)(S).\n\nNow, we note that the event i \u2208 St happens only if \u03a6(cid:48)\n\u03a6(cid:48)\nt(A) + ri < \u03a6(cid:48)\n\nt(B) \u2212 2, then we must have i \u2208 St+1, since for any C such that i /\u2208 C,\nt(C) + ft(C) = \u03a6t(C).\nThe inequalities above use the fact that ft(S) \u2208 [\u22121, 1] for all S \u2286 [n]. Thus, if v := \u03a6(cid:48)\n\u03a6(cid:48)\nt(A), we have\n\nt(A) + ri + ft(A) < \u03a6(cid:48)\n\nt(B) \u2212 1 < \u03a6(cid:48)\n\n\u03a6t+1(A) = \u03a6(cid:48)\n\nt(B), and St = A. But if\n\nt(B) \u2212\n\nPr[i \u2208 St and i /\u2208 St+1 | rj, j (cid:54)= i] \u2264 Pr[ri \u2208 [v \u2212 2, v] | rj, j (cid:54)= i] \u2264 \u03b7,\n\nsince ri is chosen uniformly from [\u22121/\u03b7, 1/\u03b7]. We can now remove the conditioning on rj for\nj (cid:54)= i, and conclude that\nSimilarly, we can bound Pr[i /\u2208 St and i \u2208 St+1] \u2264 \u03b7. Finally, the union bound (1) over all choices\nof i yields the required bound on Pr[St (cid:54)= St+1].\n\nPr[i \u2208 St and i /\u2208 St+1] \u2264 \u03b7.\n\n5\n\n\fContinuing the proof, we have\n\nE[f(St) \u2212 f(St+1)] = E[f(St) \u2212 f(St+1) | St (cid:54)= St+1] \u00b7 Pr[St (cid:54)= St+1]\n\n\u2264 0 + 2 \u00b7 Pr[St (cid:54)= St+1]\n\u2264 4n\u03b7.\n\nThe last inequality follows from Lemma 5. Now, we have R(S\u2217) \u2212 R(S1) \u2264 2n/\u03b7, and so\n\nE[RegretT ] \u2264 T(cid:88)\n\nE[f(St) \u2212 f(St+1)] + E[R(S\u2217) \u2212 R(S1)]\n\nt=1\n\n\u2264 4n\u03b7T + 2n/\u03b7\n\u2264 6n\n\n\u221a\nT ,\n\nsince \u03b7 = 1/\n\n\u221a\nT .\n\n3.2 An Analytical Algorithm\n\nIn this section, we give a different algorithm based on the Online Gradient Descent method of\nZinkevich [15]. We apply this technique to the Lov\u00b4asz extension of the cost function coupled with a\nsimple randomized construction of the subgradient, as given in de\ufb01nition 2. This algorithm requires\nthe concept of a Euclidean projection of a point in Rn on to the set K, which is a function \u03a0K :\nRn \u2192 K de\ufb01ned by\n\n\u03a0K(y) := arg min\n\nx\u2208K (cid:107)x \u2212 y(cid:107).\n\nSince K = [0, 1]n, it is easy to implement this projection: indeed, for a point y \u2208 Rn, the projection\nx = \u03a0K(y) is de\ufb01ned by\n\n\uf8f1\uf8f2\uf8f3yi\n\n0\n1\n\nxi =\n\nif yi \u2208 [0, 1]\nif yi < 0\nif yi > 1\n\nAlgorithm 2 Submodular Subgradient Descent\n1: Input: parameter \u03b7 > 0. Let x1 \u2208 K be an arbitrary initial point.\n2: for t = 1 to T do\n3:\n\nChoose a threshold \u03c4 \u2208 [0, 1] uniformly at random, and use the set St = {i : xt(i) > \u03c4} and\nobtain cost ft(St).\nFind a maximal chain associated with xt, \u2205 = B0 \u2282 B1 \u2282 B2 \u2282 \u00b7\u00b7\u00b7 Bn = [n], and use\nft(B0), ft(B1), . . . , ft(Bn) to compute a subgradient gt of \u02c6ft at xt as in part 2 of Proposi-\ntion 3.\nUpdate: set xt+1 = \u03a0K(xt \u2212 \u03b7gt).\n\n4:\n\n5:\n6: end for\n\nIn the analysis of the algorithm, we need the following regret bound. It is a simple extension of\nZinkevich\u2019s analysis of Online Gradient Descent to vector-valued random variables whose expec-\ntation is the subgradient of the cost function (the generality to random variables is not required for\nthis section, but it will be useful in the next section):\nLemma 6. Let \u02c6f1, \u02c6f2, . . . , \u02c6fT : K \u2192 [\u22121, 1] be a sequence of convex cost functions over the cube\nK. Let x1, x2, . . . , xT \u2208 K be de\ufb01ned by x1 = 0 and xt+1 = \u03a0K(xt \u2212 \u03b7\u02c6gt), where \u02c6g1, \u02c6g2, . . . , \u02c6gT\nare vector-valued random variables such that E[\u02c6gt|xt] = gt, where gt is a subgradient of \u02c6ft at xt.\nThen the expected regret of playing x1, x2, . . . , xT is bounded by\n\nT(cid:88)\n\nt=1\n\nT(cid:88)\n\nt=1\n\nE[ \u02c6ft(xt)] \u2212 min\nx\u2208K\n\n\u02c6fT (x) \u2264 n\n2\u03b7\n\n+ 2\u03b7n\n\nE[(cid:107)\u02c6gt(cid:107)2].\n\n(cid:88)\n\nt\n\nSince this Lemma follows rather easily from [15], we omit the proof in this extended abstract.\nWe can now prove the following regret bound:\n\n6\n\n\fTheorem 7. Algorithm 2, run with parameter \u03b7 = 1/\n\n\u221a\nT , achieves the following regret bound:\n\u221a\nE[RegretT ] \u2264 3n\nT .\nFurthermore, with probability at least 1 \u2212 \u03b5, RegretT \u2264 (3n +\nProof. Note that be De\ufb01nition 2, we have that E[ft(St)] = \u02c6ft(xt). Since the algorithm runs Online\nGradient Descent (from Lemma 6) with \u02c6gt = gt (i.e. no randomness), we get the following bound\non the regret. Here, we use the bound (cid:107)\u02c6gt(cid:107)2 = (cid:107)gt(cid:107)2 \u2264 4n.\n\n2 log(1/\u03b5))\n\n\u221a\nT .\n\n(cid:112)\n\nE[RegretT ] =\n\u221a\nT , we get the required regret bound. Furthermore, by a simple Hoeffding bound, we\n\n+ 2\u03b7nT.\n\nt=1\n\nt=1\n\nt=1\n\nt=1\n\nE[ft(St)] \u2212 min\nS\u2286[n]\n\n\u02c6fT (x) \u2264 n\n2\u03b7\n\n\u02c6ft(xt) \u2212 min\nx\u2208K\n\nSince \u03b7 = 1/\nalso get that with probability at least 1 \u2212 \u03b5,\n\nT(cid:88)\n\nft(St) \u2264 T(cid:88)\n\n(cid:112)\n\nE[ft(St)] +\n\n2T log(1/\u03b5),\n\nT(cid:88)\n\nT(cid:88)\n\nf(S) \u2264 T(cid:88)\n\nT(cid:88)\n\nwhich implies the high probability regret bound.\n\nt=1\n\nt=1\n\n4 The Bandit Setting\n\nWe now present an algorithm for the Bandit Online Submodular Minimization problem. The algo-\nrithm is based on the Online Gradient Descent algorithm of Zinkevich [15]. The main idea is use\njust one sample for both exploration (to construct an unbiased estimator for the subgradient) and\nexploitation (to construct an unbiased estimator for the point chosen by the Online Gradient Descent\nalgorithm).\n\nAlgorithm 3 Bandit Submodular Subgradient Descent\n1: Input: parameters \u03b7, \u03b4 > 0. Let x1 \u2208 K be arbitrary.\n2: for t = 1 to T do\n3:\n\n(cid:80)n\n\nFind a maximal chain associated with xt, \u2205 = B0 \u2282 B1 \u2282 B2 \u2282 \u00b7\u00b7\u00b7 Bn = [n], and let\n\u03c0 be the associated permutation as in part 2 of Proposition 3. Then xt can be written as\ni=0 \u00b5i\u03c7Bi, where \u00b5i = 0 for the extra sets Bi that were added to complete the\nxt =\nmaximal chain for xt.\nChoose the set St as follows:\n\nSt = Bi with probability \u03c1i = (1 \u2212 \u03b4)\u00b5i + \u03b4\n\nn + 1 .\n\n4:\n\n5:\n\nUse the set St and obtain cost ft(St).\nIf St = B0, then set \u02c6gt = \u2212 1\nft(St)e\u03c0(n).\nOtherwise, St = Bi for some i \u2208 [2, n \u2212 1]. Choose \u03b5t \u2208 {+1,\u22121} uniformly at random,\nand set:\n\nft(St)e\u03c0(1), and if St = Bn then set \u02c6gt = 1\n\u03c1n\n\n\u03c10\n\n\uf8f1\uf8f2\uf8f3 2\n\n\u03c1i\n\nft(St)e\u03c0(i)\n\nft(St)e\u03c0(i+1)\n\nif \u03b5t = 1\nif \u03b5t = \u22121\n\n\u02c6gt =\n\n\u2212 2\nUpdate: set xt+1 = \u03a0K(xt \u2212 \u03b7 \u02c6gt).\n\n\u03c1i\n\n6:\n7: end for\n\nBefore launching into the analysis, we de\ufb01ne some convenient notation \ufb01rst. For a random variable\nXt de\ufb01ned in round t of the algorithm, de\ufb01ne Et[Xt] (resp. VARt[Xt]) to be the expectation (resp.\nvariance) of Xt conditioned on all the randomness chosen by the algorithm until round t.\nA \ufb01rst observation is that on the expectation, the regret of the algorithm above is almost the same\nas if it had played xt all along and the loss functions were replaced by the Lov\u00b4asz extensions of the\nactual loss functions.\n\n7\n\n\fLemma 8. For all t, we have E[f(St)] \u2264 E[ \u02c6ft(xt)] + 2\u03b4.\n\n(cid:80)\n\ni \u03c1if(Bi), and hence:\n\nn(cid:88)\n\n(cid:80)\n\n(cid:183)\n\nn(cid:88)\n\n(cid:184)\n\nProof. From De\ufb01nition 2 we have that \u02c6f(xt) =\n\ni \u00b5if(Bi). On the other hand, Et[f(St)] =\n\nEt[f(St)] \u2212 \u02c6ft(xt) =\n\n(\u03c1i \u2212 \u00b5i)f(Bi) \u2264 \u03b4\n\ni=0\n\ni=0\n\n1\n\nn + 1\n\n+ \u00b5i\n\n|f(Bi)| \u2264 2\u03b4.\n\nThe lemma now follows by taking the expectation of both sides of this inequality with respect to the\nrandomness chosen in the \ufb01rst t \u2212 1 rounds.\n\nNext, by Proposition 3, the subgradient of the Lov\u00b4asz extension of ft at point xt corresponding to\nthe maximal chain B0 \u2282 B1 \u2282 \u00b7\u00b7\u00b7 \u2282 Bn is given by gt(i) = f(B\u03c0(i)) \u2212 f(B\u03c0(i)\u22121). Using this\nfact, it is easy to check that the random vector \u02c6gt is constructed in such a way that E[\u02c6gt|xt] = gt.\nFurthermore, we can bound the norm of this estimator as follows:\nft(Bi)2 \u00b7 \u03c1i \u2264 4(n + 1)2\n\nEt[(cid:107)\u02c6gt(cid:107)2] \u2264 n(cid:88)\n\n(2)\n\n.\n\n\u2264 16n2\n\u03b4\n\n\u03b4\n\n4\n\u03c12\ni\n\ni=0\n\nWe can now remove the conditioning, and conclude that E[(cid:107)\u02c6gt(cid:107)2] \u2264 16n2\nTheorem 9. Algorithm 3, run with parameters \u03b4 = n\nbound:\n\nT 1/3 , \u03b7 = 1\n\n\u03b4\n\n.\n\nE[RegretT ] \u2264 12nT 2/3.\n\nT 2/3 , achieves the following regret\n\nProof. We bound the expected regret as follows:\n\nT(cid:88)\n\nT(cid:88)\n\nE[ft(St)] \u2212 min\nS\u2286[n]\n\nft(S) \u2264 2\u03b4T +\n\nE[ \u02c6ft(xt)] \u2212 min\nx\u2208K\n\nt=1\n\nt=1\n\nT(cid:88)\n\nt=1\n\nT(cid:88)\n\nt=1\n\nT(cid:88)\n\nE[(cid:107)\u02c6gt(cid:107)2]\n\n\u02c6ft(x)\n\n(By Lemma 8)\n\n(By Lemma 6)\n\n.\n\n(By (2))\n\n\u2264 2\u03b4T + n\n2\u03b7\n\u2264 2\u03b4T + n\n2\u03b7\nT 1/3 , \u03b7 = 1\nT 2/3 .\n\n+\n\n+ \u03b7\n2\nt=1\n8n2\u03b7T\n\n\u03b4\n\nThe bound is now obtained for \u03b4 = n\n\n4.1 High probability bounds on the regret\n\nThe theorem of the previous section gave a bound on the expected regret. However, a much stronger\nclaim can be made that essentially the same regret bound holds with very high probability (expo-\nnential tail). In addition, the previous theorem (which only bounds expected regret) holds against an\noblivious adversary, but not necessarily against a more powerful adaptive adversary. The following\ngives high probability bounds against an adaptive adversary.\nTheorem 10. With probability 1 \u2212 4\u03b5, Algorithm 3, run with parameters \u03b4 = n\nachieves the following regret bound:\n\nT 1/3 , \u03b7 = 1\n\nT 2/3 ,\n\n(cid:112)\n\nRegretT \u2264 O(nT 2/3\n\nlog(1/\u03b5)).\n\nThe proof of this theorem is deferred to the full version of this paper.\n\n5 Conclusions and Open Questions\n\nWe have described ef\ufb01cient regret minimization algorithms for submodular cost functions, in both\nthe bandit and full information settings. This parallels the work of Streeter and Golovin [14] who\nstudy two speci\ufb01c instances of online submodular maximization (for which the of\ufb02ine problem is\n\u221a\nNP-hard), and give (approximate) regret minimizing algorithms. An open question is whether it is\nT ) regret bounds for online submodular minimization in the bandit setting.\npossible to attain O(\n\n8\n\n\fReferences\n[1] A. D. Flaxman, A. T. Kalai, and H. B. McMahan, Online convex optimization in the bandit\n\nsetting: gradient descent without a gradient, SODA, 2005, pp. 385\u2013394.\n[2] Satoru Fujishige, Submodular functions and optimization, Elsevier, 2005.\n[3] M. Gr\u00a8otschel, L. Lov\u00b4asz, and A. Schrijver, Geometric Algorithms and Combinatorial Opti-\n\nmization, Springer Verlag, 1988.\n\n[4] Carlos Guestrin and Andreas Krause, Beyond convexity - submodularity in machine learning.,\n\nTutorial given in the 25rd International Conference on Machine Learning (ICML), 2008.\n\n[5] J. Hannan, Approximation to bayes risk in repeated play, In M. Dresher, A. W. Tucker, and P.\n\nWolfe, editors, Contributions to the Theory of Games, volume III (1957), 97\u2013139.\n\n[6] Satoru Iwata, A faster scaling algorithm for minimizing submodular functions, SIAM J. Com-\n\nput. 32 (2003), no. 4, 833\u2013840.\n\n[7] Satoru Iwata, Lisa Fleischer, and Satoru Fujishige, A combinatorial strongly polynomial algo-\n\nrithm for minimizing submodular functions, J. ACM 48 (2001), 761\u2013777.\n\n[8] Satoru Iwata and James B. Orlin, A simple combinatorial algorithm for submodular function\nminimization, SODA \u201909: Proceedings of the Nineteenth Annual ACM -SIAM Symposium on\nDiscrete Algorithms (Philadelphia, PA, USA), Society for Industrial and Applied Mathematics,\n2009, pp. 1230\u20131237.\n\n[9] Adam Kalai and Santosh Vempala, Ef\ufb01cient algorithms for online decision problems, Journal\n\nof Computer and System Sciences 71(3) (2005), 291\u2013307.\n\n[10] N. Littlestone and M. K. Warmuth, The weighted majority algorithm, Proceedings of the 30th\n\nAnnual Symposium on the Foundations of Computer Science, 1989, pp. 256\u2013261.\n\n[11] S. T. McCormick, Submodular function minimization., Chapter 7 in the Handbook on Discrete\nOptimization (G. Nemhauser K. Aardal and R. Weismantel, eds.), Elsevier, 2006, pp. 321\u2013391.\n[12] James B. Orlin, A faster strongly polynomial time algorithm for submodular function mini-\n\nmization, Math. Program. 118 (2009), no. 2, 237\u2013251.\n\n[13] Alexander Schrijver, A combinatorial algorithm minimizing submodular functions in strongly\n\npolynomial time, 1999.\n\n[14] Matthew J. Streeter and Daniel Golovin, An online algorithm for maximizing submodular func-\n\ntions, NIPS, 2008, pp. 1577\u20131584.\n\n[15] Martin Zinkevich, Online convex programming and generalized in\ufb01nitesimal gradient ascent.,\n\nProceedings of the Twentieth International Conference (ICML), 2003, pp. 928\u2013936.\n\n9\n\n\f", "award": [], "sourceid": 482, "authors": [{"given_name": "Elad", "family_name": "Hazan", "institution": null}, {"given_name": "Satyen", "family_name": "Kale", "institution": null}]}