{"title": "Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback", "book": "Advances in Neural Information Processing Systems", "page_first": 9210, "page_last": 9221, "abstract": "In this paper, we propose three online algorithms for submodular maximization. The first one, Mono-Frank-Wolfe, reduces the number of per-function gradient evaluations from $T^{1/2}$ [Chen2018Online] and $T^{3/2}$ [chen2018projection] to 1, and achieves a $(1-1/e)$-regret bound of $O(T^{4/5})$. The second one, Bandit-Frank-Wolfe, is the first bandit algorithm for continuous DR-submodular maximization, which achieves a $(1-1/e)$-regret bound of $O(T^{8/9})$. Finally, we extend Bandit-Frank-Wolfe to a bandit algorithm for discrete submodular maximization, Responsive-Frank-Wolfe, which attains a $(1-1/e)$-regret bound of $O(T^{8/9})$ in the responsive bandit setting.", "full_text": "Online Continuous Submodular Maximization: From\n\nFull-Information to Bandit Feedback\n\nMingrui Zhang\u2020 Lin Chen\u2021 Hamed Hassani(cid:93) Amin Karbasi\u2021,(cid:92)\n\n\u2020 Department of Statistics and Data Science, Yale University\n\n\u2021 Department of Electrical Engineering, Yale University\n\n(cid:93) Department of Electrical and Systems Engineering, University of Pennsylvania\n\n(cid:92) Department of Computer Science, Yale University\n\n{mingrui.zhang, lin.chen, amin.karbasi}@yale.edu\n\nhassani@seas.upenn.edu\n\nAbstract\n\nIn this paper, we propose three online algorithms for submodular maximization.\nThe \ufb01rst one, Mono-Frank-Wolfe, reduces the number of per-function gradient\nevaluations from T 1/2 [18] and T 3/2 [17] to 1, and achieves a (1 \u2212 1/e)-regret\nbound of O(T 4/5). The second one, Bandit-Frank-Wolfe, is the \ufb01rst bandit al-\ngorithm for continuous DR-submodular maximization, which achieves a (1\u2212 1/e)-\nregret bound of O(T 8/9). Finally, we extend Bandit-Frank-Wolfe to a bandit\nalgorithm for discrete submodular maximization, Responsive-Frank-Wolfe,\nwhich attains a (1 \u2212 1/e)-regret bound of O(T 8/9) in the responsive bandit set-\nting.\n\n1\n\nIntroduction\n\nLet F : X \u2192 R\u22650 be a differentiable function de\ufb01ned on a box X (cid:44)(cid:81)d\n\nSubmodularity naturally arises in a variety of disciplines, and has numerous applications in machine\nlearning, including data summarization [45], active and semi-supervised learning [26, 47], com-\npressed sensing and structured sparsity [7], fairness in machine learning [8], mean-\ufb01eld inference in\nprobabilistic models [10], and MAP inference in determinantal point processes (DPPs) [36].\nWe say that a set function f : 2\u2126 \u2192 R\u22650 de\ufb01ned on a \ufb01nite ground set \u2126 is submodular if for every\nA \u2286 B \u2286 \u2126 and x \u2208 \u2126 \\ B, we have f (x|A) \u2265 f (x|B), where f (x|A) (cid:44) f (A \u222a {x}) \u2212 f (A)\nis a discrete derivative [39]. Continuous DR-submodular functions are the continuous analogue.\ni=1 Xi, where each Xi is\na closed interval of R\u22650. We say that F is continuous DR-submodular if for every x, y \u2208 X that\n(y), where x \u2264 y means\nsatisfy x \u2264 y and every i \u2208 [d] (cid:44) {1, . . . , d}, we have \u2202F\nxi \u2264 yi,\u2200i \u2208 [d] [9].\nIn this paper, we focus on online and bandit maximization of submodular set functions and contin-\nuous DR-submodular functions. In contrast to of\ufb02ine optimization where the objective function is\ncompletely known beforehand, online optimization can be viewed as a two-player game between\nthe player and the adversary in a sequential manner [50, 42, 28]. Let F be a family of real-valued\nfunctions. The player wants to maximize a sequence of functions F1, . . . , FT \u2208 F subject to a\nconstraint set K. The player has no a priori knowledge of the functions, while the constraint set\nis known and we assume that it is a closed convex set in Rd. The natural number T is termed the\nhorizon of the online optimization problem. At the t-th iteration, without the knowledge of Ft, the\nplayer has to select a point xt \u2208 K. After the player commits to this choice, the adversary selects\na function Ft \u2208 F. The player receives a reward Ft(xt), observes the function Ft determined by\nthe adversary, and proceeds to the next iteration. In the more challenging bandit setting, even the\n\n(x) \u2265 \u2202F\n\n\u2202xi\n\n\u2202xi\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fsingle choice in hindsight. To be precise, the regret is de\ufb01ned by maxx\u2208K(cid:80)T\n\nfunction Ft is unavailable to the player and the only observable information is the reward that the\nplayer receives [23, 3, 11].\nThe performance of the algorithm that the player uses to determine her choices x1, . . . , xT is\nquanti\ufb01ed by the regret, which is the gap between her accumulated reward and the reward of the best\nt=1 Ft(xt).\nHowever, even in the of\ufb02ine scenario, it is shown that the maximization problem of a continuous\nDR-submodular function cannot be approximated within a factor of (1 \u2212 1/e + \u0001) for any \u0001 > 0 in\npolynomial time, unless RP = N P [9]. Therefore, we consider the (1 \u2212 1/e)-regret [44, 34, 18]\n\nt=1 Ft(x)\u2212(cid:80)T\n\nR1\u22121/e,T (cid:44) (1 \u2212 1/e) max\nx\u2208K\n\nFt(xt).\n\nT(cid:88)\n\nt=1\n\nFt(x) \u2212 T(cid:88)\n\nt=1\n\nFor ease of notation, we write RT for R1\u22121/e,T throughout this paper.\nIn this paper, we study the following three problems:\n\n\u2022 OCSM: the Online Continuous DR-Submodular Maximization problem,\n\u2022 BCSM: the Bandit Continuous DR-Submodular Maximization problem, and\n\u2022 RBSM: the Responsive Bandit Submodular Maximization problem.\n\nWe note that although special cases of bandit submodular maximization problem (BSM) were studied\nin [44, 27], the vanilla BSM problem is still open for general monotone submodular functions under a\nmatroid constraint. In BSM, the objective functions f1, . . . , fT are submodular set functions de\ufb01ned\non a common \ufb01nite ground set \u2126 and subject to a common constraint I. For each function fi, the\nplayer has to select a subset Xi \u2208 I. Only after playing the subset Xi, the reward fi(Xi) is received\nand thereby observed.\nIf the value of the corresponding multilinear extension1 F can be estimated by the submodular\nset function f, we may expect to solve the vanilla BSM by invoking algorithms for continuous DR-\nsubmodular maximization. In this paper, however, we will show a hardness result that subject to some\nconstraint I, it is impossible to construct a one-point unbiased estimator of the multilinear extension\nF based on the value of f, without knowing the information of f in advance. This result motivates the\nstudy of a slightly relaxed setting termed the Responsive Bandit Submodular Maximization problem\n(RBSM). In RBSM, at round i, if Xi /\u2208 I, the player is still allowed to play Xi and observe the function\nvalue fi(Xi), but gets zero reward out of it.\nOCSM was studied in [18, 17], where T 1/2 exact gradient evaluations or T 3/2 stochastic gradient\nevaluations are required per iteration (T is the horizon). Therefore, they cannot be extended to the\nbandit setting (BCSM and RBSM) where one single function evaluation per iteration is permitted. As a\nresult, no known bandit algorithm attains a sublinear (1 \u2212 1/e)-regret.\nIn this paper, we \ufb01rst propose Mono-Frank-Wolfe for OCSM, which requires one stochastic gra-\ndient per function and still attains a (1 \u2212 1/e)-regret bound of O(T 4/5). This is signi\ufb01cant as\nit reduces the number of per-function gradient evaluations from T 3/2 to 1. Furthermore, it pro-\nvides a feasible avenue to solving BCSM and RBSM. We then propose Bandit-Frank-Wolfe and\nResponsive-Frank-Wolfe that attain a (1 \u2212 1/e)-regret bound of O(T 8/9) for BCSM and RBSM,\nrespectively. To the best of our knowledge, Bandit-Frank-Wolfe and Responsive-Frank-Wolfe\nare the \ufb01rst algorithms that attain a sublinear (1 \u2212 1/e)-regret bound for BCSM and RBSM, respectively.\nThe performance of prior approaches and our proposed algorithms is summarized in Table 1. We also\nlist further related works in Appendix A.\n\n2 Preliminaries\n\nMonotonicity, Smoothness, and Directional Concavity Property A submodular set function\nf : 2\u2126 \u2192 R is called monotone if for any two sets A \u2286 B \u2286 \u2126 we have f (A) \u2264 f (B).\nFor two vectors x and y, we write x \u2264 y if xi \u2264 yi holds for every i. Let F be a continuous\nDR-submodular function de\ufb01ned on X . We say that F is monotone if F (x) \u2264 F (y) for every\nx, y \u2208 X obeying x \u2264 y. Additionally, F is called L-smooth if for every x, y \u2208 X it holds that\n\n1We formally de\ufb01ne the multilinear extension of a submodular set function in Section 2.\n\n2\n\n\fTable 1: Comparison of previous and our proposed algorithms.\n\nSetting Algorithm\n\nOCSM\n\nBCSM\nRBSM\n\nMeta-FW [18]\nVR-FW [17]\nMono-FW (this work)\nBandit-FW (this work)\nResponsive-FW (this work)\n\nStochastic\ngradient\nNo\nYes\nYes\n-\n-\n\n# of grad.\nevaluations\n\nT 1/2\nT 3/2\n1\n-\n-\n\n(1 \u2212 1/e)-regret\n\n\u221a\n\u221a\nT )\nO(\nO(\nT )\nO(T 4/5)\n\nO(T 8/9)\n\nO(T 8/9)\n\nby F (x) =(cid:80)\n\n(cid:107)\u2207F (x) \u2212 \u2207F (y)(cid:107)\u2264 L(cid:107)x \u2212 y(cid:107). Throughout the paper, we use the notation (cid:107)\u00b7(cid:107) for the Euclidean\nnorm. An important implication of continuous DR-submodularity is concavity along the non-negative\ndirections [16, 9], i.e., for all x \u2264 y, we have F (y) \u2264 F (x) + (cid:104)\u2207F (x), y \u2212 x(cid:105).\nMultilinear Extension Given a submodular set function f : 2\u2126 \u2192 R\u22650 de\ufb01ned on a \ufb01nite ground\nset \u2126, its multilinear extension is a continuous DR-submodular function F : [0, 1]|\u2126| \u2192 R\u22650 de\ufb01ned\nS\u2286\u2126 f (S)\u03a0i\u2208Sxi\u03a0j /\u2208S(1 \u2212 xj), where xi is the i-th coordinate of x. Equivalently, for\nany vector x \u2208 [0, 1]|\u2126| we have F (x) = ES\u223cx[f (S)] where S \u223c x means that S is a random subset\nof \u2126 such that every element i \u2208 \u2126 is contained in S independently with probability xi.\nGeometric Notations The d-dimensional unit ball is denoted by Bd, and the (d \u2212 1)-dimensional\nunit sphere is denoted by Sd\u22121. Let K be a bounded set. We de\ufb01ne its diameter D = supx,y\u2208K(cid:107)x\u2212y(cid:107)\nand radius R = supx\u2208K(cid:107)x(cid:107). We say a set K has lower bound u if u \u2208 K, and \u2200x \u2208 K, x \u2265 u.\n\n3 One-shot Online Continuous DR-Submodular Maximization\n\nIn this section, we propose Mono-Frank-Wolfe, an online continuous DR-submodular maximization\nalgorithm which only needs one gradient evaluation per function. This algorithm is the basis of the\nmethods presented in the next section for the bandit setting. We also note that throughout this paper,\n\u2207F denotes the exact gradient for F , while \u02dc\u2207F denotes the stochastic gradient.\nWe begin by reviewing the Frank-Wolfe (FW) [24, 33] method for maximizing monotone continuous\nDR-submodular functions in the of\ufb02ine setting [9], where we have one single objective function F .\nAssuming that we have access to the exact gradient \u2207F , the FW method is an iterative procedure that\nstarts from the initial point x(1) = 0, and at the k-th iteration, solves a linear optimization problem\n(1)\n\nv(k) \u2190 arg max\n\n(cid:104)v,\u2207F (x(k))(cid:105)\n\nv\u2208K\n\nt\n\n3\n\nt\n\nt\n\nt = 0, update x(k+1)\n\nwhich is used to update x(k+1) \u2190 x(k) + \u03b7kv(k), where \u03b7k is the step size.\nWe aim to extend the FW method to the online setting. Inspired by the FW update above, to get high\nrewards for each objective function Ft, we start from x(1)\nfor\nmultiple iterations (let K denote the number of iterations), then play the last iterate x(K+1)\nfor Ft. To\nobtain the point x(K+1)\nwhich we play, we need to solve the linear program Eq. (1) and thus get v(k)\n,\nwhere we have to know the gradient in advance. However, in the online setting, we can only observe\nthe stochastic gradient \u02dc\u2207Ft after we play some point for Ft. So the key issue is to obtain the vector\nt which at least approximately maximizes (cid:104)\u00b7,\u2207Ft(x(k)\nv(k)\nTo do so, we use K no-regret online linear maximization oracles {E (k)}, k \u2208 [K], and let v(k)\noutput vector of E (k) at round t. Once we update x(k+1)\nFt, we can observe \u02dc\u2207Ft(x(k)\nestimation of \u2207Ft(x(k)\n(cid:104)\u00b7, d(k)\n\nbe the\nfor\n), an\n) [37, 38] for all k \u2208 [K]. Then we set\n(cid:105) as the objective function for oracle E (k) at round t. Thanks to the no-regret property of E (k),\n\nt\n) and iteratively construct d(k)\n) with a lower variance than \u02dc\u2207Ft(x(k)\n\nby v(k)\nt = (1 \u2212 \u03c1k)d(k\u22121)\n\nfor all k \u2208 [K], and play x(K+1)\n+ \u03c1k \u02dc\u2207Ft(x(k)\n\n)(cid:105), before we play some point for Ft.\n\nt\n\n= x(k)\n\nt + \u03b7kv(k)\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\nt\n\n\ft\n\nt\n\nq\n\n)(cid:105).\n\n(cid:80)K\n\nq = 0, update x(k+1)\n\n(cid:105), thus also approximately maximizes (cid:104)\u00b7,\u2207Ft(x(k)\n\n, which is obtained before we play some point for Ft and observe the gradient, approximately\n\nv(k)\nt\nmaximizes (cid:104)\u00b7, d(k)\nThis approach was \ufb01rst proposed in [18, 17], where stochastic gradients at K = T 3/2 points (i.e.,\n{x(k)\nt }, k \u2208 [K]) are required for each function Ft. To carry this general idea into the one-shot setting\nwhere we can only access one gradient per function, we need the following blocking procedure.\nWe divide the upcoming objective functions F1, . . . , FT into Q equisized blocks of size K (so\nT = QK). For the q-th block, we \ufb01rst set x(1)\n, and play\nthe same point xq = x(K+1)\nfor all the functions F(q\u22121)K+1, . . . , FqK. The reason why we play\nthe same point xq will be explained later. We also de\ufb01ne the average function in the q-th block as\n\u00afFq (cid:44) 1\nk=1 F(q\u22121)K+k. In order to reduce the required number of gradients per function, the key\nidea is to view the average functions \u00afF1, . . . , \u00afFQ as virtual objective functions.\nPrecisely, in the q-th block, let (tq,1, . . . , tq,K) be a random permutation of the indices {(q \u2212 1)K +\n1, . . . , qK}. After we update all the x(k)\n, for each Ft, we play xq and \ufb01nd the corresponding k(cid:48) such\nthat t = tq,k(cid:48), then observe \u02dc\u2207Ft (i.e., \u02dc\u2207Ftq,k(cid:48) ) at x(k(cid:48))\nq ) for all\nk \u2208 [K]. Since tq,k is a random variable such that E[Ftq,k ] = \u00afFq, \u02dc\u2207Ftq,k (x(k)\nq ) is also an estimation\nof \u2207 \u00afFq(x(k)\nq ), which holds for all k \u2208 [K]. As a result, with only one gradient evaluation per function\nFtq,k, we can obtain stochastic gradients of the virtual objective function \u00afFq at K points. In this way,\nthe required number of per-function gradient evaluations is reduced from K to 1 successfully.\nNote that since we play yt = xq for each Ft in the q-th block, the regret w.r.t. the original objective\nfunctions and that w.r.t. the average functions satisfy that\n\n. Thus we can obtain \u02dc\u2207Ftq,k (x(k)\n\nq + \u03b7kv(k)\n\n= x(k)\n\nK\n\nq\n\nq\n\nq\n\nq\n\n(1 \u2212 1/e) max\nx\u2208K\n\nFt(yt) = K\n\n(1 \u2212 1/e) max\nx\u2208K\n\nT(cid:88)\n\nFt(x) \u2212 T(cid:88)\n\nt=1\n\nt=1\n\n(cid:34)\n\nQ(cid:88)\n\n\u00afFq(x) \u2212 Q(cid:88)\n\nq=1\n\nt=1\n\n(cid:35)\n\n\u00afFq(xq)\n\n,\n\nq\n\nq\n\nq\n\n1\n\n(cid:80)K\n\nq ). So v(k)\n\n+ \u03c1k \u02dc\u2207Ftq,k (x(k)\n\nq = (1 \u2212 \u03c1k)d(k\u22121)\n\nq ), thus also an estimation of \u2207 \u00afFq(x(k)\n\nq )(cid:105). Inspired by the of\ufb02ine FW method, playing xq = x(K+1)\n\nwhich makes it possible to view the functions \u00afFq as virtual objective functions in the regret analysis.\nMoreover, we iteratively construct d(k)\nq ) as an estimation of\n, the output of E (k), approximately\n\u2207Ftq,k (x(k)\nmaximizes (cid:104)\u00b7,\u2207 \u00afFq(x(k)\n, the last iterate\nin the FW procedure, may obtain high rewards for \u00afFq. As a result, we play the same point xq in the\nq-th block.\nWe also note that once tq,1, . . . , tq,k are revealed, conditioned on the knowledge, the expecta-\ntion of Ftq,k+1 is no longer the average function \u00afFq but the residual average function \u00afFq,k(x) =\ni=k+1 Ftq,i(x). As more indices tq,k are revealed, \u00afFq,k becomes increasingly different from\nK\u2212k\n\u00afFq, which makes the observed gradient \u02dc\u2207Ftq,k+1 (x(k+1)\n) any\nmore. As a result, although we use the averaging technique (the update of d(k)\n) as in [37, 38] for vari-\nance reduction, a completely different gradient error analysis is required. In Lemma 6 (Appendix B),\nwe establish that the squared error of d(k)\nexhibits an inverted bell-shaped tendency; i.e., the squared\nerror is large at the initial and \ufb01nal stages and is small at the intermediate stage.\nWe present our proposed Mono-Frank-Wolfe algorithm in Algorithm 1.\nWe will show that Mono-Frank-Wolfe achieves a (1 \u2212 1/e)-regret bound of O(T 4/5). In order\nto prove this result, we \ufb01rst make the following assumptions on the constraint set K, the objective\nfunctions Ft, the stochastic gradient \u02dc\u2207Ft, and the online linear maximization oracles.\nAssumption 1. The constraint set K is a convex and compact set that contains 0.\nAssumption 2. Every objective function Ft is monotone, continuous DR-Submodular, L1-Lipschitz,\nand L2-smooth.\nAssumption 3. The stochastic gradient \u02dc\u2207Ft(x) is unbiased, i.e., E[ \u02dc\u2207Ft(x)] = \u2207Ft(x). Addi-\ntionally, it has a uniformly bounded norm (cid:107) \u02dc\u2207Ft(x)(cid:107)\u2264 M0 and a uniformly bounded variance\nE[(cid:107)\u2207Ft(x) \u2212 \u02dc\u2207Ft(x)(cid:107)2] \u2264 \u03c32\n\n0 for every x \u2208 K and objective function Ft.\n\n) not a good estimation of \u2207 \u00afFq(x(k+1)\n\nq\n\nq\n\nq\n\nq\n\n4\n\n\fAlgorithm 1 Mono-Frank-Wolfe\nInput: constraint set K, horizon T , block size K, online linear maximization oracles on K:\n\nE (1),\u00b7\u00b7\u00b7 ,E (K), step sizes \u03c1k \u2208 (0, 1), \u03b7k \u2208 (0, 1), number of blocks Q = T /K\n\nOutput: y1, y2, . . .\n1: for q = 1, 2, . . . , Q do\nq \u2190 0\n2:\n3:\n\nq \u2190 0, x(1)\nd(0)\nq \u2208 K be the output of E (k) in round q, x(k+1)\nFor k = 1, 2, . . . , K, let v(k)\nLet (tq,1, . . . , tq,K) be a random permutation of {(q \u2212 1)K + 1, . . . , qK}\nFor t = (q \u2212 1)K + 1, . . . , qK, play yt = xq and obtain the reward Ft(yt); \ufb01nd the\n\nq + \u03b7kv(k)\n\n\u2190 x(k)\n\nSet xq \u2190 x(K+1)\n\n.\n\nq\n\nq\n\nq\n\ncorresponding k(cid:48) \u2208 [K] such that t = tq,k(cid:48), observe \u02dc\u2207Ft(x(k(cid:48))\n\nq\n\n), i.e., \u02dc\u2207Ftq,k(cid:48) (x(k(cid:48))\n\n)\nq ), compute (cid:104)v(k)\n\nq\n\nq\n\nq (cid:105) as\n\n, d(k)\n\nFor k = 1, 2, . . . , K, d(k)\n\nreward for E (k), and feed back d(k)\n\nq\n\nto E (k)\n\nq \u2190 (1 \u2212 \u03c1k)d(k\u22121)\n\nq\n\n+ \u03c1k \u02dc\u2207Ftq,k (x(k)\n\n4:\n5:\n\n6:\n\n7: end for\n\n\u221a\n\nt \u2264 C\n\nt,\u2200i \u2208 [K], where C > 0 is a constant.\n\nAssumption 4. For the online linear maximization oracles, the regret at horizon t (denoted by RE (i)\nsatis\ufb01es RE (i)\nNote that there exist online linear maximization oracles E (i) with regret RE (i)\nt,\u2200i \u2208 [K] for\nany horizon t (for example, the online gradient descent [50]). Therefore, Assumption 4 is ful\ufb01lled.\nTheorem 1 (Proof in Appendix B). Under Assumptions 1 to 4, if we set K = T 3/5, \u03b7k = 1\nK , \u03c1k =\n(K\u2212k+2)2/3 when K/2 + 2 \u2264 k \u2264 K, where we assume\n(k+3)2/3 when 1 \u2264 k \u2264 K/2 + 1, and \u03c1k =\nthat K is even for simplicity, then yt \u2208 K,\u2200t, and the expected (1 \u2212 1/e)-regret of Algorithm 1 is at\nmost\n\nt \u2264 C\n\n\u221a\n\n1.5\n\n2\n\nt\n\n)\n\nE[RT ] \u2264 (N + C + D2)T 4/5 +\n\nL2D2\n\nwhere N = max{52/3(L1 +M0)2, 4(L2\n\n1 +\u03c32\n\n0)+32G, 2.25(L2\n\nT 2/5,\n0)+7G/3}, G = (L2R +2L1)2.\n\n2\n1 +\u03c32\n\n4 Bandit Continuous DR-Submodular Maximization\n\nIn this section, we present the \ufb01rst bandit algorithm for continuous DR-submodular maximization,\nBandit-Frank-Wolfe, which attains a (1\u2212 1/e)-regret bound of O(T 8/9). We begin by explaining\nthe one-point gradient estimator [23], which is crucial to the proposed bandit algorithm. The proposed\nalgorithm and main results are illustrated in Section 4.2.\n\n4.1 One-Point Gradient Estimator\nGiven a function F , we de\ufb01ne its \u03b4-smoothed version \u02c6F\u03b4(x) (cid:44) Ev\u223cBd [F (x + \u03b4v)], where v \u223c Bd\ndenotes that v is drawn uniformly at random from the unit ball Bd. Thus the function F is averaged\nover a ball of radius \u03b4. It can be easily veri\ufb01ed that if F is monotone, continuous DR-submodular,\nL1-Lipschitz, and L2-smooth, then so is \u02c6F\u03b4, and for all x we have | \u02c6F\u03b4(x) \u2212 F (x)|\u2264 L1\u03b4 (Lemma 7\nin Appendix C). So the \u03b4-smoothed version \u02c6F\u03b4 is indeed an approximation of F . A maximizer of \u02c6F\u03b4\nalso maximizes F approximately.\nMore importantly, the gradient of the smoothed function \u02c6F\u03b4 admits a one-point unbiased estimator\n[23, 32]: \u2207 \u02c6F\u03b4(x) = Eu\u223cSd\u22121\nat random from the unit sphere Sd\u22121. Thus the player can estimate the gradient of the smoothed\nfunction at point x by playing the random point x + \u03b4u for the original function F . So usually, we\ncan extend a one-shot online algorithm to the bandit setting by replacing the observed stochastic\ngradients with the one-point gradient estimations.\nIn our setting, however, we cannot use the one-point gradient estimator directly. When the point x is\nclose to the boundary of the constraint set K, the point x + \u03b4u may fall outside of K. To address this\n\n(cid:2) d\n\u03b4 F (x + \u03b4u)u(cid:3), where u \u223c Sd\u22121 denotes that u is drawn uniformly\n\n5\n\n\fissue, we introduce the notion of \u03b4-interior. A set is said to be a \u03b4-interior of K if it is a subset of\n\nint\u03b4(K) = {x \u2208 K| inf\n\ns\u2208\u2202K d(x, s) \u2265 \u03b4} ,\n\nwhere d(\u00b7,\u00b7) denotes the Euclidean distance.\nIn other words, K(cid:48) is a \u03b4-interior of K if it holds for every x \u2208 K(cid:48) that B(x, \u03b4) \u2286 K (Fig. 1a in\nAppendix D). We note that there can be in\ufb01nitely many \u03b4-interiors of K. In the sequel, K(cid:48) will denote\nthe \u03b4-interior that we consider. We also de\ufb01ne the discrepancy between K and K(cid:48) by\n\nd(K,K(cid:48)) = sup\nx\u2208K\n\nd(x,K(cid:48)),\n\noptimal total reward on K (maxx\u2208K(cid:80)T\n\nt=1 Ft(x)) by that on K(cid:48) (maxx\u2208K(cid:48)(cid:80)T\n\nwhich is the supremum of the distances between points in K and the set K(cid:48). The distance d(x,K(cid:48)) is\ngiven by inf y\u2208K(cid:48) d(x, y).\nBy de\ufb01nition, every point x \u2208 K(cid:48) satis\ufb01es x + \u03b4u \u2208 K, which enables us to use the one-point gradient\nestimator on K(cid:48). Moreover, if every Ft is Lipschitz and d(K,K(cid:48)) is small, we can approximate the\nt=1 Ft(x)), and thereby\nobtain the regret bound subject to the original constraint set K, by running bandit algorithms on K(cid:48).\nWe also note that if the constraint set K satis\ufb01es Assumption 1 and is down-closed (e.g., a matroid\npolytope), for suf\ufb01ciently small \u03b4, we can construct K(cid:48), a down-closed \u03b4-interior of K, with d(K,K(cid:48))\nsuf\ufb01ciently small (actually it is a linear function of \u03b4). Recall that a set P is down-closed if it has a\nlower bound u such that (1) \u2200y \u2208 P, u \u2264 y; and (2) \u2200y \u2208 P, x \u2208 Rd, u \u2264 x \u2264 y =\u21d2 x \u2208 P [9].\nWe \ufb01rst de\ufb01ne Bd\u22650 = Bd \u2229 Rd\u22650 and make the following assumption2:\nAssumption 5. There exists a positive number r such that rBd\u22650 \u2286 K.\nTo construct K(cid:48), for suf\ufb01ciently small \u03b4 such that \u03b4 < r\u221a\n< 1, and\nshrink K by a factor of (1 \u2212 \u03b1) to obtain K\u03b1 = (1 \u2212 \u03b1)K. Then we translate the shrunk set K\u03b1 by\n\u03b41 (Fig. 1b in Appendix D). In other words, the set that we \ufb01nally obtain is\n\n, we \ufb01rst set \u03b1 = (\n\nd+1)\u03b4\n\nd+1\n\n\u221a\n\nr\n\nK(cid:48) = K\u03b1 + \u03b41 = (1 \u2212 \u03b1)K + \u03b41.\n\nr\n\n\u221a\n\nd+1)\u03b4\n\nd( R\n\nr ]\u03b4.\n\nr + 1) + R\n\nIn Lemma 1, we establish that K(cid:48) is indeed a \u03b4-interior of K and deduce a linear bound for d(K,K(cid:48)).\nLemma 1 (Proof in Appendix D). We assume Assumptions 1 and 5 and also assume that K is\n< 1. The set K(cid:48) = (1\u2212 \u03b1)K + \u03b41\ndown-closed and that \u03b4 is suf\ufb01ciently small such that \u03b1 = (\nis convex and compact. Moreover, K(cid:48) is a down-closed \u03b4-interior of K and satis\ufb01es d(K,K(cid:48)) \u2264\n\u221a\n[\n4.2 No-(1 \u2212 1/e)-Regret Biphasic Bandit Algorithm\nOur proposed bandit algorithm is based on the online algorithm Mono-Frank-Wolfe in Section 3.\nPrecisely, we want to replace the stochastic gradients in Algorithm 1 with the one-point gradient\nestimators, and run the modi\ufb01ed algorithm on K(cid:48), a proper \u03b4-interior of the constraint set K. Note\nthat the one-point estimator requires that the point at which we estimate the gradient (i.e., x) must\nbe identical to the point that we play (i.e., x + \u03b4u), if we ignore the random \u03b4u. In Algorithm 1,\nhowever, we play point xq but obtain estimated gradient at other points x(k(cid:48))\n(Line 5). This suggests\nthat Algorithm 1 cannot be extended to the bandit setting via the one-point gradient estimator directly.\nTo circumvent this limitation, we propose a biphasic approach that categorizes the plays into the\nexploration and exploitation phases. To motivate this biphasic method, recall that in Algorithm 1, we\nneed to play xq to gain high rewards (exploitation), whilst we observe \u02dc\u2207Ft(x(k(cid:48))\n) to obtain gradient\ninformation (exploration). So in our biphasic approach, we expend a large portion of plays on\nexploitation (play xq, so we can still get high rewards) and a small portion of plays on exploring the\ngradient (play x(k(cid:48))\nto get one-point gradient estimators, so we can still obtain suf\ufb01cient information).\nTo be precise, we divide the T objective functions into Q equisized blocks of size L, where L = T /Q.\nEach block is subdivided into two phases. As shown in Algorithm 2, we randomly choose K (cid:28) L\nfunctions for exploration (Line 6) and use the remaining (L \u2212 K) functions for exploitation (Line 7).\n\nq\n\nq\n\nq\n\n2This assumption is an analogue of the assumption rBd \u2286 K \u2286 RBd in [23].\n\n6\n\n\fWe describe our algorithm formally in Algorithm 2. We also note that for a general constraint\nset K with a proper \u03b4-interior K(cid:48) such that d(K,K(cid:48)) \u2264 c1\u03b4\u03b3, Theorem 4 (Appendix E.1) shows a\n3+5 min{1,\u03b3}\n(1 \u2212 1/e)-regret bound of O(T\n3+6 min{1,\u03b3} ). Moreover, with Lemma 1, this result can be extended to\ndown-closed constraint sets K, as shown in Theorem 2.\n\nAlgorithm 2 Bandit-Frank-Wolfe\nInput: smoothing radius \u03b4, \u03b4-interior K(cid:48) with lower bound u, horizon T , block size L, the number\nof exploration steps per block K, online linear maximization oracles on K(cid:48): E (1),\u00b7\u00b7\u00b7 ,E (K), step\nsizes \u03c1k \u2208 (0, 1), \u03b7k \u2208 (0, 1), the number of blocks Q = T /L\n\nOutput: y1, y2, . . .\n1: for q = 1, 2, . . . , Q do\nq \u2190 u\n2:\n3:\n\n\u2190 x(k)\n\nq +\n\n4:\n5:\n6:\n\n7:\n8:\n9:\n\nq \u2190 0, x(1)\nd(0)\nFor k = 1, 2, . . . , K, let v(k)\nq \u2212 u). Set xq \u2190 x(K+1)\nLet (tq,1, . . . , tq,L) be a random permutation of {(q \u2212 1)L + 1,\u00b7\u00b7\u00b7 , qL}\nfor t = (q \u2212 1)L + 1,\u00b7\u00b7\u00b7 , qL do\n\nq \u2208 K(cid:48) be the output of E (k) in round q, x(k+1)\n\nq\n\nq\n\n\u03b7k(v(k)\n\nyt = ytq,k(cid:48) = x(k(cid:48))\n\nIf t \u2208 {tq,1,\u00b7\u00b7\u00b7 , tq,K}, \ufb01nd the corresponding k(cid:48) \u2208 [K] such that t = tq,k(cid:48), play\n(cid:46) Exploration\n(cid:46) Exploitation\n\nq + \u03b4uq,k(cid:48) for Ft (i.e., Ftq,k(cid:48) ), where uq,k(cid:48) \u223c Sd\u22121\n\nIf t \u2208 {(q \u2212 1)L + 1,\u00b7\u00b7\u00b7 , qL} \\ {tq,1,\u00b7\u00b7\u00b7 , tq,K}, play yt = xq for Ft\nq \u2190 (1 \u2212 \u03c1k)d(k\u22121)\n\n+ \u03c1kgq,k, compute\n\nend for\nFor k = 1, 2, . . . , K, gq,k \u2190 d\n, d(k)\n\nq (cid:105) as reward for E (k), and feed back d(k)\n\n\u03b4 Ftq,k (ytq,k )uq,k, d(k)\nto E (k)\n\n(cid:104)v(k)\n\nq\n\nq\n\nq\n\n10: end for\nAssumption 6. Every objective function Ft satis\ufb01es that supx\u2208K|Ft(x)|\u2264 M1.\nTheorem 2 (Proof in Appendix E.2). We assume Assumptions 1, 2 and 4 to 6, and also assume\nthat K is down-closed. If we generate K(cid:48) as in Lemma 1, and set \u03b4 = r\u221a\n9 , K =\nT 2\n3 , \u03b7k = 1\nmost\n\n(k+2)2/3 , then yt \u2208 K,\u2200t, and the expected (1 \u2212 1/e)-regret of Algorithm 2 is at\n\n9 , L = T 7\n\nK , \u03c1k =\n\nT \u2212 1\n\nd+2\n\n2\n\nE[RT ] \u2264N T\n\n8\n9 +\n\n3r[2L2\n\n1 + (3L2R + 2L1)2]\n\u221a\n41/3(\n\nT\n\n2\n3 +\n\nL2D2\n\n2\n\nT\n\n1\n3 ,\n\nwhere N = (1\u22121/e)r\n\n\u221a\n\nd+2\n\n\u221a\n\n[\n\nd( R\n\nr +1)+ R\n\nr ]L1 + (2\u22121/e)r\n\n\u221a\n\nd+2\n\nd + 2)\n\u221a\nL1 +2M1 + 3\u00b741/6(\n\nd+2)d2M 2\n1\nr\n\n+ 3(\n\n\u221a\n\nd+2)D2\n4r\n\n+C.\n\n5 Bandit Submodular Set Maximization\n\nIn this section we aim to solve the problem of bandit submodular set maximization by lifting it to\nthe continuous domain. Let objective functions f1,\u00b7\u00b7\u00b7 , fT : 2\u2126 \u2192 R\u22650 be a sequence of monotone\nsubmodular set functions de\ufb01ned on a common ground set \u2126 = {1, . . . , d}. We also let I denote the\nmatroid constraint, and K be the matroid polytope of I, i.e., K = conv{1I : I \u2208 I} \u2286 [0, 1]d [16],\nwhere conv denotes the convex hull.\n\n5.1 An Impossibility Result\n\nA natural idea is that at each round t, we apply Bandit-Frank-Wolfe, the continuous algorithm\nin Section 4.2, on Ft subject to K, where Ft is the multilinear extension of the discrete objective\nfunction ft. Then we get a fractional solution yt \u2208 K, round it to a set Yt \u2208 I, and play Yt for ft.\nFor the exploitation phase, we will use a lossless rounding scheme such that ft(Yt) \u2265 Ft(yt), so we\nwill not get lower rewards after the rounding. Instances of such a lossless rounding scheme include\npipage rounding [4, 16] and the contention resolution scheme [46].\nIn the exploration phase, we need to use the reward ft(Yt) to obtain an unbiased gradient estimator of\nthe smoothed version of Ft. As the one-point estimator d\n\u03b4 F (x + \u03b4u)u in Algorithm 2 is unbiased, we\n\n7\n\n\frequire the (random) rounding scheme roundI : [0, 1]d \u2192 I to satisfy the following unbiasedness\ncondition\n\nE[f (roundI(x))] = F (x),\n\n\u2200x \u2208 [0, 1]d\n\n(2)\n\n(3)\n\nfor any submodular set function f on the ground set \u2126 and its multilinear extension F .\nSince we have no a priori knowledge of the objective function ft before playing a subset for it, such\na rounding scheme roundI should not depend on the function choice f. In other words, we need to\n\ufb01nd an independent roundI such that Eq. (2) holds for any submodular function f de\ufb01ned on \u2126.\nWe \ufb01rst review the random rounding scheme RandRound : [0, 1]d \u2192 I\n\n(cid:26)i \u2208 RandRound(x) with probability xi ;\n\ni /\u2208 RandRound(x) with probability 1 \u2212 xi .\n\nIn other words, each element i \u2208 \u2126 is included with an independent probability xi, where xi is the\ni-th coordinate of x. RandRound satis\ufb01es the unbiasedness requirement Eq. (2). However, its range\nis 2\u2126 in general, so the rounded set may fall outside of I. In fact, as shown in Lemma 2, there exists\na matroid I for which we cannot \ufb01nd a proper unbiased rounding scheme whose range is contained\nin I.\nLemma 2 (Proof in Appendix F). There exists a matroid I for which there is no rounding scheme\nround : [0, 1]d \u2192 I whose construction does not depend on the function f and which satis\ufb01es Eq. (2)\nfor any submodular set function f.\n\n5.2 Responsive Bandit Algorithm\n\nThe impossibility result Lemma 2 shows that the one-point estimator may be incapable of solving the\ngeneral BSM problem. As a result, we study a slightly relaxed setting termed the responsive bandit\nsubmodular maximization problem (RBSM). Let Xt be the subset that we play at the t-th round. The\nonly difference between the responsive bandit setting and the vanilla bandit setting is that in the\nresponsive setting, if Xt /\u2208 I, we can still observe the function value ft(Xt) as feedback, while the\nreceived reward at round t is 0 (since the subset that we play violates the constraint I). In other words,\nthe environment is always responsive to the player\u2019s decisions, no matter whether Xt is in I or not.\nWe note that the RBSM problem has broad applications in both theory and practice. In theory, RBSM\ncan be regarded as a relaxation of BSM, which helps us to better understand the nature of BSM. In\npractice, the responsive model (not only for submodular maximization or bandit) has potentially many\napplications when a decision cannot be committed, while we can still get the potential outcome of the\ndecision as feedback. For example, suppose that we have a replenishable inventory of items where\ncustomers arrive (in an online fashion) with a utility function unknown to us. We need to allocate a\ncollection of items to each customer, and the goal is to maximize the total utility (reward) of all the\ncustomers. We may use a partition matroid to model diversity (in terms of category, time, etc). In the\nRBSM model, we cannot allocate the collection of items which violates the constraint to the customer,\nbut we can use it as a questionnaire, and the customer will tell us the potential utility if she received\nthose items. The feedback will help us to make better decisions in the future. Similar examples\ninclude portfolio selection when the investment choice is too risky, i.e., violates the recommended\nconstraint set, we may stop trading and thus get no reward on that trading period, but at the same\ntime observe the potential reward if we invested in that way.\nNow, we turn to propose our algorithm. As discussed in Section 5.1, we want to solve the problem of\nbandit submodular set maximization by applying Algorithm 2 on the multilinear extensions Ft with\ndifferent rounding schemes. Precisely, in the responsive setting, we use the RandRound Eq. (3) in\nthe exploration phase to guarantee that we can always obtain unbiased gradient estimators, and use\na lossless rounding scheme LosslessRound in the exploitation phase to receive high rewards. We\npresent Responsive-Frank-Wolfe in Algorithm 3, and show that it achieves a (1 \u2212 1/e)-regret\nbound of O(T 8/9).\nAssumption 7. Every objective function ft is monotone submodular with supX\u2286\u2126|ft(X)|\u2264 M1.\nTheorem 3 (Proof in Appendix G). Under Assumptions 4, 5 and 7, if we generate K(cid:48) as in Lemma 1,\nand set \u03b4 = r\u221a\n(k+2)2/3 , then in the responsive setting,\n\n9 , K = T 2\n\n9 , L = T 7\n\nT \u2212 1\n\nd+2\n\n3 , \u03b7k = 1\n\nK , \u03c1k =\n\n2\n\n8\n\n\fAlgorithm 3 Responsive-Frank-Wolfe\nInput: matroid constraint I, matroid polytope K, smoothing radius \u03b4, \u03b4-interior K(cid:48) with lower\nbound u, horizon T , block size L, the number of exploration steps per block K, online linear\nmaximization oracles on K(cid:48): E (1),\u00b7\u00b7\u00b7 ,E (K), steps sizes \u03c1k \u2208 (0, 1), \u03b7k \u2208 (0, 1), the number of\nblocks Q = T /L\nOutput: Y1, Y2, . . .\n1: for q = 1, 2, . . . , Q do\nq \u2190 0, x(1)\nq \u2190 u\nd(0)\n2:\nFor k = 1, 2, . . . , K, let v(k)\n3:\nq \u2212 u). Set xq \u2190 x(K+1)\nLet (tq,1, . . . , tq,L) be a random permutation of {(q \u2212 1)L + 1,\u00b7\u00b7\u00b7 , qL}\nfor t = (q \u2212 1)L + 1,\u00b7\u00b7\u00b7 , qL do\n\nq \u2208 K(cid:48) be the output of E (k) in round q, x(k+1)\n\nYtq,k(cid:48) = RandRound(ytq,k(cid:48) ) for ft (i.e., ftq,k(cid:48) ), where ytq,k(cid:48) = x(k(cid:48))\nYt \u2208 I, get reward ft(Yt); otherwise, get reward 0.\n\nIf t \u2208 {tq,1,\u00b7\u00b7\u00b7 , tq,K}, \ufb01nd the corresponding k(cid:48) \u2208 [K] such that t = tq,k(cid:48), play Yt =\nq + \u03b4uq,k(cid:48), uq,k(cid:48) \u223c Sd\u22121. If\n(cid:46) Exploration\nIf t \u2208 {(q \u2212 1)L + 1,\u00b7\u00b7\u00b7 , qL} \\ {tq,1,\u00b7\u00b7\u00b7 , tq,K}, play Yt = LosslessRound(yt) for ft,\n(cid:46) Exploitation\n\n\u2190 x(k)\n\n\u03b7k(v(k)\n\nwhere yt = xq\n\n4:\n5:\n6:\n\nq +\n\nq\n\nq\n\nend for\nFor k = 1, 2, . . . , K, gq,k \u2190 d\n, d(k)\n\nq (cid:105) as reward for E (k), and feed back d(k)\n\n\u03b4 ftq,k (Ytq,k )uq,k, d(k)\nto E (k)\n\nq\n\n(cid:104)v(k)\n\nq\n\nq \u2190 (1 \u2212 \u03c1k)d(k\u22121)\n\nq\n\n+ \u03c1kgq,k, compute\n\n7:\n\n8:\n9:\n\n10: end for\n\nthe expected (1 \u2212 1/e)-regret of Algorithm 3 is at most\n\nE[RT ] \u2264N T\n\n8\n9 +\n\nwhere N = (1\u22121/e)r\nL1 = 2M1\n\n[ d\nr +\nd, L2 = 4M1\n\n\u221a\n\nd+2\n\n\u221a\n\n\u221a\n\n(cid:112)d(d \u2212 1).\n\nd(1 + 1\n\n3r[2L2\n\n1 + (3\n41/3(\nr )]L1 + (2\u22121/e)r\n\n\u221a\n\nd+2\n\n\u221a\ndL2 + 2L1)2]\n\u221a\nd + 2)\n\u221a\nL1 + 3M1 + 3\u00b742/3(\n\n2\n3 +\n\nT\n\nL2d\n\n2\n\nT\n\n1\n3 ,\n\n\u221a\n\nd+2)d2M 2\n1\nr\n\n+ 3(\n\nd+2)d\n4r\n\n+ C,\n\n6 Conclusion\n\nIn this paper, by proposing a series of novel methods including the blocking procedure and the\npermutation methods, we developed Mono-Frank-Wolfe for the OCSM problem, which requires only\none stochastic gradient evaluation per function and still achieves a (1\u2212 1/e)-regret bound of O(T 4/5).\nWe then introduced the biphasic method and the notion of \u03b4-interior, to extend Mono-Frank-Wolfe\nto Bandit-Frank-Wolfe for the BCSM problem. Finally, we introduced the responsive model and\nthe corresponding Responsive-Frank-Wolfe Algorithm for the RBSM problem. We proved that\nboth Bandit-Frank-Wolfe and Responsive-Frank-Wolfe attain a (1 \u2212 1/e)-regret bound of\nO(T 8/9).\n\nAcknowledgments\n\nThis work is partially supported by the Google PhD Fellowship, NSF (IIS-1845032), ONR (N00014-\n19-1-2406) and AFOSR (FA9550-18-1-0160). We would like to thank Marko Mitrovic for his\nvaluable comments and Zheng Wei for help preparing some of the illustrations.\n\nReferences\n[1] Jacob D Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the dark: An ef\ufb01cient\n\nalgorithm for bandit linear optimization. In COLT, pages 263\u2013274, 2008.\n\n[2] Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization\n\nwith multi-point bandit feedback. In COLT, pages 28\u201340. Citeseer, 2010.\n\n9\n\n\f[3] Alekh Agarwal, Dean P Foster, Daniel J Hsu, Sham M Kakade, and Alexander Rakhlin.\n\nStochastic convex optimization with bandit feedback. In NIPS, pages 1035\u20131043, 2011.\n\n[4] Alexander A Ageev and Maxim I Sviridenko. Pipage rounding: A new method of constructing\nalgorithms with proven performance guarantee. Journal of Combinatorial Optimization, 8(3):\n307\u2013328, 2004.\n\n[5] Baruch Awerbuch and Robert Kleinberg. Online linear optimization and adaptive routing.\n\nJournal of Computer and System Sciences, 74(1):97\u2013114, 2008.\n\n[6] Francis Bach. Submodular functions: from discrete to continous domains. arXiv preprint\n\narXiv:1511.00394, 2015.\n\n[7] Francis Bach, Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, et al. Optimization with\nsparsity-inducing penalties. Foundations and Trends R(cid:13) in Machine Learning, 4(1):1\u2013106, 2012.\n[8] Eric Balkanski and Yaron Singer. Mechanisms for fair attribution. In Proceedings of the\n\nSixteenth ACM Conference on Economics and Computation, pages 529\u2013546. ACM, 2015.\n\n[9] An Bian, Baharan Mirzasoleiman, Joachim M. Buhmann, and Andreas Krause. Guaranteed\nnon-convex optimization: Submodular maximization over continuous domains. In AISTATS,\nFebruary 2017.\n\n[10] An Bian, Joachim M Buhmann, and Andreas Krause. Optimal dr-submodular maximization\n\nand applications to provable mean \ufb01eld inference. arXiv preprint arXiv:1805.07482, 2018.\n\n[11] S\u00e9bastien Bubeck and Ronen Eldan. Multi-scale exploration of convex functions and bandit\n\nconvex optimization. In COLT, pages 583\u2013589, 2016.\n\n[12] S\u00e9bastien Bubeck, Nicolo Cesa-Bianchi, and Sham Kakade. Towards minimax policies for\nonline linear optimization with bandit feedback. In COLT, volume 23, pages 41.1\u201341.14, 2012.\n\n[13] S\u00e9bastien Bubeck, Nicolo Cesa-Bianchi, et al. Regret analysis of stochastic and nonstochastic\nmulti-armed bandit problems. Foundations and Trends R(cid:13) in Machine Learning, 5(1):1\u2013122,\n2012.\n\u221a\n\n[14] S\u00e9bastien Bubeck, Ofer Dekel, Tomer Koren, and Yuval Peres. Bandit convex optimization:\n\nT\n\nregret in one dimension. In COLT, pages 266\u2013278, 2015.\n\n[15] S\u00e9bastien Bubeck, Yin Tat Lee, and Ronen Eldan. Kernel-based methods for bandit convex\n\noptimization. In STOC, pages 72\u201385. ACM, 2017.\n\n[16] Gruia Calinescu, Chandra Chekuri, Martin P\u00e1l, and Jan Vondr\u00e1k. Maximizing a monotone\nsubmodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):\n1740\u20131766, 2011.\n\n[17] Lin Chen, Christopher Harshaw, Hamed Hassani, and Amin Karbasi. Projection-free online\noptimization with stochastic gradient: From convexity to submodularity. In ICML, page to\nappear, 2018.\n\n[18] Lin Chen, Hamed Hassani, and Amin Karbasi. Online continuous submodular maximization.\n\nIn AISTATS, pages 1896\u20131905, 2018.\n\n[19] Lin Chen, Mingrui Zhang, Hamed Hassani, and Amin Karbasi. Black box submodular maxi-\n\nmization: Discrete and continuous settings. arXiv preprint arXiv:1901.09515, 2019.\n\n[20] Lin Chen, Mingrui Zhang, and Amin Karbasi. Projection-free bandit convex optimization. In\n\nAISTATS, pages 2047\u20132056, 2019.\n\n[21] Varsha Dani, Sham M Kakade, and Thomas P Hayes. The price of bandit information for online\noptimization. In Advances in Neural Information Processing Systems, pages 345\u2013352, 2008.\n\n[22] Ofer Dekel, Ronen Eldan, and Tomer Koren. Bandit smooth convex optimization: Improving\n\nthe bias-variance tradeoff. In NIPS, pages 2926\u20132934, 2015.\n\n10\n\n\f[23] Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex opti-\nmization in the bandit setting: gradient descent without a gradient. In SODA, pages 385\u2013394,\n2005.\n\n[24] Marguerite Frank and Philip Wolfe. An algorithm for quadratic programming. Naval Research\n\nLogistics (NRL), 3(1-2):95\u2013110, 1956.\n\n[25] Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, and S Muthukrishnan. Adaptive\nsubmodular maximization in bandit setting. In Advances in Neural Information Processing\nSystems, pages 2697\u20132705, 2013.\n\n[26] Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications in\n\nactive learning and stochastic optimization. JAIR, 42:427\u2013486, 2011.\n\n[27] Daniel Golovin, Andreas Krause, and Matthew Streeter. Online submodular maximization\nunder a matroid constraint with application to learning assignments. Technical report, arXiv,\n2014.\n\n[28] Elad Hazan and Satyen Kale. Projection-free online learning. In ICML, pages 1843\u20131850,\n\n2012.\n\n[29] Elad Hazan and K\ufb01r Levy. Bandit convex optimization: Towards tight bounds. In NIPS, pages\n\n784\u2013792, 2014.\n\n[30] Elad Hazan and Yuanzhi Li. An optimal algorithm for bandit convex optimization. arXiv\n\npreprint arXiv:1603.04350, 2016.\n\n[31] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex\n\noptimization. Machine Learning, 69(2):169\u2013192, 2007.\n\n[32] Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends R(cid:13) in\n\nOptimization, 2(3-4):157\u2013325, 2016.\n\n[33] Martin Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML,\n\npages 427\u2013435, 2013.\n\n[34] Sham M Kakade, Adam Tauman Kalai, and Katrina Ligett. Playing games with approximation\n\nalgorithms. SIAM Journal on Computing, 39(3):1088\u20131106, 2009.\n\n[35] Robert D Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In NIPS,\n\npages 697\u2013704, 2005.\n\n[36] Alex Kulesza, Ben Taskar, et al. Determinantal point processes for machine learning. Founda-\n\ntions and Trends R(cid:13) in Machine Learning, 5(2\u20133):123\u2013286, 2012.\n\n[37] Aryan Mokhtari, Hamed Hassani, and Amin Karbasi. Conditional gradient method for stochastic\n\nsubmodular maximization: Closing the gap. In AISTATS, pages 1886\u20131895, 2018.\n\n[38] Aryan Mokhtari, Hamed Hassani, and Amin Karbasi. Stochastic conditional gradient methods:\nFrom convex minimization to submodular maximization. arXiv preprint arXiv:1804.09554,\n2018.\n\n[39] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations\nfor maximizing submodular set functions i. Mathematical Programming, 14(1):265\u2013294, 1978.\n\n[40] Ankan Saha and Ambuj Tewari. Improved regret guarantees for online smooth convex optimiza-\n\ntion with bandit feedback. In AISTATS, pages 636\u2013642, 2011.\n\n[41] Shai Shalev-Shwartz. Online learning: Theory, algorithms, and applications. PhD thesis, The\n\nHebrew University of Jerusalem, 2007.\n\n[42] Shai Shalev-Shwartz and Yoram Singer. A primal-dual perspective of online learning algorithms.\n\nMachine Learning, 69(2-3):115\u2013142, 2007.\n\n11\n\n\f[43] Ohad Shamir. On the complexity of bandit and derivative-free stochastic convex optimization.\n\nIn COLT, pages 3\u201324, 2013.\n\n[44] Matthew Streeter and Daniel Golovin. An online algorithm for maximizing submodular\n\nfunctions. In NIPS, pages 1577\u20131584, 2009.\n\n[45] Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, and Jeff A Bilmes. Learning mixtures of\nsubmodular functions for image collection summarization. In Advances in neural information\nprocessing systems, pages 1413\u20131421, 2014.\n\n[46] Jan Vondr\u00e1k, Chandra Chekuri, and Rico Zenklusen. Submodular function maximization via\nthe multilinear relaxation and contention resolution schemes. In STOC, pages 783\u2013792. ACM,\n2011.\n\n[47] Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active\n\nlearning. In International Conference on Machine Learning, pages 1954\u20131963, 2015.\n\n[48] Baosheng Yu, Meng Fang, and Dacheng Tao. Linear submodular bandits with a knapsack\n\nconstraint. In Thirtieth AAAI Conference on Arti\ufb01cial Intelligence, 2016.\n\n[49] Yisong Yue and Carlos Guestrin. Linear submodular bandits and their application to diversi\ufb01ed\n\nretrieval. In NIPS, pages 2483\u20132491, 2011.\n\n[50] Martin Zinkevich. Online convex programming and generalized in\ufb01nitesimal gradient ascent.\n\nIn ICML, pages 928\u2013936, 2003.\n\n12\n\n\f", "award": [], "sourceid": 4941, "authors": [{"given_name": "Mingrui", "family_name": "Zhang", "institution": "Yale University"}, {"given_name": "Lin", "family_name": "Chen", "institution": "Yale University"}, {"given_name": "Hamed", "family_name": "Hassani", "institution": "UPenn"}, {"given_name": "Amin", "family_name": "Karbasi", "institution": "Yale"}]}