{"title": "Online Decision-Making in General Combinatorial Spaces", "book": "Advances in Neural Information Processing Systems", "page_first": 3482, "page_last": 3490, "abstract": "We study online combinatorial decision problems, where one must make sequential decisions in some combinatorial space without knowing in advance the cost of decisions on each trial; the goal is to minimize the total regret over some sequence of trials relative to the best fixed decision in hindsight. Such problems have been studied mostly in settings where decisions are represented by Boolean vectors and costs are linear in this representation. Here we study a general setting where costs may be linear in any suitable low-dimensional vector representation of elements of the decision space. We give a general algorithm for such problems that we call low-dimensional online mirror descent (LDOMD); the algorithm generalizes both the Component Hedge algorithm of Koolen et al. (2010), and a recent algorithm of Suehiro et al. (2012). Our study offers a unification and generalization of previous work, and emphasizes the role of the convex polytope arising from the vector representation of the decision space; while Boolean representations lead to 0-1 polytopes, more general vector representations lead to more general polytopes. We study several examples of both types of polytopes. Finally, we demonstrate the benefit of having a general framework for such problems via an application to an online transportation problem; the associated transportation polytopes generalize the Birkhoff polytope of doubly stochastic matrices, and the resulting algorithm generalizes the PermELearn algorithm of Helmbold and Warmuth (2009).", "full_text": "Online Decision-Making in\n\nGeneral Combinatorial Spaces\n\nArun Rajkumar\n\nShivani Agarwal\n\nDepartment of Computer Science and Automation\nIndian Institute of Science, Bangalore 560012, India\n{arun r,shivani}@csa.iisc.ernet.in\n\nAbstract\n\nWe study online combinatorial decision problems, where one must make sequen-\ntial decisions in some combinatorial space without knowing in advance the cost of\ndecisions on each trial; the goal is to minimize the total regret over some sequence\nof trials relative to the best \ufb01xed decision in hindsight. Such problems have been\nstudied mostly in settings where decisions are represented by Boolean vectors and\ncosts are linear in this representation. Here we study a general setting where costs\nmay be linear in any suitable low-dimensional vector representation of elements\nof the decision space. We give a general algorithm for such problems that we\ncall low-dimensional online mirror descent (LDOMD); the algorithm generalizes\nboth the Component Hedge algorithm of Koolen et al. (2010), and a recent algo-\nrithm of Suehiro et al. (2012). Our study offers a uni\ufb01cation and generalization of\nprevious work, and emphasizes the role of the convex polytope arising from the\nvector representation of the decision space; while Boolean representations lead to\n0-1 polytopes, more general vector representations lead to more general polytopes.\nWe study several examples of both types of polytopes. Finally, we demonstrate the\nbene\ufb01t of having a general framework for such problems via an application to an\nonline transportation problem; the associated transportation polytopes generalize\nthe Birkhoff polytope of doubly stochastic matrices, and the resulting algorithm\ngeneralizes the PermELearn algorithm of Helmbold and Warmuth (2009).\n\n1\n\nIntroduction\n\nIn an online combinatorial decision problem, the decision space is a set of combinatorial structures,\nsuch as subsets, trees, paths, permutations, etc. On each trial, one selects a combinatorial structure\nfrom the decision space, and incurs a loss; the goal is to minimize the regret over some sequence of\ntrials relative to the best \ufb01xed structure in hindsight. Such problems have been studied extensively\nin the last several years, primarily in the setting where the combinatorial structures are represented\nby Boolean vectors, and costs are linear in this representation; this includes online learning of paths,\npermutations, and various other speci\ufb01c combinatorial structures [16, 17, 12], as well as the Compo-\nnent Hedge algorithm of Koolen et al. [14] which generalizes many of these previous studies. More\nrecently, Suehiro et al. [15] considered a setting where the combinatorial structures of interest are\nrepresented by the vertices of the base polytope of a submodular function, and costs are linear in this\nrepresentation; this includes as special cases several of the Boolean examples considered earlier, as\nwell as new settings such as learning permutations with certain position-based losses (see also [2]).\nIn this work, we consider a general form of the online combinatorial decision problem, where costs\ncan be linear in any suitable low-dimensional vector representation of the combinatorial structures\nof interest. This encompasses representations as Boolean vectors and vertices of submodular base\npolytopes as special cases, but also includes many other settings. We give a general algorithm for\n\n1\n\n\fsuch problems that we call low-dimensional online mirror descent (LDOMD); the algorithm gener-\nalizes both the Component Hedge algorithm of Koolen et al. for Boolean representations [14], and\nthe algorithm of Suehiro et al. for submodular polytope vertex representations [15].1 As we show, in\nmany settings of interest, the regret bounds for LDOMD are better than what can be obtained with\nother algorithms for online decision problems, such as the Hedge algorithm of Freund and Schapire\n[10] and the Follow the Perturbed Leader algorithm of Kalai and Vempala [13].\nWe start with some preliminaries and background in Section 2, and describe the LDOMD algorithm\nand its analysis in Section 3. Our study emphasizes the role of the convex polytope arising from the\nvector representation of the decision space; we study several examples of such polytopes, including\nmatroid polytopes, polytopes associated with submodular functions, and permutation polytopes in\nSections 4\u20136, respectively. Section 7 applies our framework to an online transportation problem.\n\n2 Preliminaries and Background\nNotation. For n \u2208 Z+, we will denote [n] = {1, . . . , n}. For a vector z \u2208 Rd, we will denote by\n(cid:107)z(cid:107)1, (cid:107)z(cid:107)2, and (cid:107)z(cid:107)\u221e the standard L1, L2, and L\u221e norms of z, respectively. For a set Z \u2286 Rd, we\nwill denote by conv(Z) the convex hull of Z, and by int(Z) the interior of Z. For a closed convex\nset K \u2286 Rd and Legendre function F : K\u2192R,2 we will denote by BF : K \u00d7 int(K)\u2192R+ the\nBregman divergence associated with F , de\ufb01ned as BF (x, x(cid:48)) = F (x)\u2212 F (x(cid:48))\u2212\u2207F (x(cid:48))\u00b7 (x\u2212 x(cid:48)),\nand by F \u2217 : \u2207F (int(K))\u2192R the Fenchel conjugate of F , de\ufb01ned as F \u2217(u) = supx\u2208K(x\u00b7u\u2212F (x)).\nProblem Setup. Let C be a (\ufb01nite but large) set of\ncombinatorial structures. Let \u03c6 : C\u2192Rd be some in-\njective mapping that maps each c \u2208 C to a unique\nvector \u03c6(c) \u2208 Rd (so that |\u03c6(C)| = |C|). We will\ngenerally assume d (cid:28) |C| (e.g. d = poly log(|C|)).\nThe online combinatorial decision-making problem\nwe consider can be described as follows: On each\ntrial t, one makes a decision in C by selecting a struc-\nture ct \u2208 C, and receives a loss vector (cid:96)t \u2208 [0, 1]d;\nthe loss incurred is given by \u03c6(ct) \u00b7 (cid:96)t (see Figure 1).\nThe goal is to minimize the regret relative to the sin-\ngle best structure in C in hindsight; speci\ufb01cally, the\nregret of an algorithm A that selects ct \u2208 C on trial t\nover T trials is de\ufb01ned as\n\n\u2013 Predict ct \u2208 C\n\u2013 Receive loss vector (cid:96)t \u2208 [0, 1]d\n\u2013 Incur loss \u03c6(ct) \u00b7 (cid:96)t\n\nFigure 1: Online decision-making in a gen-\neral combinatorial space.\n\nOnline Combinatorial Decision-Making\nInputs:\n\nFinite set of combinatorial structures C\nMapping \u03c6 : C\u2192Rd\n\nFor t = 1 . . . T :\n\nRT [A] = (cid:80)T\n\nt=1 \u03c6(ct) \u00b7 (cid:96)t \u2212 minc\u2208C(cid:80)T\n\nt=1 \u03c6(c) \u00b7 (cid:96)t .\n\nIn particular, we would like to design algorithms whose worst-case regret (over all possible loss se-\nquences) is sublinear in T (and also has as good a dependence as possible on other relevant problem\nparameters). From standard results, it follows that for any deterministic algorithm, there is always a\nloss sequence that forces the regret to be linear in T ; as is common in the online learning literature,\nwe will therefore consider randomized algorithms that maintain a probability distribution pt over C\nfrom which ct is randomly drawn, and consider bounding the expected regret of such algorithms.\nOnline Mirror Descent (OMD). Recall that online mirror descent (OMD) is a general algorithmic\nframework for online convex optimization problems, where on each trial t, one selects a point xt in\nsome convex set \u2126 \u2286 Rn, receives a convex cost function ft : \u2126\u2192R, and incurs a loss ft(xt); the\ngoal is to minimize the regret relative to the best single point in \u2126 in hindsight. The OMD algorithm\nmakes use of a Legendre function F : K\u2192R de\ufb01ned on a closed convex set K \u2287 \u2126, and effectively\nperforms a form of projected gradient descent in the dual space of int(K) under F , the projections\nbeing in terms of the Bregman divergence BF associated with F . See Appendix A.1 for an outline\nof OMD and its regret bound for the special case of online linear optimization, where costs ft are\nlinear (so that ft(x) = (cid:96)t \u00b7 x for some (cid:96)t \u2208 Rn), which will be relevant to our study.\n\n1We note that the recent online stochastic mirror descent (OSMD) algorithm of Audibert et al. [3] also\ngeneralizes the Component Hedge algorithm, but in a different direction: OSMD (as described in [3]) applies\nto only Boolean representations, but allows also for partial information (bandit) settings; here we consider only\nfull information settings, but allow for more general vector representations.\n2Recall that for a closed convex set K \u2286 Rd, a function F : K\u2192R is Legendre if it is strictly convex,\ndifferentiable on int(K), and (for any norm (cid:107) \u00b7 (cid:107) on Rd) (cid:107)\u2207F (xn)(cid:107)\u2192 + \u221e whenever {xn} converges to a\npoint in the boundary of K.\n\n2\n\n\fHedge/Na\u00a8\u0131ve OMD. The Hedge algorithm proposed by Freund and Schapire [10] is widely used\nfor online decision problems in general. The algorithm maintains a probability distribution over the\ndecision space, and can be viewed as an instantiation of the OMD framework, with \u2126 (and K) the\nprobability simplex over the decision space, linear costs ft (since one works with expected losses),\nand F the negative entropy. When applied to online combinatorial decision problems in a na\u00a8\u0131ve\nmanner, the Hedge algorithm requires maintaining a probability distribution over the combinatorial\ndecision space C, which in many cases can be computationally prohibitive (see Appendix A.2 for\nan outline of the algorithm, which we also refer to as Na\u00a8\u0131ve OMD). The following bound on the\nexpected regret of the Hedge/Na\u00a8\u0131ve OMD algorithm is well known:\nTheorem 1 (Regret bound for Hedge/Na\u00a8\u0131ve OMD). Let \u03c6(c) \u00b7 (cid:96)t \u2208 [a, b] \u2200c \u2208 C, t \u2208 [T ]. Then\nsetting \u03b7\u2217 = 2\n\n(cid:113) 2 ln |C|\n\ngives\n\n(b\u2212a)\n\nT\n\n(cid:104)\n\nE\n\nRT\n\n(cid:114)\n(cid:2) Hedge(\u03b7\u2217)(cid:3)(cid:105) \u2264 (b \u2212 a)\n\nT ln|C|\n\n.\n\n2\n\nFollow the Perturbed Leader (FPL). Another widely used algorithm for online decision problems\nis the Follow the Perturbed Leader (FPL) algorithm proposed by Kalai and Vempala [13] (see Ap-\npendix A.3 for an outline of the algorithm). Note that in the combinatorial setting, FPL requires the\nsolution to a combinatorial optimization problem on each trial, which may or may not be ef\ufb01ciently\nsolvable depending on the form of the mapping \u03c6. The following bound on the expected regret of\nthe FPL algorithm is well known:\nTheorem 2 (Regret bound for FPL). Let (cid:107)\u03c6(c) \u2212 \u03c6(c(cid:48))(cid:107)1 \u2264 D1, (cid:107)(cid:96)t(cid:107)1 \u2264 G1, and |\u03c6(c) \u00b7 (cid:96)t| \u2264 B\n\u2200c, c(cid:48) \u2208 C, t \u2208 [T ]. Then setting \u03b7\u2217 =\n\n(cid:113) D1\n(cid:2) FPL(\u03b7\u2217)(cid:3)(cid:105) \u2264 2\n(cid:112)\n\nBG1T gives\n\n(cid:104)\n\nE\n\nRT\n\nD1BG1T .\n\nm \u2208 Rd such that S = conv({x(cid:48)\n\nPolytopes. Recall that a set S \u2282 Rd is a polytope if there exist a \ufb01nite number of points x1, . . . , xn \u2208\nRd such that S = conv({x1, . . . , xn}). Any polytope S \u2282 Rd has a unique minimal set of points\nm}); these points are called the vertices of S. A\nx(cid:48)\n1, . . . , x(cid:48)\npolytope S \u2282 Rd is said to be a 0-1 polytope if all its vertices lie in the Boolean hypercube {0, 1}d.\nAs we shall see, in our study of online combinatorial decision problems as above, the polytope\nconv(\u03c6(C)) \u2282 Rd will play a central role. Clearly, if \u03c6(C) \u2286 {0, 1}d, then conv(\u03c6(C)) is a 0-1\npolytope; in general, however, conv(\u03c6(C)) can be any polytope in Rd.\n\n1, . . . , x(cid:48)\n\n3 Low-Dimensional Online Mirror Descent (LDOMD)\nWe describe the Low-Dimensional OMD (LDOMD) algorithm in Figure 2. The algorithm maintains\na point xt in the polytope conv(\u03c6(C)). It makes use of a Legendre function F : K\u2192R de\ufb01ned on\na closed convex set K \u2287 conv(\u03c6(C)), and effectively performs OMD in a d-dimensional space\nrather than in a |C|-dimensional space as in the case of Hedge/Na\u00a8\u0131ve OMD. Note that an ef\ufb01cient\nimplementation of LDOMD requires two operations to be performed ef\ufb01ciently: (a) given a point\nxt \u2208 conv(\u03c6(C)), one needs to be able to ef\ufb01ciently \ufb01nd a \u2018decomposition\u2019 of xt into a convex\ncombination of a small number of points in \u03c6(C) (this yields a distribution pt \u2208 \u2206C that satis\ufb01es\nEc\u223cpt[\u03c6(c)] = xt and also has small support, allowing ef\ufb01cient sampling); and (b) given a point\n\n(cid:101)xt+1 \u2208 K, one needs to be able to ef\ufb01ciently \ufb01nd a \u2018projection\u2019 of(cid:101)xt+1 onto conv(\u03c6(C)) in terms\n\nof the Bregman divergence BF . The following regret bound for LDOMD follows directly from the\nstandard OMD regret bound (see Theorem 4 in Appendix A.1):\nTheorem 3 (Regret bound for LDOMD). Let BF (\u03c6(c), x1) \u2264 D2 \u2200c \u2208 C. Let (cid:107) \u00b7 (cid:107) be any norm\nin Rd such that (cid:107)(cid:96)t(cid:107) \u2264 G \u2200t \u2208 [T ], and such that the restriction of F to conv(\u03c6(C)) is \u03b1-strongly\nconvex w.r.t. (cid:107) \u00b7 (cid:107)\u2217, the dual norm of (cid:107) \u00b7 (cid:107). Then setting \u03b7\u2217 = D\n\n(cid:113) 2\u03b1\n(cid:114)\n(cid:2) LDOMD(\u03b7\u2217)(cid:3)(cid:105) \u2264 DG\n\nG\n\n(cid:104)\n\nE\n\nRT\n\nT gives\n2T\n\u03b1\n\n.\n\nAs we shall see below, the LDOMD algorithm generalizes both the Component Hedge algorithm\nof Koolen et al. [14], which applies to settings where \u03c6(C) \u2286 {0, 1}d (Section 3.1), and the recent\nalgorithm of Suehiro et al. [15], which applies to settings where conv(\u03c6(C)) is the base polytope\nassociated with a submodular function (Section 5).\n\n3\n\n\fAlgorithm Low-Dimensional OMD (LDOMD) for Online Combinatorial Decision-Making\nInputs:\n\nFinite set of combinatorial structures C\nMapping \u03c6 : C\u2192Rd\n\nParameters:\n\nInitialize:\n\nFor t = 1 . . . T :\n\n\u03b7 > 0\nClosed convex set K \u2287 conv(\u03c6(C)), Legendre function F : K\u2192R\nx1 = argminx\u2208conv(\u03c6(C)) F (x) (or x1 = any other point in conv(\u03c6(C)))\n\u2013 Let pt be any distribution over C such that Ec\u223cpt[\u03c6(c)] = xt\n\u2013 Randomly draw ct \u223c pt\n\u2013 Receive loss vector (cid:96)t \u2208 [0, 1]d\n\u2013 Incur loss \u03c6(ct) \u00b7 (cid:96)t\n\n\u2013 Update:(cid:101)xt+1 \u2190 \u2207F \u2217(\u2207F (xt) \u2212 \u03b7(cid:96)t)\n\nxt+1 \u2190 argminx\u2208conv(\u03c6(C)) BF (x,(cid:101)xt+1)\n\n[Bregman projection step]\n\n[Decomposition step]\n\nFigure 2: The LDOMD algorithm.\n\n(cid:17)\n\n\u221a\n\n(cid:104)\n\n(cid:2) Hedge(\u03b7\u2217)(cid:3)(cid:105)\n\n3.1 LDOMD with 0-1 Polytopes\nConsider \ufb01rst a setting where each c \u2208 C is represented as a Boolean vector, so that \u03c6(C) \u2286 {0, 1}d.\nIn this case conv(\u03c6(C)) is a 0-1 polytope. This is the setting commonly studied under the term\n\u2018online combinatorial learning\u2019 [14, 8, 3]. In analyzing this setting, one generally introduces an\nadditional problem parameter, namely an upper bound m on the \u2018size\u2019 of each Boolean vector \u03c6(c).\nSpeci\ufb01cally, let us assume (cid:107)\u03c6(c)(cid:107)1 \u2264 m \u2200c \u2208 C for some m \u2208 [d].\nUnder the above assumption, it is easy to verify that applying Theorems 1 and 2 gives\n\n(cid:2) FPL(\u03b7\u2217)(cid:3)(cid:105)\nlet F : K\u2192R be the unnormalized negative entropy, de\ufb01ned as F (x) =(cid:80)d\n+ and to\ni=1 xi,\nwhich leads to a multiplicative update algorithm; the resulting algorithm was termed Component\nm ) \u2200c \u2208 C;\nHedge in [14]. For the above choice of F , it is easy to see that BF (\u03c6(c), x1) \u2264 m ln( d\nmoreover, (cid:107)(cid:96)t(cid:107)\u221e \u2264 1 \u2200t, and the restriction of F on conv(\u03c6(C)) is ( 1\nm )-strongly convex w.r.t. (cid:107)\u00b7(cid:107)1.\n(cid:17)\nTherefore, applying Theorem 3 with appropriate \u03b7\u2217, one gets\n\n(cid:104)\nRT\nT d) .\n+, it is common to take K = Rd\n\nE\nFor the LDOMD algorithm, since conv(\u03c6(C)) \u2286 [0, 1]d \u2282 Rd\n\ni=1 xi ln xi \u2212(cid:80)d\n\nT m ln( d\n\n(cid:113)\n\n(cid:113)\n\n(cid:2) LDOMD(\u03b7\u2217)(cid:3)(cid:105)\n\n= O\n\nm\n\n= O\n\nm\n\n= O(m\n\nm )\n\n;\n\nE\n\nRT\n\n(cid:16)\n\n(cid:104)\n\nE\n\nRT\n\n(cid:16)\n\nT ln( d\n\nm )\n\n.\n\nThus, when \u03c6(C) \u2286 {0, 1}d, the LDOMD algorithm with the above choice of F gives a better regret\nbound than both Hedge/Na\u00a8\u0131ve OMD and FPL; in fact the performance of LDOMD in this setting is\nessentially optimal, as one can easily show a matching lower bound [3].\nBelow we will see how several online combinatorial decision problems studied in the literature can\nbe recovered under the above framework (e.g. see [16, 17, 12, 14, 8]); in many of these cases, both\ndecomposition and unnormalized relative entropy projection steps in LDOMD can be performed\nef\ufb01ciently (in poly(d) time) (e.g. see [14]). As a warm-up, consider the following simple example:\nExample 1 (m-sets with element-based losses). Here C contains all size-m subsets of a ground set\nof d elements: C = {S \u2286 [d]||S| = m}. On each trial t, one selects a subset St \u2208 C and receives\na loss vector (cid:96)t \u2208 [0, 1]d, with (cid:96)t\ni specifying the loss for including element i \u2208 [d]; the loss for the\ni. Here it is natural to de\ufb01ne a mapping \u03c6 : C\u2192{0, 1}d that maps\ni\u2208St (cid:96)t\neach S \u2208 C to its characteristic vector, de\ufb01ned as \u03c6i(S) = 1(i \u2208 S) \u2200i \u2208 [d]; the loss incurred\non predicting St \u2208 C is then simply \u03c6(St) \u00b7 (cid:96)t. Thus \u03c6(C) = {x \u2208 {0, 1}d |(cid:107)x(cid:107)1 = m}, and\nconv(\u03c6(C)) = {x \u2208 [0, 1]d |(cid:107)x(cid:107)1 = m}. LDOMD with unnormalized negative entropy as above\n\nsubset St is given by(cid:80)\n(cid:113)\nhas a regret bound of O(cid:0)m\n\nm )(cid:1). It can be shown that both decomposition and unnormalized\n\nT ln( d\n\nrelative entropy projection steps take O(d2) time [17, 14].\n\n4\n\n\f3.2 LDOMD with General Polytopes\nNow consider a general setting where \u03c6 : C\u2192Rd, and conv(\u03c6(C)) \u2282 Rd is an arbitrary polytope.\nLet us assume again (cid:107)\u03c6(c)(cid:107)1 \u2264 m \u2200c \u2208 C for some m > 0.\nAgain, it is easy to verify that applying Theorems 1 and 2 gives\n\n(cid:2) Hedge(\u03b7\u2217)(cid:3)(cid:105)\n\n= O(m(cid:112)T ln|C|) ;\n\n\u221a\n= O(m\n\nT d) .\n\n(cid:104)\n\n(cid:2) FPL(\u03b7\u2217)(cid:3)(cid:105)\n\nE\n\nRT\n\nE\n\nRT\n\n(cid:104)\n\nE\n\n=\n\n(cid:40)\n\n(cid:2) LDOMD(\u03b7\u2217)(cid:3)(cid:105)\n\nO(cid:0)m(cid:112)T ln(d)(cid:1)\nO(cid:0)m(cid:112)T ln(m)(cid:1)\n\nFor the LDOMD algorithm, we consider two cases:\n+. Here one can again take K = Rd\nCase 1: \u03c6(C) \u2282 Rd\n+ and let F : K\u2192R be the unnormalized\nIn this case, one gets BF (\u03c6(c), x1) \u2264 m ln(d) + m \u2200c \u2208 C if m < d, and\nnegative entropy.\nBF (\u03c6(c), x1) \u2264 m ln(m) + d \u2200c \u2208 C if m \u2265 d. As before, (cid:107)(cid:96)t(cid:107)\u221e \u2264 1 \u2200t, and the restriction of F\n(cid:104)\nm )-strongly convex w.r.t. (cid:107) \u00b7 (cid:107)1, so applying Theorem 3 for appropriate \u03b7\u2217 gives\non conv(\u03c6(C)) is ( 1\n\nRT\nThus, when \u03c6(C) \u2282 Rd\n+, if ln|C| = \u03c9(max(ln(m), ln(d)))) and d = \u03c9(ln(m)), then the\nLDOMD algorithm with unnormalized negative entropy again gives a better regret bound than both\nHedge/Na\u00a8\u0131ve OMD and FPL.\nCase 2: \u03c6(C) (cid:54)\u2282 Rd\nIf one can ef\ufb01ciently compute bi = minc\u2208C \u03c6i(c) (or more generally, a\n+.\ni(c) = \u03c6i(c) \u2212 bi \u2200i gives a non-\nlower bound bi \u2264 minc\u2208C \u03c6i(c)) for each i \u2208 [d], then de\ufb01ning \u03c6(cid:48)\n+ \u2200c \u2208 C, and one can apply the LDOMD algorithm with\nnegative vector representation \u03c6(cid:48)(c) \u2208 Rd\nunnormalized negative entropy as above to the polytope \u03c6(cid:48)(C) \u2282 Rd\n+; clearly, the resulting regret\nbound for \u03c6(cid:48) also applies to \u03c6. Another possibility is to use LDOMD with squared L2-norm, where\none takes K = Rd and lets F : K\u2192R be de\ufb01ned as F (x) = 1\n2(cid:107)x(cid:107)2\n2, which leads to an additive\n2(cid:107)\u03c6(c) \u2212 x1(cid:107)2\n2 \u2264 2m2 \u2200c \u2208 C; moreover,\n(cid:107)(cid:96)t(cid:107)2 \u2264 \u221a\nupdate algorithm. In this case, one gets BF (\u03c6(c), x1) = 1\nd \u2200t, and F is 1-strongly convex w.r.t. (cid:107) \u00b7 (cid:107)2. Applying Theorem 3 for appropriate \u03b7\u2217\n\nif m < d\nif m \u2265 d.\n\n(cid:104)\n\n(cid:2) LDOMD(\u03b7\u2217)(cid:3)(cid:105)\n\nE\n\nRT\n\n\u221a\n= O(m\n\nT d) .\n\nthen gives\n\nThus LDOMD with squared L2-norm has a similar regret bound as that of Hedge/Na\u00a8\u0131ve OMD and\nFPL. In certain cases, it may also be possible to implement LDOMD with other choices of K and F\nthat lead to better regret bounds.\nIn the following sections we will consider several examples of applications of LDOMD to online\ncombinatorial decision problems involving both 0-1 polytopes and general polytopes in Rd.\n\ne \u2208 E; the loss for the independent set I t is given by(cid:80)\n\n4 Matroid Polytopes\nConsider an online decision problem in which the decision space C contains (not necessarily all)\nindependent sets in a matroid M = (E,I). Speci\ufb01cally, on each trial t, one selects an independent\nset I t \u2208 C, and receives a loss vector (cid:96)t \u2208 [0, 1]|E|, with (cid:96)t\ne specifying the loss for including element\ne. Here it is natural to de\ufb01ne a\nmapping \u03c6 : C\u2192{0, 1}|E| that maps each independent set I \u2208 C to its characteristic vector, de\ufb01ned\nas \u03c6e(I) = 1(e \u2208 I); the loss on selecting I t \u2208 C is then \u03c6(I t) \u00b7 (cid:96)t. Thus here d = |E|, and\n\u03c6(C) \u2286 {0, 1}|E|. A particularly interesting case is obtained by taking C to contain all the maximal\nindependent sets (bases) in I; in this case, the polytope conv(\u03c6(C)) is known as the matroid base\npolytope of M. This polytope, often denoted as B(M), is also given by\n\ne\u2208I t (cid:96)t\n\nx \u2208 R|E| (cid:12)(cid:12)(cid:12) (cid:80)\n(cid:110)\n\ne\u2208S xe \u2264 rankM(S) \u2200S \u2282 E, and(cid:80)\n\ne\u2208E xe = rankM(E)\n\n,\n\nB(M) =\n\nwhere rankM : 2E\u2192R is the matroid rank function of M de\ufb01ned as\n\nrankM(S) = max(cid:8)|I| | I \u2208 I, I \u2286 S(cid:9) \u2200S \u2286 E .\n\n(cid:111)\n\nWe will see below (Section 5) that both decomposition and unnormalized relative entropy projection\nsteps in this case can be performed ef\ufb01ciently assuming an appropriate oracle.\nWe note that Example 1 (m-subsets of a ground set of d elements) can be viewed as a special case of\nthe above setting for the matroid Msub = (E,I) de\ufb01ned by E = [d] and I = {S \u2286 E ||S| \u2264 m};\nthe set C of m-subsets of [d] is then simply the set of bases in I, and conv(\u03c6(C)) = B(Msub). The\nfollowing is another well-studied example:\n\n5\n\n\fExample 2 (Spanning trees with edge-based losses). Here one is given a connected, undirected\ngraph G = ([n], E), and the decision space C is the set of all spanning trees in G. On each trial t,\none selects a spanning tree T t \u2208 C and receives a loss vector (cid:96)t \u2208 [0, 1]|E|, with (cid:96)t\ne specifying the\ne. It is well known that the set of\nall spanning trees in G is the set of bases in the graphic matroid MG = (E,I), where I contains\nedge sets of all acyclic subgraphs of G. Therefore here d = |E|, \u03c6(C) is the set of incidence vectors\nof all spanning trees in G, and conv(\u03c6(C)) = B(MG), also known as the spanning tree polytope.\n\nloss for using edge e; the loss for the tree T t is given by(cid:80)\nHere LDOMD with unnormalized negative entropy has a regret bound of O(cid:0)n\n\ne\u2208T t (cid:96)t\n\n(cid:113)\n\nT ln(\n\nn\u22121 )(cid:1).\n\n|E|\n\n5 Polytopes Associated with Submodular Functions\nNext we consider settings where the decision space C is in one-to-one correspondence with the set\nof vertices of the base polytope associated with a submodular function, and losses are linear in the\ncorresponding vertex representations of elements in C. This setting was considered recently in [15],\nand as we shall see, encompasses both of the examples we saw earlier, as well as many others. Let\nf : 2[n]\u2192R be a submodular function with f (\u2205) = 0. The base polytope of f is de\ufb01ned as\n\nx \u2208 Rn (cid:12)(cid:12)(cid:12)(cid:80)\n(cid:110)\n\ni\u2208S xi \u2264 f (S) \u2200S \u2282 [n], and(cid:80)n\n\n(cid:111)\n\ni=1 xi = f ([n])\n\n.\n\nB(f ) =\n\n+ [4]. Therefore in this case one can take K = Rn\n\nLet \u03c6 : C\u2192Rn be a bijective mapping from C to the vertices of B(f ); thus conv(\u03c6(C)) = B(f ).\n5.1 Monotone Submodular Functions\nIt is known that when f is a monotone submodular function (which means U \u2286 V =\u21d2 f (U ) \u2264\nf (V )), then B(f ) \u2286 Rn\n+ and F : K\u2192R to be the\nunnormalized negative entropy. Both decomposition and unnormalized relative entropy projection\nsteps can be performed in time O(n6 + n5Q), where Q is the time taken by an oracle that given\nS returns f (S); for cardinality-based submodular functions, for which f (S) = g(|S|) for some\ng : [n]\u2192R, these steps can be performed in just O(n2) time [15].\nRemark on matroid base polytopes and spanning trees. We note that the matroid rank function\nof any matroid M is a monotone submodular function, and that the matroid base polytope B(M)\nis the same as B(rankM). Therefore Examples 1 and 2 can also be viewed as special cases of the\nabove setting. For the spanning trees of Example 2, the decomposition step of [14] makes use of a\nlinear programming formulation whose exact time complexity is unclear. Instead, one could use the\ndecomposition step associated with the submodular function rankMG, which takes O(|E|6) time.\nMatroid polytopes are 0-1 polytopes; the example below illustrates a more general polytope:\nExample 3 (Permutations with a certain position-based loss). Let C = Sn, the set of all permutations\nof n objects: C = {\u03c3 : [n]\u2192[n]| \u03c3 is bijective}. On each trial t, one selects a permutation \u03c3t \u2208 C\ni (n\u2212\u03c3t(i)+1).\nThis type of loss arises in scheduling applications, where (cid:96)t\ni denotes the time taken to complete the\ni-th job, and the loss of a job schedule (permutation of jobs) is the total waiting time of all jobs\n(the waiting time of a job is its own completion time plus the sum of completion times of all jobs\nscheduled before it) [15]. Here it is natural to de\ufb01ne a mapping \u03c6 : C\u2192Rn\n+ that maps \u03c3 \u2208 C to\n\u03c6(\u03c3) = (n \u2212 \u03c3(1) + 1, . . . , n \u2212 \u03c3(n) + 1); the loss on selecting \u03c3t \u2208 C is then \u03c6(\u03c3t) \u00b7 (cid:96)t. Thus\nhere we have d = n, and \u03c6(C) = {(\u03c3(1), . . . , \u03c3(n))| \u03c3 \u2208 Sn}. It is known that the n! vectors in\n\u03c6(C) are exactly the vertices of the base polytope corresponding to the monotone (cardinality-based)\ni=1(n \u2212 i + 1). Thus conv(\u03c6(C)) =\nB(fperm); this is a well-known polytope called the permutahedron [21], and has recently been studied\n\u2200\u03c3 \u2208 C, and\nin the context of online learning applications in [18, 15, 1]. Here (cid:107)\u03c6(\u03c3)(cid:107)1 = n(n+1)\n\nsubmodular function fperm : 2[n]\u2192R de\ufb01ned as fperm(S) =(cid:80)|S|\ntherefore LDOMD with unnormalized negative entropy has a regret bound of O(cid:0)n2(cid:112)T ln(n)(cid:1). As\n\nand receives a loss vector (cid:96)t \u2208 [0, 1]n; the loss of the permutation is given by(cid:80)n\n\ni=1 (cid:96)t\n\n2\n\nnoted above, decomposition and unnormalized relative entropy projection steps take O(n2) time.\n\n5.2 General Submodular Functions\nIn general, when f is non-monotone, B(f ) \u2282 Rn can contain vectors with non-negative entries.\nHere one can use LOMD with the squared L2-norm. The Euclidean projection step can again be\nperformed in time O(n6 + n5Q) in general, where Q is the time taken by an oracle that given S\nreturns f (S), and in O(n2) time for cardinality-based submodular functions [15].\n\n6\n\n\fC = {\u03c3 : [n]\u2192[n]| \u03c3 is bijective} .\n\n6 Permutation Polytopes\nThere has been increasing interest in recent years in online decision problems involving rankings or\npermutations, largely due to their role in applications such as information retrieval, recommender\nsystems, rank aggregation, etc [12, 18, 19, 15, 1, 2]. Here the decision space is C = Sn, the set of\nall permutations of n objects:\nOn each trial t, one predicts a permutation \u03c3t \u2208 C and receives some type of loss. We saw one special\ntype of loss in Example 3; we now consider any loss that can be represented as a linear function of\nsome vector representation of the permutations in C. Speci\ufb01cally, let d \u2208 Z+, and let \u03c6 : C\u2192Rd be\nany injective mapping such that on predicting \u03c3t, one receives a loss vector (cid:96)t \u2208 [0, 1]d and incurs\nloss \u03c6(\u03c3t) \u00b7 (cid:96)t. For any such mapping \u03c6, the polytope conv(\u03c6(C)) is called a permutation polytope\n[5].3 The permutahedron we saw in Example 3 is one example of a permutation polytope; here\nwe consider various other examples. For any such polytope, if one can perform the decomposition\nand suitable Bregman projection steps ef\ufb01ciently, then one can use the LDOMD algorithm to obtain\ngood regret guarantees with respect to the associated loss.\nExample 4 (Permutations with assignment-based losses). Here on each trial t, one selects a per-\nmutation \u03c3t \u2208 C and receives a loss matrix (cid:96)t \u2208 [0, 1]n\u00d7n, with (cid:96)t\nij specifying the loss for assigning\ni,\u03c3t(i). Here it is natural\nto de\ufb01ne a mapping \u03c6 : C\u2192{0, 1}n\u00d7n that maps each \u03c3 \u2208 C to its associated permutation matrix\nij = 1(\u03c3(i) = j) \u2200i, j \u2208 [n]; the loss incurred on predicting \u03c3t \u2208 C is\nP \u03c3 \u2208 {0, 1}n\u00d7n, de\ufb01ned as P \u03c3\nij. Thus we have here that d = n2, \u03c6(C) = {P \u03c3 \u2208 {0, 1}n\u00d7n | \u03c3 \u2208 Sn},\nand conv(\u03c6(C)) is the well-known Birkhoff polytope containing all doubly stochastic matrices in\n[0, 1]n\u00d7n (also known as the assignment polytope or the perfect matching polytope of the complete\nbipartite graph Kn,n). Here LDOMD with unnormalized negative entropy has a regret bound of\n\nelement i to position j; the loss for the permutation \u03c3t is given by(cid:80)n\nthen(cid:80)n\nO(cid:0)n(cid:112)T ln(n)(cid:1). This recovers exactly the PermELearn algorithm used in [12]; see [12] for ef\ufb01-\ni \u03b3(\u03c3t(i)); the total loss of the permutation \u03c3t is given by(cid:80)n\n\ncient implementations of the decomposition and unnormalized relative entropy projection steps.\nExample 5 (Permutations with general position-based losses). Here on each trial t, one selects\na permutation \u03c3t \u2208 C and receives a loss vector (cid:96)t \u2208 [0, 1]n. There is a weight function \u03b3 :\n[n]\u2192R+ that weights the loss incurred at each position, such that the loss contributed by element\ni is (cid:96)t\ni \u03b3(\u03c3t(i)). Note that the\nparticular loss considered in Example 3 (and in [15]) is a special case of such a position-based loss,\nwith weight function \u03b3(i) = (n\u2212i+1). Several other position-dependent losses are used in practice;\nfor example, the discounted cumulative gain (DCG) based loss, which is widely used in information\nretrieval applications, effectively uses \u03b3(i) = 1 \u2212\nlog2(i)+1 [9]. For a general position-based loss\nwith weight function \u03b3, one can de\ufb01ne \u03c6 : C\u2192Rn\n+ as \u03c6(\u03c3) = (\u03b3(\u03c3(1)), . . . , \u03b3(\u03c3(n))). This yields a\n+. Provided\none can implement the decomposition and suitable Bregman projection steps ef\ufb01ciently, one can use\nthe LDOMD algorithm to get a sublinear regret.\n\npermutation polytope conv(\u03c6(C)) = conv(cid:0)(cid:8)(\u03b3(\u03c3(1)), . . . , \u03b3(\u03c3(n))) | \u03c3 \u2208 Sn\n\n(cid:9)(cid:1) \u2282 Rn\n\nj=1 \u03c6ij(\u03c3t)(cid:96)t\n\n(cid:80)n\n\ni=1\n\ni=1 (cid:96)t\n\ni=1 (cid:96)t\n\n1\n\ni=1 ai =(cid:80)n\n\n7 Application to an Online Transportation Problem\nConsider now the following transportation problem: there are m supply locations for a particular\ncommodity and n demand locations, with a supply vector a \u2208 Zm\n+ and demand vector b \u2208 Zn\n+\nspecifying the (integer) quantities of the commodity supplied/demanded by the various locations.\n(cid:52)\n= q. In the of\ufb02ine setting, there is a cost matrix (cid:96) \u2208 [0, 1]m\u00d7n, with\n(cid:96)ij specifying the cost of transporting one unit of the commodity from supply location i to demand\nlocation j, and the goal is to decide on a transportation matrix Q \u2208 Zm\u00d7n\nthat speci\ufb01es suitable\n(integer) quantities of the commodity to be transported between the various supply and demand\n\nAssume(cid:80)m\nlocations so as to minimize the total transportation cost,(cid:80)m\n\n(cid:80)n\n\nj=1 bj\n\n+\n\nj=1 Qij(cid:96)ij.\n\ni=1\n\nHere we consider an online variant of this problem where the supply vector a and demand vector b\nare viewed as remaining constant over some period of time, while the costs of transporting the com-\n\n3The term \u2018permutation polytope\u2019 is sometimes used to refer to various polytopes obtained through speci\ufb01c\nmappings \u03c6 : Sn\u2192Rd; here we use the term in a broad sense for any such polytope, following terminology of\nBowman [5]. (Note that the description Bowman [5] gives of a particular 0-1 permutation polytope in Rn(n\u22121),\nknown as the binary choice polytope or the linear ordering polytope [20], is actually incorrect; e.g. see [11].)\n\n7\n\n\fAlgorithm Decomposition Step for Transportation Polytopes\nInput: X \u2208 T (a, b)\nInitialize: A1 \u2190 X; k \u2190 0\nRepeat:\n\n(where a \u2208 Zm\n\n+ , b \u2208 Zn\n+)\n\n(cid:16) Ak\n\n\u2013 k \u2190 k + 1\n\u2013 Find an extreme point Qk \u2208 T (a, b) such that Ak\n\u2013 \u03b1k \u2190 min(i,j):Qk\n\u2013 Ak+1 \u2190 Ak \u2212 \u03b1kQk\n\n(cid:17)\nUntil(cid:0) all entries of Ak+1 are zero(cid:1)\n\nij\nQk\nij\n\nij >0\n\nX =(cid:80)k\n\nOuput: Decomposition of X as convex combination of extreme points Q1, . . . , Qk:\nr=1 \u03b1r = 1)\nFigure 3: Decomposition step in applying LDOMD to transportation polytopes.\n\n(it can be veri\ufb01ed that \u03b1r \u2208 (0, 1] \u2200r and(cid:80)k\n\nr=1 \u03b1rQr\n\nij = 0 =\u21d2 Qk\n\nij = 0 (see Appendix B)\n\n+\n\n+\n\n+\n\ni=1\n\nij(cid:96)t\n\nC =(cid:8)Q \u2208 Zm\u00d7n\n\nmodity between various supply and demand locations change over time. Speci\ufb01cally, the decision\nspace here is the set of all valid (integer) transportation matrices satisfying constraints given by a, b:\n\n|(cid:80)n\nj=1 Qij = ai \u2200i \u2208 [m] , (cid:80)m\n[0, 1]m\u00d7n; the loss incurred is(cid:80)m\n(cid:80)n\nconv(\u03c6(C)) = T (a, b) =(cid:8)X \u2208 Rm\u00d7n\n|(cid:80)n\nj=1 Xij = ai \u2200i \u2208 [m] , (cid:80)m\n\nOn each trial t, one selects a transportation matrix Qt \u2208 C, and receives a cost matrix (cid:96)t \u2208\nij. A natural mapping here is simply the identity:\n\u03c6 : C\u2192Zm\u00d7n\nwith \u03c6(Q) = Q \u2200Q \u2208 C. Thus we have here d = mn, \u03c6(C) = C, and conv(\u03c6(C)) is\nthe well-known transportation polytope T (a, b) (e.g. see [6]):\n\ni=1 Xij = bj \u2200j \u2208 [n](cid:9) .\n\ni=1 Qij = bj \u2200j \u2208 [n](cid:9) .\n\nj=1 Qt\n\nTransportation polytopes generalize the Birkhoff polytope of doubly stochastic matrices, which can\nbe seen to arise as a special case when m = n and ai = bi = 1 \u2200i \u2208 [n] (see Example 4). While the\nBirkhoff polytope is a 0-1 polytope, a general transportation polytope clearly includes non-Boolean\nvertices. Nevertheless, we do have T (a, b) \u2282 Rm\u00d7n\n, which suggests we can use the LDOMD\nalgorithm with unnormalized negative entropy.\nFor the decomposition step in LDOMD, one can use an algorithm broadly similar to that used for the\nBirkhoff polytope in [12]. Speci\ufb01cally, given a matrix X \u2208 conv(\u03c6(C)) = T (a, b), one successively\nsubtracts off multiples of extreme points Qk \u2208 C from X until one is left with a zero matrix (see\nFigure 3). However, a key step of this algorithm is to \ufb01nd a suitable extreme point to subtract off\non each iteration. In the case of the Birkhoff polytope, this involved \ufb01nding a suitable permutation\nmatrix, and was achieved by \ufb01nding a perfect matching in a suitable bipartite graph. For general\ntransportation polytopes, we make use of a characterization of extreme points in terms of spanning\nforests in a suitable bipartite graph (see Appendix B for details). The overall decomposition results\nin a convex combination of at most mn extreme points in C, and takes O(m3n3) time.\nThe unnormalized relative entropy projection step can be performed ef\ufb01ciently by using a procedure\nsimilar to the Sinkhorn balancing used for the Birkhoff polytope in [12]. Speci\ufb01cally, given a non-\n, one alternately scales the rows and columns to match the desired row\nand column sums until some convergence criterion is met. As with Sinkhorn balancing, this results\nin an approximate projection step, but does not hurt the overall regret analysis (other than a constant\n\nnegative matrix (cid:101)X \u2208 Rm\u00d7n\nadditive term), yielding a regret bound of O(cid:0)q(cid:112)T ln(max(mn, q))(cid:1).\n\n+\n\n+\n\n8 Conclusion\nWe have considered a general form of online combinatorial decision problems, where costs can be\nlinear in any suitable low-dimensional vector representation of elements of the decision space, and\nhave given a general algorithm termed low-dimensional online mirror descent (LDOMD) for such\nproblems. Our study emphasizes the role of the convex polytope arising from the vector representa-\ntion of the decision space; this both yields a uni\ufb01cation and generalization of previous algorithms,\nand gives a general framework that can be used to design new algorithms for speci\ufb01c applications.\nAcknowledgments. Thanks to the anonymous reviewers for helpful comments and Chandrashekar\nLakshminarayanan for helpful discussions. AR is supported by a Microsoft Research India PhD\nFellowship. SA thanks DST and the Indo-US Science & Technology Forum for their support.\n\n8\n\n\fReferences\n[1] Nir Ailon. Bandit online optimization over the permutahedron. CoRR, abs/1312.1530, 2013.\n[2] Nir Ailon. Online ranking: Discrete choice, spearman correlation and other feedback. CoRR,\n\nabs/1308.6797, 2013.\n\n[3] Jean-Yves Audibert, S\u00b4ebastien Bubeck, and G\u00b4abor Lugosi. Regret in online combinatorial\n\noptimization. Mathematics of Operations Research, 39(1):31\u201345, 2014.\n\n[4] Francis Bach. Learning with submodular functions: A convex optimization perspective. Foun-\n\ndations and Trends in Machine Learning, 6(2-3):145\u2013373, 2013.\n\n[5] V. J. Bowman. Permutation polyhedra. SIAM Journal on Applied Mathematics, 22(4):580\u2013\n\n589, 1972.\n\n[6] Richard A Brualdi. Combinatorial Matrix Classes. Cambridge University Press, 2006.\n[7] S\u00b4ebastion Bubeck. Introduction to online optimization. Lecture Notes, Princeton University,\n\n2011.\n\n[8] Nicol`o Cesa-Bianchi and G\u00b4abor Lugosi. Combinatorial bandits. Journal of Computer and\n\nSystem Sciences, 78(5):1404\u20131422, 2012.\n\n[9] David Cossock and Tong Zhang. Statistical analysis of Bayes optimal subset ranking. IEEE\n\nTransactions on Information Theory, 54(11):5140\u20135154, 2008.\n\n[10] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning\nand an application to boosting. Journal of Computer and System Sciences, 55(1):119\u2013139,\n1997.\n\n[11] M. Gr\u00a8otschel, M. J\u00a8unger, and G. Reinelt. Facets of the linear ordering polytope. Mathematical\n\nProgramming, 33:43\u201360, 1985.\n\n[12] David P. Helmbold and Manfred K. Warmuth. Learning permutations with exponential\n\nweights. Journal of Machine Learning Research, 10:1705\u20131736, 2009.\n\n[13] Adam Tauman Kalai and Santosh Vempala. Ef\ufb01cient algorithms for online decision problems.\n\nJournal of Computer and System Sciences, 71(3):291\u2013307, 2005.\n\n[14] Wouter M. Koolen, Manfred K. Warmuth, and Jyrki Kivinen. Hedging structured concepts. In\n\nCOLT, 2010.\n\n[15] Daiki Suehiro, Kohei Hatano, Shuji Kijima, Eiji Takimoto, and Kiyohito Nagano. Online\n\nprediction under submodular constraints. In ALT, 2012.\n\n[16] Eiji Takimoto and Manfred K. Warmuth. Path kernels and multiplicative updates. Journal of\n\nMachine Learning Research, 4:773\u2013818, 2003.\n\n[17] Manfred K. Warmuth and Dima Kuzmin. Randomized online PCA algorithms with regret\nbounds that are logarithmic in the dimension. Journal of Machine Learning Research, 9:2287\u2013\n2320, 2008.\n\n[18] Shota Yasutake, Kohei Hatano, Shuji Kijima, Eiji Takimoto, and Masayuki Takeda. Online\n\nlinear optimization over permutations. In ISAAC, pages 534\u2013543, 2011.\n\n[19] Shota Yasutake, Kohei Hatano, Eiji Takimoto, and Masayuki Takeda. Online rank aggregation.\n\nIn ACML, 2012.\n\n[20] Jun Zhang. Binary choice, subset choice, random utility, and ranking: A uni\ufb01ed perspective\n\nusing the permutahedron. Journal of Mathematical Psychology, 48:107\u2013134, 2004.\n\n[21] G\u00a8unter M. Ziegler. Lectures on Polytopes. Springer, 1995.\n\n9\n\n\f", "award": [], "sourceid": 1817, "authors": [{"given_name": "Arun", "family_name": "Rajkumar", "institution": "Indian Institute of Science"}, {"given_name": "Shivani", "family_name": "Agarwal", "institution": "Indian Institute of Science"}]}