{"title": "Online Learning with a Hint", "book": "Advances in Neural Information Processing Systems", "page_first": 5299, "page_last": 5308, "abstract": "We study a variant of online linear optimization where the player receives a hint about the loss function at the beginning of each round. The hint is given in the form of a vector that is weakly correlated with the loss vector on that round. We show that the player can benefit from such a hint if the set of feasible actions is sufficiently round. Specifically, if the set is strongly convex, the hint can be used to guarantee a regret of O(log(T)), and if the set is q-uniformly convex for q\\in(2,3), the hint can be used to guarantee a regret of o(sqrt{T}). In contrast, we establish Omega(sqrt{T}) lower bounds on regret when the set of feasible actions is a polyhedron.", "full_text": "Online Learning with a Hint\n\nOfer Dekel\n\nMicrosoft Research\n\noferd@microsoft.com\n\nNika Haghtalab\n\nComputer Science Department\nCarnegie Mellon University\n\nnika@cmu.edu\n\nArthur Flajolet\n\nOperations Research Center\n\nMassachusetts Institute of Technology\n\nflajolet@mit.edu\n\nPatrick Jaillet\n\nEECS, LIDS, ORC\n\nMassachusetts Institute of Technology\n\njaillet@mit.edu\n\nAbstract\n\nWe study a variant of online linear optimization where the player receives a hint\nabout the loss function at the beginning of each round. The hint is given in the\nform of a vector that is weakly correlated with the loss vector on that round. We\nshow that the player can bene\ufb01t from such a hint if the set of feasible actions is\nsuf\ufb01ciently round. Speci\ufb01cally, if the set is strongly convex, the hint can be used to\nguarantee a regret of O(log(T )), and if the set is q-uniformly convex for q \u2208 (2, 3),\nthe hint can be used to guarantee a regret of o(\u221aT ). In contrast, we establish\n\u2126(\u221aT ) lower bounds on regret when the set of feasible actions is a polyhedron.\n\n1\n\nIntroduction\n\nOnline linear optimization is a canonical problem in online learning. In this setting, a player attempts\nto minimize an online adversarial sequence of loss functions while incurring a small regret, compared\nto the best of\ufb02ine solution. Many online algorithms exist that are designed to have a regret of O(\u221aT )\nin the worst case and it has been known that one cannot avoid a regret of \u2126(\u221aT ) in the worst case.\nWhile this worst-case perspective on online linear optimization has lead to elegant algorithms and\ndeep connections to other \ufb01elds, such as boosting [9, 10] and game theory [4, 2], it can be overly\npessimistic. In particular, it does not account for the fact that the player may have side-information\nthat allows him to anticipate the upcoming loss functions and evade the \u2126(\u221aT ) regret. In this\nwork, we go beyond this worst case analysis and consider online linear optimization when additional\ninformation in the form of a function that is correlated with the loss is presented to the player.\nMore formally, online convex optimization [24, 11] is a T -round repeated game between a player and\nan adversary. On each round, the player chooses an action xt from a convex set of feasible actions\nK \u2286 Rd and the adversary chooses a convex bounded loss function ft. Both choices are revealed and\nthe player incurs a loss of ft(xt). The player then uses its knowledge of ft to adjust its strategy for\nthe subsequent rounds. The player\u2019s goal is to accumulate a small loss compared to the best \ufb01xed\naction in hindsight. This value is called regret and is a measure of success of the player\u2019s algorithm.\nWhen the adversary is restricted to Lipschitz loss functions, several algorithms are known to guarantee\nO(\u221aT ) regret [24, 16, 11]. If we further restrict the adversary to strongly convex loss functions, the\nregret bound improves to O(log(T )) [14]. However, when the loss functions are linear, no online\nalgorithm can have a regret of o(\u221aT ) [5]. In this sense, linear loss functions are the most dif\ufb01cult\nconvex loss functions to handle [24].\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fIn this paper, we focus on the case where the adversary is restricted to linear Lipschitz loss functions.\nMore speci\ufb01cally, we assume that the loss function ft(x) takes the form cT\nt x, where ct is a bounded\nloss vector in C \u2286 Rd. We further assume that the player receives a hint before choosing the action\non each round. The hint in our setting is a vector that is guaranteed to be weakly correlated with the\nloss vector. Namely, at the beginning of round t, the player observes a unit-length vector vt \u2208 Rd\nt ct \u2265 \u03b1(cid:107)ct(cid:107)2, and where \u03b1 is a small positive constant. So long as this requirement is met,\nsuch that vT\nthe hint could be chosen maliciously, possibly by an adversary who knows how the player\u2019s algorithm\nuses the hint. Our goal is to develop a player strategy that takes these hints into account, and to\nunderstand when hints of this type make the problem provably easier and lead to smaller regret.\nWe show that the player\u2019s ability to bene\ufb01t from the hints depends on the geometry of the player\u2019s\naction set K. Speci\ufb01cally, we characterize the roundness of the set K using the notion of (C, q)-\nuniform convexity for convex sets. In Section 3, we show that if K is a (C, 2)-uniformly convex\nset (or in other words, if K is a C-strongly convex set), then we can use the hint to design a player\na polynomial dependence on the dimension d and other constants. Furthermore, as we show in\nSection 4, if K is a (C, q)-uniformly convex set for q \u2208 (2, 3), we can use the hint to improve the\n, when the hint belongs to a small set of possible hints at every step.\nregret to O\n\nstrategy that improves the regret guarantee to O(cid:0)(C\u03b1)\u22121 log(T )(cid:1), where our O(\u00b7) notation hides\n\n(cid:16)\n\n(cid:17)\n\n(C\u03b1)\n\n1\n\n1\u2212q T\n\n2\u2212q\n1\u2212q\n\nIn Section 5, we prove lower bounds on the regret of any online algorithm in this model. We \ufb01rst\nshow that when K is a polyhedron, such as a L1 ball, even a stronger form of hint cannot guarantee a\nregret of o(\u221aT ). Next, we prove a lower bound of \u2126(log(T )) regret when K is strongly convex.\n1.1 Comparison with Other Notions of Hints\n\n(cid:113)(cid:80)T\n\nThe notion of hint that we introduce in this work generalizes some of the notions of predictabil-\nity on online learning. Hazan and Megiddo [13] considered as an example a setting where\nthe player knows the \ufb01rst coordinate of the loss vector at all rounds, and showed that when\n|ct1| \u2265 \u03b1 and when the set of feasible actions is the Euclidean ball, one can achieve a regret\nof O(1/\u03b1 \u00b7 log(T )). Our work directly improves over this result, as in our setting a hint vt = \u00b1e1\nalso achieves O(1/\u03b1 \u00b7 log(T )) regret, but we can deal with hints in different directions at differ-\nent rounds and we allow for general uniformly convex action sets. Rakhlin and Sridharan [20]\nconsidered online learning with predictable sequences, with a notion of predictability that is con-\ncerned with the gradient of the convex loss functions. They show that if the player receives a\nhint Mt at round t, then the regret of the algorithm is at most O(\n\u2217).\nt=1 (cid:107)\u2207ft(xt) \u2212 Mt(cid:107)2\nIn the case of linear loss functions, this implies that having an estimate vector\nc(cid:48)t of the loss vector within distance \u03c3 of the true loss vector ct results in\nan improved regret bound of O(\u03c3\u221aT ). In contrast, we consider a notion of\nhint that pertains to the direction of the loss vector rather than its location.\nOur work shows that merely knowing whether the loss vector positively or\nnegatively correlates with another vector is suf\ufb01cient to achieve improved\nregret bound, when the set is uniformly convex. That is, rather than having\naccess to an approximate value of ct, we only need to have access to a\nFigure 1: Comparison\nhalfspace that classi\ufb01es ct correctly with a margin. This notion of hint is\nbetween notions of hint.\nweaker that the notion of hint in the work of Rakhlin and Sridharan [20]\nwhen the approximation error satis\ufb01es (cid:107)ct \u2212 c(cid:48)t(cid:107)2 \u2264 \u03c3 \u00b7 (cid:107)ct(cid:107)2 for \u03c3 \u2208 [0, 1). In this case one can\nuse c(cid:48)t/(cid:107)c(cid:48)t(cid:107)2 as the direction of the hint in our setting and achieve a regret of O( 1\n1\u2212\u03c3 log T ) when\nthe set is strongly convex. This shows that when the set of feasible actions is strongly convex, a\ndirectional hint can improve the regret bound beyond what has been known to be achievable by an\napproximation hint. However, we note that our results require the hints to be always valid, whereas\nthe algorithm of Rakhlin and Sridharan [19] can adapt to the quality of the hints.\nWe discuss these works and other related works, such as [15], in more details in Appendix A.\n\n2 Preliminaries\n\nWe begin with a more formal de\ufb01nition of online linear optimization (without hints). Let A denote the\nplayer\u2019s algorithm for choosing its actions. On round t the player uses A and all of the information\n\n2\n\nxyxvc\u02c6c(x)vcw\u02c6y\u02c6xz=x+y2\u02c6c(z)\u02c6c(x)+\u02c6c(y)2cc0\u21b5v\fit has observed so far to choose an action xt in a convex compact set K \u2286 Rd. Subsequently, the\nadversary chooses a loss vector ct in a compact set C \u2286 Rd. The player and the adversary reveal their\nactions and the player incurs the loss cT\n\nt xt. The player\u2019s regret is de\ufb01ned as\n\nT(cid:88)\n\nt=1\n\nT(cid:88)\n\nt=1\n\nR(A, c1:T ) =\n\ncT\nt xt \u2212 min\nx\u2208K\n\ncT\nt x.\n\nt ct \u2265 \u03b1(cid:107)ct(cid:107)2, for some \u03b1 > 0.\n\nIn online linear optimization with hints, the player observes vt \u2208 Rd with (cid:107)vt(cid:107)2 = 1, before choosing\nxt, with the guarantee that vT\nWe use uniform convexity to characterize the degree of convexity of the player\u2019s action set K.\nInformally, uniform convexity requires that the convex combination of any two points x and y on the\nboundary of K be suf\ufb01ciently far from the boundary. A formal de\ufb01nition is given below.\nDe\ufb01nition 2.1 (Pisier [18]). Let K be a convex set that is symmetric around the origin. K and the\nBanach space de\ufb01ned by K are said to be uniformly convex if for any 0 < \u0001 < 2 there exists a\n\u03b4 > 0 such that for any pair of points x, y \u2208 K with (cid:107)x(cid:107)K \u2264 1,(cid:107)y(cid:107)K \u2264 1,(cid:107)x \u2212 y(cid:107)K \u2265 \u0001, we have\n\nK \u2264 1 \u2212 \u03b4. The modulus of uniform-convexity \u03b4K(\u0001) is the best possible \u03b4 for that \u0001, i.e.,\n\n(cid:13)(cid:13) x+y\n\n(cid:13)(cid:13)\n\n2\n\n(cid:27)\n\n\u03b4K(\u0001) = inf\n\n1 \u2212\n\n: (cid:107)x(cid:107)K \u2264 1,(cid:107)y(cid:107)K \u2264 1,(cid:107)x \u2212 y(cid:107)K \u2265 \u0001\n\n.\n\n(cid:26)\n\n(cid:13)(cid:13)(cid:13)(cid:13) x + y\n\n2\n\n(cid:13)(cid:13)(cid:13)(cid:13)\n\nK\n\nFor brevity, we say that K is (C, q)-uniformly convex when \u03b4K(\u0001) = C\u0001q and we omit C when it is\nclear from the context.\nExamples of uniformly convex sets include Lp balls for any 1 < p < \u221e with modulus of convexity\n\u03b4Lp (\u0001) = Cp\u0001p for p \u2265 2 and a constant Cp and \u03b4Lp (\u0001) = (p \u2212 1)\u00012 for 1 < p \u2264 2. On the other\nhand, L1 and L\u221e units balls are not uniformly convex. When \u03b4K(\u0001) \u2208 \u0398(\u00012), we say that K is\nstrongly convex.\nAnother notion of convexity we use in this work is called exp-concavity. A function f : K \u2192 R is\nexp-concave with parameter \u03b2 > 0, if exp(\u2212\u03b2f (x)) is a concave function of x \u2208 K. This is a weaker\nrequirement than strong convexity when the gradient of f is uniformly bounded [14]. The next\nproposition shows that we can obtain regret bounds of order \u0398(log(T )) in online convex optimization\nwhen the loss functions are exp-concave with parameter \u03b2.\nProposition 2.2 ([14]). Consider online convex optimization on a sequence of loss functions\nf1, . . . , fT over a feasible set K \u2286 Rd, such that all t, ft : K \u2192 R is exp-concave with pa-\nrameter \u03b2 > 0. There is an algorithm, with runtime polynomial in d, which we call AEXP, with a\nregret that is at most d\n\n\u03b2 (1 + log(T + 1)).\n\nThroughout this work, we draw intuition from basic orthogonal geometry. Given any vector x and a\nhint v, we de\ufb01ne x v = (xTv)v and x T v = x\u2212(xTv)v, as the parallel and the orthogonal components\nof x with respect to v. When the hint v is clear from the context we simply use x and x T to denote\nthese vectors.\nNaturally, our regret bounds involve a number of geometric parameters. Since the set of actions of the\nadversary C is compact, we can \ufb01nd G \u2265 0 such that (cid:107)c(cid:107)2 \u2264 G for all c \u2208 C. When K is uniformly\nconvex, we denote K = {w \u2208 Rd | (cid:107)w(cid:107)K \u2264 1}. In this case, since all norms are equivalent in \ufb01nite\ndimension, there exist R > 0 and r > 0 such that Br \u2286 K \u2286 BR, where Br (resp. BR) denote the\nL2 unit ball centered at 0 with radius r (resp. R). This implies that 1\nImproved Regret Bounds for Strongly Convex K\n\nR (cid:107)\u00b7(cid:107)2 \u2264 (cid:107)\u00b7(cid:107)K \u2264 1\n\nr (cid:107)\u00b7(cid:107)2.\n\n3\n\nAt \ufb01rst sight, it is not immediately clear how one should use the hint. Since vt is guaranteed to satisfy\nt vt \u2265 \u03b1(cid:107)ct(cid:107)2, moving the action x in the direction \u2212vt always decreases the loss. One could hope\ncT\nto get the most bene\ufb01t out of the hint by choosing xt to be the extremal point in K in the direction\n\u2212vt. However, this na\u00efve strategy could lead to a linear regret in the worst case. For example, say\n2 ) and vt = (0, 1) for all t and let K be the Euclidean unit ball. Choosing xt = \u2212vt\nthat ct = (1, 1\nwould incur a loss of \u2212 T\n2 , while the best \ufb01xed action in hindsight, the point ( \u22122\u221a5 , \u22121\u221a5 ), would incur a\nloss of \u2212\u221a5\n\n2 T . The player\u2019s regret would therefore be \u221a5\u22121\n\n2 T .\n\n3\n\n\fIntuitively, the \ufb02aw of this na\u00efve strategy is that the hint does not give the player any information\nabout the (d \u2212 1)-dimensional subspace orthogonal to vt. Our solution is to use standard online\nlearning machinery to learn how to act in this orthogonal subspace. Speci\ufb01cally, on round t, we use\nvt to de\ufb01ne the following virtual loss function:\n\n\u02c6ct(x) = min\nw\u2208K\n\nt w s.t. w T vt = x T vt\ncT\n\n.\n\nIn words, we consider the 1-dimensional subspace spanned by vt\nand its (d \u2212 1)-dimensional orthogonal subspace separately. For any\naction x \u2208 K, we \ufb01nd another point, w \u2208 K, that equals x in the\n(d \u2212 1)-dimensional orthogonal subspace, but otherwise incurs the\noptimal loss. The value of the virtual loss \u02c6ct(x) is de\ufb01ned to be the\nvalue of the original loss function ct at w. The virtual loss simulates\nthe process of moving x as far as possible in the direction \u2212vt without\nchanging its value in any other direction (see Figure 2). This can be\nmore formally seen by the following equation.\n\n(cid:0)(c T\nt )T \u02c6x T + (ct )Tw (cid:1) = arg min\n(cid:13)(cid:13)2 vt since the hint is valid.\n(cid:13)(cid:13)ct\n\nw\u2208K:w T =\u02c6x T\n\nvT\nt w,\n\n(1)\n\nFigure 2: Virtual function \u02c6c(\u00b7).\n\narg min\nw\u2208K:w T =\u02c6x T\n\ncT\nt w = arg min\nw\u2208K:w T =\u02c6x T\n\nwhere the last transition holds by the fact that ct =\nThis provides an intuitive understanding of a measure of convexity\nof our virtual loss functions. When K is uniformly convex then the\nfunction \u02c6ct(\u00b7) demonstrates convexity in the subspace orthogonal\nto vt. To see that, note that for any x and y that lie in the space\northogonal to vt, their mid point x+y\ntransforms to a point that\n2\nis farther away in the direction of \u2212vt than the midpoint of the\ntransformations of x and y. As shown in Figure 3, the modulus\nof uniform convexity of K affects the degree of convexity of\n\u02c6ct(\u00b7). We note, however, that \u02c6ct(\u00b7) is not strongly convex in\nall directions. In fact, \u02c6ct(\u00b7) is constant in the direction of vt.\nNevertheless, the properties shown here allude to the fact that\n\u02c6ct(\u00b7) demonstrates some notion of convexity. As we show in the\nnext lemma, this notion is indeed exp-concavity:\nLemma 3.1. If K is (C, 2)-uniformly convex, then \u02c6ct(\u00b7) is 8 \u03b1\u00b7C\u00b7r\nProof. Let \u03b3 = 8 \u03b1\u00b7C\u00b7r\nG\u00b7R2 . Without loss of generality, we assume that ct (cid:54)= 0, otherwise \u02c6ct(\u00b7) = 0 is a\nconstant function and the proof follows immediately. Based on the above discussion, it is not hard to\nsee that \u02c6ct(\u00b7) is continuous (we prove this in more detail in the Appendix D.1. So, to prove that \u02c6ct(\u00b7)\nis exp-concave, it is suf\ufb01cient to show that\n\nFigure 3: Uniform-convexity of the\nfeasible set affects the convexity the\nvirtual loss function.\n\nG\u00b7R2 -exp-concave.\n\n(cid:18)\n\n(cid:18) x + y\n\n(cid:19)(cid:19)\n\nexp\n\n\u2212\u03b3 \u00b7 \u02c6ct\n\n2\n\n1\n2\n\n\u2265\n\nexp (\u2212\u03b3 \u00b7 \u02c6ct(x)) +\n\n1\n2\n\nexp (\u2212\u03b3 \u00b7 \u02c6ct(y))\n\n\u2200(x, y) \u2208 K.\n\nConsider (x, y) \u2208 K and choose corresponding (\u02c6x, \u02c6y) \u2208 K such that \u02c6ct(x) = cT\nWithout loss of generality, we have (cid:107)\u02c6x(cid:107)K\nthat are extreme points of K. Since exp(\u2212\u03b3\u02c6ct(\u00b7)) is decreasing in \u02c6ct(\u00b7), we have\n\nt \u02c6y.\n= 1, as we can always choose corresponding \u02c6x, \u02c6y\n\nt \u02c6x and \u02c6ct(y) = cT\n\n= (cid:107)\u02c6y(cid:107)K\n\n(cid:18)\n\n(cid:18) x + y\n\n(cid:19)(cid:19)\n\nexp\n\n\u2212\u03b3 \u00b7 \u02c6ct\n\n=\n\nmax\n(cid:107)w(cid:107)K\u22641\nw T vt =( x+y\n\n2 ) T vt\n\nexp(\u2212\u03b3 \u00b7 cT\n\nt w).\n\n2 \u2212 \u03b4K((cid:107)\u02c6x \u2212 \u02c6y(cid:107)K\n\n)\n\nvt\n\n(cid:107)vt(cid:107)K\n\nsatis\ufb01es (cid:107)w(cid:107)K \u2264 1, since (cid:107)w(cid:107)K \u2264\n\n+\n2 ) T vt. So, by using this w in\n\nK\n\n) \u2264 1 (see also Figure 3). Moreover, w T vt = ( x+y\n\nNote that w = \u02c6x+\u02c6y\n\u03b4K((cid:107)\u02c6x \u2212 \u02c6y(cid:107)K\nEquation (2), we have\n\n(cid:18)\n\n(cid:18) x + y\n\n(cid:19)(cid:19)\n\nexp\n\n\u2212\u03b3 \u00b7 \u02c6ct\n\n2\n\n\u2265 exp\n\n\u2212\n\n\u03b3\n2 \u00b7 (cT\n\nt \u02c6x + cT\n\nt \u02c6y) + \u03b3 \u00b7\n\ncT\nt vt\n(cid:107)vt(cid:107)K\n\n4\n\n\u00b7 \u03b4K((cid:107)\u02c6x \u2212 \u02c6y(cid:107)K\n\n)\n\n.\n\n(3)\n\n2\n\n(cid:18)\n\n(2)\n\n(cid:13)(cid:13)(cid:13)\n\n(cid:13)(cid:13)(cid:13) \u02c6x+\u02c6y\n(cid:19)\n\n2\n\nxy\u02c6x\u02c6yzxvc\u02c6c(x)\u02c6zvc\u02c6c(\u02c6z)12(\u02c6c(\u02c6x)+\u02c6c(\u02c6y))z0xyxvc\u02c6c(x)vcw\u02c6y\u02c6xz=x+y2\u02c6c(z)\u02c6c(x)+\u02c6c(y)2\f\u03b3 \u00b7\n\n(cid:18)\n\ncT\nt vt\n(cid:107)vt(cid:107)K\n\n(cid:18)\n(cid:19)\nOn the other hand, since (cid:107)vt(cid:107)K \u2264 1\nr and (cid:107)\u02c6x \u2212 \u02c6y(cid:107)K \u2265 1\nr(cid:107)vt(cid:107)2 = 1\n(cid:32)\n\u03b3 \u00b7 r \u00b7 \u03b1 \u00b7 (cid:107)ct(cid:107)2 \u00b7 C \u00b7\n\u2265 exp\n\u00b7 \u03b4K((cid:107)\u02c6x \u2212 \u02c6y(cid:107)K\n)\nexp\n(cid:18) (\u03b3/2)2 \u00b7 (cT\n(cid:16) \u03b3\n\n\u00b7 (cid:107)ct(cid:107)2 \u00b7\nt \u02c6y)2\nt \u02c6x \u2212 cT\n2\n\n(cid:19)\nR(cid:107)\u02c6x \u2212 \u02c6y(cid:107)2, we have\n(cid:18) cT\nR2 \u00b7 (cid:107)\u02c6x \u2212 \u02c6y(cid:107)2\ncT\nt \u02c6y\n(cid:19)\n(cid:107)ct(cid:107)2 \u2212\n(cid:107)ct(cid:107)2\n(cid:16) \u03b3\n\n\u03b1 \u00b7 C \u00b7 r\n\n\u2265 exp\n\n\u2265 exp\n\n(cid:17)\n\n\u03b3 \u00b7\n\nR2\n\nt \u02c6x\n\n1\n\n2\n\n(cid:19)2(cid:33)\n\n(cid:17)\n\n1\n2 \u00b7 exp\n\n\u2265\n\n2 \u00b7 (cT\n\nt \u02c6x \u2212 cT\n\nt \u02c6y)\n\n+\n\n1\n2 \u00b7 exp\n\n2 \u00b7 (cT\n\nt \u02c6y \u2212 cT\n\nt \u02c6x)\n\n,\n\nwhere the penultimate inequality follows by the de\ufb01nition of \u03b3 and the last inequality is a consequence\n2 exp(\u2212z),\u2200z \u2208 R. Plugging the last inequality into (3)\n(cid:18)\nof the inequality exp(z2/2) \u2265 1\n(cid:17)(cid:111)\n(cid:17)\n(cid:16)\nyields\n\n2 exp(z) + 1\n\n(cid:19)\n\n(cid:110)\n\n(cid:17)\n\n(cid:16) \u03b3\n\n(cid:16) \u03b3\n\nx+y\n\nexp\n\n(cT\n\nt \u02c6x + cT\n\nt \u02c6y)\n\nexp\n\n\u00b7\n\n(cT\nt \u02c6x \u2212 cT\n\nt \u02c6y)\n\n2\n\n+ exp\n\n(cT\nt \u02c6y \u2212 cT\n\nt \u02c6x)\n\n2\n\nexp\n\n\u2212\u03b3\u02c6ct(\n\n2\n\n)\n\n\u03b3\n2\n\n\u2212\n\n\u2265\n\n=\n\n=\n\n1\n2\n1\n2\n1\n2\n\nt \u02c6y) +\n\n1\nexp (\u2212\u03b3 \u00b7 cT\n2\nexp (\u2212\u03b3 \u00b7 \u02c6ct(y)) +\n\nt \u02c6x)\n\nexp (\u2212\u03b3 \u00b7 cT\n1\nexp (\u2212\u03b3 \u00b7 \u02c6ct(x)) ,\n2\n\nwhich concludes the proof.\n\nNow, we use the sequence of virtual loss functions to reduce our problem to a standard online convex\noptimization problem (without hints). Namely, the player applies AEXP (from Proposition 2.2),\nwhich is an online convex optimization algorithm known to have O(log(T )) regret with respect to\nexp-concave functions, to the sequence of virtual loss functions. Then our algorithm takes the action\n\u02c6xt \u2208 K that is prescribed by AEXP and moves it as far as possible in the direction of \u2212vt. This\nprocess is formalized in Algorithm 1.\nAlgorithm 1 Ahint FOR STRONGLY CONVEX K\nFor t = 1, . . . , T ,\n\n1. Use Algorithm AEXP with the history \u02c6c\u03c4 (\u00b7) for \u03c4 < t, and let \u02c6xt be the chosen action.\n2. Let xt = arg minw\u2208K(vT\n\n. Play xt and receive ct as feedback.\n\nt w) s.t. w T vt = \u02c6x T vt\n\nt\n\nNext, we show that the regret of algorithm AEXP on the sequence of virtual loss functions is an upper\nbound on the regret of Algorithm 1.\nLemma 3.2. For any sequence of loss functions c1, . . . , cT , let R(Ahint, c1:T ) be the regret of\nalgorithm Ahint on the sequence c1, . . . , cT , and R(AEXP, \u02c6c1:T ) be the regret of algorithm AEXP\non the sequence of virtual loss functions \u02c6c1, . . . , \u02c6cT . Then, R(Ahint, c1:T ) \u2264 R(AEXP, \u02c6c1:T ).\nt w) s.t. w T vt = \u02c6x T vt\nProof. Equation (1) provides an equivalent de\ufb01nition xt = arg minw\u2208K(cT\n.\nUsing this, we show that the loss of algorithm Ahint on the sequence c1:T is the same as the loss of\nalgorithm AEXP on the sequence \u02c6c1:T .\nmin\n\nT(cid:88)\n\nT(cid:88)\n\nT(cid:88)\n\nT(cid:88)\n\n\u02c6ct(\u02c6xt) =\n\nt\n\ncT\nt w) =\n\ncT\nt w =\n\ncT\nt xt.\n\nt=1\n\nt=1\n\nw\u2208K:w T =\u02c6x T\n\nt\n\ncT\nt ( arg min\nw\u2208K:w T =\u02c6x T\n\nt\n\nt=1\n\nt=1\n\nNext, we show that the of\ufb02ine optimal on the sequence \u02c6c1:T is more competitive that the of\ufb02ine\noptimal on the sequence c1:T . First note that for any x and t, \u02c6ct(x) = minw\u2208K:w T =x T cT\nt x.\nt w \u2264 cT\nTherefore, minx\u2208K\n\nt x. The proof concludes by\n\nt=1 \u02c6ct(x) \u2264 minx\u2208K\n\nt=1 cT\n\n(cid:80)T\n\n(cid:80)T\nT(cid:88)\n\nR(Ahint, c1:T ) =\n\ncT\nt xt \u2212 min\nx\u2208K\n\nt=1\n\ncT\nt x \u2264\n\n\u02c6ct(\u02c6xt) \u2212 min\nx\u2208K\n\n\u02c6ct(x) = R(AEXP, \u02c6c1:T ).\n\nT(cid:88)\n\nt=1\n\nT(cid:88)\n\nt=1\n\nT(cid:88)\n\nt=1\n\n5\n\n\fOur main result follows from the application of Lemmas 3.1 and 3.2.\nTheorem 3.3. Suppose that K \u2286 Rd is a (C, 2)-uniformly convex set that is symmetric around the\norigin, and Br \u2286 K \u2286 BR for some r and R. Consider online linear optimization with hints where\nthe cost function at round t is (cid:107)ct(cid:107)2 \u2264 G and the hint vt is such that cT\nt vt \u2265 \u03b1(cid:107)ct(cid:107)2, while (cid:107)vt(cid:107)2 = 1.\nAlgorithm 1 in combination with AEXP has a worst-case regret of\n\nR(Ahint, c1:T ) \u2264\n\nd \u00b7 G \u00b7 R2\n8\u03b1 \u00b7 C \u00b7 r \u00b7 (1 + log(T + 1)).\n\nSince AEXP requires the coef\ufb01cient of exp-concavity to be given as an input, \u03b1 needs to be known\na priori to be able to use Algorithm 1. However, we can use a standard doubling trick to relax this\nrequirement and derive the same asymptotic regret bound. We defer the presentation of this argument\nto Appendix B.\n\n4\n\nImproved Regret Bounds for (C, q)-Uniformly Convex K\n\n2\u2212q\n\nIn this section, we consider any feasible set K that is (C, q)-uniformly convex for q \u2265 2. Our results\ndiffer from the previous section in two aspects. First, our algorithm can be used with (C, q)-uniformly\nconvex feasible sets for any q \u2265 2 compared to the results of the previous section that only hold for\nstrongly convex sets (q = 2). On the other hand, the approach in this section requires the hints to be\nrestricted to a \ufb01nite set of vectors V. We show that when K is (C, q)-uniformly convex for q > 2,\n1\u2212q ). If q \u2208 (2, 3), this is an improvement over the worst case regret of O(\u221aT )\nour regret is O(T\nguaranteed in the absence of hints.\nWe \ufb01rst consider the scenario where the hint is always pointing in the same direction, i.e. vt = v for\nsome v and all t \u2208 [T ]. In this case, we show how one can use a simple algorithm that picks the best\nperforming action so far (a.k.a the Follow-The-Leader algorithm) to obtain improved regret bounds.\nWe then consider the case where the hint belongs to a \ufb01nite set V. In this case, we instantiate one\ncopy of the Follow-The-Leader algorithm for each v \u2208 V and combine their outcomes in order to\nobtain improved regret bounds that depend on the cardinality of V, which we denote by |V|.\n(cid:80)\nLemma 4.1. Suppose that vt = v for all t = 1,\u00b7\u00b7\u00b7 , T and that K is (C, q)-uniformly convex that is\nsymmetric around the origin, and Br \u2286 K \u2286 BR for some r and R. Consider the algorithm, called\nFollow-The-Leader (FTL), that at every round t, plays xt \u2208 arg minx\u2208K\n\u03c4 v \u2265 0\nT(cid:88)\nfor all t = 1,\u00b7\u00b7\u00b7 , T , then the regret is bounded as follows,\n2(cid:80)t\n(cid:107)ct(cid:107)q\n\u00b7\n\u03c4 =1 cT\nt v \u2265 \u03b1 \u00b7 (cid:107)ct(cid:107)2 for all t = 1,\u00b7\u00b7\u00b7 , T , the\n\nt=1\nFurthermore, when v is a valid hint with margin \u03b1, i.e., cT\nright-hand side can be further simpli\ufb01ed to obtain the regret bound:\n\n\u03c4 x. If(cid:80)t\n(cid:33)1/(q\u22121)\n\n(cid:19)1/(q\u22121)\n\nR(AFTL, c1:T ) \u2264\n\n(cid:107)v(cid:107)K \u00b7 Rq\n\n\u03c4 =1 cT\n\n(cid:32)\n\n\u03c4 2,\n\nProof. We use a well-known inequality, known as FT(R)L Lemma (see e.g., [12, 17]), on the regret\nincurred by the FTL algorithm:\n\nT(cid:88)\nWithout loss of generality, we can assume that (cid:107)xt(cid:107)K\nfunction is attained at a boundary point. Since K is (C, q)-uniformly convex, we have\n\ncT\nt (xt \u2212 xt+1).\n= (cid:107)xt+1(cid:107)K\n\n= 1 since the maximum of a linear\n\nR(AFTL, c1:T ) \u2264\n\nt=1\n\n(cid:13)(cid:13)(cid:13)(cid:13) xt + xt+1\n\n2\n\n(cid:13)(cid:13)(cid:13)(cid:13)\n\nK\n\n\u2264 1 \u2212 \u03b4K((cid:107)xt \u2212 xt+1(cid:107)K\n\n).\n\n6\n\n\f(cid:13)(cid:13)(cid:13)(cid:13)\n\nK\n\nv\n(cid:107)v(cid:107)K\n\n)\n\n(cid:19)\n\n\u2264 1.\n\nt(cid:88)\n\n\u03c4 =1\n\n2\n\nThis implies that\n\n\u2212 \u03b4K((cid:107)xt \u2212 xt+1(cid:107)K\n\u03c4 =1 c\u03c4 . So, we have\n\n(cid:13)(cid:13)(cid:13)(cid:13) xt + xt+1\nMoreover, xt+1 \u2208 arg minx\u2208K xT(cid:80)t\nRearranging this last inequality and using the fact that(cid:80)t\n(cid:32) t(cid:88)\n(cid:80)t\nBy de\ufb01nition of FTL, we have xt \u2208 arg minx\u2208K xT(cid:80)t\u22121\n\n(cid:32) t(cid:88)\n(cid:33)T(cid:18) xt + xt+1\n(cid:33)T(cid:18) xt \u2212 xt+1\n(cid:19)\n\n\u2265 \u03b4K((cid:107)xt \u2212 xt+1(cid:107)K\n\n\u2212 \u03b4K((cid:107)xt \u2212 xt+1(cid:107)K\n\nv\n(cid:107)v(cid:107)K\n\n) \u00b7\n\n\u03c4 =1\n\n\u03c4 =1\n\nc\u03c4\n\nc\u03c4\n\n2\n\n2\n\n)\n\n(cid:33)T\n\nt(cid:88)\n(cid:32) t(cid:88)\n\n\u03c4 =1\n\n\u03c4 =1\n\n(cid:33)\n\nvTc\u03c4\n\n.\n\nxT\n\n\u2265 inf\nx\u2208K\n\nc\u03c4 = xT\n\nt+1\n\nc\u03c4 .\n\n\u03c4 =1 vTc\u03c4 \u2265 0, we obtain:\nC \u00b7 (cid:107)xt \u2212 xt+1(cid:107)q\n\n\u03c4 =1 vTc\u03c4\n(cid:107)v(cid:107)K\n\u03c4 =1 c\u03c4 , which implies:\n\n(cid:107)v(cid:107)K \u00b7 Rq\n\n\u2265\n\n2\n\n\u00b7\n\n(cid:32)t\u22121(cid:88)\n(cid:33)\n\nc\u03c4\n\nxt+1 \u2212 xt\n\n2\n\n\u2265 0.\n\ncT\nt\n\n\u03c4 =1\n\n(cid:19)\n\n(cid:32) t(cid:88)\n\nSumming up the last two inequalities and setting \u03b3 = C\u00b7\u03b1\n\n(cid:18) xt \u2212 xt+1\n\u03b3\nRearranging this last inequality and using the fact that(cid:80)t\n\u00b7 (cid:107)xt \u2212 xt+1(cid:107)q\n\u03b1 \u00b7\n(cid:32)\n\u03c4 =1 vTc\u03c4 \u2265 0, we obtain:\n2(cid:80)t\n(cid:107)ct(cid:107)q\n\u03c4 =1 vTc\u03c4\n\n(cid:33)\n(cid:107)v(cid:107)K\u00b7Rq , we derive:\nvTc\u03c4\n(cid:33)1/(q\u22121)\n\n|cT\nt (xt \u2212 xt+1)| \u2264\n\n(2\u03b3/\u03b1)1/(q\u22121) \u00b7\n\n(cid:32) t(cid:88)\n\n\u03b3\n\u03b1 \u00b7\n\n2 \u2265\n\nvTc\u03c4\n\n\u2265\n\n\u03c4 =1\n\n\u03c4 =1\n\n2\n\n1\n\n\u00b7\n\n.\n\nt (xt \u2212 xt+1))q\n(cT\n\n(cid:107)ct(cid:107)q\n\n2\n\n.\n\n(4)\n\nSumming (4) over all t completes the proof of the \ufb01rst claim. The regret bounds for when vTct \u2265\n\u03b1 \u00b7 (cid:107)ct(cid:107)2 for all t = 1,\u00b7\u00b7\u00b7 , T follow from the \ufb01rst regret bound. We defer this part of the proof to\nAppendix D.2.\n\nNote that the regret bounds become O(T ) when q \u2192 \u221e. This is expected because Lq balls are\nq-uniformly convex for q \u2265 2 and converge to L\u221e balls as q \u2192 \u221e and it is well-known that\nFollow-The-Leader yields \u0398(T ) regret in online linear optimization when K is a L\u221e ball.\nUsing the above lemma, we introduce an algorithm for online linear optimization with hints that\nbelong to a set V. In this algorithm, we instantiate one copy of the FTL algorithm for each possible\ndirection of the hint. On round t, we invoke the copy of the algorithm that corresponds to the direction\nof the hint vt, using the history of the game for rounds with hints in that direction. We show that the\noverall regret of this algorithm is no larger than the sum of the regrets of the individual copies.\nAlgorithm 2 Aset: SET-OF-HINTS\n(cid:80)\nFor all v \u2208 V, let Tv = \u2205.\nFor t = 1, . . . , T ,\n1. Play xt \u2208 arg minx\u2208K\n2. Update Tvt \u2190 Tvt \u222a {t}.\n\n\u03c4 x and receive ct as feedback.\ncT\n\n\u03c4\u2208Tvt\n\nTheorem 4.2. Suppose that K \u2286 Rd is a (C, q)-uniformly convex set that is symmetric around the\norigin, and Br \u2286 K \u2286 BR for some r and R. Consider online linear optimization with hints where\nthe cost function at round t is (cid:107)ct(cid:107)2 \u2264 G and the hint vt comes from a \ufb01nite set V and is such that\nt vt \u2265 \u03b1(cid:107)ct(cid:107)2, while (cid:107)vt(cid:107)2 = 1. Algorithm 2 has a worst-case regret of\ncT\n\nR(Aset, c1:T ) \u2264 |V| \u00b7\n\nand\n\nR(Aset, c1:T ) \u2264 |V| \u00b7\n\nR2\n\n(cid:18) Rq\n\n(cid:19)1/(q\u22121)\n\n2C \u00b7 \u03b1 \u00b7 r \u00b7 G \u00b7 (ln(T ) + 1),\nq \u2212 1\nq \u2212 2 \u00b7 T\n\n\u00b7 G \u00b7\n\n2C \u00b7 \u03b1 \u00b7 r\n\nif q = 2,\n\nq\u22122\nq\u22121\n\nif q > 2.\n\n7\n\n\fProof. We decompose the regret as follows:\n\nR(Aset, c1:T ) =\n\ncT\ncT\nt xt \u2212 inf\nt x \u2264\nx\u2208K\nR(AFTL, cTv ).\n\nt=1\n\n\u2264 |V| \u00b7 max\nv\u2208V\n\nT(cid:88)\n\nt=1\n\nT(cid:88)\n\n(cid:40)(cid:88)\n\n(cid:88)\n\nv\u2208V\n\nt\u2208Tv\n\n(cid:41)\n\ncT\nt x\n\n(cid:88)\n\nt\u2208Tv\n\ncT\nt xt \u2212 inf\nx\u2208K\n\nThe proof follows by applying Lemma 4.1 and by using (cid:107)vt(cid:107)K \u2264 (1/r) \u00b7 (cid:107)vt(cid:107)2 = 1/r.\n\nNote that Aset does not require \u03b1 or V to be known a priori, as it can compile the set of hint directions\nas it sees new ones. Moreover, if the hints are not limited to \ufb01nite set V a priori, then the algorithm\ncan \ufb01rst discretize the L2 unit ball with an \u03b1/2-net and approximate any given hint with one of the\nhints in the discretized set. Using this discretization technique, Theorem 4.2 can be extended to the\nsetting where the hints are not constrained to a \ufb01nite set while having a regret that is linear in the size\nof the \u03b1/2-net (exponential in the dimension d.) Extensions of Theorem 4.2 are discussed in more\ndetails in the Appendix C.\n\n5 Lower Bounds\n\nThe regret bounds derived in Sections 3 and 4 suggest that the curvature of K can make up for the\nlack of curvature of the loss function to get rates faster than O(\u221aT ) in online convex optimization,\nprovided we receive additional information about the next move of the adversary in the form of a\nhint. In this section, we show that the curvature of the player\u2019s decision set K is necessary to get rates\nbetter than O(\u221aT ), even in the presence of a hint.\nAs an example, consider the unit cube, i.e. K = {x | (cid:107)x(cid:107)\u221e \u2264 1}. Note that this set is not uniformly\nconvex. Since, the ith coordinate of points in such a set, namely xi, has no effect on the range of\nacceptable values for the other coordinates, revealing one coordinate does not give us any information\nabout the other coordinates xj for j (cid:54)= i. For example, suppose that ct has each of its \ufb01rst two\ncoordinates set to +1 or \u22121 with equal probability and all other coordinates set to 1. In this case,\neven after observing the last d \u2212 2 coordinates of the loss vector, the problem is reduced to a standard\nonline linear optimization problem in the 2-dimensional unit cube. This choice of ct is known to\nincur a regret of \u2126(\u221aT ) [1]. Therefore, online linear optimization with the set K = {x | (cid:107)x(cid:107)\u221e \u2264 1},\neven in the presence of hints, has a worst-case regret of \u2126(\u221aT ). As it turns out, this result holds for\nany polyhedral set of actions. We prove this by means of a reduction to the lower bounds established\nin [8] that apply to the online convex optimization framework (without hint). We defer the proof to\nthe Appendix D.4.\nTheorem 5.1. If the set of feasible actions is a polyhedron then, depending on the set C, either there\nexists a trivial algorithm that achieves zero regret or every online algorithm has worst-case regret\n\u2126(\u221aT ). This is true even if the adversary is restricted to pick a \ufb01xed hint vt = v for all t = 1,\u00b7\u00b7\u00b7 , T .\nAt \ufb01rst sight, this result may come as a surprise. After all, since any Lp ball with 1 < p \u2264 2 is\nstrongly convex, one can hope to use a L1+\u03bd unit ball K(cid:48) to approximate K when K is a L1 ball\n(which is a polyhedron) and apply the results of Section 3 to achieve better regret bounds. The\nproblem with this approach is that the constant in the modulus of convexity of K(cid:48) deteriorates when\np \u2192 1 since \u03b4Lp (\u0001) = (p \u2212 1) \u00b7 \u00012, see [3]. As a result, the regret bound established in Theorem 3.3\nbecomes O( 1\np\u22121 \u00b7 log T ). Since the best approximation of a L1 unit ball using a Lp ball is of the\nform {x \u2208 Rd | d1\u2212 1\np(cid:107)x(cid:107)p \u2264 1}, the distance between the of\ufb02ine benchmark in the de\ufb01nition\np\u22121) \u00b7 T , which translates into an\nof regret when using K(cid:48) instead of K can be as large as (1 \u2212 d\np\u22121) \u00b7 T in the regret bound when using K(cid:48) as a proxy for K. Due to\nadditive term of order (1 \u2212 d\nthe inverse dependence of the regret bound obtained in Theorem 3.3 on p \u2212 1, the optimal choice of\np = 1 + \u02dcO( 1\u221aT\nFinally, we conclude with a result that suggests that O(log(T )) is, in fact, the optimal achievable\nregret when K is strongly convex in online linear optimization with a hint. We defer the proof to the\nAppendix D.4.\n\n) leads to a regret of order \u02dcO(\u221aT ).\n\n1\n\n1\n\n8\n\n\fTheorem 5.2. If K is a L2 ball then, depending on the set C, either there exists a trivial algorithm\nthat achieves zero regret or every online algorithm has worst-case regret \u2126(log(T )). This is true\neven if the adversary is restricted to pick a \ufb01xed hint vt = v for all t = 1,\u00b7\u00b7\u00b7 , T .\n6 Directions for Future Research\n\nWe conjecture that the dependence of our regret bounds with respect to T is suboptimal when K is\n(C, q)-uniformly convex for q > 2. We expect the optimal rate to converge to \u221aT when q \u2192 \u221e as\nLq balls converge to L\u221e balls and it is well known that the minimax regret scales as \u221aT in online\nlinear optimization without hints when the decision set is a L\u221e ball. However, this calls for the\ndevelopment of an algorithm that is not based on a reduction to the Follow-The-Leader algorithm, as\ndiscussed after Lemma 4.1.\nWe also conjecture that it is possible to relax the assumption that there are \ufb01nitely many hints when\nK is (C, q)-uniformly convex with q > 2 without incurring an exponential dependence of the regret\nbounds (and the runtime) on the dimension d, see Appendix C. Again, this calls for the development\nof an algorithm that is not based on a reduction to the Follow-The-Leader algorithm.\nA solution that would alleviate the two aforementioned shortcomings would likely be derived through\na reduction to online convex optimization with convex functions that are (C, q)-uniformly convex,\nfor q \u2265 2, in all but one direction and constant in the other, in a similar fashion as done in Section\n3 when q = 2. There has been progress in this direction in the literature, but, to the best of our\nknowledge, no conclusive result yet. For instance, Vovk [23] studies a related problem but restricts\nthe study to the squared loss function. It is not clear if the setting studied in this paper can be reduced\nto the setting of square loss function. Another example is given by [21], where the authors consider\nonline convex optimization with general (C, q)-uniformly convex functions in Banach spaces (with\nno hint) achieving a regret of order O(T (q\u22122)/(q\u22121)). Note that this rate matches the one derived in\nTheorem 4.2. However, as noted above, our setting cannot be reduced to theirs because our virtual\nloss functions are not uniformly convex in every direction.\n\nAcknowledgments\n\nHaghtalab was partially funded by an IBM Ph.D. fellowship and a Microsoft Ph.D. fellowship. Jaillet\nacknowledges the research support of the Of\ufb01ce of Naval Research (ONR) grant N00014-15-1-2083.\nThis work was partially done when Haghtalab was an intern at Microsoft Research, Redmond WA.\n\nReferences\n[1] Jacob Abernethy, Peter L Bartlett, Alexander Rakhlin, and Ambuj Tewari. Optimal strategies and minimax\nlower bounds for online convex games. In Proceedings of the 21st Conference on Learning Theory (COLT),\npages 415\u2013424, 2008.\n\n[2] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm\n\nand applications. Theory of Computing, 8(1):121\u2013164, 2012.\n\n[3] Keith Ball, Eric A Carlen, and Elliott H Lieb. Sharp uniform convexity and smoothness inequalities for\n\ntrace norms. Inventiones mathematicae, 115(1):463\u2013482, 1994.\n\n[4] Avrim Blum and Yishay Monsour. Learning, regret minimization, and equilibria. In Algorithmic Game\n\nTheory, pages 79\u2013102. 2007.\n\n[5] Nicolo Cesa-Bianchi and G\u00e1bor Lugosi. Prediction, learning, and games. Cambridge university press,\n\n2006.\n\n[6] Chao-Kai Chiang and Chi-Jen Lu. Online learning with queries. In Proceedings of the 21st Annual\n\nACM-SIAM Symposium on Discrete Algorithms (SODA), pages 616\u2013629, 2010.\n\n[7] Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo\nZhu. Online optimization with gradual variations. In Proceedings of the 25th Conference on Learning\nTheory (COLT), pages 6\u20131, 2012.\n\n[8] Arthur Flajolet and Patrick Jaillet. No-regret learnability for piecewise linear losses. arXiv preprint\n\narXiv:1411.5649, 2014.\n\n9\n\n\f[9] Yoav Freund. Boosting a weak learning algorithm by majority. Information and computation, 121(2):\n\n256\u2013285, 1995.\n\n[10] Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an\n\napplication to boosting. Journal of computer and system sciences, 55(1):119\u2013139, 1997.\n\n[11] Elad Hazan. The convex optimization approach to regret minimization. Optimization for machine learning,\n\npages 287\u2013303, 2012.\n\n[12] Elad Hazan and Satyen Kale. Extracting certainty from uncertainty: Regret bounded by variation in costs.\n\nIn Proceedings of the 23th Conference on Learning Theory (COLT), 2008.\n\n[13] Elad Hazan and Nimrod Megiddo. Online learning with prior knowledge. In Proceedings of the 20th\n\nConference on Learning Theory (COLT), pages 499\u2013513, 2007.\n\n[14] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization.\n\nMachine Learning, 69(2-3):169\u2013192, 2007.\n\n[15] Ruitong Huang, Tor Lattimore, Andr\u00e1s Gy\u00f6rgy, and Csaba Szepesv\u00e1ri. Following the leader and fast\nrates in linear prediction: Curved constraint sets and other regularities. In Proceedings of the 30th Annual\nConference on Neural Information Processing Systems (NIPS), pages 4970\u20134978, 2016.\n\n[16] Adam Kalai and Santosh Vempala. Ef\ufb01cient algorithms for online decision problems. Journal of Computer\n\nand System Sciences, 71(3):291\u2013307, 2005.\n\n[17] H Brendan McMahan. A survey of algorithms and analysis for adaptive online learning. Journal of\n\nMachine Learning Research, 18:1\u201350, 2017.\n\n[18] Gilles Pisier. Martingales in banach spaces (in connection with type and cotype). Manuscript., Course IHP,\n\nFeb, pages 2\u20138, 2011.\n\n[19] Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In Proceedings of\n\nthe 25th Conference on Learning Theory (COLT), pages 993\u20131019, 2013.\n\n[20] Alexander Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences.\nIn Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS), pages\n3066\u20133074, 2013.\n\n[21] Karthik Sridharan and Ambuj Tewari. Convex games in banach spaces. In Proceedings of the 23rd\n\nConference on Learning Theory (COLT), pages 1\u201313, 2010.\n\n[22] Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Compressed\n\nSensing: Theory and Applications, pages 210\u2013268. Cambridge University Press, 2012.\n\n[23] Vladimir Vovk. Competing with wild prediction rules. Machine Learning, 69(2-3):193\u2013212, 2007.\n\n[24] Martin Zinkevich. Online convex programming and generalized in\ufb01nitesimal gradient ascent. In Proceed-\n\nings of the 20th International Conference on Machine Learning (ICML), pages 928\u2013936, 2003.\n\n10\n\n\f", "award": [], "sourceid": 2743, "authors": [{"given_name": "Ofer", "family_name": "Dekel", "institution": "Microsoft Research"}, {"given_name": "arthur", "family_name": "flajolet", "institution": "MIT"}, {"given_name": "Nika", "family_name": "Haghtalab", "institution": "Carnegie Mellon University"}, {"given_name": "Patrick", "family_name": "Jaillet", "institution": "MIT"}]}