{"title": "Dying Experts: Efficient Algorithms with Optimal Regret Bounds", "book": "Advances in Neural Information Processing Systems", "page_first": 9983, "page_last": 9992, "abstract": "We study a variant of decision-theoretic online learning in which the set of experts that are available to Learner can shrink over time. This is a restricted version of the well-studied sleeping experts problem, itself a generalization of the fundamental game of prediction with expert advice. Similar to many works in this direction, our benchmark is the ranking regret. Various results suggest that achieving optimal regret in the fully adversarial sleeping experts problem is computationally hard. This motivates our relaxation where any expert that goes to sleep will never again wake up. We call this setting \"dying experts\" and study it in two different cases: the case where the learner knows the order in which the experts will die and the case where the learner does not. In both cases, we provide matching upper and lower bounds on the ranking regret in the fully adversarial setting. Furthermore, we present new, computationally efficient algorithms that obtain our optimal upper bounds.", "full_text": "Dying Experts: Ef\ufb01cient Algorithms\n\nwith Optimal Regret Bounds\n\nHamid Shayestehmanesh\u2217\n\nDepartment of Computer Science\n\nUniversity of Victoria\n\nSajjad Azami\u2217\n\nDepartment of Computer Science\n\nUniversity of Victoria\n\nNishant A. Mehta\n\nDepartment of Computer Science\n\nUniversity of Victoria\n\n{hamidshayestehmanesh, sajjadazami, nmehta}@uvic.ca\n\nAbstract\n\nWe study a variant of decision-theoretic online learning in which the set of experts\nthat are available to Learner can shrink over time. This is a restricted version of the\nwell-studied sleeping experts problem, itself a generalization of the fundamental\ngame of prediction with expert advice. Similar to many works in this direction, our\nbenchmark is the ranking regret. Various results suggest that achieving optimal\nregret in the fully adversarial sleeping experts problem is computationally hard.\nThis motivates our relaxation where any expert that goes to sleep will never again\nwake up. We call this setting \u201cdying experts\u201d and study it in two different cases:\nthe case where the learner knows the order in which the experts will die and the\ncase where the learner does not. In both cases, we provide matching upper and\nlower bounds on the ranking regret in the fully adversarial setting. Furthermore,\nwe present new, computationally ef\ufb01cient algorithms that obtain our optimal upper\nbounds.\n\n1\n\nIntroduction\n\nDecision-theoretic online learning (DTOL) [13, 20, 21, 6] is a sequential game between a learning\nagent (hereafter called Learner) and Nature. In each round, Learner plays a probability distribution\nover a \ufb01xed set of experts and suffers loss accordingly. However, in wide range of applications,\nthis \u201c\ufb01xed\u201d set of actions shrinks as the game goes on. One way this can happen is because experts\neither get disquali\ufb01ed or expire over time; a key scenario of contemporary relevance is in contexts\nwhere experts that discriminate are prohibited from being used due to existing (or emerging) anti-\ndiscrimination laws. Two prime examples are college admissions and deciding whether incarcerated\nindividuals should be granted parole; here the agent may rely on predictions from a set of experts in\norder to make decisions, and naturally experts detected to be discriminating against certain groups\nshould not be played anymore. However, the standard DTOL setting does not directly adapt to this\ncase, i.e., for a given round it does not make sense nor may it even be possible to compare Learner\u2019s\nperformance to an expert or action that is no longer available.\nMotivated by cases where the set of experts can change, a reasonable benchmark is the ranking regret\n[12, 9], for which Learner competes with the best ordering of the actions (see (1) in Section 2 for\na formal de\ufb01nition). The situation where the set of available experts can change in each round is\nknown as the sleeping experts setting, and unfortunately, it appears to be computationally hard to\n\n\u2217equal contribution\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f\u221a\n\nobtain a no-regret algorithm in the case of adversarial payoffs (losses in our setting) and adversarial\navailability of experts [10]. This motivates the question of whether the optimal regret bounds can\nbe achieved ef\ufb01ciently for the case where the set of experts can only shrink, which we will refer to\nas the \u201cdying experts\u201d setting. Applying the results of [12] to the dying experts problem only gives\nO(\nT K log K) regret, for K experts and T rounds, and their strategy is computationally inef\ufb01cient.\nIn more detail, the strategy in [12] is to de\ufb01ne a permutation expert (our terminology) that is identi\ufb01ed\nby an ordering of experts, where a permutation expert\u2019s strategy is to play the \ufb01rst awake expert in\nthe ordering. They then run Hedge [6] on the set of all possible permutation experts over K experts.\nAlthough this strategy competes with the best ordering, the per-round computation of running Hedge\non K! experts is O(K K) if na\u00efvely implemented, and the results of [10] suggest that no ef\ufb01cient\nalgorithm \u2014 one that uses computation poly(K) per round \u2014 can obtain regret that simultaneously is\no(T ) and poly(K). However, in the dying experts setting, we show that many of these K! orderings\nare redundant and only O(2K) of them are \u201ceffective\u201d. The notion of effective experts (formally\nde\ufb01ned in Section 3) is used to refer to a minimal set of orderings such that each ordering in the set\nwill behave uniquely in hindsight. The behavior of an ordering is de\ufb01ned as how it uses the initial\nexperts in its predictions over T rounds. Interestingly, it turns out that this structure also allows for an\nef\ufb01cient implementation of Hedge which, as we show, obtains optimal regret in the dying experts\nsetting. The key idea that enables an ef\ufb01cient implementation is as follows. Our algorithms group\norderings with identical behavior into one group, where there can be at most K groups at each round.\nWhen an expert dies, the orderings in one of the groups are forced to predict differently and therefore\nhave to redistribute to the other groups. This splitting and rejoining behavior occurs in a \ufb01xed pattern\nwhich enables us to ef\ufb01ciently keep track of the weight associated with each group.\nIn certain scenarios, Learner might be aware of the order in which the experts will become unavailable.\nFor example, in online advertising, an ad broker has contracts with their providers and these contracts\nmay expire in an order known to Learner. Therefore, we will study the problem in two different\nsettings: when Learner is aware of this order and when it is not.\n\nContributions. Our \ufb01rst main result is an upper bound on the number of effective experts (The-\norem 3.1); this result will be used for our regret upper bound in the known order case. Also, in\npreparation for our lower bound results, we prove a fully non-asymptotic lower bound on the minimax\nregret for DTOL (Theorem 4.1). Our main lower bounds contributions are minimax lower bounds\nfor both the unknown and known order of dying cases (Theorems 4.2 and 4.4). In addition, we\nprovide strategies to achieve optimal upper bounds for unknown and known order of dying (Theo-\nrems 4.3 and 4.5 respectively), along with ef\ufb01cient algorithms for each case. This is in particular\ninteresting since, in the framework of sleeping experts, the results of [10] suggest that no-regret\nlearning is computationally hard, but we show that it is ef\ufb01ciently achievable in the restricted problem.\nFinally, in Section 5.3, we show how to generalize our algorithms to other algorithms with adaptive\nlearning rates, either adapting to unknown T or achieving far greater forms of adaptivity like in\nAdaHedge and FlipFlop [5].\nAll formal proofs not found in the main text can be found in the appendix.\n\n2 Background and related work\n\nThe DTOL setting [6] is a variant of prediction with expert advice [13, 20, 21] in which Learner\nreceives an example xt in round t and plays a probability distribution pt over K actions. Nature\nthen reveals a loss vector (cid:96)t that indicates the loss for each expert. Finally, Learner suffers a loss\n\n\u02c6(cid:96)t := pt \u00b7 (cid:96)t =(cid:80)K\n\ni=1 pi,t(cid:96)i,t.\n\na \u2286 Et\n\nIn the dying experts problem, we assume that the set of experts can only shrink. More formally, for\nthe set of experts E = {e1, e2, . . . eK}, at each round t, Nature chooses a non-empty set of experts\na for all t \u2208 {1, . . . , T \u2212 1}. In other words, in some rounds\na to be available such that Et+1\nEt\nNature sets some experts to sleep, and they will never be available again. Similar to [12, 11, 10], we\nadopt the ranking regret as our notion of regret. Before proceeding to the de\ufb01nition of ranking regret,\nlet us de\ufb01ne \u03c0 to be an ordering over the set of initial experts E. We use the notion of orderings and\npermutation experts interchangeably. Learner can now predict using \u03c0 \u2208 \u03a0, where \u03a0 is the set of all\nthe orderings. Also, denote by \u03c3t(\u03c0) the \ufb01rst alive expert of ordering \u03c0 in round t; expert \u03c3t(\u03c0) is\nthe action that will be played by \u03c0. The cumulative loss of an ordering \u03c0 with respect to the available\n\n2\n\n\fexperts Et\n\na is the sum of the losses of \u03c3t(\u03c0) at each round. We can now de\ufb01ne the ranking regret:\n\nR\u03a0(1, T ) =\n\n(cid:96)\u03c3t(\u03c0),t .\n\n\u02c6(cid:96)t \u2212 min\n\u03c0\u2208\u03a0\n\n(1)\n\nSince we will use the notion of classical regret in our proofs, we also provide its formal de\ufb01nition:\n\nT(cid:88)\nT(cid:88)\n\nt=1\n\nt=1\n\nT(cid:88)\nT(cid:88)\n\nt=1\n\nt=1\n\nRE(1, T ) =\n\n(cid:96)i,t .\n\n(2)\n\n\u02c6(cid:96)t \u2212 min\ni\u2208[K]\n\nWe use the convention that the subscript of a regret notion R represents the set of experts against which\nwe compare Learner\u2019s performance. Also, the argument in parentheses represents the set of rounds in\nthe game. For example, R\u03a0(1, T ) represents the regret over rounds 1 to T with the comparator set\nbeing all permutation experts \u03a0. Also, we assume that (cid:96)i,t \u2208 [0, 1] for all i \u2208 [K], t \u2208 [T ].\nSimilar to the de\ufb01nition of Et\na be the set of dead experts at the start of round t. We\nrefer to a round as a \u201cnight\u201d if any expert becomes unavailable on the next round. A \u201cday\u201d is de\ufb01ned\nas a continuous subset of rounds where the subset starts with a round after a night and ends with a\nnight. As an example, if any expert become unavailable at the beginning of round t, we refer to round\nt \u2212 1 as a night (and we say the expert dies on that night) and the set of rounds {t, t + 1 . . . , t(cid:48)} as a\nday, where t(cid:48) is the next night. We denote by m the number of nights throughout a game of T rounds.\n\nd := E \\ Et\n\na, let Et\n\nRelated work. The papers [7] and [1] initiated the line of work on the sleeping experts setting.\nThese works were followed by [2], which considered a different notion of regret and a variety of\ndifferent assumptions. In [7], the comparator set is the set of all probability vectors over K experts,\nwhile we compare Learner\u2019s performance to the performance of the best ordering. In particular, the\nproblem considered in [7] aims to compare Learner\u2019s performance to the best mixture of actions,\nwhich also includes our comparator set (orderings). However, in order to recover an ordering as\nwe de\ufb01ne, one needs to assign very small probabilities to all experts except for one (the \ufb01rst alive\naction), which makes the bound in [7] trivial. As already mentioned, we assume the set Et\na is chosen\nadversarially (subject to the restrictions of the dying setting), while in [11] and [15] the focus is on\nthe (full) sleeping experts setting with adversarial losses but stochastic generation of Et\na.\nFor the case of adversarial selection of available actions (which is more relevant to the present paper),\n[12] studies the problem in the cases of stochastic and adversarial rewards with both full information\n\u221a\nand bandit feedback. Among the four settings, the adversarial full-information setting is most related\nto our work. They prove a lower bound of \u2126(\nT K log K) in this case and a matching upper bound\nby creating K! experts and running Hedge on them, which, as mentioned before, requires computation\nof order O(K K) per round. They prove an upper bound of O(K\nT log K) which is optimal within\na log factor for the bandit setting using a similar transformation of experts. A similar framework in\nthe bandits setting introduced in [4] is called \u201cmortal bandits\u201d; we do not discuss this work further as\nthe results are not applicable to our case, given that they do not consider adversarial rewards. There\nis also another line of work which considers the contrary direction of the dying experts game. The\nsetting is usually referred to as \u201cbranching\u201d experts, in which the set of experts can only expand. In\nparticular, part of the inspiration for our algorithms came from [8, 14].\nThe hardness of the sleeping experts setting is well-studied [10, 9, 12]. First, [12] showed for a\nrestricted class of algorithms that there is no ef\ufb01cient no-regret algorithm for sleeping experts setting\nunless RP = N P . Following this, [10] proved that the existence of a no-regret ef\ufb01cient algorithm\nfor the sleeping experts setting implies the existence of an ef\ufb01cient algorithm for the problem of\nPAC learning DNFs, a long-standing open problem. For the similar but more general case of online\nsleeping combinatorial optimization (OSCO) problems, [9] showed that an ef\ufb01cient and optimal\nalgorithm for \u201cper-action\u201d regret in OSCO problems implies the existence of an ef\ufb01cient algorithm\nfor PAC learning DNFs. Per-action regret is another natural benchmark for partial availability of\nactions for which the regret with respect to an action is only considered in rounds in which that action\nwas available.\n\n\u221a\n\n3 Number of effective experts in dying experts setting\n\nIn this section, we consider the number of effective permutation experts among the set of all possible\norderings of initial experts. The idea behind this is that, given the structure in dying experts, not\n\n3\n\n\fall the orderings will behave uniquely in hindsight. Formally, the behavior of \u03c0 is a sequence of\npredictions (\u03c31(\u03c0), \u03c32(\u03c0), . . . , \u03c3T (\u03c0)). This means that the behaviors of two permutation experts \u03c0\nand \u03c0(cid:48) are the same if they use the same initial experts in every round. We de\ufb01ne the set of effective\norderings E \u2286 \u03a0 to be a set such that, for each unique behavior of orderings, there only exists one\nordering in E.\nTo clarify the de\ufb01nition of unique behavior, suppose initial expert e1 is always awake. Then two\norderings \u03c01 = (e1, e2, . . . ) and \u03c02 = (e1, e3, . . . ) will behave the same over all the rounds,\nmaking one of them redundant. Let us clarify that behavior is not de\ufb01ned based on losses, e.g.,\nif \u03c01 = (ei, . . . ) and \u03c02 = (ej, . . . ) where i (cid:54)= j both suffer identical losses over all the rounds\n(i.e. their performances are equal) while using different original experts, then they are not considered\nredundant and hence both of them are said to be effective.\nLet di be the number of experts dying on the i th night. Denote by A the number of experts that will\n\ni=1 di. We are now ready to \ufb01nd the cardinality of set E.\n\nTheorem 3.1. In the dying experts setting, for K initial experts and m nights, the number of effective\n\nalways be awake, so that A = K \u2212(cid:80)m\norderings in \u03a0 is f ({d1, d2, . . . dm}, A) = A \u00b7(cid:81)m\n\ns=1(ds + 1).\n\nIn the special case where no expert dies (m = 0), we use the convention that the (empty) product\nevaluates to 1 and hence f ({}, A) = A. We mainly care about |E| as we use it to derive our upper\nbounds; hence, we should \ufb01nd the maximum value of f. We can consider the maximum value of f in\nthree regimes.\n\n1. In the case of a \ufb01xed number of nights m and \ufb01xed A, the function f is maximized by\nequally spreading the dying experts across the nights. As the number of dying experts might\nnot be divisible by the number of nights, some of the nights will get one more expert than\nD = K \u2212 A + m and K \u2212 A \u2264 m.\n\nthe others. Formally, the maximum value is ((cid:6) D\n\nm\n\n(cid:7)D mod m \u00b7(cid:4) D\n\n(cid:5)m\u2212(D mod m) \u00b7 A), where\n\nm\n\n2. In the case of a \ufb01xed number of dying experts (\ufb01xed A), the maximum value of f is (2K\u2212A \u00b7\nA) which occurs when one expert dies on each night. The following is a brief explanation\non how to get this result. Denote by B = (d1, d2, . . . , db) a sequence of numbers of dying\nexperts where more than one expert dies on some night and B maximizes f (for \ufb01xed A),\nso that F = f ({d1, d2, . . . , db}, A). Without loss of generality, assume that d1 > 1. Split\nthe \ufb01rst night into d1 days where one expert dies at the end of each day (and consequently\neach of those days becomes a night). Now F (cid:48) = f ({1, 1, . . . , 1, d2, . . . , db}, A) where 1 is\nrepeated d1 times. If d1 > 1 then F (cid:48) = F \u00b7 2d1 /(d1 + 1) > F . We see that by splitting the\nnights we can achieve a larger effective set.\n\n3. In the case of a \ufb01xed number of nights m, similar to the previous cases, the maximum value\nis obtained when each night has equal impact on the value of f, i.e., when A = d1 + 1 =\nd2 + 1 = \u00b7\u00b7\u00b7 = dm + 1; however, it might not be possible to distribute the experts in a way\nto get this, in which case we should make the allocation {A, d1 + 1, d2 + 1, . . . , dm + 1} as\nuniform as possible.\n\nBy looking at cases 2 and 3, we see that by increasing m and the number of dying experts, we\ncan increase f; thus, the maximum value of f with no restriction is 2K\u22121 and is achieved when\nm = K \u2212 1 and A = 1.\n\n4 Regret bounds for known and unknown order of dying\n\nIn this section, we provide lower and upper bounds for the cases of unknown and known order of\ndying. In order to prove the lower bounds, we need a non-asymptotic minimax lower bound for the\nDTOL framework, i.e., one which holds for a \ufb01nite number of experts K and \ufb01nite T . During the\npreparation of the \ufb01nal version of this work, we were made aware of a result of Orabona and P\u00e1l (see\nTheorem 8 of [16]) that does give such a bound. However, for completeness, we present a different\nfully non-asymptotic result that we independently developed; this result is stated in a simpler form\nand admits a short proof (though we admit that it builds upon heavy machinery). We then will prove\nmatching upper bounds for both cases of unknown and known order of dying.\n\n4\n\n\f4.1 Fully non-asymptotic minimax lower bound for DTOL\n\nWe analyze lower bounds on the minimax regret in the DTOL game with K experts and T rounds.\nWe assume that all losses are in the interval [0, 1]. Let \u2206K := \u2206([K]) denote the simplex over K\noutcomes. The minimax regret is de\ufb01ned as\n\ninf\n\np1\u2208\u2206K\n\nsup\n\n(cid:96)1\u2208[0,1]K\n\n. . .\n\ninf\n\npT \u2208\u2206K\n\npt \u00b7 (cid:96)t \u2212 min\nj\u2208[K]\n\n(cid:96)j,t\n\n.\n\n(3)\n\nTheorem 4.1. For a universal constant L, the minimax regret (3) is lower bounded by\n\nsup\n\n(cid:40) T(cid:88)\n(cid:110)(cid:112)(T /2) log K, T\n\n(cid:96)T \u2208[0,1]K\n\nt=1\n\n(cid:111)\n\n.\n\n1\nL\n\nmin\n\n(cid:41)\n\nT(cid:88)\n\nt=1\n\nThe proof (in the appendix) begins similarly to the proof of the often-cited Theorem 3.7 of [3], but\nit departs at the stage of lower bounding the Rademacher sum; we accomplish this lower bound by\ninvoking Talagrand\u2019s Sudakov minoration for Bernoulli processes [17, 18].\n\n4.2 Unknown order of dying\n\n\u221a\n\nmT log K). Given that we have Et+1\n\nFor the case where Learner is not aware of the order in which the experts die, we prove a lower bound\nof \u2126(\na, the construction for the lower bound of [12]\ncannot be applied to our case. In other words, our adversary is much weaker than the one in [12], but,\nsurprisingly, we show that the previous lower bound still holds (by setting m = K) even with the\nweaker adversary. We then analyze a simple strategy to achieve a matching upper bound.\n\nIn this section, we further assume that(cid:112)T /2 log K < T for every T and K so that there is hope to\n\na \u2286 Et\n\nachieve regret that is sublinear with respect to T . We now present our lower bound on the regret for\nthe case of unknown order of dying.\nTheorem 4.2. When the order of dying is unknown, the minimax regret is \u2126(\n\nmT log K).\n\n\u221a\n\nProof Sketch. We construct a scenario where each day is a game decoupled from the previous ones.\nThis means that the algorithm will be forced to have no prior information about the experts at the\nbeginning of each day. First, partition the T rounds into m + 1 days of equal length. The days\nare split into two halves. On the \ufb01rst half, each expert suffers loss drawn i.i.d. from a Bernoulli\ndistribution with p = 1/2. At the end of the \ufb01rst half of the day, we choose the expert with the lowest\ncumulative loss until that round, and that expert will suffer no loss on the second half. For any other\nexpert ei, we use the loss (cid:96)(1)\ni,t of ei on the\nt th round of the second half; speci\ufb01cally, we choose the setting (cid:96)(2)\ni,t . We show that the\nranking regret of the set of orderings over T rounds is obtained by summing the classical regrets of\neach day over the set of days.\n\ni,t of ei on the t th round of the \ufb01rst half to de\ufb01ne the loss (cid:96)(2)\n\ni,t := 1 \u2212 (cid:96)(1)\n\nA natural strategy in the case of unknown dying order is to run Hedge over the set of initial experts\nE and, after each night, reset the algorithm. We will refer to this strategy as \u201cResetting-Hedge\u201d.\nTheorem 4.3 gives an upper bound on regret of Resetting-Hedge.\n\u221a\nTheorem 4.3. Resetting-Hedge enjoys a regret of R\u03a0(1, T ) = O(\n\nProof. Let \u03c4s be the set of round indices of day s; hence, we have(cid:80)m+1\n\ns=1 |\u03c4s| = T . The overall\nranking regret can be upper bounded by the sum of classical regrets for every interval. Hence, the\nanalysis is as follows:\n\nmT log K).\n\nR\u03a0(1, T ) \u2264 m+1(cid:88)\n\n(cid:112)|\u03c4s| log(K \u2212 s) \u2264(cid:112)log K\n\nm+1(cid:88)\n\n(cid:112)|\u03c4s| \u2264(cid:112)(m + 1)T log K;\n\n(4)\n\nthe last inequality is essentially from the Cauchy-Schwarz inequality (see Lemma B.2).\n\ns=1\n\ns=1\n\nAlthough the basic Resetting-Hedge strategy adapts to m, it has many downsides. For example,\nresetting can be wasteful in practice. Another natural strategy, simply running Hedge on the set of\n\n5\n\n\f\u221a\nall K! permutation experts, is non-adaptive (obtaining regret O(\nT K log K) and computationally\ninef\ufb01cient if implemented na\u00efvely). However, as we show in Section 5.1, this algorithm can be\nimplemented ef\ufb01ciently (with runtime linear in K rather than K!) and also, as we show in Section 5.3,\nby running Hedge on top of several copies of Hedge (one per specially chosen learning rate), we can\nobtain a guarantee that is far better than Theorem 4.3. Moreover, our ef\ufb01cient implementation of\nHedge can be extended to adaptive algorithms like AdaHedge and FlipFlop [5].\n\n4.3 Known order of dying\n\nA natural question is whether Learner can leverage information about the order of experts that are\ngoing to die to achieve a better regret. We show that the answer is positive: the bound can be improved\nby a logarithmic factor. We also give a matching lower bound for this case (so both bounds are tight).\nSimilar to the unknown setting, we provide a construction to prove a lower bound on the ranking\n\nregret in this case. We still assume that(cid:112)T /2 log K < T .\n\n\u221a\nTheorem 4.4. When Learner knows the order of dying, the minimax regret is \u2126(\n\nmT ).\n\nProof Sketch. Our construction involves \ufb01rst partitioning all the rounds to m/2 days of equal length.\nOn day s, all experts will suffer loss 1 on all the rounds except for experts e2s\u22121 and e2s, who will\nsuffer losses drawn i.i.d. from a Bernoulli distribution with success probability p = 1/2. Experts\ne2s\u22121 and e2s will die at the end of day s, and therefore, each \u201cday game\u201d effectively has 2 experts;\nour lower bound holds even when Learner knows this fact. Furthermore, Learner will be aware that\nthese two experts (e2s\u22121 and e2s) will die at the end of day s. Similar to the proof of Theorem 4.2,\nthe minimax regret is lower bounded by the sum of the minimax regrets over each day game.\n\nAlthough the proof is relatively simple, it is at least a little surprising that knowing such rich\ninformation as the order of dying only improves the regret by a logarithmic factor.\nTo achieve an optimal upper bound, using the results of Theorem 3.1, the strategy is to create\n2m(K \u2212 m) experts (those that are effective) and run Hedge on this set.\nTheorem 4.5. For the case of known order of dying, the strategy as described above achieves a\n\nregret of O(cid:0)(cid:112)T (m + log K)(cid:1).\n\nT log K) for K number of experts. Therefore, running Hedge on\n\n\u221a\nProof. Hedge has regret of O(\n2m(K \u2212 m) experts yields the desired bound.\nThough the order of computation in the above strategy is better than O(K K), it is still exponential in\nK. In the next section, we introduce algorithms that simulate these strategies but in a computationally\nef\ufb01cient way.\n\n5 Ef\ufb01cient algorithms for dying experts\n\nThe results of [10] imply computational hardness of achieving no-regret algorithms in sleeping\nexperts; yet, we are able to provide ef\ufb01cient algorithms for dying experts in the cases of unknown and\nknown order of dying. For the sake of simplicity, we initially assume that only one expert dies each\nnight. Later, in Section 5.3, we show how to extend the algorithms for the general case where multiple\nexperts can die each night. We then show how to extend these algorithms to adaptive algorithms such\nas AdaHedge [5]. The algorithms for both cases are given in Algorithms 1 and 2.\n\n5.1 Unknown order of dying\n\nWe now show how to ef\ufb01ciently implement Hedge over the set of all the orderings. Even though\nResetting-Hedge is already ef\ufb01cient and achieves optimal regret, it has its own disadvantages. The\nissue arises when one needs to extend Resetting-Hedge to adaptive algorithms. This is particularly\nimportant in real-world scenarios, where Learner wants to adapt to the environment (such as stochastic\nor adversarial losses). We show that Algorithm 1, Hedge-Perm-Unknown (HPU), can be adapted to\nAdaHedge [19] and, therefore, we can simulate FlipFlop [5]. Next, we give the main idea on how the\n\n6\n\n\fAlgorithm 1: Hedge-Perm-Unknown (HPU)\n\u2200i \u2208 [K] ci,1 := 1, hi,1 := (K \u2212 1)!\nEa := {e1, e2, . . . eK}\nfor t = 1, 2, . . . , T do\n\nAlgorithm 2: Hedge-Perm-Known (HPK)\n\u2200i \u2208 [K] ci,1 := 1, hi,1 := (cid:100)2K\u2212i\u22121(cid:101)\nEa := {e1, e2, . . . eK}\nfor t = 1, 2, . . . , T do\n\n(cid:104)\n\n(cid:105)\n\n(cid:80)k\nhi,t\u00b7ci,t\nj=1 hj,t\u00b7cj,t\n\ni\u2208[K]\n\n1 [ei \u2208 Ea]\u00b7\nplay pt =\nreceive ((cid:96)1,t, . . . , (cid:96)K,t)\nfor ei \u2208 Ea do\n\nci,t+1 := ci,t \u00b7 e\u2212\u03b7(cid:96)i,t\nhi,t+1 := hi,t\nif expert j dies then\nEa := Ea \\ {ej}\nfor ei \u2208 Ea do\n\nhi,t+1 := hi,t+1 \u00b7 ci,t+1\n\n+(hj,t+1 \u00b7 cj,t+1)/|Ea|\n\nci,t+1 := 1\n\n(cid:104)\n\n1 [ei \u2208 Ea]\u00b7\nplay pt =\nreceive ((cid:96)1,t, . . . , (cid:96)K,t)\nfor ei \u2208 Ea do\n\n(cid:105)\n\n(cid:80)k\nhi,t\u00b7ci,t\nj=1 hj,t\u00b7cj,t\n\ni\u2208[K]\n\nci,t+1 := ci,t \u00b7 e\u2212\u03b7(cid:96)i,t\nhi,t+1 := hi,t\nif expert j dies then\nEa := Ea \\ {ej}\nfor each i = j + 1 to K do\nhi,t+1 := hi,t+1 \u00b7 ci,t+1\n+(hj,t+1 \u00b7 cj,t+1)((cid:100)2i\u22122(cid:101)/2K\u22121\u2212j)\n\nci,t+1 := 1\n\nalgorithm works, after which we prove that Algorithm 1 ef\ufb01ciently simulates running Hedge over \u03a0.\nBefore proceeding further, let us recall how Hedge makes predictions in round t. First, it updates the\nweights using wi,t = wi,t\u22121e\u2212\u03b7(cid:96)i,t, and it then assigns a probability to expert i as follows:\n\n(cid:80)K\n\nwi,t\u22121\ni=1 wi,t\u22121\n\n.\n\npi,t =\n\nRecall that e1, e2, . . . , eK denote the original experts while \u03c01, \u03c02, . . . \u03c0K! denote the orderings.\ni \u2286 \u03a0 to be the set of orderings\nDenote by wt\npredicting as expert ei in round t. The main ideas behind the algorithm are as follows:\n\n\u03c0 the weight that Hedge assigns to \u03c0 in round t. De\ufb01ne \u03a0t\n\n1. When \u03c0 and \u03c0(cid:48) have the same prediction e in round t (i.e. \u03c3t(\u03c0) = \u03c3t(\u03c0(cid:48)) = e), then we do\n\n\u03c0(cid:48); we use wt\ne\u2212\u03b7Lt\u22121\n\u03c0\u2208\u03a0t\n\n\u03c0\n\nj\n\n\u03c0(cid:48) instead for the weight of e.\n\u03c0 + wt\n, where \u03b7 is the learning rate and Lt\n\n\u03c0 is the cumu-\n\n\u03c0 =(cid:80)t\n\nlative loss of ordering \u03c0 up until round t, i.e., Lt\n\ns=1 (cid:96)\u03c3s(\u03c0),s.\n\nWe will discuss how to tune \u03b7 later. Let J = {j1, . . . , jm} represent the rounds on which any expert\nwill die. Denote by jt the last night observed so far at the end of round t, formally de\ufb01ned as\njt = maxj\u2208J j \u2264 t. We maintain a tuple (hi,t, ci,t) for each original expert ei\u2019s in the algorithm\nin round t, where hi,t is the sum of non-normalized weights of the experts in \u03a0t\ni in round jt. We\nsimilarly maintain ci,t, except that it only considers the loss suffered from jt + 1 to round t \u2212 1 for\nexperts in \u03a0t\n\nnot need to know wt\n\n2. The algorithm maintains(cid:80)\n\n\u03c0 and wt\n\n(cid:88)\n\ni. Formally:\n\ne\u2212\u03b7((cid:80)jt\nIt is easy to verify that hj,t \u00b7 cj,t =(cid:80)\n\nhi,t =\n\n\u03c0\u2208\u03a0t\n\ni\n\ne\u2212\u03b7Lt\u22121\n\n\u03c0\n\n.\n\n\u03c0\u2208\u03a0t\n\nj\n\n\u2212\u03b7((cid:80)t\u22121\n\n(cid:88)\n\n\u03c0\u2208\u03a0t\n\ni\n\ns=1 (cid:96)\u03c3s(\u03c0),s),\n\nci,t =\n\ne\n\ns=jt+1 (cid:96)\u03c3s(\u03c0),s) .\n\nThe computational cost of the algorithm at each round will be O(K). We claim that HPU will behave\nthe same as executing Hedge on \u03a0. We use induction on rounds to show the weights are the same\nin both algorithms. By \u201csimulating\u201d we mean that the weights over the original experts will be\nmaintained identically to how Hedge maintains them.\nTheorem 5.1. At every round, HPU simulates running Hedge on the set of experts \u03a0.\n\nProof Sketch. The main idea is to group the permutation experts with similar predictions (the \ufb01rst\nexpert alive in the permutation) in one group. Hence, initially there will be K groups. Then, if expert\nej dies, every ordering in the group associated with ej will be moved to another group and the empty\ngroup will be deleted. We prove that the orderings will distribute to other groups symmetrically after\na night. Using this fact, we show that we do not need to know the elements of a group; we only\nmaintain the sum of the weights given to all the orderings in each group.\n\n7\n\n\f5.2 Known order of dying\n\nFor the case of known order of dying, we propose Algorithm 2, Hedge-Perm-Known (HPK), which\nis slightly different than HPU. In particular, the weight redistribution (when an expert dies) and\ninitialization of coef\ufb01cient hi,1 is different. In the proof of Theorem 5.1, we showed that when the\nset of experts includes all the orderings, the weight of the expert j that died will distribute equally\nbetween initial experts (ej \u2208 E). But when the set of experts is only the effective experts, this no\nlonger holds. In this section, we assume without loss of generality that the experts die in the order\ne1, e2, . . . and recall that E denotes the set of effective orderings. Based on Theorem 3.1, the number\nof experts starting with ei in E is (cid:100)2K\u2212i\u22121(cid:101); we denote the set of such experts as Eei.\nTheorem 5.2. At every round, HPK simulates running Hedge on the set of experts E.\n\nthe learning rate for HPU is \u03b7 =(cid:112)(2 log(K!))/T and for HPK is \u03b7 =(cid:112)(2 log(2m(K \u2212 m)))/T .\n\nRemarks for tuning learning rates. For both algorithms, we assume T is known beforehand. So,\n\nOne can use a time-varying learning rate to adapt to T in case it is not known.\n\n5.3 Extensions for algorithms\n\nAs we mentioned at the beginning of Section 5, for the sake of simplicity we initially assumed that\nonly one expert dies each night. First, we discuss how to handle a night with more than one death.\nAfterwards, we explain how to extend/modify HPU and HPK to implement the Follow The Leader\n(FTL) strategy. We then introduce a new algorithm which simulates FTL ef\ufb01ciently and maintains L\u2217\nas well, where L\u2217\nt\nt is the cumulative loss of the best permutation expert through the end of round t.\nFinally, using L\u2217\nt , we explain how to simulate AdaHedge and FlipFlop [5] by slightly extending HPU\nand HPK.\n\nMore than one expert dying in a night. We handle nights with more than one death as follows.\nWe have one of the experts die on that night, and, for each expert j among the other experts that\nshould have died that night, we create a \u201cdummy round\u201d, give all alive experts (including expert j) a\nloss of zero, keep the learning rate the same as the previous round, and have expert j die at the end\nof the dummy round (which hence becomes a \u201cdummy night\u201d). Even though the number of rounds\nincreases with this trick, it is easy to see that the regret is unchanged since in dummy rounds all\nexperts have the same loss (and also the learning rate after the sequence of dummy rounds is the same\nas what it would have been had there been no dummy rounds). Moreover, since now one expert dies\non each night (some of which may be dummy nights), we may use Theorems 5.1 or 5.2 to conclude\nthat our algorithm correctly distributes any dying experts\u2019 weights among the alive experts.\n\nBeyond adaptivity to m. Consider the case of unknown order and let the number of nights m be\nunknown. As promised, we show that we can improve on the simple Resetting-Hedge strategy.\nTheorem 5.3. Consider running Hedge on top of K copies of HPU where, for r \u2208 {0, 1, . . . , K \u2212 1},\n\nwe set \u03b5r =(cid:81)r\u22121\nchanges experts at most l times. Then the regret of this algorithm is O(cid:0)(cid:112)T (l + 1) log K)(cid:1).\n\n\u03c0\u2217 be a best permutation expert in hindsight and suppose that the sequence (\u03c31(\u03c0\u2217), . . . , \u03c3T (\u03c0\u2217)\n\n:=(cid:112)8 log(1/\u03b5r)/t. Let\n\nl=0\n\n1\n\nK\u2212l and the r th copy of HPU uses learning rate \u03b7\u03b5r\n\nt\n\nNote that this theorem does better than adapt to m, as with m nights we always have l \u2264 m but l can\nin fact be much smaller than m in practice. Hence, Theorem 5.3 recovers and can improve upon the\nregret of Resetting-Hedge and, moreover, wasteful resetting is avoided. Also, while the computation\nincreases by a factor of K, it is easy to see that one can instead use an exponentially spaced grid of\nsize log2(K) to achieve regret of the same order.\n\nFollow the Leader. FTL might be the most natural algorithm proposed in online learning. In\nround t the algorithm plays the expert with the lowest cumulative loss up to round t, L\u2217\nt\u22121. By\nsetting \u03b7 = \u221e in Hedge and similarly, in HPU and HPK, we recover FTL; hence, our algorithms\ncan simulate FTL. The motivation for FTL is that it achieves constant regret (with respect to T )\nwhen the losses are i.i.d. stochastic and there is a gap in mean loss between the best and second best\n(permutation) experts. Our algorithms do not maintain L\u2217\nt to implement AdaHedge\n(which we discuss in the next extension). Here, we propose a simple algorithm to perform FTL on\nthe set of orderings. The algorithm works as follows:\n\nt , but we need L\u2217\n\n8\n\n\f1. Perform as FTL on alive initial experts and keep track of their cumulative losses\n\n(Lt\n\n1, Lt\n\n2, . . . , Lt\n\nK), while ignoring the dead experts;\n\n2. If expert j dies in round t(cid:48), then for every alive expert i where Lt(cid:48)\n\ni > Lt(cid:48)\nThis not only performs the same as FTL but also explicitly keeps track of L\u2217\nimplementation to simulate AdaHedge.\n\n:= Lt(cid:48)\nj .\n\nj do: Lt(cid:48)\nt . We will use this\n\ni\n\n\u02c6Lt =(cid:80)t\nand \u2206t = \u02c6Lt \u2212 Mt where Mt =(cid:80)t\n\nAdaHedge. The following change to the learning rate in HPU/HPK recovers AdaHedge. Let\n\u02c6(cid:96)r. For round t, AdaHedge on N experts sets the learning rate as \u03b7t = (ln N )/\u2206t\u22121\nln(wr \u00b7 e\u2212\u03b7r(cid:96)r ); here, mr can easily be\ncomputed using the weights from HPU/HPK. As we have the loss of the algorithm at each round,\nwe can calculate Mt. Also, using the implementation of FTL describe above, we can maintain L\u2217\nt .\nFinally, we can compute \u2206t and the regret of HPU/HPK.\n\nr=1 mr and mr = \u2212 1\n\nr=1\n\n\u03b7r\n\nFlipFlop. By combining AdaHedge and FTL, [5] proposes FlipFlop which can do as well as either\nof AdaHedge (minimax guarantees and more) or FTL (for the stochastic i.i.d. case). We can adapt\nHPK and HPU to FlipFlop by implementing AdaHedge and FTL as described above and switching\nbetween the two based on \u2206ah\nt but the learning rate\nassociated with mt for FTL is \u03b7ftl = \u221e while for AdaHedge it is \u03b7ah\nCorollary 5.1. By combining FTL and AdaHedge as described above, HPU and HPK simulate\nFlipFlop over set of experts A (where A = \u03a0 for HPU and A = E for HPK) and achieve regret\n\nis de\ufb01ned similar to \u2206ah\n.\nt = ln K\n\u2206t\u22121\n\nt , where \u2206ftl\nt\n\nt and \u2206ftl\n\n(cid:40)\n\n(cid:114)\n\nT (T \u2212 L\u2217\nL\u2217\nT )\n\nT\n\nln (|A|) + C3 ln (|A|)\n\n,\n\n(cid:41)\n\nRA(1, T ) < min\n\nC0Rftl\n\nA (1, T ) + C1, C2\n\nwhere C0, C1, C2, C3 are constants.\n\nThe interest in FlipFlop is that in the real-world we may not know if losses are stochastic or adversarial.\nThis motivates one to use an algorithms that detect and adapt to easier situations.\n\n6 Conclusion\n\n\u221a\n\nIn this work, we introduced the dying experts setting. We presented matching upper and lower\nbounds on the ranking regret for both the cases of known and unknown order of dying. In the case\nof known order, we saw that the reduction in the number of effective orderings allows our bounds\nto be reduced by a\nlog K factor. While it appears to be computationally hard to obtain sublinear\nregret in the general sleeping experts problem, in the restricted dying experts setting we provided\nef\ufb01cient algorithms with optimal regret bounds for both cases. Furthermore, we proposed an ef\ufb01cient\nimplementation of FTL for dying experts which, combined with ef\ufb01ciently maintaining mix losses,\nenabled us to extend our algorithms to simulate AdaHedge and FlipFlop. It would be interesting to\nsee if the notion of effective experts can be extended to other settings such as multi-armed bandits.\nFurthermore, it might be interesting to study the problem in regimes in between known and unknown\norder.\n\nAcknowledgments\n\nThis work was supported by the NSERC Discovery Grant RGPIN-2018-03942.\n\nReferences\n[1] Avrim Blum. Empirical support for winnow and weighted-majority algorithms: Results on a\n\ncalendar scheduling domain. Machine Learning, 26(1):5\u201323, 1997.\n\n[2] Avrim Blum and Yishay Mansour. From external to internal regret. Journal of Machine Learning\n\nResearch, 8(Jun):1307\u20131324, 2007.\n\n[3] Nicol\u00f2 Cesa-Bianchi and G\u00e1bor Lugosi. Prediction, Learning, and Games. Cambridge\n\nUniversity Press, 2006.\n\n9\n\n\f[4] Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, and Eli Upfal. Mortal multi-armed bandits.\n\nIn Advances in neural information processing systems, pages 273\u2013280, 2009.\n\n[5] Steven de Rooij, Tim van Erven, Peter D Gr\u00fcnwald, and Wouter M Koolen. Follow the leader\nif you can, hedge if you must. The Journal of Machine Learning Research, 15(1):1281\u20131316,\n2014.\n\n[6] Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning\nand an application to boosting. Journal of computer and system sciences, 55(1):119\u2013139, 1997.\n\n[7] Yoav Freund, Robert E Schapire, Yoram Singer, and Manfred K Warmuth. Using and combining\npredictors that specialize. In In Proceedings of the Twenty-Ninth Annual ACM Symposium on\nthe Theory of Computing. Citeseer, 1997.\n\n[8] Eyal Gofer, Nicolo Cesa-Bianchi, Claudio Gentile, and Yishay Mansour. Regret minimization\n\nfor branching experts. In Conference on Learning Theory, pages 618\u2013638, 2013.\n\n[9] Satyen Kale, Chansoo Lee, and D\u00e1vid P\u00e1l. Hardness of online sleeping combinatorial opti-\nmization problems. In Advances in Neural Information Processing Systems, pages 2181\u20132189,\n2016.\n\n[10] Varun Kanade and Thomas Steinke. Learning hurdles for sleeping experts. ACM Transactions\n\non Computation Theory (TOCT), 6(3):11, 2014.\n\n[11] Varun Kanade, H Brendan McMahan, and Brent Bryan. Sleeping experts and bandits with\nstochastic action availability and adversarial rewards. In Proceedings of the International\nConference on Arti\ufb01cial Intelligence and Statistics (AISTATS). JMLR Workshop and Conference\nProceedings Volume 5, 2009.\n\n[12] Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma. Regret bounds for\n\nsleeping experts and bandits. Machine learning, 80(2-3):245\u2013272, 2010.\n\n[13] Nick Littlestone and Manfred K Warmuth. The weighted majority algorithm. Information and\n\ncomputation, 108(2):212\u2013261, 1994.\n\n[14] Jaouad Mourtada and Odalric-Ambrym Maillard. Ef\ufb01cient tracking of a growing number of\nexperts. In International Conference on Algorithmic Learning Theory, pages 517\u2013539, 2017.\n\n[15] Gergely Neu and Michal Valko. Online combinatorial optimization with stochastic decision\nsets and adversarial losses. In Advances in Neural Information Processing Systems, pages\n2780\u20132788, 2014.\n\n[16] Francesco Orabona and D\u00e1vid P\u00e1l. Optimal non-asymptotic lower bound on the minimax regret\n\nof learning with expert advice. arXiv preprint arXiv:1511.02176, 2015.\n\n[17] Michel Talagrand. Regularity of in\ufb01nitely divisible processes. The Annals of Probability, 21(1):\n\n362\u2013432, 1993.\n\n[18] Michel Talagrand. The generic chaining, volume 154. Springer, 2005.\n\n[19] Tim van Erven, Wouter M Koolen, Steven de Rooij, and Peter Gr\u00fcnwald. Adaptive Hedge. In\n\nAdvances in Neural Information Processing Systems, pages 1656\u20131664, 2011.\n\n[20] Vladimir Vovk. Aggregating strategies.\n\nIn Proceedings of the third annual workshop on\n\nComputational learning theory, pages 371\u2013386. Morgan Kaufmann Publishers Inc., 1990.\n\n[21] Vladimir Vovk. A game of prediction with expert advice. Journal of Computer and System\n\nSciences, 56(2):153\u2013173, 1998.\n\n10\n\n\f", "award": [], "sourceid": 5285, "authors": [{"given_name": "Hamid", "family_name": "Shayestehmanesh", "institution": "University of Victoria"}, {"given_name": "Sajjad", "family_name": "Azami", "institution": "University of Victoria"}, {"given_name": "Nishant", "family_name": "Mehta", "institution": "University of Victoria"}]}