{"title": "Improving Online Algorithms via ML Predictions", "book": "Advances in Neural Information Processing Systems", "page_first": 9661, "page_last": 9670, "abstract": "In this work we study the problem of using machine-learned predictions to improve performance of online algorithms. We consider two classical problems, ski rental and non-clairvoyant job scheduling, and obtain new online algorithms that use predictions to make their decisions. These algorithms are oblivious to the performance of the predictor, improve with better predictions, but do not degrade much if the predictions are poor.", "full_text": "Improving Online Algorithms via ML Predictions\n\nRavi Kumar\n\nGoogle\n\nMountain View, CA\n\nravi.k53@gmail.com\n\nManish Purohit\n\nGoogle\n\nMountain View, CA\n\nmpurohit@google.com\n\nZoya Svitkina\n\nGoogle\n\nMountain View, CA\n\nzoya@cs.cornell.edu\n\nAbstract\n\nIn this work we study the problem of using machine-learned predictions to improve\nthe performance of online algorithms. We consider two classical problems, ski\nrental and non-clairvoyant job scheduling, and obtain new online algorithms that\nuse predictions to make their decisions. These algorithms are oblivious to the\nperformance of the predictor, improve with better predictions, but do not degrade\nmuch if the predictions are poor.\n\n1\n\nIntroduction\n\nDealing with uncertainty is one of the most challenging issues that real-world computational tasks,\nbesides humans, face. Ranging from \u201cwill it snow next week?\u201d to \u201cshould I rent an apartment or\nbuy a house?\u201d, there are questions that cannot be answered reliably without some knowledge of the\nfuture. Similarly, the question of \u201cwhich job should I run next?\u201d is hard for a CPU scheduler that\ndoes not know how long this job will run and what other jobs might arrive in the future.\nThere are two interesting and well-studied computational paradigms aimed at tackling uncertainty.\nThe \ufb01rst is in the \ufb01eld of machine learning where uncertainty is addressed by making predictions\nabout the future. This is typically achieved by examining the past and building robust models based\non the data. These models are then used to make predictions about the future. Humans and real-world\napplications can use these predictions to adapt their behavior: knowing that it is likely to snow next\nweek can be used to plan a ski trip. The second is in the \ufb01eld of algorithm design. Here, the effort\nhas to been to develop a notion of competitive ratio1 for the goodness of an algorithm in the presence\nof an unknown future and develop online algorithms that make decisions heedless of the future but\nare provably good in the worst-case, i.e., even in the most pessimistic future scenario. Such online\nalgorithms are popular and successful in real-world systems and have been used to model problems\nincluding paging, caching, job scheduling, and more (see the book by Borodin and El-Yaniv [5]).\nRecently, there has been some interest in using machine-learned predictions to improve the quality\nof online algorithms [20, 18]. The main motivation for this line of research is two-fold. The \ufb01rst\nis to design new online algorithms that can avoid assuming a worst-case scenario and hence have\nbetter performance guarantees both in theory and practice. The second is to leverage the vast\namount of modeling work in machine learning, which precisely deals with how to make predictions.\nFurthermore, as machine-learning models are often retrained on new data, these algorithms can\nnaturally adapt to evolving data characteristics. When using the predictions, it is important that\nthe online algorithm is unaware of the performance of the predictor and makes no assumptions on\nthe types of prediction errors. Additionally, we desire two key properties of the algorithm: (i) if\nthe predictor is good, then the online algorithm should perform close to the best of\ufb02ine algorithm\n(consistency) and (ii) if the predictor is bad, then the online algorithm should gracefully degrade, i.e.,\nits performance should be close to that of the online algorithm without predictions (robustness).\n\n1Informally, competitive ratio compares the worst-case performance of an online algorithm to the best of\ufb02ine\n\nalgorithm that knows the future.\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fOur problems. We consider two basic problems in online algorithms and show how to use machine-\nlearned predictions to improve their performance in a provable manner. The \ufb01rst is ski rental, in\nwhich a skier is going to ski for an unknown number of days and on each day can either rent skis at\nunit price or buy them for a higher price b and ski for free from then on. The uncertainty is in the\nnumber of skiing days, which a predictor can estimate. Such a prediction can be made reasonably\nwell, for example, by building models based on weather forecasts and past behavior of other skiers.\nThe ski rental problem is the canonical example of a large class of online rent-or-buy problems, which\narise whenever one needs to decide between a cheap short-term solution (\u201crenting\u201d) and an expensive\nlong-term one (\u201cbuying\u201d). Several extensions and generalizations of the ski rental problem have\nbeen studied leading to numerous applications such as dynamic TCP acknowledgement [11], buying\nparking permits [21], renting cloud servers [14], snoopy caching [13], and others. The best known\ndeterministic algorithm for ski rental is the break-even algorithm: rent for the \ufb01rst b \u2212 1 days and\nbuy on day b. It is easy to observe that the break-even algorithm has a competitive ratio of 2 and no\ndeterministic algorithm can do better. On the other hand, Karlin et al. [12] designed a randomized\nalgorithm that yields a competitive ratio of\nThe second problem we consider is non-clairvoyant job scheduling. In this problem a set of jobs, all\nof which are available immediately, have to be scheduled on one machine; any job can be preempted\nand resumed later. The objective is to minimize the sum of completion times of the jobs. The\nuncertainty in this problem is that the scheduler does not know the running time of a job until it\nactually \ufb01nishes. Note that a predictor in this case can predict the running time of a job, once again, by\nbuilding a model based on the characteristics of the job, resource requirements, and its past behavior.\nNon-clairvoyant job scheduling, introduced by Motwani et al. [23], is a basic problem in online\nalgorithms with a rich history and, in addition to its obvious applications to real-world systems, many\nvariants and extensions of it have been studied extensively in the literature [9, 3, 1, 10]. Motwani et\nal. [23] showed that the round-robin algorithm has a competitive ratio of 2, which is optimal.\n\ne\u22121 \u2248 1.58, which is also optimal.\n\ne\n\nMain results. Before we present our main results we need a few formal notions. In online algorithms,\nthe competitive ratio of an algorithm is de\ufb01ned as the worst-case ratio of the algorithm cost to the\nof\ufb02ine optimum. In our setting, this is a function c(\u03b7) of the error \u03b7 of the predictor2. We say that an\nalgorithm is \u03b3-robust if c(\u03b7) \u2264 \u03b3 for all \u03b7, and that it is \u03b2-consistent if c(0) = \u03b2. So consistency is a\nmeasure of how well the algorithm does in the best case of perfect predictions, and robustness is a\nmeasure of how well it does in the worst-case of terrible predictions.\nLet \u03bb \u2208 (0, 1) be a hyperparameter. For the ski rental problem with a predictor, we \ufb01rst obtain a\ndeterministic online algorithm that is (1 + 1/\u03bb)-robust and (1 + \u03bb)-consistent (Section 2.2). We\nnext improve these bounds by obtaining a randomized algorithm that is (\n1\u2212e\u2212(\u03bb\u22121/b) )-robust and\n1\u2212e\u2212\u03bb )-consistent, where b is the cost of buying (Section 2.3). For the non-clairvoyant scheduling\n(\nproblem, we obtain a randomized algorithm that is (2/(1 \u2212 \u03bb))-robust and (1/\u03bb)-consistent. Note\nthat the consistency bounds for all these algorithms circumvent the lower bounds, which is possible\nonly because of the predictions.\nIt turns out that for these problems, one has to be careful how the predictions are used. We illustrate\nthrough an example that if the predictions are used naively, one cannot ensure robustness (Section 2.1).\nOur algorithms proceed by opening up the classical online algorithms for these problems and using\nthe predictions in a judicious manner. We also conduct experiments to show that the algorithms we\ndevelop are practical and achieve good performance compared to ones that do not use any prediction.\n\n\u03bb\n\n1\n\nRelated work. The work closest to ours is that of Medina and Vassilvitskii [20] and Lykouris\nand Vassilvitskii [18]. The former used a prediction oracle to improve reserve price optimization,\nrelating the gap beween the expected bid and revenue to the average predictor loss. In a sense,\nthis paper initiated the study of online algorithms equipped with machine learned predictions. The\nlatter developed this framework further, introduced the concepts of robustness and consistency,\nand considered the online caching problem with predictions. It modi\ufb01ed the well-known Marker\nalgorithm to use the predictions ensuring both robustness and consistency. While we operate in the\nsame framework, none of their techniques are applicable to our setting. Another recent work is that\nof Kraska et al. [17] that empirically shows that better indexes can be built using machine learned\nmodels; it does not provide any provable guarantees for its methods.\n\n2The de\ufb01nition of the prediction error \u03b7 is problem-speci\ufb01c. In both the problems considered in this paper, \u03b7\n\nis de\ufb01ned to be the L1 norm of the error.\n\n2\n\n\fThere are other computational models that try to tackle uncertainty. The \ufb01eld of robust optimiza-\ntion [16] considers uncertain inputs and aims to design algorithms that yield good performance\nguarantees for any potential realization of the inputs. There has been some work on analyzing\nalgorithms when the inputs are stochastic or come from a known distribution [19, 22, 6]. In the\noptimization community, the whole \ufb01eld of online stochastic optimization concerns online decision\nmaking under uncertainty by assuming a distribution on future inputs; see the book by Russell Bent\nand Pascal Van Hentenryck [4]. Our work differs from these in that we do not assume anything about\nthe input; in fact, we do not assume anything about the predictor either!\n\n2 Ski rental with prediction\n\nIn the ski rental problem, let rentals cost one unit per day, b be the cost to buy, x be the actual number\nof skiing days, which is unknown to the algorithm, and y be the predicted number of days. Then\n\u03b7 = |y \u2212 x| is the prediction error. Note that we do not make any assumptions about its distribution.\nThe optimum cost is OPT = min{b, x}.\n\n2.1 Warmup: A simple consistent, non-robust algorithm\n\nWe \ufb01rst show that an algorithm that naively uses the predicted number of days to decide whether or\nnot to buy is 1-consistent, i.e., its competitive ratio is 1 when \u03b7 = 0. However, this algorithm is not\nrobust, as the competitive ratio can be arbitrarily large in case of incorrect predictions.\n\nAlgorithm 1: A simple 1-consistent algorithm\n\nif y \u2265 b then\nelse\n\nBuy on the \ufb01rst day.\n\nKeep renting for all skiing days.\n\nend\n\nLemma 2.1. Let ALG denote the cost of the solution obtained by Algorithm 1 and let OPT denote\nthe optimal solution cost on the same instance. Then ALG \u2264 OPT + \u03b7.\n\nProof. We consider different cases based on the relative values of the prediction y and the actual\nnumber of days x of the instance. Recall that Algorithm 1 incurs a cost of b whenever the prediction\nis at least b and incurs a cost of x otherwise.\n\n\u2022 y \u2265 b, x \u2265 b =\u21d2 ALG = b = OPT.\n\u2022 y < b, x < b =\u21d2 ALG = x = OPT\n\u2022 y \u2265 b, x < b =\u21d2 ALG = b \u2264 x + y \u2212 x = x + \u03b7 = OPT + \u03b7\n\u2022 y < b, x \u2265 b =\u21d2 ALG = x < b + x \u2212 y = b + \u03b7 = OPT + \u03b7\n\nA major drawback of Algorithm 1 is its lack of robustness. In particular, its competitive ratio can be\nunbounded if the prediction y is small but x (cid:29) b. Our goal next is to obtain an algorithm that is both\nconsistent and robust.\n\n2.2 A deterministic robust and consistent algorithm\n\n(cid:26) 1 + \u03bb\n\nIn this section, we show that a small modi\ufb01cation to Algorithm 1 yields an algorithm that is both\nconsistent and robust. Let \u03bb \u2208 (0, 1) be a hyperparameter. As we see later, varying \u03bb gives us a\nsmooth trade-off between the robustness and consistency of the algorithm.\nTheorem 2.2. With a parameter \u03bb \u2208 (0, 1), Algorithm 2 has a competitive ratio of at most\n. In particular, Algorithm 2 is (1 + 1/\u03bb)-robust and (1 + \u03bb)-\nmin\n\u03bb\nconsistent.\nProof. We begin with the \ufb01rst bound. Suppose y \u2265 b and the algorithm buys the skis at the start of\nday (cid:100)\u03bbb(cid:101). Since the algorithm incurs a cost of b +(cid:100)\u03bbb(cid:101)\u2212 1 whenever x \u2265 (cid:100)\u03bbb(cid:101), the worst competitive\n\n, (1 + \u03bb) +\n\n\u03b7\n\n(1 \u2212 \u03bb)OPT\n\n(cid:27)\n\n3\n\n\fAlgorithm 2: A deterministic robust and consistent algorithm.\n\nif y \u2265 b then\nelse\n\nBuy on the start of day (cid:100)\u03bbb(cid:101)\nBuy on the start of day (cid:100)b/\u03bb(cid:101)\n\nend\n\n\u03bb\n\n\u03bb\n\n(cid:1)(cid:100)\u03bbb(cid:101) =(cid:0) 1+\u03bb\n\nratio is obtained when x = (cid:100)\u03bbb(cid:101), for which OPT = (cid:100)\u03bbb(cid:101). In this case, we have ALG = b+(cid:100)\u03bbb(cid:101)\u22121 \u2264\nstart of day (cid:100)b/\u03bb(cid:101) and rents until then. In this case, the worst competitive ratio is attained whenever\n\nb + \u03bbb \u2264(cid:0) 1+\u03bb\nx = (cid:100)b/\u03bb(cid:101) as we have OPT = b and ALG = b + (cid:100)b/\u03bb(cid:101) \u2212 1 \u2264 b + b/\u03bb =(cid:0) 1+\u03bb\n\n(cid:1) OPT. On the other hand, when y < b, the algorithm buys skis at the\n\n(cid:1) OPT.\n\nTo prove the second bound, we need to consider the following two cases. Suppose y \u2265 b. Then,\nfor all x < (cid:100)\u03bbb(cid:101), we have ALG = OPT = x. On the other hand, for x \u2265 (cid:100)\u03bbb(cid:101), we have\nALG = b + (cid:100)\u03bbb(cid:101) \u2212 1 \u2264 (1 + \u03bb)b \u2264 (1 + \u03bb)(OPT + \u03b7). The second inequality follows since\neither OP T = b (if x \u2265 b) or b \u2264 y \u2264 OPT + \u03b7 (if x < b). Suppose y < b. Then, for all\nx \u2264 b, we have ALG = OPT = x. Similarly, for all x \u2208 (b,(cid:100)b/\u03bb(cid:101)), we have ALG = x \u2264 y + \u03b7 <\nb + \u03b7 = OPT + \u03b7. Finally for all x \u2265 (cid:100)b/\u03bb(cid:101), noting that \u03b7 = x \u2212 y > b/\u03bb \u2212 b = (1 \u2212 \u03bb)b/\u03bb,\nwe have ALG = b + (cid:100)b/\u03bb(cid:101) \u2212 1 \u2264 b + b/\u03bb < b + ( 1\n1\u2212\u03bb )\u03b7. Thus we obtain\nALG \u2264 (1 + \u03bb)OPT + ( 1\n\n1\u2212\u03bb )\u03b7 = OPT + ( 1\n\n1\u2212\u03bb )\u03b7, completing the proof.\n\n\u03bb\n\nThus, Algorithm 2 gives an option to trade-off consistency and robustness. In particular, greater trust\nin the predictor suggests setting \u03bb close to zero as this leads to a better competitive ratio when \u03b7 is\nsmall. On the other hand, setting \u03bb close to one is conservative and yields a more robust algorithm.\n\n2.3 A randomized robust and consistent algorithm\n\nIn this section we consider a family of randomized algorithms and compare their performance against\nan oblivious adversary. In particular, we design robust and consistent algorithms that yield a better\ntrade-off than the above deterministic algorithms. Let \u03bb \u2208 (1/b, 1) be a hyperparameter. For a given\n\u03bb, Algorithm 3 samples the day when skis are bought based on two different probability distributions,\ndepending on the prediction received, and rents until that day.\n\nAlgorithm 3: A randomized robust and consistent algorithm\n\nif y \u2265 b then\n\nb\n\nLet k \u2190 (cid:98)\u03bbb(cid:99);\n\nDe\ufb01ne qi \u2190(cid:0) b\u22121\nDe\ufb01ne ri \u2190(cid:0) b\u22121\n\nb\n\n(cid:1)k\u2212i \u00b7\n\n(cid:1)(cid:96)\u2212i \u00b7\n\nb(1\u2212(1\u22121/b)k) for all 1 \u2264 i \u2264 k;\n\n1\n\nChoose j \u2208 {1 . . . k} randomly from the distribution de\ufb01ned by qi;\nBuy at the start of day j.\nLet (cid:96) \u2190 (cid:100)b/\u03bb(cid:101);\n\nelse\n\nb(1\u2212(1\u22121/b)(cid:96)) for all 1 \u2264 i \u2264 (cid:96);\n\n1\n\nChoose j \u2208 {1 . . . (cid:96)} randomly from the distribution de\ufb01ned by ri;\nBuy at the start of day j.\n\nend\n\nTheorem 2.3. Algorithm 3 yields a competitive ratio of at most min{\nIn particular, Algorithm 3 is (\n\n1\u2212e\u2212(\u03bb\u22121/b) )-robust and (\n\n1\u2212e\u2212\u03bb )-consistent.\n\n\u03bb\n\n1\n\n1\n\n1\u2212e\u2212(\u03bb\u22121/b) ,\n\n\u03bb\n\n1\u2212e\u2212\u03bb (1 + \u03b7\n\nOPT )}.\n\nProof. We consider different cases depending on the relative values of y and x.\n(i) y \u2265 b, x \u2265 k. Here, we have OPT = min{b, x}. Since the algorithm incurs a cost of (b + i \u2212 1)\nwhen we buy at the beginning of day i, we have\n(b + i \u2212 1)\n\n(cid:18) b \u2212 1\n\n(b + i \u2212 1)qi =\n\n(cid:19)k\u2212i\n\nE[ALG] =\n\nk(cid:88)\n\nk(cid:88)\n\n=\n\nk\n\n1\n\nb(1 \u2212 (1 \u2212 1/b)k)\n\n1 \u2212 (1 \u2212 1/b)k\n\nb\n\ni=1\n\ni=1\n\n4\n\n\f(cid:19)\n\n(cid:18) k/b\n\n1 \u2212 e\u2212k/b\n\n(OPT + \u03b7) \u2264\n\n(cid:19)\n\n(cid:18)\n\n\u03bb\n\n1 \u2212 e\u2212\u03bb\n\n(OPT + \u03b7).\n\n\u2264\n\nk\n\n1 \u2212 e\u2212k/b\n\n\u2264\n\n(ii) y \u2265 b, x < k. Here, we have OPT = x. On the other hand, the algorithm incurs a cost of\n(b + i \u2212 1) only if it buys at the beginning of day i \u2264 x. In particular, we have\n\nE[ALG] =\n\n=\n\n=\n\nxqi\n\ni=x+1\n\n1\n\ni=1\n\nx(cid:88)\n\n(b + i \u2212 1)qi +\n\nk(cid:88)\n(cid:34) x(cid:88)\n(cid:18)\nb(1 \u2212 (1 \u2212 1/b)k)\n1 \u2212 (1 \u2212 1/b)k \u2264\n(cid:18)\n(cid:19)\n(cid:18) k/b\n(cid:19)\n\n1 \u2212 e\u2212k/b\n\ni=1\n\nx\n\n1\n\nOPT +\n\n\u2264\n\n1 \u2212 e\u2212k/b\n\n(cid:18) b \u2212 1\n\n(cid:19)k\u2212i\n(cid:18)\n\nk(cid:88)\n\nx\n\ni=x+1\n\n+\n\nOPT \u2264\n\n(cid:19)k\u2212i(cid:35)\n\nb\n\n(cid:18) b \u2212 1\n(cid:19)\n(cid:19)\n\nx\n\nOPT,\n\n(OPT + \u03b7),\n\n1\n\n1 \u2212 e\u2212(\u03bb\u22121/b)\n\n(cid:18) (b\u2212k)/b\n(cid:19)\n\n1 \u2212 e\u2212k/b\n\u03bb\n\n1 \u2212 e\u2212\u03bb\n\n(cid:18)\n\n1 \u2212 e\u2212k/b\n\n(b + i \u2212 1)\n\n1\n\n(cid:19)\n(cid:18) k/b\n(cid:18) k/b\n\n1 \u2212 e\u2212k/b\n\nb\n\n(cid:19)\n\n(cid:19)\n\n\u03b7 \u2264\n\nE[ALG] \u2264\n\nOPT =\n\n1 \u2212 e\u2212k/b\n\nOPT +\n\nwhich establishes robustness. In order to prove consistency, we can rewrite the RHS as follows\n\nsince x < k and b \u2212 k \u2264 \u03b7.\n(iii) y < b, x < (cid:96). Here, we have OPT = min{b, x}. On the other hand, the expected cost of the\nalgorithm can be computed similar to (ii)\n\n(cid:96)(cid:88)\n\n(cid:18)\n\n(cid:19)\n\nx\n\nE[ALG] =\n\n\u2264\n\n(b + i \u2212 1)ri +\n\n(cid:19)\n\n1\n\n1 \u2212 e\u22121/\u03bb\n\ni=x+1\n\n(OPT + \u03b7) \u2264\n\nxri \u2264\n\n(cid:18)\n\n1\n\n(cid:19)\n\n1 \u2212 e\u2212(cid:96)/b\n\u03bb\n\n1 \u2212 e\u2212\u03bb\n\n(OPT + \u03b7).\n\nx(cid:88)\n(cid:18)\n\ni=1\n\n(iv) y < b, x \u2265 (cid:96). Here, we have OPT = b. The expected cost incurred by the algorithm is as in (i).\n\n(b + i \u2212 1)ri =\n\n(cid:96)(cid:88)\n(cid:18) 1/\u03bb + 1/b\n\ni=1\n\n(1 \u2212 e\u22121/\u03bb)\n\n(cid:19)\n\nOPT \u2264\n\n(cid:96)\n\n1 \u2212 (1 \u2212 1/b)(cid:96) \u2264\n(cid:18)\n\n1\n\n(cid:19)\n\n(cid:100)b/\u03bb(cid:101)\n\n(1 \u2212 e\u2212(cid:96)/b)\n\n1 \u2212 e\u2212(\u03bb\u22121/b)\n\nOPT,\n\nE[ALG] =\n\n\u2264\n\nwhich proves robustness. To prove consistency, we rewrite the RHS as follows.\n\nE[ALG] \u2264\n\n\u2264\n\n(cid:96)\n\n1 \u2212 e\u2212(cid:96)/b\n\n1\n\n1 \u2212 e\u22121/\u03bb\n\n(cid:96)\n\n\u2264\n\n1 \u2212 e\u22121/\u03bb\n(OPT + \u03b7) \u2264\n\n=\n\n(cid:18)\n\n1\n\n(cid:19)\n\n1 \u2212 e\u22121/\u03bb\n\u03bb\n\n1 \u2212 e\u2212\u03bb\n\n(b + (cid:96) \u2212 b)\n\n(OPT + \u03b7).\n\nAlgorithms 2 and 3 both yield a smooth trade-off between the robustness and consistency guarantees\nfor the ski rental problem. As shown in Figure 1, the randomized algorithm offers a much better\ntrade-off by always guaranteeing smaller consistency for a given robustness guarantee. We remark\nthat setting \u03bb = 1 in Algorithms 2 and 3 allows us to recover the best deterministic and randomized\nalgorithms for the classical ski rental problem without using predictions.\n\n2.4 Extensions\n\nConsider a generalization of the ski rental problem where we have a varying demand xi for computing\nresources on each day i. Such a situation models the problem faced while designing small enterprise\ndata centers. System designers have the choice of buying machines at a high setup cost or renting\n\n5\n\n\fi. We de\ufb01ne \u03b7 =(cid:80)\n\nmachines from a cloud service provider to handle the computing needs of the enterprise. One can\nsatisfy the demand in two ways: either pay 1 to rent one machine and satisfy one unit of demand for\none day, or pay b to buy a machine and use it to satisfy one unit of demand for all future days. It is\neasy to cast the classical ski rental problem in this framework by setting xi = 1 for the \ufb01rst x days\nand to 0 later. Kodialam [15] considers this generalization and gives a deterministic algorithm with a\ncompetitive ratio of 2 as well as a randomized algorithm with competitive ratio of\nNow suppose we have predictions yi for the demand on day\ni |xi \u2212 yi| to be the total L1 error of the\npredictions. Both Algorithms 2 and 3 extend naturally to this\nsetting to yield the same robustness and consistency guarantees\nas in Theorems 2.2 and 2.3. Our results follow from viewing\nan instance of ski rental with varying demand problem as k\ndisjoint instances of the classical ski rental problem, where k\nis an upper bound on the maximum demand on any day. The\nproofs are similar to those in Sections 2.2 and 2.3; we omit\nthem for brevity.\n\nFigure 1: Ski rental: Robustness vs.\nconsistency.\n\ne\n\ne\u22121.\n\n3 Non-clairvoyant job scheduling with prediction\n\nWe consider the simplest variant of non-clairvoyant job scheduling, i.e., scheduling n jobs on a\nsingle machine with no release dates. The processing requirement xj of a job j is unknown to the\nalgorithm and only becomes known once the job has \ufb01nished processing. Any job can be preempted\nat any time and resumed at a later time without any cost. The objective function is to minimize the\nsum of completion times of the jobs. Note that no algorithm can yield any non-trivial guarantees if\npreemptions are not allowed.\nLet x1, . . . , xn denote the actual processing times of the n jobs, which are unknown to the non-\nclairvoyant algorithm. In the clairvoyant case, when processing times are known up front, the optimal\nalgorithm is to simply schedule the jobs in non-decreasing order of job lengths, i.e., shortest job\n\ufb01rst. A deterministic non-clairvoyant algorithm called round-robin (RR) yields a competitive ratio of\n2 [23], which is known to be best possible.\nNow, suppose that instead of being truly non-clairvoyant, the algorithm has an oracle that predicts the\nprocessing time of each job. Let y1, . . . , yn be the predicted processing times of the n jobs. Then\nj=1 \u03b7j is the total error. We assume that\nthere are no zero-length jobs and that units are normalized such that the actual processing time of\nthe shortest job is at least one. Our goal in this section is to design algorithms that are both robust\nand consistent, i.e., can use good predictions to beat the lower bound of 2, while at the same time\nguaranteeing a worst-case constant competitive ratio.\n\n\u03b7j = |xj \u2212 yj| is the prediction error for job j, and \u03b7 =(cid:80)n\n\n3.1 A preferential round-robin algorithm\n\nIn scheduling problems with preemption, we can simplify exposition by talking about several jobs\nrunning concurrently on the machine, with rates that sum to at most 1. For example, in the round-robin\nalgorithm, at any point of time, all k un\ufb01nished jobs run on the machine at equal rates of 1/k. This is\njust a shorthand terminology for saying that in any in\ufb01nitesimal time interval, 1/k fraction of that\ninterval is dedicated to running each of the jobs.\nWe call a non-clairvoyant scheduling algorithm monotonic if it has the following property: given two\n1, . . . , x(cid:48)\ninstances with identical inputs and actual job processing times (x1, . . . , xn) and (x(cid:48)\nn) such\nthat xj \u2264 x(cid:48)\nj for all j, the objective function value found by the algorithm for the \ufb01rst instance is no\nhigher than that for the second. It is easy to see that the round-robin algorithm is monotonic.\nWe consider the Shortest Predicted Job First (SPJF) algorithm, which sorts the jobs in the increasing\norder of their predicted processing times yj and executes them to completion in that order. Note that\nSPJF is monotonic, because if processing times xj became smaller (with predictions yj staying the\nsame), all jobs would \ufb01nish only sooner, thus decreasing the total completion time objective. SPJF\nproduces the optimal schedule in the case that the predictions are perfect, but for bad predictions,\nits worst-case performance is not bounded by a constant. To get the best of both worlds, i.e. good\n\n6\n\n\fperformance for good predictions as well as a constant-factor approximation in the worst-case, we\ncombine SPJF with RR using the following, calling the algorithm Preferential Round-Robin (PRR).\nLemma 3.1. Given two monotonic algorithms with competitive ratios \u03b1 and \u03b2 for the minimum total\ncompletion time problem with preemptions, and a parameter \u03bb \u2208 (0, 1), one can obtain an algorithm\nwith competitive ratio min{ \u03b1\n\n1\u2212\u03bb}.\n\n\u03bb , \u03b2\n\nProof. The combined algorithm runs the two given algorithms in parallel. The \u03b1-approximation (call\nit A) is run at a rate of \u03bb, and the \u03b2-approximation (B) at a rate of 1 \u2212 \u03bb. Compared to running at\nrate 1, if algorithm A runs at a slower rate of \u03bb, all completion times increase by a factor of 1/\u03bb, so it\nbecomes a \u03b1\n\u03bb -approximation. Now, the fact that some of the jobs are concurrently being executed by\nalgorithm B only decreases their processing times from the point of view of A, so by monotonicity,\nthis does not make the objective of A any worse. Similarly, when algorithm B runs at a lower rate of\n1 \u2212 \u03bb, it becomes a \u03b2\n1\u2212\u03bb-approximation, and by monotonicity can only get better from concurrency\nwith A. Thus, both bounds hold simultaneously, and the overall guarantee is their minimum.\n\nWe next analyze the performance of SPJF.\n\nLemma 3.2. The SPJF algorithm has competitive ratio at most(cid:0)1 + 2\u03b7\n\n(cid:1).\n\nn\n\nProof. Assume w.l.o.g. that jobs are numbered in non-decreasing order of their actual processing\ntimes, i.e. x1 \u2264 . . . \u2264 xn. For any pair of jobs (i, j), de\ufb01ne d(i, j) as the amount of job i that has\nbeen executed before the completion time of job j. In other words, d(i, j) is the amount of time by\nwhich i delays j. Let ALG denote the output of SPJF. Then\n\nALG =\n\nxj +\n\n(d(i, j) + d(j, i)).\n\nj=1\n\n(i,j):i 0.5 gives an algorithm that beats the round-robin ratio of 2 in the case of suf\ufb01ciently\ngood predictions. For the special case of zero prediction errors (or, more generally, if the order of\njobs sorted by yj is the same as that sorted by xj), we can obtain an improved competitive ratio of\n2\u03bb via a more sophisticated analysis.\n1+\u03bb\nTheorem 3.4. The preferential round-robin algorithm with parameter \u03bb \u2208 (0, 1) has competitive\nratio at most ( 1+\u03bb\n\n2\u03bb ) when \u03b7 = 0.\n\nProof. Suppose w.l.o.g. that the jobs are sorted in non-decreasing job lengths (both actual and\npredicted), i.e. x1 \u2264 \u00b7\u00b7\u00b7 \u2264 xn and y1 \u2264 \u00b7\u00b7\u00b7 \u2264 yn. Since the optimal solution schedules the jobs\nsequentially, we have\n\nn(cid:88)\n\nn(cid:88)\n\n(cid:88)\n\nOPT =\n\n(n \u2212 j + 1)xj =\n\nxj +\n\nxi.\n\n(1)\n\nj=1\n\nj=1\n\n(i,j):i