{"title": "Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling", "book": "Advances in Neural Information Processing Systems", "page_first": 581, "page_last": 591, "abstract": "From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art.", "full_text": "Staying up to Date with Online Content Changes\nUsing Reinforcement Learning for Scheduling\n\nAndrey Kolobov\nMicrosoft Research\nRedmond, WA-98052\n\nakolobov@microsoft.com\n\nYuval Peres\n\nyperes@gmail.com\n\nCheng Lu\n\nMicrosoft Bing\n\nBellevue, WA-98004\n\nCheng.Lu@microsoft.com\n\nEric Horvitz\n\nMicrosoft Research\nRedmond, WA-98052\n\nhorvitz@microsoft.com\n\nAbstract\n\nFrom traditional Web search engines to virtual assistants and Web accelerators,\nservices that rely on online information need to continually keep track of remote\ncontent changes by explicitly requesting content updates from remote sources\n(e.g., web pages). We propose a novel optimization objective for this setting that\nhas several practically desirable properties, and ef\ufb01cient algorithms for it with\noptimality guarantees even in the face of mixed content change observability and\ninitially unknown change model parameters. Experiments on 18.5M URLs crawled\ndaily for 14 weeks show signi\ufb01cant advantages of this approach over prior art.\n\nIntroduction\n\n1\nAs the Web becomes more and more dynamic, services that rely on web data face the increasingly\nchallenging problem of keeping up with online content changes. Whether it be a continuous-query\nsystem [26], a virtual assistant like Cortana or Google Now, or an Internet search engine, such a\nservice tracks many remote sources of information \u2013 web pages or data streams [27]. Users expect\nthese services, which we call trackers, to surface the latest information from the sources. This is\neasy if sources push content updates to the tracker, but few sources do. Instead, major trackers such\nas search engines must continually decide when to re-pull (crawl) data from sources to pick up the\nchanges. A policy that makes these decisions well solves the freshness crawl scheduling problem.\nFreshness crawl scheduling has several challenging aspects. For most sources, the tracker \ufb01nds\nout whether they have changed only when it crawls them. To guess when the changes happen, and\nhence should be downloaded, the tracker needs a predictive model whose parameters are initially\nunknown. Thus, the tracker needs to learn these models and optimize a freshness-related objective\nwhen scheduling crawls. For some web pages, however, sitemap polling and other means can provide\ntrustworthy near-instantaneous signals that the page has changed in a meaningful way, though not\nwhat the change is exactly. But even with these remote change observations and known change model\nparameters, freshness crawl scheduling remains highly nontrivial because the tracker cannot react\nto every individual predicted or actual change. The tracker\u2019s infrastructure imposes a bandwidth\nconstraint on the average daily number of crawls, usually just a fraction of the change event volume.\nLast but not least, Google and Bing track many billions of pages [32] with vastly different importance\nand change frequency characteristics. The sheer size of this constrained learning and optimization\nproblem makes low-polynomial algorithms for it a must, despite the availability of big-data platforms.\nThis paper presents a holistic approach to freshness crawl scheduling that handles all of the above\naspects in a computationally ef\ufb01cient manner with optimality guarantees using a type of reinforcement\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\flearning (RL) [29]. This problem has been studied extensively from different angles, as described\nin the Related Work section. The scheduling aspect per se, under various objectives and assuming\nknown model parameters, has been the focus of many papers, e.g., [2, 10, 13, 25, 35]. In distinction\nfrom these works, our approach has all of the following properties: (i) optimality; (ii) computational\nef\ufb01ciency; (iii) guarantee that every source changing at a non-zero rate will be occasionally crawled;\n(iv) ability to take advantage of remote change observations, if available. No other work has (iv), and\nonly [35] has (i)-(iii). Moreover, learning change models previously received attention [11] purely as\na preprocessing step. Our RL approach integrates it with scheduling, with convergence guarantees.\nSpeci\ufb01cally, our contributions are: (1) A natural freshness optimization objective based on harmonic\nnumbers, and analysis showing how its mathematical properties enable ef\ufb01cient optimal scheduling.\n(2) Ef\ufb01cient optimization procedures for this bandwidth-constrained objective under complete, mixed,\nand lacking remote change observability. (3) A reinforcement learning algorithm that integrates\nthese approaches with model estimation of [11] and converges to the optimal policy, lifting the\nknown-parameter assumption. (4) An approximate crawl scheduling algorithm that requires learning\nfar fewer parameters, and identifying a condition under which its solution is optimal.\n2 Problem formalization\n\nIn settings we consider, a service we call tracker monitors a set W of information sources. A source\nw \u2208 W can be a web page, a data stream, a \ufb01le, etc, whose content occasionally changes. To pick up\nchanges from a source, the tracker needs to crawl it, i.e., download its content. When source w has\nchanges the tracker hasn\u2019t picked up, the tracker is stale w.r.t. w; otherwise, it is fresh w.r.t. w. We\nassume near-instantaneous crawl operations, and a \ufb01xed set of sources W . Growing W to improve\ninformation completeness [27] is also an important but distinct problem; we do not consider it here.\nDiscrete page changes. We de\ufb01ne a content change at a source as an alteration at least minimally\nimportant to the tracker. In practice, trackers compute a source\u2019s content digest using data extractors,\nshingles [4], or similarity hashes [7], and consider content changed when its digest changes.\nModels of change process and importance. We model each source w \u2208 W \u2019s changes as a Poisson\nprocess with change rate \u2206w. Many prior works adopted it for web pages [2, 8, 9, 10, 11, 12, 35] as a\ngood balance between \ufb01delity and computational convenience. We also associate an importance score\n\u00b5w with each source, and denote these parameters jointly as (cid:126)\u00b5. Importance score \u00b5w can be thought\nof as characterizing the time-homogeneous Poisson rate at which the page is served in response to the\nquery stream, although in general it can be any positive weight measuring source signi\ufb01cance [2].\nWhile scores \u00b5w are de\ufb01ned by, and known to, the tracker, change rates \u2206w need to be learned.\nChange observability. For most sources, the tracker can \ufb01nd out whether the source has changed\nonly by crawling it. In this case, even crawling doesn\u2019t tell the tracker how many times the source\nhas changed since the last crawl. We denote the set of these sources as W \u2212 and say that the tracker\nreceives incomplete change observations about them. However, for other sources, which we denote as\nW o, the tracker may receive near-instant noti\ufb01cation whenever they change, i.e., get complete remote\nchange observations. E.g., for web pages these signals may be available from browser telemetry or\nsitemaps. Thus the tracker\u2019s set of sources can be represented as W = W o\u222a W \u2212 and W o\u2229 W \u2212 = 0.\nBandwidth constraints. Even if the tracker receives complete change observations, it generally\ncannot afford to do a crawl upon each of them. The tracker\u2019s network infrastructure and considerations\nof respect to other Internet users limit its crawl rate (the average number of requests per day); the\ntotal change rate of tracked sources may be much higher. We call this limit bandwidth constraint R.\nOptimizing freshness. The tracker operates in continuous time and starts fresh w.r.t. all sources. Our\nscheduling problem\u2019s solution is a policy \u03c0 \u2014 a rule that at every instant t chooses (potentially stochas-\ntically) a source to crawl or decides that none should be crawled. Executing \u03c0 produces a crawl se-\nquence of time-source pairs CrSeq = (t1, w1), (t2, w2), . . ., denoted CrSeqw = (t1, w), (t2, w), . . .\nfor a speci\ufb01c source w. Similarly, the (Poisson) change process at the sources generates a change\nsequence ChSeq = (t(cid:48)\ni; its restriction to\nsource w is ChSeqw. We denote the joint process governing changes at all sources as P ((cid:126)\u2206).\n3 Minimizing harmonic staleness penalty\n\ni is a change time of source w(cid:48)\n\n2), . . ., where t(cid:48)\n\n1, w(cid:48)\n\n1), (t(cid:48)\n\n2, w(cid:48)\n\nWe view maximizing freshness as minimizing costs the tracker incurs for the lack thereof, and\nassociate the time-averaged expected staleness penalty J \u03c0 with every scheduling policy \u03c0:\n\n2\n\n\f(cid:34)\n\n1\nT\n\n(cid:90) T\n\n(cid:88)\n\n0\n\nw\u2208W\n\n(cid:35)\n\n\u00b5wC(Nw(t))dt\n\n(1)\n\nJ \u03c0 = lim\nT\u2192\u221e\n\nE\n\nCrSeq\u223c\u03c0,\n\nChSeq\u223cP ((cid:126)\u2206)\n\nHere, T is a planning horizon, Nw(t) is the number of uncrawled changes source w has accumulated\nby time t, and C : Z+ \u2192 R+ is a penalty function, to be chosen later, that assigns a cost to every\npossible number of uncrawled changes. Note that Nw(t) implicitly depends on the most recent time\nw was crawled as well as on change sequence ChSeq, so the expectation is both over possible change\nsequences and possible crawl sequences CrSeq generatable by \u03c0. Minimizing staleness means\n\ufb01nding \u03c0\u2217 = argmin\u03c0\u2208\u03a0 J \u03c0 under bandwidth constraint, where \u03a0 is a suitably chosen policy class.\nChoosing C(n) that is ef\ufb01cient to optimize and induces \"well-behaving\" policies is of utmost\nimportance. E.g., C(n) = 1n>0, which imposes a \ufb01xed penalty if a source has any changes since\nlast crawl [2, 10], can be optimized ef\ufb01ciently in O(|W| log(|W|)) time over the class \u03a0 of policies\nthat crawl each source w according to a Poisson process with a source-speci\ufb01c rate \u03c1w. However, for\nmany sources, the optimal \u03c1\u2217\nw is 0 under this C(n) [2]. This is unacceptable in practice, as it leaves\nthe tracker stale w.r.t. some sources forever, raising a question: why monitor these sources at all?\nIn this paper, we propose and analyze the following penalty:\n\nC(n) = H(n) =\n\n1\ni\n\nif n > 0, and 0 if n = 0\n\n(2)\n\nn(cid:88)\n\ni=1\n\nH(n) for n > 0 is the n-th harmonic number and has several desirable properties as staleness penalty:\nIt is strictly monotonically increasing. Thus, it penalizes the tracker for every change that happened\nat a source since the previous crawl, not just the \ufb01rst one as in [10].\nIt is discrete-concave, providing diminishing penalties: intuitively, while all undownloaded changes\nat a source matter, the \ufb01rst one matters most, as it marks the transition from freshness to staleness.\n\"Good\" policies w.r.t. this objective don\u2019t starve any source as long as that source changes. This\nis because, as it turns out, policies that ignore changing sources incur J \u03c0 = \u221e if C(n) is as in Eq. 2\n(see Prop. 1 in Section 4). In fact, this paper\u2019s optimality results and high-level approaches are valid\nfor any concave C(n) \u2265 0 s.t. limn\u2192\u221e C(n) = \u221e, though possibly at a higher computational cost.\nIt allows for ef\ufb01ciently \ufb01nding optimal policies under practical policy classes. Indeed, C(n) =\nH(n) isn\u2019t the only penalty function satisfying the above properties. For instance, C(n) = nd\nfor 0 < d < 1 and C(n) = logd(1 + n) for d > 1 behave similarly, but result in much more\ncomputationally expensive optimization problems, as do other alternatives we have considered.\n\n4 Optimization under known change process\n\nWe now derive procedures for optimizing Eq. 1 with C(n) = H(n) (Eq. 2) under the bandwidth\nconstraint for sources with incomplete and complete change observations, assuming that we know the\nchange process parameters (cid:126)\u2206 exactly. In Section 5 we will lift the known-parameters assumption. We\nassume (cid:126)\u00b5, (cid:126)\u2206 > 0, because sources that are unimportant or never change don\u2019t need to be crawled.\n\n4.1 Case of incomplete change observations\n\nWhen the tracker can \ufb01nd out about changes at a source only by crawling it, we consider randomized\npolicies that sample crawl times for each source w from a Poisson process with rate \u03c1w:\n\n\u03a0\u2212 = {CrSeqw \u223c P oisson(\u03c1w) \u2200w \u2208 W \u2212|(cid:126)\u03c1 \u2265 0}\n\n(3)\nThis policy class re\ufb02ects the intuition that, since each source changes according to a Poisson process,\ni.e., roughly periodically, it should also be crawled roughly periodically. In fact, as Azar et al. [2]\nshow, any \u03c0 \u2208 \u03a0\u2212 can be de-randomized into a deterministic policy that is approximately periodic\nfor each w. Since every \u03c0 \u2208 \u03a0\u2212 is fully determined by the corresponding vector (cid:126)\u03c1, we can easily\n\nexpress a bandwidth constraint on \u03c0 \u2208 \u03a0\u2212 as(cid:80)\n\nw\u2208W \u2212 \u03c1w = R.\n\nTo optimize over \u03a0\u2212, we \ufb01rst express policy cost (Eq. 1) in terms of \u03a0\u2212\u2019s policy parameters (cid:126)\u03c1 \u2265 0:\n\n3\n\n\fProposition 1. For \u03c0 \u2208 \u03a0\u2212, J \u03c0 from Eq. 1 is equivalent to\n\nJ \u03c0 = \u2212 (cid:88)\n\n(cid:18) \u03c1w\n\n(cid:19)\n\n(4)\n\n\u00b5w ln\n\nw\u2208W \u2212\n\n\u2206w + \u03c1w\n\nProof. See the Supplement. Note that J \u03c0 = \u221e if \u03c1w = 0 for any w \u2208 W \u2212. The proof relies on\n(cid:4)\nproperties of Poisson processes, particularly memorylessness.\nThus, \ufb01nding \u03c0\u2217 \u2208 \u03a0\u2212 can be formalized as follows:\nProblem 1. [Finding \u03c0\u2217 \u2208 \u03a0\u2212]\nINPUT: bandwidth R > 0; positive importance and change rate vectors (cid:126)\u00b5, (cid:126)\u2206 > 0.\n\n= \u2212J \u03c0 = (cid:80)\n\n\u03c0\n\nw\u2208W \u2212\n\n(cid:16) \u03c1w\n\n(cid:17)\n\n\u00b5w ln\n\n\u2206w+\u03c1w\n\nsubject to\n\n(cid:80)\n\nw\u2208W \u2212\n\nOUTPUT: Crawl rates (cid:126)\u03c1 = (\u03c1w)w\u2208W \u2212 maximizing J\n\n\u03c1w = R, \u03c1w \u2265 0 for all w \u2208 W \u2212.\n\nThe next result readily identi\ufb01es the opti-\nmal solution to this problem:\nProposition 2. For (cid:126)\u00b5, (cid:126)\u2206 > 0, policy\n\u03c0\u2217 \u2208 \u03a0\u2212 parameterized by (cid:126)\u03c1\u2217 > 0 that\nsatis\ufb01es the following equation system is\nunique, minimizes harmonic penalty J \u03c0\nin Eq. 2, and is therefore optimal in \u03a0\u2212:\n\n(cid:40)\n(cid:80)\n\n\u03c1w =\n\n\u221a\n\nw+ 4\u00b5w \u2206w\n\n\u03bb \u2212\u2206w\n\n\u22062\n\n2\n\nw\u2208W \u2212 \u03c1w = R\n\n, for all w \u2208 W \u2212\n\n(5)\n\nAlgorithm 1: LAMBDACRAWL-INCOMLOBS:\n\ufb01nding the optimal crawl scheduling policy \u03c0\u2217 \u2208\n\u03a0\u2212 under incomplete change observations (Prob-\nlem 1)\nInput: R \u2265 0 \u2013 bandwidth;\n\n(cid:126)\u00b5 > 0, (cid:126)\u2206 > 0 \u2013 importance and change rates;\n\u0001 > 0 \u2013 desired precision on \u03bb\n\nOutput: (cid:126)\u03c1 \u2013 vector of crawl rates for each source.\n\nw\u2208W \u2212{\u00b5w}\n\nw\u2208W \u2212{\u00b5w}\n\nw\u2208W \u2212{\u2206w}R+R2\n\n|W \u2212| max\n\nw\u2208W \u2212{\u2206w} min\n\n1 \u03bblower \u2190(cid:91) |W \u2212|2 min\n2 \u03bbupper \u2190(cid:91) |W \u2212|2 max\n3 \u03bb \u2190(cid:91) BisectionSearch(\u03bblower, \u03bbupper, \u0001)\n(cid:113)\n5 foreach w \u2208 W \u2212 do \u03c1w \u2190(cid:91) \u2212\u2206w +\n\n4 // see, e.g., Burden & Faires [6]\n\nw\u2208W\u2212{\u2206w} max\n\n|W \u2212| min\n\nw\u2208W \u2212{\u2206w}R+R2\n\n6 Return (cid:126)\u03c1\n\nw + 4\u00b5w \u2206w\n\u22062\n2\n\n\u03bb\n\n\u0001\n\n\u03c0\n\n)|W \u2212|).\n\nProof. See the Supplement. The main in-\nsight is that for any (cid:126)\u00b5, (cid:126)\u2206 > 0 the Lagrange multiplier method, which gives rise to Eq. system 5,\n= \u2212J \u03c0 (Eq. 4) with (cid:126)\u03c1 > 0, which thus must correspond to\nidenti\ufb01es the only maximizer of J\n\u03c0\u2217 \u2208 \u03a0\u2212. Crucially, that solution always has \u03bb > 0.\n(cid:4)\nEq. system 5 is non-linear, but the r.h.s. of Eqs. involving \u03bb monotonically decreases in \u03bb > 0, so,\ne.g., bisection search [6] on \u03bb > 0 can \ufb01nd (cid:126)\u03c1\u2217 as in Algorithm 1.\nProposition 3. LAMBDACRAWL-INCOMLOBS (Algorithm 1) \ufb01nds an \u0001-approximation to Problem\n1\u2019s optimal solution in time O(log2( \u03bbupper\u2212\u03bblower\nProof. See the Supplement. The key step is showing that the solution \u03bb is in [\u03bblower, \u03bbupper]. (cid:4)\nNote that a convex problem like this could also be handled using interior-point methods, but the most\nsuitable ones have higher, cubic per-iteration complexity [5].\n4.2 Case of complete change observations\nIf the tracker receives a noti\ufb01cation every time a source changes, the policy class \u03a0\u2212 in Eq. 3 is\nclearly suboptimal, because it ignores these observations. At the same time, crawling every source on\nw\u2208W o \u2206w can easily\nexceed bandwidth R. These extremes suggest a policy class whose members trigger crawls for only a\nfraction of the observations, dictated by a source-speci\ufb01c probability pw:\n\nevery change signal is unviable, because the total change rate of all sources(cid:80)\n\n\u03a0o ={for all w \u2208 W o, on each observation ow crawl w with probability pw |0 \u2264 (cid:126)p \u2264 1}\n\nProposition 4. For \u03c0 \u2208 \u03a0o, J \u03c0 from Eq. 1 is equivalent to J \u03c0 = \u2212(cid:80)\n\n(6)\nAs with \u03a0\u2212, to \ufb01nd \u03c0\u2217 \u2208 \u03a0o we \ufb01rst express J \u03c0 from Eq. 1 in terms of \u03a0o\u2019s policy parameters (cid:126)p:\nw\u2208W o \u00b5w ln (pw) if (cid:126)p > 0\nand J \u03c0 = \u221e if pw = 0 for any w \u2208 W o.\nProof. See the Supplement. The key insight is that under any \u03c0 \u2208 \u03a0o, the number of w\u2019s uncrawled\n(cid:4)\nchanges at time t is geometrically distributed with parameter pw.\n\n4\n\n\fUnder any \u03c0 \u2208 \u03a0o, the crawl rate \u03c1w of any source is related to its change rate \u2206w: every time w\nchanges we get an observation and crawl w with probability pw. Thus, \u03c1w = pw\u2206w. Also, bandwidth\nw\u2208W o \u2206w isn\u2019t sensible, because with complete change observations the tracker doesn\u2019t\nbene\ufb01t from more crawls than there are changes. Thus, we frame \ufb01nding \u03c0\u2217 \u2208 \u03a0o as follows:\nProblem 2. [Finding \u03c0\u2217 \u2208 \u03a0o]\n\nR > (cid:80)\nINPUT: bandwidth R s.t. 0 < R \u2264(cid:80)\nOUTPUT: Crawl probabilities (cid:126)p = (pw)w\u2208W o subject to(cid:80)\n\nw\u2208W o; importance and change rate vectors (cid:126)\u00b5, (cid:126)\u2206 > 0.\n\nw\u2208W o pw\u2206w = R and 0 \u2264 pw \u2264\n\n1 for all w \u2208 W o, maximizing J\nNon-linear optimization under inequality constraints could generally take exponential time in the\nconstraint number. Our main result in this subsection is a polynomial optimal algorithm for Problem 2.\nFirst, consider a relaxation of Problem 2\nthat ignores the inequality constraints:\nProposition 5. The optimal solution (cid:126)\u02c6p\u2217 to\nthe relaxation of Problem 2 that ignores in-\nequality constraints is unique and assigns\n\u02c6p\u2217\nw =\n\nAlgorithm 2: LAMBDACRAWL-COMPLOBS:\n\ufb01nding the optimal crawl scheduling policy\n\u03c0\u2217 \u2208 \u03a0o under complete change observations\n(Problem 2)\n\n= \u2212J \u03c0 =(cid:80)\n\n1 LAMBDACRAWL-COMPLOBS:\n\nw(cid:48)\u2208W o \u00b5w(cid:48) for all w \u2208 W o.\n\nw\u2208W o \u00b5w ln (pw).\n\n(cid:80)\n\nR\u00b5w\n\n\u2206w\n\n\u03c0\n\nR s.t. 0 \u2264 R \u2264(cid:80)\n\n2\n\nInput: (cid:126)\u00b5, (cid:126)\u2206 \u2013 importance and change rate vectors\nw\u2208W \u2206w \u2013 bandwidth;\nOutput: (cid:126)p\u2217 \u2013 vector specifying optimal per-page crawl\n\n// ignore w\n\n7\n8\n9\n10\n11\n\n12\n\n13\n\n14\n\nalgorithm\n\nrem \\ {w}\n\n// reduce remaining\n\n// remaining sources to consider\n\n\u00b5w(cid:48) for all w \u2208 W o\n\nforeach w \u2208 W o\nw \u2265 1 then\np\u2217\n\nV iolationDetected \u2190(cid:91) F alse\n\nprobabilities upon receiving a change\nobservation.\nrem (cid:54)= \u2205 do\n4 while W o\nforeach w \u2208 W o\nrem do\n(cid:80)\n5\n\u02c6p\u2217\nR\u00b5w\n6\nw(cid:48)\u2208W o\n\n3 Wrem \u2190(cid:91) W o\nw \u2190(cid:91)\nw \u2190(cid:91) 1\nR \u2190(cid:91) R \u2212 \u2206w\nrem \u2190(cid:91) W o\n\nProof. See the Supplement. The proof\n(cid:4)\napplies Lagrange multipliers.\nOur\nLAMBDACRAWL-\nCOMPLOBS (Algorithm 2)\u2019s high-level\napproach is to iteratively (lines 4-14)\nsolve Problem 2\u2019s relaxations as in Prop.\n5 (lines 5-6), each time detecting sources\nthat activate (either meet or exceed) the\npw \u2264 1 constraints (line 9). (Note that\nw \u2264 0.)\nthe relaxed solution never has \u02c6p\u2217\nOur key insight, which we prove in the\nSupplement, is that any such source has\np\u2217\nw = 1. Therefore, we set p\u2217\nw = 1 for\neach of them, adjust the overall bandwidth\nconstraint for the remaining sources to\nRrem = R \u2212 p\u2217\nw\u2206w = R \u2212 \u2206w, and\nremove w from further consideration\n(lines 10-12). Eventually, we arrive at a\n(possibly empty) set of sources for which\nProp. 5\u2019s solution obeys all constraints\nunder the remaining bandwidth (lines\n15-16). Since Prop. 5\u2019s solution is optimal in this base case, the overall algorithm is optimal too.\nProposition 6. LAMBDACRAWL-COMPLOBS is optimal for Problem 2 and runs in time O(|W o|2).\n(cid:4)\nProof. See the Supplement. The proof critically relies on the concavity of J\nThe O(|W o|2) bound is loose. Each iteration usually discovers several active constraints at once, and\nfor many sources the constraint is never activated, so the actual running time is close to O(|W o|).\n4.3 Crawl scheduling under mixed observability\nIn practice, trackers have to simultaneously handle sources with and without complete change data\nunder a common bandwidth budget R. Consider a policy class that combines \u03a0\u2212 and \u03a0o:\n\u03a0(cid:9) =\n\nbandwidth\nW o\nonwards\nV iolationDetected = T rue\n\nFor all w \u2208 W o:{on each change observation ow, crawl w with probability pw|(cid:126)p}\n\n(cid:26)For all w \u2208 W \u2212:{CrSeqw \u223c P oisson(\u03c1w)|(cid:126)\u03c1},\n(cid:19)\n\nFor \u03c0 \u2208 \u03a0(cid:9), Prop.s 1 and 4 imply that J \u03c0 from Eq. 1 is equivalent to\n\nif V iolationDetected == F alse then break\n\n16\n17 Return (cid:126)p\u2217 = (p\u2217\n\n15 foreach w \u2208 W o\n\nw \u2190(cid:91) \u02c6p\u2217\n\nw)w\u2208W o\n\nrem do\n\n\u2206w\n\nrem\n\np\u2217\n\nw\n\n(7)\n\nrem do\n\nif \u02c6p\u2217\n\n\u03c0.\n\nrem\n\n\u00b5w ln (pw)\n\n(8)\n\nJ \u03c0 = \u2212 (cid:88)\n\n\u00b5w ln\n\nw\u2208W \u2212\n\n\u2212 (cid:88)\n\nw\u2208W o\n\n(cid:18) \u03c1w\n\n\u2206w + \u03c1w\n\n5\n\n\fOptimization over \u03c0 \u2208 \u03a0(cid:9) can be stated as follows:\nProblem 3. [Finding \u03c0\u2217 \u2208 \u03a0(cid:9)]\nINPUT: bandwidth R > 0; importance and change rate vectors (cid:126)\u00b5, (cid:126)\u2206 > 0.\nOUTPUT: Crawl rates (cid:126)\u03c1 = (\u03c1w)w\u2208W \u2212 and crawl probabilities (cid:126)p = (pw)w\u2208W o maximizing\n\n(cid:18) \u03c1w\n\n(cid:19)\n\n(cid:88)\n\nsubj. to(cid:80)\n\n\u03c0\n\nJ\n\nw\u2208W \u2212 \u03c1w +(cid:80)\n\n= \u2212J \u03c0 =\n\n\u00b5w ln\n\n(9)\nw\u2208W o pw\u2206w = R, \u03c1w > 0 for all w \u2208 W \u2212, 0 < pw \u2264 1 for all w \u2208 W o.\n\n\u00b5w ln (pw)\n\n\u2206w + \u03c1w\n\nw\u2208W \u2212\n\nw\u2208W o\n\n+\n\n(cid:88)\n\n\u2217\n\n:\n\nw\u2208Wo\n\n1 Ro\n2 Ro\n\nmin, Ro\n\nmax, \u0001)\n\nw\u2208Wo\n\n\u2206w}\n\u2217\n\nOutput: J\n\n\u2217\n\nchange observations, R, (cid:126)\u00b5, (cid:126)\u2206, \u0001no-obs\n\nInput: Ro \u2013 bandwidth for sources with complete\n\n[0, min{R,(cid:80)\n\nOutput: (cid:126)\u03c1\u2217, (cid:126)p\u2217 \u2013 crawl rates and probabilities for\n\nsources without and with complete change\nobservations.\n\n(cid:126)\u00b5 > 0, (cid:126)\u2206 > 0 \u2013 importance and change rates;\n\u0001no-obs, \u0001 > 0 \u2013 desired precisions\n\nmax \u2190(cid:91) min{R,(cid:80)\nmin \u2190(cid:91) 0\n3 (cid:126)\u03c1\u2217, (cid:126)p\u2217 \u2190(cid:91)\n\nOptMaxSearch(Split-Eval-J\n\n, Ro\n4 // E.g., Golden section search [20]\n5 Return (cid:126)\u03c1\u2217, (cid:126)p\u2217\n6\n7 SPLIT-EVAL-J\n\nAlgorithm 3: LAMBDACRAWL: \ufb01nding optimal\nmixed-observability policy \u03c0\u2217 \u2208 \u03a0(cid:9) (Problem 3)\nInput: R > 0 \u2013 bandwidth;\n\nThe optimization objective (Eq. 9) is\nstrictly concave as a sum of concave func-\ntions over the constrained region, and\ntherefore has a unique maximizer. Find-\ning it amounts to deciding how to split\nthe total bandwidth R into Ro for sources\nwith complete change observations and\nR\u2212 = R \u2212 Ro for the rest. For any candi-\ndate split, LAMBDACRAWL-COMPLOBS\nand LAMBDACRAWL-INCOMLOBS give\nus the reward-maximizing policy param-\neters (cid:126)p\u2217(Ro) and (cid:126)\u03c1\u2217(R\u2212), respectively,\nand Eq.\n9 then tells us the overall\n\u2217\n(Ro, R\u2212) of that split. We also\nvalue J\nknow that for the optimal split, Ro\u2217 \u2208\n\u2206w}], as discussed\nimmediately before Problem 2. Thus, we\ncan \ufb01nd Problem 3\u2019s maximizer to any\ndesired precision using a method such as\nGolden-section search [20] on Ro. LAMB-\nDACRAWL (Algorithm 3) implements this\nidea, where SPLIT-EVAL-J\n(line 7) eval-\n(Ro, R\u2212) and OptMaxSearch de-\nuates J\nnotes an optimal search method.\nProposition 7. LAMBDACRAWL (Algo-\n\u0001 )) calls to\nrithm 3) \ufb01nds an \u0001-approximation to Problem 3\u2019s optimal solution using O(log( R\nLAMBDACRAWL-INCOMLOBS and LAMBDACRAWL-COMPLOBS.\nProof. This follows directly from the optimality of LAMBDACRAWL-INCOMLOBS and\nLAMBDACRAWL-COMPLOBS (Prop.s 2 and 6), as well as of OptMaxSearch such as Golden section,\n(cid:4)\nwhich makes O(log( R\n5 Reinforcement learning for scheduling\nAll our algorithms so far assume known change rates, but in reality change rates are usually unavailable\nand vary with time, requiring constant re-learning. In this section we modify LAMBDACRAWL into a\nmodel-based reinforcement learning (RL) algorithm that learns change rates on the \ufb02y.\nFor a source w, suppose the tracker observes binary change indicators {zj}U\nobservation times and zj = 1 iff w changed since tj\u22121 at least once. Consider two cases:\nIncomplete change observations for w. Here, the tracker generates the sequence {zj}U\nj=1 for each\nsource w by crawling it. If zj = 1, the tracker still doesn\u2019t know exactly how many times the source\nchanged since time tj\u22121. Denoting atj = tj \u2212 tj\u22121, j \u2265 1, \u02c6\u2206 that solves\n\n8 (cid:126)\u03c1 \u2190(cid:91) LAMBDACRAWL-INCOMLOBS(R \u2212\n9 (cid:126)p \u2190(cid:91) LAMBDACRAWL-COMPLOBS(Ro, (cid:126)\u00b5W o , (cid:126)\u2206W o )\n10 Return(cid:80)\n\nj=1, where {tj}U\n\nRo, (cid:126)\u00b5W \u2212 , (cid:126)\u2206W \u2212 , \u0001no-obs)\n\n(Eq. 9) for the given split\n\n(cid:16) \u03c1w\n\n\u0001 )) iterations.\n\n(cid:17)\n\n+(cid:80)\n\nw\u2208W \u2212 \u00b5w ln\n\n\u2206w +\u03c1w\n\nw\u2208W o \u00b5w ln (pw)\n\nj=0 are\n\n\u2217\n\n\u2217\n\n(cid:88)\n\n\u2212 (cid:88)\n\naj\n\neaj \u2206 \u2212 1\n\nj:zj =1\n\nj:zj =0\n\naj = 0,\n\n(10)\n\nis an MLE of \u2206 for the given source [11]. The l.h.s. of the equation is monotonically decreasing in\n\u2206, so \u02c6\u2206 can be ef\ufb01ciently found numerically. This estimator is consistent under mild conditions [11],\ne.g., if the sequence {aj}\u221e\n\nj=1 doesn\u2019t converge to 0, i.e., if the observations are spaced apart.\n\n6\n\n\fComplete change observations for w.\nIn this case, for all j, zj = 1: an observa-\ntion indicating exactly one change arrives\non every change. Here a consistent MLE\nof \u2206 is the observation rate [30]:\n\n\u02c6\u2206 = (U + 1)/tU ,\n\n(11)\n\nrate\n\nAlgorithm 4: LAMBDALEARNANDCRAWL: \ufb01nd-\ning optimal crawl scheduling policy \u03c0\u2217 \u2208 \u03a0(cid:9)\n(Problem 3) under initially unknown change model\nInput: R > 0 \u2013 bandwidth;\n\n(cid:126)\u00b5 > 0,\n\n(cid:126)\u02c6\u22060 > 0 \u2013 importance and initial change\n\nguesses\n\nn, (cid:126)p\u2217\n\nsuf\ufb01x length for learning (cid:126)\u2206 in that epoch\n\n3 foreach 1 \u2264 n \u2264 Nepochs do\n\nsuf\ufb01x\n\n2 obs_hist \u2190(cid:91) ()\n\n1 // obs_hist[S(n)] is S(n)-length observation history\n\n\u0001no-obs, \u0001 > 0 \u2013 desired precisions\nTepoch > 0 \u2013 duration of an epoch\nNepochs > 0 \u2013 number of epochs\nS(n) \u2013 for each epoch n, observation history\n\n(cid:126)\u03c1\u2217\nn, (cid:126)p\u2217\n(cid:126)\u02c6\u2206n\u22121, \u0001no-obs, \u0001)\n// (cid:126)Znew holds observations for all sources from start\nto\n// end of epoch n. Execute policy ((cid:126)\u03c1\u2217\n\nLAMBDALEARNANDCRAWL, a model-\nbased RL version of LAMBDACRAWL that\nuses these estimators to learn model pa-\nrameters simultaneously with scheduling\nis presented in Algorithm 4. It operates\nin epochs of length Tepoch time units each\n(lines 3-13). At the start of each epoch\nn, it calls LAMBDACRAWL (Algorithm\n3) on the available (cid:126)\u02c6\u2206n\u22121 change rate es-\ntimates to produce a policy ((cid:126)\u03c1\u2217\nn) opti-\nmal with respect to them (line 4). Execut-\ning this policy during the current epoch,\nfor the time period of Tepoch, and record-\ning the observations extends the observa-\ntion history (lines 7-8). (Note though that\nfor sources w \u2208 W o, the observations\ndon\u2019t depend on the policy.) It then re-\nestimates change rates using a suf\ufb01x of\nthe augmented observation history (lines\n10-13). Under mild assumptions, LAMB-\nDALEARNANDCRAWL converges to the\noptimal policy:\nProposition 8. LAMBDALEARNAND-\nCRAWL (Algorithm 4) converges in\nprobability to the optimal policy un-\nder\ni.e.,\nlimNepochs\u2192\u221e((cid:126)\u03c1Nepochs, (cid:126)pNepochs ) = ((cid:126)\u03c1\u2217, (cid:126)p\u2217), if (cid:126)\u2206 is stationary and S(n), the length of the his-\ntory\u2019s training suf\ufb01x, satis\ufb01es S(Nepoch) = length(obs_hist).\nProof. See the Supplement. It follows from the consistency and positivity of the change rate estimates,\n(cid:4)\nas well as LAMBDACRAWL\u2019s optimality\n\nn \u2190(cid:91) LAMBDACRAWL(R, (cid:126)\u00b5,\n(cid:126)Znew \u2190(cid:91) ExecuteAndObserve((cid:126)\u03c1\u2217\n\u02c6\u2206nw \u2190(cid:91) Solve( (cid:80)\n(cid:80)\n\u02c6\u2206nw \u2190(cid:91) Solve(\n\nAppend(obs_hist, (cid:126)Znew)\n// Learn new (cid:126)\u2206 estimates using Eqs. 10 and 11\nforeach w \u2208 W \u2212 do\n\n+ 0.5\naj \u2212 0.5 = 0, obs_hist[S(n)])\n\neaj \u2206\u22121\n\nj:zjw =1\n\nn, (cid:126)p\u2217\nn, (cid:126)p\u2217\n\nn) to get it\nn, Tepoch)\n\nthe true change rates (cid:126)\u2206,\n\nj:zjw =0\n\nforeach w \u2208 W o do\n\nUS(n)+0.5\nS(n)+0.5 , obs_hist[S(n)])\n\ne0.5\u2206\u22121 \u2212\n\n4\n\n5\n\n6\n\n7\n\n8\n\n9\n10\n11\n\n12\n\n13\n\naj\n\nLAMBDALEARNANDCRAWL in practice requires attention to several aspects:\nStationarity of (cid:126)\u2206. Source change rates may vary with time, so the length of history suf\ufb01x for\nestimating (cid:126)\u2206 should be shorter than the entire available history.\nSingularities of (cid:126)\u02c6\u2206 estimators. The MLE in Eq. 10 yields \u02c6\u2206w = \u221e if all crawls detect a change (the\nr.h.s. is 0). Similarly, Eq. 11 produces \u02c6\u2206w = 0 if no observations about w arrive in a given period. To\navoid these singularities without affecting consistency, we smooth the estimates by adding imaginary\nobservation intervals of length 0.5 to Eq. 10 and imaginary 0.5 observation to Eq. 11 (lines 11,13).\nNumber of parameters. Learning a change rate separately for each source can be slow. Instead,\nwe can generalize change rates across sources (e.g., [12]). Alternatively, sometimes we can avoid\nlearning for most pages altogether:\nProposition 9. Suppose the tracker\u2019s set of sources W \u2212 is such that for some constant c > 0,\n= c for all w \u2208 W \u2212. Then minimizing harmonic penalty under incomplete change observations\n\n\u00b5w\n\u2206w\n\n(Problem 1) has \u03c1\u2217\n\nw =\n\n(cid:80)\n\u00b5wR\u2212\nw(cid:48)\u2208W\u2212 \u00b5w(cid:48) .\n\nc \u00b5w into Eq. system 5. (cid:4)\nProof. See the Supplement. The proof proceeds by plugging in \u2206w = 1\nThus, if the importance-to-change-rate ratio is roughly equal across all sources, then their crawl rates\ndon\u2019t depend on change rates or even the ratio constant itself. Thus, we don\u2019t need to learn them for\nsources w \u2208 W \u2212 and can hope for faster convergence, although for some quality loss (see Section 7).\n\n7\n\n\f6 Related work\nScheduling for Posting, Polling, and Maintenance. Besides monitoring information sources,\nmathematically related settings arise in smart broadcasting in social networks [19, 31, 33, 36], per-\nsonalized teaching [31], database synchronization [14], and job and maintenance service scheduling\n[1, 3, 15, 16]. In web crawling context (see Olston & Najork [22] for a survey), the closest works are\n[10], [35], [25], and [2]. Like [10] and [2], we use Lagrange multipliers for optimization, and adopt\nthe Poisson change model of [10] and many works since. Our contributions differ from prior art in\nseveral ways: (1) optimization objectives (see below) and guarantees; (2) special crawl scheduling\nunder complete change observations; (3) reinforcement learning of model parameters during crawling.\nOptimization objectives. Our objective falls in the class of convex separable resource allocation\nproblems [17]. So do most other related objectives: binary freshness/staleness [2, 10], age [8], and\nembarrassment [35]. The latter is implemented via specially constructed importance scores [35], so\nour algorithms can be used for it too. Other separable objectives include information longevity [23].\nIn contrast, Pandey & Olston [25] focus on an objective that depends on user behavior and cannot be\nseparated into contributions from individual sources. While intuitively appealing, their measure can\nbe optimized only via many approximations [25], and the algorithm for it is ultimately heuristic.\nAcquiring model parameters. Importance can be de\ufb01ned and quickly determined from information\nreadily available to search engines, e.g., page relevance to queries [35], query-independent popularity\nsuch as PageRank [24], and other features [25, 28]. Learning change rates is more delicate. Change\nrate estimators we use are due to [11]; our contribution in this regard is integrating them into crawl\nscheduling while providing theoretical guarantees, as well as identifying conditions when estimation\ncan be side-stepped using an approximation (Prop. 9). While many works adopted the homogeneous\nPoisson change process [2, 8, 9, 10, 11, 12, 35], its non-homogeneous variant [14], quasi-deterministic\n[35], and general marked temporal point process [31] change models were also considered. Change\nmodels can also be inferred via generalization using source co-location [12] or similarity [28].\nRL. Our setting could be viewed as a restless multi-armed bandit (MAB) [34], a MAB type that\nallows an arm to change its reward/cost distribution without being pulled. However, no known restless\nMAB class allows arms to incur a cost/reward without being pulled, as in our setting. This distinction\nmakes existing MAB analysis such as [18] inapplicable to our model. RL with events and policies\nobeying general marked temporal point processes was studied in [31]. However, it relies on DNNs\nand as a result doesn\u2019t provide guarantees of convergence, optimality, other policy properties, or a\nmechanism for imposing strict constraints on bandwidth, and is far more expensive computationally.\n7 Empirical evaluation\nOur experimental evaluation assesses the relative performance of LAMBDACRAWL, LAMB-\nDACRAWLAPPROX, and existing alternatives, evaluates the bene\ufb01t of using completes change\nobservations, and shows empirical convergence properties of RL for crawl scheduling (Sec. 5). Please\nrefer to the Supplement, Sec. 9, for details of the experiment setup. All the data and code we\nused are available at https://github.com/microsoft/Optimal-Freshness-Crawl-Scheduling.\nMetrics. We assessed the algorithms in terms of two criteria. One is the harmonic policy cost J \u03c0\nh ,\nde\ufb01ned as in Eq. 1 with C(n) as in Eq. 2, which LAMBDACRAWL optimizes directly. The other is the\nbinary policy cost J \u03c0\nb , also de\ufb01ned as in Eq. 1 but with C(n) = 1n>0. It was used widely in previous\nworks, e.g., [2, 10, 35], and is optimized directly by BinaryLambdaCrawl [2]. LAMBDACRAWL\ndoesn\u2019t claim optimality for it, but we can still use it to evaluate LAMBDACRAWL\u2019s policy.\nData and baselines. The experiments used web page change and importance data collected by\ncrawling 18, 532, 314 URLs daily for 14 weeks. We compared LAMBDACRAWL (labeled LC in the\n\ufb01gures), LAMBDACRAWLAPPROX (LCA, LC with Prop. 9\u2019s approximation), and their RL variants\nLLC (Alg. 4) and LLCA to BinaryLambdaCrawl (BLC) [2], the state-of-the-art optimal algorithm for\nh = \u221e (see Fig. 1), we also\nthe binary cost J \u03c0\nused our own variant, BLC\u0001, with the non-starvation guarantee, and its RL \ufb02avor BLLC\u0001. Finally, we\nused ChangeRateCrawl (CC) [10, 35] and UniformCrawl (UC) [9, 23] heuristics. In each run of an\nexperiment, the bandwidth R was 20% of the total number of URLs used in that run.\nResults. We conducted three experiments, whose results support the following claims:\n(1) LAMBDACRAWL\u2019s harmonic staleness cost J \u03c0\nh is a more robust objective than the binary cost\nb widely studied previously: optimizing the former yields policies that are also near-optimal w.r.t.\nJ \u03c0\nthe latter, while the converse is not true. In this experiment, whose results are shown in Fig. 1, we\n\nb . Since BLC may crawl-starve sources and hence get J \u03c0\n\n8\n\n\f\u03c0\nJ\n\nt\ns\no\nc\n\ny\nc\ni\nl\no\nP\n\n600\n\n400\n\n200\n\n0\n\n\u221e\n\u221e\n\u221e\n\n\u03c0\nJ\n\nt\ns\no\nc\ny\nc\ni\nl\no\nP\n\n8,000\n\n6,000\n\n4,000\n\n2,000\n\n0\n\n\u221e\n\u221e\n\u221e\n\nLC-CO LC-IO BLC\u0001 BLC\n\n500\n\n400\n\n\u03c0\nJ\n\nt\ns\no\nc\n\ny\nc\ni\nl\no\nP\n\n300\n\n0\n\nLLC\nBLLC\u0001\n\nLLCA\n\n5\n\n10\n\n15\n\n20\n\nNumber of epochs (days)\n\nLCA BLC\u0001 BLC\n\nLC\nHarmonic J \u03c0\nh\n\nBinary J \u03c0\nb\n\nb (Fig. 1).\n\nHarmonic J \u03c0\nh\n\nBinary J \u03c0\nb\nFigure 2: Bene\ufb01t of using com-\nplete change observations. Here\nwe use only the URLs that pro-\nvide them (4% of our dataset), via\nsitemaps and other signals. On this\nURL subset, LAMBDACRAWL re-\nduces to LC-ComplObs (LC-CO,\nAlg. 2) that heeds these signals,\nwhile LC-IncomplObs (LC-IO, Alg.\n1), BLC, and BLC\u0001 ignore them. As\na result, LC-CO\u2019s policy cost both\nw.r.t. J \u03c0\nb is at least 50%\nlower (!) than the other algorithms\u2019.\n\nh and J \u03c0\n\nh ) and binary (J \u03c0\n\nFigure 3: Convergence of the RL-\nbased LLC, LLCA, and BLLC\u0001 ini-\ntialized with uniform change rate es-\ntimates of 1/day. Dashed lines show\nh of LC,\nasymptotic policy costs (J \u03c0\nLCA, and BLC\u0001 from Fig. 1); plots\nhave con\ufb01dence intervals. LLC con-\nverges notably faster than BLLC\u0001,\nLLCA even more so, as it learns\nfewer parameters. LLCA\u2019s asymp-\ntotic policy is worse than LLC\u2019s but\nbetter than BLLC\u0001\u2019s, especially w.r.t.\nbinary cost J \u03c0\n\nFigure 1: Performance w.r.t. har-\nmonic (J \u03c0\nb ) pol-\nicy costs. Lower bars = better\npolicies. LC is robust to both, but\nBLC\u0001 & BLC [2] aren\u2019t: LC (J \u03c0\nh -\nh = \u221e) and\noptimal) beats BLC (J \u03c0\nBLC\u0001 by 35% w.r.t. J \u03c0\nh , but BLC\n(J \u03c0\nb -optimal for incomplete-change-\nobservation URLs) and BLC\u0001 don\u2019t\nbeat LC/LCA w.r.t. J \u03c0\nb . CC (J \u03c0\nh =\nb = 963) and UC (J \u03c0\n2144,J \u03c0\nh =\n1268,J \u03c0\nb = 628) did poorly and\nwere omitted from the plot.\nassumed known change rates. To obtain them, we applied the change rate estimators in Eqs. 10 and\n11 to all of the 14-week crawl data for 18.5M URLs, and used their output as ground truth. Policies\nwere evaluated using equations in Props. 1, 4 to get J \u03c0\nh , and Eqs. 12, 13 in the Supplement to get J \u03c0\nb .\n(2) Utilizing complete change observations as LAMBDACRAWL does when they are available makes\na very big difference in policy cost. Per Fig. 1, LC outperforms BLC even in terms of binary cost\nb , w.r.t. which BLC gives an optimatlity guarantee as long as all URLs have only incomplete\nJ \u03c0\nchange observations. This raises the question: can LC\u2019s and LCA\u2019s specialized handling of the\ncomplete-observation URLs, a mere 4% of our dataset, explain their overall performance advantage?\nThe experiment results in Fig. 2 suggest that this is the case. Here we used only the aforementioned\nURLs with complete change observations. On this URL set, LC reduces to LC-CO (Alg. 2) and yields\na 2\u00d7 reduction in harmonic cost J \u03c0\nh compared to treating these URLs conventionally as LC-IO (Alg.\n1), BLC, and BLC\u0001 do. On the full 18.5M set of URLs, LC crawls its complete-observability subset\neven more effectively by allocating to it a disproportionately large fraction of the overall bandwidth.\nAlthough its handling of complete-observation URLs gives LC an edge over alternatives, note that in\nthe hypothetical situation where LC treats these URLs conventionally, as re\ufb02ected in the LC-IO\u2019s plot\nin Fig. 2, it is still at par with BLC and BLC\u0001 w.r.t. J \u03c0\n(3) When source change rates are initially unknown, the approximate LLCA converges faster w.r.t.\nh than the optimal LLC, but at the cost of higher asymptotic policy cost. Interestingly, LLCA\u2019s\nJ \u03c0\napproximation (Prop. 9) only weakly affects its asymptotic performance w.r.t. binary cost J \u03c0\nb (Fig.\n1). These factors and algorithm simplicity make this approximation a useful tradeoff in practice.\nThis experiment, whose analysis is presented in Fig. 3, compared LLC, LLCA, and BLLC\u0001 in settings\nwhere URL change rates have to be learned on the \ufb02y. We chose 20 100,000-URL subsamples of\nour 18.5M-URL dataset randomly with replacement, and used them to simulate 20 21-day runs of\neach algorithm starting with change rate estimates of 1 change/day for each URL. We used \"ground\ntruth\" change rates to generate change times for each URL. Every simulated day (epoch; see Alg.\n4), each algorithm re-estimated change rates from observations, which were sampled according to\nthe algorithm\u2019s current policy, of simulated URL changes. For the next day, it reoptimized its policy\nfor the new rate estimates, and this policy was evaluated with equations in Props. 1, 4 under the\nground-truth rates. Each algorithm\u2019s policy costs across 20 episodes were averaged for each day.\n8 Conclusion\nWe have introduced a new optimization objective and a suite of ef\ufb01cient algorithms for it to address the\nfreshness crawl scheduling problem faced by services from search engines to databases. In particular,\nwe have presented LAMBDALEARNANDCRAWL, which integrates model parameter learning with\nscheduling optimization. To provide theoretical convergence rate analysis in the future, we intend to\nframe this problem as a restless multi-armed bandit setting [18, 34].\nAcknowledgements. We would like to thank Lin Xiao (Microsoft Research) and Junaid Ahmed\n(Microsoft Bing) for their comments and suggestions regarding this work.\n\nb , and markedly outperforms them w.r.t. J \u03c0\nh .\n\n9\n\n\fReferences\n[1] Anily, S., Glass, C., and Hassin, R. The scheduling of maintenance service. Discrete Applied\n\nMathematics, pp. 27\u201342, 1998.\n\n[2] Azar, Y., Horvitz, E., Lubetzky, E., Peres, Y., and Shahaf, D. Tractable near-optimal policies for\n\ncrawling. Proceedings of the National Academy of Sciences (PNAS), 2018.\n\n[3] Bar-Noy, A., Bhatia, R., Naor, J., and Schieber, B. Minimizing service and operation costs of\n\nperiodic scheduling. In SODA, pp. 11\u201320, 1998.\n\n[4] Broder, A., Glassman, S., Manasse, M., and Zweig, G. Syntactic clustering of the web. In\n\nWWW, pp. 1157\u20131166, 1997.\n\n[5] Bubeck, S. Convex optimization: Algorithms and complexity. Foundations and Trends in\n\nMachine Learning, 8(3-4):231\u2013357, 2015.\n\n[6] Burden, R. L. and Faires, J. D. Numerical Analysis. PWS Publishers, 3rd edition, 1985.\n\n[7] Charikar, M. Similarity estimation techniques from rounding algorithms. In STOC, pp. 380\u2013388,\n\n2002.\n\n[8] Cho, J. and Garcia-Molina, H. Synchronizing a database to improve freshness.\n\nSIGMOD International Conference on Management of Data, 2000.\n\nIn ACM\n\n[9] Cho, J. and Garcia-Molina, H. The evolution of the web and implications for an incremental\n\ncrawler. In VLDB, 2000.\n\n[10] Cho, J. and Garcia-Molina, H. Effective page refresh policies for web crawlers. ACM Transac-\n\ntions on Database Systems, 28(4):390\u2013426, 2003.\n\n[11] Cho, J. and Garcia-Molina, H. Estimating frequency of change. ACM Transactions on Internet\n\nTechnology, 3(3):256\u2013290, 2003.\n\n[12] Cho, J. and Ntoulas, A. Effective change detection using sampling. In VLDB, 2002.\n\n[13] Coffman, E. G., Liu, Z., and Weber, R. R. Optimal robot scheduling for web search engines.\n\nJournal of Scheduling, 1(1), 1998.\n\n[14] Gal, A. and Eckstein, J. Managing periodically updated data in relational databases. Journal of\n\nACM, 48:1141\u20131183, 2001.\n\n[15] Glazebrook, K. D. and Mitchell, H. M. An index policy for a stochastic scheduling model with\n\nimproving/deteriorating jobs. Naval Research Logistics, 49:706\u2013721, 2002.\n\n[16] Glazebrook, K. D., Mitchell, H. M., and Ansell, P. S. Index policies for the maintenance of a\ncollection of machines by a set of repairmen. European Journal of Operations Research, 165\n(1):267\u2013284, 2005.\n\n[17] Ibaraki, T. and Katoh, N. Resource Allocation Problems: Algorithmic Approaches. MIT Press,\n\n1988.\n\n[18] Immorlica, N. and Kleinberg, R. Recharging bandits. In FOCS, 2018.\n\n[19] Karimi, M. R., Tavakoli, E., Farajtabar, M., Song, L., and Gomez-Rodriguez, M. Smart\n\nbroadcasting: Do you want to be seen? In ACM KDD, 2016.\n\n[20] Kiefer, J. Sequential minimax search for a maximum. Proceedings of the American Mathemati-\n\ncal Society, 4(3):502\u2013506, 1953.\n\n[21] Kolobov, A., Peres, Y., Lubetzky, E., and Horvitz, E. Optimal freshness crawl under politeness\n\nconstraints. In SIGIR, 2019.\n\n[22] Olston, C. and Najork, M. Web crawling. Foundations and Trends in Information Retrieval, 3\n\n(1):175\u2013246, 2010.\n\n10\n\n\f[23] Olston, C. and Pandey, S. Recrawl scheduling based on information longevity. In WWW, pp.\n\n437\u2013446, 2008.\n\n[24] Page, L., Brin, S., Motwani, R., and Winograd, T. The pagerank citation ranking: Bringing\n\norder to the web. Technical report, Stanford University, MA, USA, 1998.\n\n[25] Pandey, S. and Olston, C. User-centric web crawling. In WWW, 2005.\n\n[26] Pandey, S., Ramamritham, K., and Chakrabarti, S. Monitoring the dynamic web to respond to\n\ncontinuous queries. In WWW, 2003.\n\n[27] Pandey, S., Dhamdhere, K., and Olston, C. WIC: A general-purpose algorithm for monitoring\n\nweb information sources. In VLDB, 2004.\n\n[28] Radinsky, K. and Bennett, P. N. Predicting content change on the web. In WSDM, pp. 415\u2013424,\n\n2013.\n\n[29] Sutton, R. and Barto, A. G. Introduction to Reinforcement Learning. MIT Press, 1st edition,\n\n1998.\n\n[30] Taylor, H. and Karlin, S. An Introduction To Stochastic Modeling. Academic Press, 3rd edition,\n\n1998.\n\n[31] Upadhyay, U., De, A., and Gomez-Rodriguez, M. Deep reinforcement learning of marked\n\ntemporal point processes. In NeurIPS, 2018.\n\n[32] van den Bosch, A., Bogers, T., and de Kunder, M. A longitudinal analysis of search engine\n\nindex size. In ISSI, 2015.\n\n[33] Wang, Y., Williams, G., and Theodorou, E. Variational policy for guiding point processes. In\n\nICML, 2017.\n\n[34] Whittle, P. Restless bandits: Activity allocation in a changing world. Applied Probability, 25\n\n(A):287\u2013298, 1988.\n\n[35] Wolf, J. L., Squillante, M. S., Yu, P. S., Sethuraman, J., and Ozsen, L. Optimal crawling\n\nstrategies for web search engines. In WWW, 2002.\n\n[36] Zarezade, A., Upadhyay, U., Rabiee, H. R., and Gomez-Rodriguez, M. Redqueen: An online\n\nalgorithm for smart broadcasting in social networks. In ACM KDD, 2017.\n\n11\n\n\f", "award": [], "sourceid": 315, "authors": [{"given_name": "Andrey", "family_name": "Kolobov", "institution": "Microsoft Research"}, {"given_name": "Yuval", "family_name": "Peres", "institution": "N/A"}, {"given_name": "Cheng", "family_name": "Lu", "institution": "Microsoft"}, {"given_name": "Eric", "family_name": "Horvitz", "institution": "Microsoft Research"}]}