{"title": "Computing the Stationary Distribution Locally", "book": "Advances in Neural Information Processing Systems", "page_first": 1376, "page_last": 1384, "abstract": "Computing the stationary distribution of a large finite or countably infinite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks to sample states from the stationary distribution, as in Markov Chain Monte Carlo (MCMC). However these methods are computationally costly; either they involve operations at every state or they scale (in computation time) at least linearly in the size of the state space. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some $\\Delta \\in (0,1)$. If so, it estimates the stationary probability. Our algorithm uses information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. We provide correctness and convergence guarantees that depend on the algorithm parameters and mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates.", "full_text": "Computing the Stationary Distribution, Locally\n\nChristina E. Lee\n\nLIDS, Department of EECS\n\nMassachusetts Institute of Technology\n\ncelee@mit.edu\n\nAsuman Ozdaglar\n\nLIDS, Department of EECS\n\nMassachusetts Institute of Technology\n\nasuman@mit.edu\n\nDevavrat Shah\n\nDepartment of EECS\n\nMassachusetts Institute of Technology\n\ndevavrat@mit.edu\n\nAbstract\n\nComputing the stationary distribution of a large \ufb01nite or countably in\ufb01nite state\nspace Markov Chain (MC) has become central in many problems such as statisti-\ncal inference and network analysis. Standard methods involve large matrix multi-\nplications as in power iteration, or simulations of long random walks, as in Markov\nChain Monte Carlo (MCMC). Power iteration is costly, as it involves computation\nat every state. For MCMC, it is dif\ufb01cult to determine whether the random walks\nare long enough to guarantee convergence. In this paper, we provide a novel al-\ngorithm that answers whether a chosen state in a MC has stationary probability\nlarger than some \u2206 \u2208 (0, 1), and outputs an estimate of the stationary probability.\nOur algorithm is constant time, using information from a local neighborhood of\nthe state on the graph induced by the MC, which has constant size relative to the\nstate space. The multiplicative error of the estimate is upper bounded by a func-\ntion of the mixing properties of the MC. Simulation results show MCs for which\nthis method gives tight estimates.\n\n1\n\nIntroduction\n\nComputing the stationary distribution of a Markov chain (MC) with a very large state space (\ufb01nite,\nor countably in\ufb01nite) has become central to statistical inference. The ability to tractably simulate\nMCs along with the generic applicability has made Markov Chain Monte Carlo (MCMC) a method\nof choice and arguably the top algorithm of the twentieth century [1]. However, MCMC and its vari-\nations suffer from limitations in large state spaces, motivating the development of super-computation\ncapabilities \u2013 be it nuclear physics [2, Chapter 8], Google\u2019s computation of PageRank [3], or stochas-\ntic simulation at-large [4]. MCMC methods involve sampling states from a long random walk over\nthe entire state space [5, 6]. It is dif\ufb01cult to determine when the algorithm has walked \u201clong enough\u201d\nto produce reasonable approximations for the stationary distribution.\nPower iteration is another method commonly used for computing leading eigenvectors and stationary\ndistributions of MCs. The method involves iterative multiplication of the transition matrix of the MC\n[7]. However, there is no clearly de\ufb01ned stopping condition in general settings, and computations\nmust be performed at every state of the MC.\nIn this paper, we provide a novel algorithm that addresses these limitations. Our algorithm answers\nthe following question: for a given node i of a countable state space MC, is the stationary probability\nof i larger than a given threshold \u2206 \u2208 (0, 1), and can we approximate it? For chosen parameters\n\u2206, \u0001, and \u03b1, our algorithm guarantees that for nodes such that the estimate \u02c6\u03c0i < \u2206/(1 + \u0001), the true\n\n1\n\n\f(cid:16) ln( 1\n\n(cid:17)\n\nvalue \u03c0i is also less than \u2206 with probability at least 1 \u2212 \u03b1. In addition, if \u02c6\u03c0i \u2265 \u2206/(1 + \u0001), with\nprobability at least 1 \u2212 \u03b1, the estimate is within an \u0001 times Zmax(i) multiplicative factor away from\nthe true \u03c0i, where Zmax(i) is effectively a \u201clocal mixing time\u201d for i derived from the fundamental\nmatrix of the transition probability matrix P .\n\n\u03b1 )\n\u00013\u2206\n\nThe running time of the algorithm is upper bounded by \u02dcO\n, which is constant with respect\nto the MC. Our algorithm uses only a\u201clocal\u201d neighborhood of the state i, de\ufb01ned with respect to the\nMarkov graph. Stopping conditions are easy to verify and have provable performance guarantees.\nIts correctness relies on a basic property: the stationary probability of each node is inversely pro-\nportional to the mean of its \u201creturn time.\u201d Therefore, we sample return times to the node and use\nthe empirical average as an estimate. Since return times can be arbitrarily long, we truncate sample\nreturn times at a chosen threshold. Hence, our algorithm is a truncated Monte Carlo method.\nWe utilize the exponential concentration of return times in Markov chains to establish theoretical\nguarantees for the algorithm. For \ufb01nite state Markov chains, we use results from Aldous and Fill\n[8]. For countably in\ufb01nite state space Markov chains, we build upon a result by Hajek [9] on the\nconcentration of certain types of hitting times to derive concentration of return times to a given node.\nWe use these concentration results to upper bound the estimation error and the algorithm runtime\nas a function of the truncation threshold and the mixing properties of the graph. For graphs that\nmix quickly, the distribution over return times concentrates more sharply around its mean, resulting\nin tighter performance guarantees. We illustrate the wide applicability of our local algorithm for\ncomputing network centralities and stationary distributions of queuing models.\n\nRelated Work. MCMC was originally proposed in [5], and a tractable way to design a random\nwalk for a target distribution was proposed by Hastings [6]. Given a distribution \u03c0(x), the method\ndesigns a Markov chain such that the stationary distribution of the Markov chain is equal to the target\ndistribution. Without using the full transition matrix of the Markov chain, Monte Carlo sampling\ntechniques estimate the distribution by sampling random walks via the transition probabilities at each\nnode. As the length of the random walk approaches in\ufb01nity, the distribution over possible states of\nthe random walk approaches stationary distribution. Articles by Diaconis and Saloff-Coste [10] and\nDiaconis [11] provide a summary of major developments from a probability theoretic perspective.\nThe majority of work following the initial introduction of the algorithm involves analyzing the con-\nvergence rates and mixing times of this random walk [8, 12]. Techniques involve spectral analysis or\ncoupling arguments. Graph properties such as conductance help characterize the graph spectrum for\nreversible Markov chains. For general non-reversible countably in\ufb01nite state space Markov chains,\nlittle is known about the mixing time. Thus, it is dif\ufb01cult to verify if the random walk has suf\ufb01-\nciently converged to the stationary distribution, and before that point there is no guarantee whether\nthe estimate obtained from the random walk is larger or smaller than the true stationary probability.\nPower iteration is an equally old and well-established method for computing leading eigenvectors of\nmatrices [7]. Given a matrix A and a seed vector x0, power iteration recursively computes xt+1 =\n(cid:107)Axt(cid:107). The convergence rate of xt to the leading eigenvector is governed by the spectral gap. As\nAxt\nmentioned above, techniques for analyzing the spectrum are not well developed for general non-\nreversible MCs, thus it is dif\ufb01cult to know how many iterations are suf\ufb01cient. Although power\niteration can be implemented in a distributed manner, each iteration requires computation to be\nperformed by every state in the MC, which is expensive for large state space MCs. For countably\nin\ufb01nite state space MCs, there is no clear analog to matrix multiplication.\nIn the specialized setting of PageRank, the goal is to compute the stationary distribution of a speci\ufb01c\nMarkov chain described by a transition matrix P = (1 \u2212 \u03b2)Q + \u03b21 \u00b7 rT , where Q is a stochastic\ntransition probability matrix, and \u03b2 is a scalar in (0, 1). This can be interpreted as random walk in\nwhich every step either follows Q with probability 1 \u2212 \u03b2, or with probability \u03b2 jumps to a node\naccording to the distribution speci\ufb01ed by vector r. By exploiting this special structure, numerous\nrecent results have provided local algorithms for computing PageRank ef\ufb01ciently. This includes\nwork by Jeh and Widom [13], Fogaras et al. [14], Avrachenkov et al. [15], Bahmani et al. [16] and\nmost recently, Borgs et al. [17]: it outputs a set of \u201cimportant\u201d nodes \u2013 with probability 1 \u2212 o(1),\nit includes all nodes with PageRank greater than a given threshold \u2206, and does not include nodes\n\nwith PageRank less than \u2206/c for a given c > 1. The algorithm runs in time O(cid:0) 1\n\n\u2206polylog(n)(cid:1).\n\nUnfortunately, these approaches are speci\ufb01c to PageRank and do not extend to general MCs.\n\n2\n\n\f2 Setup, problem statement & algorithm\nConsider a discrete time, irreducible, positive-recurrent MC {Xt}t\u22650 on a countable state space \u03a3\nhaving transition probability matrix P . Let P (n)\n\nij be the (i, j)-coordinate of P n such that\n\nP (n)\n\nij\n\n(cid:44) P(Xn = j|X0 = i).\n\nThroughout the paper, we will use the notation Ei[\u00b7] = E[\u00b7|X0 = i], and Pi(\u00b7) = P(\u00b7|X0 = i). Let\nTi be the return time to a node i, and let Hi be the maximal hitting time to a node i such that\n\nTi = inf{t \u2265 1 | Xt = i} and Hi = max\nj\u2208\u03a3\n\nThe stationary distribution is a function \u03c0 : \u03a3 \u2192 [0, 1] such that (cid:80)\n(cid:80)\ni\u2208\u03a3 \u03c0i = 1 and \u03c0i =\nj\u2208\u03a3 \u03c0jPji for all i \u2208 \u03a3. An irreducible positive recurrent Markov chain has a unique station-\n\nEj[Ti].\n\n(1)\n\n(cid:104)(cid:80)Ti\n\n(cid:105)\n\nary distribution satisfying [18, 8]:\nEi\n\n\u03c0i =\n\nt=1 1{Xt=i}\nEi[Ti]\n\n=\n\n1\n\nEi[Ti]\n\nfor all\n\ni \u2208 \u03a3.\n\n(2)\n\nThe Markov chain can be visualized as a random walk over a weighted directed graph G =\n(\u03a3, E, P ), where \u03a3 is the set of nodes, E = {(i, j) \u2208 \u03a3 \u00d7 \u03a3 : Pij > 0} is the set of edges,\nand P describes the weights of the edges.1 The local neighborhood of size r around node i \u2208 \u03a3 is\nde\ufb01ned as {j \u2208 \u03a3 | dG(i, j) \u2264 r}, where dG(i, j) is the length of the shortest directed path (in terms\nof number of edges) from i to j in G. An algorithm is local if it only uses information within a local\nneighborhood of size r around i, where r is constant with respect to the size of the state space.\nThe fundamental matrix Z of a \ufb01nite state space Markov chain is\n\n\u221e(cid:88)\n\n(cid:16)\n\nP (t) \u2212 1\u03c0T(cid:17)\n\n=(cid:0)I \u2212 P + 1\u03c0T(cid:1)\u22121\n\nZ (cid:44)\n\n, such that Zjk (cid:44)\n\njk \u2212 \u03c0k\nP (t)\n\n.\n\n\u221e(cid:88)\n\n(cid:16)\n\n(cid:17)\n\nt=0\n\nt=0\n\nSince P (t)\njk denotes the probability that a random walk beginning at node j is at node k after t steps,\nZjk represents how quickly the probability mass at node k from a random walk beginning at node j\nconverges to \u03c0k. We will use this to provide bounds for the performance of our algorithm.\n\n2.1 Problem Statement\nConsider a discrete time, irreducible, aperiodic, positive recurrent MC {Xt}t\u22650 on a countable state\nspace \u03a3 with transition probability matrix P : \u03a3 \u00d7 \u03a3 \u2192 [0, 1]. Given node i and threshold \u2206, is\n\u03c0i > \u2206? If so, what is \u03c0i? We answer this with a local algorithm, which uses only edges within a\nlocal neighborhood around i of constant size with respect to the state space.\nWe illustrate the limitations of using a local algorithm for answering this question. Consider the\nClique-Cycle Markov chain shown in Figure 1(a) with n nodes, composed of a size k clique con-\nnected to a size (n \u2212 k + 1) cycle. For node j in the clique excluding i, with probability 1/2, the\nrandom walk stays at node j, and with probability 1/2 the random walk chooses a random neighbor\nuniformly. For node j in the cycle, with probability 1/2, the random walk stays at node j, and with\nprobability 1/2 the random walk travels counterclockwise to the subsequent node in the cycle. For\nnode i, with probability \u0001 the random walk enters the cycle, with probability 1/2 the random walk\nchooses any neighbor in the clique; and with probability 1/2 \u2212 \u0001 the random walk stays at node i.\nWe can show that the expected return time to node i is (1 \u2212 2\u0001)k + 2\u0001n.\nTherefore, Ei[Ti] scales linearly in n and k. Suppose we observe only the local neighborhood of\nconstant size r around node i. All Clique-Cycle Markov chains with more than k + 2r nodes have\nidentical local neighborhoods. Therefore, for any \u2206 \u2208 (0, 1), there exists two Clique-Cycle Markov\nchains which have the same \u0001 and k, but two different values for n, such that even though their local\nneighborhoods are identical, \u03c0i > \u2206 in the MC with a smaller n, while \u03c0i < \u2206 in the MC with a\nlarger n. Therefore, by restricting ourselves to a local neighborhood around i of constant size, we\nwill not be able to correctly determine whether \u03c0i > \u2206 for every node i in any arbitrary MC.\n\n1Throughout the paper, Markov chain and random walk on a network are used interchangeably; similarly,\n\nnodes and states are used interchangeably.\n\n3\n\n\f(a) Clique-Cycle Markov chain\n\n(b) MM1 Queue\n\nFigure 1: Examples of Markov Chains\n\n2.2 Algorithm\nGiven a threshold \u2206 \u2208 (0, 1) and a node i \u2208 \u03a3, the algorithm obtains an estimate \u02c6\u03c0i of \u03c0i, and\nuses \u02c6\u03c0i to determine whether to output 0 (\u03c0i \u2264 \u2206) or 1 (\u03c0i > \u2206). The algorithm relies on the\ncharacterization of \u03c0i given in Eq. (2): \u03c0i = 1/Ei[Ti]. It takes many independent samples of a\ntruncated random walk that begins at node i and stops either when the random walk returns to node\ni, or when the length exceeds a predetermined maximum denoted by \u03b8. Each sample is generated\nby simulating the random walk using \u201ccrawl\u201d operations over the MC. The expected length of each\nrandom walk sample is Ei[min(Ti, \u03b8)], which is close to Ei[Ti] when \u03b8 is large.\nAs the number of samples and \u03b8 go to in\ufb01nity, the estimate will converge almost surely to \u03c0i, due\nto the strong law of large numbers and positive recurrence of the MC. We use Chernoff\u2019s bound to\nchoose a suf\ufb01ciently large number of samples as a function of \u03b8 to guarantee that with probability\n1 \u2212 \u03b1, the average length of the sample random walks will lie within (1 \u00b1 \u0001) of Ei[min(Ti, \u03b8)].\nWe also need to choose an suitable value for \u03b8 that balances between accuracy and computation cost.\nThe algorithm searches for an appropriate size for the local neighborhood by beginning small and\nincreasing the size geometrically. In our analysis, we will show that the total computation summed\nover all iterations is only a constant factor more than the computation in the \ufb01nal iteration.\n\nInput: Anchor node i \u2208 \u03a3 and parameters \u2206 = threshold for importance,\n\u0001 = closeness of the estimate, and \u03b1 = probability of failure.\nInitialize: Set\n\n(cid:24) 6(1 + \u0001) ln(8/\u03b1)\n\n(cid:25)\n\nt = 1, \u03b8(1) = 2, N (1) =\n\n\u00012\n\n.\n\nStep 1 (Gather Samples) For each k in {1, 2, 3, . . . , N (t)}, generate independent samples\nsk \u223c min(Ti, \u03b8(t)) by simulating paths of the MC beginning at node i, and setting sk to\nbe the length of the kth sample path. Let \u02c6p(t) = fraction of samples truncated at \u03b8(t),\n\nN (t)(cid:88)\n\nk=1\n\n\u02c6T (t)\ni =\n\n1\n\nN (t)\n\nsk, \u02c6\u03c0(t)\n\ni =\n\n1\n\u02c6T (t)\ni\n\n, and \u02dc\u03c0(t)\n\ni =\n\n1 \u2212 \u02c6p(t)\n\u02c6T (t)\ni\n\n.\n\nStep 2 (Termination Conditions)\n\ni < \u2206\n\n\u2022 If (a) \u02c6\u03c0(t)\n\u2022 Else if (b) \u02c6p(t) \u00b7 \u02c6\u03c0(t)\n\u2022 Else continue.\n\n(1+\u0001), then stop and return 0, and estimates \u02c6\u03c0(t)\n\nand \u02dc\u03c0(t)\ni < \u0001\u2206, then stop and return 1, and estimates \u02c6\u03c0(t)\n\ni\n\ni\n\ni\n\n.\n\nand \u02dc\u03c0(t)\n\n.\n\ni\n\nStep 3 (Update Rules) Set\n\n\u03b8(t+1) \u2190 2 \u00b7 \u03b8(t), N (t+1) \u2190\n\n(cid:38)\n\n3(1 + \u0001)\u03b8(t+1) ln(4\u03b8(t+1)/\u03b1)\n\n(cid:39)\n\n, and t \u2190 t + 1.\n\n\u02c6T (t)\ni\n\n\u00012\n\nReturn to Step 1.\nOutput: 0 or 1 indicating whether \u03c0i > \u2206, and estimates \u02c6\u03c0(t)\ni\n\nand \u02dc\u03c0(t)\n\ni\n\n.\n\n4\n\ni 1 2 3 4 5 \fThis algorithm outputs two estimates for the anchor node i: \u02c6\u03c0i, which relies on the second expression\nin Eq. (2), and \u02dc\u03c0i, which relies on the \ufb01rst expression in Eq. (2). We refer to the total number of\niterations used in the algorithm as the value of t at the time of termination, denoted by tmax. The\n\ntotal number of random walk steps taken within the \ufb01rst t iterations is(cid:80)t\nThe algorithm will always terminate within ln(cid:0) 1\n\n(cid:1) iterations. Since \u03b8(t) governs the radius of the\n\nk=1 N (t) \u00b7 \u02c6T (t)\n\nlocal neighborhood that the algorithm utilizes, this implies that our algorithm is local, since the\nmaximum distance is strictly upper bounded by 1\n\n\u0001\u2206, which is constant with respect to the MC.\n\n\u0001\u2206\n\n.\n\ni\n\nWith high probability, the estimate \u02c6\u03c0(t)\n1+\u0001 due to the truncation. Thus when the\nalgorithm terminates at stopping condition (a), \u03c0i < \u2206 with high probability. When the algorithm\nterminates at condition (b), the fraction of samples truncated is small, which will imply that the\npercentage error of estimate \u02c6\u03c0(t)\n\nis upper bounded as a function of \u0001 and properties of the MC.\n\nis larger than \u03c0i\n\ni\n\ni\n\n3 Theoretical guarantees\n\nThe following theorems give correctness and convergence guarantees for the algorithm. The proofs\nhave been omitted and can be found in the extended version of this paper [19].\nTheorem 3.1. For an aperiodic, irreducible, positive recurrent, countable state space Markov chain,\nand for any i \u2208 \u03a3, with probability greater than 1 \u2212 \u03b1:\ni \u2265 \u03c0i\n\n1+\u0001 . Therefore, if the algorithm terminates at\n\n1. Correctness. For all iterations t, \u02c6\u03c0(t)\n\ncondition (a) and outputs 0, then \u03c0i < \u2206.\n\n2. Convergence. The number of iterations tmax and the total number of steps (or neighbor\n\nqueries) used by the algorithm are bounded above by2 3\n\n(cid:19)\n\n(cid:18) ln( 1\n\n\u03b1 )\n\u00013\u2206\n\n.\n\n(cid:18) 1\n\n(cid:19)\n\n\u0001\u2206\n\ntmax(cid:88)\n\nk=1\n\ntmax \u2264 ln\n\n, and\n\nN (t) \u00b7 \u02c6T (t)\n\ni \u2264 \u02dcO\n\nPart 1 is proved by using Chernoff\u2019s bound to show that N (t) is large enough to guarantee that with\nprobability greater than 1 \u2212 \u03b1, for all iterations t, \u02c6T (t)\nconcentrates around its mean. Part 2 asserts\nthat the algorithm terminates in \ufb01nite time as a function of the parameters of the algorithm, inde-\npendent from the size of the MC state space. Therefore this implies that our algorithm is local. This\ntheorem holds for all aperiodic, irreducible, positive recurrent MCs. This is proved by observing\nthat \u02c6T (t)\n\n\u0001\u2206, termination condition (b) must be satis\ufb01ed.\n\ni > \u02c6p(t)\u03b8(t). Therefore when \u03b8(t) > 1\n\ni\n\n3.1 Finite-state space Markov Chain\n\nWe can obtain characterizations for the approximation error and the running time as functions of\nspeci\ufb01c properties of the MC. The analysis depends on how sharply the distribution over return\ntimes concentrates around the mean.\nTheorem 3.2. For an irreducible Markov chain {Xt} with \ufb01nite state space \u03a3 and transition prob-\nability matrix P , for any i \u2208 \u03a3, with probability greater than 1 \u2212 \u03b1, for all iterations t,\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u2264 2(1 \u2212 \u0001)Pi(Ti > \u03b8(t))Zmax(i) + \u0001 \u2264 4(1 \u2212 \u0001)2\u2212\u03b8(t)/2HiZmax(i) + \u0001,\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u02c6\u03c0(t)\n\ni \u2212 \u03c0i\n\u02c6\u03c0(t)\ni\n\nwhere Hi is de\ufb01ned in Eq (1), and Zmax(i) = maxj |Zji|.\nTherefore, with probability greater than 1 \u2212 \u03b1, if the algorithm terminates at condition (b), then\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u02c6\u03c0(t)\n\ni \u2212 \u03c0i\n\u02c6\u03c0(t)\ni\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u2264 \u0001 (3Zmax(i) + 1) .\n\n2We use the notation \u02dcO(f (a)g(b)) to mean \u02dcO(f (a)) \u02dcO(g(b)) = \u02dcO(f (a)polylogf (a)) \u02dcO(g(b)polylogg(b)).\n3The bound for tmax is always true (stronger than with high probability).\n\n5\n\n\fTheorem 3.2 shows that the percentage error in the estimate \u02c6\u03c0(t)\ni decays exponentially in \u03b8(t), which\ndoubles in each iteration. The proof relies on the fact that the distribution of the return time Ti has\nan exponentially decaying tail [8], ensuring that the return time Ti concentrates around its mean\nEi[Ti]. When the algorithm terminates at stopping condition (b), P(Ti > \u03b8) \u2264 \u0001( 4\n3 + \u0001) with high\nprobability, thus the percentage error is bounded by O(\u0001Zmax(i)).\nSimilarly, we can analyze the error between the second estimate \u02dc\u03c0(t)\nand \u03c0i, in the case when \u03b8(t) is\n2. This is required to guarantee that (1\u2212 \u02c6p(t)) lies within an \u0001\nlarge enough such that P(Ti > \u03b8(t)) < 1\nmultiplicative interval around its mean with high probability. Observe than 2Zmax(i) is replaced by\nmax(2Zmax(i) \u2212 1, 1). Thus for some values of Zmax(i), the error bound for \u02dc\u03c0i is smaller than the\nequivalent bound for \u02c6\u03c0i. We will show simulations of computing PageRank, in which \u02dc\u03c0i estimates\n\u03c0i more closely than \u02c6\u03c0i.\nTheorem 3.3. For an irreducible Markov chain {Xt} with \ufb01nite state space \u03a3 and transition prob-\nability matrix P , for any i \u2208 \u03a3, with probability greater than 1 \u2212 \u03b1, for all iterations t such that\nP(Ti > \u03b8(t)) < 1\n2 ,\ni \u2212 \u03c0i\n\u02dc\u03c0(t)\ni\n\n(cid:19)(cid:18) Pi(Ti > \u03b8(t))\n\nmax(2Zmax(i) \u2212 1, 1) +\n\n1 \u2212 Pi(Ti > \u03b8(t))\n\n(cid:18) 1 + \u0001\n\n1 \u2212 \u0001\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u02dc\u03c0(t)\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u2264\n\n2\u0001\n1 \u2212 \u0001\n\n.\n\n(cid:19)\n\ni\n\n(cid:0)Ti > \u03b8(t)(cid:1) will be small and \u02c6\u03c0(t)\n\nTheorem 3.4 also uses the property of an exponentially decaying tail as a function of Hi to show\nthat for large \u03b8(t), with high probability, Pi\ni will be close to \u03c0i,\nand thus the algorithm will terminate at one of the stopping conditions. The bound is a function\nof how sharply the distribution over return times concentrates around the mean. Theorem 3.4(a)\nstates that for low probability nodes, the algorithm will terminate at stopping condition (a) for large\nenough iterations. Theorem 3.4(b) states that for all nodes, the algorithm will terminate at stopping\ncondition (b) for large enough iterations.\nTheorem 3.4. For an irreducible Markov chain {Xt} with \ufb01nite state space \u03a3,\n(a) For any node i \u2208 \u03a3 such that \u03c0i < (1 \u2212 \u0001)\u2206/(1 + \u0001), with probability greater than 1 \u2212 \u03b1, the\ntotal number of steps used by the algorithm is bounded above by\n\n(cid:32)\n\n(cid:32)\n\n(cid:32)(cid:18)\n\n(cid:19)(cid:18) 1\n\nN (t) \u00b7 \u02c6T (t)\n\ni \u2264 \u02dcO\n\nln( 1\n\u03b1 )\n\u00012\n\nHi ln\n\n1\n\n1 \u2212 2\u22121/2Hi\n\n\u2212 1 + \u0001\n(1 \u2212 \u0001)\u2206\n\n\u03c0i\n\n(cid:19)\u22121(cid:33)(cid:33)(cid:33)\n\n.\n\ntmax(cid:88)\n\nk=1\n\n(b) For all nodes i \u2208 \u03a3, with probability greater than 1 \u2212 \u03b1, the total number of steps used by the\nalgorithm is bounded above by\n\ntmax(cid:88)\n\nk=1\n\nN (t) \u00b7 \u02c6T (t)\n\ni \u2264 \u02dcO\n\n(cid:18) ln( 1\n\n\u03b1 )\n\u00012\n\n(cid:18) Hi\n\n\u03b1\n\n(cid:18)\n\n(cid:18) 1\n\nln\n\n\u03c0i\n\n+\n\n1\n\n1 \u2212 2\u22121/2Hi\n\n\u0001\u2206\n\n(cid:19)(cid:19)(cid:19)(cid:19)\n\n.\n\n3.2 Countable-state space Markov Chain\n\nThe proofs of Theorems 3.2 and 3.4 require the state space of the MC to be \ufb01nite, so we can upper\nbound the tail of the distribution of Ti using the maximal hitting time Hi. In fact, these results can\nbe extended to many countably in\ufb01nite state space Markov chains, as well. We prove that the tail of\nthe distribution of Ti decays exponentially for any node i in any countable state space Markov chain\nthat satis\ufb01es Assumption 3.5.\nAssumption 3.5. The Markov chain {Xt} is aperiodic and irreducible. There exists a Lyapunov\nfunction V : \u03a3 \u2192 R+ and constants \u03bdmax, \u03b3 > 0, and b \u2265 0, that satisfy the following conditions:\n\n1. The set B = {x \u2208 \u03a3 : V (x) \u2264 b} is \ufb01nite,\n\n2. For all x, y \u2208 \u03a3 such that P(cid:0)Xt+1 = j|Xt = i(cid:1) > 0, |V (j) \u2212 V (i)| \u2264 \u03bdmax,\n3. For all x \u2208 \u03a3 such that V (x) > b, E(cid:2)V (Xt+1) \u2212 V (Xt)|Xt = x(cid:3) < \u2212\u03b3.\n\nAt \ufb01rst glance, this assumption may seem very restrictive. But in fact, this is quite reasonable: by\nthe Foster-Lyapunov criteria [20], a countable state space Markov chain is positive recurrent if and\n\n6\n\n\fonly if there exists a Lyapunov function V : \u03a3 \u2192 R+ that satis\ufb01es condition (1) and (3), as well\nas (2\u2019): E[V (Xt+1)|Xt = x] < \u221e for all x \u2208 \u03a3. Assumption 3.5 has (2), which is a restriction of\nthe condition (2\u2019). The existence of the Lyapunov function allows us to decompose the state space\ninto sets B and Bc such that for all nodes x \u2208 Bc, there is an expected decrease in the Lyapunov\nfunction in the next step or transition. Therefore, for all nodes in Bc, there is a negative drift towards\nset B. In addition, in any single step, the random walk cannot escape \u201ctoo far\u201d.\nUsing the concentration bounds for the countable state space settings, we can prove the following\ntheorems that parallel the theorems stated for the \ufb01nite state space setting. The formal statements\nare restricted to nodes in B = {i \u2208 \u03a3 : V (i) \u2264 b}. This is not actually restrictive, as for any i such\nthat V (i) > b, we can de\ufb01ne a new Lyapunov function where V (cid:48)(i) = b, and V (cid:48)(j) = V (j) for all\nj (cid:54)= i. Then B(cid:48) = B \u222a {i}, and V (cid:48) still satis\ufb01es assumption 3.5 for new values of \u03bdmax, \u03b3, and b.\nTheorem 3.6. For a Markov chain satisfying Assumption 3.5, for any i \u2208 B, with probability\ngreater than 1 \u2212 \u03b1, for all iterations t,\ni \u2212 \u03c0i\n\u02c6\u03c0(t)\ni\n\n2\u2212\u03b8(t)/Ri\n1 \u2212 2\u22121/Ri\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u02c6\u03c0(t)\n\n\u03c0i + \u0001,\n\n(cid:32)\n\nwhere Ri is de\ufb01ned such that\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u2264 4(1 \u2212 \u0001)\n(cid:18)\n\n(cid:33)\n(cid:19)\n\nRi = O\n\nH B\n\ni e2\u03b7\u03bdmax\n\n(1 \u2212 \u03c1)(e\u03b7\u03bdmax \u2212 \u03c1)\n\n,\n\nis the maximal hitting time over the Markov chain with its state space restricted to the\n\nand H B\ni\nsubset B. The scalars \u03b7 and \u03c1 are functions of \u03b3 and \u03bdmax (de\ufb01ned in [9]).\nTheorem 3.7. For a Markov chain satisfying Assumption 3.5,\n(a) For any node i \u2208 B such that \u03c0i < (1 \u2212 \u0001)\u2206/(1 + \u0001), with probability greater than 1 \u2212 \u03b1, the\ntotal number of steps used by the algorithm is bounded above by\n\n(cid:32)\n\n(cid:32)\n\n(cid:32)(cid:18)\n\n(cid:19)(cid:18) 1\n\nN (t) \u00b7 \u02c6T (t)\n\ni\n\n\u02dcO\n\nln( 1\n\u03b1 )\n\u00012\n\nRi ln\n\n1\n\n1 \u2212 2\u22121/Ri\n\n\u2212 1 + \u0001\n(1 \u2212 \u0001)\u2206\n\n\u03c0i\n\n(cid:19)\u22121(cid:33)(cid:33)(cid:33)\n\n.\n\ntmax(cid:88)\n\nk=1\n\n(b) For all nodes i \u2208 B, with probability greater than 1 \u2212 \u03b1, the total number of steps used by the\nalgorithm is bounded above by\n\ntmax(cid:88)\n\nk=1\n\nN (t) \u00b7 \u02c6T (t)\n\ni \u2264 \u02dcO\n\n(cid:18) ln( 1\n\n\u03b1 )\n\u00012\n\n(cid:18) Ri\n\n\u03b1\n\n(cid:18)\n\n(cid:18) 1\n\nln\n\n\u03c0i\n\n+\n\n1\n\n1 \u2212 2\u22121/Ri\n\n\u0001\u2206\n\n(cid:19)(cid:19)(cid:19)(cid:19)\n\n.\n\nIn order to prove these theorems, we build upon results of [9], and establish that return times have\nexponentially decaying tails for countable state space MCs that satisfy Assumption 3.5.\n\n4 Example applications: PageRank and MM1 Queue\n\nPageRank is frequently used to compute the importance of web pages in the web graph. Given a\nscalar parameter \u03b2 and a stochastic transition matrix P , let {Xt} be the Markov chain with transition\nn 1 \u00b7 1T + (1 \u2212 \u03b2)P . In every step, there is an \u03b2 probability of jumping uniformly randomly\nmatrix \u03b2\nto any other node in the network. PageRank is de\ufb01ned as the stationary distribution of this Markov\nchain. We apply our algorithm to compute PageRank on a random graph generated according to the\ncon\ufb01guration model with a power law degree distribution for \u03b2 = 0.15.\nIn queuing theory, Markov chains are used to model the queue length at a server, which evolves over\ntime as requests arrive and are processed. We use the basic MM1 queue, equivalent to a random\nwalk on Z+. Assume we have a single server where the requests arrive according to a Poisson\nprocess, and the processing time for a single request is distributed exponentially. The queue length\nis modeled with the Markov chain shown in Figure 1(b), where p is the probability that a new request\narrives before the current request is fully processed.\n\nFigures 2(a) and 2(b) plot \u02c6\u03c0(tmax)\nfor each node in the PageRank or MM1 queue MC,\nrespectively. For both examples, we choose algorithm parameters \u2206 = 0.02, \u0001 = 0.15, and \u03b1 = 0.2.\n\nand \u02dc\u03c0(tmax)\n\ni\n\ni\n\n7\n\n\f(a) PageRank Estimates\n\n(b) MM1 Estimates\n\n(c) PageRank \u2014Total Steps vs. \u2206\n\n(d) MM1 Queue \u2014Total Steps vs. \u2206\n\nFigure 2: Simulations showing results of our algorithm applied to PageRank and MM1 Queue setting\n\nObserve that the algorithm indeed obtains close estimates for nodes such that \u03c0i > \u2206, and for nodes\nsuch that \u03c0i \u2264 \u2206, the algorithm successfully outputs 0 (i.e., \u03c0i \u2264 \u2206). We observe that the method\nfor bias correction makes signi\ufb01cant improvements for estimating PageRank. We computed the\nfundamental matrix for the PageRank MC and observed that that Zmax(i) \u2248 1 for all i.\nFigures 2(c) and 2(d) show the computation time, or total number of random walk steps taken by\nour algorithm, as a function of \u2206. Each \ufb01gure shows the results from three different nodes, chosen\nto illustrate the behavior on nodes with varying \u03c0i. The \ufb01gures are shown on a log-log scale. The\n\u2206 ), which is\nresults con\ufb01rm that the computation time of the algorithm is upper bounded by O( 1\nlinear when plotted in log-log scale. When \u2206 > \u03c0i, the computation time behaves as 1\n\u2206. When\n\u2206 ), and is close to constant with respect to \u2206.\n\u2206 < \u03c0i, the computation time grows slower than O( 1\n\n5 Summary\n\nWe proposed a local algorithm for estimating the stationary probability of a node in a MC. The\nalgorithm is a truncated Monte Carlo method, sampling return paths to the node of interest. The\nalgorithm has many practical bene\ufb01ts. First, it can be implemented easily in a distributed and paral-\nlelized fashion, as it only involves sampling random walks using neighbor queries. Second, it only\nuses a constant size neighborhood around the node of interest, upper bounded by 1\n\u0001\u2206. Third, it only\nperforms computation at the node of interest. The computation only involves counting and taking\nan average, thus it is simple and memory ef\ufb01cient. We guarantee that the estimate \u02c6\u03c0(t)\n, is an upper\nbound for \u03c0i with high probability. For MCs that mix well, the estimate will be tight with high\nprobability for nodes such that \u03c0i > \u2206. The computation time of the algorithm is upper bounded by\nparameters of the algorithm, and constant with respect to the size of the state space. Therefore, this\nalgorithm is suitable for MCs with large state spaces.\nAcknowledgements: This work is supported in parts by ARO under MURI awards 58153-MA-MUR and\nW911NF-11-1-0036, and grant 56549-NS, and by NSF under grant CIF 1217043 and a Graduate Fellowship.\n\ni\n\n8\n\n02040608010000.010.020.030.040.050.060.070.08Anchor Node IDStationary Probability True value (\u03c0)Estimate (\u03c0hat)Estimate (\u03c0tilde)0102030405000.10.20.30.40.50.60.7Anchor Node IDStationary Probability True value (\u03c0)Estimate (\u03c0hat)Estimate (\u03c0tilde)10\u2212410\u2212310\u2212210\u22121100104105106107\u2206Total Steps Taken Node 1Node 2Node 310\u2212410\u2212310\u2212210\u22121100103104105106107108\u2206Total Steps Taken Node 1Node 2Node 3\fReferences\n[1] B. Cipra. The best of the 20th century: Editors name top 10 algorithms. SIAM News, 33(4):1,\n\nMay 2000.\n\n[2] T.M. Semkow, S. Pomm, S. Jerome, and D.J. Strom, editors. Applied Modeling and Computa-\n\ntions in Nuclear Science. American Chemical Society, Washington, DC, 2006.\n\n[3] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing\n\norder to the web. Technical Report 1999-66, November 1999.\n\n[4] S. Assmussen and P. Glynn. Stochastic Simulation: Algorithms and Analysis (Stochastic Mod-\n\nelling and Applied Probability). Springer, 2010.\n\n[5] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. Equation of\nstate calculations by fast computing machines. The Journal of Chemical Physics, 21:1087,\n1953.\n\n[6] W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applications.\n\nBiometrika, 57(1):97\u2013109, 1970.\n\n[7] G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins Studies in the Mathe-\n\nmatical Sciences. Johns Hopkins University Press, 1996.\n\n[8] D. Aldous and J. Fill. Reversible Markov chains and random walks on graphs: Chapter 2\n(General Markov chains). book in preparation. URL: http://www.stat.berkeley.edu/\u223caldous/\nRWG/Chap2.pdf , pages 7, 19\u201320, 1999.\n\n[9] B. Hajek. Hitting-time and occupation-time bounds implied by drift analysis with applications.\n\nAdvances in Applied probability, pages 502\u2013525, 1982.\n\n[10] P. Diaconis and L. Saloff-Coste. What do we know about the Metropolis algorithm? Journal\n\nof Computer and System Sciences, 57(1):20\u201336, 1998.\n\n[11] P. Diaconis. The Markov chain Monte Carlo revolution. Bulletin of the American Mathematical\n\nSociety, 46(2):179\u2013205, 2009.\n\n[12] D.A. Levin, Y. Peres, and E.L. Wilmer. Markov chains and mixing times. Amer Mathematical\n\nSociety, 2009.\n\n[13] G. Jeh and J. Widom. Scaling personalized web search. In Proceedings of the 12th interna-\n\ntional conference on World Wide Web, pages 271\u2013279, New York, NY, USA, 2003.\n\n[14] D. Fogaras, B. Racz, K. Csalogany, and T. Sarlos. Towards scaling fully personalized PageR-\nank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333\u2013358, 2005.\n[15] K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte Carlo methods in PageR-\nank computation: When one iteration is suf\ufb01cient. SIAM Journal on Numerical Analysis,\n45(2):890\u2013904, 2007.\n\n[16] B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental and personalized PageRank. Proc.\n\nVLDB Endow., 4(3):173\u2013184, December 2010.\n\n[17] C. Borgs, M. Brautbar, J. Chayes, and S.-H. Teng. Sublinear time algorithm for PageRank\n\ncomputations and related applications. CoRR, abs/1202.2771, 2012.\n\n[18] SP. Meyn and RL. Tweedie. Markov chains and stochastic stability. Springer-Verlag, 1993.\n[19] C.E. Lee, A. Ozdaglar, and D. Shah. Computing the stationary distribution locally. MIT LIDS\n\nReport 2914, Nov 2013. URL: http://www.mit.edu/\u223ccelee/LocalStationaryDistribution.pdf.\n\n[20] F.G. Foster. On the stochastic matrices associated with certain queuing processes. The Annals\n\nof Mathematical Statistics, 24(3):355\u2013360, 1953.\n\n9\n\n\f", "award": [], "sourceid": 698, "authors": [{"given_name": "Christina", "family_name": "Lee", "institution": "MIT"}, {"given_name": "Asuman", "family_name": "Ozdaglar", "institution": "MIT"}, {"given_name": "Devavrat", "family_name": "Shah", "institution": "MIT"}]}