{"title": "From random walks to distances on unweighted graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 3429, "page_last": 3437, "abstract": "Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited.We introduce and develop a class of techniques for analyzing random walks on graphs using stochastic calculus. Using these techniques we generalize results on the degeneracy of hitting times and analyze a metric based on the Laplace transformed hitting time (LTHT). The metric serves as a natural, provably well-behaved alternative to the expected hitting time. We establish a general correspondence between hitting times of the Brownian motion and analogous hitting times on the graph. We show that the LTHT is consistent with respect to the underlying metric of a geometric graph, preserves clustering tendency, and remains robust against random addition of non-geometric edges. Tests on simulated and real-world data show that the LTHT matches theoretical predictions and outperforms alternatives.", "full_text": "From random walks to distances on\n\nunweighted graphs\n\nTatsunori B. Hashimoto\n\nMIT EECS\n\nthashim@mit.edu\n\nTommi S. Jaakkola\n\nMIT EECS\n\ntommi@mit.edu\n\nYi Sun\n\nMIT Mathematics\nyisun@mit.edu\n\nAbstract\n\nLarge unweighted directed graphs are commonly used to capture relations be-\ntween entities. A fundamental problem in the analysis of such networks is to\nproperly de\ufb01ne the similarity or dissimilarity between any two vertices. Despite\nthe signi\ufb01cance of this problem, statistical characterization of the proposed met-\nrics has been limited.\nWe introduce and develop a class of techniques for analyzing random walks on\ngraphs using stochastic calculus. Using these techniques we generalize results on\nthe degeneracy of hitting times and analyze a metric based on the Laplace trans-\nformed hitting time (LTHT). The metric serves as a natural, provably well-behaved\nalternative to the expected hitting time. We establish a general correspondence\nbetween hitting times of the Brownian motion and analogous hitting times on the\ngraph. We show that the LTHT is consistent with respect to the underlying metric\nof a geometric graph, preserves clustering tendency, and remains robust against\nrandom addition of non-geometric edges. Tests on simulated and real-world data\nshow that the LTHT matches theoretical predictions and outperforms alternatives.\n\n1\n\nIntroduction\n\nMany network metrics have been introduced to measure the similarity between any two vertices.\nSuch metrics can be used for a variety of purposes, including uncovering missing edges or pruning\nspurious ones. Since the metrics tacitly assume that vertices lie in a latent (metric) space, one could\nexpect that they also recover the underlying metric in some well-de\ufb01ned limit. Surprisingly, there\nare nearly no known results on this type of consistency. Indeed, it was recently shown [19] that the\nexpected hitting time degenerates and does not measure any notion of distance.\nWe analyze an improved hitting-time metric \u2013 Laplace transformed hitting time (LTHT) \u2013 and rigor-\nously evaluate its consistency, cluster-preservation, and robustness under a general network model\nwhich encapsulates the latent space assumption. This network model, speci\ufb01ed in Section 2, posits\nthat vertices lie in a latent metric space, and edges are drawn between nearby vertices in that space.\nTo analyze the LTHT, we develop two key technical tools. We establish a correspondence between\nfunctionals of hitting time for random walks on graphs, on the one hand, and limiting It\u00f4 processes\n(Corollary 4.4) on the other. Moreover, we construct a weighted random walk on the graph whose\nlimit is a Brownian motion (Corollary 4.1). We apply these tools to obtain three main results.\nFirst, our Theorem 3.5 recapitulates and generalizes the result of [19] pertaining to degeneration\nof expected hitting time in the limit. Our proof is direct and demonstrates the broader applicabil-\nity of the techniques to general random walk based algorithms. Second, we analyze the Laplace\ntransformed hitting time as a one-parameter family of improved distance estimators based on ran-\ndom walks on the graph. We prove that there exists a scaling limit for the parameter \u03b2 such that\nthe LTHT can become the shortest path distance (Theorem S5.2) or a consistent metric estimator\naveraging over many paths (Theorem 4.5). Finally, we prove that the LTHT captures the advantages\n\n1\n\n\fof random-walk based metrics by respecting the cluster structure (Theorem 4.6) and robustly recov-\nering similarity queries when the majority of edges carry no geometric information (Theorem 4.9).\nWe now discuss the relation of our work to prior work on similarity estimation.\nQuasi-walk metrics: There is a growing literature on graph metrics that attempts to correct the\ndegeneracy of expected hitting time [19] by interpolating between expected hitting time and shortest\npath distance. The work closest to ours is the analysis of the phase transition of the p-resistance\nmetric in [1] which proves that p-resistances are nondegenerate for some p; however, their work did\nnot address consistency or bias of p-resistances. Other approaches to quasi-walk metrics such as\nlogarithmic-forest [3], distributed routing distances [16], truncated hitting times [12], and random-\nized shortest paths [8, 21] exist but their statistical properties are unknown. Our paper is the \ufb01rst to\nprove consistency properties of a quasi-walk metric.\nNonparametric statistics: In the nonparametric statistics literature, the behavior of k-nearest neigh-\nbor and \u03b5-ball graphs has been the focus of extensive study. For undirected graphs, Laplacian-based\ntechniques have yielded consistency for clusters [18] and shortest paths [2] as well as the degener-\nacy of expected hitting time [19]. Algorithms for exactly embedding k-nearest neighbor graphs are\nsimilar and generate metric estimates, but require knowledge of the graph construction method, and\ntheir consistency properties are unknown [13]. Stochastic differential equation techniques similar\nto ours were applied to prove Laplacian convergence results in [17], while the process-level con-\nvergence was exploited in [6]. Our work advances the techniques of [6] by extracting more robust\nestimators from process-level information.\nNetwork analysis: The task of predicting missing links in a graph, known as link prediction, is one\nof the most popular uses of similarity estimation. The survey [9] compares several common link\nprediction methods on synthetic benchmarks. The consistency of some local similarity metrics such\nas the number of shared neighbors was analyzed under a single generative model for graphs in [11].\nOur results extend this analysis to a global, walk-based metric under weaker model assumptions.\n\n2 Continuum limits of random walks on networks\n\n2.1 De\ufb01nition of a spatial graph\n\nWe take a generative approach to de\ufb01ning similarity between vertices. We suppose that each vertex\ni of a graph is associated with a latent coordinate xi \u2208 Rd and that the probability of \ufb01nding an edge\nbetween two vertices depends solely on their latent coordinates. In this model, given only the un-\nweighted edge connectivity of a graph, we de\ufb01ne natural distances between vertices as the distances\nbetween the latent coordinates xi. Formally, let X = {x1, x2, . . .} \u2282 Rd be an in\ufb01nite sequence\nof points drawn i.i.d. from a differentiable density with bounded log gradient p(x) with compact\nsupport D. A spatial graph is de\ufb01ned by the following:\nDe\ufb01nition 2.1 (Spatial graph). Let \u03b5n : Xn \u2192 R>0 be a local scale function and h : R\u22650 \u2192 [0, 1]\na piecewise continuous function with h(x) = 0 for x > 1, h(1) > 0, and h left-continuous at 1. The\nspatial graph Gn corresponding to \u03b5n and h is the random graph with vertex set Xn and a directed\nedge from xi to xj with probability pij = h(|xi \u2212 xj|\u03b5n(xi)\u22121).\nThis graph was proposed in [6] as the generalization of k-nearest neighbors to isotropic kernels. To\nmake inference tractable, we focus on the large-graph, small-neighborhood limit as n \u2192 \u221e and\n\u03b5n(x) \u2192 0. In particular, we will suppose that there exist scaling constants gn and a deterministic\ncontinuous function \u03b5 : D \u2192 R>0 so that\nd+2 log(n)\u2212 1\n\nn \u2192 \u03b5(x) for x \u2208 Xn,\n\ngn \u2192 0,\n\n1\n\ngnn\n\nd+2 \u2192 \u221e,\nwhere the \ufb01nal convergence is uniform in x and a.s.\nrepresents a bound on the asymptotic sparsity of the graph.\nWe give a few concrete examples to make the quantities h, gn, and \u03b5n clear.\n\n\u03b5n(x)g\u22121\n\nin the draw of X . The scaling constant gn\n\n1. The directed k-nearest neighbor graph is de\ufb01ned by setting h(x) = 1x\u2208[0,1], the indicator\nfunction of the unit interval, \u03b5n(x) the distance to the kth nearest neighbor, and gn =\n(k/n)1/d the rate at which \u03b5n(x) approaches zero.\n\n2\n\n\f2. A Gaussian kernel graph is approximated by setting h(x) = exp(\u2212x2/\u03c32)1x\u2208[0,1]. The\ntruncation of the Gaussian tails at \u03c3 is an analytic convenience rather than a fundamental\nlimitation, and the bandwidth can be varied by rescaling \u03b5n(x).\n\n2.2 Continuum limit of the random walk\n\nOur techniques rely on analysis of the limiting behavior of the simple random walk X n\nt on a spatial\ngraph Gn, viewed as a discrete-time Markov process with domain D. The increment at step t of\nis a jump to a random point in Xn which lies within the ball of radius \u03b5n(X n\nt ) around X n\nt .\nX n\nt\nWe observe three effects: (A) the random walk jumps more frequently towards regions of high\ndensity; (B) the random walk moves more quickly whenever \u03b5n(X n\nt ) is large; (C) for \u03b5n small and\n0 is the sum of many small independent (but not\na large step count t, the random variable X n\nnecessarily identically distributed) increments. In the n \u2192 \u221e limit, we may identify X n\nt with a\ncontinuous-time stochastic process satisfying (A), (B), and (C) via the following result, which is a\nslight strengthening of [6, Theorem 3.4] obtained by applying [15, Theorem 11.2.3] in place of the\noriginal result of Stroock-Varadhan.\nt converges uniformly in Skorokhod space D([0,\u221e), D)\nTheorem 2.2. The simple random walk X n\nn to the It\u00f4 process Y(cid:98)t valued in the space of continuous functions\ndY(cid:98)t = \u2207 log(p(Y(cid:98)t))\u03b5(Y(cid:98)t)2/3d(cid:98)t + \u03b5(Y(cid:98)t)/\nC([0,\u221e), D) with re\ufb02ecting boundary conditions on D de\ufb01ned by\n\u221a\n\nafter a time scaling (cid:98)t = tg2\n\n3dW(cid:98)t.\n\nt \u2212 X n\n\n(1)\n\nWe view Theorem 2.2 as a method to understand the simple random walk X n\n\nEffects (A), (B), and (C) may be seen in the stochastic differential equation (1) as follows. The\n\ndirection of the drift is controlled by \u2207 log(p(Y(cid:98)t)), the rate of drift is controlled by \u03b5(Y(cid:98)t)2, and the\n\u221a\nnoise is driven by a Brownian motion W(cid:98)t with location-dependent scaling \u03b5(Y(cid:98)t)/\nt through the continu-\nous walk Y(cid:98)t. Attributes of stochastic processes such as stationary distribution or hitting time may be\nde\ufb01ned for both Y(cid:98)t and X n\nt , and in many cases Theorem 2.2 implies that an appropriately-rescaled\nversion of the discrete attribute will converge to the continuous one. Because attributes of the con-\ntinuous process Y(cid:98)t can reveal information about proximity between points, this provides a general\nframework for inference in spatial graphs. We use hitting times of the continuous process to a do-\nmain E \u2282 D to prove properties of the hitting time of a simple random walk on a graph via the limit\narguments of Theorem 2.2.\n\n3.1\n\n3 Degeneracy of expected hitting times in networks\n\nThe hitting time, commute time, and resistance distance are popular measures of distance based upon\nthe random walk which are believed to be robust and capture the cluster structure of the network.\nHowever, it was shown in a surprising result in [19] that on undirected geometric graphs the scaled\nexpected hitting time from xi to xj converges to inverse of the degree of xj.\nIn Theorem 3.5, we give an intuitive explanation and generalization of this result by showing that\nif the random walk on a graph converges to any limiting It\u00f4 process in dimension d \u2265 2, the\nscaled expected hitting time to any point converges to the inverse of the stationary distribution. This\nanswers the open problem in [19] on the degeneracy of hitting times for directed graphs and graphs\nwith general degree distributions such as directed k-nearest neighbor graphs, lattices, and power-law\ngraphs with convergent random walks. Our proof can be understood as \ufb01rst extending the transience\nor neighborhood recurrence of Brownian motion for d \u2265 2 to more general It\u00f4 processes and then\nconnecting hitting times on graphs to their It\u00f4 process equivalents.\n\n3.1 Typical hitting times are large\n\nWe will prove the following lemma that hitting a given vertex quickly is unlikely. Let T xi\nhitting time to xj of X n\n\nxj ,n be the\nE be the continuous equivalent for Y(cid:98)t to hit E \u2282 D .\n1Both the variance \u0398(\u03b5n(x)2) and expected value \u0398(\u2207 log(p(x))\u03b5n(x)2) of a single step in the simple\nn in Theorem 2.2 was chosen so that as n \u2192 \u221e there are g\u22122\nrandom walk are \u0398(g2\nn\ndiscrete steps taken per unit time, meaning the total drift and variance per unit time tend to a non-trivial limit.\n\nn). The time scaling(cid:98)t = tg2\n\nt started at xi and T xi\n\n3\n\n\fLemma 3.1 (Typical hitting times are large). For any d \u2265 2, c > 0, and \u03b4 > 0, for large enough n\nwe have P(T xi\n\nn ) > 1 \u2212 \u03b4.\n\nxj ,n > cg\u22122\n\nTo prove Lemma 3.1, we require the following tail bound following from the Feynman-Kac theorem.\nTheorem 3.2 ([10, Exercise 9.12] Feynman-Kac for the Laplace transform). The Laplace transform\nof the hitting time (LTHT) u(x) = E[exp(\u2212\u03b2T x\nE)] is the solution to the boundary value problem\nwith boundary condition u|\u2202E = 1:\n\nTr[\u03c3T H(u)\u03c3] + \u00b5(x) \u00b7 \u2207u \u2212 \u03b2u = 0.\n\n1\n2\n\nThis will allow us to bound the hitting time to the ball B(xj, s) of radius s centered at xj.\nLemma 3.3. For x, y \u2208 D, d \u2265 2, and any \u03b4 > 0, there exists s > 0 such that E[e\n\n\u2212T x\n\nB(y,s) ] < \u03b4.\n\nProof. We compare the Laplace transformed hitting time of the general It\u00f4 process to that of Brow-\nnian motion via Feynman-Kac and handle the latter case directly. Details are in Section S2.1.\n\nWe now use Lemma 3.3 to prove Lemma 3.1.\n\nProof of Lemma 3.1. Our proof proceeds in two steps. First, we have T xi\nany s > 0 because xj \u2208 B(xj, s), so by Theorem 2.2, we have\nB(xj ,s),ng\u22122\n\nxj ,ng\u22122\n\n\u2212T xi\n\n\u2212T xi\n\n\u2212T xi\n\nE[e\n\nlim\nn\u2192\u221e\n\nxj ,n \u2265 T xi\n\nB(xj ,s),n a.s. for\n\nApplying Lemma 3.3, we have E[e\ncombined with (2) implies P(T xi\n\nn ] \u2264 lim\nn\u2192\u221e\n\u2212T xi\n\nE[e\n\nn ] = E[e\n\n(2)\n2 \u03b4e\u2212c for some s > 0. For large enough n, this\n\nB(xj ,s)].\n\nxj ,n \u2264 cg\u22122\n\nB(xj ,s)] < 1\nn )e\u2212c < \u03b4e\u2212c and hence P(T xi\n\nxj ,n \u2264 cg\u22122\n\nn ) < \u03b4.\n\n3.2 Expected hitting times degenerate to the stationary distribution\n\nTo translate results from It\u00f4 processes to directed graphs, we require a regularity condition. Let\n0 = xi. We make the following\nqt(xj, xi) denote the probability that X n\ntechnical conjecture which we assume holds for all spatial graphs.\n((cid:63)) For t = \u0398(g\u22122\nLet \u03c0X n (x) denote the stationary distribution of X n\nunder conditions implied by our condition ((cid:63)) (Corollary S2.6).\n\nn ), the rescaled marginal nqt(x, xi) is a.s. eventually uniformly equicontinuous.2\nt . The following was shown in [6, Theorem 2.1]\n\nTheorem 3.4. Assuming ((cid:63)), for a\u22121 =(cid:82) p(x)2\u03b5(x)\u22122dx, we have the a.s. limit\n\nt = xj conditioned on X n\n\n(cid:98)\u03c0(x) := lim\n\nn\u2192\u221e n\u03c0X n (x) = a\n\np(x)\n\u03b5(x)2 .\n\nWe may now express the limit of expected hitting time in terms of this result.\nTheorem 3.5. For d \u2265 2 and any i, j, we have\nE[T xi\nxj ,n]\nn\n\na.s.\u2192 1(cid:98)\u03c0(xj)\n\n.\n\nProof. We give a sketch. By Lemma 3.1, the random walk started at xi does not hit xj within cg\u22122\nn\nt mixes at exponential\nsteps with high probability. By Theorem S2.5, the simple random walk X n\nrate, implying in Lemma S2.8 that the probability of \ufb01rst hitting at step t > cg\u22122\nn is approximately\nthe stationary distribution at xj. Expected hitting time is then shown to approximate the expectation\nof a geometric random variable. See Section S2 for a full proof.\n\nTheorem 3.5 is illustrated in Figures 1A and 1B, which show with only 3000 points, expected hitting\ntimes on a k-nearest neighbor graph degenerates to the stationary distribution. 3\n\n2Assumption ((cid:63)) is related to smoothing properties of the graph Laplacian and is known to hold for undi-\nrected graphs [4]. No directed analogue is known, and [6] conjectured a weaker property for all spatial graphs.\nSee Section S1 for further details.\n\n3Surprisingly, [19] proved that 1-D hitting times diverge despite convergence of the continuous equivalent.\nThis occurs because the discrete walk can jump past the target point. In Section S2.4, we consider 1-D hitting\n\n4\n\n\fFigure 1: Estimated distance from orange starting point on a k-nearest neighbor graph constructed\non two clusters. A and B show degeneracy of hitting times (Theorem 3.5). C, D, and E show that\nlog-LTHT interpolate between hitting time and shortest path.\n\n4 The Laplace transformed hitting time (LTHT)\n\nIn Theorem 3.5 we showed that expected hitting time is degenerate because a simple random walk\nmixes before hitting its target. To correct this we penalize longer paths. More precisely, consider for\n\n(cid:98)\u03b2 > 0 and \u03b2n = (cid:98)\u03b2g2\n\nn the Laplace transforms E[e\u2212(cid:98)\u03b2T x\n\nE ] and E[e\u2212\u03b2nT x\n\nE,n ] of T x\n\nE and T x\n\nE,n.\n\nThese Laplace transformed hitting times (LTHT\u2019s) have three advantages. First, while the expected\nhitting time of a Brownian motion to a domain is dominated by long paths, the LTHT is dominated\nby direct paths. Second, the LTHT for the It\u00f4 process can be derived in closed form via the Feynman-\nKac theorem, allowing us to make use of techniques from continuous stochastic processes to control\nthe continuum LTHT. Lastly, the LTHT can be computed both by sampling and in closed form as a\nmatrix inversion (Section S3). Now de\ufb01ne the scaled log-LTHT as\n\nxj ,n ])/(cid:112)2\u03b2ngn.\n\n\u2212 log(E[e\n\n\u2212\u03b2nT xi\n\nTaking different scalings for \u03b2n with n interpolates between expected hitting time (\u03b2n \u2192 0 on a\n\ufb01xed graph) and shortest path distance (\u03b2n \u2192 \u221e) (Figures 1C, D, and E). In Theorem 4.5, we show\nn) yields a consistent distance measure retaining the unique\n\nthat the intermediate scaling \u03b2n = \u0398((cid:98)\u03b2g2\n\nproperties of hitting times. Most of our results on the LTHT are novel for any quasi-walk metric.\nWhile considering the Laplace transform of the hitting time is novel to our work, this metric has been\nused in the literature in an ad-hoc manner in various forms as a similarity metric for collaboration\nnetworks [20], hidden subgraph detection [14], and robust shortest path distance [21]. However,\nthese papers only considered the elementary properties of the limits \u03b2n \u2192 0 and \u03b2n \u2192 \u221e. Our\nconsistency proof demonstrates the advantage of the stochastic process approach.\n\n4.1 Consistency\nIt was shown previously that for n \ufb01xed and \u03b2n \u2192 \u221e, \u2212 log(E[\u2212\u03b2nT xi\nxj ,n])/\u03b2ngn converges to\nshortest path distance from xi to xj. We investigate more precise behavior in terms of the scaling of\n\u03b2n. There are two regimes: if \u03b2n = \u03c9(log(gd\nnn)), then the shortest path dominates and the LTHT\nn), the graph log-LTHT\n\nconverges to shortest path distance (See Theorem S5.2). If \u03b2n = \u0398((cid:98)\u03b2g2\nconverges to its continuous equivalent, which for large (cid:98)\u03b2 averages over random walks concentrated\naround the geodesic. To show consistency for \u03b2n = \u0398((cid:98)\u03b2g2\n\nn), we proceed in three steps: (1) we\nreweight the random walk on the graph so the limiting process is Brownian motion; (2) we show\nthat log-LTHT for Brownian motion recovers latent distance; (3) we show that log-LTHT for the\nreweighted walk converges to its continuous limit; (4) we conclude that log-LTHT of the reweighted\nwalk recovers latent distance.\n(1) Reweighting the random walk to converge to Brownian motion: We de\ufb01ne weights using the\n\nestimators(cid:98)p and(cid:98)\u03b5 for p(x) and \u03b5(x) from [6].\n\ntimes to small out neighbors which corrects this problem and derive closed form solutions (Theorem S2.12).\nThis hitting time is non-degenerate but highly biased due to boundary terms (Corollary S2.14).\n\n5\n\n\fTheorem 4.1. Let (cid:98)p and (cid:98)\u03b5 be consistent estimators of the density and local scale and A be the\nadjacency matrix. Then the random walk (cid:98)X n\n\nt de\ufb01ned below converges to a Brownian motion.\n\nP((cid:98)X n\nt+1 = xj | (cid:98)X n\n\nt = xi) =\n\n(cid:40) Ai,j(cid:98)p(xj )\u22121\nk Ai,k(cid:98)p(xk)\u22121(cid:98)\u03b5(xi)\u22122\n(cid:80)\n1 \u2212(cid:98)\u03b5(xi)\u22122\n\ni (cid:54)= j\ni = j\n\nProof. Reweighting by(cid:98)p and(cid:98)\u03b5 is designed to cancel the drift and diffusion terms in Theorem 2.2 by\n\nensuring that as n grows large, jumps have means approaching 0 and variances which are asymptot-\nically equal (but decaying with n). See Theorem S4.1. 4\n\n(2) Log-LTHT for a Brownian motion: Let Wt be a Brownian motion with W0 = xi, and let\nT\n\nxi\nB(xj ,s) be the hitting time of Wt to B(xj, s). We show that log-LTHT converges to distance.\n\nLemma 4.2. For any \u03b1 < 0, if(cid:98)\u03b2 = s\u03b1, as s \u2192 0 we have\n(cid:113)\n2(cid:98)\u03b2 \u2192 |xi \u2212 xj|.\n\n\u2212 log(E[exp(\u2212(cid:98)\u03b2T\n\nxi\nB(xj ,s))])/\n\nProof. We consider hitting time of Brownian motion started at distance |xi \u2212 xj| from the origin to\ndistance s of the origin, which is controlled by a Bessel process. See Subsection S6.1 for details.\n\nn): To compare continuous and discrete log-LTHT\u2019s, we\nwill \ufb01rst de\ufb01ne the s-neighborhood of a vertex xi on Gn as the graph equivalent of the ball B(xi, s).\n\n(3) Convergence of LTHT for \u03b2n = \u0398((cid:98)\u03b2g2\nDe\ufb01nition 4.3 (s-neighborhood). Let(cid:98)\u03b5(x) be the consistent estimate of the local scale from [6] so\nthat(cid:98)\u03b5(x) \u2192 \u03b5(x) uniformly a.s. as n \u2192 \u221e. The(cid:98)\u03b5-weight of a path xi1 \u2192 \u00b7\u00b7\u00b7 \u2192 xil is the sum\n(cid:80)l\u22121\nm=1(cid:98)\u03b5(xim ) of vertex weights(cid:98)\u03b5(xi). For s > 0 and x \u2208 Gn, the s-neighborhood of x is\nFor xi, xj \u2208 Gn, let (cid:98)T xi\n\nn(x) := {y | there is a path x \u2192 y of(cid:98)\u03b5-weight \u2264 g\u22121\nn s}.\n\nB(xj ,s) be the hitting time of the transformed walk on Gn from xi to NBs\n\nWe now verify that hitting times to the s-neighborhood on graphs and the s-radius ball coincide.\nCorollary 4.4. For s > 0, we have g2\n\nn(xj).\n\nd\u2192 T\n\nNBs\n\nn(cid:98)T xi\n\nNBs\n\nn(xj ),n\n\nxi\nB(xj ,s).\n\nProof. We verify that the ball and the neighborhood have nearly identical sets of points and apply\nTheorem 2.2. See Subsection S6.2 for details.\n\n(4) Proving consistency of log-LTHT: Properly accounting for boundary effects, we obtain a con-\nsistency result for the log-LTHT for small neighborhood hitting times.\nTheorem 4.5. Let xi, xj \u2208 Gn be connected by a geodesic not intersecting \u2202D. For any \u03b4 > 0,\nn, for large n we have with high probability\n\nthere exists a choice of(cid:98)\u03b2 and s > 0 so that if \u03b2n = (cid:98)\u03b2g2\n(cid:113)\n\n(cid:12)(cid:12)(cid:12)(cid:12) < \u03b4.\n2(cid:98)\u03b2 \u2212 |xi \u2212 xj|\n\n(cid:12)(cid:12)(cid:12)(cid:12)\u2212 log(E[exp(\u2212\u03b2n(cid:98)T xi\n\nn(xj ),n)])/\n\nNBs\n\nProof of Theorem 4.5. The proof has three steps. First, we convert to the continuous setting via\nCorollary 4.4. Second, we show the contribution of the boundary is negligible. The conclusion\nfollows from the explicit computation of Lemma S6.1. Full details are in Section S6.\n\nThe stochastic process limit based proof of Theorem 4.5 implies that the log-LTHT is consistent and\nrobust to small perturbations to the graph which preserve the same limit (Supp. Section S8).\n\n4This is a special case of a more general theorem for transforming limits of graph random walks (Theorem\n\nS4.1). Figure S1 shows that this modi\ufb01cation is highly effective in practice.\n\n6\n\n\f4.2 Bias\n\nRandom walk based metrics are often motivated as recovering a cluster preserving metric. We now\nshow that the log-LTHT of the un-weighted simple random walk preserves the underlying cluster\nstructure. In the 1-D case, we provide a complete characterization.\nTheorem 4.6. Suppose the spatial graph has d = 1 and h(x) = 1x\u2208[0,1]. Let T xi\n(xj ),n\nthe hitting time of a simple random walk from xi to the out-neighborhood of xj. It converges to\n\nbe\n\nn\n\nNB(cid:98)\u03b5(xj )gn\n2\u03b2)/(cid:112)2\u03b2\n\n(cid:17)\n\n,\n\nlog(1 + e\u2212\u221a\n\n\u2212 log(E[\u2212\u03b2T xi\n\nNB(cid:98)\u03b5(xj )gn\n\nn\n\n(xj ),n\n\n])/(cid:112)8\u03b2 \u2192\n\n(cid:90) xj\n(cid:16) \u2202 log(p(x))\n\nxi\n\n(cid:112)m(x)dx + o\n(cid:17)2\n\n(cid:16)\n\nwhere m(x) = 2\n\n\u03b5(x)2 + 1\n\n\u03b2\n\n\u2202 log(p(x))\n\n\u2202x2\n\n+ 1\n\u03b2\n\n\u2202x\n\nde\ufb01nes a density-sensitive metric.\n\nProof. Apply the WKBJ approximation for Schrodinger equations to the Feynman-Kac PDE from\nTheorem 3.2. See Corollary S7.2 and Corollary S2.13 for a full proof.\n\nThe leading order terms of the density-sensitive metric appropriately penalize crossing regions of\nlarge changes to the log density; this is not the case for the expected hitting time (Theorem S2.12).\n\n4.3 Robustness\n\nWhile shortest path distance is a consistent measure of the underlying metric, it breaks down catas-\ntrophically with the addition of a single non-geometric edge and does not meaningfully rank vertices\nthat share an edge. In contrast, we show that LTHT breaks ties between vertices via the resource\nallocation (RA) index, a robust local similarity metric under Erd\u02ddos-R\u00e9nyi-type noise. 5\nDe\ufb01nition 4.7. The noisy spatial graph Gn over Xn with noise terms q1(n), . . ., qn(n) is con-\nstructed by drawing an edge from xi to xj with probability\n\npij = h(|xi \u2212 xj|\u03b5n(xi)\u22121)(1 \u2212 qj(n)) + qj(n).\n\nn (xi) as Rij := (cid:80)\n\nDe\ufb01ne the directed RA index in terms of the out-neighborhood set NBn(xi) and the in-neighborhood\nset NBin\nij :=\nxk\u2208NBn(xi)\u2229NBin\n\u2212 log(E[exp(\u2212\u03b2T xi\nxj ,n > 1]). 6 We show two step log-LTHT and RA index give equivalent\nmethods for testing if vertices are within distance \u03b5n(x).\nTheorem 4.8. If \u03b2 = \u03c9(log(gd\nM ts\n\nnn)) and xi and xj have at least one common neighbor, then\nij \u2212 2\u03b2 \u2192 \u2212 log(Rij) + log(|NBn(xi)|).\n\nn(xj ) |NBn(xk)|\u22121 and two step log-LTHT by M ts\n\nxj ,n) | T xi\n\nProof. Let Pij(t) be the probability of going from xi to xj in t steps, and Hij(t) the probability of\nnot hitting before time t. Factoring the two-step hitting time yields\n\nM ts\n\nij = 2\u03b2 \u2212 log(Pij(2)) \u2212 log\n\n1 +\n\n(cid:16)\n\n\u221e(cid:88)\n\nt=3\n\nPij(t)\nPij(2)\n\nHij(t)e\u2212\u03b2(t\u22122)(cid:17)\n\n.\n\nLet kmax be the maximal out-degree in Gn. The contribution of paths of length greater than 2\nvanishes because Hij(t) \u2264 1 and Pij(t)/Pij(2) \u2264 k2\nmax, which is dominated by e\u2212\u03b2 for \u03b2 =\n\u03c9(log(gnn)). Noting that Pij(2) = Rij\n\n|NBn(xi)| concludes. For full details see Theorem S9.1.\n\nFor edge identi\ufb01cation within distance \u03b5n(x), the RA index is robust even at noise level q = o(gd/2\n\nn ).\n\n5Modifying the graph by changing fewer than g2\n\nn/n edges does not affect the continuum limit of the random\ngraph, and therefore preserve the LTHT with parameter \u03b2 = \u0398(g2\nn). While this weak bound allows on average\no(1) noise edges per vertex, it does show that the LTHT is substantially more robust than shortest paths without\nmodi\ufb01cation. See Section S8 for proofs.\n\n6The conditioning T xi\n\nxj ,n > 1 is natural in link-prediction tasks where only pairs of disconnected vertices\n\nare queried. Empirically, we observe it is critical to performance (Figure 3).\n\n7\n\n\fFigure 2: The LTHT recovered deleted edges\nmost consistently on a citation network\n\nFigure 3: The two-step LTHT (de\ufb01ned above\nTheorem 4.8) outperforms others at word simi-\nlarity estimation including the basic log-LTHT.\n\nTheorem 4.9. If qi = q = o(gd/2\ni, j, with probability at least 1 \u2212 \u03b4 we have\n\nn ) for all i, for any \u03b4 > 0 there are c1, c2 and hn so that for any\n\n\u2022 |xi \u2212 xj| < min{\u03b5n(xi), \u03b5n(xj)} if Rijhn < c1;\n\u2022 |xi \u2212 xj| > 2 max{\u03b5n(xi), \u03b5n(xj)} if Rijhn > c2.\n\nProof. The minimal RA index follows from standard concentration arguments (see S9.2).\n\n5 Link prediction tasks\n\nWe compare the LTHT against other baseline measures of vertex similarity: shortest path distance,\nexpected hitting time, number of common neighbors, and the RA index. A comprehensive evaluation\nof these quasi-walk metrics was performed in [8] who showed that a metric equivalent to the LTHT\nperformed best. We consider two separate link prediction tasks on the largest connected component\nof vertices of degree at least \ufb01ve, \ufb01xing \u03b2 = 0.2.7 The degree constraint is to ensure that local\nmethods using number of common neighbors such as the resource allocation index do not have an\nexcessive number of ties. Code to generate \ufb01gures in this paper are contained in the supplement.\nCitation network: The KDD 2003 challenge dataset [5] includes a directed, unweighted network\nof e-print arXiv citations whose dense connected component has 11,042 vertices and 222,027 edges.\nWe use the same benchmark method as [9] where we delete a single edge and compare the similarity\nof the deleted edge against the set of control pair of vertices i, j which do not share an edge. We\ncount the fraction of pairs on which each method rank the deleted edge higher than all other methods.\nWe \ufb01nd that LTHT is consistently best at this task (Figure 2). 8\nAssociative Thesaurus network: The Edinburgh associative thesaurus [7] is a network with a dense\nconnected component of 7754 vertices and 246,609 edges in which subjects were shown a set of ten\nwords and for each word was asked to respond with the \ufb01rst word to occur to them. Each vertex\nrepresents a word and each edge is a weighted, directed edge where the weight from xi to xj is the\nnumber of subjects who responded with word xj given word xi.\nWe measure performance by whether strong associations with more than ten responses can be dis-\ntinguished from weak ones with only one response. We \ufb01nd that the LTHT performs best and that\npreventing one-step jumps is critical to performance as predicted by Theorem 4.8 (Figure 3).\n\n6 Conclusion\n\nOur work has developed an asymptotic equivalence between hitting times for random walks on\ngraphs and those for diffusion processes. Using this, we have provided a short extension of the\nproof for the divergence of expected hitting times, and derived a new consistent graph metric that\nis theoretically principled, computationally tractable, and empirically successful at well-established\nlink prediction benchmarks. These results open the way for the development of other principled\nquasi-walk metrics that can provably recover underlying latent similarities for spatial graphs.\n\n7Results are qualitatively identical when varying \u03b2 from 0.1 to 1; see supplement for details.\n8The two-step LTHT is not shown since it is equivalent to the LTHT in missing link prediction.\n\n8\n\n\fReferences\n[1] M. Alamgir and U. von Luxburg. Phase transition in the family of p-resistances. In Advances in Neural\n\nInformation Processing Systems, pages 379\u2013387, 2011.\n\n[2] M. Alamgir and U. von Luxburg. Shortest path distance in random k-nearest neighbor graphs. In Pro-\nceedings of the 29th International Conference on Machine Learning (ICML-12), pages 1031\u20131038, 2012.\n\n[3] P. Chebotarev. A class of graph-geodetic distances generalizing the shortest-path and the resistance dis-\n\ntances. Discrete Applied Mathematics, 159(5):295\u2013302, 2011.\n\n[4] D. A. Croydon and B. M. Hambly. Local limit theorems for sequences of simple random walks on graphs.\n\nPotential Analysis, 29(4):351\u2013389, 2008.\n\n[5] J. Gehrke, P. Ginsparg, and J. Kleinberg. Overview of the 2003 KDD Cup. ACM SIGKDD Explorations\n\nNewsletter, 5(2):149\u2013151, 2003.\n\n[6] T. B. Hashimoto, Y. Sun, and T. S. Jaakkola. Metric recovery from directed unweighted graphs.\n\nIn\nProceedings of the Eighteenth International Conference on Arti\ufb01cial Intelligence and Statistics, pages\n342\u2013350, 2015.\n\n[7] G. R. Kiss, C. Armstrong, R. Milroy, and J. Piper. An associative thesaurus of English and its computer\n\nanalysis. The Computer and Literary Studies, pages 153\u2013165, 1973.\n\n[8] I. Kivim\u00e4ki, M. Shimbo, and M. Saerens. Developments in the theory of randomized shortest paths with a\ncomparison of graph node distances. Physica A: Statistical Mechanics and its Applications, 393:600\u2013616,\n2014.\n\n[9] L. L\u00fc and T. Zhou. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics\n\nand its Applications, 390(6):1150\u20131170, 2011.\n\n[10] B. \u00d8ksendal. Stochastic differential equations: An introduction with applications. Universitext. Springer-\n\nVerlag, Berlin, sixth edition, 2003.\n\n[11] P. Sarkar, D. Chakrabarti, and A. W. Moore. Theoretical justi\ufb01cation of popular link prediction heuristics.\nIn IJCAI Proceedings-International Joint Conference on Arti\ufb01cial Intelligence, volume 22, page 2722,\n2011.\n\n[12] P. Sarkar and A. W. Moore. A tractable approach to \ufb01nding closest truncated-commute-time neighbors in\n\nlarge graphs. In In Proc. UAI, 2007.\n\n[13] B. Shaw and T. Jebara. Structure preserving embedding. In Proceedings of the 26th Annual International\n\nConference on Machine Learning, pages 937\u2013944. ACM, 2009.\n\n[14] S. T. Smith, E. K. Kao, K. D. Senne, G. Bernstein, and S. Philips. Bayesian discovery of threat networks.\n\nIEEE Transactions on Signal Processing, 62:5324\u20135338, 2014.\n\n[15] D. W. Stroock and S. S. Varadhan. Multidimensional diffussion processes, volume 233. Springer Science\n\n& Business Media, 1979.\n\n[16] A. Tahbaz-Salehi and A. Jadbabaie. A one-parameter family of distributed consensus algorithms with\nboundary: From shortest paths to mean hitting times. In Decision and Control, 2006 45th IEEE Confer-\nence on, pages 4664\u20134669. IEEE, 2006.\n\n[17] D. Ting, L. Huang, and M. I. Jordan. An analysis of the convergence of graph Laplacians. In Proceedings\n\nof the 27th International Conference on Machine Learning (ICML-10), pages 1079\u20131086, 2010.\n\n[18] U. von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. The Annals of Statistics,\n\npages 555\u2013586, 2008.\n\n[19] U. von Luxburg, A. Radl, and M. Hein. Hitting and commute times in large random neighborhood graphs.\n\nJournal of Machine Learning Research, 15:1751\u20131798, 2014.\n\n[20] M. Yazdani. Similarity Learning Over Large Collaborative Networks. PhD thesis, \u00c9cole Polytechnique\n\nF\u00e9d\u00e9rale de Lausanne, 2013.\n\n[21] L. Yen, M. Saerens, A. Mantrach, and M. Shimbo. A family of dissimilarity measures between nodes\ngeneralizing both the shortest-path and the commute-time distances. In Proceedings of the 14th ACM\nSIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785\u2013793. ACM,\n2008.\n\n9\n\n\f", "award": [], "sourceid": 1889, "authors": [{"given_name": "Tatsunori", "family_name": "Hashimoto", "institution": "MIT CSAIL"}, {"given_name": "Yi", "family_name": "Sun", "institution": "MIT Mathematics"}, {"given_name": "Tommi", "family_name": "Jaakkola", "institution": "MIT"}]}