{"title": "Nonbacktracking Bounds on the Influence in Independent Cascade Models", "book": "Advances in Neural Information Processing Systems", "page_first": 1407, "page_last": 1416, "abstract": "This paper develops upper and lower bounds on the influence measure in a network, more precisely, the expected number of nodes that a seed set can influence in the independent cascade model. In particular, our bounds exploit nonbacktracking walks, Fortuin-Kasteleyn-Ginibre type inequalities, and are computed by message passing algorithms. Nonbacktracking walks have recently allowed for headways in community detection, and this paper shows that their use can also impact the influence computation. Further, we provide parameterized versions of the bounds that control the trade-off between the efficiency and the accuracy. Finally, the tightness of the bounds is illustrated with simulations on various network models.", "full_text": "Nonbacktracking Bounds on the In\ufb02uence in\n\nIndependent Cascade Models\n\n1Program in Applied and Computational Mathematics 2The Department of Electrical Engineering\n\nEmmanuel Abbe1 2 Sanjeev Kulkarni2 Eun Jee Lee1\n\nPrinceton University\n\n{eabbe, kulkarni, ejlee}@princeton.edu\n\nAbstract\n\nThis paper develops upper and lower bounds on the in\ufb02uence measure in a network,\nmore precisely, the expected number of nodes that a seed set can in\ufb02uence in the\nindependent cascade model. In particular, our bounds exploit nonbacktracking\nwalks, Fortuin\u2013Kasteleyn\u2013Ginibre type inequalities, and are computed by message\npassing algorithms. Nonbacktracking walks have recently allowed for headways\nin community detection, and this paper shows that their use can also impact the\nin\ufb02uence computation. Further, we provide parameterized versions of the bounds\nthat control the trade-off between the ef\ufb01ciency and the accuracy. Finally, the\ntightness of the bounds is illustrated with simulations on various network models.\n\n1\n\nIntroduction\n\nIn\ufb02uence propagation is concerned with the diffusion of information from initially in\ufb02uenced nodes,\ncalled seeds, in a network. Understanding how information propagates in networks has become\na central problem in a broad range of \ufb01elds, such as viral marketing [18], sociology [9, 20, 24],\ncommunication [13], epidemiology [21], and social network analysis [25].\nOne of the most fundamental questions on in\ufb02uence propagation is to estimate the in\ufb02uence, i.e. the\nexpected number of in\ufb02uenced nodes at the end of the propagation given a set of seeds. Estimating\nthe in\ufb02uence is central to diverse research problems related to in\ufb02uence propagation, such as the\nwidely-known in\ufb02uence maximization problem \u2014 \ufb01nding a set of k nodes that maximizes the\nin\ufb02uence.\nRecent studies on in\ufb02uence propagation have proposed various algorithms [12, 19, 4, 8, 23, 22] for\nthe in\ufb02uence maximization problem while using Monte Carlo (MC) simulations to approximate the\nin\ufb02uence. The submodularity argument and the probabilistic error bound on MC give a probabilistic\nlower bound on the in\ufb02uence that is obtainable by the algorithms in terms of the true maximum\nin\ufb02uence. Despite its bene\ufb01ts on the in\ufb02uence maximization problem, approximating the in\ufb02uence\nvia MC simulations is far from ideal for large networks; in particular, MC may require a large amount\nof computations in order to stabilize the approximation.\nTo overcome the limitations of Monte Carlo simulations, many researchers have been taking both\nalgorithmic and theoretical approaches to approximate the in\ufb02uence of given seeds in a network.\nChen and Teng [3] provided a probabilistic guarantee on estimating the in\ufb02uence of a single seed with\na relative error bound with the expected running time O((cid:96)(|V | + |E|)|V | log |V |/\u03b52), such that with\nprobability 1 \u2212 1/n(cid:96), for all node v, the computed in\ufb02uence of v has relative error at most \u03b5. Draief\net al., [6] introduced an upper bound for the in\ufb02uence by using the spectral radius of the adjacency\nmatrix. Tighter upper bounds were later suggested in [17] which relate the ratio of in\ufb02uenced nodes\nin a network to the spectral radius of the so-called Hazard matrix. Further, improved upper bounds\nwhich account for sensitive edges were introduced in [16].\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fIn contrast, there has been little work on \ufb01nding a tight lower bound for the in\ufb02uence. An exception\nis a work by Khim et al. [14], where the lower bound is obtained by only considering the in\ufb02uence\nthrough the maximal-weighted paths.\nIn this paper, we propose both upper and lower bounds on the in\ufb02uence using nonbacktracking walks\nand Fortuin\u2013Kasteleyn\u2013Ginibre (FKG) type inequalities. The bounds can be ef\ufb01ciently obtained by\nmessage passing implementation. This shows that nonbacktracking walks can also impact in\ufb02uence\npropagation, making another case for the use of nonbacktracking walks in graphical model problems\nas in [15, 10, 2, 1], discussed later in the paper. Further, we provide a parametrized version of the\nbounds that can adjust the trade-off between the ef\ufb01ciency and the accuracy of the bounds.\n\n2 Background\n\n(cid:88)\n\nWe introduce here the independent cascade model and provide background for the main results.\nDe\ufb01nition 1 (Independent Cascade Model). Consider a directed graph G = (V, E) where |V | = n,\na transmission probability matrix P \u2208 [0, 1]n\u00d7n, and a seed set S0 \u2286 V . For all u \u2208 V , let N +(u)\nbe the set of out-neighbors of node u. The independent cascade model IC(G,P, S0) sequentially\ngenerates the in\ufb02uenced set St \u2286 V for each discrete time t \u2265 1 as follows. At time t, St is initialized\nto be an empty set. Then, each node u \u2208 St\u22121 attempts to in\ufb02uence v \u2208 N +(u)\\\u222at\u22121\ni=0Si with\nprobability Puv, i.e. node u in\ufb02uences its unin\ufb02uenced out-neighbor v with probability Puv. If v is\nin\ufb02uenced at time t, add v to St. The process stops at T if ST = \u2205 at the end of the step t = T . The\nset of the in\ufb02uenced nodes at the end of propagation is de\ufb01ned as S = \u222aT\u22121\nWe often refer an edge (u, v) being open if node u in\ufb02uences node v. The IC model is equivalent\nto the live-arc graph model, where the in\ufb02uence happens at once, rather than sequentially. The\nlive-arc graph model \ufb01rst decides the state of every edge with a Bernoulli trial, i.e. edge (u, v) is\nopen independently with probability Puv and closed, otherwise. Then, the set of in\ufb02uenced nodes is\nde\ufb01ned as the nodes that are reachable from at least one of the seeds by the open edges.\nDe\ufb01nition 2 (In\ufb02uence). The expected number of nodes that are in\ufb02uenced at the end of the prop-\nagation process is called the in\ufb02uence (rather than the expected in\ufb02uence, with a slight abuse of\nterminology) of IC(G,P, S0), and is de\ufb01ned as\n\ni=0 St.\n\n\u03c3(S0) =\n\nP(v is in\ufb02uenced).\n\n(1)\nv\u2208V\nIt is shown in [5] that computing the in\ufb02uence \u03c3(S0) in the independent cascade model IC(G,P, S0)\nis #P-hard, even with a single seed, i.e. |S0| = 1.\nNext, we de\ufb01ne nonbacktracking (NB) walks on a directed graph. Nonbacktracking walks have\nalready been used for studying the characteristics of networks. To the best of our knowledge, the use\nof NB walks in the context of epidemics was \ufb01rst introduced in the paper of Karrer et al. [11] and\nlater applied to percolation in [10]. In particular, Karrer et al. reformulate the spread of in\ufb02uence as a\nmessage passing process and demonstrate how the resulting equations can be used to calculate an\nupper bound on the number of nodes that are susceptible at a given time. As we shall see, we take a\ndifferent approach to the use of the NB walks, which focuses on the effective contribution of a node\nin in\ufb02uencing another node and accumulates such contributions to obtain upper and lower bounds.\nMore recently, nonbacktracking walks are used for community detection [15, 2, 1].\nDe\ufb01nition 3 (Nonbacktracking Walk). Let G = (V, E) be a directed graph. A nonbacktracking walk\nof length k is de\ufb01ned as w(k) = (v0, v1, . . . , vk), where vi \u2208 V and (vi\u22121, vi) \u2208 E for all i \u2208 [k],\nand vi\u22121 (cid:54)= vi+1 for all i \u2208 [k \u2212 1].\nWe next recall a key inequality introduced by Fortuin et. al [7].\nTheorem 1 (FKG Inequality). Let (\u0393,\u227a) be a distributive lattice, where \u0393 is a \ufb01nite partially ordered\nset, ordered by \u227a, and let \u00b5 be a positive measure on \u0393 satisfying the following condition: for all\nx, y \u2208 \u0393,\n\n\u00b5(x \u2227 y)\u00b5(x \u2228 y) \u2265 \u00b5(x)\u00b5(y),\n\nwhere x \u2227 y = max{z \u2208 \u0393 : z (cid:22) x, z (cid:22) y} and x \u2228 y = min{z \u2208 \u0393 : y (cid:22) z, y (cid:22) z}. Let f and g\nbe both increasing (or both decreasing) functions on \u0393. Then,\n\n\u00b5(x))(\n\nf (x)g(x)\u00b5(x)) \u2265 (\n\nf (x)\u00b5(x))(\n\ng(x)\u00b5(x)).\n\n(2)\n\n(cid:88)\n\n(\nx\u2208\u0393\n\n(cid:88)\n\nx\u2208\u0393\n\n(cid:88)\n\nx\u2208\u0393\n\n(cid:88)\n\nx\u2208\u0393\n\n2\n\n\fFKG inequality is instrumental in studying in\ufb02uence propagation since the probability that a node is\nin\ufb02uenced is nondecreasing with respect to the partial order of random variables describing the states,\nopen or closed, of the edges.\n\n3 Nonbacktracking bounds on the in\ufb02uence\n\nIn this section, we present upper and lower bounds on the in\ufb02uence in the independent cascade model\nand explain the motivations and intuitions of the bounds. The bounds utilize nonbacktracking walks\nand FKG inequalities and are computed ef\ufb01ciently by message passing algorithms. In particular,\nthe upper bound on a network based on a graph G(V, E) runs in O(|V |2 + |V ||E|) and the lower\nbound runs in O(|V | + |E|), whereas Monte Carlo simulation would require O(|V |3 + |V |2|E|)\ncomputations without knowing the variance of the in\ufb02uence, which is harder to estimate than the\nin\ufb02uence. The reason for the large computational complexity of MC is that in order to ensure\nthat the standard error of the estimation does not grow with respect to |V |, MC requires O(|V |2)\ncomputations. Hence, for large networks, where MC may not be feasible, our algorithms can still\nprovide bounds on the in\ufb02uence.\nFurthermore, from the proposed upper \u03c3+ and lower bounds \u03c3\u2212, we can compute an upper bound on\nthe variance given by (\u03c3+ \u2212 \u03c3\u2212)2/4. This could be used to estimate the number of computations\nneeded by MC. Computing the upper bound on the variance with the proposed bounds can be done in\nO(|V |2 +|V ||E|), whereas computing the variance with MC simulation requires O(|V |5 +|V |4|E|).\n\n3.1 Nonbacktracking upper bounds (NB-UB)\nWe start by de\ufb01ning the following terms for the independent cascade model IC(G,P, S0), where\nG = (V, E) and |V | = n.\nDe\ufb01nition 4. For any v \u2208 V , we de\ufb01ne the set of in-neighbors N\u2212(v) = {u \u2208 V : (u, v) \u2208 E} and\nthe set of out-neighbors N +(v) = {u \u2208 V : (v, u) \u2208 E}.\nDe\ufb01nition 5. For any v \u2208 V and l \u2208 [n \u2212 1], the set Pl(S0\u2192 v) is de\ufb01ned as the set of all paths\nwith length l from any seed s \u2208 S0 to v. We call a path P is open iff every edge in P is open. For\nl = 0, we de\ufb01ne P0(S0\u2192 v) as the set (of size one) of the zero-length path containing node v and\nassume the path P \u2208 P0(S0\u2192 v) is open iff v \u2208 S0.\nDe\ufb01nition 6. For any v \u2208 V and l \u2208 {0, . . . , n \u2212 1}, we de\ufb01ne\n\np(v) = P(v is in\ufb02uenced)\npl(v) = P(\u222aP\u2208Pl(S0\u2192v){P is open})\n\npl(u\u2192 v) = P(\u222aP\u2208Pl(S0\u2192u),P(cid:54)\u2208v{P is open and edge (u, v) is open})\n\n(3)\n(4)\n(5)\n\nIn other words, pl(v) is the probability that node v is in\ufb02uenced by open paths of length l, i.e. there\nexists an open path of length l from a seed to v, and pl(u\u2192 v) is the probability that v is in\ufb02uenced\nby node u with open paths of length l + 1, i.e. there exists an open path of length l + 1 from a seed to\nv that ends with edge (u, v).\nLemma 1. For any v \u2208 V ,\n\nFor any v \u2208 V and l \u2208 [n \u2212 1],\n\np(v) \u2264 1 \u2212 n\u22121(cid:89)\npl(v) \u2264 1 \u2212 (cid:89)\n\nl=0\n\n(1 \u2212 pl(v)).\n\n(6)\n\n(7)\n\n(1 \u2212 pl\u22121(u\u2192 v)).\n\nu\u2208N\u2212(v)\n\nLemma 1, which can be proved by FKG inequalities, suggests that given pl\u22121(u \u2192 v), we may\ncompute an upper bound on the in\ufb02uence. Ideally, pl\u22121(u\u2192 v) can be computed by considering all\npaths that end with (u, v) having length l. However, this results in exponential complexity O(nl),\nas l goes up to n \u2212 1. Thus, we present an ef\ufb01cient way to compute an upper bound UBl\u22121(u\u2192 v)\non pl\u22121(u\u2192 v), which in turns gives an upper bound UBl(v) on pl(v), with the following recursion\nformula.\n\n3\n\n\fDe\ufb01nition 7. For all l \u2208 {0, . . . , n\u22121} and u, v \u2208 V such that (u, v) \u2208 E, UBl(u) \u2208 [0, 1] and\nUBl(u\u2192 v) \u2208 [0, 1] are de\ufb01ned recursively as follows.\nInitial condition: For every s\u2208 S0, s+\u2208 N +(s), u\u2208 V \\S0, and v\u2208 N +(u),\n\nUB0(s) = 1, UB0(s\u2192 s+) = Pss+\nUB0(u) = 0, UB0(u\u2192 v) = 0.\n\n(8)\n(9)\nRecursion: For every l\u2208 [n\u22121], s\u2208 S0, s+\u2208 N +(s), s\u2212\u2208 N\u2212(s), u\u2208 V \\S0, and v\u2208 N +(u)\\S0,\n(10)\n(11)\n\nUBl(s) = 0, UBl(s\u2192 s+) = 0, UBl(s\u2212\u2192 s) = 0\n\nUBl(u) = 1 \u2212 (cid:89)\n\nw\u2208N\u2212(u)\n\n(1 \u2212 UBl\u22121(w\u2192 u))\n\n(cid:40)Puv(1 \u2212 1\u2212UBl(u)\n\n1\u2212UBl\u22121(v\u2192u) ),\n\nPuvUBl(u),\n\nUBl(u\u2192 v) =\n\nif v\u2208 N\u2212(u)\notherwise.\n\n(12)\n\nEquation (10) follows from that for any seed node s \u2208 S0 and for all l > 0, the probabilities\npl(s) = 0, pl(s \u2192 s+) = 0, and pl(s\u2212 \u2192 s) = 0. A naive way to compute UBl(u \u2192 v) is\nUBl(u\u2192 v) = PuvUBl\u22121(u), but this results in an extremely loose bound due to the backtracking.\nFor a tighter bound, we use nonbacktracking in Equation (12), i.e. when computing UBl(u\u2192 v), we\nignore the contribution of UBl\u22121(v\u2192 u).\nTheorem 2. For any independent cascade model IC(G,P, S0),\n\n\u03c3(S0) \u2264 (cid:88)\n\n(1 \u2212 n\u22121(cid:89)\n\nv\u2208V\n\nl=0\n\n(1 \u2212 UBl(v))) =: \u03c3+(S0),\n\n(13)\n\nwhere UBl(v) is obtained recursively as in De\ufb01nition 7.\n\nNext, we present Nonbacktracking Upper Bound (NB-UB) algorithm which computes UBl(v) and\nUBl(u\u2192 v) by message passing. At the l-th iteration, the variables in NB-UB represent as follows.\n\u00b7 Sl is the set of nodes that are processed at the l-th iteration.\n\u00b7 Mcurr(v) = {(u, UBl\u22121(u \u2192 v)) : u is an in-neighbor of v, and u \u2208 Sl\u22121} is the set of pairs\n(previously processed in-neighbor u of v, incoming message from u to v).\n\u00b7 MSrc(v) = {u : u is a in-neighbor of v, and u \u2208 Sl\u22121} is the set of in-neighbor nodes of v that\nwere processed at the previous step.\n\u00b7 Mcurr(v)[u] = UBl\u22121(u\u2192 v) is the incoming message from u to v.\n\u00b7 Mnext(v) = {(u, UBl(u \u2192 v)) : u is an in-neighbor of v, and u \u2208 Sl} is the set of pairs (cur-\nrently processed in-neighbor u, next iteration\u2019s incoming message from u to v).\n\nAlgorithm 1 Nonbacktracking Upper Bound (NB-UB)\n\nInitialize: UBl(v) = 0 for all 0 \u2264 l \u2264 n \u2212 1 and v \u2208 V\nInitialize: Insert (s, 1) to Mnext(s) for all s \u2208 S0\nfor l = 0 to n \u2212 1 do\n\nfor u \u2208 Sl do\n\nfor u \u2208 Sl do\n\nMcurr(u) = Mnext(u) and Clear Mnext(u)\nUBl(u) = ProcessIncomingMsgUB(Mcurr(u))\nfor v \u2208 N +(u) \\ S0 do\nSl+1.insert(v)\nif v \u2208 MSrc(u) then\n\nelse\n\nUBl(u\u2192 v) = GenerateOutgoingMsgUB(Mcurr(u)[v], UBl(u),Puv)\nMnext(v).insert((u, UBl(u\u2192 v))).\nUBl(u\u2192 v) = GenerateOutgoingMsgUB(0, UBl(u),Puv)\nMnext(v).insert((u, UBl(u\u2192 v))).\n\nOutput: UBl(u) for all l, u\n\n4\n\n\fAt the beginning, every seed node s \u2208 S0 is initialized such that Mcurr(s) = {(s, 1)} in order to\nsatisfy the initial condition, UB0(s) = 1. For each l-th iteration, every node u in Sl is processed as\nfollows. First, ProcessIncomingMsgUB(Mcurr(u)) computes UBl(u) as in Equation (11). Second, u\npasses a message to its neighbor v \u2208 N +(u) \\ S0 along the edge (u, v), and v stores (inserts) the\nmessage in Mnext(v) for the next iteration. The message contains 1) the source of the message, u, and\n2) UBl(u\u2192 v), which is computed as in Equation (12), by the function GenerateOutgoingMsgUB.\nFinally, the algorithm outputs UBl(u) for all u \u2208 V and l \u2208 {0, . . . , n\u22121}, and the upper bound\n\u03c3+(S0) is computed by Equation (13). The description of how the algorithm runs on a small network\ncan be found in the supplementary material.\nComputational complexity: Notice that for each iteration l \u2208 {0, . . . , n \u2212 1}, the algorithm\naccesses at most n nodes, and for each node v, the functions ProcessIncomingMsgUB and\nGenerateOutgoingMsgUB are computed in O(deg(v)) and O(1), respectively. Therefore, the worst\ncase computational complexity is O(|V |2 + |V ||E|).\n\n3.2 Nonbacktracking lower bounds (NB-LB)\nA naive way to compute a lower bound on the in\ufb02uence in a network IC(G,P, S0) is to reduce the\nnetwork to a (spanning) tree network, by removing edges. Then, since there is a unique path from\na node to another, we can compute the in\ufb02uence of the tree network, which is a lower bound on\nthe in\ufb02uence in the original network, in O(|V |). We take this approach of generating a subnetwork\nfrom the original network, yet we avoid the signi\ufb01cant gap between the bound and the in\ufb02uence by\nconsidering the following directed acyclic subnetwork, in which there is no backtracking walk.\nDe\ufb01nition 8 (Min-distance Directed Acyclic Subnetwork). Consider an independent cascade model\nIC(G,P, S0) with G = (V, E) and |V | = n. Let d(S0, v) := mins\u2208S0 d(s, v), i.e.\nthe mini-\nmum distance from a seed in S0 to v. A minimum-distance directed acyclic subnetwork (MDAS),\nIC(G(cid:48),P(cid:48), S0), where G(cid:48) = (V (cid:48), E(cid:48)), is obtained as follows.\n\u00b7 V (cid:48) = {v1, ..., vn} is an ordered set of nodes such that d(S0, vi) \u2264 d(S0, vj), for every i < j.\n\u00b7 E(cid:48) = {(vi, vj) \u2208 E : i < j}, i.e. E(cid:48) is obtained from E by removing edges whose source node\ncomes later in the order than its destination node.\n\u00b7 P(cid:48)\nIf there are multiple ordered sets of vertices satisfying the condition, we may choose one arbitrarily.\nFor any k \u2208 [n], let p(vk) be the probability that vk \u2208 V (cid:48) is in\ufb02uenced in the MDAS, IC(G(cid:48),P(cid:48), S0).\nSince p(vk) is equivalent to the probability of the union of the events that an in-neighbor ui \u2208 N\u2212(vk)\nin\ufb02uences vk, p(vk) can be computed by the principle of inclusion and exclusion. Thus, we may\ncompute a lower bound on p(vk), using Bonferroni inequalities, if we know the probabilities that\nin-neighbors u and v both in\ufb02uences vk, for every pair u, v \u2208 N\u2212(vk). However, computing such\nprobabilities can take O(kk). Hence, we present LB(vk) which ef\ufb01ciently computes a lower bound\non p(vk) by the following recursion.\nDe\ufb01nition 9. For all vk \u2208 V (cid:48), LB(vk) \u2208 [0, 1] is de\ufb01ned by the recursion on k as follows.\nInitial condition: For every vs \u2208 S0,\nRecursion: For every vk \u2208 V (cid:48) \\ S0,\n\n= Pvivj , if (vi, vj) \u2208 E(cid:48), and P(cid:48)\n\n= 0, otherwise.\n\n(14)\n\nvivj\n\nvivj\n\nLB(vk) =\n\nm\u2217(cid:88)\n\nLB(vs) = 1.\n\n\uf8eb\uf8edP(cid:48)\nuivkLB(ui)(1 \u2212 i\u22121(cid:88)\n\n\uf8f6\uf8f8 ,\nm\u2217 = max{m(cid:48) \u2264 m :(cid:80)m(cid:48)\u22121\nRemark. Since the i-th summand in Equation (15) can utilize (cid:80)i\u22122\ncomputed in (i\u22121)-th summand, to compute(cid:80)i\u22121\n\nj=1 P(cid:48)\n\n\u2264 1}.\n\nP(cid:48)\n\n)\n\nuj vk\n\nj=1\n\ni=1\n\nuj vk\n\nwhere N\u2212(vk) = {u1, . . . , um} is the ordered set of in-neighbors of vk in IC(G(cid:48),P(cid:48), S0) and\n\n(15)\n\n(16)\n\nTheorem 3. For any independent cascade model IC(G,P, S0) and its MDAS IC(G(cid:48),P(cid:48), S0),\n\nuj vk, which is already\nuj vk, the summation takes at most O(deg(vk)).\n\nj=1 P(cid:48)\n\nj=1 P(cid:48)\n\n\u03c3(S0) \u2265 (cid:88)\n\nvk\u2208V (cid:48)\n\nLB(vk) =: \u03c3\u2212(S0),\n\nwhere LB(vk) is obtained recursively as in De\ufb01nition 9.\n\n5\n\n\fNext, we present Nonbacktracking Lower Bound (NB-LB) algorithm which ef\ufb01ciently computes\nLB(vk). At the k-th iteration, the key variable in NB-LB has the following meaning.\n) : vj is an in-neighbor of vk} is the set of pairs (incoming message\n\u00b7 M(vk) = {(LB(vj),P(cid:48)\nfrom an in-neighbor vj to vk, the transmission probability of edge (vj, vk)).\n\nvj vk\n\nAlgorithm 2 Nonbacktracking Lower Bound (NB-LB)\n\nInput: directed acyclic network IC(G(cid:48),P(cid:48), S0)\nInitialize: \u03c3\u2212 = 0\nInitialize: Insert (1, 1) to M(vi) for all vi \u2208 S0\nfor k = 1 to n do\nLB(vk) = ProcessIncomingMsgLB(M(vk))\n\u03c3\u2212 += LB(vk)\nfor vl \u2208 N +(vk) \\ S0 do\n\nM(vl).insert((LB(vk),P(cid:48)\n\nOutput: \u03c3\u2212\n\n))\n\nvkvl\n\nAt the beginning, every seed node s \u2208 S0 is initialized such that M(s) = {(1, 1)} in order to satisfy\nthe initial condition, LB(s) = 1. For each k-th iteration, node vk is processed as follows. First,\nLB(vk) is computed as in the Equation (15), by the function ProcessIncomingMsgLB, and added\nto \u03c3\u2212. Second, vk passes the message (LB(vk),P(cid:48)\n) to its out-neighbor vl \u2208 N +(vk)\\S0, and vl\nstores (inserts) it in M(vl). Finally, the algorithm outputs \u03c3\u2212, the lower bound on the in\ufb02uence. The\ndescription of how the algorithm runs on a small network can be found in the supplementary material.\nComputational complexity: Obtaining an arbitrary directed acyclic subnetwork from the original\nnetwork takes O(|V | + |E|). Next, the algorithm iterates through the nodes V (cid:48) = {v1, . . . , vn}.\nFor each node vk, ProcessIncomingMsgLB takes O(deg(vk)), and vk sends messages to its out-\nneighbors in O(deg(vk)). Hence, the worst case computational complexity is O(|V | + |E|).\n\nvkvl\n\n3.3 Tunable bounds\n\nIn this section, we brie\ufb02y introduce the parametrized version of NB-UB and NB-LB which provide\ncontrol to adjust the trade-off between the ef\ufb01ciency and the accuracy of the bounds.\nUpper bounds (tNB-UB): Given a non-negative integer t \u2264 n \u2212 1, for every node u \u2208 V , we\ncompute the probability p\u2264t(u) that node u is in\ufb02uenced by open paths whose length is less than or\nequal to t, and for each v \u2208 N +(u), we compute the probability pt(u\u2192 v). Then, we start NB-UB\nfrom l = t + 1 with the new initial conditions that UBt(u\u2192 v) = pt(u\u2192 v) and UBt(u) = p\u2264t(u),\n\nand compute the upper bound as(cid:80)\n\nv\u2208V (1 \u2212(cid:81)n\u22121\n\nl=t (1 \u2212 UBl(v))).\n\nFor higher values of t, the algorithm results in tighter upper bounds, while the computational\ncomplexity may increase exponentially for dense networks. Thus, this method is most applicable in\nsparse networks, where the degree of each node is bounded.\nLower bounds (tNB-LB): We \ufb01rst order the set of nodes {v1, . . . , vn} such that d(S0, vi) \u2264\nd(S0, vj) for every i < j. Given a non-negative integer t \u2264 n, we obtain a subnetwork\nIC(G[Vt],P[Vt], S0 \u2229 Vt) of size t, where G[Vt] is the subgraph induced by the set of nodes\nVt = {v1, . . . , vt}, and P[Vt] is the corresponding transmission probability matrix. For each\nvi \u2208 Vt, we compute the exact probability pt(vi) that node vi is in\ufb02uenced in the subnetwork\nIC(G[Vt],P[Vt], S0 \u2229 Vt). Then, we start NB-LB from i = t + 1 with the new initial conditions that\nLB(vk) = pt(vk), for all k \u2264 t.\nFor larger t, the algorithm results in tighter lower bounds. However, the computational complexity\nmay increase exponentially with respect to t, the size of the subnetwork. This algorithm can\nadopt Monte Carlo simulations on the subnetwork to avoid the large computational complexity.\nHowever, this modi\ufb01cation results in probabilistic lower bounds, rather than theoretically guaranteed\nlower bounds. Nonetheless, this can still give a signi\ufb01cant improvement, because the Monte Carlo\nsimulations on a smaller size of network require less computation to stabilize the estimation.\n\n6\n\n\f4 Experimental Results\n\nIn this section, we evaluate the NB-UB and NB-LB in independent cascade models on a variety of\nclassical synthetic networks.\nNetwork Generation. We consider 4 classical random graph models with the parameters shown\nas follows: Erdos Renyi random graphs with ER(n = 1000, p = 0.003), scale-free networks\nSF (n = 1000, \u03b1 = 2.5), random regular graphs Reg(n = 1000, d = 3), and random tree graphs\nwith power-law degree distributions T (n = 1000, \u03b1 = 3). For each graph model, we generate 100\nnetworks IC(G, pA,{s}) as follows. The graph G is the largest connected component of a graph\ndrawn from the graph model, the seed node s is a randomly selected vertex, and A is the adjacency\nmatrix of G. The corresponding IC model has the same transmission probability p for every edge.\nEvaluation of Bounds. For each network generated, we compute the following quantities for\neach p \u2208 {0.1, 0.2, . . . , 0.9}.\n\u00b7 \u03c3mc: the estimation of the in\ufb02uence with 106 Monte Carlo simulations.\n\u00b7 \u03c3+: the upper bound obtained by NB-UB.\n\u00b7 \u03c3+\nspec: the spectral upper bound by [17].\n\u00b7 \u03c3\u2212: the lower bound obtained by NB-LB.\n\u00b7 \u03c3\u2212\n\nprob: the probabilistic lower bound obtained by 10 Monte Carlo simulations.\n\nFigure 1: This \ufb01gure compares the average relative gap of the bounds: NB-UB, the spectral upper bound in [17],\nNB-LB, and the probabilistic lower bound computed by MC simulations, for various types of networks.\n\n(cid:80)\n\nThe probabilistic lower bound is chosen for the experiments since there has not been any tight lower\nbound. The sample size of 10 is determined to overly match the computational complexity of NB-LB\nalgorithm. In Figure 1, we compare the average relative gap of the bounds for every network model\nand for each transmission probability, where the true value is assumed to be \u03c3mc. For example,\nthe average relative gap of NB-UB for 100 Erdos Renyi networks {Ni}100\ni=1 with the transmission\n, where \u03c3+[Ni] and \u03c3mc[Ni] denote the\nprobability p is computed by 1\nNB-UB and the MC estimation, respectively, for the network Ni.\n100\nResults. Figure 1 shows that NB-UB outperforms the upper bound in [17] for the Erdos-Renyi and\nrandom 3-regular networks, and performs comparably for the scale-free networks. Also, NB-LB gives\ntighter bounds than the MC bounds on the Erdos-Renyi, scale-free, and random regular networks\nwhen the transmission probability is small, p < 0.4. Both NB-UB and NB-LB compute the exact\nin\ufb02uence for the tree networks since both algorithms avoid backtracking walks.\nNext, we show the bounds on exemplary networks.\n\n\u03c3+[Ni]\u2212\u03c3mc[Ni]\n\ni\u2208[100]\n\n\u03c3mc[Ni]\n\n4.1 Upper Bounds\n\nSelection of Networks. In order to illustrate a typical behavior of the bounds, we have chosen\nthe network in Figure 2a as follows. First, we generate 100 random 3-regular graphs G with 1000\nnodes and assign a random seed s. Then, the corresponding IC model is de\ufb01ned as IC(G,P =\n\n7\n\n\f(a)\n\n(b)\n\nFigure 2: (a) The \ufb01gure compares various upper bounds on the in\ufb02uence in the 3-regular network in section 4.1.\nThe MC upper bounds are computed with various simulation sizes and shown with the data points indicated\nwith MC(N ), where N is the number of simulations. The spectral upper bound in [17] is shown in red line, and\nNB-UB is shown in green line.\n(b) The \ufb01gure shows lower bounds on the in\ufb02uence of a scale-free network in section 4.2. The probabilistic\nlower bounds shown with points are obtained from Monte Carlo simulations with various simulation sizes, and\nthe data points indicated with MC(N ) are obtained by N number of simulations. NB-LB is shown in green line.\npA, S0 = {s}). For each network, we compute NB-UB and MC estimation. Then, we compute the\nscore for each network, where the score is de\ufb01ned as the sum of the square differences between the\nupper bounds and MC estimations over the transmission probability p \u2208 {0.1, 0.2,. . ., 0.9}. Finally,\na graph whose score is the median from all 100 scores is chosen for Figure 2a.\nResults. In \ufb01gure 2a, we compare 1) the upper bounds introduced [17] and 2) the probabilistic upper\nbounds obtained by Monte Carlo simulations with 99% con\ufb01dence level, to NB-UB. The MC upper\nbounds are computed with the various sample sizes N \u2208 {5, 10, 30, 300, 3000}. It is evident from\nthe \ufb01gure that a larger sample size provides a tighter probabilistic upper bound. NB-UB outperforms\nthe bound by [17] and the probabilistic MC bound when the transmission probability is relatively\nsmall. Further, it shows a similar trend as the MC simulations with a large sample size.\n\n4.2 Lower Bounds\n\nSelection of Networks. We adopt a similar selection process as in the selection for the upper\nbounds, but with the scale free networks, with 3000 nodes and \u03b1 = 2.5.\nResults. We compare probabilistic lower bounds obtained by MC with 99% con\ufb01dence level\nto NB-LB. The lower bounds from Monte Carlo simulations are computed with various sample\nsizes N \u2208 {5, 12, 30, 300, 3000}, which accounts for a constant, log(|V |), 0.01|V |, 0.1|V |, and\n|V |. NB-LB outperforms the probabilistic bounds by MC with small sample sizes. Recall that\nthe computational complexity of the lower bound in algorithm 2 is O(|V | + |E|), which is the\ncomputational complexity of a constant number of Monte Carlo simulations. In \ufb01gure 2b, it shows\nthat NB-LB is tighter than the probabilistic lower bounds with the same computational complexity,\nand it also agrees with the behavior of the MC simulations.\n\n5 Conclusion\n\nIn this paper, we propose both upper and lower bounds on the in\ufb02uence in the independent cascade\nmodels and provide algorithms to ef\ufb01ciently compute the bounds. We extend the results by proposing\ntunable bounds which can adjust the trade-off between the ef\ufb01ciency and the accuracy. Finally, the\ntightness and the performance of the bounds are shown with the experimental results. One can further\nimprove the bounds considering r-nonbacktracking walks, i.e. avoiding cycles of length r rather than\njust backtracks, and we leave this for future study.\n\nAcknowledgement. The authors thank Colin Sandon for helpful discussions.\nThis re-\nsearch was partly supported by the NSF CAREER Award CCF-1552131 and the ARO grant\nW911NF-16-1-0051\n\n8\n\n0010203040506070809010000.050.10.150.20.250.30.350.40.450.5InfluenceTransmission ProbabilityUpper bounds of the influenceMC (5)MC (10)MC (30)MC (300)MC (3000)NB-UBSpectral010203040506070809010000.10.20.30.40.5010020030040050060070080090010000.50.60.70.80.91180.92605.80500100015002000250000.10.20.30.40.50.60.70.80.91InfluenceTransmission ProbabilityLower bounds of the influenceMC (5)MC (12)MC (30)MC (300)MC (3000)NB-LB\fReferences\n[1] E. Abbe and C. Sandon. Detection in the stochastic block model with multiple clusters: proof of\nthe achievability conjectures, acyclic bp, and the information-computation gap. arXiv preprint\narXiv:1512.09080, 2015.\n\n[2] C. Bordenave, M. Lelarge, and L. Massouli\u00e9. Non-backtracking spectrum of random graphs:\ncommunity detection and non-regular ramanujan graphs. In Foundations of Computer Science\n(FOCS), 2015 IEEE 56th Annual Symposium on, pages 1347\u20131357. IEEE, 2015.\n\n[3] W. Chen and S.-H. Teng. Interplay between social in\ufb02uence and network centrality: A compar-\native study on shapley centrality and single-node-in\ufb02uence centrality. In Proceedings of the\n26th International Conference on World Wide Web, pages 967\u2013976. International World Wide\nWeb Conferences Steering Committee, 2017.\n\n[4] W. Chen, Y. Wang, and S. Yang. Ef\ufb01cient in\ufb02uence maximization in social networks. In\nProceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and\ndata mining, pages 199\u2013208. ACM, 2009.\n\n[5] W. Chen, Y. Yuan, and L. Zhang. Scalable in\ufb02uence maximization in social networks under the\nlinear threshold model. In Data Mining (ICDM), 2010 IEEE 10th International Conference on,\npages 88\u201397. IEEE, 2010.\n\n[6] M. Draief, A. Ganesh, and L. Massouli\u00e9. Thresholds for virus spread on networks. In Pro-\nceedings of the 1st international conference on Performance evaluation methodolgies and tools,\npage 51. ACM, 2006.\n\n[7] C. M. Fortuin, P. W. Kasteleyn, and J. Ginibre. Correlation inequalities on some partially\n\nordered sets. Communications in Mathematical Physics, 22(2):89\u2013103, 1971.\n\n[8] A. Goyal, W. Lu, and L. V. Lakshmanan. Celf++: optimizing the greedy algorithm for in\ufb02uence\nmaximization in social networks. In Proceedings of the 20th international conference companion\non World wide web, pages 47\u201348. ACM, 2011.\n\n[9] M. Granovetter. Threshold models of collective behavior. American journal of sociology, pages\n\n1420\u20131443, 1978.\n\n[10] B. Karrer, M. Newman, and L. Zdeborov\u00e1. Percolation on sparse networks. Physical review\n\nletters, 113(20):208702, 2014.\n\n[11] B. Karrer and M. E. Newman. Message passing approach for general epidemic models. Physical\n\nReview E, 82(1):016101, 2010.\n\n[12] D. Kempe, J. Kleinberg, and \u00c9. Tardos. Maximizing the spread of in\ufb02uence through a social\nnetwork. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge\ndiscovery and data mining, pages 137\u2013146. ACM, 2003.\n\n[13] A. Khelil, C. Becker, J. Tian, and K. Rothermel. An epidemic model for information diffusion\nin manets. In Proceedings of the 5th ACM international workshop on Modeling analysis and\nsimulation of wireless and mobile systems, pages 54\u201360. ACM, 2002.\n\n[14] J. T. Khim, V. Jog, and P.-L. Loh. Computing and maximizing in\ufb02uence in linear threshold and\ntriggering models. In Advances in Neural Information Processing Systems, pages 4538\u20134546,\n2016.\n\n[15] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborov\u00e1, and P. Zhang. Spectral\nredemption in clustering sparse networks. Proceedings of the National Academy of Sciences,\n110(52):20935\u201320940, 2013.\n\n[16] E. J. Lee, S. Kamath, E. Abbe, and S. R. Kulkarni. Spectral bounds for independent cascade\nmodel with sensitive edges. In 2016 Annual Conference on Information Science and Systems\n(CISS), pages 649\u2013653, March 2016.\n\n9\n\n\f[17] R. Lemonnier, K. Scaman, and N. Vayatis. Tight bounds for in\ufb02uence in diffusion networks\nand application to bond percolation and epidemiology. In Advances in Neural Information\nProcessing Systems, pages 846\u2013854, 2014.\n\n[18] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM\n\nTransactions on the Web (TWEB), 1(1):5, 2007.\n\n[19] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective\nIn Proceedings of the 13th ACM SIGKDD international\n\noutbreak detection in networks.\nconference on Knowledge discovery and data mining, pages 420\u2013429. ACM, 2007.\n\n[20] D. Lopez-Pintado and D. J. Watts. Social in\ufb02uence, binary decisions and collective dynamics.\n\nRationality and Society, 20(4):399\u2013443, 2008.\n\n[21] B. Shulgin, L. Stone, and Z. Agur. Pulse vaccination strategy in the sir epidemic model. Bulletin\n\nof Mathematical Biology, 60(6):1123\u20131148, 1998.\n\n[22] Y. Tang, X. Xiao, and Y. Shi. In\ufb02uence maximization: Near-optimal time complexity meets\npractical ef\ufb01ciency. In Proceedings of the 2014 ACM SIGMOD international conference on\nManagement of data, pages 75\u201386. ACM, 2014.\n\n[23] C. Wang, W. Chen, and Y. Wang. Scalable in\ufb02uence maximization for independent cascade\nmodel in large-scale social networks. Data Mining and Knowledge Discovery, 25(3):545\u2013576,\n2012.\n\n[24] D. J. Watts. A simple model of global cascades on random networks. Proceedings of the\n\nNational Academy of Sciences, 99(9):5766\u20135771, 2002.\n\n[25] J. Yang and S. Counts. Predicting the speed, scale, and range of information diffusion in twitter.\n\n2010.\n\n10\n\n\f", "award": [], "sourceid": 913, "authors": [{"given_name": "Emmanuel", "family_name": "Abbe", "institution": "Princeton University"}, {"given_name": "Sanjeev", "family_name": "Kulkarni", "institution": "Princeton University"}, {"given_name": "Eun Jee", "family_name": "Lee", "institution": "Princeton University"}]}