{"title": "Tight Bounds for Influence in Diffusion Networks and Application to Bond Percolation and Epidemiology", "book": "Advances in Neural Information Processing Systems", "page_first": 846, "page_last": 854, "abstract": "In this paper, we derive theoretical bounds for the long-term influence of a node in an Independent Cascade Model (ICM). We relate these bounds to the spectral radius of a particular matrix and show that the behavior is sub-critical when this spectral radius is lower than 1. More specifically, we point out that, in general networks, the sub-critical regime behaves in O(sqrt(n)) where n is the size of the network, and that this upper bound is met for star-shaped networks. We apply our results to epidemiology and percolation on arbitrary networks, and derive a bound for the critical value beyond which a giant connected component arises. Finally, we show empirically the tightness of our bounds for a large family of networks.", "full_text": "Tight Bounds for In\ufb02uence in Diffusion Networks and\nApplication to Bond Percolation and Epidemiology\n\nR\u00b4emi Lemonnier1,2\nNicolas Vayatis1\n1CMLA \u2013 ENS Cachan, CNRS, France, 21000mercis, Paris, France\n{lemonnier, scaman, vayatis}@cmla.ens-cachan.fr\n\nKevin Scaman1\n\nAbstract\n\nIn this paper, we derive theoretical bounds for the long-term in\ufb02uence of a node\nin an Independent Cascade Model (ICM). We relate these bounds to the spectral\nradius of a particular matrix and show that the behavior is sub-critical when this\n\u221a\nspectral radius is lower than 1. More speci\ufb01cally, we point out that, in general\nn) where n is the size of the\nnetworks, the sub-critical regime behaves in O(\nnetwork, and that this upper bound is met for star-shaped networks. We apply our\nresults to epidemiology and percolation on arbitrary networks, and derive a bound\nfor the critical value beyond which a giant connected component arises. Finally,\nwe show empirically the tightness of our bounds for a large family of networks.\n\n1\n\nIntroduction\n\nThe emergence of social graphs of the World Wide Web has had a considerable effect on propaga-\ntion of ideas or information. For advertisers, these new diffusion networks have become a favored\nvector for viral marketing operations, that consist of advertisements that people are likely to share\nby themselves with their social circle, thus creating a propagation dynamics somewhat similar to\nthe spreading of a virus in epidemiology ([1]). Of particular interest is the problem of in\ufb02uence\nmaximization, which consists of selecting the top-k nodes of the network to infect at time t = 0\nin order to maximize in expectation the \ufb01nal number of infected nodes at the end of the epidemic.\nThis problem was \ufb01rst formulated by Domingues and Richardson in [2] and later expressed in [3]\nas an NP-hard discrete optimization problem under the Independent Cascade (IC) framework, a\nwidely-used probabilistic model for information propagation.\nFrom an algorithmic point of view, in\ufb02uence maximization has been fairly well studied. Assuming\nthe transmission probability of all edges are known, Kempe, Kleinberg and Tardos ([3]) derived\na greedy algorithm based on Monte-Carlo simulations that was shown to approximate the optimal\nsolution up to a factor 1\u2212 1\ne , building on classical results of optimization theory. Since then, various\ntechniques were proposed in order to signi\ufb01cantly improve the scalability of this algorithm ([4, 5, 6,\n7]), and also to provide an estimate of the transmission probabilities from real data ([8, 9]). Recently,\na series of papers ([10, 11, 12]) introduced continuous-time diffusion networks in which infection\nspreads during a time period T at varying rates across the different edges. While these models\nprovide a more accurate representation of real-world networks for \ufb01nite T , they are equivalent to the\nIC model when T \u2192 \u221e. In this paper, will focus on such long-term behavior of the contagion.\nFrom a theoretical point of view, little is known about the in\ufb02uence maximization problem under the\nIC model framework. The most celebrated result established by Newman ([13]) proves the equiva-\nlence between bond percolation and the Susceptible-Infected-Removed (SIR) model in epidemiology\n([14]) that can be identi\ufb01ed to a special case of IC model where transmission probability are equal\namongst all infectious edges.\nIn this paper, we propose new bounds on the in\ufb02uence of any set of nodes. Moreover, we prove the\nexistence of an epidemic threshold for a key quantity de\ufb01ned by the spectral radius of a given hazard\n\n1\n\n\f\u221a\nmatrix. Under this threshold, the in\ufb02uence of any given set of nodes in a network of size n will be\nn), while the in\ufb02uence of a randomly chosen set of nodes will be O(1). We provide empirical\nO(\nevidence that these bounds are sharp for a family of graphs and sets of initial in\ufb02uencers and can\ntherefore be used as what is to our knowledge the \ufb01rst closed-form formulas for in\ufb02uence estimation.\nWe show that these results generalize bounds obtained on the SIR model by Draief, Ganesh and\nMassouli\u00b4e ([15]) and are closely related to recent results on percolation on \ufb01nite inhomogeneous\nrandom graphs ([16]).\nThe rest of the paper is organized as follows.\nIn Sec. 2, we recall the de\ufb01nition of Information\nCascades Model and introduce useful notations.\nIn Sec. 3, we derive theoretical bounds for the\nin\ufb02uence. In Sec. 4, we show that our results also apply to the \ufb01elds of percolation and epidemiology\nand generalize existing results in these \ufb01elds. In Sec. 5, we illustrate our results by applying them\non simple networks and retrieving well-known results. In Sec. 6, we perform experiments in order\nto show that our bounds are sharp for a family of graphs and sets of initial nodes.\n\n2\n\nInformation Cascades Model\n\n2.1 In\ufb02uence in random networks and infection dynamics\nLet G = (V,E) be a directed network of n nodes and A \u2282 V be a set of n0 nodes that are initially\ncontagious (e.g. aware of a piece of information, infected by a disease or adopting a product). In\nthe sequel, we will refer to A as the in\ufb02uencers. The behavior of the cascade is modeled using a\nprobabilistic framework. The in\ufb02uencer nodes spread the contagion through the network by means\nof transmission through the edges of the network. More speci\ufb01cally, each contagious node can infect\nits neighbors with a certain probability. The in\ufb02uence of A, denoted as \u03c3(A), is the expected number\nof nodes reached by the contagion originating from A, i.e.\n\n(cid:88)\n\nv\u2208V\n\n\u03c3(A) =\n\nP(v is infected by the contagion |A).\n\n(1)\n\nWe consider three infection dynamics that we will show in the next section to be equivalent regarding\nthe total number of infected nodes at the end of the epidemic.\n\nDiscrete-Time Information Cascades [DT IC(P)] At time t = 0, only the in\ufb02uencers are in-\nfected. Given a matrix P = (pij)ij \u2208 [0, 1]n\u00d7n, each node i that receives the contagion at time t\nmay transmit it at time t + 1 along its outgoing edge (i, j) \u2208 E with probability pij. Node i cannot\nmake any attempt to infect its neighbors in subsequent rounds. The process terminates when no\nmore infections are possible.\n\nContinuous-Time Information Cascades [CT IC(F, T )] At time t = 0, only the in\ufb02uencers\nare infected. Given a matrix F = (fij)ij of non-negative integrable functions, each node i that\nreceives the contagion at time t may transmit it at time s > t along its outgoing edge (i, j) \u2208 E\nwith stochastic rate of occurrence fij(s \u2212 t). The process terminates at a given deterministic time\nT > 0. This model is much richer than Discrete-time IC, but we will focus here on its behavior\nwhen T = \u221e.\n\nRandom Networks [RN (P)] Given a matrix P = (pij)ij \u2208 [0, 1]n\u00d7n, each edge (i, j) \u2208 E is\nremoved independently of the others with probability 1 \u2212 pij. A node i \u2208 V is said to be infected if\ni is linked to at least one element of A in the spanning subgraph G(cid:48) = (V,E(cid:48)) where E(cid:48) \u2282 E is the\nset of non-removed edges.\nFor any v \u2208 V, we will designate by in\ufb02uence of v the in\ufb02uence of the set containing only v,\ni.e. \u03c3({v}). We will show in Section 4.2 that, if P is symmetric and G undirected, these three\ninfection processes are equivalent to bond percolation and the in\ufb02uence of a node v is also equal\nto the expected size of the connected component containing v in G(cid:48). This will make our results\napplicable to percolation in arbitrary networks. Following the percolation literature, we will denote\nas sub-critical a cascade whose in\ufb02uence is not proportional to the size of the network n.\n\n2\n\n\f2.2 The hazard matrix\n\nIn order to linearize the in\ufb02uence problem and derive upper bounds, we introduce the concept of\nhazard matrix, which describes the behavior of the information cascade. As we will see in the\nfollowing, in the case of Continuous-time Information Cascades, this matrix gives, for each edge of\nthe network, the integral of the instantaneous rate of transmission (known as hazard function). The\nspectral radius of this matrix will play a key role in the in\ufb02uence of the cascade.\nDe\ufb01nition. For a given graph G = (V,E) and edge transmission probabilities pij, let H be the\nn \u00d7 n matrix, denoted as the hazard matrix, whose coef\ufb01cients are\nif (i, j) \u2208 E\notherwise\n\n(cid:26) \u2212 ln(1 \u2212 pij)\n\nHij =\n\n(2)\n\n0\n\n.\n\nNext lemma shows the equivalence between the three de\ufb01nitions of the previous section.\nLemma 1. For a given graph G = (V,E), set of in\ufb02uencers A, and transmission probabili-\nties matrix P, the distribution of the set of infected nodes is equal under the infection dynamics\n\nDT IC(P), CT IC(F,\u221e) and RN (P), provided that for any (i, j) \u2208 E,(cid:82) \u221e\n\n0 fij(t)dt = Hij.\n\nDe\ufb01nition. For a given set of in\ufb02uencers A \u2282 V, we will denote as H(A) the hazard matrix except\nfor zeros along the columns whose indices are in A:\n\nH(A)ij = 1{j /\u2208A}Hij.\n\n(3)\nWe recall that for any square matrix M, its spectral radius \u03c1(M ) is de\ufb01ned by \u03c1(M ) = maxi(|\u03bbi|)\nwhere \u03bb1, ..., \u03bbn are the (possibly repeated) eigenvalues of matrix M. We will also use that, when\nM is a real square matrix with positive entries, \u03c1( M +M(cid:62)\nRemark. When the pij are small, the hazard matrix is very close to the transmission matrix P.\nThis implies that, for low pij values, the spectral radius of H will be very close to that of P. More\nspeci\ufb01cally, a simple calculation holds\n\nX(cid:62)M X\nX(cid:62)X .\n\n) = supX\n\n2\n\n\u03c1(P) \u2264 \u03c1(H) \u2264 \u2212 ln(1 \u2212 (cid:107)P(cid:107)\u221e)\n\n(cid:107)P(cid:107)\u221e\n\n\u03c1(P),\n\n(4)\n\nwhere (cid:107)P(cid:107)\u221e = maxi,j pij. The relatively slow increase of \u2212 ln(1\u2212x)\nfor x \u2192 1\u2212 implies that the\nbehavior of \u03c1(P) and \u03c1(H) will be of the same order of magnitude even for high (but lower than 1)\nvalues of (cid:107)P(cid:107)\u221e.\n\nx\n\n3 Upper bounds for the in\ufb02uence of a set of nodes\nGiven A \u2282 V the set of in\ufb02uencer nodes and |A| = n0 < n, we derive here two upper bounds\nfor the in\ufb02uence of A. The \ufb01rst bound (Proposition 1) applies to any set of in\ufb02uencers A such\nthat |A| = n0. Intuitively, this result correspond to a best-case scenario (or a worst-case scenario,\ndepending on the viewpoint), since we can target any set of nodes so as to maximize the resulting\ncontagion.\nProposition 1. De\ufb01ne \u03c1c(A) = \u03c1(\nby \u03c3(A) the expected number of nodes reached by the cascade starting from A:\n\n). Then, for any A such that |A| = n0 < n, denoting\n\nH(A)+H(A)(cid:62)\n\n2\n\nwhere \u03b31 is the smallest solution in [0, 1] of the following equation:\n\u2212\u03c1c(A)\u03b31 \u2212 \u03c1c(A)n0\n\u03b31(n \u2212 n0)\n\n\u03b31 \u2212 1 + exp\n\n(cid:18)\n\n\u03c3(A) \u2264 n0 + \u03b31(n \u2212 n0),\n\n(cid:19)\n\n(5)\n\n(6)\n\n= 0.\n\nCorollary 1. Under the same assumptions:\n\n3\n\n\f\u2022 if \u03c1c(A) < 1,\n\n\u03c3(A) \u2264 n0 +\n\n(cid:115)\n\n\u03c1c(A)\n1 \u2212 \u03c1c(A)\n\n(cid:32)\n\n(cid:112)n0(n \u2212 n0),\n\n(cid:33)\n\n.\n\n(cid:112)4n/n0 \u2212 3 \u2212 1\n\n2\u03c1c(A)\n\n\u2022 if \u03c1c(A) \u2265 1,\n\n\u03c3(A) \u2264 n \u2212 (n \u2212 n0) exp\n\n\u2212\u03c1c(A) \u2212\n\n\u221a\nIn particular, when \u03c1c(A) < 1, \u03c3(A) = O(\n\nn) and the regime is sub-critical.\n\nThe second result (Proposition 2) applies in the case where A is drawn from a uniform distribution\nover the ensemble of sets of n0 nodes chosen amongst n (denoted as Pn0(V)). This result corre-\nsponds to the average-case scenario in a setting where the initial in\ufb02uencer nodes are not known and\ndrawn independently of the transmissions over each edge.\nProposition 2. De\ufb01ne \u03c1c = \u03c1(H+H(cid:62)\n). Assume the set of in\ufb02uencers A is drawn from a uniform\ndistribution over Pn0(V). Then, denoting by \u03c3uniform the expected number of nodes reached by the\ncascade starting from A:\n\n2\n\n\u03c3uniform \u2264 n0 + \u03b32(n \u2212 n0),\n(cid:19)\nwhere \u03b32 is the unique solution in [0, 1] of the following equation:\n\n(cid:18)\n\n\u03b32 \u2212 1 + exp\n\n\u2212\u03c1c\u03b32 \u2212 \u03c1cn0\nn \u2212 n0\n\n= 0.\n\nCorollary 2. Under the same assumptions:\n\n\u2022 if \u03c1c < 1,\n\n\u2022 if \u03c1c \u2265 1,\n\n\u03c3uniform \u2264 n0\n1 \u2212 \u03c1c\n\u03c3uniform \u2264 n \u2212 (n \u2212 n0) exp\n\n,\n\n(cid:18)\n\n\u2212 \u03c1c\n1 \u2212 n0\n\nn\n\n(7)\n\n(8)\n\n(cid:19)\n\n.\n\nIn particular, when \u03c1c < 1, \u03c3uniform = O(1) and the regime is sub-critical.\n\n\u221a\nThe difference in the sub-critical regime between O(\nn) and O(1) for the worst and average case\nin\ufb02uence is an important feature of our results, and is veri\ufb01ed in our experiments (see Sec. 6). Intu-\nitively, when the network is inhomogeneous and contains highly central nodes (e.g. scale-free net-\nworks), there will be a signi\ufb01cant difference between speci\ufb01cally targeting the most central nodes\nand random targeting (which will most probably target a peripheral node).\n\n4 Application to epidemiology and percolation\n\nBuilding on the celebrated equivalences between the \ufb01elds of percolation, epidemiology and in\ufb02u-\nence maximization, we show that our results generalize existing results in these \ufb01elds.\n\n4.1 Susceptible-Infected-Removed (SIR) model in epidemiology\n\nWe show here that Proposition 1 further improves results on the SIR model in epidemiology. This\nwidely used model was introduced by Kermac and McKendrick ([14]) in order to model the prop-\nagation of a disease in a given population. In this setting, nodes represent individuals, that can be\nin one of three possible states, susceptible (S), infected (I) or removed (R). At t = 0, a subset A of\nn0 nodes is infected and the epidemic spreads according to the following evolution. Each infected\nnode transmits the infection along its outgoing edge (i, j) \u2208 E at stochastic rate of occurrence \u03b2 and\nis removed from the graph at stochastic rate of occurrence \u03b4. The process ends for a given T > 0.\nIt is straightforward that, if the removed events are not observed, this infection process is equivalent\nto CT IC(F, T ) where for any (i, j) \u2208 E,fij(t) = \u03b2 exp(\u2212\u03b4t). The hazard matrix H is therefore\nequal to \u03b2\nij is the adjacency matrix of the underlying network. Note\n\n\u03b4 A where A = (cid:0)1{(i,j)\u2208E}(cid:1)\n\n4\n\n\fthat, by Lemma 1, our results can be used in order to model the total number of infected nodes in a\nsetting where infection and recovery rates of a given node exhibit a non-exponential behavior. For\ninstance, incubation periods for different individuals generally follow a log-normal distribution [17],\nwhich indicates that continuous-time IC with a log-normal rate of removal might be well-suited to\nmodel some kind of infections.\nIt was recently shown by Draief, Ganesh and Massouli\u00b4e ([15]) that, in the case of undirected net-\nworks, and if \u03b2\u03c1(A) < \u03b4,\n\n\u03c3(A) \u2264\n\u03b4 \u03c1(A) < 1, the in\ufb02uence of set of nodes A is O(\n\nnn0\n\u03b4 \u03c1(A)\n\nThis result shows, that, when \u03c1(H) = \u03b2\nn).\nWe show in the next lemma that this result is a direct consequence of Corollary 1: the condition\n\u03c1c(A) < 1 is weaker than \u03c1(H) < 1 and, under these conditions, the bound of Corollary 1 is tighter.\nLemma 2. For any symmetric adjacency matrix A, initial set of in\ufb02uencers A such that |A| = n0 <\nn, \u03b4 > 0 and \u03b2 < \u03b4\n\n\u03c1(A) , we have simultaneously \u03c1c(A) \u2264 \u03b2\n\n\u221a\n\n(9)\n\n.\n\n\u221a\n1 \u2212 \u03b2\n\n(cid:115)\n\n(cid:112)n0(n \u2212 n0) \u2264\n\n\u221a\n1 \u2212 \u03b2\n\u03c1(A) imposes that the regime is sub-critical.\n\n\u03c1c(A)\n1 \u2212 \u03c1c(A)\n\n\u03b4 \u03c1(A) and\nnn0\n\u03b4 \u03c1(A)\n\nn0 +\n\n,\n\n(10)\n\nwhere the condition \u03b2 < \u03b4\n\nMoreover, these new bounds capture with more accuracy the behavior of the in\ufb02uence in extreme\ncases. In the limit \u03b2 \u2192 0, the difference between the two bounds is signi\ufb01cant, because Proposition\n1 yields \u03c3(A) \u2192 n0 whereas (9) only ensures \u03c3(A) \u2264 \u221a\nnn0. When n = n0, Proposition 1 also\nensures that \u03c3(A) = n0 whereas (9) yields \u03c3(A) \u2264\n. Secondly, Proposition 1 gives also\nbounds in the case \u03b2\u03c1(A) \u2265 \u03b4. Finally, Proposition 1 applies to more general cases that the classical\nhomogeneous SIR model, and allows infection and recovery rates to vary across individuals.\n\nn0\n1\u2212 \u03b2\n\u03b4 \u03c1(A)\n\n4.2 Bond percolation\nGiven a \ufb01nite undirected graph G = (V,E), bond percolation theory describes the behavior of\nconnected clusters of the spanning subgraph of G obtained by retaining a subset E(cid:48) \u2282 E of edges\nof G according to a given distribution.When these removals occur independently along each edge\nwith same probability 1\u2212 p, this process is called homogeneous percolation and is fairly well known\n(see e.g [18]). The inhomogeneous case, where the independent edge removal probabilities 1 \u2212 pij\nvary across the edges, is more intricate and has been the subject of recent studies. In particular,\nresults on critical probabilities and size of the giant component have been obtained by Bollobas,\nJanson and Riordan in [16]. However, these bounds hold for a particular class of asymptotic graphs\n(inhomogeneous random graphs) when n \u2192 \u221e. In the next lemma, we show that our results can be\nused in order to obtain bounds that hold in expectation for any \ufb01xed graph.\nLemma 3. Let P = (pij)ij \u2208 [0, 1]n\u00d7n be a symmetric matrix. Let G(cid:48) = (V,E(cid:48)) be the undirected\nsubgraph of G such that each edge {i, j} \u2208 E is removed independently with probability 1\u2212 pij. Let\nGd = (V,Ed) be the directed graph such that (i, j) \u2208 Ed \u21d0\u21d2 {i, j} \u2208 E. Then, for any v \u2208 V,\nthe expected size of the connected component containing v in G(cid:48) is equal to the in\ufb02uence of v in Gd\nunder the infection process DT IC(P).\nWe now derive an upper bound for C1(G(cid:48)), the size of the largest connected component of the\nspanning subgraph G(cid:48) = (V,E(cid:48)). In the following, we will denote by E[C1(G(cid:48))] the expected value\nof this random variable, given P = (pij)ij.\nProposition 3. Let G = (V,E) be an undirected network where each edge {i, j} \u2208 E has an inde-\npendent probability 1\u2212 pij of being removed. The expected size of the largest connected component\nof the resulting subgraph G(cid:48) is upper bounded by:\n\nE[C1(G(cid:48))] \u2264 n\n\n\u221a\n\n\u03b33,\n\nwhere \u03b33 is the unique solution in [0, 1] of the following equation:\n\u03c1(H)\u03b33\n\n\u03b33 \u2212 1 +\n\nexp\n\n\u2212 n\nn \u2212 1\n\n(cid:19)\n\n(11)\n\n(12)\n\n= 0.\n\nn \u2212 1\nn\n\n(cid:18)\n\n5\n\n\fMoreover, the resulting network has a probability of being connected upper bounded by:\n\nP(G(cid:48) is connected) \u2264 \u03b33.\n\n(13)\nIn the case \u03c1(H) < 1, we can further simplify our bounds in the same way than for Propositions 1\nand 2.\n\nCorollary 3. In the case \u03c1(H) < 1, E[C1(G(cid:48))] \u2264(cid:113) n\n\n1\u2212\u03c1(H) .\n\nWhereas our results hold for any n \u2208 N, classical results in percolation theory study the asymptotic\nbehavior of sequences of graphs when n \u2192 \u221e. In order to further compare our results, we therefore\nconsider sequences of spanning subgraphs (G(cid:48)\nn)n \u2208N, obtained by removing each edge of graphs\nof n nodes (Gn)n \u2208N with probability 1 \u2212 pn\nij. A previous result ([16], Corollary 3.2 of section\n5) states that, for particular sequences known as inhomogeneous random graphs and under a given\nsub-criticality condition, C1(G(cid:48)\nn) = o(n) asymptotically almost surely (a.a.s.), i.e with probability\ngoing to 1 as n \u2192 \u221e. Using Proposition 3, we get for our part the following result:\nCorollary 4. Assume the sequence\n\n(cid:16)Hn =(cid:0)\u2212 ln(1 \u2212 pn\nij)(cid:1)\n\n(cid:17)\n\nn \u2208N is such that\n\nij\n\nThen, for any \u0001 > 0, we have asymptotically almost surely when n \u2192 \u221e,\n\n\u03c1(Hn) < 1.\n\nlim sup\nn\u2192\u221e\n\nC1(G(cid:48)\n\nn) = o(n1/2+\u0001).\n\n(14)\n\n(15)\n\nThis result is to our knowledge the \ufb01rst to bound the expected size of the largest connected compo-\nnent in general arbitrary networks.\n\n5 Application to particular networks\n\nIn order to illustrate our theoretical results, we now apply our bounds to three speci\ufb01c networks and\ncompare them to existing results, showing that our bounds are always of the same order than these\nspeci\ufb01c results. We consider three particular networks: 1) star-shaped networks, 2) Erd\u00a8os-R\u00b4enyi\nnetworks and 3) random graphs with an expected degree distribution. In order to simplify these\nproblems and exploit existing theorems, we will consider in this section that pij = p is \ufb01xed for\neach edge {i, j} \u2208 E. Infection dynamics thus only depend on p, the set of in\ufb02uencers A, and the\nstructure of the underlying network.\n\n(cid:18)(cid:18)\n\n\u221a\n\n5.1 Star-shaped networks\nFor a star shaped network centered around a given node v1, and A = {v1}, the exact in\ufb02uence is\ncomputable and writes \u03c3({v1}) = 1 + p(n \u2212 1). As H(A)ij = \u2212 ln(1 \u2212 p)1{i=1,j(cid:54)=1}, the spectral\nradius is given by\n\n(cid:18)H(A) + H(A)(cid:62)\n\n(cid:19)\n\n(16)\nTherefore, Proposition 1 states that \u03c3({v1}) \u2264 1 + (n \u2212 1)\u03b31 where \u03b31 is the solution of equation\n\n=\n\n\u03c1\n\n2\n\n2\n\n\u2212 ln(1 \u2212 p)\n\n\u221a\n\nn \u2212 1.\n\n(cid:19) ln(1 \u2212 p)\n\n(cid:19)\n\n\u03b31\n\nn \u2212 1 +\n\n1 \u2212 \u03b31 = exp\n\n1\nn \u2212 1\nIt is worth mentionning that, when p = 1\u221a\nn\u22121 is solution of (17) and therefore the\nbound is \u03c3({v1}) \u2264 1 +\nn \u2212 1 which is tight. Note that, in the case of star-shaped networks, the\nin\ufb02uence does not present a critical behavior and is always linear with respect to the total number of\nnodes n.\n\nn\u22121, \u03b31 = 1\u221a\n\n(17)\n\n\u221a\n\n\u03b31\n\n2\n\n.\n\n\u221a\n\n5.2 Erd\u00a8os-R\u00b4enyi networks\nFor Erd\u00a8os-R\u00b4enyi networks G(n, p) (i.e. an undirected network with n nodes where each couple of\nnodes (i, j) \u2208 V 2 belongs to E independently of the others with probability p), the exact in\ufb02uence\n\n6\n\n\fof a set of nodes is not known. However, percolation theory characterizes the limit behavior of the\ngiant connected component when n \u2192 \u221e. In the simplest case of Erd\u00a8os-R\u00b4enyi networks G(n, c\nn )\nthe following result holds:\nLemma 4. (taken from [16]) For a given sequence of Erd\u00a8os-R\u00b4enyi networks G(n, c\n\nn ), we have:\n\n\u2022 if c < 1, C1(G(n, c\n\u2022 if c > 1, C1(G(n, c\n\n(1\u2212c)2 log(n) a.a.s.\n\nn )) \u2264 3\nn )) = (1 + o(1))\u03b2n a.a.s. where \u03b2 \u2212 1 + exp(\u2212\u03b2c) = 0.\n\nAs previously stated, our results hold for any given graph, and not only asymptotically. However,\nwe get an asymptotic behavior consistent with the aforementioned result. Indeed, using notations of\nsection 4.2, Hn\nn ). Using Proposition 3, and\nnoting that \u03b33 = (1 + o(1))\u03b2, we get that, for any \u0001 > 0:\n\nn )1{i(cid:54)=j} and \u03c1(Hn) = \u2212(n \u2212 1) ln(1 \u2212 c\n\nij = \u2212 ln(1 \u2212 c\n\n\u2022 if c < 1, C1(G(n, c\n\u2022 if c > 1, C1(G(n, c\n\nn )) = o(n1/2+\u0001) a.a.s.\nn )) \u2264 (1 + o(1))\u03b2n1+\u0001 a.a.s., where \u03b2 \u2212 1 + exp(\u2212\u03b2c) = 0.\n\n5.3 Random graphs with given expected degree distribution\n\nIn this section, we apply our bounds to random graphs whose expected degree distribution is \ufb01xed\n(see e.g [19], section 13.2.2). More speci\ufb01cally, let w = (wi)i\u2208{1,...,n} be the expected degree of\neach node of the network. For a \ufb01xed w, let G(w) be a random graph whose edges are selected\nindependently and randomly with probability\n\n(cid:80)\n\n1{i(cid:54)=j}wiwj\n\nk wk\n\nqij =\n\n.\n\n(18)\n\n(cid:18)H + H(cid:62)\n\n(cid:19)\n\n\u03c1\n\n2\n\n\u2248 \u03c1((qij)ij) \u2264\n\nFor these graphs, results on the volume of connected components (i.e the expected sum of degrees\nof the nodes in these components) were derived in [20] but our work gives to our knowledge the \ufb01rst\nresult on the size of the giant component. Note that Erd\u00a8os-R\u00b4enyi G(n, p) networks are a special case\nof (18) where wi = np for any i \u2208 V.\nIn order to further compare our results, we note that these graphs are also very similar to the widely\nused con\ufb01guration model where node degrees are \ufb01xed to a sequence w, the main difference being\nthat the occupation probabilities pij are in this case not independent anymore. For con\ufb01guration\ni wi ([21, 22]). In the case of graphs\ni wi\n\nmodels, a giant component exists if and only if(cid:80)\nwith given expected degree distribution, we retrieve the key role played by the ratio(cid:80)\n\ni /(cid:80)\n\ni w2\n\ni w2\n\nin our criterion of non-existence of the giant component given by \u03c1(H+H(cid:62)\n\ni > 2(cid:80)\n(cid:80)\ni(cid:80)\nk wk)ij is given by(cid:80)\ni /(cid:80)\n\ni w2\ni wi\n\n) < 1 where\n\n2\n\n.\n\n(19)\n\ni /(cid:80)\n\nThe left-hand approximation is particularly good when the qij are small. This is for instance the case\nas soon as there exists \u03b1 < 1 such that, for any i \u2208 V, wi = o(n\u03b1). The right-hand side is based on\nthe fact that the spectral radius of the matrix (qij + 1{i=j}w2\ni wi.\n\ni w2\n\n6 Experimental results\n\nIn this section, we show that the bounds given in Sec. 3 are tight (i.e. very close to empirical results in\nparticular graphs), and are good approximations of the in\ufb02uence on a large set of random networks.\nFig. 1a compares experimental simulations of the in\ufb02uence to the bound derived in proposition 1.\nThe considered networks have n = 1000 nodes and are of 6 types (see e.g [19] for further details on\nthese different networks): 1) Erd\u00a8os-R\u00b4enyi networks, 2) Preferential attachment networks, 3) Small-\nworld networks, 4) Geometric random networks ([23]), 5) 2D regular grids and 6) totally connected\nnetworks with \ufb01xed weight b \u2208 [0, 1] except for the ingoing and outgoing edges of the in\ufb02uencer\nnode A = {v1} having weight a \u2208 [0, 1]. Except for totally connected networks, edge probabilities\nare set to the same value p for each edge (this parameter was used to tune the spectral radius \u03c1c(A)).\nAll points of the plots are averages over 100 simulations. The results show that the bound in propo-\nsition 1 is tight (see totally connected networks in Fig. 1a) and close to the real in\ufb02uence for a large\n\n7\n\n\fclass of random networks. In particular, the tightness of the bound around \u03c1c(A) = 1 validates the\nbehavior in\nn of the worst-case in\ufb02uence in the sub-critical regime. Similarly, Fig. 1b compares\n\n\u221a\n\n(a) Fixed set of in\ufb02uencers\n\n(b) Uniformly distributed set of in\ufb02uencers\n\nFigure 1: Empirical in\ufb02uence on random networks of various types. The solid lines are the upper\nbounds in propositions 1 (for Fig. 1a) and 2 (for Fig. 1b).\n\nexperimental simulations of the in\ufb02uence to the bound derived in proposition 2 in the case of random\ninitial in\ufb02uencers. While this bound is not as tight as the previous one, the behavior of the bound\nagrees with experimental simulations, and proves a relatively good approximation of the in\ufb02uence\nunder a random set of initial in\ufb02uencers. It is worth mentioning that the bound is tight for the sub-\ncritical regime and shows that corollary 2 is a good approximation of \u03c3uniform when \u03c1c < 1. In\norder to verify the criticality of \u03c1c(A) = 1, we compared the behavior of \u03c3(A) w.r.t the size of the\nnetwork n. When \u03c1c(A) < 1 (see Fig. 2a in which \u03c1c(A) = 0.5), \u03c3(A) = O(\nn), and the bound\nis tight. On the contrary, when \u03c1c(A) > 1 (see Fig. 2b in which \u03c1c(A) = 1.5), \u03c3(A) = O(n), and\n\u03c3(A) is linear w.r.t. n for most random networks.\n\n\u221a\n\n(a) Sub-critical regime: \u03c1c(A) = 0.5\n\n(b) Super-critical regime: \u03c1c(A) = 1.5\n\nFigure 2: In\ufb02uence w.r.t. the size of the network in the sub-critical and super-critical regime. The\nsolid line is the upper bound in proposition 1. Note the square-root versus linear behavior.\n\n7 Conclusion\n\nIn this paper, we derived the \ufb01rst upper bounds for the in\ufb02uence of a given set of nodes in any\n\ufb01nite graph under the Independent Cascade Model (ICM) framework, and relate them to the spectral\nradius of a given hazard matrix. We show that these bounds can also be used to generalize previous\nresults in the \ufb01elds of epidemiology and percolation. Finally, we provide empirical evidence that\nthese bounds are close to the best possible for general graphs.\n\nAcknowledgments\n\nThis research is part of the SODATECH project funded by the French Government within the pro-\ngram of \u201cInvestments for the Future \u2013 Big Data\u201d.\n\n8\n\n024681001002003004005006007008009001000spectral radius of the Hazard matrix (\u03c1c(A))influence (\u03c3(A)) totally connectederdos renyipreferential attachmentsmall Worldgeometric random2D gridupper bound024681001002003004005006007008009001000spectral radius of the Hazard matrix (\u03c1c)influence (\u03c3uniform) totally connectederdos renyipreferential attachmentsmall Worldgeometric random2D gridupper bound02004006008001000051015202530size of the network (n)influence (\u03c3(A)) totally connectederdos renyipreferential attachmentsmall Worldgeometric random2D gridupper bound020040060080010000100200300400500size of the network (n)influence (\u03c3(A)) totally connectederdos renyipreferential attachmentsmall Worldgeometric random2D gridupper bound\fReferences\n[1] Justin Kirby and Paul Marsden. Connected marketing: the viral, buzz and word of mouth revolution.\n\nElsevier, 2006.\n\n[2] Pedro Domingos and Matt Richardson. Mining the network value of customers. In Proceedings of the\nseventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 57\u201366.\nACM, 2001.\n\n[3] David Kempe, Jon Kleinberg, and \u00b4Eva Tardos. Maximizing the spread of in\ufb02uence through a social\nnetwork. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery\nand Data Mining, KDD \u201903, pages 137\u2013146, New York, NY, USA, 2003. ACM.\n\n[4] Wei Chen, Yajun Wang, and Siyu Yang. Ef\ufb01cient in\ufb02uence maximization in social networks. In Proceed-\nings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages\n199\u2013208. ACM, 2009.\n\n[5] Wei Chen, Chi Wang, and Yajun Wang. Scalable in\ufb02uence maximization for prevalent viral marketing\nin large-scale social networks. In Proceedings of the 16th ACM SIGKDD international conference on\nKnowledge discovery and data mining, pages 1029\u20131038. ACM, 2010.\n\n[6] Amit Goyal, Wei Lu, and Laks VS Lakshmanan. Celf++: optimizing the greedy algorithm for in\ufb02uence\nIn Proceedings of the 20th international conference companion on\n\nmaximization in social networks.\nWorld wide web, pages 47\u201348. ACM, 2011.\n\n[7] Kouzou Ohara, Kazumi Saito, Masahiro Kimura, and Hiroshi Motoda. Predictive simulation framework\nof stochastic diffusion model for identifying top-k in\ufb02uential nodes. In Asian Conference on Machine\nLearning, pages 149\u2013164, 2013.\n\n[8] Manuel Gomez Rodriguez, Jure Leskovec, and Andreas Krause.\n\nInferring networks of diffusion and\nin\ufb02uence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery\nand data mining, pages 1019\u20131028. ACM, 2010.\n\n[9] Seth A. Myers and Jure Leskovec. On the convexity of latent social network inference. In NIPS, pages\n\n1741\u20131749, 2010.\n\n[10] Manuel Gomez-Rodriguez, David Balduzzi, and Bernhard Sch\u00a8olkopf. Uncovering the temporal dynamics\n\nof diffusion networks. In ICML, pages 561\u2013568, 2011.\n\n[11] Manuel G Rodriguez and Bernhard Sch\u00a8olkopf.\n\nIn\ufb02uence maximization in continuous time diffusion\nnetworks. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), pages\n313\u2013320, 2012.\n\n[12] Nan Du, Le Song, Manuel Gomez-Rodriguez, and Hongyuan Zha. Scalable in\ufb02uence estimation in\n\ncontinuous-time diffusion networks. In NIPS, pages 3147\u20133155, 2013.\n\n[13] Mark EJ Newman. Spread of epidemic disease on networks. Physical review E, 66(1):016128, 2002.\n[14] William O Kermack and Anderson G McKendrick. Contributions to the mathematical theory of epi-\ndemics. ii. the problem of endemicity. Proceedings of the Royal society of London. Series A, 138(834):55\u2013\n83, 1932.\n\n[15] Moez Draief, Ayalvadi Ganesh, and Laurent Massouli\u00b4e. Thresholds for virus spread on networks. In Pro-\nceedings of the 1st international conference on Performance evaluation methodolgies and tools, page 51.\nACM, 2006.\n\n[16] B\u00b4ela Bollob\u00b4as, Svante Janson, and Oliver Riordan. The phase transition in inhomogeneous random\n\ngraphs. Random Structures & Algorithms, 31(1):3\u2013122, 2007.\n\n[17] Kenrad E Nelson. Epidemiology of infectious disease: general principles. Infectious Disease Epidemiol-\n\nogy Theory and Practice. Gaithersburg, MD: Aspen Publishers, pages 17\u201348, 2007.\n\n[18] Svante Janson, Tomasz Luczak, and Andrzej Rucinski. Random graphs, volume 45. John Wiley & Sons,\n\n2011.\n\n[19] Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.\n[20] Fan Chung and Linyuan Lu. Connected components in random graphs with given expected degree se-\n\nquences. Annals of combinatorics, 6(2):125\u2013145, 2002.\n\n[21] Michael Molloy and Bruce Reed. A critical point for random graphs with a given degree sequence.\n\nRandom structures & algorithms, 6(2-3):161\u2013180, 1995.\n\n[22] Michael Molloy and Bruce Reed. The size of the giant component of a random graph with a given degree\n\nsequence. Combinatorics probability and computing, 7(3):295\u2013305, 1998.\n\n[23] Mathew Penrose. Random geometric graphs, volume 5. Oxford University Press Oxford, 2003.\n\n9\n\n\f", "award": [], "sourceid": 558, "authors": [{"given_name": "Remi", "family_name": "Lemonnier", "institution": "ENS Cachan"}, {"given_name": "Kevin", "family_name": "Scaman", "institution": "ENS Cachan"}, {"given_name": "Nicolas", "family_name": "Vayatis", "institution": "Ecole Normale Sup\u00e9rieure de Cachan"}]}