{"title": "Computing and maximizing influence in linear threshold and triggering models", "book": "Advances in Neural Information Processing Systems", "page_first": 4538, "page_last": 4546, "abstract": "We establish upper and lower bounds for the influence of a set of nodes in certain types of contagion models. We derive two sets of bounds, the first designed for linear threshold models, and the second more broadly applicable to a general class of triggering models, which subsumes the popular independent cascade models, as well. We quantify the gap between our upper and lower bounds in the case of the linear threshold model and illustrate the gains of our upper bounds for independent cascade models in relation to existing results. Importantly, our lower bounds are monotonic and submodular, implying that a greedy algorithm for influence maximization is guaranteed to produce a maximizer within a (1 - 1/e)-factor of the truth. Although the problem of exact influence computation is NP-hard in general, our bounds may be evaluated efficiently. This leads to an attractive, highly scalable algorithm for influence maximization with rigorous theoretical guarantees.", "full_text": "Computing and maximizing in\ufb02uence in linear\n\nthreshold and triggering models\n\nJustin Khim\n\nDepartment of Statistics\n\nThe Wharton School\n\nUniversity of Pennsylvania\n\nPhiladelphia, PA 19104\n\njkhim@wharton.upenn.edu\n\nVarun Jog\n\nElectrical & Computer Engineering Department\n\nUniversity of Wisconsin - Madison\n\nMadison, WI 53706\nvjog@wisc.edu\n\nPo-Ling Loh\n\nElectrical & Computer Engineering Department\n\nUniversity of Wisconsin - Madison\n\nMadison, WI 53706\nloh@ece.wisc.edu\n\nAbstract\n\nWe establish upper and lower bounds for the in\ufb02uence of a set of nodes in certain\ntypes of contagion models. We derive two sets of bounds, the \ufb01rst designed for\nlinear threshold models, and the second more broadly applicable to a general class\nof triggering models, which subsumes the popular independent cascade models, as\nwell. We quantify the gap between our upper and lower bounds in the case of the\nlinear threshold model and illustrate the gains of our upper bounds for independent\ncascade models in relation to existing results. Importantly, our lower bounds\nare monotonic and submodular, implying that a greedy algorithm for in\ufb02uence\n\nmaximization is guaranteed to produce a maximizer within a(cid:0)1 \u2212 1\n\n(cid:1)-factor of the\n\ntruth. Although the problem of exact in\ufb02uence computation is NP-hard in general,\nour bounds may be evaluated ef\ufb01ciently. This leads to an attractive, highly scalable\nalgorithm for in\ufb02uence maximization with rigorous theoretical guarantees.\n\ne\n\n1\n\nIntroduction\n\nMany datasets in contemporary scienti\ufb01c applications possess some form of network structure [20].\nPopular examples include data collected from social media websites such as Facebook and Twitter [1],\nor electrical recordings gathered from a physical network of \ufb01ring neurons [22]. In settings involving\nbiological data, a common goal is to construct an abstract network representing interactions between\ngenes, proteins, or other biomolecules [8].\nOver the last century, a vast body of work has been developed in the epidemiology literature to\nmodel the spread of disease [10]. The most popular models include SI (susceptible, infected), SIS\n(susceptible, infected, susceptible), and SIR (susceptible, infected, recovered), in which nodes may\ninfect adjacent neighbors according to a certain stochastic process. These models have recently been\napplied to social network and viral marketing settings by computer scientists [6, 14]. In particular,\nthe notion of in\ufb02uence, which refers to the expected number of infected individuals in a network\nat the conclusion of an epidemic spread, was studied by Kempe et al. [9]. However, determining\nan in\ufb02uence-maximizing seed set of a certain cardinality was shown to be NP-hard\u2014in fact, even\ncomputing the in\ufb02uence exactly in certain simple models is #P-hard [3, 5]. Recent work in theoretical\ncomputer science has therefore focused on maximizing in\ufb02uence up to constant factors [9, 2].\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fA series of recent papers [12, 21, 13] establish computable upper bounds on the in\ufb02uence when\ninformation propagates in a stochastic manner according to an independent cascade model. In such a\nmodel, the infection spreads in rounds, and each newly infected node may infect any of its neighbors\nin the succeeding round. Central to the bounds is a matrix known as the hazard matrix, which encodes\nthe transmission probabilities across edges in the graph. A recent paper by Lee et al. [11] leverages\n\u201csensitive\u201d edges in the network to obtain tighter bounds via a conditioning argument. Such bounds\ncould be maximized to obtain a surrogate for the in\ufb02uence-maximizing set in the network; however,\nthe tightness of the proposed bounds is yet unknown. The independent cascade model may be viewed\nas a special case of a more general triggering model, in which the infection status of each node in\nthe network is determined by a random subset of neighbors [9]. The class of triggering models also\nincludes another popular stochastic infection model known as the linear threshold model, and bounds\nfor the in\ufb02uence function in linear threshold models have been explored in an independent line of\nwork [5, 23, 4].\nNaturally, one might wonder whether in\ufb02uence bounds might be derived for stochastic infection\nmodels in the broader class of triggering models, unifying and extending the aforementioned results.\nWe answer this question af\ufb01rmatively by establishing upper and lower bounds for the in\ufb02uence in\ngeneral triggering models. Our derived bounds are attractive for two reasons: First, we are able\nto quantify the gap between our upper and lower bounds in the case of linear threshold models,\nexpressed in terms of properties of the graph topology and edge probabilities governing the likelihood\nof infection. Second, maximizing a lower bound on the in\ufb02uence is guaranteed to yield a lower\nbound on the true maximum in\ufb02uence in the graph. Furthermore, as shown via the theory of\nsubmodular functions, the lower bounds in our paper may be maximized ef\ufb01ciently up to a constant-\nfactor approximation via a greedy algorithm, leading to a highly-scalable algorithm with provable\nguarantees. To the best of our knowledge, the only previously established bounds for in\ufb02uence\nmaximization are those mentioned above for the special cases of independent cascade and linear\nthreshold models, and no theoretical or computational guarantees were known.\nThe remainder of our paper is organized as follows: In Section 2, we \ufb01x notation to be used in the\npaper and describe the aforementioned infection models in greater detail. In Section 3, we establish\nupper and lower bounds for the linear threshold model, which we extend to triggering models in\nSection 4. Section 5 addresses the question of maximizing the lower bounds established in Sections 3\nand 4, and discusses theoretical guarantees achievable using greedy algorithms and convex relaxations\nof the otherwise intractable in\ufb02uence maximization problem. We report the results of simulations in\nSection 6, and conclude the paper with a selection of open research questions in Section 7.\n\n2 Preliminaries\n\nIn this section, we introduce basic notation and de\ufb01ne the infection models to be analyzed in our\npaper. The network of individuals is represented by a directed graph G = (V, E), where V is the\nset of vertices and E \u2286 V \u00d7 V is the set of edges. Furthermore, each directed edge (i, j) possesses\na weight bij, whose interpretation changes with the speci\ufb01c model we consider. We denote the\nweighted adjacency matrix of G by B = (bij). We distinguish nodes as either being infected or\nuninfected based on whether or not the information contagion has reached them. Let \u00afA := V \\ A.\n\n2.1 Linear threshold models\n\nWe \ufb01rst describe the linear threshold model, introduced by Kempe et al. [9]. In this model, the edge\nweights (bij) denote the in\ufb02uence that node i has on node j. The chance that a node is infected\ndepends on two quantities: the set of infected neighbors at a particular time instant and a random\nnode-speci\ufb01c threshold that remains constant over time. For each i \u2208 V , we impose the condition\n\n(cid:80)\nj bji \u2264 1.\neach vertex i computes the total incoming weight from all infected neighbors, i.e.,(cid:80)\n\nThe thresholds {\u03b8i : i \u2208 V } are i.i.d. uniform random variables on [0, 1]. Beginning from an initially\ninfected set A \u2286 V , the contagion proceeds in discrete time steps, as follows: At every time step,\nj is infected bji.\nIf this quantity exceeds \u03b8i, vertex i becomes infected. Once a node becomes infected, it remains\ninfected for every succeeding time step. Note that the process necessarily stabilizes after at most |V |\ntime steps. The expected size of the infection when the process stabilizes is known as the in\ufb02uence of\nA and is denoted by I(A). We may interpret the threshold \u03b8i as the level of immunity of node i.\n\n2\n\n\fKempe et al. [9] established the monotonicity and submodularity of the function I : 2V \u2192 R. As\ndiscussed in Section 5.1, these properties are key to the problem of in\ufb02uence maximization, which\nconcerns maximizing I(A) for a \ufb01xed size of the set A. An important step used in describing the\nsubmodularity of I is the \u201creachability via live-edge paths\" interpretation of the linear threshold\nmodel. Since this interpretation is also crucial to our analysis, we describe it below.\nReachability via live-edge paths: Consider the weighted adjacency matrix B of the graph. We\ncreate a subgraph of G by selecting a subset of \u201clive\" edges, as follows: Each vertex i designates\nat most one incoming edge as a live edge, with edge (j, i) being selected with probability bji. (No\nj bji.) The \u201creach\" of a set A is de\ufb01ned as the\nset of all vertices i such that a path exists from A to i consisting only of live edges. The distribution\nof the set of nodes infected in the \ufb01nal state of the threshold model is identical to the distribution of\nreachable nodes under the live-edges model, when both are seeded with the same set A.\n\nneighboring edge is selected with probability 1 \u2212(cid:80)\n\n2.2\n\nIndependent cascade models\n\nKempe et al. [9] also analyzed the problem of in\ufb02uence maximization in independent cascade models,\na class of models motivated by interacting particle systems in probability theory [7, 15]. Similar to\nthe linear threshold model, the independent cascade models begins with a set A of initially infected\nnodes in a directed graph G = (V, E), and the infection spreads in discrete time steps. If a vertex i\nbecomes infected at time t, it attempts to infect each uninfected neighbor j via the edge from i to j\nat time t + 1. The entries (bij) capture the probability that i succeeds in infecting j. This process\ncontinues until no more infections occur, which again happens after at most |V | time steps.\nThe in\ufb02uence function for this model was also shown to be monotonic and submodular, where the\nmain step again relied on a \u201creachability via live-edge paths\" model. In this case, the interpretation is\nstraightforward: Given a graph G, every edge (i, j) \u2208 E is independently designated as a live edge\nwith probability bij. It is then easy to see that the reach of A again has the same distribution as the\nset of infected nodes in the \ufb01nal state of the independent cascade model.\n\n2.3 Triggering models\n\nTo unify the above models, Kempe et al. [9] introduced the \u201ctriggering model,\" which evolves as\nfollows: Each vertex i chooses a random subset of its neighbors as triggers, where the choice of\ntriggers for a given node is independent of the choice for all other nodes. If a node i is uninfected at\ntime t but a vertex in its trigger set becomes infected, vertex i becomes infected at time t + 1. Note\nthat the triggering model may be interpreted as a \u201creachability via live-edge paths\" model if edge (j, i)\nis designated as live when i chooses j to be in its trigger set. The entry bij represents the probability\nthat edge (i, j) is live. Clearly, the linear threshold and independent cascade models are special cases\nof the triggering model when the distributions of the trigger sets are chosen appropriately.\n\n2.4 Notation\nFinally, we introduce some notational conventions. For a matrix M \u2208 Rn\u00d7n, we write \u03c1(M ) to\ndenote the spectral radius of M. We write (cid:107)M(cid:107)\u221e,\u221e to denote the (cid:96)\u221e-operator norm of M. The\nmatrix Diag(M ) denotes the matrix with diagonal entries equal to the diagonal entries of M and all\nother entries equal to 0. We write 1S to denote the all-ones vector supported on a set S.\nFor a given vertex subset A \u2286 V in the graph with weighted adjacency matrix B, de\ufb01ne the vector\nj\u2208A bji. Thus, b \u00afA(i) records the total incoming\nweight from A into i. A walk in the graph G is a sequence of vertices {v1, v2, . . . , vr} such that\n(vi, vi+1) \u2208 E, for 1 \u2264 i \u2264 r \u2212 1. A path is a walk with no repeated vertices. We de\ufb01ne the weight\ne\u2208w be, where the product is over all edges e \u2208 E included in w. (The\nweight of a walk of length 0 is de\ufb01ned to be 1.) For a set of walks W = {w1, w2, . . . , wr}, we denote\n\nb \u00afA \u2208 R| \u00afA| indexed by i \u2208 \u00afA, such that b \u00afA(i) =(cid:80)\nof a walk to be \u03c9(w) :=(cid:81)\nthe sum of the weights of all walks in W by \u03c9(W ) =(cid:80)r\n\ni \u03c9(wi).\n\nIn\ufb02uence bounds for linear threshold models\n\n3\nWe now derive upper and lower bounds for the in\ufb02uence of a set A \u2286 V in the linear threshold model.\n\n3\n\n\f3.1 Upper bound\n\nWe begin with upper bounds. We have the following main result, which bounds the in\ufb02uence as a\nfunction of appropriate sub-blocks of the weighted adjacency matrix:\nTheorem 1. For any set A \u2286 V , we have the bound\n\nIn fact, the proof of Theorem 1 shows that the bound (1) may be strengthened to\n\nI(A) \u2264 |A| + bT\n\n\u00afA(I \u2212 B \u00afA \u00afA)\u221211 \u00afA.\n\n\uf8eb\uf8edn\u2212|A|(cid:88)\n\n\uf8f6\uf8f8 1 \u00afA,\n\nI(A) \u2264 |A| + bT\n\u00afA\n\nBi\u22121\n\u00afA, \u00afA\n\n(1)\n\n(2)\n\nsince the upper bound is contained by considering paths from vertices in A to vertices in \u00afA and\nsumming over paths of various lengths (see also Theorem 4 below). The bound (2) is exact when the\nunderlying graph G is a directed acyclic graph (DAG). However, the bound (1) may be preferable in\nsome cases from the point of view of computation or interpretation.\n\ni=1\n\n3.2 Lower bounds\n\nWe also establish lower bounds on the in\ufb02uence. The following theorem provides a family of lower\nbounds, indexed by m \u2265 1:\nTheorem 2. For any m \u2265 1, we have the following natural lower bound on the in\ufb02uence of A:\n\nI(A) \u2265 m(cid:88)\n\n\u03c9(P k\n\nA),\n\n(3)\n\nwhere P k\nsome special cases when the bounds may be written explicitly:\n\nA are all paths from A to \u00afA of length k, such that only the starting vertex lies in A. We note\n\nk=0\n\nm = 1 :\n\nm = 2 :\n\nm = 3 :\n\nI(A) \u2265 |A| + bT\nI(A) \u2265 |A| + bT\nI(A) \u2265 |A| + bT\n\n\u00afA1 \u00afA := LB1(A)\n\u00afA(I + B \u00afA, \u00afA)1 \u00afA := LB2(A)\n\u00afA(I + B \u00afA, \u00afA + B2\n\n\u00afA, \u00afA \u2212 Diag(B2\n\n\u00afA, \u00afA))1 \u00afA.\n\n(4)\n(5)\n(6)\n\nRemark: As noted in Chen et al. [5], computing exact in\ufb02uence is #-P hard precisely because it is\ndif\ufb01cult to write down an expression for \u03c9(P k\nA) for arbitrary values of k. When m > 3, we may use\nthe techniques in Movarraei et al. [18, 16, 17] to obtain explicit lower bounds when m \u2264 7. Note\nthat as m increases, the sequence of lower bounds approaches the true value of I(A).\nThe lower bound (4) has a very simple interpretation. When |A| is \ufb01xed, the function LB1(A)\ncomputes the aggregate weight of edges from A to \u00afA. Furthermore, we may show that the function\nLB1 is monotonic. Hence, maximizing LB1 with respect to A is equivalent to \ufb01nding a maximum\ncut in the directed graph. (For more details, see Section 5.) The lower bounds (5) and (6) also take\ninto account the weight of paths of length 2 and 3 from A to \u00afA.\n\n3.3 Closeness of bounds\n\nA natural question concerns the proximity of the upper bound (1) to the lower bounds in Theorem 2.\nThe bounds may be far apart in general, as illustrated by the following example:\nExample: Consider a graph G with vertex set {1, 2, . . . , n}, and edge weights given by\n\n\uf8f1\uf8f2\uf8f30.5,\n\n0.5,\n0,\n\nwij =\n\nif i = 1 and j = 2,\nif i = 2 and 3 \u2264 j \u2264 n,\notherwise.\n\nLet A = {1}. We may check that LB1(A) = 1.5. Furthermore, I(A) = n+2\n4 , and any upper bound\nnecessarily exceeds this quantity. Hence, the gap between the upper and lower bounds may grow\nlinearly in the number of vertices. (Similar examples may be computed for LB2, as well.)\n\n4\n\n\fThe reason for the linear gap in the above example is that vertex 2 has a very large outgoing weight;\ni.e., it is highly infectious. Our next result shows that if the graph does not contain any highly-\ninfectious vertices, the upper and lower bounds are guaranteed to differ by a constant factor. The\n\nresult is stated in terms of the maximum row sum \u03bb \u00afA,\u221e =(cid:13)(cid:13)B \u00afA, \u00afA\n\n(cid:13)(cid:13)\u221e,\u221e, which corresponds to the\n\nmaximum outgoing weight of the nodes in \u00afA.\n\u2264\nTheorem 3. Suppose \u03bb \u00afA,\u221e < 1. Then U B\nLB1\n\n1\n\n1\u2212\u03bb \u00afA,\u221e and U B\n\nLB2\n\n\u2264\n\n1\n1\u2212\u03bb2\n\u00afA,\u221e\n\n.\n\nSince the column sums of B are bounded above by 1 in a linear threshold model, we have the\nfollowing corollary:\nCorollary 1. Suppose B is symmetric and A (cid:40) V . Then U B\nNote that if \u03bb\u221e = (cid:107)B(cid:107)\u221e,\u221e, we certainly have \u03bb \u00afA,\u221e \u2264 \u03bb\u221e for any choice of A \u2286 V . Hence,\nTheorem 3 and Corollary 1 hold a fortiori with \u03bb \u00afA,\u221e replaced by \u03bb\u221e.\n\n1\u2212\u03bb \u00afA,\u221e and U B\n\n1\n1\u2212\u03bb2\n\u00afA,\u221e\n\n\u2264\n\n\u2264\n\nLB2\n\n.\n\n1\n\nLB1\n\n4\n\nIn\ufb02uence bounds for triggering models\n\nWe now generalize our discussion to the broader class of triggering models. Recall that in this model,\nbij records the probability that (i, j) is a live edge.\n\n4.1 Upper bound\n\nWe begin by deriving an upper bound, which shows that inequality (2) holds for any triggering model:\nTheorem 4. In a general triggering model, the in\ufb02uence of A \u2286 V satis\ufb01es inequality (2).\nThe approach we use for general triggering models relies on slightly more sophisticated observations\nthan the proof for linear threshold models. Furthermore, the \ufb01nite sum in inequality (2) may not\nin general be replaced by an in\ufb01nite sum, as in the statement of Theorem 1 for the case of linear\n\nthreshold models. This is because if \u03c1(cid:0)B \u00afA, \u00afA\n\n(cid:1) > 1, the in\ufb01nite series will not converge.\n\n4.2 Lower bound\n\nWe also have a general lower bound:\nTheorem 5. Let A \u2286 V . The in\ufb02uence of A satis\ufb01es the inequality\n\nI(A) \u2265(cid:88)\n\ni\u2208V\n\nsup\n\np\u2208PA\u2192i\n\n\u03c9(p) := LBtrig(A),\n\n(7)\n\nwhere PA\u2192i is the set of all paths from A to i such that only the starting vertex lies in A.\n\nThe proof of Theorem 5 shows that the bound (7) is sharp when at most one path exists from A to\neach vertex i. In the case of linear threshold models, the bound (7) is not directly comparable to\nthe bounds stated in Theorem 2, since it involves maximal-weight paths rather than paths of certain\nlengths. Hence, situations exist in which one bound is tighter than the other, and vice versa (e.g., see\nthe Example in Section 3.3).\n\n4.3\n\nIndependent cascade models\n\nWe now apply the general bounds obtained for triggering models to the case of independent cascade\nmodels. Theorem 4 implies the following \u201cworst-case\" upper bounds on in\ufb02uence, which only\ndepends on |A|:\nTheorem 6. The in\ufb02uence of A \u2286 V in an independent cascade model satis\ufb01es\n\nI(A) \u2264 |A| + \u03bb\u221e|A| \u00b7 1 \u2212 \u03bbn\u2212|A|\n\u221e\n1 \u2212 \u03bb\u221e\n\n.\n\nIn particular, if \u03bb\u221e < 1, we have\n\nI(A) \u2264 |A|\n1 \u2212 \u03bb\u221e\n\n.\n\n5\n\n(8)\n\n(9)\n\n\fNote that when \u03bb\u221e > 1, the bound (8) exceeds n for all large enough n, so the bound is trivial.\nIt is instructive to compare Theorem 6 with the results of Lemonnier et al. [13]. The hazard matrix of\nan independent cascade model with weighted adjacency matrix (bij) is de\ufb01ned by\n\n\u2200(i, j).\nThe following result is stated in terms of the spectral radius \u03c1 = \u03c1\nProposition 1 (Corollary 1 in Lemonnier et al. [13]). Let A (cid:40) V , and suppose \u03c1 < 1 \u2212 \u03b4, where\n\u03b4 =\n\n(cid:112)|A|(n \u2212 |A|).\n\nHij = \u2212 log(1 \u2212 bij),\n\n. Then I(A) \u2264 |A| +\n\n(cid:16)H+HT\n\n(cid:16) |A|\n\n(cid:113) \u03c1\n\n(cid:17)1/3\n\n(cid:17)\n\n:\n\n2\n\n4(n\u2212|A|)\n\n1\u2212\u03c1\n\nAs illustrated in the following example, the bound in Theorem 6 may be signi\ufb01cantly tighter than the\nbound provided in Proposition 1:\n\n(cid:16)H+HT\n\nExample: Consider a directed Erd\u00f6s-R\u00e9nyi graph on n vertices, where each edge (i, j) is in-\nn. Suppose c < 1. For any set |A|, the bound (9) gives\ndependently present with probability c\nI(A) \u2264 |A|\n= \u2212(n \u2212 1) log(cid:0)1 \u2212 c\n1 \u2212 c\n\nIt is easy to check that \u03c1\nc < 1, so Proposition 1 implies the (approximate) bound I(A) \u2264 |A| +\nparticular, this bound increases with n, unlike our bound (10). Although the example is speci\ufb01c to\nErd\u00f6s-R\u00e9nyi graphs, we conjecture that whenever (cid:107)B(cid:107)\u221e,\u221e < 1, the bound in Theorem 6 is tighter\nthan the bound in Proposition 1.\n\n(cid:1). For large values of n, we have \u03c1(H) \u2192\n(cid:112)|A|(n \u2212 |A|). In\n\n(cid:113) c\n\n(cid:17)\n\n(10)\n\n1\u2212c\n\nn\n\n2\n\n.\n\n5 Maximizing in\ufb02uence\nWe now turn to the question of choosing a set A \u2286 V of cardinality at most k that maximizes I(A).\n\n5.1 Submodular maximization\n\nWe begin by reviewing the notion of submodularity, which will be crucial in our discussion of\nin\ufb02uence maximization algorithms. We have the following de\ufb01nition:\nDe\ufb01nition 1 (Submodularity). A set function f : 2V \u2192 R is submodular if either of the following\nequivalent conditions holds:\n(i) For any two sets S, T \u2286 V ,\n\nf (S \u222a T ) + f (S \u2229 T ) \u2264 f (S) + f (T ).\n\n(11)\n\n(ii) For any two sets S \u2286 T \u2286 V and any x /\u2208 T , the following inequality holds:\n\n(12)\nThe left and right sides of inequality (12) are the discrete derivatives of f evaluated at T and S.\n\nf (T \u222a {x}) \u2212 f (T ) \u2264 f (S \u222a {x}) \u2212 f (x).\n\nSubmodular functions arise in a wide variety of applications. Although submodular functions\nresemble convex and concave functions, optimization may be quite challenging; in fact, many\nsubmodular function maximization problems are NP-hard. However, positive submodular functions\nmay be maximized ef\ufb01ciently if they are also monotonic, where monotonicity is de\ufb01ned as follows:\nDe\ufb01nition 2 (Monotonicity). A function f : 2V \u2192 R is monotonic if for any two sets S \u2286 T \u2286 V ,\n\nf (S) \u2264 f (T ).\n\nEquivalently, a function is monotonic if its discrete derivative is nonnegative at all points.\n\nprovides a(cid:0)1 \u2212 1\nmaximizes the discrete derivative of f evaluated at Si. Then f (Sk) \u2265(cid:0)1 \u2212 1\n\n(cid:1)-approximation to the cardinality-constrained maximization problem:\n(cid:1) m\u2217(k).\n\nWe have the following celebrated result, which guarantees that the output of the greedy algorithm\nProposition 2 (Theorem 4.2 of Nemhauser and Wolsey [19]). Let f : 2V \u2192 R+ be a monotonic\nsubmodular function. For any k \u2265 0, de\ufb01ne m\u2217(k) = max|S|\u2264k f (S). Suppose we construct a\nsequence of sets {S0 = \u03c6, S1, . . . , Sk} in a greedy fashion, such that Si+1 = Si \u222a {x}, where x\n\ne\n\ne\n\n6\n\n\f5.2 Greedy algorithms\n\nKempe et al. [9] leverage Proposition 2 and the submodularity of the in\ufb02uence function to derive\nguarantees for a greedy algorithm for in\ufb02uence maximization in the linear threshold model. However,\ndue to the intractability of exact in\ufb02uence calculations, each step of the greedy algorithm requires\napproximating the in\ufb02uence of several augmented sets. This leads to an overall runtime of O(nk)\ntimes the runtime for simulations and introduces an additional source of error.\nAs the results of this section establish, the lower bounds {LBm}m\u22651 and LBtrig appearing in\nTheorems 2 and 5 are also conveniently submodular, implying that Proposition 2 also applies when\na greedy algorithm is employed. In contrast to the algorithm studied by Kempe et al. [9], however,\nour proposed greedy algorithms do not involve expensive simulations, since the functions LBm and\nLBtrig are relatively straightforward to evaluate. This means the resulting algorithm is extremely fast\nto compute even on large networks.\nTheorem 7. The lower bounds {LBm}m\u22651 are monotone and submodular. Thus, for any\n\nk \u2264 n, a greedy algorithm that maximizes LBm at each step yields a(cid:0)1 \u2212 1\na greedy algorithm that maximizes LBtrig at each step yields a (cid:0)1 \u2212 1\n\nto maxA\u2286V :|A|\u2264k LBm(A).\nTheorem 8. The function LBtrig is monotone and submodular.\n\nThus,\n\nmaxA\u2286V :|A|\u2264k LBtrig(A).\nNote that maximizing LBm(A) or LBtrig(A) necessarily provides a lower bound on maxA\u2286V I(A).\n\n(cid:1)-approximation\n(cid:1)-approximation to\n\nfor any k \u2264 n,\n\ne\n\ne\n\n6 Simulations\n\nIn this section, we report the results of various simulations. In the \ufb01rst set of simulations, we generated\nan Erd\u00f6s-Renyi graph with 900 vertices and edge probability 2\nn; a preferential attachment graph with\n900 vertices, 10 initial vertices, and 3 edges for each added vertex; and a 30 \u00d7 30 grid. We generated\n33 instances of edge probabilities for each graph, as follows: For each instance and each vertex i, we\nchose \u03b3(i) uniformly in [\u03b3min, 0.8], where \u03b3min ranged from 0.0075 to 0.75 in increments of 0.0075.\nThe probability that the incoming edge was chosen was 1\u2212\u03b3\nd(i) , where d(i) is the degree of i. An initial\ninfection set A of size 10 was chosen at random, and 50 simulations of the infection process were\nrun to estimate the true in\ufb02uence. The upper and lower bounds and value of I(A) computed via\nsimulations are shown in Figure 1. Note that the gap between the upper and lower bounds indeed\ncontrolled for smaller values of \u03bb \u00afA,\u221e, agreeing with the predictions of Theorem 3.\nFor the second set of simulations, we generated 10 of each of the following graphs: an Erd\u00f6s-Renyi\ngraph with 100 vertices and edge probability 2\nn; a preferential attachment graph with 100 vertices, 10\ninitial vertices, and 3 additional edges for each added vertex; and a grid graph with 100 vertices. For\neach of the 10 realizations, we also picked a value of \u03b3(i) for each vertex i uniformly in [0.075, 0.8].\nThe corresponding edge probabilities were assigned as before. We then selected sets A of size 10\nusing greedy algorithms to maximize LB1, LB2, and U B, as well as the estimated in\ufb02uence based on\n50 simulated infections. Finally, we used 200 simulations to approximate the actual in\ufb02uence of each\nresulting set. The average in\ufb02uences, along with the average in\ufb02uence of a uniformly random subset\nof vertices of size 10, are plotted in Figure 2. Note that the greedy algorithms all perform comparably,\nalthough the sets selected using LB2 and U B appear slightly better. The fact that the algorithm that\nuses U B performs well is somewhat unsurprising, since it takes into account the in\ufb02uence from all\npaths. However, note that maximizing U B does not lead to the theoretical guarantees we have derived\nfor LB1 and LB2. In Table 1, we report the runtimes scaled by the runtime of the LB1 algorithm.\nAs expected, the LB1 algorithm is fastest, and the other algorithms may be much slower.\n\n7 Discussion\n\nWe have developed novel upper and lower bounds on the in\ufb02uence function in various contagion\nmodels, and studied the problem of in\ufb02uence maximization subject to a cardinality constraint. Note\nthat all of our methods may be extended via the conditional expectation decomposition employed\nby Lee et al. [11], to obtain sharper in\ufb02uence bounds for certain graph topologies. It would be\ninteresting to derive theoretical guarantees for the quality of improvement in such cases; we leave this\n\n7\n\n\fFigure 1: Lower bounds, upper bounds, and simulated in\ufb02uence for Erd\u00f6s-Renyi, preferential\nattachment, and 2D-grid graphs, respectively. For small values of \u03bb \u00afA,\u221e, our bounds are tight.\n\nFigure 2: Simulated in\ufb02uence for sets |A| selected by greedy algorithms and uniformly at random\non Erd\u00f6s-Renyi, preferential attachment, and 2D-grid graphs respectively. All greedy algorithms\nperform similarly, but the algorithms maximizing the simulated in\ufb02uence and U B are much more\ncomputationally intensive.\n\nErd\u00f6s-Renyi\n\nPreferential attachment\n\n2D-grid\n\nLB1 LB2\n1.00\n2.36\n2.56\n1.00\n1.00\n2.43\n\nU B\n27.43\n28.49\n47.08\n\nSimulation\n\n710.58\n759.83\n1301.73\n\nTable 1: Runtimes for the in\ufb02uence maximization algorithms, scaled by the runtime of the greedy\nLB1 algorithm. The corresponding lower bounds are much easier to compute, allowing for faster\nalgorithms.\n\nexploration for future work. Other open questions involve quantifying the gap between the upper and\nlower bounds derived in the case of general triggering models, and obtaining theoretical guarantees\nfor non-greedy algorithms in our lower bound maximization problem.\n\nReferences\n[1] L. A. Adamic and E. Adar. Friends and neighbors on the Web. Social Networks, 25(3):211 \u2013\n\n230, 2003.\n\n[2] C. Borgs, M. Brautbar, J. Chayes, and B. Lucier. Maximizing social in\ufb02uence in nearly optimal\ntime. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms,\npages 946\u2013957. SIAM, 2014.\n\n[3] W. Chen, C. Wang, and Y. Wang. Scalable in\ufb02uence maximization for prevalent viral market-\ning in large-scale social networks. In Proceedings of the 16th ACM SIGKDD International\nConference on Knowledge Discovery and Data Mining, pages 1029\u20131038. ACM, 2010.\n\n[4] W. Chen, Y. Wang, and S. Yang. Ef\ufb01cient in\ufb02uence maximization in social networks. In\nProceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and\ndata mining, pages 199\u2013208. ACM, 2009.\n\n[5] W. Chen, Y. Yuan, and L. Zhang. Scalable in\ufb02uence maximization in social networks under the\nlinear threshold model. In Proceedings of the 2010 IEEE International Conference on Data\nMining, ICDM \u201910, pages 88\u201397, Washington, DC, USA, 2010. IEEE Computer Society.\n\n8\n\n11.522.53\u03bb\u00afA,\u221e10121416182022verticesinfectedLB1LB2SimulationUB4681012\u03bb\u00afA,\u221e1020304050verticesinfectedLB1LB2SimulationUB0.20.40.60.81\u03bb\u00afA,\u221e12141618202224verticesinfectedLB1LB2SimulationUB246810|A|051015202530in\ufb02uenceLB1LB2SimulationUBRandom246810|A|01020304050in\ufb02uenceLB1LB2SimulationUBRandom246810|A|0510152025in\ufb02uenceLB1LB2SimulationUBRandom\f[6] P. Domingos and M. Richardson. Mining the network value of customers. In Proceedings of the\nseventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,\npages 57\u201366. ACM, 2001.\n\n[7] R. Durrett. Lecture Notes on Particle Systems and Percolation. Wadsworth & Brooks/Cole\n\nstatistics/probability series. Wadsworth & Brooks/Cole Advanced Books & Software, 1988.\n\n[8] M. Hecker, S. Lambeck, S. Toepfer, E. Van Someren, and R. Guthke. Gene regulatory network\n\ninference: data integration in dynamic models\u2014a review. Biosystems, 96(1):86\u2013103, 2009.\n\n[9] D. Kempe, J. Kleinberg, and \u00c9. Tardos. Maximizing the spread of in\ufb02uence through a social\nnetwork. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge\nDiscovery and Data Mining, KDD \u201903, pages 137\u2013146, New York, NY, USA, 2003. ACM.\n\n[10] W. O. Kermack and A. G. McKendrick. A contribution to the mathematical theory of epidemics.\nProceedings of the Royal Society of London A: Mathematical, Physical and Engineering\nSciences, 115(772):700\u2013721, 1927.\n\n[11] E. J. Lee, S. Kamath, E. Abbe, and S. R. Kulkarni. Spectral bounds for independent cascade\nmodel with sensitive edges. In 2016 Annual Conference on Information Science and Systems\n(CISS), pages 649\u2013653, March 2016.\n\n[12] R. Lemonnier, K. Scaman, and N. Vayatis. Tight bounds for in\ufb02uence in diffusion networks\nand application to bond percolation and epidemiology. In Advances in Neural Information\nProcessing Systems, pages 846\u2013854, 2014.\n\n[13] R. Lemonnier, K. Scaman, and N. Vayatis. Spectral Bounds in Random Graphs Applied to\n\nSpreading Phenomena and Percolation. ArXiv e-prints, March 2016.\n\n[14] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM\n\nTransactions on the Web (TWEB), 1(1):5, 2007.\n\n[15] T. Liggett. Interacting Particle Systems, volume 276. Springer Science & Business Media,\n\n2012.\n\n[16] N. Movarraei and S. Boxwala. On the number of paths of length 5 in a graph. International\n\nJournal of Applied Mathematical Research, 4(1):30\u201351, 2015.\n\n[17] N. Movarraei and S. Boxwala. On the number of paths of length 6 in a graph. International\n\nJournal of Applied Mathematical Research, 4(2):267\u2013280, 2015.\n\n[18] N. Movarraei and M. Shikare. On the number of paths of lengths 3 and 4 in a graph. International\n\nJournal of Applied Mathematical Research, 3(2):178\u2013189, 2014.\n\n[19] G. L. Nemhauser and L. A. Wolsey. Best algorithms for approximating the maximum of a\n\nsubmodular set function. Mathematics of operations research, 3(3):177\u2013188, 1978.\n\n[20] M. E. J. Newman. The structure and function of complex networks. SIAM review, 45(2):167\u2013\n\n256, 2003.\n\n[21] K. Scaman, R. Lemonnier, and N. Vayatis. Anytime in\ufb02uence bounds and the explosive behavior\nof continuous-time diffusion networks. In Advances in Neural Information Processing Systems,\npages 2017\u20132025, 2015.\n\n[22] O. Sporns. The human connectome: A complex network. Annals of the New York Academy of\n\nSciences, 1224(1):109\u2013125, 2011.\n\n[23] C. Zhou, P. Zhang, J. Guo, and L. Guo. An upper bound based greedy algorithm for mining\ntop-k in\ufb02uential nodes in social networks. In Proceedings of the 23rd International Conference\non World Wide Web, pages 421\u2013422. ACM, 2014.\n\n9\n\n\f", "award": [], "sourceid": 2265, "authors": [{"given_name": "Justin", "family_name": "Khim", "institution": "University of Pennsylvania"}, {"given_name": "Varun", "family_name": "Jog", "institution": "University of Wisconsin - Madison"}, {"given_name": "Po-Ling", "family_name": "Loh", "institution": "Berkeley"}]}