{"title": "The Importance of Communities for Learning to Influence", "book": "Advances in Neural Information Processing Systems", "page_first": 5862, "page_last": 5871, "abstract": "We consider the canonical problem of influence maximization in social networks. Since the seminal work of Kempe, Kleinberg, and Tardos there have been two, largely disjoint efforts on this problem. The first studies the problem associated with learning the generative model that produces cascades, and the second focuses on the algorithmic challenge of identifying a set of influencers, assuming the generative model is known. Recent results on learning and optimization imply that in general, if the generative model is not known but rather learned from training data, no algorithm for influence maximization can yield a constant factor approximation guarantee using polynomially-many samples, drawn from any distribution. In this paper we describe a simple algorithm for maximizing influence from training data. The main idea behind the algorithm is to leverage the strong community structure of social networks and identify a set of individuals who are influentials but whose communities have little overlap. Although in general, the approximation guarantee of such an algorithm is unbounded, we show that this algorithm performs well experimentally. To analyze its performance, we prove this algorithm obtains a constant factor approximation guarantee on graphs generated through the stochastic block model, traditionally used to model networks with community structure.", "full_text": "The Importance of Communities for\n\nLearning to In\ufb02uence\n\nEric Balkanski\nHarvard University\n\nericbalkanski@g.harvard.edu\n\nNicole Immorlica\nMicrosoft Research\n\nnicimm@microsoft.com\n\nYaron Singer\n\nHarvard University\n\nyaron@seas.harvard.edu\n\nAbstract\n\nWe consider the canonical problem of in\ufb02uence maximization in social networks.\nSince the seminal work of Kempe, Kleinberg, and Tardos [KKT03] there have been\ntwo, largely disjoint efforts on this problem. The \ufb01rst studies the problem associated\nwith learning the generative model that produces cascades, and the second focuses\non the algorithmic challenge of identifying a set of in\ufb02uencers, assuming the\ngenerative model is known. Recent results on learning and optimization imply that\nin general, if the generative model is not known but rather learned from training data,\nno algorithm for in\ufb02uence maximization can yield a constant factor approximation\nguarantee using polynomially-many samples, drawn from any distribution.\nIn this paper we describe a simple algorithm for maximizing in\ufb02uence from training\ndata. The main idea behind the algorithm is to leverage the strong community\nstructure of social networks and identify a set of individuals who are in\ufb02uentials\nbut whose communities have little overlap. Although in general, the approximation\nguarantee of such an algorithm is unbounded, we show that this algorithm performs\nwell experimentally. To analyze its performance, we prove this algorithm obtains a\nconstant factor approximation guarantee on graphs generated through the stochastic\nblock model, traditionally used to model networks with community structure.\n\n1\n\nIntroduction\n\nFor well over a decade now, there has been extensive work on the canonical problem of in\ufb02uence\nmaximization in social networks. First posed by Domingos and Richardson [DR01, RD02] and\nelegantly formulated and further developed by Kempe, Kleinberg, and Tardos [KKT03], in\ufb02uence\nmaximization is the algorithmic challenge of selecting individuals who can serve as early adopters of\na new idea, product, or technology in a manner that will trigger a large cascade in the social network.\nIn their seminal paper, Kempe, Kleinberg, and Tardos characterize a family of natural in\ufb02uence\nprocesses for which selecting a set of individuals that maximize the resulting cascade reduces to\nmaximizing a submodular function under a cardinality constraint. Since submodular functions can be\nmaximized within a 1 1/e approximation guarantee, one can then obtain desirable guarantees for\nthe in\ufb02uence maximization problem. There have since been two, largely separate, agendas of research\non the problem. The \ufb01rst line of work is concerned with learning the underlying submodular function\nfrom observations of cascades [LK03, AA05, LMF+07, GBL10, CKL11, GBS11, NS12, GLK12,\nDSSY12, ACKP13, DSGRZ13, FK14, DBB+14, CAD+14, DGSS14, DLBS14, NPS15, HO15].\nThe second line of work focuses on algorithmic challenges revolving around maximizing in\ufb02uence,\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fassuming the underlying function that generates the diffusion process is known [KKT05, MR07,\nSS13, BBCL14, HS15, HK16, AS16].\nIn this paper, we consider the problem of learning to in\ufb02uence where the goal is to maximize in\ufb02uence\nfrom observations of cascades. This problem synthesizes both problems of learning the function\nfrom training data and of maximizing in\ufb02uence given the in\ufb02uence function. A natural approach for\nlearning to in\ufb02uence is to \ufb01rst learn the in\ufb02uence function from cascades, and then apply a submodular\noptimization algorithm on the function learned from data. Somewhat counter-intuitively, it turns\nout that this approach yields desirable guarantees only under very strong learnability conditions1.\nIn some cases, when there are suf\ufb01ciently many samples, and one can observe exactly which node\nattempts to in\ufb02uence whom at every time step, these learnability conditions can be met. A slight\nrelaxation however (e.g. when there are only partial observations [NPS15, HXKL16]), can lead to\nsharp inapproximability.\nA recent line of work shows that even when a function is statistically learnable, optimizing the\nfunction learned from data can be inapproximable [BRS17, BS17]. In particular, even when the\nsubmodular function f : 2N ! R is a coverage function (which is PMAC learnable [BDF+12, FK14]),\none would need to observe exponentially many samples {Si, f (Si)}m\ni=1 to obtain a constant factor\napproximation guarantee. Since coverage functions are special cases of the well studied models of\nin\ufb02uence (independent cascade, linear and submodular threshold), this implies that when the in\ufb02uence\nfunction is not known but learned from data, the in\ufb02uence maximization problem is intractable.\n\nLearning to in\ufb02uence social networks. As with all impossibility results, the inapproximability\ndiscussed above holds for worst case instances, and it may be possible that such instances are rare\nfor in\ufb02uence in social networks. In recent work, it was shown that when a submodular function has\nbounded curvature, there is a simple algorithm that can maximize the function under a cardinality\nconstraint from samples [BRS16]. Unfortunately, simple examples show that submodular functions\nthat dictate in\ufb02uence processes in social networks do not have bounded curvature. Are there other\nreasonable conditions on social networks that yield desirable approximation guarantees?\n\nMain result.\nIn this paper we present a simple algorithm for learning to in\ufb02uence. This algorithm\nleverages the idea that social networks exhibit strong community structure. At a high level, the\nalgorithm observes cascades and aims to select a set of nodes that are in\ufb02uential, but belong to\ndifferent communities. Intuitively, when an in\ufb02uential node from a certain community is selected\nto initiate a cascade, the marginal contribution of adding another node from that same community\nis small, since the nodes in that community were likely already in\ufb02uenced. This observation can\nbe translated into a simple algorithm which performs very well in practice. Analytically, since\ncommunity structure is often modeled using stochastic block models, we prove that the algorithm\nobtains a constant factor approximation guarantee in such models, under mild assumptions.\n\n1.1 Technical overview\n\nThe analysis for the approximation guarantees lies at the intersection of combinatorial optimization\nand random graph theory. We formalize the intuition that the algorithm leverages the community\nstructure of social networks in the standard model to analyze communities, which is the stochastic\nblock model. Intuitively, the algorithm obtains good approximations by picking the nodes that have\nthe largest individual in\ufb02uence while avoiding picking multiple nodes in the same community by\npruning nodes with high in\ufb02uence overlap. The individual in\ufb02uence of nodes and their overlap\nare estimated by the algorithm with what we call \ufb01rst and second order marginal contributions of\nnodes, which can be estimated from samples. We then uses phase transition results of Erd\u02ddos\u2013R\u00e9nyi\nrandom graphs and branching processes techniques to compare these individual in\ufb02uences for nodes\nin different communities in the stochastic block model and bound the overlap of pairs of nodes.\n\nThe optimization from samples model. Optimization from samples was recently introduced by\n[BRS17] in the context of submodular optimization, we give the de\ufb01nition for general set functions.\n\n1In general, the submodular function f : 2N ! R needs to be learnable everywhere within arbitrary precision,\ni.e. for every set S one needs to assume that the learner can produce a surrogate function \u02dcf : 2N ! R s.t. for\nevery S \u2713 N the surrogate guarantees to be (1 \u270f)f (S) \uf8ff \u02dcf (S) \uf8ff (1 + \u270f)f (S), for \u270f 2 o(1)[HS16, HS17].\n\n2\n\n\fPr\n\nT2M\n\nDe\ufb01nition 1. A class of functions F = {f : 2N ! R} is \u21b5-optimizable from samples over\ndistribution D under constraint M if there exists an algorithm s.t. for all f 2F , given a set of\ni=1 where the sets Si are drawn i.i.d. from D, the algorithm returns S 2M\nsamples {(Si, f (Si))}m\ns.t.:\nS1,...,Sm\u21e0D\uf8ffE[f (S)] \u21b5 \u00b7 max\n\nf (T ) 1 ,\n\nwhere the expectation is over the decisions of the algorithm and m 2 poly(|N|, 1/).\nWe focus on bounded product distributions D, so every node a is, independently, in S \u21e0D with some\nprobability pa 2 [1/ poly(n), 1 1/ poly(n)]. We assume this is the case throughout the paper.\nIn\ufb02uence process. We assume that the in\ufb02uence process follows the standard independent cascade\nmodel. In the independent cascade model, a node a in\ufb02uences each of its neighbors b with some\nprobability qab, independently. Thus, given a seed set of nodes S, the set of nodes in\ufb02uenced is the\nnumber of nodes connected to some node in S in the random subgraph of the network which contains\nevery edge ab independently with probability qab .We de\ufb01ne f (S) to be the expected number of nodes\nin\ufb02uenced by S according to the independent cascade model over some weighted social network.\n\nThe learning to in\ufb02uence model: optimization from samples for in\ufb02uence maximization. The\nlearning to in\ufb02uence model is an interpretation of the optimization from samples model [BRS17]\nfor the speci\ufb01c problem of in\ufb02uence maximization in social networks. We are given a collection of\nsamples {(Si,|cc(Si)|)}m\ni=1 where sets Si are the seed sets of nodes and |cc(Si)| is the number of\nnodes in\ufb02uenced by Si, i.e., the number of nodes that are connected to Si in the random subgraph of\nthe network. This number of nodes is a random variable with expected value f (Si) := E[|cc(Si)|]\nover the realization of the in\ufb02uence process. Each sample is an independent realization of the\nin\ufb02uence process. The goal is then to \ufb01nd a set of nodes S under a cardinality constraint k which\nmaximizes the in\ufb02uence in expectation, i.e., \ufb01nd a set S of size at most k which maximizes the\nexpected number of nodes f (S) in\ufb02uenced by seed set S.\n\n2 The Algorithm\n\nWe present the main algorithm, COPS. This algorithm is based on a novel optimization from samples\ntechnique which detects overlap in the marginal contributions of two different nodes, which is useful\nto avoid picking two nodes who have intersecting in\ufb02uence over a same collection of nodes.\n\n2.1 Description of COPS\n\nCOPS, consists of two steps.\nIt \ufb01rst orders nodes in decreasing order of \ufb01rst order marginal\ncontribution, which is the expected marginal contribution of a node a to a random set S \u21e0D . Then,\nit iteratively removes nodes a whose marginal contribution overlaps with the marginal contribution of\nat least one node before a in the ordering. The solution is the k \ufb01rst nodes in the pruned ordering.\n\nAlgorithm 1 COPS, learns to in\ufb02uence networks with COmmunity Pruning from Samples.\nInput: Samples S = {(S, f (S))}, acceptable overlap \u21b5.\n\nOrder nodes according to their \ufb01rst order marginal contributions\nIteratively remove from this ordering nodes a whose marginal contribution has overlap of at least\n\u21b5 with at least one node before a in this ordering.\nreturn k \ufb01rst nodes in the ordering\n\nThe strong performance of this algorithm for the problem of in\ufb02uence maximization is best explained\nwith the concept of communities. Intuitively, this algorithm \ufb01rst orders nodes in decreasing order of\ntheir individual in\ufb02uence and then removes nodes which are in a same community. This second step\nallows the algorithm to obtain a diverse solution which in\ufb02uences multiple different communities\nof the social network. In comparison, previous algorithms in optimization from samples [BRS16,\nBRS17] only use \ufb01rst order marginal contributions and perform well if the function is close to linear.\nDue to the high overlap in in\ufb02uence between nodes in a same community, in\ufb02uence functions are far\n\n3\n\n\ffrom being linear and these algorithms have poor performance for in\ufb02uence maximization since they\nonly pick nodes from a very small number of communities.\n\n2.2 Computing overlap using second order marginal contributions\n\nWe de\ufb01ne second order marginal contributions, which are used to compute the overlap between the\nmarginal contribution of two nodes.\nDe\ufb01nition 2. The second order expected marginal contribution of a node a to a random set S\ncontaining node b is\n\nvb(a) :=\n\n[f (S [{ a}) f (S)].\n\nE\n\nS\u21e0D:a62S,b2S\n\nThe \ufb01rst order marginal contribution v(a) of node a is de\ufb01ned similarly as the marginal contribution\nof a node a to a random set S, i.e., v(a) := ES\u21e0D:a62S[f (S [{ a}) f (S)]. These contributions can\nbe estimated arbitrarily well for product distributions D by taking the difference between the average\nvalue of samples containing a and b and the average value of samples containing b but not a (see\nAppendix B for details).\nThe subroutine OVERLAP(a, b, \u21b5), \u21b5 2 [0, 1], compares the second order marginal contribution of\na to a random set containing b and the \ufb01rst order marginal contribution of a to a random set. If b\ncauses the marginal contribution of a to decrease by at least a factor of 1 \u21b5, then we say that a has\nmarginal contribution with overlap of at least \u21b5 with node b.\n\nAlgorithm 2 OVERLAP(a, b, \u21b5), returns true if a and b have marginal contributions that overlap by\nat least a factor \u21b5.\nInput: Samples S = {(S, f (S))}, node a, acceptable overlap \u21b5\n\nIf second order marginal contribution vb(a) is at least a factor of 1 \u21b5 smaller than \ufb01rst order\nmarginal contribution v(a),\nreturn Node a has overlap of at least \u21b5 with node b\n\nOVERLAP is used to detect nodes in a same community. In the extreme case where two nodes a and b\nare in a community C where any node in C in\ufb02uences all of community C, then the second order\nmarginal contribution vb(a) of a to random set S containing b is vb(a) = 0 since b already in\ufb02uences\nall of C so a does not add any value, while v(a) \u21e1| C|. In the opposite case where a and b are in\ntwo communities which are not connected in the network, we have v(a) = vb(a) since adding b to a\nrandom set S has no impact on the value added by a.\n\n2.3 Analyzing community structure\n\nThe main bene\ufb01t from COPS is that it leverages the community structure of social networks. To\nformalize this explanation, we analyze our algorithm in the standard model used to study the\ncommunity structure of networks, the stochastic block model. In this model, a \ufb01xed set of nodes\nV is partitioned in communities C1, . . . , C`. The network is then a random graph G = (V, E)\nwhere edges are added to E independently and where an intra-community edge is in E with much\nlarger probability than an inter-community edge. These edges are added with identical probability\nqsb\nC for every edge in a same community, but with different probabilities for edges inside different\ncommunities Ci and Cj. We illustrate this model in Figure 1.\n\n3 Dense Communities and Small Seed Set in the Stochastic Block Model\nIn this section, we show that COPS achieves a 1 O(|Ck|1) approximation, where Ck is the kth\nlargest community, in the regime with dense communities and small seed set, which is described\nbelow. We show that the algorithm picks a node from each of the k largest communities with\nhigh probability, which is the optimal solution. In the next section, we show a constant factor\napproximation algorithm for a generalization of this setting, which requires a more intricate analysis.\nIn order to focus on the main characteristics of the community structure as an explanation for the\nperformance of the algorithm, we make the following simplifying assumptions for the analysis. We\n\n4\n\n\fFigure 1: An illustration of the stochastic block model with communities C1, C2, C3 and C4 of sizes 6, 4, 4\nand 4. The optimal solution for in\ufb02uence maximization with k = 4 is in green. Picking the k \ufb01rst nodes in the\nordering by marginal contributions without pruning, as in [BRS16], leads to a solution with nodes from only C1\n(red). By removing nodes with overlapping marginal contributions, COPS obtains a diverse solution.\n\nC and qsb\n\n\ufb01rst assume that there are no inter-community edges.2 We also assume that the random graph obtained\nfrom the stochastic block model is redrawn for every sample and that we aim to \ufb01nd a good solution\nin expectation over both the stochastic block model and the independent cascade model.\nFormally, let G = (V, E) be the random graph over n nodes obtained from an independent cascade\nprocess over the graph generated by the stochastic block model. Similarly as for the stochastic block\nmodel, edge probabilities for the independent cascade model may vary between different communities\nand are identical within a single community C, where all edges have weights qic\nC. Thus, an edge e\nbetween two nodes in a community C is in E with probability pC := qic\nC \u00b7 qsb\nC , independently for\nevery edge, where qic\nC are the edge probabilities in the independent cascade model and the\nstochastic block model respectively. The total in\ufb02uence by seed set S is then |ccG(Si)| where ccG(S)\nis the set of nodes connected to S in G and we drop the subscript when it is clear from context. Thus,\nthe objective function is f (S) := EG[|cc(S)|]. We describe the two assumptions for this section.\nDense communities. We assume that for the k largest communities C, pC > 3 log |C|/|C| and\nC has super-constant size (|C| = !(1)). This assumption corresponds to communities where the\nprobability pC that a node ai 2 C in\ufb02uences another node aj 2 C is large. Since the subgraph G[C]\nof G induced by a community C is an Erd\u02ddos\u2013R\u00e9nyi random graph, we get that G[C] is connected\nwith high probability (see Appendix C).\nLemma 3. [ER60] Assume C is a \u201cdense\" community, then the subgraph G[C] of G is connected\nwith probability 1 O(|C|2).\nSmall seed set. We also assume that the seed sets S \u21e0D are small enough so that they rarely\nintersect with a \ufb01xed community C, i.e., PrS\u21e0D[S\\C = ;] 1o(1). This assumption corresponds\nto cases where the set of early in\ufb02uencers is small, which is usually the case in cascades.\nThe analysis in this section relies on two main lemmas. We \ufb01rst show that the \ufb01rst order marginal\ncontribution of a node is approximately the size of the community it belongs to (Lemma 4). Thus, the\nordering by marginal contributions orders elements by the size of the community they belong to. Then,\nwe show that any node a 2 C that is s.t. that there is a node b 2 C before a in the ordering is pruned\n(Lemma 5). Regarding the distribution S \u21e0D generating the samples, as previously mentioned, we\nconsider any bounded product distribution. This implies that w.p. 1 1/ poly(n), the algorithm can\ncompute marginal contribution estimates \u02dcv that are all a 1/ poly(n)-additive approximation to the\ntrue marginal contributions v (See Appendix B for formal analysis of estimates). Thus, we give the\nanalysis for the true marginal contributions, which, with probability 1 1/ poly(n) over the samples,\neasily extends for arbitrarily good estimates.\nThe following lemma shows that the ordering by \ufb01rst order marginal contributions corresponds to the\nordering by decreasing order of community sizes that nodes belong to.\nLemma 4. For all a 2 C where C is one of the k largest communities, the \ufb01rst order marginal\ncontribution of node a is approximately the size of its community, i.e., (1 o(1))|C|\uf8ff v(a) \uf8ff| C|.\nProof. Assume a is a node in one of the k largest communities. Let Da and Da denote the\ndistributions S \u21e0D conditioned on a 2 S and a 62 S respectively. We also denote marginal\ncontributions by fS(a) := f (S [{ a}) f (S). We obtain\nsmaller to qsb\n\n2The analysis easily extends to cases where inter-community edges form with probability signi\ufb01cantly\n\nC, for all C.\n\n5\n\n\fv(a) =\n\nE\n\nS\u21e0Da,G\n\n[fS(a)] Pr\nS\u21e0Da\n\n[S \\ C = ;] \u00b7 Pr\n\nG\n\n[cc(a) = C] \u00b7\n\nE\n\nS\u21e0Da : S\\C=;,\n\nG : cc(a)=C\n\n[fS(a)]\n\n= Pr\n\nS\u21e0Da\n\n[S \\ C = ;] \u00b7 Pr\n\nG\n\n[cc(a) = C] \u00b7 |C|\n\n (1 o(1)) \u00b7 |C|\n\nwhere the last inequality is by the small seed set assumption and since C is connected with probability\n1 o(1) (Lemma 3 and |C| = !(1) by dense community assumption). For the upper bound, v(a) is\ntrivially at most the size of a\u2019s community since there are no inter-community edges.\n\nThe next lemma shows that the algorithm does not pick two nodes in a same community.\nLemma 5. With probability 1 o(1), for all pairs of nodes a, b such that a, b 2 C where C is one of\nthe k largest communities, OVERLAP(a, b, \u21b5) = True for any constant \u21b5 2 [0, 1).\nProof. Let a, b be two nodes in one of the k largest communities C and Da,b denote the distribution\nS \u21e0D conditioned on a 62 S and b 2 S. Then,\n\n[fS(a)] \uf8ff Pr[b 2 cc(a)] \u00b7 0 + Pr[b 62 cc(a)] \u00b7 |C| = o(1) \uf8ff o(1) \u00b7 v(a)\n\nvb(a) =\n\nE\n\nS\u21e0Da,b\n\nwhere the last equality is since G[C] is not connected w.p. O(|C|2) by Lemma 3 and since\n|C| = !(1) by the dense community assumption, which concludes the proof.\nBy combining Lemmas 4 and 5, we obtain the main result for this section (proof in Appendix D).\nTheorem 6. In the dense communities and small seed set setting, COPS with \u21b5-overlap allowed,\nfor any constant \u21b5 2 (0, 1) is a 1 o(1)-approximation algorithm for learning to in\ufb02uence from\nsamples from a bounded product distribution D.\n4 Constant Approximation for General Stochastic Block Model\n\nIn this section, we relax assumptions from the previous section and show that COPS is a constant\nfactor approximation algorithm in this more demanding setting. Recall that G is the random graph\nobtained from both the stochastic block model and the independent cascade model. A main observa-\ntion that is used in the analysis is to observe that the random subgraph G[C], for some community C,\nis an Erd\u02ddos\u2013R\u00e9nyi random graph G|C|,pC .\nRelaxation of the assumptions.\nInstead of only considering dense communities where pC =\n\u2326((log |C|)/|C|), we consider both tight communities C where pC (1 + \u270f)/|C| for some constant\n\u270f> 0 and loose communities C where pC \uf8ff (1 \u270f)/|C| for some constant \u270f> 0.3 We also\nrelax the small seed set assumption to the reasonable non-ubiquitous seed set assumption. Instead\nof having a seed set S \u21e0D rarely intersect with a \ufb01xed community C, we only assume that\nPrS\u21e0D[S \\ C = ;] \u270f for some constant \u270f> 0. Again, since seed sets are of small sizes in practice,\nit seems reasonable that with some constant probability a community does not contain any seeds.\n\nOverview of analysis. At a high level, the analysis exploits the remarkably sharp threshold for the\nphase transition of Erd\u02ddos\u2013R\u00e9nyi random graphs. This phase transition (Lemma 7) tells us that a tight\ncommunity C contains w.h.p. a giant connected component with a constant fraction of the nodes\nfrom C. Thus, a single node from a tight community in\ufb02uences a constant fraction of its community\nin expectation. The ordering by \ufb01rst order marginal contributions thus ensures a constant factor\napproximation of the value from nodes in tight communities (Lemma 10). On the other hand, we show\nthat a node from a loose community in\ufb02uences only at most a constant number of nodes in expectation\n(Lemma 8) by using branching processes. Since the algorithm checks for overlap using second order\nmarginal contributions, the algorithm picks at most one node from any tight community (Lemma 11).\nCombining all the pieces together, we obtain a constant factor approximation (Theorem 12).\n\n3Thus, we consider all possible sizes of communities except communities of size that converges to exactly\n\n1/pC, which is unlikely to occur in practice.\n\n6\n\n\fWe \ufb01rst state the result for the giant connected component in a tight community, which is an immediate\ncorollary of the prominent giant connected component result in the Erd\u02ddos\u2013R\u00e9nyi model.\nLemma 7. [ER60] Let C be a tight community with |C| = !(1), then G[C] has a \u201cgiant\" connected\ncomponent containing a constant fraction of the nodes in C w.p. 1 o(1).\nThe following lemma analyzes the in\ufb02uence of a node in a loose community through the lenses of\nGalton-Watson branching processes to show that such a node in\ufb02uences at most a constant number of\nnodes in expectation. The proof is deferred to Appendix E.\nLemma 8. Let C be a loose community, then f ({a}) \uf8ff c for all a 2 C and some constant c.\nWe can now upper bound the value of the optimal solution S?. Let C1, . . . , Ct be the t \uf8ff k tight\ncommunities that have at least one node in Ci that is in the optimal solution S? and that are of\nsuper-constant size, i.e., |C| = !(1). Without loss, we order these communities in decreasing order\nof their size |Ci|.\nLemma 9. Let S? be the optimal set of nodes and Ci and t be de\ufb01ned as above. There exists a\nconstant c such that f (S?) \uf8ffPt\nProof. Let S?\nB be a partition of the optimal nodes in nodes that are in tight communities\nwith super-constant individual in\ufb02uence and nodes that are not in such a community. The in\ufb02uence\nA) is trivially upper bounded byPt\nf (S?\nB) \uf8ff\nPa2S?\nf ({a}) \uf8ff c\u00b7 where the \ufb01rst inequality is by submodularity and the second since nodes in\nloose communities have constant individual in\ufb02uence by Lemma 8 and nodes in tight community\nwithout super-constant individual in\ufb02uence have constant in\ufb02uence by de\ufb01nition. We conclude that\nby submodularity, f (S?) \uf8ff f (S?\nNext, we argue that the solution returned by the algorithm is a constant factor away fromPt\ni=1 |Ci|.\nLemma 10. Let a be the ith node in the ordering by \ufb01rst order maginal contribution after the pruning\nand Ci be the ith largest tight community with super-constant individual in\ufb02uence and with at least\none node in the optimal solution S?. Then, f ({a}) \u270f|Ci| for some constant \u270f> 0.\nProof. By de\ufb01nition of Ci, we have |C1|\u00b7\u00b7\u00b7|\nCi| that are all tight communities. Let b be a\nnode in Cj for j 2 [i], 1gc(C) be the indicator variable indicating if there is a giant component in\ncommunity C, and gc(C) be this giant component. We get\n\ni=1 |Ci|. Next, there exists some constant c s.t. f (S?\n\nB) \uf8ffPt\n\nA) + f (S?\n\ni=1 |Ci| + c \u00b7 k.\n\ni=1 |Ci| + c \u00b7 k.\n\nA and S?\n\nB\n\nv(b) Pr[1gc(Cj )] \u00b7 Pr\nS\u21e0Db\n\n[S \\ Cj = ;] \u00b7 Pr[b 2 gc(Cj)] \u00b7 E[|gc(Cj)| : b 2 gc(Cj)]\n\n (1 o(1)) \u00b7 \u270f1 \u00b7 \u270f2 \u00b7 \u270f3|Cj| \u270f|Cj|\n\nfor some constants \u270f1,\u270f 2,\u270f 3,\u270f > 0 by Lemma 7 and the non-ubiquitous assumption. Similarly as in\nTheorem 6, if a and b are in different communities, OVERLAP(a, b, \u21b5) = False for \u21b5 2 (0, 1]. Thus,\nthere is at least one node b 2 [i\nj=1Cj at position i or after in the ordering after the pruning, and\nv(b) \u270f|Cj| for some j 2 [i]. By the ordering by \ufb01rst order marginal contributions and since node a\nis in ith position, v(a) v(b), and we get that f ({a}) v(a) v(b) \u270f|Cj| \u270f|Ci|.\nNext, we show that the algorithm never picks two nodes from a same tight community and defer the\nproof to Appendix E.\nLemma 11. If a, b 2 C and C is a tight community, then OVERLAP(a, b, \u21b5) = True for \u21b5 = o(1).\nWe combine the above lemmas to obtain the approximation guarantee of COPS (proof in Appendix E).\nTheorem 12. With overlap allowed \u21b5 = 1/ poly(n), COPS is a constant factor approximation\nalgorithm for learning to in\ufb02uence from samples drawn from a bounded product distribution D in the\nsetting with tight and loose communities and non-ubiquitous seed sets.\n\n5 Experiments\n\nIn this section, we compare the performance of COPS and three other algorithms on real and synthetic\nnetworks. We show that COPS performs well in practice, it outperforms the previous optimization\nfrom samples algorithm and gets closer to the solution obtained when given complete access to the\nin\ufb02uence function.\n\n7\n\n\fDBLP\n\nGreedy\nCOPS\nMargI\nRandom\n\n400\n\n300\n\n200\n\n100\n\ne\nc\nn\na\nm\nr\no\nf\nr\ne\nP\n\n300\n\n250\n\n200\n\n150\n\n100\n\n50\n\ne\nc\nn\na\nm\nr\no\nf\nr\ne\nP\n\nDBLP\n\n400\n\n300\n\n200\n\n100\n\ne\nc\nn\na\nm\nr\no\nf\nr\ne\nP\n\nFacebook\n\n300\n\n250\n\n200\n\n150\n\n100\n\n50\n\ne\nc\nn\na\nm\nr\no\nf\nr\ne\nP\n\nFacebook\n\n0\n0.0\n\n0.4\n\n0.8\n\n1.2\nq\n\n1.6\n\n2.0\nx10$2&\nx10$2&\n\n0\n\n0\n\n3\n\n6\n\nk\n\n9\n\n12\n\n15\n\n0\n0.6\n\n0.7\n\n0.8\n\n0.9\n\nq\n\n1\n\n1.1\nx10$2&\nx10$2&\n\n0\n\n0\n\n3\n\n6\n\nk\n\n9\n\n12\n\n15\n\nStochastic Block Model 1\n\n600\n\ne\nc\nn\na\nm\nr\no\nf\nr\ne\nP\n\n400\n\n200\n\n1500\n\n1200\n\ne\nc\nn\na\nm\nr\no\nf\nr\ne\nP\n\n900\n\n600\n\n300\n\nStochastic Block Model 2\n\nPreferential Attachment\n\nErd\u0151s\u2013R\u00e9nyi \n\n800\n\n600\n\n400\n\n200\n\ne\nc\nn\na\nm\nr\no\nf\nr\ne\nP\n\n250\n\n200\n\n150\n\n100\n\n50\n\ne\nc\nn\na\nm\nr\no\nf\nr\ne\nP\n\n0\n0.0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1.0\nx105%\n\nn\n\n0\n0.0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1.0\nx105%\nx105%\n\nn\n\n0\n0.0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1.0\nx105%\nx105%\n\nn\n\n0\n0.0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1.0\nx105%\nx105%\n\nn\n\nFigure 2: Empirical performance of COPS against the GREEDY upper bound, the previous optimization from\nsamples algorithm MARGI and a random set.\n\nExperimental setup. The \ufb01rst synthetic network considered is the stochastic block model, SBM 1,\nwhere communities have random sizes with one community of size signi\ufb01cantly larger than the other\ncommunities. We maintained the same expected community size as n varied. In the second stochastic\nblock model, SBM 2, all communities have same expected size and the number of communities was\n\ufb01xed as n varied. The third and fourth synthetic networks were an Erd\u02ddos\u2013R\u00e9nyi (ER) random graph\nand the preferential attachment model (PA). Experiments were also conducted on two real networks\npublicly available ([LK15]). The \ufb01rst is a subgraph of the Facebook social network with n = 4k\nand m = 88k. The second is a subgraph of the DBLP co-authorship network, which has ground\ntruth communities as described in [LK15], where nodes of degree at most 10 were pruned to obtain\nn = 54k, m = 361k and where the 1.2k nodes with degree at least 50 were considered as potential\nnodes in the solution.\n\nBenchmarks. We considered three different benchmarks to compare the COPS algorithm against.\nThe standard GREEDY algorithm in the value query model is an upper bound since it is the optimal\nef\ufb01cient algorithm given value query access to the function and COPS is in the more restricted setting\nwith only samples. MARGI is the optimization from samples algorithm which picks the k nodes\nwith highest \ufb01rst order marginal contribution ([BRS16]) and does not use second order marginal\ncontributions. RANDOM simply returns a random set. All the samples are drawn from the product\ndistribution with marginal probability k/n, so that samples have expected size k. We further describe\nthe parameters of each plot in Appendix F.\n\nEmpirical evaluation. COPS signi\ufb01cantly outperforms the previous optimization from samples\nalgorithm MARGI, getting much closer to the GREEDY upper bound. We observe that the more there\nis a community structure in the network, the better the performance of COPS is compared to MARGI,\ne.g., SBM vs ER and PA (which do not have a community structure). When the edge weight q := qi.c.\nfor the cascades is small, the function is near-linear and MARGI performs well, whereas when it is\nlarge, there is a lot of overlap and COPS performs better. The performance of COPS as a function\nof the overlap allowed (experiment in Appendix F) can be explained as follows: Its performance\nslowly increases as the the overlap allowed increases and COPS can pick from a larger collection of\nnodes until it drops when it allows too much overlap and picks mostly very close nodes from a same\ncommunity. For SBM 1 with one larger community, MARGI is trapped into only picking nodes from\nthat larger community and performs even less well than RANDOM. As n increases, the number of\nnodes in\ufb02uenced increases roughly linearly for SBM 2 when the number of communities is \ufb01xed\nsince the number of nodes per community increases linearly, which is not the case for SBM 1.\n\n8\n\n\fReferences\n\n[AA05] Eytan Adar and Lada A. Adamic. Tracking information epidemics in blogspace. In WI, 2005.\n\n[ACKP13] Bruno D. Abrahao, Flavio Chierichetti, Robert Kleinberg, and Alessandro Panconesi. Trace\n\ncomplexity of network inference. In KDD, 2013.\n\n[AS16] Rico Angell and Grant Schoenebeck. Don\u2019t be greedy: Leveraging community structure to \ufb01nd\n\nhigh quality seed sets for in\ufb02uence maximization. arXiv preprint arXiv:1609.06520, 2016.\n\n[BBCL14] Christian Borgs, Michael Brautbar, Jennifer T. Chayes, and Brendan Lucier. Maximizing social\nin\ufb02uence in nearly optimal time. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium\non Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 946\u2013957,\n2014.\n\n[BDF+12] Ashwinkumar Badanidiyuru, Shahar Dobzinski, Hu Fu, Robert Kleinberg, Noam Nisan, and\nTim Roughgarden. Sketching valuation functions. In Proceedings of the twenty-third annual\nACM-SIAM symposium on Discrete Algorithms, pages 1025\u20131035. Society for Industrial and\nApplied Mathematics, 2012.\n\n[BHK] Avrim Blum, John Hopcroft, and Ravindran Kannan. Foundations of data science.\n\n[BRS16] Eric Balkanski, Aviad Rubinstein, and Yaron Singer. The power of optimization from samples. In\nAdvances in Neural Information Processing Systems 29: Annual Conference on Neural Information\nProcessing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 4017\u20134025, 2016.\n\n[BRS17] Eric Balkanski, Aviad Rubinstein, and Yaron Singer. The limitations of optimization from samples.\n\nIn STOC, 2017.\n\n[BS17] Eric Balkanski and Yaron Singer. The sample complexity of optimizing a convex function. In\n\nCOLT, 2017.\n\n[CAD+14] Justin Cheng, Lada A. Adamic, P. Alex Dow, Jon M. Kleinberg, and Jure Leskovec. Can cascades\n\nbe predicted? In WWW, 2014.\n\n[CKL11] Flavio Chierichetti, Jon M. Kleinberg, and David Liben-Nowell. Reconstructing patterns of\n\ninformation diffusion from incomplete observations. In NIPS, 2011.\n\n[DBB+14] Abir De, Sourangshu Bhattacharya, Parantapa Bhattacharya, Niloy Ganguly, and Soumen\nIn CIKM,\n\nChakrabarti. Learning a linear in\ufb02uence model from transient opinion dynamics.\n2014.\n\n[DGSS14] Hadi Daneshmand, Manuel Gomez-Rodriguez, Le Song, and Bernhard Sch\u00f6lkopf. Estimating dif-\nfusion network structures: Recovery conditions, sample complexity & soft-thresholding algorithm.\nIn ICML, 2014.\n\n[DLBS14] Nan Du, Yingyu Liang, Maria-Florina Balcan, and Le Song.\n\ninformation diffusion networks. In ICML, 2014.\n\nIn\ufb02uence function learning in\n\n[DR01] Pedro Domingos and Matthew Richardson. Mining the network value of customers. In KDD,\n\n2001.\n\n[DSGRZ13] Nan Du, Le Song, Manuel Gomez-Rodriguez, and Hongyuan Zha. Scalable in\ufb02uence estimation\n\nin continuous-time diffusion networks. In NIPS, 2013.\n\n[DSSY12] Nan Du, Le Song, Alexander J. Smola, and Ming Yuan. Learning networks of heterogeneous\n\nin\ufb02uence. In NIPS, 2012.\n\n[ER60] Paul Erdos and Alfr\u00e9d R\u00e9nyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad.\n\nSci, 5(1):17\u201360, 1960.\n\n[FK14] Vitaly Feldman and Pravesh Kothari. Learning coverage functions and private release of marginals.\n\nIn COLT, 2014.\n\n[GBL10] Amit Goyal, Francesco Bonchi, and Laks VS Lakshmanan. Learning in\ufb02uence probabilities in\n\nsocial networks. In KDD, 2010.\n\n[GBS11] Manuel Gomez-Rodriguez, David Balduzzi, and Bernhard Sch\u00f6lkopf. Uncovering the temporal\n\ndynamics of diffusion networks. In ICML, 2011.\n\n9\n\n\f[GLK12] Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Krause. Inferring networks of diffusion\n\nand in\ufb02uence. ACM Transactions on Knowledge Discovery from Data, 5(4):21, 2012.\n\n[HK16] Xinran He and David Kempe. Robust in\ufb02uence maximization. In Proceedings of the 22nd ACM\nSIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA,\nUSA, August 13-17, 2016, pages 885\u2013894, 2016.\n\n[HO15] Jean Honorio and Luis Ortiz. Learning the structure and parameters of large-population graphical\n\ngames from behavioral data. Journal of Machine Learning Research, 16:1157\u20131210, 2015.\n\n[HS15] Thibaut Horel and Yaron Singer. Scalable methods for adaptively seeding a social network. In\nProceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy,\nMay 18-22, 2015, pages 441\u2013451, 2015.\n\n[HS16] Thibaut Horel and Yaron Singer. Maximization of approximately submodular functions.\n\nIn\nAdvances in Neural Information Processing Systems 29: Annual Conference on Neural Information\nProcessing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 3045\u20133053, 2016.\n\n[HS17] Avinatan Hassidim and Yaron Singer. Submodular maximization under noise. In COLT, 2017.\n\n[HXKL16] Xinran He, Ke Xu, David Kempe, and Yan Liu. Learning in\ufb02uence functions from incomplete\nobservations. In Advances in Neural Information Processing Systems 29: Annual Conference on\nNeural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages\n2065\u20132073, 2016.\n\n[KKT03] David Kempe, Jon M. Kleinberg, and \u00c9va Tardos. Maximizing the spread of in\ufb02uence through a\n\nsocial network. In KDD, 2003.\n\n[KKT05] David Kempe, Jon M. Kleinberg, and \u00c9va Tardos. In\ufb02uential nodes in a diffusion model for social\nnetworks. In Automata, Languages and Programming, 32nd International Colloquium, ICALP\n2005, Lisbon, Portugal, July 11-15, 2005, Proceedings, pages 1127\u20131138, 2005.\n\n[LK03] David Liben-Nowell and Jon M. Kleinberg. The link prediction problem for social networks. In\n\nCIKM, 2003.\n\n[LK15] Jure Leskovec and Andrej Krevl. Snap datasets, stanford large network dataset collection. 2015.\n\n[LMF+07] Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie S. Glance, and Matthew Hurst.\n\nPatterns of cascading behavior in large blog graphs. In SDM, 2007.\n\n[MR07] Elchanan Mossel and S\u00e9bastien Roch. On the submodularity of in\ufb02uence in social networks. In\n\nSTOC, 2007.\n\n[NPS15] Harikrishna Narasimhan, David C. Parkes, and Yaron Singer. Learnability of in\ufb02uence in net-\nworks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural\nInformation Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages\n3186\u20133194, 2015.\n\n[NS12] Praneeth Netrapalli and Sujay Sanghavi. Learning the graph of epidemic cascades. In SIGMET-\n\nRICS/Performance, 2012.\n\n[RD02] Matthew Richardson and Pedro Domingos. Mining knowledge-sharing sites for viral marketing.\n\nIn KDD, 2002.\n\n[SS13] Lior Seeman and Yaron Singer. Adaptive seeding in social networks. In 54th Annual IEEE\nSymposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA,\nUSA, pages 459\u2013468, 2013.\n\n10\n\n\f", "award": [], "sourceid": 2999, "authors": [{"given_name": "Eric", "family_name": "Balkanski", "institution": "Harvard University"}, {"given_name": "Nicole", "family_name": "Immorlica", "institution": "Microsoft Research"}, {"given_name": "Yaron", "family_name": "Singer", "institution": "Harvard University"}]}