{"title": "On the Convexity of Latent Social Network Inference", "book": "Advances in Neural Information Processing Systems", "page_first": 1741, "page_last": 1749, "abstract": "In many real-world scenarios, it is nearly impossible to collect explicit social network data. In such cases, whole networks must be inferred from underlying observations. Here, we formulate the problem of inferring latent social networks based on network diffusion or disease propagation data. We consider contagions propagating over the edges of an unobserved social network, where we only observe the times when nodes became infected, but not who infected them. Given such node infection times, we then identify the optimal network that best explains the observed data. We present a maximum likelihood approach based on convex programming with a l1-like penalty term that encourages sparsity. Experiments on real and synthetic data reveal that our method near-perfectly recovers the underlying network structure as well as the parameters of the contagion propagation model. Moreover, our approach scales well as it can infer optimal networks on thousands of nodes in a matter of minutes.", "full_text": "On the Convexity of Latent Social Network Inference\n\nSeth A. Myers\n\nInstitute for Computational\n\nand Mathematical Engineering\n\nStanford University\n\nJure Leskovec\n\nDepartment of Computer Science\n\nStanford University\n\njure@cs.stanford.edu\n\nsamyers@stanford.edu\n\nAbstract\n\nIn many real-world scenarios, it is nearly impossible to collect explicit social net-\nwork data. In such cases, whole networks must be inferred from underlying ob-\nservations. Here, we formulate the problem of inferring latent social networks\nbased on network diffusion or disease propagation data. We consider contagions\npropagating over the edges of an unobserved social network, where we only ob-\nserve the times when nodes became infected, but not who infected them. Given\nsuch node infection times, we then identify the optimal network that best explains\nthe observed data. We present a maximum likelihood approach based on convex\nprogramming with a l1-like penalty term that encourages sparsity. Experiments\non real and synthetic data reveal that our method near-perfectly recovers the un-\nderlying network structure as well as the parameters of the contagion propagation\nmodel. Moreover, our approach scales well as it can infer optimal networks of\nthousands of nodes in a matter of minutes.\n\n1 Introduction\n\nSocial network analysis has traditionally relied on self-reported data collected via interviews and\nquestionnaires [27]. As collecting such data is tedious and expensive, traditional social network\nstudies typically involved a very limited number of people (usually less than 100). The emergence\nof large scale social computing applications has made massive social network data [16] available,\nbut there are important settings where network data is hard to obtain and thus the whole network\nmust thus be inferred from the data. For example, populations, like drug injection users or men who\nhave sex with men, are \u201chidden\u201d or \u201chard-to-reach\u201d. Collecting social networks of such populations\nis near impossible, and thus whole networks have to be inferred from the observational data.\n\nEven though inferring social networks has been attempted in the past, it usually assumes that the\npairwise interaction data is already available [5]. In this case, the problem of network inference\nreduces to deciding whether to include the interaction between a pair of nodes as an edge in the un-\nderlying network. For example, inferring networks from pairwise interactions of cell-phone call [5]\nor email [4, 13] records simply reduces down to selecting the right threshold \u03c4 such that an edge\n(u, v) is included in the network if u and v interacted more than \u03c4 times in the dataset. Similarly,\ninferring networks of interactions between proteins in a cell usually reduces to determining the right\nthreshold [9, 20].\n\nWe address the problem of inferring the structure of unobserved social networks in a much more\nambitious setting. We consider a diffusion process where a contagion (e.g., disease, information,\nproduct adoption) spreads over the edges of the network, and all that we observe are the infection\ntimes of nodes, but not who infected whom i.e. we do not observe the edges over which the contagion\nspread. The goal then is to reconstruct the underlying social network along the edges of which the\ncontagion diffused.\n\n1\n\n\fWe think of a diffusion on a network as a process where neighboring nodes switch states from in-\nactive to active. The network over which activations propagate is usually unknown and unobserved.\nCommonly, we only observe the times when particular nodes get \u201cinfected\u201d but we do not observe\nwho infected them. In case of information propagation, as bloggers discover new information, they\nwrite about it without explicitly citing the source [15]. Thus, we only observe the time when a blog\ngets \u201cinfected\u201d but not where it got infected from. Similarly, in disease spreading, we observe peo-\nple getting sick without usually knowing who infected them [26]. And, in a viral marketing setting,\nwe observe people purchasing products or adopting particular behaviors without explicitly knowing\nwho was the in\ufb02uencer that caused the adoption or the purchase [11]. Thus, the question is, if we as-\nsume that the network is static over time, is it possible to reconstruct the unobserved social network\nover which diffusions took place? What is the structure of such a network?\n\nWe develop convex programming based approach for inferring the latent social networks from dif-\nfusion data. We \ufb01rst formulate a generative probabilistic model of how, on a \ufb01xed hypothetical\nnetwork, contagions spread through the network. We then write down the likelihood of observed\ndiffusion data under a given network and diffusion model parameters. Through a series of steps we\nshow how to obtain a convex program with a l1-like penalty term that encourages sparsity. We evalu-\nate our approach on synthetic as well as real-world email and viral marketing datasets. Experiments\nreveal that we can near-perfectly recover the underlying network structure as well as the parameters\nof the propagation model. Moreover, our approach scales well since we can infer optimal networks\nof a thousand nodes in a matter of minutes.\nFurther related work. There are several different lines of work connected to our research. First is\nthe network structure learning for estimating the dependency structure of directed graphical mod-\nels [7] and probabilistic relational models [7]. However, these formulations are often intractable\nand one has to reside to heuristic solutions. Recently, graphical Lasso methods [25, 21, 6, 19] for\nstatic sparse graph estimation and extensions to time evolving graphical models [1, 8, 22] have been\nproposed with lots of success. Our work here is similar in a sense that we \u201cregress\u201d the infection\ntimes of a target node on infection times of other nodes. Additionally, our work is also related to a\nlink prediction problem [12, 23, 18, 24] but different in a sense that this line of work assumes that\npart of the network is already visible to us.\n\nThe work most closely related to ours, however, is [10], which also infers networks through cascade\ndata. The algorithm proposed (called NetInf) assumes that the weights of the edges in latent network\nare homogeneous, i.e. all connected nodes in the network infect/in\ufb02uence their neighbors with the\nsame probability. When this assumption holds, the algorithm is very accurate and is computationally\nfeasible, but here we remove this assumption in order to address a more general problem. Further-\nmore, where [10] is an approximation algorithm, our approach guarantees optimality while easily\nhandling networks with thousands of nodes.\n\n2 Problem Formulation and the Proposed Method\n\nWe now de\ufb01ne the problem of inferring a latent social networks based on network diffusion data,\nwhere we only observe identities of infected nodes. Thus, for each node we know the interval\nduring which the node was infected, whereas the source of each node\u2019s infection is unknown. We\nassume only that an infected node was previously infected by some other previously infected node\nto which it is connected in the latent social network (which we are trying to infer). Our method-\nology can handle a wide class of information diffusion and epidemic models, like the independent\ncontagion model, the Susceptible\u2013Infected (SI), Susceptible\u2013Infected\u2013Susceptible (SIS) or even the\nSusceptible\u2013Infected\u2013Recovered (SIR) model [2]. We show that calculating the maximum likeli-\nhood estimator (MLE) of the latent network (under any of the above diffusion models) is equivalent\nto a convex problem that can be ef\ufb01ciently solved.\nProblem formulation: The cascade model. We start by \ufb01rst introducing the model of the diffusion\nprocess. As the contagion spreads through the network, it leaves a trace that we call a cascade.\nAssume a population of N nodes, and let A be the N \u00d7 N weighted adjacency matrix of the network\nthat is unobserved and that we aim to infer. Each entry (i, j) of A models the conditional probability\nof infection transmission:\n\nAij = P (node i infects node j | node i is infected).\n\n2\n\n\fThe temporal properties of most types of cascades, especially disease spread, are governed by a\ntransmission (or incubation) period. The transmission time model w(t) speci\ufb01es how long it takes\nfor the infection to transmit from one node to another, and the recovery model r(t) models the time\nof how long a node is infected before it recovers. Thus, whenever some node i, which was infected\nat time \u03c4i, infects another node j, the time separating two infection times is sampled from w(t), i.e.,\ninfection time of node j is \u03c4j = \u03c4i + t, where t is distributed by w(t). Similarly, the duration of each\nnode\u2019s infection is sampled from r(t). Both w(t) and r(t) are general probability distributions with\nstrictly nonnegative support.\nA cascade c is initiated by randomly selecting a node to become infected at time t = 0. Let \u03c4i\ndenote the time of infection of node i. When node i becomes infected, it infects each of its neighbors\nindependently in the network, with probabilities governed by A. Speci\ufb01cally, if i becomes infected\nand j is susceptible, then j will become infected with probability Aij. Once it has been determined\nwhich of i\u2019s neighbors will be infected, the infection time of each newly infected neighbor will\nbe the sum of \u03c4i and an interval of time sampled from w(t). The transmission time for each new\ninfection is sampled independently from w(t).\nOnce a node becomes infected, depending on the model, different scenarios happen. In the SIS\nmodel, node i will become susceptible to infection again at time \u03c4i + ri. On the other hand, under\nthe SIR model, node i will recover and can never be infected again. Our work here mainly considers\nthe SI model, where nodes remain infected forever, i.e., it will never recover, ri = \u221e. It is important\nto note, however, that our approach can handle all of these models with almost no modi\ufb01cation to\nthe algorithm.\n\nFor each cascade c, we then observe the node infection times \u03c4 c\ni as well as the duration of infection,\nbut the source of each node\u2019s infection remains hidden. The goal then is to, based on observed set\nof cascade infection times D, infer the weighted adjacency matrix A, where Aij models the edge\ntransmission probability.\nMaximum Likelihood Formulation. Let D be the set of observed cascades. For each cascade\ni be the time of infection for node i. Note that if node i did not get infected in cascade c,\nc, let \u03c4 c\nthen \u03c4 c\ni = \u221e. Also, let Xc(t) denote the set of all nodes that are in an infected state at time t in\ncascade c. We know the infection of each node was the result of an unknown, previously infected\nnode to which it is connected, so the component of the likelihood function for each infection will be\ndependent on all previously infected nodes. Speci\ufb01cally, the likelihood function for a \ufb01xed given A\nis\n\nL(A; D) = Y\n\nc\u2208D\n\n= Y\n\nc\u2208D\n\n\uf8ee\n\uf8f0\n\uf8ee\n\uf8f0\n\ni;\u03c4 c\n\n\uf8eb\n\uf8ed Y\n\uf8eb\n\uf8ed Y\n\ni;\u03c4 c\n\ni <\u221e\n\ni <\u221e\n\nP (i infected at \u03c4 c\n\ni |Xc(\u03c4 c\n\ni;\u03c4 c\n\ni ))\uf8f6\nP (i never infected|Xc(t) \u2200 t)\uf8f6\n\uf8f8 \u00b7 \uf8eb\n\uf8ed Y\n\uf8f8\nj )Aji)\uf8f6\n\uf8f6\n\uf8f8 \u00b7 \uf8eb\n\uf8f9\n\uf8ed Y\n\uf8fb .\n\uf8f8\n\n(1 \u2212 Aji)\uf8f6\n\uf8f8\n\nY\n\ni =\u221e\n\nj <\u221e\n\ni;\u03c4 c\n\ni =\u221e\n\nj;\u03c4 c\n\n\uf8f9\n\uf8fb\n\n\uf8eb\n\uf8ed1 \u2212 Y\n\nj;\u03c4j \u2264\u03c4i\n\n(1 \u2212 w(\u03c4 c\n\ni \u2212 \u03c4 c\n\nThe likelihood function is composed of two terms. Consider some cascade c. First, for every node\ni that got infected at time \u03c4 c\ni we compute the probability that at least one other previously infected\nnode could have infected it. For every non-infected node, we compute probability that no other\nnode ever infected it. Note that we assume that both the cascades and infections are conditionally\nindependent. Moreover, in the case of the SIS model each node can be infected multiple times\nduring a single cascade, so there will be multiple observed values for each \u03c4 c\ni and the likelihood\nfunction would have to include each infection time in the product sum. We omit this detail for the\nsake of clarity.\nThen the maximum likelihood estimate of A is a solution to minA \u2212 log(L(A; D)) subject to the\nconstraints 0 \u2264 Aij \u2264 1 for each i, j.\nSince a node cannot infect itself, the diagonal of A is strictly zero, leaving the optimization problem\nwith N (N \u2212 1) variables. This makes scaling to large networks problematic. We can, however,\nbreak this problem into N independent subproblems, each with only N \u2212 1 variables by observing\nthat the incoming edges to a node can be inferred independently of the incoming edges of any other\nnode. Note that there is no restriction on the structure of A (for example, it is not in general a\nstochastic matrix), so the columns of A can be inferred independently.\n\n3\n\n\fLet node i be the current node of interest for which we would like to infer its incoming connections.\nThen the MLE of the ith column of A (designated A:,i) that models the strength of i\u2019s incoming\nedges, is the solution to minA:,i \u2212 log(Li(A:,i; D)), subject to the constraints 0 \u2264 Aji \u2264 1 for each\nj, and where\n\n\uf8ee\n\uf8f01 \u2212 Y\n\n\uf8f9\n\uf8fb \u00b7 Y\nj )Aji(cid:1)\n\nc\u2208D;\u03c4 c\n\n\uf8ee\n\uf8f0 Y\n\nj\u2208c;\u03c4 c\n\n(1 \u2212 Aji)\uf8f9\n\uf8fb .\n\ni <\u221e\n\nj <\u221e\n\ni =\u221e\n\nj;\u03c4j \u2264\u03c4i\n\nc\u2208D;\u03c4 c\n\ni \u2212 \u03c4 c\n\n(cid:0)1 \u2212 w(\u03c4 c\n\nLi(A:,i; D) = Y\nLastly, the number of variables can further be reduced by observing that if node j is never infected\nin the same cascade as node i, then the MLE of Aji = 0, and Aji can thus be excluded from the set\nof variables. This dramatically reduces the number of variables as in practice the true A does not\ninduce large cascades, causing the cascades to be sparse in the number of nodes they infect [14, 17].\nTowards the convex problem. The Hessian of the log-likelihood/likelihood functions are inde\ufb01nite\nin general, and this could make \ufb01nding the globally optimal MLE for A dif\ufb01cult. Here, we derive a\nconvex optimization problem that is equivalent to the above MLE problem. This not only guarantees\nconvergence to a globally optimal solution, but it also allows for the use of highly optimized convex\nprogramming methods.\nWe begin with the problem maxA:,i Li(A; D) subject to 0 \u2264 Aji \u2264 1 for each j. If we then make\nj )Aji(cid:1) , the problem\nthe change of variables Bji = 1 \u2212 Aji and \u03b3c = 1 \u2212Qj\u2208Xc (\u03c4 c\nthen becomes\n\u03b3c \u00b7 Y\n\ni ) (cid:0)1 \u2212 w(\u03c4 c\nY\n\n\u03b3c,B(:,i) Y\nmax\n\ni \u2212 \u03c4 c\n\nBji\n\nc\u2208D;\u03c4 c\n\ni <\u221e\n\nc\u2208D;\u03c4 c\n\ni =\u221e\n\nj\u2208c;\u03c4 c\n\nj <\u221e\n\nsubject to\n\n0 \u2264 Bji \u2264 1 \u2200 j\n0 \u2264 \u03b3c \u2264 1 \u2200 c\nj + wc\n\nj Bji(cid:1) \u2264 1 \u2200 c.\n\n\u03b3c + Y\n\nj\u2208Xc(\u03c4 c\ni )\n\n(cid:0)1 \u2212 wc\nj \u2261 w(\u03c4 c\n\ni \u2212 \u03c4 c\n\nwhere we use shorthand notation wc\nj ) (note that i is \ufb01xed). Also, note that the last\nconstraint on \u03b3c is an inequality instead of an equality constraint. The objective function will strictly\nincrease when either increasing \u03b3c or Bji, so this inequality will always be a binding constraint\nat the solution, i.e., the equality will always hold. The reason we use the inequality is that this\nturns the constraint into an upper bound on a posynomial (assuming w(t) \u2264 1 \u2200t). Furthermore,\nwith this change of variables the objective function is a monomial, and our problem satis\ufb01es all the\nrequirements for a geometric program. Now in order to convexify the geometric program, we apply\nthe change of variables \u02c6\u03b3 = log(\u03b3) and \u02c6Bji = log(Bji), and take the reciprocal of the objective\nfunction to turn it into a minimization problem. Finally, we take the logarithm of the objective\nfunction as well as the constraints, and we are left with the following convex optimization problem\n\nmin\n\n\u02c6\u03b3c, \u02c6B(:,i) X\n\nc\u2208D;\u03c4 c\n\ni <\u221e\n\n\u2212\u02c6\u03b3c\u2212 X\n\nc\u2208D;\u03c4 c\n\ni =\u221e\n\nsubject to\n\u02c6Bji \u2264 0 \u2200 j\n\u02c6\u03b3c \u2264 0 \u2200 c\n\nX\n\nj\u2208c;\u03c4 c\n\nj <\u221e\n\n\u02c6Bji\n\nlog\uf8ee\n\n\uf8f0exp \u02c6\u03b3c + Y\n\nj;\u03c4j \u2264\u03c4i\n\n(cid:16)1 \u2212 wc\n\nj + wc\n\nj exp \u02c6Bji(cid:17)\uf8f9\n\n\uf8fb \u2264 0 \u2200 c.\n\nNetwork sparsity.\nIn general, social networks are sparse in a sense that on average nodes are\nconnected to a constant number rather than a constant fraction of other nodes in the network. To en-\ncourage a sparse MLE solution, an l1 penalty term can be added to the original (pre-convexi\ufb01cation)\nlog-likelihood function, making the objective function\n\n\u2212 log Li(A:,i|D) + \u03c1\n\nN\n\nX\n\nj=1\n\n|Aji|\n\n4\n\n\fwhere \u03c1 is the sparsity parameter. Experimentation has indicated that including this penalty function\ndramatically increases the performance of the method; however, if we apply the same convexi\ufb01cation\nprocess to this new augmented objective function the resulting function is\n\nX\n\nc\u2208D;tc\n\ni <\u221e\n\n\u2212\u02c6\u03b3c \u2212 X\n\nc\u2208D;tc\n\ni =\u221e\n\nX\n\nj\u2208c;tc\n\nj <\u221e\n\n\u02c6Bji \u2212 \u03c1\n\nN\n\nX\n\nj=1\n\nexp \u02c6Bji,\n\nwhich is concave and makes the whole problem non-convex. Instead, we propose the use of the\npenalty function \u03c1PN\n. This penalty function still promotes a sparse solution, and even\nthough we no longer have a geometric program, we can convexify the objective function and so the\nglobal convexity is preserved:\n\n1\u2212Aji\n\nj=1\n\n1\n\nX\n\nc\u2208D;tc\n\ni <\u221e\n\n\u2212\u02c6\u03b3c \u2212 X\n\nc\u2208D;tc\n\ni =\u221e\n\nX\n\nj\u2208c;tc\n\nj <\u221e\n\n\u02c6Bji + \u03c1\n\nN\n\nX\n\nj=1\n\nexp(cid:16)\u2212 \u02c6Bji(cid:17) .\n\nImplementation. We use the SNOPT7 library to solve the likelihood optimization. We break the\nnetwork inference down into a series of subproblems corresponding to the inference of the inbound\nedges of each node. Special concern is needed for the sparsity penalty function. The presence of\nthe l1 penalty function makes the method extremely effective at predicting the presence of edges\nin the network, but it has the effect of distorting the estimated edge transmission probabilities. To\ncorrect for this, the inference problem is \ufb01rst solved with the l1 penalty. Of the resulting solution,\nthe edge transmission probabilities that have been set zero are then restricted to remain at zero, and\nthe problem is then relaxed with the sparsity parameter set to \u03c1 = 0. This preserves the precision\nand recall of the edge location prediction of the algorithm while still generating accurate edge trans-\nmission probability predictions. Moreover, with the implementation described above, most 1000\nnode networks can be inferred inside of 10 minutes, running on a laptop. A freely-distributable (but\nnon-scalable) MATLAB implementation can be found at http://snap.stanford.edu/connie.\n\n3 Experiments\n\nIn this section, we evaluate our network inference method, which we will refer to as ConNIe\n(Convex Network Inference) on a range of datasets and network topologies. This includes both\nsynthetically generated networks as well as real social networks, and both simulated and real dif-\nfusion data. In our experiments we focus on the SI model as it best applies to the real data we\nuse.\n\n3.1 Synthetic data\n\nEach of the synthetic data experiments begins with the construction of the network. We ran our\nalgorithm on a directed scale-free network constructed using the preferential attachment model [3],\nand also on a Erd\u02ddos-R\u00b4enyi random graph. Both networks have 512 nodes and 1024 edges. In each\ncase, the networks were constructed as unweighted graphs, and then each edge (i, j) was assigned a\nuniformly random transmission probability Aij between 0.05 and 1.\nTransmission time model. In all of our experiments, we assume that the model w(t) of trans-\nmission times is known. We experimented with various realistic models for the transmission\ntime [2]: exponential (w(t) = \u03b1e\u2212\u03b1t), power-law (w(t) \u221d (\u03b1 \u2212 1)t\u2212\u03b1) and the Weibull distri-\n\u03b1 )k(cid:17) as it has been argued that Weibull distribution of \u03b1 = 9.5 and\nbution (cid:16)w(t) = k\nk = 2.3 best describes the propagation model of the SARS outbreak in Hong Kong [26]. Notice that\nour model does not make any assumption about the structure of w(t). For example, our approach\ncan handle the exponential and power-law that both have a mode at 0 and monotonically decrease in\nt, as well as the Weibull distribution which can have a mode at any value.\n\n\u03b1(cid:1)k\u22121\n\n\u03b1 (cid:0) x\n\ne\u2212( x\n\nWe generate cascades by \ufb01rst selecting a random starting node of the infection. From there, the\ninfection is propagated to other nodes until no new infections occur: an infected node i transmits\nthe infection to uninfected j with probability Aij, and if transmission occurs then the propagation\ntime t is sampled according to the distribution w(t). The cascade is then given to the algorithm in\n\n5\n\n\fi\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n \n0\n\nConNIe\nNetinf\n\n0.5\n\nRecall\n\n \n\n1\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n \n0\n\nConNIe\nNetinf\n\n0.5\n\nRecall\n\n \n\n1\n\n(a) PR Curve (PL)\n\n(b) PR Curve (Exp)\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n \n\nConNIe\nNetinf\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\nRecall\n\n(c) PR Curve (WB)\n\n \n\nConNIe\nNetinf\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n \n0\n\n0.11\n\n0.1\n\n0.09\n\n0.08\n\n0.07\n\n0.06\n\n0.05\n\n0.04\n\n \n\nConNIe\nNetinf\n\nE\nS\nM\n\n \n\nConNIe\nNetinf\n\n0.15\n\nE\nS\nM\n\n0.1\n\n0.05\n\n \n1000 1500 2000 2500 3000\n\nNum. of Edges\n(d) MSE (PL)\n\n0.2\n\nE\nS\nM\n\n0.15\n\n0.1\n\n \n\n500\n\n1000\n\n1500\n\nNum. of Edges\n(e) MSE (Exp)\n\n \n800\n\n1000\n\n1200\n\n1400\nNum. of Edges\n\n1600\n\n(f) MSE (WB)\n\nFigure 1: (a)-(c): Precision and recall of ConNIe compared to NetInf for the SI diffusion model, run\non a synthetical scale-free graph with synthetically generated cascades. Transmission time models\nused are power law (PL), exponential (Exp), and Weibull (WB). All networks contain 512 nodes,\nand the weight of each edge was sampled from a uniform random distribution between 0 and 1. For\nthe MLE method, the PR curves were generated by varying the sparsity parameter \u03c1 between 0 and\n1000. (d)-(f): Mean square error of the edge transmission probability of the two algorithms. The\ndotted green line indicates the number of edges in the true network.\n\nthe form of a series of timestamps corresponding to when each node was infected. Not to make the\nproblem too easy we generate enough cascades so that 99% of all edges of the network transmitted\nat least one infection. The number of cascades needed for this depends on the underlying network.\nOverall, we generate on the same order of cascades as there are nodes in the network.\nQuantifying performance. To assess the performance of ConNIe, we consider both the accuracy of\nthe edge prediction, as well as the accuracy of edge transmission probability. For edge prediction, we\nrecorded the precision and recall of the algorithm. We simply vary the value of \u03c1 to obtain networks\non different numbers of edges and then for each such inferred network we compute precision (the\nnumber of correctly inferred edges divided by the total number of inferred edges), and recall (the\nnumber of correctly inferred edges divided by the total number of edges in the unobserved network).\nFor large values of \u03c1 inferred networks have high precision but low recall, while for low values of \u03c1\nthe precision will be poor but the recall will be high.\nTo assess the accuracy of the estimated edge transmission probabilities Aij, we compute the mean-\nsquare error (MSE). The MSE is taken over the union of potential edge positions (node pairs) where\nthere is an edge in the latent network, and the edge positions in which the algorithm has predicted\nthe presence of an edge. For potential edge locations with no edge present, the weight is set to 0.\nComparison to other methods. We compare our approach to NetInf which is an iterative algorithm\nbased on submodular function optimization [10]. NetInf \ufb01rst reconstructs the most likely structure of\neach cascade, and then based on this reconstruction, it selects the next most likely edge of the social\nnetwork. The algorithm assumes that the weights of all edges have the same constant value (i.e.,\nall nonzero Aij have the same value). To apply this algorithm to the problem we are considering,\nwe simply \ufb01rst use the NetInf to infer the network structure and then estimate the edge transmission\nprobabilities Aij by simply counting the fraction of times it was predicted that a cascade propagated\nalong the edge (i, j).\nFigure 1 shows the precision-recall curves for the scale-free synthetic network with the three trans-\nmission models w(t). The results for the Erd\u02ddos-R\u00b4enyi random graph were omitted due to space\nrestrictions, but they were very similar. Notice our approach achieves the break even point (point\nwhere precision equals recall) well above 0.85. This is a notable result: we were especially careful\nnot to generate too many cascades, since more cascades mean more evidence that makes the problem\neasier. Also in Figure 1 we plot the Mean Squared Error of the estimates of the edge transmission\n\n6\n\n\f1\n\n0.8\n\n0.6\n\n0.4\n\nn\ne\nv\ne\n\u2212\nk\na\ne\nr\nB\nR\nP\n\n \n\n0.2\n\n \n0\n\n \n\nConNIe\nNetinf\n\n1000\n3000\nNumber of Diffusions\n\n2000\n\n4000\n\n1\n\n0.8\n\n0.6\n\nn\ne\nv\ne\n\u2212\nk\na\ne\nr\nB\nR\nP\n\n \n\n0.4\n\n \n0\n\n \n\nConNIe\nNetinf\n\n1000\n\n2000\n\nNumber of Diffusions\n\n3000\n\n(a) PR Break-even (PL)\n\n(b) PR Break-even (EXP)\n\n0.2\n\n0.15\n\n0.1\n\n0.05\n\nn\ne\nv\ne\n\u2212\nk\na\ne\nr\nB\n\n \nt\n\n \n\na\nE\nS\nM\n\n0\n\n \n0\n\nConNIe\nNetinf\n\n \n\nt\n\ni\n\n \n\nn\no\nP\nn\ne\nv\ne\n\u2212\nk\na\ne\nr\nB\nR\nP\n\n \n\n1000\n\n2000\n\n3000\n\nNumber of Diffusions\n(d) MSE (EXP)\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n \n\n0\n\n0.2\n\n0.15\n\n0.1\n\n0.05\n\nn\ne\nv\ne\n\u2212\nk\na\ne\nr\nB\n\n \nt\n\n \n\na\nE\nS\nM\n\n0\n\n \n0\n\ne\nm\n\ni\nt\n\nn\nu\nR\n\n101\n\n100\n\n \n\nConNIe\nNetinf\n\n2000\n\n1000\n3000\nNumber of Diffusions\n(c) MSE (PL)\n\n4000\n\n \n\nConNIe\nNetinf\n\n \n\nConNIe\nNetinf\n\n0.2\n\n0.6\nNoise to Signal Ratio\n\n0.4\n\n0.8\n\n \n100\n\n300\n\n500\n\nNetwork Size\n\n(e) PR vs. Noise/Signal\n\n(f) Runtime vs. Network Size\n\nFigure 2: (a)-(b): Precision-Recall break-even point for the two methods as a function of the num-\nber of observed cascades, with a power law (PL) and exponential (EXP) transmission distribution.\n(c)-(d): Mean Square Error at the PR-Break-even point as a function of the number of observed\ncascades. (e) PR Break-even point versus the perturbation size applied to the infection times.\n\nprobability Aij as a function of the number of edges in the inferred network. The green vertical\nline indicates the point where the inferred network contains the same number of edges as the real\nnetwork. Notice that ConNIe estimates the edge weights with error less than 0.05, which is more\nthan a factor of two smaller than the error of the NetInf algorithm. This, of course, is expected as\nNetInf assumes the network edge weights are homogeneous, which is not the case.\n\nWe also tested the robustness of our algorithm. Figure 2 shows the accuracy (Precision-Recall\nbreak-even point as well as edge MSE) as a function of the number of observed diffusions, as well\nas the effect of noise in the infection times. Noise was added to the cascades by adding indepen-\ndent normally distribution perturbations to each of the observed infection times, and the noise to\nsignal ratio was calculated as the average perturbation over the average infection transmission time.\nThe plot shows that ConNIe is robust against such perturbations, as it can still accurately infer the\nnetwork with noise to signal ratios as high as 0.4.\n\n3.2 Experiments on Real data\n\nReal social networks. We also experiment with three real-world networks. First, we consider a\nsmall collaboration network between 379 scientists doing research on networks. Second, we ex-\nperiment on a real email social network of 593 nodes and 2824 edges that is based on the email\ncommunication in a small European research institute.\n\nFor the edges in the collaboration network we simply randomly assigned their edge transmission\nprobabilities. For the email network, the number of emails sent from a person i to a person j\nindicates the connection strength. Let there be a rumor cascading through a network, and assume\nthe probability that any one email contains this rumor is \ufb01xed at \u03be. Then if person i sent person j\nmij emails, the probability of i infecting j with the rumor is Aij = 1 \u2212 (1 \u2212 \u03c6)(1 \u2212 \u03be)mij . The\nparameter \u03c6 simply enforces a minimum edge weight between the pairs who have exchanged least\none email. We set \u03be = .001 and \u03c6 = .05.\nFor the email network we generated cascades using the power-law transmission time model, while\nfor the collaboration network we used the Weibull distribution for sampling transmission times. We\nthen ran the network inference on cascades, and Figure 3 gives the results. Similarly as with syn-\nthetic networks our approach achieves break even points of around 0.95 on both datasets. Moreover,\nthe edge transmission probability estimation error is less than 0.03. This is ideal: our method is ca-\npable of near perfect recovery of the underlying social network over which a relatively small number\nof contagions diffused.\n\n7\n\n\fi\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n \n0\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n \n0\n\nConNIe\nNetinf\n\n0.5\n\nRecall\n\nConNIe\nNetinf\n\n0.5\n\nRecall\n\n \n\n1\n\n \n\n1\n\n0.04\n\n0.03\n\nE\nS\nM\n\n0.02\n\n0.01\n\n \n\n0\n2000\n\n0.055\n\n0.05\n\n0.045\n\n0.04\n\n0.035\n\nE\nS\nM\n\n0.03\n\n \n\n600\n\n \n\nConNIe\nNetinf\n\n2500\n3000\nNum. of Edges\n\n3500\n\n \n\nConNIe\nNetinf\n\n700\n\n800\n\n900\n\n1000\n\nNum. of Edges\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n \n0\n\n \n\nConNIe\nNetinf\n\n0.2\n\n0.4\n\nRecall\n\n0.6\n\n0.8\n\nNetwork estimation\n\nEdge weight error\n\nRecommendation network\n\nFigure 3: The precision-recall curve of the network estimation and the mean-square error (left) of\npredicted transmission probabilities as a function of number edges being predicted (middle). Top\nrow shows the results for the email network, and the bottom row for the collaboration network.\n(Right) Precision-recall curve on inferring a real recommendation network based on real product\nrecommendation data.\n\nReal social networks and real cascades. Last, we investigate a large person-to-person recommen-\ndation network, consisting of four million people who made sixteen million recommendations on\nhalf a million products [14]. People generate cascades as follows: a node (person) v buys product p\nat time t, and then recommends it to nodes {w1, . . . , wn}. These nodes wi can then buy the product\n(with the option to recommend it to others). We trace cascades of purchases on a small subset of the\ndata. We consider a recommendation network of 275 users and 1522 edges and a set of 5,767 rec-\nommendations on 625 different products between a set of these users. Since the edge transmission\nmodel is unknown we model it with a power-law distribution with parameter \u03b1 = 2.\nWe present the results in rightmost plot of Figure 3. Our approach is able to recover the underlying\nsocial network surprisingly accurately. The break even point of our approach is 0.74 while NetInf\nscores 0.55. Moreover, we also note that our approach took less than 20 seconds to infer this net-\nwork. Since there are no ground truth edge transmission probabilities for us to compare against, we\ncan not compute the error of edge weight estimation.\n\n4 Conclusion\n\nWe have presented a general solution to the problem of inferring latent social networks from the\nnetwork diffusion data. We formulated a maximum likelihood problem and by solving an equivalent\nconvex problem, we can guarantee the optimality of the solution. Furthermore, the l1 regularization\ncan be used to enforce a sparse solution while still preserving convexity. We evaluated our algo-\nrithm on a wide set of synthetic and real-world networks with several different cascade propagation\nmodels. We found our method to be more general and robust than the competing approaches. Ex-\nperiments reveal that our method near-perfectly recovers the underlying network structure as well as\nthe parameters of the edge transmission model. Moreover, our approach scales well as it can infer\noptimal networks on thousand nodes in a matter of minutes.\n\nOne possible venue for future work is to also include learning the parameters of the underlying model\nof diffusion times w(t). It would be fruitful to apply our approach to other datasets, like the spread\nof a news story breaking across the blogosphere, a SARS outbreak, or a new marketing campaign\non a social networking website, and to extend it to additional models of diffusion. By inferring and\nmodeling the structure of such latent social networks, we can gain insight into positions and roles\nvarious nodes play in the diffusion process and assess the range of in\ufb02uence of nodes in the network.\nAcknowledgements. This research was supported in part by NSF grants CNS-1010921, IIS-\n1016909, LLNL grant B590105, the Albert Yu and Mary Bechmann Foundation, IBM, Lightspeed,\nMicrosoft and Yahoo.\n\n8\n\n\fReferences\n\n[1] A. Ahmed and E. Xing. Recovering time-varying networks of dependencies in social and\n\nbiological studies. PNAS, 106(29):11878, 2009.\n\n[2] N. T. J. Bailey. The Mathematical Theory of Infectious Diseases and its Applications. Hafner\n\nPress, 2nd edition, 1975.\n\n[3] A.-L. Barab\u00b4asi and R. Albert. Emergence of scaling in random networks. Science, 1999.\n[4] M. Choudhury, W. A. Mason, J. M. Hofman, and D. J. Watts. Inferring relevant social networks\n\nfrom interpersonal communication. In WWW \u201910, pages 301\u2013310, 2010.\n\n[5] N. Eagle, A. S. Pentland, and D. Lazer. Inferring friendship network structure by using mobile\n\nphone data. PNAS, 106(36):15274\u201315278, 2009.\n\n[6] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graph-\n\nical lasso. Biostat, 9(3):432\u2013441, 2008.\n\n[7] L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of link struc-\n\nture. JMLR, 3:707, 2003.\n\n[8] Z. Ghahramani. Learning dynamic Bayesian networks. Adaptive Processing of Sequences and\n\nData Structures, page 168, 1998.\n\n[9] L. Giot, J. Bader, C. Brouwer, A. Chaudhuri, B. Kuang, Y. Li, Y. Hao, C. Ooi, B. Godwin,\net al. A protein interaction map of Drosophila melanogaster. Science, 302(5651):1727, 2003.\n[10] M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and in\ufb02u-\n\nence. In KDD \u201910, 2010.\n\n[11] S. Hill, F. Provost, and C. Volinsky. Network-based marketing: Identifying likely adopters via\n\nconsumer networks. Statistical Science, 21(2):256\u2013276, 2006.\n\n[12] R. Jansen, H. Yu, D. Greenbaum, et al. A bayesian networks approach for predicting protein-\n\nprotein interactions from genomic data. Science, 302(5644):449\u2013453, October 2003.\n\n[13] G. Kossinets and D. J. Watts. Empirical analysis of an evolving social network. Science, 2006.\n[14] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM\n\nTWEB, 1(1):2, 2007.\n\n[15] J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news\n\ncycle. In KDD \u201909, pages 497\u2013506, 2009.\n\n[16] J. Leskovec and E. Horvitz. Planetary-scale views on a large instant-messaging network. In\n\nWWW \u201908, 2008.\n\n[17] J. Leskovec, A. Singh, and J. M. Kleinberg. Patterns of in\ufb02uence in a recommendation network.\n\nIn PAKDD \u201906, pages 380\u2013389, 2006.\n\n[18] D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM\n\n\u201903, pages 556\u2013559, 2003.\n\n[19] N. Meinshausen and P. Buehlmann. High-dimensional graphs and variable selection with the\n\nlasso. The Annals of Statistics, pages 1436\u20131462, 2006.\n\n[20] M. Middendorf, E. Ziv, and C. Wiggins.\n\nmelanogaster protein interaction network. PNAS, 102(9):3192, 2005.\n\nInferring network mechanisms:\n\nthe Drosophila\n\n[21] M. Schmidt, A. Niculescu-Mizil, and K. Murphy. Learning graphical model structure using\n\nl1-regularization paths. In AAAI, volume 22, page 1278, 2007.\n\n[22] L. Song, M. Kolar, and E. Xing. Time-varying dynamic bayesian networks. In NIPS \u201909.\n[23] B. Taskar, M. F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. NIPS \u201903.\n[24] J. Vert and Y. Yamanishi. Supervised graph inference. NIPS \u201905.\n[25] M. J. Wainwright, P. Ravikumar, and J. D. Lafferty. High-dimensional graphical model selec-\n\ntion using \u21131-regularized logistic regression. In PNAS, 2006.\n\n[26] J. Wallinga and P. Teunis. Different epidemic curves for severe acute respiratory syndrome\nreveal similar impacts of control measures. Amer. J. of Epidemiology, 160(6):509\u2013516, 2004.\n[27] S. Wasserman and K. Faust. Social Network Analysis : Methods and Applications. Cambridge\n\nUniversity Press, 1994.\n\n9\n\n\f", "award": [], "sourceid": 1257, "authors": [{"given_name": "Seth", "family_name": "Myers", "institution": null}, {"given_name": "Jure", "family_name": "Leskovec", "institution": null}]}