{"title": "Learning Networks of Heterogeneous Influence", "book": "Advances in Neural Information Processing Systems", "page_first": 2780, "page_last": 2788, "abstract": "Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting events in the future. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we attempt to address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence among different entities in a network are heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernel-based method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the influence functions between networked entities.", "full_text": "Learning Networks of Heterogeneous In\ufb02uence\n\nNan Du\u2217 Le Song\u2217 Alex Smola\u2020 Ming Yuan\u2217\nGeorgia Institute of Technology\u2217, Google Research\u2020\ndunan@gatech.edu lsong@cc.gatech.edu\nalex@smola.org myuan@isye.gatech.edu\n\nAbstract\n\nInformation, disease, and in\ufb02uence diffuse over networks of entities in both nat-\nural systems and human society. Analyzing these transmission networks plays\nan important role in understanding the diffusion processes and predicting future\nevents. However, the underlying transmission networks are often hidden and in-\ncomplete, and we observe only the time stamps when cascades of events happen.\nIn this paper, we address the challenging problem of uncovering the hidden net-\nwork only from the cascades. The structure discovery problem is complicated by\nthe fact that the in\ufb02uence between networked entities is heterogeneous, which can\nnot be described by a simple parametric model. Therefore, we propose a kernel-\nbased method which can capture a diverse range of different types of in\ufb02uence\nwithout any prior assumption. In both synthetic and real cascade data, we show\nthat our model can better recover the underlying diffusion network and drastically\nimprove the estimation of the transmission functions among networked entities.\n\nIntroduction\n\n1\nNetworks have been powerful abstractions for modeling a variety of natural and arti\ufb01cial systems\nthat consist of a large collection of interacting entities. Due to the recent increasing availability\nof large-scale networks, network modeling and analysis have been extensively applied to study\nthe spreading and diffusion of information, ideas, and even virus in social and information net-\nworks (see e.g., [17, 5, 18, 1, 2]). However, the process of in\ufb02uence and diffusion often occurs in\na hidden network that might not be easily observed and identi\ufb01ed directly. For instance, when a\ndisease spreads among people, epidemiologists can know only when a person gets sick, but they\ncan hardly ever know where and from whom he (she) gets infected. Similarly, when consumers\nrush to buy some particular products, marketers can know when purchases occurred, but they cannot\ntrack in further where the recommendations originally came from [12]. In all such cases, we could\nobserve only the time stamp when a piece of information has been received by a particular entity,\nbut the exact path of diffusion is missing. Therefore, it is an interesting and challenging question\nwhether we can uncover the diffusion paths based just on the time stamps of the events.\nThere are many recent studies on estimating correlation or causal structures from multivariate time-\nseries data (see e.g., [2, 6, 13]). However, in these models, time is treated as discrete index and not\nmodeled as a random variable. In the diffusion network discovery problem, time is treated explic-\nitly as a continuous variable, and one is interested in capturing how the occurrence of event at one\nnode affects the time for its occurence at other nodes. This problem recently has been explored by\na number of studies in the literature. Speci\ufb01cally, Meyers and Leskovec inferred the diffusion net-\nwork by learning the infection probability between two nodes using a convex programming, called\nCONNIE [14]. Gomez-Rodriguez et al. inferred the network connectivity using a submodular opti-\nmization, called NETINF [4]. However, both CONNIE and NETINF assume that the transmission\nmodel for each pair of nodes is \ufb01xed with prede\ufb01ned transmission rate. Recently, Gomez-Rodriguez\net al. proposed an elegant method, called NETRATE [3], using continuous temporal dynamics\nmodel to allow variable diffusion rates across network edges. NETRATE makes fewer number of\nassumptions and achieves better performance in various aspects than the previous two approaches.\nHowever, the limitation of NETRATE is that it requires the in\ufb02uence model on each edge to have a\n\n1\n\n\f(a) Pair 1\n\n(b) Pair 2\n\n(c) Pair 3\n\nFigure 1: The histograms of the interval between the time when a post appeared in one site and the time when\na new post in another site links to it. Dotted and dash lines are density \ufb01tted by NETRATE. The solid lines are\ngiven by KernelCascade.\n\ufb01xed parametric form, such as exponential, power-law, or Rayleigh distribution, although the model\nparameters learned from cascades could be different.\nIn practice, the patterns of information diffusion (or a spreading disease) among entities can be quite\ncomplicated and different from each other, going far beyond what a single family of parametric\nmodels can capture. For example, in twitter, an active user can be online for more than 12 hours a\nday, and he may instantly respond to any interesting message. However, an inactive user may just\nlog in and respond once a day. As a result, the spreading pattern of the messages between the active\nuser and his friends can be quite different from that of the inactive user.\nAnother example is from the information diffusion in a blogsphere: the hyperlinks between posts can\nbe viewed as some kind of information \ufb02ow from one media site to another, and the time difference\nbetween two linked posts reveal the pattern of diffusion. In Figure 1, we examined three pairs of\nmedia sites from the MemeTracker dataset [3, 9], and plotted the histograms of the intervals between\nthe the moment when a post \ufb01rst appeared in one site and the moment when it was linked by a new\npost in another site. We can observe that information can have very different transmission patterns\nfor these pairs. Parametric models \ufb01tted by NETRATE may capture the simple pattern in Figure 1(a),\nbut they might miss the multimodal patterns in Figure 1(b) and Figure 1(c). In contrast, our method,\ncalled KernelCascade, is able to \ufb01t both data accurately and thus can handle the heterogeneity.\nIn the reminder of this paper, we present the details of our approach KernelCascade. Our key idea\nis to model the continuous information diffusion process using survival analysis by kernelizing the\nhazard function. We obtain a convex optimization problem with grouped lasso type of regularization\nand develop a fast block-coordinate descent algorithm for solving the problem. The sparsity patterns\nof the coef\ufb01cients provide us the structure of the diffusion network. In both synthetic and real world\ndata, our method can better recover the underlying diffusion networks and drastically improve the\nestimation of the transmission functions among networked entities.\n2 Preliminary\nIn this section, we will present some basic concepts from survival analysis [7, 8], which are essential\nfor our later modeling. Given a nonnegative random variable T corresponding to the time when an\n0 f (x)dx\nbe its cumulative distribution function. The probability that an event does not happen up to time t\nis thus given by the survival function S(t) = P r(T \u2265 t) = 1 \u2212 F (t). The survival function is a\ncontinuous and monotonically decreasing function with S(0) = 1 and S(\u221e) = limt\u2192\u221e S(t) = 0.\nGiven f (t) and S(t), we can de\ufb01ne the instantaneous risk (or rate) that an event has not happened\nyet up to time t but happens at time t by the hazard function\n\nevent happens, let f (t) be the probability density function of T and F (t) = P r(T \u2264 t) =(cid:82) t\n\nP r(t \u2264 T \u2264 t + \u2206t|T \u2265 t)\n\nh(t) = lim\n\u2206t\u21920\n\n\u2206t\n\n=\n\nf (t)\nS(t)\n\n.\n\n(1)\n\nWith this de\ufb01nition, h(t)\u2206t will be the approximate probability that an event happens in [t, t + \u2206t)\ngiven that the event has not happened yet up to t. Furthermore, the hazard function h(t) is also\nrelated to the survival function S(t) via the differential equation h(t) = \u2212 d\ndt log S(t), where we\nhave used f (t) = \u2212S(cid:48)(t). Solving the differential equation with boundary condition S(0) = 1, we\ncan recover the survival function S(t) and the density function f (t) based on the hazard function\nh(t), i.e.,\n\nS(t) = exp\n\nh(x) dx\n\nand\n\nf (t) = h(t) exp\n\nh(x) dx\n\n.\n\n(2)\n\n(cid:90) t\n\n(cid:18)\n\n\u2212\n\n(cid:19)\n\n(cid:90) t\n\n(cid:18)\n\n\u2212\n\n(cid:19)\n\n0\n\n0\n\n2\n\n0102030405000.10.20.3t(hours)pdf histogramexprayleighKernelCascade0204000.020.040.060.080.1t(hours)pdf histogramexprayleighKernelCascade05010000.020.040.060.080.1t(hours)pdf histogramexprayleighKernelCascade\f(a) Hidden network\n\n(b) Node e gets infected at time t4\n\n(c) Node e survives\n\nFigure 2: Cascades over a hidden network. Solid lines in panel(a) represent connections in a hidden network.\nIn panel (b) and (c), \ufb01lled circles indicate infected nodes while empty circles represent uninfected ones. Node\na, b, c and d are the parents of node e which got infected at t0 < t1 < t2 < t3 respectively and tended to infect\nnode e. In panel (b), node e survives given node a, b and c shown in green dash lines. However, it was infected\nby node d. In panel (c), node e survives even though all its parents got infected.\n3 Modeling Cascades using Survival Analysis\nWe use survival analysis to model information diffusion for networked entities. We will largely\nfollow the presentation of Gomez-Rodriguez et al. [3], but add clari\ufb01cation when necessary. We\nassume that there is a \ufb01xed population of N nodes connected in a directed network G = (V,E).\nNeighboring nodes are allowed to directly in\ufb02uence each other. Nodes along a directed path may\nin\ufb02uence each other only through a diffusion process. Because the true underlying network is un-\nknown, our observations are only the time stamps when events occur to each node in the network.\nThe time stamps are then organized as cascades, each of which corresponds to a particular event.\nFor instance, a piece of news posted on CNN website about \u201cFacebook went public\u201d can be treated\nas an event. It can spread across the blogsphere and trigger a sequence of posts from other sites\nreferring to it. Each site will have a time stamp when this particular piece of news is being discussed\nand cited. The goal of the model is to capture the interplay between the hidden diffusion network\nand the cascades of observed event time stamps.\nMore formally, a directed edge, j \u2192 i, is associated with an transmission function fji(ti|tj), which\nis the conditional likelihood of an event happening to node i at time ti given that the same event has\nalready happened to node j at time tj. The transmission function attempts to capture the temporal\ndependency between the two successive events for node i and j. In addition, we focus on shift-\ninvariant transmission functions whose value only depends on the time difference, i.e., fji(ti|tj) =\nfji(ti \u2212 tj) = fji(\u2206ji) where \u2206ji := ti \u2212 tj. Given the likelihood function, we can compute the\ncorresponding survival function Sji(\u2206ji) and hazard function hji(\u2206ji). When there is no directed\nedge j \u2192 i, the transmission function and hazard function are both identically zeros, i.e., fji(\u2206ji) =\n0 and hji(\u2206ji) = 0, but the survival function is identically one, i.e., Sji(\u2206ji) = 1. Therefore, the\nstructure of the diffusion network is re\ufb02ected in the non-zero patterns of a collection of transmission\nfunctions (or hazard functions).\nN )(cid:62) with i-th dimension recording the time\nA cascade is an N-dimensional vector tc := (tc\n1, . . . , tc\ni \u2208 [0, T c] \u222a {\u221e}, and the symbol \u221e labels\nstamp when event c occurs to node i. Furthermore, tc\nnodes that have not been in\ufb02uenced during observation window [0, T c] \u2014 it does not imply that\nnodes are never in\ufb02uenced. The \u2018clock\u2019 is set to 0 at the start of each cascade. A dataset can\n\ncontain a collection, C, of cascades(cid:8)t1, . . . , t|C|(cid:9). The time stamps assigned to nodes by a cascade\n\ninduce a directed acyclic graph (DAG) by de\ufb01ning node j as the parent of i if tj < ti. Thus, it\nis meaningful to refer to parents and children within a cascade [3], which is different from the\nparent-child structural relation on the true underlying diffusion network. Since the true network is\ninferred from many cascades (each of which imposes its own DAG structure), the inferred network\nis typically not a DAG.\nThe likelihood (cid:96)(tc) of a cascade induced by event c is then simply a product of all individual\nlikelihood (cid:96)i(tc) that event c occurs to each node i. Depending on whether event c actually occurs\nto node i in the data, we can compute this individual likelihood as:\nEvent c did occur at node i. We assume that once an event occurs at node i under the in\ufb02uence of\na particular parent j in a cascade, the same event will not happen again. In Figure 2(b), node e is\nsusceptible given its parent a, b, c and d. However, only node d is the \ufb01rst parent who infects node e.\nBecause each parent could be equally likely to \ufb01rst in\ufb02uence node i, the likelihood is just a simple\nsum over the likelihoods of the mutually disjoint events that node i has survived from the in\ufb02uence\nof all the other parents except the \ufb01rst parent j, i.e.,\nSki(\u2206c\n\n(cid:88)\n\n(cid:88)\n\n(cid:89)\n\n(cid:89)\n\nSki(\u2206c\n\nki).\n\nhji(\u2206c\n\nji)\n\nki) =\n\n(cid:96)+\ni (tc) =\n\nfji(\u2206c\n\nji)\n\nj:tc\n\nj T c\n\n(cid:123)(cid:122)\n\nuninfected nodes\n\n\u00d7 (cid:89)\n(cid:124)\n\n(cid:125)\n\n(cid:123)(cid:122)\n\ni\u2264T c\ntc\ninfected nodes\n\n(cid:125)\n\n(4)\n\n\uf8f6\uf8f7\uf8f8 (6)\n\nthe likelihood of all cascades is a product of the these individual cascade likeli-\nc=1,...,|C| (cid:96)(tc). In the end, we take the negative log of this likeli-\n\nhood function and regroup all terms associated with edges pointing to node i together to derive\n\nTherefore,\n\nhoods, i.e. (cid:96)({t1, . . . , t|C|}) =(cid:81)\n\uf8eb\uf8ec\uf8ed(cid:88)\nL({t1, . . . , t|C|}) = \u2212(cid:88)\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\ni\n\nj\n\n{c|tc\n\ni}\nj