{"title": "COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution", "book": "Advances in Neural Information Processing Systems", "page_first": 1954, "page_last": 1962, "abstract": "Information diffusion in online social networks is affected by the underlying network topology, but it also has the power to change it. Online users are constantly creating new links when exposed to new information sources, and in turn these links are alternating the way information spreads. However, these two highly intertwined stochastic processes, information diffusion and network evolution, have been predominantly studied separately, ignoring their co-evolutionary dynamics.We propose a temporal point process model, COEVOLVE, for such joint dynamics, allowing the intensity of one process to be modulated by that of the other. This model allows us to efficiently simulate interleaved diffusion and network events, and generate traces obeying common diffusion and network patterns observed in real-world networks. Furthermore, we also develop a convex optimization framework to learn the parameters of the model from historical diffusion and network evolution traces. We experimented with both synthetic data and data gathered from Twitter, and show that our model provides a good fit to the data as well as more accurate predictions than alternatives.", "full_text": "COEVOLVE: A Joint Point Process Model for\nInformation Diffusion and Network Co-evolution\n\nMehrdad Farajtabar\u2217\n\nShuang Li\u2217\n\nYichen Wang\u2217\n\nHongyuan Zha\u2217\n\nManuel Gomez-Rodriguez\u2020\n\nLe Song\u2217\n\nGeorgia Institute of Technology\u2217\n\nMPI for Software Systems\u2020\n\n{mehrdad,yichen.wang,sli370}@gatech.edu\n\n{zha,lsong}@cc.gatech.edu\n\nmanuelgr@mpi-sws.org\n\nAbstract\n\nInformation diffusion in online social networks is affected by the underlying net-\nwork topology, but it also has the power to change it. Online users are constantly\ncreating new links when exposed to new information sources, and in turn these\nlinks are alternating the way information spreads. However, these two highly in-\ntertwined stochastic processes, information diffusion and network evolution, have\nbeen predominantly studied separately, ignoring their co-evolutionary dynamics.\nWe propose a temporal point process model, COEVOLVE, for such joint dyna-\nmics, allowing the intensity of one process to be modulated by that of the other.\nThis model allows us to ef\ufb01ciently simulate interleaved diffusion and network\nevents, and generate traces obeying common diffusion and network patterns ob-\nserved in real-world networks. Furthermore, we also develop a convex optimiza-\ntion framework to learn the parameters of the model from historical diffusion and\nnetwork evolution traces. We experimented with both synthetic data and data ga-\nthered from Twitter, and show that our model provides a good \ufb01t to the data as\nwell as more accurate predictions than alternatives.\n\n1 Introduction\nOnline social networks, such as Twitter or Weibo, have become large information networks where\npeople share, discuss and search for information of personal interest as well as breaking news [1].\nIn this context, users often forward to their followers information they are exposed to via their\nfollowees, triggering the emergence of information cascades that travel through the network [2],\nand constantly create new links to information sources, triggering changes in the network itself\nover time. Importantly, recent empirical studies with Twitter data have shown that both information\ndiffusion and network evolution are coupled and network changes are often triggered by information\ndiffusion [3, 4, 5].\nWhile there have been many recent works on modeling information diffusion [2, 6, 7, 8] and network\nevolution [9, 10, 11], most of them treat these two stochastic processes independently and separately,\nignoring the in\ufb02uence one may have on the other over time. Thus, to better understand information\ndiffusion and network evolution, there is an urgent need for joint probabilistic models of the two\nprocesses, which are largely inexistent to date.\nIn this paper, we propose a probabilistic generative model, COEVOLVE, for the joint dynamics of\ninformation diffusion and network evolution. Our model is based on the framework of temporal\npoint processes, which explicitly characterize the continuous time interval between events, and it\nconsists of two interwoven and interdependent components (refer to Appendix B for an illustration):\nI. Information diffusion process. We design an \u201cidentity revealing\u201d multivariate Hawkes pro-\ncess [12] to capture the mutual excitation behavior of retweeting events, where the intensity of\nsuch events in a user is boosted by previous events from her time-varying set of followees. Al-\n\n1\n\n\fthough Hawkes processes have been used for information diffusion before [13, 14, 15, 16, 17, 18,\n19], the key innovation of our approach is to explicitly model the excitation due to a particular\nsource node, hence revealing the identity of the source. Such design re\ufb02ects the reality that infor-\nmation sources are explicitly acknowledged, and it also allows a particular information source to\nacquire new links in a rate according to her \u201cinformativeness\u201d.\n\nII. Network evolution process. We model link creation as an \u201cinformation driven\u201d survival process,\nand couple the intensity of this process with retweeting events. Although survival processes have\nbeen used for link creation before [20, 21], the key innovation in our model is to incorporate re-\ntweeting events as the driving force for such processes. Since our model has captured the source\nidentity of each retweeting event, new links will be targeted toward the information sources, with\nan intensity proportional to their degree of excitation and each source\u2019s in\ufb02uence.\n\nOur model is designed in such a way that it allows the two processes, information diffusion and\nnetwork evolution, unfold simultaneously in the same time scale and excise bidirectional in\ufb02uence\non each other, allowing sophisticated coevolutionary dynamics to be generated (e.g., see Figure 5).\nImportantly, the \ufb02exibility of our model does not prevent us from ef\ufb01ciently simulating diffusion\nand link events from the model and learning its parameters from real world data:\n\u2022 Ef\ufb01cient simulation. We design a scalable sampling procedure that exploits the sparsity of the\ngenerated networks. Its complexity is O(nd log m), where n is the number of samples, m is the\nnumber of nodes and d is the maximum number of followees per user.\n\u2022 Convex parameters learning. We show that the model parameters that maximize the joint like-\nFinally, we experimentally verify that our model can produce coevolutionary dynamics of infor-\nmation diffusion and network evolution, and generate retweet and link events that obey common\ninformation diffusion patterns (e.g., cascade structure, size and depth), static network patterns (e.g.,\nnode degree) and temporal network patterns (e.g., shrinking diameter) described in related litera-\nture [22, 10, 23]. Furthermore, we show that, by modeling the coevolutionary dynamics, our model\nprovide signi\ufb01cantly more accurate link and diffusion event predictions than alternatives in large\nscale Twitter dataset [3].\n\nlihood of observed diffusion and link creation events can be found via convex optimization.\n\n2 Backgrounds on Temporal Point Processes\nA temporal point process is a random process whose realization consists of a list of discrete events\nlocalized in time, {ti} with ti \u2208 R+ and i \u2208 Z+. Many different types of data produced in online\nsocial networks can be represented as temporal point processes, such as the times of retweets and\nlink creations. A temporal point process can be equivalently represented as a counting process, N (t),\nwhich records the number of events before time t. Let the history H(t) be the list of times of events\n{t1, t2, . . . , tn} up to but not including time t. Then, the number of observed events in a small time\nwindow dt between [t, t+dt) is dN (t) =!ti\u2208H(t) \u03b4(t\u2212ti) dt, and hence N (t) =\" t\n0 dN (s), where\n\u03b4(t) is a Dirac delta function. More generally, given a function f (t), we can de\ufb01ne the convolution\nwith respect to dN (t) as\n\nf (t) \u22c6dN (t) :=# t\n\n0\n\nf (t \u2212 \u03c4 ) dN (\u03c4 ) =$ti\u2208H(t)\n\nf (t \u2212 ti).\n\n(1)\n\nThe point process representation of temporal data is fundamentally different from the discrete time\nrepresentation typically used in social network analysis. It directly models the time interval between\nevents as random variables, and avoid the need to pick a time window to aggregate events. It allows\ntemporal events to be modeled in a more \ufb01ne grained fashion, and has a remarkably rich theoretical\nsupport [24].\nAn important way to characterize temporal point processes is via the conditional intensity function\n\u2014 a stochastic model for the time of the next event given all the times of previous events. For-\nmally, the conditional intensity function \u03bb\u2217(t) (intensity, for short) is the conditional probability of\nobserving an event in a small window [t, t + dt) given the history H(t), i.e.,\n\n\u03bb\u2217(t)dt := P{event in [t, t + dt)|H(t)} = E[dN (t)|H(t)],\n\n(2)\nwhere one typically assumes that only one event can happen in a small window of size dt,\ni.e., dN (t) \u2208{ 0, 1}. Then, given a time t\u2032 ! t, we can also characterize the conditional proba-\nbility that no event happens during [t, t\u2032) and the conditional density that an event occurs at time t\u2032\n\n2\n\n\fas S\u2217(t\u2032) = exp(\u2212\" t\u2032\nexpress the log-likelihood of a list of events {t1, t2, . . . , tn} in an observation window [0, T ) as\n\nt \u03bb\u2217(\u03c4 ) d\u03c4 ) and f\u2217(t\u2032) = \u03bb\u2217(t\u2032) S\u2217(t\u2032) respectively [24]. Furthermore, we can\n\n\u03bb\u2217(\u03c4 ) d\u03c4, T ! tn.\n\n(3)\n\nL =\n\nlog \u03bb\u2217(ti) \u2212# T\n\n0\n\nn$i=1\n\nThis simple log-likelihood will later enable us to learn the parameters of our model from observed\ndata.\nFinally, the functional form of the intensity \u03bb\u2217(t) is often designed to capture the phenomena of\ninterests. Some useful functional forms we will use later are [24]:\n\n(i) Poisson process. The intensity is assumed to be independent of the history H(t), but it can be\na time-varying function, i.e., \u03bb\u2217(t) = g(t) ! 0;\n(ii) Hawkes Process. The intensity models a mutual excitation between events, i.e.,\n\n\u03bb\u2217(t) = \u00b5 + \u03b1\u03ba\u03c9(t) \u22c6dN (t) = \u00b5 + \u03b1$ti\u2208H(t)\n\n(4)\nwhere \u03ba\u03c9(t) := exp(\u2212\u03c9t)I[t ! 0] is an exponential triggering kernel, \u00b5 ! 0 is a baseline\nintensity independent of the history. Here, the occurrence of each historical event increases the\nintensity by a certain amount determined by the kernel and the weight \u03b1 ! 0, making the intensity\nhistory dependent and a stochastic process by itself. We will focus on the exponential kernel in\nthis paper. However, other functional forms for the triggering kernel, such as log-logistic function,\nare possible, and our model does not depend on this particular choice; and,\n(iii) Survival process. There is only one event for an instantiation of the process, i.e.,\n\n\u03ba\u03c9(t \u2212 ti),\n\n\u03bb\u2217(t) = g\u2217(t)(1 \u2212 N (t)),\nwhere \u03bb\u2217(t) becomes 0 if an event already happened before t.\n\n(5)\n\n3 Generative Model of Information Diffusion and Network Co-evolution\nIn this section, we use the above background on temporal point processes to formulate our proba-\nbilistic generative model for the joint dynamics of information diffusion and network evolution.\n\n3.1 Event Representation\nWe model the generation of two types of events: tweet/retweet events, er, and link creation events,\nel. Instead of just the time t, we record each event as a triplet\nsource\n\u2193s,\ndestination\n\n:= ( u\n\u2191\n\ner or el\n\nt\n).\n\u2191time\n\n(6)\n\n,\n\nFor retweet event, the triplet means that the destination node u retweets at time t a tweet originally\nposted by source node s. Recording the source node s re\ufb02ects the real world scenario that infor-\nmation sources are explicitly acknowledged. Note that the occurrence of event er does not mean\nthat u is directly retweeting from or is connected to s. This event can happen when u is retweeting\na message by another node u\u2032 where the original information source s is acknowledged. Node u\nwill pass on the same source acknowledgement to its followers (e.g., \u201cI agree @a @b @c @s\u201d).\nOriginal tweets posted by node u are allowed in this notation. In this case, the event will simply be\ner = (u, u, t). Given a list of retweet events up to but not including time t, the history Hr\nus(t) of\ni = (ui, si, ti)|ui = u and si = s} . The entire history\nretweets by u due to source s is Hr\nof retweet events is denoted as Hr(t) := \u222au,s\u2208[m]Hr\nFor link creation event, the triplet means that destination node u creates at time t a link to source\nnode s, i.e., from time t on, node u starts following node s. To ease the exposition, we restrict\nourselves to the case where links cannot be deleted and thus each (directed) link is created only\nonce. However, our model can be easily augmented to consider multiple link creations and deletions\nper node pair, as discussed in Section 8. We denote the link creation history as Hl(t).\n3.2 Joint Model with Two Interwoven Components\nGiven m users, we use two sets of counting processes to record the generated events, one for infor-\nmation diffusion and the other for network evolution. More speci\ufb01cally,\n\nus(t) = {er\n\nus(t).\n\n3\n\n\fI. Retweet events are recorded using a matrix N (t) of size m \u00d7 m for each \ufb01xed time point t. The\n(u, s)-th entry in the matrix, Nus(t) \u2208{ 0}\u222a Z+, counts the number of retweets of u due to\nsource s up to time t. These counting processes are \u201cidentity revealing\u201d, since they keep track of\nthe source node that triggers each retweet. This matrix N (t) can be dense, since Nus(t) can be\nnonzero even when node u does not directly follow s. We also let dN (t) := ( dNus(t) )u,s\u2208[m].\nII. Link events are recorded using an adjacency matrix A(t) of size m\u00d7 m for each \ufb01xed time point\nt. The (u, s)-th entry in the matrix, Aus(t) \u2208{ 0, 1}, indicates whether u is directly following s.\nThat is Aus(t) = 1 means the directed link has been created before t. For simplicity of exposition,\nwe do not allow self-links. The matrix A(t) is typically sparse, but the number of nonzero entries\ncan change over time. We also de\ufb01ne dA(t) := ( dAus(t) )u,s\u2208[m].\n\nThen the interwoven information diffusion and network evolution processes can be characterized\nusing their respective intensities E[dN (t)|Hr(t) \u222aH l(t)] = \u0393\u2217(t) dt and E[dA(t)|Hr(t) \u222a\nHl(t)] = \u039b\u2217(t) dt, where \u0393\u2217(t) = ( \u03b3\u2217us(t) )u,s\u2208[m] and \u039b\u2217(t) = ( \u03bb\u2217us(t) )u,s\u2208[m]. The sign\n\u2217 means that the intensity matrices will depend on the joint history, Hr(t) \u222aH l(t), and hence their\nevolution will be coupled. By this coupling, we make: (i) the counting processes for link creation to\nbe \u201cinformation driven\u201d and (ii) the evolution of the linking structure to change the information dif-\nfusion process. Refer to Appendix B for an illustration of our joint model. In the next two sections,\nwe will specify the details of these two intensity matrices.\n\nInformation Diffusion Process\n\n3.3\nWe model the intensity, \u0393\u2217(t), for retweeting events using multivariate Hawkes process [12]:\n\n\u03ba\u03c91(t) \u22c6 (Auv(t) dNvs(t)) ,\n\n\u03b3\u2217us(t) = I[u = s] \u03b7u + I[u \u0338= s] \u03b2s$v\u2208Fu(t)\n\n(7)\nwhere I[\u00b7] is the indicator function and Fu(t) := {v \u2208 [m] : Auv(t) = 1} is the current set of follo-\nwees of u. The term \u03b7u ! 0 is the intensity of original tweets by a user u on his own initiative,\nbecoming the source of a cascade and the term \u03b2s!v\u2208Fu(t) \u03ba\u03c9(t) \u22c6 (Auv(t) dNvs(t)) models the\n\npropagation of peer in\ufb02uence over the network, where the triggering kernel \u03ba\u03c91(t) models the decay\nof peer in\ufb02uence over time.\nNote that the retweet intensity matrix \u0393\u2217(t) is by itself a stochastic process that depends on the time-\nvarying network topology, the non-zero entries in A(t), whose growth is controlled by the network\nevolution process in Section 3.4. Hence the model design captures the in\ufb02uence of the network\ntopology and each source\u2019s in\ufb02uence, \u03b2s, on the information diffusion process. More speci\ufb01cally,\nto compute \u03b3\u2217us(t), one \ufb01rst \ufb01nds the current set Fu(t) of followees of u, and then aggregates\nthe retweets of these followees that are due to source s. Note that these followees may or may\nnot directly follow source s. Then, the more frequently node u is exposed to retweets of tweets\noriginated from source s via her followees, the more likely she will also retweet a tweet originated\nfrom source s. Once node u retweets due to source s, the corresponding Nus(t) will be incremented,\nand this in turn will increase the likelihood of triggering retweets due to source s among the followers\nof u. Thus, the source does not simply broadcast the message to nodes directly following her but\nher in\ufb02uence propagates through the network even to those nodes that do not directly follow her.\nFinally, this information diffusion model allows a node to repeatedly generate events in a cascade,\nand is very different from the independent cascade or linear threshold models [25] which allow at\nmost one event per node per cascade.\n\n\u03bb\u2217us(t) = (1 \u2212 Aus(t))(\u00b5u + \u03b1u \u03ba\u03c92(t) \u22c6dN us(t))\n\n3.4 Network Evolution Process\nWe model the intensity, \u039b\u2217(t), for link creation using a combination of survival and Hawkes process:\n(8)\nwhere the term 1 \u2212 Aus(t) effectively ensures a link is created only once, and after that, the corre-\nsponding intensity is set to zero. The term \u00b5u ! 0 denotes a baseline intensity, which models when a\nnode u decides to follow a source s spontaneously at her own initiative. The term \u03b1u\u03ba\u03c92(t)\u22c6dNus(t)\ncorresponds to the retweets of node u due to tweets originally published by source s, where the trig-\ngering kernel \u03ba\u03c92(t) models the decay of interests over time. Here, the higher the corresponding\nretweet intensity, the more likely u will \ufb01nd information by source s useful and will create a direct\nlink to s.\n\n4\n\n\fThe link creation intensity \u039b\u2217(t) is also a stochastic process by itself, which depends on the retweet\nevents, and is driven by the retweet count increments dNus(t). It captures the in\ufb02uence of retweets\non the link creation, and closes the loop of mutual in\ufb02uence between information diffusion and\nnetwork topology.\nNote that creating a link is more than just adding a path or allowing information sources to take\nshortcuts during diffusion. The network evolution makes fundamental changes to the diffusion\ndynamics and stationary distribution of the diffusion process in Section 3.3. As shown in [14],\ngiven a \ufb01xed network structure A, the expected retweet intensity \u00b5s(t) at time t due to source\ns will depend of the network structure in a highly nonlinear fashion, i.e., \u00b5s(t) := E[\u0393\u2217\u00b7s(t)] =\n(e(A\u2212\u03c91I)t + \u03c91(A \u2212 \u03c91I)\u22121(e(A\u2212\u03c91I)t \u2212 I)) \u03b7s, where \u03b7s \u2208 Rm has a single nonzero entry\nwith value \u03b7s and e(A\u2212\u03c91I)t is the matrix exponential. When t \u2192 \u221e, the stationary intensity\n\u00af\u00b5s = (I \u2212 A/\u03c9)\u22121 \u03b7s is also nonlinearly related to the network structure. Thus given two network\nstructures A(t) and A(t\u2032) at two points in time, which are different by a few edges, the effect of\nthese edges on the information diffusion is not just simply an additive relation. Depending on how\nthese newly created edges modify the eigen-structure of the sparse matrix A(t), their effect can be\ndrastic to the information diffusion.\n\nRemark 1. In our model, each user is exposed to information through a time-varying set of neigh-\nbors. By doing so, we couple information diffusion with the network evolution, increasing the\npractical application of our model to real-network datasets. The particular de\ufb01nition of exposure\n(e.g., a retweet\u2019s neighbor) will depend on the type of historical information that is available. Re-\nmarkably, the \ufb02exibility of our model allows for different types of diffusion events, which we can\nbroadly classify into two categories. In a \ufb01rst category, events corresponds to the times when an\ninformation cascade hits a person, for example, through a retweet from one of her neighbors, but\nshe does not explicitly like or forward the associated post. In a second category, the person decides\nto explicitly like or forward the associated post and events corresponds to the times when she does\nso. Intuitively, events in the latter category are more prone to trigger new connections but are also\nless frequent. Therefore, it is mostly suitable to large event dataset for examples those ones gener-\nated synthetically. In contrast, the events in the former category are less likely to inspire new links\nbut found in abundance. Therefore, it is very suitable for real-world sparse data. Consequently, in\nsynthetic experiments we used the latter and in the real one we used the former. It\u2019s noteworthy that\nEq. (8) is written based on the latter category, but, Fig. 7 in appendix is drawn based on the former.\n\n4 Ef\ufb01cient Simulation of Coevolutionary Dynamics\nWe can simulate samples (link creations, tweets and retweets) from our model by adapting Ogata\u2019s\nthinning algorithm [26], originally designed for multidimensional Hawkes processes. However, a\nnaive implementation of Ogata\u2019s algorithm would scale poorly, i.e., for each sample, we would\nneed to re-evaluate \u0393\u2217(t) and \u039b\u2217(t), thus, to draw n samples, we would need to perform O(m2n2)\noperations, where m is the number of nodes.\nWe designed a sampling procedure that is especially well-\ufb01tted for the structure of our model. The\nalgorithm is based on the following key idea: if we consider each intensity function in \u0393\u2217(t) and\n\u039b\u2217(t) as a separate Hawkes process and draw a sample from each, it is easy to show that the mini-\nmum among all these samples is a valid sample from the model [12]. However, by drawing samples\nfrom all intensities, the computational complexity would not improve. However, when the network\nis sparse, whenever we sample a new node (or link) event from the model, only a small number\nof intensity functions, in the local neighborhood of the node (or the link), will change. As a con-\nsequence, we can reuse most of the samples from the intensity functions for the next new sample\nand \ufb01nd which intensity functions we need to change in O(log m) operations, using a heap. Fi-\nnally, we exploit the properties of the exponential function to update individual intensities for each\nnew sample in O(1): let ti and ti+1 be two consecutive events, then, we can compute \u03bb\u2217(ti+1) as\n(\u03bb\u2217(ti) \u2212 \u00b5) exp(\u2212\u03c9(ti+1 \u2212 ti)) + \u00b5 without the need to compare all previous events.\nThe complete simulation algorithm is summarized in Algorithm 2 in Appendix C. By using Algo-\nrithm 2, we reduce the complexity from O(n2m2) to O(nd log m), where d is the maximum number\nof followees per node. That means, our algorithm scales logarithmically with the number of nodes\nand linearly with the number of edges at any point in time during the simulation. We also note that\nthe events for link creations, tweets and retweets are generated in a temporally intertwined and inter-\n\n5\n\n\fRetweet\n\nLink\n\n \n\nRetweet\n\nLink\n\n \n\ni\n\ns\nn\na\nr\nt\n \n\ni\n\ne\nk\np\nS\n\n \n\n0\n\n0.6\n\ny\nt\ni\ns\nn\ne\nt\nn\nI\n\n60\n\n \n\n0\n0\n\n20\n\n40\n\nEvent occurrence time\n\n60\n\n20\n\n40\n\nEvent occurrence time\n\ne\nc\nn\na\ni\nr\na\nv\no\nc\n \ns\ns\no\nr\nC\n\n4\n\n2\n\n0\n\n\u221250\n\n50\n\n0\n\nLag\n(c)\n\n(a)\n\n(b)\n\nFigure 1: Coevolutionary dynamics for synthetic data. a) Spike trains of link and retweet events. b)\nLink and retweet intensities. c) Cross covariance of link and retweet intensities.\n\n \n\n \n\n \n\n \n\nData\n\nPower\u2212law fit Poisson fit\n\nData\n\nPower\u2212law fit Poisson fit\n\nData\n\nPower\u2212law fit Poisson fit\n\nData\n\nPower\u2212law fit Poisson fit\n\n4\n10\n\n2\n10\n\n4\n10\n\n2\n10\n\n4\n10\n\n2\n10\n\n4\n10\n\n2\n10\n\n0\n10\n\n \n\n0\n10\n\n1\n10\n\n0\n10\n\n \n\n0\n10\n\n1\n10\n\n0\n10\n\n \n\n0\n10\n\n1\n10\n\n0\n10\n\n \n\n0\n10\n\n1\n10\n\n2\n10\n\n(a) \u03b2 = 0\n\n(b) \u03b2 = 0.001\n\n(c) \u03b2 = 0.1\n\n(d) \u03b2 = 0.8\n\nFigure 2: Degree distributions when network sparsity level reaches 0.001 for \ufb01xed \u03b1 = 0.1.\n\n0\n\n0\n\n*\n\n*\n\ni \u2208E\n\n. (9)\n\n()\n\nlinks\n\n()\n\ntweet / retweet\n\n\u03b3\u2217us(\u03c4 ) d\u03c4\n\n\u03bb\u2217us(\u03c4 ) d\u03c4\n\ni\u2208A\n\n+ $el\n\u2019\n\nwindow [0, T ), and the term \u2212\" T\n\nleaving fashion by Algorithm 2. This is because every new retweet event will modify the intensity\nfor link creation, and after each link creation we also need to update the retweet intensities.\n5 Ef\ufb01cient Parameter Estimation from Coevolutionary Events\nGiven a collection of retweet events E = {er\ni} recorded within\ni} and link creation events A = {el\na time window [0, T ), we can easily estimate the parameters needed in our model using maximum\nlikelihood estimation. Here, we compute the joint log-likelihood L({\u00b5u} ,{\u03b1u} ,{\u03b7u} ,{\u03b2s}) of\nthese events using Eq. (3), i.e.,\n$er\n\u2019\n\nFor the terms corresponding to retweets, the log term only sums over the actual observed events,\nbut the integral term actually sums over all possible combination of destination and source pairs,\neven if there is no event between a particular pair of destination and source. For such pairs with\nno observed events, the corresponding counting processes have essentially survived the observation\n0 \u03b3\u2217us(\u03c4 )d\u03c4 simply corresponds to the log survival probability.\n\nlog%\u03bb\u2217uisi(ti)& \u2212 $u,s\u2208[m]# T\n\nlog%\u03b3\u2217uisi(ti)& \u2212 $u,s\u2208[m]# T\n\nTerms corresponding to links have a similar structure to those for retweet.\nSince \u03b3\u2217us(t) and \u03bb\u2217us are linear in the parameters (\u03b7u, \u03b2s) and (\u00b5u, \u03b1u) respectively, then log(\u03b3\u2217us(t))\nand log(\u03bb\u2217us) are concave functions in these parameters. Integration of \u03b3\u2217us(t) and \u03bb\u2217us still results\nin linear functions of the parameters. Thus the overall objective in Eq. (9) is concave, and the global\noptimum can be found by many algorithms. In our experiments, we adapt the ef\ufb01cient algorithm\ndeveloped in previous work [18, 19]. Furthermore, the optimization problem decomposes in m\nindependent problems, one per node u, and can be readily parallelized.\n6 Properties of Simulated Co-evolution, Networks and Cascades\u2217\nIn this section, we perform an empirical investigation of the properties of the networks and infor-\nmation cascades generated by our model. In particular, we show that our model can generate co-\nevolutionary retweet and link dynamics and a wide spectrum of static and temporal network patterns\nand information cascades. Appendix D contains additional simulation results and visualizations.\nAppendix E contains an evaluation of our model estimation method in synthetic data.\nRetweet and link coevolution. Figures 1(a,b) visualize the retweet and link events, aggregated\nacross different sources, and the corresponding intensities for one node and one realization, picked\nat random. Here, it is already apparent that retweets and link creations are clustered in time and often\nfollow each other. Further, Figure 1(c) shows the cross-covariance of the retweet and link creation\nintensity, computed across multiple realizations, for the same node, i.e., if f (t) and g(t) are two\n\nintensities, the cross-covariance is a function of the time lag \u03c4 de\ufb01ned as h(\u03c4 ) =\" f (t + \u03c4 )g(t) dt.\n\nIt can be seen that the cross-covariance has its peak around 0, i.e., retweets and link creations are\n\n6\n\n\f \n\n \n\n80\n\n40\n\nr\ne\n\nt\n\ne\nm\na\nd\n\ni\n\n0\n\n \n\n\u03b2=0\n\n\u03b2=0.05\n\n\u03b2=0.1\n\n\u03b2=0.2\n\n5\n\nsparsity\n\n10\n\u22124\n\nx 10\n\n80\n\n40\n\nr\ne\n\nt\n\ne\nm\na\nd\n\ni\n\n0\n\n \n\n\u03b1=0 \u03b1=0.001\n\n\u03b1=0.1\n\n\u03b1=0.8\n\n5\n\nsparsity\n\n10\n\u22124\n\nx 10\n\n(a) Diameter, \u03b1 = 0.1\n\n(b) Diameter, \u03b2 = 0.1\n\n100%\n\nt\n\ne\ng\na\nn\ne\nc\nr\ne\np\n\n10%\n\n1%\n\n0.1%\n\n0\n\n1\n\n2\n\n3\n\nt\n\ne\ng\na\nn\ne\nc\nr\ne\np\n\n10%\n\n1%\n\n0.1%\n\n7\n\n8 others\n\n0\n\n0\n\n1\n\nFigure 3: Diameter for network sparsity 0.001. Panels (a) and (b) show the diameter against sparsity\nover time for \ufb01xed \u03b1 = 0.1, and for \ufb01xed \u03b2 = 0.1 respectively.\n\n,=0\n\n,=0.1\n\n,=0.8\n\n,=0\n\n,=0.1\n\n,=0.8\n\n100%\n\n4\n\n5\n\n6\n\ncascade size\n\n3\n\n4\n\n2\n6\ncascade depth\n\n5\n\n7 others\n\nFigure 4: Distribution of cascade structure, size and depth for different \u03b1 values and \ufb01xed \u03b2 = 0.2.\n\nhighly correlated and co-evolve over time. For ease of exposition, we illustrated co-evolution using\none node, however, we found consistent results across nodes.\nDegree distribution. Empirical studies have shown that the degree distribution of online social\nnetworks and microblogging sites follow a power law [9, 1], and argued that it is a consequence of\nthe rich get richer phenomena. The degree distribution of a network is a power law if the expected\nnumber of nodes md with degree d is given by md \u221d d\u2212\u03b3, where \u03b3> 0. Intuitively, the higher the\nvalues of the parameters \u03b1 and \u03b2, the closer the resulting degree distribution follows a power-law;\nthe lower their values, the closer the distribution to an Erdos-Renyi random graph [27]. Figure 2\ncon\ufb01rms this intuition by showing the degree distribution for different values of \u03b2.\nSmall (shrinking) diameter. There is empirical evidence that the diameter of online social networks\nand microblogging sites exhibit relatively small diameter and shrinks (or \ufb02attens) as the network\ngrows [28, 9, 22]. Figures 3(a-b) show the diameter on the largest connected component (LCC)\nagainst the sparsity of the network over time for different values of \u03b1 and \u03b2. Although at the\nbeginning, there is a short increase in the diameter due to the merge of small connected components,\nthe diameter decreases as the network evolves. Here, nodes arrive to the network when they follow\n(or are followed by) a node in the largest connected component.\nCascade patterns. Our model can produce the most commonly occurring cascades structures as\nwell as heavy-tailed cascade size and depth distributions, as observed in historical Twitter data [23].\nFigure 4 summarizes the results. The higher the \u03b1 value, the shallower and wider the cascades.\n\n7 Experiments on Real Dataset\nIn this section, we validate our model using a large Twitter dataset containing nearly 550,000 tweet,\nretweet and link events from more than 280,000 users [3]. We will show that our model can capture\nthe co-evolutionary dynamics and, by doing so, it predicts retweet and link creation events more\naccurately than several alternatives. Appendix F contains detailed information about the dataset and\nadditional experiments.\nRetweet and link coevolution. Figures 5(a, b) visualize the retweet and link events, aggregated\nacross different sources, and the corresponding intensities given by our trained model for one node,\npicked at random. Here, it is already apparent that retweets and link creations are clustered in time\nand often follow each other, and our \ufb01tted model intensities successfully track such behavior. Fur-\nther, Figure 5(c) compares the cross-covariance between the empirical retweet and link creation\nintensities and between the retweet and link creation intensities given by our trained model, com-\nputed across multiple realizations, for the same node. The similarity between both cross-covariances\nis striking and both has its peak around 0, i.e., retweets and link creations are highly correlated and\nco-evolve over time. For ease of exposition, as in Section 6, we illustrated co-evolution using one\nnode, however, we found consistent results across nodes (see Appendix F).\nLink prediction. We use our model to predict the identity of the source for each test link event,\ngiven the historical (link and retweet) events before the time of the prediction, and compare its\nperformance with two state of the art methods, denoted as TRF [3] and WENG [5]. TRF measures\n\n\u2217 Implementation codes are available at https://github.com/farajtabar/Coevolution\n\n7\n\nOthers \fRetweet\n\nLink\n\n \n\n1\n\nRetweet\n\nLink\n\n \n\ni\n\ns\nn\na\nr\nt\n \n\ni\n\ne\nk\np\nS\n\n \n\n0\n\ny\nt\ni\ns\nn\ne\nt\nn\nI\n\n0.5\n\n80\n\n \n\n0\n0\n\n20\n\n40\n\n60\n\nEvent occurrence time\n\n20\n\n40\n\n60\n\nEvent occurrence time\n\n80\n\n0\n\n \n\n\u2212100\n\nEstimated\n\nEmpirical\n\n \n\n4\n\n2\n\ne\nc\nn\na\ni\nr\na\nv\no\nc\n \ns\ns\no\nr\nC\n\n100\n\n0\n\nLag\n(c)\n\n(a)\n\n(b)\n\nFigure 5: Coevolutionary dynamics for real data a) Spike trains of link and retweet events. b)\nEstimated link and retweet intensities. c) Empirical and estimated cross covariance of link and\nretweet intensities\n\nk\nn\na\nR\ng\nv\nA\n\n140\n\n70\n\n10\n\n1\n\nCOEVOLVE\nTRF\nWENG\n\n3\n\n# events\n\n5\n\u00d7105\n\n0.2\n\n1\np\no\nT\n\n0.1\n\n0\n\n1\n\nCOEVOLVE\nTRF\nWENG\n\n80\n\nk\nn\na\nR\ng\nv\nA\n\n40\n\n3\n\n# events\n\n5\n\u00d7105\n\n1\n\n3\n\n# events\n\n5\n\u00d7105\n\nCOEVOLVE\nHAWKES\n\n0.3\n\nCOEVOLVE\nHAWKES\n\n1\np\no\nT\n\n0.15\n\n0\n\n1\n\n3\n\n# events\n\n5\n\u00d7105\n\n(a) Links: AR\n\n(b) Links: Top-1\n\n(c) Activity: AR\n\nActivity: Top-1\n\nFigure 6: Prediction performance in the Twitter dataset by means of average rank (AR) and success\nprobability that the true (test) events rank among the top-1 events (Top-1).\n\nthe probability of creating a link from a source at a given time by simply computing the proportion\nof new links created from the source with respect to the total number of links created up to the given\ntime. WENG considers different link creation strategies and makes a prediction by combining them.\nWe evaluate the performance by computing the probability of all potential links using different\nmethods, and then compute (i) the average rank of all true (test) events (AvgRank) and, (ii) the\nsuccess probability (SP) that the true (test) events rank among the top-1 potential events at each test\ntime (Top-1). We summarize the results in Fig. 6(a-b), where we consider an increasing number of\ntraining retweet/tweet events. Our model outperforms TRF and WENG consistently. For example,\nfor 8 \u00b7 104 training events, our model achieves a SP 2.5x times larger than TRF and WENG.\nActivity prediction. We use our model to predict the identity of the node that is going to generate\neach test diffusion event, given the historical events before the time of the prediction, and compare\nits performance with a baseline consisting of a Hawkes process without network evolution. For\nthe Hawkes baseline, we take a snapshot of the network right before the prediction time, and use\nall historical retweeting events to \ufb01t the model. Here, we evaluate the performance the via the\nsame two measures as in the link prediction task and summarize the results in Figure 6(c-d) against\nan increasing number of training events. The results show that, by modeling the co-evolutionary\ndynamics, our model performs signi\ufb01cantly better than the baseline.\n8 Discussion\nWe proposed a joint continuous-time model of information diffusion and network evolution, which\ncan capture the coevolutionary dynamics, mimics the most common static and temporal network\npatterns observed in real-world networks and information diffusion data, and predicts the network\nevolution and information diffusion more accurately than previous state-of-the-arts. Using point\nprocesses to model intertwined events in information networks opens up many interesting future\nmodeling work. Our current model is just a show-case of a rich set of possibilities offered by a point\nprocess framework, which have been rarely explored before in large scale social network mode-\nling. For example, we can generalize our model to support link deletion by introducing an intensity\nmatrix \u039e\u2217(t) modeling link deletions as survival processes, i.e., \u039e\u2217(t) = (g\u2217us(t)Aus(t))u,s\u2208[m],\nand then consider the counting process A(t) associated with the adjacency matrix to evolve as\nE[dA(t)|Hr(t) \u222aH l(t)] = \u039b\u2217(t) dt \u2212 \u039e\u2217(t) dt. We also can consider the number of nodes vary-\ning over time. Furthermore, a large and diverse range of point processes can also be used in the\nframework without changing the ef\ufb01ciency of the simulation and the convexity of the parameter\nestimation, e.g., condition the intensity on additional external features, such as node attributes.\nAcknowledge\nThe authors would like to thank Demetris Antoniades and Constantine Dovrolis for providing them\nwith the dataset. The research was supported in part by NSF/NIH BIGDATA 1R01GM108341, ONR\nN00014-15-1-2340, NSF IIS-1218749, NSF CAREER IIS-1350983.\n\n8\n\n\fReferences\n[1] H. Kwak, C. Lee, H. Park, and others. What is Twitter, a social network or a news media? WWW, 2010.\n[2] J. Cheng, L. Adamic, P. A. Dow, and others. Can cascades be predicted? WWW, 2014.\n[3] D. Antoniades and C. Dovrolis. Co-evolutionary dynamics in social networks: A case study of twitter.\n\narXiv:1309.6001, 2013.\n\n[4] S. Myers and J. Leskovec. The bursty dynamics of the twitter information network. WWW, 2014.\n[5] L. Weng, J. Ratkiewicz, N. Perra, B. Goncalves, C. Castillo, F. Bonchi, R. Schifanella, F. Menczer, and\n\nA. Flammini. The role of information diffusion in the evolution of social networks. KDD, 2013.\n\n[6] N. Du, L. Song, M. Gomez-Rodriguez, and H. Zha. Scalable in\ufb02uence estimation in continuous-time\n\ndiffusion networks. NIPS, 2013.\n\n[7] M. Gomez-Rodriguez, D. Balduzzi, and B. Sch\u00a8olkopf. Uncovering the temporal dynamics of diffusion\n\nnetworks. ICML, 2011.\n\n[8] M. Gomez-Rodriguez, J. Leskovec, A. Krause. Inferring networks of diffusion and in\ufb02uence. KDD, 2010.\n[9] D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. Computer Science\n\nDepartment, page 541, 2004.\n\n[10] J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and J. Leskovec. Kronecker graphs: An approach\n\nto modeling networks. JMLR, 2010.\n\n[11] J. Leskovec, L. Backstrom, R. Kumar, and others. Microscopic evolution of social networks. KDD, 2008.\n[12] T.J. Liniger. Multivariate Hawkes Processes. PhD thesis, ETHZ, 2009.\n[13] C. Blundell, J. Beck, K. Heller. Modelling reciprocating relationships with hawkes processes. NIPS, 2012.\n[14] M. Farajtabar, N. Du, M. Gomez-Rodriguez, I. Valera, H. Zha, and L. Song. Shaping social activity by\n\nincentivizing users. NIPS, 2014.\n\n[15] T. Iwata, A. Shah, and Z. Ghahramani. Discovering latent in\ufb02uence in online social activities via shared\n\ncascade poisson processes. KDD, 2013.\n\n[16] S. Linderman and R. Adams. Discovering latent network structure in point process data. ICML, 2014.\n[17] I. Valera, M. Gomez-Rodriguez, Modeling adoption of competing products and conventions in social\n\nmedia. ICDM, 2015.\n\n[18] K. Zhou, H. Zha, and L. Song. Learning social infectivity in sparse low-rank networks using multi-\n\ndimensional hawkes processes. AISTATS, 2013.\n\n[19] K. Zhou, H. Zha, and L. Song. Learning triggering kernels for multi-dimensional hawkes processes.\n\nICML, 2013.\n\n[20] D. Hunter, P. Smyth, D. Q. Vu, and others. Dynamic egocentric models for citation networks. ICML, 2011.\n[21] D. Q. Vu, D. Hunter, P. Smyth, and A. Asuncion. Continuous-time regression models for longitudinal\n\nnetworks. NIPS, 2011.\n\n[22] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densi\ufb01cation laws, shrinking diameters\n\nand possible explanations. KDD, 2005.\n\n[23] S. Goel, D. J. Watts, and D. G. Goldstein. The structure of online diffusion networks. EC, 2012.\n[24] O. Aalen, O. Borgan, and H. Gjessing. Survival and event history analysis: a process point of view, 2008.\n[25] D. Kempe, J. Kleinberg, and \u00b4E. Tardos. Maximizing the spread of in\ufb02uence through a social network.\n\nKDD, 2003.\n\n[26] Y. Ogata. On lewis\u2019 simulation method for point processes. IEEE TIT, 27(1):23\u201331, 1981.\n[27] P. Erdos and A R\u00b4enyi. On the evolution of random graphs. Hungar. Acad. Sci, 5:17\u201361, 1960.\n[28] L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four degrees of separation. WebSci, 2012.\n[29] M. Granovetter. The strength of weak ties. American journal of sociology, pages 1360\u20131380, 1973.\n[30] D. Romero and J. Kleinberg. The directed closure process in hybrid social-information networks, with an\n\nanalysis of link formation on twitter. ICWSM, 2010.\n\n[31] J. Ugander, L. Backstrom, and J. Kleinberg. Subgraph frequencies: Mapping the empirical and extremal\n\ngeography of large graph collections. WWW, 2013.\n\n[32] D.J. Watts and S.H. Strogatz. Collective dynamics of small-world networks. Nature, 1998.\n[33] T. Gross and B. Blasius. Adaptive coevolutionary networks: a review. Royal Society Interface, 2008.\n[34] P. Singer, C. Wagner, and M. Strohmaier. Factors in\ufb02uencing the co-evolution of social and content\nnetworks in online social media. Modeling and Mining Ubiquitous Social Media, pages 40\u201359. Springer,\n2012.\n\n9\n\n\f", "award": [], "sourceid": 1192, "authors": [{"given_name": "Mehrdad", "family_name": "Farajtabar", "institution": "Georgia Tech"}, {"given_name": "Yichen", "family_name": "Wang", "institution": "Georgia Institute of Technology"}, {"given_name": "Manuel", "family_name": "Gomez Rodriguez", "institution": "MPI SWS"}, {"given_name": "Shuang", "family_name": "Li", "institution": "Georgia Institute of Technology"}, {"given_name": "Hongyuan", "family_name": "Zha", "institution": "Georgia Tech"}, {"given_name": "Le", "family_name": "Song", "institution": "Georgia Institute of Technology"}]}