{"title": "Reconstructing Parameters of Spreading Models from Partial Observations", "book": "Advances in Neural Information Processing Systems", "page_first": 3467, "page_last": 3475, "abstract": "Spreading processes are often modelled as a stochastic dynamics occurring on top of a given network with edge weights corresponding to the transmission probabilities. Knowledge of veracious transmission probabilities is essential for prediction, optimization, and control of diffusion dynamics. Unfortunately, in most cases the transmission rates are unknown and need to be reconstructed from the spreading data. Moreover, in realistic settings it is impossible to monitor the state of each node at every time, and thus the data is highly incomplete. We introduce an efficient dynamic message-passing algorithm, which is able to reconstruct parameters of the spreading model given only partial information on the activation times of nodes in the network. The method is generalizable to a large class of dynamic models, as well to the case of temporal graphs.", "full_text": "Reconstructing Parameters of Spreading Models\n\nfrom Partial Observations\n\nCenter for Nonlinear Studies and Theoretical Division T-4\n\nLos Alamos National Laboratory, Los Alamos, NM 87545, USA\n\nAndrey Y. Lokhov\n\nlokhov@lanl.gov\n\nAbstract\n\nSpreading processes are often modelled as a stochastic dynamics occurring on top\nof a given network with edge weights corresponding to the transmission probabili-\nties. Knowledge of veracious transmission probabilities is essential for prediction,\noptimization, and control of diffusion dynamics. Unfortunately, in most cases the\ntransmission rates are unknown and need to be reconstructed from the spreading\ndata. Moreover, in realistic settings it is impossible to monitor the state of each\nnode at every time, and thus the data is highly incomplete. We introduce an ef\ufb01cient\ndynamic message-passing algorithm, which is able to reconstruct parameters of the\nspreading model given only partial information on the activation times of nodes in\nthe network. The method is generalizable to a large class of dynamic models, as\nwell to the case of temporal graphs.\n\n1\n\nIntroduction\n\nKnowledge of the underlying parameters of the spreading model is crucial for understanding the\nglobal properties of the dynamics and for development of effective control strategies for an opti-\nmal dissemination or mitigation of diffusion [1, 2]. However, in many realistic settings effective\ntransmission probabilities are not known a priori and need to be recovered from a limited number of\nrealizations of the process. Examples of such situations include spreading of a disease [3], propagation\nof information and opinions in a social network [4], correlated infrastructure failures [5], or activation\ncascades in biological and neural networks [6]: precise model and parameters, as well as propagation\npaths are often unknown, and one is left at most with several observed diffusion traces. It can be\nargued that for many interesting systems, even the functional form of the dynamic model is uncertain.\nNevertheless, the reconstruction problem still makes sense in this case: the common approach is to\nassume some simple and reasonable form of the dynamics, and recover the parameters of the model\nwhich explain the data in the most accurate and minimalistic way; this is crucial for understanding\nthe basic mechanisms of the spreading process, as well as for making further predictions without\nover\ufb01tting. For example, if only a small number of samples is available, a few-parameter model\nshould be used.\nIn practice, it is very costly or even impossible to record the state of each node at every time step of\nthe dynamics: we might only have access to a subset of nodes, or monitor the state of the system\nat particular times. For instance, surveys may give some information on the health or awareness of\ncertain individuals, but there is no way to get a detailed account for the whole population; neural\navalanches are usually recorded in cortical slices, representing only a small part of the brain; it is\ncostly to deploy measurement devices on each unit of a complex infrastructure system; \ufb01nally, hidden\nnodes play an important role in the arti\ufb01cial learning architectures. This is precisely the setting\nthat we address in this article: reconstruction of parameters of a propagation model in the presence\nof nodes with hidden information, and/or partial information in time. It is not surprising that this\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fchallenging problem turns out to be notably harder then its detailed counterpart and requires new\nalgorithms which would be robust with respect to missing observations.\nRelated work. The inverse problem of network and couplings reconstruction in the dynamic setting\nhas attracted a considerable attention in the past several years. However, most of the existing works\nare focused on learning the propagation networks under the assumption of availability of full diffusion\ninformation. The papers [7, 8, 9, 10] developed inference methods based on the maximization of\nthe likelihood of the observed cascades, leading to distributed and convex optimization algorithms\nin the case of continuous and discrete dynamics, principally for the variants of the independent\ncascade (IC) model [11]. These algorithms have been further improved under the sparse recovery\nframework [12, 13], particularly ef\ufb01cient for structure learning of treelike networks. A careful\nrigorous analysis of these likelihood-based and alternative [14, 15] reconstruction algorithms give\nan estimation of the number of observed cascades required for an exact network recovery with high\nprobability. Precise conditions for the parameters recovery at a given accuracy are still lacking. The\nfact that the aforementioned algorithms rely on the fully observed spreading history represents an\nimportant limitation in the case of incomplete information. The case of missing time information has\nbeen addressed in two recent papers: focusing primarily on tree graphs, [16] studied the structure\nlearning problem in which only initial and \ufb01nal spreading states are observed; [17] addressed the\nnetwork reconstruction problem in the case of partial time snapshots of the network, using relaxation\noptimization techniques and assuming that full probabilistic trace for each node in the network is\navailable. A standard technique for dealing with incomplete data involves maximizing the likelihood\nmarginalized over the hidden information; for example, this approach has been used in [18] for\nidentifying the diffusion source. In what follows, we use this method for benchmarking our results.\nOverview of results. In this article, we propose a different algorithm, based on recently introduced\ndynamic message-passing (DMP) equations for cascading processes [19, 20], which will be referred\nto as DMPREC (DMP-reconstruction) throughout the text. Making use of all available information,\nit yields signi\ufb01cantly more accurate reconstruction results, outperforming the likelihood method\nand having a substantially lower algorithmic complexity, independent on the number of nodes with\nunobserved information. More generally, the DMPREC framework can be easily adapted to allow\nreconstruction of heterogeneous transmission probabilities in a large class of cascading processes,\nincluding the IC and threshold models, SIR and other epidemiological models, rumor spreading\ndynamics, etc., as well as for the processes occurring on dynamically-changing networks.\n\n2 Problem formulation\n\nModel. For the sake of simplicity and de\ufb01niteness, we assume that cascades follow the dynamics of\nstochastic susceptible-infected (SI) model in discrete time, de\ufb01ned on a network G = (V, E) with set\nof nodes denoted by V and set of directed edges E [3]. Each node i \u2208 V at times t = 1, 2, . . . , T\ncan be in either of two states: susceptible (S) or infected (I). At each time step, node i in the I state\n1 . The dynamics is non-recurrent:\ncan activate one of its susceptible neighbors j with probability \u03b1ij\nonce the node is activated (infected), it can never change its state back to susceptible. In what follows,\nthe network G is supposed to be known.\nIncomplete observations and inference problem. We assume that the input is formed from M\nindependent cascades, where a cascade \u03a3c is de\ufb01ned as a collection of activation times of nodes in\nthe network {\u03c4 c\ni }i\u2208V . Each cascade is observed up to the \ufb01nal observation time T . Notice that T is\nan important parameter: intuitively, the larger is T , the more information is contained in cascades,\nand the less samples are needed. We assume that T is given and \ufb01xed, being related to the availability\nof the \ufb01nite-time observation window. If node i in cascade c does not get activated at a certain time\nprior to the horizon T , we put by de\ufb01nition \u03c4 c\ni = T means that node i changes its\nstate at time T or later. The full information on the cascades \u03a3 = \u222ac\u03a3c is divided into the observed\npart, \u03a3O, and the hidden part \u03a3H. Thus, in general \u03a3O contains only a subset of activation times\nin T \u2208 [0, T ] for a part of observed nodes in the network O \u2208 V . The task is to reconstruct the\n\ni = T ; hence, \u03c4 c\n\n1We chose this two-state model since it has slightly more general dynamic rules compared to the popular IC\nmodel [11] with an additional restriction: a node infected at time t has a single chance to activate its susceptible\nneighbors at time step t+1, while further infection attempts in subsequent rounds are not allowed. The DMPREC\nmethod presented below can be easily applied to the case of IC model by noticing that it corresponds to the SIR\nmodel with a recovery probability equal to one, for which the DMP equations are known [20].\n\n2\n\n\fij}(ij)\u2208E \u2261 G\u03b1\u2217, where G\u03b1\u2217 with a star denotes the original transmission probabilities\n\ncouplings {\u03b1\u2217\nthat have been used to generate the data.\nMaximization of the likelihood. Similarly to the formulations considered in [7, 8, 10], it is possible\nto explicitly write the expression for the likelihood of the discrete-time SI model in the case of fully\navailable information \u03a3O = \u03a3 under the assumption that the data has been generated using the\ncouplings G\u03b1:\n\nwith\nPi(\u03c4 c\n\ni | \u03a3c, G\u03b1) =\n\n\uf8eb\uf8ed\u03c4 c\ni \u22122(cid:89)\n\n(cid:89)\n\nt(cid:48)=0\n\nk\u2208\u2202i\n\nP (\u03a3 | G\u03b1) =\n\nPi(\u03c4 c\n\ni | \u03a3c, G\u03b1),\n\ni\u2208V\n\n1\u2264c\u2264M\n\n(cid:89)\n\n(cid:89)\n\uf8f6\uf8f8(cid:34)\n\n(cid:32)(cid:89)\n\nk\u2208\u2202i\n\n(1 \u2212 \u03b1ki1\u03c4 c\n\nk\u2264t(cid:48))\n\n1 \u2212\n\n(1 \u2212 \u03b1ki1\u03c4 c\n\nk\u2264\u03c4 c\n\ni \u22121)\n\n(cid:33)\n\n(1)\n\n(cid:35)\n\n1\u03c4 c\n\ni <T\n\n, (2)\n\nwhere \u2202i denotes the set of neighbors of node i in the graph G, and 1 is the indicator function. The\nexpression (2) has the following meaning: the probability that node i has been activated at time \u03c4i\ngiven the activation times of its neighbors is equal to the probability that the activation signal has\nnot been transmitted by any infected neighbor of i until the time \u03c4i \u2212 2 (\ufb01rst term in the product),\nand that at least one of the active neighbors actually transmitted the infection at time \u03c4i \u2212 1 (second\nterm). A straightforward adaptation of the NETRATE algorithm, suggested in [8], to the present\n\nsetting implies that the estimation of the transmission probabilities (cid:98)G\u03b1\u2217 is obtained as a solution of\nthe convex optimization problem(cid:98)G\u03b1\u2217 = arg min (\u2212 ln P (\u03a3 | G\u03b1)) ,\n\n(3)\nwhich can be solved locally for each node i and its neighborhood due to the factorization of the\nlikelihood (1) under assumption of asymmetry of the couplings. In the case of partial observations,\nthe optimization problem (3) is not well de\ufb01ned since it requires the full knowledge of activation\ntimes for each node. A simple and natural extension of this scheme, which we will refer to as the\nmaximum likelihood estimator (MLE), is to consider the likelihood function marginalized over\nunknown activation times:\n\nP (\u03a3O | G\u03b1) =\n\nP (\u03a3 | G\u03b1).\n\n(4)\n\n(cid:88)\n\n{\u03c4 c\n\nh},h\u2208H\n\nAn exact evaluation of (4) is a computationally hard high-dimensional integration problem with\ncomplexity proportional to T H in the presence of H nodes with hidden information. In order to\ncorrect for this fact, we propose a heuristic scheme which we denote as the heuristic two-stage\nh}h\u2208H of the cascades\n(HTS) algorithm. The idea of HTS consists of completing the missing part {\u03c4 c\nestimation of the couplings (cid:98)G\u03b1,(cid:98)\u03a3H = arg max P (\u03a3 | (cid:98)G\u03b1), and solving the optimization problem\nat each step of the optimization process with the most probable values according to the current\n(3) using the full information on the cascades \u03a3 = \u03a3O \u222a(cid:98)\u03a3H; these two alternating steps are iterated\nuntil the global convergence of the algorithm. An exact (brute-force) estimation of(cid:98)\u03a3H requires an\n\nexponential number of operations T H, as the original MLE formulation. However, we found that\nin practice the computational time can be signi\ufb01cantly reduced with the use of the Monte Carlo\nsampling. The corresponding approximation is based on the observation that the likelihood (1) is\ni }i\u2208V forming possible (realizable) cascades. Hence, for each c, we sample\nnon-zero only for {\u03c4 c\nLH,T auxiliary cascades, and choose the set of {\u03c4 c\nh}h\u2208H maximizing (1). LH,T is typically a large\nsampling parameter, growing with T and H to ensure a proper convergence. This procedure leads\nto an algorithm with a complexity O(N M|E|2LH,T ) at each step of the optimization, where |E|\ndenotes the number of edges; see the journal version of the paper [21] for a more in-depth discussion.\nHence, both MLE and HTS algorithms are practically intractable; the remaining part of the paper\nis devoted to the development of an accurate algorithm with a polynomial-time computational\ncomplexity for this hard problem. The next section introduces dynamic message-passing equations\nwhich serve as a basis for such algorithm.\n\n3 Dynamic message-passing equations.\n\nThe dynamic message-passing equations for the SI model in continuous [19] and discrete [20] settings\nallow to compute marginal probabilities that node i is in the state S at time t:\n\n(5)\n\n(cid:89)\n\nk\u2208\u2202i\n\n\u03b8k\u2192i(t)\n\nP i\nS(t) = P i\n\nS(0)\n\n3\n\n\fS(0). The variables \u03b8k\u2192i(t) represent the probability that\nfor t > 0 and a given initial condition P i\nnode k did not pass the activation signal to the node i until time t. The intuition behind the key\nEquation (5) is that the probability of node i to be susceptible at time t is equal to the probability of\nbeing in the S state at initial time times the probability that neither of its neighbors infected it until\ntime t. The quantities \u03b8k\u2192i(t) can be computed iteratively using the following expressions:\n\n\u03b8k\u2192i(t) = \u03b8k\u2192i(t \u2212 1) \u2212 \u03b1ki\u03c6k\u2192i(t \u2212 1),\n\n\u03c6k\u2192i(t) = (1 \u2212 \u03b1ki)\u03c6k\u2192i(t \u2212 1) + P k\n\nS (0)\n\n\uf8eb\uf8ed (cid:89)\n\nl\u2208\u2202k\\i\n\n\u03b8l\u2192k(t \u2212 1) \u2212 (cid:89)\n\n\u03b8l\u2192k(t)\n\nl\u2208\u2202k\\i\n\n\uf8f6\uf8f8 ,\n\n(6)\n\n(7)\n\nwhere \u2202k\\i denotes the set of neighbors of k excluding i. The Equation (6) translates the fact that\n\u03b8k\u2192i(t) can only decrease if the infection is actually transmitted along the directed link (ki) \u2208 E;\nthis happens with probability \u03b1ki times \u03c6k\u2192i(t \u2212 1) which denotes the probability that node k is in\nthe state I at time t, but has not transmitted the infection to node i until time t \u2212 1. The Equation\n(7), which allows to close the system of dynamic equations, describes the evolution of probability\n\u03c6k\u2192i(t): at time t\u2212 1, it decreases if the infection is transmitted (\ufb01rst term in the sum), and increases\nif node k goes from the state S to the state I (difference of terms 2 and 3). Note that node i is\nexcluded from the corresponding products over \u03b8-variables because this equation is conditioned on\nthe fact that i is in the state S, and therefore can not infect k. The Equations (6) and (7) are iterated\nin time starting from initial conditions \u03b8i\u2192j(0) = 1 and \u03c6i\u2192j(0) = 1 \u2212 P i\nS(0) which are consistent\nwith the de\ufb01nitions above. The name \u201cDMP equations\u201d comes from the fact the whole scheme can\nbe interpreted as the procedure of passing \u201cmessages\u201d along the edges of the network.\nTheorem 1. DMP equations for the SI model, de\ufb01ned by Equations (5)-(7), yield exact marginal\nprobabilities on tree networks. On general networks, the quantities P i\nS(t) give lower bound on values\nof marginal probabilities.\n\nProof Sketch. The exactness of solution on tree graphs immediately follows from the fact that the\nDMP equations can be derived from belief propagation equations on time trajectories [20], which\nprovide exact marginals on trees. The fact that P i\nS(t) computed according to (5) represent a lower\nbound on marginal probabilities in general networks can be derived from a counting argument,\nconsidering multiple infection paths on a loopy graph which contribute to the computation of P i\nS(t),\neffectively lowering its value through the Equation (5); the proof technique is borrowed from [19],\n(cid:3)\nwhere similar dynamic equations in the continuous-time case have been considered.\nUsing the de\ufb01nition (5) of P i\nS(t), it is convenient to de\ufb01ne the marginal probability mi(t) of activation\nof node i at time t:\n\n(cid:35)\n\nmi(t) = P i\n\nS(0)\n\n\u03b8k\u2192i(t)\n\n.\n\n(8)\n\n(cid:34)(cid:89)\n\nk\u2208\u2202i\n\n\u03b8k\u2192i(t \u2212 1) \u2212 (cid:89)\n\nk\u2208\u2202i\n\nAs it often happens with message-passing algorithms, although being exact only on tree networks,\nDMP equations provide accurate results even on loopy networks. An example is provided in the\nFigure 1, where the DMP-predicted marginals are compared with the values obtained from extensive\nsimulations of the dynamics on a network of retweets with N = 96 nodes [22]. This observation will\nallow us to use DMP equations as a suitable approximation tool on general networks. In the next\nsection we describe an ef\ufb01cient reconstruction algorithm, DMPREC, which is based on the resolution\nof the dynamics given by DMP equations and makes use of all available information.\n\n4 Proposed algorithm: DMPREC\n\nProbability of cascades and free energy. The marginalization over hidden nodes in (4) creates a\ncomplex relation between couplings in the whole graph, resulting in a non-explicit expression. The\nmain idea behind the DMPREC algorithm is to approximate the likelihood of observed cascades (4)\nthrough the marginal probability distributions (5) and (8):\ni | G\u03b1)1\u03c4 c\n\nP (\u03a3O | G\u03b1) \u2248 M(cid:89)\n\n(cid:2)mi(\u03c4 c\n\ni | G\u03b1)1\u03c4 c\n\n(cid:89)\n\ni \u2264T + P i\n\nS(\u03c4 c\n\n(cid:3) .\n\n(9)\n\ni =T\n\nc=1\n\ni\u2208O\n\nThe expression (9) is at the core of the suggested algorithm. As there is no tractable way to compute\nexactly the joint probability of partial observations, we approximate it using a mean-\ufb01eld-type\n\n4\n\n\fFigure 1: Illustration of the accuracy of DMP equations on a network of retweets with N = 96 nodes [22].\n(a) Comparison of DMP-predicted P i\nS(t) estimated from 106 runs of Monte Carlo simulations with\nt = 10 and one infected node at initial time. The couplings {\u03b1ij} have been generated uniformly at random in\nthe range [0, 1]. (b) Visualization of the network topology created with the Gephi software.\n\nS(t) with P i\n\napproach as a product of marginal probabilities provided by the dynamic message-passing equations.\nThe reasoning behind this approach is that each marginal is expressed through an average of all\npossible realizations of dynamics with a given initial condition; this is in contrast with the likelihood\nfunction which considers only particular instance realized in the given cascade. Therefore, equation\n(9) summarizes the effect of different propagation paths, and the maximization of this probability\nfunction will yield the most likely consensus between the ensemble of couplings in the network.\nPrecisely this key property makes the reconstruction possible in the case involving nodes with\nhidden information via maximization of the objective (9) which can be interpreted as a cost function\nrepresenting the product of individual probabilities of activation taken precisely at the value of the\nobserved infection times. Starting from this expression, one can de\ufb01ne the associated \u201cfree energy\u201d:\n\nfDMP = \u2212 ln P (\u03a3O | G\u03b1) =\n\nDMP = \u2212(cid:80)\n\nc ln(cid:2)mi(\u03c4 c\n\n(cid:88)\n(cid:3). In the last expression for f i\n\nf i\nDMP,\n\ni\u2208O\n\n(10)\n\nS(T \u2212 1)1\u03c4 c\n\ni \u2264T\u22121 + P i\n\ni )1\u03c4 c\nS(T ) = P i\n\nDMP we\nwhere f i\nS(T \u2212 1). Our goal is to minimize the free energy (10)\nused the fact that mi(T ) + P i\nwith respect to {\u03b1ij}(ij)\u2208E. A similar approach has been previously outlined by [23] as a way to\nlearn homogeneous couplings in the spreading source inference algorithm. In order to carry out this\noptimization task, we need to develop an ef\ufb01cient way of gradient evaluation.\nComputation of the gradient. The gradient of the free energy reads (note that the indicator functions\npoint to disjoint events):\n\ni =T\n\ni | G\u03b1)/\u2202\u03b1rs\n\nmi(\u03c4 c\n\ni | G\u03b1)\n\n1\u03c4 c\n\ni \u2264T\u22121 +\n\n\u2202P i\n\nS(T \u2212 1 | G\u03b1)/\u2202\u03b1rs\nS(T \u2212 1 | G\u03b1)\nP i\n\n1\u03c4 c\n\ni =T\n\n(11)\n\n(cid:104) \u2202mi(\u03c4 c\n\n= \u2212(cid:88)\n\nc\n\n\u2202f i\nDMP\n\u2202\u03b1rs\n\n(cid:105)\n\n,\n\nrs\n\nwhere the derivatives of the marginal probabilities can be computed explicitly by taking the derivative\nof the DMP equations (5)-(8). Let us denote \u2202\u03b8k\u2192i(t)/\u2202\u03b1rs \u2261 pk\u2192i\n(t) and \u2202\u03c6k\u2192i(t)/\u2202\u03b1rs \u2261\n(t). Since the dynamic messages at initial time {\u03b8i\u2192j(0)} and {\u03c6i\u2192j(0)} are independent\nqk\u2192i\non the couplings, we have pk\u2192i\n(0) = 0 for all k, i, r, s, and these quantities can be\ncomputed iteratively using the analogues of the Equations (6) and (7):\n(t \u2212 1) \u2212 \u03c6k\u2192i(t \u2212 1)1k=r,i=s,\n\n(t \u2212 1) \u2212 \u03b1kiqk\u2192i\n\n(0) = qk\u2192i\n\n(12)\n\nrs\n\nrs\n\nrs\n\nrs\n\nrs\n\npk\u2192i\nqk\u2192i\nrs\n+ P k\n\n(t) = pk\u2192i\n(t) = (1 \u2212 \u03b1ki)qk\u2192i\nS (0)\n\n(cid:88)\n\nrs\n\nrs\n\n(cid:89)\n\n(t \u2212 1) \u2212 \u03c6k\u2192i(t \u2212 1)1k=r,i=s\n\u03b8n\u2192k(t \u2212 1) \u2212 P k\nS (0)\n\nl\u2208\u2202k\\i\n\nn\u2208\u2202k\\{i,l}\n\n(cid:88)\n\nl\u2208k\\i\n\npl\u2192k\nrs (t)\n\n\u03b8n\u2192k(t).\n\n(13)\n\nn\u2208\u2202k\\{i,l}\n\n(cid:89)\n\nUsing these quantities, the derivatives of the marginals entering in Equation (11) can be written as\n\nrs (t \u2212 1)\npl\u2192k\n(cid:88)\n\npk\u2192i\n\nrs\n\nk\u2208\u2202i\n\n(cid:89)\n\n(t)\n\nl\u2208\u2202i\\k\n\n\u2202P i\nS(t)\n\u2202\u03b1rs\n\n= P i\n\nS(0)\n\n\u03b8l\u2192i(t),\n\n\u2202mi(t)\n\u2202\u03b1rs\n\n=\n\n\u2202P i\n\nS(t \u2212 1)\n\u2202\u03b1rs\n\n\u2212 \u2202P i\nS(t)\n\u2202\u03b1rs\n\n.\n\n(14)\n\n5\n\n00.20.40.60.8100.20.40.60.81MC-predictedPiS(t)DMP-predictedPiS(t)(a)(b)\fThe following observation shows that at least on tree networks, corresponding to the regime in\nwhich DMP equations have been derived, the values of the original transmission probabilities G\u03b1\u2217\ncorrespond to the point in which the gradient of the free energy takes zero value.\nClaim 1. On a tree network, in the limit of large number of samples M \u2192 \u221e, the derivative of the\nfree energy is equal to zero at the values of couplings G\u03b1\u2217 used for generating cascades.\n\nProof. Let us \ufb01rst look at samples originating from the same initial condition. According to Theorem\n1, the DMP equations are exact on tree graphs, and hence it is easy to see that\nS(T \u2212 1 | G\u03b1\u2217 ) ln P i\nM\u2192\u221ef i\nlim\n\nDMP = \u2212 (cid:88)\n\nmi(t | G\u03b1\u2217 ) ln mi(t | G\u03b1) \u2212 P i\n\n(15)\n\nt\u2264T\u22121\n\nS(T \u2212 1 | G\u03b1).\n(cid:35)\n\nTherefore,\n\nlim\nM\u2192\u221e\n\n\u2202f i\nDMP\n\u2202\u03b1rs\n\n|G\u03b1\u2217 = \u2212 \u2202\n\u2202\u03b1rs\n\n(cid:34) (cid:88)\n\nt\u2264T\u22121\n\nmi(t | G\u03b1\u2217 ) + P i\n\nS(T \u2212 1 | G\u03b1\u2217 )\n\n= 0,\n\nsince the expression inside the brackets sums exactly to one. This result trivially holds by summing up\nsamples with different initial conditions. Combination of this result with the de\ufb01nition (10) completes\nthe proof.\n\nThe DMPREC algorithm consists of running the message-passing equations for the derivatives of the\ndynamic variables (12), (13) in parallel with DMP equations (5)-(7), allowing for the computation\nof the gradient of the free energy (11) through (14), which is used afterwards in the optimization\nprocedure. Let us analyse the computational complexity of each step of parameters update. The\nnumber of runs is equal to the number of distinct initial conditions in the ensemble of observed\ncascades, so if all M cascades start with distinct initial conditions, the complexity of the DMPREC\nalgorithm is equal to O(|E|2T M ) for each step of the update of {\u03b1rs}(rs)\u2208E. Hence, in a typical\nsituation where each cascade is initiated at one particular node, the number of runs will be limited by\nN, and the overall update-step complexity of DMPREC will be O(|E|2T N ).\nMissing information in time. On top of inaccessible nodes, the state of the network can be monitored\nat a lower frequency compared to the natural time scale of the dynamics. It is easy to adapt the\nalgorithm to the case of observations at K time steps T \u2261 {tk}k\u2208[1,K]. Since the activation time\ni +1] \u2261 \u03b4kc\ni of node i in cascade c is now known only up to the interval [tkc\n, where\n\u03c4 c\nS(t | G\u03b1)\ni \u2264 tkc\ni ) \u2212 P i\ntkc\nP i\ni < \u03c4 c\ni ) in this case. This leads to obvious modi\ufb01cations to the expressions (10) and (11),\ninstead of mi(\u03c4 c\n(cid:35)\nusing the differences of derivatives at corresponding times instead of one-step differences as in (14).\nFor instance, if the \ufb01nal time is not included in the observations, we have\n\ni +1, one should maximize(cid:80)\nDMP = \u2212(cid:88)\n\nS(t | G\u03b1)(cid:3) ,\n\n= \u2212(cid:88)\n\nln(cid:2)\u2206kc\n\ni + 1, tkc\nS(tkc\n\ni +1) \u2261 \u2206kc\n\nmi(t) = P i\n\nS(tkc\n\nt\u2208\u03b4kc\n\ni\n\n\u2202\u2206kc\n\ni\n\ni\n\n(cid:34)\n\nf i\n\nS(t | G\u03b1)/\u2202\u03b1rs\nP i\nS(t | G\u03b1)\nP i\n\u2206kc\n\ni\n\ni\n\n.\n\nP i\n\ni\n\n\u2202f i\nDMP\n\u2202\u03b1rs\n\nc\n\nc\n\n5 Numerical results\n\nWe evaluate the performance of the DMPREC algorithm on synthetic and real-world networks under\nassumption of partial observations. In numerical experiments, we focus primarily on the presence of\ninaccessible nodes, which is a more computationally dif\ufb01cult case compared to the setting of missing\ninformation in time. An example involving partial time observations is shown in section 5.1.\n\n5.1 Tests with synthetic data\nExperimental setup. In the tests described in this section, the couplings {\u03b1ij} are sampled uniformly\nin the range [0, 1], the \ufb01nal observation time is set to T = 10. Each cascade is generated using a\ndiscrete-time SI model de\ufb01ned in section 2 from randomly selected sources. In the case of inaccessible\nnodes, the activation times data is hidden in all the samples for H randomly selected nodes. We use the\nlikelihood methods for benchmarking the accuracy of our approach. The MLE algorithm introduced\nabove is not tractable even on small graphs, therefore we compare the results of DMPREC with\n\n6\n\n\fthe HTS algorithm outlined in the section 2. Still, HTS has a very high computational complexity,\nand therefore we are bound to run comparative tests on small graphs: a connected component of an\narti\ufb01cially-generated network with N = 20, sampled using a power-law degree distribution, and a\nreal directed network of relationships in a New England monastery with N = 18 nodes [24]. Both\nalgorithms are initialized with \u03b1ij = 0.5 for all (ij) \u2208 E. The accuracy of reconstruction is assessed\nusing the (cid:96)1 norm of the difference between reconstructed and original couplings, normalized over\nthe number of directed edges in the graph2 . Intuitively, this measure gives an average expected error\nfor each parameter \u03b1ij.\n\nFigure 2: Tests for DMPREC and HTS on a small power-law network: (a) for \ufb01xed number of nodes with\nunobserved information H = 5, (b) for \ufb01xed number of samples M = 6400. (c) Scatter plot of {\u03b1ij} obtained\nwith DMPREC versus original parameters {\u03b1\u2217\nij} in the case of missing information in time with M = 6400,\nT = 10; the state of the network is observed every other time step.\n\nFigure 3: Numerical results for the real-world Monastery network of [24]: (a) for \ufb01xed number of nodes with\nunobserved information H = 4, (b) for \ufb01xed number of samples M = 6400. (c) The topology of the network\n(thickness of edges proportional to {\u03b1\u2217\n\nij} used for generating cascades).\n\nResults. In the Figure 2 we present results for a small power-law network with short loops, which\nis not a favorable situation for DMP equations derived in the treelike approximation of the graph.\nFigures 2 (a) and 2 (b) show the dependence of an average reconstruction error as a function of M\n(for \ufb01xed H/N = 0.25) and H (for \ufb01xed M = 6400), respectively. DMPREC clearly outperforms\nthe HTS algorithm, yielding surprisingly accurate reconstruction of transmission probabilities even\nin the case where a half of network nodes do not report any information. Most importantly, DMPREC\nachieves reconstruction with a signi\ufb01cantly lower computational time: for example, while it took\nmore than 24 hours to compute the point corresponding to H = 4 and M = 6400 with HTS (MLE\nat this test point took several weeks to converge), the computation involving DMPREC converged to\nthe presented level of accuracy in less than 10 minutes on a standard laptop. These times illustrate\nthe hardness of the learning problem involving incomplete information.\nWe have also used this case study network to test the estimation of transmission probabilities with the\nDMPREC algorithm when the state of the network is recorded only at a subset of times T \u2208 [0, T ].\nResults for the case where every other time stamp is missing are given in the Figure 2 (c): couplings\nestimated with DMPREC are compared to the original values {\u03b1\u2217\nij}; despite the fact that only 50% of\ntime stamps are available, the inferred couplings show an excellent agreement with the ground truth.\nEquivalent results for the real-world relationship network extracted from the study [24] and containing\nboth directed and undirected links, are shown in the Figure 3; an ability of DMPREC to capture the\nmutual dependencies of different couplings through dynamic correlations is even more pronounced in\nthis case, with almost perfect reconstruction of couplings for large M and a rather weak dependence\n\n2Note that this measure excludes those few parameters which are impossible to reconstruct: e.g. no algorithm\n\ncan learn the coupling associated with the ingoing edge of the hidden node located at the leaf of a network.\n\n7\n\n 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1\u03b1*ij\u03b1ij(c) 0 0.05 0.1025710H(c)(b) 0 0.05 0.1 0.15102103104105106\u2329\uf8e6\u03b1ij - \u03b1*ij\uf8e6\u232a M(\u00d70.64)(c)(b)(a)HTSDMPrec(c) 0 0.05 0.1 0 2 4 6 8H(c)(b) 0 0.05 0.1 0.15102103104105106\u2329\uf8e6\u03b1ij - \u03b1*ij\uf8e6\u232a M(\u00d70.64)(c)(b)(a)HTSDMPrec\fon the number of nodes with removed observations. We have run tests on larger synthetic networks\nwhich show similar reconstruction results for DMPREC, but where comparisons with the likelihood\nmethod could not be carried out. In the next section we focus on an application involving real-world\ndata which represents a more interesting and important case for the validation of the algorithm.\n\n5.2 Test with a real-world data\n\nAs a proxy for the real statistics, we used the data provided by the Bureau of Transportation Statistics\n[25], from which we reconstructed a part of the U.S. air transportation network, where airports are\nthe nodes, and directed links correspond to traf\ufb01c between them. The reason behind this choice is\nbased on the fact that the majority of large-scale in\ufb02uenza pandemics over the past several decades\nrepresented the air-traf\ufb01c mediated epidemics. For illustration purposes, we selected top N = 30\nairports ranked according to the total number of passenger enplanements and commonly classi\ufb01ed as\nlarge hubs, and extracted a sub-network of \ufb02ights between them. The weight of each edge is de\ufb01ned\nby the annual number of transported passengers, aggregated over multiple routes; we have pruned\nlinks with a relatively low traf\ufb01c \u2013 below 10% of the traf\ufb01c level on the busiest routes, so that the\ntotal number of remaining directed links is |E| = 210. The \ufb01nal weights are based on the assumption\nthat the probability of infection transmission is proportional to the \ufb02ux; the weights have been\nrenormalized accordingly so that the busiest route received the coupling \u03b1ij = 0.5. The resulting\nnetwork is depicted in the Figure 4 . We have generated M = 10, 000 independent cascades in this\nnetwork, and have hidden the information at H = 15 nodes (50% of airports) selected at random.\nWe observe that even with a signi\ufb01cantly large portion of missing information, the reconstructed\nparameters show a good agreement with the original ones.\n\nFigure 4: Left: Sub-network of \ufb02ights between major U.S. hubs, where the thickness of edges is proportional to\nthe aggregated traf\ufb01c between them; nodes which do not report information are indicated in red. Right: Scatter\nplots of reconstructed {\u03b1ij} versus original {\u03b1\u2217\n\nij} couplings for H = 0 and H = 15 and M = 10, 000.\n\n6 Conclusions and path forward\n\nFrom the algorithmic point of view, inference of spreading parameters in the presence of nodes with\nincomplete information considerably complicates the problem because the reconstruction can no\nlonger be performed independently for each neighborhood. In this paper, it is shown how the dynamic\ninterdependence of parameters can be exploited in order to be able to recover the couplings in the\nsetting involving hidden information. Let us discuss several directions for future work. DMPREC\ncan be straightforwardly generalized to more complicated spreading models using a generic form\nof DMP equations [20] and the key approximation ingredient (9), as well as adapted to the case\nof temporal graphs by encoding network dynamics via time-dependent coef\ufb01cients \u03b1ij(t), which\nmight be more appropriate in certain real situations. It would also be useful to extend the present\nframework to the case of continuous dynamics using the continuous-time version of DMP equations\nof [19]. An important direction would be to generalize the learning problem beyond the assumption\nof a known network, and formulate precise conditions for detection of hidden nodes and for a perfect\nnetwork recovery in this case. Finally, in the spirit of active learning, we anticipate that DMPREC\ncould be helpful for the problems involving an optimal placement of observes in the situations where\ncollection of full measurements is costly.\nAcknowledgements. The author is grateful to M. Chertkov and T. Misiakiewicz for discussions and comments,\nand acknowledges support from the LDRD Program at Los Alamos National Laboratory by the National Nuclear\nSecurity Administration of the U.S. Department of Energy under Contract No. DE-AC52-06NA25396.\n\n8\n\nllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5\u03b1*ij\u03b1ijH=0\u2329\uf8e6\u03b1ij - \u03b1*ij\uf8e6\u232a=0.0400 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5\u03b1*ij\u03b1ijH=15\u2329\uf8e6\u03b1ij - \u03b1*ij\uf8e6\u232a=0.0473\fReferences\n[1] C. Nowzari, V. Preciado, and G. Pappas. Analysis and control of epidemics: A survey of spreading\n\nprocesses on complex networks. Control Systems, IEEE, 36(1):26\u201346, 2016.\n\n[2] A. Y. Lokhov and D. Saad. Optimal deployment of resources for maximizing impact in spreading processes.\n\narXiv preprint arXiv:1608.08278, 2016.\n\n[3] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, and A. Vespignani. Epidemic processes in complex\n\nnetworks. Rev. Mod. Phys., 87:925\u2013979, 2015.\n\n[4] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang. Complex networks: Structure and\n\ndynamics. Physics reports, 424(4):175\u2013308, 2006.\n\n[5] I. Dobson, B. A. Carreras, V. E. Lynch, and D. E. Newman. Complex systems analysis of series of\n\nblackouts: Cascading failure, critical points, and self-organization. Chaos, 17(2):026103, 2007.\n\n[6] R. O\u2019Dea, J. J. Crofts, and M. Kaiser. Spreading dynamics on spatially constrained complex brain networks.\n\nJ. R. Soc. Interface, 10(81):20130016, 2013.\n\n[7] S. Myers and J. Leskovec. On the convexity of latent social network inference. In Advances in Neural\n\nInformation Processing Systems, pages 1741\u20131749, 2010.\n\n[8] M. Gomez-Rodriguez, D. Balduzzi, and B. Sch\u00f6lkopf. Uncovering the temporal dynamics of diffusion\nnetworks. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML\n\u201911, pages 561\u2013568, New York, NY, USA, June 2011. ACM.\n\n[9] N. Du, L. Song, M. Yuan, and A. J. Smola. Learning networks of heterogeneous in\ufb02uence. In Advances in\n\nNeural Information Processing Systems, pages 2780\u20132788, 2012.\n\n[10] P. Netrapalli and S. Sanghavi. Learning the graph of epidemic cascades. In ACM SIGMETRICS Performance\n\nEvaluation Review, volume 40, pages 211\u2013222. ACM, 2012.\n\n[11] D. Kempe, J. Kleinberg, and \u00c9. Tardos. Maximizing the spread of in\ufb02uence through a social network.\nIn Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data\nmining, pages 137\u2013146. ACM, 2003.\n\n[12] H. Daneshmand, M. Gomez-Rodriguez, L. Song, and B. Sch\u00f6lkopf. Estimating diffusion network structures:\nIn Proceedings of the 31st\n\nRecovery conditions, sample complexity & soft-thresholding algorithm.\nInternational Conference on Machine Learning (ICML-14), volume 2014, page 793, 2014.\n\n[13] J. Pouget-Abadie and T. Horel.\n\nInferring graphs from cascades: A sparse recovery framework.\n\nProceedings of The 32nd International Conference on Machine Learning, pages 977\u2013986, 2015.\n\nIn\n\n[14] B. Abrahao, F. Chierichetti, R. Kleinberg, and A. Panconesi. Trace complexity of network inference. In\nProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining,\npages 491\u2013499. ACM, 2013.\n\n[15] V. Gripon and M. Rabbat. Reconstructing a graph from path traces. In Information Theory Proceedings\n\n(ISIT), 2013 IEEE International Symposium on, pages 2488\u20132492. IEEE, 2013.\n\n[16] K. Amin, H. Heidari, and M. Kearns. Learning from contagion (without timestamps). In Proceedings of\n\nthe 31st International Conference on Machine Learning, pages 1845\u20131853, 2014.\n\n[17] E. Sefer and C. Kingsford. Convex risk minimization to infer networks from probabilistic diffusion data at\n\nmultiple scales. In Data Engineering (ICDE), 2015 IEEE 31th International Conference on, 2015.\n\n[18] M. Farajtabar, M. Gomez-Rodriguez, N. Du, M. Zamani, H. Zha, and L. Song. Back to the past: Source\nidenti\ufb01cation in diffusion networks from partially observed cascades. In Proceedings of the Eighteenth\nInternational Conference on Arti\ufb01cial Intelligence and Statistics (AISTATS), pages 232\u2013240, 2015.\n\n[19] B. Karrer and M. E. Newman. Message passing approach for general epidemic models. Physical Review E,\n\n82(1):016101, 2010.\n\n[20] A. Y. Lokhov, M. M\u00e9zard, and L. Zdeborov\u00e1. Dynamic message-passing equations for models with\n\nunidirectional dynamics. Physical Review E, 91(1):012811, 2015.\n\n[21] A. Y. Lokhov and T. Misiakiewicz. Ef\ufb01cient reconstruction of transmission probabilities in a spreading\n\nprocess from partial observations. arXiv preprint arXiv:1509.06893, 2016.\n\n[22] R. Rossi and N. Ahmed. Network repository, 2013. http://networkrepository.com.\n[23] F. Altarelli, A. Braunstein, L. Dall\u2019Asta, A. Lage-Castellanos, and R. Zecchina. Bayesian inference of\n\nepidemics on networks via belief propagation. Physical review letters, 112(11):118701, 2014.\n\n[24] S. F. Sampson. Crisis in a cloister. PhD thesis, Cornell University, Ithaca, 1969.\n[25] Bureau of transportation statistics. http://www.rita.dot.gov/bts/.\n\n9\n\n\f", "award": [], "sourceid": 1722, "authors": [{"given_name": "Andrey", "family_name": "Lokhov", "institution": "Los Alamos National Laboratory"}]}