{"title": "Continuous-Time Regression Models for Longitudinal Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 2492, "page_last": 2500, "abstract": "The development of statistical models for continuous-time longitudinal network data is of increasing interest in machine learning and social science. Leveraging ideas from survival and event history analysis, we introduce a continuous-time regression modeling framework for network event data that can incorporate both time-dependent network statistics and time-varying regression coefficients. We also develop an efficient inference scheme that allows our approach to scale to large networks. On synthetic and real-world data, empirical results demonstrate that the proposed inference approach can accurately estimate the coefficients of the regression model, which is useful for interpreting the evolution of the network; furthermore, the learned model has systematically better predictive performance compared to standard baseline methods.", "full_text": "Continuous-Time Regression Models for\n\nLongitudinal Networks\n\nDuy Q. Vu\n\nDepartment of Statistics\n\nPennsylvania State University\n\nUniversity Park, PA 16802\ndqv100@stat.psu.edu\n\nDavid R. Hunter\n\nDepartment of Statistics\n\nPennsylvania State University\n\nUniversity Park, PA 16802\n\nArthur U. Asuncion\u2217\n\nDepartment of Computer Science\nUniversity of California, Irvine\n\nIrvine, CA 92697\n\nasuncion@ics.uci.edu\n\nPadhraic Smyth\n\nDepartment of Computer Science\nUniversity of California, Irvine\n\nIrvine, CA 92697\n\ndhunter@stat.psu.edu\n\nsmyth@ics.uci.edu\n\nAbstract\n\nThe development of statistical models for continuous-time longitudinal network\ndata is of increasing interest in machine learning and social science. Leveraging\nideas from survival and event history analysis, we introduce a continuous-time\nregression modeling framework for network event data that can incorporate both\ntime-dependent network statistics and time-varying regression coef\ufb01cients. We\nalso develop an ef\ufb01cient inference scheme that allows our approach to scale to\nlarge networks. On synthetic and real-world data, empirical results demonstrate\nthat the proposed inference approach can accurately estimate the coef\ufb01cients of\nthe regression model, which is useful for interpreting the evolution of the network;\nfurthermore, the learned model has systematically better predictive performance\ncompared to standard baseline methods.\n\n1 Introduction\n\nThe analysis of the structure and evolution of network data is an increasingly important task in\na variety of disciplines, including biology and engineering. The emergence and growth of large-\nscale online social networks also provides motivation for the development of longitudinal models\nfor networks over time. While in many cases the data for an evolving network are recorded on a\ncontinuous time scale, a common approach is to analyze \u201csnapshot\u201d data (also known as collapsed\npanel data), where multiple cross-sectional snapshots of the network are recorded at discrete time\npoints. Various statistical frameworks have been previously proposed for discrete snapshot data,\nincluding dynamic versions of exponential random graph models [1, 2, 3] as well as dynamic block\nmodels and matrix factorization methods [4, 5]. In contrast, there is relatively little work to date on\ncontinuous-time models for large-scale longitudinal networks.\n\nIn this paper, we propose a general regression-based modeling framework for continuous-time net-\nwork event data. Our methods are inspired by survival and event history analysis [6, 7]; speci\ufb01cally,\nwe employ multivariate counting processes to model the edge dynamics of the network. Building\non recent work in this context [8, 9], we use both multiplicative and additive intensity functions that\nallow for the incorporation of arbitrary time-dependent network statistics; furthermore, we consider\n\n\u2217current af\ufb01liation: Google Inc.\n\n1\n\n\ftime-varying regression coef\ufb01cients for the additive approach. The additive form in particular en-\nables us to develop an ef\ufb01cient online inference scheme for estimating the time-varying coef\ufb01cients\nof the model, allowing the approach to scale to large networks. On synthetic and real-world data, we\nshow that the proposed scheme accurately estimates these coef\ufb01cients and that the learned model is\nuseful for both interpreting the evolution of the network and predicting future network events.\n\nThe speci\ufb01c contributions of this paper are: (1) We formulate a continuous-time regression model\nfor longitudinal network data with time-dependent statistics (and time-varying coef\ufb01cients for the\nadditive form); (2) we develop an accurate and ef\ufb01cient inference scheme for estimating the regres-\nsion coef\ufb01cients; and (3) we perform an experimental analysis on real-world longitudinal networks\nand demonstrate that the proposed framework is useful in terms of prediction and interpretability.\n\nThe next section introduces the general regression framework and the associated inference scheme\nis described in detail in Section 3. Section 4 describes the experimental results on synthetic and\nreal-world networks. Finally, we discuss related work and conclude with future research directions.\n\n2 Regression models for continuous-time network data\n\nBelow we introduce multiplicative and additive regression models for the edge formation process\nin a longitudinal network. We also describe non-recurrent event models and give examples of time-\ndependent statistics in this context.\n\n2.1 General framework\n\nAssume in our network that nodes arrive according to some stochastic process and directed edges\namong these nodes are created over time. Given the ordered pair (i, j) of nodes in the network at\ntime t, let Nij (t) be a counting process denoting the number of edges from i to j up to time t.\nIn this paper, each Nij(t) will equal zero or one, though this can be generalized. Combining the\nindividual counting processes of all potential edges gives a multivariate counting process N(t) =\n(Nij(t) : i, j \u2208 {1, . . . n}, i 6= j); we make no assumption about the independence of individual\nedge counting processes.\n(See [7] for an overview of counting processes.) We do not consider\nan edge dissolution process in this paper, although in theory it is possible to do so by placing a\nsecond counting process on each edge for dissolution events. (See [10, 3] for different examples\nof formation\u2013dissolution process models.) As proposed in [9], we model the multivariate counting\nprocess via the Doob-Meyer decomposition [7],\n\nN(t) = Z t\n\n0\n\n\u03bb(s) ds + M(t),\n\n(1)\n\nwhere essentially \u03bb(t) and M(t) may be viewed as the (deterministic) signal and (martingale) noise,\nrespectively. To model the so-called intensity process \u03bb(t), we denote the entire past of the network,\nup to but not including time t, by Ht\u2212 and consider for each potential directed edge (i, j) two\npossible intensity forms, the multiplicative Cox and the additive Aalen functions [7], respectively:\n\n\u03bbij (t|Ht\u2212 ) = Yij (t)\u03b10(t) exp(cid:2)\u03b2\u22a4s(i, j, t)(cid:3) ;\n\u03bbij (t|Ht\u2212 ) = Yij (t)(cid:2)\u03b20(t) + \u03b2(t)\u22a4s(i, j, t)(cid:3) ,\n\n(2)\n(3)\nwhere the \u201cat risk\u201d indicator function Yij(t) equals one if and only if (i, j) could form an edge\nat time t, a concept whose interpretation is determined by the context (e.g., see Section 2.2). In\nequations (2) and (3), s(i, j, t) is a vector of p statistics for directed edge (i, j) constructed based on\nHt\u2212; examples of these statistics are given in Section 2.2. In each of the two models, the intensity\nprocess depends on a linear combination of the coef\ufb01cients \u03b2, which can be time-varying in the\nadditive Aalen formulation. When all elements of sk(i, j, t) equal zero, we obtain the baseline\nhazards \u03b10(t) and \u03b20(t).\nThe two intensity forms above, the Cox and Aalen, each have their respective strengths (e.g., see [7,\nchapter 4]). In particular, the coef\ufb01cients of the Aalen model are quite easy to estimate via linear\nregression, unlike the Cox model. We leverage this computational advantage to develop an ef\ufb01cient\ninference algorithm for the Aalen model later in this paper. On the other hand, the Cox model\nforces the hazard function to be non-negative, while the Aalen model does not\u2014however, in our\nexperiments on both simulated and real-world data we did not encounter any issues with negative\nhazard functions when using the Aalen model.\n\n2\n\n\f2.2 Non-recurrent event models for network formation processes\n\nIf tarr\n\ni\n\nand tarr\n\nj\n\nare the arrival times of nodes i and j, then the risk indicator of equations (2) and (3)\n\ni\n\nj\n\n, tarr\n\n) < t \u2264 teij(cid:1). The time teij of directed edge (i, j) is taken to be +\u221e\nis Yij (t) = I(cid:0) max(tarr\nif the edge is never formed during the observation time. The reason for the upper bound teij is that\nthe counting process is non-recurrent; i.e., formation of an edge means that it can never occur again.\nThe network statistics s(i, j, t) of equations (2) and (3), corresponding to the ordered pair (i, j), can\nbe time-invariant (such as gender match) or time-dependent (such as the number of two-paths from\ni to j just before time t). Since it has been found empirically that most new edges in social networks\nare created between nodes separated by two hops [11], we limit our statistics to the following:\n\n1. Out-degree of sender i: s1(i, j, t) = Ph\u2208V,h6=i Nih(t\u2212)\n2. In-degree of sender i: s2(i, j, t) = Ph\u2208V,h6=i Nhi(t\u2212)\n3. Out-degree of receiver j: s3(i, j, t) = Ph\u2208V,h6=j Njh(t\u2212)\n4. In-degree of receiver j: s4(i, j, t) = Ph\u2208V,h6=j Nhj(t\u2212)\n5. Reciprocity: s5(i, j, t) = Nji(t\u2212)\n6. Transitivity: s6(i, j, t) = Ph\u2208V,h6=i,j Nih(t\u2212)Nhj(t\u2212)\n7. Shared contactees: s7(i, j, t) = Ph\u2208V,h6=i,j Nih(t\u2212)Njh(t\u2212)\n8. Triangle closure: s8(i, j, t) = Ph\u2208V,h6=i,j Nhi(t\u2212)Njh(t\u2212)\n9. Shared contacters: s9(i, j, t) = Ph\u2208V,h6=i,j Nhi(t\u2212)Nhj(t\u2212)\n\nHere Nji(t\u2212) denotes the value of the counting process (i, j) right before time t. While this paper\nfocuses on the non-recurrent setting for simplicity, one can also develop recurrent models using this\nframework, by capturing an alternative set of statistics specialized for the recurrent case [8, 12, 9].\nSuch models are useful for data where interaction edges occur multiple times (e.g., email data).\n\n3 Inference techniques\n\nIn this section, we describe algorithms for estimating the coef\ufb01cients of the multiplicative Cox and\nadditive Aalen models. We also discuss an ef\ufb01cient online inference technique for the Aalen model.\n\n3.1 Estimation for the Cox model\n\nRecent work has posited Cox models similar to (2) with the goal of estimating general network\neffects [8, 12] or citation network effects [9]. Typically, \u03b10(t) is considered a nuisance parameter,\nand estimation for \u03b2 proceeds by maximization of the so-called partial likelihood of Cox [13]:\n\n,\n\n(4)\n\nL(\u03b2) =\n\nm\n\nYe=1\n\nexp(cid:0)\u03b2\u22a4s(ie, je, te)(cid:1)\n\nPn\ni=1 Pj6=i Yij(te) exp(cid:0)\u03b2\u22a4s(i, j, te)(cid:1)\n\nwhere m is the number of edge formation events, and te, ie, and je are the time, sender, and receiver\nof the eth event. In this paper, maximization is performed via the Newton-Raphson algorithm. The\ncovariance matrix of \u02c6\u03b2 is estimated as the inverse of the negative Hessian matrix of the last iteration.\nWe use the caching method of [9] to compute the likelihood, the score vector, and the Hessian matrix\nmore ef\ufb01ciently. We will illustrate this method through the computation of the likelihood, where the\nmost expensive computation is for the denominator\n\nn\n\n\u03ba(te) =\n\nXi=1 Xj6=i\n\nYij (te) exp(cid:0)\u03b2\u22a4s(i, j, te)(cid:1).\n\n(5)\n\nFor models such as the one in Section 2.2, a na\u00a8\u0131ve update for \u03ba(te) needs O(pn2) operations, where\nn is the the current number of nodes. A na\u00a8\u0131ve calculation of log L(\u03b2) needs O(mpn2) operations\n(where m is the number of edge events), which is costly since m and n may be large. Calculations\nof the score vector and Hessian matrix are similar, though they involve higher exponents of p.\n\n3\n\n\fAlternatively, as in [9], we may simply write \u03ba(te) = \u03ba(te\u22121) + \u2206\u03ba(te), where \u2206\u03ba(te) entails all\nof the possible changes that occur during the time interval [te\u22121, te). Since we assume in this paper\nthat edges do not dissolve, it is necessary to keep track only of the group of edges whose covariates\nchange during this interval, which we call Ue\u22121, and those that \ufb01rst become at risk during this inter-\nval, which we call Ce\u22121. These groups of edges may be cached in memory during an initialization\nstep; then, subsequent calculations of \u2206\u03ba(te) are simple functions of the values of s(i, j, te\u22121) and\ns(i, j, te) for (i, j) in these two groups (for Ce\u22121, only the time te statistic is relevant).\nThe number of edges cached at each time step tends to be small, generally O(n) because our network\nstatistics s are limited to those based on node degrees and two-paths. This leads to substantial\ncomputational savings; since we must still initialize \u03ba(t1), the total computational complexity of\neach Newton-Raphson iteration is O(p2n2 + m(p2n + p3)).\n\n3.2 Estimation for the Aalen model\n\nInference in model (3) proceeds not for the \u03b2k parameters directly but rather for their time-integrals\n\nBk(t) = Z t\n\n0\n\n\u03b2k(s)ds.\n\n(6)\n\nThe reason for this is that B(t) = [B1(t), . . . , Bp(t)] may be estimated straightforwardly using a\nprocedure akin to simple least squares [7]: First, let us impose some ordering on the n(n \u2212 1)\npossible ordered pairs (i, j) of nodes. Take W(t) to be the n(n \u2212 1) \u00d7 p matrix whose (i, j)th row\nequals Yij (t)s(i, j, t)\u22a4. Then\n\n\u02c6B(t) = Z t\n\n0\n\nJ(s)W\u2212(s)dN(s) = Xte\u2264t\n\nJ(te)W\u2212(te)\u2206N(te)\n\n(7)\n\nis the estimator of B(t), where the multivariate counting process N(te) uses the same ordering of its\nn(n \u2212 1) entries as the W(t) matrix,\n\nW\u2212(t) = (cid:2)W(t)\u22a4W(t)(cid:3)\u22121W(t)\u22a4,\n\nand J(t) is the indicator that W(t) has full column rank, where we take J(t)W\u2212(t) = 0 whenever\nW(t) does not have full column rank. As with typical least squares, a covariance matrix for these\n\u02c6B(t) may also be estimated [7]; we give a formula for this matrix in equation (11). If estimates of\n\u03b2k(t) are desired for the sake of interpretability, a kernel smoothing method may be used:\n\n\u02c6\u03b2k(t) =\n\n1\n\nb Xte\n\nK(cid:16) t \u2212 te\n\nb (cid:17)\u2206 \u02c6Bk(te),\n\n(8)\n\nwhere b is the bandwidth parameter, \u2206 \u02c6Bk(te) = \u02c6Bk(te) \u2212 \u02c6Bk(te\u22121), and K is a bounded kernel\nfunction with compact support [\u22121, 1] such as the Epanechnikov kernel.\n\n3.3 Online inference for the Aalen model\n\nSimilar to the caching method for the Cox model in Section 3.1, it is possible to streamline the\ncomputations for estimating the integrated Aalen model coef\ufb01cients B(t). First, we rewrite (7) as\n\n\u02c6B(t) = Xte\u2264t\n\nJ(te)(cid:2)W(te)\u22a4W(te)(cid:3)\u22121W(te)\u22a4\u2206N(te) = Xte\u2264t\n\nA\u22121(te)W(te)\u22a4\u2206N(te),\n\n(9)\n\nwhere A(te) = W(te)\u22a4W(te) and J(te) is omitted because for large network data sets and for\nreasonable choices of starting observation times, the covariate matrix is always of full rank. The\ncomputation of W(te)\u22a4\u2206N(te) is simple because \u2206N(te) consists of all zeros except for a single\nentry equal to one. The most expensive computation is to update the (p + 1) \u00d7 (p + 1) matrix A(te)\nat every event time te; inverting A(te) is not expensive since p is relatively small.\nUsing Ue\u22121 and Ce\u22121 as in Section 3.1, the component (k, l) of the matrix A(te) corresponding to\ncovariates k and l can be written as Akl(te) = Akl(te\u22121) + \u2206Akl(te\u22121), where\n\n\u2206Akl(te\u22121) = \u2212 X(i,j)\u2208Ue\u22121\n\nWijk(te\u22121)Wijl(te\u22121) + X(i,j)\u2208Ue\u22121 \u222aCe\u22121\n\nWijk(te)Wijl(te).\n\n(10)\n\n4\n\n\fFor models such as the one presented in Section 2.2, if n is the current number of nodes, the cost of\nna\u00a8\u0131vely calculating Akl(te) by iterating through all \u201cat-risk\u201d edges is nearly n2. As in Section 3.1,\nthe cost will be O(n) if we instead use caching together with equation (10). In other cases, there\nmay be restrictions on the set of edges at risk at a particular time. Here the computational burden for\nthe na\u00a8\u0131ve calculation can be substantially smaller than O(n2); yet it is generally the case that using\n(10) will still provide a substantial reduction in computing effort.\nOur online inference algorithm during the time interval [te\u22121, te) may be summarized as follows:\n\n1. Update A(te\u22121) using equation (10).\n2. Compute \u02c6B(te\u22121) = \u02c6B(te\u22122) + A\u22121(te\u22121)W(te\u22121)\u2032\u2206N(te\u22121).\n3. Compute and cache the network statistics changed by the event e \u2212 1, then initialize Ue\u22121\n\nwith a list of those at-risk edges whose network statistics are changed by this event.\n\n4. Compute and cache all values of network statistics changed during the time interval\n\n[te\u22121, te). De\ufb01ne Ce\u22121 as the set of edges that switch to at-risk during this interval.\n\n5. Before considering the event e:\n\n(a) Compute look-ahead summations at time te\u22121 indexed by Ue\u22121.\n(b) Update the covariate matrix W(te\u22121) based on the cache.\n(c) Compute forward summations at time te indexed by Ue\u22121 and Ce\u22121.\n\nFor the \ufb01rst event, A(t1) must be initialized by na\u00a8\u0131ve summation over all current at-risk edges, which\nrequires O(p2n2) calculations. Assuming that the number n of nodes stays roughly the same over\neach of the m edge events, the overall computational complexity of this online inference algorithm\nis thus O(p2n2 + m(p2n + p3)). If a covariance matrix estimate for \u02c6B(t) is desired, it can also be\nderived online using the ideas above, since we may write it as\n\nW\u2212(te)diag{\u2206N(te)}W\u2212(te)\u22a4 = Xte\u2264t\n\n\u02c6\u03a3(t) = Xte\u2264t\nwhere Wije(te) denotes the vector W(te)\u22a4\u2206N(te) and \u2297 is the outer product.\n\nA\u22121(te)(cid:2)Wije (te) \u2297 Wije(te)(cid:3)A\u22121(te), (11)\n\n4 Experimental analysis\n\nIn this section, we empirically analyze the ability of our inference methods to estimate the regression\ncoef\ufb01cients as well as the predictive power of the learned models. Before discussing the experimen-\ntal results, we brie\ufb02y describe the synthetic and real-world data sets that we use for evaluation.\n\nWe simulate two data sets, SIM-1 and SIM-2, from ground-truth regression coef\ufb01cients. In par-\nticular, we simulate a network formation process starting from time unit 0 until time 1200, where\nnodes arrive in the network at a constant rate \u03bb0 = 10 (i.e., on average, 10 nodes join the network\nat each time unit); the resulting simulated networks have 11,997 nodes. The edge formation pro-\ncess is simulated via Otaga\u2019s modi\ufb01ed thinning algorithm [14] with an additive conditional intensity\nfunction. From time 0 to 1000, the baseline coef\ufb01cient is set to \u03b20 = 10\u22126; the coef\ufb01cients for\nsender out-degree and receiver in-degree are set to \u03b21 = \u03b24 = 10\u22127; the coef\ufb01cients for reciprocity,\ntransitivity, and shared contacters are set to \u03b25 = \u03b26 = \u03b29 = 10\u22125; and the coef\ufb01cients for sender\nin-degree, receiver out-degree, shared contactees, and triangle closure are set to 0. For SIM-1, these\ncoef\ufb01cients are kept constant and 118,672 edges are created. For SIM-2, between times 1000 and\n1150, we increase the coef\ufb01cients for transitivity and shared contacters to \u03b26 = \u03b29 = 4 \u00d7 10\u22125, and\nafter 1150, the coef\ufb01cients return to their original values; in this case, 127,590 edges are created.\n\nWe also evaluate our approach on two real-world data sets, IRVINE and METAFILTER. IRVINE\nis a longitudinal data set derived from an online social network of students at UC Irvine [15]. This\ndataset has 1,899 users and 20,296 directed contact edges between users, with timestamps for each\nnode arrival and edge creation event. This longitudinal network spans April to October of 2004.\nThe METAFILTER data set is from a community weblog where users can share links and discuss\nWeb content1. This dataset has 51,362 users and 76,791 directed contact edges between users. The\ncontinuous-time observation spans 8/31/2007 to 2/5/2011. Note that both data sets are non-recurrent\nin that the creation of an edge between two nodes only occurs at most once.\n\n1The METAFILTER data are available at http://mssv.net/wiki/index.php/Infodump\n\n5\n\n\ft\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nC\n\n5\n\u2212\ne\n4\n\n5\n\u2212\ne\n1\n\n1150\n\n1000\n\n(b)\n\nt\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nC\n\n5\n\u2212\ne\n4\n\n5\n\u2212\ne\n1\n\n1150\n\n1000\n\nt\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nC\n\n5\n\u2212\ne\n4\n\n5\n\u2212\ne\n1\n\n1150\n\n1000\n\n1000\n\nTime\n\nConstant\nTransitivity\n\nTime\n\nConstant\n\n(c)\n\nShared Contacters\n\nTime\n\nPiecewise\nTransitivity\n\n(d)\n\n1150\n\nTime\n\nPiecewise\n\nShared Contacters\n\nFigure 1: (a,b) Estimated time-varying coef\ufb01cients on SIM-1; (c,d) Estimated time-varying coef\ufb01-\ncients on SIM-2. Ground-truth coef\ufb01cients are also shown in red dashed lines.\n\n2\n0\n.\n\nt\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nC\n\n9/9/04\n\n0\n\n6/1/04\n\n3\n0\n0\n0\n.\n\nt\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nC\n\n9/9/04\n\n0\n\n6/1/04\n\n0\n\nt\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nC\n\n5\n\u2212\ne\n5\n\u2212\n\n9/9/04\n\n6/1/04\n\nt\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nC\n\n5\n\u2212\ne\n4\n\n5\n\u2212\ne\n1\n\n(a)\n\n5\n\u2212\ne\n3\n\nt\n\ni\n\nn\ne\nc\ni\nf\nf\n\ne\no\nC\n\n0\n\n6/1/04\n\n8\n\u2212\ne\n1\n\ni\n\nt\nn\ne\nc\ni\nf\nf\ne\no\nC\n\n0\n\n1/21/10\n\n7/21/04\nTime\n\n7/21/04\nTime\n\n7/21/04\nTime\n\n7/21/04\nTime\n\n9/9/04\n\n(a) Sender Out-Degree\n\n(b) Reciprocity\n\n(c) Transitivity\n\n(d) Shared Contacters\n\nFigure 2: Estimated time-varying coef\ufb01cients on IRVINE data. These plots suggest that there are\ntwo distinct phases of network evolution, consistent with an independent analysis of these data [15].\n\n4\n\u2212\ne\n4\n\ni\n\nt\nn\ne\nc\ni\nf\nf\ne\no\nC\n\n12/2/10\n\n0\n\n1/21/10\n\n5\n\u2212\ne\n1\n\ni\n\nt\nn\ne\nc\ni\nf\nf\ne\no\nC\n\n12/2/10\n\n0\n\n1/21/10\n\ni\n\nt\nn\ne\nc\ni\nf\nf\ne\no\nC\n\n0\n\n6\n\u2212\ne\n2\n\u2212\n\n12/2/10\n\n1/21/10\n\n7/10/10\nTime\n\n7/10/10\nTime\n\n7/10/10\nTime\n\n7/10/10\nTime\n\n12/2/10\n\n(a) Sender Out-Degree\n\n(b) Reciprocity\n\n(c) Transitivity\n\n(d) Shared Contacters\n\nFigure 3: Estimated time-varying coef\ufb01cients on METAFILTER. Here, the network effects continu-\nously change during the observation time.\n\n4.1 Recovering the time-varying regression coef\ufb01cients\nThis section focuses on the ability of our additive Aalen modeling approach to estimate the time-\nvarying coef\ufb01cients, given an observed longitudinal network.\n\nThe \ufb01rst set of experiments attempts to recover the ground-truth coef\ufb01cients on SIM-1 and SIM-2.\nWe run the inference algorithm described in Section 3.3 and use an Epanechnikov smoothing kernel\n(with a bandwidth of 10 time units) to obtain smoothed coef\ufb01cients. On SIM-1, Figures 1(a,b)\nshow the estimated coef\ufb01cients associated with the transitivity and shared contacters statistics, as\nwell as the ground-truth coef\ufb01cients. Likewise, Figures 1(c,d) show the same estimated and ground-\ntruth coef\ufb01cients for SIM-2. These results demonstrate that our inference algorithm can accurately\nrecover the ground-truth coef\ufb01cients in cases where the coef\ufb01cients are \ufb01xed (SIM-1) and modulated\n(SIM-2). We also tried other settings for the ground-truth coef\ufb01cients (e.g., multiple sinusoidal-like\nbumps) and found that our approach can accurately recover the coef\ufb01cients in those cases as well.\n\nOn the IRVINE and METAFILTER data, we also learn time-varying coef\ufb01cients which are use-\nful for interpreting network evolution. Figure 2 shows several of the estimated coef\ufb01cients for the\nIRVINE data, using an Epanechnikov kernel (with a bandwidth of 30 days). These coef\ufb01cients\nsuggest the existence of two distinct phases in the evolution of the network. In the \ufb01rst phase of\nnetwork formation, the network grows at an accelerated rate. Positive coef\ufb01cients for sender out-\ndegree, reciprocity, and transitivity in these plots imply that users with a high numbers of friends\ntend to make more friends, tend to reciprocate their relations, and tend to make friends with their\nfriends\u2019 friends, respectively. However, these coef\ufb01cients decrease towards zero (the blue line) and\nenter a second phase where the network is structurally stable. Both of these phases have also been\nobserved in an independent study of the data [15]. Figure 3 shows the estimated coef\ufb01cients for\nMETAFILTER, using an Epanechnikov kernel (with a bandwidth of 30). Interestingly, the coef\ufb01-\ncients suggest that there is a marked change in the edge formation process around 7/10/10. Unlike\nthe IRVINE coef\ufb01cients, the estimated METAFILTER coef\ufb01cients continue to vary over time.\n\n6\n\n\fTable 1: Lengths of building, training, and test periods. The number of events are in parentheses.\n\nIRVINE\n4/15/04 \u2013 5/11/04 (7073)\nMETAFILTER 6/15/04 \u2013 12/21/09 (60376)\n\n5/12/04 \u2013 5/31/04 (7646)\n12/22/09 \u2013 7/9/10 (8763)\n\n6/1/04 \u2013 10/19/04 (5507)\n7/10/10 \u2013 2/5/11 (7620)\n\nBuilding\n\nTraining\n\nTest\n\n4.2 Predicting future links\n\nWe perform rolling prediction experiments over the real-world data sets to evaluate the predictive\npower of the learned regression models. Following the evaluation methodology of [9], we split\neach longitudinal data set into three periods: a statistics-building period, a training period, and a test\nperiod (Table 1). The statistics-building period is used solely to build up the network statistics, while\nthe training period is used to learn the coef\ufb01cients and the test period is used to make predictions.\nThroughout the training and test periods, the time-dependent statistics are continuously updated.\nFurthermore, for the additive Aalen model, we use the online inference technique from Section 3.3.\nWhen we predict an event in the test period, all the previous events from the test period are used\nas training data as well. Meanwhile, for the multiplicative Cox model, we adaptively learn the\nmodel in batch-online fashion; during the test period, for every 10 days, we retrain the model (using\nthe Newton-Raphson technique described in Section 3.1) with additional training examples coming\nfrom the test set. Our Newton-Raphson implementation uses a step-halving procedure, halving the\nlength of each step if necessary until log L(\u03b2) increases. The iterations continue until every element\nin \u2207 log L(\u03b2) is smaller that 10\u22123 in absolute value, or until the relative increase in log L(\u03b2) is less\nthan 10\u2212100, or until 100 Newton-Raphson iterations are reached, whichever occurs \ufb01rst.\nThe baseline that we consider is logistic regression (LR) with the same time-dependent statistics\nused in the Aalen and Cox models. Note that logistic regression is a competitive baseline that\nhas been used in previous link prediction studies (e.g., [11]). We learn the LR model in the same\nadaptive batch-online fashion as the Cox model. We also use case control sampling to address the\nimbalance between positive and negative cases (since at each \u201cpositive\u201d edge event there are order\nof n2 \u201cnegative\u201d training cases). At each event, we sample K negative training examples for that\nsame time point. We use two settings for K in the experiments: K = 10 and K = 50.\nTo make predictions using the additive Aalen model, one would need to extrapolate the time-varying\ncoef\ufb01cients to future time points. For simplicity, we use a uniform smoothing kernel (weighting all\nobservations equally), with a window size of 1 or 10 days. A more advanced extrapolation technique\ncould yield even better predictive performance for the Aalen model.\n\nEach model can provide us with the probability of an edge formation event between two nodes at a\ngiven point in time, and so we can calculate an accumulative recall metric across all test events:\n\nRecall = P(i\u2192j,t)\u2208TestSet I[j \u2208 Top(i, t, K)]\n\n|TestSet|\n\n,\n\n(12)\n\nwhere Top(i, t, K) is the top-K list of i\u2019s potential \u201cfriends\u201d ranked based on intensity \u03bbij(t).\nWe evaluate the predictive performance of the Aalen model (with smoothing windows of 1 and 10),\nthe Cox model, and the LR baseline (with case control ratios 1:10 and 1:50). Figure 4(a) shows the\nrecall results on IRVINE. In this case, both the Aalen and Cox models outperform the LR baseline;\nfurthermore, it is interesting to note that the Aalen model with time-varying coef\ufb01cients does not\noutperform the Cox model. One explanation for this result is that the IRVINE coef\ufb01cients are pretty\nstable (apart from the initial phase as shown in Figure 2), and thus time-varying coef\ufb01cients do not\nprovide additional predictive power in this case. Also note that LR with ratio 1:10 outperforms 1:50.\nWe also tried an LR ratio of 1:3 (not shown) but found that it performed nearly identically to LR\n1:10; thus, both the Aalen and Cox models outperform the baseline substantially on these data.\n\nFigure 4(b) shows the recall results on METAFILTER. As in the previous case, both the Aalen and\nCox models signi\ufb01cantly outperform the LR baseline. However, the Aalen model with time-varying\ncoef\ufb01cients also substantially outperforms the Cox model with time-\ufb01xed coef\ufb01cients. In this case,\nestimating time-varying coef\ufb01cients improves predictive performance, which makes sense because\nwe have seen in Figure 3 that METAFILTER\u2019s coef\ufb01cients tend to vary more over time. We also\ncalculated precision results (not shown) on these data sets which con\ufb01rm these conclusions.\n\n7\n\n\f4\n\n.\n\n0\n\n3\n0\n\n.\n\n2\n0\n\n.\n\nl\nl\n\na\nc\ne\nR\n\nAdaptive LR (1:10)\nAdaptive LR (1:50)\nAdaptive Cox\nAalen (Uniform\u22121)\nAalen (Uniform\u221210)\n20\n\n15\n\n1\n\n5\n\n10\n\n3\n\n.\n\n0\n\n2\n\n.\n\n0\n\nl\nl\n\na\nc\ne\nR\n\n1\n0\n\n.\n\nAdaptive LR (1:10)\nAdaptive LR (1:50)\nAdaptive Cox\nAalen (Uniform\u22121)\nAalen (Uniform\u221210)\n20\n\n15\n\n1\n\n5\n\n10\n\nCut\u2212Point K\n\n(a) IRVINE\n\nCut\u2212Point K\n\n(b) METAFILTER\n\nFigure 4: Predictive performance of the additive Aalen model, multiplicative Cox model, and logistic\nregression baseline on the IRVINE and METAFILTER data sets, using recall as the metric.\n\n5 Related Work and Conclusions\n\nEvolving networks have been descriptively analyzed in exploratory fashion in a variety of domains,\nincluding email data [16], citation graphs [17], and online social networks [18]. On the model-\ning side, temporal versions of exponential random graph models [1, 2, 3] and latent space mod-\nels [19, 4, 5, 20] have been developed. Such methods operate on cross-sectional snapshot data, while\nour framework models continuous-time network event data. It is worth noting that continuous-time\nMarkov process models for longitudinal networks have been proposed previously [21]; however,\nthese approaches have only been applied to very small networks, while our regression-based ap-\nproach can scale to large networks. Recently, there has also been work on inferring unobserved\ntime-varying networks from evolving nodal attributes which are observed [22, 23, 24]. In this paper,\nthe main focus is the statistical modeling of observed continuous-time networks.\n\nMore recently, survival and event history models based on the Cox model have been applied to\nnetwork data [8, 12, 9]. A signi\ufb01cant difference between our previous work [9] and this paper is\nthat scalability is achieved in our earlier work by restricting the approach to \u201cegocentric\u201d modeling,\nin which counting processes are placed only on nodes.\nIn contrast, here we formulate scalable\ninference techniques for the general \u201crelational\u201d setting where counting processes are placed on\nedges. Prior work also assumed static regression coef\ufb01cients, while here we develop a framework for\ntime-varying coef\ufb01cients for the additive Aalen model. Regression models with varying coef\ufb01cients\nhave been previously proposed in other contexts [25], including a time-varying version of the Cox\nmodel [26], although to the best of our knowledge such models have not been developed or \ufb01tted on\nlongitudinal networks.\n\nA variety of link prediction techniques have also been investigated by the machine learning commu-\nnity over the past decade (e.g., [27, 28, 29]). Many of these methods use standard classi\ufb01ers (such as\nlogistic regression) and take advantage of key features (such as similarity measures among nodes)\nto make accurate predictions. While our focus is not on feature engineering, we note that arbitrary\nnetwork and nodal features such as those developed for link prediction can be incorporated into our\ncontinuous-time regression framework. Other link prediction techniques based on matrix factoriza-\ntion [30] and random walks [11] have also been studied. While these link prediction techniques\nmainly focus on making accurate predictions, our proposed approach here not only gives accurate\npredictions but also provides a statistical model (with time-varying coef\ufb01cient estimates) which can\nbe useful in evaluating scienti\ufb01c hypotheses.\n\nIn summary, we have developed multiplicative and additive regression models for large-scale\ncontinuous-time longitudinal networks. On simulated and real-world data, we have shown that\nthe proposed inference approach can accurately estimate regression coef\ufb01cients and that the learned\nmodel can be used for interpreting network evolution and predicting future network events. An in-\nteresting direction for future work would be to incorporate time-dependent nodal attributes (such as\ntextual content) into this framework and to investigate regularization methods for these models.\n\nAcknowledgments\nThis work is supported by ONR under the MURI program, Award Number N00014-08-1-1015.\n\n8\n\n\fReferences\n[1] S. Hanneke and E. P. Xing. Discrete temporal models of social networks. In Proc. 2006 Conf. on Statistical\n\nNetwork Analysis, pages 115\u2013125. Springer-Verlag, 2006.\n\n[2] D. Wyatt, T. Choudhury, and J. Bilmes. Discovering long range properties of social networks with multi-\n\nvalued time-inhomogeneous models. In Proc. 24th AAAI Conf. on AI, 2010.\n\n[3] P. N. Krivitsky and M. S. Handcock. A separable model for dynamic networks. Under review, November\n\n2010. http://arxiv.org/abs/1011.1937.\n\n[4] W. Fu, L. Song, and E. P. Xing. Dynamic mixed membership blockmodel for evolving networks. In Proc.\n\n26th Intl. Conf. on Machine Learning, pages 329\u2013336. ACM, 2009.\n\n[5] J. Foulds, C. DuBois, A. Asuncion, C. Butts, and P. Smyth. A dynamic relational in\ufb01nite feature model\nIn AI and Statistics, volume 15 of JMLR W&C Proceedings, pages\n\nfor longitudinal social networks.\n287\u2013295, 2011.\n\n[6] P. K. Andersen, O. Borgan, R. D. Gill, and N. Keiding. Statistical Models Based on Counting Processes.\n\nSpringer, 1993.\n\n[7] O. O. Aalen, O. Borgan, and H. K. Gjessing. Survival and Event History Analysis: A Process Point of\n\nView. Springer, 2008.\n\n[8] C. T. Butts. A relational event framework for social action. Soc. Meth., 38(1):155\u2013200, 2008.\n[9] D. Q. Vu, A. U. Asuncion, D. R. Hunter, and P. Smyth. Dynamic egocentric models for citation networks.\n\nIn Proc. 28th Intl. Conf. on Machine Learning, pages 857\u2013864, 2011.\n\n[10] P. Holland and S. Leinhardt. A dynamic model for social networks. J. Math. Soc., 5:5\u201320, 1977.\n[11] L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social\nnetworks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining,\npages 635\u2013644. ACM, 2011.\n\n[12] P. O. Perry and P. J. Wolfe. Point process modeling for directed interaction networks. Under review,\n\nOctober 2011. http://arxiv.org/abs/1011.1703.\n\n[13] D. R. Cox. Regression models and life-tables. J. Roy. Stat. Soc., Series B, 34:187\u2013220, 1972.\n[14] D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, Volume 1. Probability\n\nand its Applications (New York). Springer, New York, 2nd edition, 2008.\n\n[15] P. Panzarasa, T. Opsahl, and K. M. Carley. Patterns and dynamics of users\u2019 behavior and interaction:\n\nNetwork analysis of an online community. J. Amer. Soc. for Inf. Sci. and Tech., 60(5):911\u2013932, 2009.\n\n[16] G. Kossinets and D. J. Watts. Empirical analysis of an evolving social network. Science, 311(5757):88\u2013\n\n90, 2006.\n\n[17] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densi\ufb01cation laws, shrinking diameters\nIn Proc. 11th ACM SIGKDD Intl. Conf. on Knowledge Discovery in Data\n\nand possible explanations.\nMining, pages 177\u2013187. ACM, 2005.\n\n[18] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in Facebook.\n\nIn Proc. 2nd ACM SIGCOMM Wkshp. on Social Networks, pages 37\u201342. ACM, 2009.\n\n[19] P. Sarkar and A. Moore. Dynamic social network analysis using latent space models. SIGKDD Explo-\n\nrations, 7(2):31\u201340, 2005.\n\n[20] Q. Ho, L. Song, and E. Xing. Evolving cluster mixed-membership blockmodel for time-varying networks.\n\nIn AI and Statistics, volume 15 of JMLR W&C Proceedings, pages 342\u2013350, 2011.\n\n[21] T. A. B. Snijders. Models for longitudinal network data. Mod. Meth. in Soc. Ntwk. Anal., pages 215\u2013247,\n\n2005.\n\n[22] S. Zhou, J. Lafferty, and L. Wasserman. Time varying undirected graphs. Machine Learning, 80:295\u2013319,\n\n2010.\n\n[23] A. Ahmed and E. P. Xing. Recovering time-varying networks of dependencies in social and biological\n\nstudies. Proc. Natl. Acad. Scien., 106(29):11878\u201311883, 2009.\n\n[24] M. Kolar, L. Song, A. Ahmed, and E. P. Xing. Estimating time-varying networks. Ann. Appl. Stat.,\n\n4(1):94\u2013123, 2010.\n\n[25] Z. Cai, J. Fan, and R. Li. Ef\ufb01cient estimation and inferences for varying-coef\ufb01cient models. J. Amer. Stat.\n\nAssn., 95(451):888\u2013902, 2000.\n\n[26] T. Martinussen and T.H. Scheike. Dynamic Regression Models for Survival Data. Springer, 2006.\n[27] D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Amer. Soc. for Inf.\n\nSci. and Tech., 58(7):1019\u20131031, 2007.\n\n[28] M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki. Link prediction using supervised learning. In SDM \u201906:\n\nWorkshop on Link Analysis, Counter-terrorism and Security, 2006.\n\n[29] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive and negative links in online social\n\nnetworks. In Proc. 19th Intl. World Wide Web Conference, pages 641\u2013650. ACM, 2010.\n\n[30] D. M. Dunlavy, T. G. Kolda, and E. Acar. Temporal link prediction using matrix and tensor factorizations.\n\nACM Transactions on Knowledge Discovery from Data, 5(2):10, February 2011.\n\n9\n\n\f", "award": [], "sourceid": 1343, "authors": [{"given_name": "Duy", "family_name": "Vu", "institution": null}, {"given_name": "David", "family_name": "Hunter", "institution": null}, {"given_name": "Padhraic", "family_name": "Smyth", "institution": null}, {"given_name": "Arthur", "family_name": "Asuncion", "institution": null}]}