{"title": "Nonparametric Multi-group Membership Model for Dynamic Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 1385, "page_last": 1393, "abstract": "Relational data\u2014like graphs, networks, and matrices\u2014is often dynamic, where the relational structure evolves over time. A fundamental problem in the analysis of time-varying network data is to extract a summary of the common structure and the dynamics of underlying relations between entities. Here we build on the intuition that changes in the network structure are driven by the dynamics at the level of groups of nodes. We propose a nonparametric multi-group membership model for dynamic networks. Our model contains three main components. We model the birth and death of groups with respect to the dynamics of the network structure via a distance dependent Indian Buffet Process. We capture the evolution of individual node group memberships via a Factorial Hidden Markov model. And, we explain the dynamics of the network structure by explicitly modeling the connectivity structure. We demonstrate our model\u2019s capability of identifying the dynamics of latent groups in a number of different types of network data. Experimental results show our model achieves higher predictive performance on the future network forecasting and missing link prediction.", "full_text": "Nonparametric Multi-group Membership Model\n\nfor Dynamic Networks\n\nMyunghwan Kim\nStanford University\nStanford, CA 94305\n\nJure Leskovec\n\nStanford University\nStanford, CA 94305\n\nmykim@stanford.edu\n\njure@cs.stanford.edu\n\nRelational data\u2014like graphs, networks, and matrices\u2014is often dynamic, where the relational struc-\nture evolves over time. A fundamental problem in the analysis of time-varying network data is to\nextract a summary of the common structure and the dynamics of the underlying relations between\nthe entities. Here we build on the intuition that changes in the network structure are driven by dy-\nnamics at the level of groups of nodes. We propose a nonparametric multi-group membership model\nfor dynamic networks. Our model contains three main components: We model the birth and death of\nindividual groups with respect to the dynamics of the network structure via a distance dependent In-\ndian Buffet Process. We capture the evolution of individual node group memberships via a Factorial\nHidden Markov model. And, we explain the dynamics of the network structure by explicitly mod-\neling the connectivity structure of groups. We demonstrate our model\u2019s capability of identifying the\ndynamics of latent groups in a number of different types of network data. Experimental results show\nthat our model provides improved predictive performance over existing dynamic network models on\nfuture network forecasting and missing link prediction.\n\n1 Introduction\n\nStatistical analysis of social networks and other relational data is becoming an increasingly impor-\ntant problem as the scope and availability of network data increases. Network data\u2014such as the\nfriendships in a social network\u2014is often dynamic in a sense that relations between entities rise and\ndecay over time. A fundamental problem in the analysis of such dynamic network data is to extract\na summary of the common structure and the dynamics of the underlying relations between entities.\n\nAccurate models of structure and dynamics of network data have many applications. They allow us\nto predict missing relationships [20, 21, 23], recommend potential new relations [2], identify clusters\nand groups of nodes [1, 29], forecast future links [4, 9, 11, 24], and even predict group growth and\nlongevity [15].\n\nHere we present a new approach to modeling network dynamics by considering time-evolving inter-\nactions between groups of nodes as well as the arrival and departure dynamics of individual nodes\nto these groups. We develop a dynamic network model, Dynamic Multi-group Membership Graph\nModel, that identi\ufb01es the birth and death of individual groups as well as the dynamics of node join-\ning and leaving groups in order to explain changes in the underlying network linking structure. Our\nnonparametric model considers an in\ufb01nite number of latent groups, where each node can belong to\nmultiple groups simultaneously. We capture the evolution of individual node group memberships\nvia a Factorial Hidden Markov model. However, in contrast to recent works on dynamic network\nmodeling [4, 5, 11, 12, 14], we explicitly model the birth and death dynamics of individual groups\nby using a distance-dependent Indian Buffet Process [7]. Under our model only active/alive groups\nin\ufb02uence relationships in a network at a given time. Further innovation of our approach is that we\nnot only model relations between the members of the same group but also account for links between\nmembers and non-members. By explicitly modeling group lifespan and group connectivity structure\nwe achieve greater modeling \ufb02exibility, which leads to improved performance on link prediction and\nnetwork forecasting tasks as well as to increased interpretability of obtained results.\n\n1\n\n\fThe rest of the paper is organized as follows: Section 2 provides the background and Section 3\npresents our generative model and motivates its parametrization. We discuss related work in Sec-\ntion 4 and present model inference procedure in Section 5. Last, in Section 6 we provide experi-\nmental results as well as analysis of the social network from the movie, The Lord of the Rings.\n\n2 Models of Dynamic Networks\n\nFirst, we describe general components of modern dynamic network models [4, 5, 11, 14]. In the\nnext section we will then describe our own model and point out the differences to the previous work.\n\nDynamic networks are generally conceptualized as discrete time series of graphs on a \ufb01xed set of\nnodes N . Dynamic network Y is represented as a time series of adjacency matrices Y (t) for each\ntime t = 1, 2, \u00b7 \u00b7 \u00b7 , T . In this work, we limit our focus to unweighted directed as well as undirected\nnetworks. So, each Y (t) is a N \u00d7 N binary matrix where Y (t)\nij = 1 if a link from node i to j exists\nat time t and Y (t)\n\nij = 0 otherwise.\n\nEach node i of the network is associated with a number of latent binary features that govern the\ninteraction dynamics with other nodes of the network. We denote the binary value of feature k of\nnode i at time t by z(t)\nik \u2208 {0, 1}. Such latent features can be viewed as assigning nodes to multi-\nple overlapping, latent clusters or groups [1, 21]. In our work, we interpret these latent features as\nmemberships to latent groups such as social communities of people with the same interests or hob-\nbies. We allow each node to belong to multiple groups simultaneously. We model each node-group\nmembership using a separate Bernoulli random variable [17, 22, 29]. This is in contrast to mixed-\nmembership models where the distribution over individual node\u2019s group memberships is modeled\nusing a multinomial distribution [1, 5, 12]. The advantage of our multiple-membership approach\nis as follows. Mixed-membership models (i.e., multinomial distribution over group memberships)\nessentially assume that by increasing the amount of node\u2019s membership to some group k, the same\nnode\u2019s membership to some other group k\u2032 has to decrease (due to the condition that the probabilities\nnormalize to 1). On the other hand, multiple-membership models do not suffer from this assumption\nand allow nodes to truely belong to multiple groups. Furthermore, we consider a nonparametric\nmodel of groups which does not restrict the number of latent groups ahead of time. Hence, our\nmodel adaptively learns the appropriate number of latent groups for a given network at a given time.\n\nIn dynamic network models, one also speci\ufb01es a process by which nodes dynamically join and leave\ngroups. We assume that each node i can join or leave a given group k according to a Markov model.\nHowever, since each node can join multiple groups independently, we naturally consider factorial\nhidden Markov models (FHMM) [8], where latent group membership of each node independently\nevolves over time. To be concrete, each membership z(t)\nik evolves through a 2-by-2 Markov transition\nprobability matrix Q(t)\n= r), where\nr, s \u2208 {0 = non-member, 1 = member}.\n\nk [r, s] corresponds to P (z(t)\n\nk where each entry Q(t)\n\nik = s|z(t\u22121)\n\nik\n\nNow, given node group memberships z(t)\nik at time t one also needs to specify the process of link\ngeneration. Links of the network realize according to a link function f (\u00b7). A link from node i to\nnode j at time t occurs with probability determined by the link function f (z(t)\nj\u00b7 ). In our model,\nwe develop a link function that not only accounts for links between group members but also models\nlinks between the members and non-members of a given group.\n\ni\u00b7 , z(t)\n\n3 Dynamic Multi-group Membership Graph Model\n\nNext we shall describe our Dynamic Multi-group Membership Graph Model (DMMG) and point out\nthe differences with the previous work. In our model, we pay close attention to the three processes\ngoverning network dynamics: (1) birth and death dynamics of individual groups, (2) evolution of\nmemberships of nodes to groups, and (3) the structure of network interactions between group mem-\nbers as well as non-members. We now proceed by describing each of them in turn.\n\nModel of active groups. Links of the network are in\ufb02uenced not only by nodes changing member-\nships to groups but also by the birth and death of groups themselves. New groups can be born and\nold ones can die. However, without explicitly modeling group birth and death there exists ambiguity\n\n2\n\n\fbetween group membership change and the birth/death of groups. For example, consider two dis-\njoint groups k and l such that their lifetimes and members do not overlap. In other words, group l is\nborn after group k dies out. However, if group birth and death dynamics is not explicitly modeled,\nthen the model could interpret that the two groups correspond to a single latent group where all the\nmembers of k leave the group before the members of l join the group. To resolve this ambiguity we\ndevise an explicit model of birth/death dynamics of groups by introducing a notion of active groups.\n\nUnder our model, a group can be in one of two states: it can be either active (alive) or inactive (not\nyet born or dead). However, once a group becomes inactive, it can never be active again. That is,\nonce a group dies, it can never be alive again. To ensure coherence of group\u2019s state over time, we\nbuild on the idea of distance-dependent Indian Buffet Processes (dd-IBP) [7]. The IBP is named\nafter a metaphorical process that gives rise to a probability distribution, where customers enter an\nIndian Buffet restaurant and sample some subset of an in\ufb01nitely long sequence of dishes. In the\ncontext of networks, nodes usually correspond to \u2018customers\u2019 and latent features/groups correspond\nto \u2018dishes\u2019. However, we apply dd-IBP in a different way. We regard each time step t as a \u2018customer\u2019\nthat samples a set of active groups Kt. So, at the \ufb01rst time step t = 1, we have P oisson(\u03bb) number\nof groups that are initially active, i.e., |K1| \u223c P oisson(\u03bb). To account for death of groups we\nthen consider that each active group at time t \u2212 1 can become inactive at the next time step t with\nprobability \u03b3. On the other hand, P oisson(\u03b3\u03bb) new groups are also born at time t. Thus, at each\ntime currently active groups can die, while new ones can also be born. The hyperparameter \u03b3\ncontrols for how often new groups are born and how often old ones die. For instance, there will be\nalmost no newborn or dead groups if \u03b3 \u2248 1, while there would be no temporal group coherence and\npractically all the groups would die between consecutive time steps if \u03b3 = 0.\n\nFigure 1(a) gives an example of the above process. Black circles indicate active groups and white\ncircles denote inactive (not yet born or dead) groups. Groups 1 and 3 exist at t = 1 and Group 2\nis born at t = 2. At t = 3, Group 3 dies but Group 4 is born. Without our group activity model,\nGroup 3 could have been reused with a completely new set of members and Group 4 would have\nnever been born. Our model can distinguish these two disjoint groups.\n\nFormally, we denote the number of active groups at time t by Kt = |Kt|. We also denote the state\n(active/inactive) of group k at time t by W (t)\nk = 1{k \u2208 Kt}. For convenience, we also de\ufb01ne a set\nof newly active groups at time t be K+\n\nt = |K+\nt |.\nPutting it all together we can now fully describe the process of group birth/death as follows:\n\nk = 0 \u2200t\u2032 < t} and K +\n\nk = 1, W (t\u2032)\n\nt = {k|W (t)\n\nfor t = 1\nfor t > 1\n\nK +\n\nP oisson (\u03b3\u03bb) ,\n\nt \u223c(cid:26)P oisson (\u03bb) ,\nk \u223c\uf8f1\uf8f4\uf8f2\n\uf8f4\uf8f3\n\nBernoulli(1 \u2212 \u03b3)\n1,\n0,\n\nW (t)\n\nk\n\n= 1\nt\u2032=1 K +\n\nif W (t\u22121)\n\nif Pt\u22121\n\notherwise .\n\nt\u2032 < k \u2264Pt\n\nt\u2032=1 K +\n\nt\u2032\n\n(1)\n\nNote that under this model an in\ufb01nite number of active groups can exist. This means our model au-\ntomatically determines the right number of active groups and each node can belong to many groups\nsimultaneously. We now proceed by describing the model of node group membership dynamics.\n\nDynamics of node group memberships. We capture the dynamics of nodes joining and leaving\ngroups by assuming that latent node group memberships form a Markov chain. In this framework,\nnode memberships to active groups evolve through time according to Markov dynamics:\n\nP (z(t)\n\nik |z(t\u22121)\n\nik\n\n) = Qk =(cid:18) 1 \u2212 ak\n\nbk\n\nak\n\n1 \u2212 bk (cid:19) ,\n\nwhere matrix Qk[r, s] denotes a Markov transition from state r to state s, which can be a \ufb01xed\nparameter, group speci\ufb01c, or otherwise domain dependent as long as it de\ufb01nes a Markov transition\nmatrix. Thus, the transition of node\u2019s i membership to active group k can be de\ufb01ned as follows:\n\nak, bk \u223c Beta(\u03b1, \u03b2), z(t)\n\nik \u223c W (t)\n\nk\n\n\u00b7 Bernoulli(cid:18)a\n\n(t\u22121)\nik\n\n1\u2212z\nk\n\n(1 \u2212 bk)z\n\n(t\u22121)\n\nik (cid:19) .\n\n(2)\n\nTypically, \u03b2 > \u03b1, which ensures that group\u2019s memberships are not too volatile over time.\n\n3\n\n\f(a) Group activity model\n\n(b) Link function model\n\nFigure 1: (a) Birth and death of groups: Black circles represent active and white circles represent inactive\n(unborn or dead) groups. A dead group can never become active again.\ndenotes\nbinary node group memberships. Entries of link af\ufb01nity matrix \u0398k denotes linking parameters between all 4\ncombinations of members (z(t)\nij , individual\naf\ufb01nities \u0398k[z(t)\n\ni = 0). To obtain link probability p(t)\n\ni = 1) and non-members (z(t)\n\n] are combined using a logistic function g(\u00b7)\n\n(b) Link function: z(t)\n\n, z(t)\n\ni\n\nj\n\nj\n\n.\n\nRelationship between node group memberships and links of the network. Last, we describe the\npart of the model that establishes the connection between node\u2019s memberships to groups and the\nlinks of the network. We achieve this by de\ufb01ning a link function f (i, j), which for given a pair of\nnodes i, j determines their interaction probability p(t)\n\nij based on their group memberships.\n\nWe build on the Multiplicative Attribute Graph model [16, 18], where each group k is associated\nwith a link af\ufb01nity matrix \u0398k \u2208 R2\u00d72. Each of the four entries of the link af\ufb01nity matrix captures\nthe tendency of linking between group\u2019s members, members and non-members, as well as non-\nmembers themselves. While traditionally link af\ufb01nities were considered to be probabilities, we\nrelax this assumption by allowing af\ufb01nities to be arbitrary real numbers and then combine them\nthrough a logistic function to obtain a \ufb01nal link probability.\n\nThe model is illustrated in Figure 1(b). Given group memberships z(t)\njk of nodes i and j at\ntime t the binary indicators \u201cselect\u201d an entry \u0398k[z(t)\njk ] of matrix \u0398k. This way linking tendency\nfrom node i to node j is re\ufb02ected based on their membership to group k. We then determine the\noverall link probability p(t)\n\nij by combining the link af\ufb01nities via a logistic function g(\u00b7)1. Thus,\n\nik and z(t)\n\nik , z(t)\n\nij = f (z(t)\np(t)\n\ni\u00b7 , z(t)\n\nj\u00b7 ) = g \u01ebt +\n\n\u0398k[z(t)\n\nik , z(t)\n\njk ]! , Yij \u223c Bernoulli(p(t)\n\nij )\n\n(3)\n\n\u221e\n\nXk=1\n\nwhere \u01ebt is a density parameter that re\ufb02ects the varying link density of network over time.\n\nNote that due to potentially in\ufb01nite number of groups the sum of an in\ufb01nite number of link af\ufb01nities\nmay not be tractable. To resolve this, we notice that for a given \u0398k subtracting \u0398k[0, 0] from all its\nentries and then adding this value to \u01ebt does not change the overall linking probability p(t)\nij . Thus, we\ncan set \u0398k[0, 0] = 0 and then only a \ufb01nite number of af\ufb01nities selected by z(t)\nik have to be considered.\nFor all other entries of \u0398k we use N (0, \u03bd2) as a prior distribution.\n\nTo sum up, Figure 2 illustrates the three components of the DMMG in a plate notation. Group\u2019s\nstate W (t)\nik is de\ufb01ned as\nthe FHMM over active groups. Then, the link between nodes i and j is determined based on the\ngroups they belong to and the corresponding group link af\ufb01nity matrices \u0398.\n\nis determined by the dd-IBP process and each node-group membership z(t)\n\nk\n\n4 Related Work\n\nClassically, non-Bayesian approaches such as exponential random graph models [10, 27] have been\nused to study dynamic networks. On the other hand, in the Bayesian approaches to dynamic network\nanalysis latent variable models have been most widely used. These approaches differ by the struc-\nture of the latent space that they assume. For example, euclidean space models [13, 24] place nodes\n\n1g(x) = exp(x)/(1 + exp(x))\n\n4\n\n\fFigure 2: Dynamic Multi-group Membership Graph Model. Network Y depends on each node\u2019s group mem-\nberships Z and active groups W . Links of Y appear via link af\ufb01nities \u0398.\n\nin a low dimensional Euclidean space and the network evolution is then modeled as a regression\nproblem of node\u2019s future latent location. In contrast, our model uses HMMs, where latent vari-\nables stochastically depend on the state at the previous time step. Related to our work are dynamic\nmixed-membership models where a node is probabilistically allocated to a set of latent features. Ex-\namples of this model include the dynamic mixed-membership block model [5, 12] and the dynamic\nin\ufb01nite relational model [14]. However, the critical difference here is that our model uses multi-\nmemberships where node\u2019s membership to one group does not limit its membership to other groups.\nProbably most related to our work here are DRIFT [4] and LFP [11] models. Both of these models\nconsider Markov switching of latent multi-group memberships over time. DRIFT uses the in\ufb01nite\nfactorial HMM [6], while LFP adds \u201csocial propagation\u201d to the Markov processes so that network\nlinks of each node at a given time directly in\ufb02uence group memberships of the corresponding node\nat the next time. Compared to these models, we uniquely incorporate the model of group birth and\ndeath and present a novel and powerful linking function.\n\n5 Model Inference via MCMC\n\nWe develop a Markov chain Monte Carlo (MCMC) procedure to approximate samples from the\nposterior distribution of the latent variables in our model. More speci\ufb01cally, there are \ufb01ve types\nof variables that we need to sample: node group memberships Z = {z(t)\nik }, group states W =\n{W (t)\nk }, group membership transitions Q = {Qk}, link af\ufb01nities \u0398 = {\u0398k}, and density parameters\n\u01eb = {\u01ebt}. By sampling each type of variables while \ufb01xing all the others, we end up with many\nsamples representing the posterior distribution P (Z, W, Q, \u0398, \u01eb|Y, \u03bb, \u03b3, \u03b1, \u03b2). We shall now explain\na sampling strategy for each varible type.\n\nSampling node group memberships Z. To sample node group membership z(t)\nik , we use the\nforward-backward recursion algorithm [26]. The algorithm \ufb01rst de\ufb01nes a deterministic forward\npass which runs down the chain starting at time one, and at each time point t collects information\nfrom the data and parameters up to time t in a dynamic programming cache. A stochastic backward\npass starts at time T and samples each z(t)\nik in backwards order using the information collected dur-\ning the forward pass. In our case, we only need to sample z(T B\nk indicate the\nbirth time and the death time of group k. Due to space constraints, we discuss further details in the\nextended version of the paper [19].\n\nk and T D\n\nwhere T B\n\nk :T D\nk )\n\nik\n\nSampling group states W . To update active groups, we use the Metropolis-Hastings algorithm\nwith the following proposal distribution P (W \u2192 W \u2032): We add a new group, remove an existing\ngroup, or update the life time of an active group with the same probability 1/3. When adding a new\ngroup k\u2032 we select the birth and death time of the group at random such that 1 \u2264 T B\nk\u2032 \u2264 T D\nk\u2032 \u2264 T .\nFor removing groups we randomly pick one of existing groups k\u2032\u2032 and remove it by setting W (t)\nk\u2032\u2032 = 0\nfor all t. Finally, to update the birth and death time of an existing group, we select an existing group\nand propose new birth and death time of the group at random. Once new state vector W \u2032 is proposed\nwe accept it with probability\n\nmin(cid:18)1,\n\nP (Y |W \u2032)P (W \u2032|\u03bb, \u03b3)P (W \u2032 \u2192 W )\n\nP (Y |W )P (W |\u03bb, \u03b3)P (W \u2192 W \u2032) (cid:19) .\n\n(4)\n\nWe compute P (W |\u03bb, \u03b3) and P (W \u2032 \u2192 W ) in a closed form, while we approximate the posterior\nP (Y |W ) by sampling L Gibbs samples while keeping W \ufb01xed.\n\n5\n\n\fSampling group membership transition matrix Q. Beta distribution is a conjugate prior of\nBernoulli distribution and thus we can sample each ak and bk in Qk directly from the posterior\ndistribution: ak \u223c Beta(\u03b1 + N01,k, \u03b2 + N00,k) and bk \u223c Beta(\u03b1 + N10,k, \u03b2 + N11,k), where Nrs,k\nis the number of nodes that transition from state r to s in group k (r, s \u2208 {0 = non-member, 1 =\nmember}).\n\nSampling link af\ufb01nities \u0398. Once node group memberships Z are determined, we update the entries\nof link af\ufb01nity matrices \u0398k. Direct sampling of \u0398 is intractable because of non-conjugacy of the\nlogistic link function. An appropriate method in such case would be the Metropolis-Hastings that\naccepts or rejects the proposal based on the likelihood ratio. However, to avoid low acceptance\nrates and quickly move toward the mode of the posterior distribution, we develop a method based\non Hybrid Monte Carlo (HMC) sampling [3]. We guide the sampling using the gradient of log-\nlikelihood function with respect to each \u0398k. Because links Y (t)\nij are generated independently given\ngroup memberships Z, the gradient with respect to \u0398k[x, y] can be computed by\n\n\u2212\n\n1\n2\u03c32 \u03982\n\nk +Xi,j,t(cid:16)Y (t)\n\nij \u2212 p(t)\n\nik = x, z(t)\n\njk = y} .\n\nij (cid:17) 1{z(t)\n\n(5)\n\nUpdating density parameter \u01eb. Parameter vector \u01eb is de\ufb01ned over a \ufb01nite dimension T . Therefore,\nwe can update \u01eb by maximizing the log-likelihood given all the other variables. We compute the\ngradient update for each \u01ebt and directly update \u01ebt via a gradient step.\n\nUpdating hyperparameters. The number of groups over all time periods is given by a Poisson\ndistribution with parameter \u03bb (1 + \u03b3 (T \u2212 1)). Hence, given \u03b3 we sample \u03bb by using a Gamma\nconjugate prior. Similarly, we can use the Beta conjugate prior for the group death process (i.e.,\nBernoulli distribution) to sample \u03b3. However, hyperparameters \u03b1 and \u03b2 do not have a conjugate\nprior, so we update them by using a gradient method based on the sampled values of ak and bk.\n\nij\n\nTime complexity of model parameter estimation. Last, we brie\ufb02y comment on the time com-\nplexity of our model parameter estimation procedure. Each sample z(t)\nik requires computation of\nlink probability p(t)\nfor all j 6= i. Since the expected number of active groups at each time is \u03bb,\nthis requires O(\u03bbN 2T ) computations of p(t)\nij . By caching the sum of link af\ufb01nities between every\npair of nodes sampling Z as well as W requires O(\u03bbN 2T ) time. Sampling \u0398 and \u01eb also requires\nO(\u03bbN 2T ) because the gradient of each p(t)\nij needs to be computed. Overall, our approach takes\nO(\u03bbN 2T ) to obtain a single sample, while models that are based on the interaction matrix between\nall groups [4, 5, 11] require O(K 2N 2T ), where K is the expected number of groups. Furthermore,\nit has been shown that O(log N ) groups are enough to represent networks [16, 18]. Thus, in practice\nK (i.e., \u03bb) is of order log N and the running time for each sample is O(N 2T log N ).\n\n6 Experiments\n\nWe evaluate our model on three different tasks. For quantitative evaluation, we perform missing link\nprediction as well as future network forecasting and show our model gives favorable performance\nwhen compared to current dynamic and static network models. We also analyze the dynamics of\ngroups in a dynamic social network of characters in a movie \u201cThe Lord of the Rings: The Two\nTowers.\u201d\n\nExperimental setup. For the two prediction experiments, we use the following three datasets. First,\nthe NIPS co-authorships network connects two people if they appear on the same publication in\nthe NIPS conference in a given year. Network spans T =17 years (1987 to 2003). Following [11]\nwe focus on a subset of 110 most connected people over all time periods. Second, the DBLP co-\nauthorship network is obtained from 21 Computer Science conferences from 2000 to 2009 (T =\n10) [28]. We focus on 209 people by taking 7-core of the aggregated network for the entire time.\nThird, the INFOCOM dataset represents the physical proximity interactions between 78 students at\nthe 2006 INFOCOM conference, recorded by wireless detector remotes given to each attendee [25].\nAs in [11] we use the processed data that removes inactive time slices to have T =50.\n\nTo evaluate the predictive performance of our model, we compare it to three baseline models. For\na naive baseline model, we regard the relationship between each pair of nodes as the instance of\n\n6\n\n\fModel\n\nNaive\nLFRM\nDRIFT\n\nDMMG\n\nTestLL\n\n-2030\n-880\n-758\n\n\u2212624\n\nNIPS\nAUC\n\n0.808\n0.777\n0.866\n0.916\n\nF1\n\nTestLL\n\n-12051\n0.177\n-3783\n0.195\n-3108\n0.296\n0.434 \u22122684\n\nDBLP\nAUC\n\n0.814\n0.784\n0.916\n0.939\n\nINFOCOM\n\nF1\n\nTestLL\n\nAUC\n\nF1\n\n-17821\n0.300\n-8689\n0.146\n-6654\n0.421\n0.492 \u22126422\n\n0.677\n0.946\n0.973\n0.976\n\n0.252\n0.703\n0.757\n0.764\n\nTable 1: Missing link prediction. We bold the performance of the best scoring method. Our DMMG performs\nthe best in all cases. All improvements are statistically signi\ufb01cant at 0.01 signi\ufb01cance level.\n\nindependent Bernoulli distribution with Beta(1, 1) prior. Thus, for a given pair of nodes, the link\nprobability at each time equals to the expected probability from the posterior distribution given net-\nwork data. Second baseline is LFRM [21], a model of static networks. For missing link prediction,\nwe independently \ufb01t LFRM to each snapshot of dynamic networks. For network forecasting task,\nwe \ufb01t LFRM to the most recent snapshot of a network. Even though LFRM does not capture time\ndynamics, we consider this to be a strong baseline model. Finally, for the comparison with dynamic\nnetwork models, we consider two recent state of the art models. The DRIFT model [4] is based\non an in\ufb01nite factorial HMM and authors kindly shared their implementation. We also consider the\nLFP model [11] for which we were not able to obtain the implementation, but since we use the same\ndatasets, we compare performance numbers directly with those reported in [11].\n\nTo evaluate predictive performance, we use various standard evaluation metrics. First, to assess\ngoodness of inferred probability distributions, we report the log-likelihood of held-out edges. Sec-\nond, to verify the predictive performance, we compute the area under the ROC curve (AUC). Last,\nwe also report the maximum F1-score (F1) by scanning over all possible precision/recall thresholds.\n\nTask 1: Predicting missing links. To generate the datasets for the task of missing link prediction,\nwe randomly hold out 20% of node pairs (i.e., either link or non-link) throughout the entire time\nperiod. We then run each model to obtain 400 samples after 800 burn-in samples for each of 10\nMCMC chains. Each sample gives a link probability for a given missing entry, so the \ufb01nal link\nprobability of a missing entry is computed by averaging the corresponding link probability over all\nthe samples. This \ufb01nal link probability provides the evaluation metric for a given missing data entry.\n\nTable 1 shows average evaluation metrics for each model and dataset over 10 runs. We also compute\nthe p-value on the difference between two best results for each dataset and metric. Overall, our\nDMMG model signi\ufb01cantly outperforms the other models in every metric and dataset. Particularly\nin terms of F1-score we gain up to 46.6% improvement over the other models.\n\nBy comparing the naive model and LFRM, we observe that LFRM performs especially poorly\ncompared to the naive model in two networks with few edges (NIPS and DBLP). Intuitively this\nmakes sense because due to the network sparsity we can obtain more information from the temporal\ntrajectory of each link than from each snapshot of network. However, both DRIFT and DMMG\nsuccessfully combine the temporal and the network information which results in better predictive\nperformance. Furthermore, we note that DMMG outperforms the other models by a larger margin\nas networks get sparser. DMMG makes better use of temporal information because it can explicitly\nmodel temporally local links through active groups.\n\nLast, we also compare our model to the LFP model. The LFP paper reports AUC ROC score of\n\u223c0.85 for NIPS and \u223c0.95 for INFOCOM on the same task of missing link prediction with 20%\nheld-out missing data [11]. Performance of our DMMG on these same networks under the same\nconditions is 0.916 for NIPS and 0.976 for INFOCOM, which is a strong improvement over LFP.\n\nTask 2: Future network forecasting. Here we are given a dynamic network up to time Tobs and\nthe goal is to predict the network at the next time Tobs + 1. We follow the experimental protocol\ndescribed in [4, 11]: We train the models on \ufb01rst Tobs networks, \ufb01x the parameters, and then for\neach model we run MCMC sampling one time step into the future. For each model and network,\nwe obtain 400 samples with 10 different MCMC chains, resulting in 400K network samples. These\nnetwork samples provide a probability distribution over links at time Tobs + 1.\n\nTable 2 shows performance averaged over different Tobs values ranging from 3 to T -1. Overall,\nDMMG generally exhibits the best performance, but performance results seem to depend on the\ndataset. DMMG performs the best at 0.001 signi\ufb01cance level in terms of AUC and F1 for the NIPS\ndataset, and at 0.05 level for the INFOCOM dataset. While DMMG improves performance on AUC\n\n7\n\n\fModel\n\nNaive\nLFRM\nDRIFT\n\nTestLL\n\n-547\n-356\n\u2212148\n\nDMMG\n\n-170\n\nNIPS\nAUC\n\n0.524\n0.398\n0.672\n0.732\n\nF1\n\n0.130\n0.011\n0.084\n0.196\n\nTestLL\n\n-3248\n-1680\n\u22121324\n\nDBLP\nAUC\n0.668\n0.492\n0.650\n\n-1347\n\n0.652\n\nINFOCOM\n\nF1\n\nTestLL\n\nAUC\n\nF1\n\n0.243\n0.024\n0.122\n0.245\n\n-774\n-760\n-661\n\n\u2212625\n\n0.673\n0.640\n0.782\n0.804\n\n0.270\n0.248\n0.381\n0.392\n\nTable 2: Future network forecasting. DMMG performs best on NIPS and INFOCOM while results on DBLP\nare mixed.\n\nhaldir\ngandalf\nmerry\nfrodo\nsam\ngollum\npippin\naragorn\nlegolas\ngimli\nsaruman\neowyn\neomer\ntheoden\ngrima\nhama\nfaramir\narwen\nelrond\ngaladriel\nmadril\n\nhaldir\ngandalf\nmerry\nfrodo\nsam\ngollum\npippin\naragorn\nlegolas\ngimli\nsaruman\neowyn\neomer\ntheoden\ngrima\nhama\nfaramir\narwen\nelrond\ngaladriel\nmadril\n\nhaldir\ngandalf\nmerry\nfrodo\nsam\ngollum\npippin\naragorn\nlegolas\ngimli\nsaruman\neowyn\neomer\ntheoden\ngrima\nhama\nfaramir\narwen\nelrond\ngaladriel\nmadril\n\n 1\n\n 2\n\n 3\n\n 4\n\n 5\n\n 1\n\n 2\n\n 3\n\n 4\n\n 5\n\n 1\n\n 2\n\n 3\n\n 4\n\n 5\n\n(a) Group 1\n\n(b) Group 2\n\n(c) Group 3\n\nFigure 3: Group arrival and departure dynamics of different characters in the Lord of the Rings. Dark areas in\nthe plots correspond to a give node\u2019s (y-axis) membership to each group over time (x-axis)\n\n.\n\n(9%) and F1 (133%), DRIFT achieves the best log-likelihood on the NIPS dataset. In light of our\nprevious observations, we conjecture that this is due to change in network edge density between\ndifferent snapshots. On the DBLP dataset, DRIFT gives the best log-likelihood, the naive model\nperforms best in terms of AUC, and DMMG is the best on F1 score. However, in all cases of DBLP\ndataset, the differences are not statistically signi\ufb01cant. Overall, DMMG performs the best on NIPS\nand INFOCOM and provides comparable performance on DBLP.\n\nTask 3: Case study of \u201cThe Lord of the Rings: The Two Towers\u201d social network. Last, we also\ninvestigate groups identi\ufb01ed by our model on a dynamic social network of characters in a movie,\nThe Lord of the Rings: The Two Towers. Based on the transcript of the movie we created a dynamic\nsocial network on 21 characters and T =5 time epochs, where we connect a pair of characters if they\nco-appear inside some time window.\n\nWe \ufb01t our model to this network and examine the results in Figure 3. Our model identi\ufb01ed three\ndynamic groups, which all nicely correspond to the Lord of the Rings storyline. For example,\nthe core of Group 1 corresponds to Aragorn, elf Legolas, dwarf Gimli, and people in Rohan who\nin the end all \ufb01ght against the Orcs. Similarly, Group 2 corresponds to hobbits Sam, Frodo and\nGollum on their mission to destroy the ring in Mordor, and are later joined by Faramir and ranger\nMadril. Interestingly, Group 3 evolving around Merry and Pippin only forms at t=2 when they start\ntheir journey with Treebeard and later \ufb01ght against wizard Saruman. While the \ufb01ght occurs in two\nseparate places we \ufb01nd that some scenes are not distinguishable, so it looks as if Merry and Pippin\nfought together with Rohan\u2019s army against Saruman\u2019s army.\n\nAcknowledgments\n\nWe thank Creighton Heaukulani and Zoubin Ghahramani for sharing data and code. This research\nhas been supported in part by NSF IIS-1016909, CNS-1010921, IIS-1149837, IIS-1159679, IARPA\nAFRL FA8650-10-C-7058, Okawa Foundation, Docomo, Boeing, Allyes, Volkswagen, Intel, Alfred\nP. Sloan Fellowship and the Microsoft Faculty Fellowship.\n\nReferences\n\n[1] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels.\n\nJMLR, 9, 2008.\n\n[2] L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social\n\nnetworks. In WSDM, 2011.\n\n8\n\n\f[3] S. Duane, A. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid monte carlo. Physics Letter B,\n\n195(2):216\u2013222, 1987.\n\n[4] J. Foulds, A. U. Asuncion, C. DuBois, C. T. Butts, and P. Smyth. A dynamic relational in\ufb01nite feature\n\nmodel for longitudinal social networks. In AISTATS, 2011.\n\n[5] W. Fu, L. Song, and E. P. Xing. Dynamic mixed membership blockmodel for evolving networks.\n\nIn\n\nICML, 2009.\n\n[6] J. V. Gael, Y. W. Teh, , and Z. Ghahramani. The in\ufb01nite factorial hidden markov model. In NIPS, 2009.\n\n[7] S. J. Gershman, P. I. Frazier, and D. M. Blei. Distance dependent in\ufb01nite latent feature models.\n\narXiv:1110.5454, 2012.\n\n[8] Z. Ghahramani and M. I. Jordan. Factorial hidden markov models. Machine Learning, 29(2-3):245\u2013273,\n\n1997.\n\n[9] F. Guo, S. Hanneke, W. Fu, and E. P. Xing. Recovering temporally rewiring networks: a model-based\n\napproach. In ICML, 2007.\n\n[10] S. Hanneke, W. Fu, and E. P. Xing. Discrete temporal models of social networks. Electron. J. Statist.,\n\n4:585\u2013605, 2010.\n\n[11] C. Heaukulani and Z. Ghahramani. Dynamic probabilistic models for latent feature propagation in social\n\nnetworks. In ICML, 2013.\n\n[12] Q. Ho, L. Song, and E. P. Xing. Evolving cluster mixed-membership blockmodel for time-varying net-\n\nworks. In AISTATS, 2011.\n\n[13] P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. JASA,\n\n97(460):1090 \u2013 1098, 2002.\n\n[14] K. Ishiguro, T. Iwata, N. Ueda, and J. Tenenbaum. Dynamic in\ufb01nite relational model for time-varying\n\nrelational data analysis. In NIPS, 2010.\n\n[15] S. Kairam, D. Wang, and J. Leskovec. The life and death of online groups: Predicting group growth and\n\nlongevity. In WSDM, 2012.\n\n[16] M. Kim and J. Leskovec. Modeling social networks with node attributes using the multiplicative attribute\n\ngraph model. In UAI, 2011.\n\n[17] M. Kim and J. Leskovec. Latent multi-group membership graph model. In ICML, 2012.\n\n[18] M. Kim and J. Leskovec. Multiplicative attribute graph model of real-world networks. Internet Mathe-\n\nmatics, 8(1-2):113\u2013160, 2012.\n\n[19] M. Kim and J. Leskovec. Nonparametric multi-group membership model for dynamic networks.\n\narXiv:1311.2079, 2013.\n\n[20] J. R. Lloyd, P. Orbanz, Z. Ghahramani, and D. M. Roy. Random function priors for exchangeable arrays\n\nwith applications to graphs and relational data. In NIPS, 2012.\n\n[21] K. T. Miller, T. L. Grifths, and M. I. Jordan. Nonparametric latent feature models for link prediction. In\n\nNIPS, 2009.\n\n[22] M. M\u00f8rup, M. N. Schmidt, and L. K. Hansen.\n\nIn\ufb01nite multiple membership relational modeling for\n\ncomplex networks. In MLSP, 2011.\n\n[23] K. Palla, D. A. Knowles, and Z. Ghahramani. An in\ufb01nite latent attribute model for network data.\n\nIn\n\nICML, 2012.\n\n[24] P. Sarkar and A. W. Moore. Dynamic social network analysis using latent space models. In NIPS, 2005.\n\n[25] J. Scott, R. Gass, J. Crowcroft, P. Hui, C. Diot, and A. Chaintreau. CRAWDAD data set cambridge/haggle\n\n(v. 2009-05-29), May 2009.\n\n[26] S. L. Scott. Bayesian methods for hidden markov models. JASA, 97(457):337\u2013351, 2002.\n\n[27] T. A. B. Snijders, G. G. van de Bunt, and C. E. G. Steglich. Introduction to stochastic actor-based models\n\nfor network dynamics. Social Networks, 32(1):44\u201360, 2010.\n\n[28] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic\n\nsocial networks. In KDD\u201908, 2008.\n\n[29] J. Yang and J. Leskovec. Community-af\ufb01liation graph model for overlapping community detection. In\n\nICDM, 2012.\n\n9\n\n\f", "award": [], "sourceid": 699, "authors": [{"given_name": "Myunghwan", "family_name": "Kim", "institution": "Stanford University"}, {"given_name": "Jure", "family_name": "Leskovec", "institution": "Stanford University"}]}