{"title": "Dynamic Infinite Relational Model for Time-varying Relational Data Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 919, "page_last": 927, "abstract": "We propose a new probabilistic model for analyzing dynamic evolutions of relational data, such as additions, deletions and split & merge, of relation clusters like communities in social networks. Our proposed model abstracts observed time-varying object-object relationships into relationships between object clusters. We extend the infinite Hidden Markov model to follow dynamic and time-sensitive changes in the structure of the relational data and to estimate a number of clusters simultaneously. We show the usefulness of the model through experiments with synthetic and real-world data sets.", "full_text": "Dynamic In\ufb01nite Relational Model\n\nfor Time-varying Relational Data Analysis\n\nKatsuhiko Ishiguro\n\nTomoharu Iwata\n\nNaonori Ueda\n\nNTT Communication Science Laboratories\n\nfishiguro,iwata,uedag@cslab.kecl.ntt.co.jp\n\nKyoto, 619-0237 Japan\n\nJoshua Tenenbaum\n\nMIT\n\nBoston, MA.\njbt@mit.edu\n\nAbstract\n\nWe propose a new probabilistic model for analyzing dynamic evolutions of rela-\ntional data, such as additions, deletions and split & merge, of relation clusters like\ncommunities in social networks. Our proposed model abstracts observed time-\nvarying object-object relationships into relationships between object clusters. We\nextend the in\ufb01nite Hidden Markov model to follow dynamic and time-sensitive\nchanges in the structure of the relational data and to estimate a number of clusters\nsimultaneously. We show the usefulness of the model through experiments with\nsynthetic and real-world data sets.\n\n1 Introduction\n\nAnalysis of \u201crelational data\u201d, such as the hyperlink structure on the Internet, friend links on social\nnetworks, or bibliographic citations between scienti\ufb01c articles, is useful in many aspects. Many\nstatistical models for relational data have been presented [10, 1, 18]. The stochastic block model\n(SBM) [11] and the in\ufb01nite relational model (IRM) [8] partition objects into clusters so that the\nrelations between clusters abstract the relations between objects well. SBM requires specifying the\nnumber of clusters in advance, while IRM automatically estimates the number of clusters. Similarly,\nthe mixed membership model [2] associates each object with multiple clusters (roles) rather than a\nsingle cluster.\nThese models treat the relations as static information. However, a large amount of relational data\nin the real world is time-varying. For example, hyperlinks on the Internet are not stationary since\nlinks disappear while new ones appear every day. Human relationships in a company sometimes\ndrastically change by the splitting of an organization or the merging of some groups due to e.g.\nMergers and Acquisitions. One of our modeling goals is to detect these sudden changes in network\nstructure that occur over time.\nRecently some researchers have investigated the dynamics in relational data. Tang et al.[13] pro-\nposed a spectral clustering-based model for multi-mode, time-evolving relations. Yang et al.[16]\ndeveloped the time-varying SBM. They assumed a transition probability matrix like HMM, which\ngoverns all the cluster assignments of objects for all time steps. This model has only one transition\nprobability matrix for the entire data. Thus, it cannot represent more complicated time variations\nsuch as split & merge of clusters that only occur temporarily. Fu et al.[4] proposed a time-series\nextension of the mixed membership model. [4] assumes a continuous world view: roles follow a\nmixed membership structure; model parameters evolve continuously in time. This model is very\ngeneral for time series relational data modeling, and is good for tracking gradual and continuous\nchanges of the relationships. Some works in bioinformatics [17, 5] have also adopted similar strate-\ngies. However, a continuous model approach does not necessarily best capture sudden transitions of\nthe relationships we are interested in. In addition, previous models assume the number of clusters is\n\ufb01xed and known, which is di\ufb03cult to determine a priori.\n\n1\n\n\fIn this paper we propose yet another time-varying relational data model that deals with temporal\nand dynamic changes of cluster structures such as additions, deletions and split & merge of clus-\nters. Instead of the continuous world view of [4], we assume a discrete structure: distinct clusters\nwith discrete transitions over time, allowing for birth, death and split & merge dynamics. More\nspeci\ufb01cally, we extend IRM for time-varying relational data by using a variant of the in\ufb01nite HMM\n(iHMM) [15, 3]. By incorporating the idea of iHMM, our model is able to infer clusters of objects\nwithout specifying a number of clusters in advance. Furthermore, we assume multiple transition\nprobabilities that are dependent on time steps and clusters. This speci\ufb01c form of iHMM enables the\nmodel to represent time-sensitive dynamic properties such as split & merge of clusters. Inference is\nperformed e\ufb03ciently with the slice sampler.\n\n2 In\ufb01nite Relational Model\n\nWe \ufb01rst explain the in\ufb01nite relational model (IRM) [8], which can estimate the number of hidden\nclusters from a relational data. In IRM, Dirichlet process (DP) is used as a prior for clusters of an\nunknown number, and is denoted as DP((cid:13); G0) where (cid:13) > 0 is a parameter and G0 is a base measure.\nWe write G (cid:24) DP((cid:13); G0) when a distribution G ((cid:18)) is sampled from DP. In this paper, we implement\nDP by using a stick-breaking process [12], which is based on the fact that G is represented as an\n(cid:12)k(cid:14)(cid:18)k((cid:18)); (cid:18)k (cid:24) G0. (cid:12) = ((cid:12)1; (cid:12)2; : : :) is a mixing ratio vector with\nin\ufb01nite mixture of (cid:18)s: G ((cid:18)) =\nk(cid:0)1\u220f\nin\ufb01nite elements whose sum equals one, constructed in a stochastic way:\n\n\u22111\n\nk=1\n\n(1 (cid:0) vl);\n\nvk (cid:24) Beta (1; (cid:13)) :\n\n(cid:12)k = vk\n\nl=1\n\n(1)\n\nHere vk is drawn from a Beta distribution with a parameter (cid:13).\nThe IRM is an application of the DP for relational data. Let us assume a binary two-place relation\non the set of objects D = f1; 2; : : : ; Ng as D (cid:2) D ! f0; 1g. For simplicity, we only discuss a two\nplace relation between the identical domain (D (cid:2) D). The IRM divides the set of N objects into\nmultiple clusters based on the observed relational data X = fxi; j 2 f0; 1g; 1 (cid:20) i; j (cid:20) Ng. The IRM\nis able to infer the number of clusters at the same time because it uses DP as a prior distribution\nof the cluster partition. Observation xi; j 2 f0; 1g denotes the existence of a relation between objects\ni; j 2 f1; 2; : : : ; Ng. If there is (not) a relation between i and j, then xi; j = 1 (0). We allow asymmetric\nrelations xi; j , x j;i throughout the paper.\nThe probabilistic generative model (Fig. 1(a)) of the IRM is as follows:\n\n(cid:12)j(cid:13) (cid:24) Stick ((cid:13))\nzij(cid:12) (cid:24) Multinomial ((cid:12))\n(\n)\n(cid:17)k;lj(cid:24); (cid:24) Beta ((cid:24); )\nxi; jjZ; H (cid:24) Bernoulli\n\n(cid:17)zi;z j\n\n:\n\n(2)\n(3)\n(4)\n(5)\n\ni=1 and H = f(cid:17)k;lg1\n\nHere, Z = fzigN\nk;l=1. In Eq. (2) \u201cStick\u201d is the stick-breaking process (Eq. (1)). We\nsample a cluster index of the object i, zi = k; k 2 f1; 2; : : : ;g using (cid:12) as in Eq. (3). In Eq. (4) (cid:17)k;l is\nthe strength of a relation between the objects in clusters k and l. Generating the observed relational\ndata xi; j follows Eq. (5) conditioned by the cluster assignments Z and the strengths H.\n\n3 Dynamic In\ufb01nite Relational Model (dIRM)\n\n3.1 Time-varying relational data\n\nFirst, we de\ufb01ne the time-varying relational data considered in this paper. Time-varying relational\nt 2\ndata X have three subscripts t;\nf1; 2; : : : ; Tg. xt;i; j = 1(0) indicates that there is (not) an observed relationship between objects i\nand j at time step t. T is the number of time steps, and N is the number of objects. We assume\nthat there is no relation between objects belonging to a di\ufb00erent time step t and t0. The time-varying\nrelational data X is a set of T (static) relational data for T time steps.\n\n, where i; j 2 f1; 2; : : : ; Ng;\n\nxt;i; j 2 f0; 1g\n\ni; and j: X =\n\n}\n\n{\n\n2\n\n\fFigure 1: Graphical model of (a)IRM (Eqs.2-5), (b)\u201ctIRM\u201d (Eqs.7-10), and (c)dIRM (Eqs.11-15).\nCircle nodes denote variables, square nodes are constants and shaded nodes indicate observations.\n\nIt is natural to assume that every object transits between di\ufb00erent clusters along with the time evo-\nlution. Observing several real world time-varying relational data, we assume there are several prop-\nerties of transitions, as follows:\n\n(cid:15) P1. Cluster assignments in consecutive time steps have higher correlations.\n(cid:15) P2. Time evolutions of clusters are not stationary nor uniform.\n(cid:15) P3. The number of clusters is time-varying and unknown a priori.\n\nP1 is a common assumption for many kinds of time series data, not limited to relational data. For\nexample, a member of a \ufb01rm community on SNSs will belong to the same community for a long\ntime. A hyperlink structure in a news website may alter because of breaking news, but most of the\nsite does not change as rapidly every minute.\nP2 tries to model occasional and drastic changes from frequent and minor modi\ufb01cations in rela-\ntional networks. Such unstable changes are observed elsewhere. For example, human relationships\nin companies will evolve every day, but a merger of departments sometimes brings about drastic\nchanges. On an SNS, a user community for the upcoming Olympics games may exist for a limited\ntime: it will not last years after the games end. This will cause an addition and deletion of a user\ncluster (community). P3 is indispensable to track such changes of clusters.\n\n3.2 Naive extensions of IRM\n\nWe attempt to modify the IRM to satisfy these properties. We \ufb01rst consider several straightforward\nsolutions based on the IRM for analyzing time-varying relational data.\nThe simplest way is to convert time-varying relational data X into \u201cstatic\u201d relational data \u02dcX = f \u02dcxi; jg\nand apply the IRM to \u02dcX. For example, we can generate \u02dcX as follows:\n\n{\n\n\u2211T\n\n1\nT\n\n\u02dcxi; j =\n\nt=1 xt;i; j > (cid:27);\n\n1\n0 otherwise;\n\n(6)\n\nwhere (cid:27) denotes a threshold. This solution cannot represent the time changes of clustering because\nit assume the same clustering results for all the time steps (z1;i = z2;i = (cid:1)(cid:1)(cid:1) = zT;i).\nWe may separate the time-varying relational data X into a series of time step-wise relational data Xt\nand apply the IRM for each Xt. In this case, we will have a di\ufb00erent clustering result for each time\nstep, but the analysis ignores the dependency of the data over time.\n\n3\n\n\u03b2\u03b3Nzixi,jN X N\u03b7k,l\u03be\u03c8(a)\u03b2\u03b3\u03b7k,l\u03be\u03c8zt,ixt,i,j(b)NN X NTzt+1,i\u03ba\u03b10\u03b2\u03b3\u03b7k,l\u03be\u03c8zt,ixt,i,j\u03c0t,kzt-1,i(c)NN X NT\fAnother solution is to extend the object assignment variable zi to be time-dependent zt;i. The result-\ning \u201ctIRM\u201d model is described as follows (Fig. 1(b)):\n(cid:12)j(cid:13) (cid:24) Stick ((cid:13))\nzt;ij(cid:12) (cid:24) Multinomial ((cid:12))\n(\n(cid:17)k;lj(cid:24); (cid:24) Beta ((cid:24); )\nxt;i; jjZt; H (cid:24) Bernoulli\n\n(7)\n(8)\n(9)\n(10)\n\n)\n\n(cid:17)zt;i;zt; j\n\n:\n\nHere, Zt = fzt;igN\ni=1. Since (cid:12) is shared over all time steps, we may expect that the clustering results\nbetween time steps will have higher correlations. However, this model assumes that zt;i is condi-\ntionally independent from each other for all t given (cid:12). This implies that the tIRM is not suitable for\nmodeling time evolutions since the order of time steps are ignored in the model.\n\n3.3 dynamic IRM\nTo address three conditions P1(cid:24)3 above, we propose a new probabilistic model called the dynamic\nin\ufb01nite relational model (dIRM). The generative model is given below:\n\n(\n(cid:12)j(cid:13) (cid:24) Stick ((cid:13))\n\n(cid:25)t;kj(cid:11)0; (cid:20); (cid:12) (cid:24) DP\n(cid:11)0 + (cid:20);\nzt;ijzt(cid:0)1;i; \u03a0t (cid:24) Multinomial\n(\n(cid:17)k;lj(cid:24); (cid:24) Beta ((cid:24); )\nxt;i; jjZt; H (cid:24) Bernoulli\n\n(cid:17)zt;i;zt; j\n\n:\n\n)\n\n)\n\n(cid:11)0(cid:12) + (cid:20)(cid:14)k\n\n(\n\n(cid:11)0 + (cid:20)\n(cid:25)t;zt(cid:0)1;i\n\n)\n\n(11)\n\n(12)\n\n(13)\n(14)\n(15)\n\nHere, \u03a0t = f(cid:25)t;k : k = 1; : : : ;1g. A graphical model of the dIRM is presented in Fig. 1(c).\n(cid:12) in Eq. (11) represents time-average memberships (mixing ratios) to clusters. Newly introduced\n(cid:25)t;k = ((cid:25)t;k;1; (cid:25)t;k;2; : : : ; (cid:25)t;k;l; : : :) in Eq. (12) is a transition probability that an object remaining in the\ncluster k 2 f1; 2; : : :g at time t (cid:0) 1 will move to the cluster l 2 f1; 2; : : :g at time t. Because of the DP,\nthis transition probability is able to handle in\ufb01nite hidden states like iHMM [14].\nThe DP used in Eq. (12) has an additional term (cid:20) > 0, which is introduced by Fox et al. [3]. (cid:14)k is\na vector whose elements are zero except the kth element, which is one. Because the base measure\nin Eq. (12) is biased by (cid:20) and (cid:14)k, the kth element of (cid:25)t;k prefers to take a larger value than other\nelements. This implies that this DP encourages the self-transitions of objects, and we can achieve\nthe property P1 for time-varying relational data.\nOne di\ufb00erence from conventional iHMMs [14, 3] lies in P2, which is achieved by making the\ntransition probability (cid:25) time-dependent. (cid:25)t;k is sampled for every time step t, thus, we can model\ntime-varying patterns of transitions, including additions, deletions and split & merge of clusters as\nextreme cases. These changes happen only temporarily, therefore, time-dependent transition prob-\nabilities are indispensable for our purpose. Note that the transition probability is also dependent\non the cluster index k, as in conventional iHMMs. Also the dIRM can automatically determine the\nnumber of clusters thanks to DP: this enables us to hold P3.\nEquation (13) generates a cluster assignment for the object i at time t, based on the cluster, where\nthe object was previously (zt(cid:0)1;i) and its transition probability (cid:25). Equation (14) generates a strength\nparameter (cid:17) for the pair of clusters k and l, then we obtain the observed sample xt;i; j in Eq. (15).\nThe di\ufb00erence between iHMMs and dIRM is two-fold. One is the time-dependent transition proba-\nbility of the dIRM discussed above. The another is that the iHMMs have one hidden state sequence\ns1:t to be inferred, while the dIRM needs to estimate multiple hidden state sequences z1:t;i given one\ntime sequence observation. Thus, we may interpret the dIRM as an extension of the iHMM, which\nhas N (= a number of objects) hidden sequences to handle relational data.\n\n4\n\n\f(\n\n(\n\n)\n(\n\n4 Inference\n\nWe use a slice sampler [15], which enables fast and e\ufb03cient sampling of the sequential hidden states.\nThe slice sampler introduces auxiliary variables U = fut;ig. Given U, the number of clusters can be\nreduced to a \ufb01nite number during the inference, and it enables us an e\ufb03cient sampling of variables.\n\n4.1 Sampling parameters\n\nFirst, we explain the sampling of an auxiliary variable ut;i. We assume a prior of ut;i as a uniform\ndistribution. Also we de\ufb01ne the joint distribution of u, z, and x:\n\n)\n\n(\n\np\n\nxt;i; j; ut;i; ut; j; zt(cid:0)1:t;i; zt(cid:0)1:t; j\n\n(16)\nHere, I((cid:1)) is 1 if the predicate holds, otherwise zero. Using Eq. (16), we can derive the posterior of\nut;i as follows:\n\nut; j < (cid:25)t;zt(cid:0)1; j;zt; j\n\nut;i < (cid:25)t;zt(cid:0)1;i;zt;i\n\n= I\n\nx\n\nI\n\n)\n\nut;i (cid:24) Uniform\n\n0; (cid:25)t;zt(cid:0)1;i;zt;i\n\n:\n\n(17)\n\n)\n\n(\n\n(cid:17)zt;i ;zt; j\nt;i; j\n\n1(cid:0)xt;i; j\n\n)1(cid:0)(cid:17)zt;i ;zt; j :\n\n)\n\n(\n\n(\n\nNext, we explain the sampling of an object assignment variable zt;i. We de\ufb01ne the following message\nvariable p:\n\npt;i;k = p\n\n(18)\nSampling of zt;i is similar to the forward-backward algorithm for the original HMM. First, we com-\npute the above message variables from t = 1 to t = T (forward \ufb01ltering). Next, we sample zt;i from\nt = T to t = 1 using the computed message variables (backward sampling).\nIn forward \ufb01ltering we compute the following equation from t = 1 to t = T:\n\n:\n\nzt;i = kjX1:t; U1:t; \u03a0; H; (cid:12)\n\npt;i;k / p\n\nxt;i;ijzt;i =k; H\n\nxt;i; jjzt;i =k; H\n\np\n\nxt; j;ijzt;i =k; H\n\np\n\npt(cid:0)1;i;l:\n\n(19)\n\n) \u2211\n\n)\u220f\n\n)\n\n(\n\n(\n\nj,i\n\nl:ut;i<(cid:25)t;l;k\n\np\n\n)\n\n(\n\n(\n\nzt;i = kjzt+1;i = l\n\nNote that the summation is conditioned by ut;i. The number of ls (cluster indices) that hold this\ncondition is limited to a certain \ufb01nite number. Thus, we can evaluate the above equation.\nIn backward sampling, we sample zt;i from t = T to t = 1 from the equation below:\n\n) / pt;i;k(cid:25)t+1;k;lI\nof unrepresented clusters are aggregated in (cid:12)K+1 = 1 (cid:0)\u2211K\n\nBecause of I(u < (cid:25)), values of cluster indices k are limited within a \ufb01nite set. Therefore, the variety\nof sampled zt;i will be limited a certain \ufb01nite number K given U.\nGiven U and Z, we have \ufb01nite K-realized clusters. Thus, computing the posteriors of (cid:25)t;k and (cid:17)k;l\nbecomes easy and straightforward. First (cid:12) is assumed as a K + 1-dimensional vector (mixing ratios\n(cid:12)k ). mt;k;l denotes a number of objects\ni such that zt(cid:0)1;i = k and zt;i = l. Also, let us denote a number of xt;i; j such that zt;i = k and zt; j = l as\nNk;l. Similarly, nk;l denotes a number of xt;i; j such that zt;i = k, zt; j = l and xt;i; j = 1. Then we obtain\nfollowing posteriors:\n\n:\n\n(20)\n\nut+1;i < (cid:25)t+1;k;l\n\nk=1\n\n)\n(\n)\n(cid:11)0(cid:12) + (cid:20)(cid:14)k + mt;k\n(cid:24) + nk;l; + Nk;l (cid:0) nk;l\n\n(\n(cid:25)t;k (cid:24) Dirichlet\n(cid:17)k;l (cid:24) Beta\n\n:\n\n:\n\n(21)\n(22)\n\nmt;k is a K + 1-dimensional vector whose lth element is mt;k;l (mt;k;K+1 = 0).\nWe omit the derivation of the posterior of (cid:12) since it is almost the same with that of Fox et al. [3].\n\n4.2 Sampling hyperparameters\n\nSampling hyperparameters is important to obtain the best results. This could be done normally by\nputting vague prior distributions [14]. However, it is di\ufb03cult to evaluate the precise posteriors for\nsome hyperparameters [3]. Instead, we reparameterize and sample a hyperparameter in terms of\na 2 (0; 1) [6]. For example, if the hyperparameter (cid:13) is assumed as Gamma-distributed, we convert (cid:13)\nby a = (cid:13)\n1+(cid:13) . Sampling a can be achieved from a uniform grid on (0; 1). We compute (unnormalized)\nposterior probability densities at several as and choose one to update the hyperparameter.\n\n5\n\n\fFigure 2: Example of real-world datasets. (a)IOtables data, observations at t = 1, (b)IOtables data,\nobservations at t = 5, (c)Enron data, observations at t = 2, and (d)Enron data, observations at t = 10.\n\n5 Experiments\n\nPerformance of the dIRM is compared with the original IRM [8] and its naive extension tIRM\n(described in Eqs. (7-10)). To apply the IRM to time-varying relational data, we use Eq. (6) to X\nwith a threshold (cid:27) = 0:5. The di\ufb00erence between the tIRM (Eqs. (7-10)) and the dIRM is that\nthe tIRM does not incorporate the dependency between successive time steps while the dIRM does.\nHyperparameters were estimated simultaneously in all experiments.\n\n5.1 Datasets and measurements\n\nWe prepared two synthetic datasets (Synth1 and Synth2). To synthesize datasets, we \ufb01rst determined\nthe number of time steps T, the number of clusters K, and the number of objects N. Next, we man-\nually assigned zt;i in order to obtain cluster split & merge, additions, and deletions. After obtaining\nZ, we de\ufb01ned the connection strengths between clusters H = f(cid:17)k;lg. In this experiment, each (cid:17)k;l may\ntake one of two values (cid:17) = 0:1 (weakly connected) or (cid:17) = 0:9 (strongly connected). Observation X\nwas randomly generated according to Z and H. Synth1 is smaller (N = 16) and stable while Synth2\nis much larger (N = 54), and objects actively transit between clusters.\nTwo real-world datasets were also collected. The \ufb01rst one is the National Input-Output Tables for\nJapan (IOtables) provided by the Statistics Bureau of the Ministry of Internal A\ufb00airs and Commu-\nnications of Japan. IOtables summarize the transactions of goods and services between industrial\nsectors. We used an inverted coe\ufb03cient matrix, which is a part of the IOtables. Each element in\nthe matrix ei; j represents that one unit of demand in the jth sector invokes ei; j productions in the ith\nsector. We generated xi; j from ei; j by binarizaion: setting xi; j = 1 if ei; j exceeds the average, and\nsetting xi; j = 0 otherwise. We collected data from 1985, 1990, 1995, 2000, and 2005, in 32 sectors\nresolutions. Thus we obtain a time-varying relational data of N = 32 and T = 5.\nThe another real-world dataset is the Enron e-mail dataset [9], used in many studies including [13, 4].\nWe extracted e-mails sent in 2001. The number of time steps was T = 12, so the dataset was divided\ninto monthly transactions. The full dataset contained N = 151 persons. xt;i; j = 1(0) if there is\n(not) an e-mail sent from i to j at time (month) t. We also generated a smaller dataset (N = 68) by\nexcluding those who send few e-mails for convenience. Quantitative measurements were computed\nwith this smaller dataset.\nFig. 2 presents examples of IOtables dataset ((a),(b)) and Enron dataset ((c),(d)). IOtables dataset\ncharacterized by its stable relationships, compared to Enron dataset. In Enron dataset, the amount\nof communication rapidly increases after the media reported on the Enron scandals.\nWe used three evaluating measurements. One is the Rand index, which computes the similarity\nbetween true and estimated clustering results [7]. The Rand index takes the maximum value (1)\nif the two clustering results completely match. We computed the Rand index between the ground\ntruth Zt and the estimated \u02c6Zt for each time step, and averaged the indices for T steps. We also\ncompute the error in the number of estimated clusters. Di\ufb00erences in the number of realized clusters\nwere computed between Zt and \u02c6Zt, and we calculated the average of these errors for T steps. We\n\n6\n\n051015202530051015202530ijIOtables data t = 1051015202530051015202530ijIOtables data t = 5050100150050100150ijEnron data t = 2050100150050100150ijEnron data t = 10(a)(b)(c)(d)\fTable 1: Computed Rand indices, numbers of erroneous clusters, and averaged test data log likeli-\nhoods.\n\nData\nSynth1\nSynth2\nIOtables\nEnron\n\nRand index\n\n# of erroneous clusters\nIRM tIRM dIRM IRM tIRM dIRM\n0.13\n0.796\n0.65\n0.433\n\n0.982\n0.847\n\n0.946\n0.734\n\n1.00\n3.00\n\n-\n-\n\n0.20\n0.98\n\n-\n-\n\n-\n-\n\n-\n-\n\n-\n-\n\n-\n-\n\nTest log likelihood\n\nIRM\n-0.542\n-0.692\n-0.354\n-0.120\n\ntIRM dIRM\n-0.505\n-0.508\n-0.318\n-0.393\n-0.291\n-0.358\n-0.106\n-0.135\n\ncalculated these measurements for the synthetic datasets. The third measure is an (approximated)\ntest-data log likelihood. For all datasets, we generated noisy datasets whose observation values are\ninverted. The number of inverted elements was kept small so that inversions would not a\ufb00ect the\nglobal clustering results. The ratios of inverted elements over the entire elements are set to 5% for\ntwo synthetic data, 1% for IOtables data and 0.5% for Enron data. We made inferences on the noisy\ndatasets, and computed the likelihoods that \u201cinverted observations take the real value\u201d. We used the\naveraged log-likelihood per a observation as a measurement.\n\n5.2 Results\n\nFirst, we present the quantitative results. Table 1 lists the computed Rand index, errors in the es-\ntimated number of clusters, and test-data log likelihoods. We con\ufb01rmed that dIRM outperformed\nthe other models in all datasets for the all measures. Particularly, dIRM showed good results in the\nSynth2 and Enron datasets, where the changes in relationships are highly dynamic and unstable. On\nthe other hand, the dIRM did not achieve a remarkable improvement against tIRM for the Synth1\ndataset whose temporal changes are small. Thus we can say that the dIRM is superior in modeling\ntime-varying relational data, especially for dynamic ones.\nNext, we evaluate results of the real-world datasets qualitatively. Figure 3 shows the results from\nIOtables data. The panel (a) illustrates the estimated (cid:17)k;l using the dIRM, and the panel (b) presents\nthe time evolution of cluster assignments, respectively. The dIRM obtained some reasonable and\nstable industrial clusters, as shown in Fig. 3 (b). For example, dIRM groups the machine industries\ninto cluster 5, and infrastructure related industries are grouped into cluster 13. We believe that the\nself-transition bias (cid:20) helps the model \ufb01nd these stable clusters. Also relationships between clusters\npresented in Fig. 3 (a) are intuitively understandable. For example, demands for machine industries\n(cluster 5) will cause large productions for \u201ciron and steel\u201d sector (cluster 7). The \u201ccommerce &\ntrade\u201d and \u201centerprise services\u201d sectors (cluster 10) connects strongly to almost all the sectors.\nThere are some interesting cluster transitions. First, look at the \u201c\ufb01nance, insurance\u201d sector. At\nt = 1, this sector belongs to cluster 14. However, the sector transits to cluster 1 afterwards, which\ndoes not connect strongly with clusters 5 and 7. This may indicates the shift of money from these\nmatured industries. Next, the \u201ctransport\u201d sector enlarges its roll in the market by moving to cluster\n14, and it causes the deletion of cluster 8. Finally, note the transitions of \u201ctelecom, broadcast\u201d sector.\nFrom 1985 to 2000, this sector is in the cluster 9 which is rather independent from other clusters.\nHowever, in 2005 the cluster separated, and telecom industry merged with cluster 1, which is a\nin\ufb02uential cluster. This result is consistent with the rapid growth in ITC technologies and its large\nimpact on the world.\nFinally, we discuss results on the Enron dataset. Because this e-mail dataset contains many individ-\nuals\u2019 names, we refrain from cataloging the object assignments as in the IOtables dataset. Figure\n4 (a) tells us that clusters 1 (cid:24) 7 are relatively separated communities. For example, members in\ncluster 4 belong to a restricted domain business such energy, gas, or pipeline businesses. Cluster 5\nis a community of \ufb01nancial and monetary departments, and cluster 7 is a community of managers\nsuch as vice presidents, and CFOs.\nOne interesting result from the dIRM is \ufb01nding cluster 9. This cluster notably sends many messages\nto other clusters, especially for management cluster 7. The number of objects belonging to this\ncluster is only three throughout the time steps, but these members are the key-persons at that time.\n\n7\n\n\fFigure 3: (a) Example of estimated (cid:17)k;l (strength of relationship between clusters k; l) for IOtable\ndata by dIRM. (d) Time-varying clustering assignments for selected clusters by dIRM.\n\nFigure 4: (a): Example of estimated (cid:17)k;l for Enron dataset using dIRM. (b): Number of items\nbelonging to clusters at each time step for Enron dataset using dIRM.\n\nFirst, the CEO of Enron America stayed at cluster 9 in May (t = 5). Next, the founder of Enron was\na member of the cluster in August t = 8. The CEO of Enron resigned that month, and the founder\nactually made an announcement to calm down the public. Finally, the COO belongs to the cluster in\nOctober t = 10. This is the month that newspapers reported the accounting violations.\nFig. 4 (b) presents the time evolutions of the cluster memberships; i.e. the number of objects belong-\ning to each cluster at each time step. In contrast to the IOtables dataset, this Enron e-mail dataset is\nvery dynamic, as you can see from Fig. 2(c), (d). For example, the volume of cluster 6 (inactive clus-\nter) decreases as time evolves. This result re\ufb02ects the fact that the transactions between employees\nincrease as the scandal is more and more revealed. On the contrary, cluster 4 is stable in member-\nship. Thus, we can imagine that the group of energy and gas is a dense and strong community. This\nis also true for cluster 5.\n\n6 Conclusions\n\nWe proposed a new time-varying relational data model that is able to represent dynamic changes\nof cluster structures. The dynamic IRM (dIRM) model incorporates a variant of the iHMM model\nand represents time-sensitive dynamic properties such as split & merge of clusters. We explained a\ngenerative model of the dIRM, and showed an inference algorithm based on a slice sampler. Exper-\niments with synthetic and real-world time series datasets showed that the proposed model improves\nthe precision of time-varying relational data analysis. We will apply this model to other datasets to\nstudy the capability and the reliability of the model. We also are interested in modifying the dIRM\nto deal with multi-valued observation data.\n\n8\n\nCluster 1Cluster 5Cluster 7Cluster 8Cluster 9Cluster 10Cluster 13Cluster 14machineryelectronic machinerytransport machineryprecision machineryiron and steeltransporttransporttransport(unborn)(deleted)(deleted)consumer servicesfinance, insurancefinance, insurancefinance, insurancetelecom, broadcastfinance, insurancecommerce, tradesenterprise servicestransporttransportfinance, insurance(deleted)(deleted)petroleumelectric powers, gaswater, waste disposalmining1985 ( t = 1)1990 ( t = 2)1995 ( t = 3)2000 ( t = 4)2005 ( t = 5)telecom, broadcastconsumer servicestelecom, broadcastconsumer servicestelecom, broadcastconsumer servicestelecom, broadcastconsumer servicesmachineryelectronic machinerytransport machineryprecision machineryiron and steelcommerce, tradesenterprise servicespetroleumelectric powers, gaswater, waste disposalminingmachineryelectronic machinerytransport machineryprecision machineryiron and steelcommerce, tradesenterprise servicespetroleumelectric powers, gaswater, waste disposalminingmachineryelectronic machinerytransport machineryprecision machineryiron and steelcommerce, tradesenterprise servicespetroleumelectric powers, gaswater, waste disposalminingmachineryelectronic machinerytransport machineryprecision machineryiron and steelcommerce, tradesenterprise servicespetroleumelectric powers, gaswater, waste disposalmining(b)241012142468101214 0.10.20.30.40.50.60.70.80.91357911131391113685710kldIRM: learned for IOtables data\u03b7kl(a)The founderCOOCEO of Enron AmericadIRM: Learned \u03b7kl for Enron data(a)(b)\u201cInactive\u201d object cluster\fReferences\n[1] A. Clauset, C. Moore, and M. E. J. Newman. Hierarchical structure and the prediction of\n\nmissing links in networks. Nature, 453:98\u2013101, 2008.\n\n[2] E. Erosheva, S. Fienberg, and J. La\ufb00erty. Mixed-membership models of scienti\ufb01c publications.\nProceedings of the National Academy of Sciences of the United States of America (PNAS),\n101(Suppl 1):5220\u20135227, 2004.\n\n[3] E.B. Fox, E.B. Sudderth, M.I. Jordan, and A.S. Willsky. An HDP-HMM for systems with\nstate persistence. In Proceedings of the 25th International Conference on Machine Learning\n(ICML), 2008.\n\n[4] Wenjie Fu, Le Song, and Eric P. Xing. Dynamic mixed membership blockmodel for evolving\nnetworks. In Proceedings of the 26th International Conference on Machine Learning (ICML),\n2009.\n\n[5] O. Hirose, R. Yoshida, S. Imoto, R. Yamaguchi, T. Higuchi, D. S. Chamock-Jones, C. Print,\nand S. Miyano. Statistical inference of transcriptional module-based gene networks from time\ncourse gene expression pro\ufb01les by using state space models. Bioinformatics, 24(7):932\u2013942,\n2008.\n\n[6] P. D. Ho\ufb00. Subset clustering of binary sequences, with an application to genomic abnormality\n\ndata. Biometrics, 61(4):1027\u20131036, 2005.\n\n[7] L. Hubert and P. Arabie. Comparing partitions. Journal of Classi\ufb01cation, 2(1):193\u2013218, 1985.\n[8] C. Kemp, J. B. Tenenbaum, T. L. Gri\ufb03ths, T. Yamada, and N. Ueda. Learning systems of\nconcepts with an in\ufb01nite relational model. In Proceedings of the 21st National Conference on\nArti\ufb01cial Intelligence (AAAI), 2006.\n\n[9] B. Klimat and Y. Yang. The enron corpus: A new dataset for email classi\ufb01cation research. In\n\nProceedings of the European Conference on Machine Learning (ECML), 2004.\n\n[10] D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In Pro-\nceedings of the Twelfth International Conference on Information and Knowledge Management,\npages 556\u2013559. ACM, 2003.\n\n[11] K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures.\n\nJournal of the American Statistical Association, 96(455):1077\u20131087, 2001.\n\n[12] J. Sethuraman. A constructive de\ufb01nition of dirichlet process. Statistica Sinica, 4:639\u2013650,\n\n1994.\n\n[13] L. Tang, H. Liu, J. Zhang, and Z. Nazeri. Community evolution in dynamic multi-mode net-\nIn Proceeding of the 14th ACM SIGKDD International Conference on Knowledge\n\nworks.\nDiscovery and Data Mining, pages 677\u2013685, 2008.\n\n[14] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet process. Journal of\n\nThe American Statistical Association, 101(476):1566\u20131581, 2006.\n\n[15] J. Van Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani. Beam sampling for the in\ufb01nite hidden\nIn Proceedings of the 25th International Conference on Machine Learning\n\nMarkov model.\n(ICML), 2008.\n\n[16] T. Yang, Y. Chi, S. Zhu, Y. Gong, and R. Jin. A Bayesian approach toward \ufb01nding commu-\nnities and their evolutions in dynamic social networks. In Proceedings of SIAM International\nConference on Data Mining (SDM), 2009.\n\n[17] R. Yoshida, S. Imoto, and T. Higuchi. Estimating time-dependent gene networks from time\nseries microarray data by dynamic linear models with markov switching. In Proceedings of\nthe International Conference on Computational Systems Bioinformatics, 2005.\n\n[18] S. Zhu, K. Yu, and Y. Gong. Stochastic relational models for large-scale dyadic data using\n\nmcmc. In Advances in Neural Information Processing Systems 21 (NIPS), 2009.\n\n9\n\n\f", "award": [], "sourceid": 318, "authors": [{"given_name": "Katsuhiko", "family_name": "Ishiguro", "institution": null}, {"given_name": "Tomoharu", "family_name": "Iwata", "institution": null}, {"given_name": "Naonori", "family_name": "Ueda", "institution": null}, {"given_name": "Joshua", "family_name": "Tenenbaum", "institution": null}]}