{"title": "Non-stationary continuous dynamic Bayesian networks", "book": "Advances in Neural Information Processing Systems", "page_first": 682, "page_last": 690, "abstract": "Dynamic Bayesian networks have been applied widely to reconstruct the structure of regulatory processes from time series data. The standard approach is based on the assumption of a homogeneous Markov chain, which is not valid in many real-world scenarios. Recent research efforts addressing this shortcoming have considered undirected graphs, directed graphs for discretized data, or over-flexible models that lack any information sharing between time series segments. In the present article, we propose a non-stationary dynamic Bayesian network for continuous data, in which parameters are allowed to vary between segments, and in which a common network structure provides essential information sharing across segments. Our model is based on a Bayesian change-point process, and we apply a variant of the allocation sampler of Nobile and Fearnside to infer the number and location of the change-points.", "full_text": "Non-stationary continuous dynamic\n\nBayesian networks\n\nDepartment of Statistics, TU Dortmund University, 44221 Dortmund, Germany\n\nMarco Grzegorczyk\n\ngrzegorczyk@statistik.tu-dortmund.de\n\nBiomathematics & Statistics Scotland (BioSS)\n\nJCMB, The King\u2019s Buildings, Edinburgh EH93JZ, United Kingdom\n\nDirk Husmeier\n\ndirk@bioss.ac.uk\n\nAbstract\n\nDynamic Bayesian networks have been applied widely to reconstruct the structure\nof regulatory processes from time series data. The standard approach is based on\nthe assumption of a homogeneous Markov chain, which is not valid in many real-\nworld scenarios. Recent research efforts addressing this shortcoming have con-\nsidered undirected graphs, directed graphs for discretized data, or over-\ufb02exible\nmodels that lack any information sharing among time series segments.\nIn the\npresent article, we propose a non-stationary dynamic Bayesian network for con-\ntinuous data, in which parameters are allowed to vary among segments, and in\nwhich a common network structure provides essential information sharing across\nsegments. Our model is based on a Bayesian multiple change-point process, where\nthe number and location of the change-points is sampled from the posterior distri-\nbution.\n\nIntroduction\n\n1\nThere has recently been considerable interest in structure learning of Bayesian networks. Exam-\nples from the topical \ufb01eld of systems biology are the reconstruction of transcriptional regulatory\nnetworks from gene expression data [1], the inference of signal transduction pathways from pro-\ntein concentrations [2], and the identi\ufb01cation of neural information \ufb02ow operating in the brains of\nsongbirds [3]. In particular, dynamic Bayesian networks (DBNs) have been applied, as they allow\nfeedback loops and recurrent regulatory structures to be modelled while avoiding the ambiguity\nabout edge directions common to static Bayesian networks. The standard assumption underpinning\nDBNs is that of stationarity: time-series data are assumed to have been generated from a homoge-\nneous Markov process. However, regulatory interactions and signal transduction processes in the\ncell are usually adaptive and change in response to external stimuli. Likewise, neural information\n\ufb02ow slowly adapts via Hebbian learning to make the processing of sensory information more ef-\n\ufb01cient. The assumption of stationarity is therefore too restrictive in many circumstances, and can\npotentially lead to erroneous conclusions.\n\nIn the recent past, various research efforts have addressed this issue and proposed models that relax\nthe stationarity assumption. Talih and Hengartner [4] proposed a time-varying Gaussian graphical\nmodel (GGM), in which the time-varying variance structure of the data was inferred with reversible\njump (RJ) Markov chain Monte Carlo (MCMC). A limitation of this approach is that changes of the\nnetwork structure between different segments are restricted to changing at most a single edge, and\nthe total number of segments is assumed known a priori. Xuan and Murphy [5] developed a related\nnon-stationary GGM based on a product partition model. The method allows for separate structures\n\n1\n\n\fProposed\n\nhere\n\nMarginal\nLikelihood\n\nnode\nspeci\ufb01c\n\nYes\n\nRobinson &\n\nHartemink (2009)\n\nMarginal\nLikelihood\n\nwhole\nnetwork\n\nNo\n\nL`ebre\n(2008)\nMarginal\nLikelihood\n\nnode\nspeci\ufb01c\n\nNo\n\nGrzegorcyk\net al. (2008)\n\nMarginal\nLikelihood\n\nwhole\nnetwork\n\nYes\n\nKo et al.\n(2007)\nBIC\n\nnode\nspeci\ufb01c\n\nYes\n\nContinuous\nChange-point\n\nprocess\n\nDiscrete\n\nChange-point\n\nprocess\n\nContinuous\nChange-point\n\nprocess\n\nContinuous\n\nContinuous\n\nFree\n\nallocation\n\nFree\n\nallocation\n\nScore\n\nChange-\npoints\nStructure\nconstant\nData format\nLatent\nvariables\n\nTable 1: Overview of how our model compares with various related, recently published models.\n\nin different segments, where the number of structures is inferred from the data. The inference\nalgorithm iterates between a convex optimization for determining the graph structure and a dynamic\nprogramming algorithm for calculating the segmentation. The latter aspect imposes restrictions on\nthe graph structure (decomposability), though. Moreover, both the models of [4] and [5] are based\non undirected graphs, whereas most processes in systems biology, like neural information \ufb02ow,\nsignal transduction and transcriptional regulation, are intrinsically of a directed nature. To address\nthis shortcoming, Robinson and Hartemink [6] and L\u00b4ebre [7] proposed a non-stationary dynamic\nBayesian network. Both methods allow for different network structures in different segments of the\ntime series, where the location of the change-points and the total number of segments are inferred\nfrom the data with RJMCMC. The essential difference between the two methods is that the model\nproposed in [6] is a non-stationary version of the BDe score [8], which requires the data to be\ndiscretized. The method proposed in [7] is based on the Bayesian linear regression model of [9],\nwhich avoids the need for data discretization.\n\nAllowing the network structure to change between segments leads to a highly \ufb02exible model. How-\never, this approach faces a conceptual and a practical problem. The practical problem is potential\nmodel over-\ufb02exibility1. Owing to the high costs of postgenomic high-throughput experiments, time\nseries in systems biology are typically rather short. Modelling short time series segments with sep-\narate network structures will almost inevitably lead to in\ufb02ated inference uncertainty, which calls\nfor some information sharing between the segments. The conceptual problem is related to the very\npremise of a \ufb02exible network structure. This assumption is reasonable for some scenarios, like mor-\nphogenesis, where the different segments are e.g. associated with the embryonic, larval, pupal, and\nadult stages of fruit \ufb02y (as discussed in [6]). However, for most cellular processes on a shorter time\nscale, it is questionable whether it is the structure rather than just the strength of the regulatory in-\nteractions that changes with time. To use the analogy of the traf\ufb01c \ufb02ow network invoked in [6]: it\nis not the road system (the network structure) that changes between off-peak and rush hours, but the\nintensity of the traf\ufb01c \ufb02ow (the strength of the interactions). In the same vein, it is not the ability of\na transcription factor to potentially bind to the promoter of a gene and thereby initiate transcription\n(the interaction structure), but the extent to which this happens (the interaction strength).\n\nThe objective of the present work is to propose and assess a non-stationary continuous-valued DBN\nthat introduces information sharing among different time series segments via a constrained structure.\nOur model is non-stationary with respect to the parameters, while the network structure is kept \ufb01xed\namong segments. Our model complements the one proposed in [6] in two other aspects: the score\nis a non-stationary generalization of the BGe [10] rather than the BDe score, thus avoiding the need\nfor data discretization, and the patterns of non-stationarity are node-speci\ufb01c, thereby providing extra\nmodel \ufb02exibility. Our work is based on [11], [12], and [13]. Like [11], our model is effectively a\nmixture of BGe models. We replace the free allocation model of [11] by a change-point process\nto incorporate our prior notion that adjacent time points in a time series are likely to be governed\nby similar distributions. We borrow from [12] the concept of node-speci\ufb01c change-points to enable\ngreater model \ufb02exibility. However, as opposed to [12], we do not approximate the scoring function\nby BIC [14], but compute the proper marginal likelihood. The objective of inference is to infer the\n\n1Note that as opposed to [7], [6] partially addresses this issue via a prior distribution that discourages\n\nchanges in the network structure.\n\n2\n\n\flocation and the node-speci\ufb01c number of change-points from the posterior distribution. An overview\nof how our method is related to various recently published related models is provided in Table 1.\n\n2 Methodology\n\n2.1 The dynamic BGe network\n\nDBNs are \ufb02exible models for representing probabilistic relationships between interacting variables\n(nodes) X1, . . . , XN via a directed graph G. An edge pointing from Xi to Xj indicates that the\nrealization of Xj at time point t, symbolically: Xj(t), is conditionally dependent on the realization\nof Xi at time point t\u22121, symbolically: Xi(t\u22121). The parent node set of node Xn in G, \u03c0n = \u03c0n(G),\nis the set of all nodes from which an edge points to node Xn in G. Given a data set D, where Dn,t\nand D(\u03c0n,t) are the tth realizations Xn(t) and \u03c0n(t) of Xn and \u03c0n, respectively, and 1 \u2264 t \u2264 m\nrepresents time, DBNs are based on the following homogeneous Markov chain expansion:\n\nP (D|G, \u03b8) =\n\nN\n\nm\n\nYn=1\n\nYt=2\n\nP(cid:16)Xn(t) = Dn,t|\u03c0n(t \u2212 1) = D(\u03c0n,t\u22121), \u03b8n(cid:17)\n\n(1)\n\nwhere \u03b8 is the total parameter vector, composed of node-speci\ufb01c subvectors \u03b8n, which specify\nthe local conditional distributions in the factorization. From Eq. (1) and under the assumption of\n\nparameter independence, P (\u03b8|G) =Qn P (\u03b8n|G), the marginal likelihood is given by\n\nN\n\nP (D|G) = Z P (D|G, \u03b8)P (\u03b8|G)d\u03b8 =\nn , G) = Z\n\nm\n\nYt=2\n\n\u03a8(D\u03c0n\n\n\u03a8(D\u03c0n\n\nn , G)\n\nYn=1\n\nP(cid:16)Xn(t) = Dn,t|\u03c0n(t \u2212 1) = D(\u03c0n,t\u22121), \u03b8n(cid:17)P (\u03b8n|G)d\u03b8n\n\n(2)\n\n(3)\n\n:= {(Dn,t, D\u03c0n,t\u22121) : 2 \u2264 t \u2264 m} is the subset of data pertaining to node Xn\nwhere D\u03c0n\nn\nand parent set \u03c0n. We choose a linear Gaussian distribution for the local conditional distribution\nP (Xn|\u03c0n, \u03b8n) in Eq.(1). Under fairly weak regularity conditions discussed in [10] (parameter mod-\nularity and conjugacy of the prior2), the integral in Eq. (3) has a closed form solution, given by\nEq. (24) in [10]. The resulting expression is called the BGe score3.\n\n2.2 The non-stationary dynamic change-point BGe model (cpBGe)\n\nTo obtain a non-stationary DBN, we generalize Eq. (1) with a node-speci\ufb01c mixture model:\n\nP (D|G, V, K, \u03b8) =\n\nN\n\nm\n\nYn=1\n\nYt=2\n\nKn\n\nYk=1\n\nP(cid:16)Xn(t) = Dn,t|\u03c0n(t \u2212 1) = D(\u03c0n,t\u22121), \u03b8k\n\nn(cid:17)\u03b4Vn (t),k\n\n(4)\n\nwhere \u03b4Vn(t),k is the Kronecker delta, V is a matrix of latent variables Vn(t), Vn(t) = k indicates\nthat the realization of node Xn at time t, Xn(t), has been generated by the kth component of a\nmixture with Kn components, and K = (K1, . . . , Kn). Note that the matrix V divides the data\ninto several disjoined subsets, each of which can be regarded as pertaining to a separate BGe model\nwith parameters \u03b8k\nn. The vectors Vn are node-speci\ufb01c, i.e. different nodes can have different break-\npoints. The probability model de\ufb01ned in Eq.(4) is effectively a mixture model with local probability\ndistributions P (Xn|\u03c0n, \u03b8k\nn) and it can hence, under a free allocation of the latent variables, approx-\nimate any probability distribution arbitrarily closely. In the present work, we change the assignment\nof data points to mixture components from a free allocation to a change-point process. This effec-\ntively reduces the complexity of the latent variable space and incorporates our prior belief that, in a\n\n2The conjugate prior is a normal-Wishart distribution. For the present study, we chose the hyperparameters\n\nof this distribution maximally uninformative subject to the regularity conditions discussed in [10].\n\n3The score equivalence aspect of the BGe model is not required for DBNs, because edge reversals are not\npermissible. However, formulating our method in terms of the BGe score is advantageous when adapting the\nproposed framework to non-linear static Bayesian networks along the lines of [12].\n\n3\n\n\ftime series, adjacent time points are likely to be assigned to the same component. From Eq. (4), the\nmarginal likelihood conditional on the latent variables V is given by\n\nP (D|G, V, K)=Z P (D|G, V, K, \u03b8)P (\u03b8)d\u03b8 =\nn [k, Vn], G)=Z\n\nm\n\nYt=2\n\n\u03a8(D\u03c0n\n\nP(cid:16)Xn(t) = Dn,t|\u03c0n(t \u2212 1) = D(\u03c0n,t\u22121), \u03b8k\n\nn(cid:17)\u03b4Vn (t),k\n\nN\n\nYn=1\n\nKn\n\nYk=1\n\n\u03a8(D\u03c0n\n\nn [k, Vn], G)\n\n(5)\n\nEq. (6) is similar to Eq. (3), except\n:=\n{(Dn,t, D\u03c0n,t\u22121) : Vn(t) = k, 2 \u2264 t \u2264 m}. Hence when the regularity conditions de\ufb01ned in\n[10] are satis\ufb01ed, then the expression in Eq.(6) has a closed-form solution: it is given by Eq. (24) in\n[10] restricted to the subset of the data that has been assigned to the kth mixture component (or kth\nsegment). The joint probability distribution of the proposed cpBGe model is given by:\n\nis restricted to the subset D\u03c0n\n\nn [k, Vn]\n\nthat\n\nit\n\nP (\u03b8k\n\nn|G)d\u03b8k\n\nn (6)\n\nP (G, V, K, D) = P (D|G, V, K) \u00b7 P (G) \u00b7 P (V|K) \u00b7 P (K)\nKn\n\nN\n\n= P (G) \u00b7\n\nYn=1({P (Vn|Kn) \u00b7 P (Kn) \u00b7\n\nYk=1\n\n\u03a8(D\u03c0n\n\nn [k, Vn], G))\n\n(7)\n\nP (V|K) = QN\n\nIn the absence of genuine prior knowledge about the regulatory network structure, we assume for\nP (G) a uniform distribution on graphs, subject to a fan-in restriction of |\u03c0n| \u2264 3. As prior prob-\nability distributions on the node-speci\ufb01c numbers of mixture components Kn, P (Kn), we take iid\ntruncated Poisson distributions with shape parameter \u03bb = 1, restricted to 1 \u2264 Kn \u2264 KM AX\n(we set KM AX = 10 in our simulations). The prior distribution on the latent variable vectors,\nn=1{P (Vn|Kn), is implicitly de\ufb01ned via the change-point process as follows. We\nidentify Kn with Kn \u2212 1 change-points bn = {bn,1, . . . , bn,Kn\u22121} on the continuous interval [2, m].\nFor notational convenience we introduce the pseudo change-points bn,0 = 2 and bn,Kn = m. For\nnode Xn the observation at time point t is assigned to the kth component, symbolically Vn(t) = k,\nif bn,k\u22121 \u2264 t < bn,k. Following [15] we assume that the change-points are distributed as the even-\nnumbered order statistics of L := 2(Kn \u2212 1) + 1 points u1, . . . , uL uniformly and independently\ndistributed on the interval [2, m]. The motivation for this prior, instead of taking Kn uniformly\ndistributed points, is to encourage a priori an equal spacing between the change-points, i.e.\nto\ndiscourage mixture components (i.e. segments) that contain only a few observations. The even-\nnumbered order statistics prior on the change-point locations bn induces a prior distribution on the\nnode-speci\ufb01c allocation vectors Vn. Deriving a closed-form expression is involved. However, the\nMCMC scheme we discuss in the next section does not sample Vn directly, but is based on local\nmodi\ufb01cations of Vn based on birth, death and reallocation moves. All that is required for the ac-\nceptance probabilities of these moves are P (Vn|Kn) ratios, which are straightforward to compute.\n\n2.3 MCMC inference\n\nWe now describe an MCMC algorithm to obtain a sample {Gi, Vi, Ki}i=1,...,I from the posterior\ndistribution P (G, V, K|D) \u221d P (G, V, K, D) of Eq. (7). We combine the structure MCMC algo-\nrithm4 [17, 18] with the change-point model used in [15], and draw on the fact that conditional on\nthe allocation vectors V, the model parameters can be integrated out to obtain the marginal like-\nn [k, Vn], G) in closed form, as shown in the previous section. Note that this\nlihood terms \u03a8(D\u03c0n\napproach is equivalent to the idea underlying the allocation sampler proposed in [13]. The resulting\nalgorithm is effectively an RJMCMC scheme [15] in the discrete space of network structures and\nlatent allocation vectors, where the Jacobian in the acceptance criterion is always 1 and can be omit-\nted. With probability pG = 0.5 we perform a structure MCMC move on the current graph Gi and\nleave the latent variable matrix and the numbers of mixture components unchanged, symbolically:\nVi+1 = Vi and Ki+1 = Ki. A new candidate graph Gi+1 is randomly drawn out of the set of\ngraphs N (Gi) that can be reached from the current graph Gi by deletion or addition of a single edge.\nThe proposed graph Gi+1 is accepted with probability:\n\nA(Gi+1|Gi) = min(cid:26)1,\n\nP (D|Gi+1, Vi, Ki)\nP (D|Gi, Vi, Ki)\n\nP (Gi+1)\nP (Gi)\n\n|N (Gi)|\n\n|N (Gi+1)|(cid:27)\n\n(8)\n\n4An MCMC algorithm based on Eq.(10) in [16] is computationally less ef\ufb01cient than when applied to static\nBayesian networks or stationary DBNs, since the local scores would have to be re-computed every time the\npositions of the change-points change.\n\n4\n\n\fpip3\n\nplcg\n\npkc\n\npip2\n\np38\n\nraf\n\nmek\n\nerk\n\nakt\n\npka\n\njnk\n\n(d)\n\n(a)\n\n(b)\n\n(c)\n\nFigure 1: Networks from which synthetic data were generated. Panels (a-c) show elementary\nnetwork motifs [20]. Panel (d) shows a protein signal transduction network studied in [2], with an\nadded feedback loop on the root node.\n\nn is the latent variable vector of Xn in Vi, and Ki = (Ki\n\nwhere |.| is the cardinality, and the marginal likelihood terms have been speci\ufb01ed in Eq. (5). The\ngraph is left unchanged, symbolically Gi+1 := Gi, if the move is not accepted.\nWith the complementary probability 1 \u2212 pG we leave the graph Gi unchanged and perform a move\non (Vi, Ki), where Vi\nN ). We\nn via a change-point\nrandomly select a node Xn and change its current number of components Ki\nn by a change-point re-allocation move. The\nbirth or death move, or its latent variable vector Vi\nn.\nn by 1 and may also have an effect on Vi\nchange-point birth (death) move increases (decreases) Ki\nThe change-point reallocation move leaves Ki\nn. Under\nfairly mild regularity conditions (ergodicity), the MCMC sampling scheme converges to the desired\nposterior distribution if the acceptance probabilities for the three change-point moves (Ki\nn) \u2192\n(Ki+1\n\nn unchanged and may have an effect on Vi\n\nn ) are chosen of the form min(1, R), see [15], with\n\nn , Vi+1\n\n1, . . . , Ki\n\nn, Vi\n\nn\n\nn\n\n\u00d7 A \u00d7 B\n\n(9)\n\nR = QKi+1\nQK i\n\nn [k, Vi+1\nn [k, Vi\n\nn ], G)\nn], G)\nwhere A = P (Vi+1\nn) is the prior probability ratio, and B is the\ninverse proposal probability ratio. The exact form of these factors depends on the move type and\nis provided in the supplementary material. We note that the implementation of the dynamic pro-\ngramming scheme proposed in [19] has the prospect to improve the convergence and mixing of the\nMarkov chain, which we will investigate in our future work.\n\nk=1 \u03a8(D\u03c0n\nk=1 \u03a8(D\u03c0n\nn|Ki\n\nn )P (Ki+1\n\nn )/P (Vi\n\nn |Ki+1\n\nn)P (Ki\n\n3 Results on synthetic data\nTo assess the performance of the proposed model, we applied it to a set of synthetic data generated\nfrom different networks, as shown in Figure 1. The structures in Figure panels 1a-c constitute\nelementary network motifs, as studied e.g. in [20]. The network in Figure 1d was extracted from\nthe systems biology literature [2] and represents a well-studied protein signal transduction pathway.\nWe added an extra feedback loop on the root node to allow the generation of a Markov chain with\nnon-zero autocorrelation; note that this modi\ufb01cation is not biologically implausible [21].\n\nWe generated data with a mixture of piece-wise linear processes and sinusoidal transfer functions.\nThe advantage of the \ufb01rst approach is the exact knowledge of the true process change-points; the\nsecond approach is more realistic (smooth function) with a stronger mismatch between model and\ndata-generation mechanism. For example, the network in Figure 1c was modelled as\n\nX(t + 1) = \u03c6X (t);\n\nY (t + 1) = \u03c6Y (t); W (t + 1) = W (t) +\n\nZ(t + 1) = cX \u00b7 X(t) + cY \u00b7 Y (t) + \u00b7sin(W (t)) + cZ \u00b7 \u03c6Z(t + 1)\n\n2\u03c0\nm\n\n+ cW \u00b7 \u03c6W (t)\n\n(10)\n\nwhere the \u03c6.(.) are iid standard Normally distributed. We employed different values cX = cY \u2208\n{0.25, 0.5} and cZ, cW \u2208 {0.25, 0.5, 1} to vary the signal-to-noise ratio and the amount of au-\ntocorrelation in W . For each parameter con\ufb01guration, 25 time series with 41 time points where\nindependently generated. For the other networks, data were generated in a similar way. Owing\nto space restrictions, the complete model speci\ufb01cations have to be relegated to the supplementary\nmaterial.\n\n5\n\n\f1\n\n0.8\n\n0.6\n\n0.4\n\n0.4\n\nGrz. et al.\nKo et al.\nBGe\nBDe\nref. line\n\n1\n\n0.8\n\n0.6\n\n1\n\n0.8\n\n0.6\n\n1\n\n0.8\n\n0.6\n\n0.6\n\n0.8\n\n(a)\n\n0.4\n\n0.4\n\n1\n\n0.6\n\n0.8\n\n(b)\n\n0.4\n\n0.4\n\n1\n\n0.6\n\n0.8\n\n(c)\n\n0.4\n\n0.4\n\n1\n\n0.6\n\n0.8\n\n1\n\n(d)\n\ncpBGe vs. . . .\n. . . vs. Grz. et al.\n. . . vs. Ko et al.\n\n. . . vs. BGe\n. . . vs. BDe\n\n(a)\n\n(b)\n\n(c)\n\n0.074\n\n0.753\n\n<0.0001 <0.0001\n<0.0001\n<0.0001\n<0.0001 <0.0001 <0.0001\n<0.0001 <0.0001 <0.0001 <0.0001\n\n(d)\n0.013\n0.002\n0.060\n\nFigure 2: Comparison of AUC scores on the synthetic data. The panels (a-d) correspond to those\nof Figure 1. The horizontal axis in each panel represents the proposed cpBGe model. The vertical\naxis represents the following competing models: BDe (\u25b3), BGe (\u2294), the method of Ko et al. [12]\n((cid:13)), and the method of Grzegorczyk et al. [11] (\u22c6), adapted as described in the text. Different sym-\nbols of the same shape correspond to different signal-to-noise ratios (SNR) and autocorrelation times\n(ACT). Each symbol shows a comparison of two average AUC scores, averaged over 25 (panels a-\nc) or 5 (panel d) time series independently generated for a given SNR/ACT setting. The diagonal\nline indicates equal performance; symbols below this lines indicate that the proposed cpBGe model\noutperforms the competing model. The table in the bottom shows an overview of the corresponding\np-values obtained from a two-sided paired t-test with Bonferroni correction. For all but three cases\nthe cpBGe model outperforms the competing model at the standard 5% signi\ufb01cance level.\n\nTo each data set, we applied the proposed cpBGe model as described in Section 2. We compared its\nperformance with four alternative schemes. We chose the classical stationary DBNs based on BDe\n[8] and BGe [10]. Note that for these models the parameters can be integrated out analytically, and\nonly the network structure has to be learned. The latter was sampled from the posterior distribution\nwith structure MCMC [17, 18]. Note that the BDe model requires discretized data, which we ef-\nfected with the information bottleneck algorithm [22]. Our comparative evaluation also included two\nnon-linear/non-stationary models with a clearly de\ufb01ned network structure (for the sake of compara-\nbility with our approach). We chose the method of Ko et al. [12] for its \ufb02exibility and comparative\nease of implementation. The inference scheme is based on the application of the EM algorithm [23]\nto a node-speci\ufb01c mixture model subject to a BIC penalty term [14]. We implemented this algorithm\naccording to the authors\u2019 speci\ufb01cation in MATLAB c(cid:13), using the software package NETLAB [24].\nWe also compared our model with the approach proposed by Grzegorczyk et al. [11]. We applied the\nsoftware available from the authors\u2019 website. We replaced the authors\u2019 free allocation model by the\nchange-point process used for our model. This was motivated by the fact that for a fair comparison,\nthe same prior knowledge about the data structure (time series) should be used. In all other aspects\nwe applied the method as described in [11]. All MCMC simulations were divided into a burn-in and\na sampling phase, where the length of the burn-in phase was chosen such that standard convergence\ncriteria based on potential scale reduction factors [25] were met. The software implementations of\nall methods used in our study are available upon request. For lack of space, further details have to\nbe relegated to the supplementary material.\n\nTo assess the network reconstruction accuracy, various criteria have been proposed in the litera-\nture. In the present study, we chose receiver-operator-characteristic (ROC) curves computed from\nthe marginal posterior probabilities of the edges (and the ranking thereby induced). Owing to the\nlarge number of simulations \u2013 for each network and parameter setting the simulations were repeated\non 25 (Figures 2a-c) or 5 (Figures 2d) independently generated time series \u2013 we summarized the\nperformance by the area under the curve (AUC), ranging between 0.5 (expected random predictor)\nto 1.0 (perfect predictor). The results are shown in Figure 2 and suggest that the proposed cpBGe\nmodel tends to signi\ufb01cantly outperform the competing models. A more detailed analysis with an\n\n6\n\n\f0.6\n\n0.3\n\n0\n0\n\n40\n\n20\n\n5\n\n0.6\n\n0.3\n\n0\n0\n\n0.6\n\n0.3\n\n0\n0\n\n10\n\n20\n\n30\n\n40\n\n10\n\n20\n\n30\n\n40\n\n10\n\n20\n\n30\n\n40\n\n40\n\n20\n\n5\n\n5\n\n20\n\n40\n\n5\n\n20\n\n40\n\nFigure 3: Results on the Arabidopsis gene expression time series. Top panels: Average posterior\nprobability of a change-point (vertical axis) at a speci\ufb01c transition time plotted against the transition\ntime (horizontal axis) for two selected circadian genes (left: LHY, centre: TOC1) and averaged over\nall 9 genes (right). The vertical dotted lines indicate the boundaries of the time series segments,\nwhich are related to different entrainment conditions and time intervals. Bottom left and centre pan-\nels: Co-allocation matrices for the two selected genes LHY and TOC1. The axes represent time.\nThe grey shading indicates the posterior probability of two time points being assigned to the same\nmixture component, ranging from 0 (black) to 1 (white). Bottom right panel: Predicted regulatory\nnetwork of nine circadian genes in Arabidopsis thaliana. Empty circles represent morning genes.\nShaded circles represent evening genes. Edges indicate predicted interactions with a marginal pos-\nterior probability greater than 0.5.\n\ninvestigation of how the signal-to-noise ratio and the autocorrelation parameters effect the relative\nperformance of the methods has to be relegated to the supplementary material for lack of space.\n\n4 Results on Arabidopsis gene expression time series\nWe have applied our method to microarray gene expression time series related to the study of cir-\ncadian regulation in plants. Arabidopsis thaliana seedlings, grown under arti\ufb01cially controlled Te-\nhour-light/Te-hour-dark cycles, were transferred to constant light and harvested at 13 time points in\n\u03c4 -hour intervals. From these seedlings, RNA was extracted and assayed on Affymetrix GeneChip\noligonucleotide arrays. The data were background-corrected and normalized according to standard\nprocedures5, using the GeneSpring c(cid:13) software (Agilent Technologies). We combined four time se-\nries, which differed with respect to the pre-experiment entrainment condition and the time intervals:\nTe \u2208 {10h, 12h, 14h}, and \u03c4 \u2208 {2h, 4h}. The data, with detailed information about the experi-\nmental protocols, can be obtained from [27], [11], and [28]. We focused our analysis on 9 circadian\ngenes6 (i.e. genes involved in circadian regulation). We combined all four time series into a single\nset. The objective was to test whether the proposed cpBGe model would detect the different experi-\nmental phases. Since the gene expression values at the \ufb01rst time point of a time series segment have\nno relation with the expression values at the last time point of the preceding segment, the corre-\nsponding boundary time points were appropriately removed from the data7. This ensures that for all\npairs of consecutive time points a proper conditional dependence relation determined by the nature\nof the regulatory cellular processes is given. The top panel of Figure 3 shows the marginal posterior\n\n5We used RMA rather than GCRMA for reasons discussed in [26].\n6These 9 circadian genes are LHY, TOC1, CCA1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3.\n7A proper mathematical treatment is given in Section 3 of the supplementary material.\n\n7\n\n\fprobability of a change-point for two selected genes (LHY and TOC1), and averaged over all genes.\nIt is seen that the three concatenation points are clearly detected. There is a slight difference between\nthe heights of the posterior probability peaks for LHY and TOC1. This behaviour is also captured by\nthe co-allocation matrices in the bottom row of Figure 3. This deviation indicates that the two genes\nare effected by the changing experimental conditions (entrainment, time interval) in different ways\nand thus provides a useful tool for further exploratory analysis. The bottom right panel of Figure 3\nshows the gene interaction network that is predicted when keeping all edges with marginal posterior\nprobability above 0.5. There are two groups of genes. Empty circles in the \ufb01gure represent morning\ngenes (i.e. genes whose expression peaks in the morning), shaded circles represent evening genes\n(i.e. genes whose expression peaks in the evening). There are several directed edges pointing from\nthe group of morning genes to the evening genes, mostly originating from gene CCA1. This result\nis consistent with the \ufb01ndings in [29], where the morning genes were found to activate the evening\ngenes, with CCA1 being a central regulator. Our reconstructed network also contains edges pointing\ninto the opposite direction, from the evening genes back to the morning genes. This \ufb01nding is also\nconsistent with [29], where the evening genes were found to inhibit the morning genes via a negative\nfeedback loop. In the reconstructed network, the connectivity within the group of evening genes is\nsparser than within the group of morning genes. This \ufb01nding is consistent with the fact that follow-\ning the light-dark cycle entrainment, the experiments were carried out in constant-light condition,\nresulting in a higher activity of the morning genes overall. Within the group of evening genes, the\nreconstructed network contains an edge between GI and TOC1. This interaction has been con\ufb01rmed\nin [30]. Hence while a proper evaluation of the reconstruction accuracy is currently unfeasible \u2013\nlike [6] and many related studies, we lack a gold-standard owing to the unknown nature of the true\ninteraction network \u2013 our study suggests that the essential features of the reconstructed network are\nbiologically plausible and consistent with the literature.\n\n5 Discussion\n\nWe have proposed a continuous-valued non-stationary dynamic Bayesian network, which constitutes\na non-stationary generalization of the BGe model. This complements the work of [6], where a\nnon-stationary BDe model was proposed. We have argued that a \ufb02exible network structure can\nlead to practical and conceptual problems, and we therefore only allow the parameters to vary\nwith time. We have presented a comparative evaluation of the network reconstruction accuracy\non synthetic data. Note that such a study is missing from recent related studies on this topic, like [6]\nand [7], presumably because their overall network structure is not properly de\ufb01ned. Our \ufb01ndings\nsuggest that the proposed non-stationary BGe model achieves a clear performance improvement\nover the classical stationary models BDe and BGe as well as over the non-linear/non-stationary\nmodels of [12] and [11]. The application of our model to gene expression time series from circadian\nclock-regulated genes in Arabidopsis thaliana has led to a plausible data segmentation, and the\nreconstructed network shows features that are consistent with the biological literature.\n\nThe proposed model is based on a multiple change-point process. This scheme provides the ap-\nproximation of a non-linear regulation process by a piecewise linear process under the assumption\nthat the temporal processes are suf\ufb01ciently smooth. A straightforward modi\ufb01cation would be the\nreplacement of the change-point process by the allocation model of [13] and [11]. This modi\ufb01cation\nwould result in a fully-\ufb02exible mixture model, which is preferable if the smoothness assumption for\nthe temporal processes is violated. It would also provide a non-linear Bayesian network for static\nrather than time series data. While the algorithmic implementation is straightforward, the increased\ncomplexity of the latent variable con\ufb01guration space would introduce additional challenges for the\nmixing and convergence properties of the MCMC sampler. The development of more effective pro-\nposal moves, as well as a comparison with alternative non-linear Bayesian network models, like\n[31], is a promising subject for future research.\n\nAcknowledgements\n\nMarco Grzegorczyk is supported by the Graduate School \u201cStatistische Modellbildung\u201d of the De-\npartment of Statistics, University of Dortmund. Dirk Husmeier is supported by the Scottish Govern-\nment Rural and Environment Research and Analysis Directorate (RERAD).\n\n8\n\n\fReferences\n[1] N. Friedman, M. Linial, I. Nachman, and D. Pe\u2019er. Using Bayesian networks to analyze expression data.\n\nJournal of Computational Biology, 7:601\u2013620, 2000.\n\n[2] K. Sachs, O. Perez, D. Pe\u2019er, D. A. Lauffenburger, and G. P. Nolan. Protein-signaling networks derived\n\nfrom multiparameter single-cell data. Science, 308:523\u2013529, 2005.\n\n[3] V. A. Smith, J. Yu, T. V. Smulders, A. J. Hartemink, and E. D. Jarvi. Computational inference of neural\n\ninformation \ufb02ow networks. PLoS Computational Biology, 2:1436\u20131449, 2006.\n\n[4] M. Talih and N. Hengartner. Structural learning with time-varying components: Tracking the cross-\n\nsection of \ufb01nancial time series. Journal of the Royal Statistical Society B, 67(3):321\u2013341, 2005.\n\n[5] X. Xuan and K. Murphy. Modeling changing dependency structure in multivariate time series. In Zoubin\nGhahramani, editor, Proceedings of the 24th Annual International Conference on Machine Learning\n(ICML 2007), pages 1055\u20131062. Omnipress, 2007.\n\n[6] J. W. Robinson and A. J. Hartemink. Non-stationary dynamic Bayesian networks. In D. Koller, D. Schu-\nurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages\n1369\u20131376. Morgan Kaufmann Publishers, 2009.\n\n[7] S. L`ebre. Analyse de processus stochastiques pour la g\u00b4enomique : \u00b4etude du mod`ele MTD et inf\u00b4erence de\n\nr\u00b4eseaux bay\u00b4esiens dynamiques. PhD thesis, Universit\u00b4e d\u2018Evry-Val-d\u2018Essonne, 2008.\n\n[8] D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The combination of\n\nknowledge and statistical data. Machine Learning, 20:245\u2013274, 1995.\n\n[9] C. Andrieu and A. Doucet. Joint Bayesian model selection and estimation of noisy sinusoids via re-\n\nversible jump MCMC. IEEE Transactions on Signal Processing, 47(10):2667\u20132676, 1999.\n\n[10] D. Geiger and D. Heckerman. Learning Gaussian networks. In Proceedings of the Tenth Conference on\n\nUncertainty in Arti\ufb01cial Intelligence, pages 235\u2013243, San Francisco, CA., 1994. Morgan Kaufmann.\n\n[11] M. Grzegorczyk, D. Husmeier, K. Edwards, P. Ghazal, and A. Millar. Modelling non-stationary gene reg-\nulatory processes with a non-homogeneous Bayesian network and the allocation sampler. Bioinformatics,\n24(18):2071\u20132078, 2008.\n\n[12] Y. Ko, C. Zhai, and S.L. Rodriguez-Zas. Inference of gene pathways using Gaussian mixture models.\nIn BIBM International Conference on Bioinformatics and Biomedicine, pages 362\u2013367. Fremont, CA,\n2007.\n\n[13] A. Nobile and A.T. Fearnside. Bayesian \ufb01nite mixtures with an unknown number of components: The\n\nallocation sampler. Statistics and Computing, 17(2):147\u2013162, 2007.\n\n[14] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461\u2013464, 1978.\n[15] P. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.\n\n[16] N. Friedman and D. Koller. Being Bayesian about network structure. Machine Learning, 50:95\u2013126,\n\n[17] P. Giudici and R. Castelo. Improving Markov chain Monte Carlo model search for data mining. Machine\n\n[18] D. Madigan and J. York. Bayesian graphical models for discrete data. International Statistical Review,\n\nBiometrika, 82:711\u2013732, 1995.\n\n2003.\n\nLearning, 50:127\u2013158, 2003.\n\n63:215\u2013232, 1995.\n\nComputing, 16:203\u2013213, 2006.\n\n[19] P. Fearnhead. Exact and ef\ufb01cient Bayesian inference for multiple changepoint problems. Statistics and\n\n[20] S. S. Shen-Orr, R. Milo, S. Mangan, and U. Alon. Network motifs in the transcriptional regulation\n\nnetwork of Escherichia coli. Nature Genetics, 31:64\u201368, 2002.\n\n[21] M. K. Dougherty, J. Muller, D. A. Ritt, M. Zhou, X. Z. Zhou, T. D. Copeland, T. P. Conrads, T. D. Veen-\nstra, K. P. Lu, and D. K. Morrison. Regulation of Raf-1 by direct feedback phosphorylation. Molecular\nCell, 17:215\u2013224, 2005.\n\n[22] A. J. Hartemink. Principled Computational Methods for the Validation and Discovery of Genetic Regu-\n\nlatory Networks. PhD thesis, MIT, 2001.\n\n[23] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM\n\nalgorithm. Journal of the Royal Statistical Society, B39(1):1\u201338, 1977.\n\n[24] I. T. Nabney. NETLAB: Algorithms for Pattern Recognition. Springer Verlag, New York, 2004.\n[25] A. Gelman and D. B. Rubin. Inference from iterative simulation using multiple sequences. Statistical\n\nScience, 7:457\u2013472, 1992.\n\n[26] W.K. Lim, K. Wang, C. Lefebvre, and A. Califano. Comparative analysis of microarray normalization\n\nprocedures: effects on reverse engineering gene networks. Bioinformatics, 23(13):i282\u2013i288, 2007.\n\n[27] K. D. Edwards, P. E. Anderson, A. Hall, N. S. Salathia, J. C.W. Locke, J. R. Lynn, M. Straume, J. Q.\nSmith, and A. J. Millar. Flowering locus C mediates natural variation in the high-temperature response\nof the Arabidopsis circadian clock. The Plant Cell, 18:639\u2013650, 2006.\n\n[28] T.C. Mockler, T.P. Michael, H.D. Priest, R. Shen, C.M. Sullivan, S.A. Givan, C. McEntee, S.A. Kay, and\nJ. Chory. The diurnal project: Diurnal and circadian expression pro\ufb01ling, model-based pattern matching\nand promoter analysis. Cold Spring Harbor Symposia on Quantitative Biology, 72:353\u2013363, 2007.\n\n[29] C. R. McClung. Plant circadian rhythms. Plant Cell, 18:792\u2013803, 2006.\n[30] J.C.W. Locke, M.M. Southern, L. Kozma-Bognar, V. Hibberd, P.E. Brown, M.S. Turner, and A.J. Millar.\nExtension of a genetic network model by iterative experimentation and mathematical analysis. Molecular\nSystems Biology, 1:(online), 2005.\n\n[31] S. Imoto, S. Kim, T. Goto, , S. Aburatani, K. Tashiro, Satoru Kuhara, and Satoru Miyano. Bayesian net-\nworks and nonparametric heteroscedastic regression for nonlinear modeling of genetic networks. Journal\nof Bioinformatics and Computational Biology, 1(2):231\u2013252, 2003.\n\n9\n\n\f", "award": [], "sourceid": 646, "authors": [{"given_name": "Marco", "family_name": "Grzegorczyk", "institution": null}, {"given_name": "Dirk", "family_name": "Husmeier", "institution": null}]}