{"title": "Time-Varying Dynamic Bayesian Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 1732, "page_last": 1740, "abstract": "Directed graphical models such as Bayesian networks are a favored formalism to model the dependency structures in complex multivariate systems such as those encountered in biology and neural sciences. When the system is undergoing dynamic transformation, often a temporally rewiring network is needed for capturing the dynamic causal influences between covariates. In this paper, we propose a time-varying dynamic Bayesian network (TV-DBN) for modeling the structurally varying directed dependency structures underlying non-stationary biological/neural time series. This is a challenging problem due the non-stationarity and sample scarcity of the time series. We present a kernel reweighted $\\ell_1$ regularized auto-regressive procedure for learning the TV-DBN model. Our method enjoys nice properties such as computational efficiency and provable asymptotic consistency. Applying TV-DBN to time series measurements during yeast cell cycle and brain response to visual stimuli reveals interesting dynamics underlying the respective biological systems.", "full_text": "Time-Varying Dynamic Bayesian Networks\n\nSchool of Computer Science, Carnegie Mellon University\n\nLe Song, Mladen Kolar and Eric P. Xing\n{lesong, mkolar, epxing}@cs.cmu.edu\n\nAbstract\n\nDirected graphical models such as Bayesian networks are a favored formalism\nfor modeling the dependency structures in complex multivariate systems such as\nthose encountered in biology and neural science. When a system is undergo-\ning dynamic transformation, temporally rewiring networks are needed for cap-\nturing the dynamic causal in\ufb02uences between covariates. In this paper, we pro-\npose time-varying dynamic Bayesian networks (TV-DBN) for modeling the struc-\nturally varying directed dependency structures underlying non-stationary biologi-\ncal/neural time series. This is a challenging problem due the non-stationarity and\nsample scarcity of time series data. We present a kernel reweighted (cid:96)1-regularized\nauto-regressive procedure for this problem which enjoys nice properties such as\ncomputational ef\ufb01ciency and provable asymptotic consistency. To our knowledge,\nthis is the \ufb01rst practical and statistically sound method for structure learning of TV-\nDBNs. We applied TV-DBNs to time series measurements during yeast cell cycle\nand brain response to visual stimuli. In both cases, TV-DBNs reveal interesting\ndynamics underlying the respective biological systems.\n\nIntroduction\n\n1\nAnalysis of biological networks has led to numerous advances in understanding the organizational\nprinciples and functional properties of various biological systems, such as gene regulatory sys-\ntems [1] and central nervous systems [2]. However, most such results are based on static networks,\nthat is, networks with invariant topology over a given set of biological entities. A major challenge in\nsystems biology is to understand and model, quantitatively, the dynamic topological and functional\nproperties of biological networks. We refer to these time or condition speci\ufb01c biological circuitries\nas time-varying networks or structural non-stationary networks, which are ubiquitous in biological\nsystems. For example (i) over the course of a cell cycle, there may exist multiple biological \u201cthemes\u201d\nthat determine functions of each gene and their regulatory relations, and these \u201cthemes\u201d are dynamic\nand stochastic. As a result, the molecular networks at each time point are context-dependent and\ncan undergo systematic rewiring rather than being invariant over time [3]. (ii) The emergence of\na uni\ufb01ed cognitive moment relies on the coordination of scattered mosaics of functionally special-\nized brain regions. Neural assemblies, distributed local networks of neurons transiently linked by\ndynamic connections, enable the emergence of coherent behaviour and cognition [4].\nA key technical hurdle preventing us from an in-depth investigation of the mechanisms that drive\ntemporal biological processes is the unavailability of serial snapshots of time-varying networks un-\nderlying biological processes. Current technology does not allow for experimentally determining a\nseries of time speci\ufb01c networks for a realistic dynamic biological system. Usually, only time series\nmeasurements of the activities of the nodes can be made, such as microarray, EEG or fMRI. Our\ngoal is to recover the latent time-varying networks underlying biological processes, with temporal\nresolution up to every single time point based on time series measurements of the nodal states. Re-\ncently, there has been a surge of interests along this direction [5, 6, 7, 8, 9, 10]. However, most\nexisting approaches are computationally expensive, making large scale genome-wide reverse engi-\nneering nearly infeasible. Furthermore, these methods also lack formal statistical characterization of\n\n1\n\n\fthe estimation procedure. For instance, non-stationary dynamic Bayesian networks are introduced\nin [9], where the structures are learned via MCMC sampling; such approach is not likely to scale\nup to more than 1000 nodes and without a regularization term it is also prone to over\ufb01tting when\nthe dimension of the data is high but the number of observations is small. More recent efforts have\nfocused on ef\ufb01cient kernel reweighted or total-variation penalized sparse structure recovery meth-\nods for undirected time-varying networks [10, 11, 12], which possess both attractive computational\nschemes and rigorous statistical consistency results. However, what has not been addressed so far is\nhow to recover directed time-varying networks. Our current paper advances in this direction.\nMore speci\ufb01cally, we propose time-varying dynamic Bayesian networks (TV-DBN) for modeling\nthe directed time-evolving network structures underlying non-stationary biological time series. To\nmake this problem statistically tractable, we rely on the assumption that the underlying network\nstructures are sparse and vary smoothly across time. We propose a kernel reweighted (cid:96)1-regularized\nauto-regressive approach for learning this sequence of networks. Our approach has the following at-\ntractive properties: (i) The aggregation of observations from adjacent time points by kernel reweight-\ning greatly alleviates the statistical problem of sample scarcity when the networks can change at\neach time point whereas only one or a few time series replicates are available. (ii) The problem\nof structural estimation for a TV-DBN decomposes into a collection of simpler and atomic struc-\ntural learning problems. We can choose from a battery of highly scalable (cid:96)1-regularized least-square\nsolvers for learning each structure. (iii) We can formally characterize the conditions under which our\nestimation procedure is structurally consistent: as time series are sampled in increasing resolution,\nour algorithm can recover the true structure of the underlying TV-DBN with high probability.\nIt is worth emphasizing that our approach is very different from earlier approaches, such as the\nstructure learning algorithms for dynamic Bayesian networks [13], which learn time-homogeneous\ndynamic systems with \ufb01xed node dependencies, or approaches which start from an a priori static\nnetwork and then trace time-dependent activities [3]. The Achilles\u2019 heel of this latter approach\nis that edges that are transient over a short period of time may be missed by the summary static\nnetwork in the \ufb01rst place. Furthermore, our approach is also different from change point based al-\ngorithms [14, 8] which \ufb01rst segment time series and then \ufb01t an invariant structure to each segment.\nThese approaches can only recover piece-wise stationary models rather than constantly varying net-\nworks. In our experiments, we demonstrate the advantange of TV-DBNs using synthetic networks.\nWe also apply TV-DBNs to real world datasets: a gene expression dataset measured during yeast\ncell cycle; and an EEG dataset recorded during a motor imagination task. In both cases, TV-DBNs\nreveal interesting time-varying causal structures of the underlying biological systems.\n\n2 Preliminary\nWe concern ourselves with stochastic processes in time or space domains, such as the dynamic con-\ntrol of gene expression during cell cycle, or the sequential activation of brain areas during cognitive\ndecision making, of which the state of a variable at one time point is determined by the states of a\nset of variables at previous time points. Models describing a stochastic temporal processes can be\nnaturally represented as dynamic Bayesian networks (DBN) [15]. Taking the transcriptional regula-\np)(cid:62) \u2208 Rp be a vector representing the\ntion of gene expression as an example, let X t := (X t\nexpression levels of p genes at time t, a stochastic dynamic process can be modeled by a \u201c\ufb01rst-order\nMarkovian transition model\u201d p(X t|X t\u22121), which de\ufb01nes the probabilistic distribution of gene ex-\npressions at time t given those at time t \u2212 1. Under this assumption, the likelihood of the observed\nexpression levels of these genes over a time series of T steps can be expressed as:\n\n1, . . . , X t\n\np(X 1, . . . , X T ) = p(X 1)(cid:89)T\n\np(X t|X t\u22121) = p(X 1)(cid:89)T\n\n(cid:89)p\n\ni|X t\u22121\n\n\u03c0i\n\n),\n\nj\n\nj\n\n\u03c0i\n\nt=2\n\n(1)\nwhere we assume that the topology of the networks is speci\ufb01ed by a set of regulatory relations\ni}, and hence the transition model p(X t|X t\u22121) factors over\nX t\u22121\n) can be viewed as a regulatory gate func-\n). Each p(X t\n\nindividual genes, i.e.,(cid:81)\n\n:= {X t\u22121\n\n: X t\u22121\n\np(X t\n\nregulates X t\ni|X t\u22121\n\ni p(X t\n\n\u03c0i\n\ni|X t\u22121\n\n\u03c0i\n\ntion that takes multiple covariates (regulators) and produce a single response.\nA simple form of the transition model p(X t|X t\u22121) in a DBN is a linear dynamics model:\n\nt=2\n\ni=1\n\n(2)\nwhere A \u2208 Rp\u00d7p is a matrix of coef\ufb01cients relating the expressions at time t \u2212 1 to those of the\nnext time point, and \u0001 is a vector of isotropic zero mean Gaussian noise with variance \u03c32. In this\n\nX t = A \u00b7 X t\u22121 + \u0001,\n\n\u0001 \u223c N (0, \u03c32I),\n\n2\n\n\fi|X t\u22121\n\n\u03c0i\n\n\u03c0i\n\ni|X t\u22121\n\n) = N (X t\n\n) can be expressed as a\ni ; Ai\u00b7X t\u22121, \u03c32), where Ai\u00b7 denotes the ith row of\n\ncase, the gate function that de\ufb01nes the conditional distribution p(X t\nunivariate Gaussian, i.e., p(X t\nthe matrix A. This model is also known as an auto-regressive model.\nThe major reason for favoring DBNs over standard Bayesian networks (BN) or undirected graph-\nical models is its enhanced semantic interpretability. An edge in a BN does not necessarily imply\ncausality due to the Markov equivalence of different edge con\ufb01gurations in the network [16]. In\nDBNs (of the type de\ufb01ned above), all directed edges only point from time t \u2212 1 to t, which bear a\nnatural causal implication and are more likely to suggest regulatory relations. The auto-regressive\nmodel in (2) also offers an elegant formal framework for consistent estimation of the structures of\nDBNs; we can read off the edges between variables in X t\u22121 and X t by simply identifying the\nnonzero entries in the transition matrix A. For example, the non-zero entries of Ai\u00b7 represent the\nset of regulator X\u03c0i that directly lead to a response on Xi.\nContrary to the name of dynamic Bayesian networks may suggest, DBNs are time-invariant models\nand the underlying network structures do not change over time. That is, the dependencies between\nvariables in X t\u22121 and X t are \ufb01xed, and both p(X t|X t\u22121) and A are invariant over time. The term\n\u201cdynamic\u201d only means that the DBN can model dynamical systems. In the sequel, we will present a\nnew formalism where the structures of DBNs are time-varying rather than invariant.\n\n3 A New Formalism: Time-Varying Dynamic Bayesian Networks\nWe will focus on recovering the directed time-varying network structure (or the locations of non-\nzero entries in A) rather than the exact edge values. This is related to the structure estimation\nproblems studied in [11, 12], but in our case for auto-regressive models (and hence directed net-\nworks). Structure estimation results in parse models for easy interpretation, but it is statistically\nmore challenging than the value estimation problem. This is also different from estimating a non-\nstationary model in the conventional sense, where one interests in recovering the exact values of the\nvarying coef\ufb01cients [17, 18]. To make this distinction clear, we use the following 3 examples:\n\nB1 =\n\n,\n\nB2 =\n\n,\n\nB3 =\n\n.\n\n(3)\n\nMatrices B1 and B2 encode the same graph structure, since the locations of their non-zero entries\nare exactly same. Although B1 is closer to B3 than B2 in terms of matrix values (eg. measured in\nFrobenius norm), they encodes very different graph strucutres.\nFormally, let graph Gt = (V,E t) represents the conditional independence relations between the\ncomponents of random vectors X t\u22121 and X t. The vertex set V is a common set of variables\nunderlying X 1:T , i.e., each node in V corresponds to a sequence of variables X 1:T\n. The edge set\nE t \u2286 V \u00d7 V contains directed edges from components of X t\u22121 to those of X t; an edge (i, j) (cid:54)\u2208 E t\nif and only if X t\ngiven the rest of the variables in the model.\nDue to the time-varying nature of the networks, the transition model pt(X t|X t\u22121) in (1) becomes\ntime dependent. In the case of the auto-regressive DBN in (2), its time-varying extension becomes:\n\ni is conditionally independent of X t\u22121\n\nj\n\ni\n\nX t = At \u00b7 X t\u22121 + \u0001,\n\n\u0001 \u223c N (0, \u03c32I),\n\n(4)\n\nand our goal is to estimate the non-zero entries in the sequence of time dependent transition matrices\n{At} (t = 1 . . . T ). The directed edges E t := E t(At) in network Gt associated with each At can\n\nbe recovered via E t =(cid:8)(i, j) \u2208 V \u00d7 V | i (cid:54)= j, At\n\nij (cid:54)= 0(cid:9).\n\n4 Estimating Time-Varying DBN\n\nNote that if we follow the naive assumption that each temporal snapshot is a completely different\nnetwork, the task of jointly estimating {At} by maximizing the log-likelihood would be statistically\nimpossible because the estimator would suffer from extremely high variance due to sample scarcity.\nTherefore, we make a statistically tractable yet realistic assumption that the underlying network\nstructures are sparse and vary smoothly across time; and hence temporally adjacent networks are\nlikely to share common edges than temporally distal networks.\n\n3\n\n(cid:32)0 1\n\n0 0\n0 0\n\n(cid:33)\n\n0\n1\n0\n\n(cid:32)0\n\n0\n0\n\n(cid:33)\n\n0\n3\n0\n\n0.1\n0\n0\n\n(cid:32)0\n\n0\n0\n\n(cid:33)\n\n1\n0\n0.1\n\n0.1\n1.1\n0\n\n\fOverall, we have designed a procedure that decomposes the problem of estimating the time-varying\nnetworks along two orthogonal axes. The \ufb01rst axis is along the time, where we estimate the network\nfor each time point separately by reweighting the observations accordingly; and the second axis is\nalong the set of genes, where we estimate the neighborhood for each gene separately and then join\nthese neighborhoods to form the overall network. One bene\ufb01t of such decomposition is that the\nestimation problem is reduced to a set of atomic optimizations, one for each node i (i = 1 . . .|V|) at\neach time point t\u2217 (t\u2217 = 1 . . . T ):\n\n(cid:88)T\n\nt=1\n\n\u02c6At\u2217\n\ni\u00b7 = argmin\nAt\u2217\ni\u00b7 \u2208R1\u00d7n\n\n1\nT\n\nwt\u2217\n\n(t)(xt\n\ni \u2212 At\u2217\n\ni\u00b7 xt\u22121)2 + \u03bb\n\n,\n\n(5)\n\n(cid:13)(cid:13)(cid:13)At\u2217\n\ni\u00b7\n\n(cid:13)(cid:13)(cid:13)1\n\nPT\nt=1 Kh(t\u2212t\u2217), where Kh(\u00b7) = K( \u00b7\n\nwhere \u03bb is a parameter for the (cid:96)1-regularization term, which controls the number of non-zero en-\ntries in the estimated \u02c6At\u2217\ni\u00b7 , and hence the sparsity of the networks; wt\u2217(t) is the weighting of an\nobservation from time t when we estimate the network at time t\u2217. More speci\ufb01cally, it is de\ufb01ned as\nwt\u2217(t) = Kh(t\u2212t\u2217)\nh) is a symmetric nonnegative kernel function and h is\nthe kernel bandwidth. We use a Gaussian RBF kernel, Kh(t) = exp(\u2212 t2\nh ), in our later experiments.\nNote that multiple measurements at the same time point are considered as i.i.d. observations and can\nbe trivially handled by assigning them the same weights.\nThe objective de\ufb01ned in (5) is essentially a weighted regression problem. The square loss function\nis due to the fact that we are \ufb01tting a linear model with uncorrelated Gaussian noise. Two other\nkey components of the objective are: (i) a kernel reweighting scheme for aggregating observations\nacross time; and (ii) an (cid:96)1-regularization for sparse structure estimation. The \ufb01rst component origi-\nnates from our assumption that the structural changes of the network vary smoothly across time. This\nassumption allows us to borrow information across time by reweighting the observations from dif-\nferent time points and then treating them as if they were i.i.d. observations. Intuitively, the weighting\nshould place more emphasis on observations at or near time point t\u2217 with weights becoming smaller\nas observations move further away from time point t\u2217. The second component is to promote sparse\nstructure and avoid model over\ufb01tting. This is also consistent with the biological observation that net-\nworks underlying biological processes are parsimonious in structure. For example, a transcription\nfactor only controls a small fraction of target genes at particular time point or under a speci\ufb01c con-\ndition [19]. It is well-known that (cid:96)1-regularized least square linear regression, has a parsimonious\nproperty, and exhibits model-selection consistency (i.e., recovers the set of true non-zero regression\ncoef\ufb01cients asymptotically) in noisy settings even when p (cid:29) T [20].\nNote that our procedure can also be easily extended to learn the structure of auto-regressive models\n\u0001 \u223c N (0, \u03c32I). The change we need to\nmake is to incorporate the higher order auto-regressive coef\ufb01cients in the square loss function, i.e.,\n(xt\n\nof higher order D: X t = (cid:80)D\ni \u2212(cid:80)D\n\ni\u00b7 (d)xt\u2212d)2, and penalize the (cid:96)1-norms of these At\u2217\n\nd=1 At(d) \u00b7 X t\u2212d + \u0001,\n\ni\u00b7 (d) correspondingly.\n\nd=1 At\u2217\n\n5 Optimization\n\nEstimating time-varying networks using the decomposition scheme above requires solving a collec-\ntion of optimization problems in (5). In a genome-wide reverse engineering task, there can be tens\nof thousands of genes and hundreds of time points, so one can easily have a million optimization\nproblems. Therefore, it is essential to use an ef\ufb01cient algorithm for solving the atomic optimization\nproblem in (5), which can be trivially parallelized for each genes at each time point.\nInstead of solving the form of the optimization problem in (5), we will push the weighting\n\nwt\u2217(t) into the square loss function by scaling the covariates and response variables by(cid:112)wt\u2217(t),\ni and \u02dcxt\u22121 \u2190 (cid:112)wt\u2217(t)xt\u22121. After this transformation, the optimization\ni \u2190 (cid:112)wt\u2217(t)xt\n\ni.e. \u02dcxt\nproblem becomes a standard (cid:96)1-regularized least-square problem which can be solved via a bat-\ntery of highly scalable and specialized solvers, such as the shooting algorithm [21]. The shooting\nalgorithm is a simple, straightforward and fast algorithm that iteratively solves a system of nonlin-\nj =\near equations related to the optimality condition of problem (5): 2\nT\n\u2212\u03bb sign(At\u2217\nij ) (\u2200j = 1 . . . p). At each iteration of the shooting algorithm, one entries of Ai\u00b7 is up-\ndated by holding all other entries \ufb01xed. Overall, our procedure for estimating time-varying networks\nis summarized in Algorithm 1, which uses the shooting algorithm as the key building block (step\n\n(cid:80)T\nt=1(At\u2217\n\ni\u00b7 \u02dcxt\u22121 \u2212 \u02dcxt\n\ni)\u02dcxt\u22121\n\n4\n\n\fAlgorithm 1: Procedure for Estimating Time-Varying DBN\nInput: Time series {x1, . . . , xT}, regularization parameter \u03bb and kernel parameter h.\nOutput: Time-varying networks {A1, . . . , AT}.\nbegin\n\nIntroduce variable A0 and randomly initialize it\nfor i = 1 . . . p do\n\nfor t\u2217 = 1 . . . T do\nInitialize: At\u2217\nScale time series: \u02dcxt\nwhile At\u2217\n\ni\u00b7 \u2190 At\u2217\u22121\n\nfor j = 1 . . . p do\n\ni\u00b7 not converges do\nCompute: Sj \u2190 2\nUpdate: At\u2217\n\nT\n\ni\u00b7\n\ni, \u02dcxt\u22121 \u2190(cid:112)wt\u2217(t)xt\u22121 (t = 1 . . . T )\ni \u2190(cid:112)wt\u2217(t)xt\n(cid:80)T\nt=1((cid:80)\n(cid:80)T\nt=1 \u02dcxt\u22121\nij \u2190 (sign(Sj \u2212 \u03bb)\u03bb \u2212 Sj)/bj, if |Sj| > \u03bb, otherwise 0\n\nk \u2212 \u02dcxt\n\nk(cid:54)=j At\u2217\n\n, bj = 2\nT\n\nik \u02dcxt\u22121\n\ni)\u02dcxt\u22121\n\nj\n\nj\n\n\u02dcxt\u22121\n\nj\n\n1\n2\n3\n4\n5\n6\n7\n8\n\n9\n\n10\n\n11\n\nend\n\n7-10). In step 5, we uses a warm start for each atomic optimization problem: since the networks\nvary smoothly across time, we can use At\u2217\u22121\ni\u00b7 for further speedup.\n\nas a good initialization for At\u2217\n\ni\u00b7\n\n6 Statistical Properties\n\nIn this section, we study the statistical consistency of the estimation procedure in Section 4. Our\nanalysis is different from the consistency results presented by [11] on recovering time-varying undi-\nrected graphical models. Their analysis deals with Frobenius norm consistency which is a weaker\nresult than the structural consistency we pursue here. Our structural consistency result for TV-DBNs\nestimation procedure follows the proof strategy of [20]; however, the analysis is complicated by two\nmajor factors. First, times series observations are very often non-i.i.d.\u2014 current observations may\ndepend on past history. Second, we are modeling non-stationary processes, where we need to deal\nwith the additional bias term that arises due to locally stationary approximation to non-stationarity.\nIn the following, we state our assumptions and theorem, but leave the detailed proof of this theorem\nfor a full version of the paper (a sketch of the proof can be found in the appendix).\n\nTheorem 1 Assume that the conditions below hold:\n\n1. Elements of At are smooth functions with bounded second derivatives, i.e. there exists a\n\nconstant L > 0 s.t. | \u2202\n\nij| < L and\n\n\u2202t At\n\n| \u22022\n\u2202t2 At\n\nij| < L.\n\ni\n\n2. The minimum absolute value of non-zero elements of At is bounded away from zero at\nobservation points, and this bound tends to zero as we observe more and more samples,\ni.e., amin := mint\u2208{1/T,2/T,...,1} mini\u2208[p],j\u2208St\n\nij| > 0.\ni denote the set of non-zero elements of\nij (cid:54)= 0}. Assume that there exist a\n: At\ns ,\u2200i \u2208 [p], t \u2208 [0, 1], where s is an upper\n4. The kernel K(\u00b7) : R (cid:55)\u2192 R is a symmetric function and has bounded support on [0, 1]. There\n\n3. Let \u03a3t = E[X t(X t)T ] = [\u03c3ij(t)]p\nthe i-th row of the matrix At, i.e. St\nconstant d \u2208 (0, 1] s.t. maxj\u2208St\nbound on the number of non-zero elements, i.e. s = maxt\u2208[0,1] maxi\u2208[p] |St\ni|.\nexists a constant MK s.t. maxx\u2208R |K(x)| \u2264 MK and maxx\u2208R K(x)2 \u2264 MK.\n\n|At\ni,j=1 and let St\ni = {j \u2208 [p]\ni ,k(cid:54)=j |\u03c3jk(t)| \u2264 d\n\nLet the regularization parameter scale as \u03bb = O((cid:112)(log p)/T h), the minimum absolute non-zero\n\nentry amin of At\u2217 be suf\ufb01ciently large (amin \u2265 2\u03bb). If h = O(T 1/3) and s log p\n\nT h = o(1) then\n\nP[supp( \u02c6At\u2217\n\n) = supp(At\u2217\n\n)] \u2192 1, T \u2192 \u221e,\n\n\u2200t\u2217 \u2208 [0, 1].\n\n(6)\n\n5\n\n\f7 Experiments\n\nTo the best of our knowledge, this is the \ufb01rst practical method for structure learning of non-stationary\nDBNs. Thus we mainly compare with static DBN structure learning methods. The goal is to demon-\nstrate the advantage of TV-DBNs for modeling time-varying structures of non-stationary processes\nwhich are ignored by traditional approaches. We conducted 3 experiments using synthetic data,\ngene expression data and EEG signals. In these experiments, TV-DBNs either better recover the\nunderlying networks, or provide better explanatory power for the underlying biological processes.\nSynthetic Data In this experiment, we generate synthetic time series using a \ufb01rst order auto-\nregressive models with smoothly varying model structures. More speci\ufb01cally, we \ufb01rst generate\n8 different anchor transition matrices At1 . . . At8, each of which corresponds to an Erd\u00a8os-R\u00b4enyi\nrandom graph of node size p = 50 and average indegree of 2 (we have also experimented with\np = 75 and 100 which provides similar results). We then evenly space these 8 anchor matrices,\nand interpolate a suitable number of intermediate matrices to match the number of observations\nT . Due to the interpolation, the average indegree of each node is around 4. With the sequence of\n{At}(t = 1 . . . T ), we simulate the time series according to equation (4) with noise variance \u03c32 = 1.\nWe then study the behavior of TV-DBNs and static DBNs [22] in recovering the underlying varying\nnetworks as we increase the number of observations T . We also compare with a piecewise constant\nDBN that estimate a static network for each segment obtained from change point detection [14].\nFor the TV-DBN, we choose the bandwidth parameter h of the Gaussian kernel according to\nthe spacing between two adjacent anchor matrices (T /7) such that exp(\u2212 T 2\n49h) = exp(\u22121).\nFor all methods, we choose the regularization parameter such that the resulting networks has\nan average indegree of 4. We evaluate the performance using an F1 score, which is the har-\nmonic mean of precision and recall scores in retrieving the true time-varying network edges.\n\nFigure 1: F1 score of estimating time-\nvarying networks for different methods.\n\nWe can see that estimating a static DBN or a piecewise\nconstant DBN does not provide a good estimation of the\nnetwork structures (Figure 1). In contrast, the TV-DBN\nleads to a signi\ufb01cantly higher F1 score, and its perfor-\nmance also bene\ufb01t quickly from increasing the number\nof observations. Note that these results are not surpris-\ning since time-varying networs simply \ufb01t better with the\ndata generating process. As time-varying networks occur\noften in biological systems, we expect TV-DBNs will be\nuseful for studying biological systems.\nYeast Gene Regulatory Networks. In this experiment, we will reverse engineer the time varying\ngene regulatory networks from time series of gene expression measured across two yeast cell cycles.\nA yeast cell cycle is divided into four stages: S phase for DNA synthesis, M phase for mitosis,\nand G1 and G2 phase separating S and M phases. We use two time series (alpha30 and alpha38)\nfrom [23] which are technical replicates of each other with a sampling interval of 5 minutes and\na total of 25 time points across two yeast cell cycles. We consider a set of 3626 genes which\nare common to both arrays. We choose the bandwidth parameter h such that the weighting decay\nto exp(\u22121) for half of a cell cycle, i.e. exp(\u221262/h) = exp(\u22121). We choose the regularization\nparameter such that the sparsity of the networks are around 0.01.\nDuring the cell cycle of yeasts, there exist multiple underlying \u201cthemes\u201d that determine the func-\ntionalities of each gene and their relationships to each other, and such themes are dynamical and\nstochastic. As a result, the gene regulatory networks at each time point are context-dependent and\ncan undergo systematic rewiring, rather than being invariant over time. A summary of the estimated\ntime-varying networks are visualized in Figure 2. We group genes according to 50 ontology groups.\nWe can see that the most active groups of genes are related to background processes such as cy-\ntoskeleton organization, enzyme regulator activity, ribosome activity. We can also spot transient\ninteractions, for instance, between genes related to site of polarized growth and nucleolus (time\npoint 18), and between genes related to ribosome and cellular homeostasis (time point 24). Note\nthat, although gene expressions are measured across two cell cycles, the values do not necessarily\nexhibit periodic behavior. In fact, only a small fraction of yeast genes (less than 20%) has been\nreported to exhibit cycling behavior [23].\n\n6\n\n\f(b) t2\n\n(c) t4\n\n(d) t6\n\n(e) t8\n\n(f) t10\n\n(g) t12\n\n(h) t14\n\n(i) t16\n\n(j) t18\n\n(k) t20\n\n(l) t22\n\n(m) t24\n\nTF\n\nKnockout\nOntology\n\n7\n7\n13\n\n(a) t1\n\n23\n26\n77\n\nDBN TV-DBN\n\nFigure 2: Interactions between gene ontological groups. The weight of an edge between two ontological\ngroups is the total number of connection between genes in the two groups. We thresholded the edge weight\nsuch that only the dominant interactions are displayed.\nTable 1: The number of enriched unique gene\nsets discovered by the static and time-varying\nnetworks respectively. Here we are interested\nin recall score: the time-varying networks better\nmodels the biological system.\n\nNext we study genes sets that are related to speci\ufb01c\nstage of cell cycle where we expect to see periodic be-\nhavior.\nIn particular, we obtain gene sets known to\nbe related to G1, S and S/G2 stage respectively.1 We\nuse interactivity, which is the total number of edges a\ngroup of genes is connected to, to describe the activity\nof each group of genes. Since the regulatory networks\nare directed, we can examine both indegree and out-\ndegree separately for each gene sets. In Figure 3(a)(b)(c), the interactivities of these genes indeed\nexhibit periodic behavior which corresponds well with their supposed functions in cell cycles.\nWe also plot the histogram of indegree and outdegree (averaged across time) for the time-varying\nnetworks in Figure 3(d). We \ufb01nd that the outdegrees approximately follow a scale free distribution\nwith largest outdegree reaching 90. This corresponds well with the biological observation that there\nare a few genes (regulators) that regulate a lot of other genes. The indegree distribution is very dif-\nferent from that of the outdegree, and it exhibits a clear peak between 5 and 6. This also corresponds\nwell with biological observations that most genes are controlled only by a few regulators.\nTo further assess the modeling power of the time-varying networks and its advantage over static\nnetwork, we perform gene set enrichment studies. More speci\ufb01cally, we use three types of infor-\nmation to de\ufb01ne the gene sets: transcription factor binding targets (TF), gene knockout signatures\n(Knockout), and gene ontology (Ontology) groups [24]. We partition the genes in the time varying\nnetworks at each time point into 50 groups using spectral clustering, and then test whether these\ngroups are enriched with genes from any prede\ufb01ned gene sets. We use a max-statistic and a 99%\ncon\ufb01dence level for the test [25]. Table 1 indicates that time-varying networks are able to discover\nmore functional groups as de\ufb01ned by the genes sets than static networks as commonly used in bio-\nlogical literature. In the appendix, we also visualize the time spans of these active functional groups.\nIt can be seen that many of them are dynamic and transient, and not captured by a static network.\nBrain Response to Visual Stimuli. In this experiment, we will explore the interactions between\nbrain regions in response to visual stimuli using TV-DBNs. We use the EEG dataset from [26]\nwhere \ufb01ve healthy subjects (labeled \u2018aa\u2019, \u2018al\u2019, \u2018av\u2019, \u2018aw\u2019 and \u2018ay\u2019 respectively) were required to\nimagine body part movement based on visual cues in order to generate EEG changes. We focus our\n\n1We obtain gene sets from http://genome-www.stanford.edu/cellcycle/data/rawdata/KnowGenes.doc.\n\n7\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 3: (a) Genes speci\ufb01c to G1 phase are being regulated periodically; we can see that the average in-\ndegree of these genes increases during G1 stage and starts to decline right after the G1 phase. (b) S phase\nspeci\ufb01c genes periodically regulate other genes; we can see that the average outdegree of these genes peaks at\nthe end of S phase and starts to decline right after S phase. (c) The interactivity of S/G2 speci\ufb01c genes also\nshow nice correspondence with their functional roles; we can see that the average outdegree increases till G2\nphase and then starts to decline. (d) Indegree and outdegree distribution averaged over 24 time points.\n\nSB\n\nt = 1.0s\n\nt = 1.5s\n\nt = 2.0s\n\nt = 2.5s\n\nal\n\nav\n\nFigure 4: Temporal progression of brain interactions for subject \u2018al\u2019 and BCI \u201cilliterate\u201d \u2018av\u2019. The plot for the\nother 3 subjects can be found in the appendix. The dots correspond to EEG electrode positions in 10-5 system.\n\nanalysis on trials related to right hand imagination, and signals in the window [1.0, 2.5] second after\nthe visual cue is presented. We bandpass \ufb01lter data at 8\u201312 Hz to obtain EEG alpha activity. We\nfurther normalize each EEG channel to zero mean and unit variance, and estimate the time-varying\nnetworks for all 5 subject using exactly the same regularization parameter and kernel bandwidth\n(h s.t. exp(\u2212(0.5)2/h) = exp(\u22121)). We tried a range of different regularization parameters, but\nobtained qualitatively similar results to Figure 4.\nWhat is particularly interesting in this dataset is that subject \u2018av\u2019 is called BCI \u201cilliterate\u201d; he/she\nis unable to generate clear EEG changes during motor imagination. The estimated time-varying\nnetworks reveal that the brain interactions of subject \u2018av\u2019 is particularly weak and the brain con-\nnectivity actually decreases as the experiment proceeds. In contrast, all other four subjects show an\nincreased brain interaction as they engage in active imagination. Particularly, these increased inter-\nactions occur between visual and motor cortex. This dynamic coherence between visual and motor\ncortex corresponds nicely to the fact that subjects are consciously transforming visual stimuli into\nmotor imaginations which involves the motor cortex. It seems that subject \u2018av\u2019 fails to perform such\nintegration due to the disruption of brain interactions.\n\n8 Conclusion\nIn this paper, we propose time-varying dynamic Bayesian networks (TV-DBN) for modeling the\nvarying network structures underlying non-stationary biological time series. We have designed a\nsimple and scalable kernel reweighted structural learning algorithm to make the learning possible.\nGiven the rapid advances in data collection technologies for biological systems, we expect that com-\nplex, high-dimensional, and feature rich data from complex dynamic biological processes, such as\ncancer progression, immune responses, and developmental processes, will continue to grow. Thus,\nwe believe our new method is a timely contribution that can narrow the gap between imminent\nmethodological needs and the available data and offer deeper understanding of the mechanisms and\nprocesses underlying biological networks.\nAcknowledgments LS is supported by a Ray and Stephenie Lane Research Fellowship. EPX is supported\nby grant ONR N000140910758, NSF DBI-0640543, NSF DBI-0546594, NSF IIS-0713379 and an Alfred P.\nSloan Research Fellowship. We also thank Grace Tzu-Wei Huang for helpful discussions.\n\n8\n\n\fReferences\n[1] A. L. Barabasi and Z. N. Oltvai. Network biology: Understanding the cell\u2019s functional organization.\n\nNature Reviews Genetics, 5(2):101\u2013113, 2004.\n\n[2] Francisco Varela, Jean-Philippe Lachaux, Eugenio Rodriguez, and Jacques Martinerie. The brainweb:\n\nPhase synchronization and large-scale integration. Nature Reviews Neuroscience, 2:229\u2013239, 2001.\n\n[3] N. Luscombe, M. Babu, H. Yu, M. Snyder, S. Teichmann, and M. Gerstein. Genomic analysis of regula-\n\ntory network dynamics reveals large topological changes. Nature, 431:308\u2013312, 2004.\n\n[4] Eugenio Rodriguez, Nathalie George, Jean-Philippe Lachaux, Jacques Martinerie, Bernard Renault, and\nFrancisco J. Varela1. Perception\u2019s shadow: long-distance synchronization of human brain activity. Nature,\n397(6718):430\u2013433, 1999.\n\n[5] M. Talih and N. Hengartner. Structural learning with time-varying components: Tracking the cross-\n\nsection of \ufb01nancial time series. J. Royal Stat. Soc. B, 67(3):321C341, 2005.\n\n[6] S. Hanneke and E. P. Xing. Discrete temporal models of social networks. In Workshop on Statistical\n\nNetwork Analysis, ICML06, 2006.\n\n[7] F. Guo, S. Hanneke, W. Fu, and E. P. Xing. Recovering temporally rewiring networks: A model-based\n\napproach. In International Conference in Machine Learning, 2007.\n\n[8] X. Xuan and K. Murphy. Modeling changing dependency structure in multivariate time series. In Inter-\n\nnational Conference in Machine Learning, 2007.\n\n[9] J. Robinson and A. Hartemink. Non-stationary dynamic bayesian networks. In Neural Information Pro-\n\n[10] Amr Ahmed and Eric P. Xing. Tesla: Recovering time-varying networks of dependencies in social and\n\nbiological studies. Proceeding of the National Academy of Sciences, in press, 2009.\n\n[11] S. Zhou, J. Lafferty, and L. Wasserman. Time varying undirected graphs. In Computational Learning\n\n[12] L. Song, M. Kolar, and E. Xing. Keller: Estimating time-evolving interactions between genes. In Bioin-\n\ncessing Systems, 2008.\n\nTheory, 2008.\n\nformatics (ISMB), 2009.\n\n[13] N. Friedman, M. Linial, I. Nachman, and D. Peter. Using bayesian networks to analyze expression data.\n\nJournal of Computational Biology, 7:601\u2013620, 2000.\n\n[14] N. Dobingeon, J. Tourneret, and M. Davy. Joint segmentation of piecewise constant autoregressive pro-\ncesses by using a hierarchical model and a bayesian sampling approach. IEEE Transactions on Signal\nProcessing, 55(4):1251\u20131263, 2007.\n\n[15] K. Kanazawa, D. Koller, and S. Russell. Stochastic simulation algorithms for dynamic probabilistic\n\n[16] L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models with link uncertainty.\n\nnetworks. Uncertainty in AI, 1995.\n\nJournal of Machine Learning Research, 2002.\n\n[17] R. Dahlhaus. Fitting time series models to nonstationary processes. Ann. Statist, (25):1\u201337, 1997.\n[18] C. Andrieu, M. Davy, and A. Doucet. Ef\ufb01cient particle \ufb01ltering for jump markov systems: Application to\n\ntime-varying autoregressions. IEEE Transactions on Signal Processing, 51(7):1762\u20131770, 2003.\n\n[19] E. H. Davidson. Genomic Regulatory Systems. Academic Press, 2001.\n[20] Florentina Bunea. Honest variable selection in linear and logistic regression models via (cid:96)1 and (cid:96)1 + (cid:96)2\n\npenalization. Electronic Journal of Statistics, 2:1153, 2008.\n\n[21] W. Fu. Penalized regressions: the bridge versus the lasso. Journal of Computational and Graphical\n\n[22] M. Schmidt, A. Niculescu-Mizil, and K Murphy.\n\nLearning graphical model structure using l1-\n\nStatistics, 7(3):397\u2013416, 1998.\n\nregularization paths. In AAAI, 2007.\n\n[23] Tata Pramila, Wei Wu, Shawna Miles, William Noble, and Linda Breeden. The forkhead transcription fac-\ntor hcm1 regulates chromosome segregation genes and \ufb01lls the s-phase gap in the transcriptional circuitry\nof the cell cycle. Gene and Development, 20:2266\u20132278, 2006.\n\n[24] Jun Zhu, Bin Zhang, Erin Smith, Becky Drees, Rachel Brem, Leonid Kruglyak, Roger Bumgarner, and\nEric E Schadt. Integrating large-scale functional genomic data to dissect the complexity of yeast regula-\ntory networks. Nature Genetics, 40:854\u2013861, 2008.\n\n[25] T. Nichols and A. Holmes. Nonparametric permutation tests for functional neuroimaging: a primer with\n\nexamples. Human Brain Mapping, 15:1\u201325, 2001.\n\n[26] G. Dornhege, B. Blankertz, G. Curio, and K.R. M\u00a8uller. Boosting bit rates in non-invasive eeg single-trial\nclassi\ufb01cations by feature combination and multi-class paradigms. IEEE Trans. Biomed. Eng., 51:993\u2013\n1002, 2004.\n\n9\n\n\f", "award": [], "sourceid": 858, "authors": [{"given_name": "Le", "family_name": "Song", "institution": null}, {"given_name": "Mladen", "family_name": "Kolar", "institution": null}, {"given_name": "Eric", "family_name": "Xing", "institution": null}]}