{"title": "Modelling transcriptional regulation using Gaussian Processes", "book": "Advances in Neural Information Processing Systems", "page_first": 785, "page_last": 792, "abstract": null, "full_text": "Modelling transcriptional regulation using Gaussian processes\n\nNeil D. Lawrence School of Computer Science University of Manchester, U.K. neill@cs.man.ac.uk\n\nGuido Sanguinetti Department of Computer Science University of Sheffield, U.K. guido@dcs.shef.ac.uk\n\nMagnus Rattray School of Computer Science University of Manchester, U.K. magnus@cs.man.ac.uk\n\nAbstract\nModelling the dynamics of transcriptional processes in the cell requires the knowledge of a number of key biological quantities. While some of them are relatively easy to measure, such as mRNA decay rates and mRNA abundance levels, it is still very hard to measure the active concentration levels of the transcription factor proteins that drive the process and the sensitivity of target genes to these concentrations. In this paper we show how these quantities for a given transcription factor can be inferred from gene expression levels of a set of known target genes. We treat the protein concentration as a latent function with a Gaussian process prior, and include the sensitivities, mRNA decay rates and baseline expression levels as hyperparameters. We apply this procedure to a human leukemia dataset, focusing on the tumour repressor p53 and obtaining results in good accordance with recent biological studies.\n\nIntroduction\nRecent advances in molecular biology have brought about a revolution in our understanding of cellular processes. Microarray technology now allows measurement of mRNA abundance on a genomewide scale, and techniques such as chromatin immunoprecipitation (ChIP) have largely unveiled the wiring of the cellular transcriptional regulatory network, identifying which genes are bound by which transcription factors. However, a full quantitative description of the regulatory mechanism of transcription requires the knowledge of a number of other biological quantities: first of all the concentration levels of active transcription factor proteins, but also a number of gene-specific constants such as the baseline expression level for a gene, the rate of decay of its mRNA and the sensitivity with which target genes react to a given transcription factor protein concentration. While some of these quantities can be measured (e.g. mRNA decay rates), most of them are very hard to measure with current techniques, and have therefore to be inferred from the available data. This is often done following one of two complementary approaches. One can formulate a large scale simplified model of regulation (for example assuming a linear response to protein concentrations) and then combine network architecture data and gene expression data to infer transcription factors' protein concentrations on a genome-wide scale. This line of research was started in [3] and then extended further to include gene-specific effects in [10, 11]. Alternatively, one can formulate a realistic model of a small subnetwork where few transcription factors regulate a small number of established target genes, trying to include the finer points of the dynamics of transcriptional regulation. In this paper we follow the second approach, focussing on the simplest subnetwork consisting of one tran-\n\n\f\nscription factor regulating its target genes, but using a detailed model of the interaction dynamics to infer the transcription factor concentrations and the gene specific constants. This problem was recently studied by Barenco et al. [1] and by Rogers et al. [9]. In these studies, parametric models were developed describing the rate of production of certain genes as a function of the concentration of transcription factor protein at some specified time points. Markov chain Monte Carlo (MCMC) methods were then used to carry out Bayesian inference of the protein concentrations, requiring substantial computational resources and limiting the inference to the discrete time-points where the data was collected. We show here how a Gaussian process model provides a simple and computationally efficient method for Bayesian inference of continuous transcription factor concentration profiles and associated model parameters. Gaussian processes have been used effectively in a number of machine learning and statistical applications [8] (see also [2, 6] for the work that is most closely related to ours). Their use in this context is novel, as far as we know, and leads to several advantages. Firstly, it allows for the inference of continuous quantities (concentration profiles) without discretization, therefore accounting naturally for the temporal structure of the data. Secondly, it avoids the use of cumbersome interpolation techniques to estimate mRNA production rates from mRNA abundance data, and it allows us to deal naturally with the noise inherent in the measurements. Finally, it greatly outstrips MCMC techniques in terms of computational efficiency, which we expect to be crucial in future extensions to more complex (and realistic) regulatory networks. The paper is organised as follows: in the first section we discuss linear response models. These are simplified models in which the mRNA production rate depends linearly on the transcription factor protein concentration. Although the linear assumption is not verified in practice, it has the advantage of giving rise to an exactly tractable inference problem. We then discuss how to extend the formalism to model cases where the dependence of mRNA production rate on transcription factor protein concentration is not linear, and propose a MAP-Laplace approach to carry out Bayesian inference. In the third section we test our model on the leukemia data set studied in [1]. Finally, we discuss further extensions of our work. MATLAB code to recreate the experiments is available on-line.\n\n1 Linear Response Model\nLet the data set under consideration consist of T measurements of the mRNA abundance of N genes. We consider a linear differential equation that relates a given gene j 's expression level xj (t) at time t to the concentration of the regulating transcription factor protein f (t), dxj = Bj + Sj f (t) - Dj xj (t) . dt (1)\n\nHere, Bj is the basal transcription rate of gene j , Sj is the sensitivity of gene j to the transcription factor and Dj is the decay rate of the mRNA. Crucially, the dependence of the mRNA transcription rate on the protein concentration (response) is linear. Assuming a linear response is a crude simplification, but it can still lead to interesting results in certain modelling situations. Equation (1) was used by Barenco et al. [1] to model a simple network consisting of the tumour suppressor transcription factor p53 and five of its target genes. We will consider more general models in section 2. The equation given in (1) can be solved to recover Bj + kj exp (-Dj t) + Sj exp (-Dj t) xj (t) = Dj t f (u) exp (Dj u) du\n0\n\n(2)\n\nwhere kj arises from the initial conditions, and is zero if we assume an initial baseline expression level xj (0) = Bj /Dj . We will model the protein concentration f as a latent function drawn from a Gaussian process prior distribution. It is important to notice that equation (2) involves only linear operations on the function f (t). This implies immediately that the mRNA abundance levels will also be modelled as a Gaussian process, and the covariance function of the marginal distribution p (x1 , . . . , xN ) can be worked out explicitly from the covariance function of the latent function f .\n\n\f\nLet us rewrite equation (2) as Bj + Lj [f ] (t) Dj where we have set the initial conditions such that kj in equation (2) is equal to zero and t Lj [f ] (t) = Sj exp (-Dj t) f (u) exp (Dj u) du xj (t) =\n0\n\n(3)\n\nis the linear operator relating the latent function f to the mRNA abundance of gene j , xj (t). If the covariance function associated with f (t) is given by kf f (t, t ) then elementary functional analysis yields that cov (Lj [f ] (t) , Lk [f ] (t )) = Lj Lk [kf f ] (t, t ) . Explicitly, this is given by the following formula 0t t ) ) kxj xk (t, t = Sj Sk exp (-Dj t - Dk t exp (Dj u) exp (Dk u ) kf f (u, u ) du du. (4)\n0\n\nIf the process prior over f (t) is taken to be a squared exponential kernel, , - 2 (t - t ) kf f (t, t ) = exp l2 where l controls the width of the basis functions1 , the integrals in equation (4) can be computed analytically. The resulting covariances are obtained as l ) [hkj (t , t) + hj k (t, t )] (5) kxj xk (t, t = Sj Sk 2 where - e e t- + t 2 exp (k ) t hkj (t , t) = - k + k xp [-Dk (t - t)] rf erf Dj + Dk l l e tl + . exp [- (Dk t + Dj )] rf - k erf (k ) Here erf(x) =\n2 N\n\nx\n0\n\nexp\n\n-2d y y and k =\n\nDk l 2.\n\nWe can therefore compute a likelihood which\nN\n\nrelates instantiations from all the observed genes, {xj (t)}j =1 , through dependencies on the parameters {Bj , Sj , Dj }j =1 . The effect of f (t) has been marginalised. To infer the protein concentration levels, one also needs the \"cross-covariance\" terms between xj (t) and f (t ), which is obtained as t kxj f (t, t ) = Sj exp (-Dj t) exp (Dj u) kf f (u, t ) du. (6)\n0\n\nAgain, this can be obtained explicitly for squared exponential priors on the latent function f as e t- + t . l Sj t 2 exp (j ) exp [-Dj (t - t)] rf - j erf + j kxj f (t , t) = 2 l l Standard Gaussian process regression techniques [see e.g. 8] then yield the mean and covariance function of the posterior process on f as\n-1 f post = Kf x Kxx x\np -1 Kfost = Kf f - Kf x Kxx Kxf f\n\n(7)\n\nwhere x denotes collectively the xj (t) observed variables and capital K denotes the matrix obtained by evaluating the covariance function of the processes on every pair of observed time points. The\n1\n\nThe scale of the process is ignored to avoid a parameterisation ambiguity with the sensitivities.\n\n\f\nmodel parameters Bj , Dj and Sj can be estimated by type II maximum likelihood. Alternatively, they can be assigned vague gamma prior distributions and estimated a posteriori using MCMC sampling. In practice, we will allow the mRNA abundance of each gene at each time point to be corrupted by some noise, so that we can model the observations at times ti for i = 1, . . . , T as, yj (ti ) = xj (ti ) + j (ti ) (8) 0 2. with j (ti ) N , j i Estimates of the confidence levels associated with each mRNA measurement can be obtained for Affymetrix microarrays using probe-level processing techniques such as the mmgMOS model of []. The covariance of the noisy process is simply obtained as 4 . 2 2 2 2 Kyy = + Kxx , with = diag 11 , . . . , 1T , . . . , N 1 , . . . , N T\n\n2 Non-linear Response Model\nWhile the linear response model presents the advantage of being exactly tractable in the important squared exponential case, a realistic model of transcription should account for effects such as saturation and ultrasensitivity which cannot be captured by a linear function. Also, all the quantities in equation (1) are positive, but one cannot constrain samples from a Gaussian process to be positive. Modelling the response of the transcription rate to protein concentration using a positive nonlinear function is an elegant way to enforce this constraint. 2.1 Formalism\n\nLet the response of the mRNA transcription rate to transcription factor protein concentration levels be modelled by a nonlinear function g with a target-specific vector j of parameters, so that, dxj = Bj + g (f (t), j ) - Dj xj dt t Bj xj (t) = + exp (-Dj t) du g (f (u), j ) exp (Dj u) , Dj 0\n\n(9)\n\nwhere we again set xj (0) = Bj /Dj and assign a Gaussian process prior distribution to f (t). In this case the induced distribution of xj (t) is no longer a Gaussian process. However, we can derive the functional gradient of the likelihood and prior, and use this to learn the Maximum a Posteriori (MAP) solution for f (t) and the parameters by (functional) gradient descent. Given noise-corrupted data yj (ti ) as above, the log-likelihood of the data Y = {yj (ti )} is given by ( N T 2 2 - NT xj (ti ) - yj (ti )) 1i j log(2 ) (10) - log j i p(Y |f , {Bj , j , Dj , }) = - 2 2 =1 =1 j i 2 where denotes collectively the parameters of the prior covariance on f (in the squared exponential case, = l2 ). The functional derivative of the log-likelihood with respect to f is then obtained as i j (xj (ti ) - yj (ti )) log p(Y |f ) =- (ti - t) g (f (t))e-Dj (ti -t) 2 f (t) j i =1 =1\nT N\n\n(11)\n\nwhere (x) is the Heaviside step function and we have omitted the model parameters for brevity. The negative Hessian of the log-likelihood with respect to f is given by w(t, t ) = - iT jN (xj (ti ) - yj (ti )) 2 log p(Y |f ) = (ti - t) (t - t ) g (f (t))e-Dj (ti -t) 2 f (t) f (t ) j i =1 =1 iT\n=1\n\n+\n\n(ti - t) (ti - t )\n\njN\n=1\n\n- j i2 g (f (t)) g (f (t )) e-Dj (2ti -t-t\n\n)\n\n(12) where g (f ) = g / f and g (f ) = 2 g / f 2 .\n\n\f\n2.2\n\nImplementation\n\nWe discretise in time t and compute the gradient and Hessian on a grid using approximate Riemann quadrature. In the simplest case, we choose a uniform grid [tp ] p = 1, . . . , M so that = tp - tp-1 is constant. We write f = [fp ] to be the vector realisation of the function f at the grid points. The gradient of the log-likelihood is then given by, i j (xj (ti ) - yj (ti )) log p(Y |f ) g (fp ) e-Dj (ti -tp ) = - ( ti - tp ) 2 fp j i =1 =1 and the negative Hessian of the log-likelihood is, Wpq = - + 2 iT\n=1 T N\n\n(13)\n\ni j (xj (ti ) - yj (ti )) 2 log p(Y |f ) g = pq ( ti - tq ) 2 fp fq j i =1 =1 (ti - tp ) (ti - tq ) jN\n=1 - j i2 g (fq ) g (fp ) e-Dj (2ti -tp -tq )\n\nT\n\nN\n\n(\n\nfq ) e-Dj (ti -tq ) (14)\n\nwhere pq is the Kronecker delta. In these and the following formulae ti is understood to mean the index of the grid point corresponding to the ith data point, whereas tp and tq correspond to the grid points themselves. We can then compute the gradient and Hessian of the (discretised) un-normalised log posterior (f ) = log p(Y |f ) + log p(f ) [see 8, chapter 3] (f ) = log p(Y |f ) - K -1 f (f ) = -(W + K -1 ) (15)\n\nwhere K is the prior covariance matrix evaluated at the grid points. These can be used to find the ^ MAP solution f using Newton's method. The Laplace approximation to the log-marginal likelihood is then (ignoring terms that do not involve model parameters) ^ ^ ^ (16) log p(Y ) log p(Y |f ) - 1 f T K -1 f - 1 log |I + K W |.\n2 2\n\nWe can also optimise the log-marginal with respect to the model and kernel parameters. The gradient of the log-marginal with respect to the kernel parameters is [8] +p ( ^ log p(Y |) fp log p(Y |) K -1 ^ 1 K ^ = 1 f T K -1 K f - 2 tr I + K W )-1 W (17) 2 ^ fp ^ where the final term is due to the implicit dependence of f on . 2.3 Example: exponential response\n\nAs an example, we consider the case in which g (f (t) , j ) = Sj exp (f (t)) (18) which provides a useful way of constraining the protein concentration to be positive. Substituting equation (18) in equations (13) and (14) one obtains i j (xj (ti ) - yj (ti )) log p(Y |f ) = - ( ti - tp ) Sj efp -Dj (ti -tp ) 2 fp j i =1 =1 Wpq = -pq i j log p(Y |f ) - 2 + 2 (ti - tp ) (ti - tq ) j i2 Sj efp +fq -Dj (2ti -tp -tq ) . fp =1 =1 ^ f K = AK -1 \nT N T N\n\nThe terms required in equation (17) are, log p(Y |) 1q = -(AW )pp - Aqq Wqp ^ 2 fp where A = (W + K -1 )-1 . ^ log p(Y |f ) ,\n\n\f\n3 Results\nTo test the efficacy of our method, we used a recently published biological data set which was studied using a linear response model by Barenco et al. [1]. This study focused on the tumour suppressor protein p53. mRNA abundance was measured at regular intervals in three independent human cell lines using Affymetrix U133A oligonucleotide microarrays. The authors then restricted their interest to five known target genes of p53: DDB2, p21, SESN1/hPA26, BIK and TNFRSF10b. They estimated the mRNA production rates by using quadratic interpolation between any three consecutive time points. They then discretised the model and used MCMC sampling (assuming a log-normal noise model) to obtain estimates of the model parameters Bj , Sj , Dj and f (t). To make the model identifiable, the value of the mRNA decay of one of the target genes, p21, was measured experimentally. Also, the scale of the sensitivities was fixed by choosing p21's sensitivity to be equal to one, and f (0) was constrained to be zero. Their predictions were then validated by doing explicit protein concentration measurements and growing mutant cell lines where the p53 gene had been knocked out. 3.1 Linear response analysis\n\nWe first analysed the data using the simple linear response model used by Barenco et al. [1]. Raw data was processed using the mmgMOS model of [4], which also provides estimates of the credibility associated with each measurement. Data from the different cell lines were treated as independent instantiations of f but sharing the model parameters {Bj , Sj , Dj , }. We used a squared exponential covariance function for the prior distribution on the latent function f . The inferred posterior mean function for f , together with 95% confidence intervals, is shown in Figure 1(a). The pointwise estimates inferred by Barenco et al. are shown as crosses in the plot. The posterior mean function matches well the prediction obtained by Barenco et al.2 Notice that the right hand tail of the inferred mean function shows an oscillatory behaviour. We believe that this is an artifact caused by the squared exponential covariance; the steep rise between time zero and time two forces the length scale of the function to be small, hence giving rise to wavy functions [see page 123 in 8]. To avoid this, we repeated the experiment using the \"MLP\" covariance function for the prior distribution over f [12]. Posterior estimation cannot be obtained analytically in this case so we resorted to the MAPLaplace approximation described in section 2. The MLP covariance is obtained as the limiting case of an infinite number of sigmoidal neural networks and has the following covariance function w ( tt + b ) ( k (t, t = arcsin 19) wt2 + b + 1) (wt 2 + b + 1) where w and b are parameters known as the weight and the bias variance. The results using this covariance function are shown in Figure 1(b). The resulting profile does not show the unexpected oscillatory behaviour and has tighter credibility intervals. Figure 2 shows the results of inference on the values of the hyperparameters Bj , Sj and Dj . The columns on the left, shaded grey, show results from our model and the white columns are the estimates obtained in [1]. The hyperparameters were assigned a vague gamma prior distribution (a = b = 0.1, corresponding to a mean of 1 and a variance of 10). Samples from the posterior distribution were obtained using Hybrid Monte Carlo [see e.g. 7]. The results are in good accordance with the results obtained by Barenco et al. Differences in the estimates of the basal transcription rates are probably due to the different methods used for probe-level processing of the microarray data. 3.2 Non-linear response analysis\n\nWe then used the non-linear response model of section 2 in order to constrain the protein concentrations inferred to be positive. We achieved this by using an exponential response of the transcription rate to the logged protein concentration. The inferred MAP solutions for the latent function f are plotted in Figure 3 for the squared exponential prior (a) and for the MLP prior (b).\n2\n\nBarenco et al. also constrained the latent function to be zero at time zero.\n\n\f\n4 3 2 1 0 -1 -2 0 5 (a) 10\n\n4 3 2 1 0 -1 -2 0 5 (b) 10\n\nFigure 1: Predicted protein concentration for p53 using a linear response model: (a) squared exponential\nprior on f ; (b) MLP prior on f . Solid line is mean prediction, dashed lines are 95% credibility intervals. The prediction of Barenco et al. was pointwise and is shown as crosses.\n0.25\n\n2.5\n\n0.9 0.8\n\n0.2\n\n2\n\n0.7 0.6\n\n0.15\n\n1.5 0.5 0.4\n\n0.1\n\n1 0.3 0.2 0.1\n\n0.05\n\n0.5\n\n0\n\nDDB2\n\nhPA26\n\nTNFRSF20b\n\np21\n\nBIK\n\n0\n\nDDB2\n\nhPA26\n\nTNFRSF20b\n\np21\n\nBIK\n\n0\n\nDDB2\n\nhPA26\n\nTNFRSF20b\n\np21\n\nBIK\n\n(a) (b) (c) Figure 2: Results of inference on the hyperparameters for p53 data studied in [1]. The bar charts show (a) Basal transcription rates from our model and that of Barenco et al.. Grey are estimates obtained with our model, white are the estimates obtained by Barenco et al. (b) Similar for sensitivities. (c) Similar for decay rates.\n\n4 Discussion\nIn this paper we showed how Gaussian processes can be used effectively in modelling the dynamics of a very simple regulatory network motif. This approach has many advantages over standard parametric approaches: first of all, there is no need to restrict the inference to the observed time points, and the temporal continuity of the inferred functions is accounted for naturally. Secondly, Gaussian processes allow noise information to be accounted for in a natural way. It is well known that biological data exhibits a large variability, partly because of technical noise (due to the difficulty to measure mRNA abundance for low expressed genes, for example), and partly because of the difference between different cell lines. Accounting for these sources of noise in a parametric model can be difficult (particularly when estimates of the derivatives of the measured quantities are required), while Gaussian Processes can incorporate this information naturally. Finally, MCMC parameter estimation in a discretised model can be computationally expensive due to the high correlations between variables. This is a consequence of treating the protein concentrations as parameters, and results in many MCMC iterations to obtain reliable samples. Parameter estimation can be achieved easily in our framework by type II maximum likelihood or by using efficient Monte Carlo sampling techniques only on the model hyperparameters. While the results shown in the paper are encouraging, this is still a very simple modelling situation. For example, it is well known that transcriptional delays can play a significant role in determining the dynamics of many cellular processes [5]. These effects can be introduced naturally in a Gaussian process model; however, the data must be sampled at a reasonably high frequency in order for delays to become identifiable in a stochastic model, which is often not the case with microarray data sets. Another natural extension of our work would be to consider more biologically meaningful nonlinearities, such as the popular Michaelis-Menten model of transcription used in [9]. Finally, networks consisting of a single transcription factor are very useful to study small systems of particular interest such as p53. However, our ultimate goal would be to describe regulatory pathways consisting of\n\n\f\n6 5 4 3 2 1 0 0 5 (a) 10\n\n6 5 4 3 2 1 0 0 5 (b) 10\n\nFigure 3: Predicted protein concentration for p53 using an exponential response: (a) shows results of using a\nsquared exponential prior covariance on f ; (b) shows results of using an MLP prior covariance on f . Solid line is mean prediction, dashed lines show 95% credibility intervals. The results shown are for exp(f ), hence the asymmetry of the credibility intervals. The prediction of Barenco et al. was pointwise and is shown as crosses.\n\nmore genes. These can be dealt with in the general framework described in this paper, but careful thought will be needed to overcome the greater computational difficulties. Acknowledgements We thank Martino Barenco for useful discussions and for providing the data. We gratefully acknowledge support from BBSRC Grant No BBS/B/0076X \"Improved processing of microarray data with probabilistic models\".\n\nReferences\n[1] M. Barenco, D. Tomescu, D. Brewer, R. Callard, J. Stark, and M. Hubank. Ranked prediction of p53 targets using hidden variable dynamic modeling. Genome Biology, 7(3):R25, 2006. [2] T. Graepel. Solving noisy linear operator equations by Gaussian processes: Application to ordinary and partial differential equations. In T. Fawcett and N. Mishra, editors, Proceedings of the International Conference in Machine Learning, volume 20, pages 234241. AAAI Press, 2003. [3] J. C. Liao, R. Boscolo, Y.-L. Yang, L. M. Tran, C. Sabatti, and V. P. Roychowdhury. Network component analysis: Reconstruction of regulatory signals in biological systems. Proceedings of the National Academy of Sciences USA, 100(26):1552215527, 2003. [4] X. Liu, M. Milo, N. D. Lawrence, and M. Rattray. A tractable probabilistic model for affymetrix probelevel analysis across multiple chips. Bioinformatics, 21(18):36373644, 2005. [5] N. A. Monk. Unravelling nature's networks. Biochemical Society Transactions, 31:14571461, 2003. [6] R. Murray-Smith and B. A. Pearlmutter. Transformations of Gaussian process priors. In J. Winkler, N. D. Lawrence, and M. Niranjan, editors, Deterministic and Statistical Methods in Machine Learning, volume 3635 of Lecture Notes in Artificial Intelligence, pages 110123, Berlin, 2005. Springer-Verlag. [7] R. M. Neal. Bayesian Learning for Neural Networks. Springer, 1996. Lecture Notes in Statistics 118. [8] C. E. Rasmussen and C. K. Williams. Gaussian Processes for Machine Learning. MIT press, 2005. [9] S. Rogers, R. Khanin, and M. Girolami. Model based identification of transcription factor activity from microarray data. In Probabilistic Modeling and Machine Learning in Structural and Systems Biology, Tuusula, Finland, 17-18th June 2006. [10] C. Sabatti and G. M. James. Bayesian sparse hidden components analysis for transcription regulation networks. Bioinformatics, 22(6):739746, 2006. [11] G. Sanguinetti, M. Rattray, and N. D. Lawrence. A probabilistic dynamical model for quantitative inference of the regulatory mechanism of transcription. Bioinformatics, 22(14):17531759, 2006. [12] C. K. I. Williams. Computation with infinite neural networks. Neural Computation, 10(5):12031216, 1998.\n\n\f\n", "award": [], "sourceid": 3119, "authors": [{"given_name": "Neil", "family_name": "Lawrence", "institution": null}, {"given_name": "Guido", "family_name": "Sanguinetti", "institution": null}, {"given_name": "Magnus", "family_name": "Rattray", "institution": null}]}