{"title": "Inference in continuous-time change-point models", "book": "Advances in Neural Information Processing Systems", "page_first": 2717, "page_last": 2725, "abstract": "We consider the problem of Bayesian inference for continuous time multi-stable stochastic systems which can change both their diffusion and drift parameters at discrete times. We propose exact inference and sampling methodologies for two specific cases where the discontinuous dynamics is given by a Poisson process and a two-state Markovian switch. We test the methodology on simulated data, and apply it to two real data sets in finance and systems biology. Our experimental results show that the approach leads to valid inferences and non-trivial insights.", "full_text": "Inference in continuous-time change-point models\n\nFlorian Stimberg\n\nComputer Science, TU Berlin\n\nAndreas Ruttor\n\nComputer Science, TU Berlin\n\nflostim@cs.tu-berlin.de\n\nruttor@cs.tu-berlin.de\n\nManfred Opper\n\nComputer Science, TU Berlin\n\nopperm@cs.tu-berlin.de\n\nGuido Sanguinetti\n\nSchool of Informatics, University of Edinburgh\n\ngsanguin@inf.ed.ac.uk\n\nAbstract\n\nWe consider the problem of Bayesian inference for continuous-time multi-stable\nstochastic systems which can change both their diffusion and drift parameters at\ndiscrete times. We propose exact inference and sampling methodologies for two\nspeci\ufb01c cases where the discontinuous dynamics is given by a Poisson process\nand a two-state Markovian switch. We test the methodology on simulated data,\nand apply it to two real data sets in \ufb01nance and systems biology. Our experimental\nresults show that the approach leads to valid inferences and non-trivial insights.\n\n1\n\nIntroduction\n\nContinuous-time stochastic models play a prominent role in many scienti\ufb01c \ufb01elds, from biology to\nphysics to economics. While it is often possible to easily simulate from a stochastic model, it is often\nhard to solve inference or parameter estimation problems, or to assess quantitatively the \ufb01t of a model\nto observations. In recent years this has motivated an increasing interest in the machine learning\nand statistics community in Bayesian inference approaches for stochastic dynamical systems, with\napplications ranging from biology [1\u20133] to genetics [4] to spatio-temporal systems [5].\n\nIn this paper, we are interested in modelling and inference for systems exhibiting multi-stable be-\nhavior. These systems are characterized by stable periods and rapid transitions between different\nequilibria. Very common in physical and biological sciences, they are also highly relevant in eco-\nnomics and \ufb01nance, where unexpected events can trigger sudden changes in trading behavior [6].\n\nWhile there have been a number of approaches to Bayesian change-point inference [7\u20139] most of\nthem expect the observations to be independent and coming directly from the change-point process.\nIn many systems this is not the case because observations are only available from a dynamic pro-\ncess whose parameters are change-point processes. There have been other algorithms for detecting\nindirectly observed change-point processes [10], but we emphasize that we are also (and sometimes\nmostly) interested in the dynamical parameters of the system.\n\nWe present both an exact and an MCMC-based approach for Bayesian inference in multi-stable\nstochastic systems. We describe in detail two speci\ufb01c scenarios: the classic change-point process\nscenario whereby the latent process has a new value at each jump and a bistable scenario where the\nlatent process is a stochastic telegraph process. We test extensively our model on simulated data,\nshowing good convergence properties of the sampling algorithm. We then apply our approach to\ntwo very diverse data sets in \ufb01nance and systems biology, demonstrating that the approach leads to\nvalid inferences and interesting insights in the nature of the system.\n\n1\n\n\f2 The generative model\n\nWe consider a system of N stochastic differential equations (SDE)\n\ndxi = (Ai(t) \u2212 \u03bbixi)dt + \u03c3i(t)dWi,\n\n(1)\n\nof the Ornstein-Uhlenbeck type for i = 1, . . . , N, which are driven by independent Wiener pro-\ncesses Wi(t). The time dependencies in the drift Ai(t) and in the diffusion terms \u03c3i(t) will account\nfor sudden changes in the system and will be further modelled by stochastic Markov jump processes.\nOur prior assumption is that change points, where Ai and \u03c3i change their values, constitute Poisson\nevents. This means that the times \u2206t between consecutive change points are independent expo-\nnentially distributed random variables with density p(\u2206t) = f exp(\u2212f \u2206t), where f denotes their\nexpected number per time unit. We will consider two different models for the values of Ai and \u03c3i in\nthis paper:\n\n\u2022 Model 1 assumes that at each of the change points Ai and \u03c3i are drawn independently from\n\ufb01xed prior densities pA(\u00b7) and p\u03c3(\u00b7). The number of change points up to time t is counted\nby the Poisson process \u00b5(t), so that Ai(t) = A\u00b5(t)\nare piecewise constant\nfunctions of time.\n\nand \u03c3i(t) = \u03c3\u00b5(t)\n\ni\n\ni\n\n\u2022 Model 2 restricts the parameters Ai(t) and \u03c3i(t) to two possible values A0\n\ni , and\ni , which are time independent random variables with corresponding priors. We select the\n\u03c31\nparameters according to the telegraph process \u00b5(t), which switches between \u00b5 = 0 and\n\u00b5 = 1 at each change point.\n\ni , A1\n\ni , \u03c30\n\nFor both models, Ai(t) and \u03c3i(t) are unobserved. However, we have a data set of M noisy ob-\nservations Y \u2261 {y1, . . . , yM } of the process x(t) = (x1(t), . . . , xN )(t) at discrete times tj,\nj = 1, . . . , M, i.e. we assume that yj = x(tj)+\u03be\u03be\u03bej with independent Gaussian noise \u03be\u03be\u03bej \u223c N (0, \u03c32\no).\n\n3 Bayesian Inference\n\nGiven data Y we are interested in the posterior distribution of all unobserved quantities, which are\nthe paths of the stochastic processes X \u2261 x[0:T ], Z \u2261 (A[0:T ], \u03c3[0:T ]) in a time interval [0 : T ] and\nthe model parameters \u039b = ({\u03bbi}). For simplicity, we have not used a prior for the rate f and treated\nit as a \ufb01xed quantity. The joint probability of these quantities is given by\n\np(Y, X, Z, \u039b) = p(Y |X)p(X|Z, \u039b)p(Z)p(\u039b)\n\n(2)\n\nA Gibbs sampling approach to this distribution is nontrivial, because the sample paths are in\ufb01nite\ndimensional objects, and a naive temporal discretization may lead to potential extra errors.\n\nInference is greatly facilitated by the fact that conditioned on Z and \u039b, X is an Ornstein-Uhlenbeck\nprocess, i.e. a Gaussian Markov process. Since also the data likelihood p(Y |X) is Gaussian, it is\npossible to integrate out the process X analytically leading to a marginal posterior\n\np(Z|Y, \u039b) \u221d p(Y |Z, \u039b)p(Z)\n\n(3)\n\nover the simpler piecewise constant sample paths of the jump processes. Details on how to compute\nthe likelihood p(Y |Z, \u039b) are given in the supplementary material.\nWhen inference on posterior values X is required, we can use the fact that X|Y, Z, \u039b is an in-\nhomogeneous Ornstein-Uhlenbeck process, which allows for an explicit analytical computation of\nmarginal means and variances at each time.\nThe jump processes Z = {\u03c4\u03c4\u03c4 , \u0398} are completely determined by the set of change points \u03c4\u03c4\u03c4 \u2261 {\u03c4j}\nand the actual values of \u0398 \u2261 {Aj, \u03c3j} to which the system jumps at the change points. Since\np(Z) = p(\u0398|\u03c4\u03c4\u03c4 )p(\u03c4\u03c4\u03c4 ) and p(\u0398|\u03c4, Y, \u039b) \u221d p(Y |Z, \u039b)p(\u0398|\u03c4\u03c4\u03c4 ), we can see that conditioned on a set\nof, say m change points, the distribution of \u0398 is a \ufb01nite (and usually relatively low) dimensional\nintegral from which one can draw samples using standard methods. In fact, if the prior density of\nthe drift values pA is a Gaussian, then it is easy to see that also the posterior is Gaussian.\n\n2\n\n\f4 MCMC sampler architecture\n\nWe use a Metropolis-within-Gibbs sampler, which alternates between sampling the parameters \u039b,\n\u0398 from p(\u039b|Y, \u03c4\u03c4\u03c4 , \u0398), p(\u0398|Y, \u03c4\u03c4\u03c4 , \u039b) and the positions \u03c4\u03c4\u03c4 of change points from p(\u03c4\u03c4\u03c4 |Y, \u0398, \u039b).\nSampling from p(\u039b|Y, \u03c4\u03c4\u03c4 , \u0398) as well as sampling the \u03c3is from p(\u0398|Y, \u03c4\u03c4\u03c4 , \u039b) is done by a Gaussian\nrandom walk Metropolis-Hastings sampler on the logarithm of the parameters, to ensure positivity.\nSampling the Ais on the other hand can be done directly if the prior p(Ai) is Gaussian, because then\np(Ai|Y, \u03c4\u03c4\u03c4 , \u039b, {\u03c3i}) is also Gaussian.\nFinally, we need to draw change points from their density p(\u03c4\u03c4\u03c4 |Y, \u0398, \u039b) \u221d p(Y |Z, \u039b)p(\u0398|\u03c4\u03c4\u03c4 )p(\u03c4\u03c4\u03c4 ).\nTheir number m is a random variable with a Poisson prior distribution and for \ufb01xed m, each \u03c4i is\nuniformly distributed in [0 : T ]. Therefore the prior probability of the sorted list \u03c41, . . . , \u03c4m is given\nby\n\n(4)\nFor sampling change points we use a Metropolis-Hastings step, which accepts a proposal \u03c4\u03c4\u03c4 \u2217 for the\npositions of the change points with probability\n\np(\u03c41, . . . , \u03c4m|f ) \u221d f m e\u2212f T .\n\nA = min(cid:18)1,\n\np(\u03c4\u03c4\u03c4 \u2217|Y, \u0398, \u039b)\np(\u03c4\u03c4\u03c4 |Y, \u0398, \u039b)\n\nq(\u03c4\u03c4\u03c4 |\u03c4\u03c4\u03c4 \u2217)\n\nq(\u03c4\u03c4\u03c4 \u2217|\u03c4\u03c4\u03c4 )(cid:19) ,\n\n(5)\n\nwhere q(\u03c4\u03c4\u03c4 \u2217|\u03c4\u03c4\u03c4 ) is the proposal probability to generate \u03c4\u03c4\u03c4 \u2217 starting from \u03c4\u03c4\u03c4 . Otherwise the old sample\nis used again. As proposal for a new \u03c4\u03c4\u03c4 -path we choose one of three (model 1) or \ufb01ve (model 2)\npossible actions, which modify the current sample:\n\n\u2022 Moving a change point: One change point is chosen at random with equal probability and\nthe new jump time is drawn from a normal distribution with the old jump time as the mean.\nThe normal distribution is truncated at the neighboring jump times to ensure that the order\nof jump times stays the same.\n\n\u2022 Adding a change point: We use a uniform distribution over the whole time interval [0 : T ]\nto draw the time of the added jump. In case of model 1 the parameter set \u0398i for the new\ninterval stays the same and is only changed in the following update of all the \u0398 sets. For\nmodel 2 it is randomly decided if the telegraph process \u00b5(t) is inverted before or after the\nnew change point. This is necessary to allow \u00b5 to change on both ends.\n\n\u2022 Removing a change point: The change point to remove is chosen at random. For model\n1 the newly joined interval inherits the parameters with equal probability from the interval\nbefore or after the removed change point. As for adding a change point, when using model\n2 we choose to either invert \u00b5 after or before the removed jump time.\n\nFor model 2 we also need the option to add or remove two jumps, because adding or removing one\njump will result in inverting the whole process after or before it, which leads to poor acceptance\nrates. When adding or removing two jumps instead, \u00b5 only changes between these two jumps.\n\n\u2022 Adding two change points: The \ufb01rst change point is drawn as for adding a single one,\nthe second one is drawn uniformly from the interval between the new and the next change\npoint.\n\n\u2022 Removing two change points: We choose one of the change points, except the last one, at\n\nrandom and delete it along with the following one.\n\nWhile the proposal does not use any information from the data, it is very fast to compute and quickly\nconverges to reasonable states, although we initialize the change points simply by drawing from\np(\u03c4\u03c4\u03c4 ).\n\n5 Exact inference\n\nIn the case of small systems described by model 2 it is also feasible to calculate the marginal prob-\nability distribution q(\u00b5, x, t) for the state variables x, \u00b5 at time t of the posterior process directly.\nFor that purpose, we use a smoothing algorithm, which is quite similar to the well-known method\nfor state inference in hidden Markov models. In order to improve clarity we only discuss the case of\n\n3\n\n\f6\n\n4\n\n2\nx\n\n0\n\n)\n1\n=\n\u00b5\n(\np\n\n0.9\n0.6\n0.3\n0.0\n\n0\n\n0\n10\n\ne\nc\nn\ne\nr\ne\nf\nf\ni\nd\n\n-1\n10\n\n \n\nn\na\ne\nm\n\n-2\n10\n\ny = 2.8359 x-0.48585\n\n200\n\n400\n\nt\n\n600\n\n800\n\n1000\n\n1\n\n10\n\n100\n\n1000\nsamples\n\n10000 100000\n\nFigure 1: Comparison of the results of the MCMC sampler and the exact inference: (top left) True\npath of x (black) and the noisy observations (blue crosses). (bottom left) True path of \u00b5 (black)\nand posterior of p(\u00b5 = 1) from the exact inference (green) and the MCMC sampler (red dashed).\n(right) Convergence of the sampler. Mean difference between sampler result and exact inference of\np(\u00b5 = 1) for different number of samples (red crosses) and the result of power law regression for\nmore than 100 samples (green).\n\na one-dimensional Ornstein-Uhlenbeck process x(t) here, but the generalization to multiple dimen-\nsions is straightforward.\n\nAs our model has the Markov property, the exact marginal posterior is given by\n\nq(\u00b5, x, t) =\n\n1\nL\n\np(\u00b5, x, t)\u03c8(\u00b5, x, t).\n\n(6)\n\nHere p(\u00b5, x, t) denotes the marginal \ufb01ltering distribution, which is the probability density of the\nstate (x, \u00b5) at time t conditioned on the observations up to time t. The normalization constant L is\nequal to the total likelihood of all observations. And the last factor \u03c8(\u00b5, x, t) is the likelihood of the\nobservations after time t under the condition that the process started with state (x, \u00b5) at time t.\nThe initial condition for the forward message p(\u00b5, x, t) is the prior over the initial state of the system.\nThe time evolution of the forward message is given by the forward Chapman-Kolmogorov equation\n\n\" \u2202\n\n\u2202t\n\n+\n\n\u2202\n\u2202x\n\n(A\u00b5 \u2212 \u03bbx) \u2212\n\n\u03c32\n\u00b5\n2\n\n\u22022\n\n\u2202x2# p(\u00b5, x, t) = X\u03bd6=\u00b5\n\n[f\u03bd\u2192\u00b5 p(\u03bd, x, t) \u2212 f\u00b5\u2192\u03bd p(\u00b5, x, t)] .\n\n(7)\n\nHere f\u03bd\u2192\u00b5 denotes the transition rate from discrete state \u03bd to discrete state \u00b5 \u2208 {0, 1} of model 2,\nwhich has the values\n\nIncluding an observation yj at time tj leads to a jump of the \ufb01ltering distribution,\n\nf0\u21921 = f1\u21920 = f, f0\u21920 = f0\u21920 = 0.\n\n(8)\n\np(\u00b5, x, t+\n\nj ) = p(\u00b5, x, t\u2212\n\nj )p(yj|x),\n\n(9)\n\nwhere p(yj|x) denotes the local likelihood of that observation given by the noise model and\np(\u00b5, x, t\u2213\nj ) are the values of the forward message directly before and after time point tj. By in-\ntegrating equation (7) forward in time from the \ufb01rst observation to the last, we obtain the exact\nsolution to the \ufb01ltering problem of our model.\n\nSimilarly we integrate backward in time from the last observation at time T to the \ufb01rst one in order\nto compute \u03c8(\u00b5, x, t). The initial condition here is \u03c8(\u00b5, x, t+\nN ) = 1. Between observations the time\nevolution of the backward message is given by the backward Chapman-Kolmogorov equation\n\n\" \u2202\n\n\u2202t\n\n+ (A\u00b5 \u2212 \u03bbx)\n\n\u2202\n\u2202x\n\n+\n\n\u03c32\n\u00b5\n2\n\n\u22022\n\n\u2202x2# \u03c8(\u00b5, x, t) = X\u03bd6=\u00b5\n\nAnd each observation is taken into account by the jump condition\n\nf\u00b5\u2192\u03bd [\u03c8(\u00b5, x, t) \u2212 \u03c8(\u03bd, x, t)] .\n\n(10)\n\n\u03c8(\u00b5, x, t\u2212\n\nj ) = \u03c8(\u00b5, x, t+\n\nj )p(yj|x(tj)).\n\n(11)\n\n4\n\n\f10\n\nx\n\n5\n\n0\n\n500\n\n1000\nt\n\n1500\n\n2000\n\n-2\n\n-26\u00d710\n-25\u00d710\n-24\u00d710\n-23\u00d710\n-22\u00d710\n1\u00d710\n0\n\ny\nt\ni\nl\ni\nb\na\nb\no\nr\np\n\n \n\np\nm\nu\nj\n\n0\n\n500\n\n1000\nt\n\n1500\n\n2000\n\nA\n\nA\n\n-1\n\n-2\n\n1\u00d710\n5\u00d710\n0\n-1\n1\u00d710\n5\u00d710\n0\n\n-2\n\nt\n\n2\n\u03c3\n\n2\n\u03c3\n\n0\n\n500 1000 1500\n\nt\n\n0\n\n500 1000 1500 2000\n\nt\n\n-2\n\n-2\n\n4\u00d710\n2\u00d710\n0\n-2\n\n-22\u00d710\n1\u00d710\n0\n\n0\n\n500 1000 1500\n\nt\n\n0\n\n500 1000 1500 2000\n\nt\n\nFigure 2: Synthetic results on a four-dimensional diffusion process with diagonal diffusion matrix:\n(top left) true paths with subsampled data points (dots); (top right) intensity of the posterior point\nprocess (the probability of a change point in a given interval is given by the integral of the intensity).\nActual change points are shown as vertical dotted lines. (bottom row) posterior processes for A\n(left) and \u03c32 (right) with a one standard deviation con\ufb01dence interval. True paths are shown as\nblack dashed lines.\n\nAfterwards, Lq(\u00b5, x, t) can be calculated by multiplying forward message p(\u00b5, x, t) and backward\nmessage \u03c8(\u00b5, x, t). Normalizing that quantity according to\n\nq(\u00b5, x, t)dx = 1\n\n(12)\n\nZ X\u00b5\n\nthen gives us the marginal posterior as well as the total likelihood L = p(y1, . . . , yN |A, b, . . . ) of all\nobservations. Note, that we only need to calculate L for one time point, as it is a time-independent\nquantity. Minimizing \u2212 log L as a function of the parameters can then be used to obtain maximum\nlikelihood estimates. As an analytical solution for both equations (7) and (10) does not exist, we\nhave to integrate them numerically on a grid. A detailed description is given in the supplementary\nmaterial.\n\n6 Results\n\n6.1 Synthetic Data\n\nAs a \ufb01rst consistency check, we tested the model on simulated data. The availability of an exact\nsolution to the inference problem provides us with an excellent way of monitoring convergence\nof our sampler. Figure 1 shows the results of sampling on data generated from model 2, with\nparameter settings such that only the diffusion constant changes, making it a fairly challenging\nproblem. Despite the rather noisy nature of the data (top left panel), the approach gives a reasonable\nreconstruction of the latent switching process (left panel, bottom). The comparison between exact\ninference and MCMC is also instructive, showing that the sampled posterior does indeed converge\nto the true posterior after a relatively short burn in period (Figure 1 right panel). A power law\nregression of the mean absolute difference between exact and MCMC (after burn in) on the number\n\n5\n\n\f)\ns\nt\ni\nn\nu\n\n \n\ny\nr\na\nr\nt\ni\nb\nr\na\n(\n \ne\nc\nn\ne\nc\ns\ne\nr\no\nu\nl\nF\n\n300\n250\n200\n150\n100\n50\n\n0\n\n0\n\n \n\n)\n1\n=\n\u00b5\n(\np\n\n \n\n1\n\n0.8\n0.6\n0.4\n0.2\n\n0\n\n10\n\n20\n\ntime/hrs\n\n30\n\n0\n\n10\n\n20\n\ntime/hrs\n\n30\n\nFigure 3: Stochastic gene expression during competence: (left) \ufb02uorescence intensity for comS\nprotein over 36 hrs; (right) inferred comK activation pro\ufb01le using model 2 (see text)\n\nof samples yields a decrease with approximately the square root of the number of samples (exponent\n0.48), as expected.\nTo test the performance of the inference approach on model 1, we simulated data from a four-\ndimensional diffusion process with diagonal diffusion with change points in the drift and diffusion\n(at the same times). The results of the sampling based inference are shown in Figure 2. Once\nagain, the results indicate that the sampled distribution was able to accurately identify the change\npoints (top right panel) and the values of the parameters (bottom panels). The results are based\non 260,000 samples and were obtained in approximately twelve hours on a standard workstation.\nUnfortunately in this higher dimensional example we do not have access to the true posterior, as\nnumerical integration of a high dimensional PDE proved computationally prohibitive.\n\n6.2 Characterization of noise in stochastic gene expression\n\nRecent developments in microscopy technology have led to the startling discovery that stochasticity\nplays a crucial role in biology [11]. A particularly interesting development is the distinction between\nintrinsic and extrinsic noise [12]: given a biological system, intrinsic noise arises as a consequence of\n\ufb02uctuations due to the low numbers of the molecular species composing the system, while extrinsic\nnoise is caused by external changes in\ufb02uencing the system of interest. A currently open question\nis how to characterize mathematically the difference between intrinsic and extrinsic noise, and a\nwidely mooted opinion is that either the amplitude or the spectral characteristics of the two types\nof noise should be different [13]. To provide a proof-of-principle investigation into these issues, we\ntested our model on real stochastic gene expression data subject to extrinsic noise in Bacillus subtilis\n[14]. Here, single-cell \ufb02uorescence levels of the protein comS were assayed through time-lapse\nmicroscopy over a period of 36 hours. During this period, the protein was subjected to extrinsic noise\nin the form of activation of the regulator comK, which controls comS expression with a switch-like\nbehavior (Hill coef\ufb01cient 5). Activation of comS produces a striking phenotype called competence,\nwhereby the cell stops dividing, becoming visibly much longer than sister cells. The data used is\nshown in Figure 3, left panel.\n\nTo determine whether the noise characteristics are different in the presence of comK activity, we\nmodelled the data using two different models: model 2, where both the offset A and the diffusion\n\u03c3 can take two different values, and a constrained version of model 2 where the diffusion constant\ncannot switch (as in [15]). In both cases we sampled 500,000 posterior samples, discarding an initial\nburn-in of 10,000 samples. Both models predict two clear change points representing the activation\nand inactivation of comK at approximately 5 and 23 hrs respectively (Figure 3 right panel, showing\nmodel 2 results). Also both models are in close agreement on the inferred kinetic parameters A, b,\nand \u03bb (Figure 4, left panel, showing a comparison of the \u03bb posteriors), consistently with the fact that\nthe mean trajectory for both models must be the same.\nNaturally, model 2 predicted two different values for the diffusion constant depending on the activity\nstate of comK (Figure 4, central panel). The two posterior distributions for \u03c31 and \u03c32 appear to be\nwell separated, lending support to the unconstrained version of model 2 being a better description\n\n6\n\n\f8\n\n6\n\n4\n\n2\n\n0\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\n0.015\n\n0.01\n\n0.005\n\n0\n\n0\n\n0.25\n0.2\n\n0.15\n0.1\n\n0.05\n\n0\n\n200\n\n400\n\n0\n\n2\n\n4\n\n6\n\n8\n\n10\n\n12\n\n14\n\nFigure 4: Stochastic gene expression during competence: (left) posterior estimates of \u03bb (solid) for\nswitching \u03c3 (red) and non-switching \u03c32 (blue) with common prior (dashed); (center) posterior es-\n2 (green solid) and non-switching \u03c3 posterior (blue solid) with common\ntimates of \u03c32\nprior (dashed); (right) posterior distribution of f (A, b, \u03c31, \u03c32) (see text), indicating the incompati-\nbility of the simple birth-death model of steady state with the data.\n\n1 (red solid), \u03c32\n\nof the data. While this is an interesting result in itself, it is perhaps not surprising. We can gain some\ninsights by considering the underlying discrete dynamics of comS protein counts, which our model\napproximates as a continuous variable [16]. As we are dealing with bacterial cells, transcription and\ntranslation are tightly coupled, so that we can reasonably assume that protein production is given by\na Poisson process. At steady state in the absence of comK, the production of comS proteins will be\ngiven by a birth-death process with birth rate b and death rate \u03bb, while in the presence of comK the\nbirth rate would change to A + b. De\ufb01ning\n\n\u03c10 =\n\nb\n\u03bb\n\n,\n\n\u03c11 =\n\nA + b\n\n\u03bb\n\n(13)\n\nthis simple birth-death model implies a Poisson distribution of the steady state comS protein levels\nin the two comK states, with parameters \u03c10, \u03c11 respectively. Unfortunately, we only measure the\ncounts of comS protein up to a proportionality constant (due to the arbitrary units of \ufb02uorescence);\nthis means that the basic property of Poisson distributions of having the same mean and variance\ncannot be tested easily. However, if we consider the ratio of signal to noise ratios in the two states,\nwe obtain a quantity which is independent of the \ufb02uorescence units, namely\n\n\u00afN1/stdev(N1)\n\u00afN0/stdev(N0)\n\n=r \u03c11\n\n\u03c10\n\n=r A + b\n\nb\n\n.\n\n(14)\n\nThis relationship is not enforced in our model, but, if the simple birth-death interpretation is sup-\nported by the data, it should emerge naturally in the posterior distributions. To test this, we plot in\nFigure 4 right panel the posterior distribution of\n\nf (A, b, \u03c31, \u03c32) =\n\n(A + b)/\u03c32\n\nb/\u03c31\n\n\u2212r A + b\n\nb\n\n,\n\n(15)\n\nthe difference between the posterior estimate of the ratio of the signal to noise ratios in the two comK\nstates and the prediction from the birth-death model. The overwhelming majority of the posterior\nprobability mass is away from zero, indicating that the data does not support the predictions of the\nbirth-death interpretation of the steady states. A possible explanation of this unexpected result is\nthat the continuous approximation breaks down in the low abundance state (corresponding to no\ncomK activation); the expected number of particles in the comK inactive state is given by \u03c10 and\nhas posterior mean 25.8. The breaking down of the OU approximation for these levels of protein\nexpression would be surprising, and would sound a call for caution when using SDEs to model\nsingle cell data as advocated in large parts of the literature [2]. An alternative and biologically more\nexciting explanation would be that the assumption that the decay rates are the same irrespective of\nthe activity of comK is wrong. Notice that, if we assumed different decay rates in the two states, the\n\n\ufb01rst term in equation (15) would not change, while the second would scale with a factorp\u03bb0/\u03bb1.\n\nOur results would then predict that comK regulation at the transcriptional level alone cannot explain\nthe data, and that comS dynamics must be regulated both transcriptionally and post-transcriptionally.\n\n7\n\n\f8000\n\n6000\n\nX\nA\nD\n\n4000\n\n2000\n\n0\n\n1990\n\n2000\nyear\n\n2010\n\n10000\n7500\n5000\n2500\n0\n\nA\n\n-2500\n-5000\n\nDot-com bubble\n\n\u2019German NASDAQ\u2019\n\nEarly 2000s recession\n\nGlobal Financial Crisis\n\n1990\n\n1995\n\n2000\nyear\n\n2005\n\n2010\n\n2\n\u03c3\n\n5\n\n4\u00d710\n\n5\n\n3\u00d710\n\n5\n\n2\u00d710\n\n1990\n\n1995\n\n2000\nyear\n\n2005\n\n2010\n\nFigure 5: Analysis of DAX data: left monthly closing values with data points (red crosses); center\nA process with notable events highlighted; right \u03c3 process.\n\n6.3 Change point detection in \ufb01nancial data\n\nAs an example of another application of our methodology, we applied model 1 to \ufb01nancial data\ntaken from the German stock exchange (DAX). The data, shown in Figure 5, consists of monthly\nclosing values; we subsampled it at quarterly values. The posterior processes for A and \u03c3 are shown\nin the central and right panels of Figure 5 respectively. An inspection of these results reveals several\ninteresting change points which can be related to known events: for convenience, we highlight\na few of them in the central panel of Figure 5. Clearly evident are the changes caused by the\nintroduction of the Neuer Markt (the German equivalent of the NASDAQ) in 1997, as well as the\ndot-com bubble (and subsequent recession) in the early 2000s and the global \ufb01nancial crisis in 2008.\nInterestingly, in our results the diffusion (or volatility as is more commonly termed in \ufb01nancial\nmodelling) seems not to be particularly affected by recent events (after surging for the Neuer Markt).\nA possible explanation is the rather long time interval between data points: volatility is expected to\nbe particularly high on the micro-time scale, or at best the daily scale. Therefore the effective\nsampling rate we use may be too sparse to capture these changes.\n\n7 Discussion\n\nIn this paper, we proposed a Bayesian approach to inference in multi-stable system. The basic\nmodel is a system of SDEs whose drift and diffusion coef\ufb01cients can change abruptly at random,\nexponential distributed times. We describe the approach in two special models: a system of SDEs\nwith coef\ufb01cients changing at change points from a Poisson process (model 1) and a system of\nSDE whose coef\ufb01cients can change between two sets of values according to a random telegraph\nprocess (model 2). Each model is particularly suitable for speci\ufb01c applications: while model 1\nis important in \ufb01nancial modelling and industrial application, model 2 extends a number of similar\nmodels already employed in systems biology [3,15,17]. Testing our model(s) in speci\ufb01c applications\nreveals that it often leads to interpretable predictions. For example, in the analysis of DAX data, the\nmodel correctly captures known important events such as the dot-com bubble. In an application to\nbiological data, the model leads to non-obvious predictions of considerable biological interest.\n\nIn regard to the computational costs stated in this paper, it has to be noted that the sampler was\nimplemented in Matlab. A new implementation in C++ for model 2 showed over 12 times faster\ncomputational times for a data set with 10 OU processes and 2 telegraph processes. A similar\nimprovement is to be expected for model 1.\n\nThere are several interesting possible avenues to further this work. While the inference scheme\nwe propose is practical in many situations, scaling to higher dimensional problems may become\ncomputationally intensive. It would therefore be interesting to investigate approximate inference\nsolutions like the ones presented in [15]. Another interesting direction would be to extend the\ncurrent work to a factorial design; these can be important, particularly in biological applications\nwhere multiple factors can interact in determining gene expression [17, 18]. Finally, our models are\nnaturally non-parametric in the sense that the number of change points is not a priori determined.\nIt would be interesting to explore further non-parametric extensions where the system can exist in\na \ufb01nite but unknown number of regimes, in the spirit of non-parametric models for discrete time\ndynamical systems [19].\n\n8\n\n\fReferences\n[1] Neil D. Lawrence, Guido Sanguinetti, and Magnus Rattray. Modelling transcriptional regulation using\nGaussian processes. In B. Sch\u00a8olkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information\nProcessing Systems 19. 2007.\n\n[2] Darren J. Wilkinson. Stochastic Modelling for Systems Biology. Chapman & Hall / CRC, London, 2006.\n[3] Guido Sanguinetti, Andreas Ruttor, Manfred Opper, and C`edric Archambeau. Switching regulatory mod-\n\nels of cellular stress response. Bioinformatics, 25:1280\u20131286, 2009.\n\n[4] Ido Cohn, Tal El-Hay, Nir Friedman, and Raz Kupferman. Mean \ufb01eld variational approximation for\ncontinuous-time Bayesian networks. In Proceedings of the twenty-\ufb01fthth conference on Uncertainty in\nArti\ufb01cial Intelligence (UAI), 2009.\n\n[5] Andreas Ruttor and Manfred Opper. Approximate inference in reaction-diffusion processes. JMLR\n\nW&CP, 9:669\u2013676, 2010.\n\n[6] Tobias Preis, Johannes Schneider, and H. Eugene Stanley. Switching processes in \ufb01nancial markets.\n\nProceedings of the National Academy of Sciences USA, 108(19):7674\u20137678, 2011.\n\n[7] Paul Fearnhead and Zhen Liu. Ef\ufb01cient bayesian analysis of multiple changepoint models with depen-\n\ndence across segments. Statistics and Computing, 21(2):217\u2013229, 2011.\n\n[8] Paolo Giordani and Robert Kohn. Ef\ufb01cient bayesian inference for multiple change-point and mixture\n\ninnovation models. Journal of Business and Economic Statistics, 26(1):66\u201377, 2008.\n\n[9] E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. An HDP-HMM for systems with state\n\npersistence. In Proc. International Conference on Machine Learning, July 2008.\n\n[10] Yunus Saatci, Ryan Turner, and Carl Edward Rasmussen. Gaussian process change point models.\n\nICML, pages 927\u2013934, 2010.\n\nIn\n\n[11] Vahid Shahrezaei and Peter Swain. The stochastic nature of biochemical networks. Curr. Opin. in\n\nBiotech., 19(4):369\u2013374, 2008.\n\n[12] Michael B. Elowitz, Arnold J. Levine, Eric D. Siggia, and Peter S. Swain. Stochastic gene expression in\n\na single cell. Science, 297(5584):1129\u20131131, 2002.\n\n[13] Avigdor Eldar and Michael B. Elowitz.\n\n467(7312):167\u2013173, 2010.\n\nFunctional roles for noise in genetic circuits. Nature,\n\n[14] G\u00a8urol M. Su\u00a8el, Jordi Garcia-Ojalvo, Louisa M. Liberman, and Michael B. Elowitz. An excitable gene\n\nregulatory circuit induces transient cellular differentiation. Nature, 440(7083):545\u2013550, 2006.\n\n[15] Manfred Opper, Andreas Ruttor, and Guido Sanguinetti. Approximate inference in continuous time\ngaussian-jump processes. In J. Lafferty, C. K. I. Williams, R. Zemel, J. Shawe-Taylor, and A. Culotta,\neditors, Advances in Neural Information Processing Systems 23, pages 1822\u20131830. 2010.\n\n[16] N. G. van Kampen. Stochastic Processes in Physics and Chemistry. North-Holland, Amsterdam, 1981.\n[17] Manfred Opper and Guido Sanguinetti. Learning combinatorial transcriptional dynamics from gene ex-\n\npression data. Bioinformatics, 26(13):1623\u20131629, 2010.\n\n[18] H. M. Shahzad Asif and Guido Sanguinetti. Large scale learning of combinatorial transcriptional dynam-\n\nics from gene expression. Bioinformatics, 27(9):1277\u20131283, 2011.\n\n[19] Matthew Beal, Zoubin Ghahramani, and Carl Edward Rasmussen. The in\ufb01nite hidden Markov model. In\nS. Becker, S. Thrun, and L. Saul, editors, Advances in Neural Information Processing Systems 14, pages\n577\u2013584. 2002.\n\n9\n\n\f", "award": [], "sourceid": 1480, "authors": [{"given_name": "Florian", "family_name": "Stimberg", "institution": null}, {"given_name": "Manfred", "family_name": "Opper", "institution": null}, {"given_name": "Guido", "family_name": "Sanguinetti", "institution": null}, {"given_name": "Andreas", "family_name": "Ruttor", "institution": null}]}