{"title": "Maximum likelihood trajectories for continuous-time Markov chains", "book": "Advances in Neural Information Processing Systems", "page_first": 1437, "page_last": 1445, "abstract": "Continuous-time Markov chains are used to model systems in which transitions between states as well as the time the system spends in each state are random. Many computational problems related to such chains have been solved, including determining state distributions as a function of time, parameter estimation, and control. However, the problem of inferring most likely trajectories, where a trajectory is a sequence of states as well as the amount of time spent in each state, appears unsolved. We study three versions of this problem: (i) an initial value problem, in which an initial state is given and we seek the most likely trajectory until a given final time, (ii) a boundary value problem, in which initial and final states and times are given, and we seek the most likely trajectory connecting them, and (iii) trajectory inference under partial observability, analogous to finding maximum likelihood trajectories for hidden Markov models. We show that maximum likelihood trajectories are not always well-defined, and describe a polynomial time test for well-definedness. When well-definedness holds, we show that each of the three problems can be solved in polynomial time, and we develop efficient dynamic programming algorithms for doing so.", "full_text": "Maximum likelihood trajectories for continuous-time\n\nMarkov chains\n\nTheodore J. Perkins\n\nOttawa Hospital Research Institute\n\nOttawa, Ontario, Canada\ntperkins@ohri.ca\n\nAbstract\n\nContinuous-time Markov chains are used to model systems in which transitions\nbetween states as well as the time the system spends in each state are random.\nMany computational problems related to such chains have been solved, including\ndetermining state distributions as a function of time, parameter estimation, and\ncontrol. However, the problem of inferring most likely trajectories, where a tra-\njectory is a sequence of states as well as the amount of time spent in each state,\nappears unsolved. We study three versions of this problem: (i) an initial value\nproblem, in which an initial state is given and we seek the most likely trajectory\nuntil a given \ufb01nal time, (ii) a boundary value problem, in which initial and \ufb01nal\nstates and times are given, and we seek the most likely trajectory connecting them,\nand (iii) trajectory inference under partial observability, analogous to \ufb01nding max-\nimum likelihood trajectories for hidden Markov models. We show that maximum\nlikelihood trajectories are not always well-de\ufb01ned, and describe a polynomial time\ntest for well-de\ufb01nedness. When well-de\ufb01nedness holds, we show that each of the\nthree problems can be solved in polynomial time, and we develop ef\ufb01cient dy-\nnamic programming algorithms for doing so.\n\n1\n\nIntroduction\n\nA continuous-time Markov chain (CTMC) is a model of a dynamical system which, upon entering\nsome state, remains in that state for a random real-valued amount of time (called the dwell time or\noccupancy time) and then transitions randomly to a new state. CTMCs are used in a wide variety of\ndomains. In stochastic chemical kinetics, states may correspond to the conformation of a molecule\nsuch as a protein, peptide or nucleic acid polymer, and transitions correspond to conformational\nchanges (e.g., [1]). Or, the state may correspond to the numbers of different types of molecules in\nan interacting system, and transitions are the result of chemical reactions between molecules [2].\nIn phylogenetics, the states may correspond to the genomes of different organisms, and transitions\nto the evolutionary events (mutations) that separate those organisms [3]. Other application domains\ninclude queueing theory, process control and manufacturing, quality control, formal veri\ufb01cation, and\nrobot nagivation.\nMany computational problems associated with CTMCs have been solved, often by generalizing\nmethods developed for discrete-time Markov chains (DTMCs). For example, stationary distribu-\ntions for CTMCs can be computed in a manner very similar to that for DTMCs [4]. Estimating the\nparameters of a CTMC from fully observed data involves estimating state transition probabilities,\njust as for DTMCs, but adds estimation of the state dwell time distributions. Estimating parameters\nfrom partially observed data can be done by a generalization of the well-known Baum-Welch algo-\nrithm for parameter estimation for hidden Markov models [5] or by Bayesian methods [6, 7]. When\nthe state of a CTMC is observed periodically through time, but some transitions between observa-\ntion times may go unseen, the parameter estimation problem can also be solved through embedding\n\n1\n\n\ftechniques [8]. In scenarios such as manufacturing or robot navigation, one may assume that the\nstate transitions or dwell times are under at least partial control. When control choices are made\nonce for each state entered, dynamic programming and related methods can be used to develop opti-\nmal control strategies [9]. When control choices are made continuously in time, methods for hybrid\nsystem control are more appropriate [10].\nAnother fundamental and well-studied problem for CTMCs is to compute, given an initial state and\ntime, the state distribution or most likely state at a later time. These problems are readily solved for\nDTMCs by dynamic programming [11], but for the CTMCs, solutions have a somewhat different\n\ufb02avor. One approach is based on the forward Chapman-Kolmogorov equations [4], called the Mas-\nter equation in the stochastic chemical kinetics literature [12]. These specify a system of ordinary\ndifferential equations the describe how the probabilities of being in each state change over time.\nSolving the equations, sometimes analytically but more often numerically, yields the entire state\ndistribution as a function of time. Alternatively, one can uniformize the CTMC, which produces\na DTMC along with a probability distribution for a number of transitions to perform. The process\nobtained by choosing the number of transitions, and then producing a trajectory with that many tran-\nsitions from the DTMC, has the same state distribution as the original CTMC. This representation\nallows particularly ef\ufb01cient computation of the state distribution if that distribution is only required\nat one or a smaller number of different times. Finally, especially in the chemical kinetics commu-\nnity, stochastic simulation algorithms are popular [13]. These approaches act by simply simulating\ntrajectories from the CTMC to produce empirical, numerical estimates of state distributions or other\nfeatures of the dynamics.\nDespite the extensive work on a variety of problems related to to CTMCs, to the best of our knowl-\nedge, the problem of \ufb01nding most likely trajectories has not been addressed. With this paper, we\nattempt to \ufb01ll that gap. We propose dynamic programming solutions to three variants of the problem:\n(i) an initial value problem, where a starting state and \ufb01nal time are given, and we seek the most\nlikely sequence of states and dwell times occurring up until the \ufb01nal time, (ii) a boundary value\nproblem, where initial and \ufb01nal states and times are given, and we seek the most likely intervening\ntrajectory, and (iii) a problem involving partial observability, where we have a sequence of \u201cobser-\nvations\u201d that may not give full state information, and we want to infer the most likely trajectory that\nthe system followed in producing the observations.\n\n2 De\ufb01nitions\nA CTMC is de\ufb01ned by four things: (i) a \ufb01nite state set S, (ii) initial state probabilities, Ps for s \u2208 S,\n(iii) state transition probabilities Pss(cid:48) for s, s(cid:48) \u2208 S, and (iv) state dwell time parameters \u03bbs for each\ns \u2208 S. Let St \u2208 S denote the state of the system at time t \u2208 [0, +\u221e). The rules for the evolution\nof the system are that it starts in state S0, which is chosen according to the distribution Ps. At any\ntime t, when the system is in state St = s, the system stays in state s for a random amount of time\nthat is exponentially distributed with parameter \u03bbs. When the system \ufb01nally leaves state s, the next\nstate of the system is s(cid:48) (cid:54)= s with probability Pss(cid:48).\nA trajectory of the CTMC is a sequence of states along with the dwell times in all but the\nlast state U = (s0, t0, s1, t1, . . . , sk\u22121, tk\u22121, sk). The meaning of this trajectory is that the\nsystem started in state s0, where it stayed for time t0, then transitioned to state s1, where it\nstayed for time t1, and so on. Eventually, the system reaches state sk, where it remains. Let\nUt = (s0, t0, s1, t1, . . . , skt\u22121, tkt\u22121, skt) be a random variable describing the trajectory of the\nsystem up until time t. In particular, this means that there are kt state transitions up until time t\n(where kt is itself a random variable), the system enters state skt sometime at or before time t, and\nremains in state skt until sometime after time t.\nGiven the initial state, S0, and a time t, the likelihood of a particular trajectory U is\nl(Ut = U|S0) =\n\nif s0 (cid:54)= S0 or(cid:80)k\u22121\n\ni=0 ti > t\n\n(cid:40) 0\n(cid:0)\u03a0k\u22121\n\n\u2212\u03bbsk(t\u2212(cid:80)\n\ni\n\nti)(cid:17)\n\ni=0 \u03bbsie\u2212\u03bbsi tiPsisi+1\n\n(cid:1)(cid:16)\n\ne\n\n2\n\nWhen(cid:80)\n\n(1)\ni ti > t, the likelihood is zero, because it means that the speci\ufb01ed transitions have not\ncompleted by time t. Otherwise, the terms inside the \ufb01rst parentheses account for the likelihood of\nthe dwell times and the state transitions in the sequence, and the term inside the second parentheses\n\notherwise\n\n\faccounts for the probability that the dwell time in the \ufb01nal state does not complete before time t.\nWith this notation, the initial value problem we study is easily stated as\n\nwhere s \u2208 S and t > 0 are both given. The boundary value problem we study is\n\narg max\nU\n\nl(Ut = U|S0 = s) ,\n\narg max\nU\n\nl(Ut = U|S0 = s, St = s(cid:48)).\n\n(2)\n\n(3)\n\nHere, the given s and s(cid:48) are any states in S, possibly the same state, and t > 0 is also given.\nA hidden continuous-time Markov chain (HCTMC) adds an observation model to the CTMC.\nIn particular, we assume a \ufb01nite set of possible observations O. When the system is observed\nand it is in state s \u2208 S, the observer sees observation o \u2208 O with probability Pso. Let O =\n(o1, \u03c41, o2, \u03c42, . . . , om, \u03c4m) denote a sequence of observations and the times at which they are made.\nWe assume that the observation times are \ufb01xed, being chosen ahead of time, and depend in no way on\nthe evolution of the chain itself. Given a trajectory of the system U = (s0, t0, s1, t1, . . . , tk\u22121, sk),\nlet U(t) denote the state of the system at time t implied by that sequence. Then, the probability of\nan observation sequence O given the trajectory U can be written as\n\nP (O|U\u03c4m = U) = \u03a0m\n\ni=1PU(\u03c4i)oi\n\n(4)\n\nThe \ufb01nal problem we study in this paper is that of \ufb01nding the most likely trajectory given an obser-\nvation sequence:\n\narg max\nU\n\nl(U\u03c4m = U|O) \u221d arg max\n\nU\n\nP (O|U\u03c4m = U)l(U\u03c4m = U)\n\n(5)\n\n3 Solving the initial and boundary value problems\n\nIn this section we develop solutions to problems (2) and (3). The \ufb01rst step in this development is\nto show that we can analytically optimize the dwell times if we are given the state sequence. This\nis covered in the next subsection. Following that, we develop a dynamic program to \ufb01nd optimal\nstate sequences, assuming that the dwell times are set to their optimal values relative to the state\nsequence.\n\n3.1 Maximum likelihood dwell times\n\nConsider a particular trajectory U = (s0, t0, s1, t1, . . . , sk\u22121, tk\u22121, sk). Given S0 and a time t, the\nlikelihood of that particular trajectory, l(Ut = U|S0) is given above by Equation (1). Let us assume\nthat S0 = s0, as we have no need to consider U starting from the wrong state, and let us maximize\nl(Ut = U|S0) with respect to the dwell times. To be concise, let Ttk = {(t0, t1, . . . , tk\u22121) : ti \u2265\ni ti \u2264 t}. This is the set of all feasible dwell times for the states up until\n\n0 for all 0 \u2264 i < k and (cid:80)\n\nstate sk. Then we can write the desired optimization as\n\ni=0 \u03bbsie\u2212\u03bbsi tiPsisi+1\nIt is more convenient to maximize the logarithm, which gives us\n\narg max(t0,...,tk\u22121)\u2208Ttk\n\n(cid:1)(cid:16)\ne\u2212\u03bbsk (t\u2212\u03a3iti)(cid:17)\n(cid:33)\n\n.\n\n(cid:0)\u03a0k\u22121\n(cid:32)k\u22121(cid:88)\n\ni=0\n\narg max(t0,...,tk\u22121)\u2208Ttk\n\nlog \u03bbsi \u2212 \u03bbsiti + log Psisi+1\n\n\u2212 \u03bbsk (t \u2212 \u03a3jtj)\n\nDropping the terms that do not depend on any of the ti and rearranging, we \ufb01nd the equivalent\nproblem\n\narg max(t0,...,tk\u22121)\u2208Ttk\n\nk\u22121(cid:88)\n(\u03bbsk \u2212 \u03bbsi)ti\n\nThe solution can be obtained by inspection. If \u03bbsk \u2264 \u03bbsi for all 0 \u2264 i < k, then we must have all\nti = 0. That is, the system transitions instantaneously through the states s0, s1, . . . , sk\u22121 and then\n\ni=0\n\n3\n\n(6)\n\n(7)\n\n(8)\n\n\fdwells in state sk for (at least) time t.1 Otherwise, let j be such that \u03bbsj is minimal for 0 \u2264 j < k.\nThen an optimal solution has tj = t, and all other ti = 0. Intuitively, this says that if state sj has the\nlargest expected dwell time (corresponding to the smallest \u03bb parameter), then the most likely setting\nof dwell times is obtained by assuming all of the time t is spent in state sj, and all other transitions\nhappen instantaneously. This is not unintuitive, although it is dissatisfying in the sense that the most\nlikely set of dwell times are not typical in some sense. For example, none are near their expected\nvalue. Moreoever, the basic character of the solution\u2014that all the time t goes into waiting at the\nslowest state\u2014is independent of t. Nevertheless, being able to solve explicitly for the most likely\ndwell times for a given state sequence makes it much easier to \ufb01nd the most likely Ut. So, let us\npress onwards.\n\n3.2 Dynamic programming for the most likely state sequence\n\nSubstituting back our solution for the ti into Equation (1), and continuing our assumption that s0 =\nS0, we obtain\n\n\uf8f1\uf8f2\uf8f3\n(cid:0)\u03a0k\u22121\n(cid:0)\u03a0k\u22121\n= (cid:0)\u03a0k\u22121\n\ni=0 \u03bbsiPsisi+1\n\n(cid:1) e\u2212\u03bbsk t\n(cid:1) e\u2212(mink\u22121\n(cid:1) e\u2212(mink\n\ni=0 \u03bbsi)t\n\ni=0 \u03bbsiPsisi+1\n\ni=0 \u03bbsiPsisi+1\n\ni=0 \u03bbsi )t\n\nif \u03bbsk \u2264 \u03bbsi for\nall 0 \u2264 i < k\notherwise\n\n(9)\n\nmax\n\n(t0,...,tk\u22121)\u2208Ttk\n\nl(Ut = U|S0) =\n\nThis leads to a dynamic program for \ufb01nding the state sequence that maximizes the likelihood. As is\ntypical, we build maximum likelihood paths of increasing length by \ufb01nding the best ways of extend-\ning shorter paths. The main difference with a more typical scenario is that to score an extension we\nneed to know not just the score and \ufb01nal state of the shorter path, but also the smallest dwell time\nparameter along that path. De\ufb01ne a (k, s, \u03bb)-trajectory to be one that includes k \u2208 {0, 1, 2, . . .} state\ntransitions, ends at state sk = s, and for which the smallest dwell time parameter of any state along\nthe trajectory is \u03bb. Then de\ufb01ne Fk(s, \u03bb) to be the maximum achievable l(Ut = U|S0), where we\nrestrict attention to U that are (k, s, \u03bb)-trajectories. We initialize the dynamic program as:\n\nF0(S0, \u03bbS0) = e\u2212t\u03bbS0\nF0(s, \u03bb) = 0 for all (s, \u03bb) (cid:54)= (S0, \u03bbS0)\n\nTo compute Fk(s, \u03bb) for larger k, we \ufb01rst observe that Fk(s, \u03bb) is unde\ufb01ned if \u03bb > \u03bbs. This is\nbecause there are no (k, s, \u03bb)-trajectories if \u03bb > \u03bbs. The fact that a trajectory ends at state s implies\nthat the minimum dwell time parameter along the trajectory can be no greater than \u03bbs. So, we only\ncompute Fk(s, \u03bb) for \u03bb \u2264 \u03bbs.\nTo determine Fk+1(s, \u03bb), we must consider two cases.\nIf \u03bb < \u03bbs, then the best (k + 1, s, \u03bb)-\ntrajectory must come from some (k, s(cid:48), \u03bb)-trajectory. That is, the length k trajectory must already\nhave a dwell time parameter of \u03bb along it. The state s(cid:48) can be any state other than s. If \u03bb = \u03bbs, then\nthe best (k + 1, s, \u03bb)-trajectory may be an extension of any (k, s(cid:48), \u03bb(cid:48))-trajectory with \u03bb(cid:48) \u2265 \u03bb and\ns (cid:54)= s(cid:48). To be more concise, de\ufb01ne\n\nG(s, \u03bb) =\n\n{\u03bbs(cid:48) : \u03bbs(cid:48) \u2265 \u03bb}\n\nif \u03bb < \u03bbs\nif \u03bb = \u03bbs\n\n(10)\n\nWe then compute F for increasing k as:\n\nFk+1(s, \u03bb) =\n\nmax\n\ns(cid:48)(cid:54)=s,\u03bb(cid:48)\u2208G(s,\u03bb)\n\nFk(s(cid:48), \u03bb(cid:48))\u03bbs(cid:48)Ps(cid:48)se\u2212t(\u03bb\u2212\u03bb(cid:48))\n\nThe \ufb01rst term on the right hand side accounts for the likelihood of the best (k, s(cid:48), \u03bb(cid:48))-trajectory. The\nnext two terms account for the dwell in s(cid:48) and the transition probability to s. The \ufb01nal term accounts\nfor any difference between the smallest dwell time parameters along the k and k + 1 transition\ntrajectories.\n\n1If the reader is not comfortable with a dwell time exactly equal to zero, one may instead take ti = 0 as a\nshorthand for an in\ufb01nitesimal but positive dwell time. Alternatively, the optimization problem can be modi\ufb01ed\nto explicitly require ti > 0. However, this does nothing to change the fundamental nature of the solution, while\nresulting in a signi\ufb01cantly more laborious exposition.\n\n4\n\n(cid:26) {\u03bb}\n\n\fFigure 1: A continuous-time Markov chain used as a demonstration domain. The \ufb01ve circles corre-\nspond to states, and the arrows to transitions between states. States are also labeled with their dwell\ntime parameters.\n\nBecause the set of possible states, S, is \ufb01nite, so is the set of possible dwell time parameters, \u03bbs for\ns \u2208 S. The size of the table Fk for each k is thus at most |S|2. If we limit k to some maximum value\nK, then the total size of all the tables is at most K|S|2, and the total computational effort O(K|S|3).\nTo solve the initial value problem (2), we scan over all values of k, s and \u03bb to \ufb01nd the maximum\nvalue of Fk(s, \u03bb). Such a value implies that the most likely state sequence ends at state s after k\nstate transitions. We can use a traceback to reconstitute the full sequence of states, and the result of\nthe previous section to obtain the most likely dwell times. To solve the boundary value problem (3),\nwe do the same, except that we only scan over values of k and \u03bb, looking for the maximum value of\nFk(St, \u03bb).\n\n3.3 Examples\n\n3, whereas the direct path (x, z) simply has probability Pxz = 1\n\nIn this section, we use the toy chain depicted in Figure 1 to demonstrate the algorithm of the previous\nsection, and to highlight some properties of maximum likelihood trajectories. First, suppose that we\nknow the system is in state x at time zero and in state z at time t. There are two different paths,\n(x, z) and (x, y, z), that lead from x to z.\nIf we ignore the issue of dwell times and consider\nonly the transition probabilities, then the path (x, y, z) seems more probable.\nIts probability is\n3 \u00b7 1 = 2\nPxyPyz = 2\n3. However, if\nwe consider the dwell times as well, the story can change. For example, suppose that t = 1. Note\nthat \u03bby = 1\n10, so that the expected dwell time in state y is 10. If the chain enters state y, the chance\nof it leaving y before time t = 1 is quite small. If we run the dynamic programming algorithm of\nthe previous section to \ufb01nd the most likely trajectory, it \ufb01nds (s0 = x, t0 = 0, s1 = z) to be most\nlikely, with a score of 0.1226. Along the way, it computes the likelihood of the most likely path\ngoing through y, which is (s0 = x, t0 = 0, s1 = y, t1 = t, s2 = x). It prefers to place all the dwell\ntime t in state y, because that state is most likely to have a long dwell time. However, the total score\nof this trajectory is still only 0.0603, making the direct path the more likely one. On the other hand,\nif t = 2, then the path through y becomes more likely by a score of 0.0546 to 0.0451. If t = 10, then\nthe path through y still has a likelihood of 0.0245, whereas the direct path has a likelihood below\n2 \u00d7 10\u22125, because it is highly unlikely to remain in x and/or z for so long.\nNext, suppose that we know S0 = a and that we are interested in knowing the most likely trajectory\nout until time t, regardless of the \ufb01nal state of that trajectory. For simplicity, suppose also that\n\u03bba = \u03bbb. There is only one possible state sequence containing k transitions for each k = 0, 1, 2, . . .,\nand the likelihood of any such sequence turns out to be independent of the dwell times (assuming\nthe dwell times total no more than time t):\n\n(\u03a0k\u22121\n\ni=0 \u03bbe\u2212\u03bbti)e\u2212\u03bb(t\u2212\u03a3iti) = e\u2212\u03bbt\u03bbk\n\n(11)\nIf \u03bb < 1, this implies the optimal trajectory has the system remaining at state a. However, if \u03bb = 1\nthen all trajectories of all lengths have the same likelihood. If \u03bb > 1, then there are trajectories of\narbitrarily large likelihood, but no maximum likelihood trajectory. Intuitively, because the likelihood\nof a dwell time can be greater than one, the likelihood of a trajectory can be increased by including\nshort dwells in states with high dwell parameters \u03bb.\nIn general, if a continuous-time Markov chain has a cycle of states (s0, s1, . . . , sk = s0), such\nthat \u03a0k\u22121\ni=0 Psisi+1\u03bbsi > 1, then maximum likelihood trajectories do not exist. Rather, a sequence of\n\n5\n\nabxzy\u03bby=1/10\u03bbx=1\u03bbz=1\u03bbaPxz=1/3Pxy=2/3Pyz=Pza=Pab=Pba=1\u03bbb\fFigure 2: Abstract example of a continuous-time trajectory of a chain, along with observations taken\nat \ufb01xed time intervals.\n\ntrajectories with ever-increasing likelihood can be found starting from any state from which the cycle\nis reachable. One should, thus, always check the chain for this property before seeking maximum\nlikelihood trajectories. This can be easily done in polynomial time. For example, one can label the\nedges of the transition graph with the weights log Pss(cid:48)\u03bbs for the edge from s to s(cid:48), and then check\nthe graph for the existence of a positive-weight cycle\u2014a well-known polynomial-time computation.\n\n4 Solving the partially observable problem\n\nl(U\u03c4m = U|O) \u221d P (O|U\u03c4m = U)l(U\u03c4m = U)\n\nWe now turn to problem (12), where we are given an observation sequence O =\n(o1, \u03c41, o2, \u03c42, . . . , om, \u03c4m) and want to \ufb01nd the most likely trajectory U. For simplicity, we as-\nsume that \u03c41 = 0. The following can be straightforwardly generalized to allow the \ufb01rst observation\nto take place sometime after the trajectory begins. Similarly, we restrict attention to trajectories\ni tk \u2264 \u03c4m, so that we do not concern ourselves with\nextrapolating the trajectory beyond the \ufb01nal observation time. The conditional likelihood of such a\ntrajectory can be written as\n\nU = (s0, t0, s1, t0, . . . , tk\u22121, sk) where(cid:80)\n(cid:1)(cid:16)\n\ninclude the probability of starting in state s0, and we have implicitly assumed that(cid:80)\n\n(12)\n(13)\nThe term in the \ufb01rst parentheses is P (O|U\u03c4m = U), and the term in the second parentheses is\nl(U\u03c4m = U). The only differences between the second parentheses and Equation (1) is that we now\ni tk \u2264 \u03c4m, as\nmentioned above. This form, however, is not convenient for optimizing U. To do this, we need to\nrewrite l(U\u03c4m = U) in a way that separates the likelihood into events happening in each interval of\ntime between observations.\n\ni=0 \u03bbsie\u2212\u03bbsi tiPsisi+1\n\n(cid:1)(cid:16)\n\ne\u2212\u03bbsk (t\u2212\u03a3iti)(cid:17)(cid:17)\n\n(cid:0)\u03a0k\u22121\n\n= (cid:0)\u03a0m\n\ni=1PU(\u03c4i)oi\n\nPs0\n\n4.1 Decomposing trajectory likelihood by observation intervals\n\nFor simplicity, let us further restrict attention to trajectories U that do not include a transition\ninto a state si precisely at any observation time \u03c4j. We do not have space here to show that\nthis restriction does not affect the value of the optimization problem; this will be addressed in\nthe full paper. The likelihood of the trajectory can be written in terms of the events in each ob-\nservation interval. For example, consider the trajectory and observations depicted in Figure 2.\nIn the \ufb01rst interval, the system starts in state s0 and transitions to s1, where it stays until time\n\u03c42. The likelihood of this happening is Ps0 \u03bbs0 e\u2212\u03bbs0 t0Ps0s1e\u2212\u03bbs1 (\u03c42\u2212t0). In the second observa-\ntion interval, the system never leaves state s1. The probability of this happening is e\u2212\u03bbs1 (\u03c43\u2212\u03c42).\nFinally, in the third interval, the system continues in state s1 before transitioning to state s2\nand then s3, where it remains until the \ufb01nal observation. The likelihood of this happening is\n\u03bbs1e\u2212\u03bbs1 (t0+t1\u2212\u03c43)Ps1s2\u03bbs2e\u2212\u03bbs2 t2 ps2s3e\u2212\u03bbs3 (\u03c44\u2212t0\u2212t1\u2212t2). If we multiply these together, we ob-\ntain the full likelihood of the trajectory, Ps0(\u03a02\nIn general, let Ui = (si0, ti0, si1, ti1, . . . , siki) denote the sequence of states and dwell times of\ntrajectory U during the time interval [\u03c4i, \u03c4i+1). The \ufb01rst dwell time ti0, if any, is measured with\nrespect to the start of the time interval. The component of the likelihood of the whole trajectory U\nattributable to the ith time interval is nothing other than l(U\u03c4i+1\u2212\u03c4i = Ui|S0 = si0). Thus, the\nlikelihood of the whole trajectory can be written as\nl(U\u03c4m = U) = Ps0\u03a0m\u22121\n\ni=1 l(U\u03c4i+1\u2212\u03c4i = Ui|S0 = si0)\n\ni=0\u03bbsie\u2212\u03bbsi ti)e\u2212\u03bbs3 (\u03c44\u2212\u03a3j tj ).\n\n(14)\n\n6\n\ns0s1s2s3o1o2o3o4time\f4.2 Dynamic programming for the optimal trajectory\n\nCombining Equations (12) and (14), we \ufb01nd\nl(U\u03c4m = U|O) \u221d PU(0)PU(0)o1\u03a0m\u22121\n\ni=1 l(U\u03c4i+1\u2212\u03c4i = Ui|S0 = U(\u03c4i))PU(\u03c4i+1)oi+1\n\n(15)\nThe \ufb01rst two terms account for the probability of the initial state and the probability of the \ufb01rst\nobservation given the initial state. The terms inside the product account for the likelihood of the ith\ninterval of the trajectory, and the probability of the (i + 1)st observation, given the state at the end\nof the ith interval of the trajectory.\nOne immediate implication of this rewriting of the conditional likelihood is the following. At times\n\u03c4i and \u03c4i+1, the system is in states U(\u03c4i) and U(\u03c4i+1). If U is to maximize the conditional likeli-\nhood, it had better be that the fragment of the trajectory between those two times, Ui, is a maximum\nlikelihood trajectory from state U(\u03c4i) to state U(\u03c4i+1) in time \u03c4i+1 \u2212 \u03c4i. If it is not, then an alterna-\ntive, higher likelihood trajectory fragment could be swapped into U, resulting in a higher conditional\nlikelihood. Let us de\ufb01ne\n\n(16)\nto be the maximum achievable likelihood by any trajectory from state s to state s(cid:48) in time t. Then a\nnecessary condition for U to maximize the conditional likelihood is\n\nl(Ut = U(cid:48)|S0 = s, St = s(cid:48))\n\nHt(s, s(cid:48)) = max\nU(cid:48)\n\nl(U\u03c4i+1\u2212\u03c4i = Ui|S0 = U(\u03c4i)) = H\u03c4i+1\u2212\u03c4i(U(\u03c4i), U(\u03c4i+1)) .\n\n(17)\nMoreover, to \ufb01nd an optimal U, we can simply assume that the above condition holds, and con-\ncern ourselves only with \ufb01nding the best endpoints for the each time interval, U(\u03c4i) and U(\u03c4i+1).\n(Of course, the endpoint of one interval must be the same as the initial point of the next interval.)\nSpeci\ufb01cally, de\ufb01ne Ji(s) to be the likelihood of the most likely trajectory covering the time interval\n[\u03c41, \u03c4i], accounting for the \ufb01rst i observations, and ending at state s. The we can compute J as\nfollows. To initialize, we set\nThen, for i = 1, 2, . . . , m \u2212 1,\n\nJ1(s) = PsPso1 .\n\n(18)\n\nJi+1(s) = max\n\ns(cid:48) Ji(s(cid:48))H\u03c4i+1\u2212\u03c4i(s(cid:48), s)Psoi+1 .\n\n(19)\n\nWe can then reconstruct the most likely trajectory by \ufb01nding s that maximizes Jm(s) and tracing\nback to the beginning. This algorithm is identical to the Viterbi algorithm for \ufb01nding most likely\nstate sequences for hidden Markov models, with the exception that the state transition probabilities\nin the Viterbi algorithm are replaced by the H\u03c4i+1\u2212\u03c4i(s(cid:48), s) terms above\u2014which can, of course, be\ncomputed based on the results of the previous section.\n\n4.3 Examples\n\nTo demonstrate this algorithm, let us return to the CTMC depicted in Figure 1. We assume that\n\u03bba = \u03bbb = 1, that the system always starts in state x, and that when we observe the system, we\nget a real-valued Gaussian observation with standard deviation 1 and means 0, 10, 3, 100 and 100\nfor states x, y, z, a and b respectively.2 The left side of Figure 3 shows three sample sequences\nof 20 observations. The right side of the \ufb01gure shows the most likely trajectories inferred under\ndifferent assumptions. First, if we assume the time interval between observations is t = 1, and we\nconsider observations OA, then the most likely trajectory has the system in state x up through the\n10th observation, after which it instantly transitions to state z and remains there. This makes sense,\nas the lower observations at the start of the series are more likely in state x. If we consider instead\nobservations OB, which has a high observation at time t = 11, the procedure infers that the system\nwas in state y at that time. Moreover, it predicts that the system switches into y immediately after the\n10th observation, and says there until just before the 12th observation, taking advantage of the fact\nthat longer dwell times are more likely in state y than in the other states. If we consider observations\nOC, which have a spike at t = 5, the transit to state y is moved earlier, and state z is used to explain\nobservations at t = 6 onward, even though the \ufb01rst few are relatively unlikely in that state. If we\n\n2Although our derivations above assume the observation set O is \ufb01nite, the same approach goes through if\n\nO is continuous and individual observations have likelihoods instead of probabilities.\n\n7\n\n\fFigure 3: Left: three length-20 observation sequences, OA, OB, and OC. All three are the same\nat most points, but the 11th observation of OB is 10, and the 5th observation of OC is 10. Right:\nmost likely trajectories inferred by our algorithm, assuming the underlying CTMC is the one given\nin Figure 1, with parameters given in the text.\n\nreturn to observations OA, but we assume that the time interval between observations is t = 2, then\nthe most likely trajectory is different than it is for t = 1. Although the same states are used to explain\nthe observations, the most likely trajectory has the system transitioning from x to y immediately\nafter the 10th observation and dwelling there until just before the 11th observation, where the state\nbecomes z. This is because, as explained previously, this is the more likely trajectory from x to z\ngiven t = 2. If we assume the time interval between observations is t = 20, then a wider range of\nobservations during the trajectory are attributed to state y. Intuitively, this is because, although the\nobservations are somewhat unlikely under state y, it is extremely unlikely for the system to dwell\nfor so long in state z as to account for all of the observations from the 11th onward.\n\n5 Discussion\n\nWe have provided correct, ef\ufb01cient algorithms for inferring most likely trajectories of CTMCs given\neither initial or initial and \ufb01nal states of the chain, or given noisy/partial observations of the chain.\nGiven the enormous practical import of the analogous problems for discrete-time chains, we are\nhopeful that our methods will prove useful additions to the toolkit of methods available for analyzing\ncontinuous-time chains. An alternative, existing approach to the problems we have addressed here is\nto discretize time, producing a DTMC which is then analyzed by standard methods [14]. A problem\nwith this approach, however, is that if the time step is taken too large, the discretized chain can\ncollapse a whole set of transition sequences of the CTMC into a single \u201cpseudotransition\u201d, obscuring\nthe real behavior of the system in continuous time. If the time step is taken to be suf\ufb01ciently small,\nthen the DTMC should produce substantially the same solutions as our approach. However, the\ntime complexity of the calculations increases as the time step shrinks, which can be a problem if\nwe are interested in long time intervals and/or there are states with very short expected dwell times,\nnecessitating very small time steps.\nA related problem on which we are working is to \ufb01nd the most probable state sequence of a\ncontinuous-time chain under similar informational assumptions. By this, we mean that the dwell\ntimes, rather than being optimized, are marginalized out, so that we are left with only the sequence\nof states and not the particular times they occurred. In many applications, this state sequence may\nbe of greater interest than the dwell times\u2014especially since, as we have shown, maximum likeli-\nhood dwell times are often in\ufb01nitessimal and hence non-representative of typical system behavior.\nMorever, this version of the problem has the advantage of always being well-de\ufb01ned. Because state\nsequences have probabilities rather than likelihoods, a most probable state sequence will always\nexist.\n\nAcknowledgments\n\nFunding for this work was provided in part by the National Sciences and Engineering Research\nCouncil of Canada and by the Ottawa Hospital Research Institute.\n\n8\n\nxzOA, t=20yxzOA, t=2yxzOC, t=1yxzOB, t=1yxzOA, t=1observation number1510152015101520\u221220246810observation numberobservation OAOBOC\fReferences\n[1] FG Ball and JA Rice. Stochastic models for ion channels:\n\nMathematical biosciences, 112(2):189, 1992.\n\nintroduction and bibliography.\n\n[2] D.J. Wilkinson. Stochastic modelling for systems biology. Chapman & Hall/CRC, 2006.\n[3] M. Holder and P.O. Lewis. Phylogeny estimation: traditional and Bayesian approaches. Nature\n\nReviews Genetics, 4(4):275\u2013284, 2003.\n\n[4] H.M. Taylor and S. Karlin. An introduction to stochastic modeling. Academic Press, 1998.\n[5] D.R. Fredkin and J.A. Rice. Maximum likelihood estimation and identi\ufb01cation directly from\n\nsingle-channel recordings. Proceedings: Biological Sciences, pages 125\u2013132, 1992.\n\n[6] R. Rosales, J.A. Stark, W.J. Fitzgerald, and S.B. Hladky. Bayesian restoration of ion channel\n\nrecords using hidden Markov models. Biophysical Journal, 80(3):1088\u20131103, 2001.\n\n[7] M.A. Suchard, R.E. Weiss, and J.S. Sinsheimer. Bayesian selection of continuous-time Markov\n\nchain evolutionary models. Molecular Biology and Evolution, 18(6):1001\u20131013, 2001.\n\n[8] DT Crommelin and E. Vanden-Eijnden. Fitting timeseries by continuous-time Markov chains:\nJournal of Computational Physics, 217(2):782\u2013805,\n\nA quadratic programming approach.\n2006.\n\n[9] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming.\n\nJohn Wiley and Sons, New York, 1994.\n\n[10] S. Hedlund and A. Rantzer. Optimal control of hybrid systems. In Decision and Control, 1999.\n\nProceedings of the 38th IEEE Conference on, volume 4, 1999.\n\n[11] D.P. Bertsekas. Dynamic programming and optimal control. Athena Scienti\ufb01c Belmont, Mass,\n\n1995.\n\n[12] NG Van Kampen. Stochastic processes in physics and chemistry. North-Holland, 2007.\n[13] D. T. Gillespie. Exact stochastic simulation of coupled chemical reactions. Journal of Physical\n\nChemistry, 81:2340\u20132361, 1977.\n\n[14] A. Hordijk, D.L. Iglehart, and R. Schassberger. Discrete time methods for simulating continu-\n\nous time Markov chains. Advances in Applied Probability, pages 772\u2013788, 1976.\n\n9\n\n\f", "award": [], "sourceid": 822, "authors": [{"given_name": "Theodore", "family_name": "Perkins", "institution": null}]}