Eric Xing, Michael Jordan, Richard Karp, Stuart J. Russell
We propose a dynamic Bayesian model for motifs in biopolymer se- quences which captures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-speciﬁc multinomial parameters for monomer distribu- tion are distributed as a latent Dirichlet-mixture random variable, and the position-speciﬁc Dirichlet component is determined by a hidden Markov process. Model parameters can be ﬁt on training motifs using a vari- ational EM algorithm within an empirical Bayesian framework. Varia- tional inference is also used for detecting hidden motifs. Our model im- proves over previous models that ignore biological priors and positional dependence. It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.