Pieter Abbeel, Andrew Ng
First-order Markov models have been successfully applied to many prob- lems, for example in modeling sequential data using Markov chains, and modeling control problems using the Markov decision processes (MDP) formalism. If a ﬁrst-order Markov model’s parameters are estimated from data, the standard maximum likelihood estimator considers only the ﬁrst-order (single-step) transitions. But for many problems, the ﬁrst- order conditional independence assumptions are not satisﬁed, and as a re- sult the higher order transition probabilities may be poorly approximated. Motivated by the problem of learning an MDP’s parameters for control, we propose an algorithm for learning a ﬁrst-order Markov model that ex- plicitly takes into account higher order interactions during training. Our algorithm uses an optimization criterion different from maximum likeli- hood, and allows us to learn models that capture longer range effects, but without giving up the beneﬁts of using ﬁrst-order Markov models. Our experimental results also show the new algorithm outperforming conven- tional maximum likelihood estimation in a number of control problems where the MDP’s parameters are estimated from data.