Policy gradients in linearly-solvable MDPs

Part of Advances in Neural Information Processing Systems 23 (NIPS 2010)

Bibtex Metadata Paper Supplemental

Authors

Emanuel Todorov

Abstract

We present policy gradient results within the framework of linearly-solvable MDPs. For the first time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the first compatible function approximators and natural policy gradients for continuous-time stochastic systems.