Loading [MathJax]/jax/output/CommonHTML/jax.js

On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

Part of Advances in Neural Information Processing Systems 25 (NIPS 2012)

Bibtex Metadata Paper

Authors

Bruno Scherrer, Boris Lesner

Abstract

We consider infinite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. Using Value and Policy Iteration with some error ϵ at each iteration, it is well-known that one can compute stationary policies that are $\frac{2\gamma{(1-\gamma)^2}\epsilonoptimal.Afterarguingthatthisguaranteeistight,wedevelopvariationsofValueandPolicyIterationforcomputingnonstationarypoliciesthatcanbeupto\frac{2\gamma}{1-\gamma}\epsilonoptimal,whichconstitutesasignificantimprovementintheusualsituationwhen\gammaiscloseto1$. Surprisingly, this shows that the problem of computing near-optimal non-stationary policies'' is much simpler than that of computing near-optimal stationary policies''.