RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning

Part of Advances in Neural Information Processing Systems 27 (NIPS 2014)

Bibtex »Metadata »Paper »Reviews »Supplemental »


Marek Petrik, Dharmashankar Subramanian


<p>We describe how to use robust Markov decision processes for value function approximation with state aggregation. The robustness serves to reduce the sensitivity to the approximation error of sub-optimal policies in comparison to classical methods such as fitted value iteration. This results in reducing the bounds on the gamma-discounted infinite horizon performance loss by a factor of 1/(1-gamma) while preserving polynomial-time computational complexity. Our experimental results show that using the robust representation can significantly improve the solution quality with minimal additional computational cost.</p>