Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)
Arnab Nilim, Laurent Ghaoui
Optimal solutions to Markov Decision Problems (MDPs) are very sen- sitive with respect to the state transition probabilities. In many practi- cal problems, the estimation of those probabilities is far from accurate. Hence, estimation errors are limiting factors in applying MDPs to real- world problems. We propose an algorithm for solving finite-state and finite-action MDPs, where the solution is guaranteed to be robust with respect to estimation errors on the state transition probabilities. Our al- gorithm involves a statistically accurate yet numerically efficient repre- sentation of uncertainty, via Kullback-Leibler divergence bounds. The worst-case complexity of the robust algorithm is the same as the origi- nal Bellman recursion. Hence, robustness can be added at practically no extra computing cost.