Convergence of Stochastic Iterative Dynamic Programming Algorithms

Jaakkola, Tommi; Jordan, Michael; Singh, Satinder

Convergence of Stochastic Iterative Dynamic Programming Algorithms

Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh

Advances in Neural Information Processing Systems 6 (NIPS 1993)

Abstract

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learn(cid:173) ing problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DP-based learning algorithms to the pow(cid:173) erful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD(") and Q-Iearning belong.

Abstract

Name Change Policy