Adaptive Choice of Grid and Time in Reinforcement Learning

Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)

Bibtex Metadata Paper


Stephan Pareigis


We propose local error estimates together with algorithms for adap(cid:173) tive a-posteriori grid and time refinement in reinforcement learn(cid:173) ing. We consider a deterministic system with continuous state and time with infinite horizon discounted cost functional. For grid re(cid:173) finement we follow the procedure of numerical methods for the Bellman-equation. For time refinement we propose a new criterion, based on consistency estimates of discrete solutions of the Bellman(cid:173) equation. We demonstrate, that an optimal ratio of time to space discretization is crucial for optimal learning rates and accuracy of the approximate optimal value function.