Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)
Many researchers have explored methods for hierarchical reinforce(cid:173) ment learning (RL) with temporal abstractions, in which abstract actions are defined that can perform many primitive actions before terminating. However, little is known about learning with state ab(cid:173) stractions, in which aspects of the state space are ignored. In previ(cid:173) ous work, we developed the MAXQ method for hierarchical RL. In this paper, we define five conditions under which state abstraction can be combined with the MAXQ value function decomposition. We prove that the MAXQ-Q learning algorithm converges under these conditions and show experimentally that state abstraction is important for the successful application of MAXQ-Q learning.