Part of Advances in Neural Information Processing Systems 6 (NIPS 1993)
Parti-game is a new algorithm for learning from delayed rewards in high dimensional real-valued state-spaces. In high dimensions it is essential that learning does not explore or plan over state space uniformly. Part i-game maintains a decision-tree partitioning of state-space and applies game-theory and computational geom(cid:173) etry techniques to efficiently and reactively concentrate high reso(cid:173) lution only on critical areas. Many simulated problems have been tested, ranging from 2-dimensional to 9-dimensional state-spaces, including mazes, path planning, non-linear dynamics, and uncurl(cid:173) ing snake robots in restricted spaces. In all cases, a good solution is found in less than twenty trials and a few minutes.
1 REINFORCEMENT LEARNING
Reinforcement learning [Samuel, 1959, Sutton, 1984, Watkins, 1989, Barto et al., 1991] is a promising method for control systems to program and improve themselves. This paper addresses its biggest stumbling block: the curse of dimensionality [Bell(cid:173) man, 1957], in which costs increase exponentially with the number of state variables.
Some earlier work [Simons et al., 1982, Moore, 1991, Chapman and Kaelbling, 1991, Dayan and Hinton, 1993] has considered recursively partitioning state-space while learning from delayed rewards. The new ideas in the parti-game algorithm in-