Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)
Andres Rodriguez, Ronald Parr, Daphne Koller
The problem of developing good policies for partially observable Markov decision problems (POMDPs) remains one of the most challenging ar(cid:173) eas of research in stochastic planning. One line of research in this area involves the use of reinforcement learning with belief states, probabil(cid:173) ity distributions over the underlying model states. This is a promis(cid:173) ing method for small problems, but its application is limited by the in(cid:173) tractability of computing or representing a full belief state for large prob(cid:173) lems. Recent work shows that, in many settings, we can maintain an approximate belief state, which is fairly close to the true belief state. In particular, great success has been shown with approximate belief states that marginalize out correlations between state variables. In this paper, we investigate two methods of full belief state reinforcement learning and one novel method for reinforcement learning using factored approximate belief states. We compare the performance of these algorithms on several well-known problem from the literature. Our results demonstrate the im(cid:173) portance of approximate belief state representations for large problems.