A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

Part of Advances in Neural Information Processing Systems 11 (NIPS 1998)

Bibtex Metadata Paper


Nobuo Suematsu, Akira Hayashi


We describe a Reinforcement Learning algorithm for partially observ(cid:173) able environments using short-term memory, which we call BLHT. Since BLHT learns a stochastic model based on Bayesian Learning, the over(cid:173) fitting problem is reasonably solved. Moreover, BLHT has an efficient implementation. This paper shows that the model learned by BLHT con(cid:173) verges to one which provides the most accurate predictions of percepts and rewards, given short-term memory.