An Environment Model for Nonstationary Reinforcement Learning

Samuel P. M. Choi, Dit-Yan Yeung, Nevin Lianwen Zhang

Advances in Neural Information Processing Systems 12 (NIPS 1999)

Reinforcement learning in nonstationary environments is generally regarded as an important and yet difficult problem. This paper partially addresses the problem by formalizing a subclass of nonsta(cid:173) tionary environments. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basically indexes a Markov decision process (MDP) and evolves with time according to a Markov chain. While HM-MDP is a special case of partially observable Markov decision processes (POMDP), modeling an HM-MDP environment via the more gen(cid:173) eral POMDP model unnecessarily increases the problem complex(cid:173) ity. A variant of the Baum-Welch algorithm is developed for model learning requiring less data and time.