Natalia Hernandez-Gardiol, Sridhar Mahadevan
Department of Computer Science
Michigan State University East Lansing, MI 48824 firstname.lastname@example.org
A key challenge for reinforcement learning is scaling up to large partially observable domains. In this paper, we show how a hier(cid:173) archy of behaviors can be used to create and select among variable length short-term memories appropriate for a task. At higher lev(cid:173) els in the hierarchy, the agent abstracts over lower-level details and looks back over a variable number of high-level decisions in time. We formalize this idea in a framework called Hierarchical Suffix Memory (HSM). HSM uses a memory-based SMDP learning method to rapidly propagate delayed reward across long decision sequences. We describe a detailed experimental study comparing memory vs. hierarchy using the HSM framework on a realistic corridor navigation task.