Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper addresses the problem of catastrophic forgetting when learning different tasks in RL. The proposed approach is based on experience replay. While the approach is of moderate novelty, it has the interesting property compared to the more complex approaches at the state of the art (e.g. Progress and Compress and Elastic Weight Consolidation) that it does not require the tasks and their boundaries to be known beforehand. The bulk of the paper is on the experiments, considering three DMLab tasks. It is shown that the efficiency of the approach does not depend on the fraction of past/novel tasks, and on the size of the memory buffer. Complementary investigations are required in the camera-ready to better understand how and why it works: * using a visualization of the internal state of the network, to understand whether the learner stores the policy for the different tasks in different regions of the (latent) state space; * assessing the sample complexity of each task and setting the size of the memory buffer to 1, 1/2, 1/3 of this sample complexity; * examining the sensitivity of the approach to the truncated importance sampling coefficients \bar c and \bar \rho.