This is a borderline paper. The paper has considerable strengths, including a much more realistic continual learning setting for practical applications (please note that there was some disagreement among the reviewers on this, although this area chair sides with the authors), solid positioning with respect to the literature, a method that performs very well in the proposed setting, and strong empirical results. It's main weaknesses are lack of clarity at times, critical details relegated to the appendix, and a lack of discussion/evaluation for how this setting would work in RL (I can easily imagine this, and the authors sketched it in the rebuttal, but it still should have been included in the paper). The authors do need to address these weaknesses before it is ready for publication, since another round of revisions would really benefit the paper. However, I want to commend the authors since I think that this is great work, and look forward to its publication.