Title:A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

Even after the discussion and the author response there was still some disagreement between the reviewers. The paper proposes a simple yet novel and very interesting idea. There still are a few concerns about clarity, but those can be fixed in the final version (see updated reviews). Overall this is a solid paper, that (as always) would benefit from more thorough empirical evaluation. One reviewer proposed to add an additional baseline of a domain-randomized robust policy that is trained on various tasks. In summary an interesting idea that might not be a 100% fleshed out but is ready to be "put out there" nevertheless. It would be good to mention the AAMAS extended abstract in the final version and to discuss the relation to it.