Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper is well-written and easy to follow. To my best knowledge, the work is novel and provide an extension to the successor representation. If data available, it would be interesting to see how similar or dissimilar human/animal behave to the proposed model, especially to examine the new point of view of the hippocampal function. The choice of experimental setting can be briefly explained for the readers not very familiar with relevant literature. For example, given that we interpret the results by looking at whether it learns the barrier, it would be good to explain why the barrier is curial to the experiment/model evaluation. _____________________ After authors response: The authors promised more discussion on neuroscience and comparison with earlier models, which would be valuable for this paper. So I am happy to increase my score by 1 point. Please also make some efforts to make the paper more readable by a boarder audience, as suggested by other reviewers and myself.
I don't have any major technical criticisms of the paper. However, I didn't feel that the experimental results really highlighted the advantages of this approach. Specifically, the authors never compare against any other method for solving POMDPs. I think this is necssary for making a compelling case for this method. Is it more sample efficient, more computationally efficient, more flexible? The links to neuroscience are very intriguing to me (as a neuroscientist), though I fully understand that the authors don't really have space to explore them. Nonetheless, one possibility might be to focus more on these applications instead of the toy experimental results. Then the impetus for demonstrating computational superiority over other methods is replaced with an impetus for demonstrating that this model can explain aspects of neural computation (e.g., in the hippocampus) that alternative models cannot. Minor: p. 3: "set features" -> "set of features" There seems to be inconsistent indexing of various quantities (e.g., M, V, T) by the policy pi. -------------- Comments after rebuttal: I was already quite enthusiastic about this paper. I don't think it's necessary to increase my rating. I'm happy that the authors are interested in providing more neuroscience context for their ideas.
The paper present a POMDP continuous approximation scheme for the RL successor representation, relevant when states are not known but must be inferred, allowing for a realistic learning algorithm in-between model-free and model-based RL. I am not aware of anyone having attempted something similar, clearly a very original submission. This is a dense paper (in a good way) with a lot of vey interesting results. The density meant that some of the results were a little hard to follow, but I do not doubt the importance of these results. This should be of relevance to a large proportion of NeurIPS attendees. After author response: I still think some more details on simulations would have beneficial, and I am not sure how much more the authors' response really adds, but I am still excited about this paper.