Review for NeurIPS paper: Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

NeurIPS 2020

Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

Meta Review

The problem of exploration in RL with function approximation is very important and any advancement on the topic is of interest for the community. The reviewers all agreed about the algorithmic and technical contribution of the paper, in particular the introduction of sensitive sampling and its analysis in the regret proof. This convinced us that the paper deserves acceptance. Nonetheless, I also encourage the authors to improve the current submission. As pointed out by R3, the assumptions used in the paper are quite strong and they may somehow limit the generality of the results. The authors should stress the potential limitations coming from the assumptions and better contrasted it with the related literature. In particular, using the Eluder dimension as a measure of complexity is definitely interesting but, as of today, it lacks of interpretability. In fact, we have very few families of MDPs for which a meaningful bound for the Eluder dimension is available (i.e., linear and GLM). The authors should point this out and possibly clarify whether it is possibly to obtained good bounds for other classes of problems.