The work has clear positives: + paper presents a novel algorithm that achieves low regret + novel consideration of adaptive discretization for model-based RL (prior work focuses on the model-free case) + important practical focus on computational resources + novel theory However, there are significant issues as well: - the experiments are not were presented or discussed within the context of the rest of the paper: they appear to contradict the main messages of the paper - there is confusion over how the experiments were implemented and evaluated (e.g., proper averaging over independent runs, fair treatment of hyperparameters etc). See R1 for more details - the proposed algorithm exhibits space complexity that monotonically increases; the authors suggested to just cap it - poor discussion of model-based RL with function approximation (linear dyna, recent deep learning approaches etc) - related no clear argument why we would explore adaptive discretization approaches compared to other approaches. The paper is doing something different than the majority of the community---that can be good but it should be directly addressed Summary of the discussion. The reviewers thought the experiments considerably weakened the paper, and it would be best if they were removed from the paper. The strongest advocate of the paper had low confidence and not much to say much during discussion. The biggest critic also had low confidence. The most knowledgable reviewer R5 landed at weak accept concluding the theory is interesting on its own, but the paper needs significant revision to deal with the experiment issue. The meta reviewer agrees. The authors seem to agree that the paper could be much better if more strongly pitched towards the theory side.