The paper focuses on efficiently exploring MDPs with high dimensional state representations, by combining an unsupervised algorithm for learning a low-dimensional representation and then solving the problem in this low-dimensional space. The paper is largely theoretic and show that in certain conditions, near-optimal policies can be found with polynomial complexity in the number of latent states. The reviewers mostly agreed on the following points. The paper is considered well-written, and presents theoretically strong results that are sound, novel, and non-trivial. As weaknesses of the paper the reviewers mentioned the lack of empirical results in more realistic settings and restrictive assumptions. The reviewers had a couple of doubts about the paper, which mostly were addressed in the rebuttal (e.g. is improvement due to exploration, how is a discrete set of latent states, errorbars). Everything considered, in accordance with the opinion of the reviewer, I recommend acceptance for this paper. I would like to ask the authors to make sure the clarifications given in the rebuttal are reflected in the final version of the manuscript.