Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper proposes a method for stopping unnecessary exploration in RL with a bounded regret on the loss. The stopping method, called e-stop, learns from state-only demonstrations provided by an expert. The paper is very well-written and clear to follow. The theoretical analysis of the method is compelling. The experiments are rather minimalistic, but they support the theoretical analysis. A drawback of this work is that it is limited to discrete MDPs, which could seriously hinder its impact given that modern RL methods are all about continuous high-dimensional state spaces.