Reviews: Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

The paper proposes a method for stopping unnecessary exploration in RL with a bounded regret on the loss. The stopping method, called e-stop, learns from state-only demonstrations provided by an expert. The paper is very well-written and clear to follow. The theoretical analysis of the method is compelling. The experiments are rather minimalistic, but they support the theoretical analysis. A drawback of this work is that it is limited to discrete MDPs, which could seriously hinder its impact given that modern RL methods are all about continuous high-dimensional state spaces.

Paper ID:	8723
Title:	Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation