Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper presents HAAR - a hierarchical reinforcement learning approach that is based on the idea of using the advantage / temporal difference error of the high-level controler provide the reward signal for the lower layer. The reviewers judged this approach to be novel, and empirical results are promising. Analytical results provide improvement guarantees similar to a base algorithm like TRPO. Several areas for improvement were mentioned, and many of these were addressed in the rebuttal. For example, the reviewers were pleased to see the additional experiment showing performance from random skill initialization. Remaining questions after the rebuttal were as follows. First, it was not clear to what the approach may require full observability, and whether the present experiments were specifically constructed with this in mind. A clear specification of each observation space should be provided in the camera ready version, and limitations of the approach (e.g., in terms of partial observability) should be discussed. In addition, there are remaining questions about the precise assumptions underlying the presented analysis. The current claim is too broad, as it is not qualified by the specific assumptions made. Overall, the paper is judged to make a valuable contribution. I urge the authors to carefully consider all reviewer suggestions to improve the camera ready version of the paper.