Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper presents an appealing idea to combine current max-entropy methods in RL with Monte-Carlo Tree Search. A theoretical result shows improved rate of convergence, while empirical results show improved sample efficiency. The initial reviews were quite positive; I only noted a small number of issues mentioned in the reviews of R1 and R3. In our discussions after reading the author feedback, R3 noted that some of his concerns have not been addressed. R2 replied, saying that these concerns are relatively minor and can be addressed in the final version. With final scores of (8, 7, 6) the paper has quite good chances being accepted NeurIPS.