Reviews: Hierarchical Decision Making by Generating and Following Natural Language Instructions

After author feedback and reviewer discussion, this paper received diverging final ratings of 7 (R1), 3 (R2) and 4 (R3). Given this lack of consensus, the AC read the paper, reviews, feedback, and discussion closely, and in this case decided to accept. Of the three reviews, R2’s rating was the most negative. R2’s review noted that the paper was well-written, and praised the inclusion of challenging linguistic phenomena in the dataset, while raising concerns with the characterization of the language as ‘latent’ and requesting additional details (e.g. regarding the RNN encoder and the ranking approach). In the context of the review itself, R2’s rating (3, ‘clear reject’) appears to be calibrated to a relatively strict standard, possibly stricter than some other NeurIPS reviewers. Turning to the paper itself, in the opinion of the AC, the use of natural language to decompose complex tasks into manageable subgoals is an important research direction that is worthy of further study. As noted in author feedback and by R1, natural language has some advantages over program specifications, e.g. interpretability, applicability to multiple tasks, free availability in some cases. In this paper, the authors show in (a real-time strategy game) that models that are trained to generate and then follow natural language instructions outperform models trained only to directly imitate human actions. This is not a trivial finding, and has not been widely demonstrated. The paper is clearly written, and comes with code and a detailed supplementary. Therefore, it is the opinion of the AC that this paper would be of significant interest to the NeurIPS community, and groups working on hierachical RL, options frameworks, hindsight experience replay, etc. Having said that, R2 and R3 raise valid concerns. R2 correctly notes that a variable which is explicitly annotated is not latent. R3 raises concerns regarding the claim that the increase in performance can be attributed to the compositionality of language. It is the opinion of the AC that these issues can be addressed in the camera-ready version (and should be). Regarding compositionality, it might be advisable to make a weaker claim that does not specifically invoke compositionality (which is not studied in detail in the paper).

Paper ID:	5293
Title:	Hierarchical Decision Making by Generating and Following Natural Language Instructions