NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This submission drew a great deal of discussion -- primarily on the point of the role of learning. All reviewers agreed that the approach had the potential to learn interesting, non-trivial things but did not feel the the current experiments demonstrated these effectively -- despite strong performance on the task. Some examples of questions that were not answered by the main draft but came up in the discussion: [Training Data] The training data provides edges in the dependency graph, subgoals, and predicate value -- image pairs. One question was whether the union of the seen dependency graph constituted the entire true underlying graph. Similarly, do all predicate-object pairs occur? These all get towards the question of "what is left to generalize to after training has finished?" [Generalization] Following up on that thought, where is evidence of learning generalized representations? Does the satisfcation network perform well on known predicates and objects in never-before-seen configurations? What about the reachability or dependency networks for combinations of known but never-before-paired subgoals? Does the precondition network understand that cooked(X) requires On(X,pot), Activated(stove) regardless of X? What if X is new? [Speed] One potential benefit of the precondition network is speed -- having learned useful / likely preconditions from the demonstrations. Given the demonstrations were generated using A* over the planning domain, has the precondition network learned to prune out ineffecient paths to goals? [Data Effiecency] While the demonstration counts were provided in the supplementary, it would be good to know how the performance of the approach varies with these. [Baselines] There are not really good baselines here to understand generalization of the learned components. The results certainly show that planning makes sense and that the decomposition of the networks is a good one, but not whether the learning is generalizing as described above. An ideal baseline would be something that can leverage the demonstrations and the planning algorithm without the possibility of generalizing. Something like a rule-learning algorithm might do the trick and would be limited to observed transitions. For inference, the satisfaction network could be used to estimate state. What is clear is that the proposed method can effectively learn from demonstrations without the need for explicitly learning the planning domain action rules like classical planners. This contribution is itself valuable even though the source for the detailed demonstrations may not yet be available. It seems likely that this paper would spur further work in this area. The pair of experimental settings make for convincing results. I do however recommend authors consider studying the questions posed above to demonstrate that the models have learned non-trivial things about the environments aside from those directly supervised by the demonstrations -- after all, it is somewhat clear that a large enough neural network with an infinite data generator like this could learn the set of binary and unary prediction tasks in a small synthetic environment.