Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
It would be interesting to increase the complexity of the tasks. Also, I'd love to see a comparison to other related methods and an explanation of why you're not using language to describe the tasks, e.g. https://arxiv.org/abs/1712.07294
The authors propose a new method called compositional plan vectors to handle complex tasks which involve compositions of skills. The method is based on a simple but effective idea of using the addition of task representation vectors to compose tasks. In my opinion, learning complex tasks involving many subskills is an important and challenging problem and the proposed method provides a novel and interesting way to approach this problem. The paper is written well and easy to follow. The authors position their method well with respect to prior work. The experiments test the proposed model in 2 diverse simulation environments. The proposed model is shown to perform better than a competitive baseline in both environments. CPV also outperform the baselines in generalizing to unseen longer tasks. One of my concerns about the submissions is lack of motivation behind introducing new environments. Using existing environments make it easier to compare with prior methods and help in reproducibility. The authors should justify the need for new environments.
Summary The paper proposes a new method for better and more efficient generalization to more complex tasks at test time in the setting of one-shot imitation learning. The main idea is to condition the policy on the difference between the embedding of some reference trajectory and the a partial trajectory of the agent (for the same task, but starting from a potentially different state of the environment). Main Comments I found the experimental section to be slightly thin and I would like to see how this method performs on at least another more complex task. It would also be good to include a discussion on the types of environments where we can expect this to perform best and where we can expect it to fail or perform worse than other relevant algorithms. I also think more comparisons with other approaches for one-shot imitation learning (such as Duan et al. 2017) are needed for strengthening the paper. How does CPV compare to other imitation learning algorithms such as Behavioral Cloning, Dagger, or GAIL? I believe that the paper’s related work section could be significantly improved. It seems to me that it is lacking some important references on learning using task embeddings / goal representations, as well as works on multi-task learning, transfer learning and task compositionality. Some examples of relevant references are: C. Devin, A. Gupta, T. Darrell, P. Abbeel, and S. Levine. Learning modular neural network policies for multi-task and multi-robot transfer. In ICRA, 2017. B. C. da Silva, G. Konidaris, and A. G. Barto. Learning parameterized skills. In ICML, 2012. T. Schaul, D. Horgan, K. Gregor, and D. Silver. Universal value function approximators. In ICML, 2015. S. P. Singh. The efficient learning of multiple task sequences. In NIPS, 1991 Haarnoja et al 2018, Composable Deep Reinforcement Learning for Robotic Manipulation Hausman et al. 2018, Learning an embedding space for Transferable Robot Skills Minor Comments: You mention that the TECNet normalizes the embedding. Is the embedding normalized in the same way in your model? It would be good to add some experiments to tease apart the effects of this normalization. How much of the difference in the performance of CPV and that of the TECNet is due to this normalization? Some details regarding the state representation, the model architecture and training parameters are missing It would be interesting to see if this method can be applied in RL contexts as well and how well it generalizes to new tasks. Clarity The paper is clearly written overall. Significance The problem of generalizing to unseen tasks is a very important and unsolved one in the field. The paper seems relevant for the community although I believe it could be applied outside of imitation learning as well. While the paper doesn’t introduce a completely novel concept or approach, the way it combines the different building blocks seems promising for obtaining a better performing method for one-shot imitation learning and generalization to unseen tasks. I believe people in the community would learn something new from reading the paper and could build on its findings. Originality The paper doesn't introduce a fundamentally novel idea, but the way it combines previous ideas is not entirely trivial and the authors demonstrate its benefits compared to relevant ablations. Quality The paper seems technically correct to me. ---------- UPDATE: I have read the rebuttal and the other reviews. The authors addressed most of my concerns, added comparisons with a strong baseline and evaluated their approach on another environment. The results show that their method is able do well on certain compositional tasks on which the baseline fails to gain any reward. For all these reasons, I have increased my score by 1 point (total of 7).