Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Originality: There is no prior work that attempts HER in visual domains without explicit goal conditioning. This work is the first one to do this. The designing and training on HALGAN seem rather standard and similar to InfoGAN. Quality: This paper is technically sound and the performance of HALGAN + HER is validated in two environments. Clarity: This paper is well written and easy to follow. The details are adequate for readers to understand and reproduce the paper. Significance: While the authors hope this method could be a step to reducing the sample complexity of training RL and eventually training agents in the real world. The environments shown in the paper are primitive. It is easy to synthesize images in these two environments while it is not clear how HALGAN will perform in diverse and realistic 3D indoor scenes.
Originality: The authors make clear the distinction from related work. They are not the first to integrate GANs for generating goals in RL, but do so in a new and interesting way. Quality: The comparison with baselines is thorough, showing the benefit of this approach for these domains. However, page 7 claims that HALGAN RL agents need fewer samples than standard RL, and yet in fact HALGAN must be exposed to enough samples of successful trajectories to be able to effectively hallucinate goal states. Are the 1000-samples used to train the HALGAN shown in Figure 3(f) 1000 examples of /goal/ states, or just states in general. How long does the agent have to explore before a HALGAN can be trained? This discussion needs to be made more clearly and carefully. A second question is why the generator of HALGAN does not input the s_t that it is trying to modify. Without seeing the original state, how can HALGAN generate a goal if the goal is potentially occluded by something in the state. If this can never happen, are these environments realistic? Will the HALGAN approach work in more complex environments? Clarity: Overall the paper is very clear, the diagram in Figure 2 and the rest of the figures contribute to ease of understanding the contributions. The distinction from prior work is clear, as is the motivation. Minor point: there is a missing period at the end of the paragraph which starts section 5 (ending with "hallucinations of the goal are discussed next"). Significance: The idea makes sense and could be useful, however it's not clear how often real-world tasks have visible goals which must be added to the state. Further discussion of examples where this is the case in the real world could help bolster the significance of the paper.
The paper extends HER to visual environments where unsuccessful trajectories can be hallucinated to appear successful by altering the agent’s observations using a generative model. While the idea presented in the paper is interesting, I have questions about the scalability of this approach. Specifically, the approach requires training a generative network to produced observations that contain the goal from the environment. I am interested to know how the authors intend to scale up this approach to more complex visual domains like Atari, DeepMind Lab etc. In these complex visual domains hallucinating a goal image that lies within the space of observations of an environment seems difficult and is unclear from the approach presented in this paper. After reading the rebuttal: The main strength and weakness of the paper are as follows (from my perspective): * (strength) the authors introduce a generative approach for applying Hindsight Experience Replay (HER) in visual domains: the idea is simple and has the potential to improve our current Deep RL methods. * (weakness) currently, the paper does not seem to have a detailed discussion on how their generative model was trained to produce images containing the goal information. The authors do clarify this on their feedback and it would be useful if they also add this discussion on their next version of the paper. More importantly, including this discussion is useful for the Deep RL community. * (weakness) their current approach of training the generative model relies on manually annotating the goal images, which may prevent scalability of the algorithm. Addressing this could make their approach be more impactful.