This paper tackles the important problem of learning from sparse or delayed rewards. Reviewers liked the simplicity of the proposed approach and found the results to be impressive. The rebuttal addressed most of the main concerns and included important missing baselines and the reviewers reached a consensus that the paper should be accepted. The authors are encouraged to improve the final manuscript based on the reviews. Results on widely studied sparse reward tasks would have made the paper even stronger but the current evaluation warrants acceptance as a poster.