Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
All reviewers recommend accepting the paper. The authors response did address most of the reviewers' concerns. While the AC recommends accepting the paper, the AC encourages the authors to consider the comments of reviewer 1. Specifically, regarding the literature review as well as the hyper-parameter selection in the experimental section. Only changing the backup mechanism keeping all other hyper parameters fixed as in the Nature DQN model is indeed a good experimental setup. However, the optimal operation mode for different models might be different (even when sharing architectures and training protocols): for instance we could 'afford' a larger learning rate if we have a better back-up mechanism. Furthermore it would be informative to include experiments for longer than 10M frames (at least on some key games). In any case, the paper is novel and it is certainly valuable from a practical perspective to successfully implement the backward update idea in the deep reinforcement learning domain (with the current empirical evaluation). Theoretical convergence analysis of the algorithm is also valuable.