Reviewers agreed that this paper makes a good contribution, and enjoyed the principled derivation of the algorithm building upon the f-divergence for imitation learning and the distribution error correction in off-policy RL.