Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
Meta Review
The paper is well written with solid theoretical contributions. Reviewers are also happy with the rebuttal. The common concern is that the paper lacks experimental evaluation.