Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Although the reviewers like the paper, they are all concerned about the novelty of the results. The author admit in their response that the novelty is not in the estimator, IR or SIR, but in its application to RL. But then the main assumption in the analysis of the paper (Assumption 1) says that the samples in the buffer are i.i.d., which is not a valid assumption in RL. However, after all, we think introducing IR or SIR and using it in the problem of off-policy evaluation is important and beneficial to the community. I strongly recommend that the authors address the issues raised by the reviewers and clarify the contribution of the work (that Assumption 1 is in contrast with the online nature of RL) in the final version of the paper.