This paper introduces a simple idea for MARL, using importance weights to correct for off-policy. Generally, the reviewers agree that the paper is clear and well written. Although the main idea is very natural and intuitive, as pointed out by reviewer 4, it is not intuitive that is would actually work. Therefore, one of the strengths of this paper is to show that intuition fails us in this case. The reviewers point out some weaknesses in the empirical sections, in particular comparisons with other methods, and we hope that the authors will be able to address some of these in the final version of the paper.