The paper proposes an off-policy policy iteration scheme for factored action spaces where different actions (action dimensions) are persistent with different frequencies. The reviewers agree that the proposed approach is sound, novel, and well motivated. The paper is well written. There is some disagreement how broad the range of applications is to which the proposed method can be applied and what this means for the impact of the paper (R5); some concerns regarding the scalability (R5) of the approach; and some desire for environments not designed by the authors (R2). The AC believes that, although the application domain may be somewhat niche, and the proposed method the result of a somewhat straightforward reasoning about basic properties of MDPs (I don't mean this in a bad way; such basic ideas are often overlooked), on balance the paper will be useful and of interest to the community. The authors are encouraged to take into account the feedback from the reviewers and expand the discussion of related work; the lack of a discussion of semi-MDP seems like an important oversight. (Intra-option learning may be of particular relevance to the off-policy learning scheme proposed by the authors.) Furthermore, the authors are encouraged to address the scalability concerns expressed by R5.