__ Summary and Contributions__: The papers formalises and solves the the problem of assigning treatments under the knowledge of a casual graph that takes into account the effects of treatments with outcomes and the effect of potential confounders that can be observed or not. Three algorithms are proposed and evaluated: an exact solution that uses dynamic programming a greedy approximation for high dimensional settings and a model-free approach motivated by the RL literature.
Apart from the new algorithms and the experimental work, the paper is full of theoretical insights that characterise the identification of the models under observed and unobserved confounders.

__ Strengths__: - The problem in very relevant particularly in healthcare.
- The paper is extremely well written, and full of interesting theoretical and practical results.
- The experimental section is convincing.

__ Weaknesses__: I don't see any major weaknesses in this work, and i think that it is a pretty solid contribution. One can always discuss whether the causal graph that the authors assume is given is realistic or not depending on the problem but in this case I think that the authors consider an scenario that is general enough

__ Correctness__: Yes, to my knowledge the approach is correct and the code seems to be well written too.

__ Clarity__: Yes, the paper is very clear. Some sections are a bit dense, but the authors have done a good job motivating the results and introducing the notation carefully.

__ Relation to Prior Work__: Previous work is cited and I am not aware of any important piece of literature that is missing.

__ Reproducibility__: Yes

__ Additional Feedback__: I think that his is pretty solid work and I don't have any major complain or comment.

__ Summary and Contributions__: This paper aims at designing a method for searching for an effective medical treatment. The authors use causal inference framework to formalise the problem and provide a dynamic programming algorithm to solve the problem. The proposed method is evaluated by both synthetic and real world data sets, and a comparison has been conducted.

__ Strengths__: The paper studies an interesting problem.
The experiment is on a real world data set.

__ Weaknesses__: I do not see how causal inference framework help formulate and solve the problem.
Firstly, I am confused by the use of potential outcome Y_s (a). If the “a” is binary, it normally contains 1 or 0 representing treatment and control. We can only observe one of two potential outcomes, and the other needs to be estimated (the counterfactual potential). When reading this paper, I could not see the discussions on the observed potential outcome and counterfactual outcome, and I could not see their link to the treatment effect either.
Secondly, I doubt whether Figure 1(a) is realistic. It says that the outcome Y_T is only determined by A_T. It is surprise that the most recent treatments and outcomes do not affect the current outcome. Thinking about that the A_T is set by a doctor and it a manipulation in the causal graph. Once A_T is set, the Y_T is determined by A_T only. I think that this is very simple and unrealistic in practice.
Thirdly, the proposed method and its variants perform similarly to a model free RL method, which is a naïve dynamic programming based method (NDP), in the real world data set. Authors argue that the proposed methods offer more transparent trade-off between search time and treatment efficacy, but this has not been elaborated in the experiment section.

__ Correctness__: No flaws are identified.

__ Clarity__: The paper is well written.

__ Relation to Prior Work__: Discussions on related work are reasonable.

__ Reproducibility__: Yes

__ Additional Feedback__: In causal inference, a distinction of observed outcomes and counterfactual outcomes is very important. In the paper, such distinction is not clear. Authors point to Theorem 1 to link two together. I read Section 4 again, I could not see such a distinction. So, I cannot see the connection.
From authors' reply "To solve our policy optimization problem (1), it is not necessary to impute all counterfactuals. For example, in the binary case given by R2, if an action a = 0 has been tried, and Y (0) observed, only the probability that Y (1) > Y (0) is required to solve the problem.", I am even not sure if the authors understanding is correct. Each individual has two potential outcomes, Y(0) and Y(1). When Y(0) is observed for an individual, we need to estimate his/her counterfactual potential outcome Y(1). Individual treatment effect is Y(1) - Y(0), and the average treatment effect in a population is E[Y(1) - Y(0)]. If we do not estimate the counterfactual outcome for the individual, we do not know the individual treatment effect. If we do not estimate the counterfactual outcomes of all in the population, we do not have the average treatment effect. I am not sure how authors estimate the probability that Y (1) > Y (0) without imputing all counterfactuals. In this equation. Y(1) is a potential outcome, not an observed outcome. This is exactly what I would like to know how to estimate probability that Y (1) > Y (0) in observed data set.

__ Summary and Contributions__: Some methods are discussed to search for causally near-optimal treatments, such as the best antibiotic to use for a particular patient.
A greedy approach is recommended.

__ Strengths__: Theoretical claims are sound. I have not heard of this approach before (am I naive?), but the motivation make sense. I was particularly taken with the empirical example--this clearly indicates how this approach might be of use to practitioners.

__ Weaknesses__: It was stated that experts are still performing better than this approach on the empirical example, which is definitely a limitation. The remark was that the experts might be privy to information that the program was not, a typical problem with trying to provide programmatic advice in place of expert advice. However, if that's the case, would it be feasible to try to include more information to the optimization to close the gap? If not, what is the scenario? In a hospital, say, does the patient's information get coded into the program, the button pushed, and an antibiotic output, to be considered by the physician? Does a ranked list of potential antibiotics get output for the physician to choose from? It's unclear to me.

__ Correctness__: The theoretical claims, and the arguments in the appendix, look good. The empirical methodology is very well done.

__ Clarity__: The paper is VERY WELL WRITTEN! Thanks!

__ Relation to Prior Work__: Yes--different from previous contributions (that I know of).

__ Reproducibility__: Yes

__ Additional Feedback__: I appreciated the author feedback; I thought it was very clear.

__ Summary and Contributions__: The paper proposes a formal problem space for efficiently identifying a treatment which meets or exceeds a predefined level of efficacy. This problem space does not involve identifying a more efficaous treatment, or combination of treatments, but rather focuses on getting to an acceptably effective treatment, using an iterative process, as quickly as possible.

__ Strengths__: The problem space is clearly defined, mathematically/statistically defensible, and may spur additional experiments to surpass the baselines that the authors have established.
The manuscript itself was clearly written, easy to follow, and provided ample support for the claims that were made.
The inclusion of (a) proofs, (b) synthetic data, (c) real-world observational data are laudable.

__ Weaknesses__: The primary weaknesses of the paper do not concern the methods, rigor, or writing of the submission, but whether:
a) The specific problem for which the approach was developed is meaningful AND
b) whether the results for the best approach are meaningful in the context of application chosen by the authors

__ Correctness__: Yes, claims logically flow from problem statement and results. Methodology is well supported
One nit-picky question if I may:
In Figure 2a, it is curious that the mean effect of the emulated doctor is so much lower than that of the only real-world data point, that of the actual doctor. This may indicate that the approach for defining the emulated doctor needs refinement and that the curve for it may be artificially low

__ Clarity__: Clearly written paper. Perhaps a bit too much mixing of mathematical notation in the middle of sentences, but not out of reason.

__ Relation to Prior Work__: The current work was clearly linked to prior contributions across multiple fields.

__ Reproducibility__: Yes

__ Additional Feedback__: Overall, this is a very clear and well supported manuscript. However, given this work is being presented with specific healthcare applications in mind, it's challenging to separate the clinically unimpactful results from the rigorous definition of the problem space and methodology.
The results clearly demonstrate that the authors approach is superior to clinical baselines and alternative approaches for the application addressed in this paper. However, the improvement offered is minuscule.
Additionally, the authors state (and I agree) that the approach may not be applicable for the great majority of clinical use cases involving observational data.
Finally, the authors also acknowledge that policies which minimize search time may lead to sub-optimal short-term response. This may be critical in many applications, including the antibiotic use case that the authors presented. Short term response, even if insufficient in the long term, may often times keep a patient alive long enough to find a more optimal solution. Getting to the finish line faster is not very exciting if the patient has died half way through the race.
In summary, the paper is excellent but the impact is minimal.