Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper addresses the problem of inferring causal effects using observational data, under the “no-hidden confounders” scenario. Recently there has been much interest in the problem from the machine learning community, including several papers proposing neural net architectures tailored for this problem. This paper proposes a new regularization scheme for this task. The idea is inspired by TMLE, a well known method for doubly-robust estimation of treatment effects. However, TMLE is only an inspiration - the regularization scheme and resulting architecture are distinct and novel. Indeed, this is the first time I’ve seen the approach of estimating equations, common in the econometrics literature, being directly and fruitfully taken up to create a novel deep net architecture and optimization objective. The idea is followed by an extensive set of experiments. First, it is shown that the method achieves good results on the widely used IHDP benchmark. Then, the paper uses the large ACIC 2018 benchmark, which includes 101 different and purposely diverse datasets. The proposed method shows excellent results compared to relevant baselines. Then, the paper includes a quite in-depth (considering the space constraints) examination of what drives the method’s performance, using an ablation study and a careful simulation study. Two of the reviewers agreed that the paper has substantial novelty to it. The third reviewer thought it more incremental. One of the reviewers who believed the paper is original and novel was nonetheless concerned about many specific issues relating to clarity and presentation, and also about the depth of the experimental results. That reviewer also had doubts about part of the theoretical results as they pertain to consistency. In my estimation, and after considering the authors response: 1. The idea is definitely novel. Indeed, I find this to be one of the most novel ideas I’ve seen in the last 3 years in the field of applying ML to causal effect inference. 2. The experiments are extensive, and definitely on par with the existing published literature in the field. The subsections “Why does Dragonnet work?” and “When does targeted regularization work?” are exactly the kind of examination I believe we would all want to see in papers proposing novel optimization schemes and architectures. 3. I found the paper to be clear overall. The specific points correctly raised by the reviewers were satisfactorily addressed in the authors response. I believe the point raised by one of the reviewers about consistency stems from a misunderstanding and is explained in the paper itself and the authors response. As the reviewers pointed out, the paper would have gained from a more thorough analysis of this specific regularization helps in the finite-sample case - how is the hypothesis class implied by the regularizer better for causal effect inference? Though this would no-doubt help, I do not think it is a necessary requirement for acceptance. There is enough novelty in the idea in itself. Some other improvements I would like to see are understanding what epsilons are actually chosen by the optimization procedure, and how other baselines perform on the ACIC benchmark.