NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:2684
Title:Policy Evaluation with Latent Confounders via Optimal Balance

Reviewer 1

# Originality The problem, i.e., policy evaluation in presence of unobserved confounders, is not quite new and there exist a great deal of previous works on this topic. But it is worth noting that this paper provided some rigoriously theoretical results, albeit with some strong assumptions. # Quality The theoretical part of this paper is technically sound but the experimental part appears not that convicing. The setup seems a bit simple. I think it is necessary to conduct one or two more complicated or real-world experiments, say, higher-dimensional Z? more complicated distributions over Z? more complex function of Y? etc. # clarity The paper is organised well and is easy to follow. The notations are also presented clearly. # Significance Although the results are kind of interest, they are quite limited in real world due to strong assumptions. For example, the authors assume access to an identified latent confounder model, which is almost impossible in practice. Also, some assumptions in Section 3.3 are not likely to occur in real world applications. As such, the proposed approach is probably of limited importance in practice.

Reviewer 2

Update: I am happy with the author's response. Thus I will keep my original score. Original comments: In this paper, the authors proposed a new method for policy evaluation when one only has proxies for true confounders. In particular, the author shows that the policy value can be consistently estimated in square root-n convergence rate when the outcome function is unknown. Pros. Policy evaluation with the existence of latent confounders is a very important question in contextual bandits. Although there is a large body of literature when assuming unconfoundedness, the problem of developing efficient policy evaluation method when allowing for latent confounders has been seldomly investigated. It’s good to see that this paper has filled this gap. Cons. In this paper, the author only provides explicit bounds and algorithms when the function class follows a RKHS. It would be interesting to see explicit bound and implementation of such algorithm in other class of models, such as neural networks.

Reviewer 3

This paper aims to evaluate a policy when the treatment and the outcome are confounded by some unobserved variables. In general, this problem is impossible and the authors assume that there are some proxy variables for the hidden confounders. The authors propose to solve the policy evaluation problem by designing a weighted estimator which computes a weight for an instance (outcome, treatment, proxy triple). When there is no unobserved confounder, the inverse propensity weights make the weighted estimator unbiased irrespective of the outcome. The authors first show that, in the presence of proxy variables, there exist weights that achieve an unbiased evaluation of the policy. However, such weights depend on the mean outcome functions and cannot be computed without knowledge of the outcome function. This existence of such weights motivates the authors to consider the minimization of conditional mean squared error over a class of functions. If the weights provide an error of O(1/n) for the CMSE and the outcome function belongs to the class of functions considered, then the resulting weighted estimator will be consistent. In order to handle the CMSE, the paper then considers a suitable upper bound with a loss of at most O(1/n) and try to minimize the upper bound. The upper bound has many interesting properties, in particular, it becomes a quadratic function of the weights when the class of functions is in RKHS. This immediately gives a QP to derive the weights when the mean outcome functions come from RKHS and consistency follows. Subsequently, the authors perform a simulation study and compare their method with various other policy evaluation methods. The proposed method seems to converge with an increasing number of samples, whereas methods like inverse propensity score or direct substitution do not. I found the paper provides a significant contribution to the literature on policy evaluation. The general framework of minimizing an adversarial balance objective might be useful for other settings like instrumental variables. I thought the paper was well-written, the motivation behind the theorems and the lemmas were clear and I could certainly follow the arguments. I have a couple of questions and some minor suggestions for the authors. 1. Following theorem 4, the text mentions matrix \Sigma. I don’t think such a matrix was introduced in the text. 2. For the simulation setting, the proxy variables have dimension 10 and the latent variables have dimension 1. Do the authors hope to obtain similar results for high-dimensional latent variables? 3. The proxy variables can also be weakly influenced by the latent confounders. It would be nice to see how the proposed methods are robust to the strength of the influence of the unobserved confounders to the proxy variables. Even a discussion in this regard will be helpful for the readers. 4. I am a bit surprised to see that there is no edge from X to T. Although the outcomes are completely determined by the treatment and the hidden confounder (say intelligence), the proxies (e.g. test scores) might affect the treatment assignment. Can it be incorporated into the current framework?