Review for NeurIPS paper: Distributionally Robust Local Non-parametric Conditional Estimation

NeurIPS 2020

Distributionally Robust Local Non-parametric Conditional Estimation

Review 1

Summary and Contributions: This paper considers the problem of non-parametric conditional estimation such as nearest neighbor regression in the presence of adversarial examples. Following some existing literature, it formulates the problem as distributional robust optimization and derives a solution. For conditional mean estimation the solution reduces to a convex program. Some analysis of rate of convergence is then provided under the condition that the conditional distribution is smooth.

Strengths: + The problem is timely. While there has been quite a bit of work on adversarially robust nearest neighbors and other non-parametric (see below for missing literature review), the problem is far from solved. This paper presents a reasonable solution. + The proposed method appears to be technically correct. + While DRO and perturbations wrt infinity Wasserstein balls have been used in the context of adversarial examples and classification with neural networks before, it is a new approach for non-parametric estimation. So the formulation and the results are a novel alternative way to look at the problem.

Weaknesses: - The approach seems to be somewhat non-transparent -- for example, for nearest neighbors, it is a little unclear what exactly the solution is doing, and why averaging over such a large ball is needed. I would recommend adding some discussion on this. - While the paper's approach is different, it is missing comparisons to recent theoretical and empirical work on robustness of nearest neighbors and other non-parametric methods. See below for more details.

Correctness: Methodology appears to be correct although I did not verify the proofs.

Clarity: Mostly well-written -- although the approach could be made more transparent.

Relation to Prior Work: The paper is missing references to and comparisons with important related works on adversarial examples for nearest neighbors and other non-parametric methods. For example, [1] provides a direct convergence rate for robustness of nearest neighbors to adversarial examples; it would be good to discuss how the bounds in this paper are different. Similarly, the convex program proposed by this paper feels similar to [5] as well as [2]; it would be good to discuss the relationship between these works. [1] Analyzing the robustness of nearest neighbors to adversarial examples, ICML 2018. [2] On the Geometry of Adversarial Examples, https://arxiv.org/abs/1811.00525 [3] Robustness for non-parametric methods: a defense and an attack, AISTATS 2020. [4] When are non-parametric methods robust?, ICML 2020. [5] Evaluating the Robustness of Nearest Neighbor Classifiers: A Primal-Dual Perspective, https://arxiv.org/abs/1906.03972

Reproducibility: Yes

Additional Feedback: Overall this is a solid paper, and a good fit for the conference. The paper could be significantly strengthened by making the approach more transparent and citing, and discussing directly related previous work. -- The authors have addressed my questions, and my opinion (accept) remains unchanged. I urge the authors to add these discussions and the context of prior work to the final version of the paper.

Review 2

Summary and Contributions: Upon reading the response and the other reviews, I raised my score. I am still not entirely convinced about the strength of empirical evaluation. Even though MNIST is a benchmark dataset, the proposed task does not seem like a natural setting for non-parametric conditional estimation. ------------- The paper introduces a new estimator for conditional statistics (such as the conditional mean). The estimator uses distributionally robust optimization to ensure robustness of the estimator. The authors demonstrate that under certain assumptions the estimator is efficiently computable, and they also provide generalization and consistency guarantees.

Strengths: The analysis of the computational efficiency of the proposed estimator is interesting and novel.

Weaknesses: While I agree that many tasks in machine learning can be cast as conditional estimation problems, I would have liked to see a concrete setting to which the estimator is particularly well suited, and where it can be shown (either theoretically or empirically) to perform competitively with existing methods. I found the empirical evaluation to be very limited, because: 1) both the studied tasks (mean estimation with synthetic data, and digit estimation on MNIST) are of minimal practical interest, and 2) the numerical results are not very convincing. The theoretical contributions of the paper involve demonstrating that the proposed estimator is under certain settings tractable and consistent, but fail to illustrate its strengths.

Correctness: Yes.

Clarity: Yes.

Relation to Prior Work: Much of the discussion of related work feels out of place and overly broad.

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: This paper discusses the problem of robust non-parametric conditional estimation for mean, quantiles, and alike. They formulate the problem using infinity Wasserstein ball around the empricial distribution and under some conditions prove consistency, upper bounds and computational complexity of the problem. The theoretical arguments are interesting on their own, and they also perform experiments on MNIST and synthetic examples and compare their approach with other estimators.

Strengths: The theory developed is nice, especially I found Proposition 2.2, Theorem 2.3, and Proposition 3.4 very interesting. It is a novel application of simple methods in optimization applied to a statistical problem. The problem is at the very heart of machine learning and statistics and can have huge impacts on the community. The numerical experiment was very insightful and showed how other estimators behave in terms of robustness.

Weaknesses: Some theory developed is incomplete and needs further toughts. The result of Proposition 3.1 just provides an upperbound on the expected value of the estimate, while no lowerbounds are provided. Moreover, the result of Example 3.3 is not very interesting, as it is under very tight conditions (linearity of the conditional expectation with respect to x), and a common use-case is not provided. The experimental results on MNIST is not very pleasing and does not show the superiority against NW estimate.

Correctness: The results, as far as I understood and skimmed through the appendix, seems correct, however, I did not verfied the proofs line by line.

Clarity: Yes, it is a well written paper.

Relation to Prior Work: Yes, it is clear what others miss in terms of robustness or their assumptions.

Reproducibility: Yes

Additional Feedback: ===== EDIT ===== after reading the authors' rebuttal, I am convinced that the score I gave is fair.