Reviews: Efficient Nonparametric Smoothness Estimation

NIPS 2016
Mon Dec 5th through Sun the 11th, 2016 at Centre Convencions Internacional Barcelona

Paper ID:	593
Title:	Efficient Nonparametric Smoothness Estimation

Reviewer 1

Summary

This paper estimates s-Sobolev inner products of two distributions, from sets of independent samples from each distribution. When used to estimate Sobolev norms, a single set is split into two. The main constribution is in obtaining computationally tractable optimal (for some ranges of s) estimators. The techniques are not fundamentally novel. But their use is interesting, and reasonable assumptions are used to simplify the problem. In particular, a compact support allows working with discrete Fourier frequencies, and the estimation is done entirely in the frequency domain, using the fact that the functionals of interest are expectations, replacing them with empirical means. The bias-variance control is done using bandlimiting. Some asymptotic analysis is also provided, and used to obtain two-sample tests.

Qualitative Assessment

- This paper is generally well written, the results seem solid and potentially impactful. - Some of the work referred to actually derives adaptive estimators, with respect to the ambient smoothness s', when the latter is not known in advance. One caveat here is that to obtain the best rates, s' should be known, and thus the estimator is not adaptive as is. This is brought up in the Conclusions, but that discussion is a bit vague. - The comparisons in Lines 258-265 is particularly insightful. More elaboration may be a good thing. Some remarks: - The caption of Table 1 says "for which..." twice. The letter k is used in two different senses, change one. - There's no need in this context in conjugating f on Line 63. But in Equation (5), shouldn't \psi_z be conjugated? - Line 142, say a word about the W^{2s,4} pseudonorm. - In Equation (14), typo: change H^2 to H^s. - Line 150-151, the consistency goes through only if s < s' - Line 211, statistical (risk) for computational efficiency - Line 244, if and only (if), double parentheses - To be clear Theorems 4/5 are asymptotic. So there is no finite-sample analysis of the power of the two-sample tests in the paper. So the statement on Line 254 may be a bit strong.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 2

Summary

The main subject of this paper is the analysis of a new way to estimate Sobolev quantities of densities from samples. Instead of plugging in an estimator of the density and using the associated quantities of these estimators, the authors propose and analyse an estimation scheme based on a truncated Fourier decomposition. A study of the bias and variance shows that this provides minimax optimal statistical performance. Additionally, this estimator is computationally efficient, in comparaison with existing methods.

Qualitative Assessment

Overall, I find that this is a very good idea. It is nice that it has a minimax optimal statistical performance. It is almost surprising that it has not been considered in the past, as it seems quite simple. The most important part for me is the novelty of this approach.

Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

Reviewer 3

Summary

In this article, the authors provide a non parametric estimation for inner product between density when they belong to a Sobolev space. The estimator is based on a Fourier decomposition and the authors show finite sample bounds and asymptotic results. Finally, a small empirical study is presented and the link with two-sample testing is made.

Qualitative Assessment

The main document is well-written and clear overall. But there are a lot of typos in the supplementary file that made the proofs hard to read-proof. For example, the name of the theorem do not match with the main document, p2 eq (3) |y| is missing, eq(9) remove C_1 The beginning of section 6 is very confuse, my advice would be to change (1) and (2) by i) and ii). In my opinion, the main weakness of the paper is that the estimator is very classical, at least for density estimation. As a matter of fact, all the proofs consist in applying a 80's theorem. Moreover, I do not think that this estimator is really interesting for real data, the numerical section is too short and on too limited (distance between Gaussian in 1D or 3D). I suggest to make a stronger numerical section, on real data sets with an example of two-sample testing.

Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

Reviewer 4

Summary

This paper focuses on estimation of Sobolev quantities of distribution functions, which is a generalization of L^2 quantities. Specifically, a non-parametric estimator from samples is proposed and studied. The authors analyze the bias, variance and asymptotic properties of the estimator under certain conditions and run simulations to show the performance of the estimator.

Qualitative Assessment

Pros: 1. The authors provides a clear definition and explanation of Sobolev quantities, and make good connection with the classical L^2 quantities. This helps readers to easily understand the problem. 2. The authors have very solid mathematical abilities, which make them able to prove the theoretical properties of the estimator. This paper is a very strong mathematical paper. Cons: 1. My concern is mainly about the practical usage of Sobolev quantities. As I know, there are no downstream application of Sobolev quantities, which makes this paper a little bit uninteresting in the NIPS community. I believe that this paper will attract more attention in statistic community. 2. A minor question: Is that possible to prove a minimax lower bound on the l-2 error?

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 5

Summary

The paper proposed a non-parametric estimator for Sobolev quantities (norms, inner products, and distances) of probability density. The estimator operates in the Fourier (frequency) domain, and it unbiasedly estimates the related quantities that are truncated to a certain frequency. For the theoretical aspect, the convergence rate of the mean square error is provided, which matches several previous results. Moreover, the asymptotic distributions are derived for the inner product estimator and the distance estimator when $p=q$. For the practical aspect, the computational complexity is emphasized which in linear w.r.t. the sample size $n$. This makes the proposed estimator more computationally tractable than previous estimator. Also, simulations are provided that validates the theory.

Qualitative Assessment

Sobolev quantities (norms, inner products, and distances) of probability density are important to a lot of applications in machine learning and signal processing. This paper proposed a non-parametric estimator for such quantities with the computational complexity linear in the sample size $n$. The convergence rate of the mean square error, and the asymptotic distributions are provided for such estimator. Moreover, a few simulations are provided that validates the theory. Generally speaking, I think this is a pretty nice theoretical paper. The idea is simple and well presented: the estimator looks at the probability distribution in its Fourier domain, and provides an unbiased estimate of the desired quantities that are truncated to a certain frequency. Here are a few suggestions & comments. 1. My biggest concern is the usefulness of the proposed estimator. Choosing to estimate the frequency-truncated version of the Sobolev quantities seems a natural idea to me. However, I am a little less convinced why we should use the proposed estimator in practice instead of using some other methods. For example, we can look the quantity in some other domain rather than the Fourier domain. The only reason that convinces me is its linear computational complexity. I would be better convinced if some optimality guarantees are presented. For example, some converse results that show the estimator is indeed rate optimal. Or some real world data experiments that show the proposed estimator is robust in practice. Some intuitions suggesting the Fourier domain is reasonable to look at will also help. Also, I admit I am not an expert in the non-linear functional estimation of continuous distributions. It would be helpful if the authors can clarify a little more why estimating the Sobolev quantities, especially higher order Sobolev quantities, are very useful. A few examples will work here. 2. The paper uses Gaussian distribution and uniform distribution for simulation. It seems a little weird to me to use the Gaussian distribution, which violates the bounded support assumption. Also, it would be better to use some less smooth distributions. Of course, the paper would be much stronger if there were some real-world data experiments presented. 3. There are a few typos in section 7, which, of course, are just some minor issues. For example, line 195 and line 196, I believe it is equation (13) that we should look at instead of (14). Also, line 196 and 199, I think in the expression of $Z_n$, the exponent should be positive instead of negative.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 6

Summary

This paper presents a method for estimating Sobolev quantities which are a measure of function smoothness. The authors prove a bias and variance bond for the estimator and also derive its asymptotic distribution. Their approach is unique among methods for analyzing distributional functional estimation which may be useful for analyzing other methods.

Qualitative Assessment

EDIT: After reading the authors' response, I have revised some of my scores up. This revision, however, is conditional on the authors clarification of the issues I brought up below about the proofs and the experiments. The potential impact of this paper lies in the possibility for adapting the authors' method of analysis to other estimation problems and in potentially estimating density smoothness for estimating other distributional functionals. There are a few issues with the clarity of the paper and the proofs. If these issues are sufficiently addressed, then I would recommend this paper for publication in NIPS. Main Paper 1. There are some organizational issues where notation is used either before defining it or not at all. Examples include Section 4 where the samples X_j and Y_j are not defined. Additionally, the norms in the constant C_2 in Theorem 2 are not defined in the main paper. In the latter case, if the authors do not want to define these norms in the main paper, it would be sufficient to note that the expression for the constant is in the supp. material. 2. There are some issues with the experimental results. First, it seems that when you take into consideration the different scales of figures 1 and 2, the estimator does better in the higher dimensional case than in the univariate case. Generally, the opposite is true. Is there some explanation for this? Also, figures 2c and 2d should be redrawn on the same scale. With the current scale, it looks like the estimator performs better for H^1 than for H^0. EDIT: The authors' response makes sense. It would be good to clarify this in the paper somewhere. Supplementary material (in order of appearance) 1. Eq. (3) is missing a summation symbol. 2. In Eq. (8), the final term should be ((k+z)z)^(2s). The notation of k_j and k is also confusing in this equation and the ones that follow which makes it difficult to verify this section. 3. In the final equation on page 2 (right after line 23), the RHS depends on k_j. Yet the LHS sums over all j. In fact given this summation, it's not clear how the RHS depends on any of the variables as it appears to sum over all of them. Thus it doesn't make sense to maximize over k in the next part. 4. The middle eq. in (12) is missing z^(2s) 5. On line 33, please clarify that this is Theorem 4 from the supplemental material. 6. It would be helpful to clarify that Theorem 2 in the supp. material is a restatement of Thm 4 from the paper. 7. The sufficient condition claimed for Theorem 2 on line 50 is not clear to me. Please clarify more. Part of the confusion is that \sigma_n is not defined. In addition, the paper could benefit from another read through by the authors to catch some minor typos and grammatical errors.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)