Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper is of interest firstly by its technical results. Non parametric estimation with adversarial losses as well as Gans are clearly the focus of numerous current ML research and in my opinion this work is a significant contribution on these subjects. Their rates unify/improve a lot of existing results of the literature and cover a wide variety of settings, thanks to the generality of Besov spaces. The paper is also very well written, clearly introducing to the reader the substantial mathematical content needed on wavelets/Besov spaces and discussing the results in a honest way.Then it provides very interesting interpretation of the results as well as an advanced discussion on the different regimes of smoothness. Generally this paper provides a deeper understanding on this problem. I really appreciated the investment of the authors in the writing and I enjoyed reading this paper. The proofs are also detailed and clearly written. They use classic tools from statistics (Fano's lemma, concentration bounds, wavelet estimators) and look correct - I did not find any flaw.
My comments mainly lie in the following perspective: 1. About \epsilon in Theorem 9. I think it may depend on the parameters of Besov space. As the design of both discriminator network and generator network depends on \epsilon, does this imply that the result is not adaptive, i.e., in order the achieve the minimax rate, we need to know \sigma_g, p_g, q_g? 2. Could the authors briefly summarize technical contributions of the paper? It seems that Theorem 9 mainly depends on the previous study of approximation ability of fully-connected ReLU networks. 3. GANs achieves minimax convergence rate over the Besov space, while the same rate can also be achieved by wavelet thresholding. This implies that if we consider Besov space and minimax rate, GANs cannot outperform wavelet thresholding. In order to demonstrate the superiority of GANs, especially in terms of image analysis, is it possible to study some more restrictive function classes other than the Besov space? This comment may be beyond the scope of the paper, but I do think it closely related to the study of statistical properties of GANs. ------------------------------------------------------------- Thank you for the response, which clarifies the adaptivity and theoretical contributions. My score remains the same.
- This paper is poorly written and is hard to follow. Its structure and organization could be substantially improved. The notation is unclear, and the terminology is not defined. For example, see lines 45-48. The formal problem statement (section 2.2) is vague, as well. - Unclear novelty. Many technical terms are used without any context; they are not explained and, further, it is not clear how those concepts help the claims made in the paper. For example, in lines 125-156, authors say: "We end this section with a brief survey of known results for estimating distributions under specific Besov IPM losses, noting that our results (Equations (3) and (4) below) generalize all these rates." Authors then spend the rest of that section listing known results about L_p distances, Wasserstein distances, KS and Sobolov distance; however, authors never explain how exactly their results generalize all these known results. The reviewer has an impression as if several parts of the paper were taken directly from a textbook without a proper introduction of terminology. - Unclear relevance to the machine learning community. I understand that this paper describes theoretical work, and because of that, it does not include empirical results and benchmarks. Nevertheless, it is unclear what is a direct relevance of the new results for GANs and nonparametric density estimation. Authors provide a discussion of results in section 5. However, it is unclear and confusing. Again, the notation is not defined; for example, see lines 271-283. -----Update----- The rebuttal addressed many of my concerns, in particular, those related to the terminology used in the paper, and relevance of new results for GANs and nonparametric density estimation. I highly encourage authors to include those additional explanations in the paper and to remove alternative notations (or define in appendix), which are not needed to understand the paper. The author response has convinced me to increase my initial score for this submission.