NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper consider the estimation of f-divergences (of the form \int f(dQ/dP) dP = \int f(q/p) p dz, for convex f) in situations where the density p(z) is known, but q(z) is partially known, i.e., is a mixture of the form q(z) = \int q(Z|x)w(x)dx for known components q(Z|x), but unknown weight distribution W (which however can be sampled from). The main message is that (1) these situations arise naturally in the practice of deep-NN, and (2) that the rates of estimation are parametric in this situation, rather than the worst-case nonparametric rates (depending exponentially on the dimension of Z). The estimator simply substitutes q(z) for a corresponding mixture \hat q(z) defined over a sample {X_i}_1^n from the weight distribution W. The results are non-trivial and consider various conditions relating Q(Z|x) , Q(Z), and P(Z), and most common instances of f-divergences. It however leaves unclear whether these rates are tight in these situations (besides perhaps in a few cases where it matches the best possible parametric rates, albeit the meaning of 'sample size' is different from traditional where Z is being sampled). The main concern about the work, raised by some reviewers, is the assumption that the densities are known or almost known. In other words, the conditions at present seem limited to the NN applications described and leave a sense of narrowness of its potential appeal. However this might not be a problem given the popularity of everything NN.