#### Comparing distributions: $\ell_1$ geometry improves kernel two-sample testing

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

#### Authors

*meyer scetbon, Gael Varoquaux*

#### Abstract

Are two sets of observations drawn from the same distribution? This
problem is a two-sample test.
Kernel methods lead to many appealing properties. Indeed state-of-the-art
approaches use the $L^2$ distance between kernel-based
distribution representatives to derive their test statistics. Here, we show that
$L^p$ distances (with $p\geq 1$) between these
distribution representatives give metrics on the space of distributions that are
well-behaved to detect differences between distributions as they
metrize the weak convergence. Moreover, for analytic kernels,
we show that the $L^1$ geometry gives improved testing power for
scalable computational procedures. Specifically, we derive a finite
dimensional approximation of the metric given as the $\ell_1$ norm of a vector which captures differences of expectations of analytic functions evaluated at spatial locations or frequencies (i.e, features). The features can be chosen to
maximize the differences of the distributions and give interpretable
indications of how they differs. Using an $\ell_1$ norm gives better detection
because differences between representatives are dense
as we use analytic kernels (non-zero almost everywhere). The tests are consistent, while
much faster than state-of-the-art quadratic-time kernel-based tests. Experiments
on artificial
and real-world problems demonstrate
improved power/time tradeoff than the state of the art, based on
$\ell_2$ norms, and in some cases, better outright power than even the most
expensive quadratic-time tests. This performance gain is retained even in high dimensions.