Nearly-Tight Bounds for Testing Histogram Distributions

Canonne, Clément L; Diakonikolas, Ilias; Kane, Daniel; Liu, Sihan

Nearly-Tight Bounds for Testing Histogram Distributions

Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track

Bibtex Paper Supplemental

Authors

Clément L Canonne, Ilias Diakonikolas, Daniel Kane, Sihan Liu

Abstract

We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$ -histograms over $[n]$ , are probability distributions that are piecewise constant over a set of $k$ intervals. Given samples from an unknown distribution $\mathbf p$ on $[n]$ , we want to distinguish between the cases that $\mathbf p$ is a $k$ -histogram versus far from any $k$ -histogram, in total variation distance. Our main result is a sample near-optimal and computationally efficient algorithm for this testing problem, and a nearly-matching (within logarithmic factors) sample complexity lower bound, showing that the testing problem has sample complexity $\widetilde \Theta (\sqrt{nk} / \epsilon + k / \epsilon^2 + \sqrt{n} / \epsilon^2)$ .

Nearly-Tight Bounds for Testing Histogram Distributions

Authors

Abstract

Name Change Policy