Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track
Maryam Aliakbarpour, Mark Bun, Adam Smith
Hypothesis selection, also known as density estimation, is a fundamental problem in statistics and learning theory. Suppose we are given a sample set from an unknown distribution P and a finite class of candidate distributions (called hypotheses) H\coloneqq{H1,H2,…,Hn}. The aim is to design an algorithm that selects a distribution ˆH in H that best fits the data. The algorithm's accuracy is measured based on the distance between ˆH and P compared to the distance of the closest distribution in H to P (denoted by OPT). Concretely, we aim for ‖ to be at most \alpha \cdot OPT + \epsilon for some small \epsilon and \alpha. While it is possible to decrease the value of \epsilon as the number of samples increases, \alpha is an inherent characteristic of the algorithm. In fact, one cannot hope to achieve \alpha < 3 even when there are only two candidate hypotheses, unless the number of samples is proportional to the domain size of P [Bousquet, Kane, Moran '19]. Finding the best \alpha has been one of the main focuses of studies of the problem since early work of [Devroye, Lugosi '01]. Prior to our work, no algorithm was known that achieves \alpha = 3 in near-linear time. We provide the first algorithm that operates in almost linear time (\tilde{O}(n/\epsilon^3) time) and achieves \alpha = 3. This result improves upon a long list of results in hypothesis selection. Previously known algorithms either had worse time complexity, a larger factor \alpha, or extra assumptions about the problem setting.In addition to this algorithm, we provide another (almost) linear-time algorithm with better dependency on the additive accuracy parameter \epsilon, albeit with a slightly worse accuracy parameter, \alpha = 4.