Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track
Clément L. Canonne, Joy Qiping Yang
This paper studies the problem of \emph{entropy identity testing}: given sample access to a distribution p and a fully described distribution q (both are discrete distributions over the support of size k), and the promise that either p=q or |H(p)−H(q)|⩾, where H(\cdot) denotes the Shannon entropy, a tester needs to distinguish between the two cases with high probability. We establish a near-optimal sample complexity bound of \tilde{\Theta}(\sqrt{k}/\varepsilon + {1}/{\varepsilon^2}) for this problem, and show how to apply it to the problem of identity testing for in-degree-d n-dimensional Bayesian networks, obtaining an upper bound of \tilde{O}( {2^{d / 2} n^{3/2}}/{\varepsilon^2} + {n^2}/{\varepsilon^4} ). This improves on the sample complexity bound of \tilde{O}(2^{d/2}n^2/\varepsilon^4) from Canonne, Diakonikolas, Kane, and Stewart (2020), which required an additional assumption on the structure of the (unknown) Bayesian network.