Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)
Ananda Theertha Suresh
For a dataset of label-count pairs, an anonymized histogram is the multiset of counts. Anonymized histograms appear in various potentially sensitive contexts such as password-frequency lists, degree distribution in social networks, and estimation of symmetric properties of discrete distributions. Motivated by these applications, we propose the first differentially private mechanism to release anonymized histograms that achieves near-optimal privacy utility trade-off both in terms of number of items and the privacy parameter. Further, if the underlying histogram is given in a compact format, the proposed algorithm runs in time sub-linear in the number of items. For anonymized histograms generated from unknown discrete distributions, we show that the released histogram can be directly used for estimating symmetric properties of the underlying distribution.