Locally differentially private estimation of functionals of discrete distributions

Part of Advances in Neural Information Processing Systems 34 pre-proceedings (NeurIPS 2021)


Bibtek download is not available in the pre-proceeding


Cristina Butucea, Yann ISSARTEL


We study the problem of estimating non-linear functionals of discrete distributions in the context of local differential privacy. The initial data $x_1,\ldots,x_n \in[K]$ are supposed i.i.d. and distributed according to an unknown discrete distribution $p = (p_1,\ldots,p_K)$. Only $\alpha$-locally differentially private (LDP) samples $z_1,...,z_n$ are publicly available, where the term 'local' means that each $z_i$ is produced using one individual attribute $x_i$. We exhibit privacy mechanisms (PM) that are interactive, i.e. they are allowed to use already published confidential data, or non-interactive. We describe the behavior of the squared error risk in estimating the power sum $F_{\gamma} = \sum_{k=1}^K p_k^{\gamma}$, $\gamma >0$ as a function of $K, \, n$ and $\alpha$. In the non-interactive case, we study several plug-in type estimators of $F_{\gamma}$, for all $\gamma >0$, that are similar to the MLE which has been analyzed by Jiao et al. (2017) in the multinomial model. However, due to the privacy constraint the rates we attain are slower and similar to those obtained in the Gaussian model by Collier et al. (2020). In the interactive case, we introduce for all $\gamma >1$ a two-step procedure which attains the faster parametric rates $(n \alpha^2)^{-1/2}$ when $\gamma \geq 2$. We give lower bounds results over all $\alpha-$LDP privacy mechanisms and over all estimators using the private samples.