{"title": "Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation", "book": "Advances in Neural Information Processing Systems", "page_first": 1529, "page_last": 1537, "abstract": "We provide a detailed study of the estimation of probability distributions---discrete and continuous---in a stringent setting in which data is kept private even from the statistician.  We give sharp minimax rates of convergence for estimation in these locally private settings, exhibiting fundamental tradeoffs between privacy and convergence rate, as well as providing tools to allow movement along the privacy-statistical efficiency continuum. One of the consequences of our results is that Warner's classical work on randomized response is an optimal way to perform survey sampling while maintaining privacy of the respondents.", "full_text": "Local Privacy and Minimax Bounds:\n\nSharp Rates for Probability Estimation\n\nJohn C. Duchi1\n\nMichael I. Jordan1,2\n\nMartin J. Wainwright1,2\n\n1Department of Electrical Engineering and Computer Science\nUniversity of California, Berkeley\n\n2Department of Statistics\n\n{jduchi,jordan,wainwrig}@eecs.berkeley.edu\n\nAbstract\n\nWe provide a detailed study of the estimation of probability distributions\u2014\ndiscrete and continuous\u2014in a stringent setting in which data is kept private even\nfrom the statistician. We give sharp minimax rates of convergence for estimation\nin these locally private settings, exhibiting fundamental trade-offs between pri-\nvacy and convergence rate, as well as providing tools to allow movement along\nthe privacy-statistical ef\ufb01ciency continuum. One of the consequences of our re-\nsults is that Warner\u2019s classical work on randomized response is an optimal way to\nperform survey sampling while maintaining privacy of the respondents.\n\n1\n\nIntroduction\n\nThe original motivation for providing privacy in statistical problems, \ufb01rst discussed by Warner [23],\nwas that \u201cfor reasons of modesty, fear of being thought bigoted, or merely a reluctance to con\ufb01de\nsecrets to strangers,\u201d respondents to surveys might prefer to be able to answer certain questions\nnon-truthfully, or at least without the interviewer knowing their true response. With this motivation,\nWarner considered the problem of estimating the fractions of the population belonging to certain\nstrata, which can be viewed as probability estimation within a multinomial model. In this paper, we\nrevisit Warner\u2019s probability estimation problem, doing so within a theoretical framework that allows\nus to characterize optimal estimation under constraints on privacy. We also apply our theoretical\ntools to a further probability estimation problem\u2014that of nonparametric density estimation.\n\nIn the large body of research on privacy and statistical inference [e.g., 23, 14, 10, 15], a major focus\nhas been on the problem of reducing disclosure risk: the probability that a member of a dataset\ncan be identi\ufb01ed given released statistics of the dataset. The literature has stopped short, however,\nof providing a formal treatment of disclosure risk that would permit decision-theoretic tools to be\nused in characterizing trade-offs between the utility of achieving privacy and the utility associated\nwith an inferential goal. Recently, a formal treatment of disclosure risk known as \u201cdifferential\nprivacy\u201d has been proposed and studied in the cryptography, database and theoretical computer\nscience literatures [11, 1]. Differential privacy has strong semantic privacy guarantees that make it a\ngood candidate for declaring a statistical procedure or data collection mechanism private, and it has\nbeen the focus of a growing body of recent work [13, 16, 24, 21, 6, 18, 8, 5, 9].\n\nIn this paper, we bring together the formal treatment of disclosure risk provided by differential pri-\nvacy with the tools of minimax decision theory to provide a theoretical treatment of probability\nestimation under privacy constraints. Just as in classical minimax theory, we are able to provide\nlower bounds on the convergence rates of any estimator, in our case under a restriction to esti-\nmators that guarantee privacy. We complement these results with matching upper bounds that are\nachievable using computationally ef\ufb01cient algorithms. We thus bring classical notions of privacy,\nas introduced by Warner [23], into contact with differential privacy and statistical decision theory,\nobtaining quantitative trade-offs between privacy and statistical ef\ufb01ciency.\n\n1\n\n\f1.1 Setting and contributions\n\nLet us develop some basic formalism before describing our main results. We study procedures that\nreceive private views Z1, . . . , Zn \u2208 Z of an original set of observations, X1, . . . , Xn \u2208 X , where\nX is the (known) sample space.\nIn our setting, Zi is drawn conditional on Xi via the channel\ndistribution Qi(Zi | Xi = x); typically we omit the dependence of Qi on i. We focus in this paper\non the non-interactive setting (in information-theoretic terms, on memoryless channels), where Qi\nis chosen prior to seeing data; see Duchi et al. [9] for more discussion.\n\nWe assume each of these private views Zi is \u03b1-differentially private for the original data Xi. To give\na precise de\ufb01nition for this type of privacy, known as \u201clocal privacy,\u201d let \u03c3(Z) be the \u03c3-\ufb01eld on Z\nover which the channel Q is de\ufb01ned. Then Q provides \u03b1-local differential privacy if\n\nsup(cid:26) Q(S | Xi = x)\n\nQ(S | Xi = x\u2032) | S \u2208 \u03c3(Z), and x, x\u2032 \u2208 X(cid:27) \u2264 exp(\u03b1).\n\n(1)\n\nThis formulation of local privacy was \ufb01rst proposed by Ev\ufb01mievski et al. [13]. The likelihood ratio\nbound (1) is attractive for many reasons. It means that any individual providing data guarantees\nhis or her own privacy\u2014no further processing or mistakes by a collection agency can compromise\none\u2019s data\u2014and the individual has plausible deniability about taking a value x, since any outcome z\nis nearly as likely to have come from some other initial value x\u2032. The likelihood ratio also controls\nthe error rate in tests for the presence of points x in the data [24].\n\nIn the current paper, we study minimax convergence rates when the data provided satis\ufb01es the local\nprivacy guarantee (1). Our two main results quantify the penalty that must be paid when local\nprivacy at a level \u03b1 is provided in multinomial estimation and density estimation problems. At a\nhigh level, our \ufb01rst result implies that for estimation of a d-dimensional multinomial probability\nmass function, the effective sample size of any statistical estimation procedure decreases from n to\nn\u03b12/d whenever \u03b1 is a suf\ufb01ciently small constant. A consequence of our results is that Warner\u2019s\nrandomized response procedure [23] enjoys optimal sample complexity; it is interesting to note\nthat even with the recent focus on privacy and statistical inference, the optimal privacy-preserving\nstrategy for problems such as survey collection has been known for almost 50 years.\n\nOur second main result, on density estimation, exhibits an interesting departure from standard min-\nimax estimation results. If the density being estimated has \u03b2 continuous derivatives, then classical\nresults on density estimation [e.g., 26, 25, 22] show that the minimax integrated squared error scales\n(in the sample size n) as n\u22122\u03b2/(2\u03b2+1). In the locally private case, we show that there is a difference\nin the polynomial rate of convergence: we obtain a scaling of (\u03b12n)\u22122\u03b2/(2\u03b2+2). We give ef\ufb01ciently\nimplementable algorithms that attain sharp upper bounds as companions to our lower bounds, which\nin some cases exhibit the necessity of non-trivial sampling strategies to guarantee privacy.\n\nNotation: Given distributions P and Q de\ufb01ned on a space X , each absolutely continuous with\nrespect to a measure \u00b5 (with densities p and q), the KL-divergence between P and Q is\n\nDkl (PkQ) :=ZX\n\ndP log\n\ndP\ndQ\n\np\nq\n\nd\u00b5.\n\np log\n\n=ZX\n2ZX |p(x) \u2212 q(x)| d\u00b5(x).\n\n1\n\nLetting \u03c3(X ) denote an appropriate \u03c3-\ufb01eld on X , the total variation distance between P and Q is\n\nkP \u2212 QkTV := sup\n\nS\u2208\u03c3(X )|P (S) \u2212 Q(S)| =\n\nLet X be distributed according to P and Y | X be distributed according to Q(\u00b7 | X), and let\n\nM =R Q(\u00b7 | x)dP (x) denote the marginal of Y . The mutual information between X and Y is\n\nI(X; Y ) := EP [Dkl (Q(\u00b7 | X)kM (\u00b7))] =Z Dkl (Q(\u00b7 | X = x)kM (\u00b7)) dP (x).\n\nA random variable Y has Laplace(\u03b1) distribution if its density pY (y) = \u03b1\n2 exp (\u2212\u03b1|y|). We write\nan . bn to denote an = O(bn) and an \u224d bn to denote an = O(bn) and bn = O(an). For a convex\nset C \u2282 Rd, we let \u03a0C denote the orthogonal projection operator onto C.\n\n2\n\n\fMn(\u03b8(P), \u03a6 \u25e6 \u03c1, \u03b1) := inf\n\nb\u03b8,Q\u2208Q\u03b1\n\nsup\nP \u2208P\n\nEP,Qh\u03a6(cid:16)\u03c1(b\u03b8(Z1, . . . , Zn), \u03b8(P ))(cid:17)i .\n\n(2)\n\n2 Background and Problem Formulation\n\nIn this section, we provide the necessary background on the minimax framework used throughout\nthe paper, more details of which can be found in standard sources [e.g., 17, 25, 26, 22]. We also\nreference our work [9] paper on statistical inference under differential privacy constraints; we restate\ntwo theorems from the paper [9] to keep our presentation self-contained.\n\n2.1 Minimax framework\n\nLet P denote a class of distributions on the sample space X , and let \u03b8 : P \u2192 \u0398 denote a function\nde\ufb01ned on P. The range \u0398 depends on the underlying statistical model; for example, for density\nestimation, \u0398 may consist of the set of probability densities de\ufb01ned on [0, 1]. We let \u03c1 denote the\nsemi-metric on the space \u0398 that we use to measure the error of an estimator for \u03b8, and \u03a6 : R+ \u2192 R+\nbe a non-decreasing function with \u03a6(0) = 0 (for example, \u03a6(t) = t2).\n\nRecalling that Z is the domain of the private variables Zi, let b\u03b8 : Z n \u2192 \u0398 denote an arbitrary\n\nestimator for \u03b8. Let Q\u03b1 denote the set of conditional (or channel) distributions guaranteeing \u03b1-local\nprivacy (1). Looking uniformly over all channels Q \u2208 Q\u03b1, we de\ufb01ne the central object of interest\nfor this paper, the \u03b1-private minimax rate for the family \u03b8(P),\n\nassociated with estimating \u03b8 based on (Z1, . . . , Zn). We remark here (see also the discussion in [9])\nthat the private minimax risk (2) is different from previous work on optimality in differential privacy\n(e.g. [2, 16, 8]): prior work focuses on accurate estimation of a sample quantity \u03b8(x1:n) based on\nthe sample x1:n, while we provide lower bounds on error of the population estimator \u03b8(P ). Lower\nbounds on population estimation imply those on sample estimation, so our lower bounds are stronger\nthan most of those in prior work.\n\nA standard route for lower bounding the minimax risk (2) is by reducing the estimation problem to\nthe testing problem of identifying a point \u03b8 \u2208 \u0398 from a collection of well-separated points [26, 25].\nGiven an index set V, the indexed family of distributions {P\u03bd, \u03bd \u2208 V} \u2282 P is a 2\u03b4-packing of \u0398\nif \u03c1(\u03b8(P\u03bd), \u03b8(P\u03bd \u2032 )) \u2265 2\u03b4 for all \u03bd 6= \u03bd\u2032 in V. The setup is that of a standard hypothesis testing\nproblem: nature chooses V \u2208 V uniformly at random, then data (X1, . . . , Xn) are drawn i.i.d. from\nP n\n\u03bd , conditioning on V = \u03bd. The problem is to identify the member \u03bd of the packing set V.\nIn this work we have the additional complication that all the statistician observes are the private sam-\n\nples Z1, . . . , Zn. To that end, if we let Qn(\u00b7 | x1:n) denote the conditional distribution of Z1, . . . , Zn\ngiven that X1 = x1, . . . , Xn = xn, we de\ufb01ne the marginal channel M n\n\n\u03bd via the expression\n\nM n\n\n\u03bd (A) :=Z Qn(A | x1, . . . , xn)dP\u03bd(x1, . . . , xn) for A \u2208 \u03c3(Z n).\n\n(3)\n\nLetting \u03c8 : Z n \u2192 V denote an arbitrary testing procedure, we have the following minimax bound,\n\nwhose two parts are known as Le Cam\u2019s two-point method [26, 22] and Fano\u2019s inequality [25, 7, 22].\n\nLemma 1 (Minimax risk bound). For the previously described estimation and testing problems,\n\nMn(\u03b8(P), \u03a6 \u25e6 \u03c1, Q) \u2265 \u03a6(\u03b4) inf\n\n\u03c8\n\nP(\u03c8(Z1, . . . , Zn) 6= V ),\n\n(4)\n\nwhere the in\ufb01mum is taken over all testing procedures. For a binary test speci\ufb01ed by V = {\u03bd, \u03bd\u2032},\n\ninf\n\u03c8\n\nP (\u03c8(Z1, . . . , Zn) 6= V ) =\n\n1\n2 \u2212\n\nand more generally,\n\n1\n2 kM n\n\n\u03bd \u2212 M n\n\n\u03bd \u2032kTV ,\n\nP(\u03c8(Z1, . . . , Zn) 6= V ) \u2265(cid:20)1 \u2212\n\ninf\n\u03c8\n\nI(Z1, . . . , Zn; V ) + log 2\n\nlog |V|\n\n(cid:21) .\n\n3\n\n(5a)\n\n(5b)\n\n\f2.2\n\nInformation bounds\n\nThe main step in proving minimax lower bounds is to control the divergences involved in the lower\nbounds (5a) and (5b). We review two results from our work [9] that obtain such bounds as a function\nof the amount of privacy provided. The second of the results provides a variational upper bound on\nthe mutual information I(Z1, . . . , Zn; V ), in that we optimize jointly over subset S \u2282 X . To state\nthe proposition, we require a bit of notation: for each i \u2208 {1, . . . , n}, let P\u03bd,i be the distribution of\nXi conditional on the random packing element V = \u03bd, and let M n\n\u03bd be the marginal distribution (3)\ninduced by passing Xi through Q. De\ufb01ne the mixture distribution P i = 1\nthen state a proposition summarizing the results we require from Duchi et al. [9]:\n\n|V|P\u03bd\u2208V P\u03bd,i, We can\n\nProposition 1 (Information bounds). For any \u03bd, \u03bd\u2032 \u2208 V and \u03b1 \u2265 0,\n\nDkl (M n\n\n\u03bd kM n\n\nAdditionally for V chosen uniformly at random from V, we have the variational bound\n.\n\nI(Z1, . . . , Zn; V ) \u2264 e\u03b1 (e\u03b1 \u2212 e\u2212\u03b1)2\n\nsup\n\n\u03bd \u2032 ) \u2264 4(e\u03b1 \u2212 1)2\nnXi=1\n\n|V|\n\nTV .\n\nnXi=1\nkP\u03bd,i \u2212 P\u03bd \u2032,ik2\nS\u2208\u03c3(X )X\u03bd\u2208V(cid:0)P\u03bd,i(S) \u2212 P (S)(cid:1)2\n\n(6)\n\n(7)\n\nBy combining Proposition 1 with Lemma 1, it is possible to derive sharp lower bounds on arbitrary\nestimation procedures under \u03b1-local privacy. In the remainder of the paper, we demonstrate this\ncombination for probability estimation problems; we provide proofs of all results in [9].\n\n3 Multinomial Estimation under Local Privacy\n\nIn this section we return to the classical problem of avoiding answer bias in surveys, the original\nmotivation for studying local privacy [23].\n\n3.1 Minimax rates of convergence for multinomial estimation\n\nLet \u2206d :=(cid:8)\u03b8 \u2208 Rd | \u03b8 \u2265 0,Pd\n\nj=1 \u03b8j = 1(cid:9) denote the probability simplex in Rd. The multinomial\n\nestimation problem is de\ufb01ned as follows. Given a vector \u03b8 \u2208 \u2206d, samples X are drawn i.i.d. from\na multinomial with parameters \u03b8, where P\u03b8(X = j) = \u03b8j for j \u2208 {1, . . . , d}, and the goal is to\nestimate \u03b8. In one of the earliest evaluations of privacy, Warner [23] studied the Bernoulli variant of\nthis problem and proposed randomized response: for a given survey question, respondents provide\na truthful answer with probability p > 1/2 and lie with probability 1 \u2212 p.\nIn our setting, we assume the statistician sees \u03b1-locally private (1) random variables Zi for the cor-\nresponding samples Xi from the multinomial. In this case, we have the following result, which char-\n2]\n\nacterizes the minimax rate of estimation of a multinomial in both mean-squared error E[kb\u03b8 \u2212 \u03b8k2\nand absolute error E[kb\u03b8 \u2212 \u03b8k1]; the latter may be more relevant for probability estimation problems.\n\nTheorem 1. There exist universal constants 0 < c\u2113 \u2264 cu < 5 such that for all \u03b1 \u2208 [0, 1], the\nminimax rate for multinomial estimation satis\ufb01es the bounds\n\n,\n\n1\n\nd\n\nn\u03b12(cid:27) \u2264 Mn(cid:16)\u2206d,k\u00b7k2\n\n2 , \u03b1(cid:17) \u2264 cu min(cid:26)1,\nc\u2113 min(cid:26)1,\n\u221an\u03b12\n\u221an\u03b12(cid:27) \u2264 Mn (\u2206d,k\u00b7k1 , \u03b1) \u2264 cu min(cid:26)1,\nc\u2113 min(cid:26)1,\n\nn\u03b12(cid:27) ,\n\u221an\u03b12(cid:27) .\n\nd\n\nd\n\nd\n\n(8)\n\n(9)\n\nand\n\nTheorem 1 shows that providing local privacy can sometimes be quite detrimental to the quality\nof statistical estimators. Indeed, let us compare this rate to the classical rate in which there is no\nprivacy. Then estimating \u03b8 via proportions (i.e., maximum likelihood), we have\n\nEhkb\u03b8 \u2212 \u03b8k2\n2i =\n\ndXj=1\n\nEh(b\u03b8j \u2212 \u03b8j)2i =\n\n1\nn\n\ndXj=1\n\n\u03b8j(1 \u2212 \u03b8j) \u2264\n\n1\n\nn(cid:18)1 \u2212\n\n1\n\nd(cid:19) <\n\n1\nn\n\n.\n\nBy inequality (8), for suitably large sample sizes n, the effect of providing differential privacy at a\nlevel \u03b1 causes a reduction in the effective sample size of n 7\u2192 n\u03b12/d.\n\n4\n\n\f[Z]j =(xj\n\nwith probability exp(\u03b1/2)\n1+exp(\u03b1/2)\n1+exp(\u03b1/2) .\n\n1\n\n1 \u2212 xj with probability\n\n(10)\n\n3.2 Optimal mechanisms: attainability for multinomial estimation\n\nAn interesting consequence of the lower bound in (8) is the following fact that we now demonstrate:\nWarner\u2019s classical randomized response mechanism [23] (with minor modi\ufb01cation) achieves the\noptimal convergence rate. There are also other relatively simple estimation strategies that achieve\nconvergence rate d/n\u03b12; the perturbation approach Dwork et al. [11] propose, where Laplace(\u03b1)\nnoise is added to each coordinate of a multinomial sample, is one such strategy. Nonetheless, the\nease of use and explainability of randomized response, coupled with our optimality results, pro-\nvide support for randomized response as a preferred method for private estimation of population\nprobabilities.\n\nWe now prove that randomized response attains the optimal rate of convergence. There is a bijection\n\nbetween multinomial samples x \u2208 {1, . . . , d} and the d standard basis vectors e1, . . . , ed \u2208 Rd,\nso we abuse notation and represent samples x as either when designing estimation strategies. In\nrandomized response, we construct the private vector Z \u2208 {0, 1}d from a multinomial observation\nx \u2208 {e1, . . . , ed} by sampling d coordinates independently via the procedure\n\nWe claim that this channel (10) is \u03b1-differentially private: indeed, note that for any x, x\u2032 \u2208 \u2206d and\nany vector z \u2208 {0, 1}d we have\n\nQ(Z = z | x)\nQ(Z = z | x\u2032)\n\n= exp(cid:16) \u03b1\n\n2\n\n(kz \u2212 xk1 \u2212 kz \u2212 x\u2032k1)(cid:17) \u2208 [exp(\u2212\u03b1), exp(\u03b1)] ,\n\nwhere we used the triangle inequality to assert that |kz \u2212 xk1 \u2212 kz \u2212 x\u2032k1 | \u2264 kx \u2212 x\u2032k1 \u2264 2. We\ncan compute the expected value and variance of the random variables Z; indeed, by de\ufb01nition (10)\n\ne\u03b1/2 \u2212 1\ne\u03b1/2 + 1\n\n1\n\ne\u03b1/2\n\n1\n\nE[Z | x] =\n\nx +\nSince the Z are Bernoulli, we obtain the variance bound E[kZ \u2212 E[Z]k2\nthe de\ufb01nition of the projection \u03a0\u2206d onto the simplex, we arrive at the natural estimator\n\n1 + e\u03b1/2 (1 \u2212 x) =\n\n1 + e\u03b1/2 x +\n\n1 + e\u03b1/2 1.\n\n2] < d/4 + 1 < d. Recalling\n\n1\nn\n\ne\u03b1/2 \u2212 1\n\nb\u03b8part :=\n\nof the problem [3], so the estimator (11) is ef\ufb01ciently computable. Since projections only decrease\n\nnXi=1(cid:16)Zi \u2212 1/(1 + e\u03b1/2)(cid:17) e\u03b1/2 + 1\n\nand b\u03b8 := \u03a0\u2206d(cid:16)b\u03b8part(cid:17) .\nThe projection ofb\u03b8part onto the probability simplex can be done in time linear in the dimension d\ndistance, vectors in the simplex are at most distance \u221a2 apart, and E\u03b8[b\u03b8part] = \u03b8, we \ufb01nd\ne\u03b1/2 \u2212 1(cid:19)2(cid:27) . min(cid:26)1,\nn(cid:18) e\u03b1/2 + 1\nn\u03b12(cid:27) .\nEhkb\u03b8 \u2212 \u03b8k2\n\n2i \u2264 minn2, Ehkb\u03b8part \u2212 \u03b8k2\n\nA similar argument shows that randomized response is minimax optimal for the \u21131-loss as well.\n\n2io \u2264 min(cid:26)2,\n\n(11)\n\nd\n\nd\n\n4 Density Estimation under Local Privacy\n\nIn this section, we turn to studying a nonparametric statistical problem in which the effects of local\ndifferential privacy turn out to be somewhat more severe. We show that for the problem of density\nestimation, instead of just multiplicative loss in the effective sample size as in the previous section,\nimposing local differential privacy leads to a different convergence rate.\n\nIn more detail, we consider estimation of probability densities f : R \u2192 R+,R f (x)dx = 1 and\nf \u2265 0, de\ufb01ned on the real line, focusing on a standard family of densities of varying smoothness [e.g.\n22]. Throughout this section, we let \u03b2 \u2208 N denote a \ufb01xed positive integer. Roughly, we consider\ndensities that have bounded \u03b2th derivative, and we study density estimation using the squared L2-\n2 := R f 2(x)dx as our metric; in formal terms, we impose these constraints in terms of\nnorm kfk2\nSobolev classes (e.g. [22, 12]). Let the countable collection of functions {\u03d5j}\u221e\nbasis for L2([0, 1]). Then any function f \u2208 L2([0, 1]) can be expanded as a sumP\u221e\nterms of the basis coef\ufb01cients \u03b8j :=R f (x)\u03d5j(x)dx, where {\u03b8j}\u221e\n\nj=1 be an orthonormal\nj=1 \u03b8j\u03d5j in\nj=1 \u2208 \u21132(N). The Sobolev space\n\nF\u03b2[C] is obtained by enforcing a particular decay rate on the coef\ufb01cients \u03b8:\n\n5\n\n\fDe\ufb01nition 1 (Elliptical Sobolev space). For a given orthonormal basis {\u03d5j} of L2([0, 1]), smooth-\nness parameter \u03b2 > 1/2 and radius C, the function class F\u03b2[C] is given by\n\nF\u03b2[C] :=(cid:26)f \u2208 L2([0, 1]) | f =\n\n\u03b8j\u03d5j such that\n\n\u221eXj=1\n\n\u221eXj=1\n\nj2\u03b2\u03d52\n\nj \u2264 C 2(cid:27).\n\nIf we choose the trigonometric basis as our orthonormal basis, then membership in the class F\u03b2[C]\ncorresponds to certain smoothness constraints on the derivatives of f . More precisely, for j \u2208 N,\nconsider the orthonormal basis for L2([0, 1]) of trigonometric functions:\n\n\u03d50(t) = 1, \u03d52j(t) = \u221a2 cos(2\u03c0jt), \u03d52j+1(t) = \u221a2 sin(2\u03c0jt).\n\n(12)\n\nNow consider a \u03b2-times almost everywhere differentiable function f for which |f (\u03b2)(x)| \u2264 C for\nalmost every x \u2208 [0, 1] satisfying f (k)(0) = f (k)(1) for k \u2264 \u03b2 \u2212 1. Uniformly for such f , there is\na universal constant c such that that f \u2208 F\u03b2[cC] [22, Lemma A.3]. Thus, De\ufb01nition 1 (essentially)\ncaptures densities that have Lipschitz-continuous (\u03b2 \u2212 1)th derivative. In the sequel, we write F\u03b2\nwhen the bound C in F\u03b2[C] is O(1). It is well known [26, 25, 22] that the minimax risk for non-\nprivate estimation of densities in the class F\u03b2 scales as\n\nMn(cid:16)F\u03b2,k\u00b7k2\n\n2 ,\u221e(cid:17) \u224d n\u2212 2\u03b2\n\n2\u03b2+1 .\n\nOur main result is to demonstrate that the classical rate (13) is no longer attainable when we require\n\u03b1-local differential privacy. In Sections 4.2 and 4.3, we show how to achieve the (new) optimal rate\nusing histogram and orthogonal series estimators.\n\n4.1 Lower bounds on density estimation\n\nWe begin by giving our main lower bound on the minimax rate of estimation of densities when are\nkept differentially private, providing the proof in the longer paper [9].\nTheorem 2. Consider the class of densities F\u03b2 de\ufb01ned using the trigonometric basis (12). For some\n\u03b1 \u2208 [0, 1], suppose Zi are \u03b1-locally private (1) for the samples Xi \u2208 [0, 1]. There exists a constant\nc\u03b2 > 0, dependent only on \u03b2, such that\n\n(13)\n\n(14)\n\nMn(cid:16)F\u03b2,k\u00b7k2\n\n2 , \u03b1(cid:17) \u2265 c\u03b2(cid:0)n\u03b12(cid:1)\u2212 2\u03b2\n\n2\u03b2+2 .\n\nIn comparison with the classical minimax rate (13), the lower bound (14) involves a different poly-\nnomial exponent: privacy reduces the exponent from 2\u03b2/(2\u03b2 + 1) to 2\u03b2/(2\u03b2 + 2). For example,\nfor Lipschitz densities we have \u03b2 = 1, and the rate degrades from n\u22122/3 to n\u22121/2.\n\nInterestingly, no estimator based on Laplace (or exponential) perturbation of the samples Xi them-\nselves can attain the rate of convergence (14). In their study of the deconvolution problem, Carroll\nand Hall [4] show that if samples Xi are perturbed by additive noise W , where the characteris-\ntic function \u03c6W of the additive noise has tails behaving as |\u03c6W (t)| = O(|t|\u2212a) for some a > 0,\nthen no estimator can deconvolve the samples X + W and attain a rate of convergence better than\nn\u22122\u03b2/(2\u03b2+2a+1). Since the Laplace distribution\u2019s characteristic function has tails decaying as t\u22122,\nno estimator based on perturbing the samples directly can attain a rate of convergence better than\nn\u22122\u03b2/(2\u03b2+5). If the lower bound (14) is attainable, we must then study privacy mechanisms that are\nnot simply based on direct perturbation of the samples {Xi}n\n\ni=1.\n\n4.2 Achievability by histogram estimators\n\nWe now turn to the mean-squared errors achieved by speci\ufb01c practical schemes, beginning with the\nspecial case of Lipschitz density functions (\u03b2 = 1), for which it suf\ufb01ces to consider a private version\nof a classical histogram estimate. For a \ufb01xed positive integer k \u2208 N, let {Xj}k\nj=1 denote the partition\nof X = [0, 1] into the intervals\n\nXj = [(j \u2212 1)/k, j/k)\n\nfor j = 1, 2, . . . , k \u2212 1, and Xk = [(k \u2212 1)/k, 1].\n\n6\n\n\fthe sum f\u03b8 :=Pk\n\nAny histogram estimate of the density based on these k bins can be speci\ufb01ed by a vector \u03b8 \u2208 k\u2206k,\nwhere we recall \u2206k \u2282 Rk\n+ is the probability simplex. Any such vector de\ufb01nes a density estimate via\nj=1 \u03b8j 1Xj , where 1E denotes the characteristic (indicator) function of the set E.\n\nLet us now describe a mechanism that guarantees \u03b1-local differential privacy. Given a data set\n{X1, . . . , Xn} of samples from the distribution f , consider the vectors\nfor i = 1, 2, . . . , n,\n\n(15)\nwhere ek(Xi) \u2208 \u2206k is a k-vector with the jth entry equal to one if Xi \u2208 Xj, and zeroes in all\nother entries, and Wi is a random vector with i.i.d. Laplace(\u03b1/2) entries. The variables {Zi}n\nso-de\ufb01ned are \u03b1-locally differentially private for {Xi}n\n\nZi := ek(Xi) + Wi,\n\ni=1.\n\ni=1\n\nUsing these private variables, we then form the density estimate bf := fb\u03b8 =Pk\n\nj=1b\u03b8j 1Xj based on\n\n(16)\n\nwhere \u03a0k denotes the Euclidean projection operator onto the set k\u2206k. By construction, we have\n\nbf \u2265 0 andR 1\nProposition 2. Consider the estimate bf based on k = (n\u03b12)1/4 bins in the histogram. For any\n\n1-Lipschitz density f : [0, 1] \u2192 R+, we have\n\nn\n\nb\u03b8 := \u03a0k(cid:18) k\n\nZi(cid:19),\nnXi=1\n0 bf (x)dx = 1, so bf is a valid density estimate.\n2i \u2264 5(\u03b12n)\u2212 1\n\nEfh(cid:13)(cid:13)bf \u2212 f(cid:13)(cid:13)2\n\nFor any \ufb01xed \u03b1 > 0, the \ufb01rst term in the bound (17) dominates, and the O((\u03b12n)\u2212 1\n2 ) rate matches\nthe minimax lower bound (14) in the case \u03b2 = 1: the privatized histogram estimator is minimax-\noptimal for Lipschitz densities. This result provides the private analog of the classical result that\nhistogram estimators are minimax-optimal (in the non-private setting) for Lipschitz densities.\n\n2 + \u221a\u03b1n\u22123/4.\n\n(17)\n\n4.3 Achievability by orthogonal projection estimators\n\nFor higher degrees of smoothness (\u03b2 > 1), histogram estimators no longer achieve optimal rates in\nthe classical setting [20]. Accordingly, we turn to estimators based on orthogonal series and show\nthat even under local privacy, they achieve the lower bound (14) for all orders of smoothness \u03b2 \u2265 1.\nRecall the elliptical Sobolev space (De\ufb01nition 1), in which a function f is represented as f =\n\nj=1 \u03b8j\u03d5j , where \u03b8j =R f (x)\u03d5j(x)dx. This representation underlies the classical method of or-\nP\u221e\n\nthonormal series estimation: given a data set, {X1, X2, . . . , Xn}, drawn i.i.d. according to a density\nf \u2208 L2([0, 1]), we \ufb01rst compute the empirical basis coef\ufb01cients\n\n1\nn\n\nnXi=1\n\nb\u03b8j =\n\n\u03d5j(Xi) and then set bf =\n\nkXj=1b\u03b8j\u03d5j,\n\nwhere the value k \u2208 N is chosen either a priori based on known properties of the estimation problem\nor adaptively, for example, using cross-validation [12, 22].\n\nIn the setting of local privacy, we consider a mechanism that, instead of releasing the vector of coef-\n\n\ufb01cients(cid:0)\u03d51(Xi), . . . , \u03d5k(Xi)(cid:1) for each data point, employs a random vector Zi = (Zi,1, . . . , Zi,k)\n\nwith the property that E[Zi,j | Xi] = \u03d5j(Xi) for each j = 1, 2, . . . , k. We assume the basis func-\ntions are uniformly bounded; i.e., there exists a constant B0 = supj supx |\u03d5j(x)| < \u221e. For a \ufb01xed\nnumber B strictly larger than B0 (to be speci\ufb01ed momentarily), consider the following scheme:\n\n(18)\n\n(19)\n\nSampling strategy Given a vector \u03c4 \u2208 [\u2212B0, B0]k, constructe\u03c4 \u2208 {\u2212B0, B0}k with coordinatese\u03c4j\n\n2 + \u03c4j\nsampled independently from {\u2212B0, B0} with probabilities 1\nT from a Bernoulli(e\u03b1/(e\u03b1 + 1)) distribution. Then choose Z \u2208 {\u2212B, B}k via\n\n2 \u2212 \u03c4j\n\n. Sample\n\nand 1\n\n2B0\n\n2B0\n\nZ \u223c(cid:26)Uniform on (cid:8)z \u2208 {\u2212B, B}k : hz,e\u03c4i > 0(cid:9) if T = 1\nUniform on (cid:8)z \u2208 {\u2212B, B}k : hz,e\u03c4i \u2264 0(cid:9) if T = 0.\n\n7\n\n\fover, the samples (19) are ef\ufb01ciently computable (for example by rejection sampling). Starting from\n\nBy inspection, Z is \u03b1-differentially private for any initial vector in the box [\u2212B0, B0]k, and more-\nthe vector \u03c4 \u2208 Rk, \u03c4j = \u03d5j(Xi), in the above sampling strategy we have\nB\nB0\u221ak\n\ne\u03b1 + 1(cid:19) \u03d5j(x) = ck\n\nE[[Z]j | X = x] = ck\n\ne\u03b1 \u2212 1\ne\u03b1 + 1\n\nB\n\nB0\u221ak(cid:18) e\u03b1\n\ne\u03b1 + 1 \u2212\n\n1\n\n\u03d5j(x),\n\n(20)\n\nfor a constant ck that may depend on k but is O(1) and bounded away from 0. Consequently, to\nattain the unbiasedness condition E[[Zi]j | Xi] = \u03d5j(Xi), it suf\ufb01ces to take B = O(B0\u221ak/\u03b1).\n\nThe full sampling and inferential scheme are as follows: (i) given a data point Xi, construct the\nvector \u03c4 = [\u03d5j(Xi)]k\nj=1; (ii) sample Zi according to strategy (19) using \u03c4 and the bound B =\n\nB0\u221ak(e\u03b1 + 1)/ck(e\u03b1 \u2212 1). (The constant ck is as in the expression (20).) Using the estimator\n\nwe obtain the following proposition.\n\n1\nn\n\nnXi=1\n\nkXj=1\n\nbf :=\n\nZi,j\u03d5j,\n\n(21)\n\nProposition 3. Let {\u03d5j} be a B0-bounded orthonormal basis for L2([0, 1]). There exists a constant\nc (depending only on C and B0) such that the estimator (21) with k = (n\u03b12)1/(2\u03b2+2) satis\ufb01es\n\nsup\n\nf \u2208F\u03b2[C]\n\nEfhkf \u2212 bfk2\n\n2i \u2264 c(cid:0)n\u03b12(cid:1)\u2212 2\u03b2\n\n2\u03b2+2 .\n\nPropositions 2 and 3 make clear that the minimax lower bound (14) is sharp, as claimed.\n\nBefore concluding our exposition, we make a few remarks on other potential density estimators. Our\northogonal-series estimator (21) (and sampling scheme (20)), while similar in spirit to that proposed\nby Wasserman and Zhou [24, Sec. 6], is different in that it is locally private and requires a differ-\nent noise strategy to obtain both \u03b1-local privacy and optimal convergence rate. Lei [19] considers\nprivate M -estimators based on \ufb01rst performing a histogram density estimate, then using this to con-\nstruct a second estimator; his estimator is not locally private, and the resulting M -estimators have\nsub-optimal convergence rates. Finally, we remark that density estimators that are based on orthogo-\nnal series and Laplace perturbation are sub-optimal: they can achieve (at best) rates of (n\u03b12)\u2212 2\u03b2\n2\u03b2+3 ,\nwhich is polynomially worse than the sharp result provided by Proposition 3. It appears that appro-\npriately chosen noise mechanisms are crucial for obtaining optimal results.\n\n5 Discussion\n\nWe have linked minimax analysis from statistical decision theory with differential privacy, bringing\nsome of their respective foundational principles into close contact. In this paper particularly, we\nshowed how to apply our divergence bounds to obtain sharp bounds on the convergence rate for cer-\ntain nonparametric problems in addition to standard \ufb01nite-dimensional settings. By providing sharp\nconvergence rates for many standard statistical inference procedures under local differential privacy,\nwe have developed and explored some tools that may be used to better understand privacy-preserving\nstatistical inference and estimation procedures. We have identi\ufb01ed a fundamental continuum along\nwhich privacy may be traded for utility in the form of accurate statistical estimates, providing a\nway to adjust statistical procedures to meet the privacy or utility needs of the statistician and the\npopulation being sampled. Formally identifying this trade-off in other statistical problems should\nallow us to better understand the costs and bene\ufb01ts of privacy; we believe we have laid some of the\ngroundwork to do so.\n\nAcknowledgments\n\nJCD was supported by a Facebook Graduate Fellowship and an NDSEG fellowship. Our work was\nsupported in part by the U.S. Army Research Laboratory, U.S. Army Research Of\ufb01ce under grant\nnumber W911NF-11-1-0391, and Of\ufb01ce of Naval Research MURI grant N00014-11-1-0688.\n\n8\n\n\fReferences\n\n[1] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consis-\ntency too: A holistic solution to contingency table release. In Proceedings of the 26th ACM Symposium\non Principles of Database Systems, 2007.\n\n[2] A. Beimel, K. Nissim, and E. Omri. Distributed private data analysis: Simultaneously solving how and\nwhat. In Advances in Cryptology, volume 5157 of Lecture Notes in Computer Science, pages 451\u2013468.\nSpringer, 2008.\n\n[3] P. Brucker. An O(n) algorithm for quadratic knapsack problems. Operations Research Letters, 3(3):\n\n163\u2013166, 1984.\n\n[4] R. Carroll and P. Hall. Optimal rates of convergence for deconvolving a density. Journal of the American\n\nStatistical Association, 83(404):1184\u20131186, 1988.\n\n[5] K. Chaudhuri and D. Hsu. Convergence rates for differentially private statistical estimation. In Proceed-\n\nings of the 29th International Conference on Machine Learning, 2012.\n\n[6] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization.\n\nJournal of Machine Learning Research, 12:1069\u20131109, 2011.\n\n[7] T. M. Cover and J. A. Thomas. Elements of Information Theory, Second Edition. Wiley, 2006.\n\n[8] A. De. Lower bounds in differential privacy. In Proceedings of the Ninth Theory of Cryptography Con-\n\nference, 2012. URL http://arxiv.org/abs/1107.2183.\n\n[9] J. C. Duchi, M. I. Jordan, and M. J. Wainwright.\n\nLocal privacy and statistical minimax rates.\n\narXiv:1302.3203 [math.ST], 2013. URL http://arxiv.org/abs/1302.3203.\n\n[10] G. T. Duncan and D. Lambert. Disclosure-limited data dissemination. Journal of the American Statistical\n\nAssociation, 81(393):10\u201318, 1986.\n\n[11] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis.\n\nIn Proceedings of the 3rd Theory of Cryptography Conference, pages 265\u2013284, 2006.\n\n[12] S. Efromovich. Nonparametric Curve Estimation: Methods, Theory, and Applications. Springer-Verlag,\n\n1999.\n\n[13] A. V. Ev\ufb01mievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining.\nIn Proceedings of the Twenty-Second Symposium on Principles of Database Systems, pages 211\u2013222,\n2003.\n\n[14] I. P. Fellegi. On the question of statistical con\ufb01dentiality. Journal of the American Statistical Association,\n\n67(337):7\u201318, 1972.\n\n[15] S. E. Fienberg, U. E. Makov, and R. J. Steele. Disclosure limitation using perturbation and related methods\n\nfor categorical data. Journal of Of\ufb01cial Statistics, 14(4):485\u2013502, 1998.\n\n[16] M. Hardt and K. Talwar. On the geometry of differential privacy.\n\nSecond Annual ACM Symposium on the Theory of Computing, pages 705\u2013714, 2010.\nhttp://arxiv.org/abs/0907.3754.\n\nIn Proceedings of the Fourty-\nURL\n\n[17] I. A. Ibragimov and R. Z. Has\u2019minskii. Statistical Estimation: Asymptotic Theory. Springer-Verlag, 1981.\n\n[18] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn\n\nprivately? SIAM Journal on Computing, 40(3):793\u2013826, 2011.\n\n[19] J. Lei. Differentially private M-estimators. In Advances in Neural Information Processing Systems 25,\n\n2011.\n\n[20] D. Scott. On optimal and data-based histograms. Biometrika, 66(3):605\u2013610, 1979.\n\n[21] A. Smith. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the\n\nFourty-Third Annual ACM Symposium on the Theory of Computing, 2011.\n\n[22] A. B. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2009.\n\n[23] S. Warner. Randomized response: a survey technique for eliminating evasive answer bias. Journal of the\n\nAmerican Statistical Association, 60(309):63\u201369, 1965.\n\n[24] L. Wasserman and S. Zhou. A statistical framework for differential privacy. Journal of the American\n\nStatistical Association, 105(489):375\u2013389, 2010.\n\n[25] Y. Yang and A. Barron. Information-theoretic determination of minimax rates of convergence. Annals of\n\nStatistics, 27(5):1564\u20131599, 1999.\n\n[26] B. Yu. Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam, pages 423\u2013435. Springer-Verlag,\n\n1997.\n\n9\n\n\f", "award": [], "sourceid": 764, "authors": [{"given_name": "John", "family_name": "Duchi", "institution": "UC Berkeley"}, {"given_name": "Martin", "family_name": "Wainwright", "institution": "UC Berkeley"}, {"given_name": "Michael", "family_name": "Jordan", "institution": "UC Berkeley"}]}