{"title": "Differentially Private Uniformly Most Powerful Tests for Binomial Data", "book": "Advances in Neural Information Processing Systems", "page_first": 4208, "page_last": 4218, "abstract": "We derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance. We show that in general, DP hypothesis tests can be written in terms of linear constraints, and for exchangeable data can always be expressed as a function of the empirical distribution. Using this structure, we prove a \u2018Neyman-Pearson lemma\u2019 for binomial data under DP, where the DP-UMP only depends on the sample sum. Our tests can also be stated as a post-processing of a random variable, whose distribution we coin \u201cTruncated-Uniform-Laplace\u201d (Tulap), a generalization of the Staircase and discrete Laplace distributions. Furthermore, we obtain exact p-values, which are easily computed in terms of the Tulap random variable. We show that our results also apply to distribution-free hypothesis tests for continuous data. Our simulation results demonstrate that our tests have exact type I error, and are more powerful than current techniques.", "full_text": "Differentially Private Uniformly Most Powerful Tests\n\nfor Binomial Data\n\nJordan Awan\n\nDepartment of Statistics\nPenn State University\n\nUniversity Park, PA 16802\n\nawan@psu.edu\n\nAleksandra Slavkovi\u00b4c\nDepartment of Statistics\nPenn State University\n\nUniversity Park, PA 16802\n\nsesa@psu.edu\n\nAbstract\n\nWe derive uniformly most powerful (UMP) tests for simple and one-sided hypothe-\nses for a population proportion within the framework of Differential Privacy (DP),\noptimizing \ufb01nite sample performance. We show that in general, DP hypothesis tests\ncan be written in terms of linear constraints, and for exchangeable data can always\nbe expressed as a function of the empirical distribution. Using this structure, we\nprove a \u2018Neyman-Pearson lemma\u2019 for binomial data under DP, where the DP-UMP\nonly depends on the sample sum. Our tests can also be stated as a post-processing\nof a random variable, whose distribution we coin \u201cTruncated-Uniform-Laplace\u201d\n(Tulap), a generalization of the Staircase and discrete Laplace distributions. Fur-\nthermore, we obtain exact p-values, which are easily computed in terms of the\nTulap random variable. We show that our results also apply to distribution-free\nhypothesis tests for continuous data. Our simulation results demonstrate that our\ntests have exact type I error, and are more powerful than current techniques.\n\n1\n\nIntroduction\n\nDifferential Privacy (DP), introduced by DMNS06, offers a rigorous measure of disclosure risk. To\nsatisfy DP, a procedure cannot be a deterministic function of the sensitive data, but must incorporate\nadditional randomness, beyond sampling. Subject to the DP constraint, it is natural to search for a\nprocedure which maximizes the utility of the output. Many works address the goal of minimizing the\ndistance between the output of the randomized DP procedure and standard non-private algorithms,\nbut few attempt to infer properties about the underlying population (for some notable exceptions,\nsee related work), which is typically the goal in statistics and scienti\ufb01c research. In this paper, we\nstudy the setting where each individual contributes a sensitive binary value, and we wish to infer the\npopulation proportion via hypothesis tests, subject to DP. In particular, we derive uniformly most\npowerful (UMP) tests for simple and one-sided hypotheses, optimizing \ufb01nite sample performance.\nUMP tests are fundamental to classical statistics, being closely linked to suf\ufb01ciency, likelihood\ninference, and con\ufb01dence sets. However, \ufb01nding UMP tests can be hard and in many cases they do\nnot even exist (see Sch96, Section 4.4). Our results are the \ufb01rst to achieve UMP tests under (\u0001, \u03b4)\u2212DP,\nand are among the \ufb01rst steps towards a general theory of optimal inference under DP.\nRelated work Vu and Slavkovi\u00b4c [VS09] are among the \ufb01rst to perform hypothesis tests under\nDP. They develop private tests for population proportions as well as for independence in 2 \u00d7 2\ncontingency tables. In both settings, they \ufb01x the noise adding distribution, and use approximate\nsampling distributions to perform these DP tests. A similar approach is used by Sol14 to develop tests\nfor normally distributed data. The work of VS09 is extended by WLK15 and GLRV16, developing\nadditional tests for multinomial data. To implement their tests, WLK15 develop asymptotic sampling\ndistributions, verifying via simulations that the type I errors are reliable. On the other hand, GLRV16\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fuse simulations to compute an empirical type I error. Uhler et al. [USF13] develop DP chi-squared\ntests and p-values for GWAS data, and derive the exact sampling distribution of their noisy statistic.\nWorking under \u201cLocal Differential Privacy,\u201d a stronger notion of privacy than DP, GR18 develop\nmultinomial tests based on asymptotic distributions. Given a DP output, She17 and BRC17 develop\nsigni\ufb01cance tests for regression coef\ufb01cients.\nOutside the hypothesis testing setting, there is some work on optimal population inference under\nDP. Duchi et al. [DJW18] give general techniques to derive minimax rates under local DP, and in\nparticular give minimax optimal point estimates for the mean, median, generalized linear models, and\nnonparametric density estimation. Karwa and Vadhan [KV17] develop nearly optimal con\ufb01dence\nintervals for normally distributed data with \ufb01nite sample guarantees, which could potentially be\ninverted to give UMP-unbiased tests.\nRelated work on developing optimal DP mechanisms for general loss functions such as GV16a and\nGRS09, give mechanisms that optimize symmetric convex loss functions, centered at a real statistic.\nSimilarly, AS18 derive optimal mechanisms among the class of K-Norm Mechanisms.\nOur contributions The previous literature on DP hypothesis testing has a few characteristics in\ncommon: 1) nearly all of these proposed methods \ufb01rst add noise to the data, and perform their test\nas a post-processing procedure, 2) all of the hypothesis tests use either asymptotic distributions or\nsimulations to derive approximate decision rules, and 3) while each procedure is derived intuitively\nbased on classical theory, none show that they are optimal among all possible DP algorithms.\nIn contrast, in this paper we search over all DP hypothesis tests at level \u03b1, deriving the uniformly\nmost powerful (UMP) test for a population proportion. In Section 3, we show that arbitrary DP\nhypothesis tests, which report \u2018Reject\u2019 or \u2018Fail to Reject\u2019, can be written in terms of linear inequalities.\nIn Theorem 3.2, we show that for exchangeable data, DP tests need only depend on the empirical\ndistribution. We use this structure to \ufb01nd closed-form DP-UMP tests for simple hypotheses in\nTheorems 4.5 and 5.2, and extend these results to obtain one-sided DP-UMP tests in Corollary 5.3.\nThese tests are closely tied to our proposed \u201cTruncated-Uniform-Laplace\u201d (Tulap) distribution, which\nextends both the discrete Laplace distribution (studied in GRS09), and the Staircase distribution of\nGV16a to the setting of (\u0001, \u03b4)-DP. We prove that the Tulap distribution satis\ufb01es (\u0001, \u03b4)-DP in Theorem\n6.1. While the tests developed in the previous sections only resulted in the output \u2018Reject\u2019 or \u2018Fail to\nReject\u2019, in Section 6, we show that our DP-UMP tests can be stated as a post-processing of a Tulap\nrandom variable. From this formulation, we obtain exact p-values via Theorem 6.2 and Algorithm 1\nwhich agree with our DP-UMP tests. In Section 7, we show that our results apply to distribution-free\nhypothesis tests of continuous data. In Section 8, we verify through simulations that our UMP tests\nhave exact type I error, and are more powerful than current techniques.\n\n2 Background and notation\n\nWe use capital letters to denote random variables and lowercase letters for particular values. For\na random variable X, we denote FX as its cumulative distribution function (cdf), fX as either its\nprobability density function (pdf) or probability mass function (pmf), depending on the context.\nFor any set X , the n-fold cartesian product of X is X n = {(x1, x2, . . . , xn) | xi \u2208 X }. We\ndenote elements of X n with an underscore to emphasize that they are vectors. The Hamming\ndistance metric on X n is H : X n \u00d7 X n \u2192 Z\u22650, de\ufb01ned by H(x, x(cid:48)) = #{i | xi (cid:54)= x(cid:48)\ni}.\nDifferential Privacy, introduced by DMNS06, provides a formal measure of disclosure risk. The\nnotion of DP that we give in De\ufb01nition 2.1 more closely resembles the formulation in WZ10, which\nuses the language of distributions rather than random mechanisms. It is important to emphasize that\nthe notion of Differential Privacy in De\ufb01nition 2.1 does not involve any distribution model on X n.\nDe\ufb01nition 2.1 (Differential Privacy: DMNS06, WZ10). Let \u0001 > 0, \u03b4 \u2265 0, and n \u2208 {1, 2, . . .} be\ngiven. Let X be any set, and (Y , F ) be a measurable space. Let P = {Px | x \u2208 X n} be a set of\nprobability measures on (Y , F ). We say that P satis\ufb01es (\u0001, \u03b4)-Differential Privacy ((\u0001, \u03b4) - DP) if\nfor all B \u2208 F and all x, x(cid:48) \u2208 X n such that H(x, x(cid:48)) = 1, we have Px(B) \u2264 e\u0001Px(cid:48)(B) + \u03b4.\nIn De\ufb01nition 2.1, we interpret x \u2208 X n as the database we collect, where X is the set of possible\nvalues that one individual can contribute, and Y \u223c Px as the statistical result we report to the public.\nWith this interpretation, if a set of distributions satis\ufb01es (\u0001, \u03b4)-DP for small values of \u0001 and \u03b4, then if\none person\u2019s data is changed in the database, the distribution of Y does not change much. Ideally \u0001 is\n\n2\n\n\fn allows us to disregard events which have small probability. We\n\na small value less than 1, and \u03b4 (cid:28) 1\nrefer to (\u0001, 0)-DP as pure DP, and (\u0001, \u03b4)-DP as approximate DP.\nThe focus of this paper is to \ufb01nd uniformly most powerful (UMP) hypothesis tests, subject to DP. As\nthe output of a DP method is necessarily a random variable, we work with randomized hypothesis\ntests, which we review in De\ufb01nition 2.2. Our notation follows that of Sch96, Chapter 4.\niid\u223c f\u03b8, where \u03b8 \u2208 \u0398.\nDe\ufb01nition 2.2 (Hypothesis Test). Let (X1, . . . , Xn) \u2208 X n be distributed Xi\nLet \u03980, \u03981 be a partition of \u0398. A (randomized) test of H0 : \u03b8 \u2208 \u03980 versus H1 : \u03b8 \u2208 \u03981 is a\nEf\u03b8 \u03c6 \u2264 \u03b1.The power\nmeasurable function \u03c6 : X n \u2192 [0, 1]. We say a test \u03c6 is at level \u03b1 if sup\u03b8\u2208\u03980\nof \u03c6 at \u03b8 is denoted \u03b2\u03c6(\u03b8) = Ef\u03b8 \u03c6.\nLet \u03a6 be a set of tests. We say that \u03c6\u2217 \u2208 \u03a6 is the uniformly most powerful level \u03b1 (UMP-\u03b1) test\namong \u03a6 for H0 : \u03b8 \u2208 \u03980 versus H1 : \u03b8 \u2208 \u03981 if 1) sup\u03b8\u2208\u03980 \u03b2\u03c6\u2217 (\u03b8) \u2264 \u03b1 and 2) for any \u03c6 \u2208 \u03a6 such\nthat sup\u03b8\u2208\u03980 \u03b2\u03c6(\u03b8) \u2264 \u03b1 we have \u03b2\u03c6\u2217 (\u03b8) \u2265 \u03b2\u03c6(\u03b8), for all \u03b8 \u2208 \u03981.\nIn De\ufb01nition 2.2, \u03c6(x) is the probability of rejecting the null hypothesis, given that we observe\nx \u2208 X n. That is, the output of a test is either \u2018Reject\u2019, or \u2018Fail to Reject\u2019 with respective probabilities\n\u03c6(x), and 1 \u2212 \u03c6(x). While the condition of (\u0001, \u03b4)-DP does not involve the randomness of X, for\nhypothesis testing, the level, and power of a test depend on the model for X. In Section 3, we study\nthe set of hypothesis tests which satisfy (\u0001, \u03b4)-DP.\n\n3 Problem setup and exchangeability condition\nWe begin this section by considering arbitrary hypothesis testing problems under DP. Let \u03c6 : X n \u2192\n[0, 1] be any test. Since the only possible outputs of the mechanism are \u2018Reject\u2019 or \u2018Fail to Reject\u2019\nwith probabilities \u03c6(x) and 1 \u2212 \u03c6(x), the test \u03c6 satis\ufb01es (\u0001, \u03b4)-DP if and only if for all x, x(cid:48) \u2208 X n\nsuch that H(x, x(cid:48)) = 1,\n\n\u03c6(x) \u2264 e\u0001\u03c6(x(cid:48)) + \u03b4\n\nand\n\n(1 \u2212 \u03c6(x)) \u2264 e\u0001(1 \u2212 \u03c6(x(cid:48))) + \u03b4.\n\n(1)\nRemark 3.1. For any simple hypothesis test, where \u03a60 and \u03a61 are both singleton sets, the DP-UMP\ntest \u03c6\u2217 is the solution to a linear program. If X is \ufb01nite, this observation allows one to explore the\nstructure of DP-UMP tests through numerical linear program solvers.\nGiven the random vector X \u2208 X n, initially it may seem that we need to consider all \u03c6, which\nare arbitrary functions of X. However, assuming that X is exchangeable, Theorem 3.2 below says\nthat for any DP hypothesis tests, we need only consider tests which are functions of the empirical\ndistribution of X. In other words, \u03c6 need not consider the order of the entries in X. This result is\nreminiscent of De Finetti\u2019s Theorem (see Sch96, Theorem 1.48) in classical statistics.\nTheorem 3.2. Let \u0398 be a set and {\u00b5\u03b8}\u03b8\u2208\u0398 be a set of exchangeable distributions on X n. Let\n\u03c6 : X n \u2192 [0, 1] be a test satisfying (1). Then there exists \u03c6(cid:48) : X n \u2192 [0, 1] satisfying (1) which\n\nonly depends on the empirical distribution of X, such that(cid:82) \u03c6(cid:48)(x) d\u00b5\u03b8 =(cid:82) \u03c6(x) d\u00b5\u03b8, for all \u03b8 \u2208 \u0398.\nFor any \u03c0 \u2208 \u03c3(n), \u03c6(\u03c0(x)) satis\ufb01es (\u0001, \u03b4)-DP. By exchangeability,(cid:82) \u03c6(\u03c0(x)) d\u00b5\u03b8 =(cid:82) \u03c6(x) d\u00b5\u03b8.\n\n\u03c0\u2208\u03c3(n) \u03c6(\u03c0(x)), where \u03c3(n) is the symmetric group on n letters.\n\nProof. De\ufb01ne \u03c6(cid:48) by \u03c6(cid:48)(x) = 1\n\n(cid:80)\n\nn!\n\nSince condition 1 is closed under convex combinations, and integrals are linear, the result follows.\n\nunknown. Then the statistic X =(cid:80)n\n\nWe now state the particular problem which is the focus for the remainder of the paper, where each\nindividual contributes a sensitive binary value to the database. Let X \u2208 {0, 1}n be a random\niid\u223c Bern(\u03b8), where \u03b8 is\nvector, where Xi is the sensitive data of individual i. We model X as Xi\ni=1 Xi \u223c Binom(n, \u03b8) encodes the empirical distribution of\nX. By Theorem 3.2, we can restrict our attention to tests which are functions of X. Such tests\n\u03c6 : {0, 1, . . . , n} \u2192 [0, 1] satisfy (\u0001, \u03b4) -DP if and only if for all x \u2208 {1, 2, . . . , n},\n\n\u03c6(x) \u2264 e\u0001\u03c6(x \u2212 1) + \u03b4\n\n\u03c6(x \u2212 1) \u2264 e\u0001\u03c6(x) + \u03b4\n\n3\n\n(2)\n(3)\n\n\f(1 \u2212 \u03c6(x)) \u2264 e\u0001(1 \u2212 \u03c6(x \u2212 1)) + \u03b4\n\n(1 \u2212 \u03c6(x \u2212 1)) \u2264 e\u0001(1 \u2212 \u03c6(x)) + \u03b4.\n\n\u0001,\u03b4 =(cid:8)\u03c6 : \u03c6 satis\ufb01es (2)-(5)(cid:9).\n\n(4)\n(5)\n\nWe denote the set of all tests which satisfy (2)-(5) as D n\nRemark 3.3. For arbitrary DP hypothesis testing problems, the number of constraints generated by\n(1) could be very large, even in\ufb01nite, but for our problem we only have 4n constraints.\n\n4 Simple DP-UMP tests when \u03b4 = 0\n\nIn this section, we derive the DP-UMP test when \u03b4 = 0 for simple hypotheses. In particular, given\nn, \u0001 > 0, \u03b1 > 0, \u03b80 < \u03b81, and X \u223c Binom(n, \u03b8), we \ufb01nd the UMP test at level \u03b1 among D n\n\u0001,0 for\ntesting H0 : \u03b8 = \u03b80 versus H1 : \u03b8 = \u03b81.\nBefore developing these tests, we introduce the Truncated-Uniform-Laplace (Tulap) distribution,\nde\ufb01ned in De\ufb01nition 4.1, which is central to all of our main results. To motivate this distribution,\nrecall that GV16a show for general loss functions, adding discrete Laplace noise L \u223c DLap(e\u2212\u0001) to\nX is optimal under (\u0001, 0)-DP. For this reason, it is natural to consider a test which post-processes\nX + L. However, we know by classical UMP theory that since X + L is discrete, a randomized test\nis required. Instead of using a randomized test, by adding uniform noise U \u223c Unif(\u22121/2, 1/2) to\nX + L, we obtain a continuous sampling distribution, from which a deterministic test is available.\nWe call the distribution of (X + L + U ) | X as Tulap(X, b, 0). The distribution Tulap(X, b, q) is\nobtained by truncating within the central (1 \u2212 q)th-quantiles of Tulap(X, b, 0).\nIn De\ufb01nition 4.1, we use the nearest integer function [\u00b7] : R \u2192 Z. For any real number t \u2208 R, [t] is\nde\ufb01ned to be the integer nearest to t. If there are two distinct integers which are nearest to t, we take\n[t] to be the even one. Note that, [\u2212t] = \u2212[t] for all t \u2208 R.\nDe\ufb01nition 4.1 (Truncated-Uniform-Laplace (Tulap)). Let N and N0 be real-valued random variables.\nLet m \u2208 R, b \u2208 (0, 1) and q \u2208 [0, 1). We say that N0 \u223c Tulap(m, b, 0) and N \u223c Tulap(m, b, q) if\nN0 and N have the following cdfs:\n\nif x \u2264 [m]\nif x > [m],\n\nFN0 (x) =\n\n(cid:40) b\u2212[x\u2212m]\n\n1+b\n\n1 \u2212 b[x\u2212m]\n\n1+b\n\n(cid:0)b + (x \u2212 m \u2212 [x \u2212 m] + 1\n2 )(1 \u2212 b)(cid:1)\n(cid:0)b + ([x \u2212 m] \u2212 (x \u2212 m) + 1\n2 )(1 \u2212 b)(cid:1)\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f30\n\nif FN0 < q/2\n2 \u2264 FN0(x) \u2264 1 \u2212 q\nif q\nif FN0 > 1 \u2212 q\n2 .\n\nFN0 (x)\u2212 q\n\n1\u2212q\n\n1\n\n2\n\nFN (x) =\n\n2\n\nNote that a Tulap random variable Tulap(m, b, q) is continuous and symmetric about m.\nRemark 4.2. The Tulap distribution extends the staircase and discrete Laplace distributions as\nfollows: Tulap(0, b, 0) d= Staircase(b, 1/2) and [Tulap(0, b, 0)] d= DLap(b), where Staircase(b, \u03b3)\nis the distribution in GV16a. GV16a show that for a real valued statistic T and convex symmetric\nloss functions centered at T , the optimal noise distribution for \u0001-DP is Staircase(b, \u03b3) for b = e\u2212\u0001\nand some \u03b3 \u2208 (0, 1). If the statistic is a count, then GRS09 show that DLap(b) is optimal. Our\nresults agree with these works when \u03b4 = 0, and extend them to the case of arbitrary \u03b4.\n\nNow that we have de\ufb01ned the Tulap distribution, we are ready to develop the UMP test among D n\n\u0001,0\nfor the simple hypotheses H0 : \u03b8 = \u03b80 versus H1 : \u03b8 = \u03b81. In classical statistics, the UMP for this\ntest is given by the Neyman-Pearson lemma, however in the DP framework, our test must satisfy\n(2)-(5). Within these constraints, we follow the logic behind the Neyman-Pearson lemma as follows.\nLet \u03c6 \u2208 D n\n\u0001,0. Thinking of \u03c6(x) de\ufb01ned recursively, equations (2)-(5) give upper and lower bounds\nfor \u03c6(x) in terms of \u03c6(x \u2212 1). Since \u03b81 > \u03b80, and binomial distributions have a monotone likelihood\nratio (MLR) in X, larger values of X give more evidence for \u03b81 over \u03b80. Thus, \u03c6(x) should be\nincreasing in x as much as possible, subject to (2)-(5). Lemma 4.3 shows that taking \u03c6(x) to be such\na function is equivalent to having \u03c6(x) be the cdf of a Tulap random variable.\nLemma 4.3. Let \u0001 > 0 be given. Let \u03c6 : {0, 1, 2, . . . , n} \u2192 (0, 1). The following are equivalent:\n1) There exists m \u2208 (0, 1) such that \u03c6(0) = m and \u03c6(x) = min{e\u0001\u03c6(x\u2212 1), 1\u2212 e\u2212\u0001(1\u2212 \u03c6(x\u2212 1))}\n\nfor x = 1, . . . , n.\n\n4\n\n\f2) There exists m \u2208 (0, 1) such that \u03c6(0) = m and for x = 1, . . . , n,\n\n(cid:40)\n\n\u03c6(x) =\n\ne\u0001\u03c6(x \u2212 1)\n1 \u2212 e\u2212\u0001(1 \u2212 \u03c6(x \u2212 1))\n\nif \u03c6(x \u2212 1) \u2264 1\nif \u03c6(x \u2212 1) > 1\n\n1+e\u0001\n1+e\u0001 .\n\n3) There exists m \u2208 R such that \u03c6(x) = FN0 (x \u2212 m) for x = 0, 1, 2, . . . , n, where N0 \u223c\n\nTulap(0, b = e\u2212\u0001, 0).\n\nProof Sketch. First show that 1) and 2) are equivalent by checking which constraint is active. Then\nverify that FN0 (x \u2212 m) satis\ufb01es the recurrence of 2). This can be done using the properties of the\nTulap cdf, stated in Lemma 10.2, found in the Supplementary Material.\n\nWhile the form of 1) in Lemma 4.3 is intuitive, the connection to the Tulap cdf in 3) allows for a usable\nclosed-form of the test. This connection with the Tulap distribution is crucial for the development\nin Section 6, which shows that the test in Lemma 4.3 can be achieved by post-processing X + N,\nwhere N is distributed as Tulap.\nIt remains to show that the tests in Lemma 4.3 are in fact UMP among D n\nprove this is Lemma 4.4, which is a standard result in the classical hypothesis testing theory.\nLemma 4.4. Let (X , F , \u00b5) be a measure space and let f and g be two densities on X with respect\n\nto \u00b5. Suppose that \u03c61, \u03c62 : X \u2192 [0, 1] are such that(cid:82) \u03c61f d\u00b5 \u2265(cid:82) \u03c62f d\u00b5, and there exists k \u2265 0\nsuch that \u03c61 \u2265 \u03c62 when g \u2265 kf and \u03c61 \u2264 \u03c62 when g < kf. Then(cid:82) \u03c61g d\u00b5 \u2265(cid:82) \u03c62g d\u00b5.\n(cid:82) (\u03c61 \u2212 \u03c62)(g \u2212 kf ) d\u00b5 \u2265 0. Hence,(cid:82) \u03c61g d\u00b5 \u2212(cid:82) \u03c62g d\u00b5 \u2265 k(cid:0)(cid:82) \u03c61f d\u00b5 \u2212(cid:82) \u03c62f d\u00b5(cid:1) \u2265 0.\n\nProof. Note that (\u03c61 \u2212 \u03c62)(g \u2212 kf ) \u2265 0 for almost all x \u2208 X (with respect to \u00b5). This implies that\n\n\u0001,0. The main tool used to\n\nNext we present our key result, Theorem 4.5, which can be viewed as a \u2018Neyman-Pearson lemma\u2019 for\nbinomial data under (\u0001, 0)-DP. We extend this result in Theorem 5.2 for (\u0001, \u03b4)-DP.\nTheorem 4.5. Let \u0001 > 0, \u03b1 \u2208 (0, 1), 0 \u2264 \u03b80 < \u03b81 \u2264 1, and n \u2265 1 be given. Observe X \u223c\nBinom(n, \u03b8), where \u03b8 is unknown. Set the decision rule \u03c6\u2217 : Z \u2192 [0, 1] by \u03c6\u2217(x) = FN0(x \u2212 m),\nwhere N0 \u223c Tulap(0, b = e\u2212\u0001, 0) and m is chosen such that E\u03b80\u03c6\u2217(x) = \u03b1. Then \u03c6\u2217 is UMP-\u03b1\ntest of H0 : \u03b8 = \u03b80 versus H1 : \u03b8 = \u03b81 among D n\n\u0001,0.\nProof Sketch. Let \u03c6 be any other test which satis\ufb01es (2)-(5) at level \u03b1. Then, since \u03c6\u2217 can be\nwritten in the form of 1) in Lemma 4.3, there exists y \u2208 Z such that \u03c6\u2217(x) \u2265 \u03c6(x) when x \u2265 y\nand \u03c6\u2217(x) \u2264 \u03c6(x) when x < y. By MLR of the binomial distribution and Lemma 4.4, we have\n\u03b2\u03c6\u2217 (\u03b81) \u2265 \u03b2\u03c6(\u03b81).\n\nWhile the classical Neyman-Pearson lemma results in an acceptance and rejection region, the DP-\nUMP always has some probability of rejecting the null, due to the constraints (2)-(5). As \u0001 \u2191 \u221e, the\nDP-UMP converges to the non-private UMP.\n5 Simple and one-sided DP-UMP tests when \u03b4 \u2265 0\nIn this section, we extend the results of Section 4 to allow for \u03b4 \u2265 0. We begin by proposing the\nform of the DP-UMP test for simple hypotheses. As in Section 4, the DP-UMP test is increasing in\nx as much as (2)-(5) allow. Lemma 5.1 states that such a test can be written as the cdf of a Tulap\nrandom variable, where the parameter q depends on \u0001 and \u03b4. We omit the proof of Theorem 5.2,\nwhich mimics the proof of Theorem 4.5.\nLemma 5.1. Let \u0001 > 0 and \u03b4 \u2265 0 be given and set b = e\u2212\u0001 and q =\n\u03c6 : {0, 1, 2, . . . , n} \u2192 [0, 1]. The following are equivalent:\n1) There exists y \u2208 {0, 1, 2, . . . , n} and m \u2208 (0, 1) such that\n\n1\u2212b+2\u03b4b . Let\n\n2\u03b4b\n\n\uf8f1\uf8f2\uf8f30\n\n\u03c6(x) =\n\nm\nmin{e\u0001\u03c6(x \u2212 1) + \u03b4,\n\n1 \u2212 e\u2212\u0001(1 \u2212 \u03c6(x \u2212 1)) + e\u2212\u0001\u03b4,\n\n1}\n\nif x < y\nif x = y\nif x > y.\n\n5\n\n\f2) There exists y \u2208 {0, 1, 2, . . . , n} and m \u2208 (0, 1) such that\nif x < y\nif x = y\nif x > y and \u03c6(x \u2212 1) \u2264 1\u2212\u03b4\nif x > y and 1\u2212\u03b4\nif x > y and \u03c6(x \u2212 1) > 1 \u2212 \u03b4.\n\n0\nm\ne\u0001\u03c6(x \u2212 1) + \u03b4\n1 \u2212 e\u2212\u0001(1 \u2212 \u03c6(x \u2212 1)) + e\u2212\u0001\u03b4\n1\n\n\u03c6(x) =\n\n1+e\u0001\n\n1+e\u0001 \u2264 \u03c6(x \u2212 1) \u2264 1 \u2212 \u03b4\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\n3) There exists m \u2208 R such that \u03c6(x) = FN (x \u2212 m) where N \u223c Tulap(0, b, q).\n\nProof Sketch. The equivalence of 1) and 2) only requires determining which constraints are active.\nTo show the equivalence of 2 and 3, we verify that FN (x\u2212 m) satis\ufb01es the recurrence of 2), using the\nexpression of FN (x) in terms of FN0(x) given in De\ufb01nition 4.1, and the results of Lemma 4.3.\nTheorem 5.2. Let \u0001 > 0, \u03b4 \u2265 0, \u03b1 \u2208 (0, 1), 0 \u2264 \u03b80 < \u03b81 \u2264 1, and n \u2265 1 be given. Observe\nX \u223c Binom(n, \u03b8), where \u03b8 is unknown. Set b = e\u2212\u0001 and q = 2\u03b4b\n1\u2212b+2\u03b4b . De\ufb01ne \u03c6\u2217 : Z \u2192 [0, 1] by\n\u03c6\u2217(x) = FN (x \u2212 m) where N \u223c Tulap(0, b, q) and m is chosen such that E\u03b80\u03c6\u2217(x) = \u03b1. Then \u03c6\u2217\nis UMP-\u03b1 test of H0 : \u03b8 = \u03b80 versus H1 : \u03b8 = \u03b81 among D n\n\u0001,\u03b4.\n\nSo far we have focused on simple hypothesis tests, but since our test only depends on \u03b80, and not on\n\u03b81, our test is in fact the DP-UMP for one-sided tests, as stated in Corollary 5.3. Corollary 5.3 also\nshows that we can use our tests to build DP-UMP tests for H0 : \u03b8 \u2265 \u03b80 versus H1 : \u03b8 < \u03b80 as well.\nHence, Corollary 5.3 is our most general result so far, containing Theorems 4.5 and 5.2 as special\ncases.\nCorollary 5.3. Let X \u223c Binom(n, \u03b8). Set \u03c6\u2217(x) = FN (x \u2212 m1) and \u03c8\u2217(x) = 1 \u2212 FN (x \u2212 m2),\nwhere N \u223c Tulap\nand m1, m2 are chosen such that E\u03b80\u03c6\u2217(x) = \u03b1 and\n\u0001,\u03b4 for testing H0 : \u03b8 \u2264 \u03b80 versus H1 : \u03b8 > \u03b80, and\nE\u03b80 \u03c8\u2217(x) = \u03b1. Then \u03c6\u2217(x) is UMP-\u03b1 among D n\n\u03c8\u2217(x) is UMP-\u03b1 among D n\n\n\u0001,\u03b4 for testing H0 : \u03b8 \u2265 \u03b80 versus H1 : \u03b8 < \u03b80.\n\n0, b = e\u2212\u0001, q = 2\u03b4b\n\n1\u2212b+2\u03b4b\n\n(cid:16)\n\n(cid:17)\n\n6 Optimal one-sided private p-values\n\nFor the DP-UMP tests developed in Sections 4 and 5, the output is simply to \u2018Reject\u2019 or \u2018Fail to\nReject\u2019 H0. In scienti\ufb01c research, however, p-values are often used to weigh the evidence in favor of\nthe alternative hypothesis over the null. Informally, a p-value is the smallest level \u03b1, for which a test\noutputs \u2018Reject\u2019. A more formal de\ufb01nition is given in De\ufb01nition 10.4, in the Supplementary Material.\nIn this section, we show that our proposed DP-UMP tests can be achieved by post-processing a Tulap\nrandom variable. Using this, we develop a differentially private algorithm for releasing a private\np-value which agrees with the DP-UMP tests in Sections 4 and 5. While we state our p-values for\none-sided tests, they also apply to simple tests as a special case.\nSince our DP-UMP test from Theorem 5.2 rejects with probability \u03c6\u2217(x) = FN (x \u2212 m), given\nN \u223c FN , \u03c6\u2217(x) rejects the null if and only if X + N \u2265 m. So, our DP-UMP tests can be stated as\na post-processing of X + N. Theorem 6.1 states that releasing X + N satis\ufb01es (\u0001, \u03b4)-DP. By the\npost-processing property of DP (see DR14, Proposition 2.1), once we release X + N, any function\nof X + N also satis\ufb01es (\u0001, \u03b4)-DP. Thus, we can compute our private UMP-\u03b1 tests as a function of\nX + N for any \u03b1. The smallest \u03b1 for which we reject the null is the p-value for that test. In fact\nAlgorithm 1 and Theorem 6.2 give a more elegant method of computing this p-value.\nTheorem 6.1. Let X be any set, and T : X n \u2192 Z, with \u2206(T ) = sup|T (x) \u2212 T (x(cid:48))| = 1, where\nthe supremum is over the set {(x, x(cid:48)) \u2208 X n \u00d7 X n | H(x, x(cid:48)) = 1}. Then the set of distributions\nTulap\n\nT (x), b = e\u2212\u0001,\n\nsatis\ufb01es (\u0001, \u03b4)-DP.\n\n(cid:17)(cid:12)(cid:12)(cid:12)x \u2208 X n(cid:111)\n\n1\u2212b+2\u03b4b\n\n(cid:110)\n\n(cid:16)\n\n2\u03b4b\n\nProof Sketch. Since Tulap random variables are continuous and have MLR in T (x), by Lemma 10.3\nin the Supplementary Material, it suf\ufb01ces to show that for all t \u2208 R, the cdf of a Tulap random\nvariable FN (t\u2212 T (x)) satis\ufb01es (1), with \u03c6(x) replaced with FN (t\u2212 T (x)). This already established\nin Lemma 5.1, by the equivalence of 1) and 3).\n\n6\n\n\fTheorem 6.2. Let \u0001 > 0, \u03b4 \u2265 0, X \u223c Binom(n, \u03b8) where \u03b8 is unknown, and Z|X \u223c Tulap(X, b =\ne\u2212\u0001, q = 2\u03b4b\n1) p(\u03b80, Z) := P (X + N \u2265 Z | Z) is a p-value for H0 : \u03b8 \u2264 \u03b80 versus H1 : \u03b8 > \u03b80, where the\n\n1\u2212b+2\u03b4b ). Then\n\nprobability is over X \u223c Binom(n, \u03b80) and N \u223c Tulap(0, b, q).\n\n2) Let 0 < \u03b1 < 1 be given. The test \u03c6\u2217(x) = PZ\u223cTulap(x,b,q)(p(\u03b80, Z) \u2264 \u03b1 | X) is UMP-\u03b1 for\n\nH0 : \u03b8 \u2264 \u03b80 versus H1 : \u03b8 > \u03b80 among D n\n\u0001,\u03b4.\n3) The output of Algorithm 1 is equal to p(\u03b80, Z).\n\nIt follows from Theorem 6.2 that p(\u03b80, Z) is the stochastically smallest possible p-value for the\nhypothesis test H0 : \u03b8 \u2264 \u03b80 versus H1 : \u03b8 > \u03b80 under (\u0001, \u03b4)-DP. Note that 1 \u2212 p(\u03b80, Z) =\nP (X + N \u2264 Z | Z) is the p-value for H0 : \u03b8 \u2265 \u03b80 versus H1 : \u03b8 < \u03b80, which agrees with the\nUMP-\u03b1 test in Corollary 5.3.\n\nAlgorithm 1 UMP one-sided p-value for binomial data under (\u0001, \u03b4)-DP\nINPUT: n \u2208 N, \u03b80 \u2208 (0, 1), \u0001 > 0, \u03b4 \u2265 0, Z \u223c Tulap\n1: Set FN as the cdf of N \u223c Tulap(0, b, q)\n2: Set F = (FN (0 \u2212 Z), FN (1 \u2212 Z), . . . , FN (n \u2212 Z))(cid:62)\n\nX, b = e\u2212\u0001, q = 2\u03b4b\n\n(cid:1)\u03b80\n3: Set B = ((cid:0)n\n0 (1 \u2212 \u03b80)n\u22120,(cid:0)n\n\n0 (1 \u2212 \u03b80)n\u22121, . . . ,(cid:0)n\n(cid:1)\u03b81\n(cid:1)\u03b8n\n\n0 (1 \u2212 \u03b80)n\u2212n)(cid:62)\n\n1\u2212b+2\u03b4b\n\nn\n\n1\n\nOUTPUT: F (cid:62)B\n\n0\n\n(cid:16)\n\n(cid:17)\n\n,\n\nTo implement Algorithm 1, we must be able to sample a Tulap random variable, which Algorithm\n2 provides. The algorithm is based on the expression of Tulap(m, b, 0) in terms of geometric\nand uniform variables, and uses rejection sampling when q > 0 (see Bis06, Chapter 11 for an\nintroduction to rejection sampling). A detailed proof that the output of this algorithm follows the\ncorrect distribution can be found in Lemma 10.1 in the Supplementary Material.\nAlgorithm 2 Sample from Tulap distribution: N \u223c Tulap(m, b, q)\nINPUT: m \u2208 R, b \u2208 (0, 1), q \u2208 [0, 1).\n1: Draw G1, G2\n2: Set N = G1 \u2212 G2 + U + m\n3: If FN0 (N ) < q/2 or FN0 (N ) > 1 \u2212 q/2, where N0 \u223c Tulap(m, b, 0), go to 1:\nOUTPUT: N\n\niid\u223c Geom(1 \u2212 b) and U \u223c Unif(\u22121/2, 1/2)\n\nRemark 6.3. Since we know that releasing X + N, where N is a Tulap random variable, satis\ufb01es\n(\u0001, \u03b4)-DP, we can compute more than just p-values by post-processing X + N. We can also compute\npoint estimates for \u03b8, derive posterior distribution of \u03b8 given a prior, and compute con\ufb01dence intervals\nfor \u03b8 as post-processing of X + N. In the full version of this paper, we will study each of these\nobjectives, and connect con\ufb01dence intervals with the DP-UMP tests derived here.\nRemark 6.4. One may wonder about the asymptotic properties of the DP-UMP test. It is not hard\nto show that for any \ufb01xed \u0001 > 0, \u03b4, and \u03b80 \u2208 (0, 1), our proposed DP-UMP test has asymptotic\nrelative ef\ufb01ciency (ARE) of 1, relative to the non-private UMP test (see vdV00, Section 14.3 for\nan introduction to ARE). Let X \u223c Binom(n, \u03b80). De\ufb01ne the two test statistics as T1 = X and\nT2 = X + N, where N \u223c Tulap(0, b, q). The ARE of the DP-UMP relative to the non-private UMP\ntest is (C2/C1)2, where Ci = lim\nn\u2192\u221e\nE\u03b8Ti = n\u03b8, Var\u03b80(T1) = n\u03b80(1 \u2212 \u03b80), and Var\u03b80 (T2) = n\u03b80(1 \u2212 \u03b80) + Var(N ). Since Var(N ) is\na constant, we have that C1 = C2 = (\u03b80(1 \u2212 \u03b80))\u22121/2.\n\n(cid:19)(cid:46)(cid:112)n Var\u03b80(Ti), for i = 1, 2. We compute\n\n(cid:12)(cid:12)(cid:12)\u03b8=\u03b80\n\n(cid:18) d\n\nE\u03b8Ti\n\nd\u03b8\n\n7 Application to distribution-free inference\n\nIn this section, we show how our DP-UMP tests for count data can be used to test certain hypotheses\nfor continuous data. In particular, we give a DP version of the sign and median test allowing one to\ntest the median of either paired or independent samples. For an introduction to the sign and median\ntests, see Sections 5.4 and 6.4 of GC14. Let \u0001 > 0 and \u03b4 \u2208 [0, 1) be given, and let N \u223c Tulap(0, b, q)\nfor b = e\u2212\u0001 and q = 2\u03b4b\n\n1\u2212b\u22122\u03b4b.\n\n7\n\n\fd= X and\nSign test: We observe n iid pairs (Xi, Yi) for i = 1, . . . , n. Then for all i = 1, . . . , n, Xi\nd= Y for some random variables X and Y . We assume that for any pair (Xi, Yi) we can determine\nYi\nif Xi > Yi or not. For simplicity, we also assume that there are no pairs with Xi = Yi. Denote the\nunknown probability \u03b8 = P (X > Y ). We want to test a hypothesis such as H0 : \u03b8 \u2264 \u03b80 versus\nH1 : \u03b8 > \u03b80. The sign test uses the test statistic T = #{Xi > Yi}. Since the sensitivity of T is 1, by\nTheorem 6.1, T + N satis\ufb01es (\u0001, \u03b4)-DP. Note that the test statistic is distributed as T \u223c Binom(n, \u03b8).\nUsing Algorithm 1, we obtain a private p-value for the sign test as a post-processing of T + N.\nMedian test: We observe two independent sets of iid data {Xi}n\ni=1, where all Xi\nand Yi are distinct values, and we have a total ordering on these values. Then there exists random\nd= Y for all i. We want to test H0 : median(X) \u2264\nvariables X and Y such that Xi\nmedian(Y ) versus H1 : median(X) > median(Y ). The median test uses the test statistic T =\n#{i | rank(Xi) > n}, where rank(Xi) = #{Xj \u2264 Xi} + #{Yj \u2264 Xi}. Since the sensitivity\nof T is 1, by Theorem 6.1, T + N satis\ufb01es (\u0001, \u03b4)-DP. When median(X) = median(Y ), T \u223c\nHyperGeom(n = n, m = n, k = n). Using Algorithm 1, with B replaced with the pmf of\nHyperGeom(n = n, m = n, k = n), we obtain a private p-value for the median test as a post-\nprocessing of T + N.\n\ni=1 and {Yi}n\n\nd= X and Yi\n\n8 Simulations\n\nIn this section, we study both the empirical power and the empirical type I error of our DP-UMP\ntest against the normal approximation proposed by VS09. We de\ufb01ne the empirical power to be the\nproportion of times a test \u2018Rejects\u2019 when the alternative is true, and the empirical type I error as the\nproportion of times a test \u2018Rejects\u2019 when the null is true. For our simulations, we focus on small\nsamples as the noise introduced by DP methods is most impactful in this setting.\nIn Figure 1, we plot the empirical power of our UMP test, the Normal Approximation from VS09,\nand the non-private UMP. For each n, we generate 10,000 samples from Binom(n, .95). We privatize\neach X by adding N \u223c Tulap(0, e\u2212\u0001, 0) for the DP-UMP and L \u223c Lap(1/\u0001) for the Normal\nApproximation. We compute the UMP p-value via Algorithm 1 and the approximate p-value for X+L,\n\nusing the cdf of N(cid:0)X, n/4 + 2/\u00012(cid:1). The empirical power is given by (10000)\u22121#{p-value< .05}.\n\nThe DP-UMP test indeed gives higher power compared to the Normal Approximation, but the\napproximation does not lose too much power. Next we see that type I error is another issue.\nIn Figure 2 we plot the empirical type I error of the DP-UMP and the Normal Approximation tests.\nWe \ufb01x \u0001 = 1 and \u03b4 = 0, and vary \u03b80. For each \u03b80, we generate 100,000 samples from Binom(30, \u03b80).\nFor each sample, we compute the DP-UMP and Normal Approximation tests at type I error \u03b1 = .05.\nWe plot the proportion of times we reject the null as well as moving average curves. The DP-UMP,\nwhich is provably at type I error \u03b1 = .05 achieves type I error very close to .05, but the Normal\nApproximation has a higher type I error for small values of \u03b80, and a lower type I error for large\nvalues of \u03b80.\n\n9 Discussion and future directions\n\nIn this paper, we derived uniformly most powerful simple and one-sided tests for binary data among\nall DP \u03b1-level tests. Previously, while various hypothesis tests under DP have been proposed, none\nhave satis\ufb01ed such an optimality criterion. While our initial DP-UMP tests only output \u2018Reject\u2019 or\n\u2018Fail to Reject\u2019, we showed that they can be achieved by post-processing a noisy suf\ufb01cient statistic.\nThis allows us to produce private p-values which agree with the DP-UMP tests. Our results can also\nbe applied to obtain p-values for distribution-free tests, to test some hypotheses about continuous\ndata under DP.\nA simple, yet fundamental observation that underlies our results is that DP tests can be written in\nterms of linear constraints. This idea alone allows for a new perspective on DP hypothesis testing,\nwhich is particularly applicable to other discrete problems, such as multinomial models or difference\nof population proportions. Stating the problem in this form allows for the consideration of all possible\nDP tests, and allows the exploration of UMP tests through numerical linear program solvers.\nWhile the focus of this work is on hypothesis testing, these results can also be applied to obtain\noptimal length con\ufb01dence intervals for binomial data. In fact, classical statistical theory establishes a\n\n8\n\n\fFigure 1: Empirical power for UMP and Normal\nApproximation tests for H0 : \u03b8 \u2264 .9 versus\nH1 : \u03b8 \u2265 .9. The true value is \u03b8 = .95. \u0001 = 1\nand \u03b4 = 0. n varies along the x-axis.\n\nFigure 2: Empirical type I error \u03b1 for UMP and\nNormal Approximation tests for H0 : \u03b8 \u2264 \u03b80\nversus H1 : \u03b8 \u2265 \u03b80. \u03b80 varies along the x-axis.\nn = 30, \u0001 = 1, and \u03b4 = 0. Target is \u03b1 = .05.\n\nconnection between UMP tests and Uniformly Most Accurate (UMA) con\ufb01dence intervals. Besides\ncon\ufb01dence intervals, the p-value function for the test H0 : \u03b8 \u2265 \u03b80 versus H1 : \u03b8 < \u03b80 is a cdf\nwhich generates a con\ufb01dence distribution; see XS13 for a review. Since this p-value corresponds\nto the DP-UMP test, this con\ufb01dence distribution is stochastically more concentrated about the true\n\u03b8, than any other private con\ufb01dence distribution. In the full paper, we plan to explore con\ufb01dence\nintervals and con\ufb01dence distributions in detail, establishing connections between our approach here\nand optimal inference in these settings.\nWe showed that for exchangeable data, DP tests need only depend on the empirical distribution. For\nbinary data, the empirical distribution is equivalent to the sample sum, which is a complete suf\ufb01cient\nstatistic for the binomial model. However, in general it is not clear whether optimal DP tests are\nalways a function of complete suf\ufb01cient statistics as is the case for classical UMP tests. It would be\nworth investigating whether there is a notion of suf\ufb01ciency which applies for DP tests.\nWhen \u03b4 = 0, our optimal noise adding mechanism, the proposed Tulap distribution, is related to\nthe discrete Laplace distribution, which GRS09 and GV16a also found is optimal for a general class\nof loss functions. For \u03b4 > 0, a truncated discrete Laplace distribution is optimal for our problem.\nLittle previous work has looked into optimal noise adding mechanisms for approximate DP. GV16b\nstudied this problem to some extent, but did not explore truncated Laplace distributions. Steinke\n[Ste18] proposes that truncated Laplace can be viewed as the canonical distribution for approximate\nDP in a way that Laplace is canonical for pure DP. Further exploration in the use of truncated Laplace\ndistributions in the approximate DP setting may be of interest.\n\nAcknowledgements\n\nWe would like to thank Vishesh Karwa and Matthew Reimherr for helpful discussions and feedback\non previous drafts. We also thank the reviewers for their helpful comments and suggestions, which\nhave contributed to many improvements in the presentation of this work. This work is supported in\npart by NSF Award No. SES-1534433 to The Pennsylvania State University.\n\nReferences\n\n[AS18] Jordan Awan and Aleksandra Slavkovi\u00b4c. Structure and sensitivity in differential privacy:\n\nComparing k-norm mechanisms. ArXiv e-prints, January 2018. Under Review.\n\n9\n\n0.20.40.60.81.0nempirical power163264128256512lllllllDP UMPDP Normal ApproximationNon\u2212private UMP0.0400.0450.0500.0550.060thetaempirical type I error0.050.150.250.350.450.550.650.750.850.95llllllllllllllllllllUMPNormal Approximation\f[Bis06] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science\n\nand Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.\n\n[BRC17] A. Barrientos, A. Reiter, J.and Machanavajjhala, and Y. Chen. Differentially private\n\nsigni\ufb01cance tests for regression coef\ufb01cients. ArXiv e-prints, May 2017.\n\n[CB02] G. Casella and R.L. Berger. Statistical Inference. Duxbury advanced series in statistics\n\nand decision sciences. Thomson Learning, 2002.\n\n[DJW18] John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Minimax optimal proce-\ndures for locally private estimation. Journal of the American Statistical Association,\n113(521):182\u2013201, 2018.\n\n[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating Noise to\nSensitivity in Private Data Analysis, pages 265\u2013284. Springer Berlin Heidelberg, Berlin,\nHeidelberg, 2006.\n\n[DR14] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy.\n\nFoundations and Trends in Theoretical Computer Science, 9:211\u2013407, August 2014.\n\n[GC14] J.D. Gibbons and S. Chakraborti. Nonparametric Statistical Inference, Fourth Edition:\n\nRevised and Expanded. Taylor & Francis, 2014.\n\n[GLRV16] Marco Gaboardi, Hyun Lim, Ryan Rogers, and Salil Vadhan. Differentially private\nchi-squared hypothesis testing: Goodness of \ufb01t and independence testing. In Maria Flo-\nrina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International\nConference on Machine Learning, volume 48 of Proceedings of Machine Learning\nResearch, pages 2111\u20132120, New York, New York, USA, 20\u201322 Jun 2016. PMLR.\n\n[GR18] Marco Gaboardi and Ryan Rogers. Local private hypothesis testing: Chi-square tests.\nIn Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International\nConference on Machine Learning, volume 80 of Proceedings of Machine Learning\nResearch, pages 1626\u20131635, Stockholmsm\u00e4ssan, Stockholm Sweden, 10\u201315 Jul 2018.\nPMLR.\n\n[GRS09] Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. Universally utility-\nmaximizing privacy mechanisms. In Proceedings of the Forty-\ufb01rst Annual ACM Sympo-\nsium on Theory of Computing, STOC \u201909, pages 351\u2013360, New York, NY, USA, 2009.\nACM.\n\n[GV16a] Quan Geng and Pramod Viswanath. The optimal noise-adding mechanism in differential\n\nprivacy. IEEE Transactions on Information Theory, 62(2):925\u2013951, 2 2016.\n\n[GV16b] Quan Geng and Pramod Viswanath. Optimal noise adding mechanisms for approximate\n\ndifferential privacy. IEEE Trans. Information Theory, 62(2):952\u2013969, 2016.\n\n[IK06] Seidu Inusah and Tomasz J. Kozubowski. A discrete analogue of the laplace distribution.\n\nJournal of Statistical Planning and Inference, 136(3):1090 \u2013 1102, 2006.\n\n[KV17] Vishesh Karwa and Salil P. Vadhan. Finite sample differentially private con\ufb01dence\n\nintervals. CoRR, abs/1711.03908, 2017.\n\n[Sch96] M.J. Schervish. Theory of Statistics. Springer Series in Statistics. Springer New York,\n\n1996.\n\n[She17] Or Sheffet. Differentially private ordinary least squares. In Doina Precup and Yee Whye\nTeh, editors, Proceedings of the 34th International Conference on Machine Learning, vol-\nume 70 of Proceedings of Machine Learning Research, pages 3105\u20133114, International\nConvention Centre, Sydney, Australia, 06\u201311 Aug 2017. PMLR.\n\n[Sol14] Eftychia Solea. Differentially private hypothesis testing for normal random variables.\n\nMaster\u2019s thesis, The Pennsylvania State University, May 2014.\n\n[Ste18] Thomas Steinke. Private correspondence, 2018.\n\n10\n\n\f[USF13] Caroline Uhler, Aleksandra Slavkovi\u00b4c, and Stephen Fienberg. Privacy-preserving data\nsharing for genome-wide association studies\". Journal of Privacy and Con\ufb01dentiality, 5,\n2013.\n\n[vdV00] A.W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Proba-\n\nbilistic Mathematics. Cambridge University Press, 2000.\n\n[VS09] Duy Vu and Aleksandra Slavkovi\u00b4c. Differential privacy for clinical trial data: Preliminary\nevaluations. In Proceedings of the 2009 IEEE International Conference on Data Mining\nWorkshops, ICDMW \u201909, pages 138\u2013143, Washington, DC, USA, 2009. IEEE Computer\nSociety.\n\n[WLK15] Y. Wang, J. Lee, and D. Kifer. Revisiting Differentially Private Hypothesis Tests for\n\nCategorical Data. ArXiv e-prints, November 2015.\n\n[WZ10] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy.\n\nJASA, 105:489:375\u2013389, 2010.\n\n[XS13] Min-ge Xie and Kesar Singh. Con\ufb01dence distribution, the frequentist distribution\nestimator of a parameter: A review. International Statistical Review, 81(1):3\u201339, 2013.\n\n11\n\n\f", "award": [], "sourceid": 2070, "authors": [{"given_name": "Jordan", "family_name": "Awan", "institution": "Penn State University"}, {"given_name": "Aleksandra", "family_name": "Slavkovi\u0107", "institution": "Pennsylvania State University"}]}