{"title": "Differentially Private Bayesian Inference for Exponential Families", "book": "Advances in Neural Information Processing Systems", "page_first": 2919, "page_last": 2929, "abstract": "The study of private inference has been sparked by growing concern regarding the analysis of data when it stems from sensitive sources. We present the first method for private Bayesian inference in exponential families that properly accounts for noise introduced by the privacy mechanism. It is efficient because it works only with sufficient statistics and not individual data. Unlike other methods, it gives properly calibrated posterior beliefs in the non-asymptotic data regime.", "full_text": "Differentially Private Bayesian Inference for\n\nExponential Families\n\nGarrett Bernstein\n\nCollege of Information and Computer Sciences\n\nUniversity of Massachusetts Amherst\n\nAmherst, MA 01002\n\ngbernstein@cs.umass.edu\n\nDaniel Sheldon\n\nCollege of Information and Computer Sciences\n\nUniversity of Massachusetts Amherst\n\nAmherst, MA 01002\n\nsheldon@cs.umass.edu\n\nAbstract\n\nThe study of private inference has been sparked by growing concern regarding the\nanalysis of data when it stems from sensitive sources. We present the \ufb01rst method\nfor private Bayesian inference in exponential families that properly accounts for\nnoise introduced by the privacy mechanism. It is ef\ufb01cient because it works only\nwith suf\ufb01cient statistics and not individual data. Unlike other methods, it gives\nproperly calibrated posterior beliefs in the non-asymptotic data regime.\n\n1\n\nIntroduction\n\nDifferential privacy is the dominant standard for privacy [1]. A randomized algorithm that satis\ufb01es\ndifferential privacy offers protection to individuals by guaranteeing that its output is insensitive to\nchanges caused by the data of any single individual entering or leaving the data set. An algorithm can\nbe made differentially private by applying one of several general-purpose mechanisms to randomize\nthe computation in an appropriate way, for example, by adding noise calibrated to the sensitivity\nof the quantity being computed, where sensitivity captures how much the quantity depends on any\nindividual\u2019s data [1]. Due to the obvious importance of protecting individual privacy while drawing\npopulation level inferences from data, differentially private algorithms have been developed for a\nbroad range of machine learning tasks [2\u20139].\nThere is a growing interest in private methods for Bayesian inference [10\u201314]. In Bayesian inference,\na modeler selects a prior distribution p(\u03b8) over some parameter, observes data x that depends\nprobabilistically on \u03b8 through a model p(x | \u03b8), and then reasons about \u03b8 through the posterior\ndistribution p(\u03b8 | x), which quanti\ufb01es updated beliefs and uncertainty about \u03b8 after observing x.\nBayesian inference is a core machine learning task and there is an obvious need to be able to conduct\nit in a way that protects privacy when x is sensitive. Additionally, recent work has identi\ufb01ed surprising\nconnections between sampling from posterior distributions and differential privacy\u2014for example,\na single perfect sample from p(\u03b8 | x) satis\ufb01es differential privacy for some setting of the privacy\nparameter [10\u201313].\nAn \u201cobvious\u201d way to conduct private Bayesian inference is to privatize the computation of the\nposterior, that is, to design a differentially private algorithm A that outputs y = A(x) with the goal\nthat y \u2248 p(\u03b8 | x) is a privatized representation of the posterior. However, using y directly as \u201cthe\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fposterior\u201d will not correctly quantify beliefs, because the Bayesian modeler never observes x, she\nobserves y; her posterior beliefs are now quanti\ufb01ed by p(\u03b8 | y).\nThis paper will take a different approach to private Bayesian inference by designing a pair of\nalgorithms: The release mechanism A computes a private statistic y = A(x) of the input data; the\ninference algorithm P computes p(\u03b8 | y). These algorithms should satisfy the following criteria:\n\u2022 Privacy. The release mechanism A is differentially private. By the post-processing property of\n\u2022 Calibration. The inference algorithm P can ef\ufb01ciently compute or approximate the correct\n\u2022 Utility. Informally, the statistic y should capture \u201cas much information as possible\u201d about x so\n\ndifferential privacy [15], all further computations are also private.\nposterior, p(\u03b8 | y) (see Section 4 for our process to measure calibration).\nthat p(\u03b8 | y) is \u201cclose\u201d to p(\u03b8 | x) (see Section 4 for our process to measure utility).\n\nOne challenge is computational ef\ufb01ciency. The exact posterior p(\u03b8 | y) \u221d (cid:82) p(\u03b8)p(x1:n |\n\nImportantly, the release mechanism A is public, so the distribution p(y | x) is known. Williams and\nMcSherry \ufb01rst suggested conducting inference on the output of a differentially private algorithm and\nshowed how to do this for the factored exponential mechanism [16]; see also [17\u201320].\nOur work focuses speci\ufb01cally on Bayesian inference when the private data X = x1:n is an iid sample\nof (publicly known) size n from an exponential family model p(xi | \u03b8). Exponential families include\nmany of the most familiar parametric probability models. We will adopt a straightforward release\nmechanism where the Laplace mechanism [1] is used to release noisy suf\ufb01cient statistics y [12, 19],\nwhich are a \ufb01nite-dimensional quantity that capture all the information about \u03b8 [21].\nThe technical challenge is then to develop an ef\ufb01cient general-purpose inference algorithm P.\n\u03b8)p(y|x1:n)dx1:n integrates over all possible data sets [16], which is intractable to do directly\nfor large n. We integrate instead over the suf\ufb01cient statistics s, which have \ufb01xed dimension and\ncompletely characterize the posterior; furthermore, since they are a sum over individuals, p(s | \u03b8)\nis asymptotically normal. We develop an ef\ufb01cient Gibbs sampler that uses a normal approximation\nfor s together with variable augmentation to model the Laplace noise in a way that yields simple\nupdates [22].\nA second challenge is that the suf\ufb01cient statistics may be unbounded, which makes their release\nincompatible with the Laplace mechanism. We address this by imposing truncation bounds and\nonly computing statistics from data that fall within the bounds. We show how to use automatic\ndifferentiation and a \u201crandom sum\u201d central limit theorem to compute the parameters of the normal\napproximation p(s | \u03b8) for a truncated exponential family when the number of individuals that fall\nwithin the truncation bounds is unknown.\nOur overall contribution is the pairing of an existing simple release mechanism A with a novel,\nef\ufb01cient, and general-purpose Gibbs sampler P that meets the criteria outlined above for private\nBayesian inference in any univariate exponential family or multivariate exponential family with\nbounded suf\ufb01cient statistics.1 We show empirically that when compared with competing methods,\nours is the only one that provides properly calibrated beliefs about \u03b8 in the non-asymptotic regime,\nand that it provides good utility compared with other private Bayesian inference approaches.\n\n2 Differential Privacy\n\nDifferential privacy requires that an individual\u2019s data has a limited effect on the algorithm\u2019s behavior.\nIn our setting, a data set X = x1:n := (x1, . . . , xn) consists of records from n individuals, where\nxi \u2208 Rd is the data of the ith individual. We will assume n is known. Differential privacy reasons\nabout the hypothesis that one individual chooses not to remove their data from the data set, and their\nrecord is replaced by another one.2 Let nbrs(X) denote the set of data sets that differ from X by\nexactly one record\u2014i.e., if X(cid:48) \u2208 nbrs(X), then X(cid:48) = (x1:i, x(cid:48)\n\ni, xi+1:n) for some i.\n\n1There are remaining technical challenges for multivariate models with unbounded suf\ufb01cient statistics that\n\nwe leave for future work.\n\n2This variant assumes n remains \ufb01xed, which is sometimes called bounded differential privacy [23].\n\n2\n\n\fDe\ufb01nition 1 (Differential Privacy; Dwork et al. [1]). A randomized algorithm A satis\ufb01es \u0001-differential\nprivacy if for any input X, any X(cid:48) \u2208 nbrs(X) and any subset of outputs O \u2286 Range(A),\n\nPr[A(X) \u2208 O] \u2264 exp(\u0001) Pr[A(X\n\n(cid:48)) \u2208 O].\n\nthe Laplace mechanism outputs the random variable L(X) \u223c Lap(cid:0)f (X), \u2206f /\u0001(cid:1) from the Laplace\n\nWe achieve differential privacy by injecting noise into statistics that are computed on the data. Let f\nbe any function that maps datasets to Rd. The amount of noise depends on the sensitivity of f.\nDe\ufb01nition 2 (Sensitivity). The sensitivity of a function f is \u2206f = maxX,X(cid:48)\u2208nbrs(X) (cid:107)f (X) \u2212\nf (X(cid:48))(cid:107)1.\nWe drop the subscript f when it is clear from context. Our approach achieves differential privacy\nthrough the application of the Laplace mechanism.\nDe\ufb01nition 3 (Laplace Mechanism; Dwork et al. [1]). Given a function f that maps data sets to Rm,\ndistribution, which has density Lap(z; u, b) = (2b)\u2212m exp (\u2212(cid:107)z \u2212 u(cid:107)1/b). This corresponds to\nadding zero-mean independent noise ui \u223c Lap(0, \u2206f /\u0001) to each component of f (X).\nA \ufb01nal important property of differential privacy is post-processing [15]; if an algorithm A is \u0001-\ndifferentially private, then any algorithm that takes as input only the output of A, and does not use\nthe original data set X, is also \u0001-differentially private.\n\n3 Private Bayesian Inference in Exponential Families\n\n(cid:62)\n\nt(x) \u2212 A(\u03b7)(cid:1) ,\nt(x1:n) \u2212 nA(\u03b7)(cid:1) ,\n\n(cid:62)\n\nthe log-partition function, and h(x) is the base measure. The density of the full data is\n\nWe consider the canonical setting of Bayesian inference in an exponential family. The modeler\nposits a prior distribution p(\u03b8), assumes the data x1:n is an iid sample from an exponential family\nmodel p(x | \u03b8), and wishes to compute the posterior p(\u03b8 | x1:n). An exponential family in natural\nparameterization has density\n\nwhere \u03b7 are the natural parameters, t(x) is the suf\ufb01cient statistic, A(\u03b7) =(cid:82) h(x) exp(cid:0)\u03b7(cid:62)t(x)(cid:1) dx is\nwhere h(x1:n) =(cid:81)n\n\np(x | \u03b7) = h(x) exp(cid:0)\u03b7\np(x1:n | \u03b7) = h(x1:n) exp(cid:0)\u03b7\ni=1 h(xi) and t(x1:n) =(cid:80)n\n\ni=1 t(xi). Notice that once normalizing constants\nare dropped, this density is dependent on the data only directly through the suf\ufb01cient statistics,\ns = t(x1:n).\nWe will write exponential families more generally as p(x | \u03b8) to indicate the case when the natural\nparameters \u03b7 = \u03b7(\u03b8) depend on a different parameter vector \u03b8.\nEvery exponential family distribution has a conjugate prior distribution p(\u03b8; \u03bb)[24] with hyperparam-\neters \u03bb. A conjugate prior has the property that, if it is used as the prior, then the posterior belongs\nto the same family, i.e., p(\u03b8 | x1:n; \u03bb) = p(\u03b8; \u03bb(cid:48)) for some \u03bb(cid:48) that depends only on \u03bb, n, and the\nsuf\ufb01cient statistics s. We write this function as \u03bb(cid:48) = Conjugate-Update(\u03bb, s, n); our methods are not\ntied to the speci\ufb01c choice of conjugate prior, only that the posterior parameters can be calculated in\nthis form. See supplementary material for a general form of Conjugate-Update.\n\n3.1 Release Algorithm: Noisy Suf\ufb01cient Statistics\n\nIf privacy were not a concern, the Bayesian modeler would simply compute the suf\ufb01cient statistics\ns = t(x1:n) and use them to update the posterior beliefs. However, to maintain privacy, the modeler\nmust access the sensitive data only through a randomized release mechanism A. As a result, in order\nto obtain proper posterior beliefs the modeler must account for the randomization of the release\nmechanism by performing inference.\nWe take the simple approach of releasing noisy suf\ufb01cient statistics via the Laplace mechanism, as\nin [12, 13, 19]. Suf\ufb01cient statistics are a natural quantity to release. They are an \u201cinformation\nbottleneck\u201d\u2014a \ufb01nite-dimensional quantity that captures all the relevant information about \u03b8. The\ni=1 t(xi) is a sum\nover individuals, the sensitivity is \u2206s = maxx,x(cid:48)\u2208Rd (cid:107)t(x) \u2212 t(x(cid:48))(cid:107)1. When t(\u00b7) is unbounded this\nquantity becomes in\ufb01nite; we will modify the release mechanism so the sensitivity is \ufb01nite (Sec. 3.3).\n\nreleased value is y = A(x1:n) \u223c Lap(s, \u2206s/\u0001). Because s = t(x1:n) = (cid:80)n\n\n3\n\n\f3.2 Basic Inference Approach: Bounded Suf\ufb01cient Statistics\nThe goal of the inference algorithm P is to compute p(\u03b8 | y). We \ufb01rst develop the basic approach for\nthe simpler case when t(x) is bounded, and then extend both A and P to handle the unbounded case.\nThe full joint distribution of the probability model can be expressed as:\n\np(\u03b8, s, y) = p(\u03b8) p(s | \u03b8) p(y | s),\n\n(cid:82)\n\nwhere p(\u03b8) = p(\u03b8; \u03bb) is a conjugate prior and the goal is to compute a representation of p(\u03b8 | y) \u221d\ns p(\u03b8, s, y)ds by integrating over the suf\ufb01cient statistics.\nWe will develop a Gibbs sampler to sample from this distribution. There are two main challenges.\nFirst, the distribution p(s | \u03b8) is obtained by marginalizing over the data sample x1:n, and is usually\nnot known in closed form. We will address this with an asymptotically correct normal approximation.\nSecond, when resampling s within the Gibbs algorithm, we require the full conditional distribution of\ns given the other variables, which is proportional to p(s|\u03b8)p(y | s). Care must be taken to make it\neasy to sample from this conditional distribution. We address this via variable augmentation. We\ndiscuss our approach to both challenges in detail below.\nNormal approximation of p(s | \u03b8). The exact form of the suf\ufb01cient statistic distribution p(s | \u03b8)\nis obtained by marginalizing over the data:\n\n(cid:90)\n\np(s | \u03b8) =\n\np(x1:n | \u03b8)dx1:n,\n\nt\u22121(s)\n\n\u22121(s) :=(cid:8)x1:n : t(x1:n) = s(cid:9).\n\nt\n\ntractable full conditional. By the central limit theorem (CLT), because s =(cid:80)\n\nIn general, the exact form of this distribution is not available. In some cases, it is\u2014for example\nif x \u223c Bernoulli(\u03b8) then s \u223c Binomial(n, \u03b8)\u2014but even then it may not lead to a tractable full\nconditional for s.\nProperties of exponential families pave the way toward a general approach that always leads to a\ni t(xi) is a sum of iid\nrandom variables, it is asymptotically normal. It can be approximated as p(s | \u03b8) \u2248 N (s; n\u00b5, n\u03a3),\nwhere \u00b5 = E[t(x)] and \u03a3 = Var[t(x)] are the mean and variance of the suf\ufb01cient statistic of a\nn (s \u2212 n\u00b5) D\u2212\u2192 N (0, \u03a3) [25]. The\n1\u221a\nsingle individual. This approximation is asymptotically correct:\nquantities \u00b5 and \u03a3 can be computed using well-known properties of exponential families [25]:\n\n\u00b5 = E[t(x)] =\n\n\u2202\n\u2202\u03b7(cid:62) A(\u03b7),\n\n\u03a3 = Var[t(x)] =\n\n\u22022\n\n\u2202\u03b7\u2202\u03b7(cid:62) A(\u03b7),\n\n(1)\n\nwhere \u03b7 = \u03b7(\u03b8) is the natural parameter.\nNote that we will not use this approximation for Gibbs updates of \u03b8. Instead, we will compute the\nconditional p(\u03b8 | s) using standard conjugacy formulas. In this sense, we maintain two views of the\njoint distribution p(\u03b8, s)\u2014when updating \u03b8, it is the standard exponential family model, which leads\nto conjugate updates; when updating s, it is approximated as p(\u03b8)N (s; n\u00b5, s\u03a3), which will lead to\nsimple updates when combined with a variable augmentation technique.\nVariable augmentation for p(y | s). We seek a tractable form for the full conditional of s under\nthe normal approximation, which is the product of a normal density and a Laplace density:\n\np(s | \u03b8, y) \u221d N (s; n\u00b5, n\u03a3) Lap(y; s, \u2206s/\u0001).\n\nA similar situation arises in the Bayesian Lasso [22], and we will employ the same variable aug-\nmentation trick. A Laplace random variable z \u223c Lap(u, b) can be written as a scale mixture of\nnormals by introducing a latent variable \u03c32 \u223c Exp(1/(2b2)), i.e., the distribution with density\n\n1/(2b2) exp(cid:0)\u2212\u03c32/(2b2)(cid:1) and letting z \u223c N (u, \u03c32). We apply this separately to each dimension of\n\nthe vector y so that:\n\n(cid:32)\n\n(cid:33)\n\nj \u223c Exp\n\u03c32\n\n\u00012\n2\u22062\ns\n\ny \u223c N(cid:0)s, diag(\u03c32)(cid:1).\n\n,\n\n4\n\n\fAlgorithm 1 Gibbs Sampler, Bounded \u2206s\n1: Initialize \u03b8, s, \u03c32\n2: repeat\n3:\n4:\n5:\n\ns \u223c NormProduct(cid:0)n\u00b5, n\u03a3, y, diag(\u03c32)(cid:1)\n(cid:17)\n\n\u03b8 \u223c p(\u03b8; \u03bb(cid:48)) where \u03bb(cid:48) = Conjugate-Update(\u03bb, s, n)\nCalculate \u00b5 = E[s] and \u03a3 = Var[s] (e.g., use Eq. (1))\n\n(cid:16)\n\n1/\u03c32\n\nj \u223c InverseGaussian\n\n\u0001\n\n\u2206s|y\u2212s| , \u00012\n\n\u22062\ns\n\n6:\n\nSubroutine NormProduct\n1: input: \u00b51, \u03a31, \u00b52, \u03a32\n1 + \u03a3\u22121\n\n2: \u03a33 =(cid:0)\u03a3\u22121\n(cid:0)\u03a3\u22121\n\n(cid:1)\u22121\n1 \u00b51 + \u03a3\u22121\n\n3: \u00b53 = \u03a33\n4: return: N (\u00b53, \u03a33)\n\n2\n\n2 \u00b52\n\n(cid:1)\n\nThe Gibbs Sampler. After the normal approxi-\nmation and variable augmentation, the generative\nprocess is as shown to the right. The \ufb01nal Gibbs\nsampling algorithm is shown in Algorithm 1. Note\nthat the update for \u03b8 is based on conjugacy in the ex-\nact distribution p(\u03b8, s), while the update for s uses\nthe density of the generative process to the right, so\nthat p(s | \u03b8, \u03c32, y) \u221d p(s | \u03b8) p(y | \u03c32, s), which\nis a product of two normal densities\n\n(cid:19)\ny \u223c N(cid:0)s, diag(\u03c32)(cid:1)\nN (s; n\u00b5, n\u03a3)N(cid:0)y; s, diag(\u03c32)(cid:1) \u221d N (s; \u00b5s, \u03a3s),\n\n\u03b8 \u223c p(\u03b8; \u03bb)\ns \u223c N (n\u00b5, n\u03a3)\nj \u223c Exp\n\u03c32\n\n(cid:18) \u00012\n\n2\u22062\ns\n\nfor all j\n\n(cid:112)\n\nwhere \u00b5s and \u03a3s are are de\ufb01ned in Algorithm 1 [26].\nlows Park and Casella [22];\n\nv/(2\u03c0x3) exp(cid:0)\u2212v(x \u2212 m)2/(2m2x)(cid:1). Full derivations are given in the supplement.\n\nfol-\nthe inverse Gaussian density is InverseGaussian(x; m, v) =\n\nThe update for \u03c32\n\n3.3 Unbounded Suf\ufb01cient Statistics and Truncated Exponential Families\n\nThe Laplace mechanism does not apply when the suf\ufb01cient statistics are unbounded, because \u2206s =\nmaxx,y (cid:107)t(x) \u2212 t(y)(cid:107)1 = \u221e. Thus, we need a new release mechanism A and inference algorithm P.\nWe present a solution for the case when x is univariate. All elements of the solution can generalize\nto higher dimensions, except that one step will have running time that is exponential in d; we leave\nimprovement of this to future work and focus on the simpler univariate case.\nRelease mechanism. Our solution is to truncate the support of the (now univariate) p(x | \u03b8)\nto x \u2208 [a, b], where a and b are \ufb01nite bounds provided by the modeler. If the modeler cannot\nselect bounds a priori, they may be selected privately as a preliminary step using a variant of the\nexponential mechanism (see PrivateQuantile in Smith [27]).3 Then, given truncation bounds,\nthe data owner redacts individuals where xi /\u2208 [a, b] and reports the truncated suf\ufb01cient statistics\ni=1 1[a,b](xi) \u00b7 t(xi) where 1S(x) is the indicator function of the set S. The sensitivity of \u02c6s\nis now \u2206\u02c6s = maxx,y\u2208R (cid:107)\u02c6t(x) \u2212 \u02c6t(y)(cid:107)1 where \u02c6t(x) = 1[a,b](x) t(x). An easy upper bound for this\nquantity (see supplement) is:\n\n\u02c6s =(cid:80)n\n\n(cid:12)(cid:12)tj(x) \u2212 tj(y)(cid:12)(cid:12)(cid:111)\n\n,\n\n\u2206\u02c6s \u2264 d(cid:88)\n\nj=1\n\n(cid:110)\n\nmax\n\nmax\nx\u2208[a,b]\n\n|tj(x)|, max\nx,y\u2208[a,b]\n\nwhere tj(x) is the jth component of the suf\ufb01cient statistics. The bounds [a, b] will be selected so this\nquantity is bounded. The released value is y \u223c Lap(\u02c6s, \u2206\u02c6s/\u0001).\n\nInference: Truncated Exponential Family. Several new challenges arise for inference. The\nquantity \u02c6s is no longer a suf\ufb01cient statistic for the model p(x | \u03b8), and we will need new insights to\nunderstand p(\u02c6s | \u03b8) and p(\u03b8 | \u02c6s). Since \u02c6s is a sum over individuals where xi \u2208 [a, b], it will be useful\nto examine the probability of the event x \u2208 [a, b] as well as the conditional distribution of x given this\nevent. To facilitate a general development, assume a generic truncation interval [v, w], not necessarily\n3Selecting truncation bounds will consume some of the privacy budget and modify the release mechanism A.\n\nWe do not consider inference with respect to this part of the release mechanism.\n\n5\n\n\u2713<latexit sha1_base64=\"JqEnYvV6PtsKBJYmBVwEpjIMANw=\">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit><latexit sha1_base64=\"JqEnYvV6PtsKBJYmBVwEpjIMANw=\">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit><latexit sha1_base64=\"JqEnYvV6PtsKBJYmBVwEpjIMANw=\">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit><latexit sha1_base64=\"JqEnYvV6PtsKBJYmBVwEpjIMANw=\">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit>\ud835\udc60\ud835\udc66\ud835\udf0e$\fequal to [a, b]. Let F (x; \u03b8) =(cid:82) x\n\n(cid:16)\n\n(cid:17)\n\n(cid:90) w\n\nh(x) exp(cid:0)\u03b7T t(x)(cid:1) dx.\n\n(2)\n\n\u2212\u221e p(x | \u03b8)dx be the CDF of the original (univariate) exponential\nfamily model. It is clear that Pr(x \u2208 [v, w]) = F (w; \u03b8) \u2212 F (v; \u03b8). The conditional distribution of x\ngiven x \u2208 [v, w] is a truncated exponential family, which, in its natural parameterization is:\n\n\u02c6p(x | \u03b7) = 1[v,w](x) h(x) exp\n\n\u03b7T t(x) \u2212 \u02c6A(\u03b7)\n\n,\n\n\u02c6A =\n\nv\n\nNote that this is still an exponential family model (with a modi\ufb01ed base measure), and all of the\nstandard results apply, such as the existence of a conjugate prior and the formulas in Eq. (1) for the\nmean and variance of t(x) under the truncated distribution.\nRandom sum CLT for p(\u02c6s | \u03b8). We would like to again apply an asymptotic normal approximation\nfor \u02c6s, but we do not know how many individuals fall within the truncation bounds. The \u201crandom sum\nCLT\u201d of Robbins [28] applies to the setting where the number of terms in the sum is itself a random\nk=1 t(xik ), where {i1, . . . , iN} is the set of indices\nof individuals with data inside the truncation bounds, i.e., the indices such that xik \u2208 [v, w]. The\nnumber N is now a random variable distributed as N \u223c Binom(n, q), where q = F (w; \u03b8) \u2212 F (v; \u03b8).\nProposition 1. Let \u02c6\u00b5 = E \u02c6p[t(x)] and \u02c6\u03a3 = Var \u02c6p[t(x)] be the mean and variance of t(x) in the\nk=1 t(xik ) is asymptotically normal with mean and\n\nvariable. The sum can be rewritten as \u02c6s =(cid:80)N\ntruncated exponential family. Then \u02c6s = (cid:80)N\n\nvariance:\n\nm := E[\u02c6s] = E[N ]\u02c6\u00b5 = nq \u02c6\u00b5,\nV := Var(\u02c6s) = E[N ] \u02c6\u03a3 + Var[N ]\u02c6\u00b5\u02c6\u00b5\n\n(cid:0)\u02c6s \u2212 m(cid:1) D\u2212\u2192 N (0, \u00af\u03a3) as n \u2192 \u221e, where \u00af\u03a3 = V/n = q \u02c6\u03a3 + q(1 \u2212 q)\u02c6\u00b5\u02c6\u00b5(cid:62).\n\n(cid:62) = nq \u02c6\u03a3 + nq(1 \u2212 q)\u02c6\u00b5\u02c6\u00b5\n(cid:62)\n\n.\n\nSpeci\ufb01cally, 1\u221a\nn\n\nProof. Each term of the sum has mean \u02c6\u00b5 and variance \u02c6\u03a3, and the number of terms is N \u223c\nBinom(n, q). The result follows from Robbins [28].\n\nComputing \u02c6\u00b5 and \u02c6\u03a3 by automatic differentiation (autodiff). To use the normal approximation\nwe need to compute \u02c6\u00b5 and \u02c6\u03a3.\nLemma 1. Let p(x | \u03b8) be a univariate exponential family model and let \u02c6p(x | \u03b8) be the correspond-\ning exponential family model truncated to generic interval [v, w]. Then\n\n\u02c6\u00b5 = E \u02c6p[t(x)] = Ep[t(x)] +\n\n(3)\n\n\u2202\n\n\u2202\u03b7T log(cid:0)F (w; \u03b7) \u2212 F (v; \u03b7)(cid:1)\n\u2202\u03b7\u2202\u03b7T log(cid:0)F (w; \u03b7) \u2212 F (v; \u03b7)(cid:1)\n\n\u22022\n\n\u02c6\u03a3 = Var \u02c6p[t(x)] = Varp[t(x)] +\n\nProof. It is straightforward to derive from Eq. (2) that \u02c6A(\u03b7) = A(\u03b7) + log(cid:0)F (w; \u03b7)\u2212 F (v; \u03b7)(cid:1). The\n\n(4)\n\nresult follows from applying Eq. (1) to this expression for \u02c6A(\u03b7). See the supplement for derivation of\n\u02c6A(\u03b7) and proof of this lemma.\n\nWe will use Equations (3) and (4) to compute \u02c6\u00b5 and \u02c6\u03a3 by using autodiff to compute the desired\nderivatives. If the mean and variance Ep[t(x)] and Varp[t(x)] of the untruncated distribution are not\nknown, we can apply autodiff to compute them as well using Eq. (1).\nWhen x is multivariate, analogous expressions can be derived for \u02c6\u00b5 and \u02c6\u03a3. The adjustment factors will\ninclude multivariate CDFs, with a number of terms that grow exponentially in d. This is currently the\nmain limitation in applying our methods to multivariate models with unbounded suf\ufb01cient statistics.\nConjugate updates for p(\u03b8 | \u02c6s). The \ufb01nal issue is the distribution p(\u03b8 | \u02c6s), which is no longer\ncharacterized by conjugacy because \u02c6s are not the full suf\ufb01cient statistics. We again turn to variable\ni=1 1[b,\u221e]t(xi) be the suf\ufb01cient statistics\nfor the individuals that fall in the lower portion [\u2212\u221e, a] and upper portion [b,\u221e] of the support of x,\nrespectively. We will instantiate \u02c6s(cid:96) and \u02c6su as latent variables and model their distributions using the\n\ni=1 1[\u2212\u221e,a]t(xi) and \u02c6su =(cid:80)n\n\naugmentation. Let \u02c6s(cid:96) =(cid:80)n\n\n6\n\n\fAlgorithm 2 Gibbs Sampler, Unbounded \u2206s\n1: Initialize \u03b8, \u02c6s, \u03c32, a, b\n2: [v(cid:96), w(cid:96)] \u2190 [\u2212\u221e, a]\n3: [vc, wc] \u2190 [a, b]\n4: [vu, wu] \u2190 [b,\u221e]\n5: repeat\n6:\n7:\n8:\n9:\n10:\n\nmr, Vr \u2190 RS-CLT(\u03b8, vr, wr) for r \u2208 {(cid:96), c, u}\nm(cid:48)\ns \u223c N (m(cid:96) + m(cid:48)\n\u03b8 \u223c p(\u03b8; \u03bb(cid:48)) where \u03bb(cid:48) = Conjugate-Update(\u03bb, s, n)\nRecalculate mc and Vc, then draw \u02c6sc \u223c N (mc, Vc)\n1/\u03c32\n\nc \u2190 NormProduct(cid:0)mc, Vc, y, diag(cid:0)\u03c32(cid:1)(cid:1)\n\nc + mu, V(cid:96) + V(cid:48)\n\nj \u223c InverseGaussian\n\nc + Vu)\n\nc, V(cid:48)\n\n(cid:16)\n\n(cid:17)\n\n11:\n\n\u0001\n\n\u2206\u02c6s|y\u2212\u02c6sc| , \u00012\n\n\u22062\n\u02c6s\n\nAlgorithm 3 RS-CLT\n1: input: \u03b8, v, w\n2: q \u2190 F (b; w) \u2212 F (a; v)\n3: \u02c6\u00b5, \u02c6\u03a3 \u2190 autodiff of Eqns. 3, 4\n4: m \u2190 nq\n5: V \u2190 nq \u02c6\u03a3 + nq(1 \u2212 q)\u02c6\u00b5\u02c6\u00b5(cid:62)\n6: return: m, V\n\n12: until\n\nrandom sum CLT approximation from Prop. 1 and Lemma 1 (but with different truncation bounds).\nLet \u02c6sc = \u02c6s be the suf\ufb01cient statistics for the \u201ccenter\u201d portion, and de\ufb01ne the three truncation intervals\nas [v(cid:96), w(cid:96)] = [\u2212\u221e, a], [vc, wc] = [a, b] and [vu, wu] = [b,\u221e]. The full suf\ufb01cient statistics are equal\nto s = \u02c6s(cid:96) + \u02c6sc + \u02c6su. Conditioned on all other variables, each component is multivariate normal, so\nthe sum s is also multivariate normal. We can therefore sample s and then sample from p(\u03b8 | s) using\nconjugacy. We will also need to draw \u02c6sc separately to be used to update \u03c32.\n\nThe Gibbs Sampler. The (approximate) generative process in the unbounded case is:\n\n\u03b8 \u223c p(\u03b8; \u03bb),\n\n\u02c6sr \u223c N(cid:0)mr, Vr), for r \u2208 {(cid:96), c, u} where mr, Vr = RS-CLT(\u03b8, vr, wr)\n(cid:19)\ny \u223c N(cid:0)\u02c6sc, diag(\u03c32)(cid:1).\n\n(cid:18) \u00012\n\nj \u223c Exp\n\u03c32\n\nfor all j ,\n\n2\u22062\n\u02c6s\n\nThe Gibbs sampler to sample from this distribution is given in Algorithm 2. Note that in Line 8\nwe employ rejection sampling in which suf\ufb01cient statistics are sampled until the values drawn are\nvalid for the given data model, e.g., s must be positive for the binomial distribution. The RS-CLT\nalgorithm to compute parameters of the random sum CLT is shown in Algorithm 3.\n\n4 Experiments\n\nWe design experiments to measure the calibration and utility of our method for posterior inference.\nWe conduct experiments for the binomial model with beta prior, the multinomial model with Dirichlet\nprior, and the exponential model with gamma prior. The last model is unbounded and requires\ntruncation; we set the bounds to keep the middle 95% of individuals, which is reasonable to assume\nknown a priori for some cases, such as modeling human height.\n\nMethods. We run our Gibbs sampler for 5000 iterations after 2000 burnin iterations (see supple-\nmentary material for convergence results), which we compare to two baselines. The \ufb01rst method\nuses the same release mechanism as our Gibbs sampler and performs conjugate updates using the\nnoisy suf\ufb01cient statistics [12, 13]. This method converges to the true posterior as n \u2192 \u221e because\nthe Laplace noise will eventually become negligible compared to sampling variability [12]. However,\nthe noise is not negligible for moderate n; we refer to this method as \u201cnaive\u201d. For truncated models\nwe allow the naive method to \u201ccheat\u201d by accessing the noisy untruncated suf\ufb01cient statistics s. Thus\nthe method is not private, and receives strictly more information than our Gibbs sampler, but with\nthe same magnitude noise. This allows us to demonstrate miscalibration without highly technical\nmodi\ufb01cations to the baseline method to be able to deal with truncated suf\ufb01cient statistics.\n\n7\n\n\fThe second baseline is a version of the one-posterior sampling (OPS) mechanism [11\u201313], which\nemploys the exponential mechanism [29] to release samples from a privatized posterior. We release\n100 samples using the method of [12], each with \u0001ops = \u0001/100, such that the entire algorithm\nachieves \u0001-differential privacy. Private MCMC sampling [11] is a more sophisticated method to\nrelease multiple samples from a privatized posterior and could potentially make better use of the\nprivacy budget; however, private MCMC will also necessarily be miscalibrated, and only achieves the\nweaker privacy guarantee of (\u0001, \u03b4)-differential privacy for \u03b4 > 0, so would not be direct comparable\nto our method. OPS serves as a suitable baseline that achieves \u0001-differential privacy. We include OPS\nonly for experiments on the binomial model, for which it requires the support of \u03b8 to be truncated to\n[a0, 1 \u2212 a0] where a0 > 0. We set a0 = 0.1.\nWe also include a non-private posterior for comparison, which performs conjugate updates using the\nnon-noisy suf\ufb01cient statistics.\n\nEvaluation. We evaluate both the calibration and utility of the posterior. For calibration we adapt a\nmethod of Cook et al. [30]: the idea is to draw iid samples (\u03b8i, xi) from the joint model p(\u03b8)p(x | \u03b8),\nand conduct posterior inference in each trial. Let Fi(\u03b8) be the CDF of the true posterior p(\u03b8 | xi)\nin trial i. Then we know that Ui = Fi(\u03b8i) is uniformly distributed, because \u03b8i \u223c p(\u03b8 | xi) (see\nsupplementary material). In other words, the actual parameter \u03b8i is equally likely to land at any\nquantile of the posterior. To test the posterior inference procedure, we instead compute Ui as the\nquantile at which \u03b8i lands within a set of samples from the approximate posterior. After M trials\nof the whole procedure we test for uniformity of U1:M using the Kolmogorov-Smirnov goodness-\nof-\ufb01t test [31], which measures the maximum distance between the empirical CDF of U1:M and the\nuniform CDF; lower values are better and zero corresponds to perfect uniformity. We also visualize\nthe empirical CDFs to assess calibration qualitatively.\nHigher utility of a private posterior is indicated by closeness to the non-private posterior, which we\nmeasure with maximum mean discrepancy (MMD), a kernel-based statistical test to determine if two\nsets of samples are drawn from different distributions [32]. Given m i.i.d. samples (p, q) \u223c P \u00d7 Q,\nan unbiased estimate of the MMD is\n\n(cid:88)m\n\ni(cid:54)=j\n\nMMD2(P, Q) =\n\n1\n\nm(m \u2212 1)\n\n(k(pi, pj) + k(qi, qj) \u2212 k(pi, qj) \u2212 k(pj, qi)) ,\n\nwhere k is a continuous kernel function; we use a standard normal kernel. The higher the value the\nmore likely the two samples are drawn from different distributions.\n\nResults. Figure 1a shows the results for three models and varying n and \u0001. Our method (Gibbs)\nachieves the same calibration level as non-private posterior inference for all settings. The naive\nmethod ignores noise and is too con\ufb01dent about parameter values implied by treating the noisy\nsuf\ufb01cient statistics as true ones; it is only well-calibrated with increasing n and \u0001 when noise becomes\nnegligible relative to population size. OPS is not calibrated because it samples from an over-dispersed\nversion of p(\u03b8 | x).\nFigure 1b shows the empirical CDF plots for n = 1000 and \u0001 = 0.01. Our method and the non-private\nmethod are both perfectly calibrated. The naive method\u2019s over-con\ufb01dence in the wrong suf\ufb01cient\nstatistics causes its posterior to usually be too tight at the wrong value; thus the true parameter always\nlies in a tail of the approximate posterior, so too much mass is placed near 0 and 1. OPS shows the\nopposite behavior: its posterior is always too diffuse, so the true parameter lies close to the middle.\nFor multinomial we show measures only for the parameter of the \ufb01rst category, but results hold for\nall categories.\nFigure 1c shows the MMD test statistic between each method and the non-private posterior, used as\na measure of utility. Our method consistently achieves utility at least as good as the naive method\nfor binomial and multinomial models. We omit OPS, which is never calibrated. For the exponential\nmodel (not shown) we did not obtain conclusive utility comparisons due to the lack of a naive baseline\nthat properly handles truncation; the \u201ccheating\u201d naive method from our calibration experiments\nsometimes attains higher utility than our method, and sometimes lower, but this comparison is not\nmeaningful because it receives strictly more information.\n\n8\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 1: (a) Calibration as Kolmogorov-Smirnov statistic vs. number of individuals at \u0001 =\n[0.01, 0.10] for binomial, multinomial, and exponential models.\n(b) Empirical CDF plots at\n(n = 1000; \u0001 = 0.01) for binomial, multinomial, and exponential models. (c) Utility as MMD\nwith non-private posterior vs. number of individuals at \u0001 = [0.01, 0.10] for binomial and multinomial\nmodels.\n\n5 Conclusion\n\nWe presented a Gibbs sampling approach for private posterior inference in exponential family models.\nRather than trying to approximate the posterior of p(\u03b8 | x1:n), we divide our procedure into a private\nrelease mechanism y = A(x1:n) and an inference algorithm P that computes p(\u03b8 | y). The release\nmechanism is designed to facilitate inference. We develop a general-purpose Gibbs sampler that\napplies to any exponential family model that has bounded suf\ufb01cient statistics; a truncated version\napplies to univariate models with unbounded suf\ufb01cient statistics. The Gibbs sampler uses general\nproperties of exponential families to approximate the distribution of the suf\ufb01cient statistics, and\ntherefore avoids the need to reason about individuals. Promising lines of future work are to develop\nef\ufb01cient methods for multivariate exponential families with unbounded suf\ufb01cient statistics, and to\ndevelop methods for conditional models based on exponential families, such as generalized linear\nmodels.\n\nAcknowledgments\n\nThis material is based upon work supported by the National Science Foundation under Grant Nos.\n1522054 and 1617533.\n\nReferences\n[1] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to\nsensitivity in private data analysis. In Theory of Cryptography Conference, pages 265\u2013284.\nSpringer, 2006.\n\n9\n\n0.000.250.50KS stat.binomialmultinomial\u03b5=0.01exponential102103104n0.000.250.50KS stat.102103104n102103104n\u03b5=0.10Non-Priv.GibbsNaiveOPS0.00.51.0u0.00.51.0Pr(U\u2264u)binomial0.00.51.0umultinomial0.00.51.0uexponentialNon-Priv.GibbsNaiveOPS0.00.20.40.6MMDbinomial102103104n0.00.20.40.6MMD\u03b5=0.01\u03b5=0.100.00.20.40.6MMDmultinomial102103104n0.00.20.40.6MMD\u03b5=0.01\u03b5=0.10\f[2] Kamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression. In Advances\n\nin Neural Information Processing Systems, pages 289\u2013296, 2009.\n\n[3] Benjamin I.P. Rubinstein, Peter L. Bartlett, Ling Huang, and Nina Taft. Learning in a large func-\ntion space: Privacy-preserving mechanisms for SVM learning. arXiv preprint arXiv:0911.5708,\n2009.\n\n[4] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam\n\nSmith. What can we learn privately? SIAM Journal on Computing, 40(3):793\u2013826, 2011.\n\n[5] Mart\u00edn Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar,\nand Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC\nConference on Computer and Communications Security, pages 308\u2013318. ACM, 2016.\n\n[6] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical\n\nrisk minimization. Journal of Machine Learning Research, 12(Mar):1069\u20131109, 2011.\n\n[7] Daniel Kifer, Adam Smith, and Abhradeep Thakurta. Private convex empirical risk minimization\n\nand high-dimensional regression. Journal of Machine Learning Research, 1(41):3\u20131, 2012.\n\n[8] Prateek Jain and Abhradeep Thakurta. Differentially private learning with kernels. ICML (3),\n\n28:118\u2013126, 2013.\n\n[9] Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization:\nEf\ufb01cient algorithms and tight error bounds. In Foundations of Computer Science (FOCS), 2014\nIEEE 55th Annual Symposium on, pages 464\u2013473. IEEE, 2014.\n\n[10] Christos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, and Benjamin I.P. Rubinstein.\nRobust and private Bayesian inference. In International Conference on Algorithmic Learning\nTheory, pages 291\u2013305. Springer, 2014.\n\n[11] Yu-Xiang Wang, Stephen Fienberg, and Alex Smola. Privacy for free: Posterior sampling\nand stochastic gradient Monte Carlo. In Proceedings of the 32nd International Conference on\nMachine Learning (ICML-15), pages 2493\u20132502, 2015.\n\n[12] James Foulds, Joseph Geumlek, Max Welling, and Kamalika Chaudhuri. On the theory and\npractice of privacy-preserving Bayesian data analysis. In Proceedings of the Thirty-Second\nConference on Uncertainty in Arti\ufb01cial Intelligence, pages 192\u2013201, 2016.\n\n[13] Zuhe Zhang, Benjamin I.P. Rubinstein, and Christos Dimitrakakis. On the differential privacy\n\nof Bayesian inference. In Thirtieth AAAI Conference on Arti\ufb01cial Intelligence, 2016.\n\n[14] Joseph Geumlek, Shuang Song, and Kamalika Chaudhuri. Renyi differential privacy mecha-\nnisms for posterior sampling. In Advances in Neural Information Processing Systems, pages\n5295\u20135304, 2017.\n\n[15] Cynthia Dwork and Aaron Roth. The Algorithmic Foundations of Differential Privacy. Found.\n\nand Trends in Theoretical Computer Science, 2014.\n\n[16] Oliver Williams and Frank McSherry. Probabilistic inference and differential privacy.\n\nAdvances in Neural Information Processing Systems, pages 2451\u20132459, 2010.\n\nIn\n\n[17] Vishesh Karwa, Aleksandra B. Slavkovi\u00b4c, and Pavel Krivitsky. Differentially private exponential\nrandom graphs. In International Conference on Privacy in Statistical Databases, pages 143\u2013155.\nSpringer, 2014.\n\n[18] Vishesh Karwa and Aleksandra B. Slavkovi\u00b4c. Inference using noisy degrees: Differentially\n\nprivate beta-model and synthetic graphs. The Annals of Statistics, 44(1):87\u2013112, 2016.\n\n[19] Garrett Bernstein, Ryan McKenna, Tao Sun, Daniel Sheldon, Michael Hay, and Gerome Miklau.\nDifferentially private learning of undirected graphical models using collective graphical models.\nIn International Conference on Machine Learning, pages 478\u2013487, 2017.\n\n10\n\n\f[20] Aaron Schein, Zhiwei Steven Wu, Mingyuan Zhou, and Hanna Wallach. Locally private\nBayesian inference for count models. NIPS 2017 Workshop: Advances in Approximate Bayesian\nInference, 2018.\n\n[21] R.A. Fisher. On the mathematical foundations of theoretical statistics. Phil. Trans. R. Soc. Lond.\n\nA, 222(594-604):309\u2013368, 1922.\n\n[22] Trevor Park and George Casella. The Bayesian lasso. Journal of the American Statistical\n\nAssociation, 103(482):681\u2013686, 2008.\n\n[23] Daniel Kifer and Ashwin Machanavajjhala. No free lunch in data privacy. In Proceedings of the\n2011 ACM SIGMOD International Conference on Management of data, pages 193\u2013204. ACM,\n2011.\n\n[24] Persi Diaconis and Donald Ylvisaker. Conjugate priors for exponential families. The Annals of\n\nstatistics, pages 269\u2013281, 1979.\n\n[25] Peter J. Bickel and Kjell A. Doksum. Mathematical statistics: basic ideas and selected topics,\n\nvolume I, volume 117. CRC Press, 2015.\n\n[26] Kaare Brandt Petersen and Michael Syskind Pedersen. The matrix cookbook. Technical\n\nUniversity of Denmark, 7(15):510, 2008.\n\n[27] Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates. In\nProceedings of the Forty-third Annual ACM Symposium on Theory of Computing, pages 813\u2013\n822, 2011.\n\n[28] Herbert Robbins. The asymptotic distribution of the sum of a random number of random\n\nvariables. Bulletin of the American Mathematical Society, 54(12):1151\u20131161, 1948.\n\n[29] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Foundations\nof Computer Science, 2007. FOCS\u201907. 48th Annual IEEE Symposium on, pages 94\u2013103. IEEE,\n2007.\n\n[30] Samantha R. Cook, Andrew Gelman, and Donald B. Rubin. Validation of software for Bayesian\nmodels using posterior quantiles. Journal of Computational and Graphical Statistics, 15(3):\n675\u2013692, 2006.\n\n[31] Frank J. Massey Jr. The Kolmogorov-Smirnov test for goodness of \ufb01t. Journal of the American\n\nStatistical Association, 46(253):68\u201378, 1951.\n\n[32] Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch\u00f6lkopf, and Alexander\nSmola. A kernel two-sample test. Journal of Machine Learning Research, 13(Mar):723\u2013773,\n2012.\n\n11\n\n\f", "award": [], "sourceid": 1532, "authors": [{"given_name": "Garrett", "family_name": "Bernstein", "institution": "University of Massachusetts Amherst"}, {"given_name": "Daniel", "family_name": "Sheldon", "institution": "University of Massachusetts Amherst"}]}