{"title": "Probabilistic Inference and Differential Privacy", "book": "Advances in Neural Information Processing Systems", "page_first": 2451, "page_last": 2459, "abstract": "We identify and investigate a strong connection between probabilistic inference and differential privacy, the latter being a recent privacy definition that permits only indirect observation of data through noisy measurement. Previous research on differential privacy has focused on designing measurement processes whose output is likely to be useful on its own. We consider the potential of applying probabilistic inference to the measurements and measurement process to derive posterior distributions over the data sets and model parameters thereof. We find that probabilistic inference can improve accuracy, integrate multiple observations, measure uncertainty, and even provide posterior distributions over quantities that were not directly measured.", "full_text": "Probabilistic Inference and Differential Privacy\n\nOliver Williams\nMicrosoft Research\n\nMountain View, CA 94043\n\nolliew@microsoft.com\n\nFrank McSherry\nMicrosoft Research\n\nMountain View, CA 94043\n\nmcsherry@microsoft.com\n\nAbstract\n\nWe identify and investigate a strong connection between probabilistic inference\nand differential privacy, the latter being a recent privacy de\ufb01nition that permits\nonly indirect observation of data through noisy measurement. Previous research\non differential privacy has focused on designing measurement processes whose\noutput is likely to be useful on its own. We consider the potential of applying\nprobabilistic inference to the measurements and measurement process to derive\nposterior distributions over the data sets and model parameters thereof. We \ufb01nd\nthat probabilistic inference can improve accuracy, integrate multiple observations,\nmeasure uncertainty, and even provide posterior distributions over quantities that\nwere not directly measured.\n\n1\n\nIntroduction\n\nThere has recently been signi\ufb01cant interest in the analysis of data sets whose individual records are\ntoo sensitive to expose directly, examples of which include medical information, \ufb01nancial data, and\npersonal data from social networking sites. Data like these are rich sources of information from\nwhich models could be learned for a variety of important applications. Although agencies with the\nresources to collate such data are unable to grant outside parties direct access to them, they may be\nable to safely release aggregate statistics of the data set. Progress in this area has so far been driven\nby researchers inventing sophisticated learning algorithms which are applied directly to the data and\noutput model parameters which can be proven to respect the privacy of the data set. Proving these\nprivacy properties requires an intricate analysis of each algorithm on a case-by-case basis. While this\ndoes result in many valuable algorithms and results, it is not a scalable solution for two reasons: \ufb01rst,\nto solve a new learning problem, one must invent and analyze a new privacy-preserving algorithm;\nsecond, one must then convince the owner of the data to run this algorithm. Both of these steps are\nchallenging.\nIn this paper, we show a natural connection between differential privacy, one of the leading privacy\nde\ufb01nitions, and probabilistic inference. Speci\ufb01cally, differential privacy exposes the conditional dis-\ntribution of its observable outputs given any input data set. Combining the conditional distributions\nof differentially-private observations with generative models for the data permits new inferences\nabout the data without the need to invent and analyze new differentially-private computations. In\nsome cases, one can rely on previously reported differentially private measurements. When this is\nnot suf\ufb01cient, one can use off-the-shelf differentially-private \u201cprimitives\u201d pre-sanctioned by owners\nof the data. As well as this \ufb02exibility, probabilistic inference can improve the accuracy of existing\napproaches, provide a measure of uncertainty in any predictions made, combine multiple observa-\ntions in a principled way, and integrate prior knowledge about the data or parameters.\nThe following section brie\ufb02y introduces differential privacy. In Section 3 we explore the marginal\nlikelihood of the differentially-private observations given generative model parameters for the data.\nIn general this likelihood consists of a high-dimensional integration over the space of all data sets,\nhowever we show that for a rich subclass of differentially private computations this distribution can\n\n1\n\n\fbe ef\ufb01ciently approximated via upper and lower bounds, derived using variational techniques. Sec-\ntion 4 shows several experimental results validating our hypothesis that probabilistic inference can\nbe fruitfully applied to differentially-private computation. In particular, we show how the applica-\ntion of principled, probabilistic inference to measurements made by an existing, heuristic algorithm\nfor logistic regression improves performance, as well as providing con\ufb01dence on the predictions\nmade.\n\n1.1 Related work\n\nThere is a substantial amount of research on privacy, and differential privacy in particular, connected\nwith machine learning and statistics. Nonetheless, we are unaware of any research that uses exact\nknowledge of the conditional distribution over outputs given inputs to perform inference over model\nparameters, or other features of the data. Much of the existing statistical literature is concerned with\nidentifying cases when the differentially-private observations are \u201cas good\u201d as traditional statistical\nestimators, in terms of ef\ufb01ciency [1], power [2], and minimax rates [3], and also robust estimators\n[4]. Instead, we are concerned with the cases where it is valuable to acknowledge and manage the\nuncertainty in the observations. As we demonstrate experimentally, such cases abound.\nChaudhuri and Monteleoni [5, 6] introduced the NIPS community to the problem of differentially-\nprivate logistic regression. Although we will also consider the problem of logistic regression (and\ncompare our \ufb01ndings with theirs) we should stress that the aim of the paper is not speci\ufb01cally to\nattack the problem of logistic regression. Rather, the problem serves as a good example where prior\nwork on differentially-private logistic regression can be improved through probabilistic inference.\n\n2 Differential Privacy\n\nDifferential privacy [7] applies to randomized computations executed against a dataset and returning\nan aggregate result for the entire set. It prevents inference about speci\ufb01c records by requiring that\nthe result of the computation yield nearly identical distributions for similar data sets. Formally, a\nrandomized computation M satis\ufb01es \u0001-differential privacy if for any two possible input data sets A\nand B, and any subset of possible outputs S,\n\nP (M (A) \u2208 S) \u2264 P (M (B) \u2208 S) \u00d7 exp(\u0001 \u00d7 |A (cid:9) B|) ,\n\n(1)\nwhere A (cid:9) B is the set of records in A or B, but not both. When A (cid:9) B is small, the relative\nbound on probabilities limits the inference an attacker can make about whether the true underlying\ndata were actually A or B. Inferences about the presence, absence, or speci\ufb01c values of individual\nrecords are strongly constrained.\nOne example of a differentially private computation is the exponential mechanism[8], characterized\nby a function \u03c6 : Dn \u00d7 R \u2192 R scoring each pair of data set and possible result with a real value.\nWhen the \u03c6 function satis\ufb01es | ln \u03c6(z, A)\u2212ln \u03c6(z, B)| \u2264 |A(cid:9)B| for all z, the following distribution\nsatis\ufb01es 2\u0001-differential privacy:\n\n(cid:80)\n\n\u03c6(z, X)\u0001\nz(cid:48)\u2208Z \u03c6(z(cid:48), X)\u0001\n\nP (M (X) = z) =\n\n(2)\n\nthe \u03c6 functions can be expressed as \u03c6(z, X) =(cid:81)\n\nThe exponential mechanism is fully general for differential privacy; any differentially-private mech-\nanism M can be encoded in a \u03c6 function using the density of M (X) at z.\nWhile any differentially-private mechanism can be expressed as a \u03c6 function, verifying that a func-\ntion \u03c6 satis\ufb01es the constraint | ln \u03c6(z, A)\u2212 ln \u03c6(z, B)| \u2264 |A(cid:9) B| is generally not easy, and requires\nsome form of proof on a case by case basis. One that does not require a specialized proof, is when\ni \u03c6(z, xi). This subclass is useful practically, as\ndata providers can ensure differential privacy by clamping each \u03c6(z, x) value to the range [e\u22121, e+1],\nwithout having to understand the \u03c6 function. We will refer to this subclass as the factored exponen-\ntial mechanism.\nAs we can see from the de\ufb01nition of the exponential mechanism, a differentially-private mechanism\ndraws its guarantees from its inherent randomness, rather than from secrecy about its speci\ufb01cation.\nAlthough differential privacy has many other redeeming features, it is this feature alone that we\n\n2\n\n\f(a)\n\n(b)\n\nFigure 1: Graphical models. (a) If the data X = {xi} are directly observable (shaded nodes), the\ncanonical learning task is to infer the posterior over \u03b8 given a model relating X and \u03b8. (b) In the\nprivate setting, the data are not observable; instead we observe the private measurement z, related\nto X by a known measurement process.\n\nexploit in the remainder of the work. By the same token, although there are many other privacy\nde\ufb01nitions with varying guarantees, we can apply inference to any de\ufb01nition exhibiting one key\nfeature: an explicit probabilistic relationship between the input data sets and output observations.\n\n3\n\nInference and privacy\n\nDifferential privacy limits what can be inferred about a single record in a data set, but does not\ndirectly limit inference about larger scale, aggregate properties of data sets. For example, many\ntasks in machine learning and statistics infer global parameters describing a model of the data set\nwithout explicit dependence on any single record, and we may still expect to be see a meaningful\nrelationship between differentially-private measurements and model parameters.\nOne way to model a data set is to propose a generative probabilistic model for the data p(X|\u03b8).\nIn Figure 1(a) we show a graphical model for the common case, in which we seek to infer the\nparameters \u03b8 given the observed iid data X = {xi}. In Figure 1(b) we see a graphical model for\nthe case considered in this paper, in which the data are not directly observed due to privacy. Instead,\ninformation about X is revealed by the measurement z, which is generated from X according to a\nknown conditional distribution p(z|X), for example as given in (2). We therefore reason about \u03b8 via\nthe marginal likelihood\n\n(cid:90)\n\np(z|\u03b8) =\n\ndX p(X|\u03b8)p(z|X).\n\n(3)\n\nArmed with the marginal likelihood, it is possible to bring all the techniques of probabilistic infer-\nence to bear. This will generally include adding a prior distribution over \u03b8, and combining multiple\nmeasurements to form a posterior\n\n(cid:89)\n\np(\u03b8|z1 . . . zm, \u03c0) = p(\u03b8|\u03c0)\n\np(zj|\u03b8)\n\n(4)\n\nj\n\nwhere \u03c0 stands for any non-private information about \u03b8 we may have available.\nWhile this is super\ufb01cially clean, there is a problem: the integration in (3) is over the space of all\ndata sets and is therefore challenging to compute whenever it cannot be solved analytically. Section\n4 will show some results in which we tackle this head-on via MCMC, however this only works for\ndata sets of moderate size. Therefore, the remainder of this section is devoted to the development of\nseveral bounds on the marginal likelihood for cases in which the measurement is generated via the\nfactored exponential mechanism. These bounds can be computed without requiring an integration\nover all X.\n\n3.1 Factored exponential mechanism\n\nindependence in p(X|\u03b8) = (cid:81)\n\nThe factored exponential mechanism of Section 2 is a special case of differentially-private mech-\nanism that admits ef\ufb01cient approximation of the marginal likelihood. We will be able to use the\ni \u03c6(z, xi) to factorize lower and upper\n\ni p(xi|\u03b8) and \u03c6(z, X) = (cid:81)\n\n3\n\n\u03b8xii=1..n\u03b8xizi=1..n\fbounds on the integral (3), resulting in a small number of integrals over only the domain of records,\nrather than the domain of data sets. As we will see, the bounds are often quite tight.\n\n(cid:32)(cid:88)\n\n(cid:18)(cid:90)\n(cid:18)(cid:90)\n\np(z|\u03b8) \u2265\n\nz(cid:48)\u2208Z\np(z|\u03b8) \u2264 e\u2212H[q]\n\ndx p(x|\u03b8)\n\n\u03c6(z(cid:48), x)\u0001\n\u03c6(z, x)\u0001\n\n(cid:81)\n\n(cid:19)n(cid:33)\u22121\n\n(cid:19)n\n\nwhere the upper bound is de\ufb01ned in terms of a variational distribution q(z) [9] such that(cid:80)\n\nz\u2208 q(z) =\n1. H[q] is the Shannon entropy of q. Notice that the integrations appearing in either bound are over\nthe space of a single record in a data set and not over the entire dataset as they were in (3).\n\nz(cid:48)\u2208Z \u03c6(z(cid:48), x)\u0001q(z(cid:48))\n\ndx p(x|\u03b8)\n\n\u03c6(z, x)\u0001\n\n(5a)\n\n(5b)\n\nTo prove the lower bound, we will apply Jensen\u2019s inequality with the function f (x) = 1/x to the\nmarginal likelihood of the exponential mechanism.\n\nProof of lower bound\n\n(cid:90)\n\n(cid:18)\n\n(cid:80)\n\ndX p(X|\u03b8)\n\n\u03c6(z, X)\u0001\nz(cid:48)\u2208Z \u03c6(z(cid:48), X)\u0001\n\nwhich now factorizes, as\n\n(cid:90)\n\n(cid:90)\n\ndx1\n\ndx2 . . .\n\n(cid:90)\n\n(cid:89)\n\ni\n\ndxn\n\n(cid:19)\n\n(cid:32)(cid:90)\n(cid:32)(cid:88)\n\nz(cid:48)\u2208Z\n\n\u2265\n\n=\n\n(cid:88)\n\ndX p(X|\u03b8)\n\nz(cid:48)\u2208Z\ndX p(X|\u03b8)\n\n\u03c6(z(cid:48), X)\u0001\n\u03c6(z, X)\u0001\n\u03c6(z(cid:48), X)\u0001\n\u03c6(z, X)\u0001\n\n(cid:33)\u22121\n(cid:33)\u22121\n\n(cid:90)\n(cid:89)\n(cid:18)(cid:90)\n\ni\n\n(cid:18)(cid:90)\n\np(xi|\u03b8)\n\n\u03c6(z(cid:48), xi)\u0001\n\u03c6(z, xi)\u0001 =\n\n=\n\n\u03c6(z(cid:48), xi)\u0001\n\u03c6(z, xi)\u0001\n\n(cid:19)n\n\ndxi p(xi|\u03b8)\n\ndx p(x|\u03b8)\n\n\u03c6(z(cid:48), x)\u0001\n\u03c6(z, x)\u0001\n\n(cid:19)\n\n.\n\n(cid:4)\n\ndistribution q(z), and applying Jensen\u2019s inequality with the function f (x) = log x.\n\nz(cid:48)\u2208Z \u03c6(z(cid:48), X)\u0001 in (2) by introducing a variational\n\nProof of upper bound\n\nWe can lower bound the normalizing term (cid:80)\n(cid:88)\n\n(cid:88)\n\n\u03c6(z(cid:48), X)\u0001 = exp log\n\nz(cid:48)\u2208Z\n\nq(z(cid:48))\nq(z(cid:48))\n\nz(cid:48)\u2208Z\n\n\u2265 exp(H[q]) + exp\n\nq(z(cid:48)) log \u03c6(z(cid:48), X)\u0001\n\n(cid:33)\n\n.\n\nApplying this bound to the marginal likelihood gives us the bound\n\n(cid:90)\n\ndX p(X|\u03b8)\n\n\u03c6(z, X)\u0001\n\n(cid:80)\nz(cid:48)\u2208Z \u03c6(z(cid:48), X)\u0001 \u2264 e\u2212H[q]\n= e\u2212H[q]\n\n= e\u2212H[q]\n\n\u03c6(z(cid:48), X)\u0001\n\n(cid:32)(cid:88)\n\nz(cid:48)\u2208Z\n\n(cid:90)\n(cid:90)\n(cid:18)(cid:90)\n\ndX p(X|\u03b8)\n\n(cid:18)\n\n(cid:89)\n\ndX\n\ni\n\ndx p(x|\u03b8)\n\n(cid:81)\n\n(cid:81)\n\n\u03c6(z, X)\u0001\n\nz(cid:48)\u2208Z \u03c6(z(cid:48), X)\u0001q(z(cid:48))\n\u03c6(z, xi)\u0001\n\n(cid:81)\n\nz(cid:48)\u2208Z \u03c6(z(cid:48), xi)\u0001q(z(cid:48))\n\n(cid:19)\n\np(xi|\u03b8)\n\n(cid:19)n\n\n\u03c6(z, x)\u0001\n\nz(cid:48)\u2208Z \u03c6(z(cid:48), x)\u0001q(z(cid:48))\n\n.\n\n(cid:4)\n\nWhile the upper bound is true for any q distribution, the tightest bound is found for the q which\nminimizes the bound.\n\n4\n\n\f(a)\n\n(b)\n\nFigure 2: Error in upper and lower bounds for coin-\ufb02ipping problem. (a) For each epsilon, we\nplot the maximum across all \u03b8 of the error between the true distribution and each of the upper and\nlower bounds is plotted. (b) For n = 100 and \u0001 = 0.5, we show the shape of the upper bound, lower\nbound, and true distribution when differentially-private measurement returned was z = 0.7.\n\n3.1.1 Chosing a \u03c6 function\n\nThe upper and lower bounds in (5) are true for any admissible \u03c6 function, but leave unanswered the\nquestion of what to chose in this r\u02c6ole. In the absence of privacy we might try to \ufb01nd a good \ufb01t for\nthe parameters \u03b8 by maximum likelihood. In the private setting this is not possible because the data\nare not directly observable, but the output of the factored exponential mechanism has a very similar\nform:\n\n(cid:89)\n(cid:89)\n\ni\n\ni\n\n(6a)\n\n(6b)\n\nMax likelihood:\n\n\u03b8\u2217 = arg max\n\u03b8\u2208\u0398\n\np(xi|\u03b8)\n\nz\u2217 = noisy max\nz\u2208Z\n\n\u03c6(z, xi)\u0001\n\nExp. mechanism:\n\n(cid:80)\n\nf (z)\n\nwhere noisy maxz\u2208Z f (z) samples from\nz(cid:48)\u2208Z f (z(cid:48)). By making the analogy between (6a) and (6b),\nwe might let z range over elements of \u0398 (or a \ufb01nite subset), and take \u03c6(z, xi) to be the likelihood of\nxi under parameters z. The exponential mechanism is then likely to choose parameters z that \ufb01t the\ndata well, informing us that the posterior over \u03b8 is likely in the vicinity of z. For \u03c6 to be admissible,\nwe must clamp very small values of \u03c6 up to 1/e, limiting the ability of very poorly \ufb01t records to\nin\ufb02uence our decisions strongly.\n\n3.2 Evaluation of the bounds\n\nTo demonstrate the effectiveness of these bounds we consider a problem in which it is possible to\nanalytically compute the marginal likelihood. This is the case in which the database X contains\na set of Boolean values corresponding to independent samples from a Bernoulli distribution with\nprobability \u03b8\n\np(x|\u03b8) = \u03b8x(1 \u2212 \u03b8)(1\u2212x).\n\nlog 0.1) that is,\n\n(7)\nFor our test, we took Z to be the nine multiples of 0.1 between 0.1 and 0.9, and log \u03c6(z, xi) =\n[log p(x|\u03b8)]log 0.9\nthe log likelihood clammped such that \u03c6(z, x) lies in the interval\n[e\u22121, e+1], as required by privacy.\nWe see in \ufb01gure 2a that the error in both the upper and lower bounds, across the entire density\nfunction, is essentially zero for small epsilon. As epsilon increases the bounds deteriorate, but we\nare most interested in the case of small values of epsilon, where privacy guarantees are meaningfully\nstrong. Figure 2b shows the shape of the two bounds, and the true density between, for epsilon =\n0.5. This large value was chosen as it is in the region for which the bounds are less tight and the\ndifference between the bounds and the truth can be seen.\n\n5\n\nUpper boundLower bound-4-3-2-10log10(epsilon)-0.02-0.0100.01Max. errorUpper boundActualLower bound00.20.40.60.8theta0.10.20.30.4p(z|theta)\fapproximately minimized by setting q(z) \u221d exp(cid:0)n(cid:82) dx p(x|\u03b8) log \u03c6(z, x)(cid:1). In general, however,\n\nThe upper bound is de\ufb01ned in terms of a variational distribution q. For these experiments q was\n\nthese (and other) test show that both bounds are equally good for reasonable values of \u0001 and we\ntherefore use the lower bound for the experiments in this paper, since it is simpler to compute.\n\n4 Experiments\n\nWe consider two scenarios for the experimental validation of the utility of probabilistic inference.\nFirst, we consider applying probabilistic inference to an existing differentially-private computation,\nspeci\ufb01cally a logistic regression heuristic taken from a suite of differentially-private algorithms. The\nheuristic is not representable in the factored exponential mechanism, and as such we must attempt\nto approximate the full integral over the space of data sets directly. In our second experiment, we\nchoose a problem and measurement process appropriate for the factored exponential mechanism,\nprincipal components analysis, previously only ever addressed through noisy observation of the\ncovariance matrix.\n\n4.1 Logistic Regression\n\nTo examine the potential of probabilistic inference to improve the quality of existing differentially-\nprivate computations, we consider a heuristic algorithm for logistic regression included in the Pri-\nvacy Integrated Queries distribution [10]. This heuristic uses a noisy sum primitive to repeatedly\ncompute and step in the direction of an approximate gradient. When the number of records is large\ncompared to the noise introduced, the approximate gradient is relatively accurate, and the algorithm\nperforms well. When the records are fewer or the privacy requirements demand more noise, its\nperformance suffers. Probabilistic inference has the potential to improve performance by properly\nintegrating the information extracted from the data across the multiple gradient measurements and\nmanaging the uncertainty associated with the noisy measurements.\nWe test our proposals against three synthetic data sets (CM1 and CM2 from [5] and one of our own:\nSYNTH) and two data sets from the UCI repository (PIMA and ADULT) [11]. Details of these data\nsets appear in table 4.1. The full ADULT data set was split into training and test sets, chosen so as\nto force the marginal frequency of positive and negative examples to 50%.\n\nSYNTH CM1\n\nCM2\n\nPIMA ADULT\n\nRecords\nDimensions\nPositive examples\nTest set records\n\n1000\n\n4\n497\n1000\n\n17500\n\n10\n8770\n17500\n\n17500\n\n10\n8694\n17500\n\n691\n8\n237\n767*\n\n16000\n\n6\n\n7841\n8000\n\nTable 1: Data sets used and their statistics. Attribute values in SYNTH are sampled uniformly from\na hypercube of unit volume, centered at the origin. CM1 and CM2 are both sampled uniformly at\nrandom for the surface of the unit hypersphere in 10 dimensions; CM1 is linearly separable, whereas\nCM2 is not (see [5]). PIMA and ADULT are standard data sets [11] containing diabetes records,\nand census data respectively, both of which correspond to the types of data one might expect to be\nprotected by differential privacy. The total PIMA data set is so small that we reused the full data set\nas test data (indicated by *).\n\n4.1.1 Error Rates and Log-Likelihood\n\nTables 2 and 3 report the classi\ufb01cation accuracy of several approaches when the privacy parameter\n\u0001 is set to 0.1 and 1.0 respectively. These results are computed from 50 executions of the heuristic\ngradient descent algorithm.\nWe can see a trend of general improvement from the heuristic approach to the probabilistic inference,\nboth in terms of the average error rate and the standard deviation. For the CM1 and CM2 data sets\nat epsilon = 0.1, we see substantial improvement over the reported results of [5]. Please note that\n\n6\n\n\fHeuristic\nInference\nBenchmark\nNIPS 08 [5]\n\nSYNTH\n37.40 \u00b1 15.75\n29.14 \u00b1 5.54\n16.40\n\nCM1\n3.93 \u00b1 1.57\n2.72 \u00b1 0.84\n0.00\n14.26 \u00b1 12.84\n\nCM2\n9.32 \u00b1 1.18\n8.84 \u00b1 0.79\n5.40\n19.03 \u00b1 11.05\n\nPIMA\n44.26 \u00b1 8.50\n45.70 \u00b1 6.31\n19.48\n\nADULT\n43.15 \u00b1 7.85\n36.07 \u00b1 6.32\n26.09\n\nTable 2: Error Rates with \u0001 = 0.1 All measurements are in per cent; errors are reported as the\nmean \u00b1 one standard deviation computed from 50 independent executions with random starting\npoints. Heuristic corresponds to the last estimate made by noisy gradient ascent. Inference entries\ncorrespond to the expected error, computed over the approximate posterior for \u03b8 found via MCMC.\nBenchmark is the best maximum likelihood solution found by gradient ascent when the data are\ndirectly observable and forms a baseline for expected performance. NIPS08 corresponds the the\nresults given in [5]; these values were copied from that paper and are provided for comparison.\n\nHeuristic\nInference\nBenchmark\n\nSYNTH\n17.31 \u00b1 1.12\n17.16 \u00b1 0.94\n16.40\n\nCM1\n0.00 \u00b1 0.00\n0.01 \u00b1 0.02\n0.00\n\nCM2\n5.67 \u00b1 0.19\n5.69 \u00b1 0.13\n5.40\n\nPIMA\n35.67 \u00b1 6.45\n36.47 \u00b1 8.56\n19.48\n\nADULT\n31.30 \u00b1 4.16\n29.36 \u00b1 1.31\n26.09\n\nTable 3: Error Rates with \u0001 = 1.0 All measurements are in per cent; see caption for table 2.\n\nthe experiments were run on different data than in [5] drawn from the same distribution, and that\ndifferent numbers of repetitions were used in [5] for the computation of the standard deviation and\nmean.\n\n4.1.2 Exchanging Iterations for Accuracy\n\nThe heuristic gradient ascent algorithm has an important con\ufb01guration parameter determining the\nnumber of iterations of ascent, and consequently the accuracy permitted in each round (which must\nbe lower if more rounds are to be run, to keep the cumulative privacy cost constant). The perfor-\nmance of the algorithm can be very sensitive to this parameter, as too few iterations indicate too little\nabout the data, and too many render each iteration meaningless. In Figure 3a we consider several\nparameterizations of the heuristic, taking varying numbers of steps with varying degrees of accu-\nracy in each step. Each colored path describes an execution with a \ufb01xed level of accuracy in each\niteration, and all are plotted on the common scale of total privacy consumption. All of these paths\nroughly describe a common curve, suggesting that careful con\ufb01guration is not required for these ap-\nproaches: probabilistic inference appears to extract an amount of information that depends mainly\non the total privacy consumption, and less on the speci\ufb01c details of its collection. This experiment\nwas performed on the CM2 data set and the corresponding result from [5] is indicated by the \u2018X\u2019.\n\n4.1.3 Integrating Auxiliary Information\n\nTo further demonstrate the power of the probabilistic inference approach, we consider the plausible\nscenario in which we are provided with a limited number of additional data points, obtained without\nprivacy protection (for example, if we independently run a small survey of our own). These addi-\ntional samples are easily incorporated into the graphical model by adding them as descendants of \u03b8\nin \ufb01gure 1b. Figure 3b shows how the performance on SYNTH (which contains 1000 data points)\nimproves, as the quantity of additional examples increases. Even with very few additional examples,\nprobabilistic inference is capable of exploiting this information and performance improves dramati-\ncally.\n\n4.2 Principal components\n\nTo demonstrate inference on another model, and to highlight the applicability of the factored expo-\nnential mechanism, we consider the problem of probabilistically \ufb01nding the \ufb01rst principal compo-\n\n7\n\n\f(a)\n\n(b)\n\nFigure 3: (a) Paths of varying \u0001. (b) Incorporating non-private observations A compelling bene\ufb01t\nof probabilistic inference is how easily alternate sources of information are added. The horizontal\nline indicates the performance of the benchmark maximum likelihood solution computed from the\ndata without privacy.\n\n\u0001 = 0.003\n\n\u0001 = 0.01\n\n\u0001 = 0.1\n\nFigure 4: Posterior distribution as a function of \u0001 The same synthetic data set under differentially-\nprivate measurements with varying epsilon. For each measurement, 1000 samples of the full pos-\nterior over \u03b8 are drawn and overlaid on this \ufb01gure to indicate the modes and concentration of the\ndensity. The posterior is noticeably more concentrated and accurate as epsilon increases.\n\nnent of a data set where we model the data as iid draws from a Gaussian\n\np(x|\u03b8) = N (0, \u03b8\u03b8T + \u03c32I).\n\n(8)\n\nAn important advantage of our approach is its ability to capture uncertainty in the parameters and\nact accordingly. Figure 4 demonstrates three instances of inference applied to the same data set with\nthree different values of \u0001. As \u0001 increases, the concentration of the posterior over the parameters\nincreases. We stress that the posterior and its concentration are returned to the analyst; each image\nis the result of a single differentially-private measurement, rather than a visualization of multiple\nruns. The measurement associated with \u0001 = 0.003 is revealing as it corresponds to the off-axis\nmode of the posterior. Although centered on this incorrect answer, the posterior indicates lack of\ncon\ufb01dence, and there is non-negligible mass over the correct answer.\n\n5 Conclusions\n\nMost work in the area of learning from private data forms an intrinsic analysis. That is, a complex\nalgorithm is run by the owner of the data, directly on that data, and a single output is produced which\nappropriately indicates the desired parameters (modulo noise). In contrast, this paper has shown that\nit is possible to do a great deal with an extrinsic analysis, where standard, primitive, measurements\nare made against the data, and a posterior over model parameters is inferred post hoc.\nThis paper brings together two complementary lines of research:\nthe design and analysis of\ndifferentially-private algorithms, and probabilistic inference. Our primary goal is not to weigh-in\non new differentially-private algorithms, nor to \ufb01nd new methods for probabilistic inferences \u2013 it is\nto present the observation that the two approaches are complementary in way that can be mutually\nenriching.\n\n8\n\n-2.5-2-1.5-1-0.50log10(epsilon)00.20.4Error rate0100200300400Public data0.150.20.250.3Error rate-50510-50510-50510-50510-50510-50510\fReferences\n[1] A. Smith. Ef\ufb01cient, differentially private point estimators. 2008. arXiv:0809.4794.\n[2] A. Slavkovic and D. Vu. Differential privacy for clinical trial data: Preliminary evaluations.\nIn Proceedings of the International workshop on Privacy Aspects of Data Mining, PADM09,\n2009.\n\n[3] L. Wasserman and S. Zhou. A statistical framework for differential privacy. Journal of the\n\nAmerican Statistical Association, 105(489):375\u2013389, 2010.\n\n[4] C. Dwork and J. Lei. Differential privacy and robust statistics. In STOC, 2009.\n[5] K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. In NIPS, pages 289\u2013\n\n296, 2008.\n\n[6] K. Chaudhuri, C. Monteleoni, and A.D. Sarwate. Differentially private empirical risk mini-\n\nmization. 2010.\n\n[7] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private\n\ndata analysis. In TCC, pages 265\u2013284, 2006.\n\n[8] F. McSherry and K. Talwar. Differential privacy via mechanism design. In FOCS, 2007.\n[9] M.I. Jordan, Z. Ghahramani, T. Jaakkola, and L.K. Saul. An introduction to variational meth-\n\nods for graphical models. Machine Learning, 37(2):183\u2013233, 1999.\n\n[10] F. McSherry. Privacy integrated queries. In ACM SIGMOD, 2009.\n[11] A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.\n\n9\n\n\f", "award": [], "sourceid": 1276, "authors": [{"given_name": "Oliver", "family_name": "Williams", "institution": null}, {"given_name": "Frank", "family_name": "Mcsherry", "institution": null}]}