{"title": "Differential Privacy without Sensitivity", "book": "Advances in Neural Information Processing Systems", "page_first": 956, "page_last": 964, "abstract": "The exponential mechanism is a general method to construct a randomized estimator that satisfies $(\\varepsilon, 0)$-differential privacy. Recently, Wang et al. showed that the Gibbs posterior, which is a data-dependent probability distribution that contains the Bayesian posterior, is essentially equivalent to the exponential mechanism under certain boundedness conditions on the loss function. While the exponential mechanism provides a way to build an $(\\varepsilon, 0)$-differential private algorithm, it requires boundedness of the loss function, which is quite stringent for some learning problems. In this paper, we focus on $(\\varepsilon, \\delta)$-differential privacy of Gibbs posteriors with convex and Lipschitz loss functions. Our result extends the classical exponential mechanism, allowing the loss functions to have an unbounded sensitivity.", "full_text": "Differential Privacy without Sensitivity\n\nKentaro Minami\n\nThe University of Tokyo\n\nHiromi Arai\n\nThe University of Tokyo\n\nkentaro minami@mist.i.u-tokyo.ac.jp\n\narai@dl.itc.u-tokyo.ac.jp\n\nIssei Sato\n\nThe University of Tokyo\nsato@k.u-tokyo.ac.jp\n\nHiroshi Nakagawa\n\nThe University of Tokyo\n\nnakagawa@dl.itc.u-tokyo.ac.jp\n\nAbstract\n\nThe exponential mechanism is a general method to construct a randomized estima-\ntor that satis\ufb01es (\u03b5, 0)-differential privacy. Recently, Wang et al. showed that the\nGibbs posterior, which is a data-dependent probability distribution that contains\nthe Bayesian posterior, is essentially equivalent to the exponential mechanism un-\nder certain boundedness conditions on the loss function. While the exponential\nmechanism provides a way to build an (\u03b5, 0)-differential private algorithm, it re-\nquires boundedness of the loss function, which is quite stringent for some learning\nproblems. In this paper, we focus on (\u03b5, \u03b4)-differential privacy of Gibbs posteriors\nwith convex and Lipschitz loss functions. Our result extends the classical expo-\nnential mechanism, allowing the loss functions to have an unbounded sensitivity.\n\n1\n\nIntroduction\n\nDifferential privacy is a notion of privacy that provides a statistical measure of privacy protection\nfor randomized statistics. In the \ufb01eld of privacy-preserving learning, constructing estimators that\nsatisfy (\u03b5, \u03b4)-differential privacy is a fundamental problem. In recent years, differentially private\nalgorithms for various statistical learning problems have been developed [8, 14, 3].\nUsually, the estimator construction procedure in statistical learning contains the following mini-\nmization problem of a data-dependent function. Given a dataset Dn = {x1, . . . , xn}, a statistician\nchooses a parameter \u03b8 that minimizes a cost function L(\u03b8, Dn). A typical example of cost function\nis the empirical risk function, that is, a sum of loss function (cid:96)(\u03b8, xi) evaluated at each sample point\nxi \u2208 Dn. For example, the maximum likelihood estimator (MLE) is given by the minimizer of\nempirical risk with loss function (cid:96)(\u03b8, x) = \u2212 log p(x | \u03b8).\nTo achieve a differentially private estimator, one natural idea is to construct an algorithm based on a\nposterior sampling, namely drawing a sample from a certain data-dependent probability distribution.\nThe exponential mechanism [16], which can be regarded as a posterior sampling, provides a general\nmethod to construct a randomized estimator that satis\ufb01es (\u03b5, 0)-differential privacy. The probabil-\nity density of the output of the exponential mechanism is proportional to exp(\u2212\u03b2L(\u03b8, Dn))\u03c0(\u03b8),\nwhere \u03c0(\u03b8) is an arbitrary prior density function, and \u03b2 > 0 is a parameter that controls the\ndegree of concentration. The resulting distribution is highly concentrated around the minimizer\n\u03b8\u2217\n\u2208 argmin\u03b8 L(\u03b8, Dn). Note that most differential private algorithms involve a procedure to add\nsome noise (e.g. the Laplace mechanism [12], objective perturbation [8, 14], and gradient perturba-\ntion [3]), while the posterior sampling explicitly designs the density of the output distribution.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fFigure 1: An example of a logistic loss function (cid:96)(\u03b8, x) := log(1 + exp(\u2212y\u03b8(cid:62)z)). Considering two\npoints x\u00b1 = (z,\u00b11), the difference of the loss |(cid:96)(\u03b8, x+) \u2212 (cid:96)(\u03b8, x\u2212)| increases proportionally to the\nsize of the parameter space (solid lines). In this case, the value of the \u03b2 in the exponential mecha-\nnism, which is inversely proportional to the maximum difference of the loss function, becomes very\nsmall. On the other hand, the difference of the gradient |\u2207(cid:96)(\u03b8, x+) \u2212 \u2207(cid:96)(\u03b8, x\u2212)| does not exceed\ntwice of the Lipschitz constant (dashed lines). Hence, our analysis based on Lipschitz property does\nnot be in\ufb02uenced by the size of the parameter space.\n\nTable 1: Regularity conditions for (\u03b5, \u03b4)-differential privacy of the Gibbs posterior. Instead of the\nboundedness of the loss function, our analysis in Theorem 7 requires its Lipschitz property and\nconvexity. Unlike the classical exponential mechanism, our result explains \u201cshrinkage effect\u201d or\n\u201ccontraction effect\u201d, namely, the upper bound for \u03b2 depends on the concavity of the prior \u03c0 and the\nsize of the dataset n.\n\nExponential\nmechanism [16]\nTheorem 7\nTheorem 10\n\nLoss function (cid:96)\n\n(\u03b5, \u03b4)\n\u03b4 = 0 Bounded sensitivity\n\nPrior \u03c0\nArbitrary\n\n\u03b4 > 0 Lipschitz and convex\nLipschitz\n\u03b4 > 0 Bounded,\n\nand strongly convex\n\nLog-concave\nLog-concave\n\nShrinkage\n\nNo\n\nYes\nYes\n\nWe de\ufb01ne the density of the Gibbs posterior distribution as\n\nexp(\u2212\u03b2(cid:80)n\n(cid:82) exp(\u2212\u03b2(cid:80)n\n\nG\u03b2(\u03b8 | Dn) :=\n\ni=1 (cid:96)(\u03b8, xi))\u03c0(\u03b8)\ni=1 (cid:96)(\u03b8, xi))\u03c0(\u03b8)d\u03b8\n\n.\n\n(1)\n\nThe Gibbs posterior plays important roles in several learning problems, especially in PAC-Bayesian\nlearning theory [6, 21]. In the context of differential privacy, Wang et al. [20] recently pointed out\nthat the Bayesian posterior, which is a special version of (1) with \u03b2 = 1 and a speci\ufb01c loss function,\nsatis\ufb01es (\u03b5, 0)-differential privacy because it is equivalent to the exponential mechanism under a\ncertain regularity condition. Bassily, et al. [3] studied an application of the exponential mechanism\nto private convex optimization.\nIn this paper, we study the (\u03b5, \u03b4)-differential privacy of the posterior sampling with \u03b4 > 0.\nparticular, we consider the following statement.\nClaim 1. Under a suitable condition on loss function (cid:96) and prior \u03c0, there exists an upper bound\nB(\u03b5, \u03b4) > 0, and the Gibbs posterior G\u03b2(\u03b8 | Dn) with \u03b2 \u2264 B(\u03b5, \u03b4) satis\ufb01es (\u03b5, \u03b4)-differential\nprivacy. The value of B(\u03b5, \u03b4) does not depend on the boundedness of the loss function.\n\nIn\n\n2\n\n\u03b8Loss\u2018(\u03b8,x\u2212)\u2018(\u03b8,x+)\u03b8Difference\u03b8Lossgradient|\u2207\u2018(\u03b8,x\u2212)||\u2207\u2018(\u03b8,x+)|\fWe point out here the analyses of (\u03b5, 0)-differential privacy and (\u03b5, \u03b4)-differential privacy with \u03b4 > 0\nare conceptually different in the regularity conditions they require. On one hand, the exponential\nmechanism essentially requires the boundedness of the loss function to satisfy (\u03b5, 0)-differential\nprivacy. On the other hand, the boundedness is not a necessary condition in (\u03b5, \u03b4)-differential pri-\nvacy. In this paper, we give a new suf\ufb01cient condition for (\u03b5, \u03b4)-differential privacy based on the\nconvexity and the Lipschitz property. Our analysis widens the application ranges of the exponential\nmechanism in the following aspects (See also Table 1).\n\n\u2022 (Removal of boundedness assumption) If the loss function is unbounded, which is usually\nthe case when the parameter space is unbounded, the Gibbs posterior does not satisfy (\u03b5, 0)-\ndifferential privacy in general. Still, in some cases, we can build an (\u03b5, \u03b4)-differential\nprivate estimator.\n\n\u2022 (Tighter evaluation of \u03b2) Even when the difference of the loss function is bounded, our\nanalysis can yield a better scheme in determining the appropriate value of \u03b2 for a given\nprivacy level. Figure 1 shows an example of logistic loss.\n\n\u2022 (Shrinkage and contraction effect) Intuitively speaking, the Gibbs posterior becomes robust\nagainst a small change of the dataset, if the prior \u03c0 has a strong shrinkage effect (e.g. a\nGaussian prior with a small variance), or if the size of the dataset n tends to in\ufb01nity. In\nour analysis, the upper bound of \u03b2 depends on \u03c0 and n, which explains such shrinkage and\ncontraction effects.\n\n1.1 Related work\n\n(\u03b5, \u03b4)-differential privacy of Gibbs posteriors has been studied by several authors. Mir ([18], Chapter\n5) proved that a Gaussian posterior in a speci\ufb01c problem satis\ufb01es (\u03b5, \u03b4)-differential privacy. Dim-\nitrakakis et al. [10] considered Lipschitz-type suf\ufb01cient conditions, yet their result requires some\nmodi\ufb01cation of the de\ufb01nition of the neighborhood on the database.\nIn general, the utility of sensitivity-based methods suffers from the size of the parameter space\n\u0398. Thus, getting around the dependency on the size of \u0398 is a fundamental problem in the study\nof differential privacy. For discrete parameter spaces, a general range-independent algorithm for\n(\u03b5, \u03b4)-differential private maximization was developed in [7].\n\n1.2 Notations\n+(\u0398). A map\nThe set of all probability measures on a measurable space (\u0398,T ) is denoted by M1\nbetween two metric spaces f : (X, dX ) \u2192 (Y, dY ) is said to be L-Lipschitz, if dY (f (x1), f (x2)) \u2264\nLdX (x1, x2) holds for all x1, x2 \u2208 X. Let f be a twice continuously differentiable function f\nde\ufb01ned on a subset of Rd. f is said to be m(> 0)-strongly convex, if the eigenvalues of its Hessian\n\u22072f are bounded by m from below. f is said to be M-smooth,\n2 Differential privacy with sensitivity\n\nIn this section, we review the de\ufb01nition of (\u03b5, \u03b4)-differential privacy and the exponential mechanism.\n\n2.1 Differential privacy\n\nDifferential privacy is a notion of privacy that provides a degree of privacy protection in a statistical\nsense. More precisely, differential privacy de\ufb01nes a closeness between any two output distributions\nthat correspond to adjacent datasets.\nIn this paper, we assume that a dataset D = Dn = (x1, . . . , xn) is a vector that consists of n points\nin abstract attribute space X , where each entry xi \u2208 X represents information contributed by one\nindividual. Two datasets D, D(cid:48) are said to be adjacent if dH (D, D(cid:48)) = 1, where dH is the Hamming\ndistance de\ufb01ned on the space of all possible datasets X d.\nWe describe the de\ufb01nition of differential privacy in terms of randomized estimators. A randomized\nestimator is a map \u03c1 : X n \u2192 M1\n+(\u0398) from the space of datasets to the space of probability\nmeasures.\n\n3\n\n\fDe\ufb01nition 2 (Differential privacy). Let \u03b5 > 0 and \u03b4 \u2265 0 be given privacy parameters. We say that\na randomized estimator \u03c1 : X n \u2192 M1\n+(\u0398) satis\ufb01es (\u03b5, \u03b4)-differential privacy, if for any adjacent\ndatasets D, D(cid:48)\n\n\u2208 X n, an inequality\n\n\u03c1D(A) \u2264 e\u03b5\u03c1D(cid:48)(A) + \u03b4\n\nholds for every measurable set A \u2282 \u0398.\n2.2 The exponential mechanism\n\n(2)\n\n(3)\n\nThe exponential mechanism [16] is a general construction of (\u03b5, 0)-differentially private distribu-\ntions. For an arbitrary function L : \u0398 \u00d7 X n \u2192 R, we de\ufb01ne the sensitivity by\n\n\u2206L :=\n\nsup\n\nD,D(cid:48)\u2208X n:\ndH (D,D(cid:48))=1\n\nsup\n\n\u03b8\u2208\u0398|L(\u03b8, D) \u2212 L(\u03b8, D(cid:48))|,\n\nWe consider the particular case that\n\nthe cost function is given as sum form L(\u03b8, Dn) =\ni=1 (cid:96)(\u03b8, xi). Recently, Wang et al. [20] examined two typical cases in which \u2206L is \ufb01nite. The\n\nwhich is the largest possible difference of two adjacent functions f (\u00b7, D) and f (\u00b7, D(cid:48)) with respect\nto supremum norm.\nTheorem 3 (McSherry and Talwar). Suppose that the sensitivity of the function L(\u03b8, Dn) is \ufb01nite.\nLet \u03c0 be an arbitrary base measure on \u0398. Take a positive number \u03b2 so that \u03b2 \u2264 \u03b5/2\u2206L. Then a\nprobability distribution whose density with respect to \u03c0 is proportional to exp(\u2212\u03b2L(\u03b8, Dn)) satis\ufb01es\n(\u03b5, 0)-differential privacy.\n(cid:80)n\nfollowing statement slightly generalizes their result.\nTheorem 4 (Wang, et al.). (a) Suppose that the loss function (cid:96) is bounded by A, namely |(cid:96)(\u03b8, x)| \u2264\nA holds for all x \u2208 X and \u03b8 \u2208 \u0398. Then \u2206L \u2264 2A, and the Gibbs posterior (1) satis\ufb01es (4\u03b2A, 0)-\ndifferential privacy.\n(b) Suppose that for any \ufb01xed \u03b8 \u2208 \u0398, the difference |(cid:96)(\u03b8, x1) \u2212 (cid:96)(\u03b8, x2)| is bounded by L for all\nx1, x2 \u2208 X . Then \u2206L \u2264 L, and the Gibbs posterior (1) satis\ufb01es (2\u03b2L, 0)-differential privacy.\nThe condition \u2206L < \u221e is crucial for Theorem 3 and cannot be removed. However, in practice,\nstatistical models of interest do not necessarily satisfy such boundedness conditions. Here we have\ntwo simple examples of Gaussian and Bernoulli mean estimation problems, in which the sensitivities\nare unbounded.\n\n\u2022 (Bernoulli mean) Let (cid:96)(p, x) = \u2212x log p\u2212(1\u2212x) log(1\u2212p) (p \u2208 (0, 1), x \u2208 {0, 1}) be the\nnegative log-likelihood of the Bernoulli distribution. Then |(cid:96)(p, 0)\u2212 (cid:96)(p, 1)| is unbounded.\n2 (\u03b8 \u2212 x)2 (\u03b8 \u2208 R, x \u2208 R) be the negative log-likelihood\n\u2022 (Gaussian mean) Let (cid:96)(\u03b8, x) = 1\nof the Gaussian distribution with a unit variance. Then |(cid:96)(\u03b8, x) \u2212 (cid:96)(\u03b8, x(cid:48))| is unbounded if\nx (cid:54)= x(cid:48).\n\nThus, in the next section, we will consider an alternative proof technique for (\u03b5, \u03b4)-differential pri-\nvacy so that it does not require such boundedness conditions.\n\n3 Differential privacy without sensitivity\n\nIn this section, we state our main results for (\u03b5, \u03b4)-differential privacy in the form of Claim 1.\nThere is a well-known suf\ufb01cient condition for the (\u03b5, \u03b4)-differential privacy:\nTheorem 5 (See for example Lemma 2 of [13]). Let \u03b5 > 0 and \u03b4 > 0 be privacy parameters.\nSuppose that a randomized estimator \u03c1 : X n \u2192 M1\n+(\u0398) satis\ufb01es a tail-bound inequality of log-\ndensity ratio\n\n(cid:26)\n\n(cid:27)\n\nfor every adjacent pair of datasets D, D(cid:48). Then \u03c1 satis\ufb01es (\u03b5, \u03b4)-differential privacy.\n\n\u03c1D\n\nlog\n\nd\u03c1D\nd\u03c1D(cid:48) \u2265 \u03b5\n\n\u2264 \u03b4\n\n(4)\n\n4\n\n\f(cid:27)\n\n(cid:26)\n\nTo control the tail behavior (4) of the log-density ratio function log d\u03c1D\nd\u03c1D(cid:48) , we consider the concen-\ntration around its expectation. Roughly speaking, inequality (4) holds if there exists an increasing\nfunction \u03b1(t) that satis\ufb01es an inequality\n\n\u03c1D\n\nlog\n\nd\u03c1D\nd\u03c1D(cid:48) \u2265 DKL(\u03c1D, \u03c1D(cid:48)) + t\n\n\u2264 exp(\u2212\u03b1(t)),\n\n\u2200t > 0,\nis the log-density ratio function, and DKL(\u03c1D, \u03c1D(cid:48)) := E\u03c1D log d\u03c1D\nd\u03c1D(cid:48)\n\nwhere log dG\u03b2,D\nis the\ndG\u03b2,D(cid:48)\nKullback-Leibler (KL) divergence. Suppose that the Gibbs posterior G\u03b2,D, whose density G(\u03b8 | D)\nis de\ufb01ned by (1), satis\ufb01es an inequality (5) for a certain \u03b1(t) = \u03b1(t, \u03b2). Then G\u03b2,D satis\ufb01es (4) if\nthere exist \u03b2, t > 0 that satisfy the following two conditions.\n1. KL-divergence bound: DKL(G\u03b2,D, G\u03b2,D(cid:48)) + t \u2264 \u03b5\n2. Tail-probability bound: exp(\u2212\u03b1(t, \u03b2)) \u2264 \u03b4\n\n(5)\n\n3.1 Convex and Lipschitz loss\n\nHere, we examine the case in which the loss function (cid:96) is Lipschitz and convex, and the parameter\nspace \u0398 is the entire Euclidean space Rd. Due to the unboundedness of the domain, the sensitivity\n\u2206L can be in\ufb01nite, in which case the exponential mechanism cannot be applied.\nAssumption 6. (i) \u0398 = Rd.\n(ii) For any x \u2208 X , (cid:96)(\u00b7, x) is non-negative, L-Lipschitz and convex.\n(iii) \u2212 log \u03c0(\u00b7) is twice differentiable and m\u03c0-strongly convex.\nIn Assumption 6, the loss function (cid:96)(\u00b7, x) and the difference |(cid:96)(\u00b7, x1) \u2212 (cid:96)(\u00b7, x2)| can be unbounded.\nThus, the classical argument of the exponential mechanism in Section 2.2 cannot be applied. Nev-\nertheless, our analysis shows that the Gibbs posterior satis\ufb01es (\u03b5, \u03b4)-differential privacy.\nTheorem 7. Let \u03b2 \u2208 (0, 1] be a \ufb01xed parameter, and D, D(cid:48)\nUnder Assumption 6, inequality\n\n\u2208 X n be an adjacent pair of datasets.\n(cid:18)\n\n(cid:19)2(cid:33)\n\n(cid:32)\n\n(cid:27)\n\n(cid:26)\n\ndG\u03b2,D\ndG\u03b2,D(cid:48) \u2265 \u03b5\n\n\u2264 exp\n\n\u2212\n\nm\u03c0\n\n8L2\u03b22\n\n2L2\u03b22\n\nm\u03c0\n\n\u03b5 \u2212\n\n(6)\n\nG\u03b2,D\n\nlog\n\nholds for any \u03b5 > 2L2\u03b22\nm\u03c0\n\n.\n\nGibbs posterior G\u03b2,D satis\ufb01es (\u03b5, \u03b4)-differential privacy if \u03b2 > 0 is taken so that the right-hand side\nof (6) is bounded by \u03b4. It is elementary to check the following statement:\nCorollary 8. Let \u03b5 > 0 and 0 < \u03b4 < 1 be privacy parameters. Taking \u03b2 so that it satis\ufb01es\n\n(cid:114)\n\n\u03b5\n2L\n\n\u03b2 \u2264\n\nm\u03c0\n\n1 + 2 log(1/\u03b4)\n\n,\n\n(7)\n\nGibbs posterior G\u03b2,D satis\ufb01es (\u03b5, \u03b4)-differential privacy.\n\nNote that the right-hand side of (6) depends on the strong concavity m\u03c0. The strong concavity\nparameter corresponds to the precision (i.e.\ninverse variance) of the Gaussian, and a distribution\nwith large m\u03c0 becomes spiky. Intuitively, if we use a prior that has a strong shrinkage effect, then\nthe posterior becomes robust against a small change of the dataset, and consequently the differential\nprivacy can be satis\ufb01ed with a little effort. This observation is justi\ufb01ed in the following sense: the\nupper bound of \u03b2 grows proportionally to \u221am\u03c0. In contrast, the classical exponential mechanism\ndoes not have that kind of prior-dependency.\n\n3.2 Strongly convex loss\nLet \u02dc(cid:96) be a strongly convex function de\ufb01ned on the entire Euclidean space Rd. If (cid:96) is a restriction\nof \u02dc(cid:96) to a compact L2-ball, the Gibbs posterior can satisfy (\u03b5, 0)-differential privacy with a certain\nprivacy level \u03b5 > 0 because of the boundedness of (cid:96). However, using the boundedness of \u2207(cid:96) rather\nthan that of (cid:96) itself, we can give another guarantee for (\u03b5, \u03b4)-differential privacy.\n\n5\n\n\fAssumption 9. Suppose that a function \u02dc(cid:96) : Rd \u00d7 X \u2192 R is a twice differentiable and m(cid:96)-strongly\nconvex with respect to its \ufb01rst argument. Let \u02dc\u03c0 be a \ufb01nite measure over Rd that \u2212 log \u02dc\u03c0(\u00b7) is twice\ndifferentiable and m\u03c0-strongly convex. Let \u02dcG\u03b2,D is a Gibbs posterior on Rd whose density with\n\u02dc(cid:96)(\u03b8, xi))\u02dc\u03c0(\u03b8). Assume that the mean\n\nrespect to the Lebesgue measure is proportional to exp(\u2212\u03b2(cid:80)\n(cid:13)(cid:13)(cid:13)2 \u2264 \u03ba.\n(i) \u0398 is a compact L2-ball centered at the origin, and its radius R\u0398 satis\ufb01es R\u0398 \u2264 \u03ba + \u03b1(cid:112)d/m\u03c0.\n\nof \u02dcG\u03b2,D is contained in a L2-ball of radius \u03ba:\n\u2200D \u2208 X n,\n\nDe\ufb01ne a positive number \u03b1 > 1. Assume that (\u0398, (cid:96), \u03c0) satis\ufb01es the following conditions.\n\n(cid:13)(cid:13)(cid:13)E \u02dcG\u03b2,D\n\n(8)\n\n[\u03b8]\n\ni\n\nIn other words, L :=\n\n(ii) For any x \u2208 X ,\nsupx\u2208X sup\u03b8\u2208\u0398 (cid:107)\u2207\u03b8(cid:96)(\u03b8, x)(cid:107)2 is bounded.\n(iii) \u03c0 is given by a restriction of \u02dc\u03c0 to \u0398.\n\n(cid:96)(\u00b7, x) is L-Lipschitz, and convex.\n\nThe following statements are the counterparts of Theorem 7 and its corollary.\nTheorem 10. Let \u03b2 \u2208 (0, 1] be a \ufb01xed parameter, and D, D(cid:48)\nUnder Assumption 9, inequality\n\n(cid:26)\n\n(cid:27)\n\n(cid:32)\n\nG\u03b2,D\n\nlog\n\ndG\u03b2,D\ndG\u03b2,D(cid:48) \u2265 \u03b5\n\n\u2264 exp\n\n\u2212\n\nnm(cid:96)\u03b2 + m\u03c0\n\n4C(cid:48)\u03b22\n\n\u03b5 \u2212\n\nnm(cid:96)\u03b2 + m\u03c0\n\n\u2208 X n be an adjacent pair of datasets.\n(cid:18)\n\n(cid:19)2(cid:33)\n\nC(cid:48)\u03b22\n\n(9)\n\nnm(cid:96)\u03b2+m\u03c0\n\nholds for any \u03b5 > C(cid:48)\u03b22\n. Here, we de\ufb01ned C(cid:48) := 2CL2(1 + log(\u03b12/(\u03b12 \u2212 1))), where C > 0\nis a universal constant that does not depend on any other quantities.\nCorollary 11. Under Assumption 9, there exists an upper bound B(\u03b5, \u03b4) = B(\u03b5, \u03b4, n, m(cid:96), m\u03c0, \u03b1) >\n0, and G\u03b2(\u03b8 | Dn) with \u03b2 \u2264 B(\u03b5, \u03b4) satis\ufb01es (\u03b5, \u03b4)-differential privacy.\nSimilar to Corollary 8, the upper bound on \u03b2 depends on the prior. Moreover, the right-hand side of\n(9) decreases to 0 as the size of dataset n increases, which implies that (\u03b5, \u03b4)-differential privacy is\nsatis\ufb01ed almost for free if the size of the dataset is large.\n\n3.3 Example: Logistic regression\n\nIn this section, we provide an application of Theorem 7 to the problem of linear binary classi\ufb01cation.\nLet Z := {z \u2208 Rd,(cid:107)z(cid:107)2 \u2264 r} be a space of the input variables. The space of the observation is the\nset of input variables equipped with binary label X := {x = (z, y) \u2208 Z \u00d7{\u22121, +1}}. The problem\nis to determine a parameter \u03b8 = (a, b) of linear classi\ufb01er f\u03b8(z) = sgn(a(cid:62)z + b).\nDe\ufb01ne a loss function (cid:96)LR by\n\nThe (cid:96)2-regularized logistic regression estimator is given by\n\n(cid:96)LR(\u03b8, x) := log(1 + exp(\u2212y(a(cid:62)z + b))).\n(cid:41)\n\n(cid:40)\n\n\u02c6\u03b8LR = argmin\n\u03b8\u2208Rd+1\n\n(cid:96)LR(\u03b8, xi) +\n\n\u03bb\n2(cid:107)\u03b8(cid:107)2\n\n2\n\nn(cid:88)\n\ni=1\n\n1\nn\n\n(10)\n\n(11)\n\n,\n\nwhere \u03bb > 0 is a regularization parameter. Corresponding Gibbs posterior has a density\n\nG\u03b2(\u03b8 | D) \u221d\n\n\u03c3(yi(a(cid:62)zi + b))\u03b2\u03c6d+1(\u03b8 | 0, (n\u03bb)\u22121I),\n\n(12)\nwhere \u03c3(u) = (1 + exp(\u2212u))\u22121 is a sigmoid function, and \u03c6d+1(\u03b8 | \u00b5, \u03a3) is a density of (d + 1)-\ndimensional Gaussian distribution. It is easy to check that (cid:96)LR(\u00b7,x) is r-Lipschitz and convex, and\n\u2212 log \u03c6d+1(\u00b7 | 0, (n\u03bb\u22121)I) is (n\u03bb)-strongly convex. Hence, by Corollary 8, the Gibbs posterior\nsatis\ufb01es (\u03b5, \u03b4)-differential privacy if\n\ni=1\n\nn(cid:89)\n\n(cid:115)\n\n\u03b5\n2r\n\n\u03b2 \u2264\n\nn\u03bb\n\n1 + 2 log(1/\u03b4)\n\n.\n\n6\n\n(13)\n\n\f4 Approximation Arguments\n\nIn practice, exact samplers of Gibbs posteriors (1) are rarely available. Actual implementations\ninvolve some approximation processes. Markov Chain Monte Carlo (MCMC) methods and Varia-\ntional Bayes (VB) [1] are commonly used to obtain approximate samplers of Gibbs posteriors. The\nnext proposition, which is easily obtained as a variant of Proposition 3 of [20], gives a differential\nprivacy guarantee under approximation.\nProposition 12. Let \u03c1 : X n \u2192 M1\nprivacy. If for all D, there exist approximate sampling procedure \u03c1(cid:48)\nthen the randomized mechanism D (cid:55)\u2192 \u03c1(cid:48)\ndTV(\u00b5, \u03bd) = supA\u2208T |\u00b5(A) \u2212 \u03bd(A)| is the total variation distance.\nWe now describe a concrete example of MCMC, the Langevin Monte Carlo (LMC). Let \u03b8(0) \u2208 Rd\nbe an initial point of the Markov chain. The LMC algorithm for Gibbs posterior G\u03b2,D contains the\nfollowing iterations:\n\n+(\u0398) be a randomized estimator that satis\ufb01es (\u03b5, \u03b4)-differential\nD) \u2264 \u03b3,\nD satis\ufb01es (\u03b5\u03b4 + (1 + e\u03b5)\u03b3)-differential privacy. Here,\n\nD such that dTV(\u03c1D, \u03c1(cid:48)\n\n\u03b8(t+1) = \u03b8(t) \u2212 h\u2207U (\u03b8(t)) + \u221a2h\u03b7(t+1)\n\nU (\u03b8) = \u03b2\n\n(cid:96)(\u03b8, xi) \u2212 log \u03c0(\u03b8).\n\nn(cid:88)\n\ni=1\n\n(14)\n\n(15)\n\nHere \u03b7(1), \u03b7(2), . . . \u2208 Rd are noise vectors independently drawn from a centered Gaussian N (0, I).\nThis algorithm can be regarded as a discretization of a stochastic differential equation that has a\nstationary distribution G\u03b2,D, and its convergence property has been studied in \ufb01nite-time sense\n[9, 5, 11]. Let us denote by \u03c1(t) the law of \u03b8(t). If dTV(\u03c1(t), G\u03b2,D) \u2264 \u03b3 holds for all t \u2265 T , then\nthe privacy of the LMC sampler is obtained from Proposition 12. In fact, we can prove by Corollary\n1 of [9] the following proposition.\nProposition 13. Assume that Assumption 6 holds. Let (cid:96)(\u03b8, x) be M(cid:96)-smooth for all x \u2208 X , and\n\u2212 log \u03c0(\u03b8) be M\u03c0-smooth. Let d \u2265 2 and \u03b3 \u2208 (0, 1/2). We can choose \u03b2 > 0, by Corollary 8, so\nthat G\u03b2,D satis\ufb01es (\u03b5, \u03b4)-differential privacy. Let us set step size h of the LMC algorithm (14) as\n\nh =\n\nand set T as\n\nd(n\u03b2M(cid:96) + M\u03c0)2\n\n4 log\n\n+ d log\n\n(cid:104)\n\n(cid:20)\n\n2m\u03c0\u03b32\n\n(cid:17)\n\n(cid:16) 1\n(cid:19)\n(cid:18) 1\n\n\u03b3\n\n\u03b3\n\n(cid:17)(cid:105) ,\n(cid:16) n\u03b2M(cid:96)+M\u03c0\n(cid:19)(cid:21)2\n(cid:18) n\u03b2M(cid:96) + M\u03c0\n\nm\u03c0\n\nm\u03c0\n\n(16)\n\n.\n\n(17)\n\nT =\n\nd(n\u03b2M(cid:96) + M\u03c0)2\n\n4m\u03c0\u03b32\n\n4 log\n\n+ d log\n\nThen, after T iterations of (14), \u03b8(T ) satis\ufb01es (\u03b5, \u03b4 + (1 + e\u03b5)\u03b3)-differential privacy.\n\nThe algorithm suggested in Proposition 13 is closely related to the differentially private stochastic\ngradient Langevin dynamics (DP-SGLD) proposed by Wang, et al. [20]. Ignoring the computational\ncost, we can take the approximation error level \u03b3 > 0 arbitrarily small, while the convergence\nproperty to the target posterior distribution is not necessarily ensured about DP-SGLD.\n\n5 Proofs\n\nIn this section, we give a formal proof of Theorem 7 and a proof sketch of 10.\nThere is a vast literature on techniques to obtain a concentration inequality in (5) (see, for example,\n[4]). Logarithmic Sobolev inequality (LSI) is a useful tool for this purpose. We say that a probability\nmeasure \u00b5 over \u0398 \u2282 Rd satis\ufb01es LSI with constant DLS if inequality\n(cid:19)\n\n(18)\nholds for any integrable function f, provided the expectations in the expression are de\ufb01ned. It is\nknown that [15, 4], if \u00b5 satis\ufb01es LSI, then every real-valued L-Lipschitz function F behaves in a\nsub-Gaussian manner:\n\nE\u00b5[f 2 log f 2] \u2212 E\u00b5[f 2] log E\u00b5[f 2] \u2264 2DLSE\u00b5 (cid:107)\u2207f(cid:107)2\n\n(cid:18)\n\n2\n\n\u00b5{F \u2265 E\u00b5[F ] + t} \u2264 exp\n\nt2\n\n\u2212\n\n2L2DLS\n\n.\n\n(19)\n\n7\n\n\fIn our analysis, we utilize the LSI technique for the following two reasons: (a) a sub-Gaussian tail\nbound of the log-density ratio is obtained from (19), and (b) an upper bound on the KL-divergence\nis directly obtained from LSI, which appears to be dif\ufb01cult to prove by any other argument.\nRoughly speaking, LSI holds if the logarithm of the density is strongly concave. In particular, for a\nGibbs measure on Rd, the following fact is known.\nLemma 14 ([15]). Let U : Rd \u2192 R be a twice differential, m-strongly convex and integrable\nfunction. Let \u00b5 be a probability measure on Rd whose density is proportional to exp(\u2212U ). Then \u00b5\nsatis\ufb01es LSI (18) with constant DLS = m\u22121.\nIn this context, the strong convexity of U is related to the curvature-dimension condition CD(m,\u221e),\nwhich can be used to prove LSI on general Riemannian manifolds [19, 2].\n\nProof of Theorem 7. For simplicity, we assume that (cid:96)(\u00b7, x) (\u2200x \u2208 X ) is twice differentiable. For\ngeneral Lipschitz and convex loss functions (e.g. hinge loss), the theorem can be proved using a\ni (cid:96)(\u00b7, xi) \u2212 log \u03c0(\u00b7) is m\u03c0-strongly convex, Gibbs posterior\nG\u03b2,D satis\ufb01es LSI with constant m\u22121\n\u03c0 .\nLet D, D(cid:48)\n\u2208 X n be a pair of adjacent datasets. Considering appropriate permutation of the elements,\nwe can assume that D = (x1, . . . , xn) and D(cid:48) = (x(cid:48)\nn) differ in the \ufb01rst element, namely,\nx1 (cid:54)= x(cid:48)\n\nmolli\ufb01er argument. Since U (\u00b7) = \u03b2(cid:80)\n(cid:13)(cid:13)(cid:13)(cid:13)2\n\ni (i = 2, . . . , n). By the assumption that (cid:96)(\u00b7, x) is L-Lipschitz, we have\n\n= \u03b2(cid:107)\u2207((cid:96)(\u03b8, x1) \u2212 (cid:96)(\u03b8, x(cid:48)\n\n1 and xi = x(cid:48)\n\n1))(cid:107)2 \u2264 2\u03b2L,\n\ndG\u03b2,D\ndG\u03b2,D(cid:48)\n\n1, . . . , x(cid:48)\n\n(20)\n\ndG\u03b2,D(cid:48) is 2\u03b2L-Lipschitz. Then, by concentration inequality for Lipschitz\n\nand log-density ratio log dG\u03b2,D\nfunction (19), we have\n\n(cid:27)\n\n(cid:18)\n\n(cid:19)\n\n(cid:13)(cid:13)(cid:13)(cid:13)\u2207 log\n(cid:26)\n\n\u2200t > 0, G\u03b2,D\n\nlog\n\ndG\u03b2,D\ndG\u03b2,D(cid:48) \u2265 DKL(G\u03b2,D, G\u03b2,D(cid:48)) + t\n\n\u2264 exp\n\n\u2212\n\nm\u03c0t2\n8L2\u03b22\n\n(21)\n\nWe will show an upper bound of the KL-divergence. To simplify the notation, we will write F :=\ndG\u03b2,D\ndG\u03b2,D(cid:48) . Noting that\n\nand that\n\n\u221aF(cid:107)2\n2 = (cid:107)\u2207 exp(2\u22121 log F )(cid:107)2\n2 = (cid:107)\n(cid:107)\u2207\nDKL(G\u03b2,D, G\u03b2,D(cid:48)) = EG\u03b2,D [log F ]\n\n\u221aF\n2 \u2207 log F(cid:107)2\n\n2 \u2264\n\nF\n4 \u00b7 (2\u03b2L)2\n\n(22)\n\n(23)\n\n(24)\n\n(25)\n\n(cid:27)\n\nwe have, from LSI (18) with f = \u221aF ,\n2\nm\u03c0\n\nDKL(G\u03b2,D, G\u03b2,D(cid:48)) \u2264\nCombining (21) and (24), we have\n\n(cid:27)\n\n(cid:26)\n\nG\u03b2,D\n\nlog\n\ndG\u03b2,D\ndG\u03b2,D(cid:48) \u2265 \u03b5\n\n\u2264 G\u03b2,D\n\nEG\u03b2,D(cid:48)(cid:107)\u2207\n(cid:26)\n(cid:32)\n\nlog\n\n\u2264 exp\n\n\u2212\n\nfor any \u03b5 > 2L2\u03b22\nm\u03c0\n\n.\n\n= EG\u03b2,D(cid:48) [F log F ] \u2212 EG\u03b2,D(cid:48) [F ]EG\u03b2,D(cid:48) [log F ],\n\n\u221aF(cid:107)2\n2 \u2264\n\n2L2\u03b22\n\nm\u03c0\n\nEG\u03b2,D(cid:48) [F ] =\n\n2L2\u03b22\n\nm\u03c0\n\n.\n\ndG\u03b2,D\ndG\u03b2,D(cid:48) \u2265 \u03b5 + DKL(G\u03b2,D, G\u03b2,D(cid:48)) \u2212\nm\u03c0\n\n(cid:19)2(cid:33)\n\n2L2\u03b22\n\n(cid:18)\n\n2L2\u03b22\n\nm\u03c0\n\n8L2\u03b22\n\n\u03b5 \u2212\n\nm\u03c0\n\nProof sketch for Theorem 10. The proof is almost the same as that of Theorem 7. It is suf\ufb01cient to\nshow that the set of Gibbs posteriors {G\u03b2,D, D \u2208 X n} simultaneously satis\ufb01es LSI with the same\nconstant. Since the logarithm of the density is m := (nm(cid:96)\u03b2 + m\u03c0)-strongly convex, a probability\nmeasure \u02dcG\u03b2,D satis\ufb01es LSI with constant m\u22121. By the Poincar\u00b4e inequality for \u02dcG\u03b2,D, the variance\nof (cid:107)\u03b8(cid:107)2 is bounded by d/m \u2264 d/m\u03c0. By the Chebyshev inequality, we can check that the mass\nof parameter space is lower-bounded as \u02dcG\u03b2,D(\u0398) \u2265 p := 1 \u2212 \u03b1\u22122. Then, by Corollary 3.9 of\n[17], G\u03b2,D := \u02dcG\u03b2,D|\u0398 satis\ufb01es LSI with constant C(1 + log p\u22121)m\u22121, where C > 0 is a universal\nnumeric constant.\n\n8\n\n\fAcknowledgments\n\nThis work was supported by JSPS KAKENHI Grant Number JP15H02700.\n\nReferences\n[1] P. Alquier, J. Ridgway, and N. Chopin. On the properties of variational approximations of\n\nGibbs posteriors, 2015. Available at http://arxiv.org/abs/1506.04091.\n\n[2] D. Bakry, I. Gentil, and M. Ledoux. Analysis and Geometry of Markov Diffusion Operators.\n\nSpringer, 2014.\n\n[3] R. Bassily, A. Smith, and A. Thakurta. Differentially private empirical risk minimization:\n\nEf\ufb01cient algorithms and tight error bounds. In FOCS, 2014.\n\n[4] S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory\n\nof Independence. Oxford University Press, 2013.\n\n[5] S. Bubeck, R. Eldan, and J. Lehec. Finite-time analysis of projected Langevin Monte Carlo.\n\nIn NIPS, 2015.\n\n[6] O. Catoni. Pac-Bayesian Supervised Classi\ufb01cation: The Thermodynamics of Statistical Learn-\n\ning. IMS, 2007.\n\n[7] K. Chaudhuri, D. Hsu, and S. Song. The large margin mechanism for differentially private\n\nmaximization. In NIPS, 2014.\n\n[8] K. Chaudhuri, C. Monteleoni, and A.D. Sarwate. Differentially private empirical risk mini-\n\nmization. Journal of Machine Learning Research, 12:1069\u20131109, 2011.\n\n[9] A. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave\n\ndensities, 2014. Available at http://arxiv.org/abs/1412.7392.\n\n[10] C. Dimitrakakis, B. Nelson, and B. Rubinstein. Robust and private Bayesian inference. In\n\nAlgorithmic Learning Theory, 2014.\n\n[11] A. Durmus and E. Moulines. Non-asymptotic convergence analysis for the unadjusted langevin\n\nalgorithm, 2015. Available at http://arxiv.org/abs/1507.05021.\n\n[12] C. Dwork. Differential privacy. In ICALP, pages 1\u201312, 2006.\n\n[13] R. Hall, A. Rinaldo, and L. Wasserman. Differential privacy for functions and functional data.\n\nJournal of Machine Learning Research, 14:703\u2013727, 2013.\n\n[14] D. Kifer, A. Smith, and A. Thakurta. Private convex empirical risk minimization and high-\n\ndimensional regression. In COLT, 2012.\n\n[15] M. Ledoux. Concentration of Measure and Logarithmic Sobolev Inequalities, volume 1709 of\n\nS\u00b4eminaire de Probabilit\u00b4es XXXIII Lecture Notes in Mathematics. Springer, 1999.\n\n[16] F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007.\n\n[17] E. Milman. Properties of isoperimetric, functional and Transport-Entropy inequalities via con-\n\ncentration. Probability Theory and Related Fields, 152:475\u2013507, 2012.\n\n[18] D. Mir. Differential privacy: an exploration of the privacy-utility landscape. PhD thesis,\n\nRutgers University, 2013.\n\n[19] C. Villani. Optimal Transport: Old and New. Springer, 2009.\n\n[20] Y. Wang, S. Fienberg, and A. Smola. Privacy for free: Posterior sampling and stochastic\n\ngradient monte carlo. In ICML, 2015.\n\n[21] T. Zhang. From \u03b5-entropy to KL-entropy: Analysis of minimum information complexity den-\n\nsity estimation. The Annals of Statistics, 34(5):2180\u20132210, 2006.\n\n9\n\n\f", "award": [], "sourceid": 576, "authors": [{"given_name": "Kentaro", "family_name": "Minami", "institution": "The University of Tokyo"}, {"given_name": "HItomi", "family_name": "Arai", "institution": "The University of Tokyo"}, {"given_name": "Issei", "family_name": "Sato", "institution": "The University of Tokyo"}, {"given_name": "Hiroshi", "family_name": "Nakagawa", "institution": "The University of Tokyo"}]}