{"title": "The Large Margin Mechanism for Differentially Private Maximization", "book": "Advances in Neural Information Processing Systems", "page_first": 1287, "page_last": 1295, "abstract": "A basic problem in the design of privacy-preserving algorithms is the \\emph{private maximization problem}: the goal is to pick an item from a universe that (approximately) maximizes a data-dependent function, all under the constraint of differential privacy. This problem has been used as a sub-routine in many privacy-preserving algorithms for statistics and machine learning. Previous algorithms for this problem are either range-dependent---i.e., their utility diminishes with the size of the universe---or only apply to very restricted function classes. This work provides the first general purpose, range-independent algorithm for private maximization that guarantees approximate differential privacy. Its applicability is demonstrated on two fundamental tasks in data mining and machine learning.", "full_text": "The Large Margin Mechanism\n\nfor Differentially Private Maximization\n\nKamalika Chaudhuri\n\nUC San Diego\nLa Jolla, CA\n\nkamalika@cs.ucsd.edu\n\ndjhsu@cs.columbia.edu\n\nDaniel Hsu\n\nColumbia University\n\nNew York, NY\n\nShuang Song\nUC San Diego\nLa Jolla, CA\n\nshs037@eng.ucsd.edu\n\nAbstract\n\nA basic problem in the design of privacy-preserving algorithms is the private max-\nimization problem: the goal is to pick an item from a universe that (approximately)\nmaximizes a data-dependent function, all under the constraint of differential pri-\nvacy. This problem has been used as a sub-routine in many privacy-preserving\nalgorithms for statistics and machine learning.\nPrevious algorithms for this problem are either range-dependent\u2014i.e., their utility\ndiminishes with the size of the universe\u2014or only apply to very restricted function\nclasses. This work provides the \ufb01rst general purpose, range-independent algo-\nrithm for private maximization that guarantees approximate differential privacy.\nIts applicability is demonstrated on two fundamental tasks in data mining and ma-\nchine learning.\n\n1\n\nIntroduction\n\nDifferential privacy [17] is a cryptographically motivated de\ufb01nition of privacy that has recently\ngained signi\ufb01cant attention in the data mining and machine learning communities. An algorithm\nfor processing sensitive data enforces differential privacy by ensuring that the likelihood of any\noutcome does not change by much when a single individual\u2019s private data changes. Privacy is\ntypically guaranteed by adding noise either to the sensitive data, or to the output of an algorithm\nthat processes the sensitive data. For many machine learning tasks, this leads to a corresponding\ndegradation in accuracy or utility. Thus a central challenge in differentially private learning is to\ndesign algorithms with better tradeoffs between privacy and utility for a wide variety of statistics\nand machine learning tasks.\nIn this paper, we study the private maximization problem, a fundamental problem that arises while\ndesigning privacy-preserving algorithms for a number of statistical and machine learning applica-\ntions. We are given a sensitive dataset D \u2286 X n comprised of records from n individuals. We are\nalso given a data-dependent objective function f : U \u00d7 X n \u2192 R, where U is a universe of K items\nto choose from, and f (i,\u00b7) is (1/n)-Lipschitz for all i \u2208 U. That is, |f (i, D(cid:48))\u2212 f (i, D(cid:48)(cid:48))| \u2264 1/n for\nall i and for any D(cid:48), D(cid:48)(cid:48) \u2208 X n differing in just one individual\u2019s entry. Always selecting an item that\nexactly maximizes f (\u00b7, D) is generally non-private, so the goal is to select, in a differentially private\nmanner, an item i \u2208 U with as high an objective f (i, D) as possible. This is a very general algorith-\nmic problem that arises in many applications, include private PAC learning [25] (choosing the most\naccurate classi\ufb01er), private decision tree induction [21] (choosing the most informative split), private\nfrequent itemset mining [5] (choosing the most frequent itemset), private validation [12] (choosing\nthe best tuning parameter), and private multiple hypothesis testing [32] (choosing the most likely\nhypothesis).\nThe most common algorithms for this problem are the exponential mechanism [28], and a com-\nputationally ef\ufb01cient alternative from [5], which we call the max-of-Laplaces mechanism. These\n\n1\n\n\falgorithms are general\u2014they do not require any additional conditions on f to succeed\u2014and hence\nhave been widely applied. However, a major limitation of both algorithms is that their utility suf-\nfers from an explicit range-dependence: the utility deteriorates with increasing universe size. The\nrange-dependence persists even when there is a single clear maximizer of f (\u00b7, D), or a few near\nmaximizers, and even when the maximizer remains the same after changing the entries of a large\nnumber of individuals in the data. Getting around range-dependence has therefore been a goal for\ndesigning algorithms for this problem.\nThis problem has also been addressed by recent algorithms of [31, 3], who provide algorithms that\nare range-independent and satisfy approximate differential privacy, a relaxed version of differential\nprivacy. However, none of these algorithms is general; they explicitly fail unless additional special\nconditions on f hold. For example, the algorithm from [31] provides a range-independent result only\nwhen there is a single clear maximizer i\u2217 such that f (i\u2217, D) is greater than the second highest value\nby some margin; the algorithm from [3] also has restrictive conditions that limit its applicability (see\nSection 2.2). Thus, a challenge is to develop a private maximization algorithm that is both range-\nindependent and free of additional conditions; this is necessary to ensure that an algorithm is widely\napplicable and provides good utility when the universe size is large.\nIn this work, we provide the \ufb01rst such general purpose range-independent private maximization\nalgorithm. Our algorithm is based on two key insights. The \ufb01rst is that private maximization is\neasier when there is a small set of near maximizing items j \u2208 U for which f (j, D) is close to the\nmaximum value maxi\u2208U f (i, D). A plausible algorithm based on this insight is to \ufb01rst \ufb01nd a set\nof near maximizers, and then run the exponential mechanism on this set. However, \ufb01nding this\nset directly in a differentially private manner is very challenging. Our second insight is that only\nthe number (cid:96) of near maximizers needs to be found in a differentially private manner\u2014a task that\nis considerably easier. Provided there is a margin between the maximum value and the ((cid:96) + 1)-th\nmaximum value of f (i, D), running the exponential mechanism on the items with the top (cid:96) values\nof f (i, D) results in approximate differential privacy as well as good utility.\nOur algorithm, which we call the large margin mechanism, automatically exploits large margins\nwhen they exist to simultaneously (i) satisfy approximate differential privacy (Theorem 2), as well as\n(ii) provide a utility guarantee that depends (logarithmically) only on the number of near maximizers,\nrather than the universe size (Theorem 3). We complement our algorithm with a lower bound,\nshowing that the utility of any approximate differentially private algorithm must deteriorate with\nthe number of near maximizers (Theorem 1). A consequence of our lower bound is that range-\nindependence cannot be achieved with pure differential privacy (Proposition 1), which justi\ufb01es our\nrelaxation to approximate differential privacy.\nFinally, we show the applicability of our algorithm to two problems from data mining and machine\nlearning: frequent itemset mining and private PAC learning. For the \ufb01rst problem, an application\nof our method gives the \ufb01rst algorithm for frequent itemset mining that simultaneously guarantees\napproximate differential privacy and utility independent of the itemset universe size. For the second\nproblem, our algorithm achieves tight sample complexity bounds for private PAC learning analogous\nto the shell bounds of [26] for non-private learning.\n\n2 Background\n\nThis section reviews differential privacy and introduces the private maximization problem.\n\n2.1 De\ufb01nitions of Differential Privacy and Private Maximization\nFor the rest of the paper, we consider randomized algorithms A : X n \u2192 \u2206(S) that take as input\ndatasets D \u2208 X n comprised of records from n individuals, and output values in a range S. Two\ndatasets D, D(cid:48) \u2208 X n are said to be neighbors if they differ in a single individual\u2019s entry. A function\n\u03c6 : X n \u2192 R is L-Lipschitz if |\u03c6(D) \u2212 \u03c6(D(cid:48))| \u2264 L for all neighbors D, D(cid:48) \u2208 X n.\nThe following de\ufb01nitions of (approximate) differential privacy are from [17] and [20].\nDe\ufb01nition 1 (Differential Privacy). A randomized algorithm A : X n \u2192 \u2206(S) is said to be (\u03b1, \u03b4)-\napproximate differentially private if, for all neighbors D, D(cid:48) \u2208 X n and all S \u2286 S,\n\nPr(A(D) \u2208 S) \u2264 e\u03b1 Pr(A(D(cid:48)) \u2208 S) + \u03b4.\n\n2\n\n\fThe algorithm A is \u03b1-differentially private if it is (\u03b1, 0)-approximate differentially private.\nSmaller values of the privacy parameters \u03b1 > 0 and \u03b4 \u2208 [0, 1] imply stronger guarantees of privacy.\nDe\ufb01nition 2 (Private Maximization). In the private maximization problem, a sensitive dataset\nD \u2286 X n comprised of records from n individuals is given as input; there is also a universe\nU := {1, . . . , K} of K items, and a function f : U \u00d7 X n \u2192 R such that f (i,\u00b7) is (1/n)-Lipschitz\nfor all i \u2208 U. The goal is to return an item i \u2208 U such that f (i, D) is as large as possible while\nsatisfying (approximate) differential privacy.\nAlways returning the exact maximizer of f (\u00b7, D) is non-private, as changing a single individuals\u2019\nprivate values can potentially change the maximizer. Our goal is to design a randomized algorithm\nthat outputs an approximate maximizer with high probability. (We loosely refer to the expected\nf (\u00b7, D) value of the chosen item as the utility of the algorithm.)\nNote that this problem is different from private release of the maximum value of f (\u00b7, D); a solu-\ntion for the latter is easily obtained by adding Laplace noise with standard deviation O(1/(\u03b1n)) to\nmaxi\u2208U f (i, D) [17]. Privately returning a nearly maximizing item itself is much more challenging.\nPrivate maximization is a core problem in the design of differentially private algorithms, and arises\nin numerous statistical and machine learning tasks. The examples of frequent itemset mining and\nPAC learning are discussed in Sections 4.1 and 4.2.\n\n2.2 Previous Algorithms for Private Maximization\n\nThe standard algorithm for private maximization is the exponential mechanism [28]. Given a privacy\nparameter \u03b1 > 0, the exponential mechanism randomly draws an item i \u2208 U with probability\npi \u221d en\u03b1f (i,D)/2; this guarantees \u03b1-differential privacy. While the exponential mechanism is widely\nused because of its generality, a major limitation is its range-dependence\u2014i.e., its utility diminishes\nwith the universe size K. To be more precise, consider the following example where X := U = [K]\nand\n\nf (i, D) :=\n\n|{j \u2208 [n] : Dj \u2265 i}|\n\n1\nn\n\n(1)\n\n(where Dj is the j-th entry in the dataset D). When D = (1, 1, . . . , 1), there is a clear maximizer\ni\u2217 = 1, which only changes when the entries of at least n/2 individuals in D change. It stands\nto reason that any algorithm should report i = 1 in this case with high probability. However, the\nexponential mechanism outputs i = 1 only with probability en\u03b1/2/(K \u2212 1 + en\u03b1/2), which is small\nunless n = \u2126(log(K)/\u03b1). This implies that the utility of the exponential mechanism deteriorates\nwith K.\nAnother general purpose algorithm is the max-of-Laplaces mechanism from [5]. Unfortunately, this\nalgorithm is also range-dependent. Indeed, our \ufb01rst observation is that all \u03b1-differentially private\nalgorithms that succeed on a wide class of private maximization problems share this same drawback.\nProposition 1 (Lower bound for differential privacy). Let A be any \u03b1-differentially private algo-\nrithm for private maximization, \u03b1 \u2208 (0, 1), and n \u2265 2. There exists a domain X , a function\nf : U \u00d7X n \u2192 R such that f (i,\u00b7) is (1/n)-Lipschitz for all i \u2208 U, and a dataset D \u2208 X n such that:\n\n(cid:32)\n\nf (A(D), D) > max\n\nPr\n\ni\u2208U f (i, D) \u2212 log K\u22121\n\n2\n\u03b1n\n\n<\n\n1\n2\n\n.\n\n(cid:33)\n\nWe remark that results similar to Proposition 1 have appeared in [23, 2, 10, 11, 7]; we simply re-\nframe those results here in the context of private maximization.\nProposition 1 implies that in order to remove range-dependence, we need to relax the privacy notion.\nWe consider a relaxation of the privacy constraint to (\u03b1, \u03b4)-approximate differential privacy with\n\u03b4 > 0.\nThe approximate differentially private algorithm from [31] applies in the case where there is a single\nclear maximizer whose value is much larger than that of the rest. This algorithm adds Laplace noise\nwith standard deviation O(1/(\u03b1n)) to the difference between the largest and the second-largest val-\nues of f (\u00b7, D), and outputs the maximizer if this noisy difference is larger than O(log(1/\u03b4)/(\u03b1n));\n\n3\n\n\fotherwise, it outputs Fail. Although this solution has high utility for the example in (1) with\nD = (1, 1, . . . , 1), it fails even when there is a single additional item j \u2208 U with f (j, D) close to\nthe maximum value; for instance, D = (2, 2, . . . , 2).\n[3] provides an approximate differentially private algorithm that applies when f satis\ufb01es a condition\ncalled (cid:96)-bounded growth. This condition entails the following: \ufb01rst, for any i \u2208 U, adding a single\nindividual to any dataset D can either keep f (i, D) constant, or increase it by 1/n; and second,\nf (i, D) can only increase in this case for at most (cid:96) items i \u2208 U. The utility of this algorithm depends\nonly on log (cid:96), rather than log K.\nIn contrast, our algorithm does not require the \ufb01rst condition.\nFurthermore, to ensure that our algorithm only depends on log (cid:96), it suf\ufb01ces that there only be \u2264(cid:96)\nnear maximizers, which is substantially less restrictive than the (cid:96)-bounded growth condition.\nAs mentioned earlier, we avoid range-dependence with an algorithm that \ufb01nds and optimizes over\nnear maximizers of f (\u00b7, D). We next specify what we mean by near maximizers using a notion of\nmargin.\n\n3 The Large Margin Mechanism\n\nWe now our new algorithm for private maximization, called the large margin mechanism, along with\nits privacy and utility guarantees.\n\n3.1 Margins\n\nf ((cid:96)+1)(D) < f (1)(D) \u2212 \u03b3\n\nWe \ufb01rst introduce the notion of margin on which our algorithm is based. Given an instance of the\nprivate maximization problem and a positive integer (cid:96) \u2208 N, let f ((cid:96))(D) denote the (cid:96)-th highest value\nof f (\u00b7, D). We adopt the convention that f (K+1)(D) = \u2212\u221e.\nCondition 1 (((cid:96), \u03b3)-margin condition). For any (cid:96) \u2208 N and \u03b3 > 0, we say a dataset D \u2208 X n satis\ufb01es\nthe ((cid:96), \u03b3)-margin condition if\n(i.e., there are at most (cid:96) items within \u03b3 of the top item according to f (\u00b7, D)).1\nBy convention, every dataset satis\ufb01es the (K, \u03b3)-margin condition. Intuitively, a ((cid:96), \u03b3)-margin con-\ndition with a relatively large \u03b3 implies that there are \u2264(cid:96) near maximizers, so the private maximization\nproblem is easier when D satis\ufb01es an ((cid:96), \u03b3)-margin condition with small (cid:96).\nHow large should \u03b3 be for a given (cid:96)? The following lower bound suggests that in order to have\nn = O(log((cid:96))/\u03b1), we need \u03b3 to be roughly log((cid:96))/(\u03b1n).\nTheorem 1 (Lower bound for approximate differential privacy). Fix any \u03b1 \u2208 (0, 1), (cid:96) > 1, and \u03b4 \u2208\n[0, (1 \u2212 exp(\u2212\u03b1))/(2((cid:96) \u2212 1))]; and assume n \u2265 2. Let A be any (\u03b1, \u03b4)-approximate differentially\nprivate algorithm, and \u03b3 := min{1/2, log(((cid:96) \u2212 1)/2)/(n\u03b1)}. There exists a domain X , a function\nf : U \u00d7X n \u2192 R such that f (i,\u00b7) is (1/n)-Lipschitz for all i \u2208 U, and a dataset D \u2208 X n such that:\n\n(cid:17)\n\n<\n\n1\n2\n\n.\n\n(cid:16)\n\n1. D satis\ufb01es the ((cid:96), \u03b3)-margin condition.\n\n2. Pr\n\nf (A(D), D) > f (1)(D) \u2212 \u03b3\n\nA consequence of Theorem 1 is that complete range-independence for all (1/n)-Lipschitz func-\ntions f is not possible, even with approximate differential privacy. For instance, if D satis\ufb01es an\n((cid:96), \u2126(log((cid:96))/(\u03b1n)))-margin condition only when (cid:96) = \u2126(K), then n must be \u2126(log(K)/\u03b1) in order\nfor an approximate differentially private algorithm to be useful.\n\n3.2 Algorithm\n\nThe lower bound in Theorem 1 suggests the following algorithm. First, privately determine a pair\n((cid:96), \u03b3), with (cid:96) is as small as possible and \u03b3 = \u2126(log((cid:96))/(\u03b1n)), such that D satis\ufb01es the ((cid:96), \u03b3)-margin\n\n1Our notion of margins here is different from the usual notion of margins from statistical learning that\nunderlies linear prediction methods like support vector machines and boosting. In fact, our notion is more\nclosely related to the shell decomposition bounds of [26], which we discuss in Section 4.2.\n\n4\n\n\fAlgorithm 1 The large margin mechanism LMM(\u03b1, \u03b4, D)\ninput Privacy parameters \u03b1 > 0 and \u03b4 \u2208 (0, 1), database D \u2208 X n.\noutput Item I \u2208 U.\n1: For each r = 1, 2, . . . , K, let\n\n(cid:18)\n\n6\nn\n3\nn\u03b1\n\nt(r) :=\n\n1 +\n\nln(3r/\u03b4)\n\n\u03b1\n6\nn\u03b1\n\n3\n2\u03b4\n\n(cid:19)\n\n(cid:18) 1\n\nn\n\n= O\n\n3\n\u03b4\n\n12\nn\u03b1\n\n(cid:19)\n\n,\n\n+\n\nlog\n\n1\nn\u03b1\n\nr\n\u03b4\n3r(r + 1)\n\n(cid:18) 1\n\n1\nn\u03b1\n\n(cid:19)\n\nr\n\u03b4\n\nn\u03b1\n\n5\n\nT (r) :=\n\n+ t(r) = O\n\n\u03b4\n\n+\n\nlog\n\n.\n\nln\n\n+\n\nn\n\nln\n\n+\n\niid\u223c Lap(12/\u03b1).\n\nln\n2: Draw Z \u223c Lap(3/\u03b1).\n3: Let m := f (1)(D) + Z/n. {Estimate of max value.}\n4: Draw G \u223c Lap(6/\u03b1) and Z1, Z2, . . . , ZK\u22121\n5: Let (cid:96) := 1. {Adaptively determine value (cid:96) such that D satis\ufb01es ((cid:96), t((cid:96)))-margin condition.}\n6: while (cid:96) < K do\n7:\n8:\n9:\n10:\n11:\n12: end while\n13: Let U(cid:96) be the set of (cid:96) items in U with highest f (i, D) value (ties broken arbitrarily).\n14: Draw I \u223c p where pi \u221d 1{i \u2208 U(cid:96)} exp(n\u03b1f (i, D)/6). {Exponential mechanism on top (cid:96)\n15: return I.\n\nif m \u2212 f ((cid:96)+1)(D) > (Z(cid:96) + G)/n + T ((cid:96)) then\nelse\n\nBreak out of while-loop with current value of (cid:96).\n\nLet (cid:96) := (cid:96) + 1.\n\nitems.}\n\nend if\n\ncondition. Then, run the exponential mechanism on the set U(cid:96) \u2286 U of items with the (cid:96) highest\nf (\u00b7, D) values. This sounds rather natural and simple, but a knee-jerk reaction to this approach is\nthat the set U(cid:96) itself depends on the sensitive dataset D, and it may have high sensitivity in the sense\nthat membership of many items in U(cid:96) can change when a single individual\u2019s private value is changed.\nThus differentially private computation of U(cid:96) appears challenging.\nIt turns out we do not need to guarantee the privacy of the set U(cid:96), but rather just of a valid ((cid:96), \u03b3)\npair. This is essentially because when D satis\ufb01es the ((cid:96), \u03b3)-margin condition, the probability that\nthe exponential mechanism picks an item i that occurs in U(cid:96) when the sensitive dataset is D but not\nin U(cid:96) when the sensitive dataset is its neighbor D(cid:48) is very small.\nMoreover, we can \ufb01nd such a valid ((cid:96), \u03b3) pair using a differentially private search procedure based\non the sparse vector technique [22]. Combining these ideas gives a general (and adaptive) algo-\nrithm whose loss of utility due to privacy is only O(log((cid:96)/\u03b4)/\u03b1n) when the dataset satis\ufb01es a\n((cid:96), O(log((cid:96)/\u03b4)/(\u03b1n))-margin condition. We call this general algorithm the large margin mecha-\nnism (Algorithm 1), or LMM for short.\n\n3.3 Privacy and Utility Guarantees\n\nWe \ufb01rst show that LMM satis\ufb01es approximate differential privacy.\nTheorem 2 (Privacy guarantee). LMM(\u03b1, \u03b4,\u00b7) satis\ufb01es (\u03b1, \u03b4)-approximate differential privacy.\nThe proof of Theorem 2 is in Appendix A.1. The following theorem, proved in Appendix A.2,\nprovides a guarantee on the utility of LMM.\nTheorem 3 (Utility guarantee). Pick any \u03b7 \u2208 (0, 1). Suppose D \u2208 X n satis\ufb01es the ((cid:96)\u2217, \u03b3\u2217)-margin\ncondition with\n\n\u03b3\u2217 =\n\n21\nn\u03b1\n\nln\n\n3\n\u03b7\n\n+ T ((cid:96)\u2217).\n\nThen with probability at least 1 \u2212 \u03b7, I := LMM(\u03b1, \u03b4, D) satis\ufb01es\nf (I, D) \u2265 f (1)(D) \u2212 6 ln(2(cid:96)\u2217/\u03b7)\n\n.\n\n\f(Above, T ((cid:96)\u2217) is as de\ufb01ned in Algorithm 1.)\nRemark 1. Fix some \u03b1, \u03b4 \u2208 (0, 1). Theorem 3 states that if the dataset D satis\ufb01es the ((cid:96)\u2217, \u03b3\u2217)-\nmargin condition, for some positive integer (cid:96)\u2217 and \u03b3\u2217 = C log((cid:96)\u2217/\u03b4)/(n\u03b1) for some universal\nconstant C > 0, then the value f (I, D) of the item I returned by LMM is within O(log((cid:96)\u2217)/(n\u03b1))\nof the maximum, with high probability. There is no explicit dependence on the cardinality K of the\nuniverse U.\n\n4\n\nIllustrative Applications\n\nWe now describe applications of LMM to problems from data mining and machine learning.\n\n4.1 Private Frequent Itemset Mining\n\nFrequent Itemset Mining (FIM) is the following popular data mining problem: given the purchase\nlists of users (say, for an online grocery store), the goal is to \ufb01nd the sets of items that are pur-\nchased together most often. The work of [5] provides the \ufb01rst differentially private algorithms\nfor FIM. However, as these algorithms rely on the exponential mechanism and the max-of-Laplaces\nmechanism, their utilities degrade with the total number of possible itemsets. Subsequent algorithms\nexploit other properties of itemsets or avoid directly \ufb01nding the most frequent itemset [34, 27, 15, 8].\nLet I be the set of items that can be purchased, and let B be the maximum length of an user\u2019s\n\npurchase list. Let U \u2286 2I be the family of itemsets of interest. For simplicity, we let U :=(cid:0)I\n(cid:1), which is (cid:28) |I|r whenever B (cid:28) |I|. Applying LMM gives the following guarantee.\nL \u2264 n(cid:0)B\n\ni.e., all itemsets of size r\u2014and consider the problem of picking the itemset with the (approximately)\nhighest frequency. This is a private maximization problem where D is the users\u2019 lists of purchased\nitems, and f (i, D) is the fraction of users who purchase an itemset i \u2208 U. Let fmax be the highest\nfrequency of an itemset in D. Let L be the total number of itemsets with non-zero frequency, so\n\nCorollary 1. Suppose we use LMM(\u03b1, \u03b4,\u00b7) on the FIM problem above. Then there exists a constant\nC > 0 such that the following holds. If fmax \u2265 C \u00b7 log(L/\u03b4)/(n\u03b1), then with probability \u2265 1 \u2212 \u03b4,\nthe frequency of the itemset ILMM output by LMM is\n\n(cid:1)\u2014\n\nr\n\nr\n\nf (ILMM, D) \u2265 fmax \u2212 O\n\nf (IEM, D) \u2265 fmax \u2212 O\n\n(cid:18) log(L/\u03b4)\n(cid:19)\n(cid:19)\n(cid:18) r log(|I|/\u03b4)\n\nn\u03b1\n\n.\n\n,\n\nn\u03b1\n\nIn contrast, the itemset IEM returned by the exponential mechanism is only guaranteed to satisfy\n\nwhich is signi\ufb01cantly worse than Corollary 1 whenever L (cid:28) |I|r (as is typically the case). Second,\nto ensure differential privacy by running the exponential mechanism, one needs a priori knowledge\nof the set U (and thus the universe of items I) independently of the observed data; otherwise the\nprocess will not be end-to-end differentially private. In contrast, our algorithm does not need to\nknow I in order to provide end-to-end differential privacy. Finally, unlike [31], our algorithm does\nnot require a gap between the top two itemset frequencies.\n\n4.2 Private PAC Learning\nWe now consider private PAC learning with a \ufb01nite hypothesis class H with bounded VC dimension\nd [25]. Here, the dataset D consists of n labeled training examples drawn iid from a \ufb01xed distri-\nbution. The error err(h) of a hypothesis h \u2208 H is the probability that it misclassi\ufb01es a random\nexample drawn from the same distribution. The goal is to return a hypothesis h \u2208 H with error\nas low as possible. A standard procedure that has been well-studied in the literature simply returns\n\nthe minimizer \u02c6h \u2208 H of the empirical error (cid:99)err(h, D) computed on the training data D, but this\n\ndoes not guarantee (approximate) differential privacy. The work of [25] instead uses the exponential\nmechanism to select a hypothesis hEM \u2208 H. With probability \u2265 1 \u2212 \u03b40,\n\n(cid:32)(cid:114)\n\n(cid:33)\n\n.\n\n(2)\n\nerr(hEM) \u2264 min\n\nh\u2208H err(h) + O\n\nd log(n/\u03b40)\n\nn\n\n+\n\nlog |H| + log(1/\u03b40)\n\n\u03b1n\n\n6\n\n\fThe dependence on log |H| is improved to d log |\u03a3| by [7] when the data entries come from a \ufb01-\nnite set \u03a3. The subsequent work of [4] introduces the notion of representation dimension, and\nshows how it relates to differentially private learning in the discrete and \ufb01nite case, and [3] pro-\nvides improved convergence bounds with approximate differential privacy that exploit the structure\nof some speci\ufb01c hypothesis classes. For the case of in\ufb01nite hypothesis classes and continuous data\ndistributions, [10] shows that distribution-free private PAC learning is not generally possible, but\ndistribution-dependent learning can be achieved under certain conditions.\nWe provide a sample complexity bound of a rather different character compared to previous work.\nOur bound only relies on uniform convergence properties of H, and can be signi\ufb01cantly tighter\nthan the bounds from [25] when the number of hypotheses with error close to minh\u2208H err(h) is\nsmall. Indeed, the bounds are a private analogue of the shell bounds of [26], which characterize\nthe structure of the hypothesis class as a function of the properties of a decomposition based on\nhypotheses\u2019 error rates. In many situation, these bounds are signi\ufb01cantly tighter than those that do\nnot involve the error distributions.\n\nFollowing [26], we divide the hypothesis class H into R = O((cid:112)n/(d log n)) shells; the t-th shell\n\nH(t) is de\ufb01ned by\n\n(cid:40)\n\n(cid:114)\n\n(cid:41)\n\n.\n\nH(t) :=\n\nh \u2208 H : err(h) \u2264 min\n\nh(cid:48)\u2208H err(h(cid:48)) + C0t\n\nd log(n/\u03b40)\n\nthat for all h \u2208 H, with probability \u2265 1 \u2212 \u03b40, we have |(cid:99)err(h, D) \u2212 err(h)| \u2264 c(cid:112)d log(n/\u03b40)/n.\n(cid:99)err(h, D) \u2264 minh(cid:48)\u2208H err(h(cid:48)) + C0 \u00b7 (t + 1)(cid:112)d log(n/\u03b40)/n.\n\nAbove, C0 > 0 is the constant from uniform convergence bounds\u2014i.e., C0 is the smallest c > 0 such\nObserve that H(t + 1) \u2286 H(t); and moreover, with probability \u2265 1 \u2212 \u03b40, all h \u2208 H(t) have\n\nn\n\nLet t\u2217(n) as the smallest integer t \u2208 N such that\n\nlog(|H(t + 1)|) + log(1/\u03b4)\n\n\u221a\n\n\u2264 C0\u03b1\n\ndn log n\nC\n\nt\n\nwhere C > 0 is the constant from Remark 1. Then, with probability \u2265 1 \u2212 \u03b40, the dataset D with\n1)|/\u03b4)/(\u03b1n). Therefore, we have the following guarantee for applying LMM to this problem.\nCorollary 2. Suppose we use LMM(\u03b1, \u03b4,\u00b7) on the learning problem above (with U = H and f =\n\nf = 1\u2212(cid:99)err satis\ufb01es the ((cid:96), \u03b3)-margin condition, with (cid:96) = |H(t\u2217(n)+1)| and \u03b3 = C log(|H(t\u2217(n)+\n1 \u2212(cid:99)err). Then, with probability \u2265 1 \u2212 \u03b40 \u2212 \u03b4, the hypothesis hLMM returned by LMM satis\ufb01es\n\nerr(hLMM) \u2264 min\n\nh\u2208H err(h) + O\n\nd log(n/\u03b40)\n\nn\n\n+\n\nlog(|H(t\u2217(n) + 1)|/\u03b4)\n\n\u03b1n\n\n(cid:32)(cid:114)\n\n(cid:33)\n\n.\n\nThe dependence on log |H| from (2) is replaced here by log(|H(t\u2217(n) + 1)|/\u03b4), which can be vastly\nsmaller, as discussed in [26].\n\n5 Additional Related Work\n\nThere has been a large amount of work on differential privacy for a wide range of statistical and ma-\nchine learning tasks over the last decade [6, 30, 13, 21, 33, 24, 1]; for overviews, see [18] and [29]. In\nparticular, algorithms for the private maximization problem (and variants) have been used as subrou-\ntines in many applications; examples include PAC learning [25], principle component analysis [14],\nperformance validation [12], and multiple hypothesis testing [32].\nA separation between pure and approximate differential privacy has been shown in several previous\nworks [19, 31, 3]. The \ufb01rst approximate differentially private algorithm that achieves a separation is\nthe Propose-Test-Release (PTR) framework [19]. Given a function, PTR determines an upper bound\non its local sensitivity at the input dataset through a search procedure; noise proportional to this\nupper bound is then added to the actual function value. We note that the PTR framework does not\ndirectly apply to our setting as the sensitivity is not generally de\ufb01ned for a discrete universe.\nIn the context of private PAC learning, the work of [3] gives the \ufb01rst separation between pure and\napproximate differential privacy.\nIn addition to using the algorithm from [31], they devise two\n\n7\n\n\fadditional algorithmic techniques: a concave maximization procedure for learning intervals, and an\nalgorithm for the private maximization problem under the (cid:96)-bounded growth condition discussed\nin Section 2.2. The \ufb01rst algorithm is speci\ufb01c to their problem and does not appear to apply to\ngeneral private maximization problems. The second algorithm has a sample complexity bound of\nn = O(log((cid:96))/\u03b1) when the function f satis\ufb01es the (cid:96)-bounded growth condition.\nLower bounds for approximate differential privacy have been shown by [7, 16, 11, 9], and the proof\nof our Theorem 1 borrows some techniques from [11].\n\n6 Conclusion and Future Work\n\nIn this paper, we have presented the \ufb01rst general and range-independent algorithm for approximate\ndifferentially private maximization. The algorithm automatically adapts to the available large margin\nproperties of the sensitive dataset, and reverts to worst-case guarantees when such properties are\nlacking. We have illustrated the applicability of the algorithm in two fundamental problems from\ndata mining and machine learning; in future work, we plan to study other applications where range-\nindependence is a substantial boon.\n\nAcknowledgments. We thank an anonymous reviewer for suggesting the simpler variant of LMM\nbased on the exponential mechanism. (The original version of LMM used a max of truncated ex-\nponentials mechanism, which gives the same guarantees up to constant factors.) This work was\nsupported in part by the NIH under U54 HL108460 and the NSF under IIS 1253942.\n\nReferences\n[1] Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization,\n\nrevisited. arXiv:1405.7085, 2014.\n\n[2] Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample com-\nplexity for private learning and private data release. In Theory of Cryptography, pages 437\u2013\n454. Springer, 2010.\n\n[3] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure vs.\n\napproximate differential privacy. In RANDOM, 2013.\n\n[4] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Characterizing the sample complexity of\n\nprivate learners. In ITCS, pages 97\u2013110, 2013.\n\n[5] Raghav Bhaskar, Srivatsan Laxman, Adam Smith, and Abhradeep Thakurta. Discovering fre-\n\nquent patterns in sensitive data. In KDD, 2010.\n\n[6] A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the SuLQ framework. In\n\nPODS, 2005.\n\n[7] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive\n\ndatabase privacy. Journal of the ACM, 60(2):12, 2013.\n\n[8] Luca Bonomi and Li Xiong. Mining frequent patterns with differential privacy. Proceedings\n\nof the VLDB Endowment, 6(12):1422\u20131427, 2013.\n\n[9] Mark Bun, Jonathan Ullman, and Salil Vadhan. Fingerprinting codes and the price of approxi-\n\nmate differential privacy. In STOC, 2014.\n\n[10] Kamalika Chaudhuri and Daniel Hsu. Sample complexity bounds for differentially private\n\nlearning. In COLT, 2011.\n\n[11] Kamalika Chaudhuri and Daniel Hsu. Convergence rates for differentially private statistical\n\nestimation. In ICML, 2012.\n\n[12] Kamalika Chaudhuri and Staal A Vinterbo. A stability-based validation procedure for differen-\ntially private machine learning. In Advances in Neural Information Processing Systems, pages\n2652\u20132660, 2013.\n\n[13] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empir-\n\nical risk minimization. Journal of Machine Learning Research, 12:1069\u20131109, 2011.\n\n8\n\n\f[14] Kamalika Chaudhuri, Anand D. Sarwate, and Kaushik Sinha. Near-optimal differentially pri-\nIn Advances in Neural Information Processing Systems, pages\n\nvate principal components.\n998\u20131006, 2012.\n\n[15] Rui Chen, Noman Mohammed, Benjamin CM Fung, Bipin C Desai, and Li Xiong. Publishing\n\nset-valued data via differential privacy. In VLDB, 2011.\n\n[16] Anindya De. Lower bounds in differential privacy.\n\nIn Ronald Cramer, editor, Theory of\nCryptography, volume 7194 of Lecture Notes in Computer Science, pages 321\u2013338. Springer-\nVerlag, 2012.\n\n[17] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private\n\ndata analysis. In Theory of Cryptography, 2006.\n\n[18] Cynthia Dwork. Differential privacy: A survey of results.\n\nModels of Computation, pages 1\u201319. Springer, 2008.\n\nIn Theory and Applications of\n\n[19] Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the\n\n41st annual ACM symposium on Theory of computing, pages 371\u2013380. ACM, 2009.\n\n[20] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our\n\ndata, ourselves: Privacy via distributed noise generation. In EURO-CRYPT. 2006.\n\n[21] A. Friedman and A. Schuster. Data mining with differential privacy. In KDD, 2010.\n[22] Moritz Hardt and Guy N Rothblum. A multiplicative weights mechanism for privacy-\n\npreserving data analysis. In FOCS, 2010.\n\n[23] Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In Proceedings of the\n\n42nd ACM symposium on Theory of computing, pages 705\u2013714. ACM, 2010.\n\n[24] Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning.\n\nIn COLT, 2012.\n\n[25] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam\n\nSmith. What can we learn privately? SIAM Journal on Computing, 40(3):793\u2013826, 2011.\n\n[26] John Langford and David McAllester. Computable shell decomposition bounds. J. Mach.\n\nLearn. Res., 5:529\u2013547, 2004.\n\n[27] Ninghui Li, Wahbeh Qardaji, Dong Su, and Jianneng Cao. Privbasis: frequent itemset mining\n\nwith differential privacy. In VLDB, 2012.\n\n[28] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In FOCS, 2007.\n[29] A.D. Sarwate and K. Chaudhuri. Signal processing and machine learning with differential\nprivacy: Algorithms and challenges for continuous data. Signal Processing Magazine, IEEE,\n30(5):86\u201394, Sept 2013. ISSN 1053-5888. doi: 10.1109/MSP.2013.2259911.\n\n[30] Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates.\n\nSTOC, 2011.\n\nIn\n\n[31] Adam Smith and Abhradeep Thakurta. Differentially private feature selection via stability\n\narguments, and the robustness of the lasso. In COLT, 2013.\n\n[32] Caroline Uhler, Aleksandra B. Slavkovic, and Stephen E. Fienberg. Privacy-preserving data\n\nsharing for genome-wide association studies. arXiv:1205.0739, 2012.\n\n[33] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal\n\nof the American Statistical Association, 105(489):375\u2013389, 2010.\n\n[34] Chen Zeng, Jeffrey F Naughton, and Jin-Yi Cai. On differentially private frequent itemset\n\nmining. In VLDB, 2012.\n\n9\n\n\f", "award": [], "sourceid": 729, "authors": [{"given_name": "Kamalika", "family_name": "Chaudhuri", "institution": "UC San Diego"}, {"given_name": "Daniel", "family_name": "Hsu", "institution": "Columbia University"}, {"given_name": "Shuang", "family_name": "Song", "institution": "UC San Diego"}]}