{"title": "Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers", "book": "Advances in Neural Information Processing Systems", "page_first": 1876, "page_last": 1884, "abstract": "As increasing amounts of sensitive personal information finds its way into data repositories, it is important to develop analysis mechanisms that can derive aggregate information from these repositories without revealing information about individual data instances. Though the differential privacy model provides a framework to analyze such mechanisms for databases belonging to a single party, this framework has not yet been considered in a multi-party setting. In this paper, we propose a privacy-preserving protocol for composing a differentially private aggregate classifier using classifiers trained locally by separate mutually untrusting parties. The protocol allows these parties to interact with an untrusted curator to construct additive shares of a perturbed aggregate classifier. We also present a detailed theoretical analysis containing a proof of differential privacy of the perturbed aggregate classifier and a bound on the excess risk introduced by the perturbation. We verify the bound with an experimental evaluation on a real dataset.", "full_text": "Multiparty Differential Privacy via Aggregation of\n\nLocally Trained Classi\ufb01ers\n\nManas A. Pathak\n\nCarnegie Mellon University\n\nPittsburgh, PA\n\nmanasp@cs.cmu.edu\n\nShantanu Rane\n\nCambridge, MA\n\nrane@merl.com\n\nMitsubishi Electric Research Labs\n\nCarnegie Mellon University\n\nBhiksha Raj\n\nPittsburgh, PA\n\nbhiksha@cs.cmu.edu\n\nAbstract\n\nAs increasing amounts of sensitive personal information \ufb01nds its way into data\nrepositories, it is important to develop analysis mechanisms that can derive ag-\ngregate information from these repositories without revealing information about\nindividual data instances. Though the differential privacy model provides a frame-\nwork to analyze such mechanisms for databases belonging to a single party, this\nframework has not yet been considered in a multi-party setting. In this paper, we\npropose a privacy-preserving protocol for composing a differentially private ag-\ngregate classi\ufb01er using classi\ufb01ers trained locally by separate mutually untrusting\nparties. The protocol allows these parties to interact with an untrusted curator to\nconstruct additive shares of a perturbed aggregate classi\ufb01er. We also present a\ndetailed theoretical analysis containing a proof of differential privacy of the per-\nturbed aggregate classi\ufb01er and a bound on the excess risk introduced by the per-\nturbation. We verify the bound with an experimental evaluation on a real dataset.\n\n1\n\nIntroduction\n\nIn recent years, individuals and corporate entities have gathered large quantities of personal data.\nOften, they may wish to contribute the data towards the computation of functions such as various\nstatistics, responses to queries, classi\ufb01ers etc.\nIn the process, however, they risk compromising\nthe privacy of the individuals by releasing sensitive information such as their medical or \ufb01nancial\nrecords, addresses and telephone numbers, preferences of various kinds which the individuals may\nnot want exposed. Merely anonymizing the data is not suf\ufb01cient \u2013 an adversary with access to\npublicly available auxiliary information can still recover the information about individual, as was\nthe case with the de-anonymization of the Net\ufb02ix dataset [1].\nIn this paper, we address the problem of learning a classi\ufb01er from a multi-party collection of such\nprivate data. A set of parties P1, P2, . . . , PK each possess data D1, D2, . . . , DK. The aim is to\nlearn a classi\ufb01er from the union of all the data D1\u222a D2 . . .\u222a DK. We speci\ufb01cally consider a logistic\nregression classi\ufb01er, but as we shall see, the techniques are generally applicable to any classi\ufb01cation\nalgorithm. The conditions we impose are that (a) None of the parties are willing to share the data\nwith one another or with any third party (e.g. a curator). (b) The computed classi\ufb01er cannot be\nreverse engineered to learn about any individual data instance possessed by any contributing party.\nThe conventional approach to learning functions in this manner is through secure multi-party com-\nputation (SMC) [2]. Within SMC individual parties use a combination of cryptographic techniques\nand oblivious transfer to jointly compute a function of their private data [3, 4, 5]. The techniques\ntypically provide guarantees that none of the parties learn anything about the individual data besides\nwhat may be inferred from the \ufb01nal result of the computation. Unfortunately, this does not satisfy\ncondition (b) above. For instance, when the outcome of the computation is a classi\ufb01er, it does not\nprevent an adversary from postulating the presence of data instances whose absence might change\n\n1\n\n\fthe decision boundary of the classi\ufb01er, and verifying the hypothesis using auxiliary information if\nany. Moreover, for all but the simplest computational problems, SMC protocols tend to be highly\nexpensive, requiring iterated encryption and decryption and repeated communication of encrypted\npartial results between participating parties.\nAn alternative theoretical model for protecting the privacy of individual data instances is differential\nprivacy [6]. Within this framework, a stochastic component is added to any computational mecha-\nnism, typically by the addition of noise. A mechanism evaluated over a database is said to satisfy\ndifferential privacy if the probability of the mechanism producing a particular output is almost the\nsame regardless of the presence or absence of any individual data instance in the database. Dif-\nferential privacy provides statistical guarantees that the output of the computation does not carry\ninformation about individual data instances. On the other hand, in multiparty scenarios where the\ndata used to compute a function are distributed across several parties, it does not provide any mech-\nanism for preserving the privacy of the contributing parties from one another or alternately, from a\ncurator who computes the function from the combined data.\nWe provide an alternative solution: within our approach the individual parties locally compute an\noptimal classi\ufb01er with their data. The individual classi\ufb01ers are then averaged to obtain the \ufb01nal ag-\ngregate classi\ufb01er. The aggregation is performed through a secure protocol that also adds a stochastic\ncomponent to the averaged classi\ufb01er, such that the resulting aggregate classi\ufb01er is differentially\nprivate, i.e., no inference may be made about individual data instances from the classi\ufb01er. This\nprocedure satis\ufb01es both criteria (a) and (b) mentioned above. Furthermore, it is signi\ufb01cantly less\nexpensive than any SMC protocol to compute the classi\ufb01er on the combined data.\nWe also present theoretical guarantees on the classi\ufb01er. We provide a fundamental result that the\nexcess risk of an aggregate classi\ufb01er obtained by averaging classi\ufb01ers trained on individual subsets,\ncompared to the optimal classi\ufb01er computed on the combined data in the union of all subsets, is\nbounded by a quantity that depends on the size of the smallest subset. We prove that the addition of\nthe noise does indeed result in a differentially private classi\ufb01er. We also provide a bound on the true\nexcess risk of the differentially private averaged classi\ufb01er compared to the optimal classi\ufb01er trained\non the combined data. Finally, we present experimental evaluation of the proposed technique on\na UCI Adult dataset which is a subset of the 1994 census database and empirically show that the\ndifferentially private classi\ufb01er trained using the proposed method provides the performance close to\nthe optimal classi\ufb01er when the distribution of data across parties is reasonably equitable.\n\n2 Differential Privacy\n\nIn this paper, we consider the differential privacy model introduced by Dwork [6]. Given any two\ndatabases D and D(cid:48) differing by one element, which we will refer to as adjacent databases, a\nrandomized query function M is said to be differentially private if the probability that M produces\na response S on D is close to the probability that M produces the same response S on D(cid:48). As\nthe query output is almost the same in the presence or absence of an individual entry with high\nprobability, nothing can be learned about any individual entry from the output.\n\nDe\ufb01nition A randomized function M with a well-de\ufb01ned probability density P satis\ufb01es \u0001-\ndifferential privacy if, for all adjacent databases D and D(cid:48) and for any S \u2208 range(M ),\n\n(1)\n\n(cid:12)(cid:12)(cid:12)(cid:12)log\n\n(cid:12)(cid:12)(cid:12)(cid:12) \u2264 \u0001.\n\nP (M (D) = S)\nP (M (D(cid:48)) = S)\n\nIn a classi\ufb01cation setting, the training dataset may be thought of as the database and the algorithm\nlearning the classi\ufb01cation rule as the query mechanism. A classi\ufb01er satisfying differential privacy\nimplies that no additional details about the individual training data instances can be obtained with\ncertainty from output of the learning algorithm, beyond the a priori background knowledge. Differ-\nential privacy provides an ad omnia guarantee as opposed to most other models that provide ad hoc\nguarantees against a speci\ufb01c set of attacks and adversarial behaviors. By evaluating the differentially\nprivate classi\ufb01er over a large number of test instances, an adversary cannot learn the exact form of\nthe training data.\n\n2\n\n\f2.1 Related Work\n\nDwork et al. [7] proposed the exponential mechanism for creating functions satisfying differential\nprivacy by adding a perturbation term from the Laplace distribution scaled by the sensitivity of the\nfunction. Chaudhuri and Monteleoni [8] use the exponential mechanism [7] to create a differen-\ntially private logistic regression classi\ufb01er by perturbing the estimated parameters with multivariate\nLaplacian noise scaled by the sensitivity of the classi\ufb01er. They also propose another method to learn\nclassi\ufb01ers satisfying differential privacy by adding a linear perturbation term to the objective func-\ntion which is scaled by Laplacian noise. Nissim, et al. [9] show we can create a differentially private\nfunction by adding noise from Laplace distribution scaled by the smooth sensitivity of the function.\nWhile this mechanism results in a function with lower error, the smooth sensitivity of a function\ncan be dif\ufb01cult to compute in general. They also propose the sample and aggregate framework\nfor replacing the original function with a related function for which the smooth sensitivity can be\neasily computed. Smith [10] presents a method for differentially private unbiased MLE using this\nframework.\nAll the previous methods are inherently designed for the case where a single curator has access\nto the entire data and is interested in releasing a differentially private function computed over the\ndata. To the best of our knowledge and belief, ours is the \ufb01rst method designed for releasing a\ndifferentially private classi\ufb01er computed over training data owned by different parties who do not\nwish to disclose the data to each other. Our technique was principally motivated by the sample and\naggregate framework, where we considered the samples to be owned by individual parties. Similar\nto [10], we choose a simple average as the aggregation function and the parties together release the\nperturbed aggregate classi\ufb01er which satis\ufb01es differential privacy. In the multi-party case, however,\nadding the perturbation to the classi\ufb01er is no longer straightforward and it is necessary to provide a\nsecure protocol to do this.\n\n3 Multiparty Classi\ufb01cation Protocol\n\nThe problem we address is as follows: a number of parties P1, . . . , PK possess data sets D1, . . . , DK\nwhere Di = (x, y)|j includes a set of instances x and their binary labels y. We want to train a logistic\nregression classi\ufb01er on the combined data such that no party is required to expose any of its data,\nand the no information about any single data instance can be obtained from the learned classi\ufb01er.\nThe protocol can be divided into the three following phases:\n\n3.1 Training Local Classi\ufb01ers on Individual Datasets\nEach party Pj uses their data set (x, y)|j to learn an (cid:96)2 regularized logistic regression classi\ufb01er with\nweights \u02c6wj. This is obtained by minimizing the following objective function\n\n\u02c6wj = argmin\n\nw\n\nJ(w) = argmin\n\nw\n\n1\nnj\n\nlog\n\n1 + e\u2212yiwT xi\n\n+ \u03bbwT w,\n\n(2)\n\n(cid:17)\n\nwhere \u03bb > 0 is the regularization parameter. Note that no data or information has been shared yet.\n\n3.2 Publishing a Differentially Private Aggregate Classi\ufb01er\n\nThe proposed solution, illustrated by Figure 1, proceeds as follows. The parties then collaborate\nj \u02c6wj + \u03b7, where \u03b7 is a d-dimensional\nto compute an aggregate classi\ufb01er given by \u02c6ws = 1\nK\nn(1)\u0001\u03bb and n(1) =\nrandom variable sampled from a Laplace distribution scaled with the parameter\nminj nj. As we shall see later, composing an aggregate classi\ufb01er in this manner incurs only a well-\nbounded excess risk over training a classi\ufb01er directly on the union of all data while enabling the\nparties to maintain their privacy. We also show in Section 4.1 that the noise term \u03b7 ensures that\nthe classi\ufb01er \u02c6ws satis\ufb01es differential privacy, i.e., that individual data instances cannot be discerned\nfrom the aggregate classi\ufb01er. The de\ufb01nition of the noise term \u03b7 above may appear unusual at this\nstage, but it has an intuitive explanation: A classi\ufb01er constructed by aggregating locally trained\nclassi\ufb01ers is limited by the performance of the individual classi\ufb01er that has the least number of data\ninstances. This will be formalized in Section 4.2. We note that the parties Pj cannot simply take\n\n2\n\n(cid:88)\n\ni\n\n(cid:16)\n\n(cid:80)\n\n3\n\n\fclassi\ufb01ers, because aggregating such classi\ufb01ers will not give the correct \u03b7 \u223c Lap(cid:0)2/(n(1)\u0001\u03bb)(cid:1) in\n\ntheir individually trained classi\ufb01ers \u02c6wj, perturb them with a noise vector and publish the perturbed\n\ngeneral. Since individual parties cannot simply add noise to their classi\ufb01ers to impose differential\nprivacy, the actual averaging operation must be performed such that the individual parties do not\nexpose their own classi\ufb01ers or the number of data instances they possess. We therefore use a private\nmultiparty protocol, interacting with an untrusted curator \u201cCharlie\u201d to perform the averaging. The\noutcome of the protocol is such that each of the parties obtain additive shares of the \ufb01nal classi\ufb01er\n\u02c6ws, such that these shares must be added to obtain \u02c6ws.\n\nFigure 1: Multiparty protocol to securely compute additive shares of \u02c6ws.\n\nPrivacy-Preserving Protocol\n\nWe use asymmetric key additively homomorphic encryption [11]. A desirable property of such\nschemes is that we can perform operations on the ciphertext elements which map into known op-\nerations on the same plaintext elements. For an additively homomorphic encryption function \u03be(\u00b7),\n\u03be(a) \u03be(b) = \u03be(a + b), \u03be(a)b = \u03be(ab). Note that the additively homomorphic scheme employed\nhere is semantically secure, i.e., repeated encryption of the same plaintext will result in different\nciphertexts. For the ensuing protocol, encryption keys are considered public and decryption keys are\nprivately owned by the speci\ufb01ed parties. Assuming the parties to be honest-but-curious, the steps of\nthe protocol are as follows.\n\nStage 1. Finding the index of the smallest database obfuscated by permutation.\n\n1. Each party Pj computes nj = aj + bj, where aj and bj are integers representing\nadditive shares of the database lengths nj for j = 1, 2, ..., K. Denote the K-length\nvectors of additive shares as a and b respectively.\n\n2. The parties Pj mutually agree on a permutation \u03c01 on the index vector (1, 2, ..., K).\nThis permutation is unknown to Charlie. Then, each party Pj sends its share aj to\nparty P\u03c01(j), and sends its share bj to Charlie with the index changed according to the\npermutation. Thus, after this step, the parties have permuted additive shares given by\n\u03c01(a) while Charlie has permuted additive shares \u03c01(b).\n\n3. The parties Pj generate a key pair (pk,sk) where pk is a public key for homomorphic\nencryption and sk is the secret decryption key known only to the parties but not to\nCharlie. Denote element-wise encryption of a by \u03be(a). The parties send \u03be(\u03c01(a)) =\n\u03c01(\u03be(a)) to Charlie.\n4. Charlie generates a random vector r = (r1, r2,\u00b7\u00b7\u00b7 , rK) where the elements ri are\nintegers chosen uniformly at random and are equally likely to be positive or negative.\nThen, he computes \u03be(\u03c01(aj))\u03be(rj) = \u03be(\u03c01(aj) + rj). In vector notation, he computes\n\u03be(\u03c01(a) + r). Similarly, by subtracting the same random integers in the same order\nto his own shares, he obtains \u03c01(b) \u2212 r where \u03c01 was the permutation unknown to\nhim and applied by the parties. Then, Charlie selects a permutation \u03c02 at random\n\n4\n\ncuratoradditive secret sharing& blind-and-permuteIndicator vector with permuted index of smallest databasecuratorencryption& reverse permutations&add noise vectorsEncrypted Laplaciannoise vectoradded obliviously by smallest databasecuratoradditive secret sharing of noise term& additive secret sharing of local classifersperfectly privateadditive shares ofNullStage 1Stage 2Stage 3\fand obtains \u03c02(\u03be(\u03c01(a) + r)) = \u03be(\u03c02(\u03c01(a) + r)) and \u03c02(\u03c01(b) \u2212 r). He sends\n\u03be(\u03c02(\u03c01(a) + r)) to the individual parties in the following order: First element to P1,\nsecond element to P2,...,Kth element to PK.\n\n5. Each party decrypts the signal received from Charlie. At this point, the parties\nP1, P2, ..., PK respectively possess the elements of the vector \u03c02(\u03c01(a) + r) while\nCharlie possesses the vector \u03c02(\u03c01(b) \u2212 r). Since \u03c01 is unknown to Charlie and \u03c02\nis unknown to the parties, the indices in both vectors have been complete obfuscated.\nNote also that, adding the vector collectively owned by the parties and the vector\nowned by Charlie would give \u03c02(\u03c01(a) + r) + \u03c02(\u03c01(b) \u2212 r) = \u03c02(\u03c01(a + b)) =\n\u03c02(\u03c01(n)). This situation in this step is similar to that encountered in the \u201cblind and\npermute\u201d protocol used for minimum-\ufb01nding by Du and Atallah [12].\n6. Let \u03c02(\u03c01(a) +r) = \u02dca and \u03c02(\u03c01(b)\u2212r) = \u02dcb. Then ni > nj \u21d2 \u02dcai +\u02dcbi > \u02dcaj +\u02dcbj \u21d2\n\u02dcai \u2212 \u02dcaj > \u02dcbj \u2212 \u02dcbi. For each (i, j) pair with i, j \u2208 {1, 2, ..., K}, these comparisons\ncan be solved by any implementation of a secure millionaire protocol [2]. When all\nthe comparisons are done, Charlie \ufb01nds the index \u02dcj such that \u02dca\u02dcj + \u02dcb\u02dcj = minj nj. The\ntrue index corresponding to the smallest database has already been obfuscated by the\nsteps of the protocol. Charlie holds only an additive share of minj nj and thus cannot\nknow the true length of the smallest database.\n\nStage 2. Obliviously obtaining encrypted noise vector from the smallest database.\n\n2 (u) where \u03c0\u22121\n\n1. Charlie constructs an K indicator vector u such that u\u02dcj = 1 and all other elements are\n0. He then obtains the permuted vector \u03c0\u22121\ninverts \u03c02. He generates a\nkey-pair (pk(cid:48),sk(cid:48)) for additive homomorphic function \u03b6(\u00b7) where only the encryption\nkey pk(cid:48) is publicly available to the parties Pj. Charlie then transmits \u03b6(\u03c0\u22121\n2 (u)) =\n\u03c0\u22121\n2 (\u03b6(u)) to the parties Pj.\n2 (\u03b6(u))) = \u03b6(v) where \u03c0\u22121\ninverts the permutation \u03c01 originally applied by the parties Pj in Stage I. Now that\nboth permutations have been removed, the index of the non-zero element in the indi-\ncator vector v corresponds to the true index of the smallest database. However, since\nthe parties Pj cannot decrypt \u03b6(\u00b7), they cannot \ufb01nd out this index.\n\n2. The parties mutually obtain a permuted vector \u03c0\u22121\n\n1 (\u03c0\u22121\n\n2\n\n1\n\n3. For j = 1, . . . , K, party Pj generates \u03b7j, a d-dimensional noise vector sampled from\nnj \u0001\u03bb. Then, it obtains a d-dimensional vector\n\na Laplace distribution with parameter\n\u03c8j where for i = 1, . . . , d, \u03c8j(i) = \u03b6(v(j))\u03b7j (i) = \u03b6(v(j) \u03b7j(i)).\n\n4. All parties Pj now compute a d-dimensional noise vector \u03c8 such that, for i = 1, . . . , d,\n\n2\n\n\u03c8(i) =(cid:81)\n\nj \u03c8j(i) =(cid:81)\n\nj \u03b6(v(j)\u03b7j(i)) = \u03b6\n\nj v(j)\u03b7j(i)\n\n(cid:16)(cid:80)\n\n(cid:17)\n\n.\n\nThe reader will notice that, by construction, the above equation selects only the\nLaplace noise terms for the smallest database, while rejecting the noise terms for all\nother databases. This is because v has an element with value 1 at the index correspond-\ning to the smallest database and has zeroes everywhere else. Thus, the decryption of\n\u03c8 is equal to \u03b7 which was the desired perturbation term de\ufb01ned at the beginning of\nthis section.\n\nStage 3. Generating secret additive shares of \u02c6ws.\n\n1. One of the parties, say P1, generates a d-dimensional random integer noise vector s,\nand transmits \u03c8(i)\u03b6(s(i)) for all i = 1, . . . , d to Charlie. Using s effectively prevents\nCharlie from discovering \u03b7, and therefore still ensures that no information is leaked\nabout the database owners Pj. P1 computes w1 \u2212 Ks.\n\n2. Charlie decrypts \u03c8(i)\u03b6(s(i)) to obtain \u03b7(i) + s(i) for i = 1, . . . , d. At this stage, the\nparties and Charlie have the following d-dimensional vectors: Charlie has K(\u03b7 + s),\nP1 has \u02c6w1 \u2212 Ks, and all other parties Pj, j = 2, . . . , K have \u02c6wj. None of the K + 1\nparticipants can share this data for fear of compromising differential privacy.\n\n3. Finally, Charlie and the K database-owning parties run a simple secure function eval-\nuation protocol [13], at the end of which each of the K + 1 participants obtains an\nadditive share of K \u02c6ws. This protocol is provably private against honest but curious\nparticipants when there are no collisions. The resulting shares are published.\n\n5\n\n\fThe above protocol ensures the following (a) None of the K+1 participants, or users of the perturbed\naggregate classi\ufb01er can \ufb01nd out the size of any database, and therefore none of the parties knows\nwho contributed \u03b7 (b) Neither Charlie nor any of the parties Pj can individually remove the noise \u03b7\nafter the additive shares are published. This last property is important because if anyone knowingly\ncould remove the noise term, then the resulting classi\ufb01er no longer provides differential privacy.\n\n3.3 Testing Phase\nA test participant Dave having a test data instance x(cid:48) \u2208 Rd is interested in applying the trained\nclassi\ufb01er adds the published shares and divides by K to get the differentially private classi\ufb01er \u02c6ws.\nand decide to classify x(cid:48) with label \u22121 if\nHe can then compute the sigmoid function t =\nt \u2264 1\n\n2 and with label 1 if t > 1\n2.\n\n1+e\u2212\u02c6wsT xi\n\n1\n\n4 Theoretical Analysis\n\n4.1 Proof of Differential Privacy\n\nWe show that the perturbed aggregate classi\ufb01er satis\ufb01es differential privacy. We use the follow-\ning bound on the sensitivity of the regularized regression classi\ufb01er as proved in Corollary 2 in [8]\nrestated in the appendix as Theorem 6.1.\nTheorem 4.1. The classi\ufb01er \u02c6ws preserves \u0001-differential privacy. For any two adjacent datasets D\nand D(cid:48),\n\n(cid:12)(cid:12)(cid:12)(cid:12)log\n\n(cid:12)(cid:12)(cid:12)(cid:12) \u2264 \u0001.\n\nP (\u02c6ws|D)\nP (\u02c6ws|D(cid:48))\n\nProof. Consider the case where one instance of the training dataset D is changed to result in an\nadjacent dataset D(cid:48). This would imply a change in one element in the training dataset of one party\nand thereby a change in the corresponding learned vector \u02c6ws\nj. Assuming that the change is in the\ndataset of the party Pj, the change in the learned vector is only going to be in \u02c6wj; let denote the new\nclassi\ufb01er by \u02c6w(cid:48)\nnj \u0001\u03bb. Following\nan argument similar to [7], considering that we learn the same vector \u02c6ws using either the training\ndatasets D and D(cid:48), we have\n\nj. In Theorem 6.1, we bound the sensitivity of \u02c6wj as (cid:107) \u02c6wj \u2212 \u02c6w(cid:48)\n\nj(cid:107)1 \u2264 2\n\nP ( \u02c6ws|D)\nP ( \u02c6ws|D(cid:48))\n\n=\n\nP ( \u02c6wj + \u03b7|D)\n\nj + \u03b7|D(cid:48)(cid:1) =\nP(cid:0) \u02c6w(cid:48)\n(cid:20) n(1)\u0001\u03bb\n(cid:21)\n\n2\n\n\u2264 exp\n\n(cid:104) n(1)\u0001\u03bb\n(cid:104) n(1)\u0001\u03bb\n(cid:20) n(1)\n\n2\n\n2\n\n(cid:105)\n(cid:105) \u2264 exp\n\n(cid:107) \u02c6wj(cid:107)1\n(cid:21)\n(cid:107) \u02c6w(cid:48)\nj(cid:107)1\n\u2264 exp(\u0001),\n\n\u0001\n\nexp\n\nexp\n\u2264 exp\n\n(cid:20) n(1)\u0001\u03bb\n\n2\n\n(cid:21)\n\n(cid:107) \u02c6wj \u2212 \u02c6w(cid:48)\n\nj(cid:107)1\n\nby the de\ufb01nition of function sensitivity. Similarly, we can lower bound the the ratio by exp(\u2212\u0001).\n\n2\n\nnj\u03bb\n\nnj\n\n4.2 Analysis of Excess Error\n\nIn the following discussion, we consider how much excess error is introduced when using a per-\nturbed aggregate classi\ufb01er \u02c6ws satisfying differential privacy as opposed to the unperturbed classi\ufb01er\nw\u2217 trained on the entire training data while ignoring the privacy constraints as well as the unper-\nturbed aggregate classi\ufb01er \u02c6w.\nWe \ufb01rst establish a bound on the (cid:96)2 norm of the difference between the aggregate classi\ufb01er \u02c6w and the\nclassi\ufb01er w\u2217 trained over the entire training data. To prove the bound we apply Lemma 1 from [8]\nrestated as Lemma 6.2 in the appendix. Please refer to the appendix for the proof of the following\ntheorem.\nTheorem 4.2. Given the aggregate classi\ufb01er \u02c6w, the classi\ufb01er w\u2217 trained over the entire training\ndata and n(1) is the size of the smallest training dataset,\n\n(cid:107)\u02c6w \u2212 w\u2217(cid:107)2 \u2264 K \u2212 1\n\n.\n\nn(1)\u03bb\n\n6\n\n\fThe bound is inversely proportional to the number of instances in the smallest dataset. This indicates\nthat when the datasets are of disparate sizes, \u02c6w will be a lot different from w\u2217. The largest possible\nK in which case all parties having an equal amount of training data and \u02c6w will\nvalue for n(1) is n\nbe closest to w\u2217. In the one party case for K = 1, the bound indicates that norm of the difference\nwould be upper bounded by zero, which is a valid sanity check as the aggregate classi\ufb01er \u02c6w is the\nsame as w\u2217.\nWe use this result to establish a bound on the empirical risk of the perturbed aggregate classi\ufb01er\n\u02c6ws = \u02c6w + \u03b7 over the empirical risk of the unperturbed classi\ufb01er w\u2217 in the following theorem.\nPlease refer to the appendix for the proof.\nTheorem 4.3. If all data instances xi lie in a unit ball, with probability at least 1 \u2212 \u03b4, the empirical\nregularized excess risk of the perturbed aggregate classi\ufb01er \u02c6ws over the classi\ufb01er w\u2217 trained over\nentire training data is\nJ(\u02c6ws) \u2264 J(w\u2217) +\n\n2d(K \u2212 1)(\u03bb + 1)\n\n(K \u2212 1)2(\u03bb + 1)\n\n(cid:18) d\n\n(cid:18) d\n\n(cid:19)\n\n(cid:19)\n\nlog2\n\nlog\n\n+\n\n+\n\n2n2\n\n(1)\u03bb2\n\n2d2(\u03bb + 1)\nn2\n(1)\u00012\u03bb2\n\n\u03b4\n\nn2\n(1)\u0001\u03bb2\n\n.\n\n\u03b4\n\nThe bound suggests an error because of two factors: aggregation and perturbation. The bound\nincreases for smaller values of \u0001 implying a tighter de\ufb01nition of differential privacy, indicating a\nclear trade-off between privacy and utility. The bound is also inversely proportional to n2\n(1) implying\nan increase in excess risk when the parties have training datasets of disparate sizes.\nIn the limiting case \u0001 \u2192 \u221e, we are adding a perturbation term \u03b7 sampled from a Laplacian distri-\nbution of in\ufb01nitesimally small variance resulting in the perturbed classi\ufb01er being almost as same as\nusing the unperturbed aggregate classi\ufb01er \u02c6w satisfying a very loose de\ufb01nition of differential privacy.\nWith such a value of \u0001, our bound becomes\n\nJ( \u02c6w) \u2264 J(w\u2217) +\n\n(K \u2212 1)2(\u03bb + 1)\n\n2n2\n\n(1)\u03bb2\n\n.\n\n(3)\n\nSimilar to the analysis of Theorem 4.2, the excess error in using an aggregate classi\ufb01er is inversely\nproportional to the size of the smallest dataset n(1) and in the one party case K = 1, the bound\nbecomes zero as the aggregate classi\ufb01er \u02c6w is the same as w\u2217. Also, for a small value of \u0001 in the one\nparty case K = 1 and n(1) = n, our bound reduces to that in Lemma 3 of [8],\n\nJ( \u02c6ws) \u2264 J(w\u2217) +\n\n2d2(\u03bb + 1)\n\nn2\u00012\u03bb2\n\nlog2\n\n.\n\n(4)\n\nWhile the previous theorem gives us a bound on the empirical excess risk over a given training\ndataset, it is important to consider a bound on the true excess risk of \u02c6ws over w\u2217. Let us denote the\ntrue risk of the classi\ufb01er \u02c6ws by \u02dcJ( \u02c6ws) = E[J( \u02c6ws)] and similarly, the true risk of the classi\ufb01er w\u2217 by\n\u02dcJ(w\u2217) = E[J(w\u2217)]. In the following theorem, we apply the result from [14] which uses the bound\non the empirical excess risk to form a bound on the true excess risk. Please refer to the appendix for\nthe proof.\nTheorem 4.4. If all training data instances xi lie in a unit ball, with probability at least 1 \u2212 \u03b4, the\ntrue excess risk of the perturbed aggregate classi\ufb01er \u02c6ws over the classi\ufb01er w\u2217 trained over entire\ntraining data is\n\n(cid:18) d\n\n(cid:19)\n\n\u03b4\n\n\u02dcJ(\u02c6ws) \u2264 \u02dcJ(w\u2217) +\n\n+\n\n5 Experiments\n\n2(K \u2212 1)2(\u03bb + 1)\n\n2n2\n\n(1)\u03bb2\n\n4d(K \u2212 1)(\u03bb + 1)\n\nn2\n(1)\u0001\u03bb2\n\n(cid:18) d\n\n(cid:19)\n\n\u03b4\n\n32 + log\n\n(cid:18) 1\n\n(cid:19)(cid:21)\n\n\u03b4\n\n.\n\n+\n\n4d2(\u03bb + 1)\nn2\n(1)\u00012\u03bb2\n\n(cid:18) d\n\n(cid:19)\n\nlog2\n\n(cid:20)\n\n+\n\n16\n\u03bbn\n\nlog\n\n\u03b4\n\nWe perform an empirical evaluation of the proposed differentially private classi\ufb01er to obtain a char-\nacterization of the increase in the error due to perturbation. We use the Adult dataset from the\nUCI machine learning repository [15] consisting of personal information records extracted from\n\n7\n\n\fFigure 2: Classi\ufb01er performance evaluated for w\u2217, w\u2217 + \u03b7, and \u02c6ws for different data splits vs. \u0001\n\nthe census database and the task is to predict whether a given person has an annual income over\n$50,000. The choice of the dataset is motivated as a realistic example for application of data pri-\nvacy techniques. The original Adult data set has six continuous and eight categorical features. We\nuse pre-processing similar to [16], the continuous features are discretized into quintiles, and each\nquintile is represented by a binary feature. Each categorical feature is converted to as many binary\nfeatures as its cardinality. The dataset contains 32,561 training and 16,281 test instances each with\n123 features.1 In Figure 2, we compare the test error of perturbed aggregate classi\ufb01ers trained over\ndata from \ufb01ve parties for different values of \u0001. We consider three situations: all parties with equal\ndatasets containing 6512 instances (even split, n(1) = 20% of n), parties with datasets containing\n4884, 6512, 6512, 6512, 8141 instances (n(1) = 15% of n), and parties with datasets containing\n3256, 6512, 6512, 6512, 9769 instances (n(1) = 10% of n). We also compare with the error of\nthe classi\ufb01er trained using combined training data and its perturbed version satisfying differential\nprivacy. We chose the value of the regularization parameter \u03bb = 1 and the results displayed are\naveraged over 200 executions.\nThe perturbed aggregate classi\ufb01er which is trained using maximum n(1) = 6512 does consistently\nbetter than for lower values of n(1) which is same as our theory suggested. Also, the test error for\nall perturbed aggregate classi\ufb01ers drops with \u0001, but comparatively faster for even split and converges\nto the test error of the classi\ufb01er trained over the combined data. As expected, the differentially\nprivate classi\ufb01er trained over the entire training data does much better than the perturbed aggregate\nclassi\ufb01ers with an error equal to the unperturbed classi\ufb01er except for small values of \u0001. The lower\nerror of this classi\ufb01er is at the cost of the loss in privacy of the parties as they would need to share\nthe data in order to train the classi\ufb01er over combined data.\n\n6 Conclusion\n\nWe proposed a method for composing an aggregate classi\ufb01er satisfying \u0001-differential privacy from\nclassi\ufb01ers locally trained by multiple untrusting parties. The upper bound on the excess risk of\nthe perturbed aggregate classifer as compared to the optimal classi\ufb01er trained over the complete\ndata without privacy constraints is inversely proportional to the privacy parameter \u0001, suggesting an\ninherent tradeoff between privacy and utility. The bound is also inversely proportional to the size\nof the smallest training dataset, implying the best performance when the datasets are of equal sizes.\nExperimental results on the UCI Adult data also show the behavior suggested by the bound and\nwe observe that the proposed method provides classi\ufb01cation performance close to the optimal non-\nprivate classi\ufb01er for appropriate values of \u0001. In future work, we seek to generalize the theoretical\nanalysis of the perturbed aggregate classi\ufb01er to the setting in which each party has data generated\nfrom a different distribution.\n\n1The dataset can be download from http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html#a9a\n\n8\n\n0.20.250.30.350.40.450.500.050.10.150.20.250.30.350.4test error\u03b5non-private all dataDP all dataDP aggregate n(1)=6512DP aggregate n(1)=4884DP aggregate n(1)=3256\fReferences\n[1] Arvind Narayanan and Vitaly Shmatikov. De-anonymizing social networks. In IEEE Sympo-\n\nsium on Security and Privacy, pages 173\u2013187, 2009.\n\n[2] Andrew Yao. Protocols for secure computations (extended abstract). In IEEE Symposium on\n\nFoundations of Computer Science, 1982.\n\n[3] Jaideep Vaidya, Chris Clifton, Murat Kantarcioglu, and A. Scott Patterson. Privacy-preserving\n\ndecision trees over vertically partitioned data. TKDD, 2(3), 2008.\n\n[4] Jaideep Vaidya, Murat Kantarcioglu, and Chris Clifton. Privacy-preserving naive bayes classi-\n\n\ufb01cation. VLDB J, 17(4):879\u2013898, 2008.\n\n[5] Jaideep Vaidya, Hwanjo Yu, and Xiaoqian Jiang. Privacy-preserving svm classi\ufb01cation.\n\nKnowledge and Information Systems, 14(2):161\u2013178, 2008.\n\n[6] Cynthia Dwork. Differential privacy. In International Colloquium on Automata, Languages\n\nand Programming, 2006.\n\n[7] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensi-\n\ntivity in private data analysis. In Theory of Cryptography Conference, pages 265\u2013284, 2006.\n\n[8] Kamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression. In Neural\n\nInformation Processing Systems, pages 289\u2013296, 2008.\n\n[9] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in\n\nprivate data analysis. In ACM Symposium on Theory of Computing, pages 75\u201384, 2007.\n\n[10] Adam Smith. Ef\ufb01cient, differentially private point estimators. arXiv:0809.4794v1 [cs.CR],\n\n2008.\n\n[11] Pascal Paillier. Public-key cryptosystems based on composite degree residuosity classes. In\n\nEUROCRYPT, 1999.\n\n[12] Mikhail Atallah and Jiangtao Li. Secure outsourcing of sequence comparisons. International\n\nJournal of Information Security, 4(4):277\u2013287, 2005.\n\n[13] Michael Ben-Or, Shari Goldwasser, and Avi Widgerson. Completeness theorems for non-\ncryptographic fault-tolerant distributed computation. In Proceedings of the ACM Symposium\non the Theory of Computing, pages 1\u201310, 1988.\n\n[14] Karthik Sridharan, Shai Shalev-Shwartz, and Nathan Srebro. Fast rates for regularized objec-\n\ntives. In Neural Information Processing Systems, pages 1545\u20131552, 2008.\n\n[15] A. Frank and A. Asuncion. UCI machine learning repository, 2010.\n[16] John Platt. Fast training of support vector machines using sequential minimal optimization. In\n\nAdvances in Kernel Methods \u2014 Support Vector Learning, pages 185\u2013208, 1999.\n\n9\n\n\f", "award": [], "sourceid": 408, "authors": [{"given_name": "Manas", "family_name": "Pathak", "institution": null}, {"given_name": "Shantanu", "family_name": "Rane", "institution": null}, {"given_name": "Bhiksha", "family_name": "Raj", "institution": null}]}