{"title": "Near-optimal Differentially Private Principal Components", "book": "Advances in Neural Information Processing Systems", "page_first": 989, "page_last": 997, "abstract": "Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data sets in high dimension. Many current data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We demonstrate that on real data, there this a large performance gap between the existing methods and our method. We show that the sample complexity for the two procedures differs in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling.", "full_text": "Near-optimal Differentially Private Principal\n\nComponents\n\nKamalika Chaudhuri\n\nUC San Diego\n\nkchaudhuri@ucsd.edu\n\nAnand D. Sarwate\n\nTTI-Chicago\n\nasarwate@ttic.edu\n\nKaushik Sinha\nUC San Diego\n\nksinha@cs.ucsd.edu\n\nAbstract\n\nPrincipal components analysis (PCA) is a standard tool for identifying good low-\ndimensional approximations to data sets in high dimension. Many current data\nsets of interest contain private or sensitive information about individuals. Algo-\nrithms which operate on such data should be sensitive to the privacy risks in pub-\nlishing their outputs. Differential privacy is a framework for developing tradeoffs\nbetween privacy and the utility of these outputs. In this paper we investigate the\ntheory and empirical performance of differentially private approximations to PCA\nand propose a new method which explicitly optimizes the utility of the output.\nWe demonstrate that on real data, there is a large performance gap between the\nexisting method and our method. We show that the sample complexity for the two\nprocedures differs in the scaling with the data dimension, and that our method is\nnearly optimal in terms of this scaling.\n\n1\n\nIntroduction\n\nDimensionality reduction is a fundamental tool for understanding complex data sets that arise in\ncontemporary machine learning and data mining applications. Even though a single data point\ncan be represented by hundreds or even thousands of features, the phenomena of interest are often\nintrinsically low-dimensional. By reducing the \u201cextrinsic\u201d dimension of the data to its \u201cintrinsic\u201d di-\nmension, analysts can discover important structural relationships between features, more ef\ufb01ciently\nuse the transformed data for learning tasks such as classi\ufb01cation or regression, and greatly reduce\nthe space required to store the data. One of the oldest and most classical methods for dimensionality\nreduction is principal components analysis (PCA), which computes a low-rank approximation to the\nsecond moment matrix of a set of points in Rd. The rank k of the approximation is chosen to be the\nintrinsic dimension of the data. We view this procedure as specifying a k-dimensional subspace of\nRd.\nMuch of today\u2019s machine-learning is performed on the vast amounts of personal information col-\nlected by private companies and government agencies about individuals, such as customers, users,\nand subjects. These datasets contain sensitive information about individuals and typically involve\na large number of features. It is therefore important to design machine-learning algorithms which\ndiscover important structural relationships in the data while taking into account its sensitive nature.\nWe study approximations to PCA which guarantee differential privacy, a cryptographically moti-\nvated de\ufb01nition of privacy [9] that has gained signi\ufb01cant attention over the past few years in the\nmachine-learning and data-mining communities [19, 21, 20, 10, 23]. Differential privacy measures\nprivacy risk by a parameter \u21b5 that bounds the log-likelihood ratio of output of a (private) algorithm\nunder two databases differing in a single individual.\nThere are many general tools for providing differential privacy. The sensitivity method [9] computes\nthe desired algorithm (PCA) on the data and then adds noise proportional to the maximum change\nthan can be induced by changing a single point in the data set. The PCA algorithm is very sensitive\n\n1\n\n\fin this sense because the top eigenvector can change by 90 by changing one point in the data set.\nRelaxations such as smoothed sensitivity [24] are dif\ufb01cult to compute in this setting as well. The\nSULQ method of Blum et al. [2] adds noise to the second moment matrix and then runs PCA on\nthe noisy matrix. As our experiments show, the amount of noise required is often quite severe and\nSULQ seems impractical for data sets of moderate size.\nThe general SULQ method does not take into account the quality of approximation to the non-\nprivate PCA output. We address this by proposing a new method, PPCA, that is an instance of the\nexponential mechanism of McSherry and Talwar [22]. For any k < d, this differentially private\nmethod outputs a k-dimensional subspace; the output is biased towards subspaces which are close\nto the output of PCA. In our case, the method corresponds to sampling from the matrix Bingham\ndistribution. We implement this method using a Markov Chain Monte Carlo (MCMC) procedure\ndue to Hoff [15] and show that it achieves signi\ufb01cantly better empirical performance.\nIn order to understand the performance gap, we prove sample complexity bounds in case of k = 1 for\nSULQ and PPCA, as well as a general lower bound on the sample complexity for any differentially\nprivate algorithm. We show that (up to log factors) the sample complexity scales as \u2326(d3/2pd)\nfor SULQ and as O(d) for PPCA. Furthermore, any differentially private algorithm requires \u2326(d)\nsamples, showing that PPCA is nearly optimal in terms of sample complexity as a function of data\ndimension. These theoretical results suggest that our experiments exhibit the limit of how well \u21b5-\ndifferentially private algorithms can perform, and our experiments show that this gap should persist\nfor general k.\nThere are several interesting open questions suggested by this work. One set of issues is compu-\ntational. Differentially privacy is a mathematical de\ufb01nition, but algorithms must be implemented\nusing \ufb01nite precision machines. Privacy and computation interact in many places, including pseu-\ndorandomness, numerical stability, optimization, and in the MCMC procedure we use to implement\nPPCA; investigating the impact of approximate sampling is an avenue for future work. A second set\nof issues is theoretical \u2013 while the privacy guarantees of PPCA hold for all k, our theoretical anal-\nysis of sample complexity applies only to k = 1 in which the distance and angles between vectors\nare related. An interesting direction is to develop theoretical bounds for general k; challenges here\nare providing the right notion of approximation of PCA, and extending the theory using packings of\nGrassman or Stiefel manifolds.\n\n2 Preliminaries\n\nThe data given to our algorithm is a set of n vectors D = {x1, x2, . . . , xn} where each xi corre-\nsponds to the private value of one individual, xi 2 Rd, and kxik \uf8ff 1 for all i. Let X = [x1, . . . , xn]\nn XX T denote the d \u21e5 d second\nbe the matrix whose columns are the data vectors {xi}. Let A = 1\nmoment matrix of the data. The matrix A is positive semide\ufb01nite, and has Frobenius norm at most\n1.\nThe problem of dimensionality reduction is to \ufb01nd a \u201cgood\u201d low-rank approximation to A. A popular\nsolution is to compute a rank-k matrix \u02c6A which minimizes the norm kA \u02c6AkF, where k is much\nlower than the data dimension d. The Schmidt approximation theorem [25] shows that the minimizer\nis given by the singular value decomposition, also known as the PCA algorithm in some areas of\ncomputer science.\nDe\ufb01nition 1. Suppose A is a positive semide\ufb01nite matrix whose \ufb01rst k eigenvalues are distinct. Let\nthe eigenvalues of A be 1(A) 2(A) \u00b7\u00b7\u00b7 d(A) 0 and let \u21e4 be a diagonal matrix with\n\u21e4ii = i(A). The matrix A decomposes as\n\nA = V \u21e4V T ,\n\nwhere V is an orthonormal matrix of eigenvectors. The top-k subspace of A is the matrix\n\nwhere vi is the i-th column of V in (1).\n\nVk(A) = [v1 v2 \u00b7\u00b7\u00b7 vk] ,\n\n(1)\n\n(2)\n\nGiven the top-k subspace and the eigenvalue matrix \u21e4, we can form an approximation A(k) =\nVk(A)\u21e4kVk(A)T to A, where \u21e4k contains the k largest eigenvalues in \u21e4. In the special case k = 1\n\n2\n\n\fwe have A(1) = 1(A)v1vT\n1 , where v1 is the eigenvector corresponding to 1(A). We refer to v1 as\nthe top eigenvector of the data. For a d \u21e5 k matrix \u02c6V with orthonormal columns, the quality of \u02c6V in\napproximating A can be measured by\n\nqF( \u02c6V ) = tr\u21e3 \u02c6V T A \u02c6V\u2318 .\n\n(3)\nThe \u02c6V which maximizes q( \u02c6V ) has columns equal to {vi : i 2 [k]}, corresponding to the top k\neigenvectors of A.\nOur theoretical results apply to the special case k = 1. For these results, we measure the inner\nproduct between the output vector \u02c6v1 and the true top eigenvector v1:\n\nqA(\u02c6v1) = |h\u02c6v1, v1i| .\n\nqF(\u02c6v1) = 1qA(\u02c6v1)2 +\n\nThis is related to (3). If we write \u02c6v1 in the basis spanned by {vi}, then\nih\u02c6v1, vii2.\n\ndXi=2\nOur proof techniques use the geometric properties of qA(\u00b7).\nDe\ufb01nition 2. A randomized algorithm A(\u00b7) is an (\u21e2, \u2318)-close approximation to the top eigenvector\nif for all data sets D of n points,\n(5)\n\nP (qA(A(D)) \u21e2) 1 \u2318,\n\nwhere the probability is taken over A(\u00b7).\nWe study approximations to \u02c6A that preserve the privacy of the underlying data. The notion of\nprivacy that we use is differential privacy, which quanti\ufb01es the privacy guaranteed by a randomized\nalgorithm P applied to a data set D.\nDe\ufb01nition 3. An algorithm A(B) taking values in a set T provides \u21b5-differential privacy if\n\n(4)\n\n(6)\n\nsup\nS\n\nsup\nD,D0\n\n\u00b5 (S | B = D)\n\u00b5 (S | B = D0) \uf8ff e\u21b5,\n\nwhere the \ufb01rst supremum is over all measurable S\u2713T\n, the second is over all data sets D and\nD0 differing in a single entry, and \u00b5(\u00b7|B) is the conditional distribution (measure) on T induced by\nthe output A(B) given a data set B. The ratio is interpreted to be 1 whenever the numerator and\ndenominator are both 0.\nDe\ufb01nition 4. An algorithm A(B) taking values in a set T provides (\u21b5, )-differential privacy if\n\nP (A(D) 2S ) \uf8ff e\u21b5P (A(D0) 2S ) + ,\nand all data sets D and D0 differing in a single entry.\n\nfor all all measurable S\u2713T\nHere \u21b5 and are privacy parameters, where low \u21b5 and ensure more privacy. For more details about\nthese de\ufb01nitions, see [9, 26, 8]. The second privacy guarantee is weaker; the parameter bounds the\nprobability of failure, and is typically chosen to be quite small.\nIn this paper we are interested in proving results on the sample complexity of differentially pri-\nvate algorithms that approximate PCA. That is, for a given \u21b5 and \u21e2, how large must the number of\nindividuals n in the data set be such that it is \u21b5-differentially private and also a (\u21e2, \u2318)-close approxi-\nmation to PCA? It is well known that as the number of individuals n grows, it is easier to guarantee\nthe same level of privacy with relatively less noise or perturbation, and therefore the utility of the\napproximation also improves. Our results characterize how privacy and utility scale with n and the\ntradeoff between them for \ufb01xed n.\n\n(7)\n\nRelated Work Differential privacy was proposed by Dwork et al. [9], and has spawned an exten-\nsive literature of general methods and applications [1, 21, 27, 6, 24, 3, 22, 10] Differential privacy\nhas been shown to have strong semantic guarantees [9, 17] and is resistant to many attacks [12] that\nsucceed against some other de\ufb01nitions of privacy. There are several standard approaches for design-\ning differentially-private data-mining algorithms, including input perturbation [2], output perturba-\ntion [9], the exponential mechanism [22], and objective perturbation [6]. To our knowledge, other\n\n3\n\n\fthan SULQ method [2], which provides a general differentially-private input perturbation algo-\nrithm, this is the \ufb01rst work on differentially-private PCA. Independently, [14] consider the problem\nof differentially-private low-rank matrix reconstruction for applications to sparse matrices; provided\ncertain coherence conditions hold, they provide an algorithm for constructing a rank 2k approxi-\nmation B to a matrix A such that kA BkF is O(kA Akk) plus some additional terms which\ndepend on d, k and n; here Ak is the best rank k approximation to A. Because of their additional\nassumptions, their bounds are generally incomparable to ours, and our bounds are superior for dense\nmatrices.\nThe data-mining community has also considered many different models for privacy-preserving com-\nputation \u2013 see Fung et al. for a survey with more references [11]. Many of the models used have\nbeen shown to be susceptible to composition attacks, when the adversary has some amount of prior\nknowledge [12]. An alternative line of privacy-preserving data-mining work [28] is in the Secure\nMultiparty Computation setting; one work [13] studies privacy-preserving singular value decompo-\nsition in this model. Finally, dimension reduction through random projection has been considered\nas a technique for sanitizing data prior to publication [18]; our work differs from this line of work\nin that we offer differential privacy guarantees, and we only release the PCA subspace, not actual\ndata. Independently, Kapralov and Talwar [16] have proposed a dynamic programming algorithm\nfor differentially private low rank matrix approximation which involves sampling from a distribution\ninduced by the exponential mechanism. The running time of their algorithm is O(d6), where d is\nthe data dimension.\n\n3 Algorithms and results\n\nIn this section we describe differentially private techniques for approximating (2). The \ufb01rst is a mod-\ni\ufb01ed version of the SULQ method [2]. Our new algorithm for differentially-private PCA, PPCA,\nis an instantiation of the exponential mechanism due to McSherry and Talwar [22]. Both proce-\ndures provide differentially private approximations to the top-k subspace: SULQ provides (\u21b5, )-\ndifferential privacy and PPCA provides \u21b5-differential privacy.\n\nInput perturbation. The only differentially-private approximation to PCA prior to this work is\nthe SULQ method [2]. The SULQ method perturbs each entry of the empirical second moment ma-\ntrix A to ensure differential privacy and releases the top k eigenvectors of this perturbed matrix. In\nparticular, SULQ recommends adding a matrix N of i.i.d. Gaussian noise of variance 8d2 log2(d/)\nand applies the PCA algorithm to A + N. This guarantees a weaker privacy de\ufb01nition known as\n(\u21b5, )-differential privacy. One problem with this approach is that with probability 1 the matrix\nA + N is not symmetric, so the largest eigenvalue may not be real and the entries of the correspond-\ning eigenvector may be complex. Thus the SULQ algorithm is not a good candidate for practical\nprivacy-preserving dimensionality reduction.\nHowever, a simple modi\ufb01cation to the basic SULQ approach does guarantee (\u21b5, ) differential\nprivacy. Instead of adding a asymmetric Gaussian matrix, the algorithm can add the a symmetric\nmatrix with i.i.d. Gaussian entries N. That is, for 1 \uf8ff i \uf8ff j \uf8ff d, the variable Nij is an independent\nGaussian random variable with variance 2. Note that this matrix is symmetric but not necessarily\npositive semide\ufb01nite, so some eigenvalues may be negative but the eigenvectors are all real. A\nderivation for the noise variance is given in Theorem 1.\n\nn2\u21b52\n\nAlgorithm 1: Algorithm MOD-SULQ (input pertubation)\ninputs: d \u21e5 n data matrix X, privacy parameter \u21b5, parameter \noutputs: d \u21e5 k matrix \u02c6Vk = [\u02c6v1 \u02c6v2 \u00b7\u00b7\u00b7 \u02c6vk] with orthonormal columns\n1 Set A = 1\n\n2 Set = d+1\n\nentries are i.i.d. drawn from N0, 2. ;\n\n3 Compute \u02c6Vk = Vk(A + N ) according to (2). ;\n\nn XX T .;\n\nn\u21b5r2 log\u21e3 d2+d\n\n2p2\u21e1\u2318 + 1p\u21b5n. Generate a d \u21e5 d symmetric random matrix N whose\n\n4\n\n\fExponential mechanism. Our new method, PPCA, randomly samples a k-dimensional subspace\nfrom a distribution that ensures differential privacy and is biased towards high utility. The distribu-\ntion from which our released subspace is sampled is known in the statistics literature as the matrix\nBingham distribution [7], which we denote by BMFk(B). The algorithm is in terms of general\nk < d but our theoretical results focus on the special case k = 1 where we wish to release a one-\ndimensional approximation to the data covariance matrix. The matrix Bingham distribution takes\nvalues on the set of all k-dimensional subspaces of Rd and has a density equal to\n\nf (V ) =\n\n2 d, B exp(tr(V T BV )),\nwhere V is a d \u21e5 k matrix whose columns are orthonormal and F1 1 1\n\nhypergeometric function [7, p.33].\n\nF1 1 1\n\n1\n2 k, 1\n\n(8)\n\n2 k, 1\n\n2 d, B is a con\ufb02uent\n\nAlgorithm 2: Algorithm PPCA (exponential mechanism)\ninputs: d \u21e5 n data matrix X, privacy parameter \u21b5, dimension k\noutputs: d \u21e5 k matrix \u02c6Vk = [\u02c6v1 \u02c6v2 \u00b7\u00b7\u00b7 \u02c6vk] with orthonormal columns\n1 Set A = 1\n2 Sample \u02c6Vk = BMFn \u21b5\n2 A ;\n\nn XX T ;\n\nBy combining results on the exponential mechanism [22] along with properties of PCA algorithm,\nwe can show that this procedure is differentially private. In many cases, sampling from the distribu-\ntion speci\ufb01ed by the exponential mechanism distribution may be dif\ufb01cult computationally, especially\nfor continuous-valued outputs. We implement PPCA using a recently-proposed Gibbs sampler due\nto Hoff [15]. Gibbs sampling is a popular Markov Chain Monte Carlo (MCMC) technique in which\nsamples are generated according to a Markov chain whose stationary distribution is the density in\n(8). Assessing the \u201cburn-in time\u201d and other factors for this procedure is an interesting question in its\nown right; further details are in Section E.3.\n\nOther approaches. There are other general algorithmic strategies for guaranteeing differential\nprivacy. The sensitivity method [9] adds noise proportional to the maximum change that can be\ninduced by changing a single point in the data set. Consider a data set D with m + 1 copies of a unit\nvector u and m copies of a unit vector u0 with u ? u0 and let D0 have m copies of u and m+1 copies\nof u0. Then v1(D) = u but v1(D0) = u0, so kv1(D) v1(D0)k = p2. Thus the global sensitivity\ndoes not scale with the number of data points, so as n increases the variance of the noise required\nby the Laplace mechanism [9] will not decrease. An alternative to global sensitivity is smooth\nsensitivity [24]; except for special cases, such as the sample median, smooth sensitivity is dif\ufb01cult\nto compute for general functions. A third method for computing private, approximate solutions\nto high-dimensional optimization problems is objective perturbation [6]; to apply this method, we\nrequire the optimization problems to have certain properties (namely, strong convexity and bounded\nnorms of gradients), which do not apply to PCA.\n\nMain results. Our theoretical results are sample complexity bounds for PPCA and MOD-SULQ\nas well as a general lower bound on the sample complexity for any \u21b5-differentially private algorithm.\nThese results show that the PPCA is nearly optimal in terms the scaling of the sample complexity\nwith respect to the data dimension d, privacy parameter \u21b5, and eigengap . We further show that\nMOD-SULQ requires more samples as a function of d, despite having a slightly weaker privacy\nguarantee. Proofs are deferred to the supplementary material.\nEven though both algorithms can output the top-k PCA subspace for general k \uf8ff d, we prove results\nfor the case k = 1. Finding the scaling behavior of the sample complexity with k is an interesting\nopen problem that we leave for future work; challenges here are \ufb01nding the right notion of approxi-\nmation of the PCA, and extending the theory using packings of Grassman or Stiefel manifolds.\nTheorem 1. For the in Algorithm 1, the MOD-SULQ algorithm is (\u21b5, ) differentially private.\nTheorem 2. Algorithm PPCA is \u21b5-differentially private.\nThe fact that these two algorithms are differentially private follows from some simple calculations.\nOur \ufb01rst sample complexity result provides an upper bound on the number of samples required by\n\n5\n\n\fPPCA to guarantee a certain level of privacy and accuracy. The sample complexity of PPCA n\ngrows linearly with the dimension d, inversely with \u21b5, and inversely with the correlation gap (1 \u21e2)\nand eigenvalue gap 1(A) 2(A).\n(1\u21e22)(12)\u2318,\nTheorem 3 (Sample complexity of PPCA). If n >\nthen PPCA is a (\u21e2, \u2318)-close approximation to PCA.\n\n\u21b5(1\u21e2)(12)\u21e3 log(1/\u2318)\n\n+ log\n\n41\n\nd\n\nd\n\nOur second result shows a lower bound on the number of samples required by any \u21b5-differentially-\nprivate algorithm to guarantee a certain level of accuracy for a large class of datasets, and uses proof\ntechniques in [4, 5].\nTheorem 4 (Sample complexity lower bound). Fix d, \u21b5, \uf8ff 1\nexp\u21e32 \u00b7 ln 8+ln(1+exp(d))\nhaving eigenvalue gap , where n < max\u21e2 d\n\n2 and let 1 =\n16 , no \u21b5-differentially private algorithm A can\napproximate PCA with expected utility greater than \u21e2 on all databases with n points in dimension d\n\n\u2318 . For any \u21e2 1 1\n\u21b5 ,q 1\n\n\u21b5p1\u21e2 .\n\n80 \u00b7\n\nd2\n\nd\n\n1\n\nd\n\nd\n\n\u21b5(1\u21e2) log\n\n\u21b5p(1\u21e2)\n\n1\u21e22 then PPCA produces an approximation \u02c6v1\nTheorem 3 shows that if n scales like\nthat has correlation \u21e2 with v1, whereas Theorem 4 shows that n must scale like\nfor any\n\u21b5-differentially private algorithm. In terms of scaling with d, \u21b5 and , the upper and lower bounds\nmatch, and they also match up to square-root factors with respect to the correlation. By contrast, the\nfollowing lower bound on the number of samples required by MOD-SULQ to ensure a certain level\nof accuracy shows that MOD-SULQ has a less favorable scaling with dimension.\nTheorem 5 (Sample complexity lower bound for MOD-SULQ). There are constants c and c0 such\nthat if n < c d3/2plog(d/)\n(1 c0(1 \u21e2)), then there is a dataset of size n in dimension d such that\nthe top PCA direction v and the output \u02c6v of MOD-SULQ satisfy E[|h\u02c6v1, v1i|] \uf8ff \u21e2.\nNotice that the dependence on n grows as d3/2 in SULQ as opposed to d in PPCA. Dimensionality\nreduction via PCA is often used in applications where the data points occupy a low dimensional\nspace but are presented in high dimensions. These bounds suggest that PPCA is better suited to\nsuch applications than MOD-SULQ. We next turn to validating this intuition on real data.\n\n\u21b5\n\n4 Experiments\n\nWe chose four datasets from four different domains \u2013 kddcup99, which includes features\nof 494,021 network connections, census, a demographic data set on 199, 523 individuals,\nlocalization, a medical dataset with 164,860 instances of sensor readings on individuals en-\ngaged in different activities, and insurance, a dataset on product usage and demographics of\n9,822 individuals. After preprocessing, the dimensions of these datasets are 116, 513, 44 and 150\nrespectively. We chose k to be 4, 8, 10, and 11 such that the top-k PCA subspace had qF(Vk) at least\n80% of kAkF. More details are in Appendix E in the supplementary material.\nWe ran three algorithms on these data sets : standard (non-private) PCA, MOD-SULQ with \u21b5 = 0.1\nand = 0.01, and PPCA with \u21b5 = 0.1. As a sanity check, we also tried a uniformly generated\nrandom projection \u2013 since this projection is data-independent we would expect it to have low utility.\nStandard PCA is non-private; changing a single data point will change the output, and hence violate\ndifferential privacy. We measured the utility qF(U ), where U is the k-dimensional subspace output\nby the algorithm; kUk is maximized when U is the top-k PCA subspace, and thus this re\ufb02ects how\nclose the output subspace is to the true PCA subspace in terms of representing the data. Although\nour theoretical results hold for qA(\u00b7), the \u201cenergy\u201d qF(\u00b7) is more relevant in practice for larger k.\nFigures 1(a), 1(b), 1(c), and 1(d) show qF(U ) as a function of sample size for the k-dimensional\nsubspace output by PPCA, MOD-SULQ, non-private PCA, and random projections. Each value\nin the \ufb01gure is an average over 5 random permutations of the data, as well as 10 random starting\npoints of the Gibbs sampler per permutation (for PPCA), and 100 random runs per permutation (for\nMOD-SULQ and random projections).\n\n6\n\n\f0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\ny\nt\ni\nl\ni\nt\n\nU\n\n0.5\n\n0.4\n\ny\nt\ni\nl\ni\nt\n\nU\n\n0.3\n\n0.2\n\n50000\n\n100000\n\nn\n\n(a) census\n\nAlgorithm\n\nNonprivate\nPPCA\nRandom\nSULQ\n\n150000\n\nAlgorithm\n\nNonprivate\nPPCA\nRandom\nSULQ\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\ny\nt\ni\nl\ni\nt\n\nU\n\ny\nt\ni\nl\ni\nt\n\nU\n\nAlgorithm\n\nNonprivate\nPPCA\nRandom\nSULQ\n\n2e+04\n\n4e+04\n\n6e+04\n\nn\n\n8e+04\n\n1e+05\n\n(b) kddcup\n\nAlgorithm\n\nNonprivate\nPPCA\nRandom\nSULQ\n\n2e+04\n\n4e+04\n\n6e+04\n\nn\n\n8e+04\n\n1e+05\n\n2000\n\n4000\n\n6000\n\nn\n\n8000\n\n10000\n\n(c) localization\n\n(d) insurance\n\nFigure 1: Utility qF(U ) for the four data sets\n\nKDDCUP\nLOCALIZATION 100 \u00b1 0\nTable 1: Classi\ufb01cation accuracy in the k-dimensional subspaces for kddcup99(k = 4), and\nlocalization(k = 10) in the k-dimensional subspaces reported by the different algorithms.\n\nMOD-SULQ Random projections\n98.18 \u00b1 0.65\n97.06 \u00b1 2.17\n\n98.23 \u00b1 0.49\n96.28 \u00b1 2.34\n\nNon-private PCA PPCA\n98.97 \u00b1 0.05\n\n98.95 \u00b1 0.05\n100 \u00b1 0\n\nThe plots show that PPCA always outperforms MOD-SULQ, and approaches the performance of\nnon-private PCA with increasing sample size. By contrast, for most of the problems and sample\nsizes considered by our experiments, MOD-SULQ does not perform much better than random pro-\njections. The only exception is localization, which has much lower dimension (44). This\ncon\ufb01rms that MOD-SULQ does not scale very well with the data dimension d. The performance of\nboth MOD-SULQ and PPCA improve as the sample size increases; the improvement is faster for\nPPCA than for MOD-SULQ. However, to be fair, MOD-SULQ is simpler and hence runs faster\nthan PPCA. At the sample sizes in our experiments, the performance of non-private PCA does not\nimprove much with a further increase in samples. Our theoretical results suggest that the perfor-\nmance of differentially private PCA cannot be signi\ufb01cantly improved over these experiments.\n\nEffect of privacy on classi\ufb01cation. A common use of a dimension reduction algorithm is as a\nprecursor to classi\ufb01cation or clustering; to evaluate the effectiveness of the different algorithms,\nwe projected the data onto the subspace output by the algorithms, and measured the classi\ufb01cation\naccuracy using the projected data. The classi\ufb01cation results are summarized in Table 4. We chose\nthe normal vs. all classi\ufb01cation task in kddcup99, and the falling vs. all classi\ufb01cation task in\nlocalization. 1 We used a linear SVM for all classi\ufb01cation experiments.\nFor the classi\ufb01cation experiments, we used half of the data as a holdout set for computing a projec-\ntion subspace. We projected the classi\ufb01cation data onto the subspace computed based on the holdout\nset; 10% of this data was used for training and parameter-tuning, and the rest for testing. We re-\npeated the classi\ufb01cation process 5 times for 5 different (random) projections for each algorithm, and\nthen ran the entire procedure over 5 random permutations of the data. Each value in the \ufb01gure is\nthus an average over 5 \u21e5 5 = 25 rounds of classi\ufb01cation.\n\n1For the other two datasets, census and insurance, the classi\ufb01cation accuracy of linear SVM after\n\n(non-private) PCAs is as low as always predicting the majority label.\n\n7\n\n\f)\n\nU\n(\nq\n\n \ny\nt\ni\nl\ni\nt\n\nU\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n\u25cf\n\u25cf\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\n\u25cf\n\nUtility versus privacy parameter\n\n\u25cf\n\u25cf\n\n\u25cf\n\n\u25cf\n\u25cf\n\n\u25cf\n\nAlgorithm\n\nNon\u2212Private\nSULQ\nPPCA 1000\n\n0.5\n\nPrivacy parameter alpha\n\n1.0\n\n1.5\n\n\u25cf\u25cf\u25cf\n\n2.0\n\nFigure 2: Plot of qF(U ) versus \u21b5 for a synthetic data set with n = 5,000, d = 10, and k = 2.\n\nThe classi\ufb01cation results show that our algorithm performs almost as well as non-private PCA for\nclassi\ufb01cation in the top k PCA subspace, while the performance of MOD-SULQ and random projec-\ntions are a little worse. The classi\ufb01cation accuracy while using MOD-SULQ and random projections\nalso appears to have higher variance compared to our algorithm and non-private PCA; this can be\nexplained by the fact that these projections tend to be farther from the PCA subspace, in which the\ndata has higher classi\ufb01cation accuracy.\n\nthe privacy requirement. To check the effect of\n\nEffect of\nthe privacy requirement,\nwe generated a synthetic data set of n = 5,000 points drawn from a Gaussian dis-\ntribution in d = 10 with mean 0 and whose covariance matrix had eigenvalues\n{0.5, 0.30, 0.04, 0.03, 0.02, 0.01, 0.004, 0.003, 0.001, 0.001}. In this case the space spanned by the\ntop two eigenvectors has most of the energy, so we chose k = 2 and plotted the utility qF(\u00b7) for non-\nprivate PCA, MOD-SULQ with = 0.05, and PPCA. We drew 100 samples from each privacy-\npreserving algorithm and the plot of the average utility versus \u21b5 is shown in Figure 2. As \u21b5 increases,\nthe privacy requirement is relaxed and both MOD-SULQ and PPCA approach the utility of PCA\nwithout privacy constraints. However, for moderate \u21b5 the PPCA still captures most of the utility,\nwhereas the gap between MOD-SULQ and PPCA becomes quite large.\n\n5 Conclusion\n\nIn this paper we investigated the theoretical and empirical performance of differentially private ap-\nproximations to PCA. Empirically, we showed that MOD-SULQ and PPCA differ markedly in how\nwell they approximate the top-k subspace of the data. The reason for this, theoretically, is that the\nsample complexity of MOD-SULQ scales with d3/2plog d whereas PPCA scales with d. Because\nPPCA uses the exponential mechanism with qF(\u00b7) as the utility function, it is not surprising that\nit performs well. However, MOD-SULQ often had a performance comparable to random projec-\ntions, indicating that the real data sets we used were too small for it to be effective. We furthermore\nshowed that PPCA is nearly optimal, in that any differentially private approximation to PCA must\nuse \u2326(d) samples.\nOur investigation brought up many interesting issues to consider for future work. The description of\ndifferentially private algorithms assume an ideal model of computation : real systems require addi-\ntional security assumptions that have to be veri\ufb01ed. The difference between truly random noise and\npseudorandomness and the effects of \ufb01nite precision can lead to a gap between the theoretical ideal\nand practice. Numerical optimization methods used in objective perturbation [6] can only produce\napproximate solutions, and have complex termination conditions unaccounted for in the theoretical\nanalysis. Our MCMC sampling has this \ufb02avor : we cannot sample exactly from the Bingham dis-\ntribution because we must determine the Gibbs sampler\u2019s convergence empirically. Accounting for\nthese effects is an interesting avenue for future work that can bring theory and practice together.\nFinally, more germane to the work on PCA here is to prove sample complexity results for general k\nrather than the case k = 1 here. For k = 1 the utility functions qF(\u00b7) and qA(\u00b7) are related, but for\ngeneral k it is not immediately clear what metric best captures the idea of \u201capproximating\u201d PCA.\nDeveloping a framework for such approximations is of interest more generally in machine learning.\n\n8\n\n\fReferences\n[1] BARAK, B., CHAUDHURI, K., DWORK, C., KALE, S., MCSHERRY, F., AND TALWAR, K. Privacy,\naccuracy, and consistency too: a holistic solution to contingency table release. In PODS (2007), pp. 273\u2013\n282.\n\n[2] BLUM, A., DWORK, C., MCSHERRY, F., AND NISSIM, K. Practical privacy: the SuLQ framework. In\n\nPODS (2005), pp. 128\u2013138.\n\n[3] BLUM, A., LIGETT, K., AND ROTH, A. A learning theory approach to non-interactive database privacy.\n\nIn STOC (2008), R. E. Ladner and C. Dwork, Eds., ACM, pp. 609\u2013618.\n\n[4] CHAUDHURI, K., AND HSU, D. Sample complexity bounds for differentially private learning. In COLT\n\n(2011).\n\n[5] CHAUDHURI, K., AND HSU, D. Convergence rates for differentially private statistical estimation. In\n\nICML (2012).\n\n[6] CHAUDHURI, K., MONTELEONI, C., AND SARWATE, A. D. Differentially private empirical risk mini-\n\nmization. Journal of Machine Learning Research 12 (March 2011), 1069\u20131109.\n\n[7] CHIKUSE, Y. Statistics on Special Manifolds. No. 174 in Lecture Notes in Statistics. Springer, New York,\n\n2003.\n\n[8] DWORK, C., KENTHAPADI, K., MCSHERRY, F., MIRONOV, I., AND NAOR, M. Our data, ourselves:\n\nPrivacy via distributed noise generation. In EUROCRYPT (2006), vol. 4004, pp. 486\u2013503.\n\n[9] DWORK, C., MCSHERRY, F., NISSIM, K., AND SMITH, A. Calibrating noise to sensitivity in private\n\ndata analysis. In 3rd IACR Theory of Cryptography Conference, (2006), pp. 265\u2013284.\n\n[10] FRIEDMAN, A., AND SCHUSTER, A. Data mining with differential privacy. In KDD (2010), pp. 493\u2013\n\n502.\n\n[11] FUNG, B. C. M., WANG, K., CHEN, R., AND YU, P. S. Privacy-preserving data publishing: A survey\n\nof recent developments. ACM Comput. Surv. 42, 4 (June 2010), 53 pages.\n\n[12] GANTA, S. R., KASIVISWANATHAN, S. P., AND SMITH, A. Composition attacks and auxiliary infor-\n\nmation in data privacy. In KDD (2008), pp. 265\u2013273.\n\n[13] HAN, S., NG, W. K., AND YU, P. Privacy-preserving singular value decomposition.\n\n2009-april 2 2009), pp. 1267 \u20131270.\n\nIn ICDE (29\n\n[14] HARDT, M., AND ROTH, A. Beating randomized response on incoherent matrices. In STOC (2012).\n[15] HOFF, P. D. Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to mul-\n\ntivariate and relational data. J. Comp. Graph. Stat. 18, 2 (2009), 438\u2013456.\n\n[16] KAPRALOV, M., AND TALWAR, K. On differentially private low rank approximation. In Proc. of SODA\n\n(2013).\n\n[17] KASIVISWANATHAN, S. P., AND SMITH, A. A note on differential privacy: De\ufb01ning resistance to\n\narbitrary side information. CoRR abs/0803.3946 (2008).\n\n[18] LIU, K., KARGUPTA, H., AND RYAN, J. Random projection-based multiplicative data perturbation for\n\nprivacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18, 1 (2006), 92\u2013106.\n\n[19] MACHANAVAJJHALA, A., KIFER, D., ABOWD, J. M., GEHRKE, J., AND VILHUBER, L. Privacy:\n\nTheory meets practice on the map. In ICDE (2008), pp. 277\u2013286.\n\n[20] MCSHERRY, F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis.\n\nIn SIGMOD Conference (2009), pp. 19\u201330.\n\n[21] MCSHERRY, F., AND MIRONOV, I. Differentially private recommender systems: Building privacy into\n\nthe net\ufb02ix prize contenders. In KDD (2009), pp. 627\u2013636.\n\n[22] MCSHERRY, F., AND TALWAR, K. Mechanism design via differential privacy. In FOCS (2007), pp. 94\u2013\n\n103.\n\n[23] MOHAMMED, N., CHEN, R., FUNG, B. C. M., AND YU, P. S. Differentially private data release for\n\ndata mining. In KDD (2011), pp. 493\u2013501.\n\n[24] NISSIM, K., RASKHODNIKOVA, S., AND SMITH, A. Smooth sensitivity and sampling in private data\n\nanalysis. In STOC (2007), D. S. Johnson and U. Feige, Eds., ACM, pp. 75\u201384.\n\n[25] STEWART, G. On the early history of the singular value decomposition. SIAM Review 35, 4 (1993),\n\n551\u2013566.\n\n[26] WASSERMAN, L., AND ZHOU, S. A statistical framework for differential privacy. JASA 105, 489 (2010).\n[27] WILLIAMS, O., AND MCSHERRY, F. Probabilistic inference and differential privacy. In NIPS (2010).\n[28] ZHAN, J. Z., AND MATWIN, S. Privacy-preserving support vector machine classi\ufb01cation. IJIIDS 1, 3/4\n\n(2007), 356\u2013385.\n\n9\n\n\f", "award": [], "sourceid": 482, "authors": [{"given_name": "Kamalika", "family_name": "Chaudhuri", "institution": null}, {"given_name": "Anand", "family_name": "Sarwate", "institution": null}, {"given_name": "Kaushik", "family_name": "Sinha", "institution": null}]}