{"title": "Differentially Private Covariance Estimation", "book": "Advances in Neural Information Processing Systems", "page_first": 14213, "page_last": 14222, "abstract": "The covariance matrix of a dataset is a fundamental statistic that can be used for calculating optimum regression weights as well as in many other learning and data analysis settings. For datasets containing private user information, we often want to estimate the covariance matrix in a way that preserves differential privacy. While there are known methods for privately computing the covariance matrix, they all have one of two major shortcomings. Some, like the Gaussian mechanism, only guarantee (epsilon, delta)-differential privacy, leaving a non-trivial probability of privacy failure. Others give strong epsilon-differential privacy guarantees, but are impractical, requiring complicated sampling schemes, and tend to perform poorly on real data. \n\nIn this work we propose a new epsilon-differentially private algorithm for computing the covariance matrix of a dataset that addresses both of these limitations. We show that it has lower error than existing state-of-the-art approaches, both analytically and empirically. In addition, the algorithm is significantly less complicated than other methods and can be efficiently implemented with rejection sampling.", "full_text": "Differentially Private Covariance Estimation\n\nKareem Amin\n\nkamin@google.com\nGoogle Research NY\n\nTravis Dick\n\ntdick@cs.cmu.edu\n\nCarnegie Mellon University\n\nAlex Kulesza\n\nkulesza@google.com\nGoogle Research NY\n\nAndr\u00b4es Mu\u02dcnoz Medina\nammedina@google.com\nGoogle Research NY\n\nSergei Vassilvitskii\nsergeiv@google.com\nGoogle Research NY\n\nAbstract\n\nThe task of privately estimating a covariance matrix is a popular one due to its\napplications to regression and PCA. While there are known methods for releasing\nprivate covariance matrices, these algorithms either achive only (\u270f, )-differential\nprivacy or require very complicated sampling schemes, ultimately performing\npoorly in real data.\nIn this work we propose a new \u270f-differentially private al-\ngorithm for computing the covariance matrix of a dataset that addresses both of\nthese limitations. We show that it has lower error than existing state-of-the-art\napproaches, both analytically and empirically. In addition, the algorithm is signif-\nicantly less complicated than other methods and can be ef\ufb01ciently implemented\nwith rejection sampling.\n\n1\n\nIntroduction\n\nDifferential privacy has emerged as a standard framework for thinking about user privacy in the\ncontext of large scale data analysis [Dwork et al., 2014a]. While differential privacy does not protect\nagainst all attack vectors, it does provide formal guarantees about possible information leakage. A\nkey feature of differential privacy is its robustness to post-processing: once a mechanism is certi\ufb01ed\nas differentially private, arbitrary post-processing can be performed on its outputs without additional\nprivacy impact.\nThe past decade has seen the emergence of a wide range of techniques for modifying classical\nlearning algorithms to be differentially private [McSherry and Mironov, 2009, Chaudhuri et al.,\n2011, Jain et al., 2012, Abadi et al., 2016]. These algorithms typically train directly on the raw data,\nbut inject carefully designed noise in order to produce differentially private outputs. A more general\n(and challenging) alternative approach is to \ufb01rst preprocess the dataset using a differentially private\nmechanism and then freely choose among standard off-the-shelf algorithms for learning. This not\nonly provides more \ufb02exibility in the design of the learning system, but also removes the need for\naccess to sensitive raw data (except for the initial preprocessing step). This approach thus falls\nunder the umbrella of data release: since the preprocessed dataset is differentially private, it can, in\nprinciple, be released without leaking any individual\u2019s data.\nIn this work we consider the problem of computing, in a differentially private manner, a speci\ufb01c\npreprocessed representation of a dataset: its covariance matrix. Formally, given a data matrix X 2\nRd\u21e5n, where each column corresponds to a data point, we aim to compute a private estimate of\nC = XX> that can be used in place of the raw data, for example, as the basis for standard linear\nregression algorithms. Our methods provide privacy guarantees for the columns of X.\nThere are many existing techniques that can be applied to this problem. We distinguish \u270f-\ndifferentially private algorithms, which promise what is sometimes referred to as pure differential\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fprivacy, from (\u270f, )-differentially private algorithms, which may fail to preserve privacy with some\nprobability . While algorithms in the pure differential privacy setting give stronger privacy guaran-\ntees, they tend to be signi\ufb01cantly more dif\ufb01cult to implement, and often underperform empirically\nwhen compared to the straightforward algorithms in the (\u270f, ) setting.\nIn this work, we give a new practical \u270f-differentially private algorithm for covariance matrix esti-\nmation. At a high level, the algorithm is natural. It approximates the eigendecomposition of the\ncovariance matrix C by estimating the collections of eigenvalues and eigenvectors separately. Since\nthe eigenvalues are insensitive to changes in a single column of X, we can accurately estimate them\nusing the Laplace mechanism. To estimate the eigenvectors, the algorithm uses the exponential\nmechanism to sample a direction \u2713 from the unit sphere that approximately maximizes \u2713>C\u2713, sub-\nject to the constraint of being orthogonal to the approximate eigenvectors sampled so far. The overall\nprivacy guarantee for the combined method then follows from basic composition.\nOur empirical results demonstrate lower reconstruction error for our algorithm when compared to\nother methods on both simulated and real-world datasets. This is especially striking in the high-\nprivacy/low-\u270f regime, where we outperform all existing methods. We note that there is a different\nregime where our bounds no longer compete with those of the Gaussian mechanism, namely when \u270f,\n, and the number of data points are all suf\ufb01ciently large (i.e., when privacy is \u201ceasy\u201d). This suggests\na two-pronged approach for the practitioner: utilize simple perturbation techniques when the data\nis insensitive to any one user and privacy parameters are lax, and more careful reconstruction when\nthe privacy parameters are tight or the data is scarce, as is often the case in the social sciences and\nmedical research.\nOur main results can be summarized as follows:\n\n\u2022 We prove our algorithm improves the privacy/utility trade-off by achieving lower error at\na given privacy parameter compared with previous pure differentially private approaches\n(Theorem 2).\n\nfor the core of the implementation (Algorithm 2).\n\n\u2022 We derive a non-uniform allocation of the privacy budget for estimating the eigenvectors of\nthe covariance matrix giving the strongest utility guarantee from our analysis (Corollary 1).\n\u2022 We show that our algorithm is practical: a simple rejection sampling scheme can be used\n\u2022 Finally, we perform an empirical evaluation of our algorithm, comparing it to existing meth-\nods on both synthetic and real-world datasets (Section 4). To the best of our knowledge,\nthis is the \ufb01rst comparative empirical evaluation of different private covariance estimation\nmethods, and we show that our algorithm outperforms all of the baselines, especially in the\nhigh privacy regime.\n\n1.1 Database Sanitization for Ridge Regression\nOur motivation for private covariance estimation is training regression models. In practice, regres-\nsion models are trained using different subsets of features, multiple regularization parameters, and\neven varying target variables. If we were to directly apply differentially private learning algorithms\nfor each of these learning tasks, our privacy costs would accumulate with every model we trained.\nOur goal is to instead pay the privacy cost only once, computing a single data structure that can be\nused multiple times to tune regression models. In this section, we show that a private estimate of\nthe covariance matrix C = XX> summarizes the data suf\ufb01ciently well for all of these ridge regres-\nsion learning tasks with only a one-time privacy cost. Therefore, we can view differentially private\ncovariance estimation as a database sanitization scheme for ridge regression.\nFormally, given a data matrix X 2 Rd\u21e5n with columns x1, . . . , xn, we denote the ith entry of xj\nby xj(i). Consider using ridge regression to learn a linear model for estimating some target feature\nx(t) as a function of x(t), where x(t) denotes the vector obtained by removing the tth feature\nof x 2 Rd. That is, we want solve the following regularized optimization problem:\n\nw\u21b5 = argmin\nw2Rd1\n\n1\nn\n\nnXj=1\n\n1\n\n2w>xj(t) xj(t)2 + \u21b5kwk2\n\n2.\n\nWe can write the solution to the ridge regression problem in closed form as follows. Let A 2\nR(d1)\u21e5n be the matrix consisting all but the tth row of X and y = x1(t), . . . , xn(t) 2 Rn\n\n2\n\n\fbe the tth row of X (as a column vector). Then the solution to the ridge regression problem with\nregularization parameter \u21b5 is given by w\u21b5 = (AA> + 2\u21b5nI)1Ay.\nGiven access to just the covariance matrix C = XX>, we can compute the above closed form ridge\nregression model. Suppose \ufb01rst that the target feature is t = d. Then, writing X in block-form, we\nhave\n\nC = XX> =\uf8ff A\n\ny>\u21e5A> y\u21e4 =\uf8ff AA> Ay\ny>A> y>y .\n\nNow it is not hard to see we can recover w\u21b5 by using the block entries of the full covariance matrix.\nThe following lemma quanti\ufb01es how much the error of estimating C privately affects the regression\nsolution w\u21b5. The proof can be found in Appendix A.\nLemma 1. Let X 2 Rd\u21e5n be a data matrix, C = XX> 2 Rd\u21e5d, and \u02c6C 2 Rd\u21e5d be a symmetric\napproximation to C. Fix any target feature t and regularization parameter \u21b5. Let w\u21b5 and \u02c6w\u21b5 be the\nridge regression models learned for predicting feature t from C and \u02c6C, respectively. Then\n\nkw\u21b5 \u02c6w\u21b5k2 \uf8ff kC \u02c6Ck2,1 + kC \u02c6Ck2 \u00b7k \u02c6w\u21b5k2\n\nmin(C) + 2\u21b5n\n\n,\n\nwhere kMk2,1 denotes the L2,1-norm of M (the maximum 2-norm of its columns).\nBoth kC \u02c6Ck2,1 and kC \u02c6Ck2 are upper bounded by the Frobenius error kC \u02c6CkF . Therefore, in\nour analysis of our differentially private covariance estimation mechanism, we will focus on bound-\ning the Frobenius error. The bound in Lemma 1 also holds with k \u02c6w\u21b5k2 replaced by kw\u21b5k2 in the\nright hand side, however we prefer the stated version since it can be computed by the practitioner.\n\n1.2 Related Work\n\nA variety of techniques exist for computing differentially private estimates of covariance matrices,\nincluding both general mechanisms that can be applied in this setting as well as specialized methods\nthat take advantage of problem-speci\ufb01c structure.\nA na\u00a8\u0131ve approach using a standard differential privacy mechanism would be to simply add an ap-\npropriate amount of Laplace noise independently to every element in the true covariance matrix C.\nHowever, the amount of noise required makes such a mechanism impractical, as the sensitivity, and\nhence the amount of noise added, grows linearly in the dimension. A better approach is to add\nGaussian noise [Dwork et al., 2014b]; however, this results in (\u270f, )-differential privacy, where, with\nsome probability , the outcome is not private. Similarly, Upadhyay [2018] proposes a private way\nof generating low dimensional representations of X. This is a slightly different task than covariance\nestimation. Moreover, their algorithm is only (\u270f, )-differentially private for > n log n which\nmakes the privacy regime incomparable to the one proposed in this paper. Another approach, pro-\nposed in Chaudhuri et al. [2012], is to compute a private version of PCA. This approach has two\nlimitations. First, it only works for computing the top eigenvectors, and can fail to give non-trivial\nresults for computing the full covariance matrix. Second, the sampling itself is quite involved and\nrequires the use of a Gibbs sampler. Since it is generally impossible to know when the sampler\nconverges, adding noise in this manner can violate privacy guarantees.\nThe algorithm we propose bears the most resemblance to the differentially private low-rank matrix\napproximation proposed by Kapralov and Talwar [2013], which approximates the SVD. Their al-\ngorithm computes a differentially private rank-1 approximation of a matrix C, subtracts this matrix\nfrom C and then iterates the process on the residual. Similarly, our approach iteratively generates\nestimates of the eigenvectors of the matrix, but repeatedly projects the matrix onto the subspace\northogonal to the previously estimated eigenvectors. We demonstrate the bene\ufb01t of this projective\nupdate both in our analytical bounds and empirical results. This ultimately allows us to rely on a\nsimple rejection sampling technique proposed by Kent et al. [2018] to select our eigenvectors.\nOther perturbation approaches include recent work on estimating sparse covariance matrices by\nWang and Xu [2019]. Their setup differs from ours in that they assume all columns in the covariance\nmatrix have s-sparsity. There was also an attempt by Jiang et al. [2016] to use Wishart-distributed\nnoise to privately estimate a covariance matrix. However, Imtiaz and Sarwate [2016] proposed the\nsame algorithm and later discovered that the algorithm was in fact not differentially private.\n\n3\n\n\fWang [2018] also study the effectiveness of differentially private covariance estimation for private\nlinear regression (and compare against several other private regression approaches). However, they\nonly consider the Laplace and Gaussian mechanisms for private covariance estimation and do not\nstudy the quality of the estimated covariance matrices, only their performance for regression tasks.\n\n2 Preliminaries\nLet X 2 Rd\u21e5n be a data matrix where each column corresponds to a d-dimensional data point.\nThroughout the paper, we assume that the columns of the data matrix have `2-norm at most one1.\nOur goal is to privately release an estimate of the unnormalized and uncentered covariance matrix\nC = XX> 2 Rd\u21e5d.\nWe say that two data matrices X and \u02dcX are neighbors if they differ on at most one column, denoted\nby X \u21e0 \u02dcX. We want algorithms that are \u270f-differentially private with respect to neighboring data\nmatrices. Formally, an algorithm A is \u270f-differentially private if for every pair of neighboring data\nmatrices X and \u02dcX and every set O of possible outcomes, we have:\n(1)\n\nPr(A(X) 2O ) \uf8ff e\u270f Pr(A( \u02dcX) 2O ) .\n\nA useful consequence of this de\ufb01nition is composability.\nLemma 2. Suppose an algorithm A1 : Rd\u21e5n !Y 1 is \u270f1-differentially private and a second\nalgorithm A2 : Rd\u21e5n \u21e5Y 1 !Y 2 is \u270f2-differentially private. Then the composition A(X) =\nA2(X,A1(X)) is (\u270f1 + \u270f2)-differentially private.\nOur main algorithm uses this property and multiple applications of the following mechanisms.\n\nlog k\n\n ) \uf8ff .\n\n\u270f\n\nLaplace Mechanism. Let Lap(\u21b5) denote the Laplace distribution with parameter \u21b5. Given a\nquery f : Rd\u21e5n ! Rk mapping data matrices to vectors, the `1-sensitivity of the query is given by\nf = maxX\u21e0 \u02dcX kf (X)f ( \u02dcX)k1. For a given privacy parameter \u270f, the Laplace mechanism approx-\nimately answers queries by outputting f (X)+(Y1, . . . , Yk), where each Yi is independently sampled\nfrom the Lap(f /\u270f) distribution. The privacy and utility guarantees of the Laplace mechanism are\nsummarized in the following lemma.\nLemma 3. The Laplace mechanism preserves \u270f-differential privacy and, for any > 0, we have\nPr(maxi |Yi| f\nExponential Mechanism. The exponential mechanism can be used to privately select an approx-\nimately optimal outcome from an arbitrary domain. Formally, let (Y, \u00b5) be a measure space and\ng : (X, y) 7! g(X, y) be the utility of outcome y for data matrix X. The sensitivity of g is given\nby g = maxX\u21e0 \u02dcX,y |g(X, y) g( \u02dcX, y)|. For \u270f> 0, the exponential mechanism samples y from\ndensity proportional to fexp(y) = exp( \u270f\n2g\nLemma 4 (McSherry and Talwar [2007]). The exponential mechanism preserves \u270f-differential pri-\nvacy. Let OP T = maxy g(X, y) and G\u2327 = {y 2Y : g(X, y) OP T \u2327}. If \u02c6y is the output of\nthe exponential mechanism, we have Pr(\u02c6y 62 G2\u2327 ) \uf8ff exp\u270f\u2327 /(2g) \u00b7 \u00b5(G\u2327 ).\nIn our algorithm, we will apply the exponential mechanism in order to choose unit-length approxi-\nmate eigenvectors. Therefore, the space of outcomes Y will be the unit sphere S d1 = {\u2713 2 Rd :\nk\u2713k2 = 1}. For convenience, we will use the uniform distribution on the sphere, denoted by \u00b5, as\nour base measure (this is proportional to the surface area). For example, the density p(\u2713) = 1 is the\nuniform distribution on S d1 and \u00b5(S d1) = 1.\n3\n\nIterative Eigenvalue Sampling for Covariance Estimation\n\ng(X, y)), de\ufb01ned with respect to the base measure \u00b5.\n\nIn this section we describe our \u270f-differentially private covariance estimation mechanism. In fact, our\nmethod produces a differentially private approximation to the eigendecomposition of C = XX>.\n1If the columns of X have `2-norm bounded by a known value B, we can rescale the columns by 1/B to\nobtain a matrix X0 with column norm at most 1. Since XX> = B2X0X0>, estimating the covariance matrix\nof X0 gives an estimate of the covariance matrix of X with Frobenius error in\ufb02ated by a factor B2.\n\n4\n\n\fWe \ufb01rst estimate the vector of eigenvalues, a query that has `1-sensitivity at most 2. Next, we show\nhow to use the exponential mechanism to approximate the top eigenvector of the covariance matrix\nC. Inductively, after estimating the top k eigenvectors \u02c6\u27131, . . . , \u02c6\u2713k of C, we project the data onto the\n(d k)-dimensional orthogonal subspace and apply the exponential mechanism to approximate the\ntop eigenvector of the remaining projected covariance matrix. Once all eigenvalues and eigenvectors\nhave been estimated, the algorithm returns the reconstructed covariance matrix. Pseudocode for our\nmethod is given in Algorithm 1. In Section 3.2 we discuss a rejection-sampling algorithm of Kent\net al. [2018] that can be used for sampling the distribution de\ufb01ned in step (a) of Algorithm 1. It\nis worth mentioning that if we only sample k eigenvectors, Algorithm 1 would return a rank k-\napproximation of matrix C.\n\nAlgorithm 1 Iterative Eigenvector Sampling\nInput: C = XX> 2 Rd\u21e5d, privacy parameters \u270f0, . . . ,\u270f d.\n1. Initialize C1 = C, P1 = I 2 Rd\u21e5d, \u02c6i = i(C) + Lap(2/\u270f0) for i = 1, . . . , d.\n2. For i = 1, . . . , d:\n\n(a) Sample \u02c6ui 2S di proportional to fCi(u) = exp( \u270fi\n(b) Find an orthonormal basis Pi+1 2 R(di)\u21e5d orthogonal to \u02c6\u27131, . . . , \u02c6\u2713i.\n(c) Let Ci+1 = Pi+1CP>i+1 2 R(di)\u21e5(di).\n3. Output \u02c6C =Pd\n\n\u02c6i \u02c6\u2713i \u02c6\u2713>i .\n\ni=1\n\n4 u>Ciu) and let \u02c6\u2713i = P>i \u02c6ui.\n\nOur approach is similar to the algorithm of Kapralov and Talwar [2013] with one signi\ufb01cant differ-\nence: in their algorithm, rather than projecting onto the orthogonal subspace of the \ufb01rst k estimated\neigenvectors, they subtract the rank-one matrix given by \u02c6i \u02c6\u2713i \u02c6\u2713>i\nfrom C, where \u02c6i is the estimate\nof the ith eigenvalue. There are several advantages to using projections. First, the projection step\nexactly eliminates the variance along the direction \u02c6\u2713i, while the rank-one subtraction will fail to do\nso if the estimated eigenvalues are incorrect (effectively causing us to pay for the eigenvalue ap-\nproximation twice: once in the reconstruction of the covariance matrix and once because it prevents\nus from removing the variance along the direction \u02c6\u2713i before estimating the remaining eigenvectors).\nSecond, the analysis of the algorithm is substantially simpli\ufb01ed because we are guaranteed that the\nestimated eigenvectors \u02c6\u27131, . . . , \u02c6\u2713d are orthogonal, and we do not require bounds for rank-one updates\non the spectrum of a matrix.\nWe now show that Algorithm 1 is differentially private. The algorithm applies the Laplace mecha-\nnism once and the exponential mechanism d times, so the result follows from bounding the sensitiv-\nity of the relevant queries and applying basic composition.\n\nTheorem 1. Algorithm 1 preservesPd\n\ni=0 \u270fi-differential privacy.\n\nWe now focus on the main contribution of this paper: a utility guarantee for Algorithm 1 in terms\nof the Frobenius distance between \u02c6C and the true covariance matrix C as a function of the privacy\nparameters used for each step. An important consequence of this analysis is that we can optimize\nthe allocation of our total privacy budget \u270f among the d + 1 queries in order to get the best bound.\nFirst we provide a utility guarantee for the exponential mechanism applied to approximating the top\neigenvector of a matrix C. This result is similar to the rank-one approximation guarantee given by\nKapralov and Talwar [2013], but we include a proof in the appendix for completeness.\nLemma 5. Let X 2 Rd\u21e5n be a data matrix and C = XX>. For any > 0, with probability at\n4 u>Cu) on S d1, we\nleast 1 over \u02c6u sampled from the density proportional to fC(u) = exp( \u270f\nhave\n\n\u02c6u>C\u02c6u 1(C) O\u2713 1\n\n\u270fd log 1(C) + log\n\n1\n\n\u25c6\n\nThe following result characterizes the dependence of the Frobenius error on the errors in the es-\ntimated eigenvalues and eigenvectors.\nIn particular, given that the eigenvalue estimates all have\nbounded error, the dependence on the ith eigenvector estimate \u02c6\u2713i is only through the quantity\n\n5\n\n\fi(C) \u02c6\u2713>i C\u02c6\u2713i, which measures how much less variance of C is captured by \u02c6\u2713i as compared to the\ntrue ith eigenvector. Moreover, the contribution of \u02c6\u2713i is roughly weighted by \u02c6i. This observation\nallows us to tune the privacy budgeting across the d eigenvector queries, allocating more budget (at\nruntime) to the eigenvectors with large estimated eigenvalues. Empirically, we \ufb01nd that this budget\nallocation step improves performance in some settings.\nLemma 6. Let C 2 Rd\u21e5d be any positive semide\ufb01nite matrix. Let \u02c6\u27131, . . . , \u02c6\u2713d be any orthonormal\nvectors and \u02c61, . . . , \u02c6d be estimates of the eigenvalues of C satisfying |\u02c6i i(C)|\uf8ff \u2327 for all\ni 2 [d]. Then\n\ni(C) \u00b7 (i(C) \u02c6\u2713>i C\u02c6\u2713i) + \u2327pd\nwhere \u02c6\u21e5 is the matrix with columns \u02c6\u2713i and \u02c6\u21e4 is the diagonal matrix with entries \u02c6i.\n\nkC \u02c6\u21e5\u02c6\u21e4 \u02c6\u21e5>kF \uf8ffvuut2\n\ndXi=1\n\nProof. Let \u21e4 2 Rd\u21e5d be the diagonal matrix of true eigenvalues of C. We have kC \u02c6\u21e5\u02c6\u21e4 \u02c6\u21e5>kF \uf8ff\nkC \u02c6\u21e5\u21e4 \u02c6\u21e5>kF + k \u02c6\u21e5(\u21e4 \u02c6\u21e4) \u02c6\u21e5>kF . The second term is bounded by \u2327pd, so it remains to bound\nthe \ufb01rst term. We have that\ni(C)\u02c6\u2713>i C\u02c6\u2713i\nkC \u02c6\u21e5\u21e4 \u02c6\u21e5>k2\nF = kCk2\n= 2Xi\n\nF + k \u02c6\u21e5\u21e4 \u02c6\u21e5>k2\ni(C)(i(C) \u02c6\u2713>i C\u02c6\u2713i),\n\nF 2 tr(C \u02c6\u21e5\u21e4 \u02c6\u21e5>) = 2Xi\n\nwhere the second equation follows from the fact that the \ufb01rst two terms are both equal toPi i(C)2\n\nand the cyclic property of the trace. The \ufb01nal bound follows by taking the square root.\n\ni(C)2 2Xi\n\ndi(C)\n\nWe are now ready to prove our main utility guarantee for Algorithm 1. The remaining analysis\nfocuses on the effect of working with the projected covariance matrices Ci. One interesting ob-\nservation is that our algorithm does not have error accumulating across its iterations due to the\nprojection step. Following Lemma 6, we only need to show that \u02c6\u2713i captures nearly as much of the\nvariance of C as the ith eigenvector. Fortunately, if our estimates \u02c6\u27131, . . . , \u02c6\u2713i1 have errors, then the\northogonal subspace only contains more variance, and thus the sampling step in round i actually\nbecomes easier. In this sense Algorithm 1 is \u201cself-correcting\u201d.\nTheorem 2. Let \u02c6C be the output of Algorithm 1 run with inputs C and privacy parameters\n\u270f0, . . . ,\u270f d. For any > 0, with probability at least 1 we have\npd\n\u270f0 \u25c6,\n\nkC \u02c6CkF \uf8ff \u02dcO\u2713vuut\n\nwhere the \u02dcO notation suppresses logarithmic terms in d, 1(C), and .\nIf Algorithm 1 is used to obtain a k-rank approximation, the above theorem can be modi\ufb01ed to show\n\nthat the distance from the best k-rank approximation would be in OqPk\n\nTheorem 2 bounds the error in terms of the privacy parameters \u270f0, . . . ,\u270f d, we can tune our allocation\nof the total privacy budget of \u270f across the d + 1 private operations in order to obtain the tightest\npossible bound. In order to preserve privacy, we tune based on the estimated eigenvalues \u02c61, . . . , \u02c6d\nobtained in step (1) of Algorithm 1 rather than using the true eigenvalues. The following result\nmakes precise the natural intuition that more effort should be made to estimate those eigenvectors\nwith larger (estimated) eigenvalues; its proof can be found in Appendix B.\nCorollary 1. Fix any privacy parameter \u270f and any failure probability > 0, let \u270f0 = \u270f/2, and\nlet \u270fi =\nlog(2d/). Then Algorithm 1 run with \u270f0, . . . ,\u270f d preserves\n\u270f-differential privacy and, with probability at least 1 , the output \u02c6C satis\ufb01es\n\n2p\u02c6i+\u2327\nPj p\u02c6j +\u2327\n\n\u270f0. Since\n\nwhere \u2327 = 2\n\u270f0\n\ndXi=1\n\npd\n\n+\n\ndi(C)\n\ni=1\n\n\u270fi\n\n+\n\n\u270fi\n\n\u270f\n\nkC \u02c6CkF \uf8ff \u02dcO\u2713r d\n\n\u270f\n\ndXi=1r\u02c6i +\n\n1\n\u270f\n\n+\n\npd\n\u270f \u25c6.\n\n6\n\n\f3.1 Comparison of Bounds\nIn this section we compare the bound provided by Theorem 2 to previous state-of-the-art results.\n\nComparison to Kapralov and Talwar [2013]. The bounds given by Kapralov and Talwar [2013],\nwhen applied to the case of recovering the full-rank covariance matrix, bound the spectral error\nkC \u02c6Ck2 by \u21e31(C) (for some \u21e3> 0) under the condition that 1(C) is suf\ufb01ciently large. In\nparticular, Theorem 18 from their paper shows that there exists an \u270f-differentially private algorithm\nwith the above guarantee whenever 1(C) C1d4/(\u270f\u21e36) for some constant C1. Since kC \u02c6Ck2 \uf8ff\nkC \u02c6CkF , we can directly compare both algorithms after slightly rewriting our bounds. The\nfollowing result shows that we improve the necessary lower bound on 1(C) by a factor of d/\u21e34\n(ignoring log terms).\nCorollary 2. For any \u21e3> 0 and any positive semide\ufb01nite matrix \u02c6C, with probability at least\n0.99 (or any \ufb01xed success probability), running Algorithm 1 with \u270f0 = \u270f/2 and \u270fi = \u270f/(2d) for\ni = 1, . . . , d preserves \u270f-differential privacy and outputs \u02c6C such that kC \u02c6CkF \uf8ff O(\u21e31(C)) if\n1(C) 2d3\nComparison to Gaussian Mechanism. We can also directly compare to the error bounds for the\nGaussian mechanism given by Dwork et al. [2014b]. Theorem 9 in their paper gives kC \u02c6CkF \uf8ff\nO(d3/2plog(1/)/\u270f), where \u270f and are the (approximate) differential privacy parameters. Using\nprivacy parameters \u270f0 = \u270f/2 and \u270fi = \u270f/(2d) for i = 1, . . . , d, Theorem 2 implies that with\nhigh probability we have kC \u02c6CkF \uf8ff Od3/2p1(C) log(1(C))/\u270f + pd/\u270f. For all values of\n> 0, our algorithm provides a stronger privacy guarantee than the Gaussian mechanism. On the\nother hand, whenever 1(C) log(1(C)) \uf8ff log(1/)/\u270f, our utility guarantee is tighter. Given that\n1(C) = O(n), where n is the number of data points, we see that our algorithm admits better utility\nguarantees in both the low data regime and the high privacy regime.\n\n\u270f\u21e32 log( d\n\n\u270f\u21e3 ).\n\n3.2 Sampling on the Sphere\nTo implement Algorithm 1, we need a subprocedure for drawing samples from the densities pro-\nportional to exp( \u270f\n4 u>Cu) de\ufb01ned on the sphere S d1, where C is a covariance matrix and \u270f is the\ndesired privacy parameter. This density belongs to a family called Bingham distributions. Kapralov\nand Talwar [2013] also discuss this sampling problem and, while their algorithm could also be\nused in our setting, we instead rely on a simpler rejection-sampling scheme proposed by Kent et al.\n[2018]. This sampling technique is exact and we \ufb01nd empirically that it is very ef\ufb01cient. Pseudocode\nfor their method is given in Algorithm 2 in the appendix.\nRecall that rejection sampling allows us to generate samples from the distribution with density pro-\nportional to f, provided we can sample from the distribution with density proportional to a similar\nfunction g, called the envelope. Kent et al. [2018] propose to use the angular central Gaussian\ndistribution as an envelope. This distribution has a matrix parameter \u2326 and unnormalized density\n(de\ufb01ned on the sphere S d1) given by g(u) = (u>\u2326u)d/2. To sample from this distribution, we\ncan simply sample z from the mean-zero Gaussian distribution with covariance given by \u23261 and\noutput u = z/kzk2. Kent et al. [2018] provide a choice of parameter \u2326 to minimize the number of\nrejected samples. They show that under some reasonable assumptions the expected number of rejec-\ntions grows like O(pd) (see [Kent et al., 2018] for more details). In our experiments we observed\nthe median number of samples was less than d, and the mean was around 2d. We believe that our\nempirical rejection counts are larger than the asymptotic bounds of Kent et al. [2018] because the\ndimensionality of our datasets is not large enough.\n\n4 Experiments\n\nWe now present the results of an extensive empirical evaluation of the performance of our algorithm.\nGiven a data matrix X, we study the performance of the algorithm on two tasks: (i) privately esti-\nmating the covariance matrix C = XX>, and (ii) privately regressing to predict one of the columns\nof X from the others. Due to space constraints we present only the results of (i) and present the\nresults of (ii) in Appendix C.\n\n7\n\n\f(a)\n\n(b)\n\nFigure 1: Results comparing our algorithm across the wine, airfoil and adult data sets. (a) Com-\nparison to KT and L. Error is normalized Frobenius distance.\n(b) Comparison to the Gaussian\nmechanism. The legend G-x corresponds to a value of = 10x.\n\nWe compare the performance of our algorithm to a number of different baselines. We begin with two\ngeneral purpose output perturbation methods: the Laplace mechanism and the Gaussian mechanism.\n\u2022 The Laplace mechanism [Dwork et al., 2006] (L). The output is given by \u02c6C = C + M\n\u270f ).\nwhere M is a matrix with entries distributed Lap( 2d\n\u2022 The Gaussian mechanism [Dwork et al., 2014b] (G). Notably, the Gaussian mechanism\nachieves (\u270f, )-differential privacy, hence its privacy guarantees are weaker for the same\nvalue of \u270f. Our goal is to measure if we can achieve similar utility under stricter privacy\nconstraints. We experiment with different values of .\n\ndifferentially private. We use Algorithm 2 for the vector sampling subroutine.\n\n\u2022 The algorithm proposed by Kapralov and Talwar [2013] (KT). This algorithm is \u270f-\n\u2022 Algorithm 1 with adaptive privacy splitting (AD). We allocate the privacy budget in the\n\u2022 Algorithm 1 with uniform privacy splitting (IT-U). Same as above except the privacy bud-\n\nmanner suggested by Corollary 1.\n\nget used to sample eigenvectors is split uniformly.\n\nn\n\nOne \ufb01nal modi\ufb01cation we apply to all algorithms that release a covariance matrix is to round the\neigenvalues of the private matrix to fall in the interval [0, n], since this bound is data-independent\nand is easy to derive analytically.\nWe measure the performance of our algorithm on three different datasets: Wine, Adult, and Airfoil\nfrom the UCI repository2, These datasets have dimensions ranging from 13 to 108, and number\nof points from 200 to 49,000. The approximation error of each algorithm is measured using the\nnormalized Frobenius distance k \u02c6CCkF\n. To investigate the privacy/utility trade-off, we run each al-\ngorithm with privacy parameter \u270f 2{ 0.01, 0.1, 0.2, 0.5, 1.0, 2.0, 4.0}. For the Gaussian mechanism,\nwe also varied the parameter 2{ 1e16, 1e10, 1e3} We ran each experiment 50 times, showing\nthe average error in Figure 1.\nThe \ufb01rst thing to notice is that our algorithm consistently outperforms all others except for the single\ncase of the wine data set with \u270f = 0.01. Recall that the Gaussian mechanism has an additional\nfailure probability , thus the privacy guarantees we obtain are strictly better for the same value of \u270f.\nTherefore, it is particularly striking that we consistently beat the Gaussian mechanism even for the\nvery relaxed value of = .001.\nAnother important observation from this experiment is that the adaptive and non adaptive privacy\nbudget splitting seems to not have a big effect on the performance of the algorithm. Finally, we see\nthat the performance gap between AD and KT is largest on the dataset with the highest dimension.\nThis phenomenon is in line with the analysis of Section 3.1. We explore this effect in more detail in\nAppendix C.\nFinally, as we detail in Appendix C our approach outperforms the output perturbation method of\nChaudhuri et al. [2011] on the regression task, even though the latter achieves (\u270f, )-differential\nprivacy. As we mentioned previously, the private covariance matrix output by our algorithm can\n\n2https://archive.ics.uci.edu/ml/datasets/\n\n8\n\n10\u2212110\u22121100101\u03f50.00.51.01.52.02.53.01oUmDlLzed eUUoUwLned 13, n 200AD,T-8L.T10\u2212110\u22121100101\u03f50.00.20.40.60.81.01.2DLUIoLld 6,n 1.5.10\u2212110\u22121100101\u03f50.00.51.01.52.02.53.03.54.0Ddultd 108, n 49.10\u2212110\u22121100101\u03f50.00.51.01.52.02.51ormDlizeG errorwineG 13, n 200G-16G-10G-3AD10\u2212110\u22121100101\u03f50.00.20.40.60.81.01.2DirfoilG 6,n 1.5.10\u2212110\u22121100101\u03f50.00.20.40.60.81.01.21.41.61.8DGultG 108, n 49.\fbe also be used to tune regularization parameters without affecting the privacy budget, thus giving\nadditional freedom to practitioners in tuning their algorithms.\n\n5 Conclusion\n\nWe presented a new algorithm for differentially private covariance estimation, studied it analytically,\nand demonstrated its performance on a number of synthetic and real world datasets. To the best of\nour knowledge this is the \ufb01rst \u270f-differentially private algorithm to admit a utility guarantee that grows\nas O(d3/2) with the dimension of the dataset. Previously, such bounds could only be achieved at the\ncost (\u270f, )-differential privacy. We also showed that the average Frobenius approximation error of\n\nn rate of the Gaussian and Laplace\n\nour algorithm decreases as O 1pn, which is slower than the O 1\n\nmechanisms. This poses an open question of whether the suboptimal dependency on n is necessary\nin order to achieve pure differential privacy or to achieve a dependency on the dimension of O(d3/2).\nLooking more broadly, practical machine learning and data analysis typically requires a signi\ufb01cant\namount of tuning: feature selection, hyperparameter selection, experimenting with regularization,\nand so on. If this tuning is performed using the underlying private dataset, then in principle all of\nthese count against the privacy budget of the algorithm designer (who must also, of course, have\naccess to that private dataset). By producing a differentially private summary of the dataset from\nwhich multiple models can be trained with no additional privacy cost, our approach allows a prac-\ntitioner to operate freely, without worrying about privacy budgets or the secure handling of private\ndata. We believe that \ufb01nding techniques for computing private representations in other settings is an\nexciting direction for future research.\n\nReferences\nMartin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and\nIn Proceedings of the 2016 ACM SIGSAC\n\nLi Zhang. Deep learning with differential privacy.\nConference on Computer and Communications Security, pages 308\u2013318. ACM, 2016.\n\nKeith Ball. An elementary introduction to modern convex geometry. Flavors of geometry, 31:1\u201358,\n\n1997.\n\nK. Chaudhuri, A. Sarwate, and K. Sinha. Near-optimal algorithms for differentially-private principal\n\ncomponent analysis. In NIPS, 2012.\n\nKamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical\n\nrisk minimization. Journal of Machine Learning Research, 12(Mar):1069\u20131109, 2011.\n\nCynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sen-\nsitivity in private data analysis. In Third Theory of Cryptography Conference, pages 265\u2013284,\n2006.\n\nCynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations\n\nand Trends R in Theoretical Computer Science, 9(3\u20134):211\u2013407, 2014a.\n\nCynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Analyze gauss: optimal bounds\nIn Proceedings of the forty-sixth annual\n\nfor privacy-preserving principal component analysis.\nACM symposium on Theory of computing, pages 11\u201320. ACM, 2014b.\n\nHa\ufb01z Imtiaz and Anand D. Sarwate. Symmetric matrix perturbation for differentially-private princi-\npal component analysis. In 2016 IEEE International Conference on Acoustics, Speech and Signal\nProcessing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pages 2339\u20132343, 2016.\n\nPrateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In\n\nConference on Learning Theory, pages 24\u20131, 2012.\n\nWuxuan Jiang, Cong Xie, and Zhihua Zhang. Wishart mechanism for differentially private principal\ncomponents analysis. In Proceedings of the Thirtieth AAAI Conference on Arti\ufb01cial Intelligence,\nFebruary 12-17, 2016, Phoenix, Arizona, USA., pages 1730\u20131736, 2016.\n\n9\n\n\fMichael Kapralov and Kunal Talwar. On differentially private low rank approximation. In Proceed-\n\nings of SODA, pages 1395\u20131414, 2013.\n\nJohn T Kent, Asaad M Ganeiber, and Kanti V Mardia. A new uni\ufb01ed approach for the simulation of\na wide class of directional distributions. Journal of Computational and Graphical Statistics, 27\n(2):291\u2013301, 2018.\n\nDaniel Kifer, Adam D. Smith, and Abhradeep Thakurta. Private convex optimization for empirical\nIn Proceedings of COLT,\n\nrisk minimization with applications to high-dimensional regression.\npages 25.1\u201325.40, 2012.\n\nF. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007.\nFrank McSherry and Ilya Mironov. Differentially private recommender systems: Building privacy\ninto the net\ufb02ix prize contenders. In Proceedings of the 15th ACM SIGKDD international confer-\nence on Knowledge discovery and data mining, pages 627\u2013636. ACM, 2009.\n\nJalaj Upadhyay. The price of privacy for low-rank factorization. In Proceedings of NeurIPS, pages\n\n4180\u20134191, 2018.\n\nDi Wang and Jinhui Xu. Differentially private high dimensional sparse covariance matrix estimation.\n\nCoRR, abs/1901.06413, 2019. URL http://arxiv.org/abs/1901.06413.\n\nYu-Xiang Wang. Revisiting differentially private linear regression: optimal and adaptive prediction\n\n& estimation in unbounded domain. In Proceedings of UAI, 2018.\n\n10\n\n\f", "award": [], "sourceid": 7986, "authors": [{"given_name": "Kareem", "family_name": "Amin", "institution": "Google Research"}, {"given_name": "Travis", "family_name": "Dick", "institution": "TTIC"}, {"given_name": "Alex", "family_name": "Kulesza", "institution": "Google"}, {"given_name": "Andres", "family_name": "Munoz", "institution": "Google"}, {"given_name": "Sergei", "family_name": "Vassilvitskii", "institution": "Google"}]}