{"title": "The Price of Privacy for Low-rank Factorization", "book": "Advances in Neural Information Processing Systems", "page_first": 4176, "page_last": 4187, "abstract": "In this paper, we study what price one has to pay to release \\emph{differentially private low-rank factorization} of a matrix. We consider various settings that are close to the real world applications of low-rank factorization: (i) the manner in which matrices are updated (row by row or in an arbitrary manner), (ii) whether matrices are distributed or not, and (iii) how the output is produced (once at the end of all updates, also known as \\emph{one-shot algorithms} or continually). Even though these settings are well studied without privacy, surprisingly, there are no private algorithm for these settings (except when a matrix is updated row by row). We present the first set of differentially private algorithms for all these settings. \n\nOur algorithms when private matrix is updated in an arbitrary manner promise differential privacy with respect to two stronger privacy guarantees than previously studied, use space and time \\emph{comparable} to the non-private algorithm, and achieve \\emph{optimal accuracy}. To complement our positive results, we also prove that the space required by our algorithms is optimal up to logarithmic factors. When data matrices are distributed over multiple servers, we give a non-interactive differentially private algorithm with communication cost independent of dimension. In concise, we give algorithms that incur {\\em optimal cost across all parameters of interest}. We also perform experiments to verify that all our algorithms perform well in practice and outperform the best known algorithm until now for large range of parameters.", "full_text": "The Price of Privacy for Low-rank Factorization\n\nJalaj Upadhyay\n\nJohns Hopkins University\n\nBaltimore, MD - 21201, USA.\n\njalaj@jhu.edu\n\nAbstract\n\nIn this paper, we study what price one has to pay to release differentially private\nlow-rank factorization of a matrix. We consider various settings that are close\nto the real world applications of low-rank factorization: (i) the manner in which\nmatrices are updated (row by row or in an arbitrary manner), (ii) whether matrices\nare distributed or not, and (iii) how the output is produced (once at the end of all\nupdates, also known as one-shot algorithms or continually). Even though these\nsettings are well studied without privacy, surprisingly, there are no private algorithm\nfor these settings (except when a matrix is updated row by row). We present the\n\ufb01rst set of differentially private algorithms for all these settings.\nOur algorithms when private matrix is updated in an arbitrary manner promise\ndifferential privacy with respect to two stronger privacy guarantees than previously\nstudied, use space and time comparable to the non-private algorithm, and achieve\noptimal accuracy. To complement our positive results, we also prove that the space\nrequired by our algorithms is optimal up to logarithmic factors. When data matrices\nare distributed over multiple servers, we give a non-interactive differentially private\nalgorithm with communication cost independent of dimension. In concise, we\ngive algorithms that incur optimal cost across all parameters of interest. We also\nperform experiments to verify that all our algorithms perform well in practice and\noutperform the best known algorithm until now for large range of parameters.\n\n1\n\nIntroduction\n\nLow-rank factorization (LRF) of matrices is a fundamental component used in many applications, such\nas clustering [15, 19, 43], data mining [5], recommendation systems [20], information retrieval [49,\n53], learning distributions [2, 34], and web search [1, 36]. In these applications, given an m \u00d7 n\nmatrix A, a common approach is to \ufb01rst compute three matrices: a diagonal positive semide\ufb01nite\n\nmatrix (cid:101)\u03a3k \u2208 Rk\u00d7k and two matrices, (cid:101)Uk \u2208 Rm\u00d7k and (cid:101)Vk \u2208 Rn\u00d7k, with orthonormal columns.\nThe requirement then is that the product B := (cid:101)U(cid:101)\u03a3(cid:101)VT is as close to A as possible. More formally,\ntarget rank k, compute a rank-k matrix factorization (cid:101)Uk,(cid:101)\u03a3k, and (cid:101)Vk such that\n(cid:105) \u2265 1 \u2212 \u03b2,\n\nProblem 1. (\u03b1, \u03b2, \u03b3, k)-LRF. Given parameters 0 < \u03b1, \u03b2 < 1, \u03b3, a matrix A \u2208 Rm\u00d7n matrix, the\n\nk(cid:107)F \u2264 (1 + \u03b1)(cid:107)A \u2212 [A]k(cid:107)F + \u03b3\n\n(cid:104)(cid:107)A \u2212 (cid:101)Uk(cid:101)\u03a3k(cid:101)VT\n\nPr\n\nwhere (cid:107) \u00b7 (cid:107)F denotes the Frobenius norm, and [A]k is the best rank-k approximation of A. We refer\nto the parameter \u03b3 as the additive error and to \u03b1 as the multiplicative error.\n\nPractical matrices are often large, distributed over many servers and are dynamically updated [41, 42]\nand hence many works have considered these settings in order to reduce latency, synchronization\nissues and resource overhead [6, 9, 16, 13, 14, 17, 27, 40, 44, 48, 52]. Moreover, these applications use\ncon\ufb01dential dataset and use of ad hoc mechanism can lead to serious privacy leaks [47]. Therefore,\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fPrivacy\n\n(cid:101)O((\n(cid:101)O((\n(cid:101)O((\n(cid:101)O((\n\n\u221a\n\u221a\n\nUpdates\nTurnstile\nTurnstile\nTurnstile\nTurnstile\nRow wise\n\nReference\nTheorem 1\nTheorem 2\nTheorem 4\nTheorem 4\nCorollary 1\nTheorem 5\nTable 1: Our Results for (\u03b5, \u0398(n\u2212 log n))-Differentially Private Algorithms (T : stream length, m \u2265 n).\n\nComments\nA \u2212 A(cid:48) = uvT\nOne-shot\n(cid:107)A \u2212 A(cid:48)(cid:107)F = 1\nOne-shot\nContinually A \u2212 A(cid:48) = uvT\n(cid:107)A \u2212 A(cid:48)(cid:107)F = 1\nContinually\n(cid:107)A \u2212 A(cid:48)(cid:107)F = 1\nOne-shot\n(cid:107)A \u2212 A(cid:48)(cid:107)F = 1\n\n\u221a\nAdditive Error\nmk\u03b1\u22121 +\n\u221a\nmk\u03b1\u22122 +\n\u221a\nkn)\u03b5\u22121 log T )\n\u221a\nkn)\u03b5\u22121 log T )\n\u22121 \u221a\n\n\u221a\n\u221a\nmk\u03b1\u22121 +\nmk\u03b1\u22121 +\n(\u03b1\u03b5)\n\n(cid:16)\n(cid:101)O\n(cid:101)O(cid:0)k\u03b1\u22122\u0001\u22121\u221a\n\nkn)\u03b5\u22121)\nkn)\u03b5\u22121)\n\n(cid:17)\nm(cid:1)\n\nnk\n\n\u2212\n\nLocal\n\nfor any practical deployment [3, 25], one would like to simultaneously maintain strong privacy\nguarantee and minimize space requirements, communication and computational costs.\nUnfortunately, existing private algorithm for LRF do not consider these settings (except in central\nmodel when matrices are received row by row [24]). For example, known algorithms either use\nmultiple pass over the data matrix [29, 30, 31, 35] or cannot handle arbitrary updates [24]. Similarly,\nknown algorithms that continually release output are for monotonic functions [23], thereby excluding\nProblem 1. Private algorithms like [24, 30] that can be extended to distributed setting use multiple\nrounds of interaction, large communication cost, and/or result in trivial error bounds. Moreover,\nknown private algorithms are inef\ufb01cient compared to non-private algorithms: O(mnk) time and\nO(mn) space compared to time linear in the sparsity of the matrix and O((m + n)k/\u03b1) space [9, 13].\nIn fact, for rank-k matrices, Musco and Woodruff [45] state that Problem 1 is equivalent to the well\n\nstudied matrix completion for which one can have (cid:101)O(n \u00b7 poly(k)) time non private algorithm [32].\n\nUnder same assumptions, private algorithm takes O(mnk) time [30]. This motivates the central\nthesis of this paper: What is the price of privacy for non-trivial private algorithms?\n\n1.1 Overview of the Results\n\nWe give a uni\ufb01ed approach and \ufb01rst set of algorithms for solving Problem 1 in various settings:\n(i) when private matrix is updated row by row or in arbitrary manner, (ii) when private matrix is\ndistributed or not, and (iii) when the output is produced once at the end of all updates, also known as\none-shot algorithms or continually. We show that one does not have to pay the price of privacy (more\nthan what is required in terms of additive error and space). On a high level, we show the following:\n1. When a private matrix is streamed, we propose differentially private algorithms with respect to two\nstronger privacy guarantees than previously studied. We also show that these algorithms can be\nextended to continual release model. Our algorithms uses basic linear algebra. This makes them\neasy to code, and therefore, optimize.\n\n2. We complement our positive results with a matching lower bound on the space required. Our\n\nalgorithms are also time ef\ufb01cient and achieve optimal accuracy.\n\n3. In the distributed setting, we give a non-interactive differentially private algorithm with communi-\n\ncation cost independent of dimension.\n\nAll our results are summarized in Table 1.\n\n2 Preliminaries\n\nIn this paper, we give algorithms that are private under the notion of differential privacy. Differential\nprivacy has emerged as a de facto notion of privacy over the last few years. Formally, it is de\ufb01ned as\nfollows:\nDe\ufb01nition 1 ((\u0001, \u03b4)-differential privacy). A randomized algorithm M gives (\u03b5, \u03b4)-differential privacy,\nif for all neighboring datasets A and A(cid:48), and all measurable sets S in the range of M, Pr[M(A) \u2208\nS] \u2264 exp(\u03b5)Pr[M(A(cid:48)) \u2208 S] + \u03b4, where the probability is over the coin tosses of M.\n\nWe consider two stronger privacy guarantees than previously studied: Priv1 and Priv2. In Priv1, we\ncall two matrices A and A(cid:48) neighboring if A \u2212 A(cid:48) = uvT for some unit vectors u and v. In Priv2,\nwe consider two matrices A and A(cid:48) neighboring if (cid:107)A \u2212 A(cid:48)(cid:107)F \u2264 1.\n\n2\n\n\fOur algorithm relies heavily on some results from the theory of random projections.\nDe\ufb01nition 2. A distribution DR of t\u00d7m matrices satis\ufb01es (\u03b1, \u03b4)-subspace embedding for generalized\nregression if it has the following property: for any matrices P \u2208 Rm\u00d7n and Q \u2208 Rm\u00d7n(cid:48)\nsuch that\n\nrank(P) \u2264 r, with probability 1 \u2212 \u03b4 over \u03a6 \u223c DR, if (cid:101)X = argminX (cid:107)\u03a6(PX \u2212 Q)(cid:107)F and (cid:98)X =\nargminX\u2208Rn\u00d7n(cid:48) (cid:107)PX \u2212 Q(cid:107)F , then (cid:107)P(cid:101)X \u2212 Q(cid:107)F \u2264 (1 + \u03b1)(cid:107)P(cid:98)X \u2212 Q(cid:107)F .\n\nDe\ufb01nition 3. A distribution DA over v \u00d7 m matrices satis\ufb01es (\u03b1, \u03b4)-af\ufb01ne subspace embedding if it\nhas the following property: for any matrices D \u2208 Rm\u00d7n and E \u2208 Rm\u00d7n(cid:48)\nsuch that rank(D) \u2264 r,\nwith probability 1 \u2212 \u03b4 over S \u223c DA, simultaneously for all X \u2208 Rn\u00d7n(cid:48)\n, (cid:107)S(DX \u2212 E)(cid:107)2\nF =\n(1 \u00b1 \u03b1)(cid:107)DX \u2212 E(cid:107)2\nF .\nAn example distribution DR with t = O(\u03b1\u22122 log(1/\u03b4)) is the distribution of random matrices whose\nentries are sampled i.i.d. from N (0, 1/t).\n\n3 A Meta Low Space Differentially Private Algorithm\n\nOur aim in this section is to present a uni\ufb01ed algorithmic approach in the form of meta algorithm (see,\nAlgorithm 1). This serves the purpose of illustrating the key ideas. For example, since one of our\ngoals is private algorithms under turnstile model, we are restricted to only use linear sketches [38],\nbut what we show is that advance yet inexpensive post-processing combined with careful analysis\ncan lead to a small error differentially private LRF.\nOur Techniques. Our algorithm is based on two observations: (i) there is a way to maintain\ndifferentially private sketches of A (henceforth, we call such sketches noisy sketches) that incurs\nsub-optimal accuracy in terms of additive error and (ii) one can apply post-processing to these noisy\nsketches to obtain optimal additive error.\nTo illustrate point (i) and why we need post-processing, consider the following vanilla algorithm\nfor approximating the right singular vector: compute B = \u03a6A + N1, where \u03a6 satis\ufb01es certain\nembedding property (for example, [44, Theorem 1]) and N1 \u223c N (0, \u03c12\nin Figure 1. The output is [B]k, the best rank-k approximation of B. This already gives a good\nk be the singular value decomposition of [B]k.\n\napproximation. Let m (cid:29) n2 and let [(cid:101)U]k[(cid:101)\u03a3]k[(cid:101)V]T\n(cid:107)A \u2212 A[(cid:101)V]k[(cid:101)V]T\n\u22121 (cid:107)B(I \u2212 [(cid:101)V]k[(cid:101)V]T\n\nk(cid:107)F \u2264 (cid:107)(A + \u03a6\u2020N1) \u2212 (A + \u03a6\u2020N1)[(cid:101)V]k[(cid:101)V]T\n\nThen by embedding property of \u03a6 [44, Theorem 1],\n\n1)(cid:101)O(n2)\u00d7n for \u03c11 as de\ufb01ned\nk(cid:107)F + (cid:107)\u03a6\u2020N1 + \u03a6\u2020N1[(cid:101)V]k[(cid:101)V]T\nk )(cid:107)F + O((cid:107)\u03a6\u2020N1 + \u03a6\u2020N1[(cid:101)V]k[(cid:101)V]T\nk(cid:107)F\nk )(cid:107)F + O((cid:107)\u03a6\u2020N1 + \u03a6\u2020N1[(cid:101)V]k[(cid:101)V]T\nk(cid:107)F )\n\u22121(cid:107)A \u2212 [A]k(cid:107)F + O((cid:107)\u03a6\u2020N1(cid:107)F + (cid:107)\u03a6\u2020N1[(cid:101)V]k[(cid:101)V]T\nk(cid:107)F )\n\n\u2264 (1 \u2212 \u03b1)\n\u2264 (1 \u2212 \u03b1)(cid:107)B(I \u2212 [V]k[V]T\n\u2264 (1 + \u03b1) (1 \u2212 \u03b1)\n\nk(cid:107)F ).\nThe term in O(\u00b7) can be bounded using the embedding property of \u03a6, but this incurs large error. The\nquestion is whether we can further improve it to get optimal additive error. We show that it is possible\nusing careful post-processing (point (ii) above). That is, we can extract top-k singular components of\nthe input matrix A from sketches that are appropriately perturbed to preserve differential privacy.\nThe underlying idea is as follows: suppose we know the singular value decomposition of [A]k :=\n[U]k[\u03a3]k[V]T\n\nk . Then for \ufb01nding a matrix B such that B \u2248 A, it suf\ufb01ces to compute (cid:101)U that\napproximates [U]k, (cid:101)\u03a3 that approximates [\u03a3]k, and (cid:101)V that approximates [V]k, and set B := (cid:101)U(cid:101)\u03a3(cid:101)VT.\n\nHowever, this over simplistic overview does not guarantee privacy. In the rest of this exposition, we\ngive a brief overview of how we turn this simplistic overview to a private algorithm.\nChallenges in computing differentially private low-rank factorization. The two traditional meth-\nods to preserve privacy\u2014input perturbation and output perturbation\u2014do not provide both privacy\nand small additive error. For example, if we use output perturbation to compute the sketches,\nYc = A\u03a6 + N and Yr = \u03a8A + N(cid:48) for appropriate sketching matrices \u03a6 and \u03a8 and noise ma-\ntrices N and N(cid:48), and use known random projection results, then we get an additive error term\nthat can be arbitrarily large (more speci\ufb01cally, depends on the Frobenius norm of A and has\nthe form (cid:107)NLAN(cid:48)(cid:107)F for some linear function LA of A). More precisely, we can show that\nminr(X)\u2264k (cid:107)YcXYr \u2212 A(cid:107)F \u2264 (cid:107)A \u2212 [A]k(cid:107)F + (cid:107)NLAN(cid:48)(cid:107)F + (cid:107)NLA\u03a8A(cid:107)F + (cid:107)A\u03a6LAN(cid:48)(cid:107)F .\n\n3\n\n\fAlgorithm 1 PRIVATE-OPTIMAL-LRF(A; (\u0001, \u03b4); \u03b1; k)\n\n1)t\u00d7(m+n) and N2 \u223c N (0, \u03c12\n\nSample \u03a6 \u223c N (0, 1/t)(m+n)\u00d7m, \u03a8 \u223c N (0, 1/t)t\u00d7m, S \u223c N (0, 1/v)v\u00d7m, T \u223c\nN (0, 1/v)v\u00d7(m+n) with every entry sampled i.i.d.\nSample N1 \u223c\nN (0, \u03c12\n\n1: Set \u03b7 = max(cid:8)k, \u03b1\u22121(cid:9), t = O(\u03b7\u03b1\u22121 log(k/\u03b4)), v = O(\u03b7\u03b1\u22122 log(k/\u03b4)) and \u03c3min =\n16 log(1/\u03b4)(cid:112)t(1 + \u03b1)(1 \u2212 \u03b1)\u22121 ln(1/\u03b4)/\u03b5, \u03c11 = (cid:112)(1 + \u03b1) ln(1/\u03b4)/\u03b5, \u03c12 = (cid:112)(1 + \u03b1)\u03c11.\n2: Set (cid:98)A = (A \u03c3minIm) by padding \u03c3minIm to the columns of A, where Im denotes an m \u00d7 m\nidentity matrix. Compute Yc = (cid:98)A\u03a6, Yr = \u03a8(cid:98)A + N1, and Z = S(cid:98)ATT + N2.\n4: Compute: SVD of SU := (cid:101)Us(cid:101)\u03a3s(cid:101)VT\ns \u2208 Rv\u00d7t and a SVD of VTT := (cid:101)Ut(cid:101)\u03a3t(cid:101)VT\n5: Compute: SVD of (cid:101)Vs(cid:101)\u03a3\u2020\ns Z(cid:101)Vt]k(cid:101)\u03a3\ns[(cid:101)UT\nt(cid:101)UT\n6: Output: (cid:101)U = UU(cid:48), diagonal matrix (cid:101)\u03a3 = \u03a3(cid:48), and (cid:101)V = VTV(cid:48).\n\n3: Compute: U \u2208 Rm\u00d7t whose columns are orthonormal basis for the column space of Yc and\n\nmatrix V \u2208 Rt\u00d7(m+n) whose rows are the orthonormal basis for the row space of Yr.\n\n2)v\u00d7v. Keep N1, N2, \u03a6 private.\n\nfrom N (0, 1).\n\n\u2020\n\nt . Let it is be U(cid:48)\u03a3(cid:48)V(cid:48)T.\n\nt \u2208 Rt\u00d7v.\n\nWhile minr(X)\u2264k (cid:107)YcXYr \u2212 A(cid:107)F can be lower bounded using the techniques we use in this paper,\nthe additive term (cid:107)NLAN(cid:48)(cid:107)F can have large Frobenius norm.\nOn the other hand, input perturbation of A followed by a multiplication by Gaussian matrices \u21261 and\n\u21262 as in [8, 57, 58] can leak private data due to a subtle reason. Every row of \u21261A (and columns\nof A\u21262) has a multivariate Gaussian distribution if the determinant of ATA (AAT, respectively)\nis non zero. If m < n, one can prove that computing A\u21261 preserves privacy, but, since, A is not a\nfull-column rank matrix, the multivariate Gaussian distribution is not de\ufb01ned. The trick to consider\nthe subspace orthogonal to the kernel space of A [8] does not work because span of A and A(cid:48) may\nnot coincide for neighboring matrices A and A(cid:48). If the span do not coincide, then one can easily\ndifferentiate the two cases with high probability, violating differential privacy. In fact, until this\nwork, it was not even clear whether using input perturbation yields low rank approximation (see the\ncomment after Theorem IV.2 and discussion in Section V in Blocki et al. [8])!\nOur Algorithm. We use input perturbation with a careful choice of parameter to one of the sketches\nand output perturbation to the other two sketches and show that it incur optimal additive error and\npreserve privacy. The intuitive reason why this incurs small additive error is the fact that only one\nof the sketches, Yr or Yc, undergoes output perturbation, so there is no term like (cid:107)NLAN(cid:48)(cid:107)F as\nabove. This allows us to show that Yc and Yr (or equivalently, their orthonormal bases U and V as\nformed in Algorithm 1) approximates the span of [U]k and [V]k up to a small additive error.\nOnce we have extracted a \"good enough\" U and V, our problem reduces to computing\nargminrk(X)\u2264k (cid:107)A \u2212 UXV(cid:107)F . This would require storing the whole matrix A, something that\nwe wish to avoid. To avoid storing the whole A, we use the fact that S and T are sampled from\na distribution of random matrices with a property that, for all appropriate X, (cid:107)A \u2212 UXV(cid:107)F \u2248\n(cid:107)S(A \u2212 UXV)TT(cid:107)F . In other words, without privacy, argminrk(X)\u2264k (cid:107)S(A \u2212 UXV)TT(cid:107)F can\nbe used to get a \u201cgood\" approximation of [\u03a3]k. The exact method to perform and analyze the approx-\nimation of [\u03a3]k is slightly more involved because we only have access to the noisy version of SAT,\ni.e., Z (in fact, this is one of the places we need careful post processing to output an approximation to\n\u03a3k under a rotation and a small additive error).Finally, we arrive at the main result stated below for\nthe case when m \u2264 n (the result when m > n can be derived by just swapping m and n).\nTheorem 1 (Main result). Let m, n, k \u2208 N and \u03b1, \u03b5, \u03b4 be the input parameters (with m \u2264 n). Let \u03ba,\n\u03b7, and \u03c3min be as de\ufb01ned in Algorithm 1. Given an m \u00d7 n matrix A with nn(A) non-zero entries, let\n(A 0) be a matrix formed by appending an all zero m \u00d7 m matrix to A. Then PRIVATE-OPTIMAL-\n\nLRF (Algorithm 1) is (3\u03b5, 3\u03b4) differentially private under Priv1 and outputs a factorization (cid:101)U,(cid:101)\u03a3,(cid:101)V\n\nsuch that\n1. With probability 9/10 over the random coins of PRIVATE-SPACE-OPTIMAL-LRF,\n\n(cid:107) (A 0) \u2212 (cid:101)U(cid:101)\u03a3(cid:101)VT(cid:107)F \u2264 (1 + \u03b1)(cid:107)A \u2212 [A]k(cid:107)F + O(\u03c3min\n\nm + \u03b5\u22121(cid:112)kn ln(1/\u03b4)).\n\n\u221a\n\n2. The space used by PRIVATE-SPACE-OPTIMAL-LRF is O((m + n)\u03b7\u03b1\u22121 log(k/\u03b4)).\n\nProof Sketch. The proof of Theorem 1 is presented in the supplementary material. Here, we give a\nbrief sketch of part 1 (for m \u2264 n) to illustrate the key points. The intuition that there is no term like\n\n4\n\n\f(cid:107)NLAN(cid:48)(cid:107)F does not directly yield optimal additive error. This is because, even if we do not get\nan additive error term with large value like (cid:107)NLAN(cid:48)(cid:107)F , if not analyzed precisely, one can either\nget a non-analytic expression for the error terms or one that is dif\ufb01cult to analyze. To get analytic\nexpressions for all the error terms that are also easier to analyze, we introduce two carefully chosen\noptimization problems (equation (3)) so that the intermediate terms in our analysis satisfy certain\n\nproperties (see the proof sketch below for exact requirements). Let (cid:98)A be as de\ufb01ned in Figure 1. Part 1\nfollows from the following chain of inequalities and bounding (cid:107)(cid:98)A\u03a6LAN1(cid:107)F :\n\n(cid:107)Mk \u2212 (A 0)(cid:107)F \u2264 (cid:107)Mk \u2212 (cid:98)A(cid:107)F + O(\u03c3min\n\n\u221a\n\n\u2264 (1 + \u03b1)(cid:107)(cid:98)A \u2212 [(cid:98)A]k(cid:107)F + (cid:107)(cid:98)A\u03a6LAN1(cid:107)F + O(\u03c3min\n\u2264 (1 + \u03b1)(cid:107)A \u2212 [A]k(cid:107)F + (cid:107)(cid:98)A\u03a6LAN1(cid:107)F + O(\u03c3min\n\nm)\n\n\u221a\n\u221a\n\nm)\n\n(1)\n\nm),\n\nmin\n\nrk(X)\u2264k\n\nwhere the matrix LA satis\ufb01es the following properties: (a) (cid:107)(cid:98)A\u03a6LA\u03a8(cid:98)A \u2212 (cid:98)A(cid:107)F \u2264 (1 + \u03b1)(cid:107)(cid:98)A \u2212\n[(cid:98)A]k(cid:107)F , (b) LA has rank at most k, and (c) \u03a8A\u03a6LA is a rank-k projection matrix. We use\n\nsubadditivity of norm to prove the \ufb01rst inequality and Weyl\u2019s perturbation theorem [7] to prove the\nthird inequality. Proving the second inequality is the technically involved part. For this, we need\nto \ufb01nd a candidate LA. We \ufb01rst assume we have such a candidate LA with all the three properties.\nOnce we have such an LA, we can prove part (b) as follows:\n\n(cid:107)UXV \u2212 B(cid:107)F \u2264 (cid:107)(cid:98)A\u03a6LA\u03a8(cid:98)A \u2212 (cid:98)A(cid:107)F + (cid:107)(cid:98)A\u03a6LAN1(cid:107)F + (cid:107)S\u2020N1(T\u2020)T(cid:107)F\n\u2264 (1 + \u03b1)(cid:107)(cid:98)A \u2212 [(cid:98)A]k(cid:107)F + (cid:107)(cid:98)A\u03a6LAN1(cid:107)F + (cid:107)S\u2020N2(TT)\u2020(cid:107)F ,\n\n(2)\nwhere B = A + S\u2020N1(T\u2020)T. The \ufb01rst inequality follows from the subadditivity of Frobenius norm,\nthe fact that U and V are orthonormal bases of Yc and Yr, and property (b) to exploit that minimum\non the left hand side is over rank-k matrices. We then use the approximation guarantee of property\n(a) to get the second inequality. Using the fact that S and T are Gaussian matrices, we can lower\nbound the left hand side of equation (2) up to an additive term as follows:\n\n(cid:107) (A 0) \u2212 (cid:101)U(cid:101)\u03a3(cid:101)VT(cid:107)F \u2212 (cid:107)S\u2020N1(TT)\u2020(cid:107)F \u2264 (1 + \u03b1)3 min\n\n(cid:107)UXV \u2212 B(cid:107)F ,\n\nwhere (cid:101)U,(cid:101)\u03a3, and (cid:101)V are as in Algorithm 1. We upper bound the right hand side of equation (2) by\n\nusing Markov\u2019s inequality combined with the fact that both S and T are Gaussian matrices and LA\nsatis\ufb01es property (c). Scaling the value of \u03b1 by a constant gives part 1. So all that remains is to \ufb01nd a\ncandidate matrix LA. We construct such an LA using the following two optimization problems:\n\nrk(X)\u2264k\n\n(cid:107)(cid:98)A\u03a6([(cid:98)A]k\u03a6)\u2020X \u2212 (cid:98)A(cid:107)F .\n\n(3)\n\n(cid:107)\u03a8((cid:98)A\u03a6([(cid:98)A]k\u03a6)\u2020X \u2212 (cid:98)A)(cid:107)F\n\nProb1 : min\nX\n\nand Prob2 : min\nX\n\nWe prove that a solution to Prob1 gives us a candidate LA. This completes the proof.\n\nFrom Priv1 to Priv2. If we try to use the idea described above to prove differential privacy under\nPriv2, we end up with an additive error that depends linearly on min{m, n}. This is because we need\nto perturb the input matrix by a noise proportional to min{\u221a\nkn} to preserve differential privacy\nunder Priv2. We show that by maintaining noisy sketches Y = A\u03a6 + N1 and Z = SA + N2 for\nappropriately chosen noise matrices N1 and N2 and sketching matrices \u03a6 and S, followed by some\npost processing, we can have an optimal error differentially private algorithm under Priv2. Here, we\nrequire S to satisfy the same property as in the case of Priv1. However, the lack of symmetry between\nS and \u03a6 requires us to decouple the effects of noise matrices to get a tight bound on the additive error.\nIn total, we get an ef\ufb01cient (\u0001, \u03b4)-differentially private algorithm that uses O((m\u03b1\u22121 + n)k\u03b1\u22121)\n\nspace and outputs (\u03b1, 99/100, \u03b3, k)-LRF for \u03b3 = (cid:101)O((\n\nkn)(cid:112)log(1/\u03b4)/\u0001\u22122).\n\nkm\u03b1\u22122 +\n\nkm,\n\n\u221a\n\n\u221a\n\n\u221a\n\n4 Differentially Private Algorithms for Streaming Matrices\n\nWe next give more details of our result when matrices are streamed. Unless speci\ufb01ed, for the ease\n\nof presentation, we assume that k \u2265 1/\u03b1, \u03b4 = \u0398(n\u2212 log n), and (cid:101)O(\u00b7) hides a poly log n factor. To\n\ncapture the scenarios where data matrices are constantly updated, we consider the turnstile update\nmodel (see the survey [46] for further motivations). Formally, in a turnstile update model, a matrix\n\n5\n\n\f\u221a\n\u221a\n\n(cid:101)O((\n(cid:101)O((\n(cid:16)(cid:16)\u221a\n(cid:101)O\n(cid:101)O(cid:0)(cid:0)k2\u221a\n(cid:16)\u221a\n\n\u221a\nAdditive Error\nkn)\u03b5\u22121)\nkm\u03b1\u22121 +\n\u221a\nkn)\u03b5\u22121)\nkm\u03b1\u22122 +\n\u221a\nkm + kc\nn\n\u221a\n\n(cid:17)\n\u03b5\u22121(cid:17)\nn + m(cid:1) \u03b5\u22121(cid:1)\n(cid:17)\n\nSpace Required\n\n(cid:101)O((m + n)k\u03b1\u22121)\n(cid:101)O((m\u03b1\u22121 + n)k\u03b1\u22121)\n(cid:101)O((m + n)k\u03b1\u22121)\n\nO(mn)\n\nStreaming\nTurnstile\nTurnstile\n\n\u00d7\n\nRow-wise\nTurnstile\n\nThis work\nThis work\n\nPrivacy Notion\nA \u2212 A(cid:48) = uvT\n(cid:107)A \u2212 A(cid:48)(cid:107)F = 1\nHardt-Roth [30] A \u2212 A(cid:48) = esvT\nUpadhyay [57] A \u2212 A(cid:48) = esvT\nAll of the above\nLower Bounds\n\n\u2126((m + n)k\u03b1\u22121)\nTable 2: Comparison of Results ((cid:107)u(cid:107)2,(cid:107)v(cid:107)2 = 1, es: standard basis, k \u2264 1/\u03b1).\n\nkm +\n\n[30]\n\nkn\n\n\u2126\n\nA \u2208 Rm\u00d7n is initialized to an all zero-matrix and is updated by a sequence of triples {i, j, \u2206}, where\n1 \u2264 i \u2264 m, 1 \u2264 j \u2264 n, and \u2206 \u2208 R. Each update results in a change in the (i, j)-th entry of A as\nfollows: Ai,j \u2190 Ai,j + \u2206.\nAn algorithm is differentially private under turnstile update model if, for all possible matrices\nupdated in the turnstile update model and runs of the algorithm, the output of the algorithm is\n(\u03b5, \u03b4)-differentially private. A straightforward application of known privacy techniques to make\nknown space-optimal non-private algorithms [9] differentially private incurs a large additive error.\nIn other words, it is an open question whether we can solve Problem 1 with good accuracy while\npreserving differential privacy and receiving the matrix in the turnstile update model? We resolve\nthis question positively. We say two data streams are neighboring if they are formed by neighboring\nmatrices. We show the following:\nTheorem 2. Let A be an m \u00d7 n matrix streamed in a turnstile update model. Then there is an\n\nef\ufb01cient (\u03b5, \u03b4)-differentially private algorithm under Priv1 that uses (cid:101)O((m + n)k\u03b1\u22121) space and\ncomputes (\u03b1, 99/100, \u03b3, k)-LRF, where \u03b3 = (cid:101)O((\n\u03b3 = (cid:101)O((\n\nkn)/\u03b5). There is also an ef\ufb01cient\n(\u0001, \u03b4)-differentially private algorithm under Priv2 that computes an (\u03b1, 99/100, \u03b3, k)-LRF, where\n\nmk\u03b1\u22121 +\n\nmk\u03b1\u22122 +\n\nkn)/\u03b5).\n\n\u221a\n\n\u221a\n\n\u221a\n\n\u221a\n\nBefore we argue the tightness of Theorem 2 with respect to both space and additive error, we compare\nour result with previous works. All the private algorithms prior to this work compute a low rank\napproximation of either the matrix A or its covariance ATA. One can compute a factorization from\ntheir output at the expense of an extra O(mn2) time and O(mn) space (Dwork et al. [24] requires\nan extra O(n3) time and O(n2) space to output an LRF of ATA). Some works like [11, 35, 30, 29]\ncompute LRF under the spectral norm instead of Frobenius norm.\nIn other words, Hardt and Roth [30] and Upadhyay [57] study a problem closest to ours (the\ndifferences being that they do not consider turnstile updates and output a low rank matrix). Therefore,\nwe compare Theorem 2 only with these two results. We do not make any assumptions on the private\nmatrix. This allows us to cover matrices of all form and relaxations in an uni\ufb01ed manner. We next\ncompare the accuracy, privacy guarantees, space, and time required in more detail (see Table 2).\nBoth Hardt and Roth [30] and Upadhyay [57] give rank-O(k) approximation instead of rank-k\n(typically, p = \u0398(k)), and m \u2264 n. Therefore, for a reasonable comparison, we consider Theorem 2\nwhen \u03b1 = \u0398(1) and m \u2264 n. Our additive error is smaller than Upadhyay [57] by a factor of\n\napproximation, incur a multiplicative error of(cid:112)1 + k/p, where p is an oversampling parameter\n(cid:101)O(k3/2). To make a reasonable comparison with Hardt and Roth [30], we consider their result\n4.2 and 4.7] results in an additive error (cid:101)O((\ntheir projection matrix. In other words, we improve Hardt and Roth [30] by an (cid:101)O(c\n\nwithout incoherence assumption: which roughly says that no single row of the matrix is signi\ufb01cantly\ncorrelated with any of the right singular vectors of the matrix. Then Hardt and Roth [30, Theorem\nn)\u03b5\u22121), where c is the maximum entry in\n\nk) factor.\n\nkm + ck\n\n\u221a\n\n\u221a\n\n\u221a\n\nOur algorithms are more ef\ufb01cient than previous algorithms in terms of space and time even though\nearlier algorithms output a rank-O(k) matrix and cannot handle updates in the turnstile model.\nUpadhyay [57] takes more time than Hardt and Roth [30]. The algorithm of Hardt and Roth [30]\nuses O(mn) space since it is a private version of Halko et al. [28] and has to store the entire matrix:\nboth the stages of Halko et al. [28] require the matrix explicitly. One of the motivations mentioned in\nHardt and Roth [30] is sparse private incoherent matrices (see the discussion in Hardt and Roth [30,\nSec 1.1]), but their algorithm uses this only to reduce the additive error and not the running time.\n\n6\n\n\f\u221a\n\n\u221a\n\nkn +\n\n\u221a\n\nkn +\n\nOn the other hand, our algorithms use sublinear space and almost matches the running time of most\nef\ufb01cient non-private algorithm in the turnstile model [9, 13].\nOur privacy guarantees are also more general than previous works, who consider two matrices A\nand A(cid:48) neighboring either if A \u2212 A(cid:48) = eieT\nj [29, 31, 33] or A \u2212 A(cid:48) = eivT for some unit vector\nv [24, 30, 57], depending on whether a user\u2019s data is an entry of the matrix or a row of the matrix. It\nis easy to see that these privacy guarantees are special case of Priv1 and Priv2.\nTightness of Additive Error. Hardt and Roth [30] showed a lower bound of \u2126(\nkm) on\nadditive error by showing a reduction to the linear reconstruction attack [18]. In other words, any\nalgorithm that outputs a low rank matrix with additive error o(\nkm) cannot be differentially\nprivate! This lower bound holds even when the private algorithm can access the private matrix any\nnumber of times. Our results show that one can match the lower bound for constant \u03b1, a setting\nconsidered in Hardt and Roth [30], up to a small logarithmic factor, while allowing access to the\nprivate matrix only in the turnstile model.\nSpace Lower Bound and Optimality of the Algorithm Under Priv1. Our algorithms use same\nspace as non-private algorithm up to a logarithmic factor, which is known to be optimal for \u03b3 = 0 [12].\nHowever, we incur a non-zero additive error, \u03b3, which is inevitable [30], and it is not clear if we can\nachieve better space algorithm when \u03b3 (cid:54)= 0.\nWe complement Theorem 2 with a lower bound on the space required for low-rank approximation\nwith non-trivial additive error. Our result holds for any randomized algorithm; therefore, also hold\nfor any private algorithm. This we believe makes our result of independent interest.\nTheorem 3. The space required by any randomized algorithm to solve (\u03b1, 1/6, O(m + n), k)-LRF\nin the turnstile update model is \u2126((n + m)k\u03b1\u22121).\n\nkn). Moreover, known differentially\n\n2 \u2212 1. This thus prove optimality for all k \u2265 3.\n\n\u221a\nAny differentially private incurs an additive error \u2126(\nprivate low-rank approximation [30] set \u03b1 =\nUnder Bounded Norm Assumptions. In some practical applications, matrices are more structured.\nOne such special case is when the rows of private matrix A have bounded norm and one would like\nto approximate ATA. This problem was studied by Dwork et al. [24]. We consider the matrices are\nA(\u03c4 ) \u2208 Rn, and i\u03c4 (cid:54)= i\u03c4(cid:48) for all \u03c4 (cid:54)= \u03c4(cid:48). We show the following by using ATA as the input matrix:\nCorollary 1. Given an A \u2208 Rm\u00d7n updated by inserting one row at a time such that every row\nhas a bounded norm 1 and m > n. Then there is an (\u03b5, \u03b4)-differentially private algorithm under\n\nupdated by row-wise: all the updates at time \u03c4 \u2264 T are of the form(cid:8)i\u03c4 , A(\u03c4 )(cid:9), where 1 \u2264 i\u03c4 \u2264 m,\nPriv2 that uses (cid:101)O(nk\u03b1\u22122) space, and outputs a rank-k matrix B such that (cid:107)ATA \u2212 B(cid:107)F \u2264\n(1 + \u03b1)(cid:107)ATA \u2212 [ATA]k(cid:107)F + (cid:101)O(\n\nnk/(\u03b1\u03b5)).\n\nkm +\n\n\u221a\n\n\u221a\n\n\u221a\n\n\u221a\n\nWe do not violate the lower bound of Dwork et al. [24] because their lower bound is valid when\n\u03b1 = 0, which is not possible for low space algorithms due to Theorem 3. Dwork et al. [24] bypassed\ntheir lower bounds under a stronger assumption known as singular value separation: the difference\nbetween k-th singular value and all k(cid:48)-th singular values for k(cid:48) > k is at least \u03c9(\nn). In other\nwords, our result shows that we do not need singular value separation while using signi\ufb01cantly less\n\nspace\u2014(cid:101)O(nk/\u03b12) as compared to O(n2)\u2014if we are ready to pay for a small multiplicative error.\n\n\u221a\n\nAdapting to Continual Release Model. Until now, we gave algorithms that produce the output only\nat the end of the stream. There is a related model called (\u03b5, \u03b4)-differential privacy under T -continual\nrelease [23]. In this model, the server receives a stream of length T and produces an output after\nevery update, such that every output is (\u03b5, \u03b4)-differentially private. We modify our meta algorithm to\nwork in this model by using the fact that we only store noisy linear sketches of the private matrix\nduring the updates and low-rank factorization is computed through post-processing on only the noisy\nsketches. That is, we can use the generic transformation [23] to maintain the sketch of the updates. A\nfactorization for any time range can be done by aggregating the sketches for the speci\ufb01ed range using\nrange queries. This gives the \ufb01rst instance of algorithm that provides differentially private continual\nrelease of LRF. We show the following.\nTheorem 4. Let A \u2208 Rm\u00d7n be the private matrix streamed over T time epochs. Then there is\nan (\u03b5, \u03b4)-differentially private algorithm under Priv1 that outputs a rank-k factorization under the\n\ncontinual release for T time epochs such that \u03b3 = (cid:101)O(\u03b5\u22121(\n\nmk\u03b1\u22121 +\n\nkn) log T ).\n\n\u221a\n\n\u221a\n\n7\n\n\f5 Noninteractive Local Differentially Private PCA\n\nthe local model, we end up with an additive error (cid:101)O(\n\nTill now, we have considered a single server that receives the private matrix in a streamed manner.\nWe next consider another variant of differential privacy known as local differential privacy (LDP)\n[21, 22, 26, 59]. In the local model, each individual applies a differentially private algorithm locally to\ntheir data and shares only the output of the algorithm\u2014called a report\u2014with a server that aggregates\nusers\u2019 reports. A multi-player protocol is \u03b5-LDP if for all possible inputs and runs of the protocol, the\ntranscript of player i\u2019s interactions with the server is \u03b5-LDP.\nOne can study two variants of local differential privacy depending on whether the server and the users\ninteract more than once or not. In the interactive variant, the server sends several messages, each to a\nsubset of users. In the noninteractive variant, the server sends a single message to all the users at the\nstart of the protocol and sends no message after that. Smith, Thakurta, and Upadhyay [54] argued\nthat noninteractive locally private algorithms are ideal for implementation.\nThe natural extension of Problem 1 in the local model is when the matrix is distributed among the\nusers such that every user has one row of the matrix and users are responsible for the privacy of\ntheir row vector. Unfortunately, known private algorithms (including the results presented till now)\ndo not yield non trivial additive error in the local model. For example, if we convert Theorem 2 to\nkmn). This is worse than the trivial bound\n\u221a\nmn), for example, when A \u2208 {0, 1}m\u00d7n, a trivial output of all zero matrix incurs an error\n\u221a\nof O(\n\u221a\nmn). In fact, existing lower bounds in the local model suggests that one is likely to\nat most O(\nincur an error which is O(\nm) factor worse than in the central model, where m is the number\nof users. However, owing to the result of Dwork et al. [24], we can hope to achieve non-trivial\nresult for differentially private principal component analysis. This problem has been studied without\nprivacy under the row-partition model [6, 39, 37, 9, 27, 50, 51, 55]). We exploit the fact that our\nmeta algorithm only stores differentially private sketches of the input matrix to give a noninteractive\nalgorithm for low-rank principal component analysis (PCA) under local differential privacy. This\nproduces an (\u0001, \u03b4)-locally differentially private algorithm; however, it is non-interactive. We then use\nthe generic transformation of Bun et al. [10] to get the following result.\nTheorem 5. Let m, n \u2208 N and \u03b1, \u03b5, \u03b4 be the input parameters. Let k be the desired rank of\nmatrix A \u2208 Rm\u00d7n distributed in a row-wise manner amongst m users, there is an ef\ufb01cient \u03b5-local\ndifferentially private algorithm under Priv2 that uses O(v2) words of communications from users to\ncentral server and outputs a rank-k orthonormal matrix U such that with probability 9/10,\n\nthe factorization and \u03b7 = max(cid:8)k, \u03b1\u22121(cid:9). Let v = O(\u03b7\u03b1\u22122 log(k/\u03b4)). Given a private input\n\n\u221a\n\n(cid:107)A \u2212 UUTA(cid:107)F \u2264 (1 + O(\u03b1))(cid:107)A \u2212 [A]k(cid:107)F + O\n\n6 Discussion on Neighboring Relation\n\n(cid:16)\n\n(cid:17)\nv(cid:112)m log(1/\u03b4)/\u0001\n\n.\n\ne = (u, v) \u2208 E has weight(cid:80)\n\nThe two privacy guarantees considered in this paper have natural reasons to be considered. Priv1\ngeneralizes the earlier privacy guarantees and captures the setting where any two matrices differ\nin only one spectrum. Since Priv1 is de\ufb01ned in terms of the spectrum of matrices, Priv1 captures\none of the natural privacy requirements in all the applications of LRF. Priv2 is stronger than Priv1.\nTo motivate the de\ufb01nition of Priv2, consider a graph, G := (V,E) that stores career information\nof people in a set P since their graduation. The vertex set V is the set of all companies. An edge\np\u2208P (tp,e/tp), where tp,e is the time for which the person p held a job\nat v after leaving his/her job at u, and tp is the total time lapsed since his/her graduation. Graphs\nlike G are useful because the weight on every edge e = (u, v) depends on the number of people who\nchanged their job status from u to v (and the time they spent at v). Therefore, data analysts might\nwant to mine such graphs for various statistics. In the past, graph statistics have been extensively\nstudied for static graph under edge-level privacy (see, for e.g., [24, 30, 29, 56, 57]): the presence\nor absence of a person corresponds to a change in a single edge. On the other hand, in graphs like\nG, presence or absence of a person would be re\ufb02ected on many edges. If we use earlier results on\nedge-level privacy to such graphs, it would lead to either a large additive error or a loss in privacy\nparameters \u03b5, \u03b4. Priv2 is an attempt to understand whether we can achieve any non-trivial guarantee\non the additive error without depreciating the privacy parameters.\n\n8\n\n\fFigure 1: Empirical Evaluation of Additive Error of Our Algorithm.\n\n7 Empirical Evaluation of Our Algorithms\n\nIn this section, we give a glimpse of our experimental evaluations of additive error and compare it with\nthe best known results. The details and discussion of our empirical evaluations is in supplementary\nmaterials. Two important parameters in our bounds are k and \u03b1 \u2013 Hardt and Roth [30] consider\na constant \u03b1. Therefore, we analyze the additive error with respect to the change in \u03b1 in order to\nbetter understand the effect of differential privacy on low space low-rank approximation of matrices.\nThe result of our experiment is presented in Figure 1 ((a)-(d)) with the scale of y-axis (accuracy)\nin logarithmic to better illustrate the accuracy improvement shown by our algorithm. In both these\nexperiments, we see that the additive error incurred by our algorithm is less than the additive error\nincurred by Hardt and Roth [30]. We note that the matrices are highly incoherent as all the entries\nare sampled i.i.d. We also consider the role of k in our locally-private algorithm. The results of our\nexperiment in presented in Figure 1 ((e)-(f)). The error of our algorithm is consistently less than the\nexpected error.\n\n8 Conclusion\n\nIn this paper, we study differentially private low-rank approximation in various settings of practical\nimportance. We give \ufb01rst algorithms with optimal accuracy, space requirements, and runtime for all\nof these settings. Our results relies crucially on careful analysis and our algorithms heavily exploit\nadvance yet inexpensive post-processing. Prior to this work, only two known private algorithms\nfor Problem 1 use any form of post-processing for LRF: Hardt and Roth [30] uses simple pruning\nof entries of a matrix formed in the intermediate step, while Dwork et al. [24] uses best rank-k\napproximation of the privatized matrix. These post-processing either make the algorithm suited only\nfor static matrices or are expensive.\nThere are few key take aways from this paper: (i) maintaining a differentially private sketches of row\nspace and column space of a matrix already give a sub-optimal accuracy, but this can be signi\ufb01cantly\nimproved by careful inexpensive post-processing, and (ii) the structural properties of linear sketches\ncan be carefully exploited to get tight bound on the error. Prior to this work, it was not clear whether\nthe techniques we use in this paper yields low rank approximation (see the comment after Theorem\nIV.2 and discussion in Section V in Blocki et al. [8]). Therefore, we believe our techniques will \ufb01nd\nuse in many related private algorithms as evident by the recent result of Arora et al. [4].\nAcknowledgements. The author would like to thank Adam Smith for useful feedback on this paper.\nThis research was supported in part by NSF BIGDATA grant IIS-1447700, NSF BIGDATA grant\nIIS-154648, and NSF BIGDATA grant IIS-1838139.\n\n9\n\n\fReferences\n[1] Dimitris Achlioptas, Amos Fiat, Anna R Karlin, and Frank McSherry. Web search via hub\n\nsynthesis. In FOCS, pages 500\u2013509. IEEE, 2001.\n\n[2] Dimitris Achlioptas and Frank McSherry. On spectral learning of mixtures of distributions. In\n\nLearning Theory, pages 458\u2013469. Springer, 2005.\n\n[3] Apple. Apple tries to peek at user habits without violating privacy. The Wall Street Journal,\n\n2016.\n\n[4] Raman Arora, Vladimir Braverman, and Jalaj Upadhyay. Differentially private robust low-rank\napproximation. In Advances in Neural Information Processing Systems, pages 4141\u20134149,\n2018.\n\n[5] Yossi Azar, Amos Fiat, Anna Karlin, Frank McSherry, and Jared Saia. Spectral analysis of data.\n\nIn STOC, pages 619\u2013626. ACM, 2001.\n\n[6] Zheng-Jian Bai, Raymond H Chan, and Franklin T Luk. Principal component analysis for\ndistributed data sets with updating. In International Workshop on Advanced Parallel Processing\nTechnologies, pages 471\u2013483. Springer, 2005.\n\n[7] Rajendra Bhatia. Matrix analysis, volume 169. Springer Science & Business Media, 2013.\n\n[8] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The Johnson-Lindenstrauss\n\nTransform Itself Preserves Differential Privacy. In FOCS, pages 410\u2013419, 2012.\n\n[9] Christos Boutsidis, David P. Woodruff, and Peilin Zhong. Optimal principal component analysis\n\nin distributed and streaming models. In STOC, pages 236\u2013249, 2016.\n\n[10] Mark Bun, Jelani Nelson, and Uri Stemmer. Heavy hitters and the structure of local privacy.\n\narXiv preprint arXiv:1711.04740, 2017.\n\n[11] Kamalika Chaudhuri, Anand D Sarwate, and Kaushik Sinha. Near-optimal differentially private\n\nprincipal components. In NIPS, pages 998\u20131006, 2012.\n\n[12] Kenneth L. Clarkson and David P. Woodruff. Numerical linear algebra in the streaming model.\n\nIn STOC, pages 205\u2013214, 2009.\n\n[13] Kenneth L Clarkson and David P Woodruff. Low rank approximation and regression in input\n\nsparsity time. In STOC, pages 81\u201390. ACM, 2013.\n\n[14] Kenneth L Clarkson and David P Woodruff. Low-rank psd approximation in input-sparsity time.\nIn Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms,\npages 2061\u20132072. SIAM, 2017.\n\n[15] Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu.\nDimensionality reduction for k-means clustering and low rank approximation. In STOC, pages\n163\u2013172. ACM, 2015.\n\n[16] Michael B Cohen, Cameron Musco, and Christopher Musco. Input sparsity time low-rank\napproximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual\nACM-SIAM Symposium on Discrete Algorithms, pages 1758\u20131777. SIAM, 2017.\n\n[17] Amit Deshpande and Santosh Vempala. Adaptive sampling and fast low-rank matrix approxi-\nmation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and\nTechniques, pages 292\u2013303. Springer, 2006.\n\n[18] Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In PODS, pages\n\n202\u2013210. ACM, 2003.\n\n[19] Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, and V Vinay. Clustering large\n\ngraphs via the singular value decomposition. Machine learning, 56(1-3):9\u201333, 2004.\n\n[20] Petros Drineas, Iordanis Kerenidis, and Prabhakar Raghavan. Competitive recommendation\n\nsystems. In STOC, pages 82\u201390. ACM, 2002.\n\n10\n\n\f[21] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax\nrates. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on,\npages 429\u2013438. IEEE, 2013.\n\n[22] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating Noise to\nSensitivity in Private Data Analysis. In Shai Halevi and Tal Rabin, editors, TCC, volume 3876\nof Lecture Notes in Computer Science, pages 265\u2013284. Springer, 2006.\n\n[23] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under\n\ncontinual observation. In STOC, pages 715\u2013724, 2010.\n\n[24] Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Analyze Gauss: Optimal\nBounds for Privacy-Preserving Principal Component Analysis. In STOC, pages 11\u201320, 2014.\n\n[25] \u00dalfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable\nprivacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on\ncomputer and communications security, pages 1054\u20131067. ACM, 2014.\n\n[26] Alexandre Ev\ufb01mievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches\n\nin privacy preserving data mining. In PODS, pages 211\u2013222. ACM, 2003.\n\n[27] Dan Garber, Ohad Shamir, and Nathan Srebro. Communication-ef\ufb01cient algorithms for dis-\n\ntributed stochastic principal component analysis. arXiv preprint arXiv:1702.08169, 2017.\n\n[28] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure with randomness:\nProbabilistic algorithms for constructing approximate matrix decompositions. SIAM review,\n53(2):217\u2013288, 2011.\n\n[29] Moritz Hardt and Eric Price. The noisy power method: A meta algorithm with applications. In\nZ. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger, editors, Advances\nin Neural Information Processing Systems 27, pages 2861\u20132869. Curran Associates, Inc., 2014.\n\n[30] Moritz Hardt and Aaron Roth. Beating randomized response on incoherent matrices. In STOC,\n\npages 1255\u20131268, 2012.\n\n[31] Moritz Hardt and Aaron Roth. Beyond worst-case analysis in private singular vector computa-\n\ntion. In STOC, pages 331\u2013340, 2013.\n\n[32] Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using\nalternating minimization. In Proceedings of the forty-\ufb01fth annual ACM symposium on Theory\nof computing, pages 665\u2013674. ACM, 2013.\n\n[33] Wuxuan Jiang, Cong Xie, and Zhihua Zhang. Wishart mechanism for differentially private\n\nprincipal components analysis. arXiv preprint arXiv:1511.05680, 2015.\n\n[34] Ravindran Kannan, Hadi Salmasian, and Santosh Vempala. The spectral method for general\n\nmixture models. In Learning Theory, pages 444\u2013457. Springer, 2005.\n\n[35] Michael Kapralov and Kunal Talwar. On differentially private low rank approximation. In\n\nSODA, volume 5, page 1. SIAM, 2013.\n\n[36] Jon M Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM\n\n(JACM), 46(5):604\u2013632, 1999.\n\n[37] Yann-A\u00ebl Le Borgne, Sylvain Raybaud, and Gianluca Bontempi. Distributed principal compo-\n\nnent analysis for wireless sensor networks. Sensors, 8(8):4821\u20134850, 2008.\n\n[38] Yi Li, Huy L. Nguyen, and David P. Woodruff. Turnstile streaming algorithms might as well be\nlinear sketches. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May\n31 - June 03, 2014, pages 174\u2013183, 2014.\n\n[39] Yingyu Liang, Maria-Florina F Balcan, Vandana Kanchanapally, and David Woodruff. Improved\nIn Advances in Neural Information Processing\n\ndistributed principal component analysis.\nSystems, pages 3113\u20133121, 2014.\n\n11\n\n\f[40] Avner Magen and Anastasios Zouzias. Low rank matrix-valued chernoff bounds and approxi-\n\nmate matrix multiplication. In SODA, pages 1422\u20131436. SIAM, 2011.\n\n[41] Michael W Mahoney. Randomized algorithms for matrices and data. Foundations and Trends R(cid:13)\n\nin Machine Learning, 3(2):123\u2013224, 2011.\n\n[42] Ivan Markovsky. Structured low-rank approximation and its applications. Automatica, 44(4):891\u2013\n\n909, 2008.\n\n[43] Frank McSherry. Spectral partitioning of random graphs. In FOCS, pages 529\u2013537. IEEE,\n\n2001.\n\n[44] Xiangrui Meng and Michael W Mahoney. Low-distortion subspace embeddings in input-sparsity\n\ntime and applications to robust linear regression. In STOC, pages 91\u2013100. ACM, 2013.\n\n[45] Cameron Musco and David P Woodruff. Sublinear time low-rank approximation of positive\n\nsemide\ufb01nite matrices. arXiv preprint arXiv:1704.03371, 2017.\n\n[46] Shanmugavelayutham Muthukrishnan. Data streams: Algorithms and applications. Now\n\nPublishers Inc, 2005.\n\n[47] Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In\n\nSecurity and Privacy, 2008. SP 2008. IEEE Symposium on, pages 111\u2013125. IEEE, 2008.\n\n[48] Nam H Nguyen, Thong T Do, and Trac D Tran. A fast and ef\ufb01cient algorithm for low-rank\n\napproximation of a matrix. In STOC, pages 215\u2013224. ACM, 2009.\n\n[49] Christos H Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent\nsemantic indexing: A probabilistic analysis. In Proceedings of the seventeenth ACM SIGACT-\nSIGMOD-SIGART symposium on Principles of database systems, pages 159\u2013168. ACM, 1998.\n\n[50] Jack Poulson, Bryan Marker, Robert A Van de Geijn, Jeff R Hammond, and Nichols A Romero.\nElemental: A new framework for distributed memory dense matrix computations. ACM\nTransactions on Mathematical Software (TOMS), 39(2):13, 2013.\n\n[51] Yongming Qu, George Ostrouchov, Nagiza Samatova, and Al Geist. Principal component\nanalysis for dimension reduction in massive distributed data sets. In Proceedings of IEEE\nInternational Conference on Data Mining (ICDM), 2002.\n\n[52] Tamas Sarlos. Improved approximation algorithms for large matrices via random projections.\n\nIn FOCS, pages 143\u2013152. IEEE, 2006.\n\n[53] John Shawe-Taylor and Nello Cristianini. Kernel methods for pattern analysis. Cambridge\n\nuniversity press, 2004.\n\n[54] A. Smith, A. Thakurata, and J. Upadhyay. Is Interaction Necessary for Distributed Private\n\nLearning? To Appear in IEEE Symposium for Security & Privacy, 2017.\n\n[55] Fran\u00e7oise Tisseur and Jack Dongarra. A parallel divide and conquer algorithm for the symmetric\neigenvalue problem on distributed memory architectures. SIAM Journal on Scienti\ufb01c Computing,\n20(6):2223\u20132236, 1999.\n\n[56] Jalaj Upadhyay. Random Projections, Graph Sparsi\ufb01cation, and Differential Privacy.\n\nASIACRYPT (1), pages 276\u2013295, 2013.\n\nIn\n\n[57] Jalaj Upadhyay. Differentially private linear algebra in the streaming model. arXiv preprint\n\narXiv:1409.5414, 2014.\n\n[58] Jalaj Upadhyay. Randomness ef\ufb01cient fast-johnson-lindenstrauss transform with applications in\n\ndifferential privacy and compressed sensing. arXiv preprint arXiv:1410.2470, 2014.\n\n[59] Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer\n\nbias. Journal of the American Statistical Association, 60(309):63\u201369, 1965.\n\n12\n\n\f", "award": [], "sourceid": 2059, "authors": [{"given_name": "Jalaj", "family_name": "Upadhyay", "institution": "Johns Hopkins University"}]}