{"title": "Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited", "book": "Advances in Neural Information Processing Systems", "page_first": 965, "page_last": 974, "abstract": "In this paper, we revisit the Empirical Risk Minimization problem in the non-interactive local model of differential privacy. In the case of constant or low dimensions ($p\\ll n$), we first show that if the loss function is $(\\infty, T)$-smooth, we can avoid a dependence of the sample complexity, to achieve error $\\alpha$, on the exponential of the dimensionality $p$ with base $1/\\alpha$ ({\\em i.e.,} $\\alpha^{-p}$),\n which answers a question in \\cite{smith2017interaction}. Our approach is based on polynomial approximation. Then, we propose player-efficient algorithms with $1$-bit communication complexity and $O(1)$ computation cost for each player. The error bound is asymptotically the same as the original one. With some additional assumptions, we also give an efficient algorithm for the server. \n In the case of high dimensions ($n\\ll p$), we show that if the loss function is a convex generalized linear function, the error can be bounded by using the Gaussian width of the constrained set, instead of $p$, which improves the one in \n \\cite{smith2017interaction}.", "full_text": "Empirical Risk Minimization in Non-interactive\n\nLocal Differential Privacy Revisited \u2217\n\nDi Wang\nJinhui Xu\nDepartment of Computer Science and Engineering\n\nMarco Gaboardi\n\nEmail:{dwang45,gaboardi,jinhui}@buffalo.edu\n\nState University of New York at Buffalo\n\nBuffalo, NY, 14260\n\nAbstract\n\nIn this paper, we revisit the Empirical Risk Minimization problem in the non-\ninteractive local model of differential privacy.\nIn the case of constant or low\ndimensions (p (cid:28) n), we \ufb01rst show that if the loss function is (\u221e, T )-smooth,\nwe can avoid a dependence of the sample complexity, to achieve error \u03b1, on the\nexponential of the dimensionality p with base 1/\u03b1 (i.e., \u03b1\u2212p), which answers a\nquestion in [19]. Our approach is based on polynomial approximation. Then, we\npropose player-ef\ufb01cient algorithms with 1-bit communication complexity and O(1)\ncomputation cost for each player. The error bound is asymptotically the same\nas the original one. With some additional assumptions, we also give an ef\ufb01cient\nalgorithm for the server. In the case of high dimensions (n (cid:28) p), we show that if\nthe loss function is a convex generalized linear function, the error can be bounded\nby using the Gaussian width of the constrained set, instead of p, which improves\nthe one in [19].\n\n1\n\nIntroduction\n\nDifferential privacy [7] has emerged as a rigorous notion for privacy which allows accurate data\nanalysis with a guaranteed bound on the increase in harm for each individual to contribute her\ndata. Methods to guarantee differential privacy have been widely studied, and recently adopted in\nindustry [15, 8].\nTwo main user models have emerged for differential privacy: the central model and the local one. In\nthe central model, data are managed by a trusted central entity which is responsible for collecting\nthem and for deciding which differentially private data analysis to perform and to release. A classical\nuse case for this model is the one for collecting census data [9]. In the local model, each individual\nmanages his/her proper data and discloses them to a server through some differentially private\nmechanisms. The server collects the (now private) data of each individual and combines them into a\nresulting data analysis. A classical application of this model is the one aiming at collecting statistics\nfrom user devices like in the case of Google\u2019s Chrome browser [8], and Apple\u2019s iOS-10 [15, 20].\nIn the local model, there are two basic kinds of protocols: interactive and non-interactive. Bassily\nand Smith [2] have recently investigated the power of non-interactive differentially private protocols.\nThis type of protocols seems to be more appealing to real world applications, due to the fact that they\ncan be implemented more easily (i.e., less in\ufb02uenced by the network latency issue). Both Google and\nApple use the non-interactive model in their projects [15, 8].\n\n\u2217This research was supported in part by the National Science Foundation (NSF) under Grant No. CCF-\n\n1422324, CCF-1716400, CCF-1718220 and CNS-1565365.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fDespite being used in industry, the local model has been much less studied than the central one. Part\nof the reason for this is that there are intrinsic limitations in what one can do in the local model.\nAs a consequence, many basic questions, that are well studied in the central model, have not been\ncompletely understood in the local model, yet.\nIn this paper, we study differentially private Empirical Risk Minimization in the non-interactive local\nmodel. Before showing our contributions and discussing comparisons with previous works, we \ufb01rst\ndiscuss our motivations.\nProblem setting [19, 24, 23] Given a convex, closed and bounded constraint set C \u2286 Rp, a\ndata universe D, and a loss function (cid:96) : C \u00d7 D (cid:55)\u2192 R. A dataset D = {x1, x2 \u00b7\u00b7\u00b7 , xn} \u2208 Dn\nde\ufb01nes an empirical risk function: \u02c6L(\u03b8; D) = 1\ni=1 (cid:96)(\u03b8, xi). When the inputs are drawn i.i.d\nfrom an unknown underlying distribution P on D, we can also de\ufb01ne the population risk function:\nn\nLP (\u03b8) = ED\u223cP n [(cid:96)(\u03b8; D)]. Now we have the following two kinds of excess risk, one is empirical risk,\ni.e. ErrD(\u03b8priv) = \u02c6L(\u03b8priv; D) \u2212 min\u03b8\u2208C \u02c6L(\u03b8; D), the other one is population risk, i.e. ErrP (\u03b8priv) =\nLP (\u03b8priv) \u2212 min\u03b8\u2208C LP (\u03b8).\nThe problem that we study in this paper is \ufb01nding \u03b8priv \u2208 C under non-interactive local differential\nprivacy (see De\ufb01nition 1) which makes the empirical and population excess risk as low as possible.\nAlternatively, when dimensionality p is a constant or low, we can express this problem in terms of\nsample complexity as \ufb01nding n as small as possible for achieving ErrD \u2264 \u03b1 and ErrP \u2264 \u03b1, where \u03b1\nis the user speci\ufb01ed error tolerance (or simply called error).\n\n(cid:80)n\n\nMotivation Smith et al. [19] prove the following result concerning the problem for general convex\n1-Lipschitz loss functions over a bounded constraint set.\nTheorem 1. Under the assumptions above, there is a non-interactive \u0001-LDP algorithm such that for\nall distribution P on D, with probability 1 \u2212 \u03b2, we have\n\nErrP (\u03b8priv) \u2264 \u02dcO(cid:0)(\n\n\u221a\n\np+1(cid:1).\n\n1\n\np log2(1/\u03b2)\n\n\u00012n\n\n)\n\n(1)\n\nA similar result holds for ErrD, with at least \u2126(n\np+1 ) for both computation and communication\ncomplexity for each user. Alternatively, to achieve error \u03b1, the sample complexity must satis\ufb01es\n\u221a\npcp\u0001\u22122\u03b1\u2212(p+1)), where c is some constant (approximately 2). More importantly, they\nn = \u02dc\u2126(\nalso show that generally, the dependence of the sample size over the dimensionality p, in the terms\n\u03b1\u2212(p+1) and cp, is unavoidable.\n\n1\n\nthe sample complexity satis\ufb01es n > O(cid:0)p( 8r\n\nThis situation is somehow undesirable: when the dimensionality is high and the target error is low, the\ndependency on \u03b1\u2212(p+1) could make the sample size quite large. However, several results have already\nshown that for some speci\ufb01c loss functions, the exponential dependency on the dimensionality can\nbe avoided. For example, Smith et al. [19] show that, in the case of linear regression, there is a non-\ninteractive (\u0001, \u03b4)-LDP algorithm2 whose sample complexity for achieving error \u03b1 for the empirical\nrisk is n = \u2126(p log(1/\u03b4)\u0001\u22122\u03b1\u22122). Similarly, Zheng et al. [27] showed that for logistic regression, if\nr are independent of p, then there is a non-interactive (\u0001, \u03b4)-LDP algorithm with ErrP (\u03b8priv) \u2264 \u03b1.\nThis propels us to the following natural questions: i) Is there any algorithm that has a lower sample\ncomplexity than the one in Theorem 1? ii) The above discussion indicates that there is a gap between\nthe general case and the case of speci\ufb01c loss functions. Is it possible to introduce some natural\nconditions on the loss function that guarantee non-interactive \u0001-LDP with sample complexity that\nis not exponential in the dimensionality p? iii) The computation and communication costs for each\nuser in the protocols of Smith et al. [19] depend on n, which could be high for large datasets. Is it\npossible to make them independent of n? iv) The bounds in Smith et al. [19] are not very meaningful\nin high dimensions. However, in machine learning it is quite common to have high dimensionality,\ni.e., n (cid:28) p. Thus, can we obtain some more meaningful bounds for the high dimensional case?\nBelow we investigate the answer to each question.\n\n\u03b12\u00012 )(cid:1), where c and\n\n\u03b1 )4r log log(8r/\u03b1)( 4r\n\n\u0001 )2cr log(8r/\u03b1)+2( 1\n\n2Although, these two results are formulated for non-interactive (\u0001, \u03b4)-LDP, in the rest of the paper we will\n\nfocus on non-interactive \u0001-LDP algorithms.\n\n2\n\n\fOur Contributions\n\n4 )p\u03b1\u2212(2+ p\n\n\u02dc\u2126(cid:0)(c0p 1\n\n2 )\u0001\u22122(cid:1), the excess empirical risk is ensured to be ErrD \u2264 \u03b1, If the\n\n1. We \ufb01rst show that by using Bernstein polynomial approximation, it is possible to achieve a\nnon-interactive \u0001-LDP algorithm in constant or low dimensions with the following properties.\nIf the loss function is (8, T )-smooth (see De\ufb01nition 5), with a sample complexity of n =\nloss function is (\u221e, T )-smooth, the sample complexity can be further improved to n \u2265\npp\u0001\u22122\u03b1\u22124), where Dp depends only on p. Note that in the \ufb01rst case, the sample\n\u02dc\u2126(4p(p+1)D2\ncomplexity is lower than the one in [19] when \u03b1 \u2264 O( 1\np ), and in the second case, the sample\ncomplexity depends only polynomially on \u03b1\u22121, instead of the exponential dependence as in\n[19]. Furthermore, our algorithm does not assume convexity for the loss function and thus\ncan be applied to non-convex loss functions.\n\n2. Then, we address the ef\ufb01ciency issue, which has only been partially studied before [19].\nFollowing an approach similar to [2], we propose an algorithm for our loss functions which\nhas only 1-bit communication cost and O(1) computation cost for each client, and achieves\nasymptotically the same error bound as the original one. Additionally, we show also a novel\nanalysis for the server. This shows that if the loss function is convex and Lipschitz and the\nconvex set satis\ufb01es some natural conditions, then we have an algorithm which achieves the\nerror bound of O(p\u03b1) and runs in polynomial time in 1\n\u03b1 (instead of exponential time as in\n[19]) if the loss function is (\u221e, T )-smooth.\n\n3. Next, we consider the high dimensional case, and show that if the loss function is a convex\ngeneralized linear function, then an \u0001-LDP algorithm is achievable with its risk bound\ndepending only on n and the Gaussian Width of C, which is much smaller than the one in\n[19]. Particularly, if C is an (cid:96)1 norm ball or a distribution simplex, the risk bound depends\nonly on n and log p, instead of p.\n\n4. Lastly, we show the generality of our technique by applying the polynomial approximation\ntechniques to other problems. We give non-interactive \u0001-LDP algorithms for answering the\nclass of k-way marginals queries and the class of smooth queries, by using different type of\npolynomials approximations (details are in Supplementary Material).\n\nMethods\n\nSample Complexity\n(omit Poly(p) terms)\n\n\u02dc\u2126(4p\u03b1\u2212(p+2)\u0001\u22122)\n\nClaim 4 in\n[19]\nTheorem 10\nin [19]\nThis Paper\nThis Paper\n\nCommunication\nCost\n(each\nuser)\n1\n\nComputation\nCost (each\nuser)\nO(1)\n\nRunning\ntime for the\nserver\n\nO(cid:0)( 1\n\u03b1)p(cid:1)\n\nAssumptions\n\nLipschitz\n\n\u02dc\u2126(2p\u03b1\u2212(p+1)\u0001\u22122)\n\n\u02dc\u2126(cid:0)(c0p 1\n\n\u2126(n\n\n2 )\u0001\u22122(cid:1) 1\n\n4 )p\u03b1\u2212(2+ p\n\n\u02dc\u2126(4p(p+1)D2\n\np\u0001\u22122\u03b1\u22124)\n\n1\n\n1\n\np+1 )\n\n\u2126(n\n\n1\n\np+1 )\n\nO(1)\n\nO(1)\n\nNot Men-\ntioned\nO(( 1\n\u03b1)\n\nO(cid:0)Poly( 1\n\n2 )\n\nand\n\nLipschitz\nConvex\n(8, T )-smooth\n\n\u03b1)(cid:1) (\u221e, T )-smooth\n\np\n\nTable 1: Comparisons with existing results in [19] (we assume p is a constant). When the error\n\u03b1 \u2264 O( 1\np ), the sample complexity of (8, T )-smooth loss functions is less than the existing result.\nWhen the error \u03b1 \u2264 O( 1\n16p ), the sample complexity for (\u221e, T )-smooth loss functions is less than\nthe previous results.\n\nTable 1 shows some comparisons with the results in [19]. Due to the space limit, all proofs and some\ndetails of the algorithms are left to the Supplementary Material.\n\n2 Related Works\n\nERM in the local model of differential privacy has been studied in [12, 3, 6, 5, 27, 19, 25]. Ka-\nsiviswanathan et al. [12] showed a general equivalence between learning in the local model and\nlearning in the statistical query model. Duchi et al. [6, 5] gave the lower bound O(\nn ) and optimal\n\n\u221a\n\u221a\nd\n\u0001\n\n3\n\n\falgorithms for general convex optimization; however, their optimal procedure needs many rounds\nof interactions. The works that are most related to ours are [27, 19]. Zheng et al. [27] considered\nsome speci\ufb01c loss functions in high dimensions, such as sparse linear regression and kernel ridge\nregression.\nNote that although it also studied a class of loss functions (i.e., Smooth Generalized Linear Loss\nfunctions) and used the polynomial approximation approach, the functions investigated in our paper\nare more general, which include linear regression and logistic regression, and the approximation\ntechniques are quite different. Smith et al. [19] studied general convex loss functions for population\nexcess risk and showed that the dependence on the exponential of the dimensionality is unavoidable.\nIn this paper, we show that such a dependence in the term of \u03b1 is actually avoidable for a class of loss\nfunctions. This even holds for non-convex loss functions, which is quite different from all existing\nworks. Also we study the high dimensional case by using dimension reduction. The polynomial\napproximation approach has been used under central model in [1, 26, 21, 27] and the dimension\nreduction has been used in local model in [2, 27].\n\n3 Preliminaries\n\nIn LDP, we have a data universe D, n players with each\nDifferential privacy in the local model.\nholding a private data record xi \u2208 D, and a server that is in charge of coordinating the protocol. An\nLDP protocol proceeds in T rounds. In each round, the server sends a message, which we sometime\ncall a query, to a subset of the players, requesting them to run a particular algorithm. Based on the\nqueries, each player i in the subset selects an algorithm Qi, run it on her data, and sends the output\nback to the server.\nDe\ufb01nition 1. [12, 19] An algorithm Q is \u0001-locally differentially private (LDP) if for all pairs\nx, x(cid:48) \u2208 D, and for all events E in the output space of Q, we have Pr[Q(x) \u2208 E] \u2264 e\u0001Pr[Q(x(cid:48)) \u2208 E].\nA multi-player protocol is \u0001-LDP if for all possible inputs and runs of the protocol, the transcript of\nplayer i\u2019s interaction with the server is \u0001-LDP. If T = 1, we say that the protocol is \u0001 non-interactive\nLDP.\n\nSince we only consider non-interactive LDP\nthrough the paper, we will use LDP as non-\ninteractive LDP below. As an example that will\nbe useful in the sequel, the next lemma shows an\n\u0001-LDP algorithm for computing 1-dimensional\naverage.\nLemma 1. Algorithm 1 is \u0001-LDP. Moreover, if\nplayer i \u2208 [n] holds value vi \u2208 [0, b] and n >\n\u03b2 with 0 < \u03b2 < 1, then, with probability\nlog 2\nat least 1 \u2212 \u03b2, the output a \u2208 R satis\ufb01es: |a \u2212\n\n(cid:80)n\ni=1 vi| \u2264 2b\n\n(cid:113)\n\n1\nn\n\nlog 2\n\u03b2\u221a\n\nn\u0001\n\n.\n\nAlgorithm 1 1-dim LDP-AVG\n1: Input: Player i \u2208 [n] holding data vi \u2208 [0, b],\nprivacy parameter \u0001.\n2: for Each Player i do\n3:\n4: end for\n5: for The Server do\nOutput a = 1\n6:\nn\n7: end for\n\nSend zi = vi + Lap( b\n\u0001 )\n\n(cid:80)n\n\ni=1 zi.\n\nBernstein polynomials and approximation.\nWe give here some basic de\ufb01nitions that will\nbe used in the sequel; more details can be found in [1, 13, 14].\nDe\ufb01nition 2. Let k be a positive integer. The Bernstein basis polynomials of degree k are de\ufb01ned as\n\n(bv,k; x).\n\n4\n\n(cid:1)xv(1 \u2212 x)k\u2212v for v = 0,\u00b7\u00b7\u00b7 , k.\n\nbv,k(x) =(cid:0)k\ndegree k is de\ufb01ned as Bk(f ; x) =(cid:80)k\n\nv\n\nDe\ufb01nition 3. Let f : [0, 1] (cid:55)\u2192 R and k be a positive integer. Then, the Bernstein polynomial of f of\nv=0 f (v/k)bv,k(x). We denote by Bk the Bernstein operator\n\nBk(f )(x) = Bk(f, x).\nDe\ufb01nition 4. [14] Let h be a positive integer. The iterated Bernstein operator of order h is de\ufb01ned\nas the sequence of linear operators B(h)\nk, where I = B0\nk\n. The iterated Bernstein\ndenotes the identity operator and Bi\npolynomial of order h can be computed as B(h)\nv,k(x), where b(h)\nv,k(x) =\n\nk = I \u2212 (I \u2212 Bk)h =(cid:80)h\nk (f ; x) = (cid:80)k\n\n(cid:0)h\n(cid:1)(\u22121)i\u22121Bi\nk = Bk \u25e6 Bk\u22121\nk\nv=0 f ( v\n\nk is de\ufb01ned as Bi\n\nk )b(h)\n\ni=1\n\ni\n\n(cid:80)h\n\n(cid:0)h\n(cid:1)(\u22121)i\u22121Bi\u22121\n\nk\n\ni=1\n\ni\n\n\fIterated Bernstein operator can well approximate multivariate (h, T )-smooth functions.\nDe\ufb01nition 5. [14] Let h be a positive integer and T > 0 be a constant. A function f : [0, 1]p (cid:55)\u2192 R is\n(h, T )-smooth if it is in class Ch([0, 1]p) and its partial derivatives up to order h are all bounded by T .\nWe say it is (\u221e, T )-smooth, if for every h \u2208 N it is (h, T )-smooth.\nDe\ufb01nition 6. Assume f : [0, 1]p (cid:55)\u2192 R and let k1,\u00b7\u00b7\u00b7 , kp, h be positive integers. The multivariate\niterated Bernstein polynomial of order h at y = (y1, . . . , yp) is de\ufb01ned as:\n\nB(h)\n\nk1,...,kp\n\n(f ; y) =\n\nf (\n\nv1\nk1\n\n, . . . ,\n\nvp\nkp\n\n)\n\nb(h)\nvi,ki\n\n(yi).\n\n(2)\n\np(cid:88)\n\nkj(cid:88)\n\nj=1\n\nvj =0\n\np(cid:89)\n\ni=1\n\n(f ; y) if k = k1 = \u00b7\u00b7\u00b7 = kp.\n\nk = B(h)\n\nWe denote B(h)\nTheorem 2. [1] If f : [0, 1]p (cid:55)\u2192 R is a (2h, T )-smooth function, then for all positive integers k and\ny \u2208 [0, 1]p, we have |f (y) \u2212 B(h)\nk (f ; y)| \u2264 O(pT Dhk\u2212h). Where Dh is a universal constant only\nrelated to h.\n\nk1,...,kp\n\nOur settings We conclude this section by making explicitly the settings that we will consider\nthroughout the paper. We assume that there is a constraint set C \u2286 [0, 1]p and for every x \u2208 D and\n\u03b8 \u2208 C, (cid:96)(\u00b7, x) is well de\ufb01ned on [0, 1]p and (cid:96)(\u03b8, x) \u2208 [0, 1]. These closed intervals can be extended to\narbitrarily bounded closed intervals. Our settings are similar to the \u2018Typical Settings\u2019 in [19], where\nC \u2286 [0, 1]p appears in their Theorem 10, and (cid:96)(\u03b8, x) \u2208 [0, 1] from their 1-Lipschitz requirement and\n(cid:107)C(cid:107)2 \u2264 1.\n\n4 Low Dimensional Case\n\nDe\ufb01nition 6 and Theorem 2 tell us that if we know the value of the empirical risk function, i.e. the\nk ), where (v1,\u00b7\u00b7\u00b7 , vp) \u2208\naverage of the sum of loss functions, on each of the grid points ( v1\nT = {0, 1,\u00b7\u00b7\u00b7 , k}p for some large k, then we can approximate it well. Our main observation is that\nthis can be done in the local model by estimating the average of the sum of loss functions on each of\nthe grid points using Algorithm 1. This is the idea of Algorithm 2.\n\nk \u00b7\u00b7\u00b7 vp\n\nk , v2\n\nfor Each Player i \u2208 [n] do\n\nparameter \u0001 > 0, and parameter k.\nk , . . . , vp\nk , . . . , vp\n\nAlgorithm 2 Local Bernstein Mechanism\n1: Input: Player i \u2208 [n] holding data xi \u2208 D, public loss function (cid:96) : [0, 1]p \u00d7 D (cid:55)\u2192 [0, 1], privacy\n2: Construct the grid T = { v1\n3: for Each grid point v = ( v1\n4:\n5:\n6:\n7:\n8: end for\n9: for The Server do\n10:\n\nk }{v1,...,vp}, where {v1, . . . , vp} \u2208 {0, 1,\u00b7\u00b7\u00b7 , k}p.\nk ) \u2208 T do\n\nConstruct Bernstein polynomial, as in (2), for the perturbed empirical loss \u02dcL(v; D). Denote\n\n(k+1)p and b = 1 and denote the output as \u02dcL(v; D).\n\nend for\nRun Algorithm 1 with \u0001 = \u0001\n\nCalculate (cid:96)(v; xi).\n\n\u02dcL(\u00b7, D) the corresponding function.\n\nCompute \u03b8priv = arg min\u03b8\u2208C \u02dcL(\u03b8; D).\n\n11:\n12: end for\n\nTheorem 3. For any \u0001 > 0 and 0 < \u03b2 < 1, Algorithm 2 is \u0001-LDP. Assume that the loss function\n(cid:96)(\u00b7, x) is (2h, T )-smooth for all x \u2208 D, some positive integer h, and constant T . If n, \u0001 and \u03b2 satisfy\nthe condition of n = \u2126\n, with probability\nat least 1 \u2212 \u03b2 we have:\n\n, then by setting k = O\n\n2(h+1)p(cid:113)\n\n\u03b2 4p(h+1)\n\u00012D2\nh\n\n( Dh\n\nlog 1\n\u03b2\n\npn\u0001\n\n\u221a\n\nh+p\n\n)\n\n1\n\n(cid:16) log 1\n\n(cid:16)\n\n(cid:17)\n\nErrD(\u03b8priv) \u2264 \u02dcO\n\nh\n\n2(h+p) ( 1\n\np\n\n2(h+p) 2(h+1)p h\n\nh+p\n\np\n\np\n\np+h\n\u03b2 )D\nh\nn\n\nh\n\n2(h+p) \u0001\n\nh\n\nh+p\n\n,\n\n(3)\n\n(cid:17)\n(cid:16) log\n\n(cid:17)\n\nwhere \u02dcO hides the log and T terms.\n\n5\n\n\f2p\nh\n\np\n\n\u03b2 D\n\nh p\n\nh 4(h+1)p\u0001\u22122\u03b1\u2212(2+ 2p\n\nFrom (3) we can see that in order to achieve error \u03b1, the sample complexity needs to be n =\n\u02dc\u2126(log 1\nCorollary 1. If the loss function (cid:96)(\u00b7, x) is (8, T )-smooth for all x \u2208 D and some constant T , and\nn, \u0001, \u03b2, k satisfy the condition in Theorem 3 with h = 4, then with probability at least 1 \u2212 \u03b2, the\n\nsample complexity to achieve \u03b1 error is n = \u02dcO(cid:0)\u03b1\u2212(2+ p\nNote that the sample complexity for general convex loss functions in [19] is n = \u02dc\u2126(cid:0)\u03b1\u2212(p+1)\u0001\u221222p(cid:1),\n\nh )). This implies the following special cases.\n\nwhich is considerably worse than ours when \u03b1 \u2264 O( 1\np ).\nCorollary 2. If the loss function (cid:96)(\u00b7, x) is (\u221e, T )-smooth for all x \u2208 D and some constant T , and\nn, \u0001, \u03b2, k satisfy the condition in Theorem 3 with h = p, then with probability at least 1 \u2212 \u03b2, the\noutput \u03b8priv of Algorithm 2 satis\ufb01es: ErrD(\u03b8priv) \u2264 \u02dcO\n, where \u02dcO hides the log\nand T terms. So, to achieve error \u03b1, with probability at least 1 \u2212 \u03b2, we have sample complexity:\n\n(cid:16) log 1\n\n4 )p(cid:1).\n\n2 )\u0001\u22122(45\n\n1\n2\np p\n1\n4 \u0001\n\nD4p 1\n\n2(p+1)p\n\n(cid:17)\n\n\u221a\n\n1\n4 D\n\n1\n4\n\n1\n2\n\n\u03b2\n\nn\n\n\u221a\n\n(cid:16)\n\n}(cid:17)\n\nn = \u02dc\u2126\n\nmax{4p(p+1) log(\n\n)D2\n\npp\u0001\u22122\u03b1\u22124,\n\n1\n\u03b2\n\nlog 1\n\n\u03b2 4p(p+1)\n\u00012D2\np\n\n.\n\n(4)\n\nlog 1\n\n\u221a\n\n\u00012D2\n2p\n\n\u03b2 cp2\n\n\u03b2 D2p\n\n2 log 1\n\np\u0001\u22122\u03b1\u22123,\n\nIt is worth noticing that from (3) we can see that when the term h\np grows, the term \u03b1 de-\ncreases. Thus, for loss functions that are (\u221e, T )-smooth, we can get a smaller dependency\nthan the term \u03b1\u22124 in (4). For example, if we take h = 2p, then the sample complexity is\nn = \u2126(max{cp2\n}) for some constants c, c2. When h \u2192 \u221e, the\ndependency on the error becomes \u03b1\u22122, which is the optimal bound, even for convex functions.\nOur analysis of the empirical excess risk does not use the convexity assumption. While this gives\na bound which is not optimal, even for p = 1, it also says that our result holds for non-convex loss\nfunctions and constrained domain set, as long as they are smooth enough.\nFrom (4), we can see that our sample complexity is lower than the one in [19] when \u03b1 \u2264 O( 1\n16p ).\nNote that to achieve the best performance for the ERM problem in low dimensional space, quite often\nthe error is set to be extremely small, e.g., \u03b1 = 10\u221210 \u223c 10\u221214[10].\nUsing the convexity assumption of the loss function, and a lemma in [18], we can also give a bound\non the population excess risk, details are in Supplementary Material.\nCorollary 1 and 2 provide answers to our motivative questions. That is, for loss functions which are\n(8, T )-smooth, we can obtain a lower sample complexity; if they are (\u221e, T )-smooth, there is an \u0001-\nLDP algorithm for the empirical and population excess risks achieving error \u03b1 with sample complexity\nwhich is independent from the dimensionality p in the term \u03b1. This result does not contradict the\nresults in Smith et al. [19]. Indeed, the example used to show the unavoidable dependency between\nthe sample complexity and \u03b1\u2212\u2126(p), to achieve the \u03b1 error, is actually non-smooth.\nHowever, in our result of (\u221e, T )-smooth case, like in the one by Smith et al. [19], there is still a\ndependency of the sample complexity in the term cp, for some constant c. There is still the question\nabout what condition would allow a sample complexity independent from this term. We leave this\nquestion for future research and we focus instead on the ef\ufb01ciency and further applications of our\nmethod.\n\n5 More Ef\ufb01cient Algorithms\n\nAlgorithm 2 has computational time and communication complexity for each player which is expo-\nnential in the dimensionality. This is clearly problematic for every realistic practical application. For\nthis reason, in this section, we study more ef\ufb01cient algorithms. In order for convenience, in this part\nwe only focus on the case of (\u221e, T )-smooth loss functions, it can be easily extended to general cases.\nConsider the following lemma, showing an \u0001-LDP algorithm for computing p-dimensional average\n(notice the extra conditions on n and p compared with Lemma 1).\nLemma 2. [16] Consider player i \u2208 [n] holding data vi \u2208 Rp with coordinate between 0 and b.\nThen for 0 < \u03b2 < 1, 0 < \u0001 such that n \u2265 8p log( 8p\n\u03b2 , there is an \u0001-LDP\n\nn \u2265 12\n\n\u03b2 ) and\n\n(cid:113)\n\nlog 32\n\n\u221a\n\n\u0001\n\n6\n\n\f(cid:80)n\ni=1[vi]j| \u2264 O( bp\u221a\n\nn\u0001\n\n1\nn\n\n(cid:113)\n\nalgorithm, LDP-AVG, with probability at least 1 \u2212 \u03b2, the output a \u2208 Rp satisfying: maxj\u2208[p] |aj \u2212\n\nlog p\n\n\u03b2 )3. Moreover, the computation cost for each user is O(1).\n\nBy using this lemma and by discretizing the grid with some interval steps, we can design an\nalgorithm which requires O(1) computation time and O(log n)-bits communication per player\n(see Supplementary Material). However, we would like to do even better and obtain constant\ncommunication complexity. Instead of discretizing the grid, we apply a technique, \ufb01rstly proposed by\nBassily and Smith [2], which permits to transform any \u2018sampling resilient\u2019 \u0001-LDP protocol into a\nprotocol with 1-bit communication complexity. Roughly speaking, a protocol is sampling resilient if\nits output on any dataset S can be approximated well by its output on a random subset of half of the\nplayers.\nSince our algorithm only uses the LDP-AVG protocol, we can show that it is indeed sampling resilient.\nInspired by this result, we propose Algorithm 3 and obtain the following theorem.\nTheorem 4. For 0 < \u0001 \u2264 ln 2 and 0 < \u03b2 < 1, Algorithm 3 is \u0001-LDP. If the loss function (cid:96)(\u00b7, x) is\n(\u221e, T )-smooth for all x \u2208 D and n = \u2126(max{ log 1\n\u03b2}), then\n\n2p(cid:1), the results in Corollary 2 hold with probability at least 1 \u2212 4\u03b2.\n\nby setting k = O(cid:0)( Dp\n2(p+1)p(cid:113)\n\n, p(k + 1)p log(k + 1), 1\n\n\u03b2 4p(p+1)\n\u00012D2\np\n\n\u00012 log 1\n\npn\u0001\n\n\u221a\n\n)\n\n1\n\nlog 1\n\u03b2\n\nMoreover, for each player the time complexity is O(1), and the communication complexity is 1-bit.\n\nk , . . . , vp\n\npoint T (j) \u2208 T .\n\n\u0001 ),\u00b7\u00b7\u00b7 , yn = Lap( 1\n\u0001 ).\n\nparameter \u0001 \u2264 ln 2, and parameter k.\n\nk }{v1,...,vp}, where {v1, . . . , vp} \u2208 {0, 1,\u00b7\u00b7\u00b7 , k}p.\n\nAlgorithm 3 Player-Ef\ufb01cient Local Bernstein Mechanism with 1-bit communication per player\n1: Input: Player i \u2208 [n] holding data xi \u2208 D, public loss function (cid:96) : [0, 1]p \u00d7 D (cid:55)\u2192 [0, 1], privacy\n2: Preprocessing:\n3: Generate n independent public strings\n4: y1 = Lap( 1\n5: Construct the grid T = { v1\n6: Partition randomly [n] into d = (k + 1)p subsets I1, I2,\u00b7\u00b7\u00b7 , Id, and associate each Ij to a grid\n7: for Each Player i \u2208 [n] do\n8:\n9:\n10:\n11: end for\n12: for The Server do\n13:\n14:\n15:\n16:\n17:\n18:\n\nFind Il such that i \u2208 Il. Calculate vi = (cid:96)(T (l); xi).\nCompute pi = 1\n2\nSample a bit bi from Bernoulli(pi) and send it to the server.\n\nCompute v(cid:96) = n|Il|\nDenote the corresponding grid point ( v1\n\nCheck if bi = 1, set \u02dczi = yi, otherwise \u02dczi = 0.\n\nend for\nfor each l \u2208 [d] do\n\nfor i = 1\u00b7\u00b7\u00b7 n do\n\nthen denote\n\nk ) \u2208 T of Il,\n\nk ); D) = vl.\n\nk ,\u00b7\u00b7\u00b7 , vp\nend for\nConstruct Bernstein polynomial for the perturbed empirical loss \u02dcL as in Algorithm 2. Denote\n\nk , . . . , vp\n\nPr[vi+Lap( 1\n\n\u0001 )=yi]\n\nPr[Lap( 1\n\n\u0001 )=yi]\n\n(cid:80)\n\ni\u2208I(cid:96)\n\n\u02dczi\n\n\u02c6L(( v1\n\n19:\n20:\n\n\u02dcL(\u00b7, D) the corresponding function.\n\nCompute \u03b8priv = arg min\u03b8\u2208C \u02dcL(\u03b8; D).\n\n21:\n22: end for\n\nNow we study the algorithm from the server\u2019s complexity perspective. The polynomial construction\ntime complexity is O(n), where the most inef\ufb01cient part is \ufb01nding \u03b8priv = arg min\u03b8\u2208C \u02dcL(\u03b8, D).\nIn fact, this function may be non-convex; but unlike general non-convex functions, it can be \u03b1-\nuniformly approximated by a convex function \u02c6L(\u00b7; D) if the loss function is convex (by the proof of\n\n3Note that here we use an weak version of their result\n\n7\n\n\fp , \u0001\n\n\u0001 , log 1\n\nTheorem 3), although we do not have access to it. Thus, we can see this problem as an instance of\nApproximately-Convex Optimization, which has been studied recently by Risteski and Li [17].\nDe\ufb01nition 7. [17] We say that a convex set C is \u00b5-well-conditioned for \u00b5 \u2265 1, if there exists a\n(cid:107)\u22072F (x)(cid:107)2\n\u2264 \u00b5.\nfunction F : Rp (cid:55)\u2192 R such that C = {x|F (x) \u2264 0} and for every x \u2208 \u2202K :\n(cid:107)\u2207F (x)(cid:107)2\nLemma 3 (Theorem 3.2 in [17]). Let \u0001, \u2206 be two real numbers such that \u2206 \u2264 max{ \u00012\np}\u00d7 1\n\u221a\n16348.\n\u00b5\nThen, there exists an algorithm A such that for any given \u2206-approximate convex function \u02dcf over a \u00b5-\nwell-conditioned convex set C \u2286 Rp of diameter 1 (that is, there exists a 1-Lipschitz convex function\nf : C (cid:55)\u2192 R such that for every x \u2208 C,|f (x) \u2212 \u02dcf (x)| \u2264 \u2206), A returns a point \u02dcx \u2208 C with probability\n\u03b4 ) and with the following guarantee \u02dcf (\u02dcx) \u2264 minx\u2208C \u02dcf (x) + \u0001.\nat least 1 \u2212 \u03b4 in time Poly(p, 1\nBased on Lemma 3 (for \u02dcL(\u03b8; D)) and Corollary 2, and taking \u0001 = O(p\u03b1), we have the following.\nTheorem 5. Under\n=\npp\u0001\u22122\u03b1\u22124), that the loss function (cid:96)(\u00b7, x) is 1-Lipschitz and convex for\n\u02dc\u2126(4p(p+1) log(1/\u03b2)D2\nevery x \u2208 D, that the constraint set C is convex and (cid:107)C(cid:107)2 \u2264 1, and satis\ufb01es \u00b5-well-condition\nproperty (see De\ufb01nition 7), if the error \u03b1 satis\ufb01es \u03b1 \u2264 C \u00b5\n\u221a\np for some universal constant C, then\nthere is an algorithm A which runs in Poly(n, 1\n\u03b2 ) time4 for the server, and with probability\n1 \u2212 2\u03b2 the output \u02dc\u03b8priv of A satis\ufb01es \u02dcL(\u02dc\u03b8priv; D) \u2264 min\u03b8\u2208C \u02dcL(\u03b8; D) + O(p\u03b1), which means that\nErrD(\u02dc\u03b8priv) \u2264 O(p\u03b1).\nCombining with Theorem 4, 5 and Corollary 2, and taking \u03b1 = \u03b1\nTheorem 6. Under the conditions of Corollary 2, Theorem 4 and 5, and for any C \u00b5\u221a\nwe further set n = \u02dc\u2126(4p(p+1) log(1/\u03b2)D2\nrunning time and 1-bit communication per player, and Poly( 1\nFurthermore, with probability at least 1 \u2212 5\u03b2, the output \u02dc\u03b8priv satis\ufb01es ErrD(\u02dc\u03b8priv) \u2264 O(\u03b1).\n\np > \u03b1 > 0, if\npp5\u0001\u22122\u03b1\u22124), then there is an \u0001-LDP algorithm, with O(1)\n\u03b2 ) running time for the server.\n\np , we have our \ufb01nal result:\n\nin Corollary\n\nconditions\n\nassuming\n\n\u03b1 , log 1\n\n\u03b1 , log 1\n\nthat n\n\nthe\n\n2,\n\nand\n\np\n\n6 High Dimensional Case\n\nIn previous sections, p is assumed to be either constant or low. In this section, we present a general\nmethod for a family of loss functions, called generalized linear functions, in high dimensions.\nA function (cid:96)(w, x) is called a Generalized Linear Function (GLF) [18] if (cid:96)(w, x) = f ((cid:104)w, y(cid:105), z) for\nx = (y, z), where y \u2208 Rp is the data and z is the label. GLF is a rather general family of functions,\nincluding many frequently encountered loss functions like logistic regression, hinge loss, linear\nregression, etc. We assume that the dataset satis\ufb01es the conditions of (cid:107)yi(cid:107) \u2264 1 and (cid:107)zi(cid:107) \u2264 1 for all\ni \u2208 [n]. Also, f is assumed to be 1-Lipschitz, convex, (cid:107)C(cid:107)2 \u2264 1 and isotropic 5.\nOur algorithm is inspired by the one in [11]. We \ufb01rst conduct a dimension reduction for the whole\ndataset. That is, D(cid:48) = {(\u03a6y1, z1),\u00b7\u00b7\u00b7 , (\u03a6yn, zn)}, where \u03a6 \u2208 Rm\u00d7p. Then, we run a modi\ufb01ed\nversion of the algorithm in [19] (since the algorithm in [19] assumes log n \u2265 p). After obtaining\nthe private estimator \u00afw \u2208 Rm, we use a compressive sensing technique (by solving an optimization\nproblem [22]) to recover wpriv \u2208 Rp. Our method is based on the following lemma in [4].\nLemma 4. Let \u02dc\u03a6 \u2208 Rm\u00d7p be an random matrix, whose rows are i.i.d mean-zero, isotropic, sub-\nGaussian random variable in Rd with \u03c8 = (cid:107)\u03a6i(cid:107)\u03c82. Let \u03a6 = 1\u221a\n\u02dc\u03a6 and S be a set of points in\nRd. Then, there is a constant C > 0 such that for any 0 < \u03b3, \u03b2 < 1. Pr[supa\u2208S |(cid:107)\u03a6a(cid:107)2 \u2212 (cid:107)a(cid:107)2 \u2264\n\u03b32 max{G2S , log(1/\u03b2)}.\n\u03b3(cid:107)a(cid:107)2] \u2264 \u03b2, provided that m \u2265 C\u03c84\nTheorem 7. Under the assumption above. For any \u0001 \u2264 1\n\u221a\n4, Algorithm 4 is O(\u0001)-LDP. Moreover,\n\u221a\n(GC+\nsetting m = \u0398( \u03c84(GC+\n), then with\n\n), where \u03b3 = \u0398(\n\nlog n)2 log(n/\u03b2)\n\nlog n) log(1/\u03b2) 4\n\nlog(n/\u03b2)\n\n\u221a\n\n\u221a\n\n\u221a\n\nm\n\n\u03c8\n\n\u03b32\n\nn\u0001\n\n4Note that since here we assume n is at least exponential in p, thus the algorithm is not fully polynomial.\n5A convex set is isotropic if a random vector chosen uniformly from K according to the volume is isotropic.\n\nA random vector a is isotropic if for all b \u2208 Rp, E[(cid:104)a, b(cid:105)2] = (cid:107)b(cid:107)2, such as polytope.\n\n8\n\n\fAlgorithm 4 DR-ERM-LDP\n1: Input: Player i \u2208 [n] holding data xi = (yi, zi) \u2208 D, where (cid:107)yi(cid:107) \u2264 1, privacy parameter \u0001.\n2: The server generates an random sub-Gaussian matrix \u03a6 \u2208 Rm\u00d7p in Lemma 3, and sends the\n\nseed of this random matrix to all players.\n\n3: for Each Player i do\n4:\n5:\n\nCalculate x(cid:48)\ni = (\u03a6yi, zi)\nRun the modi\ufb01ed \u0001-local DP algorithm of [19](see Supplementary Material for more details)\ni}n\ni=1 with constrained set C = \u03a6C and loss function f. The server get the output as\n\nfor D(cid:48) = {x(cid:48)\n\u00afw \u2208 Rm.\n6: end for\n7: The server solves the following problem wpriv = arg minw\u2208Rp (cid:107)w(cid:107)C, subject to \u03a6w = \u00afw.\n\nprobability at least 1\u2212 \u03b2, ErrD(wpriv) = \u02dcO(cid:0)(cid:0) log(1/\u03b2)\u03c8\n\u02dcO(cid:0)(cid:0) log(1/\u03b2) 4\u221a\n\nsubgaussian norm of the distribution of \u03a6, and GC is the Gaussian width of C.\nCorollary 3. If \u03a6 is a standard Gaussian random matrix, C is the (cid:96)1 norm ball Bp\n1 or the distribution\nsimplex in Rp, and n (cid:28) p \u2264 ecn for some constant c, then the bound in Theorem 7 is just\n\n1+m(cid:1), where m = O(n\u00012 log p(cid:112)log(n/\u03b2)). Note that the bound in\n(cid:1) 1\n\n\u221a\n\nlog n) 4\n\nlog(n/\u03b2)\n\n1+m ), where \u03c8 is the\n\n(cid:1) 1\n\n\u221a\n(GC+\n\u221a\n\nn\u0001\n\n\u221a\n\n\u221a\n\nlog p 4\n\u221a\n\nn\u0001\n\nlog(n/\u03b2)\n\nthis case is always better than the one in Theorem 1, since it is always O(1).\n\nReferences\n[1] Francesco Ald\u00e0 and Benjamin IP Rubinstein. The bernstein mechanism: Function release under\n\ndifferential privacy. In AAAI, pages 1705\u20131711, 2017.\n\n[2] Raef Bassily and Adam Smith. Local, private, ef\ufb01cient protocols for succinct histograms.\nIn Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages\n127\u2013135. ACM, 2015.\n\n[3] Amos Beimel, Kobbi Nissim, and Eran Omri. Distributed private data analysis: Simultaneously\n\nsolving how and what. In CRYPTO, volume 5157, pages 451\u2013468. Springer, 2008.\n\n[4] Sjoerd Dirksen. Dimensionality reduction with subgaussian matrices: a uni\ufb01ed theory. Founda-\n\ntions of Computational Mathematics, 16(5):1367\u20131396, 2016.\n\n[5] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax\nrates. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on,\npages 429\u2013438. IEEE, 2013.\n\n[6] John C Duchi, Michael I Jordan, and Martin J Wainwright. Minimax optimal procedures for\nlocally private estimation. Journal of the American Statistical Association, (just-accepted),\n2017.\n\n[7] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to\n\nsensitivity in private data analysis. In TCC, volume 3876, pages 265\u2013284. Springer, 2006.\n\n[8] \u00dalfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable\nprivacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on\ncomputer and communications security, pages 1054\u20131067. ACM, 2014.\n\n[9] Samuel Haney, Ashwin Machanavajjhala, John M. Abowd, Matthew Graham, Mark Kutzbach,\nand Lars Vilhuber. Utility cost of formal privacy for releasing national employer-employee\nstatistics. In Proceedings of the 2017 ACM International Conference on Management of Data,\nSIGMOD \u201917, pages 1339\u20131354, New York, NY, USA, 2017. ACM. ISBN 978-1-4503-4197-\n4. doi: 10.1145/3035918.3035940. URL http://doi.acm.org/10.1145/3035918.\n3035940.\n\n[10] Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance\n\nreduction. In Advances in neural information processing systems, pages 315\u2013323, 2013.\n\n9\n\n\f[11] Shiva Prasad Kasiviswanathan and Hongxia Jin. Ef\ufb01cient private empirical risk minimization for\nhigh-dimensional learning. In International Conference on Machine Learning, pages 488\u2013497,\n2016.\n\n[12] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam\n\nSmith. What can we learn privately? SIAM Journal on Computing, 40(3):793\u2013826, 2011.\n\n[13] G.G. Lorentz. Bernstein Polynomials. AMS Chelsea Publishing Series. Chelsea Publishing\n\nCompany, 1986. ISBN 9780828403238.\n\n[14] Charles Micchelli. The saturation class and iterates of the bernstein polynomials. Journal of\n\nApproximation Theory, 8(1):1\u201318, 1973.\n\n[15] Joe Near. Differential privacy at scale: Uber and berkeley collaboration. In Enigma 2018\n\n(Enigma 2018), Santa Clara, CA, 2018. USENIX Association.\n\n[16] Kobbi Nissim and Uri Stemmer. Clustering algorithms for the centralized and local models.\n\nCoRR, abs/1707.04766, 2017.\n\n[17] Andrej Risteski and Yuanzhi Li. Algorithms and matching lower bounds for approximately-\nconvex optimization. In Advances in Neural Information Processing Systems, pages 4745\u20134753,\n2016.\n\n[18] Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Stochastic convex\n\noptimization. In COLT, 2009.\n\n[19] Adam Smith, Abhradeep Thakurta, and Jalaj Upadhyay. Is interaction necessary for distributed\n\nprivate learning? In IEEE Symposium on Security and Privacy, 2017.\n\n[20] Jun Tang, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang, and XiaoFeng Wang. Privacy\nloss in apple\u2019s implementation of differential privacy on macos 10.12. CoRR, abs/1709.02753,\n2017.\n\n[21] Justin Thaler, Jonathan Ullman, and Salil Vadhan. Faster algorithms for privately releasing\nmarginals. In International Colloquium on Automata, Languages, and Programming, pages\n810\u2013821. Springer, 2012.\n\n[22] Roman Vershynin. Estimation in high dimensions: a geometric perspective. In Sampling theory,\n\na renaissance, pages 3\u201366. Springer, 2015.\n\n[23] Di Wang and Jinhui Xu. Differentially private empirical risk minimization with smooth non-\nconvex loss functions: A non-stationary view. Thirty-Third AAAI Conference on Arti\ufb01cial\nIntelligence, (AAAI-19), Honolulu, Hawaii, USA, January 27-February 1, 2019.\n\n[24] Di Wang, Minwei Ye, and Jinhui Xu. Differentially private empirical risk minimization revisited:\nFaster and more general. In Advances in Neural Information Processing Systems, pages 2722\u2013\n2731, 2017.\n\n[25] Di Wang, Adam Smith, and Jinhui Xu. Differentially private empirical risk minimization in\nnon-interactive local model via polynomial of inner product approximation. In Algorithmic\nLearning Theory, ALT 2019, 22-24 March 2019, Chicago, IL, USA, 2019.\n\n[26] Ziteng Wang, Chi Jin, Kai Fan, Jiaqi Zhang, Junliang Huang, Yiqiao Zhong, and Liwei Wang.\nDifferentially private data releasing for smooth queries. The Journal of Machine Learning\nResearch, 17(1):1779\u20131820, 2016.\n\n[27] Kai Zheng, Wenlong Mou, and Liwei Wang. Collect at once, use effectively: Making non-\ninteractive locally private learning possible. In Proceedings of the 34th International Conference\non Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 4130\u20134139,\n2017.\n\n10\n\n\f", "award": [], "sourceid": 527, "authors": [{"given_name": "Di", "family_name": "Wang", "institution": "State University of New York at Buffalo"}, {"given_name": "Marco", "family_name": "Gaboardi", "institution": "Univeristy at Buffalo"}, {"given_name": "Jinhui", "family_name": "Xu", "institution": "SUNY at Buffalo"}]}