{"title": "Practical Locally Private Heavy Hitters", "book": "Advances in Neural Information Processing Systems", "page_first": 2288, "page_last": 2296, "abstract": "We present new practical local differentially private heavy hitters algorithms achieving optimal or near-optimal worst-case error -- TreeHist and Bitstogram. In both algorithms, server running time is $\\tilde O(n)$ and user running time is $\\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STOC 2015] requiring $\\tilde O(n^{5/2})$ server time and $\\tilde O(n^{3/2})$ user time. With a typically large number of participants in local algorithms ($n$ in the millions), this reduction in time complexity, in particular at the user side, is crucial for the use of such algorithms in practice. We implemented Algorithm TreeHist to verify our theoretical analysis and compared its performance with the performance of Google's RAPPOR code.", "full_text": "Practical Locally Private Heavy Hitters\n\nRaef Bassily\u2217\n\nKobbi Nissim\u2020\n\nUri Stemmer\u2021\n\nAbhradeep Thakurta\u00a7\n\nAbstract\n\nWe present new practical local differentially private heavy hitters algorithms\nachieving optimal or near-optimal worst-case error \u2013 TreeHist and Bitstogram.\nIn both algorithms, server running time is \u02dcO(n) and user running time is \u02dcO(1),\nhence improving on the prior state-of-the-art result of Bassily and Smith [STOC\n2015] requiring \u02dcO(n5/2) server time and \u02dcO(n3/2) user time. With a typically\nlarge number of participants in local algorithms (n in the millions), this reduction\nin time complexity, in particular at the user side, is crucial for the use of such\nalgorithms in practice. We implemented Algorithm TreeHist to verify our theo-\nretical analysis and compared its performance with the performance of Google\u2019s\nRAPPOR code.\n\n1\n\nIntroduction\n\nWe revisit the problem of computing heavy hitters with local differential privacy. Such computations\nhave already been implemented to provide organizations with valuable information about their user\nbase while providing users with the strong guarantee that their privacy would be preserved even if\nthe organization is subpoenaed for the entire information seen during an execution. Two prominent\nexamples are Google\u2019s use of RAPPOR in the Chrome browser [10] and Apple\u2019s use of differential\nprivacy in iOS-10 [16]. These tools are used for learning new words typed by users and identifying\nfrequently used emojis and frequently accessed websites.\nDifferential privacy in the local model. Differential privacy [9] provides a framework for rigor-\nously analyzing privacy risk and hence can help organization mitigate users\u2019 privacy concerns as it\nensures that what is learned about any individual user would be (almost) the same whether the user\u2019s\ninformation is used as input to an analysis or not.\nDifferentially private algorithms work in two main modalities \u2013 the curator model and the local\nmodel. The curator model assumes a trusted centralized curator that collects all the personal infor-\nmation and then analyzes it. The local model on the other hand, does not involve a central repository.\nInstead, each piece of personal information is randomized by its provider to protect privacy even if\nall information provided to the analysis is revealed. Holding a central repository of personal infor-\nmation can become a liability to organizations in face of security breaches, employee misconduct,\nsubpoenas, etc. This makes the local model attractive for implementation. Indeed in the last few\nyears Google and Apple have deployed local differentially private analyses [10, 16].\nChallenges of the local model. A disadvantage of the local model is that it requires introducing\nnoise at a signi\ufb01cantly higher level than what is required in the curator model. Furthermore, some\ntasks which are possible in the curator model are impossible in the local model [9, 14, 7]. To see\nthe effect of noise, consider estimating the number of HIV positives in a given population of n\nparticipants. In the curated model, it suf\ufb01ces to add Laplace noise of magnitude O(1/\u0001) [9], i.e.,\n\n\u2217Department of Computer Science & Engineering, The Ohio State University. bassily.1@osu.edu\n\u2020Department of Computer Science, Georgetown University. kobbi.nissim@georgetown.edu\n\u2021Center for Research on Computation and Society (CRCS), Harvard University. u@uri.co.il\n\u00a7Department of Computer Science, University of California Santa Cruz. aguhatha@ucsc.edu.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\f\u221a\nn/\u0001) is known for the local model [7]. A higher\nindependent of n. In contrast, a lowerbound of \u2126(\nnoise level implies that the number of participants n needs to be large (maybe in the millions for a\nreasonable choice of \u0001). An important consequence is that practical local algorithms must exhibit\nlow time, space, and communication complexity, especially at the user side. This is the problem\naddressed in our work.\nHeavy hitters and histograms in the local model. Assume each of n users holds an element xi\ntaken from a domain of size d. A histogram of this data lists (an estimate of) the multiplicity of each\ndomain element in the data. When d is large, a succinct representation of the histogram is desired ei-\nther in form of a frequency oracle \u2013 allowing to approximate the multiplicity of any domain element\n\u2013 and heavy hitters \u2013 listing the multiplicities of most frequent domain elements, implicitly consid-\nering the multiplicities of other domain elements as zero. The problem of computing histograms\nwith differential privacy has attracted signi\ufb01cant attention both in the curator model [9, 5, 6] and the\nlocal model [13, 10, 4]. Of relevance is the work in [15].\nWe brie\ufb02y report on the state of the art heavy hitters algorithms of Bassily and Smith [4] and\nThakurta et al. [16], which are most relevant for the current work. Bassily and Smith provide\n\nmatching lower and upper bounds of \u0398((cid:112)n log(d)/\u0001) on the worst-case error of local heavy hitters\n\nalgorithms. Their local algorithm exhibits optimal communication but a rather high time complex-\nity: Server running time is \u02dcO(n5/2) and, crucially, user running time is \u02dcO(n3/2) \u2013 complexity that\nseverely hampers the practicality of this algorithm. The construction by Thakurta et al. is a heuristic\nwith no bounds on server running time and accuracy.1 User computation time is \u02dcO(1), a signi\ufb01cant\nimprovement over [4]. See Table 1.\nOur contributions. The focus of this work is on the design of locally private heavy hitters al-\ngorithms with near optimal error, keeping time, space, and communication complexity minimal.\nWe provide two new constructions of heavy hitters algorithms TreeHist and Bitstogram that apply\ndifferent techniques and achieve similar performance. We implemented Algorithm TreeHist and\nprovide measurements in comparison with RAPPOR [10] (the only currently available implemen-\ntation for local histograms). Our measurements are performed with a setting that is favorable to\nRAPPOR (i.e., a small input domain), yet they indicate that Algorithm TreeHist performs better\nthan RAPPOR in terms of noise level.\nTable 1 details various performance parameters of algorithms TreeHist and Bitstogram, and the\nreader can check that these are similar up to small factors which we ignore in the following discus-\nsion. Comparing with [4], we improve time complexity both at the server (reduced from \u02dcO(n5/2) to\n\u02dcO(n)) and at the user (reduced from \u02dcO(n3/2) to O(max (log n, log d)2)). Comparing with [16],\nwe get provable bounds on the server running time and worst-case error. Note that Algorithm\nBitstogram achieves optimal worst-case error whereas Algorithm TreeHist is almost optimal, by\n\na factor of(cid:112)log(n).\n\nPerformance metric\n\nServer time\nUser time\n\nServer processing memory\n\nUser memory\n\nCommunication/user\n\nPublic randomness/user 3\n\nWorst-case Error\n\nTreeHist\n(this work)\n\n\u02dcO(n)\n\u02dcO(1)\n\u221a\n\u02dcO(\nn)\n\u02dcO(1)\n\nBitstogram\n(this work)\n\nBassily and Smith\n\n[4]2\n\n\u02dcO(n)\n\u02dcO(1)\n\u221a\n\u02dcO(\nn)\n\u02dcO(1)\n\n\u02dcO(n5/2)\n\u02dcO(n3/2)\nO(n2)\n\u02dcO(n3/2)\n\nO (1)\n\u02dcO(1)\n\n(cid:16)(cid:112)n log(n) log(d)\n(cid:17)\n\nO\n\nO (1)\n\u02dcO(1)\n\n(cid:16)(cid:112)n log(d)\n(cid:17)\n\nO\n\nO (1)\n\u02dcO(n3/2)\n\n(cid:16)(cid:112)n log(d)\n\n(cid:17)\n\nO\n\nTable 1: Achievable performance of our protocols, and comparison to the prior state-of-the-art by\nBassily and Smith [4]. For simplicity, the \u02dcO notation hides logarithmic factors in n and d. Depen-\ndencies on the failure probability \u03b2 and the privacy parameter \u0001 are omitted.\n\n1The underlying construction in [16] is of a frequency oracle.\n\n2\n\n\fElements of the constructions. Main details of our constructions are presented in sections 3 and 4.\nBoth our algorithms make use of frequency oracles \u2013 data structures that allow estimating various\ncounts.\nAlgorithm TreeHist identi\ufb01es heavy-hitters and estimates their frequencies by scanning the levels of\na binary pre\ufb01x tree whose leaves correspond to dictionary items. The recovery of the heavy hitters is\n\u221a\nin a bit-by-bit manner. As the algorithm progresses down the tree it prunes all the nodes that cannot\nbe pre\ufb01xes of heavy hitters, hence leaving \u02dcO(\nn) nodes in every depth. This is done by making\nqueries to a frequency oracle. Once the algorithm reaches the \ufb01nal level of the tree it identi\ufb01es the\nlist of heavy hitters. It then invokes the frequency oracle once more on those particular items to\nobtain more accurate estimates for their frequencies.\nAlgorithm Bitstogram hashes the input domain into a domain of size roughly\nn. The observa-\ntion behind this algorithm is that if a heavy hitter x does not collide with other heavy hitters then\n(h(x), xi) would have a signi\ufb01cantly higher count than (h(x),\u00acxi) where xi is the ith bit of x. This\nallows recovering all bits of x in parallel given an appropriate frequency oracle.\nWe remark that even though we describe our protocols as operating in phases (e.g., scanning the\nlevels of a binary tree), these phases are done in parallel, and our constructions are non-interactive.\nAll users participate simultaneously, each sending a single message to the server. We also remark\nthat while our focus is on algorithms achieving the optimal (i.e., smallest possible) error, our algo-\nrithms are also applicable when the server is interested in a larger error, in which case the server\ncan choose a random subsample of the users to participate in the computation. This will reduce the\nserver runtime and memory usage, and also reduce the privacy cost in the sense that the unsampled\nusers get perfect privacy (so the server might use their data in another analysis).\n\n\u221a\n\n2 Preliminaries\n\n2.1 De\ufb01nitions and Notation\nDictionary and users items: Let V = [d]. We consider a set of n users, where each user i \u2208 [n] has\nsome item vi \u2208 V. Sometimes, we will also use vi to refer to the binary representation vi when it is\nclear from the context.\nFrequencies: For each item v \u2208 V, we de\ufb01ne the frequency f (v) of such item as the number\ni\u2208[n] 1(vi = v), where 1(E) of an event E is the\n\nof users holding that item, namely, f (v) (cid:44) (cid:80)\n\nindicator function of E.\nA frequency oracle: is a data structure together with an algorithm that, for any given v \u2208 V, allows\ncomputing an estimate \u02c6f (v) of the frequency f (v).\nA succinct histogram: is a data structure that provides a (short) list of items \u02c6v1, ..., \u02c6vk, called the\nheavy hitters, together with estimates for their frequencies ( \u02c6f (\u02c6vj) : j \u2208 [k]). The frequencies of\nthe items not in the list are implicitly estimated as \u02c6f (v) = 0. We measure the error in a succinct\nhistogram by the (cid:96)\u221e distance between the estimated and true frequencies, maxv\u2208[d] | \u02c6f (v) \u2212 f (v)|.\nWe will also consider the maximum error in the estimated frequencies restricted to the items in the\nlist, that is, max\u02c6vj :j\u2208[k]| \u02c6f (\u02c6vj) \u2212 f (\u02c6vj)|.\nIf a data succinct histogram aims to provide (cid:96)\u221e error \u03b7, the list does not need to contain more than\nO(1/\u03b7) items (since items with estimated frequencies below \u03b7 may be omitted from the list, at the\nprice of at most doubling the error).\n\n2The user\u2019s run-time and memory in [4] can be improved to O(n) if one assumes random access to the\n\npublic randomness, which we do not assume in this work.\n\n3Our protocols can be implemented without public randomness while attaining essentially the same perfor-\n\nmance.\n\n3\n\n\f2.2 Local Differential Privacy\nIn the local model, an algorithm A : V \u2192 Z accesses the database v = (v1,\u00b7\u00b7\u00b7 , vn) \u2208 V n only\nvia an oracle that, given any index i \u2208 [n], runs a local randomized algorithm (local randomizer)\nR : V \u2192 \u02dcZ on input vi and returns the output R(vi) to A.\nDe\ufb01nition 2.1 (Local differential privacy [9, 11]). An algorithm satis\ufb01es \u0001-local differential privacy\n(LDP) if it accesses the database v = (v1,\u00b7\u00b7\u00b7 , vn) \u2208 V n only via invocations of a local randomizer\nR and if for all i \u2208 [n], if R(1), . . . ,R(k) denote the algorithm\u2019s invocations of R on the data sample\nfor any pair of data samples v, v(cid:48) \u2208 V and \u2200S \u2286 Range(A), Pr[A(v) \u2208 S] \u2264 e\u0001 Pr[A(v(cid:48)) \u2208 S].\n\nvi, then the algorithm A(\u00b7) (cid:44)(cid:0)R(1)(\u00b7),R(2)(\u00b7), . . . ,R(k)(\u00b7)(cid:1) is \u0001-differentially private. That is, if\n\n3 The TreeHist Protocol\n\nIn this section, we brie\ufb02y give an overview of our construction that is based on a compressed,\nnoisy version of the count sketch. To maintain clarity of the main ideas, we give here a high-level\ndescription of our construction. We refer to the full version of this work [3] for a detailed description\nof the full construction.\nWe \ufb01rst introduce some objects and public parameters that will be used in the construction:\n\nPre\ufb01xes: For a binary string v, we will use v[1 : (cid:96)] to denote the (cid:96)-bit pre\ufb01x of v. Let V =(cid:8)v \u2208\n{0, 1}(cid:96) for some (cid:96) \u2208 [log d](cid:9). Note that elements of V arranged in a binary pre\ufb01x tree of depth\n\nlog d, where the nodes at level (cid:96) of the tree represent all binary strings of length (cid:96). The items of the\ndictionary V represent the bottommost level of that tree.\nHashes: Let t, m be positive integers to be speci\ufb01ed later. We will consider a set of t pairs of hash\nfunctions {(h1, g1), . . . , (ht, gt)}, where for each i \u2208 [t], hi : V \u2192 [m] and gi : V \u2192 {\u22121, +1} are\nindependently and uniformly chosen pairwise independent hash functions.\n\nBasis matrix: Let W \u2208(cid:8)\u22121, +1(cid:9)m\u00d7m be\n\n\u221a\n\nm\u00b7Hm where Hm is the Hadamard transform matrix\nof size m. It is important to note that we do not need to store this matrix. The value of any entry in\nthis matrix can be computed in O(log m) bit operations given the (row, column) index of that entry.\nGlobal parameters: The total number of users n, the size of the Hadamard matrix m, the num-\nber of hash pairs t, the privacy parameter \u0001, the con\ufb01dence parameter \u03b2, and the hash functions\n\n(cid:8)(h1, g1), . . . , (ht, gt)(cid:9) are assumed to be public information. We set t = O(log(n/\u03b2)) and\n\n(cid:18)(cid:113) n\n\n(cid:19)\n\n.\n\nm = O\n\nlog(n/\u03b2)\n\nPublic randomness: In addition to the t hash pairs {(h1, g1), . . . , (ht, gt)}, we assume that the\nserver creates a random partition \u03a0 : [n] \u2192 [log d] \u00d7 [t] that assigns to each user i \u2208 [n] a random\npair ((cid:96)i, ji) \u2190 [log(d)] \u00d7 [t], and another random function Q : [n] \u2190 [m] that assigns4 to each user\ni a uniformly random index ri \u2190 [m]. We assume that such random indices (cid:96)i, ji, ri are shared\nbetween the server and each user.\nFirst, we describe the two main modules of our protocol.\n\n3.1 A local randomizer: LocalRnd\nFor each i \u2208 [n], user i runs her own independent copy of a local randomizer, denoted as\nLocalRnd, to generate her private report. LocalRnd of user i starts by acquiring the index triple\n((cid:96)i, ji, ri) \u2190 [log d] \u00d7 [t] \u00d7 [m] from public randomness. For each user, LocalRnd is invoked twice\nin the full protocol: once during the \ufb01rst phase of the protocol (called the pruning phase) where the\nhigh-frequency items (heavy hitters) are identi\ufb01ed, and a second time during the \ufb01nal phase (the es-\ntimation phase) to enable the protocol to get better estimates for the frequencies of the heavy hitters.\n4We could have grouped \u03a0 and Q into one random function mapping [n] to [log d] \u00d7 [t] \u00d7 [m], however,\nwe prefer to split them for clarity of exposition as each source of randomness will be used for a different role.\n\n4\n\n\fIn the \ufb01rst invocation, LocalRnd of user i performs its computation on the (cid:96)i-th pre\ufb01x of the item vi\nof user i, while in the second invocation, it performs the computation on the entire user\u2019s string vi.\nApart from this, in both invocations, LocalRnd follows similar steps. It \ufb01rst selects the hash pair\n(hji, gji), computes ci = hji(vi[1 : \u02dc(cid:96)]) (where \u02dc(cid:96) = (cid:96)i in the \ufb01rst invocation and \u02dc(cid:96) = log d in the\nsecond invocation, and vi[1 : \u02dc(cid:96)] is the \u02dc(cid:96)-th pre\ufb01x of vi), then it computes a bit xi = gji\nWri,ci (where Wr,c denotes the (r, c) entry of the basis matrix W). Finally, to guarantee \u0001-local\ndifferential privacy, it generates a randomized response yi based on xi (i.e., yi = xi with probability\ne\u0001/2/(1 + e\u0001/2) and yi = \u2212xi with probability 1/(1 + e\u0001/2), which is sent to the server.\nOur local randomizer can thought of as a transformed, compressed (via sampling), and randomized\nversion of the count sketch [8]. In particular, we can think of LocalRnd as follows. It starts off with\nsimilar steps to the standard count sketch algorithm, but then deviates from it as it applies Hadamard\ntransform to the user\u2019s signal, then samples one bit from the result. By doing so, we can achieve\nsigni\ufb01cant savings in space and communication without sacri\ufb01cing accuracy.\n\n(cid:17) \u00b7\n\nvi[1 : \u02dc(cid:96)]\n\n(cid:16)\n\n3.2 A frequency oracle: FreqOracle\n\nSuppose we want to allow the server estimate the frequencies of some given subset(cid:98)V \u2286 {0, 1}(cid:96) for\nFor each queried item \u02c6v \u2208 (cid:98)V and for each hash index j \u2208 [t], FreqOracle computes c = hj(\u02c6v),\n\nsome given (cid:96) \u2208 [log d] based on the noisy users\u2019 reports. We give a protocol, denoted as FreqOracle,\nfor accomplishing this task.\n\nthen collects the noisy reports of a collection of users I(cid:96),j that contains every user i whose pair of\npre\ufb01x and hash indices ((cid:96)i, ji) match ((cid:96), j). Next, it estimates the inverse Hadamard transform of\nthe compressed and noisy signal of each user in I(cid:96),j. In particular, for each i \u2208 I(cid:96),j, it computes\nyi Wri,c which can be described as a multiplication between yieri (where eri is the indicator vector\nwith 1 at the ri-th position) and the scaled Hadamard matrix W, followed by selecting the c-th entry\nof the resulting vector. This brings us back to the standard count sketch representation. It then sums\nall the results and multiplies the outcome by gj(\u02c6v) to obtain an estimate \u02c6fj(\u02c6v) for the frequency\nof \u02c6v. As in the count sketch algorithm, this is done for every j \u2208 [t], then FreqOracle obtains a\nhigh-con\ufb01dence estimate by computing the median of all the t frequency estimates.\n\n3.3 The protocol: TreeHist\nThe protocol is easier to describe via operations over nodes of the pre\ufb01x tree V of depth log d\n(described earlier). The protocol runs through two main phases: the pruning (or, scanning) phase,\nand the \ufb01nal estimation phase.\nIn the pruning phase, the protocol scans the levels of the pre\ufb01x tree starting from the top level (that\ncontains just 0 and 1) to the bottom level (that contains all items of the dictionary). For a given node\nat level (cid:96) \u2208 [log d], using FreqOracle as a subroutine, the protocol gets an estimate for the frequency\nof the corresponding (cid:96)-bit pre\ufb01x. For any (cid:96) \u2208 [log(d) \u2212 1], before the protocol moves to level (cid:96) + 1\nof the tree, it prunes all the nodes in level (cid:96) that cannot be pre\ufb01xes of actual heavy hitters (high-\nfrequency items in the dictionary).Then, as it moves to level (cid:96) + 1, the protocol considers only the\nchildren of the surviving nodes in level (cid:96). The construction guarantees that, with high probability,\nthe number of survining nodes in each level cannot exceed O\n. Hence, the total\n\n(cid:18)(cid:113)\n\n(cid:19)\n\nn\n\nlog(d) log(n)\n\n(cid:16)(cid:113) n log(d)\n\n(cid:17)\n\nlog(n)\n\n.\n\nnumber of nodes queried by the protocol (i.e., submitted to FreqOracle) is at most O\n\nIn the second and \ufb01nal phase, after reaching the \ufb01nal level of the tree, the protocol would have\nalready identi\ufb01ed a list of the candidate heavy hitters, however, their estimated frequencies may not\nbe as accurate as we desire due to the large variance caused by the random partitioning of users\nacross all the levels of the tree. Hence, it invokes the frequency oracle once more on those particular\nitems, and this time, the sampling variance is reduced as the set of users is partitioned only across\nthe t hash pairs (rather than across log(d)\u00d7 t bins as in the pruning phase). By doing this, the server\nobtains more accurate estimates for the frequencies of the identi\ufb01ed heavy hitters. The privacy and\naccuracy guarantees are stated below. The full details are given in the full version [3].\n\n5\n\n\f3.4 Privacy and Utility Guartantees\n\nTheorem 3.1. Protocol TreeHist is \u0001-local differentially private.\n\n(cid:16)(cid:112)n log(n/\u03b2) log(d))/\u0001\n(cid:17)\n\nTheorem 3.2. There is a number \u03b7 = O\nleast 1 \u2212 \u03b2, the output list of the TreeHist protocol satis\ufb01es the following properties:\n\nsuch that with probability at\n\n1. it contains all items v \u2208 V whose true frequencies above 3\u03b7.\n2. it does not contain any item v \u2208 V whose true frequency below \u03b7.\n3. Every\n\u2264 O\n\n(cid:16)(cid:112)n log(n/\u03b2)/\u0001\n(cid:17)\n\nfrequency\n\nestimate\n\noutput\n\nlist\n\nthe\n\nin\n\nis\n\naccurate\n\nup\n\nto\n\nan\n\nerror\n\n4 Locally Private Heavy-hitters \u2013 bit by bit\n\nWe now present a simpli\ufb01ed description of our second protocol, that captures most of the ideas. We\nrefer the reader to the full version of this work for the complete details.\nFirst Step: Frequency Oracle. Recall that a frequency oracle is a protocol that, after communicat-\ning with the users, outputs a data structure capable of approximating the frequency of every domain\nelement v \u2208 V. So, if we were to allow the server to have linear runtime in the domain size |V| = d,\nthen a frequency oracle would suf\ufb01ce for computing histograms. As we are interested in protocols\nwith a signi\ufb01cantly lower runtime, we will only use a frequency oracle as a subroutine, and query it\nonly for (roughly)\nLet Z \u2208 {\u00b11}d\u00d7n be a matrix chosen uniformly at random, and assume that Z is publicly known.5\nThat is, for every domain element v \u2208 V and every user j \u2208 [n], we have a random bit Z[v, j] \u2208\n{\u00b11}. As Z is publicly known, every user j can identify its corresponding bit Z[vj, j], where vj \u2208 V\nis the input of user j. Now consider a protocol in which users send randomized responses of their\n2 and sends yj = \u2212Z[vj, j] w.p.\ncorresponding bits. That is, user j sends yj = Z[vj, j] w.p. 1\n2 \u2212 \u0001\n1\n\n2. We can now estimate the frequency of every domain element v \u2208 V as\n\nn elements.\n\n2 + \u0001\n\n\u221a\n\n\u00b7 (cid:88)\n\nj\u2208[n]\n\na(v) =\n\n1\n\u0001\n\nyj \u00b7 Z[v, j].\n\nTo see that a(v) is accurate, observe that a(v) is the sum of n independent random variables (one for\nevery user). For the users j holding the input v being estimated (that is, vj = v) we will have that\nE[yj \u00b7 Z[v, j]] = 1. For the other users we will have that yj and Z[v, j] are independent, and hence\n1\nE[yj \u00b7 Z[v, j]] = E[yj] \u00b7 E[Z[v, j]] = 0. That is, a(v) can be expressed as the sum of n independent\n\u0001\nrandom variables: f (v) variables with expectation 1, and (n \u2212 f (v)) variables with expectation 0.\nThe fact that a(v) is an accurate estimation for f (v) now follows from the Hoeffding bound.\nLemma 4.1 (Algorithm Hashtogram). Let \u0001 \u2264 1. Algorithm Hashtogram satis\ufb01es \u0001-LDP.\nFurthermore, with probability at least 1 \u2212 \u03b2, algorithm Hashtogram answers every query v \u2208 V\nwith a(v) satisfying: |a(v) \u2212 f (v)| \u2264 O\n\n(cid:16) nd\n\n(cid:17)(cid:19)\n\n(cid:114)\n\n(cid:18)\n\nn log\n\n1\n\n.\n\n\u0001 \u00b7\n\n\u03b2\n\nSecond Step: Identifying Heavy-Hitters. Let us assume that we have a frequency oracle protocol\nwith worst-case error \u03c4. We now want to use our frequency oracle in order to construct a protocol\nthat operates on two steps: First, it identi\ufb01es a small set of potential \u201cheavy-hitters\u201d, i.e., domain\nelements that appear in the database at least 2\u03c4 times. Afterwards, it uses the frequency oracle to\nestimate the frequencies of those potential heavy elements.6\nLet h : V \u2192 [T ] be a (publicly known) random hash function, mapping domain elements into [T ],\nwhere T will be set later.7 We will now use h in order to identify the heavy-hitters. To that end,\n\n5As we describe in the full version of this work, Z has a short description, as it need not be uniform.\n6Event though we describe the protocol as having two steps, the necessary communication for these steps\n\ncan be done in parallel, and hence, our protocol will have only 1 round of communication.\n\n7As with the matrix Z, the hash function h can have a short description length.\n\n6\n\n\fFigure 1: Frequency vs privacy (\u0001) on the NLTK-\nBrown corpus.\n\nFigure 2: Frequency vs privacy (\u0001) on the Demo\n3 experiment from RAPPOR\n\nlet v\u2217 \u2208 V denote such a heavy-hitter, appearing at least 2\u03c4 times in the database S, and denote\nt\u2217 = h(v\u2217). Assuming that T is big enough, w.h.p. we will have that v\u2217 is the only input element\n(from S) that is mapped (by h) into the hash value t\u2217. Assuming that this is indeed the case, we will\nnow identify v\u2217 bit by bit.\nFor (cid:96) \u2208 [log d], denote S(cid:96) = (h(vj), vj[(cid:96)])j\u2208[n], where vj[(cid:96)] is bit (cid:96) of vj. That is, S(cid:96) is a database\nover the domain ([T ]\u00d7{0, 1}), where the row corresponding to user j is (h(vj), vj[(cid:96)]). Observe that\nevery user can compute her own row locally. As v\u2217 is a heavy-hitter, for every (cid:96) \u2208 [log d] we have\nthat (t\u2217, v\u2217[(cid:96)]) appears in S(cid:96) at least 2\u03c4 times. On the other hand, as we assumed that v\u2217 is the only\ninput element that is mapped into t\u2217 we get that (t\u2217, 1 \u2212 v\u2217[(cid:96)]) does not appear in S(cid:96) at all. Recall\nthat our frequency oracle has error at most \u03c4, and hence, we can use it to accurately determine the\nbits of v\u2217.\nTo make things more concrete, consider the protocol that for every hash value t \u2208 [T ], for every\ncoordinate (cid:96) \u2208 [log d], and for every bit b \u2208 {0, 1}, obtains an estimation (using the frequency\noracle) for the multiplicity of (t, b) in S(cid:96) (so there are log d invocations of the frequency oracle, and\na total of 2T log d estimations). Now, for every t \u2208 [T ] let us de\ufb01ne \u02c6v(t) where bit (cid:96) of \u02c6v(t) is the bit b\ns.t. (t, b) is more frequent than (t, 1\u2212b) in S(cid:96). By the above discussion, we will have that \u02c6v(t\u2217) = v\u2217.\nThat is, the protocol identi\ufb01es a set of T domain elements, containing all of the heavy-hitters. The\nfrequency of the identi\ufb01ed heavy-hitters can then be estimated using the frequency oracle.\nRemark 4.1. As should be clear from the above discussion, it suf\ufb01ces to take T (cid:38) n2, as this will\nensure that there are no collisions among different input elements. As we only care about collisions\nn times), it would suf\ufb01ce to take T (cid:38) n to ensure\nthat w.h.p. there are no collisions between heavy-hitters. In fact, we could even take T (cid:38) \u221a\nbetween \u201cheavy-hitters\u201d (appearing in S at least\nn, which\nwould ensure that a heavy-hitter x\u2217 has no collisions with constant probability, and then to amplify\nour con\ufb01dence using repetitions.\nLemma 4.2 (Algorithm Bitstogram). Let \u0001 \u2264 1. Algorithm Bitstogram satis\ufb01es \u0001-LDP.\n\u221a\nFurthermore, the algorithm returns a list L of length \u02dcO(\n1. With probability 1 \u2212 \u03b2, for every (v, a) \u2208 L we have that |a \u2212 f (v)| \u2264 O\n2. W.p. 1 \u2212 \u03b2, for every v \u2208 V s.t. f (v) \u2265 O\n\n(cid:17)\n(cid:112)n log(n/\u03b2)\n\n, we have that v is in L.\n\nn) satisfying:\n\n(cid:16) 1\n\n\u0001\n\n(cid:17)\n\n\u221a\n\n(cid:113)\n\n(cid:16) 1\n\n\u0001\n\nn log(d/\u03b2) log( 1\n\u03b2 )\n\n.\n\n5 Empirical Evaluation\n\nWe now discuss implementation details of our algorithms mentioned in Section 38. The main objec-\ntive of this section is to emphasize the empirical ef\ufb01cacy of our algorithms. [16] recently claimed\nspace optimality for a similar problem, but a formal analysis (or empirical evidence) was not pro-\nvided.\n\n7\n\n0.11.02.05.010.00.000.020.040.060.080.100.12Estimated frequencyEstimated frequency versus epsilonRank 1-TrueRank 1-PrivRank 10-TrueRank 10-PrivRank 100-TrueRank 100-Priv0204060801000.000.020.040.060.08Frequency estimateComparison between Count-Sketch and RAPPORTrue FreqCount_SketchRAPPOR\f5.1 Evaluation of the Private Frequency Oracle\n\n\u221a\n\nThe objective of this experiment is to test the ef\ufb01cacy of our algorithm in estimating the frequencies\nof a known set of dictionary of user items, under local differential privacy. We estimate the error in\nestimation while varying the privacy parameter \u0001. (See Section 2.1 for a refresher on the notation.)\nWe ran the experiment (Figure 1) on a data set drawn uniformly at random from the NLTK Brown\ncorpus [1]. The data set we created has n = 10 million samples drawn i.i.d. from the corpus with\nreplacement (which corresponds to 25, 991 unique words), and the system parameters are chosen\nas follows: number of data samples (n) : 10 million, range of the hash function (m):\nn, number\nof hash functions (t): 285. For the hash functions, we used the pre\ufb01x bits of SHA-256. The esti-\nmated frequency is scaled by the number of samples to normalize the result, and each experiment\nis averaged over ten runs. In this plot, the rank corresponds to the rank of a domain element in the\ndistribution of true frequencies in the data set. Observations: i) The plots corroborate the fact that\nthe frequency oracle is indeed unbiased. The average frequency estimate (over ten runs) for each\npercentile is within one standard deviation of the corresponding true estimate. ii) The error in the\nestimates go down signi\ufb01cantly as the privacy parameter \u0001 is increased.\nComparison with RAPPOR [10]. Here we compare our implementation with the only publicly\navailable code for locally private frequency estimation. We took the snapshot of the RAPPOR\ncode base (https://github.com/google/rappor) on May 9th, 2017. To perform a fair\ncomparison, we tested our algorithm against one of the demo experiments available for RAPPOR\n(Demo3 using the demo.sh script) with the same privacy parameter \u0001 = ln(3), the number of data\nsamples n = 1 million, and the data set to be the same data set generated by the demo.sh script. In\nFigure 2 we observe that for higher frequencies both RAPPOR and our algorithm perform similarly,\nwith ours being slightly better. However, in lower frequency regimes, the RAPPOR estimates are\nzero most of the times, while our estimates are closer to the true estimates. We do not claim our\nalgorithm to be universally better than RAPPOR on all data sets. Rather, through our experiments\nwe want to motivate the need for more thorough empirical comparison of both the algorihtms.\n\n5.2 Private Heavy-hitters\n\ntagged \u22a5 at the end. We set a threshold of 15 \u00b7 \u221a\n\nIn this section, we take on the harder task of identifying the heavy hitters, rather than estimating the\nfrequencies of domain elements. We run our experiments on the NLTK data set described earlier,\nwith the same default system parameters (as Section 5.1) along with n = 10 mi and \u0001 = 2, except\nnow we assume that we do not know the domain. As a part of our algorithm design, we assume that\nevery element in the domain is from the english alphabet set [a-z] and are of length exactly equal\nto six letters. Words longer than six letters were truncated and words shorter than six letters were\nn as the threshold for being a heavy hitter. As\nwith moth natural language data sets, the NLTK Brown data follows a power law dirstribution with\na very long tail. (See the full version of this work for a visualization of the distribution.)\nIn Table 5.2 we state our corresponding precision and recall parameters, and the false positive rate.\nThe total number of positive examples is 22 (out of 25991 unique words),and the total number\nof negative examples is roughly 3 \u00d7 108. The total number of false positives FP = 60, and false\nnegatives FN = 3. This corresponds to a vanishing FP-rate, considering the total number of negative\nexamples roughly equals 3 \u00d7 108. In practice, if there are false positives, they can be easily pruned\nusing domain expertise. For example, if we are trying to identify new words which users are typing\nin English [2], then using the domain expertise of English, a set of false positives can be easily ruled\nout by inspecting the list of heavy hitters output by the algorithm. On the other hand, this cannot\nbe done for false negatives. Hence, it is important to have a high recall value. The fact that we\nhave three false negatives is because the frequency of those words are very close to the threshold of\nn. While there are other algorithms for \ufb01nding heavy-hitters [4, 13], either they do not provide\n15\nany theoretical guarantee for the utility [10, 12, 16], or there does not exist a scalable and ef\ufb01cient\nimplementation for them.\n\n\u221a\n\n8The experiments are performed without the Hadamard compression during data transmission.\n\n8\n\n\fData set\n\nunique words\n\nNLTK Brown corpus\n\n25991\n\nRecall (TPR)\n0.86 (\u03c3 = 0.05)\n\nFPR\n2 \u00d7 10\u22127\n\nTable 2: Private Heavy-hitters with threshold=15\nTPR and FPR correspond to true positive rate and false positive rates respectively.\n\nn. Here \u03c3 corresponds to the standard deviation.\n\nPrecision\n\u221a\n\n0.24 (\u03c3 = 0.04)\n\nReferences\n[1] Nltk brown corpus. www.nltk.org.\n\n[2] Apple tries to peek at user habits without violating privacy. The Wall Street Journal, 2016.\n\n[3] Raef Bassily, Kobbi Nissim, Uri Stemmer, and Abhradeep Thakurta. Practical locally private\n\nheavy hitters. CoRR, abs/1707.04982, 2017.\n\n[4] Raef Bassily and Adam Smith. Local, private, ef\ufb01cient protocols for succinct histograms. In\nProceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, pages\n127\u2013135. ACM, 2015.\n\n[5] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure vs.\n\napproximate differential privacy. Theory of Computing, 12(1):1\u201361, 2016.\n\n[6] Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil P. Vadhan. Differentially private release and\nlearning of threshold functions. In Venkatesan Guruswami, editor, IEEE 56th Annual Sympo-\nsium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October,\n2015, pages 634\u2013649. IEEE Computer Society, 2015.\n\n[7] T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Optimal lower bound for differentially private\nmulti-party aggregation. In Leah Epstein and Paolo Ferragina, editors, Algorithms - ESA 2012 -\n20th Annual European Symposium, Ljubljana, Slovenia, September 10-12, 2012. Proceedings,\nvolume 7501 of Lecture Notes in Computer Science, pages 277\u2013288. Springer, 2012.\n\n[8] Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data\n\nstreams. In ICALP, 2002.\n\n[9] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to\nsensitivity in private data analysis. In Theory of Cryptography Conference, pages 265\u2013284.\nSpringer, 2006.\n\u00b4Ulfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable\nprivacy-preserving ordinal response. In CCS, 2014.\n\n[10]\n\n[11] Alexandre Ev\ufb01mievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy\n\nbreaches in privacy preserving data mining. In PODS, pages 211\u2013222. ACM, 2003.\n\n[12] Giulia Fanti, Vasyl Pihur, and Ulfar Erlingsson. Building a rappor with the unknown: Privacy-\npreserving learning of associations and data dictionaries. arXiv preprint arXiv:1503.01214,\n2015.\n\n[13] Justin Hsu, Sanjeev Khanna, and Aaron Roth. Distributed private heavy hitters.\n\nIn Inter-\nnational Colloquium on Automata, Languages, and Programming, pages 461\u2013472. Springer,\n2012.\n\n[14] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam\n\nSmith. What can we learn privately? SIAM Journal on Computing, 40(3):793\u2013826, 2011.\n\n[15] Nina Mishra and Mark Sandler. Privacy via pseudorandom sketches. In Proceedings of the\ntwenty-\ufb01fth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,\npages 143\u2013152. ACM, 2006.\n\n[16] A.G. Thakurta, A.H. Vyrros, U.S. Vaishampayan, G. Kapoor, J. Freudiger, V.R. Sridhar, and\n\nD. Davidson. Learning new words. US Patent 9594741, 2017.\n\n9\n\n\f", "award": [], "sourceid": 1336, "authors": [{"given_name": "Raef", "family_name": "Bassily", "institution": "The Ohio State University"}, {"given_name": "Kobbi", "family_name": "Nissim", "institution": "Georgetown University"}, {"given_name": "Uri", "family_name": "Stemmer", "institution": "Harvard University"}, {"given_name": "Abhradeep", "family_name": "Guha Thakurta", "institution": "University of California Santa Cruz"}]}