{"title": "Local Differential Privacy for Evolving Data", "book": "Advances in Neural Information Processing Systems", "page_first": 2375, "page_last": 2384, "abstract": "There are now several large scale deployments of differential privacy used to collect statistical information about users. However, these deployments periodically recollect the data and recompute the statistics using algorithms designed for a single use. As a result, these systems do not provide meaningful privacy guarantees over long time scales. Moreover, existing techniques to mitigate this effect do not apply in the ``local model'' of differential privacy that these systems use.\n\nIn this paper, we introduce a new technique for local differential privacy that makes it possible to maintain up-to-date statistics over time, with privacy guarantees that degrade only in the number of changes in the underlying distribution rather than the number of collection periods. We use our technique for tracking a changing statistic in the setting where users are partitioned into an unknown collection of groups, and at every time period each user draws a single bit from a common (but changing) group-specific distribution. We also provide an application to frequency and heavy-hitter estimation.", "full_text": "Local Differential Privacy for Evolving Data\n\nComputer and Information Science\n\nComputer and Information Science\n\nComputer and Information Sciences\n\nComputer and Information Science\n\nMatthew Joseph\n\nUniversity of Pennsylvania\nmajos@cis.upenn.edu\n\nJonathan Ullman\n\nNortheastern University\njullman@ccs.neu.edu\n\nAaron Roth\n\nUniversity of Pennsylvania\naaroth@cis.upenn.edu\n\nBo Waggoner\n\nUniversity of Pennsylvania\nbowaggoner@gmail.com\n\nAbstract\n\nThere are now several large scale deployments of differential privacy used to\ncollect statistical information about users. However, these deployments periodically\nrecollect the data and recompute the statistics using algorithms designed for a single\nuse. As a result, these systems do not provide meaningful privacy guarantees over\nlong time scales. Moreover, existing techniques to mitigate this effect do not apply\nin the \u201clocal model\u201d of differential privacy that these systems use.\nIn this paper, we introduce a new technique for local differential privacy that makes\nit possible to maintain up-to-date statistics over time, with privacy guarantees that\ndegrade only in the number of changes in the underlying distribution rather than\nthe number of collection periods. We use our technique for tracking a changing\nstatistic in the setting where users are partitioned into an unknown collection of\ngroups, and at every time period each user draws a single bit from a common (but\nchanging) group-speci\ufb01c distribution. We also provide an application to frequency\nand heavy-hitter estimation.\n\n1\n\nIntroduction\n\nAfter over a decade of research, differential privacy [12] is moving from theory to practice, with\nnotable deployments by Google [15, 6], Apple [2], Microsoft [10], and the U.S. Census Bureau [1].\nThese deployments have revealed gaps between existing theory and the needs of practitioners. For\nexample, the bulk of the differential privacy literature has focused on the central model, in which user\ndata is collected by a trusted aggregator who performs and publishes the results of a differentially\nprivate computation [11]. However, Google, Apple, and Microsoft have instead chosen to operate in\nthe local model [15, 6, 2, 10], where users individually randomize their data on their own devices and\nsend it to a potentially untrusted aggregator for analysis [18]. In addition, the academic literature has\nlargely focused on algorithms for performing one-time computations, like estimating many statistical\nquantities [7, 22, 16] or training a classi\ufb01er [18, 9, 4]. Industrial applications, however have focused\non tracking statistics about a user population, like the set of most frequently used emojis or words [2].\nThese statistics evolve over time and so must be re-computed periodically.\nTogether, the two problems of periodically recomputing a population statistic and operating in the local\nmodel pose a challenge. Na\u00efvely repeating a differentially private computation causes the privacy loss\nto degrade as the square root of the number of recomputations, quickly leading to enormous values of\n\u0001. This na\u00efve strategy is what is used in practice [15, 6, 2]. As a result, Tang et al. [23] discovered that\nthe privacy parameters guaranteed by Apple\u2019s implementation of differentially private data collection\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fcan become unreasonably large even in relatively short time periods.1 Published research on Google\nand Microsoft\u2019s deployments suggests that they encounter similar issues [15, 6, 10].\nOn inspection the na\u00efve strategy of regular statistical updates seems wasteful as aggregate population\nstatistics don\u2019t change very frequently\u2014we expect that the most frequently visited website today will\ntypically be the same as it was yesterday. However, population statistics do eventually change, and if\nwe only recompute them infrequently, then we can be too slow to notice these changes.\nThe central model of differential privacy allows for an elegant solution to this problem. For large\nclasses of statistics, we can use the sparse vector technique [13, 22, 16, 11] to repeatedly perform\ncomputations on a dataset such that the error required for a \ufb01xed privacy level grows not with\nthe number of recomputations, but with the number of times the computation\u2019s outcome changes\nsigni\ufb01cantly. For statistics that are relatively stable over time, this technique dramatically reduces the\noverall error. Unfortunately, the sparse vector technique provably has no local analogue [18, 24].\nIn this paper we present a technique that makes it possible to repeatedly recompute a statistic with\nerror that decays with the number of times the statistic changes signi\ufb01cantly, rather than the number\nof times we recompute the current value of the statistic, all while satisfying local differential privacy.\nThis technique allows for tracking of evolving local data in a way that makes it possible to quickly\ndetect changes, at modest cost, so long as those changes are relatively infrequent. Our approach\nguarantees privacy under any conditions, and obtains good accuracy by leveraging three assumptions:\n(1) each user\u2019s data comes from one of m evolving distributions; (2), these distributions change\nrelatively infrequently; and (3) users collect a certain amount of data during each reporting period,\nwhich we term an epoch. By varying the lengths of the epochs (for example, collecting reports hourly,\ndaily, or weekly), we can trade off more frequent reports versus improved privacy and accuracy.\n\n1.1 Our Results and Techniques\n\nAlthough our techniques are rather general, we \ufb01rst focus our attention on the problem of privately\nestimating the average of bits, with one bit held by each user. This simple problem is widely applicable\nbecause most algorithms in the local model have the following structure: on each individual\u2019s device,\ndata records are translated into a short bit vector using sketching or hashing techniques. The bits\nin this vector are perturbed to ensure privacy using a technique called randomized response, and\nthe perturbed vector is then sent to a server for analysis. The server collects the perturbed vectors,\naverages them, and produces a data structure encoding some interesting statistical information about\nthe users as a whole. Thus many algorithms (for example, those based on statistical queries) can be\nimplemented using just the simple primitive of estimating the average of bits.\nWe analyze our algorithm in the following probabilistic model (see Section 3 for a formal description).\nThe population of n users has an unknown partition into subgroups, each of which has size at least\nL, time proceeds in rounds, and in each round each user samples a private bit independently from\ntheir subgroup-speci\ufb01c distribution. The private data for each user consists of the vector of bits\nsampled across rounds, and our goal is to track the total population mean over time. We require\nthat the estimate be private, and ask for the strong (and widely known) notion of local differential\nprivacy\u2014for every user, no matter how other users or the server behave, the distribution of the\nmessages sent by that user should not depend signi\ufb01cantly on that user\u2019s private data.\nTo circumvent the limits of local differential privacy, we consider a slightly relaxed estimation\nguarantee. Speci\ufb01cally, we batch the rounds into T epochs, each consisting of (cid:96) rounds, and aim in\neach epoch t to estimate pt, the population-wide mean across the subgroups and rounds of epoch t.\nThus, any suf\ufb01ciently large changes in this mean will be identi\ufb01ed after the current epoch completes,\nwhich we think of as introducing a small \u201cdelay\".\nOur main result is an algorithm that takes data generated according to our model, guarantees a \ufb01xed\nlevel of local privacy \u03b5 that grows (up to a certain point) with the number of distributional changes\nrather than the number of epochs, and guarantees that the estimates released at the end of each epoch\n\nare accurate up to error that scales sublinearly in 1~(cid:96) and only polylogarithmically with the total\n\nnumber of epochs T . Our method improves over the na\u00efve solution of simply recomputing the statistic\nevery epoch \u2013 which would lead to either privacy parameter or error that scales linearly with the\n1Although the value of \u0001 that Apple guarantees over the course of say, a week, is not meaningful on its own,\nApple does take additional heuristic steps (as does Google) that make it dif\ufb01cult to combine user data from\nmultiple data collections [2, 15, 6]. Thus, they may still provide a strong, if heuristic, privacy guarantee.\n\n2\n\n\fnumber of epochs\u2014and offers a quanti\ufb01able way to reason about the interaction of collection times,\nreporting frequency, and accuracy. We note that one can alternatively phrase our algorithm so as to\nhave a \ufb01xed error guarantee, and a privacy cost that scales dynamically with the number of times the\ndistribution changes2.\nTheorem 1.1 (Protocol for Bernoulli Means, Informal Version of Theorem 4.3). In the above\nmodel, there is an \u03b5-differentially private local protocol that achieves the following guarantee: with\n\nprobability at least 1\u2212\u03b4, while the total number of elapsed epochs t where some subgroup distribution\nhas changed is fewer than \u03b5\u22c5 min\u0003\n\u0003, the protocol outputs estimates \u02dcpt where\nln(nT~\u03b4)\n\nn ln(mT~\u03b4) , ln(T)\u0001\nL\u0001\n\u00ef\u00efln(T)\u0004\n\u02dcpt\u2212 pt= O\n\n\u00ef\u0017\n\nn\n(cid:96)\n\n(cid:96)\n\n\u221a\n\nwhere L is the smallest subgroup size, n is the number of users, (cid:96) is the chosen epoch length, and T\nis the resulting number of epochs.\n\nTo interpret the theorem, consider the setting where there is only one subgroup and L= n. Then\nto achieve error \u03b1 we need, ignoring log factors, (cid:96)\u2265 1~\u03b12 and that fewer than \u03b5\u03b1\n\nn changes have\noccured. We emphasize that our algorithm satis\ufb01es \u03b5-differential privacy for all inputs without a\ndistributional assumption\u2014only accuracy relies on distributional assumptions.\nFinally, we demonstrate the versatility of our method as a basic building block in the design of\nlocally differentially private algorithms for evolving data by applying it to the well-known heavy\nhitters problem. We do so by implementing a protocol due to [3] on top of our simple primitive.\nThis adapted protocol enables us to ef\ufb01ciently track the evolution of histograms rather than single\nbits. Given a setting in which each user in each round independently draws an object from a\ndiscrete distribution over a dictionary of d elements, we demonstrate how to maintain a frequency\noracle (a computationally ef\ufb01cient representation of a histogram) for that dictionary with accuracy\nguarantees that degrade with the number of times the distribution over the dictionary changes, and\nonly polylogarithmically with the number of rounds. We summarize this result below.\nTheorem 1.2 (Protocol for Heavy-Hitters, Informal Version of Theorem 5.2). In the above model,\nthere is an \u03b5-differentially private local protocol that achieves the following guarantee: with proba-\n\nbility at least 1\u2212 \u03b4, while the total number of elapsed epochs t where some subgroup distribution\nhas changed is fewer than \u03b5\u22c5 min\u0003\n\u0003 the protocol outputs estimate\nL\u0001\noracles \u02c6f t such that for all v\u2208[d]\nln(dnT~\u03b4)\n \u02c6f t(v)\u2212P t(v)= O\nwhere n is the number of users, L is the smallest subgroup size,P t is the mean distribution over\n\nn ln(mT~\u03b4) , ln(T)\u0002\n\u00ef\u00efln(T)\u0004\nln(nT~\u03b4)\n\n+\u0004\n\nn ln(nT~\u03b4)\n\n\u00ef\u0017 .\n\ndictionary elements in epoch t, d is the number of dictionary elements, (cid:96) is the chosen epoch length,\nand T is the resulting number of epochs.\n\n(cid:96)\n\n(cid:96)\n\nn\n\n1.2 Related Work\n\nThe problem of privacy loss for persistent local statistics has been recognized since at least the original\nwork of Erlingsson et al. [15] on RAPPOR (the \ufb01rst large-scale deployment of differential privacy\nin the local model). Erlingsson et al. [15] offers a heuristic memoization technique that impedes a\ncertain straightforward attack but does not prevent the differential privacy loss from accumulating\nlinearly in the number of times the protocol is run. Ding et al. [10] give a formal analysis of a similar\nmemoization technique, but the resulting guarantee is not differential privacy\u2014instead it is a privacy\nguarantee that depends on the behavior of other users, and may offer no protection to users with\nidiosyncratic device usage. In contrast, we give a worst-case differential privacy guarantee.\nOur goal of maintaining a persistent statistical estimate is similar in spirit to the model of privacy\nunder continual observation Dwork et al. [14]. The canonical problem for differential privacy under\n\n2We can achieve a dynamic, data-dependent privacy guarantee using the notion of ex-post differential privacy\n\n[19], for example by using a so-called privacy odometer [21].\n\n3\n\n\fcontinual observation is to maintain a running count of a stream of bits. However, the problem we\nstudy is quite different. In the continual observation model, new users are arriving, while existing\nusers\u2019 data does not change. In our model each user receives new information in each round. (Also,\nwe work in the local model, which has not been the focus of the work on continual observation.)\nThe local model was originally introduced by Kasiviswanathan et al. [18], and the canonical algorith-\nmic task performed in this model has become frequency estimation (and heavy hitters estimation).\nThis problem has been studied in a series of theoretical [17, 3, 5, 8, 2] and practical works [15, 6, 2].\n\n2 Local Differential Privacy\n\nWe require that our algorithms satisfy local differential privacy. Informally, differential privacy is a\nproperty of an algorithm A, and states that the distribution of the output of A is insensitive to changes\n\nin one individual user\u2019s input. Formally, for every pair of inputs x, x\u2032 differing on at most one user\u2019s\ndata, and every set of possible outputs Z, P[A(x)\u2208 Z]\u2264 e\u03b5\u22c5 P[A(x\u2032)\u2208 Z]. A locally differentially\ndent of all other messages. Non-interactive protocols can thus be written as A(x1, . . . , xn) =\nf(A1(x1), . . . , An(xn)) for some function f, where each algorithm Ai satis\ufb01es \u03b5-differential pri-\n\nprivate algorithm is one in which each user i applies a private algorithm Ai only to their data.\nMost local protocols are non-interactive: each user i sends a single message that is indepen-\n\nvacy. Our model requires an interactive protocol: each user i sends several messages over time, and\nthese may depend on the messages sent by other users. This necessitates a slightly more complex\nformalism.\nWe consider interactive protocols among the n users and an additional center. Each user runs an\nalgorithm Ai (possibly taking a private input xi) and the central party runs an algorithm C. We let\n\nthe random variable tr(A1, . . . , An, C) denote the transcript containing all the messages sent by all\nof the parties. For a given party i and a set of algorithms A\u2032\u2212i, C\u2032, we let tri(xi; A\u2032\u2212i, C\u2032) denote the\nmessages sent by user i in the transcript tr(Ai(xi), A\u2032\u2212i, C\u2032). As a shorthand we will write tri(xi),\nsince A\u2032\u2212i, C\u2032 will be clear from context. We say that the protocol is locally differentially private if\nthe function tri(xi) is differentially private for every user i and every (possibly malicious) A\u2032\u2212i, C\u2032.\nDe\ufb01nition 2.1. An interactive protocol(A1, . . . , An, C) satis\ufb01es \u03b5-local differential privacy if for\nevery user i, every pair of inputs xi, x\u2032\ni for user i, and every set of algorithms A\u2032\u2212i, C\u2032, the resulting\nalgorithm tri(xi)= tri(Ai(xi), A\u2032\u2212i, C\u2032) is \u03b5-differentially private. That is, for every set of possible\ni)\u2208 Z].\noutputs Z, P[tri(xi)\u2208 Z]\u2264 e\u03b5\u22c5 P[tri(x\u2032\n\n3 Overview: The THRESH Algorithm\n\nHere we present our main algorithm, THRESH. The algorithmic framework is quite general, but for\nthis high level overview we focus on the simplest setting where the data is Bernoulli. In Section 4 we\nformally present the algorithm for the Bernoulli case and analyze the algorithm to prove Theorem 1.1.\nTo explain the algorithm we \ufb01rst recall the distributional model. There are n users, each of whom\n\nbelongs to a subgroup Sj for some j\u2208[m]; denote user i\u2019s subgroup by g(i). There are R= T (cid:96)\n1+ . . .+Sm\u00b5r\nm).\n(S1\u00b5r\nFor each epoch t, we use pt to denote the average of the Bernoulli means during epoch t, pt =\n(cid:96)\u2211r\u2208Et \u00b5r. After every epoch t, our protocol outputs \u02dcpt such thatpt\u2212 \u02dcpt is small.\n\nrounds divided into T epochs of length (cid:96), denoted E1, . . . , ET . In each round r, each user i receives a\nprivate bit xr\n\ng(i)). We de\ufb01ne the population-wide mean by \u00b5r= 1\n\nn\n\ni \u223c Ber(\u00b5r\n\n1\n\nThe goal of THRESH is to maintain some public global estimate \u02dcpt of pt. After any epoch t, we can\nupdate this global estimate \u02dcpt using randomized response: each user submits some differentially\nprivate estimate of the mean of their data, and the center aggregates these responses to obtain \u02dcpt.\nThe main idea of THRESH is therefore to update the global estimate only when it might become\nsuf\ufb01ciently inaccurate, and thus take advantage of the possibly small number of changes in the\nunderlying statistic pt. The challenge is to privately identify when to update the global estimate.\n\nThe Voting Protocol. We identify these \u201cupdate needed\u201d epochs through a voting protocol. Users\nwill examine their data and privately publish a vote for whether they believe the global estimate\nneeds to be updated. If enough users vote to update the global estimate, we do so (using randomized\n\n4\n\n\fresponse). The challenge for the voting protocol is that users must use randomization in their voting\nprocess, to keep their data private, so we can only detect when a large number of users vote to update.\nFirst, we describe a na\u00efve voting protocol. In each epoch t, each user i computes a binary vote at\ni.\n\nThis vote is 1 if the user concludes from their own samples that the global estimate \u02dcpt\u22121 is inaccurate,\n\nand 0 otherwise. Each user casts a noisy vote using randomized response accordingly, and if the sum\nof the noisy votes is large enough then a global update occurs.\nThe problem with this protocol is that small changes in the underlying mean pt may cause some users\nto vote 1 and others to vote 0, and this might continue for an arbitrarily long time without inducing\na global update. As a result, each voter \u201cwastes\" privacy in every epoch, which is what we wanted\nto avoid. We resolve this issue by having voters also estimate their con\ufb01dence that a global update\nneeds to occur, and vote proportionally. As a result, voters who have high con\ufb01dence will lose more\nprivacy per epoch (but the need for a global update will be detected quickly), while voters with low\ncon\ufb01dence will lose privacy more slowly (but may end up voting for many rounds).\n\nIn more detail, each user i decides their con\ufb01dence level by comparing\u02c6pt\u2212 \u02c6pf(t)\n\n\u2014the difference\n\nbetween the local average of their data in the current epoch and their local average the last time a\nglobal update occurred\u2014to a small set of discrete thresholds. Users with the highest con\ufb01dence will\nvote in every epoch, whereas users with lower con\ufb01dence will only vote in a small subset of the\nepochs. We construct these thresholds and subsets so that in expectation no user votes in more than a\nconstant number of epochs before a global update occurs, and the amount of privacy each user loses\nfrom voting will not grow with the number of epochs required before an update occurs.\n\ni\n\n4 THRESH: The Bernoulli Case\n\n4.1 The THRESH Algorithm (Bernoulli Case)\n\nWe now present pseudocode for the algorithm THRESH, including both the general framework as well\nas the speci\ufb01c voting and randomized response procedures. We emphasize that the algorithm only\ntouches user data through the subroutines VOTE, and EST, each of which accesses data from a single\nuser in at most two epochs. Thus, it is an online local protocol in which user i\u2019s response in epoch t\n\nviewable to all users). THRESH uses carefully chosen thresholds \u03c4b= 2(b+ 1)\u0001\ndepends only on user i\u2019s data from at most two epochs t and t\u2032 (and the global information that is\nln(12nT~\u03b4)~2(cid:96) for\nb=\u22121, 0, . . . ,\u00e6log(T)\u00e6 to discretize the con\ufb01dence of each user; see Section 4.2 for details on this\n\nchoice.\nWe begin with a privacy guarantee for THRESH. Our proof uses the standard analysis of the privacy\nproperties of randomized response, combined with the fact that users have a cap on the number of\nupdates that prevents the privacy loss from accumulating. We remark that our privacy proof does not\ndepend on distributional assumptions, which are only used for the proof of accuracy. We sketch a\nproof here. A full proof appears in the Supplement.\nTheorem 4.1. The protocol THRESH satis\ufb01es \u03b5-local differential privacy (De\ufb01nition 2.1)\n\nProof Sketch: Na\u00efvely applying composition would yield a privacy parameter that scales with T .\nInstead, we will rely on our de\ufb01ned privacy \u201ccaps\" cV\nthat limit the number of truthful votes\n\nand estimates each user sends. Intuitively, each user sends at most O( \u03b5\nbound the privacy \u201ccost\" of each of these O( \u03b5\n\n) messages that depend\n) elements of a user\u2019s transcript coming from a\n\u0019\n\non their private data, and the rest are sampled independently of their private data. Thus, we need only\n\ndifferent distribution and bound the sum of the costs by \u03b5.\n\n+ \u03b5\n\n+ \u03b5\n\ni and cE\ni\n\na\n\na\n\nb\n\n4.2 Accuracy Guarantee\n\n\u0002\u0001\n\n+ \u221a\n\nOur accuracy theorem needs the following assumption on L, the size of the smallest subgroup, to\nguarantee that a global update occurs whenever any subgroup has all of its member users vote \u201cyes\".\n\nAssumption 4.2. L>\u0002 3\u221a\nof rounds R, privacy parameter \u03b5, and chosen epoch length (cid:96) and number of epochs T= R~(cid:96), with\n\nThis brings us to our accuracy theorem, followed by a proof sketch (see Supplement for full details).\nTheorem 4.3. Given number of users n, number of subgroups m, smallest subgroup size L, number\n\nn ln(12mT~\u03b4).\n\n32\n\u03b5\n\n2\n\nb\n\n5\n\n\fAlgorithm 1 Global Algorithm: THRESH\nRequire: number of users n, number of epochs T , minimum subgroup size L, number of subgroups\n\n2\n\nln(12T~\u03b4)~2n\n\n1 , . . . , cV\n\n1 , . . . , cE\n\nUser i publishes at\n\nm, epoch length (cid:96), privacy parameter \u03b5, failure parameter \u03b4\n\n2: Initialize vote privacy counters cV\n3: Initialize estimate privacy counters cE\n\n1: Initialize global estimate \u02dcp0\u2190\u22121\nn \u2190 0, . . . , 0\nn \u2190 0, . . . , 0\n4: Initialize vote noise level a\u2190 4\n\u0001\n2n ln(12mT~\u03b4)\n\u0001\nn ln(12mT~\u03b4)\nL\u2212 3\u221a\n5: Initialize estimate noise level b\u2190\n\u0001\n2 ln(12T~\u03b4)~2n\nln(12nT~\u03b4)~2(cid:96)\u2212\u0001\nlog(T)\u0001\n6: for each epoch t\u2208[T] do\nfor each user i\u2208[n] do\ni\u2190 VOTE(i, t)\n+\u0002\n\u0003\nGlobalUpdatet\u2190\u0003 1\ni> 1\nn\u2211n\ni=1 at\nea+1\nf(t)\u2190 t\nfor each i\u2208[n] do\ni\u2190 EST(i, t)\nAggregate user estimates into global estimate: \u02dcpt\u2190 1\nn\u2211n\ni=1\nf(t)\u2190 f(t\u2212 1)\nfor each i\u2208[n] do\n\u02dcpt\u2190 \u02dcpt\u22121\n\ni\u2190 Ber( 1\neb+1\n\nif GlobalUpdatet then\n\nln(10T~\u03b4)\n\nUser i publishes \u02dcpt\n\nend for\n\n)\n\n2n\n\n7:\n8:\n9:\n10:\n\nend for\n\nelse\n\n11:\n12:\n13:\n14:\n15:\n16:\n17:\n18:\n19:\n20:\n21:\n22:\n23:\n24:\n25: end for\n\ni(eb+1)\u22121\neb\u22121\n\n\u02dcpt\n\nUser i publishes \u02dcpt\n\nend for\n\nend if\nAnalyst publishes \u02dcpt\n\ni\n\nAlgorithm 2 Local Subroutine: VOTE\nRequire: user i, epoch t\n1: Compute local estimate \u02c6pt\n\ni\u2190 1\n(cid:96)\u2211r\u2208Et xr\n2: b\u2217\u2190 highest b such that\u02c6pt\ni\u2212 \u02c6pf(t)\n> \u03c4b\ni\u2190(cV\ni < \u03b5~4 and 2\u00e6log T\u00e6\u2212b\ni \u2190 cV\ni + a\n)\ni\u2190 Ber( ea\ni then\nea+1\ni\u2190 Ber( 1\n)\nea+1\n\n3: VoteYest\n4: if VoteYest\ncV\n5:\nat\n6:\n7: else\nat\n8:\n9: end if\n10: Output at\ni\n\n\u2217\n\ni\n\ndivides t)\n\ni < \u03b5~4}\ni\u2190{cE\ni + b\ni \u2190 cE\n)\ni\u2190 Ber( 1+ \u02c6pt\ni then\ni(eb\u22121)\neb+1\ni\u2190 Ber( 1\n)\neb+1\n\nAlgorithm 3 Local Subroutine: EST\nRequire: user i, epoch t\n1: SendEstimatet\n2: if SendEstimatet\ncE\n3:\n\u02dcpt\n4:\n5: else\n\u02dcpt\n6:\n7: end if\n8: Output \u02dcpt\ni\n\n6\n\n\fprobability at least 1\u2212 \u03b4, in every epoch t\u2208[T] such that fewer than\n\u0003log(T)\u0003\nln(12nT~\u03b4)\n\n\u00ef\u00ef\n1\u221a\n\u0001\n2n ln(12mT~\u03b4)\u2212 1,\n\u22c5 min\n\u02dcpt\u2212 pt\u2264 4(\u00e6log(T)\u00e6+ 2)\u0004\n\nchanges have occurred in epochs 1, 2, . . . , t, THRESH outputs \u02dcpt such that\n\n\u03b5\n4\n\nL\n\n8\n\n2\n\n.\n\n\u2212 1\u0003\u00ef\u0017\n\nn\n(cid:96)\n\n2(cid:96)\n\nProof Sketch: We begin by proving correctness of the voting process. We show that (1) if every user\ndecides that their subgroup distribution has not changed then a global update does not occur, (2) if\nevery user in some subgroup decides that a change has occurred, then a global update occurs, and (3)\nfor each user i the individual user estimates driving these voting decisions are themselves accurate\n\ng(i). Finally, we prove that if every user decides that a\n\nln(nT~\u03b4)~(cid:96)) of the true \u00b5t\n\nto within t(cid:96)= O(\u0001\n\ni by at most 1.\n\nchange has occurred, then a global update occurs that produces a global estimate \u02dcpt that is within t(cid:96)\nof the true pt.\nTo reason about how distribution changes across multiple epochs affect THRESH, we use the preceding\nresults to show that the number of global updates never exceeds the number of distribution changes.\nA more granular guarantee then bounds the number of changes any user detects\u2014and the number\nof times they vote accordingly\u2014as a function of the number of distribution changes. These results\nenable us to show that each change increases a user\u2019s vote privacy cap cV\ni by at most 2 and estimate\nprivacy cap cE\nFinally, recall that THRESH has each user i compare their current local estimate \u02c6pt\n\nincreasing the likelihood of a \u201cyes\" vote. This implies that if every user in some subgroup computes\na local estimate \u02c6pt\nvote and a global update occurs, bringing with it the global accuracy guarantee proven above. In turn,\n\n\ni\u2212 \u02c6pf(t)\n, to decide how to vote, with higher thresholds for\u02c6pt\nestimate in the last global update, \u02c6pf(t)\n exceeds the highest threshold, then every user sends a \u201cyes\"\ni\u2212 \u02c6pf(t)\n\u0019\nwe conclude that\u02dcpt\u2212 pt never exceeds the highest threshold, and our accuracy result follows.\nof size\u2264 c, paying and additive c term in the accuracy. Second, the accuracy\u2019s dependence on (cid:96)\noffers guidance for its selection: roughly, for desired accuracy \u03b1, one should set (cid:96)= 1~\u03b12. Finally, in\n\nWe conclude this section with a few remarks about THRESH. First, while the provided guarantee\ndepends on the number of changes of any size, one can easily modify THRESH to be robust to changes\n\ni such that\u02c6pt\n\ni to their local\n\npractice one may want to periodically assess how many users have exhausted their privacy budgets,\nwhich we can achieve by extending the voting protocol to estimate the fraction of \u201clive\u201d users. We\nprimarily view this as an implementation detail outside of the scope of the exact problem we study.\n\ni\n\ni\n\ni\n\n5 An Application to Heavy Hitters\n\nown dictionary value v \u2208 D (e.g.\n\nWe now use the methods developed above to obtain similar guarantees for a common problem\nin local differential privacy known as heavy hitters.\nIn this problem each of n users has their\ntheir homepage), and an aggregator wants to learn the most\nfrequently held dictionary values (e.g. the most common homepages), known as \u201cheavy hitters\",\nwhile satisfying local differential privacy for each user. The heavy hitters problem has attracted\nsigni\ufb01cant attention [20, 17, 5, 8]. Here, we show how our techniques combine with an approach\nof Bassily and Smith [3] to obtain the \ufb01rst guarantees for heavy hitters on evolving data. We note that\nour focus on this approach is primarily for expositional clarity; our techniques should apply just as\nwell to other variants, which can lead to more ef\ufb01cient algorithms.\n\n5.1 Setting Overview\n\nAs in the simpler Bernoulli case, we divide time into (cid:96)\u22c5 T rounds and T epochs. Here, in each round\ndictionaryD and trackP 1, . . . ,P T , the weighted average dictionary distribution in each epoch. We\ng(i) over the d values in\nwill require the same Assumption 4.2 as in the Bernoulli case, and we also suppose that d\u00e2 n, a\n\ni from a subgroup-speci\ufb01c distributionP r\n\nr each user i draws a sample vr\n\ncommon parameter regime for this problem.\n\n7\n\n\fIn the Bernoulli case users could reason about the evolution of \u00b5t\n\neach epoch. Since it is reasonable to assume d\u00e2 (cid:96), this is no longer possible in our new setting\u2014P t\n\nj directly from their own (cid:96) samples in\nj\nis too large an object to estimate from (cid:96) samples. However, we can instead adopt a common approach\nin heavy hitters estimation and examine a \u201csmaller\" object using a hash on dictionary samples. We\nwill therefore have users reason about the distribution pt\nj induces, which is a\nmuch smaller joint distribution of m (transformed) Bernoulli distributions. Our hope is that users can\nreliably \u201cdetect changes\u201d by analyzing pt\nj, and the feasibility of this method leans crucially on the\nproperties of the hash in question.\n\nj over hashes thatP t\n\n5.2 Details and Privacy Guarantee\n\ni and then hashes it into \u03a6\u02c6pt\n\nFirst we recall the details of the one-shot protocol from Bassily and Smith [3]. In their protocol, each\n\ncenter aggregates these randomized values into a single \u00afz which induces a frequency oracle.\nWe will modify this to produce a protocol HEAVYTHRESH in the vein of THRESH. In each epoch t\neach user i computes an estimated histogram \u02c6pt\n(we assume the existence of a subroutine GenProj for generating \u03a6). Each user votes on whether or\ni to their estimate during the most recent update,\nnot a global update has occurred by comparing \u03a6\u02c6pt\n, in HEAVYVOTE. Next, HEAVYTHRESH aggregates these votes to determine whether or\nnot a global update will occur. Depending on the result, each user then calls their own estimation\n\nuser starts with a dictionary value v\u2208[d] with an associated basis vector ev\u2208 Rd. The user hashes\nthis to a smaller vector h\u2208 Rw using a (population-wide) \u03a6, a w\u00d7 d Johnson-Lindenstrauss matrix\nwhere w\u00e2 d. The user then passes this hash \u02c6zt\ni = \u03a6ev to their own local randomizerR, and the\ni\u2208 Rw, where w= 20n\n\u03a6\u02c6pf(t)\nsubroutine HEAVYEST and outputs a randomized response usingR accordingly. If a global update\nHEAVYTHRESH publishes \u02dcyt\u22121. In either case, HEAVYTHRESH publishes(\u03a6, \u02dcyt) as well. This \ufb01nal\noutput is a frequency oracle, which for any v\u2208[d] offers an estimate\u001b\u03a6ev, \u02dcyt\u001b ofP t(v).\nHEAVYTHRESH will use the following thresholds with \u03c4b = 2(b+ 1)\u0001\n2 ln(16wnT~\u03b4)~w(cid:96) for\nb=\u22121, 0, . . . ,\u00e6log(T)\u00e6. See Section 5.3 for details on this choice. Fortunately, the bulk of our\nonly additional analysis needed is for the estimation randomizerR (see Supplement). Using the\nprivacy ofR, privacy for HEAVYTHRESH follows by the same proof as for the Bernoulli case.\n\nanalysis uses tools already developed either in Section 4 or Bassily and Smith [3]. Our privacy\nguarantee is almost immediate: since HEAVYTHRESH shares its voting protocols with THRESH, the\n\noccurs, HEAVYTHRESH aggregates these responses into a new published global hash \u02dcyt; if not,\n\ni\n\nTheorem 5.1. HEAVYTHRESH is \u0001-local differentially private.\n\n5.3 Accuracy Guarantee\n\nresult and its proof sketch follow, with details and full pseudocode in the Supplement.\n\nAs above, an accuracy guarantee for HEAVYTHRESH unfolds along similar lines as that for THRESH,\nwith additional recourse to results from Bassily and Smith [3]. We again require Assumption 4.2\n\nand also assume d= 2o(n2~(cid:96)) (a weak assumption made primarily for neatness in Theorem 1.2). Our\nTheorem 5.2. With probability at least 1\u2212 \u03b4, in every epoch t\u2208[T] such that fewer than\n\u2212\u0002\n\u2212 2 ln(320nT~\u03b4)\u0002\n\u00ef\u00ef\u00ef\u0017\nln(16dT~\u03b4)\nn ln(320n2T~\u03b4)\n\u22c5 min\n\u0001\n\u0002\nln(320nT~\u03b4)\u00021+ 20\u221a\n\u00bf``(cid:192) ln( 16ndT\n2 ln(320n2T~\u03b4)\n\nlog(T)\u0002\n2n ln(12mT~\u03b4)\u2212 1,\n \u02c6f t(v)\u2212P t(v)< 4(log(T)+ 2)\u0004\n\nchanges have occurred in epochs 1, 2, . . . , t,\n\n\u00ef\u00ef\u00ef\u00ef\n\n\u0001\n\n+\n\n)\n\n.\n\n10(cid:96)\n\n10\n\n5\nn\n\n\u03b5\n4\n\n8\n\nL\n\nn\n\n\u03b4\nn\n\n(cid:96)\n\nProof Sketch: Our proof is similar to that of Theorem 4.3 and proceeds by proving analogous versions\nof the same lemmas, with users checking for changes in the subgroup distribution over observed\nhashes rather than observed bits. This leads to one new wrinkle in our argument: once we show\nthat the globally estimated hash is close to the true hash, we must translate from closeness of hashes\nto closeness of the distributions they induce . The rest of the proof, which uses guarantees of user\nestimate accuracy to 1. guarantee that suf\ufb01ciently large changes cause global updates and 2. each\nchange incurs a bounded privacy loss, largely follows that of Theorem 4.3.\n\n\u0019\n\n8\n\n\fReferences\n[1] John M. Abowd. The challenge of scienti\ufb01c reproducibility and privacy protection for statistical\n\nagencies. Census Scienti\ufb01c Advisory Committee, 2016.\n\n[2] Differential Privacy Team Apple. Learning with privacy at scale. Technical report, Apple, 2017.\n\n[3] Raef Bassily and Adam Smith. Local, private, ef\ufb01cient protocols for succinct histograms.\nIn Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages\n127\u2013135. ACM, 2015.\n\n[4] Raef Bassily, Adam Smith, and Abhradeep Thakurta. Differentially private empirical risk\nminimization: Ef\ufb01cient algorithms and tight error bounds. arXiv preprint arXiv:1405.7085,\n2014.\n\n[5] Raef Bassily, Uri Stemmer, and Abhradeep Guha Thakurta. Practical locally private heavy\n\nhitters. In Advances in Neural Information Processing Systems, pages 2285\u20132293, 2017.\n\n[6] Andrea Bittau, \u00dalfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David\nLie, Mitch Rudominer, Usharsee Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong\nprivacy for analytics in the crowd. arXiv preprint arXiv:1710.00901, 2017.\n\n[7] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive\n\ndatabase privacy. Journal of the ACM (JACM), 60(2):12, 2013.\n\n[8] Mark Bun, Jelani Nelson, and Uri Stemmer. Heavy hitters and the structure of local privacy.\n\narXiv preprint arXiv:1711.04740, 2017.\n\n[9] Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical\n\nrisk minimization. Journal of Machine Learning Research, 12(Mar):1069\u20131109, 2011.\n\n[10] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In\n\nProceedings of Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017.\n\n[11] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Founda-\n\ntions and Trends\u00ae in Theoretical Computer Science, 9(3\u20134):211\u2013407, 2014.\n\n[12] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to\nsensitivity in private data analysis. In Theory of Cryptography Conference, pages 265\u2013284.\nSpringer, 2006.\n\n[13] Cynthia Dwork, Moni Naor, Omer Reingold, Guy N Rothblum, and Salil Vadhan. On the\ncomplexity of differentially private data release: ef\ufb01cient algorithms and hardness results. In\nProceedings of the forty-\ufb01rst annual ACM symposium on Theory of computing, pages 381\u2013390.\nACM, 2009.\n\n[14] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N Rothblum. Differential privacy under\nIn Proceedings of the forty-second ACM symposium on Theory of\n\ncontinual observation.\ncomputing, pages 715\u2013724. ACM, 2010.\n\n[15] \u00dalfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable\nprivacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on\ncomputer and communications security, pages 1054\u20131067. ACM, 2014.\n\n[16] Moritz Hardt and Guy N Rothblum. A multiplicative weights mechanism for privacy-preserving\ndata analysis. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium\non, pages 61\u201370. IEEE, 2010.\n\n[17] Justin Hsu, Sanjeev Khanna, and Aaron Roth. Distributed private heavy hitters. In International\n\nColloquium on Automata, Languages, and Programming, pages 461\u2013472. Springer, 2012.\n\n[18] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam\nIn Proceedings of the 54th Annual Symposium on\n\nSmith. What can we learn privately?\nFoundations of Computer Science, pages 531\u2013540, 2008.\n\n9\n\n\f[19] Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, and Steven Z Wu. Accuracy \ufb01rst:\nSelecting a differential privacy level for accuracy constrained erm. In Advances in Neural\nInformation Processing Systems, pages 2563\u20132573, 2017.\n\n[20] Nina Mishra and Mark Sandler. Privacy via pseudorandom sketches. In Proceedings of the\ntwenty-\ufb01fth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,\npages 143\u2013152. ACM, 2006.\n\n[21] Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. Privacy odometers and \ufb01lters:\nPay-as-you-go composition. In Advances in Neural Information Processing Systems, pages\n1921\u20131929, 2016.\n\n[22] Aaron Roth and Tim Roughgarden. Interactive privacy via the median mechanism. In Pro-\nceedings of the forty-second ACM symposium on Theory of computing, pages 765\u2013774. ACM,\n2010.\n\n[23] Jun Tang, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang, and Xiaofeng Wang. Pri-\nvacy loss in apple\u2019s implementation of differential privacy on macos 10.12. arXiv preprint\narXiv:1709.02753, 2017.\n\n[24] Jonathan Ullman. Tight lower bounds for locally differentially private selection. Manuscript,\n\n2018.\n\n10\n\n\f", "award": [], "sourceid": 1212, "authors": [{"given_name": "Matthew", "family_name": "Joseph", "institution": "University of Pennsylvania"}, {"given_name": "Aaron", "family_name": "Roth", "institution": "University of Pennsylvania"}, {"given_name": "Jonathan", "family_name": "Ullman", "institution": "Northeastern University"}, {"given_name": "Bo", "family_name": "Waggoner", "institution": "Microsoft"}]}