{"title": "(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings", "book": "Advances in Neural Information Processing Systems", "page_first": 2733, "page_last": 2741, "abstract": "We provide a general technique for making online learning algorithms differentially private, in both the full information and bandit settings. Our technique applies to algorithms that aim to minimize a \\emph{convex} loss function which is a sum of smaller convex loss terms, one for each data point. We modify the popular \\emph{mirror descent} approach, or rather a variant called \\emph{follow the approximate leader}. The technique leads to the first nonprivate algorithms for private online learning in the bandit setting. In the full information setting, our algorithms improve over the regret bounds of previous work. In many cases, our algorithms (in both settings) matching the dependence on the input length, $T$, of the \\emph{optimal nonprivate} regret bounds up to logarithmic factors in $T$. Our algorithms require logarithmic space and update time.", "full_text": "(Nearly) Optimal Algorithms for Private Online\nLearning in Full-information and Bandit Settings\n\nAdam Smith\u21e4\n\nPennsylvania State University\nasmith@cse.psu.edu\n\nAbhradeep Thakurta\u2020\nStanford University and\n\nMicrosoft Research Silicon Valley Campus\n\nb-abhrag@microsoft.com\n\nAbstract\n\nWe give differentially private algorithms for a large class of online learning al-\ngorithms, in both the full information and bandit settings. Our algorithms aim\nto minimize a convex loss function which is a sum of smaller convex loss terms,\none for each data point. To design our algorithms, we modify the popular mirror\ndescent approach, or rather a variant called follow the approximate leader.\nThe technique leads to the \ufb01rst nonprivate algorithms for private online learning in\nthe bandit setting. In the full information setting, our algorithms improve over the\nregret bounds of previous work (due to Dwork, Naor, Pitassi and Rothblum (2010)\nand Jain, Kothari and Thakurta (2012)). In many cases, our algorithms (in both\nsettings) match the dependence on the input length, T , of the optimal nonprivate\nregret bounds up to logarithmic factors in T . Our algorithms require logarithmic\nspace and update time.\n\n1\n\nIntroduction\n\nThis paper looks at the information leaked by online learning algorithms, and seeks to design ac-\ncurate learning algorithms with rigorous privacy guarantees \u2013 that is, algorithms that provably leak\nvery little about individual inputs.\nEven the output of of\ufb02ine (batch) learning algorithms can leak private information. The dual form\nof a support vector machine\u2019s solution, for example, is described in terms of a small number of exact\ndata points, revealing these individuals\u2019 data in the clear. Considerable effort has been devoted to\ndesigning batch learning algorithms satisfying differential privacy (a rigorous notion of privacy that\nemerged from the cryptography literature [DMNS06, Dwo06]), for example [BDMN05, KLN+08,\nCM08, CMS11, Smi11, KST12, JT13, DJW13].\nIn this work we provide a general technique for making a large class of online learning algorithms\ndifferentially private, in both the full information and bandit settings. Our technique applies to\nalgorithms that aim to minimize a convex loss function which is a sum of smaller convex loss terms,\none for each data point. We modify the popular mirror descent approach (or rather a variant called\nfollow the approximate leader) [Sha11, HAK07].\nIn most cases, the modi\ufb01ed algorithms provide similar accuracy guarantees to their nonprivate coun-\nterparts, with a small (logarithmic in the stream length) blowup in space and time complexity.\n\nOnline (Convex) Learning: We begin with the full information setting. Consider an algorithm\nthat receives a stream of inputs F = hf1, ...., fTi, each corresponding to one individual\u2019s data. We\ninterpret each input as a loss function on a parameter space C (for example, it might be one term\n\n\u21e4Supported by NSF awards #0941553 and #0747294.\n\u2020Supported by Sloan Foundation fellowship and Microsoft Research.\n\n1\n\n\fin a convex program such as the one for logistic regression). The algorithm\u2019s goal is to output a\nsequence of parameter estimates w1, w2, ..., with each wt in C, that roughly minimizes the errors\nPt ft(wt). The dif\ufb01culty for the algorithm is that it computes wt based only on f1, ..., ft1. We\nseek to minimize the a posteriori regret,\n\nRegret(T ) =\n\nft(w)\n\n(1)\n\nTXt=1\n\nft(wt) min\nw2C\n\nTXt=1\n\nIn the bandit setting, the input to the algorithms consists only of f1(w1), f2(w2), .... That is, at each\ntime step t, the algorithm learns only the cost ft1(wt1) of the choice wt1 it made at the previous\ntime step, rather than the full cost function ft1.\nWe consider three types of adversarial input selection: An oblivious adversary selects the input\nstream f1, ..., fT ahead of time, based on knowledge of the algorithm but not of the algorithm\u2019s\nrandom coins. A (strongly) adaptive adversary selects ft based on the output so far w1, w2, ..., wt\n(but not on the algorithm\u2019s internal random coins).\nBoth the full-information and bandit settings are extensively studied in the literature (see, e.g.,\n[Sha11, BCB12] for recent surveys). Most of this effort has been spent on online learning prob-\nlems are convex, meaning that the loss functions ft are convex (in w) and the parameter set C\u2713 Rp\nis a convex set (note that one can typically \u201cconvexify\u201d the parameter space by randomization). The\nproblem dimension p is the dimension of the ambient space containing C.\nWe consider various restrictions on the cost functions, such as Lipschitz continuity and strong con-\nvexity. A function f : C! R is L-Lipschitz with respect to the `2 metric if |f (x) f (y)|\uf8ff\nLkx yk2 for all x, y 2C . Equivalently, for every x 2C 0 (the interior of C) and every subgra-\ndient z 2 @f (x), we have kzk2 \uf8ff L. (Recall that z is a subgradient of f at x if the function\n\u02dcf (y) = f (x) + hz, y xi is a lower bound for f on all of C. If f is convex, then a subgradient\nexists at every point, and the subgradient is unique if and only if f is differentiable at that point.)\nThe function f is H-strongly convex w.r.t. `2 if for every y 2C , we can bound f below on C by a\nquadratic function of the form \u02dcf (y) = f (x) + hz, y xi + H\n2. If f is twice differentiable,\nH-strong convexity is equivalent to the requirement that all eigenvalues of r2f (w) be at least H\nfor all w 2C .\nWe denote by D the set of allowable cost functions; the input sequence thus lies in DT .\nDifferential Privacy, and Challenges for Privacy in the Online Setting: We seek to design on-\nline learning algorithms that satisfy differential privacy [DMNS06, Dwo06], which ensures that the\namount of information an adversary learns about a particular cost function ft in the function se-\nquence F is almost independent of its presence or absence in F . Each ft can be thought as private\ninformation belonging to an individual. The appropriate notion of privacy here is when the entire\nsequence of outputs of the algorithms ( \u02c6w1, ..., \u02c6wT ) is revealed to an attacker (the continual observa-\ntion setting [DNPR10]). Formally, we say two input sequences F, F 0 2D T are neighbors if they\ndiffer only in one entry (say, replacing ft by f0t).\nDe\ufb01nition 2 (Differential privacy [DMNS06, Dwo06, DNPR10]). A randomized algorithm A is\n(\u270f, )-differentially private if for every two neighboring sequences F, F 0 2D T , and for every event\nO in the output space CT ,\n(2)\nIf is zero, then we simply say A is \u270f-differentially private.\nHere A(F ) refers to the entire sequence of outputs produced by the algorithm during its execution.1\nOur protocols all satisfy \u270f-differential privacy (that is, with = 0). We include in the de\ufb01nition\nfor comparison with previous work.\n\nPr[A(F ) 2O ] \uf8ff e\u270f Pr[A(F 0) 2O ] + .\n\n2 ky xk2\n\n1As de\ufb01ned, differential privacy requires indistinguishable outputs only for nonadaptively chosen sequences\n(that is, sequences where the inputs at time t are \ufb01xed ahead of time and do not depend on the outputs at times\n1, ..., t 1). The algorithms in our paper (and in previous work) in fact satisfy a stronger adaptive variant,\nin which an adversary selects the input online as the computation proceeds. When = 0, the nonadaptive\nand adaptive variants are equivalent [DNPR10]. Moreover, protocols based on \u201crandomized response\u201d or the\n\u201ctree-based sum\u201d protocol of [DNPR10, CSS10] are adaptively secure, even when > 0. We do not de\ufb01ne the\nadaptive variant here explicitly, but we use it implicitly when proving privacy.\n\n2\n\n\fDifferential privacy provides meaningful guarantees in against an attacker who has access to con-\nsiderable side information: the attacker learns the same things about someone whether or not their\ndata were actually used (see [KS08, DN10, KM12] for further discussion).\nDifferential privacy is particularly challenging to analyze for online learning algorithms, since a\nchange in a single input at the beginning of the sequence may affect outputs at all future times in\nways that are hard to predict. For example, a popular algorithm for online learning is online gradient\ndescent: at each time step, the parameter is updated as wt+1 =\u21e7 C(wt1 \u2318trft1(wt1)),\nwhere \u21e7C(x) the nearest point to x in C, and \u2318t > 0 is a parameter called the learning rate. A\nchange in an input fi (replacing it with f0i) leads to changes in all subsequent outputs wi+1, wi+2, ...,\nroughly pushing them in the direction of rfi(wi) rf0i(wi). The effect is ampli\ufb01ed by the fact\nthat the gradient of subsequent functions fi+1, fi+2, ... will be evaluated at different points in the\ntwo streams.\n\nPrevious Approaches: Despite the challenges, there are several results on differentially private\nonline learning. A special case, \u201clearning from experts\u201d in the full information setting, was discussed\nin the seminal paper of Dwork, Naor, Pitassi and Rothblum [DNPR10] on privacy under continual\nobservation. In this case, the set of available actions is the simplex ({1, ..., p}) and the functions fi\nare linear with coef\ufb01cients in {0, 1} (that is, ft(w) = hw, cti where ct 2{ 0, 1}p). Their algorithm\nguarantees a weaker notion of privacy than the one we consider2 but, when adapted to our stronger\nsetting, it yields a regret bound of O(ppT /\u270f).\nJain, Kothari and Thakurta [JKT12] de\ufb01ned the general problem of private online learning, and gave\nalgorithms for learning convex functions over convex domains in the full information setting. They\ngave algorithms that satisfy (\u270f, )-differential privacy with > 0 (our algorithms satisfy the stronger\nvariant with = 0). Speci\ufb01cally, their algorithms have regret \u02dcO(pT log(1/)/\u270f) for Lipshitz-\nbounded, strongly convex cost functions and \u02dcO(T 2/3 log(1/)/\u270f) for general Lipshitz convex costs.\nThe idea of [JKT12] for learning strongly convex functions is to bound the sensitivity of the entire\nvector of outputs w1, w2, ... to a change in one input (roughly, they show that when fi is changed, a\nsubsequent output wj changes by O(1/|j i|)).\nUnfortunately, the regret bounds obtained by previous work remain far from the best nonprivate\nbounds. [Zin03] gave an algorithm with regret O(pT ) for general Lipshitz functions, assuming L\nand the diameter kCk2 of C are constants. \u2326(pT ) regret is necessary (see, e.g., [HAK07]), so the\n\ndependence on T of [Zin03] is tight. When cost functions in F are H-strongly convex for constant\nH, then the regret can be improved to O(log T ) [HAK07], which is also tight. In this work, we give\nnew algorithms that match these nonprivate bounds\u2019 dependence on T , up to (poly log T )/\u270f factors.\nWe note that [JKT12] give one algorithm for a speci\ufb01c strongly convex problem, online linear re-\ngression, with regret poly(log T ). One can view that algorithm as a special case of our results.\nWe are not aware of any previous work on privacy in the bandit setting. One might expect that bandit\nlearning algorithms are easier to make private, since they access data in a much more limited way.\nHowever, even nonprivate algorithms for bandit learning are very delicate, and private versions had\nuntil now proved elusive.\n\nOur Results: In this work we provide a technique for making a large class of online learning algo-\nrithms differentially private, in both the full information and bandit settings. In both cases, the idea is\nto search for algorithms whose decisions at time t depend only on previous time steps through a sum\nof observations made at times 1, 2, ..., t. Speci\ufb01cally, our algorithms work by measuring the gradient\nrft(wt) when ft is learned, and maintaining a differentially private running sum of the gradients\nobserved so far. We maintain this sum using the tree-based sum protocol of [DNPR10, CSS10]. We\nthen show that a class of learning algorithms known collectively as follow the approximate leader\n(the version we use is due to [HAK07]) can be run given only these noisy sums, and that their regret\ncan be bounded even when these sums are inaccurate.\nOur algorithms can be run with space O(log T ), and require O(log T ) running time at each step.\n\n2Speci\ufb01cally, Dwork et al. [DNPR10] provide single-entry-level privacy, in the sense that a neighboring\ndata set may only differ in one entry of the cost vector for one round. In contrast, we allow the entire cost\nvector to change at one round. Hiding that larger set of possible changes is more dif\ufb01cult, so our algorithms\nalso satisfy the weaker notion of Dwork et al.\n\n3\n\n\f\u270f\n\nOur contributions for the full information setting and their relation to previous work is summarized\nin Table 1. Our main algorithm, for strongly convex functions, achieves regret O( log2.5 T\n), ignoring\nfactors of the dimension p, Lipschitz continuity L and strong convexity H. When strong convexity\nis not guaranteed, we use regularization to ensure it (similar to what is done in nonprivate settings,\n). These bounds essen-\n\ne.g. [Sha11]). Setting parameters carefully, we get regret of O(q T log2.5 T\ntially match the nonprivate lower bounds of \u2326(log T ) and \u2326(pT ), respectively.\nThe results in the full information setting apply even when the input stream is chosen adaptively as\na function of the algorithm\u2019s choices at previous time steps. In the bandit setting, we distinguish\nbetween oblivious and adaptive adversaries.\nFurthermore, in the bandit setting, we assume that C is sandwiched between two concentric L2-balls\nof radii r and R (where r < R). We also assume that for all w 2C , |ft(w)|\uf8ff B for all t 2 [T ].\nSimilar assumption were made in [FKM05, ADX10].\nOur results are summarized in Table 2. For most of the settings we consider, we match the depen-\ndence on T of the best nonprivate algorithm, though generally not the dependence on the dimension\np.\n\n\u270f\n\nFunction class\n\nLearning with ex-\nperts (linear func-\ntions over C =\n({1, ..., p})\nLipshitz\n\nand\n\nLipshitz and\nstrongly convex\n\nPrevious\nbound.\n\u02dcO(ppT /\u270f)\n\nprivate\n\nupper\n\nOur algorithm\n\nNonprivate\nlower bound\n\n[DNPR10] O(ppT log2.5 T /\u270f) \u2326(pT log p)\n\n\u02dcO(ppT 2/3 log(1/)/\u270f)\n[JKT12]\n\u02dcO(ppT log2(1/)/\u270f)\n[JKT12]\n\nO(qpT log2.5 T /\u270f) \u2326(pT )\n\nO(p log2.5 T /\u270f)\n\n\u2326(log T )\n\nTable 1: Regret bounds for online learning in the full information setting. Bounds in lines 2 and 3\nhide the (polynomial) dependencies on parameters L, H. Notation \u02dcO(\u00b7) hides poly(log(T )) factors.\n\nFunction class\nLearning with experts (linear func-\ntions over C =( {1, ..., p})\nLipschitz\nLipschitz\nstrongly\n(Adaptive)\nLipschitz\n(Oblivious)\n\nstrongly\n\nconvex\n\nconvex\n\nand\n\nand\n\nOur result\n\u02dcO(pT 3/4/\u270f) O(pT )\n\nBest nonprivate bound\n[AHR08]\n\n\u02dcO(pT 3/4/\u270f) O(pT 3/4)\n[FKM05]\n\u02dcO(pT 3/4/\u270f) O(p2/3T 3/4)[ADX10]\n\n\u02dcO(pT 2/3/\u270f) O(p2/3T 2/3)[ADX10]\n\nTable 2: Regret bounds for online learning in the bandit setting.\n\nknown nonprivate lower bound is pT . The \u02dcO(\u00b7) notation hides poly log factors in T . Bounds hide\n\nIn all these settings, the best\n\npolynomial dependencies on L, H, r and R.\n\nIn the remainder of the text, we refer to appendices for many of the details of algorithms and proofs.\nThe appendices can be found in the \u201cSupplementary Materials\u201d associated to this paper.\n\n2 Private Online Learning: Full-information Setting\n\nIn this section we adapt the Follow The Approximate Leader (FTAL) algorithm of [HAK07] to\ndesign a differentially private variant. Our modi\ufb01ed algorithm, which we call Private Follow The\n\n4\n\n\fApproximate Leader (PFTAL), needs a new regret analysis as we have to deal with randomness due\nto differential privacy.\n\n2.1 Private Follow The Approximate Leader (PFTAL) with Strongly Convex Costs\n\nschitz constant: L, convex set: C\u2713 Rp and privacy parameter: \u270f.\n\nAlgorithm 1 Differentially Private Follow the Approximate Leader (PFTAL)\nInput: Cost functions: hf1,\u00b7\u00b7\u00b7 , fTi (in an online sequence), strong convexity parameter: H, Lip-\n1: \u02c6w1 Any vector from C. Output \u02c6w1.\n2: Pass 5f1( \u02c6w1), L2-bound L and privacy parameter \u270f to the tree based aggregation protocol and\nreceive the current partial sum in \u02c6v1.\n3: for time steps t 2{ 1,\u00b7\u00b7\u00b7 , T 1} do\n4:\n\n\u02c6wt+1 arg min\nPass 5ft+1( \u02c6wt+1), L2-bound L and privacy parameter \u270f to the tree-based protocol (Algo-\nrithm 2) and receive the current partial sum in \u02c6vt+1.\n\ntP\u2327 =1kw \u02c6w\u2327k2\n\nw2C h\u02c6vt, wi + H\n\n2. Output \u02c6wt.\n\n5:\n\n2\n\n6: end for\n\nThe main idea in PFTAL algorithm is to execute the well-known Follow The Leader algorithm (FTL)\nalgorithm [Han57] using quadratic approximations \u02dcf1,\u00b7\u00b7\u00b7 , \u02dcfT of the cost functions f1,\u00b7\u00b7\u00b7 , fT .\nRoughly, at every time step (t + 1), PFTAL outputs a vector w that approximately minimizes the\nsum of the approximations \u02dcf1,\u00b7\u00b7\u00b7 , \u02dcft over the convex set C.\nLet \u02c6w1,\u00b7\u00b7\u00b7 , \u02c6wt be the sequence of outputs produced in the \ufb01rst t time steps, and let ft be the cost-\nfunction at step t. Consider the following quadratic approximation to ft (as in [HAK07]). De\ufb01ne\n\n2 kw \u02c6wtk2\n\n\u02dcft(w) = ft( \u02c6wt) + h5ft( \u02c6wt), w \u02c6wti + H\n\n(3)\nwhere H is the strong convexity parameter. Notice that ft and \u02dcft have the same value and gradient\nat \u02c6wt (that is, ft( \u02c6wt) = \u02dcft( \u02c6wt) and 5ft( \u02c6wt) = 5 \u02dcft( \u02c6wt)). Moreover, \u02dcft is a lower bound for ft\neverywhere on C.\n\u02dcf\u2327 (w) be the \u201cleader\u201d corresponding to the cost functions \u02dcf1,\u00b7\u00b7\u00b7 , \u02dcft.\nLet \u02dcwt+1 = arg min\nw2C\nMinimizing the sum of \u02dcft(w) is the same as minimizing the sum of \u02dcft(w)ft( \u02c6wt), since subtracting\na constant term won\u2019t change the minimizer. We can thus write \u02dcwt+1 as\n\ntP\u2327 =1\n\n2\n\n2\n\n2\n\n(4)\n\ntX\u2327 =1\n\nkw \u02c6w\u2327k2\n\n5ft( \u02c6w\u2327 ), wi + H\n\ntX\u2327 =1\nw2C h\n\u02dcwt+1 = arg min\nSuppose, \u02c6w1,\u00b7\u00b7\u00b7 , \u02c6wt have been released so far. To release a private approximation to \u02dcwt+1, it\nsuf\ufb01ces to approximate vt+1 = Pt\n\u2327 =1 5ft( \u02c6w\u2327 ) while ensuring differential privacy. If we \ufb01x the\npreviously released information \u02c6w\u2327 , then changing any one cost function will only change one of\nthe summands in vt+1.\nWith the above observation, we abstract out the following problem: Given a set of vectors\nz1,\u00b7\u00b7\u00b7 , zT 2 Rp, compute all the partial sums vt =\nz\u2327 , while preserving privacy. This problem\nis well studied in the privacy literature. Assuming each zt has L2-norm of at most L0, the following\ntree-based aggregation scheme will ensure that in expectation, the noise (in terms of L2-error) in\neach of vt is OpL0 log1.5 T /\u270f and the whole sequence v1,\u00b7\u00b7\u00b7 , vT is \u270f-differentially private. We\nTree-based Aggregation [DNPR10, CSS10]: Consider a complete binary tree. The leaf nodes are\nthe vectors z1,\u00b7\u00b7\u00b7 , zT . (For the ease of exposition, assume T to be a power of two. In general,\nwe can work with the smallest power of two greater than T ). Each internal node in the tree stores\nthe sum of all the leaves in its sub-tree. In a differentially private version of this tree, we ensure\nthat each node\u2019s sub-tree sum is (\u270f/log2T )-differentially private, by adding a noise vector b 2 Rp\n\nnow describe the tree-based scheme.\n\ntP\u2327 =1\n\n5\n\n\fwhose L2-norm is Gamma distributed and has standard deviation O(\n). Since each zt only\naffects log2T nodes in the tree, by the composition property [DMNS06], the complete tree will be\n\u2327 =1 z\u2327\n), since one can compute vt from at most log T nodes in the tree. A formal\n\n\u270f-differentially private. Moreover, the algorithm\u2019s error in estimating any partial sum vt =Pt\n\ngrows as O(\ndescription of the tree based aggregation scheme in given in Appendix A.\nNow we complete the PFTAL algorithm by computing the private version \u02c6wt+1 of \u02dcwt+1 in (4) as\nthe minimizer of the perturbed loss function:\n\nppL0 log2 T\n\n\u270f\n\nppL0 log T\n\n\u270f\n\ntX\u2327 =1\n\n\u02c6wt+1 = arg min\n\nw2C h\u02c6vt, wi + H\n\n2\n\nkw \u02c6w\u2327k2\n\n2\n\n(5)\n\nHere \u02c6vt is the noisy version of vt, computed using the tree-based aggregation scheme. A formal\ndescription of the algorithm is given in Algorithm 1.\n\nNote on space complexity: For simplicity, in the description of tree based aggregation scheme\n(Algorithm 2 in Appendix A) we maintain the complete binary tree. However, it is not hard to show\nat any time step t, it suf\ufb01ces to keep track of the vectors (of partial sums) in the path from zt to the\nroot of the tree. So, the amount of space required by the algorithm is O(log T ).\n\n2.1.1 Privacy and Utility Guarantees for PFTAL (Algorithm 1)\nIn this section we provide the privacy and regret guarantees for the PFTAL algorithm (Algorithm 1).\nFor detailed proofs of the theorem statements, see Appendix B.\nTheorem 3 (Privacy guarantee). Algorithm 1 is \u270f-differentially private.\n\nProof Sketch. Given the binary tree, the sequence \u02c6w2,\u00b7\u00b7\u00b7 , \u02c6wT is completely determined. Hence,\nit suf\ufb01ces to argue privacy for the collection of noisy sums associated to nodes in the binary tree.\nAt \ufb01rst glance, it seems that each loss function affects only one leaf in the tree, and hence at most\nlog T of the nodes\u2019 partial sums. If it were true, that statement would make the analysis simple.\nThe analysis is delicate, however, since the value (gradient z\u2327 ) at a leaf \u2327 in the tree depends on the\npartial sums that are released before time \u2327. Hence, changing one loss function ft actually affects\nall subsequent partial sums. One can get around this by using the fact that differential privacy\ncomposes adaptively [DMNS06]: we can write the computations done on a particular loss function\nft as a sequence of log T smaller differentially private computations, where the each computation\nin the sequence depends on the outcome of previous ones. See Appendix B for details.\n\nIn terms of regret guarantee, we show that our algorithm enjoys regret of O(p log2.5 T ) (assuming\nother parameters to be constants). Compared to the non-private regret bound of O(log T ), our regret\nbound has an extra log1.5 T factor and an explicit dependence on the dimensionality (p). A formal\nregret bound for PFTAL algorithm is given in Theorem 4.\nTheorem 4 (Regret guarantee). Let f1,\u00b7\u00b7\u00b7 , fT be L-Lipschitz, H-strongly convex functions and let\nC\u2713 Rp be a \ufb01xed convex set. For adaptive adversaries, the expected regret satis\ufb01es:\n\nE [Regret(T )] = O\u2713 p(L + HkCk2)2 log2.5 T\n\n\u270fH\n\n\u25c6 .\n\nHere expectation is taken over the random coins of the algorithm and adversary.\n\nResults for Lipschitz Convex Costs: Our algorithm for strongly convex costs can be adapted to\narbitrary Lipschitz convex costs by executing Algorithm 1 on functions ht(w) = ft(w) + H\n2 kwk2\n2\ninstead of the ft\u2019s. Setting H = O(p log2.5 T /(\u270fpT )) will give us a regret bound of \u02dcO(ppT /\u270f).\nSee Appendix C for details.\n\n3 Private Online Learning: Bandit Setting\n\nIn this section we adapt the Private Follow the Approximate Leader (PFTAL) from Section 2 to\nthe bandit setting. Existing (nonprivate) bandit algorithms for online convex optimization follow\n\n6\n\n\fa generic reduction to the full-information setting [FKM05, ADX10], called the \u201cone-point\u201d (or\n\u201cone-shot\u201d) gradient trick. Our adaptation of PFTAL to the bandit setting also uses this technique.\nSpeci\ufb01cally, to de\ufb01ne the quadratic lower bounds to the input cost functions (as in (3)), we replace\nthe exact gradient of ft at \u02c6wt with a one-point approximation.\nIn this section we describe our results for strongly convex costs. Speci\ufb01cally, to de\ufb01ne the quadratic\nlower bounds to the input cost functions (as in (3)), we replace the exact gradient of ft at \u02c6wt with\na one-point approximation. As in the full information setting, one may obtain regret bounds for\ngeneral convex functions in the bandit setting by adding a strongly convex regularizer to the cost\nfunctions.\n\nOne-point Gradient Estimates [FKM05]: Suppose one has to estimate the gradient of a function\nf : Rp ! R at a point w 2 Rp via a single query access to f. [FKM05] showed that one can\napproximate 5f (w) by p\n f (w + u)u, where > 0 is a small real parameter and u is a uniformly\nrandom vector from the p-dimensional unit sphere Sp1 = {a 2 Rp : kak2 = 1}. More precisely,\nEuh p\n5f (w) = lim\n!0\nFor \ufb01nite, nonzero values of , one can view this technique as estimating the gradient of a smoothed\nversion of f. Given > 0, de\ufb01ne \u02c6f (w) = Ev\u21e0Bp [f (w + v)] where Bp is the unit ball in Rp. That\nis, \u02c6f = f \u21e4 UBp is the convolution of f with the uniform distribution on the ball Bp of radius .\nBy Stokes\u2019 theorem, we have Eu\u21e0Sp1h p\n\n f (w + u)ui = 5 \u02c6f (w).\n\n f (w + u)ui.\n\n3.1 Follow the Approximate Leader (Bandit version): Non-private Algorithm\nLet \u02dcW = h \u02dcw1,\u00b7\u00b7\u00b7 , \u02dcwTi be a sequence of vectors in C (the outputs of the algorithm). Corresponding\nto the smoothed function \u02c6ft = f \u21e4 UBp, we de\ufb01ne a quadratic lower bound \u02c6gt:\n2 kw \u02dcwtk2\n\n(6)\nNotice that \u02c6gt is a uniform lower bound on \u02c6ft satisfying \u02c6gt( \u02dcwt) = \u02c6ft( \u02dcwt) and 5\u02c6gt( \u02dcwt) = 5 \u02c6ft( \u02dcwt).\nTo de\ufb01ne \u02c6gt, one needs access to 5 \u02c6ft( \u02dcwt). As suggested above, we replace the true gradient with\nthe one-point estimate. Consider the following proxy \u02dcgt for \u02c6gt:\n\n\u02c6gt(w) = \u02c6ft( \u02dcwt) + h5 \u02c6ft( \u02dcwt), w \u02dcwti + H\n\n2\n\np\n\n\n+h\n\nft( \u02dcwt + ut)ut, wi +\n\nH\n2 kw \u02dcwtk2\n\n2\n\n(7)\n\n\u02dcgt(w) = \u02c6ft( \u02dcwt) h5 \u02c6ft( \u02dcwt), \u02dcwti\n}\n\n{z\n\n|\n\nA\n\nwhere uT is drawn uniformly from the unit sphere Sp1. Note that in (7) we replaced the gradient\nof \u02c6ft with its one-point approximation only in one of its two occurrences (the inner product with w).\nWe would like to de\ufb01ne \u02dcwt+1 as the minimizer of the sum of proxiesPt\n\u2327 =1 \u02dcg\u2327 (w). One dif\ufb01culty\nremains: because ft is only assumed to be de\ufb01ned on C, the approximation p\n ft( \u02dcwt + ut)ut is only\nde\ufb01ned when \u02dcwt is suf\ufb01ciently far inside C. Recall from the introduction that we assume C contains\nrBp (the ball of radius r). To ensure that we only evaluate f on C, we actually minimize over a\nsmaller set (1 \u21e0)C, where \u21e0 = \n\u02dcwt+1 = arg min\n\n\u02dcg\u2327 (w) = arg min\n\nr . We obtain:\n\ntX\u2327 =1\u2713 p\n\n\n\nft( \u02dcwt + ut)ut\u25c6, wi+ H\n\n2\n\ntX\u2327 =1\n\nkw \u02dcw\u2327k2\n(8)\n\n2\n\nw2(1\u21e0)C\n\ntX\u2327 =1\n\nw2(1\u21e0)C h\n\n(We have use the fact that to minimize \u02dcgt, one can ignore the constant term A in (7).)\nWe can now state the bandit version of FTAL. At each step t = 1, ..., T :\n\n1. Compute \u02dcwt+1 using (8).\n2. Output \u02c6wt = \u02dcwt + ut.\n\nTheorem 12 (in Appendix D) gives the precise regret guarantees for this algorithm. For adaptive\nadversaries the regret is bounded by \u02dcO(p2/3T 3/4) and for oblivious adversaries the regret is bounded\nby \u02dcO(p2/3T 2/3).\n\n7\n\n\f3.2 Follow the Approximate Leader (Bandit version): Private Algorithm\n\nTo make the bandit version of FTAL \u270f-differentially private, we replace the value vt =\n\n ft(w\u2020t + ut)ut\u2318 with a private approximation v\u2020t computed using the tree-based sum\n\nprotocol. Speci\ufb01cally, at each time step t we output\n\n\u2327 =1\u21e3 p\nPt\n\nw\u2020t+1 = arg min\n\nw2(1\u21e0)C hv\u2020t , wi +\n\nH\n2\n\ntX\u2327 =1\n\nkw w\u2020\u2327k2\n2 .\n\n(9)\n\nSee Algorithm 3 (Appendix E.1) for details.\nTheorem 5 (Privacy guarantee). The bandit version of Private Follow The Approximate Leader\n(Algorithm 3) is \u270f-differentially private.\n\nThe proof of Theorem 5 is exactly the same as of Theorem 3, and hence we omit the details.\nIn the following theorem we provide the regret guarantee of the Private FTAL (bandit version). For\na complete proof, see Appendix E.2.\nTheorem 6 (Regret guarantee). Let Bp be the p-dimensional unit ball centered at the origin and\nC\u2713 Rp be a convex set such that rBp \u2713C\u2713 RBp (where 0 < r < R). Let f1,\u00b7\u00b7\u00b7 , fT be L-\nLipschitz, H-strongly convex functions such that for all w 2C , |fi(w)|\uf8ff B. Setting \u21e0 = /r in the\nbandit version of Private Follow The Approximate Leader (Algorithm 3 in Appendix E.1), we obtain\nthe following regret guarantees.\n\n1. (Oblivious adversary) With = p\n\n2. (Adaptive adversary) With = p\n\nT 1/3 , E [Regret(T )] \uf8ff \u02dcOpT 2/3\nT 1/4 , E [Regret(T )] \uf8ff \u02dcOpT 3/4\n1 + B\n\nH\n\n\u270f\u2318. The expectations are taken over the ran-\n\nHere =\u21e3BR + (1 + R/r)L + (HkCk2+B)2\ndomness of the algorithm and the adversary.\nOne can remove the dependence on r in Thm. 6 by rescaling C to isotropic position. This increases\nthe expected regret bound by a factor of (LR + kCk2). See [FKM05] for details.\nBound for general convex functions: Our results in this section can be extended to the setting of\narbitrary Lipshitz convex costs via regularization, as in Section C (by adding H\n2 to each cost\nfunction ft) . With the appropriate choice of H the regret scales as \u02dcO(T 3/4/\u270f) for both oblivious\nand adaptive adversaries. See Appendix E.3 for details.\n\n2 kwk2\n\n4 Open Questions\n\nOur work raises several interesting open questions: First, our regret bounds with general convex\nfunctions have the form \u02dcO(pT /\u270f). We would like to have a regret bound where the parameter 1/\u270f\nis factored out with lower order terms in the regret, i.e., we would like to have regret bound of the\nform O(pT ) + o(pT /\u270f).\nSecond, our regret bounds for convex bandits are worse than the non-private bounds for linear and\nmulti-arm bandits. For multi-arm bandits [ACBF02] and for linear bandits [AHR08], the non-private\nregret bound is known to be O(pT ). If we use our private algorithm in this setting, we will incur a\nregret of \u02dcO(T 2/3). Can we get O(pT ) regret for multi-arm or linear bandits?\nFinally, bandit algorithms require internal randomness to get reasonable regret guarantees. Can we\nharness the randomness of non-private bandit algorithms in the design private bandit algorithms?\nOur current privacy analysis ignores this additional source of randomness.\n\n8\n\n\fReferences\n[ACBF02] Peter Auer, Nicol`o Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit\n\nproblem. Machine learning, 2002.\n\n[ADX10] Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization\n\n[AHR08]\n\n[BCB12]\n\nwith multi-point bandit feedback. In COLT, 2010.\nJacob Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the dark: An ef\ufb01cient algo-\nrithm for bandit linear optimization. In COLT, 2008.\nS\u00b4ebastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-\narmed bandit problems. arXiv preprint arXiv:1204.5721, 2012.\n\n[BDMN05] Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: The SuLQ\n\nframework. In PODS, 2005.\nKamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression.\n2008.\n\nIn NIPS,\n\n[CM08]\n\n[CMS11] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical\n\n[CSS10]\n\n[DJW13]\n\nrisk minimization. Journal of Machine Learning Research, 12:1069\u20131109, 2011.\nTH Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics.\nICALP, 2010.\nJohn C. Duchi, Michael I. Jordan, and Martin J. Wainwright.\ntical minimax rates.\nhttp://arxiv.org/abs/1302.3203.\n\nLocal privacy and statis-\nIn IEEE Symp. on Foundations of Computer Science (FOCS), 2013.\n\nIn\n\n[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity\n\nin private data analysis. In TCC, 2006.\nCynthia Dwork and Moni Naor. On the dif\ufb01culties of disclosure prevention in statistical databases\nor the case for differential privacy. J. Privacy and Con\ufb01dentiality, 2(1), 2010.\n\n[DN10]\n\n[DNPR10] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N Rothblum. Differential privacy under\nIn Proceedings of the 42nd ACM symposium on Theory of computing,\n\ncontinual observation.\n2010.\nCynthia Dwork. Differential privacy. In ICALP, 2006.\n\n[HAK07]\n\n[Dwo06]\n[FKM05] Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex optimiza-\n\ntion in the bandit setting: gradient descent without a gradient. In SODA, 2005.\nElad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex\noptimization. Journal of Machine Learning Research, 2007.\nJames Hannan. Approximation to bayes risk in repeated play. 1957.\nPrateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In\nCOLT, 2012.\n[JT13]\nPrateek Jain and Abhradeep Thakurta. Differentially private learning with kernels. In ICML, 2013.\n[KLN+08] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam\n\n[Han57]\n[JKT12]\n\n[KM12]\n\n[KS08]\n\n[KST12]\n\n[Sha11]\n\n[Smi11]\n\n[Zin03]\n\nSmith. What can we learn privately? In FOCS, 2008.\nDaniel Kifer and Ashwin Machanavajjhala. A rigorous and customizable framework for privacy.\nIn PODS, 2012.\nShiva Prasad Kasiviswanathan and Adam Smith. A note on differential privacy: De\ufb01ning resis-\ntance to arbitrary side information. CoRR, arXiv:0803.39461 [cs.CR], 2008.\nDaniel Kifer, Adam Smith, and Abhradeep Thakurta. Private convex empirical risk minimization\nand high-dimensional regression. In COLT, 2012.\nShai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends R\nin Machine Learning, 2011.\nAdam Smith. Privacy-preserving statistical estimators with optimal convergence rates. In STOC,\n2011.\nMartin Zinkevich. Online convex programming and generalized in\ufb01nitesimal gradient ascent. In\nICML, 2003.\n\n9\n\n\f", "award": [], "sourceid": 1270, "authors": [{"given_name": "Abhradeep", "family_name": "Guha Thakurta", "institution": "Stanford University & Microsoft"}, {"given_name": "Adam", "family_name": "Smith", "institution": "Penn State University"}]}