{"title": "Query Complexity of Bayesian Private Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 2431, "page_last": 2440, "abstract": "We study the query complexity of Bayesian Private Learning: a learner wishes to locate a random target within an interval by submitting queries, in the presence of an adversary who observes all of her queries but not the responses. How many queries are necessary and sufficient in order for the learner to accurately estimate the target, while simultaneously concealing the target from the adversary? \n\nOur main result is a query complexity lower bound that is tight up to the first order. We show that if the learner wants to estimate the target within an error of $\\epsilon$, while ensuring that no adversary estimator can achieve a constant additive error with probability greater than $1/L$, then the query complexity is on the order of $L\\log(1/\\epsilon)$ as $\\epsilon \\to 0$. Our result demonstrates that increased privacy, as captured by $L$, comes at the expense of a \\emph{multiplicative} increase in query complexity. The proof  builds on Fano's inequality and properties of certain proportional-sampling estimators.", "full_text": "Query Complexity of Bayesian Private Learning\n\nKuang Xu\n\nStanford Graduate School of Business\n\nStanford, CA 94305, USA\nkuangxu@stanford.edu\n\nAbstract\n\nWe study the query complexity of Bayesian Private Learning: a learner wishes to\nlocate a random target within an interval by submitting queries, in the presence\nof an adversary who observes all of her queries but not the responses. How many\nqueries are necessary and suf\ufb01cient in order for the learner to accurately estimate\nthe target, while simultaneously concealing the target from the adversary?\nOur main result is a query complexity lower bound that is tight up to the \ufb01rst\norder. We show that if the learner wants to estimate the target within an error of \u270f,\nwhile ensuring that no adversary estimator can achieve a constant additive error\nwith probability greater than 1/L, then the query complexity is on the order of\nL log(1/\u270f) as \u270f ! 0. Our result demonstrates that increased privacy, as captured\nby L, comes at the expense of a multiplicative increase in query complexity. The\nproof builds on Fano\u2019s inequality and properties of certain proportional-sampling\nestimators.\n\n1\n\nIntroduction\n\nHow to learn, while ensuring that a spying adversary does not learn? Enabled by rapid advancements\nin the Internet, surveillance technologies and machine learning, companies and governments alike\nhave become increasingly capable of monitoring the behavior of individuals or competitors, and\nuse such data for inference and prediction. Motivated by these developments, the present paper\ninvestigates the extent to which it is possible for a learner to protect her knowledge from an adversary\nwho observes, completely or partially, her actions.\nWe will approach these questions by studying the query complexity of Bayesian Private Learning,\na framework proposed by [17] and [13] to investigate the privacy-ef\ufb01ciency trade-off in sequential\nlearning. Our main result is a tight lower bound on query complexity, showing that there will be a\nprice to pay for the learner in exchange for improved privacy, whose magnitude scales multiplicatively\nwith respect to the level of privacy desired. In addition, we provide a family of inference algorithms\nfor the adversary, based on proportional sampling, which is provably effective in estimating the target\nagainst any learner who does not employ a large number of queries.\n\n1.1 The Model: Bayesian Private Learning\n\nWe begin by describing the Bayesian Private Learning model formulated by [17] and [13]. A\nlearner is trying to accurately identify the location of a random target, X\u21e4, up to some constant\nadditive error, \u270f, where X\u21e4 is uniformly distributed in the unit interval, [0, 1). The learner gathers\ninformation about X\u21e4 by submitting n queries, (Q1, . . . , Qn) 2 [0, 1)n, for some n 2 N. For\neach query, Qi, she receives a binary response, indicating the target\u2019s location relative to the query:\nRi = I(X\u21e4 \uf8ff Qi),\n\ni = 1, 2, . . . , n, where I(\u00b7) denotes the indicator function.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fThe learner submits the queries in a sequential manner, and subsequent queries may depend on\nprevious responses. Once all n queries are submitted, the learner will produce an estimator for the\ntarget. The learner\u2019s behavior is formally captured by a learner strategy, de\ufb01ned as follows.\nDe\ufb01nition 1.1 (Learner Strategy). Fix n 2 N. Let Y be a uniform random variable over [0, 1),\nindependent from other parts of the system; Y will be referred to as the random seed. A learner\nstrategy,  = (q, l), consists of two components:\n1. Querying mechanism: q = (q\nn), is a sequence of deterministic functions, where\nq\ni : [0, 1)i1 \u21e5 [0, 1) ! [0, 1) takes as input past responses and the random seed, Y , and generates\nthe next query, i.e.,1\n(1.1)\n\ni = 1, . . . , n,\n\n1, . . . , q\n\nQi = q\n\ni (Ri1, Y ),\n\nwhere Ri denotes the responses from the \ufb01rst i queries: Ri = (R1, . . . , Ri), and R0 4= ;.\n2. Estimator: l : [0, 1)n \u21e5 [0, 1) ! [0, 1) is a deterministic function that maps all responses, Rn,\nand Y to a point in the unit interval that serves as a \u201cguess\u201d for X\u21e4: bX = l(Rn, Y ).. The estimator\nbX will be referred to as the learner estimator.\n\nWe will use n to denote the family of learner strategies that submit n queries.\n\nThe \ufb01rst objective of the learner is to accurately estimate the target, as is formalized in the following\nde\ufb01nition.\nDe\ufb01nition 1.2 (\u270f-Accuracy). Fix \u270f 2 (0, 1). A learner strategy, , is \u270f-accurate, if its estimator\napproximates the target within an absolutely error of \u270f/2 almost surely, i.e.,\n\nP\u21e3bX  X\u21e4 \uf8ff \u270f/2\u2318 = 1,\n\n(1.2)\n\nwhere the probability is with respect to the randomness in the target, X\u21e4, and the random seed, Y .\n\nWe now introduce the notion of privacy: in addition to estimating X\u21e4, the learner would like to\nsimultaneously conceal X\u21e4 from an eavesdropping adversary. Speci\ufb01cally, there is an adversary\nwho knows the learner\u2019s query strategy, and observes all of the queries but not the responses. The\nadversary then uses the query locations to generate her own adversary estimator for X\u21e4, denoted by\n\nbX a, which depends on the queries, (Q1, . . . , Qn), and any internal, idiosyncratic randomness.\nWith the adversary\u2019s presence in mind, we de\ufb01ne the notion of a private learner strategy.\nDe\ufb01nition 1.3 ((, L)-Privacy). Fix  2 (0, 1) and L 2 N. A learner strategy, , is (, L)-private if,\nfor any adversary estimator, bX a,\n\n(1.3)\nwhere the probability is measured with respect to the randomness in the target, X\u21e4, and any\nrandomness employed by the learner strategy and the adversary estimator.2\n\nP(|bX a  X\u21e4|\uf8ff /2) \uf8ff 1/L,\n\nIn particular, if a learner employs a (, L)-private strategy, then no adversary estimator can be close\nto the target within an absolute error of /2 with a probability great than 1/L. Therefore, for any\n\ufb01xed , the parameter L can be interpreted as the level of desired privacy.\nWe are now ready to de\ufb01ne the main quantity of interest in this paper: query complexity.\nDe\ufb01nition 1.4. Fix \u270f and  in [0, 1], and L 2 N. The query complexity, N (\u270f, , L), is the least\nnumber of queries needed for an \u270f-accurate learner strategy to be (, L)-private:\n\nN (\u270f, , L) 4= min{n : n contains a strategy that is both \u270f-accurate and (, L)-private}.\n1Note that the query Qi does not explicitly depend on previous queries, {Q1, . . . , Qi1}, but only their\nresponses. This is without the loss of generality, since for a given value of Y it is easy to see that {Q1, . . . , Qi1}\ncan be reconstructed once we know their responses and the functions q\n2This de\ufb01nition of privacy is reminiscent of the error metric used in Probably Approximately Correct (PAC)\nlearning ([14]), if we view the adversary as trying to learn a (trivial) constant function to within an L1 error of\n/2 with a probability great than 1/L.\n\nn.\n1, . . . , q\n\n2\n\n\f2 Main Result\n\nThe main objective of the paper is to understand how N (\u270f, , L) varies as a function of the input\nparameters, \u270f,  and L. Our result will focus on the regime of parameters where3\n\n0 <\u270f</\n\n4 and < 1/L.\n\n(2.1)\n\nThe following theorem is our main result. The upper bound has appeared in [17] and [13] and is\nincluded for completeness; the lower bound is the contribution of the present paper.\nTheorem 2.1 (Query Complexity of Bayesian Private Learning). Fix \u270f and  in (0, 1) and L 2 N,\nsuch that \u270f</ 4 and < 1/L. The following is true.\n\n1. Upper bound:\n\n2. Lower bound:\n\nN (\u270f, , L) \uf8ff L log(1/\u270f)  L(log L  1)  1.\n\nN (\u270f, , L)  L log(1/\u270f)  L log(2/)  3L log log(/\u270f).\n\n(2.2)\n\n(2.3)\n\nBoth the upper and lower bounds in Theorem 2.1 are constructive, in the sense that there is a concrete\nlearner strategy that achieves the upper bound, and an adversary estimator that forces any learner\nstrategy to employ at least as many queries as that prescribed by the lower bound.\nIf we apply Theorem 2.1 in the regime where  and L stay \ufb01xed, while the learner\u2019s error tolerance,\n\u270f, tends to zero, we obtain the following corollary in which the upper and lower bounds on query\ncomplexity coincide.\nCorollary 2.2. Fix  2 (0, 1) and L 2 N, such that < 1/L.4\n\nN (\u270f, , L) \u21e0 L log(1/\u270f),\n\nas \u270f ! 0.\n\n(2.4)\n\nNote that the special case of L = 1 corresponds to when the learner is not privacy-constrained and\naims to solely minimize the number of queries. Theorem 2.1 and Corollary 2.2 thus demonstrate that\nthere is a hefty price to pay in exchange for privacy, as the query complexity depends multiplicatively\non the level of privacy, L.\n\n3 Motivation and Related Literature\n\nBayesian Private Learning is a more general model that contains, as a special case (L = 1), the\nclassical problem of sequential learning with binary feedback, with applications in statistics ([11]),\ninformation theory ([7]) and optimization ([15]). The Bayesian Private Learning model thus inherits\nthese applications, with the additional feature that, instead of being solely interested in minimizing\nthe number of queries, the decision maker would also like to ensure the privacy of the target. Note\nthat our present model assumes that all responses are noiseless, in contrast to some of the noisy query\nmodels in the literature (cf. [10, 1, 15]).\nBayesian Private Learning is a variant of the so-called Private Sequential Learning problem. Both\nmodels were formulated in [17] and [13], and the main distinction between the two is that the target\nis drawn randomly in Bayesian Private Learning, while it is chosen in a worst-case fashion (against\nthe adversary) in the original Private Sequential Learning model. [17] and [13] establish matching\n\n3Having \u270f< 1\n\n4  corresponds to a setting where the learner would like to identify the target with high\n4 is likely an artifact of our\naccuracy, while the adversary is aiming for a coarser estimate; the speci\ufb01c constant 1\nanalysis and could potentially be improved to being closer to 1. Note that the regime where \u270f> is arguably\nmuch less interesting, because it is not natural to expect the adversary, who is not engaged in the querying\nprocess, to have a higher accuracy requirement than the learner. The requirement that < 1/L stems from the\nfollowing argument. If > 1/L, then the adversary can simply draw a point uniformly at random in [0, 1) and\nbe guaranteed that the target will be within /2 with a probability greater than 1/L. Thus, the privacy constraint\nis automatically violated, and no private learner strategy exists. To obtain a nontrivial problem, we therefore\nneed only to consider the case where < 1/L.\n4We will use the asymptotic notation f (x) \u21e0 g(x) to mean that f is on the order of g: f (x)/g(x) ! 1 as x\n\napproaches a certain limit.\n\n3\n\n\fupper and lower bounds on query complexity for Private Sequential Learning. They also propose the\nReplicated Bisection algorithm as a learner strategy for the Bayesian variant, but without a matching\nquery complexity lower bound. The present paper closes this gap.\nAt a higher level, our work is related to a growing body of literature on privacy-preserving mechanisms,\nin computer science (cf. [5, 9, 6]), operations research (cf. [4, 12]), and statistical learning theory\n(cf. [2, 8, 16]), but diverges signi\ufb01cantly in models and applications. On the methodological front, our\nproof uses Fano\u2019s inequality, a fundamental tool for deriving lower bounds in statistics, information\ntheory, and active learning ([3]).\n\n4 The Upper Bound\n\nThe next two sections are devoted to the proof of Theorem 2.1. We \ufb01rst prove the query complexity\nupper bound, and begin by giving an overview of the main ideas. Consider the special case of L = 1,\nwhere learner is solely interested in \ufb01nding the target, X\u21e4, and not at all concerned with concealing\nit from the adversary. Here, the problem reduces to the classical setting, where it is well-known\nthat the bisection strategy achieves the optimal query complexity (cf. [15]). The bisection strategy\nrecursively queries the mid-point of the interval which the learner knows to contain X\u21e4. For instance,\nthe learner would set Q1 = 1/2, and if the response R1 = 0, then she will know that X\u21e4 lies in the\ninterval [0, 1/2], and set Q2 to 1/4; otherwise, Q2 will be set to 3/4. This process repeats for n steps.\nBecause the size of the smallest interval known to contain X\u21e4 is halved with each additional query,\nthis yields the query complexity\n\nN (\u270f, , 1) = log(1/\u270f),\u270f\n\n2 (0, 1).\n\n(4.1)\n\nUnfortunately, once the level of privacy L increases above 1, the bisection strategy is almost never\n\nprivate: it is easy to verify that if the adversary sets bX a to be the learner\u2019s last query, Qn, then the\n\ntarget is sure to be within a distance of at most \u270f. That is, the bisection strategy is not (, L)-private\nfor any L > 1, whenever \u270f</ 2. This is hardly surprising: in the quest for ef\ufb01ciency, the bisection\nstrategy submits queries that become progressively closer to the target, thus rending its location\nobvious to the adversary.\nBuilding on the bisection strategy, we arrive at a natural compromise: instead of a single bisection\nsearch over the entire unit interval, we could create L identical copies of a bisection search across L\ndisjoint sub-intervals of [0, 1) that are chosen ahead of time, in a manner that makes it impossible to\ndistinguish which search is truly looking for the target. This is the main idea behind the Replicated\nBisection strategy, \ufb01rst proposed and analyzed in [13]. We examine this strategy in Section 4, which\nwill yield the query-complexity upper bound, on the order of L log(1/\u270f).\nWe now prove the upper bound, which has appeared in [17] and [13], where the authors also proposed,\nwithout a formal proof, the Replicated Bisection learner strategy that achieves (, L)-privacy with\nL log(1/\u270f)  L(log(L)  1) queries. For completeness, we \ufb01rst review the Replicated Bisection\nstrategy and subsequently give a formal proof of its privacy and accuracy. The main idea behind\nReplicated Bisection is to create L identical copies of a bisection search in a strictly symmetrical\nmanner so that the adversary wouldn\u2019t be able to know which one of the L searches is associated\nwith the target. The strategy takes as initial inputs \u270f and L, and proceeds in two phases:\nPhase 1 - Non-adaptive Partitioning. The learner submits L  1 (non-adaptive) queries: Q1 =\nL . Adjacent queries are separated by a distant of 1/L, and together\nL , Q2 = 2\n1\nthey partition the unit interval into L disjoint sub-intervals of length 1/L each. We will refer to the\ninterval [(i  1)/L, i/L) as the ith sub-interval. Because the queries in this phase are non-adaptive,\nafter the \ufb01rst L  1 queries, while the learner knows which sub-interval contains the target, X\u21e4, the\nadversary has gained no information about X\u21e4. We will denote by I\u21e4 the sub-interval that contains\nX.\nPhase 2 - Replicated Bisection. The second phase further consists of a sequence of Krounds,\nL\u270f. In each round, the learner submits one query in each of the L sub-intervals, and the\nK = log 1\nlocation of the said query relative to the left end of the sub-interval is the same across all sub-intervals.\nCrucially, in the kth round, the query corresponds to the kth step in a bisection search carried out in\nthe sub-interval I\u21e4, which contains the target. The rounds continue until the learner has identi\ufb01ed the\nlocation of X\u21e4 with suf\ufb01cient accuracy within I\u21e4. The queries outside of I\u21e4 serve only the purpose\n\nL , . . . , QL1 = 1  1\n\n4\n\n\fof obfuscation by maintaining a strict symmetry. Figure 1 in Section A in the Supplemental Note\ncontains the pseudo-code for Phase 2.\nDenote by Q\u21e4 the last query that the learner submits in the sub-interval I\u21e4 in Phase 2, and by R\u21e4\nits response. It follows by construction that either R\u21e4 = 0 and X\u21e4 2 [Q\u21e4  \u270f, Q\u21e4), or R\u21e4 = 1 and\nX\u21e4 2 [Q\u21e4, Q\u21e4 + \u270f). Therefore, the learner can produce the estimator by setting bX to the mid point of\neither [Q\u21e4  \u270f, Q\u21e4) or [Q\u21e4, Q\u21e4 + \u270f), depending on the value of R\u21e4, and this guarantees of an additive\nerror of at most \u270f/2. We have thus shown that the Replicated Bisection strategy is \u270f-accurate. The\nfollowing result shows that it is also private; the proof is given in Section B.1 in the Supplemental\nNote.\nProposition 4.1. Fix \u270f and  in (0, 1) and L 2 N, such that \u270f</ 4 and < 1/L. The Replicated\nBisection strategy is (, L)-private.\nFinally, we verify the number of queries used by Replicated Bisection: the \ufb01rst phase employs L  1\nqueries, and the second phase uses L queries per round, across log( 1\nL\u270f ) rounds, leading to a total\nL\u270f ) = L log(1/\u270f)  L(log L  1)  1 queries. This completes the proof of the\nof (L  1) + L log( 1\nquery complexity upper bound in Theorem 2.1.\n\n5 The Lower Bound\n\nMain Ideas. We prove the query complexity lower bound in Theorem 2.1 in this section, which\nturns out to be signi\ufb01cantly more challenging than showing the upper bound. To show that the query\ncomplexity is at least, say, n, we will have to demonstrate that none of the learner strategies using\nn  1 queries, n1, can be simultaneously private and accurate. Because the suf\ufb01cient statistic\nfor the adversary to perform estimation is the posterior distribution of the target given the observed\nqueries, a frontal assault on the problem would require that we characterize the resulting target\nposterior distribution for all strategies, a daunting task given the richness of n1, which grows\nrapidly as n increases.\nOur proof will indeed take an indirect approach. The key idea is that, instead of allowing the adversary\nto use the entire posterior distribution of the target, we may restrict her to a seemingly much weaker\n\nclass of proportional-sampling estimators, where the estimator bX a is sampled from a distribution\n\nproportional to the empirical density of the queries. A proportional-sampling estimator would, for\ninstance, completely ignore the order in which the queries are submitted, which may contain useful\ninformation about the target. We will show that, perhaps surprisingly, the proportional-estimators are\nso powerful that they leave the learner no option but to use a large number of samples. This forms the\ncore of the lower bound argument. The proof further consists of the following steps.\n1. Discrete Private Learning (De\ufb01nition 5.2). We formulate a discrete version of the original problem\nwhere both the learner and adversary estimate the discrete index associated with a certain sub-interval\nthat contains the target, instead of the continuous target value. The discrete framework is conceptually\nclearer, and will allow us to deploy information-theoretic tools with greater ease.\n2. Localized Query Complexity (Lemma 5.4). Within the discrete version, we prove a localized query\ncomplexity result: conditioning on the target being in a coarse sub-interval of [0, 1), any accurate\nlearner still needs to submit a large number of queries within the said sub-interval. The main argument\nhinges on Fano\u2019s inequality and a characterization of the conditional entropy of the queries and the\ntarget.\n3. Proportional-Sampling Estimator (De\ufb01nition 5.5). We use the localized query complexity in the\nprevious step to prove a query complexity lower bound for the discrete version of Bayesian Private\nLearning (Proposition 5.3). This is accomplished by analyzing the performance of the family of\nproportional-sampling estimators, where the adversary reports index of a sub-interval that is sampled\nrandomly with probabilities proportional to the number of learner queries each sub-interval contains.\nWe will show that the proportional-sampling estimator will succeed with overwhelming probability\nwhenever an accurate learner strategy submits too few queries, thus obtaining the desired lower\nbound. In fact, we will prove a more general lower bound, where the learner can make mistakes with\na positive probability.\n4. From Discrete to Continuous (Proposition 5.6). We complete the proof by connecting the discrete\nversion back to the original, continuous problem. Via a reduction argument, we show that the\n\n5\n\n\foriginal query complexity is always bounded from below by its discrete counterpart with some\nmodi\ufb01ed learner error parameters, and the \ufb01nal lower bound will be obtained by optimizing over\nthese parameters. The main dif\ufb01culty in this portion of the proof is due to the fact that an accurate\ncontinuous learner estimator is insuf\ufb01cient for generating an accurate discrete estimator that is\ncorrect almost surely. We will resolve this problem by carefully bounding the learner\u2019s probability of\nestimation error, and apply the discrete query lower bound developed in the previous step, in which\nthe learner is allowed to make mistakes.\n\nDiscrete Bayesian Private Learning We begin by formulating a discrete version of the original\nproblem, where the goal for both the learner and the adversary is to recover a discrete index associated\nwith the target, as opposed to generating a continuous estimator. We \ufb01rst create two nested partitions\nof the unit interval consisting of equal-length sub-intervals, where one partition is coarser than the\nother. The objective of the learner is to recover the index associated with the sub-interval containing\nX\u21e4 in the \ufb01ner partition, whereas that of the adversary is to recover the target\u2019s index corresponding\nto the coarser partition (an easier task!). We consider this discrete formulation because it allows for a\nsimpler analysis using Fano\u2019s inequality, setting the stage for the localized query complexity lower\nbound in the next section.\nFormally, \ufb01x s 2 (0, 1) such that 1/s is an integer. De\ufb01ne Mi(s) to be the sub-interval\n\nMs(i) = [(i  1)s, is),\n\ni = 1, 2, . . . , 1/s.\n\n(5.1)\nIn particular, the set Ms := {Ms(i) : i = 1, . . . , 1/s} is a partition of [0, 1) into 1/s sub-intervals of\nlength s each. We will refer to Ms as the s-uniform partition. De\ufb01ne J(s, x) = j, s.t. x 2 Ms(j).\nThat is, J(s, x) denotes the indices of the interval containing x in the s-uniform partition. A\nvisualization of the index J(\u00b7,\u00b7) is given in Figure 2 in Section A of the Supplemental Note.\nWe now formulate an analogous, and slightly more general, de\ufb01nition of accuracy and privacy for the\ndiscrete problem. We will use the super-script D to distinguish them from their counterparts in the\noriginal, continuous formulation. Just like the learner strategy in De\ufb01nition 1.1, a discrete learner\nstrategy, D, is allowed to submit queries at any point along [0, 1), and has access to the random\nseed, Y . The only difference is that, instead of generating a continuous estimator, a discrete learner\nstrategy produces an estimator for the index of the sub-interval containing the target in an \u270f-uniform\npartition, J(\u270f, X\u21e4).\nDe\ufb01nition 5.1 ((\u270f, \u232b)-accuracy - Discrete Version). Fix \u270f and \u232b 2 (0, 1). A discrete learner strategy,\n\nImportantly, in contrast to its continuous counterpart in De\ufb01nition 1.2 where the estimator must satisfy\nthe error criterion with probability one, the discrete learner strategy is allowed to make mistakes up\nto a probability of \u232b. The role of adversary is similarly de\ufb01ned in the discrete formulation: upon\n\nD, is (\u270f, \u232b)-accurate if it produces an estimator, bJ, such that P\u21e3bJ 6= J(\u270f, X\u21e4)\u2318 \uf8ff \u232b.\nobserving all n queries, the adversary generates an estimator, bJ a, for the index associated with the\nsub-interval containing X\u21e4 in the (coarser) -uniform partition, J(, X\u21e4). The notion of (, L)-\nprivacy for a discrete learner strategy is de\ufb01ned in terms of the adversary\u2019s (in)ability to estimate the\nindex J(, X\u21e4).\nDe\ufb01nition 5.2 ((, L)-privacy - Discrete Version). Fix  2 (0, 1) and L 2 N. A discrete learner strat-\negy, D, is (, L)-private if under any adversary estimator bJ a, we have that P\u21e3bJ a = J(, X\u21e4)\u2318 \uf8ff\n\nn as the family of discrete learner strategies that employ at most n queries.\n\n1/L. We will denote by D\n\nWe are now ready to de\ufb01ne the query complexity of the discrete formulation, as follows:\nn contains a strategy that is both (\u270f, \u232b)-accurate and (, L)-private}.\nN D(\u270f, \u232b, , L) = min{n : D\nA main result of this subsection is the following lower bound on N D, which we will convert into one\nfor the original problem in Section 5.\nProposition 5.3 (Query Complexity Lower Bound for Discrete Learner Strategies). Fix \u270f, \u232b and  in\n(0, 1) and L 2 N, such that \u270f<< 1/L. We have that\n\n(5.2)\nwhere h(p) is the Shannon entropy of a Bernoulli random variable with mean p: h(p) = p log(p)\n(1  p) log(1  p) for p 2 (0, 1), and h(0) = h(1) = 0.\n\nN D(\u270f, \u232b, , L)  L [(1  \u232b) log(/\u270f)  h(\u232b)] ,\n\n6\n\n\fLocalized Query Complexity Lower Bound We prove Proposition 5.3 in the next two subsections.\nThe \ufb01rst step, accomplished in the present subsection, is to use Fano\u2019s inequality to establish a query\ncomplexity lower bound localized to a sub-interval: conditional on the target belonging to a sub-\ninterval in the -partition, any discrete-learner strategy must devote a non-trivial number of queries in\nthat sub-interval if it wishes to be reasonably accurate.5\nFix n 2 N, and a learner strategy D 2 D\nn . Because the strategy will submit at most n queries,\nwithout loss of generality, we may assume that if the learner wishes to terminate the process after the\n\ufb01rst K queries, then she will simply set Qi to 0 for all i 2{ K + 1, K + 2, . . . , n}, and the responses\nfor those queries will be trivially equal to 0 almost surely. Denote by Qj the set of queries that lie\nwithin the sub-interval M(j):\n\n(5.3)\nand by |Qj| its cardinality. Denote by Rj the set of responses for those queries in Qj. De\ufb01ne \u21e0j,y to\nbe the learner\u2019s (conditional) probability of error:\n\nQj 4= {Qi, . . . , Qn}\\ M(j),\n\n\u21e0j,y = P\u21e3bJ 6= J(\u270f, X\u21e4) J(, X\u21e4) = j, Y = y\u2318 ,\n\nDenote by Ej,y the event:\n\nj 2{ 1, . . . , 1/}, y 2 [0, 1).\n\n(5.4)\n\nEj,y = {J(, X\u21e4) = j, Y = y}.\n\n(5.5)\nWe have the following lemma. The proof is based on Fano\u2019s inequality, and is given in Section B.2 of\nthe Supplemental Note.\nLemma 5.4 (Localized Query Complexity). Fix \u270f, \u232b and  2 (0, 1), \u270f< , and an (\u270f, \u232b)-accurate\ndiscrete learner strategy. We have that\n\nE|Qj| J(, X\u21e4) = j, Y = y  (1  \u21e0j,y) log(/\u270f)  h(\u21e0j,y),\n\nfor all j 2{ 1, . . . , 1/}, and y 2 [0, 1), andP1/\n\nj=1 R 1\n\n0 \u21e0i,y dy \uf8ff \u232b.\n\nProportional-Sampling Estimator We now use the local complexity result in Lemma 5.4 to\ncomplete the proof of Proposition 5.3. The lemma states that if the target were to lie in a given\nsub-interval in the -uniform partition, then an accurate learner strategy would have to place at least\nlog(/\u270f) queries within the said sub-interval on average.\n\nDe\ufb01nition 5.5. A proportional-sampling estimator, bJ a, is generated according to the distribution:\n\nj = 1, 2, . . . , 1/.\n\n(5.7)\n\n(5.6)\n\nP\u21e3bJ a = j\u2318 =\n\n,\n\n|Qj|P1/\n\nj0=1 |Qj0|\n\nThat is, an index is sampled with a probability proportional to the number of queries that fall within\nthe corresponding sub-interval in the -uniform partition.\n\nWe next bound the probability of correct estimation when the adversary employs a proportional-\nsampling estimator: for all j = 1, 2, . . . , 1/, we have that\n\nP\u21e3bJ a = J(, X\u21e4)Ej,y\u2318 =P\u21e3bJ a = jEj,y\u2318 = E \nnE|Qj|Ej,y (b)\n\n(a)\n=\n\n\n\n1\n\nj0=1 |Qj0|  Ej,y!\n|Qj|P1/\nn , and henceP1/\n\n1\nn\n\nj0=1 |Qj0| = n, and step (b) from\nwhere step (a) follows from the fact that D 2 D\nLemma 5.4. Recall that the learner strategy is (\u270f, \u232b)-private, and the random seed Y has a probability\ndensity of 1 in [0, 1) and zero everywhere else. Since Eq. (5.8) holds for all j and y, we can integrate\n\n((1  \u21e0j,y) log(/\u270f)  h(\u21e0j,y)) ,\n\n(5.8)\n\n5Since all learner strategies considered in the next two subsections will be for the discrete problem, we will\n\nrefer to them simply as learner strategies when there is no ambiguity.\n\n7\n\n\f [(1  \u21e0j,y) log(/\u270f)  h(\u21e0j,y)] dy\n\nand obtain the adversary\u2019s overall probability of correct estimation:\n\n=\n\n=\n\n0\n\n1\n\n(a)\n\n\n\n1\nn\n\nP\u21e3bJ a = J(, X\u21e4)\u2318\n1/Xj=1Z 1\nP(Ej,y)P\u21e3bJ a = J(, X\u21e4)Ej,y\u2318 dy\n\u21e0j,y dy1A log(/\u270f) \nn240@1 \nZ 1\nZ 1\n1/Xj=1\n1/Xj=1\nh(\u21e0j,y) dy35\nn24(1  \u232b) log(/\u270f) \nZ 1\n1/Xj=1\nn24(1  \u232b) log(/\u270f)  h0@\n\u21e0j,y dy1A35\nZ 1\n1/Xj=1\n\n\n\n\n\n(c)\n\n(b)\n\n1\n\n1\n\n0\n\n0\n\n0\n\n0\n\n0\n\n1/Xj=1Z 1\nh(\u21e0j,y) dy35\n\n(d)\n\n\n\n1\nn\n\n[(1  \u232b) log(/\u270f)  h(\u232b)] ,\n\n(5.9)\n\nwhere step (a) follows from Eq. (5.8), steps (b) and (d) from Lemma 5.4, i.e.,P1/\n0 \u21e0i,y dy \uf8ff \u232b.\nStep (c) is a result of Jensen\u2019s inequality and the Bernoulli entropy function h(\u00b7)\u2019s being concave.\nfor a learner strategy to be (, L)-private, we must have that\nRecall\nP\u21e3bJ a = J(, X\u21e4)\u2318 \uf8ff 1\nL for any adversary estimator bJ a. Eq. (5.9) thus implies that n \nL [(1  \u232b) log(/\u270f)  h(\u232b)] is a necessary condition. Because this holds for any accurate and private\nlearner policy, we have thus proven Proposition 5.3.\n\nj=1 R 1\n\nin order\n\nthat,\n\nFrom Discrete to Continuous Strategies We now connect Proposition 5.3 to the original continu-\nous estimation problem. The next proposition is the main result of this subsection. The core of the\nproof is a reduction that constructs a (\u270f,  1)-accurate and (, L)-private discrete learner strategy\nfrom an \u270f-accurate and (, L)-private continuous learner strategy. The proof is given in Section B.3\nof the Supplemental Note.\nProposition 5.6. Fix \u270f and  in (0, 1) and L 2 N, such that \u270f</ 4 and < 1/L. Fix  2 [2 , /\u270f].6\nWe have that\n(5.10)\n\nN (\u270f, , L)  N D(\u270f,  1,, L ).\n\nCompleting the Proof of the Lower Bound We are now ready to establish the query complexity\nlower bound in Theorem 2.1. Fix \u270f and  in (0, 1) and L 2 N, such that \u270f</ 4 and < 1/L. Using\nPropositions 5.3 and 5.6, we have that for any  2 [2, /\u270f],\n\nN (\u270f, , L) N D(\u270f,  1,, L )\n\n(b)\n\n L\uf8ff(1  1) log\u2713 \n\n\u270f\n\n1\u25c6  h(1) ,\n\n(5.11)\n\nwhere the last step follows from Proposition 5.3 by substituting \u232b with 1 and \u270f with \u270f. Letting\n 4= 1, the above inequality can be rearranged to become\n\nN (\u270f, , L)\n\nL\n\n(1  ) log\u2713 \n\n\u270f\n\n\u25c6  h()\n\n(a)\n\n (1  ) log\u2713 \n\n\u270f\n\n\u25c6 + 2(1  ) log()\n\n log(/\u270f)   log(/\u270f) + 3 log ,\n\n(5.12)\nwhere step (a) follows from the assumption that  = 1 \uf8ff 1/2, and the fact that h(x) \uf8ff\n2(1  x) log(x) for all x 2 (0, 1/2]. Consider the choice:  = log(/\u270f). To verify  still belongs\nto the range [2, /\u270f], note that the assumption that \u270f</ 4 ensures   2, and because x > log(x)\nfor all x > 0, we have that  < /\u270f. Substituting  with (log(/\u270f))1 in Eq. (5.12), we have that\nN (\u270f,,L)\n log(/\u270f)  1  3 log log(/\u270f) or, equivalently, N (\u270f, , L)  L log(1/\u270f)  L log(2/) \n3L log log(/\u270f). This completes the proof of the lower bound in Theorem 2.1.\n\nL\n\n6To avoid the use of rounding in our notation, we will assume that  is an integer multiple of \u270f.\n\n8\n\n\f6 Concluding Remarks\n\nThe main contribution of the present paper is a tight query complexity lower bound for the Bayesian\nPrivate Learning problem, which, together with an upper bound in [13], shows that the learner\u2019s\nquery complexity depends multiplicatively on the level of privacy, L: if an \u270f-accurate learner wishes\nto ensure that an adversary\u2019s probability of making a -accurate estimation is at most 1/L, then\nshe needs to employ on the order of L log(/\u270f) queries. Moreover, we show that the multiplicative\ndependence on L holds even under the more general models of high-dimensional queries and partial\nadversary monitoring. To prove the lower bound, we develop a set of information-theoretic arguments\nwhich involve, as a main ingredient, the analysis of proportional-sampling adversary estimators that\nexploit the action-information proximity inherent in the learning problem.\nThe present work leaves open a few interesting directions. Firstly, the current upper and lower bounds\nare not tight in the regime where the adversary\u2019s error criterion, , is signi\ufb01cantly smaller than 1/L.\nMaking progress in this regime is likely to require a more delicate argument and possible new tools.\nSecondly, our query model assumes that the responses are noiseless, and it will be interesting to\nexplore how may the presence of noise (cf. [10, 1, 15]) impact the design of private query strategies.\nFor instance, a natural generalization of the bisection search algorithm to the noisy setting is the\nProbabilistic Bisection Algorithm ([7, 15]), where the nth query point is the median of the target\u2019s\nposterior distribution in the nth time slot. It is conceivable that one may construct a probabilistic query\nstrategy analogous to the Replicated Bisection strategy by replicating queries in L pre-determined\nsub-intervals. However, it appears challenging to prove that such replications preserve privacy, and\nstill more dif\ufb01cult to see how one may obtain a matching query complexity lower bound in the noisy\nsetting. Finally, one may want to consider richer, and potentially more realistic, active learning\nmodels, such as one in which each query reveals to the learner the full gradient of a function at the\nqueried location, instead of only the sign of the gradient as in the present model.\n\nReferences\n[1] Michael Ben-Or and Avinatan Hassidim. The bayesian learner is optimal for noisy binary search\n(and pretty good for quantum as well). In Foundations of Computer Science, 2008. FOCS\u201908.\nIEEE 49th Annual IEEE Symposium on, pages 221\u2013230. IEEE, 2008.\n\n[2] Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical\n\nrisk minimization. Journal of Machine Learning Research, 12:1069\u20131109, 2011.\n\n[3] Thomas M Cover and Joy A Thomas. Elements of information theory, 2nd edition. John Wiley\n\n& Sons, 2006.\n\n[4] Rachel Cummings, Federico Echenique, and Adam Wierman. The empirical implications of\n\nprivacy-aware choice. Operations Research, 64(1):67\u201378, 2016.\n\n[5] Cynthia Dwork. Differential privacy: A survey of results. In International Conference on\n\nTheory and Applications of Models of Computation, pages 1\u201319. Springer, 2008.\n\n[6] Giulia Fanti, Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Spy vs. spy: Rumor source\nobfuscation. In ACM SIGMETRICS Performance Evaluation Review, volume 43, pages 271\u2013284.\nACM, 2015.\n\n[7] Michael Horstein. Sequential transmission using noiseless feedback. IEEE Transactions on\n\nInformation Theory, 9(3):136\u2013143, 1963.\n\n[8] Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning.\n\nIn Conference on Learning Theory, pages 24\u20131, 2012.\n\n[9] Yehuda Lindell and Benny Pinkas. Secure multiparty computation for privacy-preserving data\n\nmining. Journal of Privacy and Con\ufb01dentiality, 1(1):5, 2009.\n\n[10] Ronald L. Rivest, Albert R. Meyer, Daniel J. Kleitman, Karl Winklmann, and Joel Spencer.\nCoping with errors in binary search procedures. Journal of Computer and System Sciences,\n20(3):396\u2013404, 1980.\n\n9\n\n\f[11] Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of\n\nmathematical statistics, pages 400\u2013407, 1951.\n\n[12] John N Tsitsiklis and Kuang Xu. Delay-predictability trade-offs in reaching a secret goal.\n\nOperations Research, 66(2):587\u2013596, 2018.\n\n[13] John N Tsitsiklis, Kuang Xu, and Zhi Xu. Private sequential learning. In Conference on\n\nLearning Theory (COLT), 2018. https://arxiv.org/abs/1805.02136.\n\n[14] Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134\u20131142,\n\n1984.\n\n[15] Rolf Waeber, Peter I Frazier, and Shane G Henderson. Bisection search with noisy responses.\n\nSIAM Journal on Control and Optimization, 51(3):2261\u20132279, 2013.\n\n[16] Martin J Wainwright, Michael I Jordan, and John C Duchi. Privacy aware learning. In Advances\n\nin Neural Information Processing Systems, pages 1430\u20131438, 2012.\n\n[17] Zhi Xu. Private sequential search and optimization. Master\u2019s thesis, Massachusetts Institute of\n\nTechnology, 2017. https://dspace.mit.edu/handle/1721.1/112054.\n\n10\n\n\f", "award": [], "sourceid": 1236, "authors": [{"given_name": "Kuang", "family_name": "Xu", "institution": "Stanford Graduate School of Business"}]}