{"title": "The Limits of Post-Selection Generalization", "book": "Advances in Neural Information Processing Systems", "page_first": 6400, "page_last": 6409, "abstract": "While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of *post selection*---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure a property called *post hoc generalization* (Cummings et al., COLT'16), which says that no person when given the output of the algorithm should be able to find any statistic for which the data differs significantly from the population it came from.\n\nIn this work we show several limitations on the power of algorithms satisfying post hoc generalization. First, we show a tight lower bound on the error of any algorithm that satisfies post hoc generalization and answers adaptively chosen statistical queries, showing a strong barrier to progress in post selection data analysis. Second, we show that post hoc generalization is not closed under composition, despite many examples of such algorithms exhibiting strong composition properties.", "full_text": "The Limits of Post-Selection Generalization\n\nKobbi Nissim\u2217\n\nGeorgetown University\n\nkobbi.nissim@georgetown.edu\n\nUri Stemmer\u2021\n\nBen-Gurion University\n\nu@uri.co.il\n\nAdam Smith\u2020\nBoston University\nads22@bu.edu\n\nThomas Steinke\n\nIBM Research \u2013 Almaden\n\nposel@thomas-steinke.net\n\nJonathan Ullman\u00a7\nNortheastern University\njullman@ccs.neu.edu\n\nAbstract\n\nWhile statistics and machine learning offers numerous methods for ensuring gener-\nalization, these methods often fail in the presence of post selection\u2014the common\npractice in which the choice of analysis depends on previous interactions with the\nsame dataset. A recent line of work has introduced powerful, general purpose\nalgorithms that ensure a property called post hoc generalization (Cummings et\nal., COLT\u201916), which says that no person when given the output of the algorithm\nshould be able to \ufb01nd any statistic for which the data differs signi\ufb01cantly from the\npopulation it came from.\nIn this work we show several limitations on the power of algorithms satisfying post\nhoc generalization. First, we show a tight lower bound on the error of any algorithm\nthat satis\ufb01es post hoc generalization and answers adaptively chosen statistical\nqueries, showing a strong barrier to progress in post selection data analysis. Second,\nwe show that post hoc generalization is not closed under composition, despite many\nexamples of such algorithms exhibiting strong composition properties.\n\nIntroduction\n\n1\nConsider a dataset X consisting of n independent samples from some unknown population P. How\ncan we ensure that the conclusions drawn from X generalize to the population P? Despite decades\nof research in statistics and machine learning on methods for ensuring generalization, there is an\nincreased recognition that many scienti\ufb01c \ufb01ndings do not generalize, with some even declaring this\nto be a \u201cstatistical crisis in science\u201d [14]. While there are many reasons a conclusion might fail to\ngeneralize, one that is receiving increasing attention is post-selection, in which the choice of method\nfor analyzing the dataset depends on previous interactions with the same dataset. Post-selection can\narise from many common practices, such as variable selection, exploratory data analysis, and dataset\nre-use. Unfortunately, post-selection invalidates traditional methods for ensuring generalization,\nwhich assume that the method is independent of the data.\nNumerous methods have been devised for statistical inference after post selection (e.g. [16, 18, 12,\n13, 23]). These are primarily special purpose procedures that apply to speci\ufb01c types of simple post\nselection that admit direct analysis. A more limited number of methods apply where the data is reused\nin one of a small number of prescribed ways (e.g. [2, 4]).\n\nResearch Award.\n\n\u2217Supported by NSF award CNS-1565387.\n\u2020Supported by NSF awards IIS-1447700 and AF-1763665, a Google Faculty Award and a Sloan Foundation\n\u2021Work done while U.S. was a postdoctoral researcher at the Weizmann Institute of Science, supported by a\n\u00a7Supported by NSF awards CCF-1718088, CCF-1750640, and CNS-1816028, and a Google Faculty Award.\n\nKoshland fellowship, and by the Israel Science Foundation (grants 950/16 and 5219/17).\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\f(cid:80)\n\nn\n\nA recent line of work initiated by Dwork et al. [9] posed the question: Can we design general-\npurpose algorithms for ensuring generalization in the presence of post-selection? These works\n(e.g. [9, 8, 19, 1]) identi\ufb01ed properties of an algorithm that ensure generalization under post-selection,\nincluding differential privacy [10], information-theoretic measures, and compression. They also\nidenti\ufb01ed many powerful general-purpose algorithms satisfying these properties, leading to algorithms\nfor post-selection data analysis with greater statistical power than all previously known approaches.\nEach of the aforementioned properties give incomparable generalization guarantees, and allow for\nqualitatively different types of algorithms. However, Cummings et al. [7] identi\ufb01ed that the common\nthread in each of these approaches is to establish a notion of post hoc generalization (which they\noriginally called robust generalization), and initiated a general study of algorithms satisfying this\nnotion. Informally, an algorithm M satis\ufb01es post hoc generalization if there is no way, given only the\noutput of M(X), to identify any statistical query [17] (that is, a bounded, linear, real-valued statistic\non the population) such that the value of that query on the dataset is signi\ufb01cantly different from its\nanswer on the whole population.\nDe\ufb01nition 1.1 (Post Hoc Generalization [7]). An algorithm M : X n \u2192 Y satis\ufb01es (\u03b5, \u03b4)-post hoc\ngeneralization if for every distribution P over X and every algorithm A that outputs a bounded\nfunction q : X \u2192 [\u22121, 1], if X \u223c P\u2297n, y \u223c M(X), and q \u223c A(y), then P [|q(P) \u2212 q(X)| > \u03b5] \u2264\n\u03b4, where we use the notation q(P) = E [q(X)] and q(X) = 1\ni q(Xi), and the probability is over\nthe sampling of X and any randomness of M,A.\nPost hoc generalization is easily satis\ufb01ed whenever n is large enough to ensure uniform convergence\nfor the class of statistical queries. However, uniform convergence is only satis\ufb01ed in the unrealistic\nregime where n is much larger than |X|. Algorithms that satisfy post hoc generalization are interesting\nin the realistic regime where there will exist queries q for which q(P) and q(X) are far, but these\nqueries cannot be found. The de\ufb01nition also extends seamlessly to richer types of statistics than\nstatistical queries. However, restricting to statistical queries only strengthens our negative results.\nSince all existing general-purpose algorithms for post-selection data analysis are analyzed via post\nhoc generalization, it is crucial to understand what we can achieve with algorithms satisfying post hoc\ngeneralization. In this work we present several strong limitaitons on the power of such algorithms.\nOur results identify natural barriers to progress in this area, and highlight important challenges for\nfuture research on post-selection data analysis.\n1.1 Our Results\nSample Complexity Bounds for Statistical Queries. Our \ufb01rst contribution is strong new lower\nbounds on any algorithm that satis\ufb01es post hoc generalization and answers a sequence of adaptively\nchosen statistical queries\u2014the setting introduced in Dwork et al. [9] and further studied in [1, 15, 20].\nIn this model, there is an underlying distribution P. We would like to design an algorithm M that\nholds a sample X \u223c P\u2297n, takes statistical queries q, and returns accurate answers a such that\na \u2248 q(P). To model post-selection, we consider a data analyst A that issues a sequence of queries\nq1, . . . , qk where each query qj may depend on the answers a1, . . . , aj\u22121 given by the algorithm in\nresponse to previous queries.\nThe simplest algorithm M for this task of answering adaptive statistical queries would return the\nempirical mean qj(X) = 1\ni qj(Xi) in response to each query, and one can show that this\nn\nalgorithm answers each query to within \u00b1\u03b5 if n \u2265 \u02dcO(k/\u03b52) samples. Surprisingly, we can improve\nthe sample complexity to n \u2265 \u02dcO(\nk/\u03b52) by returning q(X) perturbed with carefully calibrated\nnoise [9, 1]. The analysis of this approach uses post hoc generalization: the noise is chosen so that\n|a \u2212 q(X)| \u2264 \u03b5/2 and the noise ensures |q(P) \u2212 q(X)| \u2264 \u03b5/2 for every query the analyst asks.\n\u221a\nOur main result shows that the sample complexity n = \u02dcO(\nk/\u03b52) is essentially optimal for any\nalgorithm that uses the framework of post hoc generalization.\nTheorem 1.2 (Informal). If M takes a sample of size n, satis\ufb01es (\u03b5, \u03b4)-post hoc generalization,\nand for every distribution P over X = {\u00b11}k+O(log(n/\u03b5)) and every data analyst A who asks k\nk/\u03b52), where the probability is\ntaken over X \u223c P\u2297n and the coins of M and A.\nTo prove our theorem, we construct a joint distribution over pairs (A,P) such that when M is given\ntoo small a sample X, and A asks k \u2212 1 statistical queries, then either M does not answer all the\n\nstatistical queries, P(cid:2)\u2203j \u2208 [k], |qj(P) \u2212 a| > \u03b5(cid:3) \u2264 \u03b4 then n = \u2126(\n\n(cid:80)\n\n\u221a\n\n\u221a\n\n2\n\n\fk)\n\nqueries accurately or A outputs a k-th query q\u2217 such that q\u2217(P) \u2212 q\u2217(X) > \u03b5. Thus, M cannot be\nboth accurate and satisfy post hoc generalization.\n\u221a\nOur proof of this result re\ufb01nes the techniques in [15, 20]\u2014which yield a lower bound of n = \u2126(\nfor \u03b5 = 1/3.\nOur proof circumvents a barrier in previous lower bounds. The previous works use the sequence\nof queries to uncover almost all of the sample held by the mechanism (a \u201creconstruction attack\u201d of\nsorts). Once the analyst has identi\ufb01ed all the points in the sample, it is easy to force an error: the\nanalyst randomly asks one of two queries \u2013 zero everywhere or zero on the reconstructed sample and\none elsewhere \u2013 that \u201clook the same to\u201d M but have different true answers.\nWe cannot use that approach because in our setting it is impossible to reconstruct any of the sample.\nIndeed, for the parameter regime we consider, differentially private algorithms could be used to\nprevent reconstruction with any meaningful con\ufb01dence. All we can hope for is a weak approximate\nreconstruction of the sample. This means the algorithm will have suf\ufb01cient information to distinguish\nthe aforementioned two queries and we cannot end the proof the same way as previously.\nIntuitively, our attack approximately reconstructs the dataset in a way that is only O(\u03b5) better than\nguessing. This is not enough to completely \u201ccut off\u201d the algorithm and force an error, but, as we\nwill see, does allow the analyst to construct a query q\u2217 that over\ufb01ts \u2013 i.e., |q\u2217(X) \u2212 q\u2217(P )| > \u03b5.\nOur approximate reconstruction is accomplished using a modi\ufb01cation of the reconstruction attack\ntechniques of prior work. Speci\ufb01cally, we employ tools from the \ufb01ngerprinting codes literature\n[3, 22, 6] but we output quantitative scores, rather than a hard in/out decision about what is in the\nsample.\nIndependently, Wang [24] proved a quantitatively similar bound to Theorem 1.2. However, Wang\u2019s\nbound only applies to algorithms M that receive only the empirical mean q(X) of each query (as\nopposed to the whole data set). This precludes mechanisms such as sample splitting that treat records\nassymetrically. Wang\u2019s bound also applies for a slightly different (though closely related) class of\nstatistics.\nThe dimensionality of X required in our result is at least as large as k, which is somewhat necessary.\nIndeed, if the support of the distribution is {\u00b11}d, then there is an algorithm M that takes a sample\n\u221a\nd log(k)/\u03b53) [9, 1], so the conclusion is simply false if d (cid:28) k. Even when d (cid:28) k,\nof size just \u02dcO(\n\u221a\nthe aforementioned algorithms require running time at least 2d per query. [15, 20] also showed that\nany polynomial time algorithm that answers k queries to constant error requires n = \u2126(\nk). We\nimprove this result to have the optimal dependence on \u03b5.\nTheorem 1.3 (Informal). Assume one-way functions exist and let c > 0 be any constant. If M takes\na sample of size n, has polynomial running time, satis\ufb01es (\u03b5, \u03b4)-post hoc generalization, and for\nevery distribution P over X = {\u00b11}kc+O(log(n/\u03b5)) and every data analyst A who asks k statistical\nk/\u03b52), where the probability is taken\nover X \u223c P\u2297n and the coins of M and A.\nWe prove the information-theoretic result (Theorem 1.2) in Section 2. Due to space restrictions, we\ndefer the computational result (Theorem 1.3) to the full version of this work.\nNegative Results for Composition. Differential privacy provides optimal or near-optimal methods\nfor answering an adaptively-chosen sequence of statistical queries. However, even for answering\nstatistical queries, outside constraints sometimes preclude randomized algorithms (to allay repro-\nducibility concerns, for instance). Furthermore, one of the main goals of the emerging study of\nadaptive data analysis is to understand unstructured, unplanned dataset re-use.\nAt this point, we know several techniques for reasoning about generalization in the adaptive setting:\ndifferential privacy and algorithmic stability, information bounds, and compression (and there may\nbe many more yet to be discovered) [7]. These techniques are not directly comparable, but they all\nuse posthoc generalization as a fundamental unit of their analysis. If posthoc generalization were\nto compose well, then this would provide an avenue for combining these techniques (and possibly\nothers). However, we show that this is not the case and, hence, we must search elsewhere for a\nunifying theory.\nIntuitively, we show that, if the same dataset is analyzed by many different algorithms each satisfying\npost hoc generalization, then the composition of these algorithms may not satisfy post hoc generaliza-\n\nqueries, P(cid:2)\u2203j \u2208 [k], |qj(P) \u2212 a| > \u03b5(cid:3) \u2264 \u03b4, then n = \u2126(\n\n\u221a\n\n3\n\n\f(\u03b5, \u03b4)-post hoc generalizing for every \u03b4 > 0 and \u03b5 = O((cid:112)log(n/\u03b4)/n.999), but (2) the composition\n\ntion. That is, combining the information output by several algorithms may permit over\ufb01tting even\nwhen the individual outputs do not.\nThe key reason differential privacy is used for adaptive data analysis is that it satis\ufb01es strong\ncomposition properties \u2013 this is what quantitatively distinguishes the technique from, say, data\nsplitting. We show that posthoc generalization does not have even weak adaptive composition\nproperties. This shows a stark difference between differential privacy and posthoc generalization as\ntools for analyzing adaptive data analysis. This result can be viewed as further motivation for using\ndifferential privacy in this setting \u2013 its composition properties are special.\nTheorem 1.4 states that there is a set of O(log n) algorithms that have almost optimal post hoc\ngeneralization, but whose composition does not have any non-trivial post hoc generalization.\nTheorem 1.4. For every n \u2208 N there is a collection of (cid:96) = O(log n) algorithms M1, . . . ,M(cid:96) that\ntake n samples from a distribution over X = {0, 1}O(log n) such that (1) each of these algorithms are\n(M1, . . . ,M(cid:96)) is not (1.999, .999)-post hoc generalizing.\nIf we consider a relaxed notion of computational post hoc generalization, then we show that compo-\nsition can fail even for just two algorithms. Informally, computational post hoc generalization means\nthat De\ufb01nition 1.1 is satis\ufb01ed when the algorithm A runs in polynomial time.\nTheorem 1.5. Assume one-way functions exist. For every n \u2208 N there are two algorithms M1,M2\nthat take n samples from a distribution over X = {0, 1}O(log n) such that (1) both algorithms are\nbut (2) the composition (M1,M2) is not (1.999, .999)-computationally post hoc generalizing.\nWe prove the information-theoretic result (Theorem 1.4) in Section 3. Due to space restrictions, we\ndefer the computational result (Theorem 1.5) to the full version of this work.\n\n(\u03b5, \u03b4)-computationally post hoc generalizing for every \u03b4 > n\u2212O(1) and \u03b5 = O((cid:112)log(n/\u03b4)/n.999),\n\n2 Lower Bounds for Statistical Queries\n2.1 Post Hoc Generalization for Adaptive Statistical Queries\n\nWe are interested in the ability of interactive algorithms satisfying post hoc generalization to answer a\nsequence of statistical queries. De\ufb01nition 1.1 applies to such algorithms via the following experiment.\n\nAlgorithm 1: AQX ,n,k[M (cid:11) A]\nA chooses a distribution P over X\nX \u223c P\u2297n and X is given to M (but not to A)\nFor j = 1, . . . , k\n\nA outputs a statistical query qj (possibly depending on q1, a1, . . . , qj\u22121, aj\u22121)\nM(X) outputs aj\n\nDe\ufb01nition 2.1. An algorithm M is (\u03b5, \u03b4)-post hoc generalizing for k adaptive queries over X given\nn samples if for every adversary A,\n\nP\n\nAQX ,n,k[M(cid:11)A]\n\n(cid:2)\u2203j \u2208 [k]\n\n(cid:12)(cid:12)qj(X) \u2212 qj(P)(cid:12)(cid:12) > \u03b5(cid:3) \u2264 \u03b4.\n\n2.2 A Lower Bound for Natural Algorithms\nWe begin with an information-theoretic lower bound for a class of algorithms M that we call natural\nalgorithms. These are algorithms that can only evaluate the query on the sample points they are given.\nThat is, an algorithm M is natural if, when given a sample X = (X1, . . . , Xn) and a statistical query\nq : X \u2192 [\u22121, 1], the algorithm M returns an answer a that is a function only of (q(X1), . . . , q(Xn)).\nIn particular, it cannot evaluate q on data points of its choice. Many algorithms in the literature have\nthis property. Formally, we de\ufb01ne natural algorithms via the game NAQX ,n,k[M (cid:11) A]. This game\nis identical to AQX ,n,k[M (cid:11) A] except that when A outputs qj, M does not receive all of qj, but\ninstead receives only qj\n\nX = (qj(X1), . . . , qj(Xn)).\n\n4\n\n\fP\n\nNAQ[N ],n,k[M(cid:11)ANAQ]\n\nTheorem 2.2 (Lower Bound for Natural Algorithms). There is an adversary ANAQ such that for\nevery natural algorithm M, and for universe size N = 8n/\u03b5, if\n\n(cid:12)(cid:12)qj(X) \u2212 qj(P)(cid:12)(cid:12) > \u03b5\n\n(cid:95)(cid:12)(cid:12)aj \u2212 qj(P)(cid:12)(cid:12) > \u03b5\n\n(cid:105) \u2264 1\n\n100\n\n(cid:104)\u2203j \u2208 [k]\n\nk/\u03b52). Here the sample X is chosen via the game NAQ[N ],n,k (it is sampled uniformly\n\n\u221a\nthen n = \u2126(\nfrom the domain [N ]).\nThe proof uses the analyst ANAQ described in Algorithm 2. For notational convenience, ANAQ\nactually asks k + 1 queries, but this does not affect the \ufb01nal result.\nAlgorithm 2: ANAQ\nParameters: sample size n, universe size N = 8n\nLet P \u2190 U[N ], A1 \u2190 \u2205, and \u03c4 \u2190 9\u03b5\nFor j \u2208 [k]\n\n\u03b5 , number of queries k, target accuracy \u03b5\n\u03b5 ) + 1\n\n2k log( 96\n\n(cid:113)\n\nSample pj \u223c U[0,1]\nFor i \u2208 [N ]\nSample \u02dcqj\n\ni \u223c Ber(pj) and let qj(i) \u2190\n\n(cid:26) \u02dcqj\n\ni\n0\n\ni /\u2208 Aj\ni \u2208 Aj\n\n(cid:26) trunc3\u03b5(aj \u2212 pj) \u00b7 (qj\n\nAsk query qj and receive answer aj\nFor i \u2208 [N ]\ni \u2190\nLet zj\nwhere trunc3\u03b5(x) takes x \u2208 R and returns the nearest point in [\u22123\u03b5, 3\u03b5] to x.\n(N.B. By construction, Aj \u2286 Aj+1.)\n\ni /\u2208 Aj\ni \u2208 Aj\n\n(cid:12)(cid:12)(cid:12) > \u03c4 \u2212 1\n(cid:111)\n\n(cid:12)(cid:12)(cid:12)(cid:80)j\n\ni \u2208 [N ] :\n\ni \u2212 pj)\n\n(cid:96)=1 z(cid:96)\ni\n\n0\n\nLet Aj+1 \u2190(cid:110)\nDe\ufb01ne zi \u2190(cid:80)k\n\nFor i \u2208 [N ]\n\nj=1 zj\n\ni and q\u2217\n\ni \u2190 zi\n\n\u03c4\n\nLet q\u2217 : [N ] \u2192 [\u22121, 1] be de\ufb01ned by q\u2217(i) \u2190 q\u2217\n\ni\n\nIn order to prove Theorem 2.2, it suf\ufb01ces to prove that either the answer aj to one of the initial queries\nqj fails to be accurate (in which case M is not accurate, or that the \ufb01nal query q\u2217 gives signi\ufb01cantly\ndifferent answers on X and P (in which case M is not robustly generalizing). Formally, we have the\nfollowing proposition.\nProposition 2.3. For an appropriate choice of k = \u0398(\u03b54n2) and n, 1\n\u03b5 suf\ufb01ciently large, for any\n|aj \u2212 qj(P)| > \u03b5, or (2) q\u2217(X) \u2212\nnatural M, with probability at least 2/3, either (1) \u2203j \u2208 [k]\nq\u2217(P) > \u03b5. where the probability is taken over the game NAQX ,n,k[M (cid:11) ANAQ] and ANAQ is\nspeci\ufb01ed by Algorithm 2.\n\ni is bounded.\n\nWe prove Proposition 2.3 using a series of claims. The \ufb01rst claim states that none of the values zi are\never too large in absolute value, which follows immediately from the de\ufb01nition of the set Aj and the\nfact that each term zj\nClaim 2.4. For every i \u2208 [N ], |zi| \u2264 \u03c4.\nThe next claim states that, no matter how the mechanism answers, very few of the items not in the\nsample get \u201caccused\u201d of membership, that is, included in the set Aj.\nClaim 2.5 (Few Accusations). Pr(|Ak \\ X| \u2264 \u03b5N/8) \u2265 1 \u2212 e\u2212\u2126(\u03b5n).\n\nProof. Fix the biases p1, ..., pk as well as the all the information visibile to the mechanism (the query\nvalues {qj\ni : i \u2208 X, j \u2208 [k]}, as well as the answers a1, ..., ak). We prove that the probability of F is\nhigh conditioned on any setting of these variables.\nThe main observation is that, once we condition on the biases pj, the query values at {qj\ni : i /\u2208 X, j \u2208\ni \u223c Ber(pj). This is true because M is a natural algorithm (so it sees\n[k]} are independent with qj\n\n5\n\n\fTarget distribution, uniform on [N ].\n\nSample size.\nn\nN Universe size.\nP\nAj Universe elements \u201csuspected\u201d of being in the sample during the jth round of the attack.\nqj\nzj\ni\n\nThe query constructed in the j round.\nIf i is not in the sample then E[(aj \u2212 pj) \u00b7 (qj\n\ni \u2212 pj)] = E[aj \u2212 pj] \u00b7 E[qj\n\ni \u2212 pj] = 0.\n\nThe bigger(cid:80)k\n\nj=1 zj\n\ni is, the more we \u201csuspect\u201d element i of being in the database.\nTable 1: Notation and intuition for Algorithm 2\n\ni\n\nzj\ni\n\n(cid:105)\n\nj=1 zj\n\n(cid:105)\n\nj=1 zj\n\ni\n\nqj\n\n= 0.\n\n(cid:105)\n\n= E(cid:104)\n\nNext, note that |zj\n\ni \u2212 pj(cid:105)\n\ntrunc3\u03b5(aj \u2212 pj) \u00b7 (qj\n\ni \u2212pj) (the difference from zj\n\ni ever exceeds \u03c4, then subsequent values zj\n\n= E(cid:2)trunc3\u03b5(aj \u2212 pj)(cid:3) \u00b7 E(cid:104)\n\n= 0.\ni | \u2264 3\u03b5, since trunc3\u03b5(aj \u2212 pj) \u2208 [\u22123\u03b5, 3\u03b5] and qj\n\nonly the query values for points in X) and, more subtly, because the analyst\u2019s decisions about how\nto sample the pj\u2019s, and which points in X to include in the sets Aj, are independent of the query\nvalues outside of X. By the principle of deferred decisions, we may thus think of the query values\ni : i /\u2208 X, j \u2208 [k]} as selected after the interaction with the mechanism is complete.\n{qj\nFix i /\u2208 X. For every j \u2208 [k] and i /\u2208 X, we have\ni \u2212 pj)\n\ni \u2212 pj \u2208 [0, 1]. The terms zj\ni for\nj > (cid:96) will be set to 0. However, we may consider a related sequence given by sums of the terms\ni = trunc3\u03b5(aj\u2212pj)\u00b7(\u02dcqj\n\u02dczj\ni Ber(pj) regardless of\nj=1 \u02dczi\nis a sum of bounded independent random variables. By Hoeffding\u2019s Inequality, the sum is bounded\n48 .\n\nE(cid:104)\nBy linearity of expectation, we also have E [zi] = E(cid:104)(cid:80)k\nare not independent, since if a partial sum(cid:80)(cid:96)\nwhether item i is in Aj). Once we have conditioned on the biases and mechanism\u2019s outputs,(cid:80)k\n(cid:113)\nO(\u03b5(cid:112)k log(1/\u03b5) with high probability, for every i (cid:54)\u2208 X P(cid:104)(cid:12)(cid:12)(cid:12)(cid:80)k\n(cid:12)(cid:12)(cid:12) > \u03b5\n(cid:1)(cid:105) \u2264 \u03b5\n18k ln(cid:0) 96\n\uf8f9\uf8fa\uf8fa\uf8fa\uf8fb \u2264 3\u00b7P\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > \u03b5\n\uf8ee\uf8f0(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k(cid:88)\n(cid:19)\uf8f9\uf8fb \u2264 \u03b5\n(cid:115)\n(cid:18) 96\n(cid:19)\n(cid:125)\n(cid:12)(cid:12)(cid:12) > \u03c4 \u2212 1\n(cid:12)(cid:12)(cid:12)(cid:80)(cid:96)\n(cid:105) \u2264 \u03b5\n8 N(cid:3) \u2264 e\u2212\u03b5N/64 \u2264 e\u2212\u2126(n)\nover the course of the algorithm using Chernoff\u2019s bound: P(cid:2)|Ak \\ X| > \u03b5\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:96)(cid:88)\n(cid:123)(cid:122)\ni . Thus, \u2200i (cid:54)\u2208 X P(cid:104)\u2203(cid:96) \u2208 [k] :\n\nFinally, notice that by construction, the real scores zj\nthe sets Aj are nested (Aj \u2286 Aj+1), and a bound on partial sums of the \u02dczj\npartial sums of the zj\n\nNow, the scores zi are independent across players (again, because we have \ufb01xed the biases pj and\n4 elements i are \u201caccused\u201d\nthe mechanism\u2019s outputs). We can bound the probability that more than \u03b5N\n\n\u03b5\nBy Etemadi\u2019s Inequality, a related bound holds uniformly over all the intermediate sums:\n\ni are all set to 0 when an item is added to Aj, so\ni applies equally well to the\n\n\uf8ee\uf8ef\uf8ef\uf8ef\uf8f0\u2203(cid:96) \u2208 [k] :\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > 3\u03b5\n(cid:124)\n\ni is that we use values \u02dcqj\n\n(cid:18) 96\n\n\u03b5\n\n\u2200i (cid:54)\u2208 X P\n\n\u02dcz(cid:96)\ni\n\nj=1\n\n\u02dczj\ni\n\nj=1\n\n18k ln\n\n\u03c4\u22121\n\n18k ln\n\n\u03b5\n\n16\n\nj=1 z(cid:96)\ni\n\n16\n\n(cid:115)\n\nj=1 \u02dczj\n\ni\n\nThe claim now follows by averaging over all of the choices we \ufb01xed.\n\n100 ,(cid:80)\n\nN k).\n\n\u221a\ni\u2208[N ]\\X zi = O(\u03b5\n\nThe next claim states that the sum of the scores over all i not in the sample is small.\nClaim 2.6. With probability at least 99\nProof. Fix a choice of (p1, . . . , pk) \u2208 [0, 1]k, the in-sample query values (q1\nX ) \u2208 {0, 1}n\u00d7k,\nand the answers (a1, . . . , ak) \u2208 [0, 1]k. Conditioned on these, the values zi for i /\u2208 X are independent\nand identically distributed. They have expectation 0 (see the proof of Claim 2.5) and are bounded by \u03c4\n(by Claim 2.4). By Hoeffding\u2019s inequality, with probability at least 99\nN ) =\n100\nO(\u03b5\nClaim 2.7. There exists c > 0 such that, for all suf\ufb01ciently small \u03b5 and suf\ufb01ciently large n, with\ni\u2208[N ] zi \u2265 ck (high\nprobability at least 99\nscores in sample).\n\n100 , either \u2203j \u2208 [k] : |aj \u2212 qj(P)| > \u03b5 (large error), or(cid:80)\n\ni\u2208[N ]\\X zi = O(\u03c4\nN k) as desired. The claim now follows by averaging over all of the choices we \ufb01xed.\n\nX , . . . , qk\n\n(cid:80)\n\n\u221a\n\n\u221a\n\n6\n\n\fThe proof of Claim 2.7 relies on the following key lemma. The lemma has appeared in various\nforms [20, 11, 21]; the form we use is [5, Lemma 3.6] (rescaled from {\u22121, +1} to {0, 1}).\nLemma 2.8 (Fingerprinting Lemma). Let f : {0, 1}m \u2192 [0, 1] be arbitrary. Sample p \u223c U[0,1] and\nsample x1, . . . , xm \u223c Ber(p) independently. Then\n\n\uf8ee\uf8f0(f (x) \u2212 p) \u00b7 (cid:88)\n\nE\n\ni\u2208[m]\n\n(xi \u2212 p) +\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)f (x) \u2212 1\n\nm\n\n(cid:88)\n\ni\u2208[m]\n\n\uf8f9\uf8fb \u2265 1\n\n.\n\n12\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\nxi\n\nProof of Claim 2.7. To make use of the \ufb01ngerprinting lemma, we consider a variant of Algorithm 2\nthat does not truncate the quantity aj \u2212 pj to the range \u00b12\u0001 when computing the score zj\ni for each\nelement i. Speci\ufb01cally, we consider scores based on the quantities\n\n(cid:26)(aj \u2212 pj) \u00b7 (qj\n\n\u02c6zj\ni =\n\n0\n\ni \u2212 pj)\n\nif i /\u2208 Aj ,\nif i \u2208 Aj ;\n\nand\n\n\u02c6zi =\n\nk(cid:88)\n\nj=1\n\n\u02c6zj\ni .\n\nWe prove two main statements: \ufb01rst, that these untruncated scores are equal to the truncated ones\nwith high probability as long as the mechanism\u2019s answers are accurate. Second, that the expected\nsum of the untruncated scores is large. This gives us the desired \ufb01nal statement.\nTo relate the truncated and untruncated scores, consider the following three key events:\n\n1. (\u201cFew accusations\u201d): Let F the event that, at every round j, set of \u201caccused\u201d items outside\nof the sample is small: |Ak \\ X| \u2264 \u03b5N/8. Since the Aj are nested, event F implies the\nsame condition for all j in [k].\n\n2. (\u201cLow population error\u201d): Let G be the event that at every round j \u2208 [k], the mechanism\u2019s\n\nanwer satis\ufb01es |aj \u2212 pj| \u2264 3\u03b5.\n\n3. (\u201cRepresentative queries\u201d): Let H be the event that |\u02dcqj(P)\u2212 pj| \u2264 \u03b5 for all rounds j \u2208 [k]\u2014\n\nthat is, each query\u2019s population average is close to the corresponding sampling bias pj.\n\nSub-Claim 2.9. Conditioned on F \u2229 G \u2229 H, the truncated and untruncated scores are equal.\nSpeci\ufb01cally, |aj \u2212 pj| \u2264 3\u03b5 for all j \u2208 [k].\nProof. We can bound the difference |aj \u2212 pj| via the triangle inequality:\n\n|aj \u2212 pj| \u2264 |aj \u2212 qj(P)| + |qj(P) \u2212 \u02dcqj(P)| + |\u02dcqj(P) \u2212 pj| .\n\nThe \ufb01rst term is the mechanism\u2019s sample error (bounded when G occurs). The second is the distortion\nof the sample mean introduced by setting the query values of i \u2208 Aj to 0. This distortion is at most\n|Aj|/N. When F occurs, Aj has size at most |X| + |Aj \\ X| \u2264 n + \u03b5N/8 = \u03b5N/4, so the second\nterm is at most \u03b5/4. Finally, the last term is bounded by \u03b5 when H occurs, by de\ufb01nition. The three\nterms add to at most 3\u03b5 when F , G, and H all occur.\n\nj=1\n\n(cid:105)\n\nWe can bound the probability of H via a Chernoff bound: The probability of that a binomial random\nvariable deviates from its mean by \u03b5N is at most 2 exp(\u2212\u03b52N/3).\nThe technical core of the proof is the use of the \ufb01ngerprinting lemma to analyze the difference\ni=1 \u02dczi \u2212\n\nD between the sum of untruncated scores and the summed population errors: D := (cid:80)N\n(cid:12)(cid:12)aj \u2212 qj(P)(cid:12)(cid:12) \u2212 kE(cid:104) |Aj|\n(cid:80)k\nProof. We show that for each round j, the expected sum of scores for that round(cid:80)\n1/12 \u2212 E(cid:104)|aj \u2212 qj(P)| \u2212 |Aj|\n\nSub-Claim 2.10. E [D] = \u2126(k)\n\ni is at least\n. This is true even when we condition on all the random choices\nand communication in rounds 1 through j \u2212 1. Adding up these expectations over all rounds gives\nthe desired expectation bound for D.\n\nN\u2212|Aj|\n\nN\u2212|Aj|\n\ni \u02dczj\n\n(cid:105)\n\n7\n\n\fFirst, note that summing zj\nunaccused elements i \u2208 [N ] \\ Aj (since \u02dczj\n\ni over all elements i \u2208 [N ] is the same as summing over that round\u2019s\n\nN(cid:88)\n\ni=1\n\n(cid:88)\n\ni = 0 for i \u2208 Aj). Thus,\n(cid:88)\ni = (aj \u2212 pj)\n\u02dczj\n\ni\u2208[N ]\\Aj\n\n\u02dczj\ni =\n\ni\u2208[N ]\\Aj\n\ni \u2212 pj) .\n(qj\n\nWe can now apply the Fingerprinting Lemma, with m = N \u2212 |Aj|, p = pj, xi = \u02dcqj\ni for i /\u2208 Aj, and\nf ((xi)i /\u2208Aj ) = aj (note that f depends implicitly on Aj, but since we condition on the outcome of\nprevious rounds, we may take Aj as \ufb01xed for round j). We obtain\n\n(cid:34) N(cid:88)\n\nE\n\n(cid:35)\n\n\u02dczj\ni\n\n\u2212 E\n\n\u2265 1\n12\n\n(cid:80)\n\n(cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)aj \u2212\n\nN \u2212 |Aj| \u00b7 (cid:88)\n\n1\n\ni /\u2208Aj\n\n(cid:35)\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\nqj\ni\n\n(cid:80)N\n\nN\n\ni=1 qj\n\ni /\u2208Aj qj\n\nN \u2212 1\n\nN\u2212|Aj| ) =\n\n|Aj|\nN\u2212|Aj|.\n\ni and the actual population mean 1\n\ni is at\n1\nN\u2212|Aj|\n|Aj|\nN\u2212|Aj|. Thus we can upper-bound the term inside the right-hand side\n\ni=1\nNow the difference between\nmost N \u00b7 ( 1\nexpectation above by |aj \u2212 qj(P)| +\nA direct corollary of Sub-Claim 2.10 is that there is a constant c(cid:48) > 0 such that, with probability at\nleast 199/200, D \u2265 c(cid:48)k. Let\u2019s call that event I.\nConditioned on F \u2229 G \u2229 H, we know that each \u02dczi equals the real score zi (by the \ufb01rst sub-claim\nabove), that |aj \u2212 qj(P)| \u2264 3\u03b5 for each j, and that |Ak| \u2264 \u03b5N/8. If we also consider the intersection\nwith I, then we have D \u2265 c(cid:48)k \u2212 3k\u03b5 \u2212 k \u03b5/8\n1\u2212\u03b5/8 \u2265 k(c(cid:48) \u2212 4\u03b5) (for suf\ufb01ciently small \u03b5). By a union\nbound, the probability of \u00ac(F \u2229 H \u2229 I) is at most 1/200 + exp(\u2212\u2126(\u03b52n)) \u2264 1/100 (for suf\ufb01ciently\ni=1 zi \u2265 ck\n100 , where c = c(cid:48) \u2212 4\u03b5 is positive for\n\n(cid:17)(cid:105) \u2265 99\n\n(cid:16)(cid:80)N\n\n(\u00acG) or\n\nsuf\ufb01ciently small \u03b5. This completes the proof of Claim 2.7.\nTo complete the proof of the proposition, suppose that |aj \u2212 qj(P)| \u2264 \u03b5 for every j, so that we can\ni\u2208X zi = \u2126(k). Then, we can show that, when n is suf\ufb01ciently large and k (cid:38) \u03b54n2, the\n\ufb01nal query q\u2217 will violate robust generalization. A relatively straightforward calculation (omitted for\n\u221a\nspace) shows that for the query q\u2217 that we de\ufb01ned, q\u2217(X) \u2212 q\u2217(P) = \u0398(\u03b5\nk). Now, we choose an\nappropriate k = \u0398(\u03b54n2) we will have that q\u2217(X) \u2212 q\u2217(P) > \u03b5. By this choice of k, the \ufb01rst term\n\u221a\nin the \ufb01nal line above will be at least 2\u03b5. Also, we have N \u2265 n = \u0398(\n\u221a\nk/\u03b52), so when k is larger\nk) \u2264 \u03b5. Thus, by\nthan some absolute constant, the O(1/\nClaims 2.6 and 2.7, either M fails to be accurate, so that \u2203j \u2208 [k] |aj \u2212 qj(P)| > \u03b5, or we \ufb01nd a\nquery q\u2217 such that q\u2217(X) \u2212 q\u2217(P) > \u03b5.\n\n\u221a\nN ) term in the \ufb01nal line above is \u0398(\u03b5/ 4\n\nlarge n). Thus we get P(cid:104)\nassume(cid:80)\n\n2.3 Lower Bounds for All Algorithms via Random Masks\n\nWe prove Theorem 1.2 by constructing the following transformation from an adversary that defeats\nall natural algorithms to an adversary that defeats all algorithms. The main idea of the reduction is to\nuse random masks to hide information about the evaluation of the queries at points outside of the\ndataset, which effectively forces the algorithm to behave like a natural algorithm because, intuitively,\nit does not know where to evaluate the query apart from on the dataset. The reduction is described in\nAlgorithm 3. Due to space restrictions, we omit its analysis due to space.\n3 Post Hoc Generalization Does Not Compose\nIn this section we prove that post hoc generalization is not closed under composition.\nTheorem 3.1. For every n \u2208 N and every \u03b1 > 0 there is a collection of (cid:96) = O( 1\n\u03b1 log n) algorithms\nM1, . . . ,M(cid:96) : ({0, 1}5 log n)n \u2192 Y such that (1) for every i = 1, . . . , (cid:96) and \u03b4 > 0, Mi satis\ufb01es\n\n(\u03b5, \u03b4)-post hoc generalization for \u03b5 = O((cid:112)log(n/\u03b4)/n1\u2212\u03b1), but (2) the composition (M1, . . . ,M(cid:96))\nis not(cid:0)2 \u2212 2\n\n(cid:1)-post hoc generalizing.\n\nn4 , 1 \u2212 1\n\n2n3\n\nThe result is based on an algorithm that we call Encrypermute. Before proving Theorem 3.1, we\nintroduce Encrypermute and establish the main property that it satis\ufb01es.\nThe key facts about Encrypermute are as follows.\n\n8\n\n\fAlgorithm 3: AAQ\nParameters: sample size n, universe size N = 8n\nOracle: an adversary ANAQ for natural algorithms with sample size n, universe size N, number of\nqueries k, target accuracy \u03b5.\nLet X = {(i, y)}i\u2208[N ],y\u2208{\u00b11}k\nFor i \u2208 [N ]\nLet P be the uniform distribution over pairs (i, mi) for i \u2208 [N ]\nFor j \u2208 [k]\n\n\u03b5 , number of queries k, target accuracy \u03b5.\n\ni ) \u223c U({\u00b11}k)\n\nChoose mi = (m1\n\ni , . . . , mk\n\nReceive the query \u02c6qj : [N ] \u2192 [\u00b11] from ANAQ\nForm the query qj(i, y) = yj \u2295 mj\nSend the query qj to M and receive the answer aj\nSend the answer aj to ANAQ\n\ni \u2295 \u02c6qj(i) (NB: qj(i, mi) = \u02c6qj(i))\n\nAlgorithm 4: Encrypermute\nInput: Parameter k, and a sample X = (x1, x2, . . . , xn) \u2208 ({0, 1}d)n for d = 5 log n.\nIf X contains n distinct elements\n\nLet \u03c0 be the permutation that sorts (x1, . . . , xk) and identify \u03c0 with r \u2208 {0, 1, . . . , k! \u2212 1}\nLet \u03b1 \u2208 [0, 1] be the largest number such that k \u2265 n\u03b1 and let t \u2190 \u03b1k/20 (NB: 2dt \u2264 k!)\nIdentify (xk+1, . . . , xk+t) \u2208 ({0, 1}d)t with a number m \u2208 {0, 1, . . . , k! \u2212 1}\nReturn c = m + r mod k!\nReturn a random number c \u2208 {0, 1, . . . , k! \u2212 1}\n\nElse\n\nClaim 3.2. Let D be any distribution over ({0, 1}d)n. Let D \u223c D, let X be a random permutation\nof D, and let C \u2190 Encrypermute(X). Then D and C are independent.\nIntuitively, the claim follows from the fact that r is uniformly random and depends only on the\npermutation, so it is independent of D. Therefore m + r mod k! is random and independent of m.\n\nLemma 3.3. \u2200\u03b4 > 0, Encrypermute satis\ufb01es (\u03b5, \u03b4)-post hoc generalization for \u03b5 =(cid:112)2 ln(2/\u03b4)/n.\n\n.\n\n(cid:17)\n\n(cid:16)(cid:112)log(n/\u03b4)/n1\u2212\u03b1\n\nIntuitively the lemma follows from the fact that C is independent of D. We omit the proof of both of\nthese claims due to space restrictions.\nProof of Theorem 3.1. Fix \u03b1 \u2208 (0, 1), and let M1 denote the mechanism that takes a database of\nsize n and outputs the \ufb01rst n\u03b1 elements of its sample. As M1 outputs a sublinear portion of its input,\nit satis\ufb01es post hoc generalization with strong parameters. Speci\ufb01cally, by [7, Lemma 3.5], M1 is\n(\u03b5, \u03b4)-post hoc generalizing for \u03b5 = O\nNow consider composing M1 with O( 1\nchoices for the parameter k, where for the ith copy we set k = (1 + \u03b1\n\n\u03b1 log n) copies of Encrypermute, with exponentially growing\n20 )i \u00b7 n\u03b1. By Lemma 3.3, each\n\nof these mechanisms satis\ufb01es post hoc generalization for \u03b5 = O((cid:112)log(1/\u03b4)/n), so this composition\nanalysis, X contains n distinct elements with probability at least(cid:0)1 \u2212 1\n(cid:1). Assuming that this\n\nsatis\ufb01es the assumptions of the theorem.\nLet P be the uniform distribution over {0, 1}d, where d = 5 log n, and let X \u223c P\u2297n. By a standard\n\nis the case, we have that the \ufb01rst copy of Encrypermute outputs c = m + r mod k!, where m\nencodes the rows of X in positions n\u03b1 + 1, . . . , (1 + \u03b1\n20 )n\u03b1, and where r is a deterministic function\nof the \ufb01rst n\u03b1 rows of X. Hence, when composed with M1, these two mechanism reveal the \ufb01rst\n20 )n\u03b1 rows of X. By induction, the output of the composition of all the copies of Encrypermute\n(1+ \u03b1\nwith M1 reveals all of X. Hence, from the output this composition, we can de\ufb01ne the predicate\nq : {0, 1}d \u2192 {\u00b11} that evaluates to 1 on every element of X, and to -1 otherwise. This predicate\nsatis\ufb01es q(X) = 1 but q(P) \u2264 \u22121 + 2n/2d = \u22121 + 2/n4.\n\n2n3\n\n9\n\n\fReferences\n[1] Raef Bassily, Kobbi Nissim, Adam D. Smith, Thomas Steinke, Uri Stemmer, and Jonathan\n\nUllman. Algorithmic stability for adaptive data analysis. In STOC, 2016.\n\n[2] Richard Berk, Lawrence Brown, Andreas Buja, Kai Zhang, Linda Zhao, et al. Valid post-\n\nselection inference. The Annals of Statistics, 41(2):802\u2013837, 2013.\n\n[3] Dan Boneh and James Shaw. Collusion-secure \ufb01ngerprinting for digital data. IEEE Transactions\n\non Information Theory, 44(5):1897\u20131905, 1998.\n\n[4] Andreas Buja, Richard Berk, Lawrence Brown, Edward George, Emil Pitkin, Mikhail Traskin,\nLinda Zhao, and Kai Zhang. Models as approximations: A conspiracy of random regressors\nand model deviations against classical inference in regression. Statistical Science, 1460, 2015.\n[5] Mark Bun, Thomas Steinke, and Jonathan Ullman. Make up your mind: The price of online\n\nqueries in differential privacy. In SODA, 2017.\n\n[6] Mark Bun, Jonathan Ullman, and Salil P. Vadhan. Fingerprinting codes and the price of\n\napproximate differential privacy. In STOC, pages 1\u201310. ACM, May 31 \u2013 June 3 2014.\n\n[7] Rachel Cummings, Katrina Ligett, Kobbi Nissim, Aaron Roth, and Zhiwei Steven Wu. Adaptive\n\nlearning with robust generalization guarantees. In COLT, 2016.\n\n[8] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron\n\nRoth. Generalization in adaptive data analysis and holdout reuse. In NIPS, 2015.\n\n[9] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron\n\nRoth. Preserving statistical validity in adaptive data analysis. In STOC, 2015.\n\n[10] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to\n\nsensitivity in private data analysis. In TCC, 2006.\n\n[11] Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vadhan. Robust\n\ntraceability from trace amounts. In FOCS, 2015.\n\n[12] Bradley Efron. Estimation and accuracy after model selection. Journal of the American\n\nStatistical Association, 109(507):991\u20131007, 2014.\n\n[13] William Fithian, Dennis Sun, and Jonathan Taylor. Optimal inference after model selection.\n\narXiv preprint arXiv:1410.2597, 2014.\n\n[14] Andrew Gelman and Eric Loken. The statistical crisis in science. Am Sci, 102(6):460, 2014.\n[15] Moritz Hardt and Jonathan Ullman. Preventing false discovery in interactive data analysis is\n\nhard. In FOCS, 2014.\n\n[16] Clifford M Hurvich and Chih-Ling Tsai. The impact of model selection on inference in linear\n\nregression. The American Statistician, 44(3):214\u2013217, 1990.\n\n[17] Michael J. Kearns. Ef\ufb01cient noise-tolerant learning from statistical queries. In STOC, 1993.\n[18] Benedikt M P\u00f6tscher. Effects of model selection on inference. Econometric Theory, 1991.\n[19] Daniel Russo and James Zou. Controlling bias in adaptive data analysis using information\n\ntheory. In AISTATS, 2016.\n\n[20] Thomas Steinke and Jonathan Ullman. Interactive \ufb01ngerprinting codes and the hardness of\n\npreventing false discovery. In COLT, 2015.\n\n[21] Thomas Steinke and Jonathan Ullman. Tight lower bounds for differentially private selection.\n\nIn FOCS, 2017.\n\n[22] G\u00e1bor Tardos. Optimal probabilistic \ufb01ngerprint codes. J. ACM, 55(2), 2008.\n[23] Jonathan Taylor and Robert J Tibshirani. Statistical learning and selective inference. Proceedings\n\nof the National Academy of Sciences, 112(25):7629\u20137634, 2015.\n\n[24] Yu-Xiang Wang. New Paradigms and Optimality Guarantees in Statistical Learning and\n\nEstimation. PhD thesis, Carnegie Mellon University, 2017.\n\n10\n\n\f", "award": [], "sourceid": 3150, "authors": [{"given_name": "Jonathan", "family_name": "Ullman", "institution": "Northeastern University"}, {"given_name": "Adam", "family_name": "Smith", "institution": "Boston University"}, {"given_name": "Kobbi", "family_name": "Nissim", "institution": "Georgetown University"}, {"given_name": "Uri", "family_name": "Stemmer", "institution": "Ben-Gurion University"}, {"given_name": "Thomas", "family_name": "Steinke", "institution": "IBM Research - Almaden"}]}