{"title": "Optimization of Smooth Functions with Noisy Observations: Local Minimax Rates", "book": "Advances in Neural Information Processing Systems", "page_first": 4338, "page_last": 4349, "abstract": "We consider the problem of global optimization of an unknown non-convex smooth function with noisy zeroth-order feedback. We propose a local minimax framework to study the fundamental difficulty of optimizing smooth functions with adaptive function evaluations. We show that for functions with fast growth around their global minima, carefully designed optimization algorithms can identify a near global minimizer with many fewer queries than worst-case global minimax theory predicts. For the special case of strongly convex and smooth functions, our implied convergence rates match the ones developed for zeroth-order convex optimization problems. On the other hand, we show that in the worst case no algorithm can converge faster than the minimax rate of estimating an unknown functions in linf-norm. Finally, we show that non-adaptive algorithms, although optimal in a global minimax sense, do not attain the optimal local minimax rate.", "full_text": "OptimizationofSmoothFunctionswithNoisyObservations:LocalMinimaxRatesYiningWang,SivaramanBalakrishnan,AartiSinghDepartmentofMachineLearningandStatisticsCarnegieMellonUniversity,Pittsburgh,PA,15213,USA{yiningwa,aarti}@cs.cmu.edu,siva@stat.cmu.eduAbstractWeconsidertheproblemofglobaloptimizationofanunknownnon-convexsmoothfunctionwithnoisyzeroth-orderfeedback.Weproposealocalminimaxframeworktostudythefundamentaldif\ufb01cultyofoptimizingsmoothfunctionswithadaptivefunctionevaluations.Weshowthatforfunctionswithfastgrowtharoundtheirglobalminima,carefullydesignedoptimizationalgorithmscanidentifyanearglobalminimizerwithmanyfewerqueriesthanworst-caseglobalminimaxtheorypredicts.Forthespecialcaseofstronglyconvexandsmoothfunctions,ourimpliedconvergenceratesmatchtheonesdevelopedforzeroth-orderconvexoptimizationproblems.Ontheotherhand,weshowthatintheworstcasenoalgorithmcanconvergefasterthantheminimaxrateofestimatinganunknownfunctionsin\u20188-norm.Finally,weshowthatnon-adaptivealgorithms,althoughoptimalinaglobalminimaxsense,donotattaintheoptimallocalminimaxrate.1IntroductionGlobalfunctionoptimizationwithstochastic(zeroth-order)queryoraclesisanimportantprobleminoptimization,machinelearningandstatistics.Tooptimizeanunknownboundedfunctionf:X\u00de\u00d1Rde\ufb01nedonaknowncompactd-dimensionaldomainX\u010eRd,thedataanalystmakesnactivequeriesx1,...,xnPXandobservesyt\u201cfpxtq`wt,wti.i.d.\u201eNp0,1q,1t\u201c1,...,n.(1)Thequeriesx1,...,xtareactiveinthesensethattheselectionofxtcandependonpreviousqueriesandtheirresponsesx1,y1,...,xt\u00b41,yt\u00b41.Afternqueries,anestimatepxnPXisproducedthatapproximatelyminimizestheunknownfunctionf.Such\u201cactivequery\u201dmodelsarerelevantinabroadrangeof(noisy)globaloptimizationapplications,forinstanceinhyper-parametertuningofmachinelearningalgorithms[40]andsequentialdesigninmaterialsynthesisexperimentswherethegoalistomaximizestrengthsoftheproducedmaterials[35,41].Sec.2.1givesarigorousformulationoftheactivequerymodelandcontrastsitwiththeclassicalpassivequerymodel.Theerrorofanestimatepxnismeasuredbythedifferenceoffppxnqandtheglobalminimumoff:Lppxn;fq:\u201cfppxnq\u00b4f\u02dawheref\u02da:\u201cinfxPXfpxq.(2)ThroughoutthepaperwetakeXtobethed-dimensionalunitcuber0,1sd,whileourresultscanbeeasilygeneralizedtoothercompactdomainssatisfyingminimalregularityconditions.Whenfbelongstoasmoothnessclass,saytheH\u00f6lderclasswithexponent\u03b1,astraightforwardglobaloptimizationmethodisto\ufb01rstsamplenpointsuniformlyatrandomfromXandthenconstruct1Theexactdistributionof\u03b5tisnotimportant,andourresultsholdforsub-Gaussiannoisetoo.32ndConferenceonNeuralInformationProcessingSystems(NeurIPS2018),Montr\u00e9al,Canada.\fnonparametricestimatespfnoffusingnonparametricregressionmethodssuchas(high-order)kernelsmoothingorlocalpolynomialregression[17,46].Classicalanalysisshowsthatthesup-normreconstructionerror}pfn\u00b4f}8\u201csupxPX|pfnpxq\u00b4fpxq|canbeupperboundedbyrOPpn\u00b4\u03b1{p2\u03b1`dqq2.ThisglobalreconstructionguaranteethenimpliesanrOPpn\u00b4\u03b1{p2\u03b1`dqqupperboundonLppxn;fqbyconsideringpxnPXsuchthatpfnppxnq\u201cinfxPXpfnpxq(suchanpxnexistsbecauseXisclosedandbounded).Formally,wehavethefollowingproposition(provedintheAppendix)thatconvertsaglobalreconstructionguaranteeintoanupperboundonoptimizationerror:Proposition1.Supposepfnppxnq\u201cinfxPXpfnpxq.ThenLppxn;fq\u010f2}pfn\u00b4f}8.Typically,fundamentallimitsontheoptimaloptimizationerrorareunderstoodthroughthelensofminimaxanalysiswheretheobjectofstudyisthe(global)minimaxrisk:infpxnsupfPFEfLppxn,fq,(3)whereFisacertainsmoothnessfunctionclasssuchastheH\u00f6lderclass.Althoughoptimizationappearstobeeasierthanglobalreconstruction,weshowthatthen\u00b4\u03b1{p2\u03b1`dqrateisnotimprovableintheglobalminimaxsenseinEq.(3)overH\u00f6lderclasses.Suchasurprisingphenomenonwasalsonotedinpreviousworks[9,22,44]forrelatedproblems.Ontheotherhand,extensiveempiricalevidencesuggeststhatnon-uniform/activeallocationsofquerypointscansigni\ufb01cantlyreduceopti-mizationerrorinpracticalglobaloptimizationofsmooth,non-convexfunctions[40].Thisraisestheinterestingquestionofunderstanding,fromatheoreticalperspective,underwhatconditions/inwhatscenariosisglobaloptimizationofsmoothfunctionseasierthantheirreconstruction,andthepowerofactive/feedback-drivenqueriesthatplayimportantrolesinglobaloptimization.Inthispaper,weproposeatheoreticalframeworkthatpartiallyanswerstheabovequestions.Incontrasttoclassicalglobalminimaxanalysisofnonparametricestimationproblems,weadoptalocalanalysiswhichcharacterizestheoptimalconvergencerateofoptimizationerrorwhentheunderlyingfunctionfiswithintheneighborhoodofa\u201creference\u201dfunctionf0.(SeeSec.2.2forarigorousformulation.)OurmainresultsaretocharacterizethelocalconvergenceratesRnpf0qforawiderangeofreferencefunctionsf0PF.Ourcontributionscanbesummarizedasfollows:1.Wedesignaniterative(active)algorithmwhoseoptimizationerrorLppxn;fqconvergesatarateofRnpf0qdependingonthereferencefunctionf0.Whenthelevelsetsoff0satisfycertainregularityandpolynomialgrowthconditions,thelocalrateRnpf0qcanbeupperboundedbyRnpf0q\u201crOpn\u00b4\u03b1{p2\u03b1`d\u00b4\u03b1\u03b2qq,where\u03b2Pr0,d{\u03b1sisaparameterdependingonf0thatcharacterizesthevolumegrowthoflevelsetsoff0.(Seeassumption(A2),Proposition2andTheorem1fordetails).Theratematchestheglobalminimaxraten\u00b4\u03b1{p2\u03b1`dqforworst-casef0where\u03b2\u201c0,buthasthepotentialofbeingmuchfasterwhen\u03b2\u01050.Weemphasizethatouralgorithmhasnoknowledgeoff0,\u03b1or\u03b2andachievesthisrateadaptively.2.Weprovelocalminimaxlowerboundsthatmatchthen\u00b4\u03b1{p2\u03b1`d\u00b4\u03b1\u03b2qupperbound,uptologa-rithmicfactorsinn.Morespeci\ufb01cally,weshowthateveniff0isknown,no(active)algorithmcanestimatefincloseneighborhoodsoff0ataratefasterthann\u00b4\u03b1{p2\u03b1`d\u00b4\u03b1\u03b2q.Wefurthershowthat,ifactivequeriesarenotavailableandx1,...,xnarei.i.d.uniformlysampledfromX,then\u00b4\u03b1{p2\u03b1`dqglobalminimaxratealsoapplieslocallyregardlessofhowlarge\u03b2is.Thus,thereisanexplicitgapbetweenlocalminimaxratesofactiveanduniformquerymodels.3.Inthespecialcasewhenfisconvex,theglobaloptimizationproblemisusuallyreferredtoaszeroth-orderconvexoptimizationandthisproblemhasbeenwidelystudied[1,2,6,18,24,36].Ourresultsimplythat,whenf0isstronglyconvexandsmooth,thelocalminimaxrateRnpf0qisontheorderofrOpn\u00b41{2q,whichmatchestheconvergenceratesin[1].Additionally,ournegativeresults(Theorem2)indicatethatthen\u00b41{2ratecannotbeachievediff0ismerelyconvex,whichseemstocontradictn\u00b41{2resultsin[2,6]thatdonotrequirestrongconvexityoff.However,itshouldbenotedthatmereconvexityoff0doesnotimplyconvexityoffinaneighborhoodoff0(e.g.,}f\u00b4f0}8\u010f\u03b5).Ourresultsshowsigni\ufb01cantdifferencesintheintrinsicdif\ufb01cultyofzeroth-orderoptimizationofconvexandnear-convexfunctions.2IntherOp\u00a8qorrOPp\u00a8qnotationwedroppoly-logarithmicdependencyonn2\f1.1RelatedWorkGlobaloptimization,knownvariouslyasblack-boxoptimization,Bayesianoptimizationandthecontinuous-armedbandit,hasalonghistoryintheoptimizationresearchcommunity[25,26]andhasalsoreceivedasigni\ufb01cantamountofrecentinterestinstatisticsandmachinelearning[8,9,22,31,32,40].Manypreviousworks[8,28]havederivedratesfornon-convexsmoothpayoffsin\u201ccontinuum-armed\u201dbanditproblems;however,theydonotconsiderlocalratesspeci\ufb01ctoobjectivefunctionswithcertaingrowthconditionsaroundtheoptima.Amongtheexistingworks,[20,34]isprobablytheclosesttoourpaper,whichstudiedasimilarproblemofestimatingthesetofalloptimaofasmoothfunctioninHausdorff\u2019sdistance.ForH\u00f6ldersmoothfunctionswithpolynomialgrowth,[34]derivesann\u00b41{p2\u03b1`d\u00b4\u03b1\u03b2qminimaxratefor\u03b1\u01031(laterimprovedto\u03b1\u011b1inhisthesis[33]),whichissimilartoourPropositions2and3.[20,34]alsodiscussedadaptivitytounknownsmoothnessparameters.Wehoweverremarkonseveraldifferencesbetweenourworkand[34].First,in[20,34]onlyfunctionswithpolynomialgrowthareconsidered,whileinourTheorems1and2functionals\u03b5Unpf0qand\u03b5Lnpf0qareproposedforgeneralreferencefunctionsf0satisfyingmildregularityconditions,whichincludefunctionswithpolynomialgrowthasspecialcases.Inaddition,[34]considerstheharderproblemofestimatingmaximasetsinHausdorffdistancethanproducingasingleapproximateoptimapxT.Asaresult,sincetheconstructionofminimaxlowerboundin[34]isnolongervalidasanalgorithm,withoutdistinguishingbetweentwofunctionswithdifferentoptimalsets,canneverthelessproduceagoodapproximateoptimizeraslongasthetwofunctionsunderconsiderationhaveoverlappingoptimalsets.Newconstructionsandinformation-theoreticaltechniquesarethereforerequiredtoprovelowerboundsundertheweaker(one-point)approximateoptimizationframework.Finally,weproveaminimaxlowerboundswhenonlyuniformquerypointsareavailableanddemonstrateasigni\ufb01cantgapbetweenalgorithmshavingaccesstouniformoradaptivelychosendatapoints.[31,32]imposeadditionalassumptionsonthelevelsetsoftheunderlyingfunctiontoobtainanimprovedconvergencerate.Thelevelsetassumptionsconsideredinthementionedreferencesareratherrestrictiveandessentiallyrequiretheunderlyingfunctiontobeuni-modal,whileourassumptionsaremuchmore\ufb02exibleandapplytomulti-modalfunctionsaswell.Inaddition,[31,32]consideredanoiselesssettinginwhichexactfunctionevaluationsfpxtqcanbeobtained,whileourpaperstudiesthenoisecorruptedmodelinEq.(1)forwhichvastlydifferentconvergenceratesarederived.Finally,nomatchinglowerboundswereprovedin[31,32].[43]consideredzeroth-orderoptimizationofapproximatelyconvexfunctionsandderivednecessaryandsuf\ufb01cientconditionsfortheconvergenceratestobepolynomialindomaindimensiond.The(stochastic)globaloptimizationproblemissimilartomodeestimationofeitherdensitiesorregressionfunctions,whichhasarichliterature[13,27,39].Animportantdifferencebetweenstatisticalmodeestimationandglobaloptimizationisthewaysample/querypointsx1,...,xnPXaredistributed:inmodeestimationitiscustomarytoassumethesamplesareindependentlyandidenticallydistributed,whileinglobaloptimizationsequentialdesignsofsamples/queriesareallowed.Furthermore,toestimate/locatethemodeofanunknowndensityorregressionfunction,suchamodehastobewell-de\ufb01ned;ontheotherhand,producinganestimatepxnwithsmallLppxn,fqiseasierandresultsinweakerconditionsimposedontheunderlyingfunction.Methodology-wise,ouriterativeprocedurealsoresemblesdisagreement-basedactivelearningmeth-ods[5,14,21].Theintermediatestepsofcandidatepointeliminationcanalsobeviewedassequencesoflevelsetestimationproblems[38,42,45]orclustertreeestimation[4,12]withactivequeries.Anotherlineofresearchhasfocusedon\ufb01rst-orderoptimizationofquasi-convexornon-convexfunctions[3,10,19,23,37,48],inwhichexactorunbiasedevaluationsoffunctiongradientsareavailableatquerypointsxPX.[48]consideredaCheeger\u2019sconstantrestrictiononlevelsetswhichissimilartoourlevelsetregularityassumptions(A2andA2\u2019).[15,16]studiedlocalminimaxratesof\ufb01rst-orderoptimizationofconvexfunctions.First-orderoptimizationdifferssigni\ufb01cantlyfromoursettingbecauseunbiasedgradientestimationisgenerallyimpossibleinthemodelofEq.(1).Furthermore,mostworkson(\ufb01rst-order)non-convexoptimizationfocusonconvergencetostationarypointsorlocalminima,whileweconsiderconvergencetoglobalminima.3\fFigure1:InformalillustrationsofouralgorithmthatattainsTheorem1(detailsintheappendix).Solidbluecurvesdepicttheunderlyingfunctionftobeoptimized,blackandredsoliddotsdenotethequerypointsandtheirresponsestpxt,ytqu,andblack/redverticallinesegmentscorrespondtouniformcon\ufb01denceintervalsonfunctionevaluationsconstructedusingcurrentbatchofdataobserved.Theleft\ufb01gureillustratesthe\ufb01rstepochofouralgorithm,wherequerypointsareuniformlysampledfromtheentiredomainX.Afterwards,sub-optimallocationsbasedonconstructedcon\ufb01denceintervalsareremoved,andashrinkt\u201ccandidateset\u201dS1isobtained.Thealgorithmthenproceedstothesecondepoch,illustratedintheright\ufb01gure,wherequerypoints(inred)aresampledonlyfromtherestrictedcandidatesetandshortercon\ufb01denceintervals(alsoinred)areconstructedandupdated.TheprocedureisrepeateduntilOplognqepochsarecompleted.2BackgroundandNotationWe\ufb01rstreviewstandardasymptoticnotationthatwillbeusedthroughoutthispaper.Fortwosequencestanu8n\u201c1andtbnu8n\u201c1,wewritean\u201cOpbnqoran\u00c0bniflimsupn\u00d18|an|{|bn|\u01038,orequivalentlybn\u201c\u2126panqorbn\u00c1an.Denotean\u201c\u0398pbnqoran\u2014bnifbothan\u00c0bnandan\u00c1bnhold.Wealsowritean\u201copbnqorequivalentlybn\u201c\u03c9panqiflimn\u00d18|an|{|bn|\u201c0.FortwosequencesofrandomvariablestAnu8n\u201c1andtBnu8n\u201c1,denoteAn\u201cOPpBnqifforevery\u0001\u01050,thereexistsC\u01050suchthatlimsupn\u00d18Prr|An|\u0105C|Bn|s\u010f\u0001.Forr\u01050,1\u010fp\u010f8andxPRd,wedenoteBprpxq:\u201ctzPRd:}z\u00b4x}p\u010fruasthed-dimensional\u2018p-ballofradiusrcenteredatx,wherethevector\u2018pnormisde\ufb01nedas}x}p:\u201cp\u0159dj\u201c1|xj|pq1{pfor1\u010fp\u01038and}x}8:\u201cmax1\u010fj\u010fd|xj|.ForanysubsetS\u010eRdwedenotebyBprpx;SqthesetBprpxqXS.2.1PassiveandActiveQueryModelsLetUbeaknownrandomquantityde\ufb01nedonaprobabilityspaceU.Thefollowingde\ufb01nitionscharacterizeallpassiveandactiveoptimizationalgorithms:De\ufb01nition1(Thepassivequerymodel).Letx1,...,xnbei.i.d.pointsuniformlysampledonXandy1,...,ynbeobservationsfromthemodelEq.(1).ApassiveoptimizationalgorithmAwithnqueriesisparameterizedbyamapping\u03c6n:px1,y1,...,xn,yn,Uq\u00de\u00d1pxnthatmapsthei.i.d.observationstpxi,yiquni\u201c1toanestimatedoptimumpxnPX,potentiallyrandomizedbyU.De\ufb01nition2(Theactivequerymodel).Anactiveoptimizationalgorithmcanbeparameterizedbymappingsp\u03c71,...,\u03c7n,\u03c6nq,wherefort\u201c1,...,n,\u03c7t:px1,y1,...,xt\u00b41,yt\u00b41,Uq\u00de\u00d1xtproducesaquerypointxtPXbasedonpreviousobservationstpxi,tiqut\u00b41i\u201c1,and\u03c6n:px1,y1,...,xn,yn,Uq\u00de\u00d1pxnproducesthe\ufb01nalestimate.Allmappingsp\u03c71,...,\u03c7n,\u03c6nqcanberandomizedbyU.2.2LocalMinimaxRatesWeusetheclassicallocalminimaxanalysis[47]tounderstandthefundamentalinformation-theoreticallimitsofnoisyglobaloptimizationofsmoothfunctions.Ontheupperboundside,4\fweseek(active)estimatorspxnsuchthatsupf0P\u0398supfP\u03981,}f\u00b4f0}8\u010f\u03b5npf0qPrfrLppxn;fq\u011bC1\u00a8Rnpf0qs\u010f1{4,(4)whereC1\u01050isapositiveconstant.Heref0P\u0398isreferredtoasthereferencefunction,andfP\u03981isthetrueunderlyingfunctionwhichisassumedtobe\u201cnear\u201df0.TheminimaxconvergencerateofLppxn;fqisthencharacterizedlocallybyRnpf0qwhichdependsonthereferencefunctionf0.Theconstantof1{4ischosenarbitrarilyandanysmallconstantleadstosimilarconclusions.Toestablishnegativeresults(i.e.,locallyminimaxlowerbounds),incontrasttotheupperboundformulation,weassumethepotentialactiveoptimizationestimatorpxnhasperfectknowledgeaboutthereferencefunctionf0P\u0398.WethenprovelocallyminimaxlowerboundsoftheforminfpxnsupfP\u03981,}f\u00b4f0}8\u010f\u03b5npf0qPrfrLppxn;fq\u011bC2\u00a8Rnpf0qs\u011b1{3,(5)whereC2\u01050isanotherpositiveconstantand\u03b5npf0q,Rnpf0qaredesiredlocalconvergenceratesforfunctionsnearthereferencef0.Althoughinsomesenseclassical,thelocalminimaxde\ufb01nitionweproposewarrantsfurtherdiscussion.1.Rolesof\u0398and\u03981:Thereferencefunctionf0andthetruefunctionsfareassumedtobelongtodifferentbutcloselyrelatedfunctionclasses\u0398and\u03981.Inparticular,inourpaper\u0398\u010e\u03981,meaningthatlessrestrictiveassumptionsareimposedonthetrueunderlyingfunctionfcomparedtothoseimposedonthereferencefunctionf0onwhichRnand\u03b5narebased.2.UpperBounds:Itisworthemphasizingthattheestimatorpxnhasnoknowledgeofthereferencefunctionf0.Fromtheperspectiveofupperbounds,wecanconsiderthesimplertaskofproducingf0-dependentbounds(eliminatingthesecondsupremum)toinsteadstudythe(alreadyinteresting)quantity:supf0P\u0398Prf0rLppxn;f0q\u011bC1Rnpf0qs\u010f1{4.Asindicatedabovewemaintainthedouble-supremuminthede\ufb01nitionbecausefewerassumptionsareimposeddirectlyonthetrueunderlyingfunctionf,andfurtherbecauseitallowstomoredirectlycompareourupperandlowerbounds.3.LowerBoundsandthechoiceofthe\u201clocalizationradius\u201d\u03b5npf0q:Ourlowerboundsallowtheestimatorknowledgeofthereferencefunction(thismakesestablishingthelowerboundmorechallenging).Eq.(5)impliesthatnoestimatorpxncaneffectivelyoptimizeafunctionfclosetof0beyondtheconvergencerateofRnpf0q,evenifperfectknowledgeofthereferencefunctionf0isavailableapriori.The\u03b5npf0qparameterthatdecidesthe\u201crange\u201dinwhichlocalminimaxratesapplyistakentobeonthesameorderastheactuallocalrateRnpf0qinthispaper.Thisis(uptoconstants)thesmallestradiusforwhichwecanhopetoobtainnon-triviallower-bounds:ifweconsideramuchsmallerradiusthanRnpf0qthenthetrivialestimatorwhichoutputstheminimizerofthereferencefunctionwouldachieveafasterratethanRnpf0q.Selectingthesmallestpossibleradiusmakesestablishingthelowerboundmostchallengingbutprovidesare\ufb01nedpictureofthecomplexityofzeroth-orderoptimization.3MainResultsWiththisbackgroundinplacewenowturnourattentiontoourmainresults.WebeginbycollectingourassumptionsaboutthetrueunderlyingfunctionandthereferencefunctioninSection3.1.WestateanddiscusstheconsequencesofourupperandlowerboundsinSections3.2and3.3respectively.WedefermosttechnicalproofstotheAppendixandturnourattentiontoouroptimizationalgorithminSectionA.3.1AssumptionsWe\ufb01rststateandmotivateassumptionsthatwillbeused.The\ufb01rstassumptionstatesthatfislocallyH\u00f6ldersmoothonitslevelsets.5\f(A1)Thereexistconstants\u03ba,\u03b1,M\u01050suchthatfrestrictedonXf,\u03ba:\u201ctxPX:fpxq\u010ff\u02da`\u03baubelongstotheH\u00f6lderclass\u03a3\u03b1pMq,meaningthatfisk-timesdifferentiableonXf,\u03baandfurthermoreforanyx,x1PXf,\u03ba,3k\u00ffj\u201c0\u00ff\u03b11`...`\u03b1d\u201cj|fp\u03b1,jqpxq|`\u00ff\u03b11`...`\u03b1d\u201ck|fp\u03b1,kqpxq\u00b4fp\u03b1,kqpx1q|}x\u00b4x1}\u03b1\u00b4k8\u010fM.(6)Herek\u201ct\u03b1uisthelargestintegerlowerbounding\u03b1andfp\u03b1,jqpxq:\u201cBjfpxq{Bx\u03b111...Bx\u03b1dd.Weuse\u03a3\u03b1\u03bapMqtodenotetheclassofallfunctionssatisfying(A1).Weremarkthat(A1)isweakerthanthestandardassumptionthatfonitsentiredomainXbelongstotheH\u00f6lderclass\u03a3\u03b1pMq.Thisisbecauseplaceswithfunctionvalueslargerthanf\u02da`\u03bacanbeeasilydetectedandremovedbyapre-processingstep.Wegivefurtherdetailsofthepre-processingstepinSectionA.3.Ournextassumptionconcernthe\u201cregularity\u201dofthelevelsetsofthe\u201creference\u201dfunctionf0.De\ufb01neLf0p\u0001q:\u201ctxPX:f0pxq\u010ff\u02da0`\u0001uasthe\u0001-levelsetoff0,and\u00b5f0p\u0001q:\u201c\u03bbpLf0p\u0001qqastheLebesguemeasureofLf0p\u0001q,alsoknownasthedistributionfunction.De\ufb01nealsoNpLf0p\u0001q,\u03b4qasthesmallestnumberof\u20182-ballsofradius\u03b4thatcoverLf0p\u0001q.(A2)Thereexistconstantsc0\u01050andC0\u01050suchthatNpLf0p\u0001q,\u03b4q\u010fC0r1`\u00b5f0p\u0001q\u03b4\u00b4dsforall\u0001,\u03b4Pp0,c0s.Weuse\u0398Ctodenoteallfunctionsthatsatisfy(A2)withrespecttoparametersC\u201cpc0,C0q.Atahigherlevel,theregularitycondition(A2)assumesthatthelevelsetsaresuf\ufb01ciently\u201cregular\u201dsuchthatcoveringthemwithsmall-radiusballsdoesnotrequiresigni\ufb01cantlylargertotalvolumes.Forexample,consideraperfectlyregularcaseofLf0p\u0001qbeingthed-dimensional\u20182ballofradiusr:Lf0p\u0001q\u201ctxPX:}x\u00b4x\u02da}2\u010fru.Clearly,\u00b5f0p\u0001q\u2014rd.Inaddition,the\u03b4-coveringnumberin\u20182ofLf0p\u0001qisontheorderof1`pr{\u03b4qd\u20141`\u00b5f0p\u0001q\u03b4\u00b4d,whichsatis\ufb01esthescalingin(A2).When(A2)holds,uniformcon\ufb01denceintervalsoffonitslevelsetsareeasytoconstructbecauselittlestatisticalef\ufb01ciencyislostbyslightlyenlargingthelevelsetssothatcompleted-dimensionalcubesarecontainedintheenlargedlevelsets.Ontheotherhand,whenregularityoflevelsetsfailstoholdsuchnonparametricestimationcanbeverydif\ufb01cultorevenimpossible.Asanextremeexample,supposethelevelsetLf0p\u0001qconsistsofnstandaloneandwell-spacedpointsinX:theLebesguemeasureofLf0p\u0001qwouldbezero,butatleast\u2126pnqqueriesarenecessarytoconstructuniformcon\ufb01denceintervalsonLf0p\u0001q.ItisclearthatsuchLf0p\u0001qviolates(A2),becauseNpLf0p\u0001q,\u03b4q\u011bnas\u03b4\u00d10`but\u00b5f0p\u0001q\u201c0.3.2UpperBoundThefollowingtheoremisourmainresultthatupperboundsthelocalminimaxrateofnoisyglobaloptimizationwithactivequeries.Theorem1.Forany\u03b1,M,\u03ba,c0,C0\u01050andf0P\u03a3\u03b1\u03bapMqX\u0398C,whereC\u201cpc0,C0q,de\ufb01ne\u03b5Unpf0q:\u201csup!\u03b5\u01050:\u03b5\u00b4p2`d{\u03b1q\u00b5f0p\u03b5q\u011bn{log\u03c9n),(7)where\u03c9\u01055`d{\u03b1isalargeconstant.Supposealsothat\u03b5Unpf0q\u00d10asn\u00d18.Thenforsuf\ufb01cientlylargen,thereexistsanestimatorpxnwithaccesstonactivequeriesx1,...,xnPX,aconstantCR\u01050dependingonlyon\u03b1,M,\u03ba,c,c0,C0andaconstant\u03b3\u01050dependingonlyon\u03b1anddsuchthatsupf0P\u03a3\u03b1\u03bapMqX\u0398CsupfP\u03a3\u03b1\u03bapMq,}f\u00b4f0}8\u010f\u03b5Unpf0qPrf\u201dLppxn,fq\u0105CRlog\u03b3n\u00a8p\u03b5Unpf0q`n\u00b41{2q\u0131\u010f1{4.(8)3theparticular\u20188normisusedforconvenienceonlyandcanbereplacedbyanyequivalentvectornorms.6\fRemark1.Unlikethe(local)smoothnessclass\u03a3\u03b1\u03bapMq,theadditionalfunctionclass\u0398Cthatencapsulates(A2)isimposedonlyonthe\u201creference\u201dfunctionf0butnotthetruefunctionftobeestimated.Thismakestheassumptionsconsiderablyweakerbecausethetruefunctionfmayviolate(A2)whileourresultsremainvalid.Remark2.Theestimatorpxndoesnotrequireknowledgeofparameters\u03ba,c0,C0or\u03b5Unpf0q,andautomaticallyadaptstothem,asshowninthenextsection.Whiletheknowledgeofsmoothnessparameters\u03b1andMseemstobenecessary,weremarkthatitispossibletoadaptto\u03b1andMbyrunningOplog2nqparallelsessionsofpxnonOplognqgridsof\u03b1andMvalues,andthenusing\u2126pn{log2nqsingle-pointqueriestodecideonthelocationwiththesmallestfunctionvalue.Suchanadaptivestrategywassuggestedin[20]toremoveanadditionalconditionin[34],whichalsoappliestooursettings.Remark3.Byrepeatingthealgorithmindependentlyforttimesandusingthe\u201cmultiplequery\u201dstrategyintheaboveremark,thefailureprobabilityofourproposedalgorithmcanbereducedtoassmallas4\u00b4t,anexponentiallydecayingprobabilitywithrespecttorepetitionst.Remark4.Whenthedistributionfunction\u00b5f0p\u0001qdoesnotchangeabruptlywith\u0001theexpressionof\u03b5Unpf0qcanbesigni\ufb01cantlysimpli\ufb01ed.Inparticular,ifforall\u0001Pp0,c0sitholdsthat\u00b5f0p\u0001{lognq\u011b\u00b5f0p\u0001q{rlognsOp1q,(9)then\u03b5Unpf0qcanbeupperboundedas\u03b5Unpf0q\u010frlognsOp1q\u00a8sup!\u03b5\u01050:\u03b5\u00b4p2`d{\u03b1q\u00b5f0p\u03b5q\u011bn).(10)Itisalsonotedthatif\u00b5f0p\u0001qhasapolynomialbehaviorof\u00b5f0p\u0001q\u2014\u0001\u03b2forsomeconstant\u03b2\u011b0,thenEq.(9)issatis\ufb01edandsoisEq.(10).Thequantity\u03b5Unpf0q\u201cinft\u03b5\u01050:\u03b5\u00b4p2`d{\u03b1q\u00b5f0p\u03b5q\u011bn{log\u03c9nuiscrucialindeterminingtheconvergencerateofoptimizationerrorofpxnlocallyaroundthereferencefunctionf0.Whilethede\ufb01nitionof\u03b5Unpf0qismostlyimplicitandinvolvessolvinganinequalityconcerningthedistributionfunction\u00b5f0p\u00a8q,weremarkthatitadmitsasimpleformwhen\u00b5f0hasapolynomialgrowthratesimilartoalocalTsybakovnoisecondition[29,46],asshownbythefollowingproposition:Proposition2.Suppose\u00b5f0p\u0001q\u00c0\u0001\u03b2forsomeconstant\u03b2Pr0,2`d{\u03b1q.Then\u03b5Unpf0q\u201crOpn\u00b4\u03b1{p2\u03b1`d\u00b4\u03b1\u03b2qq.Inaddition,if\u03b2Pr0,d{\u03b1sthen\u03b5Unpf0q`n\u00b41{2\u00c0\u03b5Unpf0q\u201crOpn\u00b4\u03b1{p2\u03b1`d\u00b4\u03b1\u03b2qq.Weremarkthatthecondition\u03b2Pr0,d{\u03b1swasalsoadoptedinthepreviouswork[34,Remark6]Also,forLipschitzcontinuousfunctions(\u03b1\u201c1)ourconditionsaresimilarto[20]andimpliesacorrespondingnear-optimalitydimensiond1consideredin[20].Proposition2canbeeasilyveri\ufb01edbysolvingthesystem\u03b5\u00b4p2`d{\u03b1q\u00b5f0p\u03b5q\u011bn{log\u03c9nwiththecondition\u00b5f0p\u0001q\u00c0\u0001\u03b2.Wethereforeomititsproof.Thefollowingtwoexamplesgivesomesimplereferencefunctionsf0thatsatisfythe\u00b5f0p\u0001q\u00c0\u0001\u03b2conditioninProposition2withparticularvaluesof\u03b2.Example1.Theconstantfunctionf0\u201d0satis\ufb01es(A1),(A2)andtheconditioninProposition2with\u03b2\u201c0.Example2.f0P\u03a32\u03bapMqthatisstronglyconvex4satis\ufb01es(A1),(A2)andtheconditioninProposition2with\u03b2\u201cd{2.Example1issimpletoverify,asthevolumeoflevelsetsoftheconstantfunctionf0\u201d0exhibitsaphasetransitionat\u0001\u201c0and\u0001\u01050,rendering\u03b2\u201c0theonlyparameteroptionforwhich\u00b5f0p\u0001q\u00c0\u0001\u03b2.Example2ismoreinvolved,andholdsbecausethestrongconvexityoff0lowerboundsthegrowthrateoff0whenmovingawayfromitsminimum.WegivearigorousproofofExample2intheappendix.Wealsoremarkthatf0doesnotneedtobeexactlystronglyconvexfor\u03b2\u201cd{2tohold,andtheexampleisvalidfor,e.g.,piecewisestronglyconvexfunctionswithaconstantnumberofpiecestoo.TobestinterprettheresultsinTheorem1andProposition2,itisinstructivetocomparethe\u201clocal\u201draten\u00b4\u03b1{p2\u03b1`d\u00b4\u03b1\u03b2qwiththebaselineraten\u00b4\u03b1{p2\u03b1`dq,whichcanbeattainedbyreconstructingf4Atwicedifferentiablefunctionf0isstronglyconvexifD\u03c3\u01050suchthat\u22072f0pxq\u013e\u03c3I,@xPX.7\finsup-normandapplyingProposition1.Since\u03b2\u011b0,thelocalconvergencerateestablishedinTheorem1isneverslower,andtheimprovementcomparedtothebaselineraten\u00b4\u03b1{p2\u03b1`dqisdictatedby\u03b2,whichgovernsthegrowthrateofvolumeoflevelsetsofthereferencefunctionf0.Inparticular,forfunctionsthatgrowsfastwhenmovingawayfromitsminimum,theparameter\u03b2islargeandthereforethelocalconvergenceratearoundf0couldbemuchfasterthann\u00b4\u03b1{p2\u03b1`dq.Theorem1alsoimpliesconcreteconvergenceratesforspecialfunctionsconsideredinExamples1and2.Fortheconstantreferencefunctionf0\u201d0,Example1andTheorem1yieldthatRnpf0q\u2014n\u00b4\u03b1{p2\u03b1`dq,whichmatchesthebaselineraten\u00b4\u03b1{p2\u03b1`dqandsuggeststhatf0\u201d0istheworst-casereferencefunction.Thisisintuitive,becausef0\u201d0hasthemostdrasticlevelsetchangeat\u0001\u00d10`andthereforesmallperturbationsanywhereoff0resultinchangesoftheoptimallocations.Ontheotherhand,iff0isstronglysmoothandconvexasinExample2,Theorem1suggeststhatRnpf0q\u2014n\u00b41{2,whichissigni\ufb01cantlybetterthanthen\u00b42{p4`dqbaselinerate5andalsomatchesexistingworksonzeroth-orderoptimizationofconvexfunctions[1].Thefasterrateholdsintuitivelybecausestronglyconvexfunctionsgrowsfastwhenmovingawayfromtheminimum,whichimpliessmalllevelsetchanges.Anactivequeryalgorithmcouldthenfocusmostofitsqueriesontothesmalllevelsetsoftheunderlyingfunction,resultinginmoreaccuratelocalfunctionreconstructionsandfasteroptimizationerrorrate.OurproofofTheorem1isconstructive,byupperboundingthelocalminimaxoptimizationerrorofanexplicitalgorithm.Atahigherlevel,thealgorithmpartitionsthenactivequeriesevenlyintolognepochs,andlevelsetsoffareestimatedattheendofeachepochbycomparing(uniform)con\ufb01denceintervalsonadensegridonX.Itisthenprovedthatthevolumeoftheestimatedlevelsetscontractsgeometrically,untilthetargetconvergencerateRnpf0qisattained.3.3LowerBoundsWeprovelocalminimaxlowerboundsthatmatchtheupperboundsinTheorem1uptologarithmicterms.AsweremarkedinSection2.2,inthelocalminimaxlowerboundformulationweassumethedataanalysthasfullknowledgeofthereferencefunctionf0,whichmakesthelowerboundsstrongerasmoreinformationisavailableapriori.Tofacilitatesuchastronglocalminimaxlowerbounds,thefollowingadditionalconditionisimposedonthereferencefunctionf0ofwhichthedataanalysthasperfectinformation.(A2\u2019)Thereexistconstantsc10,C10\u01050suchthatMpLf0p\u0001q,\u03b4q\u011bC10\u00b5f0p\u0001q\u03b4\u00b4dforall\u0001,\u03b4Pp0,c10s,whereMpLf0p\u0001q,\u03b4qisthemaximumnumberofdisjoint\u20182ballsofradius\u03b4thatcanbepackedintoLf0p\u0001q.Wedenote\u03981C1astheclassoffunctionsthatsatisfy(A2\u2019)withrespecttoparametersC1\u201cpc10,C10q\u01050.Intuitively,(A2\u2019)canberegardedasthe\u201creverse\u201dversionof(A2),whichbasicallymeansthat(A2)is\u201ctight\u201d.Wearenowreadytostateourmainnegativeresult,whichshows,fromaninformation-theoreticalperspective,thattheupperboundinTheorem1isnotimprovable.Theorem2.Suppose\u03b1,c0,C0,c10,C10\u01050and\u03ba\u201c8.DenoteC\u201cpc0,C0qandC1\u201cpc10,C10q.Foranyf0P\u0398CX\u03981C1,de\ufb01ne\u03b5Lnpf0q:\u201csup!\u03b5\u01050:\u03b5\u00b4p2`d{\u03b1q\u00b5f0p\u03b5q\u011bn).(11)ThenthereexistconstantM\u01050dependingon\u03b1,d,C,C1suchthat,foranyf0P\u03a3\u03b1\u03bapM{2qX\u0398CX\u0398C1,infpxnsupfP\u03a3\u03b1\u03bapMq,}f\u00b4f0}8\u010f2\u03b5Lnpf0qPrf\u201cLppxn;fq\u011b\u03b5Lnpf0q\u2030\u011b13.(12)Remark5.Foranyf0andnitalwaysholdsthat\u03b5Lnpf0q\u010f\u03b5Unpf0q.Remark6.Ifthedistributionfunction\u00b5f0p\u0001qsatis\ufb01esEq.(9)inRemark4,then\u03b5Lnpf0q\u011b\u03b5Unpf0q{rlognsOp1q.5Notethatf0beingstronglysmoothimplies\u03b1\u201c2inthelocalsmoothnessassumption.8\fRemark7.AstheupperboundinTheorem1mightdependsexponentiallyondomaindimensiond,theremightalsobeanexponentialgapofdbetweentheupperandlowerboundsestablishedinTheorems1and2.Remark5showsthattheremightbeagapbetweenthelocallyminimaxupperandlowerboundsinTheorems1and2.Nevertheless,Remark6showsthatunderthemildconditionof\u00b5f0p\u0001qdoesnotchangetooabruptlywith\u0001,thegapbetween\u03b5Unpf0qand\u03b5Lnpf0qisonlyapoly-logarithmicterminn.Additionally,thefollowingpropositionderivesexplicitexpressionof\u03b5Lnpf0qforreferencefunctionswhosedistributionfunctionshaveapolynomialgrowth,whichmatchestheProposition2uptolognfactors.Itsproofisagainstraightforward.Proposition3.Suppose\u00b5f0p\u0001q\u00c1\u0001\u03b2forsome\u03b2Pr0,2`d{\u03b1q.Then\u03b5Lnpf0q\u201c\u2126pn\u00b4\u03b1{p2\u03b1`d\u00b4\u03b1\u03b2qq.Thefollowingpropositionadditionallyshowstheexistenceoff0P\u03a3\u03b18pMqX\u0398CX\u0398C1thatsatis\ufb01es\u00b5f0p\u0001q\u2014\u0001\u03b2foranyvaluesof\u03b1\u01050and\u03b2Pr0,d{\u03b1s.Itsproofisgivenintheappendix.Proposition4.Fixarbitrary\u03b1,M\u01050and\u03b2Pr0,d{\u03b1s.Thereexistsf0P\u03a3\u03b1\u03bapMqX\u0398CX\u0398C1for\u03ba\u201c8andconstantsC\u201cpc0,C0q,C1\u201cpc10,C10qthatdependonlyon\u03b1,\u03b2,Manddsuchthat\u00b5f0p\u0001q\u2014\u0001\u03b2.Theorem2andProposition3showthatthen\u00b4\u03b1{p2\u03b1`d\u00b4\u03b1\u03b2qupperboundonlocalminimaxcon-vergencerateestablishedinTheorem1isnotimprovableuptologarithmicfactorsofn.Suchinformation-theoreticallowerboundsontheconvergenceratesholdevenifthedataanalysthasperfectinformationoff0,thereferencefunctiononwhichthen\u00b4\u03b1{p2\u03b1`d\u00b4\u03b1\u03b2qlocalrateisbased.Ourresultsalsoimplyann\u00b4\u03b1{p2\u03b1`dqminimaxlowerboundoverall\u03b1-H\u00f6ldersmoothfunctions,showingthatwithoutadditionalassumptions,noisyoptimizationofsmoothfunctionsisasdif\ufb01cultasreconstructingtheunknownfunctioninsup-norm.OurproofofTheorem2alsodiffersfromexistingminimaxlowerboundproofsforactivenonpara-metricmodels[11].TheclassicalapproachistoinvokeFano\u2019sinequalityandtoupperboundtheKLdivergencebetweendifferentunderlyingfunctionsfandgusing}f\u00b4g}8,correspondingtothepointxPXthatleadstothelargestKLdivergence.Suchanapproach,however,doesnotproducetightlowerboundsforourproblem.Toovercomesuchdif\ufb01culties,weborrowthelowerboundanalysisforbanditpureexplorationproblemsin[7].Inparticular,ouranalysisconsidersthequerydistributionofanyactivequeryalgorithmA\u201cp\u03d51,...,\u03d5n,\u03c6nqunderthereferencefunctionf0andboundstheperturbationinquerydistributionsbetweenf0andfusingLeCam\u2019slemma.Afterwards,anadversarialfunctionchoicefcanbemadebasedonthequerydistributionsoftheconsideredalgorithmA.Theorem2appliestoanyglobaloptimizationmethodthatmakesactivequeries,correspondingtothequerymodelinDe\ufb01nition2.Thefollowingtheorem,ontheotherhand,showsthatforpassivealgorithms(De\ufb01nition1)then\u00b4\u03b1{p2\u03b1`dqoptimizationrateisnotimprovableevenwithadditionallevelsetassumptionsimposedonf0.Thisdemonstratesanexplicitgapbetweenpassiveandadaptivequerymodelsinglobaloptimizationproblems.Theorem3.Suppose\u03b1,c0,C0,c10,C10\u01050and\u03ba\u201c8.DenoteC\u201cpc0,C0qandC1\u201cpc10,C10q.ThenthereexistconstantM\u01050dependingon\u03b1,d,C,C1andNdependingonMsuchthat,foranyf0P\u03a3\u03b1\u03bapM{2qX\u0398CX\u0398C1satisfying\u03b5Lnpf0q\u010fr\u03b5Ln\u201c:rlogn{ns\u03b1{p2\u03b1`dq,infqxnsupfP\u03a3\u03b1\u03bapMq,}f\u00b4f0}8\u010f2r\u03b5LnPrf\u201cLppxn;fq\u011br\u03b5Ln\u2030\u011b13foralln\u011bN.(13)Intuitively,theapparentgapdemonstratedbyTheorems2and3betweentheactiveandpassivequerymodelsstemsfromtheobservationthat,apassivealgorithmAonlyhasaccesstouniformlysampledquerypointsx1,...,xnandthereforecannotfocusonasmalllevelsetoffinordertoimprovequeryef\ufb01ciency.Inaddition,forfunctionsthatgrowfasterwhenmovingawayfromtheirminima(implyingalargervalueof\u03b2),thegapbetweenpassiveandactivequerymodelsbecomesbiggerasactivequeriescanmoreeffectivelyexploittherestrictedlevelsetsofsuchfunctions.9\f4ConclusionInthispaperweconsidertheproblemofnoisyzeroth-orderoptimizationofgeneralsmoothfunctions.Matchinglowerandupperboundsonthelocalminimaxconvergenceratesareestablished,whicharesigni\ufb01cantlydifferentfromclassicalminimaxratesinnonparametricregressionproblems.Manyinterestingfuturedirectionsexistalongthislineofresearch,includingexploitationofadditivestructuresintheunderlyingfunctionftocompletelyremovecurseofdimensionality,functionswithspatiallyheterogeneoussmoothnessorlevelsetgrowthbehaviors,andtodesignmorecomputationallyef\ufb01cientalgorithmsthatworkwellinpractice.AcknowledgementThisworkissupportedbyAFRLgrantFA8750-17-2-0212.Wethanktheanonymousreviewersformanyhelpfulsuggestionsthatimprovedthepresentationofthispaper.References[1]A.Agarwal,O.Dekel,andL.Xiao.Optimalalgorithmsforonlineconvexoptimizationwithmulti-pointbanditfeedback.InProceedingsoftheannualConferenceonLearningTheory(COLT),2010.[2]A.Agarwal,D.Foster,D.Hsu,S.Kakade,andA.Rakhlin.Stochasticconvexoptimizationwithbanditfeedback.SIAMJournalonOptimization,23(1):213\u2013240,2013.[3]N.Agarwal,Z.Allen-Zhu,B.Bullins,E.Hazan,andT.Ma.Findingapproximatelocalminimafasterthangradientdescent.InProceedingsoftheAnnualACMSIGACTSymposiumonTheoryofComputing(STOC),2017.[4]S.Balakrishnan,S.Narayanan,A.Rinaldo,A.Singh,andL.Wasserman.Clustertreesonmanifolds.InProceedingsofAdvancesinNeuralInformationProcessingSystems(NIPS),2013.[5]M.-F.Balcan,A.Beygelzimer,andJ.Langford.Agnosticactivelearning.JournalofComputerandSystemSciences,75(1):78\u201389,2009.[6]S.Bubeck,R.Eldan,andY.T.Lee.Kernel-basedmethodsforbanditconvexoptimization.InProceedingsoftheannualACMSIGACTSymposiumonTheoryofComputing(STOC),2017.[7]S.Bubeck,R.Munos,andG.Stoltz.Pureexplorationinmulti-armedbanditsproblems.InProceedingsoftheInternationalconferenceonAlgorithmiclearningtheory(ALT),2009.[8]S.Bubeck,R.Munos,G.Stoltz,andC.Szepesv\u00e1ri.X-armedbandits.JournalofMachineLearningResearch,12(May):1655\u20131695,2011.[9]A.D.Bull.Convergenceratesofef\ufb01cientglobaloptimizationalgorithms.JournalofMachineLearningResearch,12(Oct):2879\u20132904,2011.[10]Y.Carmon,O.Hinder,J.C.Duchi,andA.Sidford.\u201cconvexuntilprovenguilty\":Dimension-freeaccelerationofgradientdescentonnon-convexfunctions.arXivpreprintarXiv:1705.02766,2017.[11]R.M.CastroandR.D.Nowak.Minimaxboundsforactivelearning.IEEETransactionsonInformationTheory,54(5):2339\u20132353,2008.[12]K.Chaudhuri,S.Dasgupta,S.Kpotufe,andU.vonLuxburg.Consistentproceduresforclustertreeestimationandpruning.IEEETransactionsonInformationTheory,60(12):7900\u20137912,2014.[13]H.Chen.Lowerrateofconvergenceforlocatingamaximumofafunction.TheAnnalsofStatistics,16(3):1330\u20131334,1988.10\f[14]S.Dasgupta,D.J.Hsu,andC.Monteleoni.Ageneralagnosticactivelearningalgorithm.InProceedingsofAdvancesinneuralinformationprocessingsystems(NIPS),2008.[15]J.DuchiandF.Ruan.Localasymptoticsforsomestochasticoptimizationproblems:Optimality,constraintidenti\ufb01cation,anddualaveraging.arXivpreprintarXiv:1612.05612,2016.[16]J.C.Duchi,J.Lafferty,andY.Zhu.Localminimaxcomplexityofstochasticconvexoptimization.InNIPS,2016.[17]J.FanandI.Gijbels.Localpolynomialmodellinganditsapplications.CRCPress,1996.[18]A.D.Flaxman,A.T.Kalai,andH.B.McHanan.Onlineconvexoptimizationinthebanditsetting:gradientdescentwithoutagradient.InProceedingsoftheACM-SIAMSymposiumonDiscreteAlgorithms(SODA),2005.[19]R.Ge,F.Huang,C.Jin,andY.Yuan.Escapingfromsaddlepoints-onlinestochasticgradientfortensordecomposition.InProceedingsoftheannualConferenceonLearningTheory(COLT),2015.[20]J.-B.Grill,M.Valko,andR.Munos.Black-boxoptimizationofnoisyfunctionswithunknownsmoothness.InProceedingsofAdvancesinNeuralInformationProcessingSystems(NIPS),2015.[21]S.Hanneke.Aboundonthelabelcomplexityofagnosticactivelearning.InProceedingsoftheInternationalConferenceonMachineLearning(ICML),2007.[22]E.Hazan,A.Klivans,andY.Yuan.Hyperparameteroptimization:Aspectralapproach.arXivpreprintarXiv:1706.00764,2017.[23]E.Hazan,K.Levy,andS.Shalev-Shwartz.Beyondconvexity:Stochasticquasi-convexoptimization.InProceedingsofAdvancesinNeuralInformationProcessingSystems(NIPS),2015.[24]K.G.Jamieson,R.Nowak,andB.Recht.Querycomplexityofderivative-freeoptimization.InProceedingsofAdvancesinNeuralInformationProcessingSystems(NIPS),2012.[25]A.R.KanandG.T.Timmer.StochasticglobaloptimizationmethodspartI:Clusteringmethods.MathematicalProgramming,39(1):27\u201356,1987.[26]A.R.KanandG.T.Timmer.StochasticglobaloptimizationmethodspartII:Multilevelmethods.MathematicalProgramming,39(1):57\u201378,1987.[27]J.KieferandJ.Wolfowitz.Stochasticestimationofthemaximumofaregressionfunction.TheAnnalsofMathematicalStatistics,23(3):462\u2013466,1952.[28]R.D.Kleinberg.Nearlytightboundsforthecontinuum-armedbanditproblem.InAdvancesinNeuralInformationProcessingSystems(NIPS),2005.[29]A.P.KorostelevandA.B.Tsybakov.Minimaxtheoryofimagereconstruction,volume82.SpringerScience&BusinessMedia,2012.[30]O.V.Lepski,E.Mammen,andV.G.Spokoiny.Optimalspatialadaptationtoinhomogeneoussmoothness:anapproachbasedonkernelestimateswithvariablebandwidthselectors.TheAnnalsofStatistics,25(3):929\u2013947,1997.[31]C.Malherbe,E.Contal,andN.Vayatis.Arankingapproachtoglobaloptimization.InProceedingsoftheInternationalConferenceonMachineLearning(ICML),2016.[32]C.MalherbeandN.Vayatis.Globaloptimizationoflipschitzfunctions.InProceedingsoftheInternationalConferenceonMachineLearning(ICML),2017.[33]S.Minsker.Non-asymptoticboundsforpredictionproblemsanddensityestimation.PhDthesis,GeorgiaInstituteofTechnology,2012.11\f[34]S.Minsker.Estimationofextremevaluesandassociatedlevelsetsofaregressionfunctionviaselectivesampling.InProceedingsofConferencesonLearningTheory(COLT),2013.[35]N.Nakamura,J.Seepaul,J.B.Kadane,andB.Reeja-Jayan.Designforlow-temperaturemicrowave-assistedcrystallizationofceramicthin\ufb01lms.AppliedStochasticModelsinBusinessandIndustry,2017.[36]A.NemirovskiandD.Yudin.Problemcomplexityandmethodef\ufb01ciencyinoptimization.AWiley-IntersciencePublication,1983.[37]Y.NesterovandB.T.Polyak.Cubicregularizationofnewtonmethodanditsglobalperformance.MathematicalProgramming,108(1):177\u2013205,2006.[38]W.Polonik.Measuringmassconcentrationsandestimatingdensitycontourclusters-anexcessmassapproach.TheAnnalsofStatistics,23(3):855\u2013881,1995.[39]E.Purzen.Onestimationofaprobabilitydensityandmode.TheAnnalsofMathematicalStatistics,33(3):1065\u20131076,1962.[40]C.E.RasmussenandC.K.Williams.Gaussianprocessesformachinelearning,volume1.MITpressCambridge,2006.[41]B.Reeja-Jayan,K.L.Harrison,K.Yang,C.-L.Wang,A.Yilmaz,andA.Manthiram.Microwave-assistedlow-temperaturegrowthofthin\ufb01lmsinsolution.Scienti\ufb01creports,2,2012.[42]P.RigolletandR.Vert.Optimalratesforplug-inestimatorsofdensitylevelsets.Bernoulli,15(4):1154\u20131178,2009.[43]A.RisteskiandY.Li.Algorithmsandmatchinglowerboundsforapproximately-convexoptimization.InProceedingsofAdvancesinNeuralInformationProcessingSystems(NIPS),2016.[44]J.Scarlett,I.Bogunovic,andV.Cevher.Lowerboundsonregretfornoisygaussianprocessbanditoptimization.InProceedingsoftheannualConferenceonLearningTheory(COLT),2017.[45]A.Singh,C.Scott,andR.Nowak.Adaptivehausdorffestimationofdensitylevelsets.TheAnnalsofStatistics,37(5B):2760\u20132782,2009.[46]A.B.Tsybakov.Introductiontononparametricestimation.SpringerSeriesinStatistics.Springer,NewYork,2009.[47]A.W.VanderVaart.Asymptoticstatistics,volume3.Cambridgeuniversitypress,1998.[48]Y.Zhang,P.Liang,andM.Charikar.Ahittingtimeanalysisofstochasticgradientlangevindynamics.InProceedingsoftheannualConferenceonLearningTheory(COLT),2017.12\f", "award": [], "sourceid": 2116, "authors": [{"given_name": "Yining", "family_name": "Wang", "institution": "CMU"}, {"given_name": "Sivaraman", "family_name": "Balakrishnan", "institution": "Carnegie Mellon University"}, {"given_name": "Aarti", "family_name": "Singh", "institution": "CMU"}]}