{"title": "Quantized Estimation of Gaussian Sequence Models in Euclidean Balls", "book": "Advances in Neural Information Processing Systems", "page_first": 3662, "page_last": 3670, "abstract": "A central result in statistical theory is Pinsker's theorem, which characterizes the minimax rate in the normal means model of nonparametric estimation. In this paper, we present an extension to Pinsker's theorem where estimation is carried out under storage or communication constraints. In particular, we place limits on the number of bits used to encode an estimator, and analyze the excess risk in terms of this constraint, the signal size, and the noise level. We give sharp upper and lower bounds for the case of a Euclidean ball, which establishes the Pareto-optimal minimax tradeoff between storage and risk in this setting.", "full_text": "QuantizedEstimationofGaussianSequenceModelsinEuclideanBallsYuanchengZhuJohnLaffertyDepartmentofStatisticsUniversityofChicagoAbstractAcentralresultinstatisticaltheoryisPinsker\u2019stheorem,whichcharacterizestheminimaxrateinthenormalmeansmodelofnonparametricestimation.Inthispaper,wepresentanextensiontoPinsker\u2019stheoremwhereestimationiscarriedoutunderstorageorcommunicationconstraints.Inparticular,weplacelimitsonthenumberofbitsusedtoencodeanestimator,andanalyzetheexcessriskintermsofthisconstraint,thesignalsize,andthenoiselevel.WegivesharpupperandlowerboundsforthecaseofaEuclideanball,whichestablishesthePareto-optimalminimaxtradeoffbetweenstorageandriskinthissetting.1IntroductionClassicalstatisticaltheorystudiestherateatwhichtheerrorinanestimationproblemdecreasesasthesamplesizeincreases.Methodologyforaparticularproblemisdevelopedtomakeestimationef\ufb01cient,andlowerboundsestablishhowquicklytheerrorcandecreaseinprinciple.AsymptoticallymatchingupperandlowerboundstogetheryieldtheminimaxrateofconvergenceRn(F)=infbfsupf\u2208FR(bf,f).Thisistheworst-caseerrorinestimatinganelementofamodelclassF,whereR(bf,f)istheriskorexpectedloss,andbfisanestimatorconstructedonadatasampleofsizen.Thecorrespondingsamplecomplexityoftheestimationproblemisn(\u0001,F)=min{n:Rn(F)<\u0001}.Intheclassicalsetting,thein\ufb01mumisoverallestimators.Incontemporarysettings,itisincreasinglyofinteresttounderstandhowerrordependsoncomputation.Forinstance,whenthedataarehighdimensionalandthesamplesizeislarge,constructingtheestimatorusingstandardmethodsmaybecomputationallyprohibitive.Theuseofheuristicsandapproximationalgorithmsmaymakecomputationmoreef\ufb01cient,butitisimportanttounderstandthelossinstatisticalef\ufb01ciencythatthisincurs.Intheminimaxframework,thiscanbeformulatedbyplacingcomputationalconstraintsontheestimator:Rn(F,Bn)=infbf:C(bf)\u2264Bnsupf\u2208FR(bf,f).HereC(bf)\u2264BnindicatesthatthecomputationC(bf)usedtoconstructbfisrequiredtofallwithina\u201ccomputationalbudget\u201dBn.Minimaxlowerboundsontheriskasafunctionofthecomputa-tionalbudgetthusdetermineafeasibleregionforcomputation-constrainedestimation,andaPareto-optimaltradeoffforerrorversuscomputation.Oneimportantmeasureofcomputationisthenumberof\ufb02oatingpointoperations,ortherunningtimeofanalgorithm.ChandrasekaranandJordan[3]havestudiedupperboundsforstatisticalestimationwithcomputationalconstraintsofthisforminthenormalmeansmodel.However,usefullowerboundsareelusive.Thisisduetothedif\ufb01cultnatureofestablishingtightlowerboundsfor1\fthismodelofcomputationinthepolynomialhierarchy,apartfromanystatisticalconcerns.Anotherimportantmeasureofcomputationisstorage,orthespaceusedbyaprocedure.Inparticular,wemaywishtolimitthenumberofbitsusedtorepresentourestimatorbf.Thequestionthenbecomes,howdoestheexcessriskdependonthebudgetBnimposedonthenumberofbitsC(bf)usedtoencodetheestimator?Thisproblemisnaturallymotivatedbycertainapplications.Forinstance,theKeplertelescopecollects\ufb02uxdataforapproximately150,000stars[6].Thecentralstatisticaltaskistoestimatethelightcurveofeachstarnonparametrically,inordertodenoiseanddetectplanettransits.Ifthisestimationisdoneonboardthetelescope,theestimatedfunctionvaluesmayneedtobesentbacktoearthforfurtheranalysis.Tolimitcommunicationcosts,theestimatescanbequantized.Thefundamentalquestionis,whatislostintermsofstatisticalriskinquantizingtheestimates?Or,inacloudcomputingenvironment(suchasAmazonEC2),alargenumberofnonparametricestimatesmightbeconstructedoveraclusterofcomputenodesandthenstored(forexampleinAmazonS3)forlateranalysis.Tolimitthestoragecosts,whichcoulddominatethecomputecostsinmanyscenarios,itisofinteresttoquantizetheestimates.Howmuchislostintermsofrisk,inprinciple,byusingdifferentlevelsofquantization?Withsuchapplicationsasmotivation,weaddressinthispapertheproblemofrisk-storagetradeoffsinthenormalmeansmodelofnonparametricestimation.Thenormalmeansmodelisacenterpieceofnonparametricestimation.Itarisesnaturallywhenrepresentinganestimatorintermsofanor-thogonalbasis[8,11].OurmainresultisasharpcharacterizationofthePareto-optimaltradeoffcurveforquantizedestimationofanormalmeansvector,intheminimaxsense.WeconsiderthecaseofaEuclideanballofunknownradiusinRn.Thiscaseexhibitsmanyofthekeytechnicalchal-lengesthatariseinnonparametricestimationoverricherspaces,includingtheSteinphenomenonandtheproblemofadaptivity.Aswillbeapparenttothereader,theproblemweconsiderisintimatelyrelatedtoclassicalratedistortiontheory[7].Indeed,ourresultsrequireamarriageofminimaxtheoryandratedistortionideas.WethusbuildonthefundamentalconnectionbetweenfunctionestimationandlossysourcecodingthatwaselucidatedinDonoho\u2019s1998WaldLectures[4].Thisconnectioncanalsobeusedtoadvantageforpracticalestimationschemes.Aswediscussfurtherbelow,recentadvancesoncomputationallyef\ufb01cient,near-optimallossycompressionusingsparseregressionalgorithms[12]canperhapsbeleveragedforquantizednonparametricestimation.Inthefollowingsection,wepresentrelevantbackgroundandgiveadetailedstatementofourresults.InSection3wesketchaproofofourmainresultontheexcessriskfortheEuclideanballcase.Section4presentssimulationstoillustrateourtheoreticalanalyses.Section5discussesrelatedwork,andoutlinesfuturedirectionsthatourresultssuggest.2BackgroundandproblemformulationInthissectionwebrie\ufb02yreviewtheessentialelementsofrate-distortiontheoryandminimaxtheory,toestablishnotation.Wethenstateourmainresult,whichbridgestheseclassicaltheories.Intherate-distortionsettingwehaveasourcethatproducesasequenceXn=(X1,X2,...,Xn),eachcomponentofwhichisindependentandidenticallydistributedasN(0,\u03c32).Thegoalistotransmitarealizationfromthissequenceofrandomvariablesusinga\ufb01xednumberofbits,insuchawaythatresultsintheminimalexpecteddistortionwithrespecttotheoriginaldataXn.SupposethatweareallowedtouseatotalbudgetofnBbits,sothattheaveragenumberofbitspervariableisB,whichisreferredtoastherate.Totransmitorstorethedata,theencoderdescribesthesourcesequenceXnbyanindex\u03c6n(Xn),where\u03c6n:Rn\u2192{1,2,...,2nB}\u2261C(B)istheencodingfunction.ThenB-bitindexisthentransmittedorstoredwithoutloss.Adecoder,whenreceivingorretrievingthedata,representsXnbyanestimate\u02c7Xnbasedontheindexusingadecodingfunction\u03c8n:{1,2,...,2nB}\u2192Rn.Theimageofthedecodingfunction\u03c8niscalledthecodebook,whichisadiscretesetinRnwithcardinalitynolargerthan2nB.TheprocessisillustratedinFigure1,andvariouslyreferredtoas2\fXnEncoder\u03c6nDecoder\u03c8n\u02c7Xn=\u03c8n(\u03c6n(Xn))\u03c6n(Xn)\u2208C(B)\u03b8nXnEncoder\u03c6nDecoder\u03c8n\u02c7\u03b8n=\u03c8n(\u03c6n(Xn))\u03c6n(Xn)\u2208C(B)Figure1:Encodinganddecodingprocessforlossycompression(top)andquantizedestimation(bottom).Forquantizedestimation,themodel(meanvector)\u03b8nisdeterministic,notrandom.sourcecoding,lossycompression,orquantization.Wecallthepairofencodinganddecodingfunc-tionsQn=(\u03c6n,\u03c8n)an(n,B)-ratedistortioncode.WewillalsouseQntodenotethecompositionofthetwofunctions,i.e.,Qn(\u00b7)=\u03c8n(\u03c6n(\u00b7)).Adistortionmeasure,oralossfunction,d:R\u00d7R\u2192R+isusedtoevaluatetheperformanceoftheabovecodingandtransmissionprocess.Inthispaper,wewillusethesquaredlossd(Xi,\u02c7Xi)=(Xi\u2212\u02c7Xi)2.ThedistortionbetweentwosequencesXnand\u02c7Xnisthende\ufb01nedbydn(Xn,\u02c7Xn)=1nPni=1(Xi\u2212\u02c7Xi)2,theaverageoftheperobservationdistortions.Wedropthesubscriptnindwhenitisclearfromthecontext.Thedistortion,orrisk,fora(n,B)-ratedistortioncodeQnisde\ufb01nedastheexpectedlossEd(Xn,Qn(Xn)).DenotingbyQn,Bthesetofall(n,B)-ratedistortioncodes,thedistortionratefunctionisde\ufb01nedasR(B,\u03c3)=liminfn\u2192\u221einfQn\u2208Qn,BEd(Xn,Qn(Xn)).ThisdistortionratefunctiondependsontherateBaswellasthesourcedistribution.Forthei.i.d.N(0,\u03c32)source,accordingtothewell-knownratedistortiontheorem[7],R(B,\u03c3)=\u03c322\u22122B.WhenBiszero,meaningnoinformationgetsencodedatall,thisboundbecomes\u03c32,whichistheexpectedlosswheneachrandomvariableisrepresentedbyitsmean.AsBapproachesin\ufb01nity,thedistortiongoestozero.ThepreviousdiscussionassumesthesourcerandomvariablesareindependentandfollowacommondistributionN(0,\u03c32).ThegoalistominimizetheexpecteddistortioninthereconstructionofXnaftertransmittingorstoringthedataunderacommunicationconstraint.NowsupposethatXiind.\u223cN(\u03b8i,\u03c32)fori=1,2,...,n.Weassumethevariance\u03c32isknownandthemeans\u03b8n=(\u03b81,...,\u03b8n)areunknown.Suppose,fur-thermore,thatinsteadoftryingtominimizetherecoverydistortiond(Xn,\u02c7Xn),wewanttoestimatethemeanswithariskassmallaspossible,butagainusingabudgetofBbitsperindex.Withoutthecommunicationconstraint,thisproblemhasbeenverywellstudied[10,9].Letb\u03b8(Xn)\u2261b\u03b8n=(b\u03b81,...,b\u03b8n)denoteanestimatorofthetruemean\u03b8n.Foraparameterspace\u0398n\u2282Rn,theminimaxriskover\u0398nisde\ufb01nedasinfb\u03b8nsup\u03b8n\u2208\u0398nEd(\u03b8n,b\u03b8n)=infb\u03b8nsup\u03b8n\u2208\u0398nE1nnXi=1(\u03b8i\u2212b\u03b8i)2.FortheL2ballofradiusc,\u0398n(c)=n(\u03b81,...,\u03b8n):1nnXi=1\u03b82i\u2264c2o,(1)Pinsker\u2019stheoremgivestheexact,limitingformoftheminimaxriskliminfn\u2192\u221einfb\u03b8nsup\u03b8n\u2208\u0398n(c)Ed(\u03b8n,b\u03b8n)=\u03c32c2\u03c32+c2.Toimposeacommunicationconstraint,weincorporateavariantofthesourcecodingschemede-scribedaboveintothisminimaxframeworkofestimation.De\ufb01nea(n,B)-rateestimationcode3\f246012345Bits per symbol BRisk RFigure2.OurresultestablishesthePareto-optimaltradeoffinthenonparametricnormalmeansprob-lemforriskversusnumberofbits:R(\u03c32,c2,B)=c2\u03c32\u03c32+c2+c42\u22122B\u03c32+c2Curvesfor\ufb01vesignalsizesareshown,c2=2,3,4,5,6.Thenoiselevelis\u03c32=1.Withzerobits,therateisc2,thehighestpointontheriskcurve.TherateforlargeBapproachesthePinskerbound\u03c32c2/(\u03c32+c2).Mn=(\u03c6n,\u03c8n),asapairofencodinganddecodingfunctions,asbefore.Theencodingfunction\u03c6n:Rn\u2192{1,2,...,2nB}isamappingfromobservationsXntoanindexset.Thedecodingfunctionisamappingfromindicestomodels\u02c7\u03b8n\u2208Rn.WewritethecompositionoftheencoderanddecoderasMn(Xn)=\u03c8n(\u03c6n(Xn))=\u02c7\u03b8n,whichwecallaquantizedestimator.DenotingbyMn,Bthesetofall(n,B)-rateestimationcodes,wethende\ufb01nethequantizedminimaxriskasRn(B,\u03c3,\u0398n)=infMn\u2208Mn,Bsup\u03b8n\u2208\u0398nEd(\u03b8n,Mn(Xn)).WewillfocusonthecasewhereourparameterspaceistheL2ballde\ufb01nedin(1),andwriteRn(B,\u03c3,c)=Rn(B,\u03c3,\u0398n(c)).Inthissetting,weletngotoin\ufb01nityandde\ufb01netheasymptoticquantizedminimaxriskasR(B,\u03c3,c)=liminfn\u2192\u221eRn(B,\u03c3,c)=liminfn\u2192\u221einfMn\u2208Mn,Bsup\u03b8n\u2208\u0398n(c)Ed(\u03b8n,Mn(Xn)).(2)Notethatwecouldestimate\u03b8nbasedonthequantizeddata\u02c7Xn=Qn(Xn).OnceagaindenotingbyQn,Bthesetofall(n,B)-ratedistortioncodes,suchanestimatoriswritten\u02c7\u03b8n=\u02c7\u03b8n(Qn(Xn)).Clearly,ifthedecodingfunctions\u03c8nofQnareinjective,thenthisformulationisequivalent.ThequantizedminimaxriskisthenexpressedasRn(B,\u03c3,\u0398n)=inf\u02c7\u03b8ninfQn\u2208Qn,Bsup\u03b8n\u2208\u0398nEd(\u03b8n,\u02c7\u03b8n).Themanynormalmeansproblemexhibitsmuchofthecomplexityandsubtletyofgeneralnonpara-metricregressionanddensityestimationproblems.Itarisesnaturallyintheestimationofafunctionexpressedintermsofanorthogonalfunctionbasis[8,13].Ourmainresultsharplycharacterizestheexcessriskthatcommunicationconstraintsimposeonminimaxestimationfor\u0398(c).3MainresultsOur\ufb01rstresultgivesalowerboundontheexactquantizedasymptoticriskintermsofB,\u03c3,andc.Theorem1.ForB\u22650,\u03c3>0andc>0,theasymptoticminimaxriskde\ufb01nedin(2)satis\ufb01esR(B,\u03c3,c)\u2265\u03c32c2\u03c32+c2+c4\u03c32+c22\u22122B.(3)Thislowerboundonthelimitingminimaxriskcanbeviewedastheusualminimaxriskwithoutquantization,plusanexcessrisktermduetoquantization.IfwetakeBtobezero,theriskbecomesc2,whichisobtainedbyestimatingallofthemeanssimplybyzero.Ontheotherhand,lettingB\u2192\u221e,werecovertheminimaxriskinPinsker\u2019stheorem.ThistradeoffisillustratedinFigure2.Theproofofthetheoremistechnicalandwedeferittothesupplementarymaterial.Herewesketchthebasicideaoftheproof.Supposeweareableto\ufb01ndapriordistribution\u03c0non\u03b8nandarandom4\fvectore\u03b8nsuchthatforany(n,B)-rateestimationcodeMnthefollowingholds:\u03c32c2\u03c32+c2+c4\u03c32+c22\u22122B(I)=ZEXnd(\u03b8n,e\u03b8n)d\u03c0n(\u03b8n)(II)\u2264ZEXnd(\u03b8n,Mn(Xn))d\u03c0n(\u03b8n)(III)\u2264sup\u03b8n\u2208\u0398n(c)EXnd(\u03b8n,Mn(Xn)).Thentakinganin\ufb01mumoverMn\u2208Mn,Bgivesusthedesiredresult.Infact,wecantake\u03c0n,theprioron\u03b8n,tobeN(0,c2In),andthemodelbecomes\u03b8i\u223cN(0,c2)andXi|\u03b8i\u223cN(\u03b8i,\u03c32).ThenaccordingtoLemma1,inequality(II)holdswithe\u03b8nbeingtheminimizertotheoptimizationproblemminp(e\u03b8n|Xn,\u03b8n)Ed(\u03b8n,e\u03b8n)subjecttoI(Xn;e\u03b8n)\u2264nB,p(e\u03b8n|Xn,\u03b8n)=p(e\u03b8n|Xn).Theequality(I)holdsduetoLemma2.Theinequality(III)canbeshownbyalimitingconcentrationargumentonthepriordistribution,whichisincludedinthesupplementarymaterial.Lemma1.SupposethatX1,...,Xnareindependentandgeneratedby\u03b8i\u223c\u03c0(\u03b8i)andXi|\u03b8i\u223cp(xi|\u03b8i).SupposeMnisan(n,B)-rateestimationcodewithriskEd(\u03b8n,Mn(Xn))\u2264D.ThentherateBislowerboundedbythesolutiontothefollowingproblem:minp(e\u03b8n|Xn,\u03b8n)I(Xn;e\u03b8n)subjecttoEd(\u03b8n,e\u03b8n)\u2264D,(4)p(e\u03b8n|Xn,\u03b8n)=p(e\u03b8n|Xn).Thenextlemmagivesthesolutiontoproblem(4)whenwehave\u03b8i\u223cN(0,c2)andXi|\u03b8i\u223cN(\u03b8i,\u03c32)Lemma2.Suppose\u03b8i\u223cN(0,c2)andXi|\u03b8i\u223cN(\u03b8i,\u03c32)fori=1,...,n.Foranyrandomvectore\u03b8nsatisfyingEd(\u03b8n,e\u03b8n)\u2264Dandp(e\u03b8n|Xn,\u03b8n)=p(e\u03b8n|Xn)wehaveI(Xn;e\u03b8n)\u2265n2logc4(\u03c32+c2)(D\u2212\u03c32c2\u03c32+c2).Combiningtheabovetwolemmas,weobtainalowerboundoftheriskassumingthat\u03b8nfollowsthepriordistribution\u03c0n:Corollary1.SupposeMnisa(n,B)-rateestimationcodeforthesource\u03b8i\u223cN(0,c2)andXi|\u03b8i\u223cN(\u03b8i,\u03c32),thenEd(\u03b8n,Mn(Xn))\u2265\u03c32c2\u03c32+c2+c4\u03c32+c22\u22122B.(5)3.1AnadaptivesourcecodingmethodWenowpresentasourcecodingmethod,whichwewillshowattainstheminimaxlowerboundasymptoticallywithhighprobability.Supposethattheencoderisgivenasequenceofobservations(X1,...,Xn),andboththeencoderandthedecoderknowtheradiuscoftheL2ballinwhichthemeanvectorlies.Thestepsofthesourcecodingmethodareoutlinedbelow:Step1.Generatingcodebooks.Thecodebooksaredistributedtoboththeencoderandthedecoder.5\f(a)GeneratecodebookB={1/\u221an,2/\u221an,...,dc2\u221ane/\u221an}.(b)GeneratecodebookXwhichconsistsof2nBi.i.d.randomvectorsfromtheuniformdistributiononthen-dimensionalunitsphereSn\u22121.Step2.Encoding.(a)Encodebb2=1nkXk2\u2212\u03c32by\u02c7b2=argmin{|b2\u2212bb2|:b2\u2208B}.(b)EncodeXnby\u02c7Xn=argmax{hXn,xni:xn\u2208X}.Step3.Transmitorstore(\u02c7b2,\u02c7Xn)bytheircorrespondingindicesusinglogc2+12logn+nBbits.Step4.Decoding.(a)Recover(\u02c7b2,\u02c7Xn)bythetransmittedorstoredindices.(b)Estimate\u03b8by\u02c7\u03b8n=sn\u02c7b4(1\u22122\u22122B)\u02c7b2+\u03c32\u00b7\u02c7Xn.Wemakeseveralremarksonthisquantizedestimationmethod.Remark1.TherateofthiscodingmethodisB+logc2n+logn2n,whichisasymptoticallyBbits.Remark2.Themethodisprobabilistic;therandomnesscomesfromtheconstructionofthecode-bookX.DenotingbyM\u2217n,B,\u03c3,ctheensembleofsuchrandomquantizers,thereisthenanaturalone-to-onemappingbetweenM\u2217n,B,\u03c3,cand(Sn\u22121)2nBandweattachprobabilitymeasuretoM\u2217n,B,\u03c3,ccorrespondingtotheproductuniformdistributionon(Sn\u22121)2nB.Remark3.Themainideabehindthiscodingschemeistoencodethemagnitudeandthedirectionoftheobservationvectorseparately,insuchawaythattheprocedureadaptstosourceswithdifferentnormsofthemeanvectors.Remark4.Thecomputationalcomplexityofthissourcecodingmethodisexponentialinn.There-fore,liketheShannonrandomcodebook,thisisademonstrationoftheasymptoticachievabilityofthelowerbound(3),ratherthanapracticalschemetobeimplemented.Wediscusspossiblecomputationallyef\ufb01cientalgorithmsinSection5.Thefollowingshowsthatwithhighprobabilitythisprocedurewillattainthedesiredlowerboundasymptotically.Theorem2.Forasequenceofvectors{\u03b8n}\u221en=1satisfying\u03b8n\u2208Rnandk\u03b8nk2/n=b2\u2264c2,asn\u2192\u221eP d(\u03b8n,Mn(Xn))>\u03c32b2\u03c32+b2+b4\u03c32+b22\u22122B+Crlognn!\u2212\u21920(6)forsomeconstantCthatdoesnotdependonn(butcouldpossiblydependonb,\u03c3andB).TheprobabilitymeasureiswithrespecttobothMn\u2208M\u2217n,B,\u03c3,candXn\u2208Rn.ThistheoremshowsthatthesourcecodingmethodnotonlyachievesthedesiredminimaxlowerboundfortheL2ballwithhighprobabilitywithrespecttotherandomcodebookandsourcedistri-bution,butalsoadaptstothetruemagnitudeofthemeanvector\u03b8n.ItagreeswiththeintuitionthatthehardestmeanvectortoestimateliesontheboundaryoftheL2ball.BasedonTheorem2wecanobtainauniformhighprobabilityboundformeanvectorsintheL2ball.Corollary2.Foranysequenceofvectors{\u03b8n}\u221en=1satisfying\u03b8n\u2208Rnandk\u03b8nk2/n\u2264c2,asn\u2192\u221eP d(\u03b8n,Mn(Xn))>\u03c32c2\u03c32+c2+c4\u03c32+c22\u22122B+C0rlognn!\u2212\u21920forsomeconstantC0thatdoesnotdependonn.WeincludethedetailsoftheproofofTheorem2inthesupplementarymaterial,whichcarefullyanalyzesthethreetermsinthefollowingdecompositionofthelossfunction:6\f\u22124\u22122024IndexEstimateB=0.1B=0.2B=0.5B=1James\u2212SteinFigure3:ComparisonofthequantizedestimateswithdifferentratesB,theJames-Steinestimator,andthetruemeanvector.Theheightsofthebarsaretheaveragedestimatesbasedon100replicates.Eachlargebackgroundrectangleindicatestheoriginalmeancomponent\u03b8j.d(\u03b8n,\u02c7\u03b8n)=1n(cid:13)(cid:13)\u02c7\u03b8n\u2212\u03b8n(cid:13)(cid:13)2=1n(cid:13)(cid:13)\u02c7\u03b8n\u2212b\u03b3Xn+b\u03b3Xn\u2212\u03b8n(cid:13)(cid:13)2=1n(cid:13)(cid:13)\u02c7\u03b8n\u2212b\u03b3Xn(cid:13)(cid:13)2|{z}A1+1nkb\u03b3Xn\u2212\u03b8nk2|{z}A2+2nh\u02c7\u03b8n\u2212b\u03b3Xn,b\u03b3Xn\u2212\u03b8ni|{z}A3whereb\u03b3=bb2bb2+\u03c32withbb2=kXnk2/n\u2212\u03c32.TermA1characterizesthequantizationerror.TermA2doesnotinvolvetherandomcodebook,andisthelossofatypeofJames-Steinestimator.ThecrosstermA3vanishesasn\u2192\u221e.4SimulationsInthissectionwepresentasetofsimulationresultsshowingtheempiricalperformanceoftheproposedquantizedestimationmethod.Throughoutthesimulation,we\ufb01xthenoiselevel\u03c32=1,whilevaryingtheotherparameterscandB.FirstweshowinFigure3theeffectofquantizedestimationandcompareitwiththeJames-Steinestimator.Settingn=15andc=2,werandomlygenerateameanvector\u03b8n\u2208Rnwithk\u03b8k2/n=c2.ArandomvectorXisthendrawnfromN(\u03b8n,In)andquantizedestimateswithratesB\u2208{0.1,0.2,0.5,1}arecalculated;forcomparisonwealsocomputetheJames-Steinestimator,givenbyb\u03b8nJS=(cid:16)1\u2212(n\u22122)\u03c32kXnk2(cid:17)Xn.Werepeatthissamplingandestimationprocedure100timesandreporttheaveragedriskestimatesinFigure3.Weseethatthequantizedestimatoressentiallyshrinkstherandomvectortowardszero.Withsmallrates,theshrinkageisstrong,withalltheestimatesclosetozero.EstimateswithlargerratesapproachtheJames-Steinestimator.Inoursecondsetofsimulations,wechoosecfrom{0.1,0.5,1,5,10}tore\ufb02ectdifferentsignal-to-noiseratios,andchooseBfrom{0.1,0.2,0.5,1}.ForeachcombinationofthevaluesofcandB,wevaryn,thedimensionofthemeanvector,whichisalsothenumberofobservations.Givenasetofparametersc,Bandn,ameanvector\u03b8nisgenerateduniformlyonthespherek\u03b8nk2/n=c2anddataXnaregeneratedfollowingthedistributionN(\u03b8n,\u03c32In).Wequantizethedatausingthesourcecodingmethod,andcomputethemeansquarederrorbetweentheestimatorandthetruemeanvector.Theprocedureisrepeated100timesforeachoftheparametercombinations,andtheaverageandstandarddeviationofthemeansquarederrorsarerecorded.TheresultsareshowninFigure4.Weseethatasnincreases,theaverageerrordecreasesandapproachesthetheoreticlowerboundinTheorem1.Moreover,thestandarddeviationofthemeansquarederrorsalsodecreases,con\ufb01rmingtheresultofTheorem2thattheconvergenceiswithhighprobability.5DiscussionandfutureworkInthispaper,weestablishasharplowerboundontheasymptoticminimaxriskforquantizedesti-matorsofnonparametricnormalmeansforthecaseofaEuclideanball.Similartechniquescanbe7\fllllllllllllllllllllllllllllllllllllllllllllllllllll11004812nMSEB=0.1llllllllllllllllllllllllllllllllllllllllllllllllllll11004812nB=0.2llllllllllllllllllllllllllllllllllllllllllllllllllll11004812nB=0.5llllllllllllllllllllllllllllllllllllllllllllllllllll11004812nB=1llllc=0.5c=1c=5c=10Figure4:Meansquarederrorsandstandarddeviationsofthequantizedestimatorversusnfordifferentvaluesof(B,c).Thehorizontaldashedlinesindicatethelowerbounds.appliedtothesettingwheretheparameterspaceisanellipsoid\u0398={\u03b8:P\u221ej=1a2j\u03b82j\u2264c2}.AprincipalcaseofinterestistheSobolevellipsoidofordermwherea2j\u223c(\u03c0j)2masj\u2192\u221e.TheSobolevellipsoidarisesnaturallyinnonparametricfunctionestimationandisthusofgreatimpor-tance.Weleavethistofuturework.DonohodiscussestheparallelbetweenratedistortiontheoryandPinsker\u2019sworkinhisWaldLec-tures[4].FocusingonthecaseoftheSobolevspaceoforderm,whichwedenotebyFm,itisshownthattheKolmogoroventropyH\u0001(Fm)andtheratedistortionfunctionR(D,X)satisfyH\u0001(Fm)(cid:16)sup{R(\u00012,X):P(X\u2208Fm)=1}as\u0001\u21920.Thisconnectstheworst-caseminimaxanalysisandleast-favorableratedistortionfunctionforthefunctionclass.Anotherinformation-theoreticformulationofminimaxratesliesintheso-called\u201cleCamequation\u201dH\u0001(F)=n\u00012[14,15].However,botharedifferentfromthedirectionwepursueinthispaper,whichistoim-posecommunicationconstraintsinminimaxanalysis.Inotherrelatedwork,researchersincommunicationstheoryhavestudiedestimationproblemsinsensornetworksundercommunicationconstraints.DraperandWornell[5]obtainaresultontheso-called\u201cone-stepproblem\u201dforthequadratic-Gaussiancase,whichisessentiallythesameasthestatementinourCorollary1.Infact,theyconsiderasimilarsetting,buttreatthemeanvectorasrandomandgeneratedindependentlyfromaknownnormaldistribution.Incontrast,weassumea\ufb01xedbutunknownmeanvectorandestablishaminimaxlowerboundaswellasanadaptivesourcecodingmethodthatadaptstothe\ufb01xedmeanvectorwithintheparameterspace.Zhangetal.[16]alsoconsiderminimaxboundswithcommunicationconstraints.However,theanalysisin[16]isfocusedondistributedparametricestimation,wherethedataaredistributedbetweenseveralmachines.Informationissharedbetweenthemachinesinordertoconstructaparameterestimate,andconstraintsareplacedontheamountofcommunicationthatisallowed.Inadditiontotreatingmoregeneralellipsoids,animportantdirectionforfutureworkistodesigncomputationallyef\ufb01cientquantizednonparametricestimators.Onepossiblemethodistodividethevariablesintosmallerblocksandquantizethemseparately.AmoreinterestingandpromisingapproachistoadapttherecentworkofVenkataramananetal.[12]thatusessparseregressionforlossycompression.Weanticipatethatwithappropriatemodi\ufb01cations,thisschemecanbeappliedtoquantizednonparametricestimationtoyieldpracticalalgorithms,tradingoffaworseerrorexponentintheconvergenceratetotheoptimalquantizedminimaxriskforreducedcomplexityencodersanddecoders.AcknowledgementsResearchsupportedinpartbyNSFgrantIIS-1116730,AFOSRgrantFA9550-09-1-0373,ONRgrantN000141210762,andanAmazonAWSinEducationMachineLearningResearchgrant.TheauthorsthankAndrewBarron,JohnDuchi,andAlfredHeroforvaluablecommentsonthiswork.8\fReferences[1]T.TonyCai,JianqingFan,andTiefengJiang.Distributionsofanglesinrandompackingonspheres.TheJournalofMachineLearningResearch,14(1):1837\u20131864,2013.[2]T.TonyCaiandTiefengJiang.Phasetransitioninlimitingdistributionsofcoherenceofhigh-dimensionalrandommatrices.JournalofMultivariateAnalysis,107:24\u201339,2012.[3]VenkatChandrasekaranaandMichaelI.Jordan.Computationalandstatisticaltradeoffsviaconvexrelaxation.PNAS,110(13):E1181\u2013E1190,March2013.[4]DavidL.Donoho.WaldlectureI:CountingbitswithKolmogorovandShannon.2000.[5]StarkC.DraperandGregoryW.Wornell.Sideinformationawarecodingstrategiesforsensornetworks.SelectedAreasinCommunications,IEEEJournalon,22(6):966\u2013976,2004.[6]JonM.Jenkinsetal.OverviewoftheKeplerscienceprocessingpipeline.TheAstrophysicalJournalLetters,713(2):L87,2010.[7]RobertG.Gallager.InformationTheoryandReliableCommunication.JohnWiley&Sons,1968.[8]IainM.Johnstone.FunctionestimationandGaussiansequencemodels.2002.Unpublishedmanuscript.[9]MichaelNussbaum.Minimaxrisk:Pinskerbound.EncyclopediaofStatisticalSciences,3:451\u2013460,1999.[10]MarkSemenovichPinsker.Optimal\ufb01lteringofsquare-integrablesignalsinGaussiannoise.ProblemyPeredachiInformatsii,16(2):52\u201368,1980.[11]AlexandreB.Tsybakov.IntroductiontoNonparametricEstimation.SpringerSeriesinStatis-tics,1stedition,2008.[12]RamjiVenkataramanan,TuhinSarkar,andSekharTatikonda.Lossycompressionviasparselinearregression:Computationallyef\ufb01cientencodinganddecoding.InIEEEInternationalSymposiumonInformationTheory(ISIT),pages1182\u20131186.IEEE,2013.[13]LarryWasserman.AllofNonparametricStatistics.Springer-Verlag,2006.[14]WingHungWongandXiaotongShen.Probabilityinequalitiesforlikelihoodratiosandcon-vergenceratesofsieveMLEs.TheAnnalsofStatistics,23:339\u2013362,1995.[15]YuhongYangandAndrewBarron.Information-theoreticdeterminationofminimaxratesofconvergence.TheAnnalsofStatistics,27(5):1564\u20131599,1999.[16]YuchenZhang,JohnDuchi,MichaelJordan,andMartinJ.Wainwright.Information-theoreticlowerboundsfordistributedstatisticalestimationwithcommunicationconstraints.InAd-vancesinNeuralInformationProcessingSystems,pages2328\u20132336,2013.9\f", "award": [], "sourceid": 1924, "authors": [{"given_name": "Yuancheng", "family_name": "Zhu", "institution": "University of Chicago"}, {"given_name": "John", "family_name": "Lafferty", "institution": "University of Chicago"}]}