{"title": "Provable Tensor Factorization with Missing Data", "book": "Advances in Neural Information Processing Systems", "page_first": 1431, "page_last": 1439, "abstract": "We study the problem of low-rank tensor factorization in the presence of missing data. We ask the following question: how many sampled entries do we need, to efficiently and exactly reconstruct a tensor with a low-rank orthogonal decomposition? We propose a novel alternating minimization based method which iteratively refines estimates of the singular vectors. We show that under certain standard assumptions, our method can recover a three-mode $n\\times n\\times n$ dimensional rank-$r$ tensor exactly from $O(n^{3/2} r^5 \\log^4 n)$ randomly sampled entries. In the process of proving this result, we solve two challenging sub-problems for tensors with missing data. First, in analyzing the initialization step, we prove a generalization of a celebrated result by Szemer\\'edie et al. on the spectrum of random graphs. Next, we prove global convergence of alternating minimization with a good initialization. Simulations suggest that the dependence of the sample size on dimensionality $n$ is indeed tight.", "full_text": "ProvableTensorFactorizationwithMissingDataPrateekJainMicrosoftResearchBangalore,Indiaprajain@microsoft.comSewoongOhDept.ofIndustrialandEnterpriseSystemsEngineeringUniversityofIllinoisatUrbana-ChampaignUrbana,IL61801swoh@illinois.eduAbstractWestudytheproblemoflow-ranktensorfactorizationinthepresenceofmissingdata.Weaskthefollowingquestion:howmanysampledentriesdoweneed,toef\ufb01cientlyandexactlyreconstructatensorwithalow-rankorthogonaldecomposi-tion?Weproposeanovelalternatingminimizationbasedmethodwhichiterativelyre\ufb01nesestimatesofthesingularvectors.Weshowthatundercertainstandardas-sumptions,ourmethodcanrecoverathree-moden\u00d7n\u00d7ndimensionalrank-rtensorexactlyfromO(n3/2r5log4n)randomlysampledentries.Intheprocessofprovingthisresult,wesolvetwochallengingsub-problemsfortensorswithmissingdata.First,inanalyzingtheinitializationstep,weproveageneralizationofacelebratedresultbySzemer\u00b4edieetal.onthespectrumofrandomgraphs.Weshowthatthisinitializationstepaloneissuf\ufb01cienttoachievetherootmeansquarederrorontheparametersboundedbyC(r2n3/2(logn)4/|\u2126|)from|\u2126|ob-servedentriesforsomeconstantCindependentofnandr.Next,weproveglobalconvergenceofalternatingminimizationwiththisgoodinitialization.Simulationssuggestthatthedependenceofthesamplesizeonthedimensionalitynisindeedtight.1IntroductionSeveralreal-worldapplicationsroutinelyencountermulti-waydatawithstructurewhichcanbemod-eledaslow-ranktensors.Moreover,inseveralsettings,manyoftheentriesofthetensoraremissing,whichmotivatedustostudytheproblemoflow-ranktensorfactorizationwithmissingentries.Forexample,whenrecordingelectricalactivitiesofthebrain,theelectroencephalography(EEG)signalcanberepresentedasathree-wayarray(temporal,spectral,andspatialaxis).Oftentimessignalsarelostduetomechanicalfailureorlooseconnection.Givennumerousmotivatingapplications,severalmethodshavebeenproposedforthistensorcompletionproblem.However,withtheexceptionof2-waytensors(i.e.,matrices),theexistingmethodsforhigher-ordertensorsdonothavetheoreticalguaranteesandtypicallysufferfromthecurseoflocalminima.Ingeneral,\ufb01ndingafactorizationofatensorisanNP-hardproblem,evenwhenalltheentriesareavailable.However,itwasrecentlydiscoveredthatbyrestrictingattentiontoasub-classoftensorssuchaslow-CPrankorthogonaltensors[1]orlow-CPrankincoherent1tensors[2],onecanef\ufb01-ciently\ufb01ndaprovablyapproximatefactorization.Inparticular,exactrecoveryofthefactorizationispossibleforatensorwithalow-rankorthogonalCPdecomposition[1].Weaskthequestionofre-coveringsuchaCP-decompositionwhenonlyasmallnumberofentriesarerevealed,andshowthatexactreconstructionispossibleevenwhenwedonotobserveanyentryinmostofthe\ufb01bers.Problemformulation.WestudytensorsthathaveanorthonormalCANDECOMP/PARAFAC(CP)tensordecompositionwithasmallnumberofcomponents.Moreover,forsimplicityofnotationand1Thenotionofincoherenceweassumein(2)canbethoughtofasincoherencebetweenthe\ufb01bersandthestandardbasisvectors.1\fexposition,weonlyconsidersymmetricthirdordertensors.Wewouldliketostressthatourtech-niquesgeneralizeseasilytohandlenon-symmetrictensorsaswellashigher-ordertensors.Formally,weassumethatthetruetensorThasthethefollowingform:T=rX\u2018=1\u03c3\u2018(u\u2018\u2297u\u2018\u2297u\u2018)\u2208Rn\u00d7n\u00d7n,(1)withr(cid:28)n,u\u2018\u2208Rnwithku\u2018k=1,andu\u2018\u2019sareorthogonaltoeachother.WeletU\u2208Rn\u00d7rbeatall-orthogonalmatrixwhereu\u2018\u2019sisthe\u2018-thcolumnofUandUi\u22a5Ujfori6=j.Weuse\u2297todenotethestandardouterproductsuchthatthe(i,j,k)-thelementofTisgivenby:Tijk=Pa\u03c3aUiaUjaUka.Wefurtherassumethattheui\u2019sareunstructured,whichisformalizedbythenotionofincoherencecommonlyassumedinmatrixcompletionproblems.Theincoherenceofasymmetrictensorwithorthogonaldecompositionis\u00b5(T)\u2261maxi\u2208[n],\u2018\u2208[r]\u221an|Ui\u2018|,(2)where[n]={1,...,n}isthesetofthe\ufb01rstnintegers.Tensorcompletionbecomesincreasinglydif\ufb01cultfortensorswithlarger\u00b5(T),becausethe\u2018mass\u2019ofthetensorcanbeconcentratedonafewentriesthatmightnotberevealed.Outofn3entriesofT,asubset\u2126\u2286[n]\u00d7[n]\u00d7[n]isrevealed.WeuseP\u2126(\u00b7)todenotetheprojectionofamatrixontotherevealedsetsuchthatP\u2126(T)ijk=(cid:26)Tijkif(i,j,k)\u2208\u2126,0otherwise.WewanttorecoverTexactlyusingthegivenentries(P\u2126(T)).Weassumethateach(i,j,k)foralli\u2264j\u2264kisincludedin\u2126witha\ufb01xedprobabilityp(sinceTissymmetric,weincludeallpermutationsof(i,j,k)).Thisisequivalentto\ufb01xingthetotalnumberofsamples|\u2126|andselecting\u2126uniformlyatrandomoverall(cid:0)n3|\u2126|(cid:1)choices.Thegoalistoensureexactrecoverywithhighprobabilityandfor|\u2126|thatissub-linearinthenumberofentries(n3).Notations.ForatensorT\u2208Rn\u00d7n\u00d7n,wede\ufb01nealinearmappingusingU\u2208Rn\u00d7masT[U,U,U]\u2208Rm\u00d7m\u00d7msuchthatT[U,U,U]ijk=Pa,b,cTabcUaiUbjUck.ThespectralnormofatensoriskTk2=maxkxk=1T[x,x,x].TheHilbert-Schmidtnorm(Frobeniusnormformatrices)ofatensoriskTkF=(Pi,j,kT2ijk)1/2.TheEuclideannormofavectoriskuk2=(Piu2i)1/2.WeuseC,C0todenoteanypositivenumericalconstantsandtheactualvaluemightchangefromlinetoline.1.1AlgorithmIdeally,onewouldliketominimizetherankofatensorthatexplainsallthesampledentries.minimizebTrank(bT)(3)subjecttoTijk=bTijkforall(i,j,k)\u2208\u2126.However,evencomputingtherankofatensorisNP-hardingeneral,wheretherankisde\ufb01nedastheminimumrforwhichCP-decompositionexists[3].Instead,we\ufb01xtherankofbTbyexplicitlymodelingbTasbT=P\u2018\u2208[r]\u03c3\u2018(u\u2018\u2297u\u2018\u2297u\u2018),andsolvethefollowingproblem:minimizebT,rank(bT)=r(cid:13)(cid:13)(cid:13)P\u2126(T)\u2212P\u2126(cid:0)bT(cid:1)(cid:13)(cid:13)(cid:13)2F=minimize{\u03c3\u2018,u\u2018}\u2018\u2208[r](cid:13)(cid:13)(cid:13)P\u2126(T)\u2212P\u2126(cid:0)X\u2018\u2208[r]\u03c3\u2018(u\u2018\u2297u\u2018\u2297u\u2018)(cid:1)(cid:13)(cid:13)(cid:13)2F(4)Recently,[4,5]showedthatanalternatingminimizationtechniquecanrecoveramatrixwithmissingentriesexactly.Wegeneralizeandmodifythealgorithmforthecaseofhigherordertensorsandstudyitrigorouslyfortensorcompletion.However,duetospecialstructureinhigher-ordertensors,ouralgorithmaswellasanalysisissigni\ufb01cantlydifferentthanthematrixcase(seeSection2.2formoredetails).Toperformtheminimization,werepeattheouter-loopgettingre\ufb01nedestimatesforallrcomponents.Intheinner-loop,weloopovereachcomponentandsolveforuqwhile\ufb01xingtheothers{u\u2018}\u20186=q.2\fMoreprecisely,wesetbT=ut+1q\u2297uq\u2297uq+P\u20186=q\u03c3\u2018u\u2018\u2297u\u2018\u2297u\u2018in(4)andthen\ufb01ndoptimalut+1qbyminimizingtheleastsquaresobjectivegivenby(4).Thatis,eachinneriterationisasimpleleastsquaresproblemovertheknownentries,hencecanbeimplementedef\ufb01cientlyandisalsoembarrassinglyparallel.Algorithm1AlternatingMinimizationforTensorCompletion1:Input:P\u2126(T),\u2126,r,\u03c4,\u00b52:Initializewith[(u01,\u03c31),(u02,,\u03c32),...,(u0r,\u03c3r)]=RTPM(P\u2126(T),r)(RTPMof[1])3:[u1,u2,...,ur]=Threshold([u01,u02,...,u0r],\u00b5)(Clippingschemeof[4])4:forallt=1,2,...,\u03c4do5:/*OUTERLOOP*/6:forallq=1,2,...,rdo7:/*INNERLOOP*/8:\u02c6ut+11=argminut+1qkP\u2126(T\u2212ut+1q\u2297uq\u2297uq\u2212P\u20186=q\u03c3\u2018\u00b7u\u2018\u2297u\u2018\u2297u\u2018)k2F9:\u03c3t+1q=k\u02c6uqt+1k210:ut+1q=\u02c6ut+11/k\u02c6ut+1qk211:endfor12:[u1,u2,...,ur]\u2190[ut+11,ut+12,...,ut+1r]13:[\u03c31,\u03c32,...,\u03c3r]\u2190[\u03c3t+11,\u03c3t+12,...,\u03c3t+1r]14:endfor15:Output:bT=Pq\u2208[r]\u03c3q(uq\u2297uq\u2297uq)Themainnoveltyinourapproachisthatwere\ufb01neallrcomponentsiterativelyasopposedtothesequentialde\ufb02ationtechniqueusedbytheexistingmethodsfortensordecomposition(forfullyob-servedtensors).Insequentialde\ufb02ationmethods,components{u1,u2,...,ur}areestimatedse-quentiallyandestimateofsayu2isnotusedtore\ufb01neu1.Incontrast,ouralgorithmiteratesoverallrestimatesintheinnerloop,soastoobtainre\ufb01nedestimatesforallui\u2019sintheouterloop.Webelievethatsuchatechniquecouldbeappliedtoimprovetheerrorboundsof(fullyobserved)tensordecompositionmethodsaswell.Asourmethodisdirectlysolvinganon-convexproblem,itcaneasilygetstuckinlocalminima.Thekeyreasonourapproachcanovercomethecurseoflocalminimaisthatwestartwithaprovablygoodinitialpointwhichisonlyasmalldistanceawayfromtheoptima.Toobtainsuchaninitialestimate,wecomputealow-rankapproximationoftheobservedtensorusingRobustTensorPowerMethod(RTPM)[1].RTPMisageneralizationofthewidelyusedpowermethodforcomputingleadingsingularvectorsofamatrixandcanapproximatethelargestsingularvectorsuptothespectralnormofthe\u201cerror\u201dtensor.Hence,thechallengeistoshowthattheerrortensorhassmallspectralnorm(seeTheorem2.1).Weperformathresholdingstepsimilarto[4](seeLemmaA.4)aftertheRTPMsteptoensurethattheestimateswegetareincoherent.Ouranalysisrequiresthesampledentries\u2126tobeindependentofthecurrentiteratesui,\u2200i,whichingeneralisnotpossibleasui\u2019sarecomputedusing\u2126.Toavoidthisissue,wedividethegivensamples(\u2126)intoequalr\u00b7\u03c4partsrandomlywhere\u03c4isthenumberofouterloops(seeAlgorithm1).1.2MainResultTheorem1.1.Consideranyrank-rsymmetrictensorT\u2208Rn\u00d7n\u00d7nwithanorthogonalCPdecom-positionin(1)satisfying\u00b5-incoherenceasde\ufb01nedin(2).Foranypositive\u03b5>0,thereexistsapositivenumericalconstantCsuchthatifentriesarerevealedwithprobabilityp\u2265C\u00b56r5\u03c34max(logn)4log(rkTkF/\u03b5)\u03c34minn3/2,where\u03c3max,max\u2018\u03c3\u2018and\u03c3min,min\u2018\u03c3\u2018,thenthefollowingholdswithprobabilityatleast1\u2212n\u22125log2(4\u221arkTkF/\u03b5):\u2022theproblem(3)hasauniqueoptimalsolution;and\u2022log2(4\u221arkTkF\u03b5)iterationsofAlgorithm1producesanestimatebTs.t.kT\u2212bTkF\u2264\u03b5.3\fTheaboveresultcanbegeneralizedtok-modetensorsinastraightforwardmanner,whereexactre-coveryisguaranteedif,p\u2265C\u00b56r5\u03c32k\u22122max(logn)4log(rkTkF/\u03b5)\u03c34minnk/2.However,forsimplicityofnotationsandtoemphasizekeypointsofourproof,weonlyfocuson3-modetensorsinSection2.3.WeprovideaproofofTheorem1.1inSection2.Foranincoherent,well-conditioned,andlow-ranktensorwith\u00b5=O(1)and\u03c3min=\u0398(\u03c3max),alternatingminimizationrequiresO(r5n3/2(logn)4)samplestogetwithinanarbitrarilysmallnormalizederror.Thisisavanishingfractionofthetotalnumberofentriesn3.EachstepinthealternatingminimizationrequiresO(r|\u2126|)operations,hencethealternatingminimizationonlyrequiresO(r|\u2126|log(rkTkF/\u03b5))operations.TheinitializationsteprequiresO(rc|\u2126|)operationsforsomepositivenumericalconstantcasprovedin[1].Whenr(cid:28)n,thecomputationalcomplexityscaleslinearlyinthesamplesizeuptoalogarithmicfactor.A\ufb01berinathirdordertensorisann-dimensionalvectorde\ufb01nedby\ufb01xingtwooftheaxesandindexingoverremainingoneaxis.Theabovetheoremimpliesthatamongn2\ufb01bersoftheform{T[I,ej,ek]}j,k\u2208[n],exactrecoveryispossibleevenifonlyO(n3/2(logn)4)\ufb01bershavenon-zerosamples,thatismostofthe\ufb01bersarenotsampledatall.Thisshouldbecomparedtothematrixcompletionsettingwhereall\ufb01bersarerequiredtohaveatleastonesample.However,unlikematrices,thefundamentallimitofhigherordertensorcompletionisnotknown.BuildingonthepercolationofErd\u00a8os-Ren\u00b4yigraphsandthecoupon-collectorsproblem,itisknownthatmatrixcompletionhasmultiplerank-rsolutionswhenthesamplesizeislessthanC\u00b5rnlogn[6],henceexactrecoveryisimpossible.But,suchargumentsdonotgeneralizedirectlytohigherorder;seeSection2.5formorediscussion.Interestingly,simulationsinSection1.3suggeststhatforr=O(\u221an),thesamplecomplexityscalesas(r1/2n3/2logn).Thatis,assumingthesamplecomplexityprovidedbysimulationsiscorrect,ourresultachievesoptimaldependenceonn(uptologfactors).However,thedependencyonrissub-optimal(seeSection2.5foradiscussion).1.3EmpiricalResultsTheorem1.1guaranteesexactrecoverywhenp\u2265Cr5(logn)4/n3/2.Numericalexperimentsshowthattheaveragerecoveryrateconvergestoauniversalcurveover\u03b1,wherep\u2217=\u03b1r1/2lnn/((1\u2212\u03c1)n3/2)inFigure1.Ourboundistightinitsdependencynuptoapoly-logarithmicfactor,butislooseinitsdependencyintherankr.Further,itisabletorecovertheoriginalmatrixexactlyevenwhenthefactorsarenotstrictlyorthogonal.WegenerateorthogonalmatricesU=[u1,...,ur]\u2208Rn\u00d7runiformlyatrandomwithn=50andr=3unlessspeci\ufb01edotherwise.Forarank-rtensorT=Pri=1ui\u2297ui\u2297ui,werandomlyrevealeachentrywithprobabilityp.Atensorisexactlyrecoveredifthenormalizedrootmeansquarederror,RMSE=kT\u2212\u02c6TkF/kTkF,islessthan10\u221272.Varyingnandr,weplottherecoveryrateaveragedover100instancesasafunctionof\u03b1.Thedegreesoffreedominrepresentingasymmetrictensoris\u2126(rn).Henceforlarge,rweneednumberofsamplesscalingasr.Hence,thecurrentdependenceofp\u2217=O(\u221ar)canonlyholdforr=O(n).Fornotstrictlyorthogonalfactors,thealgorithmisrobust.Amorerobustapproachfor\ufb01ndinganinitialguesscouldimprovetheperformancesigni\ufb01cantly,especiallyfornon-orthogonaltensors. 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9n=50n=100n=200\u03b1 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9r=2r=3r=4r=5\u03b1 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 = 0 = 0.2 = 0.3 = 0.4\u03c1\u03c1\u03c1\u03c1\u03b1Figure1:Averagerecoveryrateconvergestoauniversalcurveover\u03b1whenp=\u03b1r1/2lnn/((1\u2212\u03c1)n3/2),where\u03c1=maxi6=j\u2208[r]hui,ujiandr=O(\u221an).2AMATLABimplementationofAlgorithm1usedtoruntheexperimentsisavailableathttp://web.engr.illinois.edu/\u223cswoh/software/optspace.4\f1.4RelatedWorkTensordecompositionandcompletion:TheCPmodelproposedin[7,8,9]isamultidimensionalgeneralizationofsingularvaluedecompositionofmatrices.ComputingtheCPdecompositionin-volvestwosteps:\ufb01rstapplyawhiteningoperatortothetensortogetalowerdimensionaltensorwithorthogonalCPdecomposition.Suchawhiteningoperatoronlyexistswhenr\u2264n.Then,applyknownpower-methodtechniquesforexactorthogonalCPdecomposition[1].Weusethisalgorithmaswellastheanalysisfortheinitialstepofouralgorithm.FormotivationandexamplesoforthogonalCPmodelswereferto[10,1].Recently,manyheuristicsfortensorcompletionhavebeendevelopedsuchastheweightedleastsquares[11],Gauss-Newton[12],alternatingleast-squares[13,14],tracenormminimization[15].However,notheoreticalguaranteesareknownfortheseapproaches.Inadifferentcontext,[16]showsthatminimizingaweightedtracenormof\ufb02attenedtensorprovidesexactrecoveryusingO(rn3/2)samples,buteachobservationneedstobeadenserandomprojectionofthetensorasopposedtoobservingjustasingleentry,whichisthecaseinthetensorcompletionproblem.In[17],anadaptivesamplingmethodwithanestimationalgorithmwasproposedthatprovablyrecoversak-moderank-rtensorwithO(nrk\u22120.5\u00b5k\u22121klog(r)).However,theestimationalgorithmaswellstheanalysiscruciallyreliesonadaptivesamplinganddoesnotgeneralizetorandomsamples.Relationtomatrixcompletion:Matrixcompletionhasbeenstudiedextensivelyinthelastdecadesincetheseminalpaper[18].Sincethen,provableapproacheshavebeendeveloped,suchas,nuclearnormminimization[18,19],OptSpace[20,21],andAlternatingMinimization[4].However,severalaspectsoftensorfactorizationmakesitchallengingtoadoptmatrixcompletionapproachesdirectly.First,thereisnonaturalconvexsurrogateofthetensorrankfunctionanddevelopingsuchafunctionisinfactatopicofactiveresearch[22,16].Next,evenwhenallentriesarerevealed,tensorde-compositionmethodssuchassimultaneouspoweriterationareknowntogetstuckatlocalextrema,makingitchallengingtoapplymatrixdecompositionmethodsdirectly.Third,fortheinitializationstep,thebestlow-rankapproximationofamatrixisuniqueand\ufb01ndingitistrivial.However,fortensors,\ufb01ndingthebestlow-rankapproximationisnotoriouslydif\ufb01cult.Ontheotherhand,someaspectsoftensordecompositionmakesitpossibletoprovestrongerresults.Matrixcompletionaimstorecovertheunderlyingmatrixonly,sincethefactorsarenotuniquelyde\ufb01nedduetoinvarianceunderrotations.However,fororthogonalCPmodels,wecanhopetorecovertheindividualsingularvectorsui\u2019sexactly.Infact,Theorem1.1showsthatourmethodindeedrecoverstheindividualsingularvectorsexactly.Spectralanalysisoftensorsandhypergraphs:Theorem2.1andLemma2.2shouldbecomparedtocopiouslineofworkonspectralanalysisofmatrices[23,20],withanimportantmotivationofdevelopingfastalgorithmsforlow-rankmatrixapproximations.Weproveananalogousguaranteeforhigherordertensorsandprovideafastalgorithmforlow-ranktensorapproximation.Theorem2.1isalsoageneralizationofthecelebratedresultofFriedman-Kahn-Szemer\u00b4edi[24]andFeige-Ofek[25]onthesecondeigenvalueofrandomgraphs.Weprovideanupperboundthelargestsecondeigenvalueofarandomhypergraph,whereeachedgeincludesthreenodesandeachofthe(cid:0)n3(cid:1)edgesisselectedwithprobabilityp.2AnalysisoftheAlternatingMinimizationAlgorithmInthissection,weprovideaproofofTheorem1.1andtheproofsketchesoftherequiredmaintechnicaltheorems.WerefertotheAppendixforformalproofsofthetechnicaltheoremsandlemmas.Therearetwokeycomponents:a)theanalysisoftheinitializationstep(Section2.1);andb)theconvergenceofalternatingminimizationgivenasuf\ufb01cientlyaccurateinitialization(Section2.2).WeusethesetwoanalysestoproveTheorem1.1inSection2.3.2.1InitializationAnalysisWe\ufb01rstshowthat(1/p)P\u2126(T)isclosetoTinspectralnorm,anduseitboundtheerrorofrobustpowermethodapplieddirectlytoP\u2126(T).Thenormalizationby(1/p)compensatesforthefactthatmanyentriesaremissing.Foraproofofthistheorem,werefertoAppendixA.5\fTheorem2.1(Initialization).Forp=\u03b1/n3/2satisfying\u03b1\u2265logn,thereexistsapositiveconstantC>0suchthat,withprobabilityatleast1\u2212n\u22125,1Tmaxn3/2pkP\u2126(T)\u2212pTk2\u2264C(logn)2\u221a\u03b1,(5)whereTmax\u2261maxi,j,kTijk,andkTk2\u2261maxkuk=1T[u,u,u]isthespectralnorm.NoticethatTmaxisthemaximumentryinthetensorTandthefactor1/(Tmaxn3/2p)correspondstonormalizationwiththeworstcasespectralnormofpT,sincekpTk2\u2264Tmaxn3/2pandthemaxi-mumisachievedbyT=Tmax(1\u22971\u22971).ThefollowingtheoremguaranteesthatO(n3/2(logn)2)samplesaresuf\ufb01cienttoensurethatwegetarbitrarilysmallerror.AformalproofisprovidedintheAppendix.Togetherwithananalysisofrobusttensorpowermethod[1,Theorem5.1],thenexterrorboundfollowsfromdirectlysubstituting(5)andusingthefactthatforincoherenttensorsTmax\u2264\u03c3max\u00b5(T)3r/n3/2.Noticethattheestimatescanbecomputedef\ufb01ciently,requiringonlyO(logr+loglog\u03b1)iterations,eachiterationrequiringO(\u03b1n3/2)operations.Thisisclosetothetimere-quiredtoreadthe|\u2126|\u2019\u03b1n3/2samples.Onecaveatisthatweneedtorunrobustpowermethodpoly(rlogn)times,eachwithfreshrandominitializations.Lemma2.2.Fora\u00b5-incoherenttensorwithorthogonaldecompositionT=Pr\u2018=1\u03c3\u2217\u2018(u\u2217\u2018\u2297u\u2217\u2018\u2297u\u2217\u2018)\u2208Rn\u00d7n\u00d7n,thereexistspositivenumericalconstantsC,C0suchthatwhen\u03b1\u2265C(\u03c3max/\u03c3min)2r5\u00b56(logn)4,runningC0(logr+loglog\u03b1)iterationsoftherobusttensorpowermethodappliedtoP\u2126(T)achievesku\u2217\u2018\u2212u0\u2018k2\u2264C0\u03c3\u2217max|\u03c3\u2217\u2018|\u00b53r(logn)2\u221a\u03b1,|\u03c3\u2217\u2018\u2212\u03c3\u2018||\u03c3\u2217\u2018|\u2264C0\u03c3\u2217max|\u03c3\u2217\u2018|\u00b53r(logn)2\u221a\u03b1,forall\u2018\u2208[r]withprobabilityatleast1\u2212n\u22125,where\u03c3\u2217max=max\u2018\u2208[r]|\u03c3\u2217\u2018|and\u03c3\u2217min=min\u2018\u2208[r]|\u03c3\u2217\u2018|.2.2AlternatingMinimizationAnalysisWenowprovideconvergenceanalysisforthealternatingminimizationpartofAlgorithm1torecoverrank-rtensorT.Ouranalysisassumesthatkui\u2212u\u2217ik2\u2264c\u03c3min/r\u03c3max,\u2200iwherecisasmallconstant(dependentonrandtheconditionnumberofT).Theabovementionedassumptioncanbesatis\ufb01edusingourinitializationanalysisandbyassuming\u2126islarge-enough.Atahigh-level,ouranalysisshowsthateachstepofAlgorithm1ensuresgeometricdecayofadistancefunction(speci\ufb01edbelow)whichis\u201csimilar\u201dtomaxjkutj\u2212u\u2217jk2.Formally,letT=Pr\u2018=1\u03c3\u2217\u2018\u00b7u\u2217\u2018\u2297u\u2217\u2018\u2297u\u2217\u2018.WLOG,wecanassumethatthat\u03c3\u2217\u2018\u22641.Also,let[U,\u03a3]={(u\u2018,\u03c3\u2018),1\u2264\u2018\u2264r},bethet-thstepiteratesofAlgorithm1.Weassumethatu\u2217\u2018,\u2200\u2018are\u00b5-incoherentandu\u2018,\u2200\u2018are2\u00b5-incoherent.De\ufb01ne,\u2206\u03c3\u2018=|\u03c3\u2018\u2212\u03c3\u2217\u2018|\u03c3\u2217\u2018,u\u2018=u\u2217\u2018+d\u2018,(\u2206\u03c3\u2018)t+1=|\u03c3t+1\u2018\u2212\u03c3\u2217\u2018|\u03c3\u2217\u2018,andut+1\u2018=u\u2217\u2018+dt+1\u2018.Now,de\ufb01nethefollowingdistancefunction:d\u221e([U,\u03a3],[U\u2217,\u03a3\u2217])\u2261max\u2018(kd\u2018k2+\u2206\u03c3\u2018).Thenexttheoremshowsthatthisdistancefunctiondecreasesgeometricallywithnumberofitera-tionsofAlgorithm1.AproofofthistheoremisprovidedinAppendixB.4.Theorem2.3.Ifd\u221e([U,\u03a3],[U\u2217,\u03a3\u2217])\u226411600r\u03c3\u2217min\u03c3\u2217maxanduiis2\u00b5-incoherentforall1\u2264i\u2264r,thenthereexistsapositiveconstantCsuchthatforp\u2265Cr2(\u03c3\u2217max)2\u00b53log2n(\u03c3\u2217min)2n3/2wehavew.p.\u22651\u22121n7,d\u221e([Ut+1,\u03a3t+1],[U\u2217,\u03a3\u2217])\u226412d\u221e([U,\u03a3],[U\u2217,\u03a3\u2217]),where[Ut+1,\u03a3t+1]={(ut+1\u2018,\u03c3t+1\u2018),1\u2264\u2018\u2264r}arethe(t+1)-thstepiteratesofAlgorithm1.Moreover,eachut+1\u2018is2\u00b5-incoherentforall\u2018.6\f 1e-16 1e-14 1e-12 1e-10 1e-08 1e-06 0.0001 0.01 1 0 5 10 15 20 25 30p=0.0025, fit error RMSEp=0.1, fit error RMSEiterationserrorFigure2:Algorithm1exhibitslinearconvergenceuntilmachineprecision.FortheestimatebTtatthet-thiterations,the\ufb01terrorkP\u2126(T\u2212bTt)kF/kP\u2126(T)kFcloselytracksthenormalizedrootmeansquarederrorkT\u2212bTtkF/kTkF,suggestingthatitservesasagoodstoppingcriterion.Notethatournumberofsamplesdependonthenumberofiterations\u03c4.Butduetolinearconver-gence,oursamplecomplexityincreasesonlybyafactoroflog(1/\u0001)where\u0001isthedesiredaccu-racy.DifferencefromMatrixAltMin:Here,wewouldliketohighlightdifferencesbetweenouranalysisandanalysisofthealternatingminimizationmethodformatrixcompletion(matrixAltMin)[4,5].Inthematrixcase,thesingularvectorsu\u2217i\u2019sneednotbeunique.Hence,theanalysisisrequiredtoguaranteeadecayinthesubspacedistancedist(U,U\u2217);typically,principalanglebasedsubspacedistanceisusedforanalysis.Incontrast,orthonormalu\u2217i\u2019suniquelyde\ufb01nethetensorandhenceonecanobtaindistanceboundskui\u2212u\u2217ik2foreachcomponentuiindividually.Ontheotherotherhand,aniterationofthematrixAltMiniteratesoverallthevectorsui,1\u2264i\u2264r,whereristherankofthecurrentiterateandhencedon\u2019thavetoconsidertheerrorinestimationofthe\ufb01xedcomponentsU[r]\\q={u\u2018,\u2200\u20186=q},whichisachallengefortheanalysisofAlgorithm1andrequirescarefuldecompositionandboundsoftheerrorterms.2.3ProofofTheorem1.1LetT=Prq=1\u03c3\u2217q(u\u2217q\u2297u\u2217q\u2297u\u2217q).DenotetheinitialestimatesU0=[u01,...,u0r]and\u03c30=[\u03c301,...,\u03c30r]tobetheoutputofrobusttensorpowermethodatstep5ofAlgorithm1.Withachoiceofp\u2265C(\u03c3\u2217max)4\u00b56r4(logn)4/(\u03c3\u2217min)4n3/2asperourassumption,Lemma2.2ensuresthatwehaveku0q\u2212u\u2217qk\u2264\u03c3\u2217min/(4800r\u03c3max)and|\u03c30q\u2212\u03c3\u2217q|\u2264|\u03c3\u2217q|\u03c3\u2217min/(4800r\u03c3max)withprobabilityatleast1\u2212n\u22125.Thisrequiresrunningrobusttensorpowermethodfor(rlogn)crandominitializationsforsomepositiveconstantc,eachrequiringO(|\u2126|)operationsignoringlogarithmicfactors.Toensurethatwehavesuf\ufb01cientlyincoherentinitialiterate,weperformthresholdingproposedin[4].Inparticular,wethresholdalltheelementsofu0i(obtainedfromRTPMmethod,seeStep3ofAlgorithm1)thatarelarger(inmagnitude)than\u00b5/\u221antobesign(u\u2018(i))\u00b5\u221anandthenre-normalizetoobtainui.UsingLemmaA.4,thisprocedureensuresthattheobtainedinitialestimateuisatis\ufb01esthetwocriteriathatisrequiredbyTheorem2.3:a)kui\u2212u\u2217ik2\u226411600r\u00b7\u03c3\u2217min\u03c3\u2217max,andb)uiis2\u00b5-incoherent.Withthisinitialization,Theorem2.3tellsusthatO(log2(4r1/2kTkF/\u03b5)iterations(eachiterationrequiresO(r|\u2126|)operations)issuf\ufb01cienttoachieve:kuq\u2212u\u2217qk2\u2264\u03b54r1/2kTkFand|\u03c3q\u2212\u03c3\u2217q|\u2264|\u03c3\u2217q|\u03b54r1/2kTkF,forallq\u2208[r]withprobabilityatleast1\u2212n\u22127log2(4r1/2kTkF/\u03b5).Thedesiredboundfollowsfromthenextlemmawithachoiceof\u02dc\u03b5=\u03b5/4r1/2kTkF.ForaproofwerefertoAppendixB.6.Lemma2.4.Foranorthogonalrank-rtensorT=Prq=1\u03c3\u2217q(u\u2217q\u2297u\u2217q\u2297u\u2217q)andanyrank-rtensorbT=Prq=1\u03c3q(uq\u2297uq\u2297uq)satisfyingku\u2212u\u2217k2\u2264\u02dc\u03b5and|\u03c3\u2212\u03c3\u2217|\u2264|\u03c3\u2217|\u02dc\u03b5forallq\u2208[r]andforallpositive\u02dc\u03b5>0,wehavekT\u2212bTkF\u22644r1/2kTkF\u02dc\u03b5.7\f2.4FundamentallimitandrandomhypergraphsFormatrices,itisknownthatexactmatrixcompletionisimpossibleiftheunderlyinggraphisdisconnected.ForErd\u00a8os-Ren\u00b4yigraphs,whensamplesizeislessthanC\u00b5rnlogn,noalgorithmcanrecovertheoriginalmatrix[6].However,fortensorcompletionandrandomhypergraphs,suchasimpleconnectiondoesnotexist.Itisnotknownhowthepropertiesofthehypergraphisrelatedtorecovery.Inthisspirit,arank-onethird-ordertensorcompletionhasbeenstudiedinaspeci\ufb01ccontextofMAX-3LINproblems.Consideraseriesoflinearequationsovernbinaryvariablesx=[x1...xn]\u2208{\u00b11}n.Aninstanceofa3LINproblemconsistsofasetoflinearequationsonGF(2),whereeachequationinvolveexactlythreevariables,e.g.x1\u2295x2\u2295x3=+1,x2\u2295x3\u2295x4=\u22121,x3\u2295x4\u2295x5=+1(6)Weuse\u22121todenotetrue(or1inGF(2))and+1todenotefalse(or0inGF(2)).Thentheexclusive-oroperationdenotedby\u2295istheintegermultiplication.theMAX-3LINproblemisto\ufb01ndasolutionxthatsatis\ufb01esasmanynumberofequationsaspossible.ThisisanNP-hardproblemingeneral,andhencerandominstancesoftheproblemwithaplantedsolutionhasbeenstudied[26].Algorithm1providesaprovableguaranteeforMAX-3LINwithrandomassignments.Corollary2.5.ForrandomMAX-3LINproblemwithaplantedsolution,underthehypothesesofTheorem1.1,Algorithm1\ufb01ndsthecorrectsolutionwithhighprobability.Noticethatthistensorhasincoherenceoneandrankone.ThisimpliesexactreconstructionforP\u2265C(logn)4/n3/2.Thissigni\ufb01cantlyimprovesoveramessage-passingapproachtoMAX-3LINin[26],whichisguaranteedto\ufb01ndtheplantedsolutionforp\u2265C(loglogn)2/(nlogn).Itwassuggestedthatanewnotionofconnectivitycalledpropagationconnectivityisasuf\ufb01cientconditionforthesolutionofrandomMAX-3LINproblemwithaplantedsolutiontobeunique[26,Proposition2].Precisely,itisclaimedthatifthehypergraphcorrespondingtoaninstanceofMAX-3LINispropagationconnected,thentheoptimalsolutionforMAX-3LINisuniqueandthereisanef\ufb01cientalgorithmthat\ufb01ndsit.However,theexamplein6ispropagationconnectedbutthereisnouniquesolution:both[1,1,1,\u22121,\u22121]and[1,\u22121,\u22121,1,\u22121]satisfytheequations.Hence,propagationconnectivityisnotasuf\ufb01cientconditionforuniquenessoftheMAX-3LINsolution.2.5OpenProblemsandFutureDirectionsTensorcompletionfornon-orthogonaldecomposition.Numericalsimulationssuggeststhatnon-orthogonalCPmodelscanberecoveredexactly(withouttheusualwhiteningstep).Itwouldbein-terestingtoanalyzeouralgorithmundernon-orthogonalCPmodel.However,wewouldliketopointherethatevenwithfullyobservedtensor,exactfactorizationisknownonlyfororthonormaltensors.Now,giventhatourmethodguaranteesnotonlycompletionbutalsotensorfactorization(whichisessentialforlargescaleapplications),ourmethodwouldrequireasimilarcondition.Optimaldependenceonr.Thenumericalresultssuggestthethresholdsamplesizescalingas\u221ar.ThisissurprisingsincethedegreesoffreedomindescribingaCPmodelscaleslinearlyinr,im-plyingthatthe\u221arscalingonlyholdsforr=O(\u221an).Incomparison,formatrixcompletionthethresholdscalesasr.Itisimportanttounderstandwhythischangeindependenceinrhappensforhigherordertensors,andidentifyhowitdependsonkfork-thordertensorcompletion.Mis-speci\ufb01edrand\u00b5.Thealgorithmrequirestheknowledgeoftherankrandtheincoherence\u00b5.Thealgorithmisnotsensitivetotheknowledgeof\u00b5.Infact,allthenumericalexperimentsarerunwithoutspecifyingtheincoherence,andwithouttheclippingstep.Aninterestingdirectionistounderstandthepriceofmis-speci\ufb01edrankandtoestimatethetruerankfromdata.References[1]AnandkumarAnima,GeRong,HsuDaniel,M.KakadeSham,andMatusTelgarsky.Tensordecompositionsforlearninglatentvariablemodels.CoRR,abs/1210.7559,2012.[2]A.Anandkumar,R.Ge,andM.Janzamin.Guaranteednon-orthogonaltensordecompositionviaalternatingrank-1updates.arXivpreprintarXiv:1402.5180,2014.[3]V.DeSilvaandL.-H.Lim.Tensorrankandtheill-posednessofthebestlow-rankapproxima-tionproblem.SIAMJournalonMatrixAnalysisandApplications,30(3):1084\u20131127,2008.8\f[4]P.Jain,P.Netrapalli,andS.Sanghavi.Low-rankmatrixcompletionusingalternatingmini-mization.InSTOC,pages665\u2013674,2013.[5]M.Hardt.Ontheprovableconvergenceofalternatingminimizationformatrixcompletion.arXivpreprintarXiv:1312.0925,2013.[6]E.J.Cand`esandT.Tao.Thepowerofconvexrelaxation:Near-optimalmatrixcompletion.InformationTheory,IEEETransactionson,56(5):2053\u20132080,2010.[7]F.L.Hitchcock.Theexpressionofatensororapolyadicasasumofproducts.1927.[8]JDouglasCarrollandJih-JieChang.Analysisofindividualdifferencesinmultidimen-sionalscalingviaann-waygeneralizationofeckart-youngdecomposition.Psychometrika,35(3):283\u2013319,1970.[9]RichardAHarshman.Foundationsoftheparafacprocedure:modelsandconditionsforanexplanatorymultimodalfactoranalysis.1970.[10]T.ZhangandG.H.Golub.Rank-oneapproximationtohighordertensors.SIAMJournalonMatrixAnalysisandApplications,23(2):534\u2013550,2001.[11]E.Acar,D.M.Dunlavy,T.G.Kolda,andM.M\u00f8rup.Scalabletensorfactorizationsforincom-pletedata.ChemometricsandIntelligentLaboratorySystems,106(1):41\u201356,2011.[12]G.TomasiandR.Bro.Parafacandmissingvalues.ChemometricsandIntelligentLaboratorySystems,75(2):163\u2013180,2005.[13]RasmusBro.Multi-wayanalysisinthefoodindustry:models,algorithms,andapplications.PhDthesis,K\u00f8benhavnsUniversitetK\u00f8benhavnsUniversitet,1998.[14]BWalczakandDLMassart.Dealingwithmissingdata:Parti.ChemometricsandIntelligentLaboratorySystems,58(1):15\u201327,2001.[15]J.Liu,P.Musialski,P.Wonka,andJ.Ye.Tensorcompletionforestimatingmissingvaluesinvisualdata.PatternAnalysisandMachineIntelligence,IEEETrans.on,35(1):208\u2013220,2013.[16]C.Mu,B.Huang,J.Wright,andD.Goldfarb.Squaredeal:Lowerboundsandimprovedrelaxationsfortensorrecovery.arXivpreprintarXiv:1307.5870,2013.[17]A.KrishnamurthyandA.Singh.Low-rankmatrixandtensorcompletionviaadaptivesam-pling.InAdvancesinNeuralInformationProcessingSystems,pages836\u2013844,2013.[18]E.J.Cand`esandB.Recht.Exactmatrixcompletionviaconvexoptimization.FoundationsofComputationalMathematics,9(6):717\u2013772,2009.[19]S.NegahbanandM.J.Wainwright.Restrictedstrongconvexityand(weighted)matrixcom-pletion:Optimalboundswithnoise.JournalofMachineLearningResearch,2012.[20]R.H.Keshavan,A.Montanari,andS.Oh.Matrixcompletionfromafewentries.InformationTheory,IEEETransactionson,56(6):2980\u20132998,2010.[21]R.HKeshavan,A.Montanari,andS.Oh.Matrixcompletionfromnoisyentries.JournalofMachineLearningResearch,11(2057-2078):1,2010.[22]R.TomiokaandT.Suzuki.Convextensordecompositionviastructuredschattennormregu-larization.InNIPS,pages1331\u20131339,2013.[23]Y.Azar,A.Fiat,A.Karlin,F.McSherry,andJ.Saia.Spectralanalysisofdata.InProc.ofthe33rdannualACMsymposiumonTheoryofcomputing,pages619\u2013626.ACM,2001.[24]J.Friedman,J.Kahn,andE.Szemer\u00b4edi.Onthesecondeigenvalueinrandomregulargraphs.InProceedingsoftheTwenty-FirstAnnualACMSymposiumonTheoryofComputing,pages587\u2013598,Seattle,Washington,USA,may1989.ACM.[25]U.FeigeandE.Ofek.Spectraltechniquesappliedtosparserandomgraphs.RandomStruct.Algorithms,27(2):251\u2013275,2005.[26]R.BerkeandM.Onsj\u00a8o.Propagationconnectivityofrandomhypergraphs.InStochasticAlgo-rithms:FoundationsandApplications,pages117\u2013126.Springer,2009.9\f", "award": [], "sourceid": 788, "authors": [{"given_name": "Prateek", "family_name": "Jain", "institution": "Microsoft Research"}, {"given_name": "Sewoong", "family_name": "Oh", "institution": "UIUC"}]}