{"title": "Deep Signature Transforms", "book": "Advances in Neural Information Processing Systems", "page_first": 3105, "page_last": 3115, "abstract": "The signature is an infinite graded sequence of statistics known to characterise a stream of data up to a negligible equivalence class. It is a transform which has previously been treated as a fixed feature transformation, on top of which a model may be built. We propose a novel approach which combines the advantages of the signature transform with modern deep learning frameworks. By learning an augmentation of the stream prior to the signature transform, the terms of the signature may be selected in a data-dependent way. More generally, we describe how the signature transform may be used as a layer anywhere within a neural network. In this context it may be interpreted as a pooling operation. We present the results of empirical experiments to back up the theoretical justification. Code available at \\texttt{github.com/patrick-kidger/Deep-Signature-Transforms}.", "full_text": "DeepSignatureTransformsPatricBonnier1,\u2217PatrickKidger1,2,\u2217ImanolPerezArribas1,2,\u2217CristopherSalvi1,2,\u2217TerryLyons1,21MathematicalInstitute,UniversityofOxford2TheAlanTuringInstitute,BritishLibrary{bonnier,kidger,perez,salvi,tlyons}@maths.ox.ac.ukAbstractThesignatureisanin\ufb01nitegradedsequenceofstatisticsknowntocharacteriseastreamofdatauptoanegligibleequivalenceclass.Itisatransformwhichhaspreviouslybeentreatedasa\ufb01xedfeaturetransformation,ontopofwhichamodelmaybebuilt.Weproposeanovelapproachwhichcombinestheadvantagesofthesignaturetransformwithmoderndeeplearningframeworks.Bylearninganaugmentationofthestreampriortothesignaturetransform,thetermsofthesignaturemaybeselectedinadata-dependentway.Moregenerally,wedescribehowthesignaturetransformmaybeusedasalayeranywherewithinaneuralnetwork.Inthiscontextitmaybeinterpretedasapoolingoperation.Wepresenttheresultsofempiricalexperimentstobackupthetheoreticaljusti\ufb01cation.Codeavailableatgithub.com/patrick-kidger/Deep-Signature-Transforms.1Introduction1.1Whatisthesignaturetransform?Whendataisorderedsequentiallythenitcomeswithanaturalpath-likestructure:thedatamaybethoughtofasadiscretisationofapathX:[0,1]\u2192V,whereVissomeBanachspace.InpracticeweshallalwaystakeV=Rdforsomed\u2208N.ForexamplethechangingairpressureataparticularlocationmaybethoughtofasapathinR;themotionofapenonpapermaybethoughtofasapathinR2;thechangeswithin\ufb01nancialmarketsmaybethoughtofasapathinRd,withdpotentiallyverylarge.Givenapath,wemayde\ufb01neitssignature,whichisacollectionofstatisticsofthepath.Themapfromapathtoitssignatureiscalledthesignaturetransform.De\ufb01nition1.1.Letx=(x1,...,xn),wherexi\u2208Rd.Letf=(f1,...,fd):[0,1]\u2192Rdbecontinuous,suchthatf(i\u22121n\u22121)=xi,andlinearontheintervalsinbetween.Thenthesignatureofxisde\ufb01nedasthecollectionofiteratedintegrals2Sig(x)=\uf8eb\uf8ec\uf8ed\uf8eb\uf8edZ\u00b7\u00b7\u00b7Z0<t1<\u00b7\u00b7\u00b7<tk<1kYj=1dfijdt(tj)dt1\u00b7\u00b7\u00b7dtk\uf8f6\uf8f81\u2264i1,...,ik\u2264d\uf8f6\uf8f7\uf8f8k\u22650.\u2217Equalcontribution.2Forclarityherewehaveusedmorewidely-understoodnotation.Thede\ufb01nitionofthesignaturetransformisusuallywritteninanequivalentbutalternatemannerusingthenotationofstochasticcalculus;seeDe\ufb01nitionA.1inAppendixA.33rdConferenceonNeuralInformationProcessingSystems(NeurIPS2019),Vancouver,Canada.\fWeshalloftenusethetermsignaturetorefertobothapath\u2019ssignatureandthesignaturetransform.Othertextssometimesusethetermpathsignatureinasimilarmanner.Wereferthereaderto[1]foraprimerontheuseofthesignatureinmachinelearning.AbriefoverviewofitskeypropertiesmaybefoundinAppendixA,alongwithassociatedreferences.Inshort,thesignatureofapathdeterminesthepathessentiallyuniquely,anddoessoinanef\ufb01cient,computableway.Furthermore,thesignatureisrichenoughthateverycontinuousfunctionofthepathmaybeapproximatedarbitrarilywellbyalinearfunctionofitssignature;itmaybethoughtofasa\u2018universalnonlinearity\u2019.Takentogetherthesepropertiesmakethesignatureanattractivetoolformachinelearning.Themostsimplewaytousethesignatureisasfeaturetransformation,asitmayoftenbesimplertolearnafunctionofthesignaturethanoftheoriginalpath.OriginallyintroducedandstudiedbyChenin[2,3,4],thesignaturehasseenusein\ufb01nance[5,6,7,8,9],roughpaththeory[10,11]andmachinelearning[12,13,14,15,16,17,18,19,20].1.2ComparisontotheFouriertransformThesignaturetransformismostcloselyanalogoustotheFouriertransform.ThefundamentaldifferencebetweenthesignaturetransformandclassicalsignaltransformssuchasFouriertransformsandwaveletsisthatthelatterareusedtomodelacurveasalinearcombinationinafunctionalbasis.Thesignaturedoesnottrytomodelorparameterisethecurveitself,butinsteadprovidesabasisforfunctionsonthespaceofcurves.Forexample,regularlyseeingthesequence:phonecall,trade,pricemovementinthestreamofof\ufb01cedatamonitoringatradermightbeanindicationofinsidertrading.Suchoccurrencesarestraightforwardtodetectbyviaalinearregressioncomposedwiththesignaturetransform.ModellingthissignalusingFourierseriesorwaveletswouldbemuchmoreexpensive:linearityofthesetransformsimplythateachchannelmustberesolvedaccuratelyenoughtoseetheorderofevents.Fromasignalprocessingperspective,thesignaturecanbethoughtofasa\ufb01lterwhichisinvarianttoresamplingoftheinputsignal.(SeePropositionA.7inAppendixA).1.3UseofthesignaturetransforminmachinelearningThesignatureisanin\ufb01nitesequence,soinpracticesome\ufb01nitecollectionoftermsmustbeselected.Sincethemagnitudeofthetermsexhibitfactorialdecay,seePropositionA.5inAppendixA,itisusual[21]tosimplychoosethe\ufb01rstNtermsofthissequence,whichwilltypicallybethelargestterms.These\ufb01rstNtermsarecalledthesignatureofdepthNorthetruncatedsignatureofdepthN,andthecorrespondingtransformisdenotedSigN.Butifthefunctiontobelearneddependednontriviallyonthehigherdegreeterms,thencrucialinformationhasnonethelessbeenlost.Thismayberemedied.Applyapointwiseaugmentationtotheoriginalstreamofdatabeforetakingthesignature.Thenthe\ufb01rstNtermsofthesignaturemaybetterencodethenecessaryinformation[19,20].Explicitly,let\u03a6:Rd\u2192Rebe\ufb01xed;onecouldensurethatinformationisnotlostbytaking\u03a6(x)=(x,\u03d5(x))forsome\u03d5.Thenratherthantakingthesignatureofx=(x1,...,xn),wherexi\u2208Rd,insteadtakethesignatureof\u03a6(x)=(\u03a6(x1),...,\u03a6(xn)).Inthiswayonemaycapturehigherorderinformationfromthestreaminthelowerdegreetermsofthesignature.1.4OurworkButhowshouldthisaugmentation\u03a6bechosen?Previousworkhas\ufb01xeditarbitrarily,orexperimentedwithseveraloptionsbeforechoosingone[19,20].Observethatineachcasethemapx7\u2192SigN(\u03a6(x))isstillultimatelyjustafeaturetransformationontopofwhichamodelisbuilt.Ourmoregeneralapproachistoallowtheselectionof\u03a6tobedata-dependent,byhavingitbelearned;inparticularitmaybeaneuralnetwork.Furthermorethereisnoreasonitshouldnecessarilyoperatepointwise,nor(sinceitisnowlearned)needitbeoftheform(x,\u03d5(x)).Inthiswaywemayenjoythebene\ufb01tsofusingsignatureswhileavoidingtheirmainlimitation.Butthismeansthatthesignaturetransformisessentiallyoperatingasalayerwithinaneuralnetwork.Itconsumesatensorofshape(b,d,n)\u2013correspondingtoabatchofsizebofpathsinRdthathave2\fbeensampledntimes\u2013andreturnsatensorofshape(b,(dN+1\u22121)/(d\u22121)),whereNisthenumberoftermsusedinthetruncatedsignature.3Thesignatureisbeingusedasapoolingoperation.Thereisnoreasontostophere.Ifthesignaturelayerworkswelloncethenitisnaturaltoseektouseitagain.Theobviousproblemisthatthesignaturetransformconsumesastreamofdataandreturnsstatisticswhichhavenoobviousstream-likequalities.Thesolutionistolifttheinputstreamtoastreamofstreams;forexample,thestreamofdata(x1,...,xn)maybeliftedtothe\u2018expandingwindows\u2019of(x2,...,xn),wherexi=(x1,...,xi).Nowapplythesignaturetoeachstreamtoobtainastreamofsignatures(SigN(x2),...,SigN(xn)),whichisessentiallyastreaminEuclideanspace.Andnowthisnewstreammaybeaugmentedviaaneuralnetworkandtheprocessrepeatedagain,asmanytimesaswewish.Inthiswaythesignaturetransformhasbeenelevatedfromaone-timefeaturetransformationtoa\ufb01rst-classlayerwithinaneuralnetwork.Thuswemayreapthebene\ufb01tsofboththesignaturetransform,withitsstrongcorpusofmathematicaltheory,andthebene\ufb01tsofneuralnetworks,withtheirgreatempiricalsuccess.Naturallyallofthisimpliestheneedforanef\ufb01cientimplementationofthesignaturetransform.Suchconcernshavemotivatedthecreationofthespin-offSignatoryproject[22].Theremainderofthepaperislaidoutasfollows.InSection2webrie\ufb02ydiscusssomerelatedwork,inSection3wedetailthespeci\ufb01csofembeddingthesignatureasalayerwithinaneuralnetwork.Sections4coversexperiments;wedemonstratepositiveresultsforgenerative,supervised,andreinforcementlearningproblems.Section5istheconclusion.AppendixAprovidesanexpositionofthetheoreticalpropertiesofthesignature,andAppendixBspeci\ufb01esimplementationdetails.2RelatedWorkThesignaturetransformisroughlyanalogoustotheuseofwaveletsorFouriertransforms,andtherearealsorelatedmodelsbasedaroundthese,forexample[23,24,25,26].Wedonotknowofadetailedcomparisonbetweentheuseofthesevarioustransformationsinthecontextofmachinelearning.Somerelatedworkusingsignatureshasalreadybeendiscussedintheprevioussection.Weexpandontheirproposedmodelshere.De\ufb01nition2.1.GivenasetV,thespaceofstreamsofdatainVisde\ufb01nedasS(V)={x=(x1,...,xn):xi\u2208V,n\u2208N}.Givenx=(x1,...,xn)\u2208S(V),theintegerniscalledthelengthofx.TwosimplemodelsutilisingthesignaturelayerareshowninFigure1.Inprincipletheuniversalnonlinearitypropertyofsignatures(seePropositionA.6inAppendixA)guaranteesthatthemodelshowninFigure1a,isrichenoughtolearnanycontinuousfunction.(Withtheneuralnetworktakentobeasinglelinearlayerandtheinputstreamassumedtoalreadybetime-augmented.)Inpractice,ofcourse,thesignaturemustbetruncated.Furthermore,itisnotclearhowtoappropriatelychoosethetruncationhyperparameterN.Thusamorepracticalapproachistoremovetherestrictionthattheneuralnetworkmustbelinear,andlearnanonlinearfunctioninstead.Thisapproachhasbeenappliedsuccessfullyinvarioustasks[5,12,13,14,15,16,17,18].3As(dN+1\u22121)/(d\u22121)=PNk=0dkisthenumberofscalarvaluesinasignaturewithNterms.xInputstreamSignaturetransformSigNNeuralnetworkf\u03b8Output\u03c3(a)Neural-signaturemodel.Trainableparameters:\u03b8.xInputstreamFeaturemap\u03a6SignaturetransformSigNNeuralnetworkf\u03b8Output\u03c3(b)Neural-signature-augmentmodel.Trainableparameters:\u03b8.Figure1:Twosimplearchitectureswithasignaturelayer.3\fAnalternatemodelisshowninFigure1b.Following[19,20],apointwisetransformationcouldbeappliedtothestreambeforetakingthesignaturetransform.Thatis,applyingthefeaturemap\u03a6:Rd\u2192Retothed-dimensionalstreamofdata(x1,...,xn)\u2208S(Rd)yields(\u03a6(x1),...,\u03a6(xn))\u2208S(Re);thesignatureof\u03a6(x)maythenpotentiallycapturepropertiesofthestreamofdatathatwillyieldmoreeffectivemodels.3ThesignaturetransformasalayerinaneuralnetworkHowever,thereisnotalwaysaclearcandidateforthefeaturemap\u03a6andagoodchoiceislikelytobedata-dependent.Thusweproposetomake\u03a6learnablebytaking\u03a6=\u03a6\u03b8tobeaneuralnetworkwithtrainableparameters\u03b8.Inthiscase,weagainobtaintheneuralnetworkshowninFigure1b,exceptthat\u03a6isnowalsolearnable.Thesignaturehasnowbecomealayerwithinaneuralnetwork.Itconsumesatensorofshape(b,d,n)\u2013correspondingtoabatchofsizebofpathsinRdthathavebeensampledntimes\u2013andreturnsatensorofshape(b,(dN+1\u22121)/(d\u22121)),whereNisthenumberoftermsusedinthetruncatedsignature.Despitebeingformedofintegrals,thesignatureisinfactstraightforwardandef\ufb01cienttocomputeexactly,seeSectionA.3inAppendixA.Morethanthat,thecomputationmayinfactbedescribedintermsofstandardtensoroperations.Assuchitmaybebackpropagatedthroughwithoutdif\ufb01culty.3.1Stream-preservingneuralnetworksLetx=(x1,...,xn)\u2208S(Rd).Whateverthechoiceof\u03a6\u03b8,itmustpreservethestream-likenatureofthedataifwearetotakeasignatureafterwards.Thesimplestwayofdoingthisistohave\u03a6\u03b8mapRd\u2192Re,sothatitoperatespointwise.Thisde\ufb01nes\u03a6(x)by\u03a6(x)=(\u03a6\u03b8(x1),...,\u03a6\u03b8(xn))\u2208S(Re).(1)Anotherwaytopreservethestream-likenatureistosweepaonedimensionalconvolutionalongthestream;moregenerallyonecouldsweepawholefeedforwardnetworkalongthestream.Forsomem\u2208Nand\u03a6\u03b8:Rd\u00d7m\u2192Rethisde\ufb01nes\u03a6(x)by\u03a6(x)=(\u03a6\u03b8(x1,...,xm),...,\u03a6\u03b8(xn\u2212m+1,...,xn))\u2208S(Re).(2)Moregenerallystillthenetworkcouldberecurrent,byhavingmemory.Let\u03a60=0,\ufb01xm\u2208N,andde\ufb01ne\u03a6k=\u03a6\u03b8(xk,...,xk+m;\u03a6k\u22121)fork=1,...,n\u2212m+1.Thende\ufb01ne\u03a6(x)by\u03a6(x)=(\u03a61,...,\u03a6n\u2212m+1)\u2208S(Re).(3)3.2Stream-likedataItisworthtakingamomenttothinkwhatisreallymeantby\u2018stream-likenature\u2019.Thesignaturetransformisde\ufb01nedonpaths;itisappliedtoastreamofdatainS(Rd)by\ufb01rstinterpolatingthedataintoapathandthentakingthesignature.Thedataistreatedasadiscretisationorsetofobservationsofsomeunderlyingpath.Notethatthereisnothingwrongwiththepathitselfhavingadiscretestructuretoit;forexampleasentence.Inprincipleonecouldreshapeatensorofshape(b,nd)withnostream-likenatureintooneofshape(b,d,n),andthentakethesignature.Howeveritisnotclearwhatthismeansmathematically.Thereisnounderlyingpath.Thesignatureisatthispointanessentiallyarbitrarytransformation,withoutthemathematicalguaranteesnormallyassociatedwithit.3.3Stream-preservingsignatures,usingliftsWewouldliketoapplythesignaturelayermultipletimes.Howeverapplyingthesignaturetransformconsumesthestream-likenatureofthedata,whichpreventsthis.Thesolutionistoconstructastreamofsignaturesinthefollowingway:givenastreamx=(x1,...,xn)\u2208S(Rd),letxk=(x1,...,xk)fork=2,...,n,andapplythesignaturetoeachxktoobtainthestream(SigN(x2),...,SigN(xn))\u2208S(R(dN+1\u22121)/(d\u22121)).(4)4\fTheshorteststreamitismeaningfultotakethesignatureofisoflengthtwo,whichiswhythereisnocorrespondingSigN(x1)term.Inthiswaythestream-likenatureofthedataispreservedthroughthesignaturetransform.Thisnotionmaybegeneralised:let\u2018=(\u20181,\u20182,...,\u2018v):S(Rd)\u2192S(S(Re)),whichwerefertoasaliftintothespaceofstreamsofstreams(andvwilllikelydependonthelengthoftheinputto\u2018).Thenapplythesignaturestream-wisetode\ufb01neSigN(\u2018(x))bySigN(\u2018(x))=(cid:0)SigN(\u20181(x)),...,SigN(\u2018v(x))(cid:1)\u2208S(R(eN+1\u22121)/(e\u22121)).(5)Intheexampleofequation(4),\u2018isgivenby\u2018(x)=(x2,...,xn).(6)Otherplausiblechoicesfor\u2018aretocutupxintomultiplepieces,forexample\u2018(x)=((x1,x2),(x3,x4),...,(x2bn/2c\u22121,x2bn/2c)),(7)ortotakeaslidingwindow\u2018(x)=((x1,x2,x3),(x2,x3,x4),...,(xn\u22122,xn\u22121,xn)).(8)3.4MultiplesignaturelayersByinsertinglifts,thesignaturetransformmaybecomposedasmanytimesasdesired.Thatis,supposewewishtolearnamapfromS(Rd)toX,whereXissomeset.(Whichmaybe\ufb01niteforaclassi\ufb01cationproblemorin\ufb01niteforaregressionproblem.)Letci,di,ei,Ni\u2208Nbesuchthatd1=danddi+1=(cNi+1i\u22121)/(ci\u22121),fori=1,...,k.Let\u03a6\u03b8ii:Rdi\u00d7mi\u2192Rei,\u2018i:S(Rei)\u2192S(S(Rci)),f\u03b8k+1:S(R(cNk+1k\u22121)/(ck\u22121))\u2192X,where\u03a6\u03b8iiand\u2018iarede\ufb01nedinthemannerofequations(1)\u2013(3)and(6)\u2013(8),and\u03b81,...,\u03b8k+1aresometrainableparameters.Thende\ufb01ningcompositionsinthemannerofequations(1)\u2013(5),let\u03c3=(cid:16)f\u03b8k+1\u25e6SigNk\u25e6\u2018k\u25e6\u03a6\u03b8kk\u25e6\u00b7\u00b7\u00b7\u25e6\u03a6\u03b822\u25e6SigN1\u25e6\u20181\u25e6\u03a6\u03b811(cid:17)(x).Thisde\ufb01nesthedeepsignaturemodel,summarisedinFigure2.AnimportantspecialcaseiswhenV=S(Re),sothatthe\ufb01nalnetworkf\u03b8k+1isstream-preserving.Thentheoverallmodelx7\u2192\u03c3isalsostream-preserving.SeeforexampleSection4.1.Notethatinprincipleitisacceptabletotakethetriviallifttoasequenceofasingleelement,\u2018(x)=(x).(9)Takingthesignatureofthiswillthenessentiallyremovethestream-likenature,however,soitissuitableonlyforthe\ufb01nalliftofadeepsignaturemodel.WeobserveinparticularthatthisiswhatisdoneinthemodelsdescribedinFigure1,whichweidentifyasspecialcasesofthedeepsignaturemodel,lackingalsoanylearnedtransformationbeforethesignature.Itiseasytoseethatthedeepsignaturemodelexhibitstheuniversalapproximationproperty.Thisfactfollowsfromtheuniversalapproximationtheoremforneuralnetworks[27]andfromtheuniversalnonlinearitypropertyofsignatures(seePropositionA.6inAppendixA).xInputstreamNeuralnetwork\u03a6\u03b811Lift\u20181SignaturetransformSigN1Neuralnetwork\u03a6\u03b822...Lift\u2018kSignaturetransformSigNkNeuralnetworkf\u03b8k+1Output\u03c3Figure2:Deepsignaturemodel.Trainableparameters:\u03b81,...,\u03b8k+1.5\f3.5ImplementationWhenusingthesignaturetransformasafeaturetransformation,thenitsuf\ufb01cestojustpre-processandsavetheentiredatasetbeforetraining.Howeverwhenthesignaturetransformisplacedwithinaneuralnetworkthenthesignaturetransformmustbeevaluatedandbackpropagatedthroughforeachstepoftraining;thisismuchmorecomputationallyintensive.Thishasmotivatedthecreationoftheseparatespin-offSignatoryproject[22],toef\ufb01cientlyperformandbackpropagatethroughthesignaturetransform.3.6InvertingthetruncatedsignatureHowwelldoesatruncatedsignatureencodetheoriginalstreamofdata?Asimpleexperimentistoattempttorecovertheoriginalstreamofdatagivenitstruncatedsignature.Weremarkthat\ufb01ndingamathematicaldescriptionofthisinversionisachallengingtask[28,29,30].Fixastreamofdatax=(x1,...,xn)\u2208S(Rd).AssumethatthetruncatedsignatureSigN(x)andthenumberofstepsn\u2208Nareknown.NowapplygradientdescenttominimiseL(y;x)=(cid:13)(cid:13)SigN(y)\u2212SigN(x)(cid:13)(cid:13)22fory=(y1,...,yn)\u2208S(Rd).Figure3showsfourhandwrittendigitsfromthePenDigitsdataset[31].Thesolidbluepathistheoriginalpathx,whilstthedashedorangepathisthereconstructedpathyminimisingL(y;x).TruncatedsignaturesoforderN=12wereusedforthistask.Weseethatthetruncatedsignatureshavemanagedtoencodetheinputpathsxalmostperfectly.Figure3:Originalpath(blue)andpathreconstructedfromitssignature(dashedorange)forfourhandwrittendigitsinthePenDigitsdataset[31].4Numericalexperiments4.1AgenerativemodelforastochasticprocessGenerativemodelsaretypicallytrainedtolearntotransformrandomnoisetoatargetdistribution.OnecommonapproachareGenerativeAdversarialNetworks[32].Analternativeapproachistode\ufb01neadistanceonthespaceofdistributionsbyembeddingthemintoaReproducingKernelHilbertSpace.Thediscriminatoristhena\ufb01xedtwo-sampletestbasedonakernelmaximummeandiscrepancy.ThisisknownasaGenerativeMomentMatchingNetwork[33,34,35].Withthisframeworkweproposeadeepsignaturemodeltogeneratesequentialdata.Thediscriminatorisasin[19,20].ThenaturalchoiceforrandomnoiseisBrownianmotionBt.De\ufb01nethekernelk:S(Rd)\u00d7S(Rd)\u2192Rbyk(x,y)=(cid:0)SigN(\u03bbxx),SigN(\u03bbyy)(cid:1),where\u03bbx\u2208RisacertainnormalisingconstantwhichguaranteesthatkisthekernelofaReproducingKernelHilbertSpace,and(\u00b7,\u00b7)denotesthedotproduct.Givennsamples{x(i)}ni=1\u2286S(Rd)fromthegeneratorandmsamples{y(i)}mi=1\u2286S(Rd)fromthetargetdistribution,de\ufb01nethelossTbyT(cid:16){x(i)}ni=1,{y(i)}mi=1(cid:17)=1n2Xi,jk(x(i),x(j))\u22122nmXi,jk(x(i),y(j))+1m2Xi,jk(y(i),y(j)).Lettheinputtothenetworkbetime-augmentedBrownianmotionB=((t1,Bt1),...,(tn,Btn))\u2208S(R2).6\fBStepsChannels\u03a6\u03b81StepsChannels\u2018StepsChannelsStepsofstepsSigNStepsSignaturetermsf\u03b82StepsChannelsxStepsChannelsSigMStepsSignaturetermsGeneratorDiscriminatorTwo-sampletestyStepsChannelsSigMStepsSignaturetermsFigure4:Generativemodelarchitecture.Trainableparameters:\u03b81,\u03b82.Thereisanimplicitbatchdimensionthroughout.Giventwostream-preservingneuralnetworks\u03a6\u03b81andf\u03b82,andalift\u2018,thenthegenerativemodelisde\ufb01nedbyx=(f\u03b82\u25e6SigN\u25e6\u2018\u25e6\u03a6\u03b81)(B).TheoverallmodelisshowninFigure4.Inanicetwist,boththegeneratorandthediscriminatorinvolvethesignature.Figure5:Generatedpathsalongsidetheoriginalpaths.Observehowthegenerativepartisaparticularcaseofthedeepsignaturemodel,andthatfurthermorethewholegenerator-discriminatorpairisalsoaparticularcaseofthedeepsignaturemodel,withthetrivialliftofequation(9)beforethesecondsignaturelayer.Weappliedtheproposedmodeltoadatasetof1024realisationsofanOrnstein\u2013Uhlenbeckprocess[36].Thelosswasminimisedat6.6\u00d710\u22124,whichimpliesthatthegeneratedpathsarestatisticallyalmostindistinguishablefromtherealOrnstein\u2013Uhlenbeckprocess.Figure5showsthegeneratedpathsalongsidetheoriginalones.FurtherimplementationdetailsareinAppendixB.4.2SupervisedlearningwithfractionalBrownianmotionFractionalBrownianmotion[37]isaGaussianprocessBH:[0,\u221e)\u2192RthatgeneralisesBrownianmotion.Itisself-similarandexhibitsfractal-likebehaviour.FractionalBrownianmotiondependsuponaparameterH\u2208(0,1),knownastheHurstparameter.LowerHurstparametersresultinnoticeablyrougherpaths.ThecaseofH=1/2correspondstousualBrownianmotion.FractionalBrownianmotionhasbeensuccessfullyusedtomodelphenomenaindiverse\ufb01elds.Forexample,empiricalevidencefrom\ufb01nancialmarkets[38]suggeststhatlog-volatilityiswellmodelledbyfractionalBrownianmotionwithHurstparameterH\u22480.1.EstimatingtheHurstparameterofafractionalBrownianmotionpathisconsideredanontrivialtaskbecauseofthepaths\u2019non-stationarityandlongrangedependencies[39].Wetrainavarietyofmodelstoperformthisestimation.Thatis,tolearnthemapxH7\u2192H,wherexH=((t0,BHt0),...,(tn,BHtn))\u2208S(R2)forsomerealisationofBH.7\fTable1:Finaltestmeansquarederror(MSE)forthedifferentmodels,averagedover3trainingruns,orderedfromlargesttosmallest.TestMSEMeanVariance#ParamsRescaledRange7.2\u00d710\u221223.7\u00d710\u22123N/ALSTM4.3\u00d710\u221228.0\u00d710\u2212312961Feedforward2.8\u00d710\u221223.0\u00d710\u2212310209Neural-Sig1.1\u00d710\u221228.2\u00d710\u2212410097GRU3.3\u00d710\u221231.3\u00d710\u221239729RNN1.7\u00d710\u221234.9\u00d710\u2212410091DeepSigNet2.1\u00d710\u221248.7\u00d710\u221259261DeeperSigNet1.6\u00d710\u221242.1\u00d710\u221259686Figure6:PerformanceatestimatingtheHurstparameterforvariousmodels,withandwithoutsignatures,foraparticular(typical)trainingrun.TheresultsareshowninFigure6andTable1.AlsoshowninTable1aretheresultsoftherescaledrangemethod[40],whichisamathematicallyderivedmethodratherthanalearnedmethod.RNN,GRUandLSTMmodelsprovidebaselinesinthecontextofrecurrentneuralnetworks.ThesimpleNeural-SigmodeloutlinedpreviouslyinFigure1aprovidesabaselinefromthecontextofsignatures.DeepSigNetandDeeperSigNetarebothdeepsignaturemodelsoftheformgivenbyFigure2.DeepSigNethasasinglelargeNeural-Lift-Signatureblock,whilstDeeperSigNethasthreesmallerones.Weobservethattraditionalsignaturebasedmodelsperformslightlyworsethantraditionalrecurrentmodels,butthatdeepsignaturemodelsoutperformallothermodelsbyatleastanorderofmagnitude.FurtherimplementationdetailsarefoundinAppendixB.4.3Non-MarkoviandeepreinforcementlearningFinallyweshowhowtheseideasmaybeextended,bydemonstratingamodelthataddsaresidualconnectiontothedeepsignaturemodel;itmayalsobeinterpretedasusingsignaturesasthememoryofarecurrentneuralnetwork.Asanexample,weapplythisarchitecturetotackleanon-Markovianreinforcementlearningproblem.Thismeansthattheoptimalactiondependsnotjustonthecurrentstateoftheenvironment,butuponthehistoryofpaststates,sothattheagentmustmaintainamemory.Let\u03a6\u03b81:Rd\u2192Reandf\u03b82:Rd+(eN+1\u22121)/(e\u22121)\u2192{actions}befunctionsdependingonlearnableparameters\u03b81,\u03b82.Giveninputxi\u2208Rdattimei,letyi=\u03a6\u03b81(xi),\u03c3i=\u03c3i\u22121\u2297SigN((yi\u22121,yi)),ai=f\u03b82(xi,\u03c3i),whereaiistheactionproposedbythenetworkattimei,andyiand\u03c3iarethememoryattimei,and\u2297denotesthetensorproductasinA.13inAppendixA.8\fThemodelissummarisedinFigure7asarecurrentneuralnetworkwithsignature-basedmemory.Notethatyiispreservedinmemoryonlytocomputethesignatureatthenexttimestep,astheshortestpathitismeaningfultocomputethesignatureofisoflengthtwo.However,notethatbyPropositionA.15inAppendixA,\u03c3i=SigN(\u03a6\u03b81(x1),...,\u03a6\u03b81(xi))\u2208R(eN+1\u22121)/(e\u22121).xi\u03a6\u03b81yiSigNyi\u22121\u2297\u03c3i\u03c3i\u22121f\u03b82aiFigure7:Agentarchitectureasarecurrentnetwork.Trainableparameters:\u03b81,\u03b82.Furthermorethexi,yi,\u03c3iandaimaybecollectedintostreams(xi)i\u2208S(Rd),(yi)i\u2208S(Re),(\u03c3i)i\u2208S(R(eN+1\u22121)/(e\u22121)),(ai)i\u2208S({actions}).Inthiswaywemayinterpretthismodelasageneralisationofdeepsignaturemodel:ithasasingleNeural-Lift-Signatureblock,withaskipconnectionacrossthewholeblock.Theneuralcomponentisgivenbytheneuralnetwork\u03a6\u03b81,whichisstream-preservingasitoperatespointwise,inthemannerofequation(1).Theliftisthe\u2018expandingwindow\u2019liftgivenbyequation(6).Finallyf\u03b82isanotherneuralnetwork,whichisagainpointwiseandthusstream-preserving.ThisinterpretationofthemodelisdemonstratedinFigure8.Wetestthismodelonanon-Markovianmodi\ufb01cationtotheclassicalMountainCarproblem[41],inwhichtheagentreceivesonlypartialinformation:itisonlygiventhecar\u2019sposition,andnotitsvelocity.We\ufb01ndthatitiscapableoflearninghowtosolvetheproblemwithinasetnumberofepisodes,whilstacomparableRNNarchitecturefailstodoso.ThereinforcementlearningtechniqueusedwasDeepQLearning[42]withthespeci\ufb01edmodelsperformingfunctionapproximationonQ.Bothmodelswerechosentohavecomparablenumbersofparameters.FurtherimplementationdetailscanbefoundinAppendixB.(xi)i\u03a6\u03b81(yi)i\u2018SigN(\u03c3i)if\u03b82(ai)iFigure8:Agentarchitectureasaresidualnetwork.Trainableparameters:\u03b81,\u03b82.Thelift\u2018isthe\u2018expandingwindow\u2019liftofequation(6).5ConclusionThereisastrongcorpusoftheorymotivatingtheuseofthesignaturetransformasatooltounderstandstreamsofdata.Meanwhileneuralnetworkshaveenjoyedgreatempiricalsuccess.Itisthusdesirabletobringthemtogether;inthispaperwehavedescribedhowthismaybedoneinageneralfashion,andhaveprovidedexamplesofhowthisprinciplemaybeusedinavarietyofdomains.Therearetwokeycontributions.First,wediscussstream-preservingneuralnetworks,whicharewhatallowforusingsignaturetransformsdeeperwithinanetwork,ratherthanasjustafeaturetransformation.Second,wediscusslifts,whichiswhatallowsfortheuseofmultiplesignaturetransforms.Inthiswaywehavesigni\ufb01cantlyextendedtheuseofthesignaturetransforminmachinelearning:ratherthanlimitingitsusagetodatapreprocessing,wedemonstratehowthesignaturetransform,asauniveralnonlinearity,maybeusedasapoolinglayerwithinaneuralnetwork.9\fAcknowledgementsPBwassupportedbytheEPSRCgrantEP/R513295/1.PKwassupportedbytheEPSRCgrantEP/L015811/1.PK,IPA,CS,TLweresupportedbytheAlanTuringInstituteundertheEPSRCgrantEP/N510129/1.References[1]I.ChevyrevandA.Kormilitzin,\u201cAprimeronthesignaturemethodinmachinelearning,\u201darXivpreprintarXiv:1603.03788,2016.[2]K.T.Chen,\u201cIteratedintegralsandexponentialhomomorphisms,\u201dProc.LondonMath.Soc,4,502\u2013512,1954.[3]K.T.Chen,\u201cIntegrationofpaths,geometricinvariantsandageneralizedBaker-Hausdorffformula,\u201dAnn.ofMath.(2),65:163\u2013178,1957.[4]K.T.Chen,\u201cIntegrationofpaths-afaithfulrepresentationofpathsbynon-commutativeformalpowerseries,\u201dTrans.Amer.Math.Soc.89(1958),395\u2013407,1958.[5]T.Lyons,H.Ni,andH.Oberhauser,\u201cAfeaturesetforstreamsandanapplicationtohigh-frequency\ufb01nancialtickdata,\u201dICBDC,2014.[6]L.G.Gyurk\u00b4o,T.Lyons,M.Kontkowski,andJ.Field,\u201cExtractinginformationfromthesignatureofa\ufb01nancialdatastream,\u201darXivpreprintarXiv:1307.7244,2014.[7]T.Lyons,S.Nejad,andI.P.Arribas,\u201cNonparametricpricingandhedgingofexoticderivatives,\u201darXivpreprintarXiv:1905.00711,2019.[8]T.Lyons,S.Nejad,andI.P.Arribas,\u201cModel-freepricingandhedgingindiscretetimeusingroughpathsignatures,\u201darXivpreprintarXiv:1905.01720,2019.[9]J.Kalsi,T.Lyons,andI.P.Arribas,\u201cOptimalexecutionwithroughpathsignatures,\u201darXivpreprintarXiv:1905.00728,2019.[10]T.J.Lyons,\u201cDifferentialequationsdrivenbyroughsignals,\u201dRevistaMatem\u00b4aticaIberoamericana,vol.14,no.2,pp.215\u2013310,1998.[11]P.K.FrizandN.B.Victoir,\u201cMultidimensionalstochasticprocessesasroughpaths:theoryandapplications,\u201dCambridgeUniversityPress,2010.[12]W.Yang,L.Jin,andM.Liu,\u201cChinesecharacter-levelwriteridenti\ufb01cationusingpathsignaturefeature,DropStrokeanddeepCNN,\u201din201513thInternationalConferenceonDocumentAnalysisandRecognition(ICDAR),pp.546\u2013550,IEEE,2015.[13]Z.Xie,Z.Sun,L.Jin,H.Ni,andT.Lyons,\u201cLearningspatial-semanticcontextwithfullyconvolutionalrecurrentnetworkforonlinehandwrittenChinesetextrecognition,\u201dIEEEtransactionsonpatternanalysisandmachineintelligence,vol.40,no.8,pp.1903\u20131917,2018.[14]W.Yang,L.Jin,D.Tao,Z.Xie,andZ.Feng,\u201cDropSample:Anewtrainingmethodtoenhancedeepconvolutionalneuralnetworksforlarge-scaleunconstrainedhandwrittenChinesecharacterrecognition,\u201dPatternRecognition,vol.58,pp.190\u2013203,2016.[15]W.Yang,L.Jin,andM.Liu,\u201cDeepwriterid:Anend-to-endonlinetext-independentwriteridenti\ufb01cationsystem,\u201dIEEEIntelligentSystems,vol.31,no.2,pp.45\u201353,2016.[16]C.Li,X.Zhang,andL.Jin,\u201cLPSNet:anovellogpathsignaturefeaturebasedhandgesturerecognitionframework,\u201dinProceedingsoftheIEEEInternationalConferenceonComputerVision,pp.631\u2013639,2017.[17]W.Yang,T.Lyons,H.Ni,C.Schmid,L.Jin,andJ.Chang,\u201cLeveragingthepathsignatureforskeleton-basedhumanactionrecognition,\u201darXivpreprintarXiv:1707.03993,2017.[18]W.Yang,L.Jin,H.Ni,andT.Lyons,\u201cRotation-freeonlinehandwrittencharacterrecognitionusingdyadicpathsignaturefeatures,hangingnormalization,anddeepneuralnetwork,\u201din201623rdInternationalConferenceonPatternRecognition(ICPR),pp.4083\u20134088,IEEE,2016.[19]F.J.Kir\u00b4alyandH.Oberhauser,\u201cKernelsforsequentiallyordereddata,\u201dJournalofMachineLearningResearch,2019.[20]I.ChevyrevandH.Oberhauser,\u201cSignaturemomentstocharacterizelawsofstochasticprocesses,\u201darXivpreprintarXiv:1810.10971,2018.[21]T.Lyons,\u201cRoughpaths,signaturesandthemodellingoffunctionsonstreams,\u201darXivpreprintarXiv:1405.4537,2014.10\f[22]P.Kidger,\u201cSignatory:differentiablecomputationsofthesignatureandlogsignaturetransforms,onbothCPUandGPU,\u201d2019.https://github.com/patrick-kidger/signatory.[23]A.Silvescu,\u201cFourierneuralnetworks,\u201dProceedingsoftheInternationalJointConferenceOnNeuralNetworks,IEEE,1999.[24]L.Mingo,L.Aslanyan,J.Castellanos,M.Diaz,andV.Riazanov,\u201cFourierneuralnetworks:anapproachwithsinusoidalactivationfunctions,\u201dInt.J.Inf.TheoryAppl.,11,2004.[25]M.GashlerandS.Ashmore,\u201cModelingtimeseriesdatawithdeepfourierneuralnetworks,\u201dNeurocomputing,2016.[26]Q.ZhangandA.Benveniste,\u201cWaveletnetworks,\u201dIEEETrans.NeuralNetw.,1992.[27]A.Pinkus,\u201cApproximationtheoryoftheMLPmodelinneuralnetworks,\u201dActaNumer.,vol.8,pp.143\u2013195,1999.[28]T.J.LyonsandW.Xu,\u201cInvertingthesignatureofapath,\u201dJournaloftheEuropeanMathematicalSociety,vol.20,no.7,pp.1655\u20131687,2018.[29]J.Chang,N.Duf\ufb01eld,H.Ni,W.Xu,etal.,\u201cSignatureinversionformonotonepaths,\u201dElectronicCommunicationsinProbability,vol.22,2017.[30]J.Chang,Effectivealgorithmsforinvertingthesignatureofapath.PhDthesis,UniversityofOxford,2018.[31]D.DuaandC.Graff,\u201cUCIMachineLearningRepository,\u201d2017.[32]I.Goodfellow,J.Pouget-Abadie,M.Mirza,B.Xu,D.Warde-Farley,S.Ozair,A.Courville,andY.Bengio,\u201cGenerativeadversarialnets,\u201dinAdvancesinneuralinformationprocessingsystems,pp.2672\u20132680,2014.[33]Y.Li,K.Swersky,andR.Zemel,\u201cGenerativemomentmatchingnetworks,\u201dICML,2015.[34]G.K.Dziugaite,D.M.Roy,andZ.Ghahramani,\u201cTraininggenerativeneuralnetworksviamaximummeandiscrepancyoptimization,\u201dUAI,2015.[35]A.Gretton,K.M.Borgwardt,M.Rasch,B.Scholkopf,andA.J.Smola,\u201cAkernelmethodforthetwo-sampleproblem,\u201dAdvancesinNeuralInformationProcessingSystems,2007.[36]G.E.UhlenbeckandL.S.Ornstein,\u201cOnthetheoryoftheBrownianmotion,\u201dPhysicalreview,vol.36,no.5,p.823,1930.[37]Y.Mishura,StochasticcalculusforfractionalBrownianmotionandrelatedprocesses,vol.1929.SpringerScience&BusinessMedia,2008.[38]J.Gatheral,T.Jaisson,andM.Rosenbaum,\u201cVolatilityisRough,\u201dQuantitativeFinance,18:6,933-949,2018.[39]L.Lacasa,B.Luque,J.Luque,andJ.C.Nuno,\u201cThevisibilitygraph:AnewmethodforestimatingtheHurstexponentoffractionalBrownianmotion,\u201dEPL(EurophysicsLetters),vol.86,no.3,p.30001,2009.[40]H.Hurst,\u201cTheLong-TermStorageCapacityofReservoirs,\u201dTransactionsoftheAmericanSocietyofCivilEngineers,1951.[41]G.Brockman,V.Cheung,L.Pettersson,J.Schneider,J.Schulman,J.Tang,andW.Zaremba,\u201cOpenaigym,\u201darXivpreprintarXiv:1606.01540,2016.[42]V.Mnih,K.Kavukcuoglu,D.Silver,A.A.Rusu,J.Veness,M.G.Bellemare,A.Graves,M.Riedmiller,A.K.Fidjeland,G.Ostrovski,etal.,\u201cHuman-levelcontrolthroughdeepreinforcementlearning,\u201dNature,vol.518,no.7540,p.529,2015.[43]B.M.HamblyandT.J.Lyons,\u201cUniquenessforthesignatureofapathofboundedvariationandthereducedpathgroup,\u201dAnnalsofMathematics,vol.171,no.1,pp.109\u2013167,2010.[44]I.PerezArribas,\u201cDerivativespricingusingsignaturepayoffs,\u201darXivpreprintarXiv:1809.09466,2018.[45]D.KingmaandJ.Ba,\u201cAdam:Amethodforstochasticoptimization,\u201dICLR,2015.[46]A.Paszke,S.Gross,S.Chintala,G.Chanan,E.Yang,Z.DeVito,Z.Lin,A.Desmaison,L.Antiga,andA.Lerer,\u201cAutomaticdifferentiationinPyTorch,\u201d2017.[47]J.ReizensteinandB.Graham,\u201cTheiisignaturelibrary:ef\ufb01cientcalculationofiterated-integralsignaturesandlogsignatures,\u201darXivpreprintarXiv:1802.08252,2018.[48]P.Embrechts,Selfsimilarprocesses,vol.21.PrincetonUniversityPress,2009.[49]C.J.C.H.Watkins,\u201cLearningfromdelayedrewards,\u201d1989.11\f", "award": [], "sourceid": 1757, "authors": [{"given_name": "Patrick", "family_name": "Kidger", "institution": "University of Oxford"}, {"given_name": "Patric", "family_name": "Bonnier", "institution": "University of Oxford"}, {"given_name": "Imanol", "family_name": "Perez Arribas", "institution": "University of Oxford"}, {"given_name": "Cristopher", "family_name": "Salvi", "institution": "University of Oxford"}, {"given_name": "Terry", "family_name": "Lyons", "institution": "University of Oxford"}]}