disjoint街头cs枪战

街头cs枪战  时间:2021-04-01  阅读:()
arXiv:0708.
4288v1[cs.
DS]31Aug2007PatternMatchinginTreesandStringsPhilipBilleAPhDDissertationPresentedtotheFacultyoftheITUniversityofCopenhageninPartialFullmentoftheRequirementsforthePhDDegreeJune2007PrefaceTheworkpresentedinthisdissertationwasdonewhileIwasenrolledasaPhDstudentattheITUniversityofCopenhageninthe4-yearPhDprogram.
MyworkwasfundedbytheEU-project"DeepStructure,Singularities,andComputerVision"(ISTProgrammeoftheEuropeanUnion(IST-2001-35443)).
Duringthesummerof2003myadvisorsStephenAlstrupandTheisRauhelefttostarttheirowncompanyandmyadvisorsthenbecameLarsBirkedalandAnna¨OstlinPagh.
IntheperiodfromMarch2003toSeptember2003Iwasonpaternityleave.
IreceivedmyMastersDegreeinJanuary2005.
InSpring2005IvisitedMartinFarach-ColtonatRutgersUniversitytwiceforatotalperiodoftwomonths.
IntheperiodfromOctober2006toApril2007Iwasonanother6monthsofpaternityleave.
Finally,intheremainingperiodIcamebacktonishthepresentdissertation.
IwanttothankalloftheinspiringpeoplethatIhaveworkedwithduringmyPhD.
Inparticular,IwanttothankStephenAlstrupandTheisRauheforintroducingmetotheiruniqueperspectiveonalgorithms.
IalsowanttothankLarsBirkedal,Anna¨OstlinPagh,andRasmusPaghfortonsofguidance.
IamgratefultoMartinFarach-ColtonfortheverypleasantstayatRutgersUniversity.
Thankstoallofmyco-authors:StephenAlstrup,TheisRauhe,IngeLiGrtz,MartinFarach-Colton,RolfFagerberg,ArjanKuijper,OleFoghOlsen,PeterGiblin,andMadsNielsen.
Thankstothepeoplewhoreadandcommentedonearlierdraftsofthedissertation:IngeLiGrtz,SrenDebois,RasmusPagh,andTheisRauhe.
Finally,thankstoallofmycolleaguesattheITUniversityforcreatingafunandinterestingworkenvironment.
iiiAbstractWestudythedesignofecientalgorithmsforcombinatorialpatternmatching.
Moreconcretely,westudyalgorithmsfortreematching,stringmatching,andstringmatchingincompressedtexts.
TreeMatchingSurveyWebeginwithasurveyontreematchingproblemsforlabeledtreesbasedondeleting,inserting,andrelabelingnodes.
Wereviewtheknownresultsforthetreeeditdistanceproblem,thetreealignmentdistanceproblem,andthetreeinclusionproblem.
Thesurveycoversbothorderedandunorderedtrees.
Foreachoftheproblemswepresentoneormoreofthecentralalgorithmsforeachoftheproblemsindetail.
TreeInclusionGivenrooted,ordered,andlabeledtreesPandTthetreeinclusionproblemistodetermineifPcanbeobtainedfromTbydeletingnodesinT.
WeshowthatthetreeinclusionproblemcanbesolvedinO(nT)spacewiththefollowingrunningtimes:minO(lPnT),O(nPlTloglognT+nT),O(nPnTlognT+nTlognT).
HerenSandlSdenotesthenumberofnodesandleavesintreeS∈{P,T},respectively,andweassumethatnP≤nT.
OurresultsmatchesorimprovestheprevioustimecomplexitieswhileusingonlyO(nT)space.
Allpreviousalgorithmsrequired(nPnT)spaceinworst-case.
TreePathSubsequenceGivenrootedandlabeledtreesPandTthetreepathsubsequenceproblemistoreportwhichpathsinParesubsequencesofwhichpathsinT.
Hereapathbeginsattherootandendsataleaf.
WeshowthatthetreepathsubsequenceproblemcanbesolvedinO(nT)spacewiththefollowingrunningtimes:minO(lPnT+nP),O(nPlT+nT),O(nPnTlognT+nT+nPlognP).
AsourresultsforthetreeinclusionproblemthismatchesorimprovestheprevioustimecomplexitieswhileusingonlyO(nT)space.
Allpreviousalgorithmsrequired(nPnT)spaceinworst-case.
RegularExpressionMatchingUsingtheFourRussianTechniqueGivenaregularexpressionRandastringQtheregularexpressionmatchingproblemistodetermineifQmatchesanyofthestringsspeciedbyR.
WegiveanalgorithmforregularexpressionmatchingusingO(nm/logn+n+mlogm)andO(n)space,wheremandnarethelengthsofRandQ,respectively.
ThismatchestherunningtimeofthefastestknownalgorithmfortheproblemwhileimprovingthespacefromO(nm/logn)toO(n).
OuralgorithmisbasedontheFourRussianTechnique.
Weextendourideastoimprovetheresultsfortheapproximateregularexpressionmatchingproblem,thestringeditdistanceproblem,andthesubsequenceindexingproblem.
iiiRegularExpressionMatchingUsingWord-LevelParallelismWerevisittheregularexpressionmatchingproblemanddevelopnewalgorithmsbasedonword-levelparalleltechniques.
OnaRAMwithastandardinstructionsetandwordlengthw≥logn,weshowthattheproblemcanbesolvedinO(m)spacewiththefollowingrunningtimes:O(nmlogww+mlogw)ifm>wO(nlogm+mlogm)if√w0.
GivenaregularexpressionRandastringQtheregularexpressionmatchingproblemistodecideifQ∈L(R).
AniteautomatonisatupleA=(V,E,Σ,θ,Φ),whereVisasetofnodescalledstates,EissetofdirectededgesbetweenstatescalledtransitionseachlabeledbyacharacterfromΣ∪{},θ∈Visastartstate,andΦVisasetofnalstates.
Inshort,Aisanedge-labeleddirectedgraphwithaspecialstartnodeandasetofacceptingnodes.
Aisadeterministicniteautomaton(DFA)ifAdoesnotcontainany-transitions,andalloutgoingtransitionsofanystatehavedierentlabels.
Otherwise,Aisanon-deterministicautomaton(NFA).
WesaythatAacceptsastringQifthereisapathfromthestartstatetoanacceptingstatesuchthattheconcatenationoflabelsonthepathspellsoutQ.
Otherwise,ArejectsQ.
LetRbearegularexpressionoflengthmandletQbeastringoflengthn.
TheclassicsolutiontoregularexpressionmatchingistorstconstructaNFAAacceptingallstringsinL(R).
ThereareseveralNFAconstructionswiththisproperty[MY60,Glu61,Tho68].
Secondly,wesimulateAonQbyproducingasequenceofstate-setsS0,SnsuchthatSiconsistsofallstatesinAforwhichthereisapathfromthestartstateofAthatspellsouttheithprexofQ.
Finally,SncontainsanacceptingstateofAifandonlyifAacceptsQandhencewecandetermineifQmatchesastringinL(R)byinspectingSn.
Thompson[Tho68]gaveasimplewell-knownNFAconstructionforregularexpressionsthatwewillcallaThompson-NFA(TNFA).
ForRtheTNFAAhasatmost2mstatesand4mtransitions,asingleacceptingstate,andcanbecomputedinO(m)time.
Eachofthestate-setinthesimulationofAonQcanbecomputedinO(m)timeusingabreadth-rstsearchofA.
ThisimpliesanalgorithmforregularexpressionmatchingusingO(nm)time.
Eachofthestate-setsonlydependsonthepreviousoneandthereforethespaceisO(m).
ThefulldetailsofThompson'sconstructionisgiveninSection6.
2.
WenotethattheregularexpressionmatchingproblemissometimesdenedasreportingalloftheendingpositionsofsubstringsofQmatchingR.
Thompson'salgorithmcaneasilybeadaptedwithoutlossofeciencyforthisproblem.
Simplyaddthestartstatetothecurrentstate-setbeforecomputingthenextandinspecttheacceptingstateofthestate-setsateachstep.
Allofthealgorithmspresentedinthissectioncanbeadaptedinasimilarfashionsuchthattheboundslistedbelowalsoholdforthisvariation.
InpracticalimplementationsregularexpressionmatchingisoftensolvedbyconvertingtheNFAacceptingtheregularexpressionintoaDFAbeforesimulation.
However,inworst-casethestandardDFA-constructionneedsO(22mm/wσ)space.
WithamoresuccinctrepresentationoftheDFAthespacecanbereducedtoO((2m+σ)m/w)[NR04,WM92b].
Notethatthespacecomplexityisstillexponentialinthelengthoftheregularexpression.
Normally,itisreportedthatthetimecomplexityforsimulatingtheDFAisO(n),however,thisanalysisdoesnotaccountforthelimitedwordsizeofthewordRAM.
Inparticular,sincethereare2mstatesintheDFAeachstaterequires(m)bitstobeaddressed.
Thereforewemayneed(m/w)timetoidentifythenextstateandthusthetotaltimetosimulatetheDFAbecomes(nm/w).
ThisboundismatchedbyNavarroandRanot[NR04]whoshowedhowtosolvetheprobleminO((n+2m)m/w)timeandO((2m+σ)m/w)space.
NavarroandRanot[NR04]suggestedusingatablesplittingtechniquetoimprovethespacecomplexityoftheDFAalgorithmforregularexpressionmatching.
ForanysthistechniquegivesanalgorithmusingO((n+2m/s)sm/w)timeandO((2m/ss+σ)m/w)space.
TheDFA-basedalgorithmsareprimarilyinterestingforsucientlysmallregularexpressions.
Forin-stance,ifm=O(logn)itfollowsthatregularexpressionmatchingcanbesolvedinO(n)timeandO(n+σ)space.
SeveralheuristicscanbeappliedtofurtherimprovetheDFAbasedalgorithms,i.
e.
,weoftendonotneedtollinallentriesinthetable,theDFAcanbestoredinanadjacency-listrepresentationand9minimized,etc.
Noneoftheseimprovetheaboveworst-casecomplexitiesoftheDFAbasedalgorithms.
Myers[Mye92a]showedhowtoecientlycombinethebenetsofNFAswithDFA.
ThekeyideainMyers'algorithmistodecomposetheTNFAbuiltfromRintoO(m/logn)subautomataeachconsistingofΘ(logn)states.
UsingtheFourRussiantechnique[ADKF70]eachsubautomatonisconvertedintoaDFAusingO(2m)=O(n)spacegivingatotalspacecomplexityofO(nm/logn).
ThesubautomatacanthenbesimulatedinconstanttimeleadingtoanalgorithmusingO(nm/logn+(n+m)logm)time.
ThedetailsofMyers'algorithmcanbefoundinSections5.
2and6.
3.
Forvariantsandextensionoftheregularexpressionmatchingproblemsee[KM95b,MOG98,Yam01,NR03,YM03,ISY03].
OurResultsandTechniquesInSection5.
2weimprovethespacecomplexityofMyers'FourRussianalgorithm.
WepresentanalgorithmusingO(nm/logn+n+mlogm)timeandO(n)space(Theorem12).
Hence,wematchorimprovetherunningtimeofMyers'algorithmwhilewesignicantlyimprovethespacecomplexityfromO(nm/logn)toO(n).
AsinMyers'algorithm,ournewresultisachievedusingadecompositionoftheTNFAintosmallsub-automataofΘ(logn)states.
Toimprovethespacecomplexitywegiveamoreecientencoding.
First,werepresentthelabelsoftransitionsineachsubautomatonusingdeterministicdictionaries[HMP01].
Secondly,weboundthenumberofdistinctTNFAswithoutlabelsontransitions.
UsingthisboundweshowthatitispossibletoencodeallTNFAswithx=Θ(logn)statesintotalspaceO(n),therebyobtainingourresult.
Ourspace-ecientFourRussianalgorithmforregularexpressionmatchingisfasterthanThompson'salgorithmwhichusesO(nm)time.
However,toachievethespeedupweuse(n)space,whichmaystillbesignicantlylargerthantheO(m)spaceusedbyThompson'salgorithm.
InChapter6westudyadierentandmorespace-ecientapproachtoregularexpressionmatching.
Specically,weshowthatregularexpressionmatchingcanbesolvedinO(m)spacewiththefollowingrunningtimes(Theorem16):O(nmlogww+mlogw)ifm>wO(nlogm+mlogm)if√wlogn,weachieveanO(lognloglogn)factorspeedupoverThompson'salgorithmusingO(m)space.
Inthiscasewesimultaneouslymatchthebestknowntimeandspaceboundsfortheproblem,withtheexceptionofanO(loglogn)factorintime.
Next,considerthecasewhentheregularexpressionis"small",e.
g.
,m=O(logn).
Inthiscase,wegetanalgorithmusingO(nloglogn)timeandO(logn)space.
Hence,thespaceisimprovedexponentiallyatthecostofanO(loglogn)factorintime.
Inthecaseofanevensmallerregularexpression,e.
g.
,m=O(√logn),theslowdowncanbeeliminatedandweachieveoptimalO(n)time.
Forlargerwordlengths,ourtimeboundsimprove.
Inparticular,whenw>lognloglogntheboundisbetterinallcases,exceptfor√w≤m≤w,andwhenw>log2nitimprovesthetimeboundofMyers'algorithm.
AsinMyers'andourpreviousalgorithmsforregularexpressionmatching,thisalgorithmisbasedonadecompositionoftheTNFA.
However,forthisresult,aslightlymoregeneraldecompositionisneededtohandledierentsizesofsubautomata.
Weprovidethisbyshowinghowany"black-box"algorithmforsimulatingsmallTNFAscanecientlyconvertedintoanalgorithmforsimulatinglargerTNFAs(seeSection6.
3andLemma38).
ToachieveO(m)spacewecannotaordtoencodethesubautomataasintheFourRussianalgorithms.
Insteadwepresenttwoalgorithmsthatsimulatethesubautomatausingword-levelparallelism.
ThemainproblemindoingsoisthecomplicateddependenciesamongstatesinTNFAs.
Astatemaybeconnectedvialongpathsof-transitionstonumberofotherstates,allofwhichhavetobetraversedinparalleltosimulatetheTNFA.
Ourrstalgorithm,presentedinSection6.
4,simulatesTNFAswithO(√w)statesinconstanttimeforeachstep.
Themainideaistoexplicitlyrepresentthetransitiveclosureofthe-pathscompactlyinaconstantnumberofwords.
Combinedwithanumberofsimplewordoperationstoweshowhowtocomputethenextstate-setinconstanttime.
Oursecond,morecomplicated10algorithm,presentedinSection6.
5simulatesTNFAsofwithO(w)statesinO(logw)timeforeachstep.
Insteadofrepresentingthetransitiveclosureoftheofthe-pathsthisalgorithmrecursivelydecomposestheTNFAintoO(logw)levelsthatrepresentincreasinglysmallersubautomata.
Usingthisdecompositionwethenshowtotraverseallofthe-pathsinconstanttimeforeachlevel.
Wecombinethetwoalgorithmswithourblack-boxsimulationoflargeTNFAs,andchoosethebestalgorithminthevariouscasestogetthestatedresult.
1.
4.
3ApproximateRegularExpressionMatchingGivenaregularexpressionR,astringQ,andanerrorthresholdktheapproximateregularexpressionmatchingproblemistodetermineiftheminimumunit-coststringeditdistancebetweenQandastringinL(R)isatmostk.
AsintheaboveletmandnbethelengthsofRandQ,respectively.
MyersandMiller[MM89]introducedtheproblemandgaveanO(nm)timeandO(m)spacealgorithm.
Theiralgorithmisanextensionofthestandarddynamicprogrammingalgorithmforapproximatestringmatchingadaptedtohandleregularexpressions.
Notethatthetimeandspacecomplexitiesarethesameasinthesimplecaseofstrings.
Assumingaconstantsizedalphabet,Wuetal.
[WMM95]proposedaFourRussianalgorithmusingO(mnlog(k+2)logn+n+m)timeandO(m√nlog(k+2)logn+n+m)space.
ThisalgorithmcombinesdecompositionofTNFAsintosubautomataastheearlieralgorithmofMyersforregularexpressionmatching[Mye92a]andthedynamicprogrammingideaofMyersandMiller[MM89]forapproximateregularexpressionmatching.
Recently,Navarro[Nav04]proposedapracticalDFAbasedsolutionforsmallregularexpressions.
Variantsofapproximateregularexpressionmatchingincludingextensionstomorecomplexcostfunctionscanbefoundin[MM89,KM95c,Mye92b,MOG98,NR03].
OurResultsandTechniquesInSection5.
3wepresentanalgorithmforapproximateregularexpressionmatchingusingO(mnlog(k+2)logn+n+mlogm)timeandO(n)spacethatworksforanyalphabet(Theorem13).
Hence,wematchtherunningtimeofWuetal.
[WMM95]whileimprovingthespacecomplexityfromO(m√nlog(k+2)logn+n+m)toO(n).
WeobtaintheresultasasimplecombinationandextensionofthetechniquesusedinourFourRussianalgorithmforregularexpressionmatchingandthealgorithmofWuetal.
[WMM95].
1.
4.
4SubsequenceIndexingRecallthatasubsequenceofastringQisastringthatcanbeobtainedfromQbydeletingzeroormorecharacters.
ThesubsequenceindexingproblemistopreprocessastringQintoadatastructureecientlysupportingqueriesoftheform:"IsPasubsequenceofQ"foranystringP.
Baeza-Yates[BY91]introducedtheproblemandgaveseveralalgorithms.
LetmandndenotethelengthofPandQ,respectively,letσbethesizeofthealphabet.
Baeza-YatesshowedthatthesubsequenceindexingproblemcaneitherbesolvedusingO(nσ)spaceandO(m)querytime,O(nlogσ)spaceandO(mlogσ)querytime,orO(n)spaceandO(mlogn)querytime.
Forthesealgorithmthepreprocessingtimematchesthespacebounds.
ThekeycomponentinBaeza-Yates'solutionsisaDFAcalledthedirectedacyclicsubsequencegraph(DASG).
Baeza-Yatesobtainsthersttrade-olistedabovebyexplicitlyconstructingtheDASGandusingittoanswerqueries.
Thesecondtrade-ofollowsfromanencodedversionoftheDASGandthethirdtrade-oisbasedonsimulatingtheDASGusingpredecessordatastructures.
Severalvariantsofsubsequenceindexinghavebeenstudied,see[DFG+97,BCGM99]andthesurveys[Tro01,CMT03].
OurResultsandTechniquesInSection5.
4weimprovetheboundsforsubsequenceindexing.
WeshowhowtosolvetheproblemusingO(nσ1/2l)spaceandpreprocessingtimeandO(m(l+1))timeforqueries,for0≤l≤loglogσ(Theorem14).
Inparticular,forconstantlwegetadatastructureusingO(nσ)space11andpreprocessingtimeandO(m)querytimeandforl=loglogσwegetadatastructureusingO(n)spaceandpreprocessingtimeandO(mloglogσ)forqueries.
Thekeyideaisasimpletwo-leveldecompositionoftheDASGthatecientlycombinestheexplicitDASGwithafastpredecessorstructure.
UsingtheclassicalvanEmdeBoasdatastructure[vEBKZ77]leadstoO(n)spaceandpreprocessingtimewithO(mloglogσ)querytime.
Togetthefulltrade-o,wereplacethisdatastructurewitharecentonebyThorup[Tho03].
1.
5CompressedStringMatchingCompressedstringmatchingcoversproblemsthatinvolvesearchingforan(uncompressed)patterninacompressedtargettextwithoutdecompressingit.
Thegoalistosearchmoreecientlythantheobviousapproachofdecompressingthetargetandthenperformingthematching.
Moderntextdatabases,e.
g.
,forbiologicaldataandWorldWideWebdata,arehuge.
Tosavetimeandspacethedatamustbekeptincompressedformwhileallowingsearching.
Therefore,ecientalgorithmsforcompressedstringmatchingareneeded.
AmirandBenson[AB92b,AB92a]initiatedthestudyofcompressedstringmatching.
Subsequently,sev-eralresearchershaveproposedalgorithmsforvarioustypesofstringmatchingproblemsandcompressionmethods[AB92b,FT98,KTS+98,KNU03,Nav03,MUN03].
Forinstance,givenastringQoflengthucom-pressedwiththeZiv-Lempel-Welchscheme[Wel84]intoastringoflengthn,Amiretal.
[ABF96]gaveanalgorithmforndingallexactoccurrencesofapatternstringoflengthminO(n+m2)timeandspace.
Algorithmsforfullycompressedpatternmatching,whereboththepatternandthetargetarecompressedhavealsobeenstudied(seethesurveybyRytter[Ryt99]).
InChapter7,westudyapproximatestringmatchingandregularexpressionmatchingproblemsinthecontextofcompressedtexts.
Asinpreviousworkontheseproblems[KNU03,Nav03]wefocusonthepopularZL78andZLWcompressionschemes[ZL78,Wel84].
Thesecompressionschemesadaptivelydividetheinputintosubstrings,calledphrases,whichcanbecompactlyencodedusingreferencestootherphrases.
DuringencodinganddecodingwiththeZL78/ZLWcompressionschemesthephrasesaretypicallystoredinadictionarytrieforfastaccess.
DetailsofZiv-LempelcompressioncanbefoundinSection7.
2.
1.
5.
1CompressedApproximateStringMatchingRecallthatgivenstringsPandQandanerrorthresholdk,theapproximatestringmatchingproblemistondallendingpositionsofsubstringsofQwhoseunit-coststringeditdistancetoPisatmostk.
LetmandudenotethelengthofPandQ,respectively.
Forourpurposesweareparticularlyinterestinginthefastalgorithmsforsmallvaluesofk,namely,theO(uk)timealgorithmbyLandauandVishkin[LV89]andthemorerecentO(uk4/m+u)timealgorithmduetoColeandHariharan[CH02](weassumew.
l.
o.
g.
thatkdepth(nca(v1,v3)),nca(v1,v3)=nca(v2,v3),nca(w1,w2)=nca(w1,w3),andnca(w1,w3)=nca(w2,w3).
(c)isnotaless-constrainedmappingsincedepth(nca(v1,v2))>depth(nca(v1,v3))andnca(v1,v3)=nca(v2,v3),whilenca(w1,w3)=nca(w2,w3)Inthepaper[LST01]analgorithmfortheorderedversionoftheless-constrainededitdistanceproblemusingO(|T1||T2|I31I32(I1+I2))timeandspaceispresented.
Fortheunorderedversion,unliketheconstrainededitdistanceproblem,itisshownthattheproblemisNP-complete.
Thereductionusedissimilartotheonefortheunorderededitdistanceproblem.
ItisalsoreportedthattheproblemisMAXSNP-hard.
Fur-thermore,itisshownthatthereisnoabsoluteapproximationalgorithm2fortheunorderedless-constrainededitdistanceproblemunlessP=NP.
2AnapproximationalgorithmAisabsoluteifthereexistsaconstantc>0suchthatforeveryinstanceI,|A(I)OPT(I)|≤c,whereA(I)andOPT(I)aretheapproximateandoptimalsolutionsofIrespectively[Mot92].
292.
3.
5OtherVariantsInthissectionwesurveyresultsforothervariantsofeditdistance.
LetT1andT2berootedtrees.
TheunitcosteditdistancebetweenT1andT2isdenedasthenumberofeditoperationsneededtoturnT1intoT2.
In[SZ90]theorderedversionofthisproblemisconsideredandafastalgorithmispresented.
IfuistheunitcosteditdistancebetweenT1andT2thealgorithmrunsinO(u2min{|T1|,|T2|}min{L1,L2})time.
ThealgorithmusestechniquesfromUkkonen[Ukk85b]andLandauandVishkin[LV89].
In[Sel77]Selkowconsideredaneditdistanceproblemwhereinsertionsanddeletionsarerestrictedtoleavesofthetrees.
Thiseditdistanceissometimesreferredtoasthe1-degreeeditdistance.
HegaveasimplealgorithmusingO(|T1||T2|)timeandspace.
AnothereditdistancemeasurewhereeditoperationsworkonsubtreesinsteadofnodeswasgivenbyLu[Lu79].
AsimilareditdistancewasgivenbyTanakain[TT88,Tan95].
AshortdescriptionofLu'salgorithmcanbefoundin[SZ97].
2.
4TreeAlignmentDistanceInthissectionweconsiderthealignmentdistanceproblem.
LetT1andT2berooted,labeledtreesandletγbeametriccostfunctiononpairsoflabelsasdenedinSection2.
2.
AnalignmentAofT1andT2isobtainedbyrstinsertingnodeslabeledwithλ(calledspaces)intoT1andT2sothattheybecomeisomorphicwhenlabelsareignored,andthenoverlayingtherstaugmentedtreeontheotherone.
ThecostofapairofopposinglabelsinAisgivenbyγ.
ThecostofAisthesumofcostsofallopposinglabelsinA.
AnoptimalalignmentofT1andT2,isanalignmentofT1andT2ofminimumcost.
Wedenotethiscostbyα(T1,T2).
Figure2.
5showsanexample(from[JWZ95])ofanorderedalignment.
aa(a,a)edbf(e,λ)(λ,f)bccd(b,b)(c,λ)(λ,c)(d,d)(a)(b)(c)Figure2.
5:(a)TreeT1.
(b)TreeT2.
(c)AnalignmentofT1andT2.
Thetreealignmentdistanceproblemisaspecialcaseofthetreeeditingproblem.
Infact,itcorrespondstoarestrictededitdistancewhereallinsertionsmustbeperformedbeforeanydeletions.
Hence,δ(T1,T2)≤α(T1,T2).
Forinstance,assumethatalleditoperationshavecost1andconsidertheexampleinFigure2.
5.
Theoptimalsequenceofeditoperationsisachievedbydeletingthenodelabeledeandtheninsertingthenodelabeledf.
Hence,theeditdistanceis2.
Theoptimalalignment,however,isthetreedepictedin(c)withavalueof4.
Additionally,italsofollowsthatthealignmentdistancedoesnotsatisfythetriangleinequalityandhenceitisnotadistancemetric.
Forinstance,inFigure2.
5ifT3isT1wherethenodelabeledeisdeleted,thenα(T1,T3)+α(T3,T2)=2>4=α(T1,T2).
Itisawellknownfactthateditandalignmentdistanceareequivalentintermsofcomplexityforsequences,see,e.
g.
,Guseld[Gus97].
However,fortreesthisisnottruewhichwewillshowinthefollowingsections.
InSection2.
4.
1andSection2.
4.
2wesurveytheresultsfortheorderedandunorderedtreealignmentdistanceproblemrespectively.
302.
4.
1OrderedTreeAlignmentDistanceInthissectionweconsidertheorderedtreealignmentdistanceproblem.
LetT1andT2betworooted,orderedandlabeledtrees.
TheorderedtreealignmentdistanceproblemwasintroducedbyJiangetal.
in[JWZ95].
ThealgorithmpresentedthereusesO(|T1||T2|(I1+I2)2)timeandO(|T1||T2|(I1+I2))space.
Hence,forsmalldegreetrees,thisalgorithmisingeneralfasterthanthebestknownalgorithmfortheeditdistance.
Wepresentthisalgorithmindetailinthenextsection.
Recently,in[JL01],anewalgorithmwasproposeddesignedforsimilartrees.
Specically,ifthereisanoptimalalignmentofT1andT2usingatmostsspacesthealgorithmcomputesthealignmentintimeO((|T1|+|T2|)log(|T1|+|T2|)(I1+I2)4s2).
Thisalgorithmworksinawaysimilartothefastalgorithmsforcomparingsimilarsequences,see,e.
g.
,Section3.
3.
4in[SM97].
ThemainideaistospeedupthealgorithmofJiangetal.
byonlyconsideringsubtreesofT1andT2whosesizesdierbyatmostO(s).
2.
4.
1.
1Jiang,Wang,andZhang'sAlgorithmInthissectionwepresentthealgorithmofJiangetal.
[JWZ95].
Weonlyshowhowtocomputethealignmentdistance.
Thecorrespondingalignmentcaneasilybeconstructedwithinthesamecomplexitybounds.
Letγbeametriccostfunctiononthelabels.
Forsimplicity,wewillrefertonodesinsteadoflabels,thatis,wewilluse(v,w)fornodesvandwtomean(label(v),label(w)).
Here,vorwmaybeλ.
Weextendthedenitionofαtoincludealignmentsofforests,thatis,α(F1,F2)denotesthecostofanoptimalalignmentofforestF1andF2.
Lemma7Letv∈V(T1)andw∈V(T2)withchildrenv1,viandw1,wjrespectively.
Then,α(θ,θ)=0α(T1(v),θ)=α(F1(v),θ)+γ(v,λ)α(θ,T2(w))=α(θ,F2(w))+γ(λ,w)α(F1(v),θ)=ik=1α(T1(vk),θ)α(θ,F2(w))=jk=1α(θ,T2(wk))Lemma8Letv∈V(T1)andw∈V(T2)withchildrenv1,viandw1,wjrespectively.
Then,α(T1(v),T2(w))=minα(F1(v),F2(w))+γ(v,w)α(θ,T2(w))+min1≤r≤j{α(T1(v),T2(wr))α(θ,T2(wr)}α(T1(v),θ)+min1≤r≤i{α(T1(vr),T2(w))α(T1(vr),θ)}Proof.
ConsideranoptimalalignmentAofT1(v)andT2(w).
Therearefourcases:(1)(v,w)isalabelinA,(2)(v,λ)and(k,w)arelabelsinAforsomek∈V(T1),(3)(λ,w)and(v,h)arelabelsinAforsomeh∈V(T2)or(4)(v,λ)and(λ,w)areinA.
Case(4)neednotbeconsideredsincethetwonodescanbedeletedandreplacedbythesinglenode(v,w)asthenewroot.
Thecostoftheresultingalignmentisbythetriangleinequalityatleastassmall.
Case1:TherootofAislabeledby(v,w).
Hence,α(T1(v),T2(w))=α(F1(v),F2(w))+γ(v,w)Case2:TherootofAislabeledby(v,λ).
Hence,k∈V(T1(ws))forsome1≤r≤i.
Itfollowsthat,α(T1(v),T2(w))=α(T1(v),θ)+min1≤r≤i{α(T1(vr),T2(w))α(T1(vr),θ)}31Case3:Symmetrictocase2.
Lemma9Letv∈V(T1)andw∈V(T2)withchildrenv1,viandw1,wjrespectively.
Foranys,tsuchthat1≤s≤iand1≤t≤j,α(F1(v1,vs),F2(w1,wt))=minα(F1(v1,vs1),F2(w1,wt1))+α(T1(vs),T2(wt))α(F1(v1,vs1),F2(w1,wt))+α(T1(vs),θ)α(F1(v1,vs),F2(w1,wt1))+α(θ,T2(wt))γ(λ,wt)+min1≤k1.
ComputeR1:=Emb(v1)andsetU1:={(r,r)|r∈R1}.
Fori:=2tok,computeRi:=Emb(vi)andUi:=Mop(Ui1,Ri).
Finally,computeR:=Deep(Fl(Deep(Nca(Uk)),label(v))).
IfR=stopandreportthatthereisnodeepembeddingofP(v)inT.
OtherwisereturnR.
Lemma13FortreesPandTandnodev∈V(P),Emb(v)computesthesetofdeepoccurrencesofP(v)inT.
Proof.
ByinductiononthesizeofthesubtreeP(v).
Ifvisaleafweimmediatelyhavethatemb(v,T)=Deep(Fl(L(T),label(v)))andthuscase1follows.
Supposethatvisaninternalnodewithk≥1childrenv1,vk.
Weshowthatemb(P(v),T)=Emb(v).
Considercases2and3ofthealgorithm.
Ifk=1wehavethatw∈Emb(v)impliesthatlabel(w)=label(v)andthereisanodew1∈Emb(v1)suchthat(parent(w1),label(v))=w,thatis,nonodeonthepathbetweenw1andwislabeledlabel(v).
ByinductionEmb(v1)=emb(P(v1),T)andthereforewistherootofanembeddingofP(v)inT.
SinceEmb(v)isthedeepsetofallsuchnodesitfollowsthatw∈emb(P(v),T).
Conversely,ifw∈emb(P(v),T)thenlabel(w)=label(v),thereisanodew1∈emb(P(v1),T)suchthatww1,andnonodeonthepathbetweenwandw1islabeledlabel(v),thatis,(w1,label(v))=w.
Hence,w∈Emb(v).
Beforeconsideringcase3werstshowthatUj=mop(Emb(v1)Emb(vj))byinductiononj,2≤j≤k.
Forj=2itfollowsfromthedenitionofMopthatU2=mop(Emb(v1),Emb(v2)).
Hence,assumethatj>2.
WehaveUj=Mop(Uj1,Emb(vj))=Mop(mop(Emb(v1)Emb(vj1)),Rj).
BydenitionofMop,Ujisthesetofpairssuchthatforanypair(r1,rj1)∈mop(Emb(v1)Emb(vj1)),(r1,rj)∈Uji(rj1,rj)∈mop(mop(Emb(v1)Emb(vj1))|2,Rj).
ByLemma11itfollowsthat(r1,rj)∈Uji(r1,rj)∈mop(Emb(v1)Emb(vj)).
Nextconsiderthecasewhenk>1.
Ifw∈Emb(v)wehavethatlabel(w)=label(v)andtherearenodes(w1,wk)∈mop(emb(P(v1),Temb(P(vk),T))suchthatw=(nca(w1,wk),label(v)).
Clearly,wistherootofanembeddingofP(v)inT.
Assumeforcontradictionthatwisnotadeepembedding,thatis,wuforsomenodeu∈emb(P(v),T).
Sincew=(nca(w1,wk),label(v))theremustbenodesu1uk,suchthatui∈emb(P(vi),T)andu=(nca(u1,uk),label(v)).
However,thiscontradictsthefactthat(w1,wk)∈mop(emb(P(v1),Temb(P(vk),T)).
Ifw∈emb(P(v),T)asimilarargument44PTa1ab2a4bbaa3aababa(a)(b)aabbabbaaababaababaa(c)(d)aabbabbaaababaababaa(e)(f)Figure3.
3:ComputingthedeepoccurrencesofPintoTdepictedin(a)and(b)respectively.
ThenodesinParenumbered1–4foreasyreference.
(c)Case1ofEmb:ThesetEmb(3).
Since3and4areleavesandlabel(3)=label(4)wehaveEmb(3)=Emb(4).
(d)Case2ofEmb.
ThesetEmb(2).
NotethatthemiddlechildoftherootofTisnotinthesetsinceitisnotadeepoccurrence.
(e)Case3ofEmb:Thetwominimalorderedpairsof(d)and(c).
(f)Thenearestcommonancestorsofbothpairsin(e)givetherootnodeofTwhichistheonly(deep)occurrenceofP.
45impliesthatw∈Emb(v).
ThesetL(T)isdeepandinalltreecasesofEmb(V)thereturnedsetisalsodeep.
ByinductionitfollowsthattheinputtoParent,Fl,Nca,andMopisalwaysdeep.
Wewillusethisfacttoouradvantageinthefollowingalgorithms.
3.
4ASimpleTreeInclusionAlgorithmInthissectionweapresentasimpleimplementationofthesetprocedureswhichleadstoanecienttreeinclusionalgorithm.
Subsequently,wemodifyoneoftheprocedurestoobtainafamilyoftreeinclusionalgorithmswherethecomplexitiesdependonthesolutiontoawell-studiedproblemknownasthetreecolorproblem.
3.
4.
1PreprocessingTocomputedeepembeddingswerequireadatastructureforTwhichallowsus,foranyv,w∈V(T),tocomputencaT(v,w)anddetermineifvworvw.
Inlineartimewecancomputepre(v)andpost(v)forallnodesv∈V(T),andwiththeseitisstraightforwardtotestthetwoconditions.
Furthermore,Lemma14(HarelandTarjan[HT84])ForanytreeTthereisadatastructureusingO(nT)spaceandpreprocessingtimewhichsupportsnearestcommonancestorqueriesinO(1)time.
Hence,ourdatastructureuseslinearpreprocessingtimeandspace(seealso[BFC00,AGKR04]formorerecentnearestcommonancestordatastructures).
3.
4.
2ImplementationoftheSetProceduresToanswertreeinclusionquerieswegiveanecientimplementationofthesetprocedures.
Theideaistorepresentsetsofnodesandsetsofpairsofnodesinaleft-to-rightorderusinglinkedlists.
Forthispurposeweintroducesomehelpfulnotation.
LetX=[x1,xk]bealinkedlistofnodes.
ThelengthofX,denoted|X|,isthenumberofelementsinXandthelistwithnoelementsiswritten[].
TheithnodeofX,denotedX[i],isxi.
GivenanynodeythelistobtainedbyappendingytoX,isthelistXy=[x1,xk,y].
Ifforalli,1≤i≤|X|1,X[i]X[i+1]thenXisorderedandifX[i]X[i+1]thenXissemiordered.
AlistY=[(x1,zk)xk,zk)]isanodepairlist.
Byanalogy,wedenelength,append,etc.
forY.
ForapairY[i]=(xi,zi)deneY[i]1=xiandY[i]2=zi.
Ifthelists[Y[1]1,Y[k]1]and[Y[1]2,Y[k]2]arebothorderedorsemiorderedthenYisorderedorsemiordered,respectively.
Thesetproceduresareimplementedusingnodelists.
Alllistsusedintheproceduresareeitherorderedorsemiordered.
AsnotedinSection3.
3wemayassumethattheinputtoalloftheprocedures,exceptDeep,representadeepset,thatis,thecorrespondingnodelistornodepairlistisordered.
WeassumethattheinputlistgiventoDeepissemiorderedandtheoutput,ofcourse,isordered.
Hence,theoutputofalltheothersetproceduresmustbesemiordered.
InthefollowingletXbeanodelist,Yanodepairlist,andαacharacterinΣ.
Thedetailedimplementationofthesetproceduresisgivenbelow.
WeshowthecorrectnessinSection3.
4.
3anddiscussthecomplexityinSection3.
4.
4.
Parent(X):Returnthelist[parent(X[1]parent(X[|X|])].
Nca(Y):Returnthelist[nca(Y[1]nca(Y[|Y|])].
Deep(X):Initially,setx:=X[1]andR:=[].
Fori:=2to|X|do:ComparexandX[i].
Therearethreecases:46Y[j]2Y[i]2X[h]x(a)Y[j]2Y[i]2x=X[h](b)Figure3.
4:Case1and2fromtheimplementationofMop.
In(a)wehaveY[i]2x.
In(b)wehaveY[j]2Y[i]2x=X[h].
1.
xX[i].
SetR:=Rxandx:=X[i].
2.
xX[i].
Setx:=X[i].
3.
X[i]x.
Donothing.
ReturnRx.
TheimplementationofprocedureDeeptakesadvantageofthefactthattheinputlistissemiordered.
Incase1nodeX[i]totherightofour"potentialoutputnode"x.
SinceanynodethatisadescendantofxmustbetotherightofX[i]itcannotnotappearlaterinthelistXthanX[i].
WecanthussafelyaddxtoRatthispoint.
Incase2nodexisanancestorofX[i]andcanthusnotbeintheoutputlist.
Incase3nodeX[i]isanancestorofxandcanthusnotbeintheoutputlist.
Mop(Y,X):Initially,setR:=[].
FindthesmallestjsuchthatY[1]2X[j]andsety:=Y[1]1,x:=X[j],andh:=j.
Ifnosuchjexistsstop.
Fori:=2to|Y|do:Seth:=h+1untilY[i]2X[h]orh>|X|.
Ifh>|X|stopandreturnR:=R(y,x).
Otherwise,compareX[h]andx.
Therearetwocases:1.
IfxX[h]setR:=R(y,x),y:=Y[i]1,andx:=X[h].
2.
Ifx=X[h]sety:=Y[i]1.
ReturnR:=R(y,x).
InprocedureMopwehavea"potentialpair"(y,x)wherey=Y[i]1forsomeiandY[i]2x.
Letjbetheindexsuchthaty=Y[j]1.
Incase1wehavexX[h]andalsoY[j]2Y[i]2sincetheinputlistsareordered(seeFigure3.
4(a)).
Therefore,(y,x)isinsertedintoR.
Incase2wehavex=X[h],i.
e.
,Y[i]2x,andasbeforeY[j]2Y[i]2(seeFigure3.
4(b)).
Therefore(y,x)cannotbeintheoutput,andweset(Y[i]1,x)tobethenewpotentialpair.
Fl(X,α):Initially,setZ:=X,R:=[],andS:=[].
RepeatuntilZ:=[]:Fori:=1to|Z|do:Iflabel(Z[i])=αsetR:=Insert(Z[i],R).
OtherwisesetS:=Sparent(Z[i]).
SetS:=Deep(S),W:=Deep(S,R),andS:=[].
ReturnR.
47TheprocedureFlcallstwoauxiliaryprocedures:Insert(x,R)thattakesanorderedlistRandinsertthenodexsuchthattheresultinglistisordered,andDeep(S,R)thattakestwoorderedlistsandreturnstheorderedlistrepresentingthesetDeep(S∪R)∩S,i.
e.
,Deep(S,R)=[s∈S|z∈R:sz].
BelowwedescribeinmoredetailhowtoimplementFltogetherwiththeauxiliaryprocedures.
WeuseonedoublylinkedlisttorepresentallthelistsZ,S,andR.
ForeachelementinZwehavepointersPredandSuccpointingtothepredecessorandsuccessorinthelist,respectively.
WealsohaveateachelementapointerNextpointingtothenextelementinZ.
InthebeginningNext=Succforallelements,sinceallelementsinthelistareinZ.
WhengoingthroughZinoneiterationwesimplefollowtheNextpointers.
WhenFlcallsInsert(Z[i],R)wesetNext(Pred(Z[i]))toNext(Z[i]).
Thatis,allnodesinthelistnotinZ,i.
e.
,nodesnothavingaNextpointerpointingtothem,areinR.
WedonotexplicitlymaintainS.
InsteadwejustsetsaveParent(Z[i])atthepositioninthelistinsteadofZ[i].
NowDeep(S)canbeperformedfollowingtheNextpointersandremovingelementsfromthedoublylinkedlistaccordinglytoprocedureDeep.
ItremainstoshowhowtocalculateDeep(S,R).
ThiscanbedonebyrunningthroughSfollowingtheNextpointers.
AteachnodescomparePred(s)andSucc(s)withs.
Ifoneofthemisadescendantofsremovesfromthedoublylinkedlist.
UsingthislinkedlistimplementationDeep(S,R)takestimeO(|S|),whereasusingDeeptocalculatethiswouldhaveusedtimeO(|S|+|R|).
3.
4.
3CorrectnessoftheSetProceduresClearly,ParentandNcaarecorrect.
ThefollowinglemmasshowthatDeep,Fl,andMoparealsocorrectlyimplemented.
Fornotationalconveniencewewritex∈X,foralistX,ifx=X[i]forsomei,1≤i≤|X|.
Lemma15ProcedureDeep(X)iscorrect.
Proof.
LetybeanelementinX.
WewillrstprovethatiftherearenodescendantsofyinX,i.
e.
,X∩V(T(y))=,theny∈R.
SinceX∩V(T(y))=wemustatsomepointduringtheprocedurehavex=y,andxwillnotchangebeforexisaddedtoR.
IfyoccursseveraltimesinXwewillhavex=yeachtimewemeetacopyofy(excepttherst)anditfollowsfromtheimplementationthatywilloccurexactlyonceinR.
WewillnowprovethatifthereareanydescendantsofyinV,i.
e.
,X∩V(T(y))=,theny∈R.
LetzbetherightmostanddeepestdescendantofyinV.
Therearetwocases:1.
yisbeforezinX.
Lookatthetimeintheexecutionoftheprocedurewhenwelookatz.
Therearetwocases.
(a)x=y.
Sinceyzwesetx=zandproceed.
Itfollowsthaty∈R.
(b)x=x′=y.
SinceanynodetotheleftofyalsoistotheleftofzandXisansemiorderedlistwemusthavex′∈V(T(y))andthusy∈R.
2.
yisafterzinX.
SincezistherightmostanddeepestdescendantofyandVissemiorderedwemusthavex=zatthetimeintheprocedurewherewelookaty.
Thereforey∈R.
IfyoccursseveraltimesinX,eachcopywillbetakencareofbyeithercase1or2.
Lemma16ProcedureMop(Y,X)iscorrect.
Proof.
Wewanttoshowthatforany1≤licanbeadescendantofanodeinXi.
Ifz∈RinthiscallthenzcannotbeinZinanylatercalls.
ToseethislookatthetimewhenzremovedfromZ.
SincethesetZ∪RisdeepatalltimesnodescendantofzwillappearinZlaterinthiscalltoFl,andnonodeinRcanbeadescendantofz.
SinceanynodeinXj,j>i,isanancestorofsomenodeinDeep(Fl(Xi,αi))neitherzoranydescendantofzcanbeinanyXj,j>i.
ThuszcannotappearinZinanylatercallstoFl.
Nowifz∈Rthenwemighthavez∈Xi+1.
Inthatcase,zwillappearinZintherstiterationoftheprocedurecallFl(Xi+1,αi),butnotinanylatercallssincethelistsaredisjoint,andsincenonodeinXjwherej>i+1canbeadescendantofanodeinXi+1.
Ifz∈Randz∈Xi+1thenclearlyzcannotappearinZinanylatercall.
ThusanodeinTisinZatmosttwiceduringallthecalls.
3.
4.
5ComplexityoftheTreeInclusionAlgorithmUsingthenodelistimplementationofthesetproceduresweget:Theorem6FortreesPandTthetreeinclusionproblemcanbesolvedinO(lPnT)timeandO(nT)space.
Proof.
ByLemma18wecanpreprocessTinO(nT)timeandspace.
Letg(n)denotethetimeusedbyFlonalistoflengthn.
ConsiderthetimeusedbyEmb(root(P)).
Weboundthecontributionforeachnodev∈V(P).
FromLemma18itfollowsthatifvisaleafthecostofvisatmostO(g(lT)).
Hence,byLemma19,thetotalcostofallleavesisO(lPg(lT))=O(lPnT).
IfvhasasinglechildwthecostisO(g(|Emb(w)|)).
IfvhasmorethanonechildthecostofMop,Nca,andDeepisboundedbyw∈child(v)O(|Emb(w)|).
Furthermore,sincethelengthoftheoutputofMop(andthusNca)isatmostz=minw∈child(v)|Emb(w)|thecostofFlisO(g(z)).
Hence,thetotalcostforinternalnodesis,v∈V(P)\L(P)Og(minw∈child(v)|Emb(w)|)+w∈child(v)|Emb(w)|≤v∈V(P)O(g(|Emb(v)|)).
(3.
1)Nextwebound(3.
1).
Foranyw∈child(v)wehavethatEmb(w)andEmb(v)aredisjointorderedlists.
FurthermorewehavethatanynodeinEmb(v)mustbeanancestorofanodeinDeep(Fl(Emb(w),label(v))).
Hence,byLemma19,foranyleaftorootpathδ=v1,vkinP,wehavethatu∈δg(|Emb(u)|)≤O(nT).
LetdenotethesetofallroottoleafpathsinP.
Itfollowsthat,v∈V(T)g(|Emb(v)|)≤p∈u∈pg(|Emb(u)|)≤O(lPnT).
Sincethistimedominatesthetimespentattheleavesthetimeboundfollows.
NextconsiderthespaceusedbyEmb(root(P)).
ThepreprocessingofSection3.
4.
1usesonlyO(nT)space.
Furthermore,byin-ductiononthesizeofthesubtreeP(v)itfollowsimmediatelythatateachstepinthealgorithmatmostO(maxv∈V(P)|Emb(v)|)spaceisneeded.
SinceEmb(v)isadeepembedding,itfollowsthat|Emb(v)|≤lT.
503.
4.
6AnAlternativeAlgorithmInthissectionwepresentanalternativealgorithm.
SincethetimecomplexityofthealgorithmintheprevioussectionisdominatedbythetimeusedbyFl,wepresentanimplementationofthisprocedurewhichleadstoadierentcomplexity.
Denearstlabeldatastructureasadatastructuresupportingqueriesoftheform(v,α),v∈V(T),α∈Σ.
Maintainingsuchadatastructureisknownasthetreecolorproblem.
Thisisawell-studiedproblem,seee.
g.
[Die89,MM96,FM96,AHR98].
WithsuchadatastructureavailablewecancomputeFlasfollows,Fl(X,α):ReturnthelistR:=[(X[1],α)(X[|X|],α)].
Theorem7LetPandTbetrees.
Givenarstlabeldatastructureusings(nT)space,p(nT)preprocessingtime,andq(nT)timeforqueries,thetreeinclusionproblemcanbesolvedinO(p(nT)+nPlT·q(nT))timeandO(s(nT)+nT)space.
Proof.
ConstructingtherstlabeldatastructuresusesO(s(nT))andO(p(nT))time.
AsintheproofofTheorem6wehavethatthetotaltimeusedbyEmb(root(P))isboundedbyv∈V(P)g(|Emb(v)|),whereg(n)isthetimeusedbyFlonalistoflengthn.
SinceEmb(v)isadeepembeddingandeachtakesq(nT)wehave,v∈V(P)g(|Emb(v)|)≤v∈V(P)g(lT)=nPlT·q(nT).
Severalrstlabeldatastructuresareavailable,forinstance,ifwewanttomaintainlinearspacewehave,Lemma20(Dietz[Die89])ForanytreeTthereisadatastructureusingO(nT)space,O(nT)expectedpreprocessingtimewhichsupportsrstlabelqueriesinO(loglognT)time.
Theexpectationinthepreprocessingtimeisduetoperfecthashing.
SinceourdatastructuredoesnotneedtosupportecientupdateswecanremovetheexpectationbyusingthedeterministicdictionaryofHagerupet.
al.
[HMP01].
Thisgivesaworst-casepreprocessingtimeofO(nTlognT),however,usingasimpletwo-levelapproachthiscanbereducedtoO(nT)(seee.
g.
[Tho03]).
Plugginginthisdatastructureweobtain,Corollary1FortreesPandTthetreeinclusionproblemcanbesolvedinO(nPlTloglognT+nT)timeandO(nT)space.
3.
5AFasterTreeInclusionAlgorithmInthissectionwepresentanewtreeinclusionalgorithmwhichhasaworst-casesubquadraticrunningtime.
AsdiscussedintheintroductionthegeneralideaistodivideTintoclustersoflogarithmicsizewhichwecanecientlypreprocessandthenusethistospeedupthecomputationwithalogarithmicfactor.
3.
5.
1ClusteringInthissectionwedescribehowtodivideTintoclustersandhowthemacrotreeiscreated.
ForsimplicityinthepresentationweassumethatTisabinarytree.
IfthisisnotthecaseitisstraightforwardtoconstructabinarytreeB,wherenB≤2nT,andamappingg:V(T)→V(B)suchthatforanypairofnodesv,w∈V(T),label(v)=label(g(v)),vwig(v)g(w),andvwig(v)g(w).
IfthenodesinthesetU=V(B)\{g(v)|v∈V(T)}isassignedaspeciallabelβ∈ΣitfollowsthatforanytreeP,PTiPB.
LetCbeaconnectedsubgraphofT.
AnodeinV(C)incidenttoanodeinV(T)\V(C)isaboundarynode.
TheboundarynodesofCaredenotedbyδC.
AclusterofCisaconnectedsubgraphofCwith51atmosttwoboundarynodes.
AsetofclustersCSisaclusterpartitionofTiV(T)=∪C∈CSV(C),E(T)=∪C∈CSE(C),andforanyC1,C2∈CS,E(C1)∩E(C2)=,|E(C1)|≥1,root(T)∈δCifroot(T)∈V(C).
If|δC|=1wecallCaleafclusterandotherwiseaninternalcluster.
WeusethefollowingrecursiveprocedureClusterT(v,s),adoptedfrom[AR02],whichcreatesaclusterpartitionCSofthetreeT(v)withthepropertythat|CS|=O(s)and|V(C)|≤nT/s.
Asimilarclusterpartitioningachievingthesameresultfollowsfrom[AHT00,AHdLT97,Fre97].
ClusterT(v,s):Foreachchilduofvtherearetwocases:1.
|V(T(u))|+1≤nT/s.
Letthenodes{v}∪V(T(u))bealeafclusterwithboundarynodev.
2.
|V(T(u))|>nT/s.
Pickanodew∈V(T(u))ofmaximumdepthsuchthat|V(T(u))|+2|V(T(w))|≤nT/s.
LetthenodesV(T(u))\V(T(w))∪{v,w}beaninternalclusterwithboundarynodesvandw.
Recursively,computeClusterT(w,s).
Lemma21GivenatreeTwithnT>1nodes,andaparameters,wherenT/s≥2,wecanbuildaclusterpartitionCSinO(nT)time,suchthat|CS|=O(s)and|V(C)|≤nT/sforanyC∈CS.
Proof.
TheprocedureClusterT(root(T),s)clearlycreatesaclusterpartitionofTanditisstraightforwardtoimplementinO(nT)time.
Considerthesizeoftheclusterscreated.
Therearetwocasesforu.
Incase1,|V(T(u))|+1≤nT/sandhencetheclusterC={v}∪V(T(u))hassize|V(C)|≤nT/s.
Incase2,|V(T(u))|+2|V(T(w))|≤nT/sandhencetheclusterC=V(T(u))\V(T(w))∪{v,w}hassize|V(C)|≤nT/s.
Nextconsiderthesizeoftheclusterpartition.
Letc=nT/s.
WesaythataclusterCisbadif|V(C)|≤c/2andgoodotherwise.
Wewillshowthatatleastaconstantfractionoftheclustersintheclusterpartitionaregood.
ItiseasytoverifythattheclusterpartitioncreatedbyprocedureClusterhasthefollowingproperties:(i)LetCbeabadinternalclusterwithboundarynodesvandw(vw).
Thenwhastwochildrenwithatleastc/2descendantseach.
(ii)LetCbeabadleafclusterwithboundarynodev.
Thentheboundarynodeviscontainedinagoodcluster.
By(ii)thenumberofbadleafclustersisnolargerthantwicethenumberofgoodinternalclusters.
By(i)eachbadinternalclusterCissharingitslowestboundarynodeofCwithtwootherclusters,andeachofthesetwoclustersareeitherinternalclustersorgoodleafclusters.
Thistogetherwith(ii)showsthatnumberofbadclustersisatmostaconstantfractionofthetotalnumberofclusters.
Sinceagoodclusterisofsizemorethanc/2,therecanbeatmost2sgoodclustersandthus|CS|=O(s).
LetC∈CSbeaninternalclusterv,w∈δC.
ThespinepathofCisthepathbetweenv,wexcludingvandw.
Anodeonthespinepathisaspinenode.
Anodetotheleftandrightofv,w,oranynodeonthespinepathisaleftnodeandrightnode,respectively.
IfCisaleafclusterwithv∈δCthenanyproperdescendantofvisaleafnode.
LetCSbeaclusterpartitionofTasdescribedinLemma21.
WedeneanorderedmacrotreeM.
OurdenitionofMmaybeviewedasan"ordered"versionofthemacrotreedenedin[AR02].
ThenodesetV(M)consistsoftheboundarynodesinCS.
Additionally,foreachinternalclusterC∈CS,v,w∈δC,vw,wehavethenodess(v,w),l(v,w)andr(v,w)andedges(v,s(v,w)),(s(v,w),w),(l(v,w),s(v,w)),and(r(v,w),s(v,w)).
Thenodesareorderedsuchthatl(v,w)wr(v,w).
ForeachleafclusterC,v∈δC,wehavethenodel(v)andedge(l(v),v).
Sinceroot(T)isaboundarynodeMisrootedatroot(T).
Figure3.
5illustratesthesedenitions.
52vvs(v,w)l(v,w)r(v,w)ww(a)(b)vvl(v)(c)(d)Figure3.
5:Theclusteringandthemacrotree.
(a)Aninternalcluster.
Theblacknodesaretheboundarynodeandtheinternalellipsescorrespondtotheboundarynodes,therightandleftnodes,andspinepath.
(b)Themacrotreecorrespondingtotheclusterin(a).
(c)Aleafcluster.
Theinternalellipsesaretheboundarynodeandtheleafnodes.
(d)Themacrotreecorrespondingtotheclusterin(c).
Toeachnodev∈V(T)weassociateauniquemacronodedenotedc(v).
Letu∈V(C),whereC∈CS.
c(u)=uifuisboundarynode,l(v)ifuisaleafnodeandv∈δC,s(v,w)ifuisaspinenode,v,w∈δC,andvw,l(v,w)ifuisaleftnode,v,w∈δC,andvw,r(v,w)ifuisarightnode,v,w∈δC,andvw.
Conversely,foranymacronodei∈V(M)denethemicroforest,denotedC(i),astheinducedsubgraphofTofthesetofnodes{v|v∈V(T),i=c(v)}.
Wealsoassignasetoflabelstoigivenbylabel(i)={label(v)|v∈V(C(i))}.
IfiisspinenodeoraboundarynodetheuniquenodeinV(C(i))ofgreatestdepthisdenotedbyrst(i).
Finally,foranysetofnodes{i1,ik}V(M)wedeneC(i1,ik)astheinducedsubgraphofthesetofnodesV(C(i1)V(C(ik)).
Thefollowingpropositionsstatesusefulpropertiesofancestors,nearestcommonancestor,andtheleft-to-rightorderinginthemicroforestsandinT.
Thepropositionsfollowsdirectlyfromthedenitionoftheclustering.
SeealsoFigure3.
6.
Proposition2(Ancestorrelations)Foranypairofnodesv,w∈V(T),thefollowinghold(i)Ifc(v)=c(w)thenvTwivC(c(v))w.
(ii)Ifc(v)=c(w),c(v)∈{s(v′,w′),v′},andc(w)∈{l(v′,w′),r(v′,w′)}thenwehavevTwivC(c(v),s(v′,w′),v′)w.
(iii)Inallothercases,wTvic(w)Mc(v).
53v′vww′(a)v′vww′(b)v′vw(c)v′vww′(d)v′wvw′(e)v′wvw′(f)v′vw′w(f)Figure3.
6:Examplesfromthepropositions.
Inallcasesv′andw′aretopandbottomboundarynodesofthecluster,respectively.
(a)Proposition2(ii).
Herec(v)=s(v′,w′)andc(w)=l(v′,w′)(solidellipses).
ThedashedellipsecorrespondstoC(c(v),s(v′,w′),v′).
(b)Proposition3(i)and4(ii).
Herec(v)=c(w)=l(v′,w′)(solidellipse).
ThedashedellipsecorrespondstoC(c(v),s(v′,w′),v′).
(c)Proposition3(ii)and4(i).
Herec(v)=c(w)=l(v′)(solidellipse).
ThedashedellipsecorrespondstoC(c(v),v′).
(d)Proposition3(iii).
Herec(v)=l(v′,w′)andc(w)=s(v′,w′)(solidellipses).
ThedashedellipsecorrespondstoC(c(v),c(w),v′).
(e)Proposition3(iv).
Herec(v)=s(v′,w′)andc(w)=r(v′,w′)(solidellipses).
ThedashedellipsecorrespondstoC(c(v),c(w),v′).
(f)Proposition4(iv).
Herec(v)=r(v′,w′)andc(w)=l(v′,w′)(solidellipses).
ThedashedellipsecorrespondstoC(c(v),c(w),s(v′,w′),v′).
(g)Proposition4(v).
Herec(v)=r(v′,w′)(solidellipse)andw′Mc(w).
ThedashedellipsecorrespondstoC(c(v),s(v′,w′),v′,w′)).
Case(i)saysthatifvandwbelongstothesamemacronodethenvisanancestorofwivisanancestorofwinthemicroforestforthatmacronode.
Case(ii)saysthatifvisaspinenodeoratopboundarynodeandwisaleftorrightnodeinthesameclusterthenvisanancestorofwivisanancestorofwinthemicrotreeinducedbythatcluster(Figure3.
6(a)).
Case(iii)saysthatinallothercasesvisanancestorofwithemacronodevbelongstoisanancestorofthemacronodewbelongstointhemacrotree.
Proposition3(Left-ofrelations)Foranypairofnodesv,w∈V(T),thefollowinghold(i)Ifc(v)=c(w)∈{r(v′,w′),l(v′,w′)}thenvwivC(c(v),v′,s(v′,w′))w.
(ii)Ifc(v)=c(w)=l(v′)thenvwivC(c(v),v′)w.
(iii)Ifc(v)=l(v′,w′)andc(w)=s(v′,w′)thenvwivC(c(v),c(w),v′)w.
(iv)Ifc(v)=s(v′,w′)andc(w)=r(v′,w′)thenvwivC(c(v),c(w),v′)w.
54(v)Inallothercases,vwic(v)Mc(w).
Case(i)saysthatifvandwarebotheitherleftorrightnodesinthesameclusterthenvistotheleftofwivistotheleftofwinthemicrotreeinducedbytheirmacronodetogetherwiththespineandtopboundarynodeofthecluster(Figure3.
6(b)).
Case(ii)saysthatifvandwarebothleafnodesinthesameclusterthenvistotheleftofwivistotheleftofwinthemicrotreeinducedbythatleafcluster(Figure3.
6(c)).
Case(iii)saysthatifvisaleftnodeandwisaspinenodeinthesameclusterthenvistotheleftofwivistotheleftofwinthemicrotreeinducedbytheirtwomacronodesandthetopboundarynodeofthecluster(Figure3.
6(d)).
Case(iv)saysthatifvisaspinenodeandwisarightnodeinthesameclusterthenvistotheleftofwivistotheleftofwinthemicrotreeinducedbytheirtwomacronodesandthetopboundarynodeofthecluster(Figure3.
6(e)).
Inallothercasesvistotheleftofwifthemacronodevbelongstoistotheleftofthemacronodeofwinthemacrotree(Case(v)).
Proposition4(Ncarelations)Foranypairofnodesv,w∈V(T),thefollowinghold(i)Ifc(v)=c(w)=l(v′)thenncaT(v,w)=ncaC(c(v),v′)(v,w).
(ii)Ifc(v)=c(w)∈{l(v′,w′),r(v′,w′)}thenncaT(v,w)=ncaC(c(v),s(v′,w′),v′)(v,w).
(iii)Ifc(v)=c(w)=s(v′,w′)thenncaT(v,w)=ncaC(c(v))(v,w).
(iv)Ifc(v)=c(w)andc(v),c(w)∈{l(v′,w′),r(v′,w′),s(v′,w′)}thenncaT(v,w)=ncaC(c(v),c(w),s(v′,w′),v′)(v,w).
(v)Ifc(v)=c(w),c(v)∈{l(v′,w′),r(v′,w′),s(v′,w′)},andw′Mc(w)thenncaT(v,w)=ncaC(c(v),s(v′,w′),v′,w′)(v,w′).
(vi)Ifc(v)=c(w),c(w)∈{l(v′,w′),r(v′,w′),s(v′,w′)},andw′Mc(v)thenncaT(v,w)=ncaC(c(w′),s(v′,w′),v′,w′)(w,w′).
(vii)Inallothercases,ncaT(v,w)=ncaM(c(v),c(w)).
Case(i)saysthatifvandwareleafnodesinthesameclusterthenthenearestcommonancestorofvandwisthenearestcommonancestorofvandwinthemicrotreeinducedbythatleafcluster(Figure3.
6(c)).
Case(ii)saysthatifvandwarebotheitherleftnodesorrightnodesthenthenearestcommonancestorofvandwisthenearestcommonancestorinthemicrotreeinducedbytheirmacronodetogetherwiththespineandtopboundarynodeofthecluster(Figure3.
6(b)).
Case(iii)saysthatifvandwarebothspinenodesinthesameclusterthenthenearestcommonancestorofvandwisthenearestcommonancestorofvandwinthemicrotreeinducedbytheirmacronode.
Case(iv)saysthatifvandwareindierentmacronodesbutareright,left,orspinenodesinthesameclusterthenthenearestcommonancestorofvandwisthenearestcommonancestorofvandwinthemicrotreeinducedbythatcluster(wecanomitthebottomboundarynode)(Figure3.
6(f)).
Case(v)saysthatifvisaleft,right,orspinenode,andthebottomboundarynodew′ofv'sclusterisanancestorinthemacrotreeofthemacronodecontainingw,thenthenearestcommonancestorofvandwisthenearestcommonancestorofvandw′inthemicrotreeinducedbythemacronodeofv,thespinenode,andthetopandbottomboundarynodesofv'scluster(Figure3.
6(g)).
Case(vi)isthesameascase(v)withvandwinterchanged.
Inallothercasesthenearestcommonancestorofvandwisthenearestcommonancestoroftheirmacronodesinthemacrotree(Case(vii)).
3.
5.
2PreprocessingInthissectionwedescribehowtopreprocessT.
FirstbuildaclusterpartitionCSofthetreeTwithclustersofsizes,tobexedlater,andthecorrespondingmacrotreeMinO(nT)time.
ThemacrotreeispreprocessedasinSection3.
4.
1.
However,sincenodesinMcontainasetoflabels,wenowstoreadictionaryforlabel(v)foreachnodev∈V(M).
Usingthedeterministicusingthedeterministicdictionary55ofHagerupet.
al.
[HMP01]allthesedictionariescanbeconstructedinO(nTlognT)timeandO(nT)space.
Furthermore,weextendthedenitionofsuchthatM(v,α)isthenearestancestorwofvsuchthatα∈label(w).
Nextweshowhowtopreprocessthemicroforests.
ForanyclusterC∈CS,deepsetsX,Y,ZV(C),i∈N,andα∈ΣdenethefollowingproceduresonclusterC.
leftC(i,X):ReturntheleftmostinodesinX.
rightC(i,X):ReturntherightmostinodesinX.
leftofC(X,Y):ReturnallnodesofXtotheleftoftheleftmostnodeinY.
matchC(X,Y,Z),whereX={m1···mk},Y={v1···vk},andZY.
ReturnR:={mj|vj∈Z}.
mopC(X,Y)Returnthepair(R1,R2).
WhereR1=mop(M,N)|1andR2=mop(M,N)|2.
Inadditiontotheseprocedureswealsodenethesetproceduresonclusters,thatis,ParentC,NcaC,DeepC,andFlC,asinSection3.
3.
Collectively,wewillcallthesetheclusterprocedures.
Werepresenttheinputandoutputssetintheproceduresasbitstringsindexedbypreordernumbers.
Specically,asubsetXinaclusterCisgivenbyabitstringb1.
.
.
bs,suchthatbi=1itheithnodeinapreordertraversalofCisinX.
IfCcontainsfewerthansnodesweleavetheremainingvaluesundened.
TheproceduresleftC(i,X)thencorrespondstosettingallbitsinXlargerthantheithsetbittozero.
Similarly,rightC(i,X)correspondstosettingallbitssmallerthantheithlargestsetbittozero.
Similarly,theproceduresleftofC(X,Y),RightofC(X,Y),andmatchC(X,Y,Z)onlydependsonthepreorderofthenodesandthusonlyonthebitstringnotanyotherinformationaboutthecluster.
WecanthusommitthesubscriptCfromtheseveprocedures.
Nextweshowhowtoimplementtheclusterprocedureseciently.
Weprecomputethevalueofallprocedures,exceptFlC,forallpossibleinputsandclusters.
Bydenition,theseproceduresdonotdependonanyspeciclabelingofthenodesinthecluster.
Hence,itsucestoprecomputethevalueforallrooted,orderedtreeswithatmostsnodes.
Thetotalnumberoftheseislessthan22s(considere.
g.
anencodingusingbalancedparenthesis).
Furthermore,thenumberofpossibleinputsetsisatmost2s.
Sinceatmost3setsaregivenasinputtoaclusterprocedure,itfollowsthatwecantabulateallsolutionsusinglessthan23s·22s=25sbitsofmemory.
Hence,choosings≤1/10lognweuseO(212logn)=O(√n)bits.
UsingstandardbitwiseoperationseachsolutioniseasilyimplementedinO(s)timegivingatotaltimeofO(√nlogn).
SincetheprocedureFlCdependsonthealphabet,whichmaybeofsizenT,wecannotecientlyapplythesametrickasabove.
InsteaddeneforanyclusterC∈CS,XV(C),andα∈Σ:AncestorC(X):Returntheset{x|xisanancestorofanodeinX}.
EqC(α):Returntheset{x|x∈V(C),label(x)=α}.
Clearly,AncestorCcanbeimplementedasabove.
ForEqCnotethatthetotalnumberofdistinctlabelsinCisatmosts.
Hence,EqCcanbestoredinadictionarywithatmostsentrieseachofwhichisabitstringoflengths.
Thus,(usingagaintheresultof[HMP01])thetotaltimetobuildallsuchdictionariesisO(nTlognT).
BythedenitionofFlCwehavethat,FlC(X,α)=DeepC(AncestorC(X)∩EqC(α)).
Sinceintersectioncanbeimplementedusingabinaryand-operation,FlC(X,α)canbecomputedinconstanttime.
Later,wewillalsoneedtocomputeunionofbitstringsandwenotethatthiscanbedoneusingabinaryor-operation.
Toimplementthesetproceduresinthefollowingsectionweoftenneedto"restrict"theclusterprocedurestoworkonasubtreeofacluster.
Specically,foranysetofmacronodes{i1,ik}inthesameclusterC56(hence,k≤5),wewillreplacethesubscriptCwithC(i1,ik).
Forinstance,ParentC(s(v,w),l(v,w))(X)={parent(x)|x∈X∩V(C(s(v,w),l(v,w))}∩V(C(s(v,w),l(v,w)).
Toimplementallrestrictedversionsoftheclusterprocedures,wecomputeforeachclusterC∈CSabitstringrepresentingthesetofnodesineachmicroforest.
Clearly,thiscanbedoneinO(nT)time.
Sincethereareatmost5microforestsineachclusteritfollowsthatwecancomputeanyrestrictedversionusinganadditionalconstantnumberofand-operations.
NotethatthetotalpreprocessingtimeandspaceisdominatedbytheconstructionofdeterministicdictionarieswhichuseO(nTlognT)timeandO(nT)space.
3.
5.
3ImplementationoftheSetProceduresUsingthepreprocessingfromtheprevioussectionweshowhowtoimplementthesetproceduresinsublineartime.
Firstwedeneacompactrepresentationofnodesets.
LetTbeatreewithmacrotreeM.
Forsimplicity,weidentifynodesinMwiththeirpreordernumber.
LetSV(T)beanysubsetofnodesofT.
Amicro-macronodearray(abbreviatednodearray)XrepresentingSisanarrayofsizenM.
Theithentry,denotedX[i],representsthesubsetofnodesinC(i),thatis,X[i]=V(C(i))∩S.
ThesetX[i]isencodedusingthesamebitrepresentationasinSection3.
5.
2.
ByourchoiceofparameterintheclusteringthespaceusedforthisrepresentationisO(nT/lognT).
Wenowpresentthedetailedimplementationofthesetproceduresonnodearrays.
LetXbeanodearray.
Parent(X):InitializeanodearrayRofsizenMandseti:=1.
Repeatuntili>nM:Seti:=i+1untilX[i]=.
Therearethreecasesdependingonthetypeofi:1.
i∈{l(v,w),r(v,w)}.
ComputeN:=ParentC(i,s(v,w),v)(X[i]).
Foreachj∈{i,s(v,w),v},setR[j]:=R[j]∪(N∩V(C(j))).
2.
i=l(v).
ComputeN:=ParentC(i,v)(X[i]).
Foreachj∈{i,v},setR[j]:=R[j]∪(N∩V(C(j))).
3.
i∈{l(v,w),r(v,w),l(v)}.
ComputeN:=ParentC(i)(X[i]).
IfN=setR[i]:=R[i]∪N.
Otherwise,ifj:=parentM(i)=⊥setR[j]:=R[j]∪{rst(j)}.
Seti:=i+1.
ReturnR.
ToseethecorrectnessoftheimplementationofprocedureParentconsiderthethreecasesoftheprocedure.
Case1handlesthefactthatleftorrightnodesmayhaveanodeonaspineorboundarynodeasparent.
Sincenoleftorrightnodescanhaveaparentoutsidetheirclusterthereisnoneedtocomputeparentsinthemacrotree.
Case2handlesthefactthataleafnodemayhavetheboundarynodeasparent.
Sincenoleafnodecanhaveaparentoutsideitsclusterthereisnoneedtocomputeparentsinthemacrotree.
Case3handlesboundaryandspinenodes.
Inthiscasethereiseitheraparentwithinthemicroforestorwecanusethemacrotreetocomputetheparentoftherootofthemicrotree.
SincetheinputtoParentisdeepweonlyneedtodooneofthetwothings.
Ifthecomputationofparentinthemicrotreereturnsanodej,thiswilleitherbeaspinenodeoraboundarynode.
Totakecareofthecasewherejisaspinenode,weaddthelowestnode(rst(j))injtotheoutput.
ProcedureParentthuscorrectlycomputesparentforallkindsofmacronodes.
WenowgivetheimplementationofprocedureNca.
TheinputtoprocedureNcaistwonodearraysXandYrepresentingtwosubsetsX,YV(T),|X|=|Y|=k.
TheoutputisanodearrayRrepresentingtheset{nca(Xi,Yi)|1≤i≤k},whereXiandYiistheithelementofXandY,wrt.
totheirpreordernumberinthetree,respectively.
WealsoassumethatwehaveXiYiforalli(sinceNcaisalwayscalledonasetofminimumorderedpairs).
57Nca(X,Y):InitializeanodearrayRofsizenM,seti:=1andj:=1.
Repeatuntili>nMorj>nM:UntilX[i]=seti:=i+1.
UntilY[j]=setj:=j+1.
Compareiandj.
Therearetwocases:1.
i=j.
Therearetwosubcases:(a)iisaboundarynode.
SetR[i]:=X[i],i:=i+1andj:=j+1.
(b)iisnotaboundarynode.
ComparethesizesofX[i]andY[j].
Therearetwocases:–|X[i]|>|Y[j]|.
SetXi:=left(|Y[j]|,X[i]),–|X[i]|=|Y[j]|.
SetXi=X[i].
SetS:=C(i,v),ifi=l(v),C(i,s(v,w),v),ifi∈{l(v,w),r(v,w)},C(i),ifi=s(v,w).
ComputeN:=NcaS(Xi,Yj).
ForeachmacronodehinSsetR[h]:=R[h]∪(N∩V(C(h))).
SetX[i]:=X[i]\Xiandj:=j+1.
2.
i=j.
ComparethesizesofX[i]andY[j].
Therearethreecases:–|X[i]|>|Y[j]|.
SetXi:=left(|Y[j]|,X[i])andYj:=Y[j],–|X[i]||Y[j]|wecomputenearestcommonancestorsoftherst/leftmost|Y[j]|nodesinX[i]andthenodesinY[j].
Duetotheassumptionontheinput(XiYi)weeitherhave|X[i]|>|Y[j]|or|X[i]|=|Y[j]|.
If|X[i]|>|Y[j]|wemustcomputenearestcommonancestorsoftherst/leftmost|Y[j]|nodesinX[i]andthenodesinY[j].
If|X[i]|=|Y[j]|wemustcomputenearestcommonancestorsofallnodesinX[i]andY[j].
WenowcomputenearestcommonancestorsofthedescribednodesinaclusterSdependingonwhatkindofnodeiis.
IfiisaleafnodethenthenearestcommonancestorsofthenodesinX[i]andY[j]iseitheriniorintheboundarynode(Proposition4(i)).
Ifiisaleftorrightnodethenthenearestcommonancestorsmust58beinionthespineorinthetopboundarynode(Proposition4(ii)).
Ifiisaspinenodethenthenearestcommonancestorsmustbeonthespineorinthetopboundarynode(Proposition4(iii)).
Weupdatetheoutputnodearray,removefromX[i]thenodeswehavejustcomputednearestcommonancestorsof,andincrementjsincewehavenowcomputednearestcommonancestorsforallnodesinY[j].
Nowconsiderthecasewherei=j.
FirstwecomparethesizesofthesubsetsrepresentedbyX[i]andY[i].
If|X[i]|>|Y[j]|weshouldcomputenearestcommonancestorsoftherst/leftmost|Y[j]|nodesinX[i]andthenodesinY[j]asinCase1(b).
If|X[i]|nM:Seti:=i+1untilX[i]=.
Comparejandi.
Therearethreecases:1.
ji.
SetS:=C(j,v),ifj=l(v),C(j,s(v,w),v),ifj∈{l(v,w),r(v,w)},C(j),otherwise.
SetR[j]:=DeepS(X[j])andj:=i.
2.
ji.
Ifi∈{l(v,w),r(v,w)}andj=s(v,w)computeN:=DeepC(i,s(v,w),v)(X[i]∪X[j]),andsetR[j]:=R[j]∩N.
Setj:=i.
Seti:=i+1.
SetR[j]:=DeepS(X[j]),whereSissetasinCase1.
ReturnR.
TheaboveDeepprocedureresemblesthepreviousDeepprocedureimplementedonthemacrotreeinthetworstcases.
Thethirdcasefromthepreviousimplementationcanbeomittedsincetheinputlistisnowinpreorder.
Incase1nodeiistotherightofour"potentialoutputnode"j.
Sinceanynodelthatisadescendantofjmustbetotheleftofi(lnMorj>nM:Seti:=i+1untilX[i]=.
Therearethreecases:1.
Ifi=l(v,w)forsomev,wsetj:=j+1untilY[j]=andeitherij,i=j,orj=s(v,w).
2.
Ifi=s(v,w)forsomev,wsetj:=j+1untilY[j]=andeitherijorj=r(v,w).
3.
Ifi∈{r(v,w),l(v)}forsomev,wsetj:=j+1untilY[j]=andeitherijori=j.
4.
Otherwise(iisaboundarynode)setj:=j+1untilY[j]=andij.
Compareiandj.
Therearetwocases:1.
ij:Compares1andj.
Ifs1jsetR[r1]:=R[r1]∪r2,S[s1]:=S[s1]∪s2,and(s1,s2):=(j,leftC(j)(1,Y[j])).
Set(r1,r2):=(i,rightC(i)(1,X[i]))andi=i+1.
2.
Otherwisecompute(r,s):=mopC(i,j,v)(X[i],Y[j]),wherevisthetopboundarynodeintheclusteriandjbelongsto.
Ifr=do:–Compares1andj.
Ifs1jorifs1=jandleftofC(i,j)(X[i],s2)=thensetR[r1]:=R[r1]∪r2,S[s1]:=S[s1]∪s2.
–Set(r1,r2):=(i,r)and(s1,s2):=(j,s).
Therearetwosubcases:(a)i=jori=l(v,w)andj=s(v,w).
SetX[i]:=rightC(i)(X[i])\r2andj:=j+1.
(b)i=s(v,w)andj=r(v,w).
Ifr2=setj:=j+1otherwiseseti:=i+1.
SetR[r1]:=R[r1]∪r2andS[s1]:=S[s1]∪s2.
Return(R,S).
ProcedureMopSimissomewhatsimilartothepreviousimplementationoftheprocedureMopfromSec-tion3.
4.
2.
Weagainhavea"potentialpair"((r1,r2),(s1,s2))butweneedmorecasestotakecareofthedierentkindsofmacronodes.
Werstndthenextnon-emptymacronodei.
Wethenhave4casesdependingonwhichkindofnodeiis.
InCase1iisaleftnode.
DuetoProposition3wecanhavemopini(case(i)),inthespine(case(iii)),60orinanodetotheleftofi(case(v)).
InCase2iisaspinenode.
DuetoProposition3wecanhavemopintherightnode(case(iv))orinanodetotheleftofi(case(v)).
InCase3iisarightnodeoraleafnode.
DuetoProposition3wecanhavemopini(case(i)and(ii))orinanodetotheleftofi(case(v)).
Inthelastcase(Case5)imustbeaboundarynodeandmopmustbeinanodetotheleftofi.
Wethencompareiandj.
Thecasewereijissimilartothepreviousimplementationoftheprocedure.
Wecomparejwithourpotentialpair.
Ifs1jthenwecaninsertr2ands2intoouroutputnodearraysRandS,respectively.
Wealsosets1tojands2totheleftmostnodeinY[j].
Then—bothifs1jors1=j—wesetr1toiandr2totherightmostnodeinX[i].
Wehavethusupdated((r1,r2),(s1,s2))tobeournewpotentialpair.
ThatweonlyneedtherightmostnodeinX[i]andtheleftmostnodeinY[j]followsdirectlyfromthedenitionofmop.
Case2(ij)ismorecomplicated.
Inthiscaseweneedtocomputemopintheclusteriandjbelongsto.
Ifthisresultsinanyminimumorderedpairs(r=)wemustupdateourpotentialpair.
Asinthepreviouscasewecompares1andj,butthistimewemustalsoaddr1ands1totheoutputifs1=jandnonodesinX[i]aretotheleftoftheleftmostnodeins2.
Toseethisrstnotethatsincer1i(theinputisdeep)wemusthaver1=s1andthuss2containsonlyonenodes′.
Ifs′istotheleftofallnodesinX[i]thennonodeinX[i]canbeinaminimumorderedpairwiths′andwecansafelyaddourpotentialpairtotheoutput.
Wethenupdateourpotentialpair.
Finally,weneedtoupdateX[i],i,andj.
Thisupdatedependsonwhichkindofmacronodeswehavebeenworkingon.
InCase(a)weeitherhavei=joriisaleftnodeandjisaspinenode.
InbothcaseswecanhavenodesinX[i]thataretonottotheleftofanynodeinY[j].
TherightmostofthesenodescanbeinaminimumorderedpairwithanodefromanothermacronodeandwethusupdateX[i]tocontainthisnodeonly(ifitexists).
NowallnodesinY[j]mustbetotheleftofallnodesinX[i]inthenextiterationandthusweincrementj.
InCase(b)iisaspinenodeandjisarightnode.
Ifr2=thennonodeinY[j]istotherightofthenodeinX[i].
Sincetheinputarraysaredeep,nonodelaterinthearrayXcanbetotheleftofanynodeinY[j]andwethereforeincrementj.
Ifr2=thenthesinglenodeinX[i]isinthepotentialpairandweincrementi.
WedonotincrementjastherecouldbenodesinX[j]totheleftofthenodesinY[j].
Whenreachingtheendofoneofthearraysweaddourpotentialpairtotheoutputandreturn.
ThecorrectnessoftheprocedurefollowsfromtheproofofLemma16andtheabove.
ProcedureMatchtakesthreenodearraysX,Y,andY′representingdeepsetsX,Y,andY′,where|X|=|Y|,andY′Y.
Theoutputisanodearrayrepresentingtheset{Xj|Yj∈Y′}.
Match(X,Y,Y′)InitializeanodearrayRofsizenM,setXL:=,YL:=,Y′L:=,x:=0,y:=0,i:=1andj:=1.
Repeatuntili>nMorj>nM:UntilX[i]=seti:=i+1.
Setx:=|X[i]|.
UntilY[j]=setj:=j+1.
Sety:=|Y[j]|.
CompareY[j]andY′[j].
Therearetwocases:1.
Y[j]=Y′[j].
Comparexandy.
Therearethreecases:(a)x=y.
SetR[i]:=R[i]∪X[i],i:=i+1,andj:=j+1.
(b)xy.
SetXL:=left(y,X[i]),R[i]:=R[i]∪XL,X[i]:=X[i]\XL,andj:=j+1.
2.
Y[j]=Y′[j].
Comparexandy.
Therearethreecases:(a)x=y.
SetR[i]:=R[i]∪match(X[i],Y[j],Y′[j]),i:=i+1,andj:=j+1.
(b)xy.
SetXL:=left(y,X[i]),R[i]:=R[i]∪match(XL,Y[j],Y′[j]),X[i]:=X[i]\XL,andj:=j+1.
ReturnR.
ProcedureMatchproceedsasfollows.
Firstwendthenextnon-emptyentriesinthetwonodearraysX[i]andY[j].
WethencompareY[j]andY′[j].
IftheyareequalwekeepallnodesinXwiththesamerankasthenodesinY[j].
Wedothisbysplittingintothreecases.
IftherearethesamenumberofnodesX[i]andY[j]weaddallnodesinX[i]totheoutputandincrementiandj.
IftherearemorenodesinY[j]thaninX[i]weaddallnodesinX[i]totheoutputandupdateY[j]tocontainonlytheyxlefmostnodesinY[j].
Wethenincrementianditerate.
IftherearemorenodesinX[i]thaninY[j]weaddtherstynodesinX[i]totheoutput,incrementj,andupdateX[i]tocontainonlythenodeswedidnotaddtotheoutput.
IfY[j]=Y′[j]wecalltheclusterprocedurematch.
AgainwesplitintothreecasesdependingonthenumberofnodesinX[i]andY[j].
IftheyhavethesamenumberofnodeswecanjustcallmatchonX[i],Y[j],andY′[j]andincrementiandj.
If|Y[j]|>|X[i]|wecallmatchwithX[i]theleftmost|X[i]|nodesofY[j]andwiththepartofY′[j]thatareasubsetoftheseleftmost|X[i]|nodesofY[j].
WethenupdateY[j]andY′[j]tocontainonlythenodeswedidnotuseinthecalltomatchandincrementi.
If|Y[j]|nM:UntilX[i]=seti:=i+1.
Therearethreecasesdependingonthetypeofi:1.
i∈{l(v,w),r(v,w)}.
ComputeN:=FlC(i,s(v,w),v)(X[i],α).
IfN=foreachj∈{i,s(v,w),v}setR[j]=R[j]∪(N∩V(C(j))).
Otherwise,setL:=LparentM(v).
2.
i=l(v).
ComputeN:=FlC(i,v)(X[i]).
IfN=foreachj∈{i,v},setR[j]:=R[j]∪(N∩V(C(j))).
Otherwise,setL:=LparentM(v).
3.
i∈{l(v,w),r(v,w),l(v)}.
ComputeN:=FlC(i)(X[i],α).
IfN=setR[i]:=R[i]∪N.
OtherwisesetL:=LparentM(i).
Subsequently,computethelistS:=FlM(L,α).
Foreachnodei∈SsetR[i]:=R[i]∪FlC(S[i])(rst(S[i]),α)).
ReturnR.
TheFlprocedureissimilartoParent.
Thecases1,2and3computeFlonamicroforest.
IftheresultiswithinthemicrotreeweaddittoRandotherwisewestorethenodeinthemacrotreewhichcontainstheparentoftherootofthemicroforestinanodelistL.
SincewealwayscallDeepontheoutputfromFl(X,α)thereisnoneedtocomputeFlinthemacrotreeifNisnonempty.
WethencomputeFlinthemacrotreeonthelistL,storetheresultsinalistS,andusethistocomputethenalresult.
ConsiderthecasesofprocedureFl.
InCase1iisaleftorrightnode.
DuetoProposition2case(i)and(ii)ofanodeinicanbeinionthespineorinthetopboundarynode.
IfthisisnotthecaseitcanbefoundbyacomputationofFloftheparentofthetopboundarynodeofthei'sclusterinthemacrotree(Proposition2case(iii)).
InCase2iisaleafnode.
Thenofanodeinimusteitherbeini,inthetopboundarynode,orcanbefoundbyacomputationofFloftheparentofthetopboundarynodeofthei'sclusterinthemacrotree.
IfiisaspinenodeoraboundarynodeofanodeiniiseitheriniorcanbefoundbyacomputationofFloftheparentofiinthemacrotree.
ThecorrectnessoftheprocedurefollowsfromProposition2,theabove,andthecorrectnessofprocedureFlM.
623.
5.
4ComplexityoftheTreeInclusionAlgorithmToanalysethecomplexityofthenodearrayimplementationwerstboundtherunningtimeoftheaboveimplementationofthesetprocedures.
Allproceduresscantheinputfromleft-to-rightwhilegraduallyproducingtheoutput.
InadditiontothisprocedureFlneedsacalltoanodelistimplementationofFlonthemacrotree.
GiventhedatastructuredescribedinSection3.
5.
2itiseasytocheckthateachstepinthescancanbeperformedinO(1)timegivingatotalofO(nT/lognT)time.
SincethenumberofnodesinthemacrotreeisO(nT/lognT)thecalltothenodelistimplementationofFliseasilydonewithinthesametime.
Hence,wehavethefollowinglemma.
Lemma22ForanytreeTthereisadatastructureusingO(nT)spaceandO(nTlognT)preprocessingtimewhichsupportsallofthesetproceduresinO(nT/lognT)time.
NextconsidercomputingthedeepoccurrencesofPinTusingtheprocedureEmbofSection3.
3andLemma22.
Sinceeachnodev∈V(P)contributesatmostaconstantnumberofcallstosetproceduresitfollowsimmediatelythat,Theorem8FortreesPandTthetreeinclusionproblemcanbesolvedinO(nPnT/lognT+nTlognT)timeandO(nT)space.
CombiningtheresultsinTheorems6,8andCorollary1wehavethemainresultofTheorem5.
6364Chapter4MatchingSubsequencesinTrees6566MatchingSubsequencesinTreesPhilipBilleITUniversityofCopenhagenbeetle@itu.
dkIngeLiGrtzTechnicalUniversityofDenmarkilg@imm.
dtu.
dkAbstractGiventworooted,labeledtreesPandTthetreepathsubsequenceproblemistodeterminewhichpathsinParesubsequencesofwhichpathsinT.
Hereapathbeginsattherootandendsataleaf.
InthispaperweproposethisproblemasausefulqueryprimitiveforXMLdata,andprovidenewalgorithmsimprovingthepreviouslybestknowntimeandspacebounds.
4.
1IntroductionWesaythatatreeislabeledifeachnodeisassignedacharacterfromanalphabetΣ.
Giventwosequencesoflabelednodespandt,wesaythatpisasubsequenceoft,denotedpt,ifpcanbeobtainedbyremovingnodesfromt.
Giventworooted,labeledtreesPandTthetreepathsubsequenceproblem(TPS)istodeterminewhichpathsinParesubsequencesofwhichpathsinT.
Hereapathbeginsattherootandendsataleaf.
Thatis,foreachpathpinPwemustreportallpathstinTsuchthatpt.
ThisproblemwasintroducedbyChen[Che00]whogaveanalgorithmusingO(min(lPnT+nP,nPlT+nT))timeandO(lPdT+nP+nT)space.
Here,nS,lS,anddSdenotesthenumberofnodes,numberofleaves,anddepth,respectively,ofatreeS.
Notethatintheworst-casethisisquadratictimeandspace.
Inthispaperwepresentimprovedalgorithmsgivingthefollowingresult:Theorem9FortreesPandTthetreepathsubsequenceproblemcanbesolvedinO(nP+nT)spacewiththefollowingrunningtimes:minO(lPnT+nP),O(nPlT+nT),O(nPnTlognT+nT+nPlognP).
ThersttwoboundsinTheorem9matchtheprevioustimeboundswhileimprovingthespacetolinear.
ThisisachievedusingaalgorithmthatresemblesthealgorithmofChen[Che00].
Atahighlevel,thealgorithmsareessentiallyidenticalandthereforetheboundsshouldberegardedasanimprovedanalysisofChen'salgorithm.
Thelatterboundisobtainedbyusinganentirelynewalgorithmthatimprovestheworst-casequadratictime.
Specically,wheneverlognP=O(nT/lognT)therunningtimeisimprovedbyalogarithmicfactor.
Notethat–intheworst-case–thenumberofpairsconsistingofapathfromPandapathTis(nPnT),andthereforeweneedatleastasmanybitstoreportthesolutiontoTPS.
Hence,onaRAMwithlogarithmicwordsizeourworst-caseboundisoptimal.
Mostimportantly,allouralgorithmsuselinearspace.
ForpracticalapplicationsthiswilllikelymakeitpossibletosolveTPSonlargetreesandimproverunningtimesincemoreofthecomputationcanbekeptinmainmemory.
67catalogbookbookauthorchapterauthorchapterchapterJohnPaulXMLnametitlesectiontitleJohndatabasesXMLqueries(a)(b)Figure4.
1:(a)Thetrieofqueries1,2,3,orthetreeforquery4.
(b)Afragmentofacatalogofbooks.
4.
1.
1ApplicationsWeproposeTPSasausefulqueryprimitiveforXMLdata.
ThekeyideaisthatanXMLdocumentDmaybeviewedasarooted,labeledtree.
Forexample,supposethatwewanttomaintainacatalogofbooksforabookstore.
AfragmentofapossibleXMLtree,denotedD,correspondingtothecatalogisshowninFig.
4.
1(b).
Inadditiontosupportingfull-textqueries,suchasndalldocumentscontainingtheword"John",wecanalsousethetreestructureofthecatalogtoaskmorespecicqueries,suchasthefollowingexamples:1.
FindallbookswrittenbyJohn,2.
ndallbookswrittenbyPaul,3.
ndallbookswithachapterthathassomethingtodowithXML,or4.
ndallbookswrittenbyJohnandPaulwithachapterthathassomethingtodowithXML.
Thequeries1,2,and3correspondtoapathqueryonD,thatis,computewhichpathsinDthatcontainsaspecicpathasasubsequence.
Forinstance,computingthepathsinDthatcontainthepathofthreenodeslabeled"book","chapter",and"XML",respectively,eectivelyanswersquery3.
MostXML-querylanguages,suchasXPath[CD99],supportsuchqueries.
Usingadepth-rsttraversalofDapathquerycanbesolvedinlineartime.
Moreprecisely,ifqisapathconsistingofnqnodes,answeringthepathqueryonDtakesO(nq+nD)time.
Hence,ifwearegivenpathqueriesq1,qkwecananswertheminO(nq1nqk+knD)time.
However,wecandobetterbyconstructingthetrie,Q,ofq1,qk.
AnsweringallpathsqueriesnowcorrespondtosolvingTPSonQandD.
Asanexamplethequeries1,2,and3formthetrieshowninFig.
4.
1(a).
AslQ≤k,Theorem9givesusanalgorithmwithrunningtimeOnq1nqk+minknD+nQ,nQlD+nD,nQnDlognD+nD+nQlognQ.
(4.
1)SincenQ≤nq1nqkthisisatleastasgoodasansweringthequeriesindividuallyandbetterinmanycases.
Ifmanypathsshareaprex,i.
e.
,queries1and2share"book"and"author",thesizeofnQcanmuchsmallerthannq1nqk.
UsingoursolutiontoTPSwecanecientlytakeadvantageofthissituationsincethelattertwotermsin(4.
1)dependonnQandnotonnq1nqk.
Nextconsiderquery4.
ThisquerycannotbeansweredbysolvingaTPSproblembutisaninstanceofthetreeinclusionproblem(TI).
HerewewanttodecideifPisincludedinT,thatis,ifPcanbeobtainedfromTbydeletingnodesofT.
DeletinganodeyinTmeansmakingthechildrenofychildrenoftheparentofyandthenremovingy.
Itisstraightforwardtocheckthatwecananswerquery4bydecidingifthetreeinFig.
4.
1(a)canbeincludedinthetreeinFig.
4.
1(b).
ThisworkwasperformedwhiletheauthorwasaPhDstudentattheITUniversityofCopenhagen.
68Recently,TIhasbeenrecognizedasanimportantXMLqueryprimitiveandhasrecievedconsiderableattention,seee.
g.
,[SM02,YLH03,YLH04,ZADR03,SN00,TRS02].
Unfortunately,TIisNP-completeingeneral[KM95a]andthereforetheexistingalgorithmsarebasedonheuristics.
ObservethatanecessaryconditionforPtoincludedinTisthatallpathsinParesubsequencesofpathsinT.
Hence,wecanuseTPStoquicklyidentifytreesorpartsoftreesthatcannotbeincludedT.
WebelievethatinthiswayTPScanbeusedasaneective"lter"formanytreeinclusionproblemsthatoccurinpractice.
4.
1.
2TechnicalOverviewGiventwostrings(orlabeledpaths)aandb,itisstraightforwardtodetermineifaisasubsequenceofbbyscanningthecharacterfromlefttorightinb.
ThisusesO(|a|+|b|)time.
WecansolveTPSbyapplyingthisalgorithmtoeachofthepairofpathsinPandT,however,thismayuseasmuchasO(nPnT(nP+nT))time.
Alternatively,Baeza-Yates[BY91]showedhowtopreprocessbinO(|b|log|b|)timesuchthattestingwhetheraisasubsequenceofbcanbedoneinO(|a|log|b|)time.
UsingthisdatastructureoneachpathinTwecansolvetheTPSproblem,however,thismaytakeasmuchasO(n2TlognT+n2PlognT).
Hence,noneoftheavailiablesubsequencealgorithmsonstringsprovideanimmediateecientsolutiontoTPS.
InspiredbytheworkofChen[Che00]wetakeanotherapproach.
WeprovideaframeworkforsolvingTPS.
ThemainideaistotraverseTwhilemaintainingasubsetofnodesinP,calledthestate.
WhenreachingaleafzinTthestaterepresentsthepathsinPthatareasubsequencesofthepathfromtheroottoz.
Ateachstepthestateisupdatedusingasimpleproceduredenedonsubsetofnodes.
TheresultofTheorem9isobtainedbytakingthebestoftwoalgorithmsbasedonourframework:Therstoneusesasimpledatastructuretomaintainthestate.
ThisleadstoanalgorithmusingO(min(lPnT+nP,nPlT+nT))time.
AtahighlevelthisalgorithmresemblesthealgorithmofChen[Che00]andachievesthesamerunningtime.
However,weimprovetheanalysisofthealgorithmandshowaspaceboundofO(nP+nT).
Thisshouldbecomparedtotheworst-casequadraticspaceboundofO(lPdT+nP+nT)givenbyChen[Che00].
Oursecondalgorithmtakesadierentapproachcombiningseveraltechniques.
Startingwithasimplequadratictimeandspacealgorithm,weshowhowtoreducethespacetoO(nPlognT)usingadecompositionofTintodisjointpaths.
WethendividePintosmallsubtreesoflogarithmicsizecalledmicrotrees.
Themicrotreesarethenpreprocessedsuchthatsubsetsofnodesinamicrotreecanbemaintainedinconstanttimeandspace.
Intuitively,thisleadstoalogarithmicimprovementofthetimeandspacebound.
4.
1.
3NotationandDenitionsInthissectionwedenethenotationanddenitionswewillusethroughoutthepaper.
ForagraphGwedenotethesetofnodesandedgesbyV(G)andE(G),respectively.
LetTbearootedtree.
TherootofTisdenotedbyroot(T).
ThesizeofT,denotedbynT,is|V(T)|.
Thedepthofanodey∈V(T),depth(y),isthenumberofedgesonthepathfromytoroot(T)andthedepthofT,denoteddT,isthemaximumdepthofanynodeinT.
Theparentofyisdenotedparent(y).
Anodewithnochildrenisaleafandotherwiseitisaninternalnode.
ThenumberofleavesinTisdenotedlT.
LetT(y)denotethesubtreeofTrootedatanodey∈V(T).
Ifz∈V(T(y))thenyisanancestorofzandifz∈V(T(y))\{y}thenyisaproperancestorofz.
Ifyisa(proper)ancestorofzthenzisa(proper)descendantofy.
WesaythatTislabeledifeachnodeyisassignedacharacter,denotedlabel(y),fromanalphabetΣ.
Thepathfromytoroot(T),ofnodesroot(T)=y1,yk=yisdenotedpath(y).
Hence,wecanformallystateTPSasfollows:GiventworootedtreePandTwithleavesx1,xrandy1,ys,respectively,determineallpairs(i,j)suchthatpath(xi)path(yj).
ForsimplicitywewillassumethatleavesinPandTarealwaysnumberedasaboveandweidentifyeachofthepathsbythenumberofthecorrespondingleaf.
Throughoutthepaperweassumeaunit-costRAMmodelofcomputationwithwordsizeΘ(lognT)andastandardinstructionsetincludingbitwisebooleanoperations,shifts,additionandmultiplication.
Allspacecomplexitiesrefertothenumberofwordsusedbythealgorithm.
69aroot(P)aroot(T)cx1bx2c1ax3⊥2a2b4⊥1b3b5PTFigure4.
2:Thelettersinsidethenodesarethelabels,andtheidentierofeachnodeiswrittenoutsidethenode.
InitiallywehaveX={root(P)}.
Sincelabel(root(P))=a=label(root(T))wereplaceroot(P)withischildrenandgetXroot(T)={x1,x2}.
Sincelabel(1)=label(x1)=label(x2)wegetX1={x3,x2}.
ContinuingthiswaywegetX2={⊥1,x2},X3={⊥1,⊥2},X4={x3,⊥2},andX5={x3,⊥2}.
Thenodes3and5areleavesofTandwethusreportpaths1and2aftercomputingX3andpath2aftercomputingX5.
4.
2AFrameworkforsolvingTPSInthissectionwepresentasimplegeneralalgorithmforthetreepathsubsequenceproblem.
Thekeyingredientinouralgorithmisthefollowingprocedure.
ForanyXV(P)andy∈V(T)dene:Down(X,y):ReturnthesetChild({x∈X|label(x)=label(y)})∪{x∈X|label(x)=label(y)}.
ThenotationChild(X)denotesthesetofchildrenofX.
Hence,Down(X,y)isthesetconsistingofnodesinXwithadierentlabelthanyandthechildrenofthenodesXwiththesamelabelasy.
WewillnowshowhowtosolveTPSusingthisprocedure.
Firstassignauniquenumberintherange{1,lP}toeachleafinP.
Then,foreachi,1≤i≤lP,addapseudo-leaf⊥iasthesinglechildoftheithleaf.
Allpseudo-leavesareassignedaspeciallabelβ∈Σ.
ThealgorithmtraversesTinadepthrstorderandcomputesateachnodeythesetXy.
Wecallthissetthestateaty.
Initially,thestateconsistsof{root(P)}.
Forz∈child(y),thestateXzcanbecomputedfromstateXyasXz=Down(Xy,z).
Ifzisaleafwereportthenumberofeachpseudo-leafinXzasthepathsinPthataresubsequencesofpath(z).
SeeFigure4.
2foranexample.
Toshowthecorrectnessofthisapproachweneedthefollowinglemma.
Lemma23Foranynodey∈V(T)thestateXysatisesthefollowingproperty:x∈Xypath(parent(x))path(y).
Proof.
Byinductiononthenumberofiterationsoftheprocedure.
Initially,X={root(P)}satisesthepropertysinceroot(P)hasnoparent.
SupposethatXyisthecurrentstateandz∈child(y)isthenextnodeinthedepthrsttraversalofT.
BytheinductionhypothesisXysatisestheproperty,thatis,foranyx∈Xy,path(parent(x))path(y)).
Then,Xz=Down(Xy,z)=Child({x∈Xy|label(x)=label(z)})∪{x∈Xy|label(x)=label(z)}.
LetxbeanodeinXy.
Therearetwocases.
Iflabel(x)=label(z)thenpath(x)path(z)sincepath(parent(x))path(y).
Hence,foranychildx′ofxwehavepath(parent(x′))path(z).
Ontheotherhand,iflabel(x)=label(z)thenx∈Xz.
Sincey=parent(z)wehavepath(y)path(z),andhencepath(parent(x))path(y)path(z).
Bytheabovelemmaallpathsreportedataleafz∈V(T)aresubsequencesofpath(z).
Thefollowinglemmashowsthatthepathsreportedataleafz∈V(T)areexactlythepathsinPthataresubsequencesofpath(z).
70Lemma24LetzbealeafinTandlet⊥ibeapseudo-leafinP.
Then,⊥i∈Xzpath(parent(⊥i))path(z).
Proof.
ItfollowsimmediatelyfromLemma23that⊥i∈Xzpath(parent(⊥i))path(z).
Itremainstoshowthatpath(parent(⊥i))path(z)⊥i∈Xz.
Letpath(z)=z1,zk,wherez1=root(T)andzk=z,andletpath(parent(⊥i))=y1,y,wherey1=root(P)andy=parent(⊥i).
Sincepath(parent(⊥i))path(z)therearenodeszji=yifor1≤i≤k,suchthat(i)ji1,itispossibletobuildamicrotreedecompositionMSofPinlineartimesuchthat|MS|=O(nP/s)and|V(M)|≤sforanyM∈MS4.
4.
3ImplementingtheAlgorithmInthissectionweshowhowtoimplementtheDownprocedureusingthemicrotreedecomposition.
FirstdecomposePaccordingtoLemma30foraparameterstobechosenlater.
Hence,eachmicrotreehasatmostsnodesand|MS|=O(nP/s).
WerepresentthestateXcompactlyusingabitvectorforeachmicrotree.
Specically,foranymicrotreeMwestoreabitvectorXM=[b1,bs],suchthatXM[i]=1itheithnodeinapreordertraversalofMisinX.
If|V(M)|0andTC[j]>0andhenceSCandTCsucestorepresentλonC.
ThenumberofcharactersappearinginbothTCandSCisatmostxandhenceeachentryofthevectorsisassignedanintegervalueintherange[1,x].
Thus,thetotalnumberofbitsneededforbothvectorsis2xlogx+1.
Hence,wecanchoosex=Θ(klogk)suchthatthevectorsforacellcanberepresentedinasinglemachineword.
ItfollowsthatifallvectorshavebeenprecomputedwegetanalgorithmforStringEditDistanceusingO(mnlogkk2+m+n)timeandO(2k+min(m,n))space.
Nextweshowhowtocomputevectorseciently.
GivenanycellC,wecanidentifythecharactersappearinginbothSCandTCbysortingSCandthenforeachindexiinTCuseabinarysearchtoseeifTC[i]appearsinSC.
NextwesortthecharactersappearinginbothsubstringsandinserttheirranksintothecorrespondingpositionsinSCandTC.
Allotherpositionsinthevectorsaregiventhevalue0.
ThisalgorithmusesO(xlogx)timeforeachcell.
However,sincethenumberofcellsisO(nmx2)thetotaltimebecomesO(nmlogxx),whichforourchoiceofxisO(nm(logk)2k).
Toimprovethiswegroupthecellsintomacrocellsofy*ycells.
Wethencomputethevectorrepresentationforeachofthesemacrocells.
ThevectorrepresentationforacellCisnowthecorrespondingsubvectorsofthemacrocellcontainingC.
Hence,eachvectorentryisnowintherange[0,xy]andthususeslog(xy+1)bits.
ComputingthevectorrepresentationusesO(xylog(xy))timeforeachmacrocellandsincethenumberofmacrocellsisO(nm(xy)2)thetotaltimetocomputeitisO(nmlog(xy)xy+m+n).
Itfollowsthatwecanchoosey=klogkandx=Θ(klogk)suchthatvectorsforacellcanberepresentedinasingleword.
Furthermore,withthischoiceofxandyallvectorsarecomputedinO(nmlogkk2+m+n)time.
Combinedwiththetimeusedtocomputethedistancewehaveshown:Theorem15ForstringsSandToflengthnandm,respectively,StringEditDistancecanbesolvedinO(mnlogkk2+m+n)timeandO(2k+min(m,n))space.
90Chapter6NewAlgorithmsforRegularExpressionMatching9192NewAlgorithmsforRegularExpressionMatchingPhilipBilleITUniversityofCopenhagenbeetle@itu.
dkAbstractInthispaperwerevisittheclassicalregularexpressionmatchingproblem,namely,givenaregularexpressionRandastringQ,decideifQmatchesoneofthestringsspeciedbyR.
LetmandnbethelengthofRandQ,respectively.
Onastandardunit-costRAMwithwordlengthw≥logn,weshowthattheproblemcanbesolvedinO(m)spacewiththefollowingrunningtimes:8>:O(nmlogww+mlogw)ifm>wO(nlogm+mlogm)if√wwO(nlogm+mlogm)if√wlogn,weachieveanO(lognloglogn)factorspeedupoverThompson'salgorithmusingO(m)space.
Hence,wesimultaneouslymatchthebestknowntimeandspaceboundsfortheproblem,withtheexceptionofanO(loglogn)factorintime.
Moreinterestingly,considerthecasewhentheregularexpressionis"small",e.
g.
,m=O(logn).
Thisisusuallythecaseinmostapplications.
TobeattheO(nlogn)timeofThompson'salgorithm,thefastalgorithms[Mye92a,BFC05]essentiallyconverttheNFAmentionedaboveintoadeterministicniteautomaton(DFA)andthensimulatethisinstead.
ConstructingandstoringtheDFAincursanadditionalexponentialtimeandspacecostinm,i.
e.
,O(2m)=O(n)(see[WM92b,NR04]forcompactDFArepresentations).
However,theDFAcannowbesimulatedinO(n)time,leadingtoanO(n)timeandspacealgorithm.
Surprisingly,ourresultshowsthatthisexponentialblow-upinmcanbeavoidedwithverylittlelossofeciency.
Moreprecisely,wegetanalgorithmusingO(nloglogn)timeandO(logn)space.
Hence,thespaceisimprovedexponentiallyatthecostofanO(loglogn)factorintime.
Inthecaseofanevensmallerregularexpression,e.
g.
,m=O(√logn),theslowdowncanbeeliminatedandweachieveoptimalO(n)time.
Forlargerwordlengthsourtimeboundsimprove.
Inparticular,whenw>lognloglogntheboundisbetterinallcases,exceptfor√w≤m≤w,andwhenw>log2nitimprovesallknowntimeboundsregardlessofhowmuchspaceisused.
ThekeytoobtainourresultsistoavoidexplicitlyconvertingsmallNFAsintoDFAs.
Insteadweshowhowtoeectivelysimulatethemdirectlyusingtheparallelismavailableattheword-levelofthemachinemodel.
Thekindofideaisnotnewandhasbeenappliedtomanyotherstringmatchingproblems,mostfamously,theShift-Oralgorithm[BYG92],andtheapproximatestringmatchingalgorithmbyMyers[Mye99].
However,noneofthesealgorithmscanbeeasilyextendedtoRegularExpressionMatching.
ThemainproblemisthecomplicateddependenciesbetweenstatesinanNFA.
Intuitively,astatemayhavelongpathsof-transitionstoalargenumberofotherstates,allofwhichhavetobetraversedinparallelinthestate-setsimulation.
ToovercomethisproblemwedevelopseveralnewtechniquesultimatelyleadingtoTheorem16.
Forinstance,weintroduceanewhierarchicaldecompositionofNFAssuitableforaparallelstate-setsimulation.
Wealsoshowhowstate-setsimulationsoflargeNFAsecientlyreducestosimulatingsmallNFAs.
Theresultspresentedinthispaperareprimarilyoftheoreticalinterest.
However,webelievethatmostoftheideasareusefulinpractice.
ThepreviousalgorithmsrequirelargetablesforstoringDFAs,andperformalongseriesoflookupsinthesetables.
Asthetablesbecomelargewecanexpectahighnumberofcache-missesduringthelookups,thuslimitingthespeedupinpractice.
Sinceweavoidthesetables,ouralgorithmsdonotsuerfromthisdefect.
Thepaperisorganizedasfollows.
InSec.
6.
2wereviewThompson'sNFAconstruction,andinSec.
6.
3wepresenttheabovementionedreduction.
InSec.
6.
4wepresentourrstsimplealgorithmfortheproblemwhichisthenimprovedinSec.
6.
5.
CombiningthesealgorithmswithourreductionleadstoTheorem16.
WeconcludewithacoupleofremarksandopenproblemsinSec.
6.
6.
6.
2RegularExpressionsandFiniteAutomataInthissectionwebrieyreviewThompson'sconstructionandthestandardstate-setsimulation.
ThesetofregularexpressionsoveranalphabetΣaredenedrecursivelyasfollows:Acharacterα∈Σisaregularexpression.
94(a)(b)(c)(d)αN(S)N(T)N(T)N(S)N(S)Figure6.
1:Thompson'sNFAconstruction.
Theregularexpressionforacharacterα∈ΣcorrespondstoNFA(a).
IfSandTareregularexpressionsthenN(ST),N(S|T),andN(S)correspondtoNFAs(b),(c),and(d),respectively.
Acceptingnodesaremarkedwithadoublecircle.
IfSandTareregularexpressionsthensoistheconcatenation,(S)·(T),theunion,(S)|(T),andthestar,(S).
Unnecessaryparenthesescanberemovedbyobservingthat·and|isassociativeandbyusingthestandardprecedenceoftheoperators,thatisprecedes·,whichinturnprecedes|.
Weoftenremovethe·whenwritingregularexpressions.
ThelanguageL(R)generatedbyRisthesetofallstringsmatchingR.
TheparsetreeT(R)ofRisthebinaryrootedtreerepresentingthehiearchicalstructureofR.
EachleafislabeledbyacharacterinΣandeachinternalnodeislabeledeither·,|,or.
AniteautomatonisatupleA=(V,E,δ,θ,φ),whereVisasetofnodescalledstates,Eissetofdirectededgesbetweenstatescalledtransitions,δ:E→Σ∪{}isafunctionassigninglabelstotransitions,andθ,φ∈Varedistinguishedstatescalledthestartstateandacceptingstate,respectively1.
Intuitively,Aisanedge-labeleddirectedgraphwithspecialstartandacceptingnodes.
Aisadeterministicniteautomaton(DFA)ifAdoesnotcontainany-transitions,andalloutgoingtransitionsofanystatehavedierentlabels.
Otherwise,Aisanon-deterministicautomaton(NFA).
WesaythatAacceptsastringQifthereisapathfromθtoφsuchthattheconcatenationoflabelsonthepathspellsoutQ.
Thompson[Tho68]showedhowtorecursivelyconstructaNFAN(R)acceptingallstringsinL(R).
TherulesarepresentedbelowandillustratedinFig.
6.
1.
N(α)istheautomatonconsistingofstatesθα,φα,andanα-transitionfromθαtoφα.
LetN(S)andN(T)beautomataforregularexpressionsSandTwithstartandacceptingstatesθS,θT,φS,andφT,respectively.
Then,NFAsN(S·T),N(S|T),andN(S)areconstructedasfollows:N(ST):AddstartstateθSTandacceptingstateφST,and-transitions(θST,θS),(φS,θT),and(φT,φST).
N(S|T):AddstartstateθS|TandacceptingstateφS|T,andadd-transitions(θS|T,θS),(θS|T,θT),(φS,φS|T),and(φT,φS|T).
N(S):AddanewstartstateθSandacceptingstateφS,and-transitions(θS,θS),(θS,φS),(φS,φS),and(φS,θS).
1SometimesNFAsareallowedasetofacceptingstates,butthisisnotnecessaryforourpurposes.
95ReadersfamiliarwithThompson'sconstructionwillnoticethatN(ST)isslightlydierentfromtheusualconstruction.
Thisisdonetosimplifyourlaterpresentationanddoesnotaecttheworstcasecomplexityoftheproblem.
AnyautomatonproducedbytheseruleswecallaThompson-NFA(TNFA).
Byconstruction,N(R)hasasinglestartandacceptingstate,denotedθandφ,respectively.
θhasnoincomingtransitionsandφhasnooutgoingtransitions.
Thetotalnumberofstatesis2mandsinceeachstatehasatmost2outgoingtransitionsthatthetotalnumberoftransitionsisatmost4m.
Furthermore,allincomingtransitionshavethesamelabel,andwedenoteastatewithincomingα-transitionsanα-state.
NotethatthestarconstructioninFig.
6.
1(d)introducesatransitionfromtheacceptingstateofN(S)tothestartstateofN(S).
Allsuchtransitionsarecalledbacktransitionsandallothertransitionsareforwardtransitions.
Weneedthefollowingproperty.
Lemma34(Myers[Mye92a])Anycycle-freepathinaTNFAcontainsatmostonebacktransition.
ForastringQoflengthnthestandardstate-setsimulationofN(R)onQproducesasequenceofstate-setsS0,Sn.
TheithsetSi,0≤i≤n,consistsofallstatesinN(R)forwhichthereisapathfromθthatspellsouttheithprexofQ.
Thesimulationcanbeimplementedwiththefollowingsimpleoperations.
Forastate-setSandacharacterα∈Σ,deneMove(S,α):ReturnthesetofstatesreachablefromSviaasingleα-transition.
Close(S):ReturnthesetofstatesreachablefromSvia0ormore-transitions.
SincethenumberofstatesandtransitionsinN(R)isO(m),bothoperationscanbeeasilyimplementedinO(m)time.
TheCloseoperationisoftencalledan-closure.
Thesimulationproceedsasfollows:Initially,S0:=Close({θ}).
IfQ[j]=α,1≤j≤n,thenSj:=Close(Move(Sj1,α)).
Finally,Q∈L(R)iφ∈Sn.
Sinceeachstate-setSjonlydependsonSj1thisalgorithmusesO(mn)timeandO(m)space.
6.
3FromLargetoSmallTNFAsInthissectionweshowhowtosimulateN(R)bysimulatinganumberofsmallerTNFAs.
WewillusethistoachieveourboundswhenRislarge.
6.
3.
1ClusteringParseTreesandDecomposingTNFAsLetRbearegularexpressionoflengthm.
WerstshowhowtodecomposeN(R)intosmallerTNFAs.
ThisdecompositionisbasedonasimpleclusteringoftheparsetreeT(R).
AclusterCisaconnectedsubgraphofT(R)andaclusterpartitionCSisapartitionofthenodesofT(R)intonode-disjointclusters.
SinceT(R)isabinarytreewithO(m)nodes,asimpletop-downprocedureprovidesthefollowingresult(seee.
g.
[Mye92a]):Lemma35GivenaregularexpressionRoflengthmandaparameterx,aclusterpartitionCSofT(R)canbeconstructedinO(m)timesuchthat|CS|=O(m/x),andforanyC∈CS,thenumberofnodesinCisatmostx.
ForaclusterpartitionCS,edgesadjacenttotwoclustersareexternaledgesandallotheredgesareinternaledges.
ContractingallinternaledgesinCSinducesamacrotree,whereeachclusterisrepresentedbyasinglemacronode.
LetCvandCwbetwoclusterswithcorrespondingmacronodesvandw.
WesaythatCvistheparentcluster(resp.
childcluster)ofCwifvistheparent(resp.
child)ofwinthemacrotree.
Therootclusterandleafclustersaretheclusterscorrespondingtotherootandtheleavesofthemacrotree.
AnexampleclusteringofaparsetreeisshowninFig.
6.
2(b).
GivenaclusterpartitionCSofT(R)weshowhowtodivideN(R)intoasetofsmallnestedTNFAs.
EachclusterC∈CSwillcorrespondtoaTNFAA,andweusethetermschild,parent,root,andleaffortheTNFAsinthesamewaywedowithclusters.
ForaclusterC∈CSwithchildrenC1,Cl,insertaspecialpseudo-nodepi,1≤i≤l,inthemiddleoftheexternaledgeconnectingCwithCi.
Welabeleachpseudo-nodebyaspecialcharacterβ∈Σ.
LetTCbe96(b)(a)(c)(d)(e)abcaaabbc·|||···aaabb··cC1C2C3C1C2C3ββA1A3acA2ββFigure6.
2:(a)Theparsetreefortheregularexpressionac|ab.
(b)Aclusteringof(a)intonode-disjointconnectedsubtreesC1,C2,andC3,eachwithatmost3nodes.
(c)Theclusteringfrom(b)extendedwithpseudo-nodes.
(d)ThenesteddecompositionofN(ac|ab).
(e)TheTNFAcorrespondingtoC1.
thetreeinducedbythesetofnodesinCand{p1,pl}.
EachleafinTCislabeledwithacharacterfromΣ∪{β},andhenceTCisawell-formedparsetreeforsomeregularexpressionRCoverΣ∪{β}.
Now,theTNFAAcorrespondingtoCisN(RC).
InA,childTNFAAiisrepresentedbyitsstartandacceptingstateθAiandφAiandapseudo-transitionlabeledβconnectingthem.
AnexampleofthesedenitionsisgiveninFig.
6.
2.
WecallanysetofTNFAsobtainedfromaclusterpartitionasaboveanesteddecompositionASofN(R).
Lemma36GivenaregularexpressionRoflengthmandaparameterx,anesteddecompositionASofN(R)canbeconstructedinO(m)timesuchthat|AS|=O(m/x),andforanyA∈AS,thenumberofstatesinAisatmostx.
Proof.
ConstructtheparsetreeT(R)forRandbuildaclusterpartitionCSaccordingtoLemma35withparametery=x412.
FromCSbuildanesteddecompositionASasdescribedabove.
EachC∈CScorrespondstoaTNFAA∈ASandhence|AS|=O(m/y)=O(m/x).
Furthermore,if|V(C)|≤ywehave|V(TC)|≤2y+1.
EachnodeinTCcontributestwostatestothecorrespondingTNFAA,andhencethetotalnumberofstatesinAisatmost4y+2=x.
Sincetheparsetree,theclusterpartition,andthenesteddecompositioncanbeconstructedinO(m)timetheresultfollows.
6.
3.
2SimulatingLargeAutomataWenowshowhowN(R)canbesimulatedusingtheTNFAsinanesteddecomposition.
ForthispurposewedeneasimpledatastructuretodynamicallymaintaintheTNFAs.
LetASbeanesteddecompositionofN(R)accordingtoLemma36,forsomeparameterx.
LetA∈ASbeaTNFA,letSAbeastate-setofA,letsbeastateinA,andletα∈Σ.
Asimulationdatastructuresupportsthe4operations:MoveA(SA,α),CloseA(SA),MemberA(SA,s),andInsertA(SA,s).
Here,theoperationsMoveAandCloseAaredenedexactlyasinSec.
6.
2,withthemodicationthattheyonlyworkonAandnotN(R).
TheoperationMemberA(SA,s)returnsyesifs∈SAandnootherwiseandInsertA(SA,s)returnsthesetSA∪{s}.
97Inthefollowingsectionsweconsidervariousecientimplementationsofsimulationdatastructures.
Fornowassumethatwehaveablack-boxdatastructureforeachA∈AS.
TosimulateN(R)weproceedasfollows.
First,xanorderingoftheTNFAsinthenesteddecompositionAS,e.
g.
,byapreordertraversalofthetreerepresentedgivenbytheparent/childrelationshipoftheTNFAs.
Thecollectionofstate-setsforeachTNFAinASarerepresentedinastate-setarrayXoflength|AS|.
Thestate-setarrayisindexedbytheabovenumbering,thatis,X[i]isthestate-setoftheithTNFAinAS.
FornotationalconveniencewewriteX[A]todenotetheentryinXcorrespondingtoA.
NotethataparentTNFAsharetwostateswitheachchild,andthereforeastatemayberepresentedmorethanonceinX.
ToavoidcomplicationswewillalwaysassurethatXisconsistent,meaningthatifastatesisincludedinthestate-setofsomeTNFA,thenitisalsoincludedinthestate-setsofallotherTNFAsthatshares.
IfS=A∈ASX[A]wesaythatXmodelsthestate-setSandwriteS≡X.
Nextweshowhowtodoastate-setsimulationofN(R)usingtheoperationsMoveASandCloseAS,whichwedenebelow.
Theseoperationsrecursivelyupdateastate-setarrayusingthesimulationdatastructures.
ForanyA∈AS,state-setarrayX,andα∈ΣdeneMoveAS(A,X,α):1.
X[A]:=MoveA(X[A],α)2.
ForeachchildAiofAintopologicalorderdo(a)X:=MoveAS(Ai,X,α)(b)IfφAi∈X[Ai]thenX[A]:=InsertA(X[A],φAi)3.
ReturnXCloseAS(A,X):1.
X[A]:=CloseA(X[A])2.
ForeachchildAiofAintopologicalorderdo(a)IfθAi∈X[A]thenX[Ai]:=InsertAi(X[Ai],θAi)(b)X:=CloseAS(Ai,X)(c)IfφAi∈X[Ai]thenX[A]:=InsertA(X[A],φAi)(d)X[A]:=CloseA(X[A])3.
ReturnXTheMoveASandCloseASoperationsrecursivelytraversesthenesteddecompositiontop-downprocessingthechildrenintopologicalorder.
Ateachchildthesharedstartandacceptingstatesarepropagatedinthestate-setarray.
Forsimplicity,wehavewrittenMemberAusingthesymbol∈.
Thestate-setsimulationofN(R)onastringQoflengthnproducesthesequenceofstate-setarraysX0,Xnasfollows:LetArbetherootautomatonandletXbeanemptystate-setarray(allentriesinXare).
Initially,setX[Ar]:=InsertAr(X[Ar],θAr)andcomputeX0:=CloseAS(Ar,CloseAS(Ar,X)).
Fori>0wecomputeXifromXi1asfollows:Xi:=CloseAS(Ar,CloseAS(Ar,MoveAS(Ar,Xi1,Q[i])))Finally,weoutputQ∈L(R)iφAr∈Xn[Ar].
ToseethatthisalgorithmcorrectlysolvesRegularExpressionMatchingitsucestoshowthatforanyi,0≤i≤n,Xicorrectlymodelstheithstate-setSiinthestandardstate-setsimulation.
Weneedthefollowinglemma.
Lemma37LetXbeastate-setarrayandletArbetherootTNFAinanesteddecompositionAS.
IfSisthestate-setmodeledbyX,thenMove(S,α)≡MoveAS(Ar,X,α)andClose(S)≡CloseAS(Ar,CloseAS(Ar,X)).
98Proof.
FirstconsidertheMoveASoperation.
LetAbetheTNFAinducedbyallstatesinAanddescendantsofAinthenesteddecomposition,i.
e.
,Aisobtainedbyrecursively"unfolding"thepseudo-statesandpseudo-transitionsinA,replacingthembytheTNFAstheyrepresent.
Weshowbyinductionthatthestate-arrayX′A:=MoveAS(A,X,α)modelsMove(S,α)onA.
Inparticular,plugginginA=Ar,wehavethatMoveAS(Ar,X,α)modelsMove(S,α)asrequired.
Initially,line1updatesX[A]tobethesetofstatesreachablefromasingleα-transitioninA.
IfAisaleaf,line2iscompletelybypassedandtheresultfollowsimmediately.
Otherwise,letA1,AlbethechildrenofAintopologicalorder.
AnyincomingtransitiontoastateθAioroutgoingtransitionfromastateφAiisan-transitionbyThompson'sconstruction.
Hence,noendpointofanα-transitioninAcanbesharedwithanyofthechildrenA1,Al.
Itfollowsthatafterline1theupdatedX[A]isthedesiredstate-set,exceptforthesharedstates,whichhavenotbeenhandledyet.
Byinduction,therecursivecallsinline2(a)handlethechildren.
Amongthesharedstatesonlytheacceptingones,φA1φAl,maybetheendpointofanα-transitionandthereforeline2(b)computesthecorrectstate-set.
TheCloseASoperationproceedsinasimilar,thoughslightlymorecomplicatedfashion.
LetXAbethestate-arraymodelingthesetofstatesreachableviaapathofforward-transitionsinA,andletXAbethestatearraymodellingClose(S)inA.
WeshowbyinductionthatifX′′A:=CloseAS(A,X)thenXAX′′AXA,wheretheinclusionreferstotheunderlyingstate-setsmodeledbythestate-setarrays.
Initially,line1updatesX[A]:=CloseA(X[A]).
IfAisaleafthenclearlyX′′A=XA.
Otherwise,letA1,AlbethechildrenofAintopologicalorder.
Line2recursivelyupdatethechildrenandpropagatethestartandacceptingstatesin(a)and(c).
FollowingeachrecursivecallweagainupdateX[A]:=CloseA(X[A])in(d).
NostateisincludedinX′′Aifthereisno-pathinAorthroughanychildofA.
Furthermore,sincethechildrenareprocessedintopologicalorderitisstraightforwardtoverifythatthesequenceofupdatesinline2ensurethatX′′Acontainallstatesreachableviaapathofforward-transitionsinAorthroughachildofA.
Hence,byinductionwehaveXAX′′AXAasdesired.
Asimilarinductionshowsthatthestate-setarrayCloseAS(Ar,X′′)modelsthesetofstatesreachablefromX′′usingapathconsistingofforward-transitionsandatmost1backtransition.
However,byLemma34thisisexactlythesetofstatesreachablebyapathof-transitions.
Hence,CloseAS(Ar,X′′)modelsClose(S)andtheresultfollows.
ByLemma37thestate-setsimulationcanbedoneusingtheCloseASandMoveASoperationsandthecomplexitynowdirectlydependsonthecomplexitiesofthesimulationdatastructure.
Puttingitalltogetherthefollowingreductioneasilyfollows:Lemma38LetRbearegularexpressionoflengthmoveralphabetΣandletQastringoflengthn.
GivenasimulationdatastructureforTNFAswithx>1)&Dα.
ThisshouldbeunderstoodasCnotation,wheretheright-shiftisunsigned.
ReadersfamiliarwiththeShift-Oralgorithm[BYG92]willnoticethesimilarity.
Toseethecorrectness,observethatstateiisputinS′istate(i1)isinSandtheithstateisanα-state.
Sincetheendpointsofα-transitionsareconsecutiveinthetopologicalorderitfollowsthatS′iscorrect.
Here,state(i1)canonlyinuencestatei,andthismakestheoperationeasytoimplementinparallel.
However,thisisnotthecaseforCloseA.
Here,anystatecanpotentiallyaectalargenumberofstatesreachablethroughlong-paths.
Todealwiththisweusethe2Weuseexponentiationtodenoterepetition,i.
e.
,130=1110.
100followingsteps.
Y:=(S*X)&EZ:=((Y|I)(I>>m))&IS′:=((Z*C)>wmWedescribeindetailwhythis,atrstglancesomewhatcrypticsequence,correctlycomputesS′astheresultofCloseA(S).
ThevariablesYandZaresimplytemporaryvariablesinsertedtoincreasethereadabilityofthecomputation.
LetS=s1.
.
.
sm.
Initially,S*XconcatenatesmcopiesofSwithazerobitbetweeneachcopy,thatis,S*X=s1.
.
.
sm*1(0m1)m1=(0s1.
.
.
sm)m.
Thebitwise&withEgivesY=0y1,1y1,2.
.
.
y1,m0y2,1y2,2.
.
.
y2,m0.
.
.
0ym,1ym,2.
.
.
ym,m,whereyi,j=1istatejisinSandstateiis-reachablefromj.
Inotherwords,thesubstringYi=yi,1.
.
.
yi,mindicatesthesetofstatesinSthathaveapathof-transitionstoi.
Hence,stateishouldbeincludedinCloseA(S)preciselyifatleastoneofthebitsinYiis1.
Thisisdeterminednext.
First(Y|I)(I>>m)setsalltestbitsto1andsubtractsthetestbitsshiftedrightbympositions.
ThisensuresthatifallpositionsinYiare0,theithtestbitintheresultis0andotherwise1.
Thetestbitsarethenextractedwithabitwise&withI,producingthestringZ=z10mz20m.
.
.
zm0m.
Thisisalmostwhatwewantsincezi=1istateiisinCloseA(S).
ThenalcomputationcompressestheZintothedesiredformat.
Themultiplicationproducesthefollowinglength2m2string:Z*C=z10mz20m.
.
.
zm0m*1(0m11)m1=z10m1z1z20m2···z1.
.
.
zk0mk···z1.
.
.
zm10z1.
.
.
zm0z2.
.
.
zm···0kzk+1.
.
.
zm···0m1zm0mInparticular,positionsm(m1)+1throughm2(fromtheleft)containthetestbitscompressedintoastringoflengthm.
Thetwoshiftszeroesallotherbitsandmovesthissubstringtotherightmostpositionintheword,producingthenalresult.
Sincem=O(√w)alloftheaboveoperationscanbedoneinconstanttime.
Finally,observethatInsertAandMemberAaretriviallyimplementedinconstanttime.
Thus,Lemma39ForanyTNFAwithm=O(√w)statesthereisasimulationdatastructureusingO(m)spaceandO(m2)preprocessingtimewhichsupportsalloperationsinO(1)time.
ThemainbottleneckintheabovedatastructureisthestringEthatrepresentsall-paths.
OnaTNFAwithmstatesErequiresatleastm2bitsandhencethisapproachonlyworksform=O(√w).
InthenextsectionweshowhowtousethestructureofTNFAstodobetter.
6.
5Overcomingthe-closureBottleneckInthissectionweshowhowtocomputean-closureonaTNFAwithm=O(w)statesinO(logm)time.
ComparedwiththeresultoftheprevioussectionwequadraticallyincreasethesizeoftheTNFAattheexpenseofusinglogarithmictime.
Thealgorithmiseasilyextendedtoanecientsimulationdatastructure.
ThekeyideaisanewhierarchicaldecompositionofTNFAsdescribedbelow.
6.
5.
1Partial-TNFAsandSeparatorTreesFirstweneedsomedenitions.
LetAbeaTNFAwithparsetreeT.
EachnodevinTuniquelycorrespondstotwostatesinA,namely,thestartandacceptingstatesθA′andφA′oftheTNFAA′withtheparsetreeconsistingofvandalldescendantsofv.
WesayvassociatesthestatesS(v)={θA′,φA′}.
Ingeneral,ifC101X(v)={θPI,φPI}vPOPIθPIθPOφPOφPIPOPI(a)(b)Figure6.
3:(a)InnerandouterpTNFAs.
(b)Thecorrespondingseparatortreeconstruction.
isaclusterofT,i.
e.
,anyconnectedsubgraphofT,wesayCassociatesthesetofstatesS(C)=∪v∈CS(v).
Wedenethepartial-TNFA(pTNFA)forC,asthedirected,labeledsubgraphofAinducedbythesetofstatesS(C).
Inparticular,AisapTNFAsinceitisinducedbyS(T).
ThetwostatesassociatedbytherootnodeofCaredenedtobethestartandacceptingstateofthecorrespondingpTNFA.
Weneedthefollowingresult.
Lemma40ForanypTNFAPwithm>2statesthereexistsapartitioningofPintotwosubgraphsPOandPIsuchthat(i)POandPIarepTNFAswithatmost2/3m+2stateseach,(ii)anytransitionfromPOtoPIendsinθPIandanytransitionfromPItoPOstartsinφPI,and(iii)thepartitioningcanbecomputedinO(m)time.
Proof.
LetPbepTNFAwithm>2statesandletCbethecorrespondingclusterwithtnodes.
SinceCisabinarytreewithmorethan1node,Jordan'sclassicalresult[Jor69]establishesthatwecanndinO(t)timeanedgeeinCwhoseremovalsplitsCintotwoclusterseachwithatmost2/3t+1nodes.
ThesetwoclusterscorrespondtotwopTNFAs,POandPI,andsincem=2teachofthesehaveatmost2/3m+2states.
Hence,(i)and(iii)follows.
For(ii)assumew.
l.
o.
g.
thatPOisthepTNFAcontainingthestartandacceptingstateofP,i.
e.
,θPO=θPandφPO=φP.
Then,POisthepTNFAobtainedfromPbyremovingallstatesofPI.
FromThompson'sconstructionitiseasytocheckthatanytransitionfromPOtoPIendsinθPIandanytransitionfromPItoPOmuststartinφPI.
Intuitively,ifwedrawP,PIis"surrounded"byPO,andthereforewewilloftenrefertoPIandPOastheinnerpTNFAandtheouterpTNFA,respectively(seeFig.
6.
3(a)).
ApplyingLemma40recursivelygivesthefollowingessentialdatastructure.
LetPbeapTNFAwithmstates.
TheseparatortreeforPisabinary,rootedtreeBdenedasfollows:Ifm=2,i.
e.
,PisatrivialpTNFAconsistingoftwostatesθPandφP,thenBisasingleleafnodevthatstoresthesetX(v)={θP,φP}.
Otherwise(m>2),computePOandPIaccordingtoLemma40.
TherootvofBstoresthesetX(v)={θPI,φPI},andthechildrenofvarerootsofseparatortreesforPOandPI,respectively(seeFig.
6.
3(b)).
WiththeaboveconstructioneachnodeintheseparatortreenaturallycorrespondtoapTNFA,e.
g.
,therootcorrespondstoP,thechildrentoPIandPO,andsoon.
WedenotethepTNFAcorrespondingtonodevinBbyP(v).
AsimpleinductioncombinedwithLemma40(i)showsthatifvisanodeofdepthkthenP(v)containsatmost(23)km+6states.
Hence,thedepthofBisatmostd=log3/2m+O(1).
ByLemma40(iii)eachlevelofBcanbecomputedinO(m)timeandthusBcanbecomputedinO(mlogm)totaltime.
1026.
5.
2ARecursive-ClosureAlgorithmWenowpresentasimple-closurealgorithmforapTNFA,whichrecursivelytraversestheseparatortreeB.
WerstgivethehighlevelideaandthenshowhowitcanbeimplementedinO(1)timeforeachlevelofB.
SincethedepthofBisO(logm)thisleadstothedesiredresult.
ForapTNFAPwithmstates,aseparatortreeBforP,andanodevinBdeneCloseP(v)(S):1.
ComputethesetZX(v)ofstatesinX(v)thatare-reachablefromSinP(v).
2.
IfvisaleafreturnS′:=Z,elseletuandwbethechildrenofv,respectively:(a)ComputethesetGV(P(v))ofstatesinP(v)thatare-reachablefromZ.
(b)ReturnS′:=CloseP(u)((S∪G)∩V(P(u)))∪CloseP(w)((S∪G)∩V(P(w))).
Lemma41ForanynodevintheseparatortreeofapTNFAP,CloseP(v)(S)computesthesetofstatesinP(v)reachableviaapathof-transitions.
Proof.
LetSbethesetofstatesinP(v)reachableviaapathof-transitions.
WeneedtoshowthatS=S′.
ItiseasytocheckthatanystateinS′isreachableviaapathof-transitionsandhenceS′S.
Weshowtheotherdirectionbyinductionontheseparatortree.
IfvisleafthenthesetofstatesinP(v)isexactlyX(v).
SinceS′=Ztheclaimfollows.
Otherwise,letuandwbethechildrenofv,andassumew.
l.
o.
g.
thatX(v)={θP(u),φP(u)}.
Considerapathpof-transitionsfromstatestostates′.
Therearetwocasestoconsider:Case1:s′∈V(P(u)).
IfpconsistsentirelyofstatesinP(u)thenbyinductionitfollowsthats′∈CloseP(u)(S∩V(P(u))).
Otherwise,pcontainastatefromP(w).
However,byLemma40(ii)θP(u)isonpandhenceθP(u)∈Z.
Itfollowsthats′∈Gandtherefores′∈CloseP(u)(G∩V(P(u))).
Case2:s′∈V(P(w)).
Asabove,withtheexceptionthatφP(u)isnowthestateinZ.
Inallcasess′∈S′andtheresultfollows.
6.
5.
3ImplementingtheAlgorithmNextweshowhowtoecientlyimplementtheabovealgorithminparallel.
Thekeyingredientisacompactmappingofstatesintopositionsinbitstrings.
SupposeBistheseparatortreeofdepthdforapTNFAPwithmstates.
TheseparatormappingMmapsthestatesofPintoanintervalofintegers[1,l],wherel=3·2d.
Themappingisdenedrecursivelyaccordingtotheseparatortree.
LetvbetherootofB.
Ifvisaleafnodetheintervalis[1,3].
ThetwostatesofP,θPandφP,aremappedtopositions2and3,respectively,whileposition1isleftintentionallyunmapped.
Otherwise,letuandwbethechildrenofv.
Recursively,mapP(u)totheinterval[1,l/2]andP(w)totheinterval[l/2+1,l].
Sincetheseparatortreecontainsatmost2dleavesandeachcontribute3positionsthemappingiswell-dened.
ThesizeoftheintervalforPisl=3·2log3/2m+O(1)=O(m).
Wewillusetheunmappedpositionsastestbitsinouralgorithm.
TheseparatormappingcompactlymapsallpTNFAsrepresentedinBintosmallintervals.
Specically,ifvisanodeatdepthkinB,thenP(v)ismappedtoanintervalofsizel/2koftheform[(i1)·l2k+1,i·l2k],forsome1≤i≤2k.
TheintervalsthatcorrespondtoapTNFAP(v)aremappedandallotherintervalsareunmapped.
WewillrefertoastatesofPbyitsmappedpositionM(s).
Astate-setofPisrepresentedbyabitstringSsuchthat,forallmappedpositionsi,S[i]=1itheiisinthestate-set.
Sincem=O(w),state-setsarerepresentedinaconstantnumberofwords.
ToimplementthealgorithmwedeneasimpledatastructureconsistingoffourlengthlbitstringsXθk,Xφk,Eθk,andEφkforeachlevelkoftheseparatortree.
Fornotationalconvenience,wewillconsiderthestringsatlevelkastwo-dimensionalarraysconsistingof2kintervalsoflengthl/2k,i.
e.
,Xθk[i,j]ispositionjintheithintervalofXθk.
Iftheithintervalatlevelkisunmappedthenallpositionsinthisintervalare0in103allfourstrings.
Otherwise,supposethattheintervalcorrespondstoapTNFAP(v)andletX(v)={θv,φv}.
Thestringsaredenedasfollows:Xθk[i,j]=1iθvis-reachableinP(v)fromstatej,Eθk[i,j]=1istatejis-reachableinP(v)fromθv,Xφk[i,j]=1iφvis-reachableinP(v)fromstatej,Eφk[i,j]=1istatejis-reachableinP(v)fromφv.
Inaddtiontothese,wealsostoreastringIkcontainingatestbitforeachinterval,thatis,Ik[i,j]=1ij=1.
SincethedepthofBisO(logm)thestringsuseO(logm)words.
Withasimpledepth-rstsearchtheycanallbecomputedinO(mlogm)time.
LetSbeabitstringrepresentingastate-setofA.
WeimplementtheoperationCloseA(S)bycomputingasequenceofintermediatestringsS0,Sdeachcorrespondingtoalevelintheaboverecursivealgorithm.
Initially,S0:=SandthenalstringSdistheresultofCloseA(S).
Atlevelk,0≤k>t))&IkFθ:=Zθ(Zθ>>t)Gθ:=Fθ&EθkYφ:=Sk&XφkZφ:=((Yφ|Ik)(Ik>>t))&IkFφ:=Zφ(Zφ>>t)Gφ:=Fφ&EφkSk+1:=Sk|Gθ|GφWearguethatthecomputationcorrectlysimulates(inparallel)aleveloftherecursivealgorithm.
AssumethatatthebeginningoflevelkthestringSkrepresentsthestate-setcorrespondingtherecursivealgorithmafterklevels.
WeinterpretSkasdividedintor=l/2kintervalsoflengtht=l/2k1,eachprexedwithatestbit,i.
e.
,Sk=0s1,1s1,2.
.
.
s1,t0s2,1s2,2.
.
.
s2,t0.
.
.
0sr,1sr,2.
.
.
sr,tAssumerstthatalltheseintervalsaremappedintervalscorrespondingtopTNFAsP(v1)P(vr),andletX(vi)={θvi,φvi},1≤i≤r.
Initially,Sk&XθkproducesthestringYθ=0y1,1y1,2.
.
.
y1,t0y2,1y2,2.
.
.
y2,t0.
.
.
0yr,1yr,2.
.
.
yr,t,whereyi,j=1iθviis-reachableinP(vi)fromstatejandjisinSk.
Then,similartothesecondlineinthesimplealgorithm,(Yθ|Ik)(Ik>>t)&IkproducesastringoftestbitsZθ=z10tz20t.
.
.
zr0t,wherezi=1iatleastoneofyi,1.
.
.
yi,tis1.
Inotherwords,zi=1iθviis-reachableinP(vi)fromanystateinSk∩V(P(vi)).
Intuitively,theZθcorrespondstothe"θ-part"oftheofZ-setintherecursivealgorithm.
Nextwe"copy"thetestbitstogetthestringFθ=Zθ(Zθ>>t)=0zt10zt2.
.
.
0ztr.
Thebitwise&withEθkgivesGθ=0g1,1g1,2.
.
.
g1,t0g2,1g2,2.
.
.
g2,t0.
.
.
0gr,1gr,2.
.
.
gr,t.
Bydenition,gi,j=1istatejis-reachableinP(vi)fromθviandzi=1.
Inotherwords,Gθrepresents,for1≤i≤r,thestatesinP(vi)thatare-reachablefromSk∩V(P(vi))throughθvi.
Again,noticethecorrespondancewiththeG-setintherecursivealgorithm.
Thenext4linesareidenticaltorst4withtheexceptionthatθisexchangedbyφ.
Hence,Gφrepresentsthestatesthat-reachablethroughφv1φvr.
104Finally,Sk|Gθ|GφcomputestheunionofthestatesinSk,Gθ,andGφproducingthedesiredstate-setSk+1forthenextleveloftherecursion.
Intheabove,weassumedthatallintervalsweremapped.
Ifthisisnotthecaseitiseasytocheckthatthealgorithmisstillcorrectsincethestringinourdatastructurecontain0sinallunmappedintervals.
Thealgorithmusesconstanttimeforeachofthed=O(logm)levelsandhencethetotaltimeisO(logm).
6.
5.
4TheSimulationDataStructureNextweshowhowtogetafullsimulationdatastructure.
First,notethatintheseparatormappingtheendpointsoftheα-transitionsareconsecutive(asinSec.
6.
4).
ItfollowsthatwecanusethesamealgorithmasintheprevioussectiontocomputeMoveAinO(1)time.
Thisrequiresadictionaryofbitstrings,Dα,usingadditionalO(m)spaceandO(mlogm)preprocessingtime.
TheInsertA,andMemberAoperationsaretriviallyimplementedinO(1).
Puttingitalltogetherwehave:Lemma42ForaTNFAwithm=O(w)statesthereisasimulationdatastructureusingO(m)spaceandO(mlogm)preprocessingtimewhichsupportsalloperationsinO(logm)time.
CombiningthesimulationdatastructuresfromLemmas39and42withthereductionfromLemma38andtakingthebestresultgivesTheorem16.
Notethatthesimplesimulationdatastructureisthefastestwhenm=O(√w)andnissucientlylargecomparedtom.
6.
6RemarksandOpenProblemsThepresentedalgorithmsassumeaunit-costmultiplicationoperation.
SincethisoperationisnotinAC0(theclassofcircuitsofpolynomialsize(inw),constantdepth,andunboundedfan-in)itisinterestingtoreconsiderwhathappenswithourresultsifweremovemultiplicationfromourmachinemodel.
ThesimulationdatastructurefromSec.
6.
4usesmultiplicationtocomputeCloseAandalsofortheconstanttimehashingtoaccessDα.
Ontheotherhand,thealgorithmofSec.
6.
5onlyusesmultiplicationforthehashing.
However,Lemma42stillholdssincewecansimplyreplacethehashingbybinarysearchtree,whichusesO(logm)time.
ItfollowsthatTheorem16stillholdsexceptfortheO(n+m2)boundinthelastline.
AnotherinterestringpointistocompareourresultswiththeclassicalShift-OralgorithmbyBaeza-YatesandGonnet[BYG92]forexactpatternmatching.
Likeours,theiralgorithmsimulatesaNFAwithmstatesusingword-levelparallelism.
ThestructureofthisNFApermitsaveryecientsimulationwithanO(w)speedupofthesimpleO(nm)timesimulation.
OurresultsgeneralizethistoregularexpressionswithaslightlyworsespeedupofO(w/logw).
WewonderifitispossibletoremovetheO(logw)factorseparatingthesebounds.
Fromapracticalviewpoint,thesimplealgorithmofSec.
6.
4seemsverypromisingsinceonlyabout15instructionsareneededtocarryoutastepinthestate-setsimulation.
Combinedwithideasfrom[NR04]webelievethatthiscouldleadtoapracticalimprovementoverpreviousalgorithms.
6.
7AcknowledgmentsTheauthorwishestothankRasmusPaghandIngeLiGrtzformanycommentsandinterestingdiscussions.
105106Chapter7ImprovedApproximateStringMatchingandRegularExpressionMatchingonZiv-LempelCompressedTexts107108ImprovedApproximateStringMatchingandRegularExpressionMatchingonZiv-LempelCompressedTextsPhilipBilleITUniversityofCopenhagenbeetle@itu.
dkRolfFagerbergUniversityofSouthernDenmarkrolf@imada.
sdu.
dkIngeLiGrtzTechnicalUniversityofDenmarkilg@imm.
dtu.
dkAbstractWestudytheapproximatestringmatchingandregularexpressionmatchingproblemforthecasewhenthetexttobesearchediscompressedwiththeZiv-Lempeladaptivedictionarycompressionschemes.
Wepresentatime-spacetrade-othatleadstoalgorithmsimprovingthepreviouslyknowncomplexitiesforbothproblems.
Inparticular,wesignicantlyimprovethespacebounds,whichinpracticalapplicationsarelikelytobeabottleneck.
7.
1IntroductionModerntextdatabases,e.
g.
forbiologicalandWorldWideWebdata,arehuge.
Tosavetimeandspace,itisdesireableifdatacanbekeptincompressedformandstillallowecientsearching.
MotivatedbythisAmirandBenson[AB92a,AB92b]initiatedthestudyofcompressedpatternmatchingproblems,thatis,givenatextstringQincompressedformZandaspecied(uncompressed)patternP,ndalloccurrencesofPinQwithoutdecompressingZ.
Thegoalistosearchmoreecientlythanthena¨veapproachofdecompressingZintoQandthensearchingforPinQ.
Variouscompressedpatternmatchingalgorithmshavebeenproposeddependingonthetypeofpatternandcompressionmethod,seee.
g.
,[AB92b,FT98,KTS+98,KNU03,Nav03,MUN03].
Forinstance,givenastringQoflengthucompressedwiththeZiv-Lempel-Welchscheme[Wel84]intoastringoflengthn,Amiretal.
[ABF96]gaveanalgorithmforndingallexactoccurrencesofapatternstringoflengthminO(n+m2)timeandspace.
Inthispaperwestudytheclassicalapproximatestringmatchingandregularexpressionmatchingprob-lemsinthecontextofcompressedtexts.
Asinpreviousworkontheseproblems[KNU03,Nav03]wefocusonthepopularZL78andZLWadaptivedictionarycompressionschemes[ZL78,Wel84].
Wepresentanewtechniquethatgivesageneraltime-spacetrade-o.
Theresultingalgorithmsimproveallpreviouslyknowncomplexitiesforbothproblems.
Inparticular,wesignicantlyimprovethespacebounds.
Whensearchinglargetextdatabases,spaceislikelytobeabottleneckandthereforethisisofcrucialimportance.
7.
1.
1ApproximateStringMatchingGivenstringsPandQandanerrorthresholdk,theclassicalapproximatestringmatchingproblemistondallendingpositionsofsubstringsofQwhoseeditdistancetoPisatmostk.
Theeditdistancebetweentwostringsistheminimumnumberofinsertions,deletions,andsubstitutionsneededtoconvertonestringtotheother.
TheclassicaldynamicprogrammingsolutionduetoSellers[Sel80]solvestheprobleminO(um)109timeandO(m)space,whereuandmarethelengthofQandP,respectively.
Severalimprovementsofthisresultareknown,seee.
g.
,thesurveybyNavarro[Nav01a].
Forthispaperweareparticularlyinterestedinthefastsolutionforsmallvaluesofk,namely,theO(uk)timealgorithmbyLandauandVishkin[LV89]andthemorerecentO(uk4/m+u)timealgorithmduetoColeandHariharan[CH02](weassumew.
l.
o.
g.
thatk|A|+k.
Inthesecases,morethankdeletionsorkinsertions,respectively,areneededtotransformBtoA.
7.
3.
1SearchingforMatchesLetPbeastringoflengthmandletkbeanerrorthreshold.
Toavoidtrivialcasesweassumethatkm+kwealsostoretheindexoftheancestorxofzjofdepthm+k.
ThisinformationcaneasilybecomputedwhileconstructingCwithinthesametimeandspacebounds,i.
e.
,usingO(nτ)timeandO(n/τ)space.
Descriptionsarecomputedfromleft-to-rightasfollows.
Initially,setl0=0,u0=0,rpre(z0)=,rsuf(z0)=,MI(z0)=,andMO(z0)=.
Tocomputethedescriptionofzi,1≤i≤n,rstfollowthepathpofreferencesuntilweencounteranelementzj∈C.
Usingtheinformationstoredatzjwesetli:=|p|+ljandui=ui1+li1.
ByLemma43(ii)thedistancetozjisatmost2τandthereforelianduicanbecomputedinO(τ)timegiventhedescriptionofzi1.
Tocomputerpre(zi)wecomputethelabelofthepathfromz0towardszioflengthmin(m+k,li).
Therearetwocasestoconsider:Ifli≤m+kwesimplycomputethelabelofthepathfromzitoz0andletrpre(zi)bethereverseofthisstring.
Otherwise(li>m+k),weusethe"shortcut"storedatzjtondtheancestorzhofdistancem+ktoz0.
Thereverseofthelabelofthepathfromzhtoz0isthenrpre(zi).
Hence,rpre(zi)iscomputedinO(m+k+τ)=O(m+τ)time.
Thestringrsuf(zi)maybethedividedoverseveralphrasesandwethereforerecursivelyfollowpathstowardstherootuntilwehavecomputedtheentirestring.
Itiseasytoseethatthefollowingalgorithmcorrectlydecodesthedesiredsubstringoflengthmin(m+k,ui)endingatpositionui+li1.
1.
Initially,setl:=min(m+k,ui+li1),t:=i,ands:=.
2.
Computethepathpofreferencesfromztoflengthr=min(l,depth(zt))andsets:=s·label(p).
3.
Ifr0.
AniteautomatonisatupleA=(V,E,Σ,θ,Φ),whereVisasetofnodescalledstates,EissetofdirectededgesbetweenstatescalledtransitionseachlabeledbyacharacterfromΣ∪{},θ∈Visastartstate,andΦVisasetofnalstates.
Inshort,Aisanedge-labeleddirectedgraphwithaspecialstartnodeandasetofacceptingnodes.
Aisadeterministicniteautomaton(DFA)ifAdoesnotcontainany-transitions,andalloutgoingtransitionsofanystatehavedierentlabels.
Otherwise,Aisanon-deterministicautomaton(NFA).
ThelabelofapathpinAistheconcatenationoflabelsonthetransitionsinp.
ForasubsetSofstatesinAandcharacterα∈Σ∪{},denethetransitionmap,δ(S,α),asthesetofstatesreachablefromSviaapathlabeledα.
Computingthesetδ(S,α)iscalledastate-settransition.
Weextendδtostringsbydeningδ(S,α·B)=δ(δ(S,α),B),foranystringBandcharacterα∈Σ.
WesaythatAacceptsthestringBifδ({θ},B)∩Φ=.
OtherwiseArejectsQ.
Asintheprevioussection,wesaythatj∈[1,|B|]isamatchithereisani∈[1,j]suchthatAacceptsB[i,j].
Thesetofallmatchesisdenoted(A,B).
GivenaregularexpressionR,anNFAAacceptingpreciselythestringsinL(R)canbeobtainedbyseveralclassicmethods[MY60,Glu61,Tho68].
Inparticular,Thompson[Tho68]gaveasimplewell-knownconstructionwhichwewillrefertoasaThompsonNFA(TNFA).
ATNFAAforRhasatmost2mstates,atmost4mtransitions,andcanbecomputedinO(m)time.
Hence,astate-settransitioncanbecomputedinO(m)timeusingabreadth-rstsearchofAandthereforewecantestacceptanceofQinO(um)timeandO(m)space.
Thissolutioniseasilyadaptedtondallmatchesinthesamecomplexitybyaddingthestartstatetoeachofthecomputedstate-setsimmediatelybeforecomputingthenext.
Formally,δ(S,α·B)=δ(δ(S∪{θ},α),B),foranystringBandcharacterα∈Σ.
Amatchthenoccursatpositionjifδ({θ},Q[1,j])∩Φ=.
116Q=ananasbananerR=(b|n)an12340Danbrnas678501324567bnanstartASu0={s0,s1,s3}Su1={s0,s1,s3}Su2={s0,s1,s3,s4,s5}Su3={s0,s1,s3,s4,s5,s7}Su4={s0,s1,s3}Su5={s0,s2,s1,s3,s5}Su6={s0,s1,s3,s6}Su7={s0,s1,s3}Su8={s0,s1,s3}Z=(0,a)(0,n)(1,n)(1,s)(0,b)(3,a)(2,e)(0,r)eFigure7.
4:ThecompressedstringZrepresentingQandthecorrespondingdictionarytrieD.
TheTNFAAfortheregularexpressionRandthecorrespondingstate-setsSuiaregiven.
Thelastmatchpointersareasfol-lows:lastmatch(s7,zi)={z0}fori=0,1,8,lastmatch(s2,zi)=lastmatch(s4,zi)=lastmatch(s5,zi)={z3}fori=3,6,andlastmatch(s6,zi)={z2}fori=2,7.
Allotherlastmatchpointersare⊥.
Usingthedescriptionwecanndthematches:Sinces2∈Su5theelementz3∈M(s2,z6)representsthematchu6+depth(z3)1=9.
Theothermatchescanbefoundsimilarly.
7.
4.
2SearchingforMatchesLetA=(V,E,Σ,θ,Φ)beaTNFAwithmstates.
GivenacompressedstringZ=z1.
.
.
znrepresentingastringQoflengthuweshowhowtond(A,Q)eciently.
Asintheprevioussectionletliandui,0≤i≤nbethelengthandstartpositionofphrase(zi).
WeprocessZfromleft-to-rightandcomputeadescriptionforziconsistingofthefollowinginformation.
Theintegersliandui.
Thestate-setSui=δ({θ},Q[1,ui]+li1).
ForeachstatesofAthecompressionelementlastmatch(s,zi)=x,wherexistheancestorofziofmaximumdepthsuchthatδ({s},phrase(x))∩Φ=.
Ifthereisnoancestorthatsatisesthis,thenlastmatch(s,zi)=⊥.
AnexampledescriptionisshowninFig.
7.
4.
ThetotalsizeofthedescriptionforziisO(m)andthereforethespaceforalldescriptionsisO(nm).
Inthenextsectionwewillshowhowtocomputethedescriptions.
Assumefornowthatwehaveprocessedz0,zi1.
Weshowhowtondthematcheswithin[ui,ui+li1].
GivenastatesdeneM(s,zi)={x1,xk},wherex1=lastmatch(s,zi),xj=lastmatch(s,parent(xj1)),1e.
,x1,xkisthesequenceofancestorsofziobtainedbyrecursivelyfollowinglastmatchpointers.
BythedenitionoflastmatchandM(s,zi)itfollowsthatM(s,zi)isthesetofancestorsxofssuchthatδ(s,x)∩Φ=.
Hence,ifs∈Sui1theneachelementx∈M(s,zi)representsamatch,namely,ui+depth(x)1.
Eachmatchmayoccurforeachofthe|Sui1|statesandtoavoidreportingduplicatematchesweuseapriorityqueuetomergethesetsM(s,zi)foralls∈Sui1,whilegeneratingthesesetsinparallel.
Asimilarapproachisusedin[Nav03].
ThistakesO(logm)timepermatch.
Sinceeachmatchcanbeduplicatedatmost|Sui1|=O(m)timesthetotaltimeforreportingmatchesisO(occ·mlogm).
1177.
4.
3ComputingDescriptionsNextweshowhowtocomputedescriptionseciently.
Let1≤τ≤nbeaparameter.
Initially,computeasetCofcompressionelementsaccordingtoLemma43withparameterτ.
Foreachelementzj∈Cwestoreljandthetransitionsetsδ(s,phrase(zj))foreachstatesinA.
EachtransitionsetusesO(m)spaceandthereforethetotalspaceusedforzjisO(m2).
DuringtheconstructionofCwecomputeeachofthetransitionsetsbyfollowingthepathofreferencestothenearestelementy∈Candcomputingstate-settransitionsfromytozj.
ByLemma43(ii)thedistancetoyisatmost2τandthereforeallofthemtransitionsetscanbecomputedinO(τm2)time.
Since,|C|=O(n/τ)thetotalpreprocessingtimeisO(n/τ·τm2)=O(nm2)andthetotalspaceisO(n/τ·m2).
Thedescriptionscannowbecomputedasfollows.
TheintegerslianduicanbecomputedasbeforeinO(τ)time.
Alllastmatchpointersforallcompressionelementscaneasilybeobtainedwhilecomputingthetransitionssets.
Hence,weonlyshowhowtocomputethestate-setvalues.
First,letSu0:={θ}.
TocomputeSuifromSui1wecomputethepathptozifromthenearestelementy∈C.
Letp′bethepathfromz0toy.
Sincephrase(zi)=label(p′)·label(p)wecancomputeSui=δ(Sui1,phrase(zi))intwostepsasfollows.
FirstcomputethesetS′=s∈Sui1δ(s,phrase(y)).
(7.
1)Sincey∈Cweknowthetransitionsetsδ(s,phrase(y))andwecanthereforecomputetheunioninO(m2)time.
Secondly,wecomputeSuiasthesetδ(S′,label(p)).
SincethedistancetoyisatmostτthisstepusesO(τm)time.
Hence,allthestate-setsSu0SuncanbecomputedinO(nm(m+τ))time.
7.
4.
4AnalysisCombiningitall,wehaveanalgorithmusingO(nm(m+τ)+occ·mlogm)timeandO(nm+nm2/τ)space.
Notethatsinceweareusing(n)space,hashingisnotneededandthealgorithmworksforZLWaswell.
Insummary,thiscompletestheproofofTheorem18.
7.
4.
5ExploitingWord-levelParallelismIfweusetheword-parallelisminherentintheword-RAMmodel,thealgorithmofNavarro[Nav03]usesO(m/w(2m+nm)+occ·mlogm)timeandO(m/w(2m+nm))space,wherewisthenumberofbitsinawordofmemoryandspaceiscountedasthenumberofwordsused.
ThekeyideainNavarro'salgorithmistocompactlyencodestate-setsinbitstringsstoredinO(m/w)words.
UsingaDFAbasedonaGlushkovautomaton[Glu61]toquicklycomputestate-settransitions,andbitwiseORandANDoperationstocomputeunionsandintersectionsamongstate-sets,itispossibletoobtaintheaboveresult.
TheO(m/w2m)termintheaboveboundsisthetimeandspaceusedtoconstructtheDFA.
AsimilarideacanbeusedtoimproveTheorem18.
However,sinceoursolutionisbasedonThompson'sautomatonwedonotneedtoconstructaDFA.
Moreprecisely,usingthestate-setencodingofTNFAsgivenin[Mye92a,BFC05]astate-settransitioncanbecomputedinO(m/logn)timeafterO(n)timeandspacepreprocessing.
Sincestate-setsareencodedasbitstringseachtransitionsetusesm/lognspaceandtheunionin(7.
1)canbecomputedinO(mm/logn)timeusingabitwiseORoperation.
Asn≥√uinZL78andZLW,wehavethatlogn≥12loguandthereforeTheorem18canbeimprovedbyroughlyafactorlogu.
Specically,wegetanalgorithmusingO(nm/logu(m+τ)+occ·mlogm)timeandO(nmm/logu/τ+nm)space.
118Bibliography[AB92a]AmihoodAmirandGaryBenson.
Ecienttwo-dimensionalcompressedmatching.
InPro-ceedingsofthe2ndDataCompressionConference,pages279–288,1992.
[AB92b]AmihoodAmirandGaryBenson.
Two-dimensionalperiodicityanditsapplications.
InPro-ceedingsofthe3rdAnnualACM-SIAMSymposiumonDiscreteAlgorithms,pages440–452,1992.
[ABF96]AmihoodAmir,GaryBenson,andMartinFarach.
Letsleepingleslie:patternmatchinginZ-compressedles.
J.
Comput.
SystemSci.
,52(2):299–307,1996.
[ADKF70]V.
L.
Arlazarov,E.
A.
Dinic,M.
A.
Kronrod,andI.
A.
Faradzev.
Oneconomicconstructionofthetransitiveclosureofadirectedgraph(inrussian).
englishtranslationinsovietmath.
dokl.
11,1209-1210,1975.
Dokl.
Acad.
Nauk.
,194:487–488,1970.
[AFT06]TatsuyaAkutsu,DaijiFukagawa,andAtsuhiroTakasu.
Approximatingtreeeditdistancethroughstringeditdistance.
InProceedingsofthe17thInternationalSymposiumonAlgorithmsandComputation,LectureNotesinComputerScience,volume4288,pages90–99,2006.
[AGKR04]StephenAlstrup,CyrilGavoille,HaimKaplan,andTheisRauhe.
Nearestcommonancestors:Asurveyandanewalgorithmforadistributedenvironment.
TheoryComput.
Syst.
,37:441–456,2004.
[AGM+90]S.
F.
Altschul,W.
Gish,W.
Miller,E.
W.
Myers,andD.
J.
Lipman.
Basiclocalalignmentsearchtool.
J.
Mol.
Biol.
,215(3):403–410,1990.
[AH94]T.
AkutsuandM.
M.
Halldorsson.
Ontheapproximationoflargestcommonpointsetsandlargestcommonsubtrees.
InProceedingsofthe5thAnnualInternationalSymposiumonAlgorithmsandComputation,LectureNotesinComputerScience,volume834,pages405–413,1994.
[AH97]SusanneAlbersandTorbenHagerup.
Improvedparallelintegersortingwithoutconcurrentwriting.
Inform.
andComput.
,136:25–51,1997.
[AHdLT97]StephenAlstrup,JacobHolm,KristiandeLichtenberg,andMikkelThorup.
Minimizingdiametersofdynamictrees.
InProceedingsofthe24thInternationalColloquiumonAutomata,LanguagesandProgramming,LectureNotesinComputerScience,volume1256,pages270–280,1997.
[AHNR98]ArneAndersson,TorbenHagerup,StefanNilsson,andRajeevRaman.
SortinginlineartimeJ.
Comput.
SystemSci.
,57(1):74–93,1998.
[AHR98]StephenAlstrup,ThoreHusfeldt,andTheisRauhe.
Markedancestorproblems.
InProceedingsofthe39thAnnualIEEESymposiumonFoundationsofComputerScience,pages534–543,1998.
119[AHT00]StephenAlstrup,JacobHolm,andMikkelThorup.
Maintainingcenterandmedianindynamictrees.
InProceedingsofthe7thScandinavianWorkshoponAlgorithmTheory,LectureNotesinComputerScience,volume1851,pages46–56,2000.
[AHU76]A.
V.
Aho,D.
S.
Hirschberg,andJ.
D.
Ullman.
Boundsonthecomplexityofthelongestcommonsubsequenceproblem.
J.
ACM,1(23):1–12,1976.
[Aku06]TatsuyaAkutsu.
Arelationbetweeneditdistancefororderedtreesandeditdistanceforeulerstrings.
Inform.
Process.
Lett.
,100(3):105–109,2006.
[AKW98]AlfredV.
Aho,BrianW.
Kernighan,andPeterJ.
Weinberger.
TheAWKProgrammingLanguage.
Addison-Wesley,1998.
[ALM+98]A.
Arora,C.
Lund,R.
Motwani,M.
Sudan,andM.
Szegedy.
Proofvericationandthehardnessofapproximationproblems.
J.
ACM,45(3):501–555,1998.
[ALP04]AmihoodAmir,MosheLewenstein,andElyPorat.
Fasteralgorithmsforstringmatchingwithkmismatches.
J.
Algorithms,50(2):257–275,2004.
[AR02]StephenAlstrupandTheisRauhe.
Improvedlabelingschemesforancestorqueries.
InPro-ceedingsofthe13thAnnualACM-SIAMSymposiumonDiscreteAlgorithms,pages947–953,2002.
[AS01]LaurentAlonsoandR.
Schott.
Onthetreeinclusionproblem.
ActaInform.
,37(9):653–670,2001.
[ASU86]AlfredV.
Aho,RaviSethi,andJereyD.
Ullman.
Compilers:principles,techniques,andtools.
Addison-WesleyLongmanPublishingCo.
,Inc.
,Boston,MA,USA,1986.
[BCGM99]LucBoasson,PatrickCegielski,I.
Guessarian,andYuriMatiyasevich.
Window-accumulatedsubsequencematchingproblemislinear.
InInProceedingsofthe18thACMSIGMOD-SIGACT-SIGARTsymposiumonPrinciplesofDatabaseSystems,pages327–336,1999.
[BFC00]MichaelA.
BenderandMartinFarach-Colton.
TheLCAproblemrevisited.
InProceedingsofthe4thLatinAmericanSymposiumonTheoreticalInformatics,pages88–94,2000.
[BFC05]PhilipBilleandMartinFarach-Colton.
Fastandcompactregularexpressionmatching,2005.
Submittedtoajournal.
Preprintavailiableatarxiv.
org/cs/0509069.
[Bil05]PhilipBille.
Asurveyontreeeditdistanceandrelatedproblems.
Theoret.
Comput.
Sci.
,337(1-3):217–239,2005.
[Bil06]PhilipBille.
Newalgorithmsforregularexpressionmatching.
InProceedingsofthe33rdInter-nationalColloquiumonAutomata,LanguagesandProgramming,LectureNotesinComputerScience,volume4051,pages643–654,2006.
[BML+04]DenilsonBarbosa,AlbertoO.
Mendelzon,LeonidLibkin,LaurentMignet,andMarceloAre-nas.
EcientincrementalvalidationofXMLdocuments.
InProceedingsofthe20thInterna-tionalConferenceonDataEngineering,page671,2004.
[BY89]RicardoA.
Baeza-Yates.
EcientTextSearching.
PhDthesis,Dept.
ofComputerScience,UniversityofWaterloo,1989.
[BY91]RicardoA.
Baeza-Yates.
Searchingsubsequences.
Theoret.
Comput.
Sci.
,78(2):363–376,1991.
[BYG92]RicardoBaeza-YatesandGastonH.
Gonnet.
Anewapproachtotextsearching.
Commun.
ACM,35(10):74–82,1992.
120[BYN96]RicardoA.
Baeza-YatesandGonzaloNavarro.
Afasteralgorithmforapproximatestringmatching.
InProceedingsofthe7thAnnualSymposiumonCombinatorialPatternMatching,LectureNotesinComputerScience,volume1075,pages1–23,1996.
[CD99]J.
ClarkandS.
DeRose.
XMLpathlanguage(XPath),availableashttp://www.
w3.
org/TR/xpath,1999.
[CGM97]S.
ChawatheandH.
Garcia-Molina.
Meaningfulchangedetectioninstructureddata.
InProceedingsoftheACMSIGMODInternationalConferenceonManagementofData,pages26–37,1997.
[CH02]RichardColeandRameshHariharan.
Approximatestringmatching:Asimplerfasteralgo-rithm.
SIAMJ.
Comput.
,31(6):1761–1782,2002.
[Cha06]TimothyM.
Chan.
All-pairsshortestpathsforunweightedundirectedgraphsinO(mn)time.
InProceedingsofthe17thAnnualACM-SIAMSymposiumonDiscreteAlgorithms,pages514–523,2006.
[Cha07]TimothyM.
Chan.
Morealgorithmsforall-pairsshortestpathsinweightedgraphs.
InPro-ceedingsofthe39thAnnualACMSymposiumonTheoryofComputing,2007.
toappear.
[Che98]WeiminChen.
Moreecientalgorithmfororderedtreeinclusion.
J.
Algorithms,26:370–385,1998.
[Che00]WeiminChen.
Multi-subsequencesearching.
Inform.
Process.
Lett.
,74(5-6):229–233,2000.
[Che01]WeiminChen.
Newalgorithmfororderedtree-to-treecorrectionproblem.
J.
Algorithms,40:135–158,2001.
[CHI99]RichardCole,RameshHariharan,andPiotrIndyk.
Treepatternmatchingandsubsetmatch-ingindeterministicO(nlog3n)-time.
InProceedingsofthe10thAnnualACM-SIAMSympo-siumonDiscreteAlgorithms,pages245–254,1999.
[Chu87]M.
J.
Chung.
O(n2.
5)algorithmforthesubgraphhomeomorphismproblemontrees.
J.
Algorithms,8(1):106–112,1987.
[CLRS01]ThomasH.
Cormen,CharlesE.
Leiserson,RonaldL.
Rivest,andCliordStein.
IntroductiontoAlgorithms,secondedition.
MITPress,2001.
[CLZU03]MaximeCrochemore,GadM.
Landau,andMichalZiv-Ukelson.
Asubquadraticsequencealignmentalgorithmforunrestrictedscoringmatrices.
SIAMJ.
Comput.
,32(6):1654–1673,2003.
[CM07]GrahamCormodeandS.
Muthukrishnan.
Thestringeditdistancematchingproblemwithmoves.
ACMTrans.
Algorithms,3(1):2,2007.
[CMT03]MaximeCrochemore,BorivojMelichar,andZdenˇekTronˇcek.
Directedacyclicsubsequencegraph:Overview.
J.
DiscreteAlgorithms,1(3-4):255–280,2003.
[CR72]StephenA.
CookandRobertA.
Reckhow.
Time-boundedrandomaccessmachines.
InPro-ceedingsofthe4thAnnualACMSymposiumonTheoryofComputing,pages73–80,1972.
[CRGMW96]S.
S.
Chawathe,A.
Rajaraman,H.
Garcia-Molina,andJ.
Widom.
Changedetectioninhierar-chicallystructuredinformation.
InProceedingsoftheACMSIGMODInternationalConferenceonManagementofData,pages493–504,1996.
[DDHS00]KeithDiefendor,PradeepK.
Dubey,RonHochsprung,andHunterScales.
AltiVecextensiontoPowerPCacceleratesmediaprocessing.
IEEEMicro,20(2):85–95,2000.
121[DFG+97]GautamDas,RudolfFleischer,LeszekGasieniec,DimitriosGunopulos,andJuhaK¨arkk¨ainen.
Episodematching.
InProceedingsofthe8thAnnualSymposiumonCombinatorialPatternMatching,LectureNotesinComputerScience,volume1264,pages12–27,1997.
[DGM90]MosheDubiner,ZviGalil,andEdithMagen.
Fastertreepatternmatching.
InProceedingsofthe31stAnnualIEEESymposiumontheFoundationsofComputerScience,pages145–150,1990.
[Die89]P.
F.
Dietz.
Fullypersistentarrays(extendedarray).
InProceedingsoftheWorkshoponAlgorithmsandDataStructures,LectureNotesinComputerScience,volume382,pages67–74,1989.
[DKM+94]MartinDietzfelbinger,AnnaKarlin,KurtMehlhorn,FriedhelmMeyeraufderHeide,HansRohnert,andRobertTarjan.
Dynamicperfecthashing:Upperandlowerbounds.
SIAMJ.
Comput.
,23(4):738–761,1994.
[DMRW06]ErikD.
Demaine,ShayMozes,BenjaminRossman,andOrenWeimann.
AnO(n3)-timealgorithmfortreeeditdistance.
Arxivpreprintcs.
DS/0604037,April2006.
[DMRW07]ErikDemaine,ShayMozes,BenjaminRossman,andOrenWeimann.
Anoptimaldecomposi-tionalgorithmfortreeeditdistance.
InProceedingsofthe34thInternationalColloquiumonAutomata,LanguagesandProgramming,2007.
[DT03]SergeDulucqandLaurentTichit.
Rnasecondarystructurecomparison:exactanalysisoftheZhang-Shashatreeeditalgorithm.
Theoret.
Comput.
Sci.
,306(1-3):471–484,2003.
[DT05]SergeDulucqandHel`eneTouzet.
Decompositionalgorithmsforthetreeeditdistanceproblem.
J.
DiscreteAlgorithms,3(2-4):448–471,2005.
[EGG88]DavidEppstein,ZviGalil,andRaaeleGiancarlo.
Speedingupdynamicprogramming.
InProceedingsofthe29thAnnualIEEESymposiumonFoundationsofComputerScience,pages488–496,1988.
[EGGI92]DavidEppstein,ZviGalil,RaaeleGiancarlo,andGiuseppeF.
Italiano.
Sparsedynamicprogrammingi:Linearcostfunctions.
J.
ACM,39(3):519–545,1992.
[FM96]P.
FerraginaandS.
Muthukrishnan.
Ecientdynamicmethod-lookupforobjectorientedlanguages.
InProceedingsofthe4thAnnualEuropeanSymposiumonAlgorithms,LectureNotesinComputerScience,volume1136,pages107–120,1996.
[Fre97]GregN.
Frederickson.
Ambivalentdatastructuresfordynamic2-edge-connectivityandksmallestspanningtrees.
SIAMJ.
Comput.
,26(2):484–538,1997.
[FT94]MartinFarachandMikkelThorup.
Fastcomparisonofevolutionarytrees.
InProceedingsofthe5thAnnualACM-SIAMSymposiumonDiscreteAlgorithms,pages481–488,1994.
[FT98]MartinFarachandMikkelThorup.
StringmatchinginLempel-Zivcompressedstrings.
Algo-rithmica,20(4):388–404,1998.
[FW93]MichaelL.
FredmanandDanE.
Willard.
Surpassingtheinformationtheoreticboundwithfusiontrees.
J.
Comput.
SystemSci.
,47(3):424–436,1993.
[FW94]MichaelL.
FredmanandDanE.
Willard.
Trans-dichotomousalgorithmsforminimumspanningtreesandshortestpaths.
J.
Comput.
SystemSci.
,48(3):533–551,1994.
[GJ79]MichaelJ.
GareyandDavidS.
Johnson.
ComputersandIntractability:AGuidetotheTheoryofNP-completeness.
Freeman,1979.
122[GK05]MinosGarofalakisandAmitKumar.
Xmlstreamprocessingusingtree-editdistanceembed-dings.
ACMTrans.
DatabaseSyst.
,30(1):279–332,2005.
[Glu61]VictorM.
Glushkov.
Theabstracttheoryofautomata.
RussianMath.
Surveys,16(5):1–53,1961.
[GN98]ArvindGuptaandNaomiNishimura.
Findinglargestsubtreesandsmallestsupertrees.
Algo-rithmica,21:183–210,1998.
[Got82]O.
Gotoh.
Animprovedalgorithmformatchingbiologicalsequences.
J.
MolecularBiology,162(3):705–708,1982.
[GT83]HaroldN.
GabowandRobertEndreTarjan.
Alinear-timealgorithmforaspecialcaseofdis-jointsetunion.
InProceedingsofthe15thAnnualACMSymposiumonTheoryofComputing,pages246–251,1983.
[Gus97]DanGuseld.
Algorithmsonstrings,trees,andsequences:computerscienceandcomputationalbiology.
Cambridge,1997.
[Hag98]TorbenHagerup.
Sortingandsearchingonthewordram.
InProceedingsofthe15thAnnualSymposiumonTheoreticalAspectsofComputerScience,LectureNotesinComputerScience,volume1373,pages366–398,1998.
[Han04]YijieHan.
Improvedalgorithmforallpairsshortestpaths.
Inform.
Process.
Lett.
,91(5):245–250,2004.
[Han06]YijieHan.
AnO(n3(loglogn/logn)5/4)timealgorithmforallpairsshortestpaths.
InPro-ceedingsofthe14thAnnualEuropeanSymposiumonAlgorithms,LectureNotesinComputerScience,volume4168,pages411–417,2006.
[Hir75]D.
S.
Hirschberg.
Alinearspacealgorithmforcomputingmaximalcommonsubsequences.
Commun.
ACM,18(6):341–343,1975.
[HMP01]TorbenHagerup,PeterBroMiltersen,andRasmusPagh.
Deterministicdictionaries.
J.
Algorithms,41(1):69–85,2001.
[HN05]HeikkiHyyr¨oandGonzaloNavarro.
Bit-parallelwitnessesandtheirapplicationstoapproxi-matestringmatching.
Algorithmica,41(3):203–231,2005.
[HO82]ChristophM.
HomannandMichaelJ.
O'Donnell.
Patternmatchingintrees.
J.
ACM,29(1):68–95,1982.
[HP01]HaruoHosoyaandBenjaminPierce.
RegularexpressionpatternmatchingforXML.
InProceedingsofthe28thACMSIGPLAN-SIGACTSymposiumonPrinciplesofProgrammingLanguages,pages67–80,2001.
[HT84]D.
HarelandR.
E.
Tarjan.
Fastalgorithmsforndingnearestcommonancestors.
SIAMJ.
Comput.
,13(2):338–355,1984.
[HT02]YijieHanandMikkelThorup.
IntegersortinginO(n√loglogn)expectedtimeandlinearspace.
InProceedingsofthe43rdAnnualIEEESymposiumonFoundationsofComputerScience,pages135–144,2002.
[HTGK03]MatthiasH¨ochsmann,ThomasT¨oller,RobertGiegerich,andStefanKurtz.
Localsimilarityinrnasecondarystructures.
InProceedingsoftheIEEEComputerSocietyConferenceonBioinformatics,pages159–158,2003.
123[ISY03]LucianIlie,BaozhenShan,andShengYu.
Fastalgorithmsforextendedregularexpressionmatchingandsearching.
InProceedingsofthe20thAnnualSymposiumonTheoreticalAspectsofComputerScience,LectureNotesInComputerScience,volumeVol.
2607,pages179–190,2003.
[JHS06]JesperJansson,NgoTrungHieu,andWing-KinSung.
Localgappedsubforestalignmentanditsapplicationinndingrnastructuralmotifs.
J.
Comp.
Biology,13(3):702–718,2006.
[JL01]JesperJanssonandAndrzejLingas.
Afastalgorithmforoptimalalignmentbetweensimi-larorderedtrees.
InProceedingsofthe12thAnnualSymposiumonCombinatorialPatternMatching,LecturenotesofComputerScience,volume2089,2001.
[JL03]JesperJanssonandAndrzejLingas.
Afastalgorithmforoptimalalignmentbetweensimilarorderedtrees.
Fundam.
Inform.
,56(1-2):105–120,2003.
[Jor69]C.
Jordan.
Surlesassemblagesdeslignes.
J.
ReineAngew.
Math.
,70:185–190,1869.
[JP06]JesperJanssonandZeshanPeng.
Algorithmsforndingamostsimilarsubforest.
InPro-ceedingsofthe17thAnnualSymposiumonCombinatorialPatternMatching,LecturenotesofComputerScience,volume4009,pages377–388,2006.
[JWZ95]TaoJiang,LushengWang,andKaizhongZhang.
Alignmentoftrees–analternativetotreeedit.
Theoret.
Comput.
Sci.
,143(1):137–148,1995.
[KA94]DmitryKeselmanandAmihoodAmir.
Maximumagreementsubtreeinasetofevolutionarytrees–metricsandecientalgorithms.
InProceedingsofthe35thAnnualIEEESymposiumonFoundationsofComputerScience,pages758–769,1994.
[Kil92]PekkaKilpel¨ainen.
TreeMatchingProblemswithApplicationstoStructuredTextDatabases.
PhDthesis,UniversityofHelsinki,DepartmentofComputerScience,November1992.
[Kle98]P.
N.
Klein.
Computingtheedit-distancebetweenunrootedorderedtrees.
InProceedingsofthe6thAnnualEuropeanSymposiumonAlgorithms,LectureNotesinComputerScience,volume1461,pages91–102,1998.
[Kle02]PhilipKlein,2002.
Personalcommunication.
[KM93]PekkaKilpel¨ainenandHeikkiMannila.
Retrievalfromhierarchicaltextsbypartialpatterns.
InProceedingsofthe16thConferenceonResearchandDevelopmentinInformationRetrieval,pages214–222,1993.
[KM95a]PekkaKilpel¨ainenandHeikkiMannila.
Orderedandunorderedtreeinclusion.
SIAMJ.
Comput.
,24:340–356,1995.
[KM95b]JamesR.
KnightandEugeneW.
Myers.
Super-patternmatching.
Algorithmica,13(1/2):211–243,1995.
[KM95c]JamesRobertKnightandEugeneW.
Myers.
Approximateregularexpressionpatternmatch-ingwithconcavegappenalties.
Algorithmica,14:85–121,1995.
[KMY95]S.
Khanna,R.
Motwani,andF.
F.
Yao.
Approximationalgorithmsforthelargestcommonsubtreeproblem.
Technicalreport,StanfordUniversity,1995.
[Knu69]DonaldErwinKnuth.
TheArtofComputerProgramming,Volume1.
AddisonWesley,1969.
[KNU03]JuhaK¨arkk¨ainen,GonzaloNavarro,andEskoUkkonen.
ApproximatestringmatchingonZiv-Lempelcompressedtext.
J.
DiscreteAlgorithms,1(3-4):313–338,2003.
124[Kos89]S.
RaoKosaraju.
Ecienttreepatternmatching.
InProceedingsofthe30thAnnualIEEESymposiumontheFoundationsofComputerScience,pages178–183,1989.
[KTS+98]TakuyaKida,MasayukiTakeda,AyumiShinohara,MasamichiMiyazaki,andSetsuoArikawa.
MultiplepatternmatchinginLZWcompressedtext.
InProceedingsofthe8thDataCompres-sionConference,pages103–112,1998.
[KTSK00]PhilipKlein,SrikantaTirthapura,DanielSharvit,andBenKimia.
Atree-edit-distanceal-gorithmforcomparingsimple,closedshapes.
InProceedingsofthe11thAnnualACM-SIAMSymposiumonDiscreteAlgorithms,pages696–704,2000.
[LM01]QuanzhongLiandBongkiMoon.
IndexingandqueryingXMLdataforregularpathexpres-sions.
InProceedingsofthe27thInternationalConferenceonVeryLargeDataBases,pages361–370,2001.
[LMS98]GadM.
Landau,EugeneW.
Myers,andJeanetteP.
Schmidt.
Incrementalstringcomparison.
SIAMJ.
Comput.
,27(2):557–582,1998.
[LST01]ChinLungLu,Zheng-YaoSu,andChuanYiTang.
Anewmeasureofeditdistancebetweenlabeledtrees.
InProceedingsofthe7thAnnualInternationalComputingandCombinatoricsConference,LectureNotesinComputerScience,volume2108,pages338–348,2001.
[Lu79]S.
Y.
Lu.
Atree-to-treedistanceanditsapplicationtoclusteranalysis.
IEEETransactionsonPatternAnalysisandMachineIntelligence,1:219–224,1979.
[Lu84]S.
Y.
Lu.
Atree-matchingalgorithmbasedonnodesplittingandmerging.
IEEETransactionsonPatternAnalysisandMachineIntelligence,6(2):249–256,1984.
[LV89]G.
M.
LandauandU.
Vishkin.
Fastparallelandserialapproximatestringmatching.
J.
Algorithms,10:157–169,1989.
[MKT+00]TetsuyaMatsumoto,TakuyaKida,MasayukiTakeda,AyumiShinohara,andSetsuoArikawa.
Bit-parallelapproachtoapproximatestringmatchingincompressedtexts.
InProceedingsofthe7thInternationalSymposiumonStringProcessingandInformationRetrieval,pages221–228,2000.
[MM88]WebbMillerandEugeneW.
Myers.
Sequencecomparisonwithconcaveweightingfunctions.
Bull.
ofMath.
Biology,50(2):97–120,1988.
[MM89]E.
W.
MyersandW.
Miller.
Approximatematchingofregularexpressions.
Bull.
ofMath.
Biology,51:5–37,1989.
[MM96]S.
MuthukrishnanandMartinM¨uller.
Timeandspaceecientmethod-lookupforobject-orientedprograms.
InProceedingsofthe7thAnnualACM-SIAMSymposiumonDiscreteAlgorithms,pages42–51,1996.
[MN90]K.
MehlhornandS.
N¨ahler.
BoundedordereddictionariesinO(loglogN)timeandO(n)space.
Inform.
Process.
Lett.
,35(4):183–189,1990.
[MNU05]VeliM¨akinen,GonzaloNavarro,andEskoUkkonen.
Transpositioninvariantstringmatching.
J.
Algorithms,56(2):124–153,2005.
[MOG98]EugeneW.
Myers,PauloOliva,andKatiaS.
Guimaraes.
Reportingexactandapproximateregularexpressionmatches.
InProceedingsofthe9thAnnualSymposiumonCombinatorialPatternMatching,LectureNotesinComputerScience,volume1448,pages91–103,1998.
[Mot92]RajeevMotwani.
Lecturenotesonapproximationalgorithmsvolume1.
TechnicalReportSTAN-CS-92-1435,StanfordUniversity,DepartmentofComputerScience,1992.
125[MP80]W.
MasekandM.
Paterson.
Afasteralgorithmforcomputingstringeditdistances.
J.
Comput.
SystemSci.
,20:18–31,1980.
[MR90]HeikkiMannilaandK.
J.
R¨aih¨a.
Onquerylanguagesforthep-stringdatamodel.
InformationModellingandKnowledgeBases,pages469–482,1990.
[MT92]JiriMatouˇsekandR.
Thomas.
Onthecomplexityofndingiso-andothermorphismsforpartialk-trees.
DiscreteMath.
,108:343–364,1992.
[MUN03]VeliM¨akinen,EskoUkkonen,andGonzaloNavarro.
Approximatematchingofrun-lengthcompressedstrings.
Algorithmica,35(4):347–369,2003.
[Mur01]MakotoMurata.
ExtendedpathexpressionsofXML.
InProceedingsofthe20thACMSym-posiumonPrinciplesofDatabaseSystems,pages126–137,2001.
[MY60]R.
McNaughtonandH.
Yamada.
Regularexpressionsandstategraphsforautomata.
IRETrans.
onElectronicComputers,9(1):39–47,1960.
[Mye86]EugeneW.
Myers.
AnO(ND)dierencealgorithmanditsvariations.
Algorithmica,1(2):251–266,1986.
[Mye91]EugeneW.
Myers.
Anoverviewofsequencecomparisonalgorithmsinmolecularbiology.
TechnicalReport91–29,Univ.
ofArizona,Dept.
ofComputerScience,1991.
[Mye92a]E.
W.
Myers.
Afour-russianalgorithmforregularexpressionpatternmatching.
J.
ACM,39(2):430–448,1992.
[Mye92b]EugeneW.
Myers.
Approximatematchingofnetworkexpressionswithspacers.
J.
ofCompu-tationalBiology,3(1):33–51,1992.
[Mye99]GeneMyers.
Afastbit-vectoralgorithmforapproximatestringmatchingbasedondynamicprogramming.
J.
ACM,46(3):395–415,1999.
[Nav01a]GonzaloNavarro.
Aguidedtourtoapproximatestringmatching.
ACMComput.
Surv.
,33(1):31–88,2001.
[Nav01b]GonzaloNavarro.
NR-grep:afastandexiblepattern-matchingtool.
Software–PracticeandExperience,31(13):1265–1312,2001.
[Nav03]GonzaloNavarro.
Regularexpressionsearchingoncompressedtext.
J.
DiscreteAlgorithms,1(5-6):423–443,2003.
[Nav04]GonzaloNavarro.
Approximateregularexpressionsearchingwitharbitraryintegerweights.
NordicJ.
Comput.
,11(4):356–373,2004.
[NKT+01]GonzaloNavarro,TakuyaKida,MasayukiTakeda,AyumiShinohara,andSetsuoArikawa.
Fasterapproximatestringmatchingovercompressedtext.
InProceedingsofthe11thDataCompressionConference,page459,Washington,DC,USA,2001.
IEEEComputerSociety.
[NR98]G.
NavarroandM.
Ranot.
AgeneralpracticalapproachtopatternmatchingoverZiv-Lempelcompressedtext.
TechnicalReportTR/DCC-98-12,Dept.
ofComputerScience,Univ.
ofChile.
,1998.
[NR03]GonzaloNavarroandMathieuRanot.
Fastandsimplecharacterclassesandboundedgapspatternmatching,withapplicationstoproteinsearching.
J.
Comp.
Biology,10(6):903–923,2003.
126[NR04]GonzaloNavarroandMathieuRanot.
Newtechniquesforregularexpressionsearching.
Algorithmica,41(2):89–116,2004.
[NRT00]NaomiNishimura,PrabhakarRagde,andDimitriosM.
Thilikos.
Findingsmallestsupertreesunderminorcontainment.
Int.
J.
Found.
Comput.
Sci.
,11(3):445–465,2000.
[OFW99]StuartOberman,GregFavor,andFredWeber.
AMD3DNow!
technology:Architectureandimplementations.
IEEEMicro,19(2):37–48,1999.
[PT06]MihaiPˇatrascuandMikkelThorup.
Time-spacetrade-osforpredecessorsearch.
InProceed-ingsofthe38thAnnualACMSymposiumonTheoryofComputing,pages232–240,2006.
[PWW97]AlexPeleg,SamWilkie,andUriWeiser.
IntelMMXformultimediaPCs.
Commun.
ACM,40(1):24–38,1997.
[Ric97a]ThorstenRichter.
Anewalgorithmfortheorderedtreeinclusionproblem.
InProceedingsofthe8thAnnualSymposiumonCombinatorialPatternMatching,LectureNotesofComputerScience,volume1264,pages150–166,1997.
[Ric97b]ThorstenRichter.
Anewmeasureofthedistancebetweenorderedtreesanditsapplications,technicalreport85166-cs.
Technicalreport,DepartmentofComputerScience,UniversityofBonn,1997.
[RR92]R.
RameshandI.
V.
Ramakrishnan.
Nonlinearpatternmatchingintrees.
J.
ACM,39(2):295–316,1992.
[Ruˇz04]MilanRuˇzic.
Algorithmsfordeterministicconstructionofecientdictionaries.
InProceedingsofthe12thAnnualEuropeanSymposiumonAlgorithms,LectureNotesinComputerScience,pages592–603,2004.
[Ryt99]WojciechRytter.
Algorithmsoncompressedstringsandarrays.
InProceedingsofthe26thConferenceonCurrentTrendsinTheoryandPracticeofInformaticsonTheoryandPracticeofInformatics,LectureNotesinComputerScience,volume1725,pages48–65,1999.
[Sel77]StanleyM.
Selkow.
Thetree-to-treeeditingproblem.
Inform.
Process.
Lett.
,6(6):184–186,1977.
[Sel80]P.
Sellers.
Thetheoryandcomputationofevolutionarydistances:Patternrecognition.
J.
Algorithms,1:359–373,1980.
[SM97]J.
SetubalandJ.
Meidanis.
IntroductiontoComputationalBiology.
PWSPublishingCompany,1997.
[SM02]TorstenSchliederandHolgerMeuss.
QueryingandrankingXMLdocuments.
J.
Am.
Soc.
Inf.
Sci.
Technol.
,53(6):489–503,2002.
[SN00]T.
SchliederandF.
Naumann.
ApproximatetreeembeddingforqueryingXMLdata.
InACMSIGIRWorkshopOnXMLandInformationRetrieval,2000.
[ST99]R.
ShamirandD.
Tsur.
Fastersubtreeisomorphism.
J.
Algorithms,33:267–280,1999.
[Sta81]RichardM.
Stallman.
Emacstheextensible,customizableself-documentingdisplayeditor.
SIGPLANNot.
,16(6):147–156,1981.
[SWSZ02]DennisShasha,JasonTsong-LiWang,HuiyuanShan,andKaizhongZhang.
Atreegrep:Ap-proximatesearchinginunorderedtrees.
InProceedingsofthe14thInternationalConferenceonScienticandStatisticalDatabaseManagement,pages89–98,2002.
127[SZ90]DennisShashaandKaizhongZhang.
Fastalgorithmsfortheunitcosteditingdistancebetweentrees.
J.
Algorithms,11:581–621,1990.
[SZ97]DennisShashaandKaizhongZhang.
Approximatetreepatternmatching.
InPatternMatchinginString,TreesandArrays,pages341–371.
OxfordUniversity,1997.
[Tai79]Kuo-ChungTai.
Thetree-to-treecorrectionproblem.
J.
ACM,26:422–433,1979.
[Tak04]T.
Takaoka.
Afasteralgorithmfortheall-pairsshortestpathproblemanditsapplication.
InProceedingsofthe10thAnnualInternationalComputingandCombinatoricsConference,LectureNotesinComputerScience,volume3106,pages278–289,2004.
[Tan95]EiichiTanaka.
Anoteonatree-to-treeeditingproblem.
InternationalJournalofPatternRecognitionandArticialIntelligence,9(1):167–172,1995.
[TH99]Shreekant(Ticky)ThakkarandTomHu.
InternetstreamingSIMDextensions.
Computer,32(12):26–34,1999.
[Tho68]K.
Thompson.
Regularexpressionsearchalgorithm.
Commun.
ACM,11:419–422,1968.
[Tho99]MikkelThorup.
Undirectedsingle-sourceshortestpathswithpositiveintegerweightsinlineartime.
J.
ACM,46(3):362–394,1999.
[Tho03]MikkelThorup.
Spaceecientdynamicstabbingwithfastqueries.
InProceedingsofthe33rdAnnualACMSymposiumonTheoryofComputing,pages649–658,2003.
[TONH96]MarcTremblay,J.
MichaelO'Connor,VenkateshNarayanan,andLiangHe.
Visspeedsnewmediaprocessing.
IEEEMicro,16(4):10–20,1996.
[Tou03]Hel`eneTouzet.
Treeeditdistancewithgaps.
Inform.
Process.
Lett.
,85(3):123–129,2003.
[Tou05]Hel`eneTouzet.
Alineartreeeditdistancealgorithmforsimilarorderedtrees.
InProceedingsofthe16thAnnualSymposiumonCombinatorialPatternMatching,LecturenotesinComputerScience,volume3537,pages334–345,2005.
[Tro01]ZdenˇekTronˇcek.
Searchingsubsequences.
Ph.
D.
Thesis,DepartmentofComputerScienceandEngineering,FEECTUinPrague,2001.
[TRS02]A.
Termier,M.
Rousset,andM.
Sebag.
Treender:arststeptowardsXMLdatamining.
InProceedingsofthe2ndInternationalConferenceonDataMining,page450,2002.
[TSKK98]SrikantaTirthapura,DanielSharvit,PhilipKlein,andBenjaminB.
Kimia.
Indexingbasedonedit-distancematchingofshapegraphs.
InProceedingofSPIEInternationalSymposiumonVoice,VideoandDataCommunications,pages91–102,1998.
[TT88]EiichiTanakaandKeikoTanaka.
Thetree-to-treeeditingproblem.
InternationalJournalofPatternRecognitionandArticialIntelligence,2(2):221–240,1988.
[Ukk85a]EskoUkkonen.
Algorithmsforapproximatestringmatching.
Inf.
Control,64(1-3):100–118,1985.
[Ukk85b]EskoUkkonen.
Findingapproximatepatternsinstrings.
J.
Algorithms,6:132–137,1985.
[Val05]GabrielValiente.
Constrainedtreeinclusion.
J.
DiscreteAlgorithms,3(2-4):431–447,2005.
[vEB77]PetervanEmdeBoas.
Preservingorderinaforestinlessthanlogarithmictimeandlinearspace.
Inform.
Process.
Lett.
,6(3):80–82,1977.
128[vEBKZ77]PetervanEmdeBoas,R.
Kaas,andE.
Zijlstra.
Designandimplementationofanecientpriorityqueue.
MathematicalSystemsTheory,10:99–127,1977.
[Wal94]LarryWall.
ThePerlProgrammingLanguage.
PrenticeHallSoftwareSeries,1994.
[Wat95]M.
S.
Waterman.
IntroductiontoComputationalBiology.
Chapman&Hall,1995.
[Wel84]TerryA.
Welch.
Atechniqueforhigh-performancedatacompression.
IEEEComputer,17(6):8–19,1984.
[WF74]RobertA.
WagnerandMichaelJ.
Fischer.
Thestring-to-stringcorrectionproblem.
J.
ACM,21:168–173,1974.
[WM92a]S.
WuandU.
Manber.
Agrep–afastapproximatepattern-matchingtool.
InProceedingsUSENIXWinter1992TechnicalConference,pages153–162,1992.
[WM92b]SunWuandUdiManber.
Fasttextsearching:allowingerrors.
Commun.
ACM,35(10):83–91,1992.
[WMM95]S.
Wu,U.
Manber,andE.
W.
Myers.
Asubquadraticalgorithmforapproximateregularexpressionmatching.
J.
Algorithms,19(3):346–360,1995.
[Wri94]AldenH.
Wright.
Approximatestringmatchingusingwithin-wordparallelism.
Softw.
Pract.
Exper.
,24(4):337–362,1994.
[WZ03]LushengWangandJianyunZhao.
Parametricalignmentoforderedtrees.
Bioinformatics,19(17):2237–2245,2003.
[WZ05]LushengWangandKaizhongZhang.
Spaceecientalgorithmsfororderedtreecomparison.
InProceedingsofhe16thAnnualInternationalSymposiumonAlgorithmsandComputation,LectureNotesinComputerScience,volume3827,pages380–391,2005.
[WZJS94]JasonTsong-LiWang,KaizhongZhang,KarpjooJeong,andDennisShasha.
Asystemforap-proximatetreematching.
IEEETransactionsonKnowledgeandDataEngineering,6(4):559–571,1994.
[Yam01]HiroakiYamamoto.
Anewrecognitionalgorithmforextendedregularexpressions.
InPro-ceedingsofthe12thInternationalSymposiumonAlgorithmsandComputation,LectureNotesinComputerScience,volume2223,pages257–267,2001.
[YLH03]LiangHuaiYang,MongLiLee,andWynneHsu.
EcientminingofXMLquerypatternsforcaching.
InProceedingsofthe29thConferenceonVeryLargeDataBases,pages69–80,2003.
[YLH04]HuaiYang,LiLee,andWynneHsu.
FindinghotquerypatternsoveranXQuerystream.
TheVLDBJournal,13(4):318–332,2004.
[YM03]HiroakiYamamotoandTakashiMiyazaki.
Afastbit-parallelalgorithmformatchingextendedregularexpressions.
InProceedingofthe9thAnnualInternationalComputingandCombina-toricsConference,LectureNotesinComputerScience,volume2697,pages222–231,2003.
[ZADR03]P.
Zezula,G.
Amato,F.
Debole,andF.
Rabitti.
TreesignaturesforXMLqueryingandnavigation.
InProceedingsofthe1stInternationalXMLDatabaseSymposium,pages149–163,2003.
[Zha89]KaizhongZhang.
TheEditingDistanceBetweenTrees:AlgorithmsandApplications.
PhDthesis,CourantInstitute,DepartmentofComputerScience,1989.
129[Zha95]KaizhongZhang.
Algorithmsfortheconstrainededitingproblembetweenorderedlabeledtreesandrelatedproblems.
PatternRecognition,28:463–474,1995.
[Zha96a]KaizhongZhang.
Aconstrainededitdistancebetweenunorderedlabeledtrees.
Algorithmica,15(3):205–222,1996.
[Zha96b]KaizhongZhang.
Ecientparallelalgorithmsfortreeeditingproblems.
InProceedingsofthe7thAnnualSymposiumCombinatorialPatternMatching,LectureNotesinComputerScience,volume1075,pages361–372,1996.
[ZJ94]KaizhongZhangandTaoJiang.
SomeMAXSNP-hardresultsconcerningunorderedlabeledtrees.
Inform.
Process.
Lett.
,49:249–254,1994.
[ZL77]JacobZivandAbrahamLempel.
Auniversalalgorithmforsequentialdatacompression.
IEEETrans.
Inform.
Theory,23(3):337–343,1977.
[ZL78]JacobZivandAbrahamLempel.
Compressionofindividualsequencesviavariable-ratecoding.
IEEETrans.
Inform.
Theory,24(5):530–536,1978.
[ZS89]KaizhongZhangandDennisShasha.
Simplefastalgorithmsfortheeditingdistancebetweentreesandrelatedproblems.
SIAMJ.
Comput.
,18:1245–1262,1989.
[ZSS91]KaizhongZhang,RickStatman,andDennisShasha.
Ontheeditingdistancebetweenun-orderedlabeledtrees.
TechnicalReport289,TheUniversityofWesternOntario,DepartmentofComputerScience,1991.
[ZSS92]KaizhongZhang,RickStatman,andDennisShasha.
Ontheeditingdistancebetweenun-orderedlabeledtrees.
Inform.
Process.
Lett.
,42:133–139,1992.
[ZSW94]KaizhongZhang,DennisShasha,andJasonT.
L.
Wang.
Approximatetreematchinginthepresenceofvariablelengthdon'tcares.
J.
Algorithms,16(1):33–66,1994.
[Zwi04]U.
Zwick.
Aslightlyimprovedsub-cubicalgorithmfortheallpairsshortestpathsproblemwithrealedgelengths.
InProceedingsofthe15thInternationalSymposiumonAlgorithmsandComputation,LectureNotesinComputerScience,volume3341,pages921–932,2004.
[ZWS96]KaizhongZhang,JasonTsong-LiWang,andDennisShasha.
Ontheeditingdistancebetweenundirectedacyclicgraphs.
Int.
J.
Found.
Comput.
Sci.
,7(1):43–58,1996.
130

UCloud云服务器香港临时补货,(Intel)CN2 GIA优化线路,上车绝佳时机

至今为止介绍了很多UCLOUD云服务器的促销活动,UCLOUD业者以前看不到我们的个人用户,即使有促销活动,续费也很少。现在新用户的折扣力很大,包括旧用户在内也有一部分折扣。结果,我们的用户是他们的生存动力。没有共享他们的信息的理由是比较受欢迎的香港云服务器CN2GIA线路产品缺货。这不是刚才看到邮件注意和刘先生的通知,而是补充UCLOUD香港云服务器、INTELCPU配置的服务器。如果我们需要他...

wordpress通用企业主题 wordpress高级企业自适应主题

wordpress高级企业自适应主题,通用型企业展示平台 + 流行宽屏设计,自适应PC+移动端屏幕设备,完美企业站功能体验+高效的自定义设置平台。一套完美自适应多终端移动屏幕设备的WordPress高级企业自适应主题, 主题设置模块包括:基本设置、首页设置、社会化网络设置、底部设置、SEO设置; 可以自定义设置网站通用功能模块、相关栏目、在线客服及更多网站功能。点击进入:wordpress高级企业...

Central美国65折优惠,美国达拉斯机房VPS季付赠送双倍内存

Central美国独立日活动正在进行中,旗下美国达拉斯机房VPS 65折优惠,季付赠送双倍内存(需要发工单),Central租用的Hivelocity的机房,只支持信用卡和加密货币付款,不支持paypal,需要美国独服的可以谨慎入手试试。Central怎么样?Central便宜服务器,Central自称成立于2019年,主营美国达拉斯机房Linux vps、Windows vps、专用服务器和托管...

街头cs枪战为你推荐
酒店回应名媛拼单酒店分房时出现单男单女时,怎样处理?哈利波特罗恩升级当爸电影哈利波特中罗恩一家的红头发为什么后来变成金色的了johncusack有喜欢演员JOHN CUSACK的吗?从哪部片子开始喜欢他的?至今为止他主要参与的电影作品有哪些?安徽汽车网安徽什么汽车网站比较好?留学生认证留学生的学位证书怎样认证?月神谭求男变女类的变身小说www.99cycy.com谁在这个http://www.sifangmall.com网站上买过东西?51sese.com谁有免费看电影的网站?bbs2.99nets.com西安论坛、西安茶馆网、西安社区、西安bbs 的网址是多少?33tutu.com33gan.com改成什么了
青岛虚拟主机 国内免备案主机 plesk 服务器评测 秒解服务器 建站代码 铁通流量查询 52测评网 太原联通测速平台 admit的用法 双十一秒杀 什么是服务器托管 cn3 vip域名 网游服务器 支持外链的相册 优酷黄金会员账号共享 彩虹云 联通网站 四川电信商城 更多