BioMedCentralPage1of23(pagenumbernotforcitationpurposes)BMCBioinformaticsOpenAccessMethodologyarticleAprobabilisticmodelfortheevolutionofRNAstructureIanHolmes*1,2Address:1DepartmentofBioengineering,UniversityofCalifornia,BerkeleyCA94720-1762,USAand2DepartmentofStatistics,1SouthParksRoad,OxfordOX13TG,UKEmail:IanHolmes*-ihh@berkeley.
edu*CorrespondingauthorAbstractBackground:ForthepurposesoffindingandaligningnoncodingRNAgene-andcis-regulatoryelementsinmultiple-genomedatasets,itisusefultobeabletoderivemulti-sequencestochasticgrammars(andhencemultiplealignmentalgorithms)systematically,startingfromhypothesesaboutthevariouskindsofrandommutationeventandtheirrates.
Results:Here,weconsiderahighlysimplifiedevolutionarymodelforRNA,called"TheTKF91StructureTree"(followingThorne,KishinoandFelsenstein's1991modelofsequenceevolutionwithindels),whichwehaveimplementedforpairwisealignmentasproofofprincipleforsuchanapproach.
Themodel,itsstrengthsanditsweaknessesarediscussedwithreferencetofourexamplesoffunctionalncRNAsequences:ariboswitch(guanine),azipcode(nanos),asplicingfactor(U4)andaribozyme(RNaseP).
Asshownbyourvisualisationsofposteriorprobabilitymatrices,theselectedexamplesillustratethreedifferentsignaturesofnaturalselectionthatarehighlycharacteristicofncRNA:(i)co-ordinatedbasepairsubstitutions,(ii)co-ordinatedbasepairindelsand(iii)whole-stemindels.
Conclusions:Althoughallthreetypesofmutation"event"arebuiltintoourmodel,eventsoftype(i)and(ii)arefoundtobebettermodeledthaneventsoftype(iii).
Nevertheless,wehypothesisefromthemodel'sperformanceonpairwisealignmentsthatitwouldformanadequatebasisforaprototypemultiplealignmentandgenefindingtool.
BackgroundOneofthepromisesofcomparativegenomicsistoanno-tatepreviouslyundetectablefunctionalsignalsingenomicsequence,byidentifyingandcharacterisingevo-lutionarilyconservedelements.
Aprincipledwaytoextractsuchsignalsisbyfittingthedatatoprobabilisticmodelsofthemolecularevolutionaryprocess.
Thelogicrunsasfollows:supposetherearevariouskindsofcon-servedelementx,y,z.
.
.
(e.
g.
exons,bitsofRNA,promot-ers,etc)thatmightexplainanobservedsequencehomology.
Foreachofthesescenarios,wecanconstructaprobabilisticmodelMx,My,Mz.
.
.
andcomparethelikeli-hoodoftheobserveddataundereachofthesemodels.
Themodelwiththebestfitindicatesthetypeoffunctionalelementpresentinthesequence.
AgroundbreakingexampleofhowthisprobabilisticapproachcanbeusedistheQRNAprogram,designedasacomparativeRNAgenepredictor[1].
ThethreetypesofelementconsideredbyQRNAarenoncodingRNA(calledRNA),protein-codingexons(calledCODforcodon),andunidentifiedDNAhomology(calledOTHforother).
ThePublished:26October2004BMCBioinformatics2004,5:166doi:10.
1186/1471-2105-5-166Received:27July2004Accepted:26October2004Thisarticleisavailablefrom:http://www.
biomedcentral.
com/1471-2105/5/1662004Holmes;licenseeBioMedCentralLtd.
ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionLicense(http://creativecommons.
org/licenses/by/2.
0),whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.
BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page2of23(pagenumbernotforcitationpurposes)former(RNA)wasmodeledusingaPairwiseStochasticContext-FreeGrammar(PairSCFG);thelattertwo(CODandOTH)usingPairwiseHiddenMarkovModels(PairHMMs).
ThenoncodingRNApredictionsgeneratedahighyieldofexperimentalhits,andofferedaninforma-tion-theoreticglimpseintoamodern-dayRNAworld[2].
Itisnaturaltoconsiderhowsuchanapproachmightbeappliedtoapairwisecomparisonwheretheevolutionary"distance"betweenthetwosequencescanvary.
Oneapproach,analogoustotheBLOSUMseriesofBLASTmatricesforproteins[3],istopartitionasetoftrainingalignmentsintoanadhocnumberofbinsbasedonthepercentagesequenceidentity.
Alignmentsinthesamebin(i.
e.
havingcomparablesequenceidentity)thenrepresentpairsofsequencesatapproximatelyequivalentdistances.
Forexample,theBLOSUM62substitutionmatrixwasesti-matedfrompairwisealignmentswithatleast62%iden-tity.
ThissortofapproachisusedbytheRIBOSUMbasepairsubstitutionmatricesdevelopedforRSEARCH[4],recentversionsofQRNA,andthestemlocprogramintheauthor'sDARTsoftwarepackage.
Analternativeapproach,analogoustothePAMseriesofBLASTmatrices[5],istotreatthe"distance"asatimemeasurement,bypostulatinganunderlyingevolutionarystochasticprocessorcontinuous-timeMarkovchainwhosemutationrateparametersareconstantovertime(calledstationarityinstochasticprocesstheory).
Thisevolutionaryrateapproachusesfewerparameters–andmakesfulleruseofthedata–thanthedividing-into-binsapproach,sinceitpostulatesaninfinitesimalgeneratorforalltime-scalesoftheprocess.
ForthePAMseries,thisgeneratortakestheformofaninstantaneoussubstitutionratematrix;foraprimary-sequencemodel,thegeneratorisacondi-tionally-normalizedPairHMMortransducer[6];foranRNAsecondary-structuremodel,wewillseethatthegen-eratorisaPairSCFG;andsoon.
Furthermore,theevolu-tionaryratemodelissupremelycompatiblewithlikelihood-basedphylogeneticmethods[7].
It'sthereforeworthconsideringsuchevolutionaryrate-basedmodels,although(sincethey'retrickiertoanalysemathematically)they'relesssuitedtoquicksoftwareprototypingthanthe"bin-by-percent-ID"approach.
Withthisinmind,wecanconsidertheevolutionaryrate-basedequivalentsofthethreepairwisegrammarsusedinQRNA.
TheOTHmodel,fornoncodingDNAsequence,isaPairHMMwithaffinegaps;theclosestevolutionaryequivalentisthe"longindel"model[8,9].
Thelongindelmodelincorporatesmulti-residueindelsandsingle-resi-due(point)substitutions;itisbasedontheTKF91model,whichonlyallowssingle-residueindels[10].
Incontrast,thecurrentbestevolutionaryversionsoftheCOD[11]andRNA[12]modelsdonotattempttomodelindels,changesinexon/intronstructureorchangesinRNAsec-ondarystructure.
Thesearedeficiencieswhichmusteven-tuallybeaddressed;ultimatelytheywilllimittheusefulnessofthemodels.
Forexample,thelackofatreat-mentofindelsmeansthatthesemodelscanonlybeusedonapre-generatedalignment;theycannot,bythemselves,beusedtoalignsequences.
Inthisreportwepresentasim-plebutimprovedmodelofRNAstructureevolution,calledtheTKF91StructureTree(Figure1).
Thismodelallowsnotjustcovariantpointsubstitutionsofnucle-otides,butalsocovariantinsertionsanddeletionsofbases,base-pairs,wholestemsandmulti-stemstructures(Figure2).
Althoughwehavenot,inthispaper,appliedtheStructureTreetomultiplealignment,oradaptedittoinclude"longindels",thesimilaritytoexistingmodels[8,13]suggestsverynaturalformsforsuchadaptationsofourmodel.
Furthermore,theTKF91StructureTreeisalge-braicallytractable,yieldingSCFG-basedscoringschemesforsimultaneousRNAalignmentandstructureprediction(fromwhichalignmentalgorithmsnaturallyfollow).
Toourknowledge,thisisthefirstsuchmodelfortheevolu-tionofRNAstructuretobedescribedwithinanevolution-aryrateframework.
AcomputerprogramforsimultaneouspairwisealignmentandsecondarystructurepredictionusingtheTKF91Struc-tureTreehasbeendevelopedinC++.
ThepotentialofthemodelforRNAsequencealignmenthasbeendemon-stratedbytestingthepairwisealigneronfourfunctionalelementsfromtheRFAMdatabase[14]:thepurineribos-witch,thenanostranslationalcontrolelement,theU2splicingfactorandthebacterialnuclearRNasePgene.
TheTKF91StructureTreeisaverysimpleevolutionarymodellackingsome"obvious"features,suchasnaturalselectiontofavourthethermodynamicallystableoverlapofπ-orbitalsbetweenadjacentstackedbasesinRNAdoublehelices.
Thefactthatthemodelappearstoworkreasona-blywell,despitetheexclusionofsuchfeatures,suggeststhatverysimplemodelsofRNAevolutionmayturnouttobesufficienttouncoverasurprisinglylargeproportionofRNAsequencehomology.
MethodsWebeginbyreviewingtheTKF91model[10].
Thismodeldescribestheevolutionofasinglesequenceundertheactionoftwokindsofmutationevent:(i)pointsubstitu-tionevents,whichactonasingleresidueonly;and(ii)single-residueindelevents,whichinsertordeleteasingleresidue.
Theratesofbothtypesofeventareindependentoftheneighboringsequence.
TheTKF91model,asdefinedbyThorneetal,istime-reversible.
Thishastheimplication,calledthepulleyprin-ciplebyFelsenstein,thatthepositionofanancestralnodeinaphylogenetictreecanbeslidaroundlikeapulleyBMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page3of23(pagenumbernotforcitationpurposes)withoutchangingthelikelihoodoftheobserveddata[7].
Aligningapairofobservedpresent-daysequencesisthere-foreidenticaltoaligninganancestralsequencewithitsdescendant,andwecantalkaboutancestor-descendantalignmentwithoutlossofgenerality.
TheTKF91modelcanbeanalysedalgebraically[10],andtheprobabilitydistributionfunction(PDF)overancestor-descendantalignmentscanbeexpressedasaPairHMM[13]andextendedtomultiplesequences(usinga"Multi-pleHMM")[13].
Whileitisstraightforwardtodefineamoregeneral"longindel"modelallowingmulti-residuedeletionsandinsertions[8],theonlyPairHMMsforthisgeneralmodelthathavebeendescribedtodateareapproximations,inspiredbytheformoftheTKF91model:sofarthereisnoexactPairHMMsolutionofthelongindelmodel[8,9,15,16].
Inthispaper,wewillnotbeconsideringsuchlong-indelmodels.
DefinitionoftheTKF91modelThestateoftheTKF91processisdescribedbyaTKF91linksequence:apermanentimmortallinkattheleftendofthesequence,followedbyzeroormoremortallinks.
Overtime,mortallinkscanbedeleted,andnewmortallinkscanbeinsertedtotherightofeitherimmortalormortallinks.
Thiscanbetreatedasabirth-deathprocess(λ0,0)withconstantimmigration(λ0),where"births"areiden-tifiedwithsingle-linkinsertionsoccuringtotheimmedi-aterightoftheparentmortallink.
and"immigration"withinsertionsimmediatelyrightoftheimmortallink.
Afurthersite-independentlabelingisintroducedonmor-tallinksusingthesingletnucleotidealphabet,={A,C,G,U}.
Eachsite'salphabetlabelevolvesasanindependentfour-statereversiblecontinuous-timeMarkovchain(RCTMC)withsubstitutionrateR0(i,j)fromstateitostatej.
Labelsfornewlyinsertedmortallinksaredrawnfromtheequi-libriumdistributionp0(i)ofthissubstitutionprocess.
Byreadingoffthelabelsofmortallinks,thestateoftheTKF91processcanbeequatedtoasequencein*.
AnalysisoftheTKF91modelThefollowingfunctionsof(λn,n)ariseinanalysesofequilibriumandtransitionprobabilitiesintheTKF91model[10].
Heretisatimeparameter.
ATKF91StructureTree,andthecorrespondingRNAsecondarystructure(inset)Figure1ATKF91StructureTree,andthecorrespondingRNAsecondarystructure(inset).
LSAULAGGCSAUSALGAACALAGCCLoopsequenceLoopsequenceStemsequenceStemsequenceUGCCACUGCUUUUAAACGGGGAAGGCAUGCCACGGAUUAUCGUAACGGUGCUAAGCAAACUABMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page4of23(pagenumbernotforcitationpurposes)Hereexp(Rnt)≡exp(A)istheexponentialofthematrixwithelementsAij=Rn(i,j)t.
Themeaningoftheabovefunctionsisasfollows.
αnistheprobabilityofnon-deletion;βn,γnaretheprobabilitiesofinsertion,following(respectively)aninsertionandadele-tion;κnistheprobabilityofcontinuingtheancestralsequence;andMn(i,j)istheconditionalsubstitutionprobabilityfromitoj.
Note(1-γn)κn(1-αn)=βn(delete→deleteandinsert→inserttransitionprobabilitiesareequal).
Notealsolimt→∞βn=κn.
TheequilibriumprobabilitydistributionoversequencesintheTKF91modelisageometricdistributionwithparameterκ0.
Theresiduesatindividualpositionsofthesequenceareindependently,identicallydistributedatequilibriumandaresampledfromtheequilibrumdistri-butionofthepointsubstitutionprocess.
TheTKF91singletgrammarisshowninFigure3.
TheTKF91pairgrammarisshowninFigure4.
Notethattwoalternatesetsofruleprobabilities,jointlyandcondition-allynormalised,canbereadofffromFigure4:thecondi-tionalprobabilitiescanbereadofffromcolumnP(d|a),whilethejointprobabilitiescanbefoundbymultiplyingtheexpressionsincolumnsP(a)andP(d|a)toobtainP(a,d).
AnexamplemutationpathforaTKF91StructureTree,illustratingtheinstantaneoustransitionsoftheprocessFigure2AnexamplemutationpathforaTKF91StructureTree,illustratingtheinstantaneoustransitionsoftheprocess.
Ineachfigure(N),grayarrowsindicatethesitesofpastmutationsinstep(N-1)→(N),whileblackarrowsindicatethesitesofupcomingmutationsinstep(N)→(N+1).
Thetypesofmutationareasfollows.
(1)→(2):Pointsubstitutionofunpairednucleotidesinloopsequences.
(2)→(3):Pointand/orcovariantsubstitutionofpairednucleotidesinstemsequences.
(3)→(4):Insertionsofsingleunpairednucleotidesinloopsequences.
(4)→(5):Deletionsofsingleunpairednucleotidesinloopsequences.
(5)→(6):Insertionanddeletionofpairednucleotidesinstemsequences.
(6)→(7):Insertionofstemsintoloopsequences.
(7)→(8):Deletionofstemsfromloopsequences.
GCSAUSALSALGACAGACLAGCCULLUGCCCUGCUUUAAGGGGAGUUAUALAGCCLAGCSAUSACCLGAAGLSAAUGCCCUGCUUUAAGGGGAUUAGUGCCACUGCUUUUAAACGGGGALACSAUSALAGCCLGACAGLSAGUALAGCCLAGCSAUSALSALGACAGACUGCCACUGCUUUUAAACGGGGACSAUSALSALGAALAGCCUCUGCCCUGCUUUAAGGGGAUGUAAGLUA(5)LGAACCSAUSALSAUGCCGCUUUAGGGAUUAAGLUAGULAGCCUUAA(6)GCUGCGUUGAUACSAUSALSACUUGAUAAGUAGULAGCUUAALSUUGAUALAAGCSAUU(7)GCUGCGUUGA(3)(1)(2)(4)(8)CSAULSACUUGAAGUAGUALSUUGAUALAAGAGCUGGCUGUAUAGCGGGUAUALGAACUGLLSASLGAACLCUCCαβλλλλγnnnnnnnnnnnttt===exp()(exp(()))exp(())11nnnnnnnnnnnttt(exp(()))(exp())(exp(()))/11=λλλκλnnnijMijt(,)exp()=RBMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page5of23(pagenumbernotforcitationpurposes)ExtendingtheTKF91modelVariousextensionstoTKF91havebeenproposed[8,9,15].
Themosttractablekindofextensionchangesthemeaningofa"link"butleavestheindelprocessonlinksintact[15].
OurRNAmodelisonesuchextension,allowingtwodifferentkindsofTKF91modelthatcanbemutuallynestedtoformloopandstemregions.
ConsiderthefollowingextensiontotheTKF91model,whichwecalltheTKF91StructureTree,andwhichisshowninFigures1and2.
ThismodelusesthefactthatanRNAsecondarystructure(excludingpseudoknots,kissingloopsandother"tertiary"interactions)canbeidentifiedwithatree.
Thestateofourstochasticprocesscanthusbedescribedbyarootedtree:everynodeinthistreeiseitherasinglet,paired,looporstemnode.
Thetreecanbebrokenintooverlappingloopsequencesandstemsequences,whichcorrespondtostrandsofunpairedRNA(loops)ordoublehelicesofbasepairedRNA(stems).
Loopsareallowedtocontainunpairednucleotides,andcanalsoserveasabranching-offpointfornestedstems.
Stems,ontheotherhand,areallowedtocontainpairednucleotides,andareterminatedbyaloop(thisreflectsthesmallestunitofRNAstructure,whichisastemterminatedbyaloop).
Thetreeisrootedbyaloopsequence.
Theabovedescriptionwillnowbemademoreprecise.
DefinitionoftheTKF91StructureTreeThestateoftheTKF91StructureTreeisdescribedbyarootedtreewhereeachnodehasdegree≤3.
Therearefourbasickindsofnodeinthetree:singlet,paired,loopandstem.
Singletandpairednodescorrespondtoobservablenucle-otides.
Singletnodes(labeledfrom)representinde-pendentlyevolvingnucleotides,asinTKF91.
Pairednodes(labeledfrom2)representcovariantbasepairs.
Loopandstemnodesdeterminethetreestructure(Figure1).
Loopnodes(labeledL),ofwhichtherootnodeisone,arepresentatthebeginningofloopsequences,whichcontainsingletandstemnodesandarewrittenhorizon-tally.
Stemnodes(labeledS)arepresentatthebeginningofstemsequences,whichcontainpairednodes,aretermi-natedbyaloopnode,andarewrittenvertically.
ThesetofloopandstemnodelabelsiswrittenΦ.
Thefullsetofnodelabelsis∪2∪Φ.
Φ={L,S}={A,C,G,U}2={AA,AC,AG,AU,CA,CC,CG,CU,GA,GC,GG,GU,UA,UC,UG,UU}LoopsequencesAloopsequenceisverysimilartoaTKF91linksequence:aswithTKF91,wehavealeftmostimmortallooplinkfol-lowedbyzeroormoremortallooplinks.
Themortallinksareinsertedanddeletedwithratesλ1and1,inthestyleofTKF91.
EachlinkisalsoanodeintheStructureTree.
Linksarelabeledfrom∪Φ:theimmortallooplinkislabeledL,whilethemortallooplinksarelabeledfrom{A,C,G,U,S}.
AswiththeTKF91model,thealphabetlabe-lingofeachmortallinkevolvesasanindependentfive-stateRCTMCwithsubstitutionrateR1(i,j)fromitojandequilibriumprobabilityp1(i)ofbeinginstatei,plustheadditionalrestrictionthatR1(X,S)=R1(S,X)=0forallX∈:inotherwords,embeddedstemscan'tinterconvertwithsingletnucleotides.
Seestep1→2ofFigure2forexamplesofsingle-nucleotidesubstitutioninloopsequences,andsteps3→4and4→5forsingle-nucle-otideinsertionanddeletion.
TheS-labeledlinkspossessanindependentlyevolvingembeddedstemsequencethatcanbeconsideredto"nest"insidetheloopsequence.
IftheS-linkisdeleted,thentheembeddedstem(andallitschildren)isdeletedwithit.
Conversely,whenanewS-linkisinserted,itisinsertedwithacompletesubtreethatissampledfromtheequilib-riumdistributionoverStructureTrees.
Seesteps6→7and7→8ofFigure2forexamplesofsubstructureinsertionanddeletion.
SincealoopsequenceiseffectivelyaTKF91sequencewithaspecial"fifthnucleotide"characterrepresentinganembeddedstem(theSlink),itobeysthesamestatisticsasaTKF91sequence.
Inparticular,theprobabilitydistribu-tionoverlooplengthsatequilibriumisageometricdistri-butionwithparameterκ1.
AnSCFGwithnonterminals{L}andterminalsforgenerat-ingtheequilibriumdistributionoverTKF91sequencesFigure3AnSCFGwithnonterminals{L}andterminalsforgenerat-ingtheequilibriumdistributionoverTKF91sequences.
HereX∈isagenericterminalsymbol.
TKF91singletrulesRulelhs→rhsP(a)1.
L→XLκ0p0(X)2.
|1κ0BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page6of23(pagenumbernotforcitationpurposes)StemsequencesAstemsequenceisalsoderivedfromaTKF91linksequence.
UnliketheTKF91linksequenceortheloopsequence,however,itiswrittenvertically(ratherthanhor-izontally)inFigure1.
Itconsistsofatopmostimmortalstemlink,zeroormoremortalstemlinks,andabottom-most,terminatinglooplink.
Again,eachlinkisalsoanodeintheStructureTree.
Linksarelabeledfrom2∪Φ:theimmortalstemlinkislabeledS(thisisthenodeintheparentloopsequence),themortallinksarelabeledwiththepairednucleotidealphabet2(eachwithanindependentsixteen-stateRCTMCmodelingcovariantpairsubstitutionalongRNAstems,withsubstitutionratematrixR2(i,j)andequilib-riump2(i)),andtheterminatinglooplinkislabeledL.
ThemortalstemlinksexperienceTKF91-styleinsertionanddeletionwithratesλ2and2(although,inthediagram-maticformofFigure1,newlyinsertedlinksareplacedimmediatelyundertheirparentlink,ratherthanimmedi-atelytotheright).
TheterminatinglooplinkLdoesnotcontributetoinsertionordeletion(soiseffectivelyimmortalbutinert)butpossessesanindependentlyevolvingloopsequence.
Seestep2→3ofFigure2forexamplesofcovariantbasepairsubstitutioninstemsequences,andstep5→6forcovariantbasepairinsertionanddeletion.
Notethattheimmortalstemlink,S,isonlyimmortalfromthepointofviewofthestemsequencebeneathit.
TheSisitselfamortallinkinaparentloopsequence,andmaybedeletedasthatsequenceevolves.
Inthisevent,thelooplinkLwillalsobedeleted,alongwithallitschildren(step7→8,Figure2).
Thus,theonlytrulyimmortallinkistheloopnodeattherootoftheStructureTree,whichhasnoparentstodealdeathfromabove.
Aswiththeloopsequence,astemsequenceiseffectivelyaTKF91sequencewithminormodifications,anditobeysthesamestatisticsasaTKF91sequence.
Theprobabilitydistributionoverstemlengthsatequilibriumisageomet-ricdistributionwithparameterκ2.
AnalysisoftheTKF91StructureTreeFigure5showstheSCFGforgeneratingtheTKF91Struc-tureTreeatequilibrium.
Therearetwononterminals,Φ,andfourterminals,.
APairSCFGwithnonterminals{L1,L2}andterminalsa∪dforgeneratingpairwisealignmentsofTKF91sequencesforanancestorandadescendantFigure4APairSCFGwithnonterminals{L1,L2}andterminalsa∪dforgeneratingpairwisealignmentsofTKF91sequencesforanancestorandadescendant.
HereX,Y∈aregenericterminalsymbols.
TKF91pairrulesRulelhs→rhsP(a)P(d|a)1.
L1→XaYdL1κ0p0(X)(1β0)α0M0(X,Y)2.
|YdL11β0p0(Y)3.
|XaL2κ0p0(X)(1β0)(1α0)4.
|1κ01β05.
L2→XaYdL1κ0p0(X)(1γ0)α0M0(X,Y)6.
|YdL11γ0p0(Y)7.
|XaL2κ0p0(X)(1γ0)(1α0)8.
|1κ01γ0BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page7of23(pagenumbernotforcitationpurposes)Figure6showsthepairstochasticcontext-freegrammarforanancestoranddescendantsequenceseparatedbyevolutionarytimet.
Again,conditionalandjointproba-bilitiescanbothbereadfromthefigure.
NonterminalsareΦ1234;terminalsareafortheancestoranddforthedescendant.
Φ1234={L1,L2,L3,L4,S1,S2,S3,S4}a={Aa,Ca,Ga,Ua}d={Ad,Cd,Gd,Ud}Dynamicprogrammingalignmentofsequencestothesegrammarshasthetypicalcomplexityforsingle-sequence[17]andpairwise[18]SCFGs.
Thatis,forFigure5,thetimecomplexityisO(L3)andthememorycomplexityO(L2),whileforFigure6,thetimecomplexityisO(L3M3)andthememorycomplexityO(L2M2),whereLandMaresequencelengths.
Thisisalsothecomplexityofthesingle-sequenceandtwo-sequenceSankoffalgorithm[19],forwhichSCFGsmayberegardedasaprobabilisticscoringscheme.
Thetimeandmemorycomplexitymaybereducedbytheuseof"banding"techniques[20,21],thatrestrictthedynamicprogrammingcomputationtothe(typically)highest-scoringcentraldiagonalbandofthedynamicprogrammingmatrix,orbymoreflexiblecon-straintsontheDPiteration[18].
GrammartransformationsWenowdescribesometransformationsofFigures5,6per-formedbeforeimplementingthegrammarparsers.
NullcyclesThepresenceinagrammarof"nullcycles"–sequencesofproductionruleswhichcausenonetchange–complicatestheparsingalgorithmsforthatgrammar.
Generally,nullcyclesareavoidedbyprogrammersdesigningSCFGsorHMMsforsequenceanalysis[17].
However,inthegram-marsderivedautomaticallyfortheTKF91StructureTree,nullcyclesarisenaturallyduetothepossibilityofzero-lengthlooporstemsequencesinthemodel.
ThereareseveralclassesofnullcycleinthegrammarsfortheStructureTreemodel,showninTable1.
DegeneraciesAswellasnullcycles,thereareotherundesirabledegener-aciesintheStructureTreegrammars.
Grammaticaldegen-eracyoccurswhenmorethanoneparsehasthesamemeaning,soparsesaredegenerateratherthanunique.
Moststochasticgrammarsusefulforbioinformaticsaredegen-erateinthesensethattherearealwaysmanyfoldsoralignmentsconsistentwiththeobservedsequencedata;thissortofdegeneracyistechnicallycalledambiguity.
Wearemoreconcernedwithotherformsofdegeneracy,suchasstructuraldegeneracy(multipleparsesdenoteasinglepatternofbasepairing)andalignmentdegeneracy(multipleparsesdenoteasinglealignment).
TKF91,ineffect,skirtsalignmentdegeneracybyassigningmeaningtotheorderingofdeletionsandinsertionsinanalignment,butalignmentdegeneraciesariseintheStruc-tureTreemodelbecausetherearemultiplewaystodeleteandinsertthings(e.
g.
deletingawholestem,versusdelet-ingallitselementsindividually).
Therearealsostructuraldegeneraciesarisingfrom"silent"(i.
e.
non-emitting)loopsorstems.
Inadditiontothenullcyclesdescribedabove,theseinclude(forthesingletgrammar)theunde-sirable"loopbifurcation"L→LLandthe"silentbulge"S→S(anullcycle).
AfulllistofdegeneraciesforthesingletandpairgrammarsisshowninTable1.
Preventionofzero-lengthstemsThenullcyclesallinvolvezero-lengthstemsandcanbebroken(NBnotmarginalised;thelikelihoodisdiscarded)byaddingextranonterminalsbeforethecorrespond-ingSk,copyingalloutgoingrulesexceptthenonemittingSk→Lk.
Thisalsoremovestheloopbifurcations,butleavessilentbulgesoftheformSk→.
ThesilentbulgescanberemovedbyaddingnonterminalsbeforeLk,copyingalloutgoingrulesexceptLk→ε,changingtosoastopreventescapefromwithoutanunpairedemission,andaddingnewrulesofAnSCFGwithnonterminalsΦandterminalsforgenerat-ingtheequilibriumdistributionoverStructureTreesFigure5AnSCFGwithnonterminalsΦandterminalsforgenerat-ingtheequilibriumdistributionoverStructureTrees.
HereW,X∈aregenericterminalsymbols.
StructureTreesingletrulesRulelhs→rhsP(a)1.
L→XLκ1p1(X)2.
|SLκ1p1(S)3.
|1κ14.
S→WSXκ2p2(WX)5.
|L1κ2Sk'Sk'Lk'LSLkkk''→LSLkkk'''→Lk'BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page8of23(pagenumbernotforcitationpurposes)APairSCFGwithnonterminalsΦ1234andterminalsa∪dforgeneratingpairwisealignmentsofStructureTreesforanancestorandadescendantFigure6APairSCFGwithnonterminalsΦ1234andterminalsa∪dforgeneratingpairwisealignmentsofStructureTreesforanancestorandadescendant.
HereW,X,Y,Z∈aregenericterminalsymbols.
StructureTreepairrulesRulelhs→rhsP(a)P(d|a)1.
L1→XaYdL1κ1p1(X)(1β1)α1M1(X,Y)2.
|YdL11β1p1(Y)3.
|XaL2κ1p1(X)(1β1)(1α1)4.
|S1L1κ1p1(S)(1β1)α15.
|S4L11β1p1(S)6.
|S3L2κ1p1(S)(1β1)(1α1)7.
|1κ11β18.
S1→WaYdS1ZdXaκ2p2(WX)(1β2)α2M2(WX,YZ)9.
|YdS1Zd1β2p2(YZ)10.
|WaS2Xaκ2p2(WX)(1β2)(1α2)11.
|L11κ21β212.
L2→XaYdL1κ1p1(X)(1γ1)α1M1(X,Y)13.
|YdL11γ1p1(Y)14.
|XaL2κ1p1(X)(1γ1)(1α1)15.
|S1L1κ1p1(S)(1γ1)α116.
|S4L11γ1p1(S)17.
|S3L2κ1p1(S)(1γ1)(1α1)18.
|1κ11γ119.
S2→WaYdS1ZdXaκ2p2(WX)(1γ2)α2M2(WX,YZ)20.
|YdS1Zd1γ2p2(YZ)21.
|WaS2Xaκ2p2(WX)(1γ2)(1α2)22.
|L11κ21γ223.
L3→XaL3κ1p1(X)124.
|S3L3κ1p1(S)125.
|1κ1126.
S3→WaS3Xaκ2p2(WX)127.
|L31κ2128.
L4→YdL41κ1p1(Y)29.
|S4L41κ1p1(S)30.
|11κ131.
S4→YdS4Zd1κ2p2(YZ)32.
|L411κ2BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page9of23(pagenumbernotforcitationpurposes)theformtoallowescapeifthereisagenuinebifurcation.
Amorecarefulanalysis,marginalisingnullcyclesandsilentbulgesratherthansimplyignoringthem,isalmostcertainlypossible.
TransformationtocanonicalformFigures7and8showthesingletandpairwisegrammarswithnullcyclesandsilentbulgesremoved,inthecanoni-calformusedbytheDARTsoftwarepackage[18].
Aswellasthenewsetsofnonterminalsdescribedabove(Φ'forsinglet,forpair)thegrammarincludesnontermi-nalsdedicatedtobifurcations(forsinglet,forpair)andemissions(forsinglet,forpair).
Theseparationofthenonterminalsintonull,bifurcationandsinglet/pairemissionsetsputsthegrammarintheformunderstoodbytheDARTlibrary[18].
ThefullnonterminalalphabetsareΨforsingletstatesandΨ1234forpairstates.
TheasymptoticcomplexityofthedynamicprogrammingrecursionsimpliedbythesegrammarsisunchangedbythetransformationtoDARTform.
ForFigure7,thetimecomplexityisO(L3)andthememorycomplexityO(L2),whileforFigure8,thetimecomplexityisO(L3M3)andthememorycomplexityO(L2M2),whereLandMaresequencelengths.
Again,thecomplexitymaybereducedbytheuseof"banding"[20,21]orother[18]constraints.
ParameterisationoftheTKF91StructureTreeTheExpectationMaximisation(EM)algorithmisoftenusedfortrainingBLOSUM-likemodels,e.
g.
estimatingemissionandtransitionprobabilitiesforPairHMMs[17]orPairSCFGs[1].
Itisalsousefulfortrainingevolutionaryratemodels,whichhaveroughlythesamenumberofparametersandcanmakeuseoflargertrainingsets(sincethetrainingdatadon'thavetobe"binned"bypercentidentity).
Table1:ClassesofdegeneracyintheStructureTreegrammars.
Permutationsandcombinationsofthesecyclesarealsopossible.
DegeneracyFigureNonterminalsequenceRulesequenceCommentNullcycle5L→S→L2,3,56L1→.
.
.
→L14,7,11viaS1L15,32,30viaS4L1L2→.
.
.
→L217,27,25L1→L2.
.
.
6,27,25.
.
.
L2→L115,11,7viaS1L116,32,30viaS4L1L3→S3→L324,25,27L4→S4→L429,30,32Loopbifurcation5L→LL2,56L1→L1L14,11L2→L1L115,11L3→L3L324,27L4→L4L429,32Silentbulge5S→L→S5,3,2CyclicpermutationofL→S→L6S2→L1→S122,4,7S1→L1→S111,4,7CyclicpermutationofL1→S1→L1S3→L3→S327,24,25CyclicpermutationofL3→S3→L3S4→L4→S432,29,30CyclicpermutationofL4→S4→L4LSSLkkkk'''→Φ134'12341234ΦΨΦΦΦ''''''''{,}{,,}{,}{,,====∪∪∪=LSBBCLSLLLaa134134'''''''',,,}SSSBBBBBBBBC13412344132113344113344=11133441234112341123,,}CCLLLLLSSSSaddaadaddaad=SS41234123413412341234}'ΨΦΦ=∪∪∪BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page10of23(pagenumbernotforcitationpurposes)TheEMalgorithmfortheTKF91StructureTreecanbesep-aratedintotwoparts,oneforthesubstitutionprocessandonefortheindelprocess.
Earlierwork[22]showedhowtoestimatethemaximum-likelihoodsubstitutionratematrixRnusingtheEMalgorithm,giventhefollowingsuf-ficientstatistics:,theexpectednumberofinsertionsofstated;,theexpectednumberofalignedsiteswithancestralstateaanddescendantstated.
Aforthcomingpaperdescribeshowtoestimatethemaxi-mum-likelihoodindelratesλn,nforaTKF91modelusingtheEMalgorithm,giventhefollowingsufficientstatistics:,theexpectednumberofdeletedlinksnotfollowedbyaninsertion;,theexpectednumberofsurvivinglinksnotfol-lowedbyaninsertion;,theexpectednumberofdeletedlinksfollowedbyaninsertion;,theexpectednumberofsurvivinglinksfollowedbyaninsertion.
Wecancalculatealltheaboveupdatestatisticssimultane-ouslyfromdata(theE-step)usingaconstrainedversionoftheInside-Outsidealgorithm[18]forthegrammarinFig-ure8,asfollows.
Assumethejointnormalisation,P(d,a),andsupposethatistheposteriorexpectationofthenumberoftimesrulemofFigure8wasapplied,asreturnedbytheInside-Outsidealgorithm.
Foremitrules,letbetheexpectednumberoftimesrulemwasusedtoemitthespecificnonterminalsX,YThenADART-formSCFGwithnonterminalsΨandterminalsforgeneratingancestralStructureTreesFigure7ADART-formSCFGwithnonterminalsΨandterminalsforgeneratingancestralStructureTrees.
HereW,X,Y,Z∈aregenericterminalsymbols.
StructureTreesingletrulesRulelhs→rhsP(a)1.
L→aLκ12.
|Bκ1p1(S)3.
|1κ14.
S→aSκ25.
|L1κ2BifurcationrulesRulelhs→rhsP(a)11.
B→SL112.
B→SL113.
C→SS1Nullcycle-breakingrulesRulelhs→rhsP(a)6.
L→aLκ17.
|Bκ1p1(S)8.
|Cκ21p1(S)2(1κ1)10.
S→aSκ2EmissionrulesRulelhs→rhsP(a)14.
aL→XLp1(X)15.
aS→WS2Xp2(WX)()Ndn,()Nadn()NnD()Nn()NnD()Nnrm.
.
.
rmXYBMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page11of23(pagenumbernotforcitationpurposes)ADART-formPairSCFGwithnonterminalsΨ1234andterminalsa∪dforgeneratingpairwisealignmentsofStructureTreesforanancestorandadescendantFigure8ADART-formPairSCFGwithnonterminalsΨ1234andterminalsa∪dforgeneratingpairwisealignmentsofStructureTreesforanancestorandadescendant.
HereW,X,Y,Z∈aregenericterminalsymbols.
StructureTreepairrulesRulelhs→rhsP(a)P(d|a)1.
L1→adL1κ1(1β1)α12.
|dL11β13.
|aL2κ1(1β1)(1α1)4.
|B11κ1p1(S)(1β1)α15.
|B411β1p1(S)6.
|B32κ1p1(S)(1β1)(1α1)7.
|1κ11β18.
S1→adS1κ2(1β2)α29.
|dS11β210.
|aS2κ2(1β2)(1α2)11.
|L11κ21β212.
L2→adL1κ1(1γ1)α113.
|dL11γ114.
|aL2κ1(1γ1)(1α1)15.
|B11κ1p1(S)(1γ1)α116.
|B411γ1p1(S)17.
|B32κ1p1(S)(1γ1)(1α1)18.
|1κ11γ119.
S2→adS1κ2(1γ2)α220.
|dL11γ221.
|aL2κ2(1γ2)(1α2)22.
|L11κ21γ223.
L3→aL3κ1124.
|B33κ1p1(S)125.
|1κ1126.
S3→aS3κ2127.
|L31κ2128.
L4→dL41κ129.
|B441κ1p1(S)30.
|11κ131.
S4→dS41κ232.
|L411κ2BifurcationrulesRulelhs→rhsP(a)P(d|a)51.
B11→S1L11152.
B41→S4L11153.
B32→S3L21154.
B33→S3L31155.
B44→S4L41156.
B11→S1L11157.
B33→S3L31158.
B44→S4L41159.
C11→S1S11160.
C33→S3S31161.
C44→S4S411Nullcycle-breakingrulesRulelhs→rhsP(a)P(d|a)33.
L1→adL1κ1(1β1)α134.
|dL11β135.
|aL2κ1(1β1)(1α1)36.
|B11κ1p1(S)(1β1)α137.
|B411β1p1(S)38.
|B32κ1p1(S)(1β1)(1α1)39.
|C11κ21(1κ1)(1β1)3α21*p1(S)240.
S1→adS1κ2(1β2)α241.
|dS11β242.
|aS2κ2(1β2)(1α2)43.
L3→aL3κ1144.
|B33κ1p1(S)145.
|C33κ21(1κ1)1*p1(S)246.
S3→aS3κ2147.
L4→dL41κ148.
|B441κ1p1(S)49.
|C441κ21(1κ1)*p1(S)250.
S4→dS41κ2EmissionrulesRulelhs→rhsP(a)P(d|a)62.
adL1→XaYdL1p1(X)M1(X,Y)63.
dL1→YdL11p1(Y)64.
aL2→XaL2p1(X)165.
aL3→XaL3p1(X)166.
dL4→YdL41p1(Y)67.
adS1→WaYdS1ZdXap2(WX)M2(WX,YZ)68.
dS1→YdS1Zd1p2(YZ)69.
aS2→WaS2Xap2(WX)170.
aS3→WaS3Xap2(WX)171.
dS4→YdS4Zd1p2(YZ)BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page12of23(pagenumbernotforcitationpurposes)Thetermsinparenthesesaretobeomittedifthecondi-tionalnormalisation,P(d|a),isused.
Therelationshipbetweentheexpectedinsertandmatchusage,andtheexpectedstart,waitandtransi-tionusageofthepreviouswork[22]isasfollowswhereisdefinedasinthepreviouswork[22]ResultsThepairwisealignerfortheTKF91StructureTreeisdis-tributedaspartoftheDARTpackageatthefollowingURL:http://www.
biowiki.
org/ThealignerisbasedontheStochasticContext-FreeGram-mars(SCFGs)showninFigures7and8,asexplainedintheMethodssection.
ThespecificimplementationusesageneralPairSCFGdynamicprogramming(DP)enginewithacceleratingheuristics,tobedescribedinalaterpaper(manuscriptinpreparation).
Totesttheperformanceofthemodelataligningandpre-dictingstructureofRNAsequence,weconsideredpairsofRNAsequencesfromfourdifferentfamilies,withvaryingdegreesofhomologyatthelevelofsecondarystructure.
Thefourfamilieswerethepurineriboswitch(Figure9),thenanostranslationalcontrolelement(TCE)fromDro-sophila(Figure10),theU2spliceosomalfactor(Figure11)andbacterialnuclearRNaseP(Figure12).
()()NrrrrrNrrrrrD11214151718113467()()rrrrrNrrNrr3335363839113161253++++=+=+D+++=++()()rrNrrrNrrrr3437219212228101140D()()()rNrNrrNrrrXXXXY4222029411646562D==+=++XXXXSrrNrrrrrN∈∑++()636615293748492XXYXYSSWXWXWXrNrrrNrr,(),()()162143639269702==++=+++++=∈∑,,()rrrNrWXYZYZWXWXWXYZWXYZ676871267()Nan,()Nadnswuininijn()()(),(),()sNwNtMaduNinininadniiadnadijn===∑aadnnijadnadRijtMad,(),∑ijadT()ijadnntTTMaiMjddt==∫0AlignmentofpurineriboswitchesfromBacillushalodurans(AP001509.
1)andStreptococcuspneumoniae(AE007476.
1)Figure9AlignmentofpurineriboswitchesfromBacillushalodurans(AP001509.
1)andStreptococcuspneumoniae(AE007476.
1).
AP001509.
1UUAAUCGAGCUCAACACUCUUCGUAUAUCCUC.
UCAAUAUGG.
GAUGAGGGUCUCUAC.
AGGUA.
CCGUAAA.
UACCUAE007476.
1AAAAUUGAAUAUCGUUUUACUUGUUUAU.
GUCGUGAAU.
UGG.
CACGA.
CGUUUCUACAAGGUG.
CCGG.
AA.
CACCUStem00000000.
.
.
.
.
11111.
.
.
.
.
.
.
.
11111.
22222.
22222AP001509.
1AGCUACGAAAAGAAUGCAGUUAAUGUAE007476.
1AACAAUAAGUAAGUCAGCAGUGAGAUStem#.
.
.
00000000.
BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page13of23(pagenumbernotforcitationpurposes)Foragivenfamily,denotethetwosequencesinthefamilybyA,B.
Thefollowingcomputationswereperformed:(1A),(1B)Foreachofthetwosequences(A,B)takenindividually,thesecondarystructurewaspredictedwith-outtheaidofcomparativeinformationfromtheothersequence,usingthesingle-sequenceSCFGofFigure7.
AlignmentofnanosTCEsfromDrosophilavirilis(DVU24695)andDrosophilamelanogaster(DRONANOS)Figure10AlignmentofnanosTCEsfromDrosophilavirilis(DVU24695)andDrosophilamelanogaster(DRONANOS).
AlignmentofU2splicingfactorsfromTetrahymenathermophila(X63784.
1)andLeptomonascollosoma(X56455.
1)Figure11AlignmentofU2splicingfactorsfromTetrahymenathermophila(X63784.
1)andLeptomonascollosoma(X56455.
1).
DVU24695G.
.
.
.
.
.
.
.
UGGAA.
.
.
GAAGCUCUGGCAGCUUU.
.
.
UUAAGCGUUUAUAUADRONANOSAAUCCAGCUCUGGAGCAGAGGCUCUGGCAGCUUUUGC.
.
.
AGCGUUUAUAUAStem#00000000001111222222222.
.
.
.
.
222222222.
.
.
.
33333333333DVU24695A.
GAGUUAUAUAUAUGCGCG.
UUCC.
.
.
.
A.
.
.
.
.
.
.
.
CDRONANOSACAUGAAAUAUAUAUACGCAUUCCGAUCAAAGCUGGGUUStem33333333333.
.
1111.
.
.
.
0000000000X63784.
1AUACCUUCUCGGCCUUUUGGCUAAGAUCAAGUGUAGUAUCUGUUCUUAUCAGUGUGAAAACUGAUACUGUCCCUACUAGGX56455.
1AUAUCUUCUCGGCUUUUUAGCUAAGAUCAUGUUUUUAAAAUGUUCUUAUCAGAGUAACUCCUGAUAUUUGCCU.
.
UC.
GGStem#.
.
.
.
.
.
000.
1111.
.
.
.
1111.
000.
222222.
.
.
.
.
.
.
.
222222.
333333.
.
.
.
.
33X63784.
1GACAUGUGGUUUCACAUUAAUUUUUCACAGGGGUCGGAUUCACUAGUGGCUUGCCCUAGUCCCGACGC.
GGUUGCCCUUGX56455.
1GCAAUUAGGAAU.
.
ACGAAAUCUUUGAUCAC.
GCGAGUUUUCCUGGStem#3333.
444444.
.
.
.
.
.
55.
666.
.
66655.
.
.
444444.
.
.
77777788888X63784.
1GCCUGCACGCUACUAAGGAGCGGCUACCCCUGX56455.
1AGUUCCACUCUUUCCAGGCGAAGCUCGCCCUUStem#8.
888888.
777777.
.
.
.
.
.
.
BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page14of23(pagenumbernotforcitationpurposes)(2)Thetwosequences(A,B)werethenalignedusingtheTKF91model,withoutmakinguseofanymodelofRNAstructure,usingthePairHMMofFigure4.
(3)Finally,thetwosequences(A,B)werealignedusingtheTKF91StructureTreemodelintroducedinthispaper,usingthePairSCFGofFigure8.
ThesecomputationsallowacomparisonbetweentheTKF91model,thesingle-sequenceSCFGofFigure7andtheTKFStructureTree.
Theresults,includingstructureandalignmentpredictions,areillustratedinacompactvisualrepresentationthatwecalla"fold/alignmentdotplot".
Thekeytointerpretingthefold/alignmentdotplotisshowninFigure13.
Thesubregionslabeleda-fhavethefollowingmeaning:(a)Thistriangulardotplotillustratesthesingle-sequencestructurepredictionforsequenceAofcomputation(1A).
Thepixelcoloratco-ordinates(x,y)representstheposte-riorprobabilitythatresiduesxandyofAarebase-paired,intheabsenceofanyinformationfromsequenceB.
(b)Thistriangulardotplotillustratesthesingle-sequencestructurepredictionforsequenceBofcomputation(1B).
Thepixelcoloratco-ordinates(x,y)representstheposte-riorprobabilitythatresiduesxandyofBarebase-paired,intheabsenceofanyinformationfromsequenceA.
(c)Thisrectangulardotplotillustratesthestructure-freepairwisealignmentofcomputation(2).
Thepixelcoloratco-ordinates(x,y)representstheposteriorprobabilitythatresiduexofAishomologoustoresidueyofB,intheAlignmentofnuclearRNasePgenesfromPichiacanadensis(AF186219.
1)andClavisporaopuntiae(AF186216.
1)Figure12AlignmentofnuclearRNasePgenesfromPichiacanadensis(AF186219.
1)andClavisporaopuntiae(AF186216.
1).
AF186219.
1UCCCUCCAAAGUCUGUAUUUUACCUGCCUACAAAAGGAGGAGUCCCCGGCGGACUUCCUCAGUAUUCGCAGGUGGGAAAUAF186216.
1Stem0000.
11111.
222.
.
.
.
.
222.
.
11111.
0000.
.
.
.
.
.
.
AF186219.
1UCGGUGAAAUCGCUCUGCCCACCAGGGAAAAGGUAAAACUCUCCCUGGUCCUUGGAAGGACUUGUCCUUCUGAGUCUCGUAF186216.
1.
.
.
.
.
.
GUU.
CUCCCCAUCCCUCUCUGGGUGCUUCUGCAUUCAGAGCGAUAUAA.
UGCUCUUStem3.
.
.
.
4444.
55555.
.
AF186219.
1GAGAGAUGC.
.
.
CAAGCGUGGAGACGCUAGGGUGGUCGCCAUAAGAAACUUCA.
ACAGGUCACACUGUU.
.
.
AF186216.
1GAGAGCUUCCUGGGCGUAGAUAGUUACGCCGGGCGUCGCCAUCAGAAA.
AACAGCAGAGCUAC.
.
.
.
.
.
.
.
G.
.
.
.
CUUUStem#.
.
55555.
.
.
.
.
6666666.
.
.
.
6666666.
.
.
7777777.
AF186219.
1.
.
AUGGGAGGCGCCACGGGCAGUUGGUCCCUUUGCAUCCAGAAGGAAGCUUUGGGGCUGUUGAGUGCAAUAUACAGAGCGAF186216.
1GCAUGGGAGGCGGCGCGGAUGGUUGGUCUUUCAAUACGAGG.
AAAGGCUGUUGAGUGCAAUUUGCGGCC.
.
Stem7777777.
4444.
.
3.
888AF186219.
1CUAGAAGGAGUCCUUCCUUCUACGCGUAAAF186216.
1UUUG.
GCAStem#89999.
9999.
8888.
.
BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page15of23(pagenumbernotforcitationpurposes)absenceofanystructuralinformationfromthetwosequences.
(d)ThistriangulardotplotillustratesthecomparativestructurepredictionforsequenceAofcomputation(3).
Thepixelcoloratco-ordinates(x,y)representsthemar-Fold-alignmentdotplotkeyFigure13Fold-alignmentdotplotkey.
TheseplotscompareseparateandintegratedmethodsforRNAalignmentandfolding.
Regions(a)and(b)usethesingle-sequenceSCFGofFigure7;region(c)usetheTKF91pairHMMofFigure4;andregions(d),(e)and(f)usethepairSCFGofFigure8,whichisbasedontheTKF91StructureTree(seeResultssection).
SequenceB.
.
.
.
.
.
SequenceA.
.
.
SequenceB.
.
.
SequenceA.
.
.
(f)(a)(c)(b)(e)PredictedbasepairsinBAlignmentofA(horiz.
)toB(vert.
)AlignmentofB(horiz.
)toA(vert.
)PredictedbasepairsinAPredictedbasepairsinB(noncomparative)(noncomparative)(ignoringstructure,primarysequencealignmentonly)(takingstructureintoconsideration)(bycomparisonwithA)(bycomparisonwithB)PredictedbasepairsinA(d)BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page16of23(pagenumbernotforcitationpurposes)ginalposteriorprobabilitythatresiduesxandyofAarebase-paired,summedoverallalignmentstosequenceB.
(e)ThistriangulardotplotillustratesthecomparativestructurepredictionforsequenceBofcomputation(3).
Thepixelcoloratco-ordinates(x,y)representsthemar-ginalposteriorprobabilitythatresiduesxandyofBarebase-paired,summedoverallalignmentstosequenceA.
(f)Thisrectangulardotplotillustratesthestructuralpair-wisealignmentofcomputation(3).
Thepixelcoloratco-ordinates(x,y)representsthemarginalposteriorproba-bilitythatresiduexofBishomologoustoresidueyofA,summedoverallsecondarystructuresofsequencesAandB.
Notethattheorientationofthisplotisflipped(reflectedaboutthediagonalaxis)relativeto(c).
Inaddition,the"true"(published)structuresandalign-mentsareoverlaidonthecomputationalresultsasbluesquares(orbluedots,onthelargerimages).
TherateparametersusedfortheTKF91StructureTreewereobtainedbymaximumlikelihoodtrainingfromarandomselectionofstructurally-annotatedRFAMalign-ments,asfollows:λ1=0.
027,1=0.
03;λ2=0.
007,2=0.
01;p1(S)=0.
01.
ThesubstitutionrateparametersweretakenfromthePFOLDprogram[12].
Theevolutionary"time"betweenthetwosequenceswassetto1ineachcase.
InthecaseoftheRNasePandU2genes,theDPalgorithmswerecon-strainedtoabandalongthemaindiagonaloftheDPmatrix;thisconstraintwasimposedduetolimitedmem-ory.
Nosuchconstraintwasimposedforthepurineribos-witchcomputations.
Theposteriorprobabilitiesoffoldingandalignment(dot-plotsa-f)obtainedbyDPonthesethreeclassesofelementareshowninFigure14(forthepurineriboswitches),Fig-ure15(forthenanosTCEs),Figure16(fortheU2snRNAs)andFigure17(forthebacterialnuclearRNasePgenes).
Inallcases,thePairSCFGsharplyresolvesthemostprobablestemsinthesequences;forthenanos,U2andRNasePsequences,italsoresolvesthepairwisealignment.
PurineriboswitchThepurineriboswitchesareaclassofcis-actingregulatoryelementsthatspecificallybindadenineorguanineandareinvolvedinthepost-translationalregulationofpurinetransportandbiosynthesis[23].
Figure9showsthealign-mentofthetworiboswitchsequences,fromBacillushalo-duransandStreptococcuspneumoniae,whichwastakenfromtheRFAMdatabase[14].
Thetwosecondarystruc-turesofthispairareexactlyidentical,althoughthepri-marysequencesareconsiderablydiverged.
Figure14showstheposteriordotplotsforthepurineriboswitches.
Thisisaneasycaseforthemodel,withastrongsignalandfewgaps.
TheTKF91StructureTreegrammar(Figure8)isabletoidentifyallstemscorrectly,withsomeslightuncertaintyoverthealignment.
Thepri-mary-sequenceTKF91grammar(Figure4)issimilarlyabletofindthecorrectalignment,althoughthesingletfoldinggrammar(Figure7)hasdifficultyresolvingthestems(notethatthisgrammardoesnotmodelbasepairstackingeffects).
NanostranslationalcontrolelementThetranslationalcontrolelement(TCE)isaregulatorysequencefromthe3'untranslatedregionoftheDrosophilananosgene,involvedinpost-translationaldegradationandtransportofnanosmRNA,whichlocalisestothepos-teriorofoocytesandothercelllines[24].
Figure10showsthealignmentofthetwoTCEsequences,fromDrosophilavirilisandDrosophilamelanogaster,whichwascuratedbyhandfromthedescriptionbyGavisetal[24].
Thetwosec-ondarystructuresofthispairsharethesameoverallbifurcating-stemstructure,butwithsomechangesinstemlength.
Figure15showstheposteriordotplotsforthenanosTCEs.
ThistimetheTKF91StructureTreegrammar(Figure8)doesconsiderablybetterthantheprimary-sequenceTKF91grammar(Figure4)atfindingthecorrectalign-ment,probablyduetothegapsattheend(theTKF91grammarinFigure4iseffectivelyaglobalalignerwithlin-eargaps,sothatthealignmentsitproducestendtoformacontinuouslinefromcornertocorneroftheDPmatrix,withoutmajordiscontinuities,ascanbeseeninregion(c)ofFigure15).
Again,theStructureTreedoesmuchbetterthanthesingletfoldinggrammar(Figure7)atdistinguish-ingrealstemsfrombackgroundnoise,sinceitisabletousecovariationofbasepairedresiduesasaclue.
U2snRNATheU2smallnuclearRNArecognizesandbindsthebranchpointregionofintronsinpre-mRNA[25].
Figure11showsthealignmentofthetwosplicingfactors,fromTetrahymenathermophilaandLeptomonascollosoma,wastakenfromtheRFAMdatabase[14].
Thesecondarystructuresofthetwosequencesarequitesimilar,buttheLeptomonasU2hasadeletionofroughly35bpthatelim-inatesanentirestem(stems4–6onFigure11).
Figure16showstheposteriordotplotsfortheU2snRNAs.
Asbefore,theStructureTree'sstempredictions(regions(d)and(e),abovethemaindiagonalofFigure16)arefarmorespecificthanthesingletgrammar'spredictionsBMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page17of23(pagenumbernotforcitationpurposes)(regions(a)and(b),belowthemaindiagonal).
Thepri-mary-sequenceTKF91grammar(Figure4)is,again,hamperedbyitsglobalalignmentandlineargappenalty,andthealignmentinregion(c)isstretchedandalsouncertain.
However,thePairSCFG(Figure8)managestoidentifythe35-bpdeletionandcorrectlyfindsstem4ofFold-alignmentdotplotofpurineriboswitches(seeFigure13forkey)Figure14Fold-alignmentdotplotofpurineriboswitches(seeFigure13forkey).
Fromawiderangeofpotentialstems(faintreddiagonallines,plotsa-b),thePairSCFGclearlyresolvesthethreestrongest(sharpwhitediagonallines,plotsd-e).
Inthiscase,thepri-marysequencealignmentisrelativelyclear(plotc)andsolittlealignmentclarityisgainedbyincludingstructuralinformation(plotf).
BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page18of23(pagenumbernotforcitationpurposes)Figure11,thoughstems5–6havealowerprobability(whenpredictingthestructureofthisdeletedregion,thePairSCFGisunabletousecovariationandmustrelyonbasepairinginformationalone).
Fold-alignmentdotplotofnanostranslationalcontrolelements(seeFigure13forkey)Figure15Fold-alignmentdotplotofnanostranslationalcontrolelements(seeFigure13forkey).
Fromarangeofpotentialstems(faintreddiagonallines,plotsa-b),thePairSCFGresolvesthethreetruestemssharplyusingthecomparativesignal(whitediagonallines,plotsd-e).
Someuncertaintyintheprimarysequencealignment(parallelblurredredlines,plotc)isresolvedbyincludingstructuralinformation,includingadeletionintheoutermoststem(brokendiagonalline,plotf).
BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page19of23(pagenumbernotforcitationpurposes)BacterialnuclearRNasePNuclearRNasePisaclassofendoribonucleaseribozymeinvolvedintheproductionofmature5'endsoftransferRNAsduringtRNAbiosynthesis[26].
Figure12showsthealignmentofthetworibozymesequences,fromPichiacanadensisandClavisporaopuntiae,whichwastakenfromFold-alignmentdotplotofU2splicingfactors(seeFigure13forkey)Figure16Fold-alignmentdotplotofU2splicingfactors(seeFigure13forkey).
Fromarangeofpotentialstems(faintreddiagonallines,plotsa-b),thePairSCFGresolvesthetruestemssharplyusingthecomparativesignal(whitediagonallines,plotsd-e).
Thepri-marysequencealignmentishighlyuncertain(blurredredlines,plotc)butthisuncertainty,includingthedeletionofawholestem,isresolvedbyincludingstructuralinformation(brokendiagonalline,plotf).
BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page20of23(pagenumbernotforcitationpurposes)theRFAMdatabase[14].
Thesecondarystructuresofthetwosequencesarequitedifferent,withmajorchangeinstemlengthanddeletionofwholestemstructures,charac-teristicofthisgenefamily(stems0–2and8–9ofFigureFold-alignmentdotplotofnuclearRNasePgenes(seeFigure13forkey)Figure17Fold-alignmentdotplotofnuclearRNasePgenes(seeFigure13forkey).
Fromawiderangeofpotentialstems(faintreddiag-onallines,plotsa-b),thePairSCFGresolvesseveralsharplyusingthecomparativesignal(sharpwhitediagonallines,plotsd-e).
Uncertaintyintheprimarysequencealignment(wideredlines,plotc)isalsoresolvedbyincludingstructuralinformation(arrowwhiteline,plotf).
BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page21of23(pagenumbernotforcitationpurposes)12).
Figure17showstheposteriordotplotsfortheRNasePgenes.
ThisfamilyisoneofthemostmutableinRFAM,andtheTKF91StructureTreeperformspoorlyonthiscase.
BoththePairHMM(Figure4;region(c)ofFigure17)andthePairSCFG(Figure8;region(f)ofFigure17)getthealignmentalmostentirelywrong,exceptforaregiontowardthe3'endthatdoesn'tcontainanystems(theregionjustbeforestem8ofFigure12).
Asaconsequence,thePairSCFGalsofailstopredictanystemscorrectly;thesingletSCFG(Figure7)doesnobetter.
Region(f)ofFigure17displaysthecontinuous-linealign-mentfromcorner-to-corner,thatischaracteristicofglobalalignerswithlineargaps:unlikethecaseoftheU2align-ment,thestructuralsignalhereisinsufficienttocompen-satefortheindel-modelingdeficienciesoftheTKF91StructureTree.
Thelog-oddsscoreofthe"true"alignment(Figure12)undertheStructureTreemodelishighlynegative(-547bits),suggestingthatthemodelispoorlyadaptedforthisexample.
Comparethiswiththescoresforthepreviousexamples:Figure9scored2bits,Figure10scored-82bitsandFigure11scored35bits.
ThelowscoreforthenanosTCEs(Figure10)wasdueprimarilytothedeletionsintheoutermoststem;thescoreroseto-5bitswithjudicioustrimmingandcarefulchoiceofthe"time"parameter.
DiscussionWehavedescribedareversiblecontinuous-timeMarkovchain,calledthe"TKF91StructureTree",thatdescribesboth(i)covariantsubstitutionsandindelsinRNAsequencecontingentuponaparticularsecondarystructure,and(ii)changesintheunderlyingRNAsecond-arystructure,correspondingtogainandlossofsubstruc-tures.
ApairwisealignmentalgorithmbasedonthemodelhasbeenimplementedinC++andtestedonfourhomologouspairsofRNAfunctionalelementfromRFAM[14].
AswiththeTKF91modelonwhichtheTKF91Struc-tureTreeisbased[10],itshouldbepossible,systemati-cally,todesigncorrespondingalgorithmsformultiplesequences,usingeitherexhaustivedynamicprogramming[6,27]orMarkovChainMonteCarlo[13].
ItshouldbenotedthatthepresentimplementationoftheTKF91StructureTreeisnotdesignedtobeadirectcom-petitortoprogramslikeFOLDALIGN[20],DYNALIGN[21]orCARNAC[28].
Suchpairwisealignmentprogramsareoptimizedforcriterialikealignmentaccuracyandsen-sitivity.
TheTKF91StructureTree,ontheotherhand,wasdesignedasanexpositoryevolutionarymodel,ultimatelyaimedatphylogeneticanalysisofmultipleRNAsequencesinanevolutionarylikelihoodcontext.
Thepair-wisealignmentprogramreportedinthispaperwasimple-mentedtodemonstratethepotentialofthisevolutionarymodel,ratherthanforuseasapracticalalignmenttool.
Theauthor'sSTEMLOCprogram,whichissimilarlybasedonPairSCFGs,hasbeenoptimizedforpracticalapplications(preferringshort-termperformanceadvan-tagesoverlong-termdesignconsiderations)andmaybefreelydownloadedfromhttp://www.
biowiki.
org/.
TheresultsofourtestsonpairwisealignmentsfromRFAMrevealthestrengthsandweaknessesofourmodel.
WhenRNAstructureisverystronglyconservedandindelsarefew,aswiththepurineriboswitchesselectedforthiscomparison(Figure14),theTKF91StructureTreeper-formswellatbothstructurepredictionandalignment.
Onsuchalignments,themodelisexpectedtobesimilartoPFOLD[12],whichusesanSCFGandanevolutionarysubstitutionmodelbutlacksanevolutionarytreatmentofgaps.
Whenthealignmenthasnumerousindelsinloopsandstems,asintheselectednanosTCEs(15),orevenminorrearrangementsofstructure,asintheselectedU2splicefactors(16),theStructureTreestillseemstoworkwell.
However,beyondacertainlevelofstructuralchange,asintheselectedRNasePalignment(17),themodelperformspoorlyandleavesconsiderableroomforimprovement.
Inviewoftheroomforimprovement,wecanidentifyanumberofweaknessesoftheTKF91StructureTreethatcouldbeimprovedinfuturemodels:Sourcesofdegeneracysuchaszero-lengthstemsandloopswereremoved"byhand"fromthePairSCFG(Table1).
Thesedegeneraciescouldhavebeeenspecificallyexcludedfromtheevolutionarymodel,butwiththeapparentcostofmakinganexactsolutionmuchhardertofind.
OnemightexpectthenondegenerategrammarsofFigures7and8toapproximatethetransitionprobabili-tiesofsuchanondegeneratemodel.
Indelratesforwholestems/multistemsaresameasforunpairedresidues.
Innature,stemgainandlossismuchslowerthanunpairedresidueinsertion/deletion,sincetheformerisastructuralchangewhilethelatterisnot.
Multiple-residueindelevents,andhenceaffinegappen-alties,arenotallowed.
Again,thepoorperformanceontheRNasePalignmentmayinpartbeduetothis:thealignmentgeneratedhasmanysmallgapsscatteredthroughout,whereasthe"true"alignmenthasfewer,longergaps.
Thisisacharacteristicartefactofusingapointindelmodel(lineargappenalty)whereamulti-residueindelmodel(affinepenalty)wouldbemoreappropriate.
Stemscannotbedeletedwithoutdeletingalltheir"chil-dren"aswell(i.
e.
allstemsnestedinside).
EmpiricalinspectionofalignmentsinRFAM,however,revealsmanystructureswhereanouterstemhasbeendeletedortrun-BMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page22of23(pagenumbernotforcitationpurposes)cated,whileinnerstemsarepreserved.
Again,perhapsanaffinegappenaltyforcovariantindels(i.
e.
indelsinstems)wouldaddressthis.
Alternatively,onemightcon-trivesomekindof"ragged-end"localalignmentmodel,e.
g.
byembeddingthefiniteTKF91StructureTreeinaninfinite,unobservedStructureTree(c.
f.
[8]),thoughthismaynotbetheidealwaytomodelsucheffects.
Theequilibriumdistributionoverstructuresishighlysimplified.
Forexample,thereiscurrentlynomodelingoffine-scaleenergeticssuchasbasepairstackingpropensitiesduetoπ-orbitalconjugation.
Mathematically,thecom-plexitiesofmodelingsucheffectsaresomewhatsimilartothoseinvolvedinmodelingnearest-neighborsubstitutionbiasesinDNA(suchasmethylation-inducedCpGdeami-nation).
Sincerecentprogresshasbeenmadewithsuchmodels[29,30]wemighteventuallyexpectinclusionofstackingeffectsinmodelsofcovariantRNAsubstitution,aswell.
Bulgescannotbeinsertedintostems,exceptviathefol-lowingawkwardmechanism:theinsertionofabulgeintoastemrequiresthepre-existenceofanullS→L→Stran-sition,wheretheLisempty.
Tofixthis,Lnodescouldbeallowedinstemsequences,justasSnodesareallowedinloopsequences(infact,oneshouldprobablyintroduce"left"and"right"L-nodes,correspondingtoleft&rightbulges).
However,thiswouldincreasethepotentialfordegeneracies.
Wehaveassumedthatallstemsandloopsevolveatthesamerate,whereasempiricalinspectionofRFAMofsuggestsotherwise.
Itisknownthattheanalogousassumptioninproteins(thatallsitesevolveatthesamerate)canskewphylogeneticdistanceestimation[31],andperhapsasimilarcorrectiontothediscretizedgammapri-orsusedinproteinscouldbeappliedhere[32].
Thereisnospecialtreatmentofstructuralfeaturessuchastriloops,tetraloops,triple-Aplatforms,U-turnsandthelike.
Suchfeaturesareoftenobservedtobeevolutionaryconserved[33,34]andseemlikelytobeinvolvedinintermolecularinteractions[35,36].
ItwouldberelativelyeasytoincorporatesuchfeaturesintotheTKF91StructureTree,asspecialclassesofL-orS-branch.
WhilethelengthsofstemsequencesaregeometricallydistributedintheTKF91StructureTree,duetotheirrootsintheTKF91model,empiricalobservationsofrealRNAstructuressuggestthatrealstemlengthsfollowafairlytightlengthdistribution.
Suchapproximationsinmode-lingstemlengthscouldconceivablycontributetopoorerperformanceofthemodel.
(Inpractise,wehavenotobservedunnaturallylongstemsintheoutputoftheTKF91StructureTreealigner,buttheexistenceofalong,perfectinvertedrepeatinthesequencecouldconceivablybringoutthisproblem.
)Despitethesedrawbacks,theresultsofourpreliminarybenchmarksuggestthattheTKF91StructureTreemaybeusefulforaligning(atleastthebetter-conserved)RNAfunctionalelements.
GiventherecentgrowthofRNAsequenceandstructuredatabasessuchasRFAM[14]andSCOR[34],itwouldbeinterestingtocarryoutabroad-scale,empiricalstudyofthemutationsofRNAstructures.
Thiscouldthenbeusedasastartingpointforsystemati-callydesigningandbenchmarkinganimprovedevolutionarymodelforRNA.
Inthemeantime,theresultspresentedheresuggestnewwaysofdesigningevolution-arygrammarsthatrecognisehigher-levelstructuralchangeaswellaspointsubstitutionsandindels,offeringnewwaysofusinghigh-throughputcomparativesequencingtointerpretthecontentsofgenomes.
AcknowledgementsThismanuscripthasevolvedoverthecourseofdiscussionswithSamGrif-fiths-Jones,AlexBateman,SeanEddy,ElenaRivas,EricWesthof,BjarneKnudsen,KushalChakrabarti,JotunHein,GertonLunterandDavidHaus-sler.
Theauthorthankstwoanonymousreviewersforhelpfulcriticism.
TheworkwaspartlydevelopedduringaworkshopinBenasque,SpainorganisedbyElenaRivasandEricWesthofin2003.
TheworkwaspartiallysupportedbygrantsfromtheUK'sEPSRC(codeHAMJW)andMRC(codeHAMKA).
References1.
RivasE,EddySR:NoncodingRNAgenedetectionusingcom-parativesequenceanalysis.
BMCBioinformatics2001,2:8.
2.
RivasE,KleinRJ,JonesTA,EddySR:Computationalidentifica-tionofnoncodingRNAsinE.
colibycomparativegenomics.
CurrentBiology2001,11:1369-1373.
3.
HenikoffS,HenikoffJG:Aminoacidsubstitutionmatricesfromproteinblocks.
ProceedingsoftheNationalAcademyofSciencesoftheUSA1992,89:10915-10919.
4.
KleinRJ,EddySR:RESEARCH:FindinghomologsofsinglestructuredRNAsequences.
BMCBioinformatics2003,4:44.
5.
DayhoffMO,SchwartzRM,OrcuttBC:AModelofEvolutionaryChangeinProteins.
InInAtlasofProteinSequenceandStructureVol-ume5.
Issuesupplement3Editedby:DayhoffMO.
Washington,DC:NationalBiomedicalResearchFoundation;1978:345-352.
6.
HolmesI:Usingguidetreestoconstructmultiple-sequenceevolutionaryHMMs.
InInProceedingsoftheEleventhInternationalConferenceonIntelligentSystemsforMolecularBiologyMenloPark,CA:AAAIPress;2003:147-157.
7.
FelsensteinJ:InferringPhylogeniesSinauerAssociates,Inc;2003.
[ISBN0878931775]8.
MiklósI,LunterG,HolmesI:Alongindelmodelforevolutionarysequencealignment.
MolecularBiologyandEvolution2004,21(3):529-540.
9.
KnudsenB,MiyamotoM:SequenceAlignmentsandPairHid-denMarkovModelsUsingEvolutionaryHistory.
JournalofMolecularBiology2003,333(2):453-460.
10.
ThorneJL,KishinoH,FelsensteinJ:AnEvolutionaryModelforMaximumLikelihoodAlignmentofDNASequences.
JournalofMolecularEvolution1991,33:114-124.
11.
PedersenJS,HeinJ:GenefindingwithahiddenMarkovmodelofgenomestructureandevolution.
Bioinformatics2003,19(2):219-227.
12.
KnudsenB,HeinJ:RNAsecondarystructurepredictionusingstochasticcontext-freegrammarsandevolutionaryhistory.
Bioinformatics1999,15(6):446-454.
13.
HolmesI,BrunoWJ:EvolutionaryHMMs:aBayesianapproachtomultiplealignment.
Bioinformatics2001,17(9):803-820.
PublishwithBioMedCentralandeveryscientistcanreadyourworkfreeofcharge"BioMedCentralwillbethemostsignificantdevelopmentfordisseminatingtheresultsofbiomedicalresearchinourlifetime.
"SirPaulNurse,CancerResearchUKYourresearchpaperswillbe:availablefreeofchargetotheentirebiomedicalcommunitypeerreviewedandpublishedimmediatelyuponacceptancecitedinPubMedandarchivedonPubMedCentralyours—youkeepthecopyrightSubmityourmanuscripthere:http://www.
biomedcentral.
com/info/publishing_adv.
aspBioMedcentralBMCBioinformatics2004,5:166http://www.
biomedcentral.
com/1471-2105/5/166Page23of23(pagenumbernotforcitationpurposes)14.
Griffiths-JonesS,BatemanA,MarshallM,KhannaA,EddySR:Rfam:anRNAfamilydatabase.
NucleicAcidsResearch2003,31:439-441.
15.
ThorneJL,KishinoH,FelsensteinJ:InchingTowardReality:anImprovedLikelihoodModelofSequenceEvolution.
JournalofMolecularEvolution1992,34:3-16.
16.
MiklósI,ToroczkaiZ:AnImprovedModelforStatisticalAlign-ment.
InInFirstWorkshoponAlgorithmsinBioinformaticsBerlin,Hei-delberg:Springer-Verlag;2001.
17.
DurbinR,EddyS,KroghA,MitchisonG:BiologicalSequenceAnalysis:ProbabilisticModelsofProteinsandNucleicAcidsCambridge,UK:Cam-bridgeUniversityPress;1998.
18.
HolmesI,RubinGM:PairwiseRNAstructurecomparisonusingstochasticcontext-freegrammars.
PacificSymposiumonBiocomputing2002.
19.
SankoffD:SimultaneoussolutionoftheRNAfolding,align-ment,andprotosequenceproblems.
SIAMJournalofAppliedMathematics1985,45:810-825.
20.
GorodkinJ,HeyerLJ,StormoGD:FindingthemostsignificantcommonsequenceandstructuremotifsinasetofRNAsequences.
NucleicAcidsResearch1997,25(18):3724-3732.
21.
MathewsDH,TurnerDH:Dynalign:analgorithmforfindingthesecondarystructurecommontotwoRNAsequences.
JournalofMolecularBiology2002,317(2):191-203.
22.
HolmesI,RubinGM:ExpectationMaximizationalgorithmfortraininghiddensubstitutionmodels.
JMolBiol2002,317(5):757-768.
23.
MandalM,BoeseB,BarrickJE,WinklerWC,BreakerRR:Ribos-witchesControlFundamentalBiochemicalPathwaysinBacillussubtilisandOtherBacteria.
Cell2003,113:577-586.
24.
CrucsS,ChatterjeeS,GavisER:OverlappingbutdistinctRNAelementscontrolrepressionandactivationofnanostranslation.
Molecularcell2000,5(3):457-467.
25.
BerglundJA,RosbashM,SchultzSC:Crystalstructureofamodelbranchpoint-U2snRNAduplexcontainingbulgedadenosines.
RNA2001,7:682-691.
26.
FrankDN,AdamidiC,EhringerMA,PitulleC,PaceNR:Phyloge-netic-comparativeanalysisoftheeukaryalribonucleasePRNA.
RNA2000,6:1895-1904.
27.
HeinJ:AnAlgorithmforStatisticalAlignmentofSequencesRelatedbyaBinaryTree.
InInPacificSymposiumonBiocomputingEditedby:AltmanRB,DunkerAK,HunterL,LauderdaleK,KleinTE.
Singapore:WorldScientific;2001:179-190.
28.
PerriquetO,TouzetH,DauchetM:Findingthecommonstruc-turesharedbytwohomologousRNAs.
Bioinformatics2003,19:108-116.
29.
LunterG,HeinJ:Anucleotidesubstitutionmodelwithnear-est-neighbourinteractions.
Bioinformatics2004.
[Toappear]30.
SiepelA,HausslerD:Phylogeneticestimationofcontext-dependentsubstitutionratesbymaximumlikelihood.
Molec-ularBiologyandEvolution2004,21(3):468-488.
31.
BrunoWJ,HalpernAL:Topologicalbiasandinconsistencyofmaximumlikelihoodusingwrongmodels.
MolecularBiologyandEvolution1999,16:564-566.
32.
YangZ:Maximum-likelihoodestimationofphylogenyfromDNAsequenceswhensubstitutionratesdifferoversites.
MolecularBiologyandEvolution1993,10:1396-1401.
33.
KlostermanPS,TamuraM,HolbrookSR,BrennerSE:SCOR:astructuralclassificationofRNAdatabase.
NucleicAcidsResearch2002,30:392-394.
34.
KlostermanPS,HendrixDK,TamuraM,HolbrookSR,BrennerSE:Three-dimensionalmotifsfromtheSCOR,structuralclassi-ficationofRNAdatabase:extrudedstrands,basetriples,tetraloopsandU-turns.
NucleicAcidsResearch2004,32(8):2342-2352.
35.
VaraniG:RNA-proteinintermolecularrecognition.
Accountsofchemicalresearch1997,30(5):190-195.
36.
WuH,HenrasA,ChanfreauG,FeigonJ:Structuralbasisforrec-ognitionoftheAGNNtetraloopRNAfoldbythedouble-strandedRNA-bindingdomainofRnt1pRNaseIII.
ProceedingsoftheNationalAcademyofSciencesoftheUSA2004,101(22):8307-8312.
photonvps怎么样?photonvps现在针对旗下美国vps推出半价促销优惠活动,2.5美元/月起,免费10Gbps DDoS防御,Linux系统,机房可选美国洛杉矶、达拉斯、芝加哥、阿什本。以前觉得老牌商家PhotonVPS贵的朋友可以先入手一个月PhotonVPS美国Linux VPS试试了。PhotonVPS允许合法大人内容,支持支付宝、paypal和信用卡,30天退款保证。Photo...
Pia云这个商家的云服务器在前面也有介绍过几次,从价格上确实比较便宜。我们可以看到最低云服务器低至月付20元,服务器均采用KVM虚拟架构技术,数据中心包括美国洛杉矶、中国香港、俄罗斯和深圳地区,这次春节活动商家的活动力度比较大推出出全场6.66折,如果我们有需要可以体验。初次体验的记得月付方案,如果合适再续约。pia云春节活动优惠券:piayun-2022 Pia云服务商官方网站我们一起看看这次活...
legionbox怎么样?legionbox是一家来自于澳大利亚的主机销售商,成立时间在2014年,属于比较老牌商家。主要提供VPS和独立服务器产品,数据中心包括美国洛杉矶、瑞士、德国和俄罗斯。其中VPS采用KVM和Xen架构虚拟技术,硬盘分机械硬盘和固态硬盘,系统支持Windows。当前商家有几款大硬盘的独立服务器,可选美国、德国和瑞士机房,有兴趣的可以看一下,付款方式有PAYPAL、BTC等。...
33333333333为你推荐
苹果appstore宕机苹果appstore打不开怎么办搜狗360电脑自动安装360安全浏览器字节跳动回应TikTok易主#北京字节跳动科技有限公司#小说审核有三面么?我面试了两轮就叫我回家等消息了 要是刷下来了也该告文档下载怎么下载百度文档文档下载怎样在手机上建立word的文档? 需要下载什么软件?即时通民生银行即时通是什么?discuz伪静态DZ怎么开启全站伪静态团购程序团购的具体流程是什么?仿佛很简单便捷的样子?dezender如何破解Zend及ionCube加密的php文件dz论坛DZ论坛与PW论坛有什么区别?
景安vps 企业域名备案 腾讯云盘 bash漏洞 租空间 java空间 毫秒英文 大容量存储器 699美元 美国在线代理服务器 中国电信宽带测速网 鲁诺 上海联通宽带测速 东莞idc 路由跟踪 中国联通宽带测试 攻击服务器 双十二促销 最新优惠 傲盾代理 更多