Xthsyntaxhighlighter

syntaxhighlighter  时间:2021-05-18  阅读:()
SOFTWAREOpenAccessPlastid:nucleotide-resolutionanalysisofnext-generationsequencingandgenomicsdataJoshuaG.
Dunn1,2,3,4*andJonathanS.
Weissman1,2,3,4AbstractBackground:Next-generationsequencing(NGS)informsmanybiologicalquestionswithunprecedenteddepthandnucleotideresolution.
Theseassayshavecreatedaneedforanalyticaltoolsthatenableuserstomanipulatedatanucleotide-by-nucleotiderobustlyandeasily.
Furthermore,becausemanyNGSassaysencodeinformationjointlywithinmultiplepropertiesofreadalignments―forexample,inribosomeprofiling,thelocationsofribosomesarejointlyencodedinalignmentcoordinatesandlength―analyticaltoolsareoftenrequiredtoextractthebiologicalmeaningfromthealignmentsbeforeanalysis.
Manyassay-specificpipelinesexistforthispurpose,butthereremainsaneedforuser-friendly,generalized,nucleotide-resolutiontoolsthatarenotlimitedtospecificexperimentalregimesoranalyticalworkflows.
Results:PlastidisaPythonlibrarydesignedspecificallyfornucleotide-resolutionanalysisofgenomicsandNGSdata.
Assuch,Plastidisdesignedtoextractassay-specificinformationfromreadalignmentswhileretaininggeneralityandextensibilitytonovelNGSassays.
PlastidrepresentsNGSandotherbiologicaldataasarraysofvaluesassociatedwithgenomicortranscriptomicpositions,andcontainsconfigurabletoolstoconvertdatafromavarietyofsourcestosucharrays.
Plastidalsoincludesnumeroustoolstomanipulateevendiscontinuousgenomicfeatures,suchassplicedtranscripts,withnucleotideprecision.
Plastidautomaticallyhandlesconversionbetweengenomicandfeature-centriccoordinates,accountingforsplicingandstrand,freeingusersofburdensomeaccounting.
Finally,Plastid'sdatamodelsuseconsistentandfamiliarbiologicalidioms,enablingevenbeginnerstodevelopsophisticatedanalyticalworkflowswithminimaleffort.
Conclusions:PlastidisaversatiletoolkitthathasbeenusedtoanalyzedatafrommultipleNGSassays,includingRNA-seq,ribosomeprofiling,andDMS-seq.
ItformsthegenomicengineofourORFannotationtool,ORF-RATER,andisreadilyadaptedtonovelNGSassays.
Examples,tutorials,andextensivedocumentationcanbefoundathttps://plastid.
readthedocs.
io.
Keywords:Sequencing,Genomics,Bioinformatics,Python,RibosomeprofilingBackgroundNextgenerationsequencing(NGS)hastransformedbiology.
Beyondenablingtherapidsequencingofge-nomes,increasinglysophisticatedNGSassayshaveempoweredbiologiststoprobeawidearrayofbiologicalprocesseswithunprecedentedprecisionanddepth,providedthatthedesiredinformationcanbeencodedwithinanucleicacidsequence.
ManyNGSassaysen-codenucleotide-resolutioninformationwithinmultiplepropertiesofsequencingreads―suchastheiralign-mentcoordinates,lengths,orsitesatwhichtheymis-matchareferencesequence―andthusrequireanalyticaltoolsthatdecodebiologicaldatafromsuchproperties.
Onesuchassayisribosomeprofiling,inwhichthepositionsoftheribosomalP-sitesarejointlyencodedbythelengthsandpositionsofalignedsequencingreads[1,2].
Anotherexampleisbisulfite*Correspondence:joshua.
g.
dunn@gmail.
com1CaliforniaInstituteofQuantitativeBiosciences,SanFrancisco,USA2DepartmentofCellularandMolecularPharmacology,UniversityofCaliforniaSanFrancisco,SanFrancisco,CA,USAFulllistofauthorinformationisavailableattheendofthearticleTheAuthor(s).
2016OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttribution4.
0InternationalLicense(http://creativecommons.
org/licenses/by/4.
0/),whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicense,andindicateifchangesweremade.
TheCreativeCommonsPublicDomainDedicationwaiver(http://creativecommons.
org/publicdomain/zero/1.
0/)appliestothedatamadeavailableinthisarticle,unlessotherwisestated.
DunnandWeissmanBMCGenomics(2016)17:958DOI10.
1186/s12864-016-3278-xsequencing,inwhichthemethylationstatusofcytosineresiduesisencodedinthegenomiclocationsofC-to-Ttransitionswithinreadalignments[3,4].
Becausedecodingbiologicalinformationfromreadalignmentsisnottrivial,awidearrayofsoftwarehasbeendevelopedforthispurpose.
Atoneextremearetoolsdedicatedtospecific,predefinedanalysisofdatafromasingleassay,suchasriboSeqR[5],RiboTools[6],andRiboGalaxy[7]forribosomeprofiling,orPROTEO-FORMER[8],ORFscore[9],orORF-RATER[10]fordenovoproteindiscoveryandORFannotation.
Toolsliketheseareuser-friendly,but,asaconsequenceoftheirdesign,aredifficulttoadapttonovelpurposes.
Attheotherextremearelow-level,generalizedtools,likeSAM-tools[11]andBEDtools[12],thatarenotdesignedfororlimitedtoanyspecificassayorexperimentalsetup.
Thesetoolsareextremelypowerful,butusingthemre-quiressubstantialexpertiseinprogrammingandaware-nessofseeminglyesotericfileformats.
Betweentheseextremeslieanumberofuser-friendlyandgeneral-purposetoolkits,suchasHTSeq[13],Metaseq[14],bx-python[15],andBioconductor[16].
Butthese,intheirpresentforms,arelimitedintheirabilitiestodecodein-formationfromrawreadalignments,tomanipulate(or,insomecases,eventorepresent)discontinuousgenomicfeaturessuchasmulti-exontranscripts,ortoperformnucleotide-resolutionanalysis.
Thesituationisfurthercomplicatedbythefactthatmanyfileformatshavebeeninventedtodescribeonlyahandfulofdatatypesingen-omics(Table1),andthatevensynonymousfiletypescanbetreatedinconsistentlywithintoolkits.
HereweintroducePlastid,aPythonlibraryfornucleotide-resolutionanalysisofgenomicsdata.
Plastidisdesignedtoretaintheuser-friendlinessofpipelinetoolsdesignedforspecificNGSassays,likeRiboGalaxy,withoutsacrificingthegeneralityandpoweroflow-leveltools,likeBEDtools.
Givenitsgoals,Plastid'sdesigndif-ferssubstantiallyfromexistingpackages(Fig.
1):First,Plastid'sinternalanalysispipelineintroducesastageinwhichmappingfunctionsextracttherelevantbiologicalinformationfromvariouspropertiesofrawreadalignments.
Biologicaldataarethenexposedtousersasvectorsofinformation―suchasthenumberofribosomalP-sitesorC-to-Tmismatchesfoundateachnucleotideposition―ratherthanlistsofreadalign-mentsorvectorsofrawcoverage.
Becausemappingfunctionscanperformarbitrarytransformationsonpropertiesofreadalignments,theyaddsubstantialflexi-bilitytoPlastid'sdesign:amappingfunctionsuitedtoagivenNGSassaytailorsPlastid'stoolstothatassay(Fig.
2).
Uniquely,Plastid'smappingfunctionsarecon-figurableandreplaceableratherthanhard-coded.
Thus,Plastidhasbeenusedtoanalyzedatafromnumeroustypesofexperiments,includingribosomeprofiling,RNA-seq,DMS-seq,andbisulfitesequencing,andcanbeusedforotherassays(e.
g.
ChIP-seq,CAGE-seq,pseudouridineprofiling)simplybychoosingappropriateparametersforexistingmappingfunctions,orbyimple-mentingnewones.
Second,Plastidintroducesanoveldatamodel,theSegmentChain,todescribemulti-exontranscriptsandotherdiscontinuousfeatures.
SegmentChainsareawareoftheirowndiscontinuityandusethisawarenesstoencapsulatemanynucleotide-wiseoperationsthatareabsentfromothertoolkits,suchasconversionofcoordinatesorvectorizeddatabetweengenomicandtranscript-centricspaces.
SegmentChainsautomaticallyaccountforsplicingandcomplementing,andthusre-duceusererrorduringmanytaskscommoninposition-wiseanalysis(Fig.
3).
Third,Plastidprovidesconsistentrepresentationsandbehaviorsforthevariouscategoriesofgenomicdata,re-gardlessofunderlyingfileformats.
Plastid'stoolsthusenableuserstofocusonbiologicalquestionsratherthandatarepresentation(Fig.
1,Table1).
Finally,Plastid'sintendedaudienceincludesbenchscientistsandnovicesaswellasseasonedbioinformati-cians.
Forthisreason,Plastiddefinesaminimalsetsofdatastructuresthat,whenpossible,havehuman-readablenamesandaremodeledonbiologicalobjects—suchassplicedtranscripts—ratherthanonmoreab-stractnotions.
Userscanthusleveragetheirbiologicalknowledgewhenwritingorreadingcode(Fig.
4).
Inadditiontotoolsfornucleotide-resolutionexplora-torydataanalysis(EDA),Plastidincludescommand-linescriptsthatautomateanalysisworkflowsusedforanumberofcommonNGSassays,suchasRNA-seqandTable1FileformatsusedingenomicsDatatypeFormatImplementationFeatureannotations(e.
g.
genes,transcripts,exons,originsofreplication)BED,extendedBED*PlastidBigBedPlastid+kentUtils[46]GTF2*PlastidGFF3*PlastidPSL*PlastidReadalignmentsbowtiePlastidBAMPlastid+Pysam[27]ReducedcountdatabedGraphPlastidBigWigPlastid+kentUtils[46]wiggle(fixedStep)Plastidwiggle(variableStep)PlastidSequenceFASTAviaBiopython[20]twobitviatwobitreader[21]Foreachcategoryofgenomicsdata,manyfileformatsexist.
Plastidincludesreadersforeachformatthatstandardizetherepresentationofdataforeachtype,sothatthemeaningofeachdatatypeisseparatedfromitsformatondisk.
*tabixcompressionfortheseformatsissupportedviaPysam[27]DunnandWeissmanBMCGenomics(2016)17:958Page2of12ribosomeprofiling.
Unlikesimilarimplementationsfoundinothertoolkits,Plastid'sscriptsleveragemap-pingfunctions,sothatevencommontasks,suchasex-portofbrowsertracksforvisualizationofdata,maybetailoredtoaspecificbiologicalquestion:forexample,dependingonthemappingfunctioninuse,Plastid'smake_wigglescriptcanexportabrowsertrackofmappedribosomalP-sites,modifiednucleotides,oranyotherdataencodedwithinthereadalignments,insteadofsimplereadcoverage.
LiketherestofPlastid'stools,thesescriptscanbegeneralizedtonovelassayswiththeimplementationofnewmappingfunctions.
Together,Plastid'sfeaturesenablenoviceandadvanceduserstodevelopanalyticalworkflowsthatarebothcon-creteandsophisticated,usingfamiliaridiomsandfewlinesofcode.
Tosupportourusers'efforts,weofferexten-sivedocumentation,step-by-stepwalkthroughsofvariousanalysistasks,andademodatasetforthosewalkthroughsathttps://plastid.
readthedocs.
io.
ImplementationRepresentationofquantitativedataManyNGSassaysencodenucleotideresolutiondata,andeffectivelyassociateaquantitativevaluewitheachgenomicortranscriptomicposition.
Anaturalrepresen-tationforsuchdataisavectororarrayofvalues,eachpositioninthearraycorrespondingtoanucleotidewithinaregionofinterest.
Plastidadoptsthisrepresentationandrepresentsquantitativedataassociatedwithgenomicpositions–suchasthenumberofsequencingreadsalignedtoagivenposition,aphylogeneticconservationscore,orlocalG/Ccontent–usingobjectscalledGenomeArrays.
WithinGenomeArrays,dataareindexedbychromo-some,nucleotideposition,andstrand,andmaybeaccessedviaaPythondictionary-likeinterfaceusingaSegmentChainasakey.
GenomeArraysreturndatainanarrayformat(NumPyarray;[17])whosepositionscorrespondtonucleotidepositionsinthegivenregionsofinterest.
TheuseofNumPyarraysenablesthedatatobeusedbythevastlibraryofscientifictoolscompatiblewiththeSciPy(ScientificPython)stack[18],andthuscreatesausefulbridgebetweengenomicsdataandexist-ingscientificinfrastructureinPython.
PlastidincludesimplementationsofGenomeArraystailoredtoanumberoffileformats,includingbedGraph,BigWig,andfixed-steporvariable-stepwiggle(Table1).
Withtheaidofmappingfunctions,GenomeArrayscanalsoimportreadalignmentsinBAMorbowtieformats,performingtransformationsatruntime(forBAMfiles),oruponimport(forbowtiefiles).
TransformationsofreadalignmentsPlastid'sGenomeArraysaredesignedtoperformtrans-formationsonreadalignmentstransparentlyduringana-lysis,inordertoextracttherelevantbiology—suchasaDatainvariousformatsExploratorydataanalysisGenomebrowsersR+BioConductorMatlab/OctavePython+SciPystackExternalprogramse.
g.
DESeq2,edgeRGeneexpressionMetageneanalysisCreatebrowsertracksP-siteestimationPhasinganalysisPlastidReadalignmentsQuantitativedataGenomesequenceFeatureannotationsFeatures&transcriptmodelsf(alignment)ArraysofdataovergenomeFig.
1UsesofPlastidinanalysisworkflows.
Plastid(yellowbox)containstoolsforbothexploratorydataanalysis(blue,center)andcommand-linescriptsforspecifictasks(green,right).
Plastidstandardizesrepresentationofdataacrossthevarietyoffileformatsusedtorepresentgenomicsdata(left).
Quantitativedataarerepresentedasarraysofdataoverthegenome.
Readalignmentsmaybetransformedintoarraysusingamappingfunctionappropriatetoagivenassay.
Transcriptsarerepresentedaschainsofsegmentsthatautomaticallyaccountfortheirdiscontinuitiesduringanalysis.
PlastidintegratesdirectlywiththeSciPystack(blue,center).
Forexploratoryanalysisinotherenvironments(blue,above)orfurtherprocessinginexternalprograms(right,green),PlastidimportsandexportsdatainstandardizedformatsDunnandWeissmanBMCGenomics(2016)17:958Page3of12nucleotidemodificationorribosomalP-site—fromwhicheverreadpropertiesencodethem.
Thesetransfor-mationsareimplementedinconfigurablemappingfunctionsthatdeterminethegenomicposition(s)atwhichthebiologyencodedineachalignmentshouldultimatelybecounted(Fig.
2a).
Concretely,mappingfunctionsaremodularcomponentsofGenomeArraystakeasinputaqueryregionofthegenomeandasetofreadalignments,andreturnasoutputanarrayoftrans-formeddatacoveringeachnucleotidepositioninthequeryregion.
Becausemappingfunctionscanexploitanypropertyofareadalignment—forexample,itslengthorsequence—inadditiontoitsalignedposi-tions,theyprovideahighlevelofflexibilityandenablereuseofPlastid'scentraltoolswithdatafromalargevarietyofNGSassays.
Mappingfunctionsareparticularlyimportanttoassaysinwhichsecondarypropertiesofreadalignmentsencodethebiologyofinterest:forexample,mappingfunctionsforribosomeprofilingassigncountstoribosomalP-sites,whichoccuratfixedoffsetsfromthe5′endsofreadalignments,potentiallyvaryingasafunctionofreadlength[1].
P-sitemappingrevealsphenomenathatareobscuredbyrawreaddensity,suchpeaksthatoccurattranslationinitiationsites,ortheperiodicsteppingoftheribosome(Fig.
3b).
Inbisulfitesequencing,onemightuseamappingfunctionthatselectivelyassignscountstothegenomicpositionsofC-to-Ttransitionswithinareadalignment,enablingCpGislandstobediscerned(Fig.
2c).
ForDMS-seqassays—inwhichdimethylsulfonate(DMS)alkylatesunpairedcytosineandadenineresiduesinRNA[19]—onewoulduseamappingfunctionthatassignscountstothealky-latedresidues,allowinginferenceofsecondaryRNAstructure(Fig.
2d).
Plastidincludesconfigurablemappingfunctionsap-plicabletoRNA-seq,ribosomeprofiling,DMS-seq,andanumberofothersequencingassays(Table2).
Whenanovelassayisdeveloped,userscanreadilyimplementamappingfunctiontailoredtotheexperiment.
Plastidcanthenusethenewmappingfunctionasaplug-in,enab-lingimmediateapplicationofextanttoolstothenovelassay.
Examplesandinstructionsforwritingmappingfunctionsareincludedinthemappingrulestutorialathttps://plastid.
readthedocs.
io.
EncapsulationofdiscontinuousgenomicfeaturesAsubstantialshortcomingofmanyexistinggenomicstoolkitsisthatdiscontinuousfeatures,suchassplicedtranscripts,arerepresentedaslistsofindependentlybehaving,continuousfragments.
Formanytasks,thisdesignrequiresuserstoperformlaboriousanderror-pronetransformationstoconvertcoordinatesfromtheNthpositionofatranscript,totheIthpositionofthetranscript'sJthexon,andeventually,totheXthpositioninthecorrespondinggenome.
Alternatively,userscansacrificepositionalinformationandaligntheirsequen-cingdatatoacontinuoustranscriptome,inthiscasepresumingaprioriknowledgeofwhichtranscriptiso-formsarepresent.
AcentraldifferencebetweenPlastidandothertoolkitsisthatPlastid'sencapsulatestranscriptsandotherFig.
2Mappingfunctionsextractbiologicaldatafromreadalignments.
a.
Mappingfunctionsusevariouspropertiesofareadalignmenttodeterminethegenomicposition(s)atwhichitshouldbecounted.
b.
Mappingfunctionsforribosomeprofilingusealignmentcoordinatesandlengthstoestimateribosomepositions,revealingfeaturesoftranslation,likeapeakofdensityatthestartcodon(redcircle)andthree-nucleotideperiodicityofribosomaltranslocation(inset).
c.
Forbisulfitesequencing,thefractionofC-to-Ttransitionsateachcytosinearemapped,revealingaCpGisland.
d.
AmappingfunctionforDMS-seqdifferentiatesstructuredfromun-structuredregionsofaselenocysteineinsertionelementinthe3′UTRofhumanSEPP1.
DMSreactivity(bluebars)matchesAandCresiduespredictedtobeunstructured(yellow)DunnandWeissmanBMCGenomics(2016)17:958Page4of12discontinuousgenomicfeatureswithinsingleobjects,calledSegmentChains,thatareawareoftheirowndiscontinuity(Fig.
3).
Thisdesignobviatestheneedtoseparatelytrackthepotentiallymanyexonsthattogetherconstituteatranscript,andfacilitatesanalysisofphe-nomenathatareeasilydescribedinthecontextofatranscript,butdiscontinuousinthegenome,suchasatranslationalpausesiteinribosomeprofilingdata.
Thus,userscantakeadvantageoftheadditionalinformationpreservedbyaligningreadstoagenome,whileretainingtheconvenienceofaligningtoatranscriptome.
SegmentChainsarealsousefulforanalysesthatsimul-taneouslyconsidertranscriptisoformsthatsharegenomiccoordinates,suchthoseimplementedinORF-RATER[10],atoolwehavedevelopedtoidentifyanddeterminetransla-tionratesofpotentiallyoverlappingopenreadingframesfromribosomeprofilingdata.
Foranalysesspecificallydevotedtotranscripts,asubclassofSegmentChain,calledTranscript,isprovided.
SegmentChainsandTranscriptsprovidetoolsformanycommonoperations,including:mappingcoordinatesbetweenvarioustranscriptisoformsandthegenome(Fig.
3a)fetchingsplicedarraysofgenomicsequence,readalignments,orcountdataatanyoreachnucleotidepositionintheSegmentChainorTranscript(Fig.
3b)ValuesateachgenomicpositionValuesateachpositionintranscriptValuesateachpositionintranscript0TranscriptpositionGenomeposition865,402865,403865,404865,437865,438865,4391,561,5291,561,5301,561,5311,561,7461,561,7501,561,7511,561,7521,561,753aSubchainscoveringtranscriptpositions1-12Genomiccoordinatesofsubchains865,403865,404865,437865,4461,561,5271,561,5311,561,7461,561,752cValuesateachgenomicpositionb.
.
.
.
.
.
.
.
.
1012345.
.
.
0123.
.
.
.
.
.
109873210121287TranscriptRegionstomaskdExcludedfromanalysisFig.
3SegmentChainsautomatemanycommontasks.
a.
SegmentChainandTranscriptobjectsautomaticallyconvertcoordinatesbetweengenomicandtranscript-relativespaces.
b.
SegmentChainsandTranscriptscanthereforeconvertreadalignmentsorquantitativedataalignedtothegenometoarraysofvaluesateachpositioninthechain.
c.
Subsections(green,pink)ofchainscanbefetchedusingstartandendpointsrela-tivetotheparentalchains.
SegmentChainsautomaticallygeneratethecorrespondinggenomiccoordinates.
d.
RegionsofachaincanbemaskedfromcomputationswithoutalteringthechaincoordinatesDunnandWeissmanBMCGenomics(2016)17:958Page5of12fetchingsub-regionsofthechain,preservingtheirdiscontinuity(Fig.
3c)maskingsub-regionsofthechain,suchasrepetitiveregions,fromanalysis(Fig.
3d)testingforequality,overlap,containment,orcoverageofotherSegmentChainsaccessingandstoringdescriptivedata,likegenenamesorIDs,GOterms,databasecrossreferences,ornotesexportingtoBED,GTF2,orGFF3formats,forusewithothersoftwarepackagesorwithinagenomebrowserSimplifiedaccesstogenomicdataIngenomics,thereareprimarilyfourcategoriesofdata—sequencedata,featureannotations(e.
g.
transcriptmodels,codingregions,originsofreplication),quantitativevaluesassociatedwithgenomicpositions(suchasconservationscores),andreadalignments—yetnumerousfileformatshavebeendevelopedtorepresenteachofthesedatatypes.
Furthermore,manyexistingpackagestreatdataofagiventypeinamannerthatdependsuponthetypeoffileinwhichitisstored.
Becomingfamiliarwiththediverseidiosyncrasiesofthesefiletypes—forexample,whethertranscriptsarerepresentedone-exon-per-lineandmustsubsequentlylinkedbyprobingtheirIDs(GTF2,GFF3files)orarecapturedwhollywithinsinglelines(BED,BigBed,PSL)—canbetime-consumingandasignificantimpedimenttoresearch.
Plastidprovidesaminimalsetofconsistentlybehavedobjecttypesforeachcategoryofdata,andreadersforcommonlyusedfileformatsineachcategory(Table1),allowinginvestigatorstofocusontheirdataratherthanitsrepresentationondisk(Fig.
1).
Inparticular,PlastidprovidesreadersthatparsefeatureannotationsinBED,extendedBED,BigBed,GTF2,GFF3andPSLformatsintoSegmentChainsorTranscripts,optionallyrecon-structingtranscriptsfromtheircomponentsinGTF2orGFF3formats;quantitativedatainbedGraph,wiggle,orBigWigformatsintoGenomeArrays;andreadalign-mentsinBAMorBowtie'slegacyformatintoGeno-meArrays,usingmappingfunctionstotransformthedata.
Becauseanumberofexcellentpackagesalreadyexistforparsingnucleotidesequence,Plastiddoesnotimplementnewreadersforsequencedata.
However,itsaRibosomecountsinsecondhalfConsistencyofcountsineachhalfofCDSbFig.
4Plastidstreamlinesanalysis.
a.
Thequalityofaribosomeprofilingdatasetmaybeassayedbycomparingthenumbersofreadcountsinthefirstversussecondhalfofeachcodingregion.
Plastidmakesitpossibletoimplementsuchanalyseswithfewlinesofeasilyreadablecode.
b.
PlastidreadilyintegrateswiththetoolsintheSciPystack.
Here,first-andsecond-halfcountsfrom(a)areplot-tedagainsteachotherusingmatplotlib,andaPearsoncorrelationcoefficientcalculatedusingSciPyDunnandWeissmanBMCGenomics(2016)17:958Page6of12toolsarecompatiblewithanysequencereaderthatreturnsdictionary-likeobjects,suchasthoseinBiopy-thon(fordatainFASTA,GenBank,EMBL,andmanyotherformats;[20])andtwobitreader(for2bitfiles;[21]).
Command-linescriptsInadditiontothelibraryitprovidesforEDA,Plastidin-cludesanumberofcommand-linescriptsthatimplementsequencingworkflowscommonlyusedingenomicsandNGSanalysis(Table3).
Whilesimilarimplementationsexistinothertoolkits,Plastid'sscriptsaredistinctintheiruseofmappingfunctions,whichallowsthemtogeneralizetomanytypesofdataandmetrics.
Forexample,Plastid'smake_wigglescriptgeneratesgenomebrowsertracksfromsequencingalignments,and,dependinguponthemappingfunctioninuse,couldexportatrackofribosomalP-sites,modifiednucleotides,unstructuredregionsofRNA,5′endsofreadalignments,orwhatevertypeofbiologyisaccessedbythemappingfunction.
Inaddition,Plastidintroducesalgorithmsandscriptsforanumberoftasksthatarenotimplementedorarehandledsubstantiallydifferentlyelsewhere.
Wehighlightafewofthesebelow:MaximalspanningwindowsManynucleotide-resolutionanalysesrequirepriorknow-ledgeofwhichtranscriptisoformsarepresent,butsuchknowledgeisfrequentlyunavailable.
Forthiscircum-stance,Plastidintroducestheuseofmaximalspanningwindows(Fig.
5)asanapproachtoisoform-independentanalysis.
Briefly,amaximalspanningwindowisdefinedasaspanofnucleotidessurroundingalandmark(e.
g.
astartcodon),inwhicheachpositionrelativetotheland-markmapstothesamegenomiccoordinateacrosseverymemberofagroupoftranscripts(orotherfeatures).
Thus,agene'smaximalspanningwindowcapturestherangeoffeaturepositionswhosedistancestoeachotherandtoalandmarkareindependentofwhatevertran-scriptisoform(s)thatmaybeexpressed.
Theuseofmaximalspanningwindowsprovidesanum-berofadvantagesoverotherstrategieswhenisoformdistributionsareuncertain.
Acommonlyusedalternativestrategyistochooseasingle,"canonical"transcriptiso-formfromeachgenetoincludeinanalysis.
Thisapproxi-mationisappropriateinsomecircumstances,butisvariablyinaccuratewhencomparingacrosscelllinesorcultureconditions.
Anotherstrategyistotreatalltran-scriptisoformsasindependententities.
But,intheabsenceofcorrectionsdownstream,thispracticecanyielddouble-countingofreadalignmentsandregionswhenmultipleisoformsoverlap.
Restrictinganalysistoeachgene'smax-imalspanningwindowminimizestheproblemsinherentinbothofthesestrategiesinsofarthequalityofagivengenomeannotationallows.
Plastidcontainstoolsthatgenerateamaximalspan-ningwindowsurroundingalandmarkofinterest(suchasastartcodon)foreachgene(or,moregenerally,anyuser-specifiedgroupoffeatures)inagenomeannota-tion.
Todoso,Plastidmakesuseoflandmarkfunctionsthatidentifyalandmarkofinterest,ifpresent,withinasingletranscript.
Thelandmarkfunctionisappliedtoeachofagene'stranscripts,and,ifthegenomicpositionsoftheirlandmarksareidentical(e.
g.
allstartcodonsmatchthesamegenomiccoordinate,evenifatdifferentco-ordinateswithineachtranscript),thenPlastid'swindow-generatingtoolkitbidirectionallyexamineseachpositiononeachtranscriptatincreasingdistancefromthelandmarkuntilcorrespondingpositionsonalltranscriptsnolongermaptothesamegenomicposition.
Ifalltranscriptsfromagivengenedonotsharethesamegenomiclandmarkco-ordinate(containdifferentstartcodons),thenthemaximalspanningwindowsurroundingthatlandmarkisofzero-length,andexcludedfromanalysis.
Plastidincludeslandmarkfunctionsthatidentifystartandstopcodons,andincludesinstructionsforwritingfunctionstoprogrammaticallyidentifyotherlandmarks,suchaspeaksinsequencingdataornucleotidemotifswithinaregionofinterest.
Plastidcanusemaximalspanningwindowsforestimationofgeneexpressionorformetageneanalysis(describedbelow)foranytypeofsequencingdata,and,inthecaseofribosomeprofiling,additionallyusesmaximalspanningwindowsforestima-tionofP-siteoffsetsandsubcodonphasing.
Table2PlastidincludesconfigurablemappingfunctionsthatcovermanyusescasesinsequencinganalysisMethodMapreadsSampleuseFiveprimeAtafixedoffsetfromtheir5′endsRibosomeprofilingwithRNaseI(e.
g.
yeast,human),RNA-seqThreeprimeAtafixedoffsetfromtheir3′endsRibosomeprofilingwithRNaseI,RNA-seqFiveprime,variableAtanoffsetfrom5′enddeterminedbyreadlengthRibosomeprofilingwithRNaseI,RNA-seqFiveprime,variableandstratifiedbyreadlengthAtanoffsetfrom5′enddeterminedbyreadlength,partitioningreadsofeachlengthintoseparatearraysORFannotationfromribosomeprofilingdataCenter-weightedFractionallyoverentirelength,optionallytrimmingafixednumberofnucleotidesfromthe5′and3′endsRibosomeprofilingwithMNase(e.
g.
E.
coli&D.
melanogaster),RNA-seqDunnandWeissmanBMCGenomics(2016)17:958Page7of12MetageneanalysisNoisecanobscureimportantbiologicalsignalswithinindividualsamples,butsuchsignalsfrequentlyappearinpopulationaverages.
Fornucleotide-resolutionanalysisofNGSdata,oneparticularlyusefulaverageisametageneprofile,inwhicharraysofquantitativedata,correspondingtoeachpositionofageneorregionofinterest,arealignedatsomelandmark—suchasastartcodon[1],orthebeginningofaregionencodingasignalpeptide[22]—andaposition-wiseaverageistakenoverthealignedarrays(Fig.
6).
Metageneprofileshavebeenusedtorevealnumer-ousbiologicalsignals,suchaspeaksofribosomedensityatstartorstopcodons[1],ribosomalpausesoverpolybasicsignals[23],andsitesofengagementofhydrophobicnascentchainsbythesignalrecognitionparticle[22].
Table3Plastid'scommand-linescriptsautomatecommonanalysistasksAnalysisofcountandalignmentdatacounts_in_regionCountthenumberofreadalignmentscoveringarbitraryregionsofinterestinthegenome,andcalculatereaddensities(inreadspernucleotideandinRPKM)overtheseregionscsCountthenumberofreadalignmentsandcalculatereaddensities(inRPKM)specificallyforgenesandsub-regions(5′UTR,CDS,3′UTR),correctinggeneandsub-regionboundariesforoverlappinggenesget_count_vectorsFetchvectorsofcountsateachnucleotidepositioninoneormoreregionsofinterest,savingeachvectorasitsownline-delimitedtextfilemake_wiggleCreatewiggleorbedGraphfilesfromalignmentfilesafterapplyingareadmappingrule(e.
g.
tomapribosome-protectedfootprintsattheirP-sites),forvisualizationinagenomebrowsermetageneComputeametageneprofileofreadalignments,counts,orquantitativedataoveroneormoreregionsofinterestphase_by_sizeEstimatesub-codonphasinginribosomeprofilingdatapsiteEstimatepositionofribosomalP-sitewithinribosomeprofilingreadalignmentsasafunctionofreadlengthManipulationofgenomicfeaturescrossmapEmpiricallyannotatemultimappingregionsofagenome,givenalignmentcriteriagff_parent_typesDetermineparent-childrelationshipsoffeaturesinaGFF3filereformat_transcriptsConverttranscriptsbetweenBED,BigBed,GTF2,GFF3,andPSLformatsfindjuncsFindalluniquesplicejunctionsinoneormoretranscriptannotations,andoptionallyexporttheseinTophat's.
juncsformatslidejuncsCompareasetofsplicejunctionstoareferenceset,and,ifpossiblewithequalsequencesupport,slidediscoveredjunctionstocompatibleknownjunctionsSharedstartcodonMaximalspanningwindowaboutstartcodonovertranscriptsetNthnucleotidefromstartcodonmapstoidenticalgenomicpositionforalltranscriptsFig.
5Maximalspanningwindowsenableisoform-independentanalysis.
Amaximalspanningwindowoverasetoftranscripts(orothergenomicfeatures)isdefinedasthelargestpossiblewindowsurround-ingasharedlandmark(inthisexample,astartcodon;verticalline),overwhichtheNthnucleotidefromthelandmarkineachtranscriptcorrespondstothesamegenomicposition.
Maximalspanningwin-dowsthusenableposition-wiseanalysisoverfractionsofgeneswhenisoformdistributionsareunknown.
Plastidusesmaximalspanningwindowsformetageneanalysis,measuringsub-codonphasinginribosomeprofiling,andestimatingribosomalP-siteoffsetsColumnwisemedianAligned,normalizedarraysovermaximalspanningwindowsforeachgeneFig.
6Metageneprofilesrevealgenomicsignals.
Schematicofmetageneanalysis.
Normalizedarraysofquantitativedata(e.
g.
ribosomalP-sites;top)aretakenateachpositioninthemaximalspanningwindowsofmultiplegenes.
Thesearraysarealignedatalandmarkofinterest(here,astartcodon),andthemedianvalueofeachcolumn(nucleotideposition),istakentobetheaverage(bottom)DunnandWeissmanBMCGenomics(2016)17:958Page8of12Plastid'smetagenetoolkitisuniqueinitsuseofmax-imalspanningwindowstoobtainisoform-independentarraysofdataforeachindividualgene.
Thesearraysarethenalignedatthepositioncorrespondingtotheland-markandacolumn-wisemedianistakenateachpos-ition.
Becauseuserscanmodifyordefinebothlandmarkfunctionsandmappingfunctions,Plastid'stoolscanbeusedtoobtainposition-wiseaveragesofarbitrarytypesofdata,surroundingvirtuallyanylandmark,inarbitrar-ilygroupedsetsofregions.
MultimappingregionsofthegenomeSpecificregionsofthegenome―suchastransposableel-ements,pseudogenes,andparalogouscodingregions―canyieldsequencingreadsthatmultimap,oralignequallywelltomultipleregionsofthegenome.
Itisfrequentlyde-sirabletoexcludesuchregionsfromanalysis,astheseintroduceambiguityintosequencingdata.
However,be-causearead'sabilitytomultimapisafunctionofbothitslengthandthenumberofmismatchestoleratedduringalignment,specificexperimentalregimesrequirecustomannotationofmultimappingregionsinthegenome.
Plastidincludesascriptcalledcrossmapthatempiricallydetermineswhichregionsofthegenomeyieldmultimap-pingreadsofagivenlengthatapermittednumberofmismatches.
Elaboratinganapproachdevelopedin[1],crossmapconceptuallydividesthegenomeintoallpossiblese-quencingreadsoflengthk,andthenalignsthesebacktothegenomeallowingnmismatches,wherekandnaregivenbytheuser.
Whenareadalignsequallywelltomultipleregionsofthegenomeunderthesecriteria,itspointoforiginisflaggedasmultimapping.
crossmapex-portsallmultimappingregionsasaBEDfile,whichcanbesubsequentlyusedtomasksuchregionsofthegen-omefromanalysisinanyofPlastid'scommand-linescriptsorinteractivetools.
ResultsanddiscussionManipulationofdataatnucleotideresolutionInitsearliestdays,next-generationsequencingwasusedprincipallyforreconstructionofgenomes,and,withtheadventofRNA-seq,forestimationofgeneexpressionlevels.
Inthefirstcase,thesequencesofreadscapturedtherelevantbiology,andinthesecond,thescalarnum-berofreadalignmentscoveringanexonortranscriptsatisfiedmostexperimentalneeds.
Atpresent,manyNGSassaysexplorebiologicalques-tionswithnucleotideresolution.
Theseassayshavecre-atedaneedforanalyticaltoolsthatenableuserstomanipulatedatanucleotide-by-nucleotiderobustlyandeasily.
Plastidintroducesseveraldatamodelstailoredspecificallytothiswayofworking:First,mappingfunc-tionsconverttherelevantpropertiesofreadalignmentsintoarraysofdecodedinformation,andthuscreateanimportantbridgebetweenNGSassaysandtheanalyticaltoolsofferedbytheSciPystack[18].
Second,Seg-mentChainsandTranscriptsenableuserstomanipu-latequantitativedataandfeatureannotationswithnucleotideprecision,ingenomicortranscript-centriccoordinates.
Thus,patternsindatacaneasilybeusedtoannotatenewfeatures,andfeaturescanbearbitrarilysub-divided,joined,orexportedinstand-ardformats,enablingtheiruseinotherpipelinesandvisualizationingenomebrowsers.
Finally,max-imalspanningwindowsofferanovelandrigorousapproachtouncertaintiescreatedwhenmultipletranscriptisoformsmightbepresent,acommoncir-cumstancewhenstudyinghighereukaryotes.
EaseofuseOneofPlastid'sdesigngoalsistolowerthebarriertoentryforgenomicanalysis.
Tothisend,Plastid'sde-signfocusesonsimplicityand,whenpossible,useofbiologicalanalogies.
Plastidthereforeintroducesaminimalsetofclasses,andinsteadfavorsexistingandcommonly-useddatastructures(suchasNumPyar-rays)andfileformats(e.
g.
BEDandGTF2),wheneverpossible.
Datathatcannotbecapturedinstandardformatsareformattedastab-delimitedtables,whichcanreadilybemanipulatedinPython(usingPandas[24]),R,orevenExcel.
Tofacilitatereading,re-reading,orwritingcode,Plas-tid'sclasses,methods,andfunctionsaremodeleduponbiologicalidiomsand,whenpossible,givenhuman-readablenames.
ThisdesignenablesuserstoleverageknowledgeofbiologywhenfamiliarizingthemselveswithPlastid,andalsotowritecodethat,usingtheconcretelanguageofbiology,ismoreeasilyinterpretedbyothers.
Finally,toenableusers,wehavewrittenextensivedocumentationwithtutorialsandwalkthroughsofvarioustypesofanalysis,aswellasatestdatasettai-loredtothosewalkthroughs.
Theseareavailableathttps://plastid.
readthedocs.
io.
ExtensibilityPlastidisdesignedtobebothmodularandeasilyex-tended,andincludeswell-definedanddocumentedAPIs.
Inaddition,Plastidincludesentrypointstoregisternewmappingfunctionsandtheircommand-lineargumentswithPlastid'scommand-linescripts,enablingadvanceduserstosharetheirextensionswithothers.
Plastidalsoincludesscriptwritingtoolsforimple-mentingnewworkflows.
TheseincludeargumentparsersthatreaddatainsupportedfileformatsintoPlastid'sstandardobjects,enablingdevelopers,likeusers,tore-mainagnosticoffileformats.
Plastidalsoincludesexten-sionstoPython'swarningcontrolsystemthatgiveDunnandWeissmanBMCGenomics(2016)17:958Page9of12developersmorefinely-grainedcontroloverhowtogroupandlimitwarningsdisplays,whichcanbenumer-ouswhenoperatingonlargegenomicsdatasets.
ConclusionsPlastidisagenomicsandNGSanalysistoolkitthatoffersuniquetoolsfordecodinginformationfromreadalign-mentsandmanipulatingdataatnucleotide-resolution.
Plastid'sdesignenablesittoretaingeneralityandflexi-bilityacrossassayswhileremaininguserfriendly.
Thus,weandothershaveusedPlastidtoanalyzedatafromnumerousNGSassays,includingribosomeprofiling,RNA-seq,DMS-seq,andbisulfitesequencing.
Plastid'sutilityderivesnotonlyfromtheintroductionofmappingfunctions,SegmentChains,andmaximalspanningwindows,butalsofromadesignintentthatfo-cusesonsimplicity,consistency,andintegrationwithotherpackages:biologicaldataarerepresentedthroughunifiedinterfacesregardlessoftheunderlyingfilefor-mat;theseinterfacesaremodeledonbiologicalidioms;and,importantly,theseinterfacesintegrateseamlesslywiththeSciPystack.
Thus,bothnoviceusersandexperiencedbioinformaticianshavefoundPlastiduseful.
VersionsofPlastidhavebeenusedinanumberofpubli-cations[10,25]andmanuscriptsinprogress(personalcommunicationsfromC.
A.
Gross,M.
Schuldiner,andN.
Bellletier&E.
A.
Gavis),andisthegenomicengineofourORFannotationsoftware,ORF-RATER[10].
AvailabilityandrequirementsSourcecodePlastidisreleasedundertheBSD3-Clauselicense.
OfficialreleasesareavailableinthePythonPackageIndexathttp://pypi.
python.
org/pypi/plastid.
Developmentver-sionsareavailableattheproject'shomepage,https://github.
com/joshuagryphon/plastid.
Examples,userdocu-mentation,andtechnicalinformationareavailableathttp://plastid.
readthedocs.
io.
TheversiondiscussedinthisarticleisPlastid0.
4.
6.
ComputingrequirementsPlastidisplatform-independentandrunsonPython2.
7andPython3.
3orgreater.
ItdependsonCython[26],numpy[17],andPysam[27]forcompilation,andadditionallySciPy[18],matplotlib[28],pandas[24],Biopython[20],twobitreader[21],andtermcolor[29]forruntime.
Plastidrunswellonlaptops,butsystemrequirementsscalewiththecomplexityofthegenomeannotationandthenumberofreadalignmentsinadataset.
Themini-mumamountofRAMwerecommendforS.
cerevisiaeandothersmallgenomesis1GB;formid-sizedgenomeslikeD.
melanogaster,4GB;and8GBforvertebrateorplantgenomes.
Runtimesandmemoryusageforworst-casescenariosunderavarietyofscriptsincludedinPlas-tidareshowninTable4.
ExternaldatasetsandsoftwareusedinthisstudySequencingdatasetssupportingtheconclusionsofthisart-icleareavailableinthetheSRA[30]underaccessionnum-bersSRR1562907(ribosomeprofiling,[22]);SRR019600-20andSRR20276-20282(bisulfitesequencing,[31]);andSRR1057939(DMS-seq,[19]).
DatawerevisualizedintheIntegrativeGenomicsViewer[32]andmodifiedinAdobeIllustratorCS6.
CodesyntaxwashighlightedusingPyg-mentsversion2.
2[33].
ForFig.
2,ribosomeprofilingdatasetSRR1562907[22]wasstrippedof3′cloningadaptors(CTGTAGGCAC-CATCAAT),andalignedtotheyeastreferencegenome(SGDR64.
1.
1)usingTophat2.
1.
0[34].
RibosomalP-siteswereassignedtobe15nucleotidesfromthe3′endof25-35mers.
BisulfitesequencingdatawerepooledfromSRAdatasetsSRR019600-20andSRR20276-20282[31],strippedof3′cloningadaptors(AGATCGGAA-GAGC)andalignedtothehumanreferencegenome(UCSChg38p3;[35])usingBismark0.
14.
4[36].
Methy-lationwasdeterminedfromBismarkcallsbyparsingtheXMflagofeachalignmentfollowingthespecificationin[36].
DMS-seqdatasetSRR1057939[19]wasdown-loadedandalignedtohumangenomesequence(EnsemblGrCh38.
78;[37])usingTophat[34].
Countswereassignedtoalkylatedresidues,estimatedtobe1base5′ofthereadalignment,inthedirectionoftheTable4ComputingrequirementsforgenomesanddatasetsofvaryingsizeTestOrganismRuntime(hh:mm:ss)Peakmemoryusage(MB)ReadcountingYeast00:01:18±00:00:01255±0ReadcountingFly00:36:34±00:00:031138±7ReadcountingHuman00:19:56±00:00:011053±2ManipulateannotationsYeast00:00:27±00:00:02467±0ManipulateannotationsFly00:03:37±00:00:032620±1ManipulateannotationsHuman00:18:42±00:01:494419±1ExportbrowsertrackYeast00:00:58±00:00:00281±1ExportbrowsertrackFly00:09:05±00:00:402452±7ExportbrowsertrackHuman00:06:11±00:00:03537±0BuildcrossmapYeast00:00:35±00:00:00100±0BuildcrossmapFly00:10:44±00:00:10328±7BuildcrossmapHuman04:11:51±00:06:32130±1Fourcommand-linescriptswereexecutedonyeast,fly,andhumandatasets.
Runtimesandpeakmemoryusagearegivenasthemean±standarddeviationofthreereplicates.
SeemethodsfordetailsDunnandWeissmanBMCGenomics(2016)17:958Page10of12alignment.
SECISelementsandtheirstructurepredic-tionswereidentifiedusingSeciSearch2.
19[38].
ForTable4,alltestswererunonasingle2.
7GHzIntelCorei7-5700CPUonanMSIApacheProQE2lap-top,inavirtualmachinerunningUbuntu14.
04with10GbofRAM,exceptforBuildcrossmap,whichusedtwocores.
RuntimesandmemoryusageweremonitoredusingMemoryProfilerversion0.
32[39].
Fortestsonyeast,weusedtheannotationandgenomeassemblyfromSGDR64.
1.
1[40],5′and3′UTRdefinitionsfrom[41]and[42],andribosomeprofilingdatasetSRR1562907.
Fortestsontheflygenome,weusedtheannotationandgenomeassemblyfromFlyBaser5.
54[43]andmergedribosomeprofilingdatasetsfrom[26](SRAnumbersSRR942868-77).
Fortestsonthehumangenome,weusedallAPPRIS-scored[44]transcriptsfromEnsemblannota-tionGrCh38.
81[37],thehg38genomeassemblyfromUCSC[35],andribosomeprofilingdatasetSRR1976443.
AllgenomeannotationfileswereconvertedtoGTF2for-mat.
SequencewasinFASTAformatwiththeexceptionofhg38,whichwaskeptasa2bitfile.
Alignmentsofallse-quencingreadswerekeptinBAMformat.
Forteststhatusedreadalignments,alignmentsweremappedasfollowsforeachorganism:15nucleotidesfromthe3′endofthereadforS.
cerevisiae(modifiedfrom[1]),center-weightedmappingforD.
melanogaster[25],andusingavariableoffsetforH.
sapiens[2].
Foreachorganismdataset,aseriesoftestswerecon-ducted.
InManipulateannotations,alltranscripts,genes,exons,andcodingregionswithinachromosomewerecomparedandmodifiedinmultiplewaysusingPlastid'scsscript,executedascsgenerate/tmp/foo–annotation_filegtf_file.
gtf–sorted.
InReadcounting,readcountsanddensitiesweretabulatedforalltran-scriptsinagenomeannotationusingthecounts_in_re-gionscript,executedascounts_in_region/tmp/foo–count_filesbam_file.
bam–annotation_filesgtf_file.
gtf–sorted[–threeprime–offset15foryeast|–fiveprime_vari-ablep_off.
txtforhuman|–center–nibble12forfly].
InBuildcrossmap,anempiricalannotationofwhichre-gionsinagivengenomegiverisetomultimappingreadswasempiricallydeterminedbyslicingthegenomese-quenceintok-mersandcountingthenumberoftimeseachk-meralignedtothegenomeusingPlastid'scross-mapscript,whichinternallyusedBowtieversion1.
1.
2[45].
Thecrossmapscriptwasexecutedascrossmap-k26–mismatches0-p2–sequence_filefile.
[fa|2bit]–sequence_format[FASTA|2bit]/path/to/bowtie/index/tmp/foo.
AbbreviationsCAGE-seq:Cap-analysisgeneexpression,foridentificationof5′endsofeukaryoticmessengerRNAs;ChIP-seq:Chromatinimmunoprecipitationsequencing,forprobingsitesofDNA::proteininteraction;DMS-seq:Dimethylsulfonatesequencing,forprobingRNAstructure;EDA:Exploratorydataanalysis;GB:Gigabyte;hh:mm:ss:Timeexpressedashours:minutes:seconds;MB:Megabyte;NGS:Next-generationsequencing;UTR:UntranslatedregionAcknowledgementsWeareparticularlygratefultoElizabethCosta,NatalieBaggett,NaamaAviram,EdwinRodriguez,andthemembersoftheWeissmanlabfortestingandcriticismofthesoftwareanddocumentation;toAlexFieldsandJohnHawkinsforusefuldiscussionofalgorithms;andtoGloriaBrarandallmentionedaboveforhelpfulcommentsonthemanuscript.
FundingJGD'sstipendandJSW'slaboratorywerefundedbyanNSFgraduateresearchfellowship,NIH/NIAgrantP01AG010770,NIH/NIGMSgrantP50GM102706,andtheHowardHughesMedicalInstitute.
Theseorganizationsdidnotdirectlyparticipateindesign,implementation,orwritingofthesoftwareorthismanuscript.
Authors'contributionsJGDdesigned,implemented,andtestedthesoftware.
JGDandJSWwrotethemanuscript.
Allauthorsreadandapprovedthefinalmanuscript.
CompetinginterestsTheauthorsdeclarethattheyhavenocompetinginterests.
ConsentforpublicationNotapplicable.
EthicsapprovalandconsenttoparticipateNotapplicable.
Authordetails1CaliforniaInstituteofQuantitativeBiosciences,SanFrancisco,USA.
2DepartmentofCellularandMolecularPharmacology,UniversityofCaliforniaSanFrancisco,SanFrancisco,CA,USA.
3HowardHughesMedicalInstitute,UniversityofCaliforniaSanFrancisco,SanFrancisco,CA,USA.
4CenterforRNASystemsBiology,Berkeley,CA,USA.
Received:21June2016Accepted:9November2016References1.
IngoliaNT,GhaemmaghamiS,NewmanJRS,WeissmanJS.
Genome-wideanalysisinvivooftranslationwithnucleotideresolutionusingribosomeprofiling.
Science.
2009;324:218–23.
2.
IngoliaNT,LareauLF,WeissmanJS.
Ribosomeprofilingofmouseembryonicstemcellsrevealsthecomplexityanddynamicsofmammalianproteomes.
Cell.
2011;147:789–802.
3.
FrommerM,McDonaldLE,MillarDS,CollisCM,WattF,GriggGW,etal.
Agenomicsequencingprotocolthatyieldsapositivedisplayof5-methylcytosineresiduesinindividualDNAstrands.
ProcNatlAcadSci.
1992;89:1827–31.
4.
BoothMJ,BrancoMR,FiczG,OxleyD,KruegerF,ReikW,etal.
Quantitativesequencingof5-methylcytosineand5-hydroxymethylcytosineatsingle-baseresolution.
Science.
2012;336:934–7.
5.
HardcastleTJ.
riboSeqR:Analysisofsequencingdatafromribosomeprofilingexperiments.
2014;Availablefrom:http://bioconductor.
org/packages/release/bioc/html/riboSeqR.
html.
Accessed13Nov2016.
6.
LegendreR,Baudin-BaillieuA,HatinI,NamyO.
RiboTools:aGalaxytoolboxforqualitativeribosomeprofilinganalysis.
Bioinformatics.
2015;31:2586–8.
7.
MichelAM,MullanJPA,VelayudhanV,O'ConnorPBF,DonohueCA,BaranovPV.
RiboGalaxy:abrowserbasedplatformforthealignment,analysisandvisualizationofribosomeprofilingdata.
RNABiol.
2016;13(3):316-9.
doi:10.
1080/15476286.
2016.
1141862.
8.
CrappéJ,NdahE,KochA,SteyaertS,GawronD,DeKeulenaerS,etal.
PROTEOFORMER:deepproteomecoveragethroughribosomeprofilingandMSintegration.
NucleicAcidsRes.
2015;43:e29.
9.
BazziniAA,JohnstoneTG,ChristianoR,MackowiakSD,ObermayerB,FlemingES,etal.
IdentificationofsmallORFsinvertebratesusingribosomefootprintingandevolutionaryconservation.
EMBOJ.
2014;33:981–93.
DunnandWeissmanBMCGenomics(2016)17:958Page11of1210.
FieldsAP,RodriguezEH,JovanovicM,Stern-GinossarN,HaasBJ,MertinsP,etal.
Aregression-basedanalysisofribosome-profilingdatarevealsaconservedcomplexitytomammaliantranslation.
MolCell.
2015;60:816–27.
11.
LiH,HandsakerB,WysokerA,FennellT,RuanJ,HomerN,etal.
Thesequencealignment/MapformatandSAMtools.
Bioinformatics.
2009;25:2078–9.
12.
QuinlanAR,HallIM.
BEDTools:aflexiblesuiteofutilitiesforcomparinggenomicfeatures.
Bioinformatics.
2010;26:841–2.
13.
AndersS,others.
HTSeq:Analysinghigh-throughputsequencingdatawithPython[Internet].
2010.
Availablefrom:http://www-huber.
embl.
de/HTSeq/doc/overview.
html.
Accessed13Nov2016.
14.
DaleRK,MatzatLH,LeiEP.
Metaseq:aPythonpackageforintegrativegenome-wideanalysisrevealsrelationshipsbetweenchromatininsulatorsandassociatednuclearmRNA.
NucleicAcidsRes.
2014;42:9158–70.
15.
bxlab/bx-python[Internet].
GitHub.
[cited2016Sep21].
Availablefrom:https://github.
com/bxlab/bx-python16.
GentlemanRC,CareyVJ,BatesDM.
Bioconductor:opensoftwaredevelopmentforcomputationalbiologyandbioinformatics.
GenomeBiol.
2004;5:R80.
17.
vanderWaltS,ColbertSC,VaroquauxG.
TheNumPyarray:astructureforefficientnumericalcomputation.
ComputSciEng.
2011;13:22–30.
18.
JonesE,OliphantT,PetersonP,etal.
SciPy:opensourcescientifictoolsforpython[internet].
2001.
Availablefrom:http://www.
scipy.
org/.
Accessed13Nov2016.
19.
RouskinS,ZubradtM,WashietlS,KellisM,WeissmanJS.
Genome-wideprobingofRNAstructurerevealsactiveunfoldingofmRNAstructuresinvivo.
Nature.
2014;505:701–5.
20.
CockPJA,AntaoT,ChangJT,ChapmanBA,CoxCJ,DalkeA,etal.
Biopython:freelyavailablePythontoolsforcomputationalmolecularbiologyandbioinformatics.
Bioinformatics.
2009;25:1422–3.
21.
SchillerBJ,contributors.
twobitreader:afastpythonpackageforreading.
2bitfiles[Internet].
twobitreader.
[cited2015Oct26].
Availablefrom:https://pythonhosted.
org/twobitreader/22.
JanCH,WilliamsCC,WeissmanJS.
PrinciplesofERcotranslationaltranslocationrevealedbyproximity-specificribosomeprofiling.
Science.
2014;346:1257521.
23.
BrandmanO,Stewart-OrnsteinJ,WongD,LarsonA,WilliamsCC,LiG-W,etal.
Aribosome-boundqualitycontrolcomplextriggersdegradationofnascentpeptidesandsignalstranslationstress.
Cell.
2012;151:1042–54.
24.
McKinneyW.
DataStructuresforStatisticalComputinginPython.
Proceedingsofthe9thPythoninScienceConference.
2010;51–625.
DunnJG,FooCK,BelletierNG,GavisER,WeissmanJS.
RibosomeprofilingrevealspervasiveandregulatedstopcodonreadthroughinDrosophilamelanogaster.
Elife.
2013;2:e01179.
26.
BehnelS,BradshawR,CitroC,DalcinL,SeljebotnDS,SmithK.
Cython:TheBestofBothWorlds.
ComputinginScienceandEngineering.
2011;13:31–9.
27.
HegerA,contributors.
pysam:htslibinterfaceforpython[Internet].
[cited2015Oct26].
Availablefrom:https://github.
com/pysam-developers/pysam28.
HunterJD.
Matplotlib:A2Dgraphicsenvironment.
ComputinginScience&Engineering.
2007;9:90–5.
29.
Lepa,Konstantin.
termcolor1.
1.
0:ANSIColorformattingforoutputinterminal[Internet].
[cited2016Apr26].
Availablefrom:https://pypi.
python.
org/pypi/termcolor.
30.
LeinonenR,SugawaraH,ShumwayM.
Thesequencereadarchive.
NucleicAcidsRes.
2011;39:D19–21.
31.
ListerR,PelizzolaM,DowenRH,HawkinsRD,HonG,Tonti-FilippiniJ,etal.
HumanDNAmethylomesatbaseresolutionshowwidespreadepigenomicdifferences.
Nature.
2009;462:315–22.
32.
ThorvaldsdóttirH,RobinsonJT,MesirovJP.
IntegrativeGenomicsViewer(IGV):high-performancegenomicsdatavisualizationandexploration.
BriefBioinform.
2013;14:178–92.
33.
Brandl,Georg,Ronacher,Armin,Hatch,Timothy,thePocooteam.
Pygments:Pythonsyntaxhighlighter[Internet].
[cited2016Apr26].
Availablefrom:http://pygments.
org/34.
KimD,PerteaG,TrapnellC,PimentelH,KelleyR,SalzbergSL.
TopHat2:accuratealignmentoftranscriptomesinthepresenceofinsertions,deletionsandgenefusions.
GenomeBiol.
2013;14:R36.
35.
LanderES,LintonLM,BirrenB,NusbaumC,ZodyMC,BaldwinJ,etal.
Initialsequencingandanalysisofthehumangenome.
Nature.
2001;409:860–921.
36.
KruegerF,AndrewsSR.
Bismark:aflexiblealignerandmethylationcallerforBisulfite-Seqapplications.
Bioinformatics.
2011;27:1571–2.
37.
CunninghamF,AmodeMR,BarrellD,BealK,BillisK,BrentS,etal.
Ensembl2015.
NucleicAcidsRes.
2015;43:D662–9.
38.
KryukovGV,CastellanoS,NovoselovSV,LobanovAV,ZehtabO,GuigóR,etal.
Characterizationofmammalianselenoproteomes.
Science.
2003;300:1439–43.
39.
Pedregosa,Fabian.
MemoryProfiler:amoduleformonitoringmemoryusageofaPythonprogram[Internet].
[cited2016Apr26].
Availablefrom:https://pypi.
python.
org/pypi/memory_profiler/40.
CherryJM,HongEL,AmundsenC,BalakrishnanR,BinkleyG,ChanET,etal.
Saccharomycesgenomedatabase:thegenomicsresourceofbuddingyeast.
NucleicAcidsRes.
2012;40:D700–5.
41.
NagalakshmiU,WangZ,WaernK,ShouC,RahaD,GersteinM,etal.
ThetranscriptionallandscapeoftheyeastgenomedefinedbyRNAsequencing.
Science.
2008;320:1344–9.
42.
YassourM,KaplanT,FraserHB,LevinJZ,PfiffnerJ,AdiconisX,etal.
AbinitioconstructionofaeukaryotictranscriptomebymassivelyparallelmRNAsequencing.
ProcNatlAcadSciUSA.
2009;106:3264–9.
43.
AttrillH,FallsK,GoodmanJL,MillburnGH,AntonazzoG,ReyAJ,etal.
FlyBase:establishingaGeneGroupresourceforDrosophilamelanogaster.
NucleicAcidsRes.
2016;44:D786–92.
44.
RodriguezJM,MaiettaP,EzkurdiaI,PietrelliA,WesselinkJ-J,LopezG,etal.
APPRIS:annotationofprincipalandalternativespliceisoforms.
NucleicAcidsRes.
2013;41:D110–7.
45.
LangmeadB,TrapnellC,PopM,SalzbergSL.
Ultrafastandmemory-efficientalignmentofshortDNAsequencestothehumangenome.
GenomeBiol.
2009;10:R25.
46.
Kent,Jim,ENCODEDCC.
kentUtils:JimKentcommandlinebioinformaticutilities[Internet].
GitHub.
[cited2016Apr26].
Availablefrom:https://github.
com/ENCODE-DCC/kentUtilsWeacceptpre-submissioninquiriesOurselectortoolhelpsyoutondthemostrelevantjournalWeprovideroundtheclockcustomersupportConvenientonlinesubmissionThoroughpeerreviewInclusioninPubMedandallmajorindexingservicesMaximumvisibilityforyourresearchSubmityourmanuscriptatwww.
biomedcentral.
com/submitSubmityournextmanuscripttoBioMedCentralandwewillhelpyouateverystep:DunnandWeissmanBMCGenomics(2016)17:958Page12of12

hostodo:美国大流量VPS,低至$3,8T流量/月-1.5G内存/1核/25gNVMe/拉斯维加斯+迈阿密

hostodo从2014年年底运作至今一直都是走低价促销侧率运作VPS,在市场上一直都是那种不温不火的品牌知名度,好在坚持了7年都还运作得好好的,站长觉得hostodo还是值得大家在买VPS的时候作为一个候选考虑项的。当前,hostodo有拉斯维加斯和迈阿密两个数据中心的VPS在促销,专门列出了2款VPS给8T流量/月,基于KVM虚拟+NVMe整列,年付送DirectAdmin授权(发ticket...

阿里云金秋上云季,云服务器秒杀2C2G5M年付60元起

阿里云(aliyun)在这个月又推出了一个金秋上云季活动,到9月30日前,每天两场秒杀活动,包括轻量应用服务器、云服务器、云数据库、短信包、存储包、CDN流量包等等产品,其中Aliyun轻量云服务器最低60元/年起,还可以99元续费3次!活动针对新用户和没有购买过他们的产品的老用户均可参与,每人限购1件。关于阿里云不用多说了,国内首屈一指的云服务器商家,无论建站还是学习都是相当靠谱的。活动地址:h...

ZJI:韩国BGP+CN2线路服务器,国内三网访问速度优秀,8折优惠码每月实付440元起

zji怎么样?zji最近新上韩国BGP+CN2线路服务器,国内三网访问速度优秀,适用8折优惠码zji,优惠后韩国服务器最低每月440元起。zji主机支持安装Linux或者Windows操作系统,会员中心集成电源管理功能,8折优惠码为终身折扣,续费同价,全场适用。ZJI是原Wordpress圈知名主机商:维翔主机,成立于2011年,2018年9月启用新域名ZJI,提供中国香港、台湾、日本、美国独立服...

syntaxhighlighter为你推荐
"中科院重庆绿色智能技术研究院采购供应商信息登记表""2014年全国民营企业招聘会现场A区域企业信息",,,,奶粉ios8支持ipadgetIntjava化学品安全技术说明书重庆宽带测速重庆市电信网速测试是哪个网站或ipwindows键是哪个windows 快捷键 大全win10关闭445端口win10怎么关闭445的最新相关信息canvas2html5创建两个canvas后,怎么回到第一个canvas
域名大全 网通服务器租用 淘宝二级域名 日本软银 windows主机 css样式大全 debian7 什么是刀片服务器 美国网站服务器 卡巴斯基破解版 中国电信测速器 我的世界服务器ip 广东主机托管 阿里云邮箱个人版 netvigator godaddy退款 hosting cdn免备案空间 卡巴斯基免费版 赵荣 更多