filesnanosim
nanosim 时间:2021-01-17 阅读:(
)
SOFTWAREOpenAccessNanoDJ:aDockerizedJupyternotebookforinteractiveOxfordNanoporeMinIONsequencemanipulationandgenomeassemblyHéctorRodríguez-Pérez1,TamaraHernández-Beeftink1,JoséM.
Lorenzo-Salazar2,JoséL.
Roda-García3,CarlosJ.
Pérez-González4,MarcosColebrook3*andCarlosFlores1,2,5*AbstractBackground:TheOxfordNanoporeTechnologies(ONT)MinIONportablesequencermakesitpossibletousecutting-edgegenomictechnologiesinthefieldandtheacademicclassroom.
Results:WepresentNanoDJ,aJupyternotebookintegrationoftoolsforsimplifiedmanipulationandassemblyofDNAsequencesproducedbyONTdevices.
Itintegratesbasecalling,readtrimmingandqualitycontrol,simulationandplottingroutineswithavarietyofwidelyusedalignersandassemblers,includingproceduresforhybridassembly.
Conclusions:WiththeuseofJupyter-facilitatedaccesstoself-explanatorycontentsofapplicationsandtheinteractivevisualizationofresults,aswellasbyitsdistributionintoaDockersoftwarecontainer,NanoDJisaimedtosimplifyandmakemorereproducibleONTDNAsequenceanalysis.
TheNanoDJpackagecode,documentationandinstallationinstructionsarefreelyavailableathttps://github.
com/genomicsITER/NanoDJ.
Keywords:Genomeanalysis,Nanoporesequencing,Jupyter,DockerBackgroundIthasneverbeenbeforesoeasyandaffordabletoaccessandutilizegeneticvariationofanyorganismandpurpose.
Thishasbeenmotivatedbythecontinuousdevelopmentofhigh-throughputDNAsequencingtechnologies,mostcommonlyknownasNextGenerationSequencing(NGS).
Akeyimprovementisthepossibilityofobtaininglongsin-glemoleculesequenceswiththefastandcost-efficiencytechnologyreleasedbyOxfordNanoporeTechnologies(ONT)andthemarketingin2014oftheMinION,aport-able,pocket-size,nanopore-basedNGSplatform[1].
Sincethen,severalalgorithmsandsoftwaretoolshaveflourishedspecificallyforONTsequencedata.
Despiteitssize,itpro-videsmulti-kilobasereadswithathroughputcomparabletootherbenchtopsequencersinthemarket(1–10Gbasesby2017),thereforestillnecessitatingofefficientandinte-gratedbioinformaticstoolstofacilitatethewidespreaduseofthetechnology.
WhileMinIONhasshownpromiseindistinctapplica-tions[2],becauseofthelowcost,laptopoperability,andtheUSB-poweredcompactdesignofMinION,cutting-edgeNGStechnologyisnotanymorenecessarilylinkedtotheestablishedideaofalargemachinewithhighcostthatmustbelocatedincentralizedsequencingcentersorinalabora-torybench.
Asaconsequence,theutilityofMinIONinfieldexperimentstomovefromsample-to-answersonsitehavebeendemonstratedwithinfectiousdiseasestudies[3,4],off-Earthgenomesequencing[5],andspeciesidentifica-tioninextremeenvironments[6–8],amongothers.
Lever-agingofMinIONcapabilitiesintheacademicclassroomisanaturalextensionofthesefieldstudiestofacilitateTheAuthor(s).
2019OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttribution4.
0InternationalLicense(http://creativecommons.
org/licenses/by/4.
0/),whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicense,andindicateifchangesweremade.
TheCreativeCommonsPublicDomainDedicationwaiver(http://creativecommons.
org/publicdomain/zero/1.
0/)appliestothedatamadeavailableinthisarticle,unlessotherwisestated.
*Correspondence:mcolesan@ull.
edu.
es;cflores@ull.
edu.
esHéctorRodríguez-PérezandTamaraHernández-Beeftinkcontributedequallytothiswork.
3DepartamentodeIngenieríaInformáticaydeSistemas,UniversidaddeLaLaguna,SantaCruzdeTenerife,Spain1ResearchUnit,HospitalUniversitarioNuestraSeoradeCandelaria,UniversidaddeLaLaguna,SantaCruzdeTenerife,SpainFulllistofauthorinformationisavailableattheendofthearticleRodríguez-Pérezetal.
BMCBioinformatics(2019)20:234https://doi.
org/10.
1186/s12859-019-2860-zeducationofgenomicsinundergraduateandgraduatestu-dents[9].
Todate,thereisnospecificsoftwaresolutionaimedtofacilitateONTsequenceanalysesbyintegratingcapabilitiesfordatamanipulation,sequencecomparisonandassemblyinfieldexperimentsorforeducationalpurposestohelpfacilitatelearningofgenomics[9].
WehavedevelopedNanoDJ,aninteractivecollectionofJupyternotebookstointegrateavarietyofsoftware,advancedcomputercode,andplaincontextualexplanations.
Inaddition,NanoDJisdistributedasaDockersoftwarecontainertosimplifyin-stallationofdependenciesandimprovethereproducibilityofresults.
ImplementationNanoDJisdistributedasaDockercontainerbuiltunder-neathJupyternotebooks,whichisincreasinglypopularinlifesciencestosignificantlyfacilitatetheinteractiveex-plorationofdata[10],andhasbeenrecentlyintegratedinthewidelyusedGalaxyportal[11].
TheDockercontainerallowsNanoDJtoruninanisolated,self-containedpack-age,thatcanbeexecutedseamlesslyacrossawiderangeofcomputingplatforms[12],havinganegligibleimpactontheexecutionperformance[13].
NanoDJintegratesdi-verseapplications(Additionalfile1:TableS1)organizedinto12notebooksgroupedonthreesections(Fig.
1;Table1).
Mainresultsarepresentedasembeddedobjects.
Inaddition,oneofthenotebookswasconceivedforedu-cationalpurposesbysettingaparticularlysimpleproblemandtheinclusionoflow-levelexplanations.
Tofacilitatetheuseoftheeducationalnotebookandbypassingthein-stallationofDockerandNanoDJ,alightweightversionofthisnotebookandsmallsetsofONTreadscanbeutilizedfromaweb-browserusingBinder(https://mybinder.
org)intheNanoDJGitHubrepository.
Inaddition,aspartoftheCyVerseproject(https://www.
cyverse.
org/),NanoDJhasbeenincorporatedintoVICE,avisualandinteractivecomputingenvironmentthatfacilitatestrainingofONTdataanalysis.
WeillustratetheversatilityofNanoDJindistinctscenariosbyprovidingresultsfromfourcasestud-ies(Additionalfile1:TextS1).
Input,basecalling,andsimulationsInputdatacanbealistofFAST5filesfrompreviousbase-calledruns(e.
g.
aMetrichoroutput)orevent-levelsignaldatatobebasecalledusingthelatestONTcaller.
TheusercanalsosimulatereadswithNanoSimandpre-computedmodelparameters.
Thispossibilityisimportantindifferentscenariosastohelpdesigninganexperiment,ortobypasstechnicaldifficultiesinacademicsetups[9].
Summary,qualitycontrolandfilteringEitherforasimulatedoranempiricalrun,theuserwillobtainsummarydataandplotsinformingofreadlengthdistribution,GCcontentvs.
length,andreadlengthvs.
qualityscore(whenavailable).
Ifbarcodeswereusedintheexperiment,Porechopcanbeusedfordemultiplexing,barcodetrimmingandtofilteroutreads.
GenomeassemblyandcomparisonDependingontheapplication,sequencedatacanbealignedagainstreferencesequencesorusedforgenomeas-semblyusingdiversemethods.
Alignmentisperformedei-theragainstone(BWAandRebaler)ormultiple(BLAST)referencesequences,providingthegenerationofBAMfilesfordownstreamapplications(e.
g.
,variantidentification)orinformationofspeciescomposition.
Alternatively,theusermayoptforadenovoassembly.
NanoDJallowstheuseofsomeofthebest-performingalgorithms(Canu,Flye,andMiniasm),ortocombineONTreadswithothersobtainedwithsecond-generationNGSplatformsforahybridassem-bly(UnicyclerandMaSuRCA).
ThelatterprovidesmoreFig.
1SimplifiedschemeofallNanoDJfunctionalitiesRodríguez-Pérezetal.
BMCBioinformatics(2019)20:234Page2of4effectiveassembliesandreducederrorratecomparedtoas-sembliesbasedonlyonONTreads[14].
NanoDJincludesthepossibilityofcontigcorrection(Racon,Nanopolish,andPilon).
Assembliescanbeevaluatedwiththeembed-dedversionofQUAST,andrepresentedwithBandage.
LimitationsandfuturedirectionsFornon-expertusers,itwouldhavebeenbetterifNanoDJwasenvisagedasanon-lineapplicationtofacilitateitsuse.
However,ourmainobjectivewastointegratemajortoolsfortheanalysisofONTsequencesinaninteractivesoft-wareenvironmenttofacilitatelearningthebasicsbehindONTsequenceanalysiswhileprovidingausefultoolforprofessionals.
ProvidingitasaDockerizedsolutionsimplybolstersthefocusontheuseofthetool,reducingthebur-denofinstallingalldependenciesbytheuser.
Atthemo-ment,NanoDJissetfortheanalysisofsmallgenomesandtargetedNGSstudies,althoughfocusingonprimaryandsecondaryanalysisofDNAsequences.
Theintegrationoftoolsforvariantidentificationandtertiaryanalysis(anno-tationofvariantsorsequenceelements,interpretation,etc.
)[15,16],aswellasforepigenetics[17]anddirectRNAsequencing[18]willbethefocusoffurtherdevelop-mentsofNanoDJ.
ConclusionsWepresentNanoDJasanintegratedJupyter-basedtool-boxdistributedasaDockersoftwarecontainertofacili-tateONTsequenceanalysis.
NanoDJisbestsuitedfortheanalysesofsmallgenomesandtargetedNGSstudies.
WeanticipatethattheJupyternotebook-basedstructurewillsimplifyfurtherdevelopmentsinotherapplications.
AvailabilityandrequirementsProjectname:NanoDJProjecthomepage:https://github.
com/genomicsITER/NanoDJOperatingsystem(s):Windows,Linux,MacOSProgramminglanguage:Bash/PythonOtherrequirements:DockerinstallationLicense:GPLAnyrestrictionstousebynon-academics:NoneAdditionalfileAdditionalfile1:TableS1.
ApplicationsintegratedinNanoDJ.
TextS1.
Testingoncasestudydatasets.
TableS2.
DatasetsforillustrativeusesofNanoDJ.
TableS3.
Comparisonofdenovoassembliesusingdifferentinputsorwithanassemblycorrector.
TableS4.
Comparisonofthreedenovoassemblersinahigh-coverageONTdataset.
TableS5.
Comparisonofresultsfromtwohybriddenovoassemblers.
FigureS1.
Humanmito-chondrialDNAvariantrepresentationagainstthereferencesequence.
TableS6.
SourceofmitochondrialDNAgenomes,simulationsandclassi-ficationresults.
(DOCX1544kb)AcknowledgementsNotapplicableFundingThisresearchwasfundedbytheInstitutodeSaludCarlosIII(grantsPI14/00844andPI17/00610),theSpanishMinistryofScience,InnovationandUniversities(grantRTC-2017-6471-1;MINECO/AEI/FEDER,UE),theSpanishMinistryofEconomyandCompetitiveness(grantMTM2016–74877-P),whichwereco-financedbytheEuropeanRegionalDevelopmentFunds'AwayofmakingEurope'fromtheEuropeanUnion,AreaTenerife2030fromCabildodeTenerife(CGIEU0000219140),andbytheagreementOA17/008withInsti-tutoTecnológicoydeEnergíasRenovables(ITER)tostrengthenscientificandtechnologicaleducation,training,research,developmentandinnovationinGenomics,PersonalizedMedicineandBiotechnology.
Thefoundingen-titieshadnoroleinthedesignofthestudy,analysis,interpretationofdataorinmanuscriptwriting.
AvailabilityofdataandmaterialsAlldatageneratedoranalysedduringthisstudyareincludedinthispublishedarticleanditssupplementaryinformationfiles.
RawreadsfromMinIONandIlluminaareavailablefromtheSRAdatabase(accessionnumber(s)PRJNA451111,PRJNA451107).
Authors'contributionsHRPscriptedandtestedthesoftware,andcontributedtodataanalysis;THBwasinvolvedindataanalysisandinterpretation;JLSwasinvolvedindataanalysis;JRGrevisedandtestedthesoftwareandrevisedthemanuscript;CPGwasinvolvedinvisualization,dataanalysisandrevisedthemanuscript;MCconceivedtheproject,revisedandtestedthesoftware,andrevisedthemanuscript;CFconceivedtheproject,designedthesoftware,interpretedthedata,andcriticallyrevisedthemanuscript.
Allauthorshavereadandapprovedthefinalmanuscript.
Table1SummaryofNanoDJnotebooksNameFunctionality0.
0_QualityControl.
ipynbEvaluatethequalitycontrolandsequencehandling1.
0_Basecalling.
ipynbTranslatestheeventsortherawelectricalsignalfromanONTsequencer(FAST5format)toaDNAsequencetoobtainaFASTAoraFASTQfile1.
1_Trim+Demux.
ipynbPerformsequencetrimminganddemultiplexing2.
0_DeNovo_Canu-Miniasm.
ipynbDenovoassemblywithCanuorMiniasm,andpolishwithRaconandPilon3.
0_DeNovo_Canu+polish.
ipynbNanopolishmodulestoimprovetheCanuassembly4.
0_DeNovo_Flye.
ipynbDenovoassemblywithFlyesoftware5.
0_DeNovo_Hybrid.
ipynbPerformdenovoassemblyofNanoporereadsinconjunctionwithIlluminareadsusingMaSuRCAand/orUnicyclersoftware6.
0_AssemblyCompare.
ipynbComparedistinctassemblyresultsbasedonQUASTsoftware7.
0_SimulateReads.
ipynbObtainsimulatedreadsmadewithNanosimsoftwareandtheNanosim-hforkwithprecomputedmodels8.
0_Alignment.
ipynbReference-basedassemblyusingeitherBWA,BLASTorRebalersoftware9.
0_AssemblyGraph.
ipynbAssemblygraphvisualizationEducational.
ipynbPerformsbasecalling(withAlbacore),qualitycontrolsteps,andaBLAST-basedclassificationofthereads(foreducationalpurposes)Rodríguez-Pérezetal.
BMCBioinformatics(2019)20:234Page3of4EthicsapprovalandconsenttoparticipateNotapplicable.
ConsentforpublicationNotapplicable.
CompetinginterestsTheauthorsdeclarethattheyhavenocompetinginterests.
Publisher'sNoteSpringerNatureremainsneutralwithregardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.
Authordetails1ResearchUnit,HospitalUniversitarioNuestraSeoradeCandelaria,UniversidaddeLaLaguna,SantaCruzdeTenerife,Spain.
2GenomicsDivision,InstitutoTecnológicoydeEnergíasRenovables(ITER),SantaCruzdeTenerife,Spain.
3DepartamentodeIngenieríaInformáticaydeSistemas,UniversidaddeLaLaguna,SantaCruzdeTenerife,Spain.
4DepartamentodeMatemáticas,EstadísticaeInvestigaciónOperativa,UniversidaddeLaLaguna,SantaCruzdeTenerife,Spain.
5CIBERdeEnfermedadesRespiratorias,InstitutodeSaludCarlosIII,Madrid,Spain.
Received:22June2018Accepted:29April2019References1.
BrownCG,ClarkeJ.
NanoporedevelopmentatOxfordNanopore.
NatBiotechnol.
2016;34:810–1.
2.
JainM,OlsenHE,PatenB,AkesonM.
TheOxfordNanoporeMinION:deliveryofnanoporesequencingtothegenomicscommunity.
GenomeBiol.
2016;17:239.
3.
QuickJ,LomanNJ,DuraffourS,etal.
Real-time,portablegenomesequencingforEbolasurveillance.
Nature.
2016;530:228–32.
4.
FariaNR,QuickJ,ClaroIM,ThézéJ,deJesusJG,GiovanettiM,KraemerMUG,HillSC,BlackA,daCostaAC,FrancoLC,SilvaSP,WuC-H,RaghwaniJ,CauchemezS,duPlessisL,VerottiMP,deOliveiraWK,CarmoEH,CoelhoGE,SantelliACFS,VinhalLC,HenriquesCM,SimpsonJT,LooseM,AndersenKG,GrubaughND,SomasekarS,ChiuCY,Muoz-MedinaJE,Gonzalez-BonillaCR,AriasCF,Lewis-XimenezLL,BaylisSA,ChieppeAO,AguiarSF,FernandesCA,LemosPS,NascimentoBLS,MonteiroHAO,SiqueiraIC,deQueirozMG,deSouzaTR,BezerraJF,LemosMR,PereiraGF,LoudalD,MouraLC,DhaliaR,FranaRF,MagalhesT,MarquesETJr,JaenischT,WallauGL,deLimaMC,NascimentoV,deCerqueiraEM,deLimaMM,MascarenhasDL,NetoJPM,LevinAS,Tozetto-MendozaTR,FonsecaSN,Mendes-CorreaMC,MilagresFP,SeguradoA,HolmesEC,RambautA,BedfordT,NunesMRT,SabinoEC,AlcantaraLCJ,LomanNJ,PybusOG.
EstablishmentandcryptictransmissionofZikavirusinBrazilandtheAmericas.
Nature.
2017;546:406–10.
5.
Castro-WallaceSL,ChiuCY,JohnKK,StahlSE,RubinsKH,McIntyreABR,DworkinJP,LupisellaML,SmithDJ,BotkinDJ,StephensonTA,JuulS,TurnerDJ,IzquierdoF,FedermanS,StrykeD,SomasekarS,AlexanderN,YuG,MasonCE,BurtonAS.
NanoporeDNAsequencingandgenomeassemblyontheinternationalSpaceStation.
SciRep.
2017;7:18022.
6.
JohnsonSS,ZaikovaE,GoerlitzDS,BaiY,TigheSW.
Real-timeDNAsequencingintheAntarcticdryvalleysusingtheOxfordNanoporesequencer.
JBiomolTech.
2017;28(1):2–7.
7.
PomerantzA,PeafielN,ArteagaA,BustamanteL,PichardoF,ColomaLA,Barrio-AmorosCL,Salazar-ValenzuelaD,ProstS.
Real-timeDNAbarcodinginaremoterainforestusingnanoporesequencing.
Gigascience.
2018;7(4):giy033.
8.
MenegonM,CantaloniC,Rodriguez-PrietoA,CentomoC,AbdelfattahA,RossatoM,BernardiM,XumerleL,LoaderS,DelledonneM.
OnsiteDNAbarcodingbynanoporesequencing.
PLoSOne.
2017;12:e0184741.
9.
ZaaijerS,ColumbiaUniversityUbiquitousgenomics2015class,ErlichY:usingmobilesequencersinanacademicclassroom.
Elife.
2016,5:e14258.
10.
AlmugbelR,HungLH,HuJ,AlmutairyA,OrtogeroN,TamtaY,YeungKY.
ReproducibleBioconductorworkflowsusingbrowser-basedinteractivenotebooksandcontainers.
JAmMedInformAssoc.
2018;25:4–12.
11.
GrüningBA,RascheE,Rebolledo-JaramilloB,EberhardC,HouwaartT,ChiltonJ,CoraorN,BackofenR,TaylorJ,NekrutenkoA.
Jupyterandgalaxy:easingentrybarriersintocomplexdataanalysesforbiomedicalresearchers.
PLoSComputBiol.
2017;13:e1005425.
12.
BoettigerC.
AnintroductiontoDockerforreproducibleresearch.
OperSystRev.
2015;49:71–9.
13.
DiTommasoP,PalumboE,ChatzouM,PrietoP,HeuerML,NotredameC.
TheimpactofDockercontainersontheperformanceofgenomicpipelines.
PeerJ.
2015;3:e1273.
14.
WickRR,JuddLM,GorrieCL,HoltKE.
CompletingbacterialgenomeassemblieswithmultiplexMinIONsequencing.
MicrobGenom.
2017;3:e000132.
15.
CookDE,Valle-InclanJE,PajoroA,RovenichH,ThommaB,FainoL.
Long-readannotation:automatedeukaryoticgenomeannotationbasedonlong-readcDNAsequencing.
PlantPhysiol.
2019;179:38–54.
16.
SedlazeckFJ,ReschenederP,SmolkaM,FangH,NattestadM,vonHaeselerA,SchatzMC.
Accuratedetectionofcomplexstructuralvariationsusingsingle-moleculesequencing.
NatMethods.
2018;15:461–8.
17.
StoiberMH,QuickJF,EganR,LeeJE,CelnikerSE,NeelyRK,LomanNJ,PennacchioLA,BrownJO.
DenovoidentificationofDNAmodificationsenabledbygenome-guidedNanoporesignalprocessing.
bioRxiv.
.
https://doi.
org/10.
1101/094672.
18.
GaraldeDR,SnellEA,JachimowiczD,SiposB,LloydJH,BruceM,PanticN,AdmassuT,JamesP,WarlandA,JordanM,CicconeJ,SerraS,KeenanJ,MartinS,McNeillL,WallaceEJ,JayasingheL,WrightC,BlascoJ,YoungS,BrocklebankD,JuulS,ClarkeJ,HeronAJ,TurnerDJ.
HighlyparalleldirectRNAsequencingonanarrayofnanopores.
NatMethods.
2018;15:201–6.
Rodríguez-Pérezetal.
BMCBioinformatics(2019)20:234Page4of4
Mineserver(ASN142586|UK CompanyNumber 1351696),已经成立一年半。主营香港日本机房的VPS、物理服务器业务。Telegram群组: @mineserver1 | Discord群组: https://discord.gg/MTB8ww9GEA7折循环优惠:JP30(JPCN2宣布产品可以使用)8折循环优惠:CMI20(仅1024M以上套餐可以使用)9折循...
HostKvm是一家成立于2013年的国外主机服务商,主要提供VPS主机,基于KVM架构,可选数据中心包括日本、新加坡、韩国、美国、俄罗斯、中国香港等多个地区机房,均为国内直连或优化线路,延迟较低,适合建站或者远程办公等。商家本月针对香港国际机房提供特别7折优惠码,其他机房全场8折,优惠后2G内存香港VPS每月5.95美元起,支持使用PayPal或者支付宝付款。下面以香港国际(HKGlobal)为...
虎跃科技怎么样?虎跃科技(虎跃云)是一家成立于2017年的国内专业服务商,专业主营云服务器和独立服务器(物理机)高防机房有着高端华为T级清洗能力,目前产品地区有:山东,江苏,浙江等多地区云服务器和独立服务器,今天虎跃云给大家带来了优惠活动,为了更好的促销,枣庄高防BGP服务器最高配置16核32G仅需550元/月,有需要的小伙伴可以来看看哦!产品可以支持24H无条件退款(活动产品退款请以活动规则为准...
nanosim为你推荐
网站空间租赁如何租用网站空间?怎么查看空间支持那些功能呢? 一般多少钱?.net虚拟主机哪里有支持net4.0的虚拟主机租服务器租个服务器?哪里租?免费国外空间免费国外全能空间申请国内免费空间国内有没有好的免费空间啊ip代理地址代理IP是什么网站空间域名网站空间,域名,操作深圳网站空间求免费稳定空间网站?网站空间申请企业网站空间申请有哪些流程啊。、、。北京虚拟主机北京服务好的虚拟主机代理商介绍几个?
山东虚拟主机 美国vps推荐 快速域名备案 krypt godaddy支付宝 双12活动 正版win8.1升级win10 java空间 免费活动 linux使用教程 in域名 架设邮件服务器 linode支付宝 秒杀品 大化网 石家庄服务器 腾讯服务器 免 forwarder alexa搜 更多