WashingtonUniversitySchoolofMedicineDigitalCommons@BeckerOpenAccessPublications2012Acasestudyforlarge-scalehumanmicrobiomeanalysisusingJCVI'sMetagenomicsReports(METAREP)JohannesGollTheJ.
CraigVenterInstitute,Rockville,MarylandMathangiThiagarajanTheJ.
CraigVenterInstitute,Rockville,MarylandSaharAbubuckerWashingtonUniversitySchoolofMedicineinSt.
LouisCurtisHuttenhowerHarvardUniversityShibuYoosephTheJ.
CraigVenterInstitute,SanDiego,CaliforniaSeenextpageforadditionalauthorsFollowthisandadditionalworksat:https://digitalcommons.
wustl.
edu/open_access_pubsPartoftheMedicineandHealthSciencesCommonsThisOpenAccessPublicationisbroughttoyouforfreeandopenaccessbyDigitalCommons@Becker.
IthasbeenacceptedforinclusioninOpenAccessPublicationsbyanauthorizedadministratorofDigitalCommons@Becker.
Formoreinformation,pleasecontactengeszer@wustl.
edu.
RecommendedCitationGoll,Johannes;Thiagarajan,Mathangi;Abubucker,Sahar;Huttenhower,Curtis;Yooseph,Shibu;andMethe,BarbaraA.
,,"Acasestudyforlarge-scalehumanmicrobiomeanalysisusingJCVI'sMetagenomicsReports(METAREP).
"PLoSOne.
,.
e29044.
(2012).
https://digitalcommons.
wustl.
edu/open_access_pubs/1170AuthorsJohannesGoll,MathangiThiagarajan,SaharAbubucker,CurtisHuttenhower,ShibuYooseph,andBarbaraA.
MetheThisopenaccesspublicationisavailableatDigitalCommons@Becker:https://digitalcommons.
wustl.
edu/open_access_pubs/1170ACaseStudyforLarge-ScaleHumanMicrobiomeAnalysisUsingJCVI'sMetagenomicsReports(METAREP)JohannesGoll1,MathangiThiagarajan1,SaharAbubucker3,CurtisHuttenhower4,ShibuYooseph2,BarbaraA.
Methe1*1TheJ.
CraigVenterInstitute,Rockville,Maryland,UnitedStatesofAmerica,2TheJ.
CraigVenterInstitute,SanDiego,California,UnitedStatesofAmerica,3TheGenomeInstitute,WashingtonUniversitySchoolofMedicine,St.
Louis,Missouri,UnitedStatesofAmerica,4HarvardSchoolofPublicHealth,Boston,Massachusetts,UnitedStatesofAmericaAbstractAsmetagenomicstudiescontinuetoincreaseintheirnumber,sequencevolumeandcomplexity,thescalabilityofbiologicalanalysisframeworkshasbecomearate-limitingfactortomeaningfuldatainterpretation.
Toaddressthisissue,wehavedevelopedJCVIMetagenomicsReports(METAREP)asanopensourcetooltoquery,browse,andcompareextremelylargevolumesofmetagenomicannotations.
Herewepresentimprovementstothissoftwareincludingtheimplementationofadynamicweightingoftaxonomicandfunctionalannotation,supportfordistributedsearches,advancedclusteringroutines,andintegrationofadditionalannotationinputformats.
TheutilityoftheseimprovementstodatainterpretationaredemonstratedthroughtheapplicationofmultiplecomparativeanalysisstrategiestoshotgunmetagenomicdataproducedbytheNationalInstitutesofHealthRoadmapforBiomedicalResearchHumanMicrobiomeProject(HMP)(http://nihroadmap.
nih.
gov).
Specifically,thescalabilityofthedynamicweightingfeatureisevaluatedandestablishedbyitsapplicationtotheanalysisofover400millionweightedgeneannotationsderivedfrom14billionshortreadsaspredictedbytheHMPUnifiedMetabolicAnalysisNetwork(HUMAnN)pipeline.
Further,thecapacityofMETAREPtofacilitatetheidentificationandsimultaneouscomparisonoftaxonomicandfunctionalannotationsincludingbiologicalpathwayandindividualenzymeabundancesfromhundredsofcommunitysamplesisdemonstratedbyprovidingscenariosthatdescribehowthesedatacanbeminedtoanswerbiologicalquestionsrelatedtothehumanmicrobiome.
Thesestrategiesprovideuserswithareferenceofhowtoconductsimilarlarge-scalemetagenomicanalysesusingMETAREPwiththeirownsequencedata,whileinthisstudytheyrevealinsightsintothenatureandextentofvariationintaxonomicandfunctionalprofilesacrossbodyhabitatsandindividuals.
OveronethousandHMPWGSdatasetsandthelatestopensourcecodeareavailableathttp://www.
jcvi.
org/hmp-metarep.
Citation:GollJ,ThiagarajanM,AbubuckerS,HuttenhowerC,YoosephS,etal.
(2012)ACaseStudyforLarge-ScaleHumanMicrobiomeAnalysisUsingJCVI'sMetagenomicsReports(METAREP).
PLoSONE7(6):e29044.
doi:10.
1371/journal.
pone.
0029044Editor:MichaelEdwardZwick,EmoryUniversitySchoolofMedicine,UnitedStatesofAmericaReceivedJuly14,2011;AcceptedNovember16,2011;PublishedJune13,2012Copyright:2012Golletal.
Thisisanopen-accessarticledistributedunderthetermsoftheCreativeCommonsAttributionLicense,whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalauthorandsourcearecredited.
Funding:ThisstudywasfundedbyNationalInstitutesofHealth(http://www.
nih.
gov/)contractsandgrants:J.
CraigVenterInstitute(contract#N01AI30071andaward#U54-AI084844);GenomeInstituteattheWashingtonUniversitySchoolofMedicine(award#U54HG004968);HarvardSchoolofPublicHealth(award#1R01HG005969).
Thefundershadnoroleinstudydesign,datacollectionandanalysis,decisiontopublish,orpreparationofthemanuscript.
CompetingInterests:Theauthorshavedeclaredthatnocompetinginterestsexist.
*E-mail:bmethe@jcvi.
orgIntroductionSeverallargescalemetagenomicstudieshavebeencompletedorareunderwaytoinvestigatethegeneticcompositionofmicrobesintheirnaturalenvironment.
ProminenteffortsincludetheGlobalOceanSampling[1–3],interrogationsofavarietyofdiverseenvironments[4–6]andmorerecentlythehumanmicrobiome[7,8].
Increasinglysuchworkisplannedandcarriedoutaspartoflargerconsortiaandfundingefforts.
ExamplesincludeMetaHIT[7],theEarthMicrobiomeProject[9],http://www.
terragenome.
org,andtheHMP[10].
TheHMP,representsanefforttocharacterizethemicrobialcommunitiesassociatedwithmultiplehabitatsacrossthehumanbody,andisanexcellentexampleofthecomplexity,scaleandnatureofsuchprojectsandconsortia.
Withitsfocusontheresidentbacteriaofsocallednormaldonors,thisprojectprovidesacriticalbaselineforfuturemetagenomicstudiesofthehumanmicrobiomeincludingtheirassociationswithhumanhealthanddisease.
Asamulti-facetedcommunityresource,theHMPincludestaxonomicmarkerstudiesof16SrRNAgenesequences[11]aswellasawholegenomeshotgun(WGS)datasurvey[10,12–15].
ThisWGSmetagenomicdatasurveyhasexaminedthetaxonomyandfunctionalpotentialofmicrobialcommunitiesfrom741samplestakenfromuptofifteenbodyhabitatsof108healthyadultmenandwomengeneratingintotalapproximately38billionshortreadsequences(3.
5Tbp)ofwhichover14billionsequenceswereprocessedandanalyzedasapartofthisstudy.
Thisinformationiscomplementaryto16SrRNAgenebasedorganismalidentificationsandothertaxonomicmarkersequences,howeverthetaskofannotatingandcharacterizinglargecollectionsofsuchdataissimilarlychallenging.
Toidentifytaxonomicandfunctionalsignatures,WGSmetagenomicdataarecuratedbyeitherdirectlyannotatingshortreads[16,17]or,aswouldbeperformedforthesequencedgenomeofasingleorganism,annotatedpostassemblytakingadvantageofthelargercontigs[18].
Annotationofthesedataisacomputationallyintensiveactivity,whichrequiresextensiveBLAST-likehomologysearchesthatcanbedifficultbothtoperformandstore.
Fortunately,billionsofshortsequencereadsPLoSONE|www.
plosone.
org1June2012|Volume7|Issue6|e29044canbemostusefullyanalyzedaftercondensingthedatatotaxonomic,enzymatic,and/orpathwayabundances,whichcansubsequentlybestudiedmoreefficiently.
Toprovideacomputationalframeworkwithinwhichtoperformsuchtasks,wehavedevelopedJCVIMetagenomicsReports(METAREP),anopensourcetoolforhigh-performancecomparativemetagenomics[19].
Thesoftwareutilizesascalabledatawarehousesolutionthatallowseffectivestorageanddynamicqueryingofannotationdatathatcanbeproducedbyvariousannotationmethods.
ThedatamodelofMETAREPversion1.
3.
1,presentedinthisreport,hasbeenexpandedtoallowthedirectimportationandanalysisofresultsproducedbytwoannotationpipelinesusedintheHMP:(1)JCVI'sProkaryoticMetagenomicsAnnotationPipeline(JPMAP)[18]usedfortheannotationofopenreadingframesfromassembliesand(2)HUMAnN[16]toannotateshortreads.
Inaddition,frequenciesoffunctionalandtaxonomicattributescanbeadjustedusingcustomannotationweights.
Thescalabilityofsuchweightedfrequencycalculationshasbeenimprovedbyutilizingdistributedsearches.
Inthisstudy,wepresentadvancementstotheMETAREPsoftwarefocusingontheimplementationofanextendeddatamodel,improvedscalabilityandanalyticalfeatureswhichhavefacilitatedbiologicalcomparisonsandinterpretationofhumanmicrobiomemetagenomicdatageneratedbytheHMPacrossmultiplesamples,bodyhabitatsandindividuals.
Inparticular,weintroduceseveralbiologicalscenariosandhypothesesalongwithappropriateanalyticalstrategiesdesignedtoinvestigatethesequestionsaswellasdemonstrateimportantdownstream,analyticalfeaturesofMETAREPincluding:howtofilterthedataforenzymaticmarkers,visualizemarkercompositionacrossorganismsandhumanhabitats,conducthierarchicalclusteringanalysisofindividualsamples,andcarryoutnon-parametricstatisticalanalysestodetectdifferentiallyabundanttaxaandpathwaysinoralhabitats.
TheresultsofthesescenariosprovidetemplatesofanalyticalstrategiesforfutureusersofMETAREPthatcanbeappliedtosimilardata.
Further,theresultsofthecurrentscenarioshaverevealednewinsightsintothetaxonomicandfunctionalrelationshipsbetweenmultiplebodyhabitatsandindividualsofthehumanmicrobiome.
Finally,wealsoprovidespecificdescriptionsofsoftwarearchitectureimprovementsandresultsoftestsdesignedtobenchmarkperformanceresponsetimeofthesoftware.
Overall,thisworkintroducesanimportantsoftwaretoolandstrategiesforcomparativeanalysisoflarge-scalemetagenomicdatageneratedfromcomplexexperimentaldesigns.
ResultsHumanMicrobiomeCaseStudyForthiscasestudy,wehaveestablishedadedicatedinstanceofoursoftware(version1.
3.
1)tohostHMPWGSannotationsathttp://www.
jcvi.
org/hmp-metarep.
TheHMPMETAREPin-stancecurrentlyallowsinteractivedataanalysisofover400millionweightedgeneannotationspredictedfrom14billionshort-readsbyHUMAnNaswellasORF-basedannotationspredictedfromover700assembliesbyJPMAP(Table1).
Eachannotationentrymaypossessmultipleattributes.
Supportedattributesrangefromorganismalinformation(NCBItaxonomy),tofunctionaldescrip-tion,EnzymeClassification(EC),GeneOntology(GO)[20]orKEGGOrthology(KO)[21]aswellasKEGGandMetaCyc[22]pathwayassignments.
Inaddition,eachannotationmaybegivenaweighttoadjustitsoverallabundance(seeMethodssectionfordynamicweightingalgorithm).
Althoughoutsidethescopeofthesoftware,forcompletenessabriefdescriptionoftheHMPWGSsequencegeneration,preprocessing,andannotationissumma-rizedintheMethodssection.
Aftersuccessfulinstallationofthesoftware,annotationscanbeimportedandanalyzed.
Inthisreport,wefocusonanalyticalfunctionsavailablethroughtheMETAREPComparepagewhichallowsuserstofilterandcomparemultipledatasetsandvisualizedifferencesusingad-vancedvisualizationtools(Figure1).
CompareoptionsincludetheTable1.
SummaryofavailabledatasetsbybodyhabitatsortedbythenumberofWGSreads.
Habitat#HUMANnN#Reads#WeightedAnnotationsSumofAnnotationWeights#AssemblyDatasets[million][million][million]DatasetsStool686262781563.
8151Supragingivalplaque8941921121538.
0118Buccalmucosa1161449104731.
8107Tonguedorsum23118228501.
8129Rightretroauricularcrease1741213.
0168.
817Posteriorfornix5529715110.
053Anteriornares911643838.
587Subgingivalplaque7150948.
67Leftretroauricularcrease8145663.
79Palatinetonsils6135654.
76Throat6129653.
17Keratinizedgingiva245223.
96Saliva543516.
83Vaginalintroitus3611.
83Midvagina2210.
72Total498146134244916.
0705Columns2–5refertotheHUMAnNdatasets.
doi:10.
1371/journal.
pone.
0029044.
t001HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org2June2012|Volume7|Issue6|e29044generationofabsoluteandrelativecountsummaries,hierarchicalclustering,heatmaps,andmulti-dimensionalscalingplots,aswellastheexecutionofstatisticaltests.
PlotscanbeexportedaspublicationreadyPDFfileswhilecounts,distancesmatrices,andstatisticalresultscanbeexportedastextfiles.
Inthefollowing,wedescribethreebiologicalscenariostohighlighthowthesecomparefunctionscanbeusedforexploratoryanalysisoftheweightedHUMAnNreadbasedannotations.
Scenario1:EnzymaticMarkersContrastedAcrossBodyHabitatsandTaxaScenario1Introduction.
Pyruvateisakeyorganiccarbonintermediatecentrallypositionedattheintersectionofassimilatoryanddissimilatorypathways,andrespiratoryandfermentativemetabolism[23].
Assuch,itcanbeexpectedtobeimportantinthemetabolismofthehumanmicrobiome.
However,thespecificenzymaticprocessesusedforitsmetabolismandtaxonomicmembershiparelikelytovaryacrossbodyhabitats.
Toevaluatethishypothesis,threemajorenzymesofpyruvatemetabolism:1)pyruvatedehydrogenasecomplex(PDHC)[24]2)pyruvate:ferre-doxinoxidoreductase(PFOR)[25]and3)pyruvateformatelyase(PFL)[26]havebeenexaminedfortheirrelativeabundancebytaxonomicprofilesandcomparedacrossmultiplebodyhabitats.
Acommonrouteofpyruvatemetabolismisoxidativedecar-boxylationcatalyzedbyPDHCtoyieldthecentralintermediateacetyl-coenzymeA(CoA),whichcanbefurtheroxidizedthroughtheTCAcycle,orusedinanabolicpathwaysforsynthesisofessentialcellcomponents,orcarbonandenergystoragecompounds.
ThePDHCbelongstothefamilyof2-oxoaciddehydrogenasewhichconsistsofmulti-subunitcomplexesrespon-siblefortheirreversibleconversionof2-oxoacidstotheircorrespondingacyl-CoAderivatives.
ThePDHCiscomposedofthreesubunits,componentE1,pyruvatedehydrogenase(1.
2.
4.
1),componentE2,dihydrolipoyltransacetylase(2.
3.
1.
12)andcom-ponentE3,dihydrolipoyldehydrogenase(1.
8.
1.
4)[27].
AkeyenzymaticcounterparttothePDHCinenergymetabolismunderanaerobicconditionsisPFOR(1.
2.
7.
1)whichcatalyzesareversible,CoA-dependentoxidativedecarboxylationofpyruvateyieldingacetyl-CoAandCO2.
Asareversiblereaction,thisenzymealsomediatesthemainCO2fixingreactionformethanogensandavarietyofphotosyntheticorganisms[28].
Incontrast,somebacteriaarecapableoffermentationinwhichorganicintermediatesofmetabolismsuchaspyruvate,serveasFigure1.
ScreenshotoftheMETAREPComparePage.
TheComparepageallowsuserstofilter,compareandvisualizeannotationattributesacrossmultipledatasets.
Asillustratedintheupperpanel,theusercanfindandselectdatasetsofinterest(herepooledbodyhabitatswereselected).
Themiddlepanelillustratesfilterandcompareoptions(heredatasetswerefilteredforthepyruvatedehydrogenasecomplexandtheheatmapplotoptionwasselected).
Thebottompanelshowsthecompareresultsandallowsuserstoswitchbetweenannotationattributesandspecifyitslevelofgranularity(herethetaxonomyattributeandphylumlevelwereselected).
doi:10.
1371/journal.
pone.
0029044.
g001HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org3June2012|Volume7|Issue6|e29044electronacceptorsinthemaintenanceofoverallredoxbalance;whileATPneededforcellgrowthisderivedfromsubstrate-levelphosphorylation.
PFL(2.
3.
1.
54),ahomodimer,catalyzesthereversiblereactionofpyruvateandCoAintoacetyl-CoAandformate[29].
Scenario1METAREPAnalyticalMethods.
Toundertakeacomparisonofthedistributionofthesethreeenzymesacrossbodyhabitats,analyticalfunctionsavailablethroughtheME-TAREPComparepage(Figure1)wereemployed.
Tocomparethedistributionofpyruvatemetabolismbytaxonomywefilteredpooleddatasetsfrom13bodyhabitats(n=493HUMAnNdatasets;97donors)forthethreepyruvatemetabolismenzymes(PDHC,PFOR,PFL)andcomparedtheirabundanceacrosstaxonomyatthephylumlevel(seeMethodssectionfordetailsofthefilterqueries)withmultiplesdistancemetrics(Euclidean,Bray-CurtisandMorisita-Horn)toexaminethesubsequentclustertopologiesforconsistency.
Theabsoluteweightedcountmatrices(phylaversusbodyhabitats)foreachmarkerenzymecanbefoundinTableS1.
Forallofthedistancemetricsusedconsistentdendrogramtopologieswererecoveredforall13PFOR,12PDHCand10PFLfilteredbodyhabitats(FigureS1).
HeatmapplotswithdendogramsusingtheMorisita-HorndistancemetricareshowninFigure2.
Scenario1Results.
ResultsofthePDHCanalysisrecoveredatotalof39phylaindicatingthebroadtaxonomicdistributionofthisenzymecomplexacrossprokaryotesandeukaryotes.
However,thevastmajorityofthetotalabundance(94%)wascontributedbyfivephyla,Actinobacteria(29%),Firmicutes(27%),Proteobacteria(24%),Bacteroidetes(12%)andFusobacteria(2%).
Theremaining6%ofthetotalabundancewascontributedbytheremaining34phyla,witheachclassificationcontributing,1%towardsthetotalabundance.
Themajorityoforalhabitats,especiallythesaliva,palatinetonsils,andthroat,alongwiththetonguedorsum,keratinizedgingivae,andbuccalmucosaclusteredtogetherwiththeposteriorfornixtoformaclusterdrivenbyhighrelativeabundancesofFirmicutes(range66%–29%)andProteobacteria(range12%–47%)(Figure2a).
TheanteriornareswaspositionedmostlycloselytotherightandleftretroauricularcreaseinaclusterwithhighabundanceofActinobacteria(range59%–84%).
Thesubgingivalandsupragingivalplaqueformedaseparateclusterthatwasplacedmostcloselytotheanteriornaresandskinclusterduetovariationintheabundanceofseveralphyla,whilestoolwasthemostdistantlyrelatedhabitatduetothehighabundanceofBacteroidetes(57%).
TheanalysisofPFLrecovered15phylaintotal,ofwhich97%ofthetotalabundancewascontributedbysixphyla,Firmicutes(50%),Proteobacteria(26%),Bacteroidetes(10%),Actinobacteria(7%),Fusobacteria(2%)andCyanobacteria(2%).
Theremaining3%ofthetotalabundancewascontributedbytheremainingninephyla,witheachclassificationcontributing,1%towardsthetotalabundance.
Aclusteroforalcavityhabitatsincludingpalatinetonsils,saliva,throat,tonguedorsum,supragingivalandsubgin-givalplaquewererecoveredinwhichapproximately50%oftheabundancefromthebodyhabitatinquestionwasattributedtoFirmicutes(range43%–59%)andapproximatelyone-thirdtoProteobacteria(range31%–35%)(Figure2c).
Theremainingoralcavityhabitats(keratinizedgingivaeandbuccalmucosa)clusteredmostcloselywiththerightandleftretroauricularcreasebasedlargelyonincreasedabundanceofFirmicutesinthesehabitats(range70%–88%).
TheposteriorfornixclusteredclosesttotheskinbasedinpartonarelativelyhighandsimilarabundanceofFirmicutes(60%)whiletheanteriornaresandstoolwerethemostdistantlyrelatedbodyhabitats.
AlthoughtheyexhibitedsimilarabundancesofFirmicutes(45%anteriornares,46%stool)theywereseparatedfromoneanother,andtheremainingbodyhabitatsbasedontherelativelyhighabundanceofActinobacteriaforanteriornares(32%,highestofallbodyhabitats)andBacteroidetesinstool(26%,highestofallbodyhabitats)alongwithvariationinotherphyla.
IncontrasttoPDHCandPFL,theanalysisofthePFORrecoveredamorevariableclusteringofbodyhabitatswithinmajorbodyregionsandverydifferenttaxonomicpatterns(Figure2b).
Inthisanalysis,14phylawererecoveredintotalofwhich95%ofthetotalabundancewascontributedbysevenphyla,Firmicutes(27%),Euryarchaeota(25%),Crenarchaeota(20%),Proteobac-teria(10%),Thermotogae(9%),Actinobacteria(2%)andDictyo-glomi(2%).
Theremaining5%ofthetotalabundancewascontributedbytheremainingsevenphyla,witheachclassificationcontributing,1%towardsthetotalabundance.
Themajorityoftheoralcavitysites,saliva,palatinetonsil,throat,buccalmucosaandsupragingivalplaquealongwithstoolformoneclusterwiththehighestabundancefromFirmicutesandhigherabundancesofThermotogae(range7%–10%)relativetotheremainingbodyhabitats.
TheremainingbodyhabitatsrevealedthehighestabundancesinEuryarchaeotaandtoalesserextentCrenarch-aeota.
TheleftandrightretroauricularcreasesamplesweremostdistantlyrelatedtoallotherbodyhabitatsandweredominatedbymembersoftheCrenarchaeota(81%and73%,respectively).
Scenario1Discussion.
Theabundancesandtaxonomicdistributionsrecoveredbetweenthesethreeenzymesvariedacrossbodyhabitats;howevercertainhabitatsweremorelikelytobefoundclusteredtogetherandthisresultwasconsistentregardlessofdistancemetricused,suggestingclosertaxonomicandfunctionalrelationshipsbetweenthem.
Thepalatinetonsilsandthroatwhichareinclosephysicalproximitywithintheoralcavity,alongwithsalivawhichcontactstheentireoralcavity[15],weremostconsistentlyclusteredtogether(e.
g.
,havetheshortestdistancesbetweenthem)andweremostconsistentlyclusteredwithotherhabitatsfromtheoralcavity.
Thesubgingivalandsupragingivalplaquewhicharebothbiofilmsassociatedwithteeth,andtherightandleftretroauricularcreasewhicharephysicallydisparatefromoneanother,butrepresentthesameskintype[15],werealsoclusteredclosesttooneanother(withtheexceptionoftheplaquesamplesinthePFORanalysis).
Howevertheirtopologicalpositionsrelativetootherhabitatsfromthesamebodyregion(oralcavityandanteriornares,respectively)werenotconsistentregardlessofdistancemetricorclusteringalgorithmused.
Theremainingoralcavity(keratinizedgingivae,buccalmucosa,tonguedorsum),stool,anteriornaresandposteriorfornixexhibitedthemostvariableplacementintermsofclustertopology.
Takentogether,theseresultssuggestthatmetabolicfunctioncanvaryacrossregionsofthebodyandthatphysicalproximity(whethercloseorseparatedbyrelativelygreaterdistances)isnotnecessarilythemostimportantindicatoroftaxonomicprofilesimilaritybasedontheuseoffunctionalgeneabundanceasabiomarker.
Insteaddifferenthabitatscanexistwithinandbetweenbodyregionsthatexhibitvariablecommunitystructure.
Theexplorationofmorerefineddefinitionsofhabitatmaybenecessarytoimproveourunderstandingofmicrobialbiogeographyinhumans.
Inallcases,thetaxonomicprofilesrevealedthatthemajorityoftherelativeabundancewasrecoveredwithinafewphyla(5–6).
Conversely,morelineageswererecoveredwithlowabundanceincludingatleastonephylumfromtheDomainEukaryotaineachexamplepresented.
Thisfindingwhenusingfunctionalgenesasbiomarkershastoourknowledgenotbeenestablishedpreviouslyininvestigationsofthehumanmicrobiome.
AnunusualfindingfromtheexaminationofthePFORprofileswastherelativelyhighHumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org4June2012|Volume7|Issue6|e29044HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org5June2012|Volume7|Issue6|e29044abundanceofCrenarchaeotaandEuryarchaeotarecoveredfromtheskinhabitatsastherearefewreportsofarchaeaassociatedwiththeskinandtoourknowledgehasnotbeenreportedpreviouslyusingametabolicmarker.
ThePFORprofilealsorevealedthepresenceoflineageslesswellstudiedintermsoftheirassociationswithhumanssuchastheThermotogae.
Collectively,theseresultssuggestimportantnewbiologicalinsightsincluding:a)theclusteringpatternsoftaxonomicabundancederivedfromfunctionalgenesarenotalwaysconsistentevenwhenbodyhabitatsfromsimilarregionsofthebodyareconsideredb)thepresenceofrelativelyhighabundanceofCrenarchaeotaandEuryarchaeotaassociatedwithskinasdeterminedbyametabolicmarker(PFOR)andc)thatalthoughmanylineages(e.
g.
ThermotogaeandtheArchaea)maybelessprevalentintermsoftotalabundancewithinthehumanmicrobiometheynonethelessrepresentanimportantreservoirofgeneticdiversity.
Scenario2:SampleVariationofBodyHabitatsandIndividualsAcrossTaxaandPathwaysOverTimeScenario2Introduction.
Thenatureandextentofvariationwithinandbetweenindividualsandbodyhabitatsovertimeisanimportanttopicofstudyinhumanmicrobiomeresearch[30].
Previousstudiesbasedon16SrRNAgenebasedtaxonomicsurveyshavesuggestedthatmicrobialcommunitytaxonomicprofilesweredeterminedlargelybybodyhabitathowever,interpersonalvariabilitywashighwithinbodyhabitats[31].
Morerecently,fromametagenomicsurveyofthehumangutmicrobiomeitwassuggestedthatindividualscanbegroupedbasedonprimarilytaxonomiccompositionandtoalesserextent,functionalprofiles[7].
DatasetsproducedbytheHMPprovideanimportantopportunitytocontinuetheseinvestigations.
Herewehypothesizethatalthoughtaxonomicandfunctionalcompositionisexpectedtovarybetweensamplesfromdifferentindividualsandovertime,thosesamplestakenfromthesamebodyhabitatswillbemoresimilartooneanother.
Scenario2METAREPAnalyticalMethods.
Thesoftwareprovidesseveraloptionstoquantifyandvisualizesamplevariationthatcanbeusedtotestthishypothesis.
ToexaminethisquestionandtohighlightthehierarchicalclusteringfunctionalityofMETAREP,thevariationoftaxonomicandpathwaycompositionwithinandacrossbodyhabitatsandindividualdonorsovertwotimepointswasinvestigated.
Datasetsfrom37donors(24males,13females)overtwosamplingtimepointsand15bodyhabitatswereinvestigated(84firstandsecondvisitsamplepairs,n=168).
Afullcomplementoffirstandsecondvisitsfromall37individualsacrossallbodyhabitatswasnotavailable.
Therefore,certainbodysiteshaveagreatercontributionofdonorswithtwovisits.
AbreakdownperbodysitecanbefoundinTableS2.
DatasetswereclusteredbasedontaxonomyattheFamilylevel,andfunctionusingKEGGpathways.
DendrogramswereproducedusingtheMorisita-Horndistancemetricincombinationwiththeaveragelinkageclusteringalgorithm.
Initiallyallsampleswereclusteredinordertovisualizetheoverallpatternsproducedfromthesedatasets(FiguresS2andS3).
Foreasiervisualizationofthesignificanttrendsdeterminedfromthelargerdataset,asubsetofsamples(24firstandsecondvisitsamplepairs,n=48)wasalsoclustered(Figure3).
Thissubsetconsistedoftwosamplingtimepointsfrom12malesand12femalestakenfromfivebodyhabitats(TableS2).
Scenario2Results.
Theresultingdendrograms(Figure3,FiguresS2andS3)showedthatthemajorityofsamplesclustertogetherbasedonbodyhabitatusingboththetaxonomyandfunctionaldatasets.
Thedendrogramtopologybytaxonomy(Figure3a,FigureS2)wasrelativelymoreconsistentingroupingsamplesfromidenticalorsimilarbodyhabitatscomparedtothatrecoveredbyfunction(Figure3b,FigureS3)inthatoralsiteswereclosesttooneanotherfollowedbysamplesfromtheanteriornares,skinandfinallyvaginaandstool.
Incontrast,thestool,anteriornaresandposteriorfornixsamplesproducedmorevariableclusteringbyfunction(Figure3b,FigureS3).
However,exceptionstoconsistentclusteringofsamplesbybodyhabitatwerefoundwithinboththetaxonomicandfunctionalanalyses.
Forexample,theoralcavitysitesaredominatedbytwolargeclusters,oneforsupragingivalplaque(Figure3aSPCluster1,Figure3bSPCluster2,FiguresS2andS3)andasecondforbuccalmucosa(Figure3aBMCluster1,Figure3bBMCluster2,FiguresS2andS3).
However,inbothconditions,thereareexamplesofbuccalmucosasampleswhichclusterwiththesupragingivalplaqueandviceversa(Figure3aSPCluster1,Figure3bSPCluster2,Figure3aBMCluster1,Figure3bBMCluster2,FiguresS2andS3)Inthedendrogrambyfunction,theanteriornaressampleswereplacedinseverallocations,includingclustersclosesttosupragingivalplaque,(Figure3bANCluster2),stool(Figure3bANCluster3)andposteriorfornix(Figure3bANCluster4).
Stoolsampleswerebrokenintothreeclusters(Figure3b,STCluster3,STCluster4,STCluster5)withthemajorityinclustersclosesttotheanteriornaresandposteriorfornix(Figure3b,STCluster4,STCluster5),whiletwoofthesamples(Figure3bSTCluster3)wereplacedclosesttoanteriornaresandbuccalmucosa(Figure3bANCluster3,BMCluster2).
Fortheoralcavitybodysiteswithlowrepresentation(throat,palatinetonsils,saliva,subgingivalplaques,andtonguedorsum)ingeneralitwasmoredifficulttodeterminetherobustnessofsampleplacementwithintheoralcavity.
However,thesubgingivalplaquesampleswerealwaysclusteredwithsupragingivalplaqueinboththetaxonomyandfunctionaldendrograms(FiguresS2andS3).
Examinationofthetemporalcomponentinthedendrogramsrevealedthatforbothtaxonomyandfunctioninthemajorityofinstances,thefirstandsecondtimepointfromaparticularindividualandbodysitewerenottheclosestsamplestooneanother.
However,thesesamplesweregenerallyfoundwithinthesamecluster.
Databasedonthe48pairwiseMorisita-Horndifferencesstronglysupportedthatdifferencesbetweenfirstandsecondtimepointsweresignificantlylowerwhencomparedtoallpairwisedistances(onesidedWilcoxonrank-sumtestp-valuev0:00001).
Neverthelesstherewerenotableexceptions.
Forexample,inboththetaxonomyandfunctiondendrogramtheplacementoftheposteriorfornixsamplefromthefirsttimepointfromindividual159227541(Figure3aPFCluster1,Figure3bPFCluster3)isclosesttostoolsamples(Figure3aSTCluster1,STCluster2,Figure3bClusterST3),whilethesecondtimepointfromthishabitatandindividualisclosettootherposteriorfornixsamples(Figure3aPFCluster2,Figure3bPFCluster4,PFCluster5).
Inboththetaxonomyandfunctiondendrogram,thefirstandsecondtimepointsfromtheanteriorFigure2.
Heatmapplotsofthreeenzymaticmarkers.
Markerabundanceiscontrastedacrossphyla(columns)andbodyhabitats(rows)usingMorisita-Horndistancesincombinationwiththeaveragelinkageclusteringmethod.
Colorsencodetherelativeabundanceoftheselectedfeature-datasetcombination(darkred0%towhite100%)whilethedendogramsatthetopandleftshowannotationfeatureanddatasetdifferences,respectively.
doi:10.
1371/journal.
pone.
0029044.
g002HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org6June2012|Volume7|Issue6|e29044naresfromindividual765560005arenotplacedclosesttooneanother.
Infact,whilethefirsttimepointisgroupedwithotheranteriornaressamplesinthefunctiondendrogram,(Figure3aANCluster1)thesecondtimepointisclosertoaposteriorfornixandstoolsamples.
(Figure3bClusterAN4).
Incontrast,bytaxonomy,althoughbothsampleswerenotclosesttooneanother,theywereplacedinaclusterofanteriornaressamples(Figure3aANCluster1).
Figure3.
Hierarchicalclusterplotsof48samplestakenfrom12femalesand12malesattwodifferenttimepoints.
HierarchicalclusteringanalysisofarandomsubsetofhumanmicrobiomesamplestakenfromfivehumanbodyregionsclusteredbyNCBItaxonomyatthefamilylevel(a)andbyKEGGpathways(b).
ClustersweregeneratedbytheaveragelinkageclusteringmethodusingtheMorisita-Hornindextogenerateadistancematrix(shownonthex-axis).
Datasetlabelsencodethefollowinginformation[donorID]-[habitat]-[gender]-[timepoint]-[sampleID]-[annotation-type].
Forexample,thedatasetlabel159814214-an-m-2-SRS047225-mtrencodesasamplefromamaledonor(ID159814214)takenfromtheanteriornaressiteattimepoint2withsampleID(SRS047225)annotatedbythemetabolicreconstruction(HUMAnN)pipeline(mtr).
Thedottedlinerepresentsthelevelatwhichthetreewascutforanalysis.
Theresultingclustersarelabeledasfollows:AN(anteriornares),BM(buccalmucosa),SP(supragingivalplaque),ST(stool),andPF(posteriorfornix).
doi:10.
1371/journal.
pone.
0029044.
g003HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org7June2012|Volume7|Issue6|e29044Scenario2Discussion.
Inthisscenario,taxonomicandfunctionalcompositionsvariedacrossindividuals,bodyhabitatsandtimealthoughvariationwithinabodyhabitatwasgenerallylessthanbetweenhabitatsasevidencedbythegenerallyconsistentclusteringofsamplesbyhabitat.
Exceptionswerefoundwhichsuggestthatgroupsofindividualsmayexistinwhichmicrobiomecompositionsaremoresimilartooneanotherandthatdiscretegroupsofsuchindividualscouldberecoveredwithtaxonomicorfunctionaldata.
Thisfindingrequiresmoreinvestigationbuthasimportantimplicationsconcerningtheabilitytousetaxonomicandfunctionalprofilestogroupindividuals.
Thetopologyrecoveredwasmorevariableforfunctioncomparedtotaxonomy.
Further,theseresultssuggestthatwithsomenotableexceptions,thereisgenerallymodestvariationinbothtaxonomyandfunctioninthemicrobiomewithinanindividualovertime.
Theseresultscouldbeinfluencedbytechnicalfactorssuchassomedifferencesinsamplecoverage,ortherelativelygreaterdifficultyofaccuratelyassigningORFstopathwayscomparedtotaxonomicclassifica-tions.
Collectively,theseresultssuggestimportantnewbiologicalinsightsincludinga)thattaxonomyandfunctionarenotnecessarilycoupled,b)thatthemicrobiomecanvaryacrossindividuals,habitatsandtimeandc)althoughvariationbetweenindividualstendstobehigherthanbetweenbodyhabitatsitmaybepossibletousetaxonomicandfunctionalprofilestogroupindividuals.
Scenario3:DetectionofDifferentiallyAbundantTaxaandFunctionbetweenThreeOralHabitatsScenario3Introduction.
Thehumanoralcavityconsistsofavarietyofsurfacesandenvironmentswhicharecolonizedbydistinctcommunitiesofmicrobialorganisms[32].
IntheHMP,bodyhabitatssampledfromtheoralcavityincludethebuccalmucosawhichistheepithelialliningofthecheekandlips,thetonguedorsum,orpapillatedsurfaceofthetongue,andsupragingivalplaquewhichisabiofilmonthetoothsurfaceabovethedentogingivaljunction[10,15].
Surveysofdiversitybasedon16SrRNAgenebasedtaxonomicprofileshaveindicatedthatover600taxaatthespecieslevelarefoundextensivelyinthehumanmicrobiome[33].
MetagenomicdatafromtheHMPnowprovidesanopportunitytoextendtheseanalysesbeyond16SrRNAgenebasedsurveystoexaminationsoftaxonomicandfunctionalprofilesofdistincthabitatsfromthenormaloralcavity.
Assuggestedinpreviousstudiesoftheoralcavity,andresultsfromScenarios1and2inthisstudy,wehypothesizethatstatisticallysignificantdifferencesinmicrobialdiversityandfunctionarepresentinHMPmetagenomicdatafromtheoralcavity.
Scenario3METAREPAnalyticalMethods.
Totestthishypothesisananalysiswasundertakentodeterminestatisticallysignificantdifferencesinpathwaysandtheirassociatedtaxonomicdistributions.
Specifically,threeoralhabitatswereinvestigated:1)buccalmucosa(n=116),2)thetonguedorsum,(n=23)and3)supragingivalplaque(n=89).
ThesethreeoralbodyhabitatswereselectedsincetheyhavethegreatestrepresentationofWGSdatasetsintheoralcavityandtogetherconstitutemorethanonefourthofallHMPmetabolicreconstructiondatasets(Table1).
TheMETAREPComparepage(Figure1)offerstwononparametrictests,theWilcoxonrank-sumtestandMetastats,amodifiednonparametrict-test[34].
Bothtestscanbeusedtoidentifysignificantdifferencesforacertainannotationattributebetweentwosamplepopulations.
Forthisscenario,allpossiblepairwisecomparisonsbetweenhabitatswerecomparedbasedontaxonomicdesignationsatthephylumlevelandmetabolicfunctionsatthepathwaylevel(Figure4).
AfilteringstepwasappliedtothisanalysisinwhichORFsclassifiedasChordatawereremovedtoeliminateORFsmostlikelyassociatedwiththehumanhost.
Thisamountedto0.
5%ofthethreepooledoralhabitatdatasets.
TheabilitytoeasilyfilterusingavarietyofdatasetvariablesdemonstratesoneofthestrengthsofMETAREP.
Thestatisticallysignificantphylaandpathwaysdeterminedfrombothtestswereexportedastextfiles(Bonferronicorrected(adj.
)p-value,0.
05,10000Metastatspermutations,TableS3andS4).
Scenario3Results.
Pairwisecomparisonsofthethreeoralhabitatsrevealedtwosignificanttrendsintaxonomicprofilessupportedbybothstatisticaltests(adj.
p-value,0.
01).
SignificantdifferencesweredeterminedintheabundancesoftheFirmicutes,withthisphylabeingmostabundantinthebuccalmucosa,followedbythetonguedorsum,andleastabundantinthesupragingivalplaquehabitats.
ThesecondsignificanttrendcouldbeseenintheabundanceofActinobacteria.
ThedatasupportedadecreaseintheabundanceofActinobacteriafromitshighestvalueinthesupragingivalplaque,followedbytonguedorsum,toitslowestvalueinthebuccalmucosa(Wilcoxonadj.
p-value,0.
01).
ThistrendwasalsosupportedatthesamelevelofsignificancebyMetastatsexceptforthecomparisonofbuccalmucosaversustonguedorsum(Metastatsadj.
p-value=0.
113).
Inadditiontothesetrends,bothtestsindicatedthatBacteriodetesweresignificantlylessabundant(adj.
p-value,0.
01)inbuccalmucosawhencomparedtotheotherhabitats.
NosignificantdifferenceintheabundanceofBacterioidetescouldbeobservedbetweenthetonguedorsumandsupragingivalplaque,however(Wilcoxonadj.
p-value=0.
733;Metastatsadj.
p-value=0.
097).
Pairwisehabitatcomparisonsofpathwayattributesrevealeddifferencesintheirdistributionandabundance.
IngeneralfewerpathwaysrevealedstatisticallysignificantdifferencesinabundanceusingtheMetastatsversustheWilcoxonrank-sumtests,respec-tively.
Supragingivalplaquehadthehighestoverallnumberofenrichedpathways(Table2).
Amongthekeydifferencessupportedbybothstatisticaltestswerethosedeterminedintheabundanceofmetabolicfunctionsrelatedtoantibioticbiosynthesis,pathogenesisandN-glycanbiosynthesis(Table3).
Forexample,theabundancesofKEGGpathwaysrelatedtotetracyclinebiosynthesis(ko00253),penicillinandcephalosporinbiosynthesis(ko00311),andbutirosinandneomycinbiosynthesis(ko00524)wasenrichedinbuccalmucosarelativetosupragingivalplaque,converselybiosyntheticpathwaysrelatedtovancomycingroupantibiotics(ko01055),streptomycinbiosynthesis(ko00521)andnovobiocinwereelevatedinthesupragingivalplaquerelativetothebuccalmucosa.
Severaldifferencesinpathwaysrelatedtopathogenesiswerealsorevealed.
Forexample,thepathwaydescribingStaphylococcusaureusinfection(ko05150)wasfoundtobesignificantlyenrichedinthebuccalmucosarelativetothesupragingivalplaque,whileepithelialcellsignalinginHelicobacterpyloriinfection(ko05120)waselevatedinthetonguedorsumrelativetosupragingivalplaque.
N-linkedproteinglycosylationbiosynthesis(ko00510)wasenrichedinsupragingivalplaqueversusbuccalmucosa.
Scenario3Discussion.
Basedonthenumberofpathwayswhichdifferinabundance,resultsfromthisinvestigationsuggestthatthemetabolicpotentialofthebuccalmucosaandtonguedorsumaremoresimilartooneanother,relativetothesupragingivalplaque.
ThesedifferencesarealsogenerallyconsistentwiththesignificanttrendsdeterminedinthepairwisecomparisonsoftaxonomicprofilesinwhichchangesinabundanceinFirmicutesandActinobacteriaweregreatestbetweenbuccalmucosaandtonguedorsumrelativetosupragingivalplaque.
Thesefindingsmayinpartbeduetothedifferencesinbodyhabitat,forexample,thebuccalmucosaandtonguedorsumrepresentmicrobialcommunitiesassociatedwithepithelialcellswhichareshedovertimefromsoftHumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org8June2012|Volume7|Issue6|e29044Figure4.
ScreenshotsofMETAREPstatisticalresultpanels.
Listofphylaandpathwaysthataredifferentiallyabundantbetweenthebuccalmucosa(n=116)andsupragingivalplague(n=89)habitats.
TaxonomicdifferencesreportedbyMetastatswithconfidenceintervals(^mm+s:e:(^mm))shownin(a),differencesinKEGGpathwayabundancedetectedbytheWilcoxonrank-sumtestareshownin(b).
doi:10.
1371/journal.
pone.
0029044.
g004Table2.
Numberofpathwaysthataredifferentiallyabundantforeachstatisticaltestandoralhabitatcombination.
WilcoxonBuccalMucosaTongueDorsumSupragingivalPlaqueTotal(redundant)buccal0395291tongue122022144plaque1931230316MetastatsBuccalMucosaTongueDorsumSupragingivalPlaqueTotal(redundant)buccal04162103tongue5402882plaque1331130246Rowsindicatethehabitatinwhichpathwaysweresignificantlyoverrepresented.
Columnsindicatethehabitatinwhichpathwaysweresignificantlyunderrepresented.
Forexample,theWilcoxonrank-sumtestfound39pathwaystobeenrichedinbuccalmucosawhencomparedwithtonguedorsum.
doi:10.
1371/journal.
pone.
0029044.
t002HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org9June2012|Volume7|Issue6|e29044tissuewhilethesupragingivalplaquerepresentsabiofilmadheredtoanon-sheddinghardsurface[35],howeverthisisaresultwhichwarrantsfurtherinvestigation.
Thedifferencesinpathwaydistributiondeterminedinthisanalysisfurtherprovidenewinsightsintoadditionalbiologicaldriversrelatedtohost-microbialinteractionsthatmayplayaroleinthefunctionalandtaxonomicprofilesofthemicrobiomerecoveredwithinandbetweenthesehabitats.
First,theseresultssuggestthattheabilitytosynthesizeavarietyofantibioticsisafunctionpresentintheoralmicrobiome;howeverthispatterndiffersbetweenthebodyhabitatsexamined.
Theinterplaybetweenantibioticsynthesisandresistanceinmicrobialcommu-nitieshasbeendescribedasbiologicalwarfarewherespecificantibioticactivityisopposedbyresistancedeterminantsandthestateofmicrobialmetabolismplaysaroleinantibioticsuscepti-bility[36].
Thus,antibioticproductionisanimportantcontrolfactorofthecolonizationandmaintenanceofmicrobialcommu-nitymembership,andmetabolicfunction.
Next,theseresultsfurthersuggestthatevenintheoralcavityofnormaladultindividualsasexaminedintheHMP,pathwaysassociatedwithpathogenesisarepresentinarangeofabundancesbyhabitat.
Thesepathwaysingeneralsharegeneralfunctionssuchassurfaceattachmentandinvasionofepithelialcellsindicatingthepossiblepresenceofopportunisticpathogensandmoregenerallythepresenceofmicrobialmechanismsforcolonizationofhabitatsinthehumanhost.
Finally,theglycosylationofproteinsisanimportant,conservedposttranslationalmodificationineukaryoticorganismsincludingsecretoryandmembraneproteins[37].
Originallydescribedasexclusivetoeukaryotes,recentstudieshavedeterminedtheirpresenceinalldomainsoflife[38].
Inthisstudy,thispathwayrevealedawidetaxonomicdistributionacrosseukaryotes(14phy-la)andprokaryotes(19phyla)howeverthevastmajority(87%)oftheabundanceofthispathwaywasdeterminedtobeprokaryoticinorigin.
Inbacteriathesepathwayshavebeenbeststudiedinpathogenswhereithasbeensuggestedthattheyareinvolvedinadherenceandinvasionofeukaryoticcells[39].
ThemechanismofN-linkedglycosylationisknowntooccurlargelyonsurfaceexposedglycoproteins,thereforeotherfunctionsfortheseproteinssuggestedincludeprotectionagainstproteolyticcleavage,en-hancementofproteinstabilityorsignalsforcellularsorting[38].
Thepresenceofdifferentiallyabundantpathwaysofantibioticproduction,pathogenesisandN-linkedproteinglycosylationbiosynthesisasdeterminedinthisscenario,revealpotentiallyimportantcontrolfactorsofcolonizationandmaintenanceofmicrobialcommunitymembership,metabolicfunctionandhostinteractioninoralhabitats.
Collectively,thesefeaturesdescribedhere,mayinpartactasdriversofmicrobiomecommunityTable3.
SelectionofKEGGpathwaysfoundtobedifferentiallyabundantinthreeoralhabitatssortedbytheratioofthemedianabundances.
KoIDPathway%MedianA%MedianBMedianRatioA/BWilcoxonadj.
p-valueMetastatsadj.
p-valueA=buccalmucosa(n=116)B=supragingivalplaque(n=89)05150Staphylococcusaureusinfection0.
28510.
12322.
314,0.
0000010.
028200311Penicillinandcephalosporinbiosynthesis0.
0630.
0361.
75,0.
0000010.
028200253Tetracyclinebiosynthesis0.
26990.
18881.
43,0.
0000010.
02900524Butirosinandneomycinbiosynthesis0.
06470.
05321.
216,0.
0000010.
028400521Streptomycinbiosynthesis0.
37460.
46510.
8050.
0002990.
028405120EpithelialcellsignalinginHelicobacterpyloriinfection0.
10340.
13380.
773,0.
000001.
0.
0501055Biosynthesisofvancomycingroupantibiotics0.
08730.
12760.
684,0.
0000010.
02900510N-Glycanbiosynthesis0.
01790.
06370.
281,0.
0000010.
0299A=buccalmucosa(n=116)B=tonguedorsum(n=23)05150Staphylococcusaureusinfection0.
28510.
12322.
314,0.
0000010.
02800311Penicillinandcephalosporinbiosynthesis0.
0630.
0361.
75,0.
0000010.
028200253Tetracyclinebiosynthesis0.
26990.
20831.
296,0.
0000010.
028901055Biosynthesisofvancomycingroupantibiotics0.
08730.
09930.
8790.
005980.
028900521Streptomycinbiosynthesis0.
37460.
46510.
8050.
0002990.
028205120EpithelialcellsignalinginHelicobacterpyloriinfection0.
10340.
13380.
773,0.
0000010.
02800510N-Glycanbiosynthesis0.
01790.
03620.
494,0.
0000010.
0299A=supragingivalplaque(n=89)B=tonguedorsum(n=23)00510N-Glycanbiosynthesis0.
06370.
03621.
76,0.
0000010.
029901055Biosynthesisofvancomycingroupantibiotics0.
12760.
09931.
285,0.
0000010.
02900521Streptomycinbiosynthesis0.
54140.
46511.
164,0.
0000010.
028405120EpithelialcellsignalinginHelicobacterpyloriinfection0.
09690.
13380.
724,0.
0000010.
028305150Staphylococcusaureusinfection0.
06760.
12320.
5490.
000004.
0.
05doi:10.
1371/journal.
pone.
0029044.
t003HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org10June2012|Volume7|Issue6|e29044structureandassuchcontributetodifferencesinthetaxonomyandfunctionbetweenbodyhabitats.
SoftwareArchitecture&ImprovementsThesoftwareintegratesseveralopensourcetoolsanddatabasesystemstofacilitatetheanalysisoflargevolumesofmetagenomicannotationdataviaawebinterface[19](Figure5).
METAREP1.
3.
1,addsaprogrammaticinterfacetoaccesslocallystoreddata,supportsweightedannotations,distributedweightedsearches,andfunctionalitytoimportHUMAnNandJPMAPannotations(seeMethodssection).
Newanalysisfeaturesincludebrowsing,searchingandcomparingKEGGpathwaysbasedonKOs,enhancedclusteringviamultipledistancematrixoptions(Morisita-Horn,Jaccard,Bray-CurtisandEuclidean)forgener-atingandvisualizinghierarchicalclustering,heatmap,andmulti-dimensionalscalingoutputs,andintegrationofGOslimstosummarizeGOannotations.
Thedatainterfacecurrentlysupportsthreeannotationformats(Figure5).
Themostgenericformatisatab-delimitedfilewithrowsrepresentingannotationentities(read,transcript,contig,gene,etc.
)andcolumnsrepresenting17predefinedcategoricalandquantitativeannota-tionattributes(Table4).
Thenewversion,allowsuserstospecifyKEGGorthologsandaweightforeachannotationtointegratequantitativeinformation.
ThedatabackendconsistsofaMySQLrelationaldatabaseandanon-relationalSolr/Lucenefull-textsearchserver.
Therelationaldatabaseisusedtostorehierarchi-caldata(NCBItaxonomy,GO,KEGG/MetaCycpathways,enzymeclassification),anddatasetandprojectmeta-datainformationaswellasuseraccountinformation.
TheSolr/Luceneplatformprovidesfastaccesstoimportedannotationdataandisusedtosummarizeannotationattributefrequencies.
Itsfacetingfunctionalityisusedforunweightedannotationswhilethestatisticalcomponentisusedforweightedannotations(seeMethodssection).
TheRstatisticalpackagesupportsstatisticaltestsandthegenerationofhighresolutionPDFplots.
ThewebinterfacelogicisimplementedinPHPusingtheCAKEPHPframeworktoseparatethedataaccesslayerfromthedatarepresentationlayerviacontrollerlogic(ModelViewControllerparadigm).
Web2.
0elementsareimplementedinJavaScriptusingthejQueryandjQueryUIlibraries.
DatacommunicationbetweenthePHPcontrollerlogicandtheSolr/Lucenebackendutilizesthelight-weightJSONdata-interchangeformatforoptimaldatatransfer.
ThesoftwareincludesPerlmodulesthatallowuserstoautomaticallydownloadup-to-dateversionsofthehierarchicaldata,importannotations,andprogrammaticallyaccessdatastoredintheSolr/Luceneindexfiles.
ThelatestopensourcecodelicensedundertheMITlicenseisavailableathttps://github.
com/jcvi/METAREP.
ScalabilityTomeasuretheimpactofweightingannotations,thequeryresponsetimeperformancewasbenchmarkedusingdatasetsfromthebuccalmucosahabitat,acollectionof100samples,eachhaving1millionentries.
Twoalternativeweightedsearchapproacheswereconsidered,onetosearchthepooleddatasetandonetosearchtheindividualdatasetsinparallel.
Thustheoverallsearchvolumewaskeptconsistentat100millionentries.
Foreachsearchapproach,theweightedqueryresponsetimeswererecordedfor10queriesthatreturnbetween1and100millionentriesusing10replicateseach.
Asabaseline,unweightedsearchtimeswererecordedaswellandlinearregressionanalyseswerecarriedout(Figure6,TableS5).
ThebenchmarkwascarriedoutusingthehardwareasspecifiedintheMethodssection.
Undertheconstraintsofthehardwareandtestdata,weobservedthefollowing.
Whileunweightedsearchesresultedinresponsetimesoflessthan52millisecondsforincreasingnumberofmatchingentrieswithslopesnotsignificantlydifferentfrom0,theweightedsearchwasproportionaltothenumberofmatchingentrieswithresponsetimesofupto62secondsindicatingatimecomplexityofO(n)(blueandreddottedregressionlines,R-squared0.
999).
However,weightedresultsshowthatgivenourhardwarethedistributedsearchresultedina7.
9foldreductionofqueryresponsetimeforanyquerywhencomparedtotheundistributedsearch(basedontheproportionofthetwoslopes)withamaximumresponseof8seconds.
DiscussionAssequencingtechnologiesprogress,computationalmethodsareconstantlybeingdevelopedorimprovedtocopewithincreasedthroughputandtoaccommodatechangesinthenatureofthedata.
ShortreadannotationisaspecialchallengethatwasaccuratelyaddressedbytheHUMANnNmethodologyaspartoftheHMPproject[16].
Giventhevolumeofthedata,exploratoryanalysisandvisualizationissimilarlychallenging.
Inthepresentstudy,weshowthatthecurrentversionofoursoftware,hasbeenadaptedtohandleweightedannotations,andcanbeusedtosimultaneouslysearch,compare,cluster,andfunctionallycharac-terizehundredsofmetagenomicsamplescomprisingannotationsderivedfrombillionsofWGSsequencereads.
OtherscenariosforintegratingweightedannotationschemesincludeweightingannotationsbythenumberofassembledreadsperORFpredictedfromassemblies,orquantifyingmoleculesinmetatranscriptomicsormetaproteomicsstudies.
Ourbenchmarksindicatethatresponsetimeincreaseslinearlywithanincreasingnumberofweightedentries(weobservedanincreaseof6.
0secondsper10millionadditionalentries).
However,forthisreleaseofthesoftwarewehaveaddedfunctionalitytosupportweighteddistributedsearcheswhichcansignificantlyimprovescalabilityonmulti-coreserversystems.
Forourhardwareconfiguration,weobservedanincreaseof0.
8secondsper10millionadditionalentries.
Tohighlightkeyfunctionalityofthesoftware,wepresentedseveralscenariosdesignedtoanalyzethehumanmicrobiome.
Weshowedhowtoanalyzeaselectionoffunctionalmarkersacrosstaxonomicclassificationsandbodyhabitats,clustermultipledatasetsbyfunctionalandtaxonomicattributes,anddemonstratedhowtoidentifydifferentiallyabundantfeaturesusingstatisticaltests.
WepointoutthatMETAREPfurtherpossessesmanyadditionalfeaturesthatarenotdiscussedindepthherebutprovideincreasedcapabilitytoanalyzeandcomparelargeandcomplexmetagenomicdatasets.
Forexample,theBrowsePathwayspageallowsuserstovisualizeenzymeorKEGGorthologabundancesontopofKEGGpathwaymapsandrestricttheresultstocertaintaxaorfunctions,statisticaltestscanbeappliedtoasubsetofthedata,suchasenzymaticmarkers,aswellasbeusedtocompareotherannotationattributesincludingMetaCyc,enzymeorthenewlyimplementedGOslimclassifications[13].
Allanalysisfeaturesanddataincluding700assemblydatasetsnotanalyzedinthisarticleareavailableasacommunityresourceathttp://www.
jcvi.
org/hmp-metarep.
Thescenariospresentedinthisstudy,whilenotexhaustiveintheirscope,havenonethelesshighlightedimportantinsightsintothehumanmicrobiomeandgeneratedadditionalhypothesesforfurtherinvestigation.
Amongthekeyinsightswehaveidentifiedisthatfirst,examinationofenzymeprofilesbytaxonomyprovidesamechanismtoidentifydifferentialabundanceofanenzymaticfunctioncoupledwiththemicroorganismscontributingthefunctioninquestion.
Inthisstudy(Scenario1),weusedthisHumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org11June2012|Volume7|Issue6|e29044Figure5.
Softwarearchitectureoverview.
TheMETAREPsoftwareintegratesseveralopensourcetoolstoimport,storeandanalyzemetagenomicsannotations.
Userscananalyzestoreddatausingavarietyofwebbasedtools.
AsubsetofthewebfunctionalityisavailableviaaprogrammaticaccessmodulewhichallowsdataretrievaldirectlyfromtheMySQLdatabaseandLuceneindexfiles.
doi:10.
1371/journal.
pone.
0029044.
g005Table4.
ColumndescriptionsoftheMETAREPtabdelimitedimportformat.
ColumnFieldNameDescriptionJPMAPHUMAnN1peptide_iduniqueentryIDJCVI_PEP_1234123ptr:4531182library_iddatasetIDSRS011061SRS0110613com_namefunctionaldescriptionsugarABCtransporter,periplasmicsugar-bindingproteinLGMN;legumain;K01369legumain[EC:3.
4.
22.
34]4com_name_srcfunctionaldescriptionsourceUniref100_A23521ptr:453118descriptionassignment5go_idGeneOntologyIDGO:0009265GO:00015096go_srcGeneOntologysourcePF02511K01369assignment7ec_idEnzymeCommissionID2.
1.
1.
1483.
4.
22.
348ec_srcEnzymeCommissionsourcePRIAMptr:4531189hmm_idHMMIDPF02511NA10blast_treeNCBItaxonomyID246194959811blast_evalueBLASTE-Value1.
78E-20median12blast_pidBLASTpercentidentity0.
93median13blast_covBLASTsequencecoverage0.
82N/A14filterfiltertagrepeatN/A15ko_idKEGGOrthologIDN/AK0136916ko_srcKEGGOrthologSourceN/Aptr:45311817weightWeighttoadjustabundanceofassignments143.
23doi:10.
1371/journal.
pone.
0029044.
t004HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org12June2012|Volume7|Issue6|e29044strategytoidentifymicroorganismsthatarenotabundantacrossthehumanmicrobiomeintotalsuchastheCrenarchaeotaandEuryarchaeotathatnonetheless,revealedanassociationwithskin.
Theseresultsfurthersuggestthatthelowabundanttaxonomicclassificationsserveasanimportantreservoirofgeneticdiversityinthehumanmicrobiome.
Next,thevariationoftaxonomicandfunctionalprofileswithinbodysitesisgenerallylessthanvariationbetweenbodysites.
Relativelyspeaking,greatervariationoccursacrossbodyhabitat,individualandtimealthoughitmaybepossibletousetaxonomicandfunctionalprofilestogroupindividuals.
Further,thelinkbetweentaxonomicandfunctionalprofilesbetweenbodyhabitatsisnotalwayscoupled.
ThisfindingisillustratedparticularlyinthecomparisonsofdendrogramsbasedontaxonomicandfunctionalprofilesofthePFORenzyme(Scenario1)andtheexaminationofmetabolicpathwaysacrossHMPdonorsamples(Scenario2).
Finally,examinationoftheFigure6.
Comparisonofqueryresponsetimefortwoweightedsearchapproaches.
Eachdatapointmarksthequeryresponsetime(yaxis)foraquerythatreturnedxnumberofentries(xaxis).
Thebluelineindicatesthelinearfitfortheweightedsearchapproachwhiletheredlineindicatesthelinearfitforthedistributedweightedsearchapproach.
Parameterestimationsforthelinearregressionmodelsaregivenintheboxesabovethefittedlines.
doi:10.
1371/journal.
pone.
0029044.
g006HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org13June2012|Volume7|Issue6|e29044differentialabundanceofmetabolicpathwaysacrossthreecontrastingoralbodyhabitats(Scenario3)suggeststhattherearepathways,includingmanythatparticipateincentralinterme-diarymetabolism,whichrevealnostatisticallysignificantdiffer-encebetweenthem.
Thisfindingimpliesthattheremaybecommonpathwayscentraltothemetabolicpotentialoftheoralmicrobiome.
However,therearedifferencesbetweenoralbodyhabitatsincludingantibioticbiosynthesis,pathogenesisandproteinglycosylationasidentifiedinthisstudywhichmaybebiologicaldriversimportantintheoralmicrobiomecolonizationandmaintenance,andthatcontributetoalterationsintaxonomicandfunctionalprofiles.
Collectively,theresultsofthisstudyindicatethechallengeofstudyingmetagenomicdatafromthehumanmicrobiomeasitcanbeinfluencedbytechnicalartifactsrelatedtosampling,sequencing,andannotationbiases,howevertheapplicationofsophisticatedtoolsfordatafiltering,analysisandvisualizationaspresentedintheMETAREPsoftwarefundamen-tallyenhanceourabilitytoexplore,characterizeandinterpretthesecomplexdatasets.
MethodsEthicsStatementAsapartofamulti-institutionalcollaboration,theHumanMicrobiomeProjecthumansubjectsstudywasreviewedbytheInstitutionalReviewBoardsatBaylorCollegeofMedicineunderIRBProtocolH-22895,theWashingtonUniversitySchoolofMedicineunderprotocolnumberHMP-07-001(IRBID#201105198)andattheJ.
CraigVenterInstituteunderIRBProtocolNumber2008-084.
AllstudyparticipantsgavetheirwritteninformedconsentbeforesamplingandthestudywasconductedusingtheHumanMicrobiomeProjectCoreSamplingProtocolA.
EachIRBhasafederalwideassuranceandfollowstheregulationsestablishedat45CFRPart46.
ThestudywasconductedinaccordancewiththeethicalprinciplesexpressedintheDeclarationofHelsinkiandtherequirementsofapplicablefederalregulations.
SequenceGeneration,Preprocessing,andAnnotationDNAwasextractedfrom108samplesfollowedbyIlluminaand454sequencing[13].
Lowqualityregionsatthebeginningandendofeachreadweretrimmedfollowedbytheremovalofsequencingartifactsandhumancontaminatedsequences[13].
Next,prepro-cessedsequenceswereassembledbytheSOAPdenovoassemblerintothreedistincttypesofassemblies,PrettyGoodAssemblies(HMPBuild1.
0HMASM),HybridAssemblies(HMPBuild1.
0HMHASM),andbodyhabitatspecificassemblies.
ORFswereidentifiedbyMetageneMarkandannotatedusingJPMAP[18].
Foreachpredictedpeptide,thepipelineranksandchoosesthebestevidencesobtainedbyseveralhomologysearchesincludingaBLASTPsearchagainstUniRef100[40]andaHMMER3searchagainstacollectionofTIGRFAMandPFAMHiddenMarkovmodels(HMM)[41].
Theopensourcecodeisavailableathttps://github.
com/jcvi/JCVI_HMP_metagenomic_pipeline.
Thepipe-linecanberunwithinErgatisl[42],aworkflowtoolthatsupportscomputegridexecutions.
UnassembledreadswereannotatedbytheHUMAnNpipeline[16]whichcharacterizesshortreadsusinganacceleratedversionoftheBLASTXalgorithmagainstacollectionoffunctionallyannotatedproteindatabasesincludingKEGG[21]andMetaCyc[22],amongothers.
Thesoftwareisavailableathttp://huttenhower.
sph.
harvard.
edu/humann.
METAREP1.
3.
1InstallationTousethesoftware,theMETAREPsourcecodeanddependentsoftwarehavetobeinstalledonaLinuxbasedoperatingsystem.
WerecommenduserstostartwithaminimalCentOS5.
5installationandusetheCentOSYUMpackageinstallertoinstallthe3rdpartytools.
AcompletelistofYUMpackagesanddetailedinformationontheinstallationprocesscanbefoundontheMETAREPWIKIpageathttps://github.
com/jcvi/METAREP/wiki/installation-guide-v-1.
3.
1.
Userscandown-loadtheMETAREP1.
3.
1sourceviaGitHubathttps://github.
com/jcvi/METAREP/zipball/1.
3.
1-betaandconfiguretheME-TAREPinstancebyeditingtheapplicationanddatabaseconfigurationfiles.
Aftersuccessfulconfigurationofthesoftware,userscanimportannotationdata.
Importandupdatescriptscanbefoundunderthescripts/perldirectory.
Exampleannotationscanbefoundinthedatadirectory.
ImportationofHUMAnNAnnotationsWedownloadedMBLASTXresultsagainstKEGGfor498datasetsfromtheHMPDataAnalysisandCoordinationCenter(DACC,http://www.
hmpdacc.
org)andranHUMAnNv0.
8usingitsMETAREPoutputformatoption.
ThefilescontainaKEGGgeneID,itsmedianBLASTE-valueoverallreads,medianBLASTpercentidentity,medianreadlength,andaweightindicatingthegenes'relativeabundanceinthesample.
AllmediansarecalculatedpergeneoverallBLASThitsmatchingit,andweightsrepresentnormalizedreadcountsadjustedforindividualalignmentqualityandgenelength(comparabletoReadsperKilobaseperMillion(RPKM)forRNA-seq[43]).
WeobservedaSpearmancorrelationof0.
94betweenthenumberofreadsandsumoftheweightsforpooledbodyhabitatdatasets.
Fordetailsoftheweightingandnormalizationprocessmappingreadstogenesandorthologousfamilies,see[16].
ExampleoutputfilescanbefoundintheMETAREPinstallationunderthedata/humanndirectory.
Next,498HUMANnNoutputfileswereimportedintoMETAREPusingtheimportscriptmetarep_loa-der.
pl.
AspartoftheHUMANnNindexingprocessadditionalKEGGannotationattributesincludingspeciesname,functionaldescription,KO,ECandGOassignmentsarefetchedfromaSQLitedatabase.
ThedatabasecanbecreatedbasedondownloadedKEGGFTPdata(licenseisrequired)usingthemetarep_update_database.
plscript.
ImportationofJPMAPAnnotationsWedownloadedJPMAPannotationsfor15hybridand690prettygoodassembliesfromtheDACC.
Next,weloadedtheannotationsintotheHMPMETAREPinstanceusingtheimportscriptmetarep_loader.
pl.
ExampleJPMAPoutputfilescanbefoundunderthedata/jpmapdirectory.
ImportationofGenericAnnotationsToimportannotationsfromotherpipelines,dataneedstobeformattedaccordingtotheMETAREPtabdelimitedformatspecifiedinTable4.
Examplesoftabdelimitedannotationfilescanbefoundunderthedata/tabdirectory.
Filescanbeimportedusingtheannotationimportscriptmetarep_loader.
pl.
DynamicWeightingofAnnotationsIfannotationweightsaresupplied,absolutefrequenciesarecalculatedasthesumofweightsofannotationentriesthatcontainacertainannotationattribute.
ThisisaccomplishedbyapplyingtheSolr/LuceneStatsComponentusingtheweightfieldasthestatsfieldparameter.
RelativefrequenciesarecalculatedastheHumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org14June2012|Volume7|Issue6|e29044sumofweightsofannotationentriesthatcontainacertainfeaturedividedbythesumofallannotationweights.
Forexample,letusassumethereare100entriesintotalwithweightsencodingannotationquality.
80entrieswiththeKEGGorthologfield(column15)setto'K00849'(galactokinase).
70entriesoutofthe80havetheweightfield(column16)setto'89whiletheremaining10entrieshaveitsetto'49.
Inaddition,thereare20entriesfor'K00856'(adenosinekinase),anotherKEGGorthologwiththeweightfieldsetto20(highannotationconfidence).
Therelativefrequencyforfeature'K00849'wouldbe80%iftheweightswereallequal.
Usingthenewweightingfeaturetherelativefrequencyisdynamicallyadjustedto60%:p(K00849)~XweightK00849=Xweighttotal~(8|70z10|4)=(8|70z4|10z20|20)~600=1000~0:60ScenarioFilterQueriesMETAREPallowsuserstofilterannotationsusingtheLucenequerylanguage.
Aqueryelementisspecifiedbythefieldnametobefollowedbythevalueseparatedbyacolon.
Forexample,thequery'ec_id:1.
2.
7.
19retrievespyruvatesynthaseentries.
Support-edsearchfieldsaregivenincolumn2ofTable4.
Inscenario1,tofilterpooledbodyhabitatsforthepyruvatedehydrogenasecomplex,wesearchedfor'ec_id:1.
2.
4.
1ORec_id:2.
3.
1.
12ORec_id:1.
8.
1.
49.
Alternatively,theKOattributecanusedtofilterfortheenzymeaswell:'ko_id:K00161ORko_id:K00162ORko_id:K00163ORko_id:K00627ORko_id:K00382'.
Filterqueriesforpyruvate-ferredoxinoxidoreductaseandpyruvate-formatelyasewere'ec_id:1.
2.
7.
19and'ec_id:2.
3.
1.
549respective-ly.
Forscenario3,wefilteredthepooledoralhabitatsfortheNCBItaxonChordatausing'NOTblast_tree:7711'.
HardwareTheHMPMETAREPinstancerunsonasingleserverwithtwomulti-threadedXeonX75602.
26GHzprocessorswithatotalof16cores(32threads),256GRAM,and4terabyteofdiskspace.
SupportingInformationFigureS1Impactofdistancematrixselectiononenzymaticmarkerbasedbodyhabitatclustering.
MarkerabundanceforPDHC(a-c),PFOR(d-f),andPFL(g-i)iscontrastedacrossphyla(columns)andbodyhabitats(rows)usingMorisita-Horn,Bray-CurtisandEuclideandistancematricesincombinationwiththeaveragelinkageclusteringmethod.
(PDF)FigureS2Hierarchicalclusterplotof84firstandsecondvisitsamplepairsclusteredbyNCBItaxonomy.
Hierarchicalclusteringanalysisofhumanmicrobiomesampleswithfirstandsecondvisits(n=168)takenfrom15humanbodyhabitatsclusteredbyNCBItaxonomyattheFamilylevel.
ClustersweregeneratedbytheaveragelinkageclusteringmethodusingtheMorisita-Hornindextogenerateadistancematrix(shownonthex-axis).
Datasetlabelsencodethefollowinginformation[donorID]-[habitat]-[gender]-[timepoint]-[sampleID]-[annotation-type].
(PDF)FigureS3Hierarchicalclusterplotof84firstandsecondvisitsamplepairsclusteredbyKEGGpathways.
Hierarchicalclusteringanalysisofhumanmicrobiomesampleswithfirstandsecondvisits(n=168)takenfrom15humanbodyregionsclusteredbyKEGGpathway.
ClustersweregeneratedbytheaveragelinkageclusteringmethodusingtheMorisita-Hornindextogenerateadistancematrix(shownonthex-axis).
Datasetlabelsencodethefollowinginformation[donorID]-[habitat]-[gender]-[timepoint]-[sampleID]-[annotation-type].
(PDF)TableS1Enzymaticmarkercountsacrossphylaandbodyhabitats.
(XLS)TableS2Bodyhabitatandgenderstatisticfor168sampleswith1stand2ndvisits.
(XLS)TableS3Differentiallyabundantphyla(buccalmucosavs.
tonguedorsum).
(XLS)TableS4Differentiallyabundantpathways(buccalmucosavs.
tonguedorsum).
(XLS)TableS5Queryresponsebenchmarkstatistics.
(XLS)AcknowledgmentsTheauthorsgratefullythankthemembersoftheHumanMicrobiomeProjectConsortium'sMetabolicReconstructionworkinggroup.
TheauthorsthankJoshuaOrvisformigratingJPMAPtobeusedwithErgatisandprovidingitasanopensourceresourceviaGitHub.
TheauthorsacknowledgeDr.
DouglasRuschforhisuserandtechnicalfeedback.
WearealsogratefultotheentireHumanMicrobiomeProjectConsortiumandassociatedinvestigatorsfrommanyadditionalinstitutions,andtheNIHOfficeoftheDirectorofRoadmapInitiative,formakingtheHMPpossible.
AuthorContributionsAnalyzedthedata:JGBAMMT.
Contributedreagents/materials/analysistools:JGCHSA.
Wrotethepaper:BAMJGCHMTSY.
References1.
VenterJC,RemingtonK,HeidelbergJF,HalpernAL,RuschD,etal.
(2004)EnvironmentalgenomeshotgunsequencingoftheSargassoSea.
Science304:66–74.
2.
YoosephS,SuttonG,RuschDB,HalpernAL,WilliamsonSJ,etal.
(2007)TheSorcererIIGlobalOceanSamplingexpedition:expandingtheuniverseofproteinfamilies.
PLoSBiol5:e16.
3.
RuschDB,HalpernAL,SuttonG,HeidelbergKB,WilliamsonS,etal.
(2007)TheSorcererIIGlobalOceanSamplingexpedition:northwestAtlanticthrougheasterntropicalPacific.
PLoSBiol5:e77.
4.
CardenasE,WuWM,LeighMB,CarleyJ,CarrollS,etal.
(2010)Significantassociationbetweensulfate-reducingbacteriaanduranium-reducingmicrobialcommunitiesasrevealedbyacombinedmassivelyparallelsequencing-indicatorspeciesapproach.
ApplEnvironMicrobiol76:6778–6786.
5.
BertinPN,Heinrich-SalmeronA,PelletierE,Goulhen-CholletF,Arse`ne-PloetzeF,etal.
(2011)Metabolicdiversityamongmainmicroorganismsinsideanarsenic-richecosystemrevealedbymeta-andproteo-genomics.
ISMEJ.
6.
HessM,SczyrbaA,EganR,KimTW,ChokhawalaH,etal.
(2011)Metagenomicdiscoveryofbiomass-degradinggenesandgenomesfromcowrumen.
Science331:463–467.
7.
ArumugamM,RaesJ,PelletierE,LePaslierD,YamadaT,etal.
(2011)Enterotypesofthehumangutmicrobiome.
Nature473:174–80.
8.
QinJ,LiR,RaesJ,ArumugamM,BurgdorfKS,etal.
(2010)Ahumangutmicrobialgenecatalogueestablishedbymetagenomicsequencing.
Nature464:59–65.
HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org15June2012|Volume7|Issue6|e290449.
GilbertJA,MeyerF,AntonopoulosD,BalajiP,BrownCT,etal.
(2010)Meetingreport:theterabasemetagenomicsworkshopandthevisionofanearthmicrobiomeproject.
StandGenomicSci3:243–8.
10.
NIHHMPWorkingGroup,PetersonJ,GargesS,GiovanniM,McInnesP,etal.
(2009)TheNIHHumanMicrobiomeProject.
GenomeRes19:2317–2323.
11.
JumpstartConsortiumHumanMicrobiomeProjectDataGenerationWorkingGroupHighthroughputmethodsfor16Ssequencinginhumanmetagenomics.
Inreview.
12.
HumanMicrobiomeJumpstartReferenceStrainsConsortium,NelsonKE,WeinstockGM,High-landerSK,WorleyKC,etal.
(2010)Acatalogofreferencegenomesfromthehumanmicrobiome.
Science328:994–999.
13.
TheHumanMicrobiomeConsortium(2012)AFrameworkforHumanMicrobiomeResearch.
Nature:doi:10.
1038/nature11209.
14.
TheHumanMicrobiomeConsortium(2012)Structure,FunctionandDiversityofHumanMicrobiomeinanAdultReferencePopulation.
Nature:doi:10.
1038/nature11234.
15.
Aagaard,PetrosinoK,KeitelJ,WatsonW,KatancikM,etal.
Acomprehensivestrategyforsamplingthehumanmicrobiome.
Inreview.
16.
AbubuckerS,SegataN,GollJ,SchubertAM,IzardJ,etal.
Metabolicreconstructionformetagenomicdataanditsapplicationtothehumanmicrobiome.
Inreview.
17.
GlassEM,WilkeningJ,WilkeA,AntonopoulosD,MeyerF(2010)UsingthemetagenomicsRASTserver(MG-RAST)foranalyzingshotgunmetagenomes.
ColdSpringHarbProtoc2010:pdb.
prot5368.
18.
TanenbaumDM,GollJ,MurphyS,KumarP,ZafarN,etal.
(2010)TheJCVIstandardoperatingprocedureforannotatingprokaryoticmetagenomicshotgunsequencingdata.
StandGenomicSci2:229–237.
19.
GollJ,RuschDB,TanenbaumDM,ThiagarajanM,LiK,etal.
(2010)METAREP:JCVImetagenomicsreports–anopensourcetoolforhigh-performancecomparativemetagenomics.
Bioinformatics26:2631–2632.
20.
AshburnerM,BallCA,BlakeJA,BotsteinD,ButlerH,etal.
(2000)Geneontology:toolfortheunificationofbiology.
TheGeneOntologyConsortium.
NatGenet25:25–9.
21.
KanehisaM,GotoS,FurumichiM,TanabeM,HirakawaM(2010)KEGGforrepresentationandanalysisofmolecularnetworksinvolvingdiseasesanddrugs.
NucleicAcidsRes38:D355–D360.
22.
CaspiR,AltmanT,DreherK,FulcherCA,SubhravetiP,etal.
(2011)Themetacycdatabaseofmetabolicpathwaysandenzymesandthebiocyccollectionofpathway/genomedatabases.
NucleicAcidsRes.
23.
FellDA,WagnerA(2000)Thesmallworldofmetabolism.
NatBiotechnol18:1121–2.
24.
DattaA(1991)CharacterizationoftheinhibitionofEscherichiacolipyruvatedehydrogenasecomplexbypyruvate.
BiochemBiophysResCommun176:517–521.
25.
BuckelW,GoldingBT(2006)Radicalenzymesinanaerobes.
AnnuRevMicrobiol60:27–49.
26.
ArnauJ,JrgensenF,MadsenSM,VrangA,IsraelsenH(1997)Cloning,expression,andcharacterizationoftheLactococcuslactispflgene,encodingpyruvateformate-lyase.
JBacteriol179:5884–5891.
27.
HassanBH,CronanJE(2011)Protein-proteininteractionsinassemblyoflipoicacidonthe2-oxoaciddehydrogenasesofaerobicmetabolism.
JBiolChem286:8263–8276.
28.
FengX,TangKH,BlankenshipRE,TangYJ(2010)MetabolicuxanalysisofthemixotrophicmetabolismsinthegreensulfurbacteriumChlorobaculumtepidum.
JBiolChem285:39544–39550.
29.
LeibigM,LiebekeM,MaderD,LalkM,PeschelA,etal.
(2011)PyruvateformatelyaseactsasaformatesupplierformetabolicprocessesduringanaerobiosisinStaphylococcusaureus.
JBacteriol193:952–962.
30.
CaporasoJG,LauberCL,CostelloEK,Berg-LyonsD,GonzalezA,etal.
(2011)Movingpicturesofthehumanmicrobiome.
GenomeBiol12:R50.
31.
CostelloEK,LauberCL,HamadyM,FiererN,GordonJI,etal.
(2009)Bacterialcommunityvariationinhumanbodyhabitatsacrossspaceandtime.
Science326:1694–1697.
32.
AasJA,PasterBJ,StokesLN,OlsenI,DewhirstFE(2005)Definingthenormalbacterialoraoftheoralcavity.
JClinMicrobiol43:5721–5732.
33.
DewhirstFE,ChenT,IzardJ,PasterBJ,TannerACR,etal.
(2010)Thehumanoralmicrobiome.
JBacteriol192:5002–5017.
34.
WhiteJR,NagarajanN,PopM(2009)Statisticalmethodsfordetectingdifferentiallyabundantfeaturesinclinicalmetagenomicsamples.
PLoSComputBiol5:e1000352.
35.
MagerDL,Ximenez-FyvieLA,HaffajeeAD,SocranskySS(2003)Distributionofselectedbacterialspeciesonintraoralsurfaces.
JClinPeriodontol30:644–54.
36.
MartnezJL,RojoF(2011)Metabolicregulationofantibioticresistance.
FEMSMicrobiolRevdoi:10.
1111/j.
1574–6976.
2011.
00282.
x.
37.
HeleniusA,AebiM(2004)RolesofN-linkedglycansintheendoplasmicreticulum.
AnnuRevBiochem73:1019–49.
38.
SzymanskiCM,WrenBW(2005)Proteinglycosylationinbacterialmucosalpathogens.
NatRevMicrobiol3:225–37.
39.
SzymanskiCM,BurrDH,GuerryP(2002)Campylobacterproteinglycosylationaffectshostcellinteractions.
InfectImmun70:2242–4.
40.
SuzekBE,HuangH,McGarveyP,MazumderR,WuCH(2007)UniRef:comprehensiveandnon-redundantUniProtreferenceclusters.
Bioinformatics23:1282–1288.
41.
EddySR(2009)Anewgenerationofhomologysearchtoolsbasedonprobabilisticinference.
GenomeInform23:205–211.
42.
OrvisJ,CrabtreeJ,GalensK,GussmanA,InmanJM,etal.
(2010)Ergatis:awebinterfaceandscalablesoftwaresystemforbioinformaticsworkows.
Bioinformatics26:1488–1492.
43.
PepkeS,WoldB,MortazaviA(2009)ComputationforChIP-seqandRNA-seqstudies.
NatMethods6:S22–S32.
HumanMicrobiomeCaseStudyAnalysisUsingMETAREPPLoSONE|www.
plosone.
org16June2012|Volume7|Issue6|e29044
Moack怎么样?Moack(蘑菇主机)是一家成立于2016年的商家,据说是国人和韩国合资开办的主机商家,目前主要销售独立服务器,机房位于韩国MOACK机房,网络接入了kt/lg/kinx三条线路,目前到中国大陆的速度非常好,国内Ping值平均在45MS左右,而且商家的套餐比较便宜,针对国人有很多活动。不过目前如果购买机器如需现场处理,由于COVID-19越来越严重,MOACK办公楼里的人也被感染...
中秋节快到了,spinservers针对中国用户准备了几款圣何塞机房特别独立服务器,大家知道这家服务器都是高配,这次推出的机器除了配置高以外,默认1Gbps不限制流量,解除了常规机器10TB/月的流量限制,价格每月179美元起,机器自动化上架,一般30分钟内,有基本自助管理功能,带IPMI,支持安装Windows或者Linux操作系统。配置一 $179/月CPU:Dual Intel Xeon E...
EtherNetservers是一家成立于2013年的英国主机商,提供基于OpenVZ和KVM架构的VPS,数据中心包括美国洛杉矶、新泽西和杰克逊维尔,商家支持使用PayPal、支付宝等付款方式,提供 60 天退款保证,这在IDC行业来说很少见,也可见商家对自家产品很有信心。有需要便宜VPS、多IP VPS的朋友可以关注一下。优惠码SUMMER-VPS-15 (终身 15% 的折扣)SUMMER-...
jqueryfind为你推荐
三星ituneshttp://www.tutorialspoint.com/css/css_dimension.htm支持ipad支持ipadxp如何关闭445端口请大家帮帮忙,怎样关闭135和445端口?win7关闭445端口win7系统怎么关闭445和135这两个端口win7telnetWin7系统中的telnet命令如何应用?联通版iphone4s怎样看苹果4S是联通版还是电信版firefoxflash插件安装火狐浏览器后,老是提示安装flash player?联通合约机iphone5我想问下,我想入手iphone5的联通合约机, 会被坑吗
鲁诺vps 科迈动态域名 arvixe siteground cpanel sockscap 512au evssl 北京主机 大容量存储器 申请个人网站 双11秒杀 多线空间 路由跟踪 电信网络测速器 秒杀品 服务器论坛 购买空间 杭州电信 hdchina 更多