determinezcloud
zcloud 时间:2021-01-11 阅读:(
)
HALId:hal-01395715https://hal.
inria.
fr/hal-01395715Submittedon11Nov2016HALisamulti-disciplinaryopenaccessarchiveforthedepositanddisseminationofsci-entificresearchdocuments,whethertheyarepub-lishedornot.
ThedocumentsmaycomefromteachingandresearchinstitutionsinFranceorabroad,orfrompublicorprivateresearchcenters.
L'archiveouvertepluridisciplinaireHAL,estdestinéeaudéptetàladiffusiondedocumentsscientifiquesdeniveaurecherche,publiésounon,émanantdesétablissementsd'enseignementetderecherchefranaisouétrangers,deslaboratoirespublicsouprivés.
ManagingHotMetadataforScientificWorkflowsonMultisiteCloudsLuisPineda-Morales,JiLiu,AlexandruCostan,EstherPacitti,GabrielAntoniu,PatrickValduriez,MartaMattosoTocitethisversion:LuisPineda-Morales,JiLiu,AlexandruCostan,EstherPacitti,GabrielAntoniu,etal.
.
ManagingHotMetadataforScientificWorkflowsonMultisiteClouds.
BigData,Dec2016,Washington,DC,UnitedStates.
pp.
390-397,10.
1109/BigData.
2016.
7840628.
hal-01395715ManagingHotMetadataforScienticWorkowsonMultisiteCloudsLuisPineda-Morales,JiLiu,AlexandruCostan,EstherPacitti,GabrielAntoniu§,PatrickValduriez§,andMartaMattosoMicrosoftResearch-InriaJointCentre,France§Inria,France{luis.
pineda-morales,ji.
liu}@inria.
fr{gabriel.
antoniu,patrick.
valduriez}@inria.
frIRISA/INSARennes,FranceLIRMM,FranceCOPPE/UFRJ,Brazilalexandru.
costan@irisa.
fresther.
pacitti@lirmm.
frmarta@cos.
ufrj.
brAbstract—Large-scalescienticapplicationsareoftenex-pressedasworkowsthathelpdeningdatadependenciesbetweentheirdifferentcomponents.
Severalsuchworkowshavehugestorageandcomputationrequirements,andsotheyneedtobeprocessedinmultiple(cloud-federated)datacenters.
Ithasbeenshownthatefcientmetadatahandlingplaysakeyroleintheperformanceofcomputingsystems.
However,mostofthisevidenceconcernonlysingle-site,HPCsystemstodate.
Inthispaper,wepresentahybriddecentralized/distributedmodelforhandlinghotmetadata(frequentlyaccessedmetadata)inmultisitearchitectures.
Wecoupleourmodelwithascienticworkowmanagementsystem(SWfMS)tovalidateandtuneitsapplicabilitytodifferentreal-lifescienticscenarios.
WeshowthatefcientmanagementofhotmetadataimprovestheperformanceofSWfMS,reducingtheworkowexecutiontimeupto50%forhighlyparalleljobsandavoidingunnecessarycoldmetadataoperations.
IndexTerms—hotmetadata,metadatamanagement,multisiteclouds,scienticworkows,geo-distributedapplications.
I.
INTRODUCTIONManylarge-scalescienticapplicationsnowprocessamountsofdatareachingtheorderofPetabytes;asthesizeofthedataincreases,sodotherequirementsforcomputingresources.
Cloudsstandoutasconvenientinfrastructuresforhandlingsuchapplications,fortheyofferthepossibilitytoleaseresourcesatalargescaleandrelativelylowcost.
Veryoften,requirementsofdata-intensivescienticapplicationsexceedthecapabilitiesofasingleclouddatacenter(site),eitherbecausethesiteimposesusagelimitsforfairnessandsecurity[1],orsimplybecausethedatasetistoolarge.
Also,theapplicationdataareoftenphysicallystoredindifferentgeographiclocations,becausetheyaresourcedfromdifferentexperiments,sensingdevicesorlaboratories(e.
g.
thewellknownALICELHCCollaborationspansover37countries[2]).
Hencemultipledatacentersareneededinordertoguaranteeboththatenoughresourcesareavailableandthatdataareprocessedasclosetoitssourceaspossible.
Allpopularpubliccloudstodayaccountforarangeofgeo-distributeddatacenters,e.
g.
MicrosoftAzure[3],AmazonEC2[4],andGoogleCloud[5].
Alargenumberofdata-intensivedistributedapplicationsareexpressedasScienticWorkows(SWf).
ASWfisanassemblyofscienticdataprocessingactivitieswithdatadependenciesbetweenthem[6].
Theapplicationismodeledasagraph,inwhichverticesrepresentprocessingjobs,andedgestheirdependencies.
Suchastructureprovidesaclearviewoftheapplicationowandfacilitatestheexecutionoftheapplicationinageo-distributedenvironment.
Currently,manyScienticWorkowManagementSystems(SWfMS)arepubliclyavailable,e.
g.
Pegasus[7]andTaverna[8];someofthemalreadysupportmultisiteexecution[9].
MetadatahaveacriticalimpactontheefciencyofSWfMS;theyprovideaglobalviewofdatalocationandenabletasktrackingduringtheexecution.
SomeSWfmetadataevenneedtobepersistedtoallowtraceabilityandreproducibilityoftheworkow'sjobs,thesearepartofthesocalledprovenancedata.
Mostnotably,weassertthatsomemetadataaremorefrequentlyaccessedthanothers(e.
g.
thestatusoftasksinexecutioninamultisiteSWfisqueriedmoreoftenthanajob'screationdate).
Wedenotesuchmetadatabyhotmetadataandarguethatitshouldbehandledinaspecic,morequicklyaccessiblewaythantherestofthemetadata.
Whileithasbeenproventhatefcientmetadatahandlingplaysakeyroleinperformance[10],[11],littleresearchhastargetedthisissueinmultisiteclouds.
Onmultisiteinfrastructures,inter-sitenetworklatencyismuchhigherthanintra-sitelatency.
Thisaspectmuststayatthecoreofthedesignofamultisitemetadatamanagementsystem.
AsweexplaininSectionIII,severaldesignprincipleshavetobetakenintoaccount.
Moreover,inmostdataprocessingsystems(shouldtheybedistributed),metadataaretypicallystored,managedandqueriedatsomecentralizedserver(orsetofservers)locatedataspecicsite[7],[12],[13].
However,inamultisitesetting,withhigh-latencyinter-sitenetworksandlargeamountsofconcurrentmetadataoperations,centralizedapproachesarenotanoptimalsolution.
Thispaperpresentsthefollowingcontributions:Basedonthenotionofhotmetadata,weintroduceanarchitectureforoptimizingtheaccessandensuringtheavailabilityofhotmetadatainamultisitecloudenviron-ment(SectionIV).
Wedevelopaprototypebycouplingourproposedschemewithastateoftheartmultisiteworkowexecutionengine,namelyChiron[14](SectionV).
Wedemonstratethatefcientmanagementofhotmeta-dataimprovestheperformanceofSWfMS,reducingtheexecutiontimeofaworkowby1)enablingtimelydataprovisioningand2)avoidingunnecessarycoldmetadatahandling(SectionVI).
II.
THECOREOFOURAPPROACH:HOTMETADATAMetadatamanagementsignicantlyimpactstheperfor-manceofcomputingsystemsdealingwiththousandsormil-lionsofindividualles.
ThisisrecurrentlythecaseofSWfs.
A.
WhyCentralizedMetadataManagementisanIssueWorkowmanagementsystemshandlemorethanle-specicmetadata;runningtheworkowitselfgeneratesasig-nicantamountofexecution-specicmetadata,e.
g.
schedulingmetadata(i.
e.
whichtaskisexecutedwhere)anddata-to-taskmappings.
Mostoftoday'sSWfMShandlemetadatainacentralizedway.
File-specicmetadataisstoredinacentral-izedserver,eitherown-managedorthroughanunderlyinglesystem,whileexecution-specicmetadataisnormallykeptintheexecution'smasterentity.
Controllingandcombiningallthesesortsofmetadatatranslateintoacriticalworkloadasscienticdatasetsgetlarger.
TheCyberShakeworkow,forinstance,runsmorethan800,000tasks,handlinganequalnumberofindividualdatapieces,processingandaggregatingover80,000inputles(whichtranslatesinto200TBofdataread),andrequiringalloftheselestobetrackedandannotatedwithmetadata[15],[16].
Withmanytasks'runtimeintheorderofmilliseconds,theloadofparallelmetadataoperationsbecomesveryheavy,andhandlingitinacentralizedfashionrepresentsaseriousperformancebottleneck.
B.
MultisiteClouds:HowtoScaleOftenenough,scienticdataaresohugeandwidespreadthattheycannotbeprocessed/storedinasingleclouddata-center.
Ontheonehand,thedatasizeorthecomputingrequirementsmightexceedthecapacityofthesiteorthelimitsimposedbyacloudprovider.
Ontheotherhand,datamightbewidelydistributed,andduetotheirsizeitismoreefcienttoprocessthemclosertowheretheyresidethantobringthemtogether;forinstance,theUSEarthquakeHazardProgrammonitorsmorethan7,000sensorssystemsacrossthecountryreportingtotheminute[17].
Ineithercase,multisitecloudsareprogressivelybeingusedforexecutinglarge-scalescienticworkows.
Managingmetadatainacentralizedwayforsuchscenariosisnotappropriate.
Ontopofthecongestiongeneratedbyconcurrentmetadataoperations,remoteinter-siteoperationscauseseveredelaysintheexecution.
Toaddressthisissue,someapproachesproposetheuseofdecentralizedmetadataservers[10].
Inourpreviouswork[18],wealsoimplementedadecentralizedmanagementarchitecturethatprovedtohandlemetadatauptotwiceasfastasacentralizedsolution.
Inthispaperwemakeonestepfurther.
Ourfocusisonthemetadataaccessfrequency,particu-larlyonidentifyingfractionsofmetadatathatdonotrequiremultipleupdates.
Thegoalistoenableamoreefcientdecentralizedmetadatamanagement,reducingthenumberofinter-sitemetadataoperationsbyfavoringtheoperationsonfrequentlyaccessedmetadata,whichwecallHotMetadata.
C.
Whatis"Hot"MetadataThetermhotdatareferstodatathatneedtobefrequentlyaccessed[19].
Hotdataareusuallycriticalfortheapplicationandmustbeplacedinafastandeasy-to-querystorage[20].
WeapplythisconcepttothecontextofSWfmanagementandwedenehotmetadataasthemetadatathatisfrequentlyaccessedduringtheexecutionofaworkow.
Conversely,lessfrequentlyaccessedmetadatawillbedenotedcoldmetadata.
Wedistinguishtwotypesofhotmetadata:taskmetadataandlemetadata.
Taskmetadataisthemetadatafortheexecutionoftasks,whichiscomposedofthecommand,parameters,starttime,endtime,statusandexecutionsite.
HotjobmetadataenablestheSWfMStosearchandgenerateexecutabletasks.
Duringtheexecution,thestatusandtheexecutionsiteofthetasksarequeriedmanytimesbyeachsitetosearchnewtaskstoexecuteandtodetermineifajobisnished.
Inaddition,thestatusofataskmaybeupdatedseveraltimes.
Asaresult,itisimportanttogetthismetadataquicklyateachsite.
Filemetadatathatweconsideras"hot"foraworkowexecutionarethoserelativetothesize,locationandpossiblereplicasofagivenpieceofdata.
KnowledgeoflehotmetadataallowstheSWfMStoplacethedataclosetothecorrespondingtask,orvice-versa.
Thisisespeciallyrelevantinmultisitesettings:timelyavailabilityofthelemetadatawouldpermittomovedatabeforetheyareneeded,hencereducingtheimpactoflow-speedinter-sitenetworks.
Ingeneral,othermetadatasuchasleownershiporpermissionsarenotcriticalfortheexecutionandthusregardedascoldmetadata.
D.
WhataretheChallengesforHotMetadataManagementThereareanumberofimplicationsinordertoeffectivelyapplytheconceptofhotmetadatatorealsystems.
Atthisstageofourresearch,weapplysimpleyetefcientsolutionstothesechallenges.
HowtodecidewhichmetadataarehotWehaveempiri-callychosentheaforementionedtaskandlemetadataashot,sincetheyhavestatisticallyproventobemorefrequentlyaccessedbytheSWfMSweuse:Asampleexecutionof1-degreeMontageWorkow(Fig.
1)asdescribedinsectionVI-B,running820jobsand57Kmetadataoperationsrevealsthatinacentralizedexecu-tion,32.
6%ofthemarelemetadataoperations(store-File,getFile)and32.
4%aretaskmetadataoperations(loadTask,storeTask);whereasinadistributedrun,upFig.
1:RelativefrequencyofmetadataoperationsinMontage.
to67%areleoperations,andtaskoperationsrepresent11%.
Therestcorrespondmostlytomonitoringandnode/siterelatedoperations.
However,aparticularSWfmightactuallyuseothermeta-datamoreoften.
Sinceworkowsaretypicallydenedinstructuredformats(e.
g.
XMLles),anotherwaytoaccountforuser-denedhotmetadatawouldbetoaddapropertytoeachjobdenitionwheretheusercouldspec-ifywhichmetadatatheyconsiderashot.
Thenextiteminourresearchagendaistoimplementanenvironmentthatwillallowforbothuser-denedanddynamically-identiedhotmetadata(byrunningtrainingexecutions).
HowtoassessthatsuchchoiceofhotmetadataisrightEvaluatingtheefcacyofchoosinghotmetadataisnottrivial.
Metadataismuchsmallerthantheapplication'sdataandhandlingitovernetworkswithuctuatingthroughputmayproduceinconsistentresultsintermsofexecutiontime.
Nevertheless,anindicatoroftheimprovementbroughtbyanadequatechoiceofhotmetadata,andwhichisnottime-bounded,isthenumberofmetadataoperationsperformed.
Inourexperimentalevaluation(SectionVI)wepresentresultsintermsofbothexecutiontimeandnumberoftasksperformingsuchoperations.
Thenextsectiondescribeshowtheconceptofhotmeta-datatranslatesintoarchitecturaldesignchoicesforefcientmultisiteworkowprocessing.
III.
DESIGNPRINCIPLESThreekeychoicessetupthefoundationofourarchitecture:Two-LayerMultisiteWorkowManagement.
Weproposetouseatwo-layermultisitesystem:(1)Thelowerintra-sitelayeroperatesascurrentsingle-siteSWfMS:asitecomposedofseveralcomputingnodesandacommonlesystem,oneofsuchnodesactsasmasterandcoordinatescommunicationandtaskexecution.
(2)Anadditionalhigherinter-sitelayercoordinatestheinteractionsatsite-levelthroughamaster/slavearchitecture(onesitebeingthemastersite).
Themasternodeineachsiteisinchargeofsynchronizationanddatatransfers.
InSectionIVweprovideadetaileddescriptionofsuchasystemarchitecture.
AdaptivePlacementforHotMetadata.
Jobdependen-ciesinaworkowformcommonstructures(e.
g.
pipeline,datadistributionanddataaggregation)[21].
SWfMSusuallytakeintoaccountthesedependenciestoschedulethejobexecutioninaconvenientwaytominimizedatamovements(e.
g.
jobco-location).
Accordingly,differentworkowswillyielddifferentschedulingpatterns.
Inordertotakeadvantageoftheseschedulingoptimizations,wemustalsoadapttheworkow'smetadatastoragescheme.
However,maintaininganupdatedversionofallmetadataacrossamultisiteenviron-mentconsumesasignicantamountofcommunicationtime,incurringalsomonetarycosts.
Toreducethisimpact,wewillevaluatedifferentstoragestrategiesforhotmetadataduringtheworkow'sexecution,whilekeepingcoldmetadatastoredlocallyandsynchronizingsuchcoldmetadataonlyduringtheexecutionofthejob.
Inthenextsectionwerecallourdecentralizedadaptivestrategies.
EventualConsistencyforHigh-latencyCommunication.
Whileclouddatacentersarenormallyinterconnectedbyhigh-speedinfrastructure,thelatencyisultimatelyboundedbythephysicaldistancebetweensitesandcommunicationtimemightreachtheorderofseconds[22].
Underthesecircumstancesitisunreasonabletoaimforasystemwithafullyconsistentstateinallofitscomponentsatagivenmomentwithoutstronglycompromisingtheperformanceoftheapplication.
Workowsemanticsallowustheexibilitytooptforaneventuallyconsistentsystem:aworkowexecutionunit(task)processesoneorseveralspecicpiecesofdata;suchunitwillbeginitsexecutiononlywhenallthepiecesitneedsareavailableinthemetadatastorage;however,therestofunitscontinueexecutingindependently.
Thus,withareasonabledelayduetothehigherlatencypropagation,thesystemisguaranteedtobeeventuallyconsistent.
IV.
ARCHITECTUREInpreviousworkweexploreddifferentstrategiesforworkow-drivenmultisitemetadatamanagement,withafocusonlemetadata[18].
Ourstudyindicatedthatahybridap-proachcombiningdecentralizedmetadataandreplicationsuitsbettertheneedsoflarge-scalemultisiteworkowexecution.
Italsoshowedthattherightstrategytoapplydependsontheworkowstructure.
Inthissection,weelaborateontopofsuchobservationsintotwofundamentallines.
(1)Wepresentanarchitectureformultisitecloudworkowprocessingwhichfeaturesdecentralizedmetadatamanagement.
(2)Weenrichthisarchitecturewithacomponentspecicallydedicatedtothemanagementofhotmetadataacrossmultiplesites.
Two-levelMultisiteArchitecture.
Inaccordancewithourdesignprinciples,thebasisforourworkowengineisa2-levelmultisitearchitecture,asshowninFigure2.
1)Attheinter-sitelevel,allcommunicationandsynchro-nizationishandledthroughasetofmasternodes(M),onepersite.
Onesiteactsasaglobalcoordinator(mastersite)andisinchargeofschedulingjobs/taskstoeachsite.
Everymasternodeholdsametadatastorewhichispartoftheglobalmetadatastorageandisdirectlyaccessibletoallothermasternodes.
Fig.
2:MultisiteSWfexecutionarchitecturew/decentralizedmetadata.
Dottedlinesrepresentinter-siteinteractions.
2)Attheintra-sitelevel,oursystempreservesthetypicalmaster/slaveschemewidely-usedtodayonsingle-siteSWfMS:themasternodeschedulesandcoordinatesagroupofslavenodeswhichexecutetheworkowtasks.
Allnodeswithinasiteareconnectedtoasharedlesystemtoaccessdataresources.
Metadataupdatesarepropagatedtoothersitesthroughthemasternode,whichclassieshotandcoldmetadataasexplainedbelow.
SeparateManagementofHotandColdMetadata.
Fol-lowingourcharacterizationofhotmetadatafromSectionII-C,weincorporateanintermediatecomponentwhichltersoutcoldmetadataoperations.
Thismodelensuresthat:a)hotmetadataoperationsaremanagedwithhighpriorityoverthenetwork,andb)coldmetadataupdatesarepropagatedonlyduringperiodsoflownetworkcongestion.
Thelterislocatedinthemasternodeofeachsite(Figure3).
Itseparateshotandcoldmetadata,favoringthepropagationofhotmetadataandthusalleviatescongestionduringmetadata-intensiveperiods.
Thestoragelocationofthehotmetadataisthenselectedbasedonsomemetadatamanagementstrategy,asdevelopedbelow.
DecentralizedHotMetadataManagementStrategies.
Weconsiderthreedifferentalternativesfordecentralizedmetadatamanagement(exploredinpreviouswork[18]).
Here,westudytheirapplicationtohotmetadata.
Theyallincludeametadataserverineachofthedatacenterswhereexecutionnodesaredeployed.
Theydifferinthewayhotmetadataisstoredandreplicated.
Webrieyrecalltheirspecicitiesbelow.
Localwithoutreplication(LOC)Everynewhotmetadataentryisstoredatthesitewhereithasbeencreated.
Forreadoperations,metadataisqueriedateachsiteandthesitethatstoresthedatawillgivetheresponse.
Ifnoreplyisreceivedwithinatimethreshold,therequestisresent.
Thisstrategywilltypicallybenetpipeline-likeworkowstructures,whereconsecutivetasksareusuallyco-locatedatthesamesite.
Hashedwithoutreplication(DHT)Hotmetadataisqueriedandupdatedfollowingtheprincipleofadistributedhashtable(DHT).
ThesitelocationofametadataentrywillbedeterminedbyasimplehashfunctionappliedtoitskeyFig.
3:Thehotmetadatalteringcomponent.
attribute,le-nameincaseoflemetadata,andtask-idfortaskmetadata.
Weassumethattheimpactofinter-siteupdateswillbecompensatedbythelinearcomplexityofreadoperations.
Hashedwithlocalreplication(REP)Wecombinethetwopreviousstrategiesbykeepingbothalocalrecordofthehotmetadataandahashedcopy.
Intuitively,thiswouldreducethenumberofinter-sitereadingrequests.
Weexpectthishybridapproachtohighlightthetrade-offsbetweenmetadatalocalityandDHTlinearoperations.
V.
IMPLEMENTATION:DMM-CHIRONInordertovalidateourarchitecture,wehavedevelopedaprototypemultisiteSWfMSthatimplementshotmetadatahandling.
Itprovidessupportfordecentralizedmetadataman-agement,withadistinctionbetweenhotandcoldmetadata.
WedenoteourprototypebyDecentralized-MetadataMultisiteChiron(DMM-Chiron).
A.
Baseline:MultisiteChironThisworkbuildsonMultisiteChiron[9],aSWfMSspecif-icallydesignedformultisiteclouds.
ItslayeredarchitectureispresentedinFigure4;itiscomposedofninemodules.
Mul-tisiteChironexploitsatextualUItointeractwithusers.
TheSWfisanalyzedbytheJobManagertoidentifyexecutableactivities,i.
e.
unexecutedjobs,forwhichtheinputdataisready.
Thesamemodulegeneratestheexecutabletasks.
Schedulingisdoneintwophases:theMultisiteTaskSched-uleratthecoordinatorsitescheduleseachtasktoasite,followingtherandomOLB(OpportunisticLoadBalancing)algorithmusedin[9].
WhiletheSingleSiteTaskSchedulerappliesthedefaultdynamicFAF(FirstActivityFirst)approachusedbyChiron[14]toscheduletaskstocomputingnodes.
Itisworthtoclarifythatoptimizationstotheschedulingalgorithmsareoutofthescopeofthispaper.
Afterwards,itistheTaskExecutorateachcomputingnodewhichrunsthetasks.
Alongtheexecution,metadataishandledbytheMetadataManageratthemastersite.
Sincethemetadatastructureiswelldened,weusearelationaldatabase,namelyPostgreSQL,tostoreit.
Alldata(input,intermediateandoutput)arestoredinaSharedFileSystemateachsite.
TheletransferbetweentwodifferentsitesisperformedbytheMultisiteFileTransfermodule.
TheMultisiteMessageCommunicationmoduleofthemasternodeateachsiteisinFig.
4:LayeredarchitectureofMultisiteChiron[9].
chargeofsynchronizationthroughamaster/slavearchitecturewhiletheMultisiteFileTransfermoduleexploitsapeer-to-peermodelfordatatransfers.
B.
CombiningMultisiteandHotMetadataManagementToimplementandevaluateourapproachtodecentralizedmetadatamanagement,wefurtherextendedMultisiteChironbyaddingmultisitemetadataprotocols.
Wemainlymodiedtwomodulesasdescribedinthenextsections:theJobMan-agerandtheMetadataManager.
FromSingle-toMultisiteJobManager.
TheJobManageristheprocessthatveriesiftheexecutionofajobisnished,inordertolaunchthenextjobs.
Originally,thisvericationwasdoneonthemetadatastoredatthecoordinatorsite.
InDMM-Chironweimplementanoptimizationtoeachofthehotmetadatamanagementstrategies(SectionIV):forLOC,thelocalDMM-Chironinstanceveriesonlythetasksscheduledatthatsiteandthecoordinatorsiteconrmsthattheexecutionofajobisnishedwhenallthesitesnishtheircorrespondingtasks.
ForDHTandREP,themasterDMM-Chironinstanceofthecoordinatorsitecheckseachtaskofthejob.
IntroducingProtocolsforMultisiteHotMetadata.
Thefollowingprotocolsillustrateoursystem'smetadataopera-tions.
Werecallthatmetadataoperationsaretriggeredbytheslavenodesateachsite,whicharetheactualexecutorsoftheworkowtasks.
MetadataWriteAsshowningure5a,anewmetadatarecordispassedonfromtheslavetothemasternodeateachsite(1).
Uponreception,themasterlterstherecordaseitherhotorcold(2).
Thehotmetadataisassignedbythemasternodetothemetadatastoragepoolatthecorrespondingsite(s)accordingtoonemetadatastrategy,cf.
SectionIV(3a).
Createdcoldmetadataiskeptlocallyandpropagatedasynchronouslytothecoordinatorsiteduringtheexecutionofthejob(3b).
MetadataReadEachmasternodehasaccesstotheentirepoolofmetadatastoressothatitcangethotmetadatafromanysite.
Figure5bshowstheprocess.
Whenareadoperationisrequestedbyaslave(1),amasternodesendsarequesttoeachmetadatastore(2)anditprocessestheresponsethatcomerst(3),providedsuchresponseisnotanemptyset.
Thismechanismensuresthatthe(a)Write(b)ReadFig.
5:MetadataProtocols.
masternodegetstherequiredmetadataintheshortesttime.
Duringtheexecution,DMM-Chirongathersallthetaskmetadatastoredateachsitetoverifyiftheexecutionofajobisnished.
VI.
EXPERIMENTALEVALUATIONAlongthefollowingexperimentswecompareourresultstoamultisiteSWfMSwithcentralizedmetadatamanagement,whichwerecallbeingthestate-of-the-artconguration.
WeuseMultisiteChironasanexampleofsucharchitecture.
A.
ExperimentalSetupDMM-ChironwasdeployedontheMicrosoftAzurecloud[3]usingatotalof27nodesofA4standardvirtualmachines(8cores,14GBmemory).
TheVMswereevenlydistributedamongthreedatacenters:WestEurope(WEU,Netherlands),NorthEurope(NEU,Ireland)andCentralUS(CUS,Iowa).
ControlmessagesbetweenmasternodesaredeliveredthroughtheAzureBus[23].
B.
UseCasesMontageisatoolkitcreatedbytheNASA/IPACInfraredScienceArchiveandusedtogeneratecustommosaicsoftheskyfromasetofimages[24].
Additionalinputfortheworkowincludesthedesiredregionofthesky,aswellasthesizeofthemosaicintermsofsquaredegrees.
WemodeltheMontageSWfusingtheproposalofJuveetal.
[15].
BuzzFlowisadata-intensiveSWfthatsearchesfortrendsandmeasurescorrelationsinscienticpublications[25].
ItanalysesdatacollectedfrombibliographydatabasessuchasDBLPorPubMed.
Buzziscomposedofthirteenjobs.
C.
DifferentStrategiesforDifferentWorkowStructuresOurhypothesisisthatnosingledecentralizedstrategycantallworkowstructures:ahighlyparalleltaskwouldexhibitdifferentmetadataaccesspatternsthanaconcurrentdatagatheringtask.
Thus,theimprovementsbroughttoonetypeofworkowbyeitherofthestrategiesmightturntobedetrimentalforanother.
Toevaluatethishypothesis,weranseveralcombinationsofourstrategieswiththefeaturedworkows.
Figure6showstheaverageexecutiontimefortheMontageworkowgenerating0.
5-,1-,and2-degreemosaicsofthesky,usinginallthecasesa5.
5GBimagedatabasedistributedacrossthethreedatacenters.
Withalargerdegree,alargervolumeofintermediatedataishandledandamosaicofhigherresolutionisproduced.
Fig.
6:Montageexecutiontimefordifferentstrategiesanddegrees.
Avg.
intermediatedatashowninparenthesis.
Inthechartwenoteintherstplaceacleartimegainofupto28%byusingalocaldistributionstrategyinsteadofacentralizedone,forallthedegrees.
Thisresultwasexpectedsincethehotmetadataisnowmanagedinparallelbythreeinstancesinsteadofone,anditisonlythecoldmetadatathatisforwardedtothecoordinatorsiteforschedulingpurposes(andusedatmostonetime).
Weobservethatformosaicsofdegree1andunder,theuseofdistributedhashedstoragealsooutperformsthecentralizedversion.
However,wenoteaperformancedegradationinthehashedstrategies,startingat1-degreeandgettingmoreevidentat2-degree.
Weattributethistothefactthatthereisalargernumberoflong-distancehotmetadataoperationscomparedtothecentralizedapproach:withhashedstrategies,1outof3operationsarecarriedoutonaveragebetweenCUSandNEU.
Inthecentralizedapproach,NEUonlyperformsoperationsintheWEUsite,thussuchlonglatencyoperationsarereduced.
Wealsoassociatethisperformancedropwiththesizeofintermediatedatabeinghandledbythesystem:whilewetrytominimizeinter-sitedatatransfers,withlargervolumesofdatasuchtransfersaffecttheexecutiontimeuptoacertaindegreeandindependentlyofthemetadatamanagementscheme.
WeconcludethatwhiletheDHTmethodmightseemefcientduetolinearreadandwriteoperations,itisnotwellsuitedforgeo-distributedexecutions,whichfavorlocalityandpenalizeremoteoperations.
Inasimilarexperiment,wevalidatedDMM-ChironusingtheBuzzworkow,whichisratherdataintensive,withtwoDBLPdatabasedumpsof60MBand1.
2GB.
TheresultsareshowninFigure7;notethattheleftandrightY-axesdifferbyoneorderofmagnitude.
WeobserveagainthatDMM-Chironbringsageneralimprovementinthecompletiontimewithrespecttothecentralizedimplementation:10%forLOCinthe60MBdatasetand6%for1.
2GB,whileforDHTandREPthetimeimprovementwasoflessthan5%.
InordertobetterunderstandtheperformanceimprovementsbroughtbyDMM-Chiron,andalsotoidentifythereasonofthelowruntimegainfortheBuzzworkow,weevaluatedMontageandBuzzinaper-jobgranularity.
Theresultsarepresentedinthenextsection.
AlbeitthetimegainsperceivedFig.
7:Buzzworkowexecutiontime.
LeftY-axisscalecorrespondsto60MBexecution,rightY-axisto1.
2GB.
intheexperimentsmightnotseemsignicantatrstglance,twoimportantaspectsmustbetakenintoconsideration:OptimizationatnocostOurproposedsolutionsareimple-mentedusingexactlythesamenumberofresourcesastheircounterpartcentralizedapproaches:thedecentral-izedmetadatastoresaredeployedwithinthemasternodesofeachsiteandthecontrolmessagesaresentthroughthesameexistingchannels.
Thismeansthatsuchgains(ifsmall)comeatnoadditionalcostfortheuser.
ActualmonetarysavingsOurlongestexperiment(Buzz1.
2GB)runsintheorderofhundredsofminutes.
Withtoday'sscienticexperimentsrunningatthisscaleandbeyond,againof10%actuallyimpliessavingsofhoursofcloudcomputingresources.
D.
ZoomonMulti-taskJobsWecallajobmulti-taskwhenitsexecutionconsistsofmorethanasingletask.
InDMM-Chiron,thevarioustasksofsuchjobsareevenlydistributedtotheavailablesitesandthuscanbeexecutedinparallel.
WearguethatitispreciselyinthesekindofjobsthatDMM-Chironyieldsitsbestperformance.
Figure8showsabreakdownofBuzzandMontagework-owswiththeproportionalsizeofeachoftheirjobsfromtwodifferentperspectives:taskscountandaverageexecutiontime.
Ourgoalistocharacterizethemostrelevantjobsineachworkowbynumberoftasksandconrmtheirrelevancebylookingattheirrelativeexecutiontime.
InBuzz,wenoticethatbothmetricsarehighlydominatedbythreejobs:Buzz(676tasks),BuzzHistory(2134)andHistogramCreator(2134),whiletherestaresosmallthattheyarebarelynoticeable.
FileSplitcomesfourthintermsofexecutiontimeanditisin-deedtheonlyremainingmulti-taskjob(3tasks).
Likewise,weidentifyforMontagetheonlyfourmulti-taskjobs:mProject(45tasks),prepare(45),mDiff(107)andmBackground(45).
InFigures9and10welookintotheexecutiontimeofthemulti-taskjobsofBuzzandMontage,respectively.
Figure9correspondstoBuzzSWfwith60MBinputdata.
Weobservethatexceptforonecase,namelyBuzzjobwithREP,thedecentralizedstrategiesoutperformconsiderablythebaseline(upto20.
3%forLOC,16.
2%forDHTand14.
4%forREP).
(a)Buzz(b)MontageFig.
8:Workowper-jobbreakdown.
Verysmalljobsareenhancedforvisibility.
Fig.
9:Executiontimeofmulti-taskjobsontheBuzzworkowwith60MBinputdata.
InthecaseofFileSplit,wearguethattheexecutiontimeistooshortandthenumberoftaskstoosmalltorevealaclearimprovement.
However,theotherthreejobsconrmthatDMM-Chironperformsbetterforhighlyparalleljobs.
Itisimportanttonotethatthesegainsaremuchlargerthanthoseoftheoverallcompletiontime(Figure7)sincetherearestillanumberofworkloadsexecutedsequentially,whichhavenotbeenoptimizedbythecurrentreleaseofDMM-Chiron.
Correspondingly,Figure10showstheexecutionofeachmulti-taskjobfortheMontageSWfof0.
5degree.
Thegurerevealsthat,onaverage,hotmetadatadistributionsubstantiallyimprovescentralizedmanagementinmostcases(upto39.
5%forLOC,52.
8%forDHTand64.
1%forREP).
However,wenoticesomeunexpectedpeaksanddropsspecicallyinthehashedapproaches.
Afteranumberofexecutions,webelievethatsuchcasesareduetocommonnetworklatencyvariationsofthecloudenvironmentaddedtothefactthattheexecutiontimeforthejobsisrathershort(intheorderofseconds).
VII.
RELATEDWORKCentralizedapproaches.
Metadataisusuallyhandledbymeansofcentralizedregistriesimplementedontopofrela-tionaldatabases,thatonlyholdstaticinformationaboutdatalocations.
SystemslikeTaverna[8],Pegasus[7]orChiron[14]leveragesuchschemes,typicallyinvolvingasingleserverthatprocessesalltherequests.
IncaseofincreasedclientconcurrencyorhighI/Opressure,however,thesinglemetadataservercanquicklybecomeaperformancebottleneck.
Also,theworkloadsinvolvingmanysmallles,whichtranslateintoFig.
10:Executiontimeofmulti-taskjobsontheMontageworkowof0.
5degree.
heavymetadataaccesses,arepenalizedbytheoverheadsfromtransactionsandlocking[26],[27].
Alightweightalternativetodatabasesisindexingthemetadata;althoughmostindexingtechniques[28],[29]aredesignedfordataratherthanmeta-data.
Eventhededicatedindex-basedmetadataschemes[30]useacentralizedindexandarenotadequateforlarge-scaleworkows,norcantheyscaletomultisitedeployments.
Distributedapproaches.
Someworkowsystemsopttorelyondistributedle-systemsthatpartitionthemetadataandstoreitateachnode(e.
g.
[31],[32]),inashared-nothingarchitecture,asarststeptowardscompletegeo-graphicaldistribution.
Hashingisthemostcommontechniqueforuniformpartitioning:itconsistsofassigningmetadatatonodesbasedonahashofaleidentier.
Giraffa[33]usesfullpathnamesaskeyintheunderlyingHBase[34]store.
Lustre[35]hashesthetailofthelenameandtheIDoftheparentdirectory.
Similarhashingschemesareusedby[36],[37],[38]withalowmemoryfootprint,grantingaccesstodatainalmostconstanttime.
FusionFS[39]implementsadistributedmetadatamanagementbasedonDHTsaswell.
Chironitselfhasaversionwithdistributedcontrolusinganin-memorydistributedDBMS[40].
Allthesesystemsarewellsuitedforsingle-clusterdeploymentsorworkowsthatrunonsupercomputers.
However,theyareunabletomeetthepracticalrequirementsofworkowsexecutedonclouds.
Similarlytous,CalvinFS[10]useshash-partitionedkey-valuemetadataacrossgeo-distributeddatacenterstohandlesmallles,yetitdoesnotaccountforworkowsemantics.
Hybridapproaches.
Morerecently,Zhaoetal.
[41]pro-posedusingbothadistributedhashtable(FusionFS[39])andacentralizeddatabase(SPADE[42])tomanagethemetadata.
Similarlytous,theirmetadatamodelincludesbothleoperationsandprovenanceinformation.
However,theydonotmakethedistinctionbetweenhotandcoldmetadata,andtheymainlytargetsinglesiteclusters.
VIII.
CONCLUSIONInthispaperweintroducedtheconceptofhotmetadataforscienticworkowsrunninginlarge,geographicallydis-tributedandhighlydynamicenvironments.
Basedonit,wedesignedahybriddecentralizedanddistributedmodelforhandlingmetadatainmultisiteclouds.
Ourproposalisabletooptimizetheaccesstoandensuretheavailabilityofhotmetadata,whileeffectivelyhidingtheinter-sitenetworklaten-ciesandremainingnon-intrusiveandeasytodeploy.
Coupledwithascienticworkowengine,ourstrategiesshowedanimprovementofupto28%forthewholeworkow'scomple-tiontimeand50%forspecichighly-paralleljobs,comparedtostate-of-the-artcentralizedsolutions,atnoadditionalcost.
Encouragedbytheseresults,weplantobroadenthescopeofourworkandconsidertheimpactofheterogeneousmultisiteenvironmentsonthehotmetadatastrategies.
Wearealsolookingatthepossibilityofaddingdatalocationawarenessinordertominimizetheimpactoflargeintermediatedatatransfers.
Anotherinterestingdirectiontoexploreisintegratingreal-timemonitoringinformationabouttheexecutedjobsinordertodynamicallybalancethehotmetadataloadaccordingtoeachsite'slivecapacityandperformance.
ACKNOWLEDGMENTThisworkissupportedbytheMSR-InriaJointCentre,theANROverFlowprojectandpartiallyperformedinthecontextoftheComputationalBiologyInstitute.
Theexperi-mentswerecarriedoutusingtheAzureinfrastructureprovidedbyMicrosoftintheframeworkoftheZ-CloudFlowproject.
LuisispartiallyfundedbyCONACyT,Mexico.
JiispartiallyfundedbyEUH2020Programme,MCTI/RNP-Brazil,CNPq,FAPERJ,andInria(MUSICproject).
REFERENCES[1]"ResourceQuotas-GoogleCloudPlatform,"https://cloud.
google.
com/compute/docs/resource-quotas.
[2]"AliceCollaboration,"http://aliceinfo.
cern.
ch/general/index.
html.
[3]"MicrosoftAzureCloud,"http://www.
windowsazure.
com/en-us/.
[4]"AmazonElasticComputeCloud,"https://aws.
amazon.
com/ec2/.
[5]"GoogleCloudPlatform,"https://cloud.
google.
com/.
[6]E.
Deelman,D.
Gannonetal.
,"Workowsande-science:Anoverviewofworkowsystemfeaturesandcapabilities,"FutureGenerationCom-puterSystems,vol.
25,no.
5,pp.
528–540,2009.
[7]E.
Deelman,G.
Singhetal.
,"Pegasus:Aframeworkformappingcomplexscienticworkowsontodistributedsystems,"ScienticPro-gramming,vol.
13,no.
3,pp.
219–237,2005.
[8]K.
Wolstencroft,R.
Hainesetal.
,"Thetavernaworkowsuite:designingandexecutingworkowsofwebservicesonthedesktop,weborinthecloud,"NucleicAcidsResearch,vol.
41,no.
W1,pp.
W557–W561,2013.
[9]J.
Liu,E.
Pacittietal.
,"Scienticworkowschedulingwithprovenancesupportinmultisitecloud,"inHighPerformanceComputingforCom-putationalScienceVECPAR,2016.
[10]A.
ThomsonandD.
J.
Abadi,"CalvinFS:consistentwanreplicationandscalablemetadatamanagementfordistributedlesystems,"inProc.
ofthe13thUSENIXConf.
onFileandStorageTechnologies,2015.
[11]S.
R.
Alam,H.
N.
El-Harakeetal.
,"ParallelI/Oandthemetadatawall,"inProc.
ofthe6thWorkshoponParallelDataStorage,ser.
PDSW'11.
NewYork,NY,USA:ACM,2011,pp.
13–18.
[12]S.
Ghemawat,H.
Gobioff,andS.
-T.
Leung,"Thegooglelesystem,"SIGOPSOper.
Syst.
Rev.
,vol.
37,no.
5,pp.
29–43,Oct.
2003.
[13]F.
SchmuckandR.
Haskin,"GPFS:Ashared-disklesystemforlargecomputingclusters,"inProc.
ofthe1stUSENIXConferenceonFileandStorageTechnologies,ser.
FAST'02,Berkeley,CA,USA,2002.
[14]E.
Ogasawara,J.
Diasetal.
,"Analgebraicapproachfordata-centricscienticworkows,"Proc.
ofVLDBEndowment,vol.
4,no.
12,pp.
1328–1339,2011.
[15]G.
Juve,A.
Chervenaketal.
,"Characterizingandprolingscienticworkows,"FGCS,vol.
29,no.
3,pp.
682–692,2013.
[16]E.
Deelman,S.
Callaghanetal.
,"Managinglarge-scaleworkowexecu-tionfromresourceprovisioningtoprovenancetracking:Thecybershakeexample,"inIEEEIntl.
Conf.
one-ScienceandGridComputing,2006.
[17]"USGSANSS-AdvancedNationalSeismicSystem,"http://earthquake.
usgs.
gov/monitoring/anss/.
[18]L.
Pineda-Morales,A.
Costan,andG.
Antoniu,"Towardsmulti-sitemetadatamanagementforgeographicallydistributedcloudworkows,"inIEEEIntl.
Conf.
onClusterComputing,Sept2015,pp.
294–303.
[19]J.
J.
Levandoski,P.
-A.
Larson,andR.
Stoica,"Identifyinghotandcolddatainmain-memorydatabases,"inDataEngineering(ICDE),2013IEEE29thInternationalConferenceon.
IEEE,2013,pp.
26–37.
[20]D.
Gibson.
(2012)IsYourBigDataHot,Warm,orCold[Online].
Available:http://www.
ibmbigdatahub.
com/blog/your-big-data-hot-warm-or-cold[21]S.
Bharathi,A.
Chervenaketal.
,"Characterizationofscienticwork-ows,"inWorkshoponWFsinSupportofLarge-ScaleScience,2008.
[22]"AzureSpeedTest,"http://www.
azurespeed.
com/.
[23]"MicrosoftAzureServiceBus,"https://azure.
microsoft.
com/en-us/services/service-bus/.
[24]E.
Deelman,G.
Singhetal.
,"Thecostofdoingscienceonthecloud:Themontageexample,"inProceedingsofthe2008ACM/IEEEConferenceonSupercomputing,ser.
SC'08,2008,pp.
50:1–50:12.
[25]J.
Dias,E.
Ogasawaraetal.
,"Algebraicdataowsforbigdataanalysis,"inBigData,2013IEEEIntl.
Conf.
on,2013,pp.
150–155.
[26]M.
Stonebraker,S.
Maddenetal.
,"Theendofanarchitecturalera:Timeforacompleterewrite,"inProc.
ofthe33rdIntl.
Conf.
onVeryLargeDataBases,ser.
VLDB'07,pp.
1150–1160.
[27]M.
StonebrakerandU.
Cetintemel,""onesizetsall":anideawhosetimehascomeandgone,"inDataEngineering,2005.
ICDE2005.
Proceedings.
21stInternationalConferenceon,April2005,pp.
2–11.
[28]J.
Wang,S.
Wuetal.
,"Indexingmulti-dimensionaldatainacloudsystem,"inProc.
ofthe2010ACMSIGMODIntl.
Conf.
onManagementofData,2010,pp.
591–602.
[29]S.
Wu,D.
Jiangetal.
,"Efcientb-treebasedindexingforclouddataprocessing,"Proc.
VLDBEndow.
,vol.
3,no.
1-2,pp.
1207–1218,2010.
[30]A.
W.
Leung,M.
Shaoetal.
,"Spyglass:Fast,scalablemetadatasearchforlarge-scalestoragesystems.
"inFAST,vol.
9,2009,pp.
153–166.
[31]A.
Gehani,M.
Kim,andT.
Malik,"Efcientqueryingofdistributedprovenancestores,"inACMInt.
SymposiumonHighPerformanceDistributedComputingHPDC,2010,pp.
613–621.
[32]T.
Malik,L.
Nistor,andA.
Gehani,"Trackingandsketchingdistributeddataprovenance,"inSixthInt.
Conf.
one-Science,2010,pp.
190–197.
[33]"Giraffa,"https://code.
google.
com/a/apache-extras.
org/p/giraffa/.
[34]"ApacheHBase,"http://hbase.
apache.
org.
[35]"Lustre-OpenSFS,"http://lustre.
org/.
[36]P.
F.
CorbettandD.
G.
Feitelson,"Thevestaparallellesystem,"ACMTrans.
Comput.
Syst.
,vol.
14,no.
3,pp.
225–264,Aug.
1996.
[37]E.
L.
MillerandR.
H.
Katz,"RAMA:Aneasy-to-use,high-performanceparallellesystem,"ParallelComputing,vol.
23,no.
4,pp.
419–446.
[38]S.
A.
Brandt,E.
L.
Milleretal.
,"Efcientmetadatamanagementinlargedistributedstoragesystems,"inProc.
20thIEEE/11thNASAGoddardConferenceonMassStorageSystemsandTechnologies,2003.
[39]D.
Zhao,Z.
Zhangetal.
,"Fusionfs:Towardsupportingdata-intensivescienticapplicationsonextreme-scalehigh-performancecomputingsystems,"in2014IEEEIntl.
Conf.
onBigData,Oct2014.
[40]R.
Souza,V.
Silvaetal.
,"Parallelexecutionofworkowsdrivenbyadistributeddatabasemanagementsystem,"inACM/IEEEConferenceonSupercomputing,Poster,2015.
[41]D.
Zhao,C.
Shouetal.
,"Distributeddataprovenanceforlarge-scaledata-intensivecomputing,"inCLUSTER,2013,pp.
1–8.
[42]M.
J.
Zaki,"Spade:Anefcientalgorithmforminingfrequentse-quences,"MachineLearning,vol.
42,no.
1,pp.
31–60.
今年1月的时候Hosteons开始提供1Gbps端口KVM架构VPS,目前商家在LET发布消息,到本月30日之前,用户下单洛杉矶/纽约/达拉斯三个地区机房KVM主机可以从1Gbps免费升级到10Gbps端口,最低年付仅21美元起。Hosteons是一家成立于2018年的国外VPS主机商,主要提供VPS、Hybrid Dedicated Servers及独立服务器租用等,提供IPv4+IPv6,支持...
CloudCone针对中国农历新年推出了几款特别套餐, 其中2019年前注册的用户可以以13.5美元/年的价格购买一款1G内存特价套餐,以及另外提供了两款不限制注册时间的用户可购买年付套餐。CloudCone是Quadcone旗下成立于2017年的子品牌,提供VPS及独立服务器租用,也是较早提供按小时计费VPS的商家之一,支持使用PayPal或者支付宝等付款方式。下面列出几款特别套餐配置信息。CP...
a400互联是一家成立于2020年商家,主营美国机房的产品,包括BGP线路、CN2 GIA线路的云服务器、独立服务器、高防服务器,接入线路优质,延迟低,稳定性高,额外也还有香港云服务器业务。当前,全场服务器5折,香港VPS7折,洛杉矶VPS5折,限时促销!A400互联官网:https://a400.net/优惠活动全场独服永久5折优惠(续费同价):0722香港VPS七折优惠:0711洛杉矶VPS五...
zcloud为你推荐
域名查询怎样查看域名是在哪个平台备案的网站空间租赁如何租用网站空间?怎么查看空间支持那些功能呢? 一般多少钱?.net虚拟主机.net虚拟主机空间怎么选择,国内虚拟主机哪家比较好,各有什么特色域名空间代理我想做域名空间代理!vpsVPS是干嘛用的?海外主机那些韩国主机,美国主机是怎么来的?网站空间域名网站、域名空间三者的关系台湾vps台湾服务器 哪里稳定速度快?重庆虚拟空间重庆合川宝龙城市广场有前途么asp网站空间ASP空间是什么?
国内vps 阿里云os wordpress技巧 日志分析软件 新世界电讯 主机合租 租空间 网通ip 合租空间 91vps 国外代理服务器地址 服务器是干什么的 linux服务器维护 香港亚马逊 空间登陆首页 starry 英雄联盟台服官网 qq金券 cdn网站加速 买空间网 更多