detailsfreehost

freehost 时间:2021-04-10 阅读:()

HPC-AwareVMPlacementinInfrastructureCloudsAbhishekGupta,LaxmikantV.
KaleUniversityofIllinoisatUrbana-ChampaignUrbana,IL61801,USA(gupta59,kale)@illinois.
eduDejanMilojicic,PaoloFaraboschiHPLabsPaloAlto,CA,USArstname.
lastname@hp.
comSusanneM.
BalleHPCloudServicesHudson,NH,USAsusanne.
balle@hp.
comAbstract—Cloudofferingsareincreasinglyservingworkloadswithalargevariabilityintermsofcompute,storageandnet-workingresources.
Computingrequirements(allthewaytoHighPerformanceComputingorHPC),criticality,communicationintensity,memoryrequirements,andscalecanvarywidely.
VirtualMachine(VM)placementandconsolidationforeffectiveutilizationofacommonpoolofresourcesforefcientexecutionofsuchdiverseclassofapplicationsinthecloudischallenging,resultinginhighercostandmissedServiceLevelAgreements(SLAs).
ForHPC,currentcloudproviderseitherofferdedicatedcloudwithdedicatednodes,losingoutonconsolidationbenetsofvirtualization,oruseHPC-agnosticcloudschedulingresultinginpoorHPCperformance.
Inthiswork,weaddressapplication-awareallocationofnVMinstances(comprisingasinglejobrequest)tophysicalhostsfromasinglepool.
WedesignandimplementanHPC-awareschedulerontopofOpenStackCompute(Nova)andalsoincorporateitinasimulator(CloudSim).
Throughvariousoptimizations,speci-callytopology-andhardware-awareness,cross-VMinterferenceaccountingandapplication-awareconsolidation,wedemonstrateenhancedVMplacementswhichachieveupto45%improvementinHPCperformanceand/or32%increaseinjobthroughputwhilelimitingtheeffectofjitter(ornoise)to8%.
Keywords-Cloud;HighPerformanceComputing;Scheduling;Placement;I.
INTRODUCTIONCloudcomputingisincreasinglybeingexploredasacosteffectivealternative(andaddition)tosupercomputersforsomeHPCapplications[1]–[4].
Cloudprovidesthebenetsofecon-omyofscale,elasticity,exibility,andcustomization(throughvirtualization)totheHPCcommunity.
ItisattractingseveraluserswhocannotaffordtodeploytheirowndedicatedHPCinfrastructureduetoup-frontinvestmentorsporadicdemands.
Despitethesebenets,today'sHPCisnotcloud-aware,andtoday'scloudsarenotHPC-aware.
Asaconsequence,onlyembarrassinglyparallelorsmallscaleHPCapplicationsaregoodcandidatestodaytorunincloud.
Thecloudcom-modityinterconnects(orbetter,theabsenceoflow-latencyinterconnects),theperformanceoverheadintroducedbyvirtu-alization,andtheapplication-agnosticcloudschedulersarethebiggestobstaclesforefcientexecutionofHPCapplicationsincloud[2],[3].
Pastresearch[1]–[4]onHPCincloudhasprimarilyfocusedonevaluationofscienticparallelapplications(suchasthoseusingMPI[5])andhasreachedpessimisticconclusions.
HPCapplicationsareusuallycomposedoftightlycoupledprocessesperformingfrequentinter-processcommunicationandsynchronizations,andposesignicantchallengestocloudschedulers.
TherehavebeenfeweffortsonresearchingVMschedulingalgorithmswhichtakeintoaccountthenatureofHPCapplicationsandhaveshownpromisingresults[6]–[9].
Inthispaper,wepostulatethattheplacementofVMstophysicalmachinescanhavesignicantimpactonperformance.
Withthisasmotivation,theprimaryquestionsthatweaddressarethefollowing:CanweimproveHPCapplicationperfor-manceincloudthroughVMplacementstrategiestailoredtoapplicationcharacteristicsIsthereacost-savingpotentialthroughincreasedresourceutilizationachievedbyapplication-awareconsolidationWhataretheperformance-costtradeoffsinusingVMconsolidationforHPCWeaddresstheproblemofhowtoeffectivelyutilizeacommonpoolofresourcesforefcientexecutionofverydiverseclassesofapplicationsinthecloud.
Forthatpurpose,wesolvetheproblemofsimultaneouslyallocatingmultipleVMinstancescomprisingasinglejobrequesttophysicalhoststakenfromapool.
WedothiswhilemeetingServiceLevelAgreement(SLA)requirements(expressedintermsofcompute,memory,homogeneity,andtopology),andwhileattemptingtoimprovetheutilizationofhardwareresources.
ExistingVMschedulingmechanismsrelyonuserinputsandstaticpartitioningofclustersintoavailabilityzonesfordifferentapplicationtypes(suchasHPCandnon-HPC).
Theproblemisparticularlychallengingbecause,ingeneral,alarge-scaleHPCapplicationwouldideallyrequireadedi-catedallocationofcloudresources(computeandnetwork),sinceitsperformanceisquitesensitivetovariability(causedbynoise,orjitter).
Theper-hourchargethatacloudproviderwouldhavetoestablishfordedicatingtheresourceswouldquicklymakethepropositionuneconomicalforcustomers.
Toovercomethisproblem,ourtechniqueidentiessuitableapplicationcombinationswhoseexecutionproleswellcom-plementeachotherandthatcanbeconsolidatedonthesamehardwareresourceswithoutcompromisingtheoverallHPCperformance.
Thisenablesustobetterutilizethehardware,andlowerthecostforHPCapplicationswhilemaintainingperformanceandprotability,hencegreatlyenhancingthebusinessvalueofthesolution.
Themethodologyusedinthispaperconsistsofatwostepprocess–1)Characterizingapplicationsbasedontheiruseofsharedresourcesinamulti-corenode(withfocusonsharedcache)andtheirtightlycoupledness,and2)usinganapplication-awareschedulertoidentifygroupsofapplicationsthathavecomplementaryproles.
Thekeycontributionsofthispaperare:WeidentifytheopportunitiesandchallengesofVMconsolidationforHPCincloud.
Inaddition,wedevelopschedulingalgorithmswhichoptimizeresourceallocationwhilebeingHPC-aware.
WeachievethisbyapplyingMulti-dimensionalOnlineBinPacking(MDOBP)heuris-ticswhileensuringthatcross-applicationinterferenceiskeptwithinbounds.
(§II,§III)WeoptimizetheperformanceforHPCincloudthroughintelligentHPC-awareVMplacement–specicallytopologyawarenessandhomogeneity,showingperfor-mancegainsupto25%comparedtoHPC-agnosticscheduling.
(§III,§VI)WeimplementtheproposedalgorithminOpenStackNovaschedulertoenableintelligentapplication-awareVMscheduling.
Throughexperimentalmeasurements,weshowthatcomparedtodedicatedexecution,ourtech-niquescanresultinupto45%betterperformancewhilelimitingjitterto8%.
(§IV,§VI)WemodifyCloudSim[10]tomakeitsuitableforsimu-lationofHPCincloud.
Toourknowledge,ourworkistherstefforttowardssimulationofHPCjobschedulingalgorithmsincloud.
Simulationresultsshowthatourtechniquescanresultinupto32%increasedthroughputcomparedtodefaultschedulingalgorithms.
(§VII)II.
VMCONSOLIDATIONFORHPCINCLOUD:SCOPEANDCHALLENGESTherearetwoadvantagesassociatedwiththeabilitytomixHPCandotherapplicationsonacommonplatform.
First,bettersystemutilizationsincethemachinescanbeusedforrunningnon-HPCapplicationswhenthereislowincominguxofHPCapplications.
Secondly,placingdifferenttypesofVMinstancesonthesamephysicalnodecanresultinadvantagesarisingfromresourcepacking.
Toquantifythepotentialcostsavingsthatcanbeachievedthroughconsolidation,weperformedanapproximatecalcu-lationusingpricingofAmazonEC2instances[11].
AmazonEC2offersadedicatedpoolofresourcesforHPCapplica-tionsknownasClusterCompute.
WeconsidertwoinstancetypesshowninTableIand,asaconcreteexample,TableIIshowsthedistributionofactuallyexecutedjobscalculatedfromMETACENTRUM-02.
swflogsobtainedfromthePar-allelWorkloadArchive[12].
ItisclearthatthereisawidedistributionandsomeHPCapplicationshavesmallmemoryfootprintwhilesomeneedlargememory.
Also,accordingtotheUSDoE,thereisatechnologytrendtowardsdecreasingmemorypercoreforexascalesupercomputers,indicatingthatmemorywillbeevenmorecrucialresourceinfuture[13].
IfthememoryleftunusedbysomeapplicationsintheClusterComputeinstancecanbeusedbyplacingaHighMemoryinstanceonthesamenodebytrading13EC2ComputeUnitsand34.
2GBmemory(stillleaving60.
2-34.
2=26GB),thenTABLEI:AmazonEC2instancetypesandpricingResourceInstancetypeHigh-MemoryDou-bleExtraLargeClusterComputeEightExtraLargeAPInamem2.
2xlargecc2.
8xlargeEC2Comp.
Units1388Memory34.
2GB60.
5GBStorage850GB3370GBI/OPerf.
HighVeryHighPrice($/hour)0.
92.
4TABLEII:Distributionofjob'smemoryrequirementMemorypercoreNumberofJobs8GB33(0.
03%)Total103656(100%)Fig.
1:Tradeoff:Resourcepackingvs.
HPC-awarenessfromTableIpricing,forevery2.
4$,onecangetadditional0.
9$.
However,thepriceofclustercomputeinstanceneedstobereducedbyafactorof2.
4/(88/13),sincethatinstancewillhave13EC2unitsless.
Hence,throughbetterresourcepack-ing,wecanget%benetsof[2.
42.
4/(88/13)+0.
9]2.
42.
4=23%.
However,traditionally,HPCapplicationsareexecutedondedicatednodestopreventanyinterferencearisingfromco-locatedapplications.
ThisisbecausetheperformanceofmanyHPCapplicationsstronglydependsontheslowestnode(forexample,whentheysynchronizethroughMPIbarriers).
Figure1illustratesthistradeoffbetweenresourcepackingandoptimizedHPCperformancewithanexample.
HerewehavetwoincomingVMprovisioningrequests,rst–for4instanceseachwith1core,512MBmemoryandsecond–for2instanceseachwith1core,3GBmemory.
Therearetwoavailablephysicalserverseachwith4coresand4GBmemory.
FigureshowstwowaysofplacingtheseVMsonphysicalservers.
Theboxesrepresentthe2dimensions–xdimensionbeingcores,ydimensionbeingmemory.
Bothrequestsaresatisedintherightgure,butnotinleftgure,sincethereisnotenoughmemoryonanindividualservertomeetthe3GBrequirementalthoughthereisenoughmemoryinthesystemasawhole.
Hence,therightgureisabetterstrategysinceitisperforming2-dimensionalbinpacking.
Nowconsiderthatthe1core512MBVMs(green)aremeantforHPC.
Inthatcase,(a)Totalcores(andVMs)perapplication=4(b)Totalcores(andVMs)perapplication=16Fig.
2:ApplicationPerformanceinsharednodeexecution(2coresforeachapplicationonanode)normalizedwrttodedicatedexecution(usingall4coresofanodeforsameapplication).
Totalcores(andVMs)perapplication=16,physicalcorespernode=4theleftgurecanresultinbetterHPCperformancecomparedtotherightonebecauseoftworeasons–a)Nointerferencefromapplicationsofotherusersrunningonsameserver,andb)allinter-processcommunicationsiswithinnode.
ThistradeoffbetweenbetterHPCperformancevs.
betterresourceutilizationmakesVMschedulingforHPCachallengingproblem.
A.
Cross-ApplicationInterferenceEventhoughtherearepotentialbenetsofusingconsol-idationofVMsforHPC,itisstillunclearwhether(andtowhatextent)wecanachieveincreasedresourceutilizationatanacceptableperformancepenalty.
ForHPCapplications,thedegradationduetointerferenceaccumulatesbecauseofthesynchronousandtightlycouplednatureofmanyHPCapplications.
WecomparedtheperformanceofasetofHPCapplicationsinaco-locatedvs.
dedicatedexecution.
Figure2demonstratestheeffectofrunningtwodifferentapplicationswhilesharingamulti-corenode(4-core,8GB,3GHzOpenCirrus[14]node).
EachVMneeds1-vcpu,2GBmemory,andusesKVM-hypervisorandCPU-pinnedconguration.
ApplicationsusedhereareNPB[15](EP=EmbarrisinglyParallel,LU=LUfactorization,IS=IntegerSort)problemsizeclassBandCandChaNGa[16]=Cosmology.
MoredetailsoftestbedandapplicationsarediscussedinSectionV.
Inthisexperiment,werstraneachapplicationusingall4coresofanode.
WethenranVMsfrom2differentapplicationsoneachnode(2VMsofeachapplicationonanode).
Next,wenormalizedtheperformanceforbothapplicationsinsecondcase(sharednode)withrespecttotherstcase(dedicatednode),andplottedthemasshowninFigures2a(4VMseachapplication)and2b(16VMseachapplications)fordifferentapplicationcombinations.
Inthegures,thex-labelshowstheapplicationcombination,andtherstbarshowsnormalizedperformancefortherstapplicationinx-label.
Similarlythesecondbarshowsthatofsecondapplicationinx-label.
Wecanobservethatsomeapplicationcombinationshavenormalizedperformanceclosetooneforbothapplicationse.
g.
EP.
B-ChaNGa.
Forsomeapplications,co-locationhasasignicantdetrimentalimpactonperformanceonatleastoneapplicatione.
g.
combinationsinvolvingIS.
B.
Theotherfacetoftheinterferenceproblemispositiveinterference.
Throughexperimentaldata,wenoticethatwecanachievesignicantperformanceimprovementforsomeapplicationcombinationse.
g.
LU.
C-ChaNGafor4corescaseshowsalmost120%betterperformanceforLU.
CwithChaNGa'snormalizedperformancecloseto1.
Thepositiveimpactonperformancewhenco-locatingdifferentHPCap-plicationspresentstousanotheropportunityforoptimizingVMplacement.
Whatneedstobeexplorediswhysomeco-locationsperformwellwhileothersdonot(SectionIII).
B.
TopologyAwarenessThesecondchallengetoVMconsolidationforHPCincloudistheapplications'sensitivitytonetworktopology.
SincemanyparallelprocessesconstitutinganHPCapplicationcommunicatefrequently,timespentincommunicationformsasignicantfractionoftotalexecutiontime.
TheimpactofclustertopologyhasbeenwidelyresearchedbyHPCresearchers,butinthecontextofcloud,itisuptothecloudprovidertouseVMplacementalgorithmswhichmapthemultipleVMsofanHPCapplicationinatopology-awaremannertominimizeinter-VMcommunicationoverhead.
Theimportanceoftopologyawarenesscanbeunderstoodbyapracticalexample–OpenCirrusHPLabsclusterhas32nodes(4-coreseach)inarack,andallnodesinarackareconnectedbya1Gbpslinktoaswitch.
Theracksareconnectedusinga10Gbpslinktoatop-levelswitch.
Hence,the10Gbpslinkissharedby32nodeswithaneffectivebandwidthof10Gbps/32=0.
312Gbpsbetweentwonodesindifferentracksforall-to-allcommunication.
However,thepoint-to-pointbandwidthbetweentwonodesinthesamerackis1Gbps.
Thus,packingVMstonodesinthesamerackwillbebenecialcomparedtoarandomplacementpolicy,whichcanpotentiallydistributethemalloverthecluster.
However,topology-awareplacementcanconictwiththegoalsofachievingbetterresourceutiliza-tionasdemonstratedbyFigure1.
C.
HardwareAwarenessAnothercharacteristicofHPCapplicationsisthattheyaregenerallyiterativeandbulksynchronous,withcomputationphasefollowedbybarriersynchronizationphase.
Sinceallpro-cessesmustnishthepreviousiterationbeforenextiterationcanbestarted,asingleslowprocesscanslowdowntheentireapplication.
Sincecloudsevolveovertimeanddemand,theyconsistofheterogeneousservers.
Furthermore,theunderlyinghardwareisnotvisibletotheuserwhoexpectsallVMstoachieveidenticalperformance.
Thecommonlyusedapproach(a)Totalcores(andVMs)perapplication=4(b)Totalcores(andVMs)perapplication=16(c)LastLevelCacheMissesFig.
3:(a,b)ApplicationPerformanceusing2corespernodenormalizedwrttodedicatedexecutionusingall4coresofa4-corenodeforsameapplication.
Firstbarforeachapplicationshowsthecasewhenleaving2coresidle(noco-locatedapplications).
Restbarsforeachapplicationshowthecasewhenco-locatedwithotherapplications(combinationsofFigure1).
(c)Averagepercorelastlevelcachemisses:Usingonly2coresvs.
usingall4coresofanodetoaddressheterogeneityincloudistocreateanewcomputeunit(e.
g.
AmazonEC2Computeunit)andallocatehardwarebasedonthisunit.
ThisallowsallocationofaCPUcoretomultipleVMsusingshares(e.
g.
80-20CPUshare).
However,itisimpracticalforHPCapplicationssincetheVMscomprisingasingleapplicationwillquicklygetoutofsyncwhensharingCPUwithotherVMs,resultinginmuchworseperformance.
Toovercometheseproblems,AmazonEC2usesadedicatedclusterforHPC.
However,thedisadvantageislowerutilizationwhichresultsinhigherprice.
Hence,thethirdchallengeforVMplacementforHPCistoensurehomogeneity.
VMplacementneedstobehardware-awaretoensurethatallkVMsofauserrequestareallocatedsametypeofprocessors.
III.
METHODOLOGYHavingidentiedtheopportunitiesforHPC-awareVMconsolidationincloud,wediscussourmethodologyforad-dressingthechallengesdiscussedinSectionII.
WeformulatetheproblemasaninitialVMplacementproblem:MapkVMs(v1,v2,.
.
,vk)eachwithsame,xedresourcerequirements(CPU,memory,disketc.
)tonphysicalserversP1,P2,.
.
.
,Pn,whichareunoccupiedorpartiallyoccupied,whilemeetingresourcedemands.
Moreover,wefocusonprovidingtheuseranHPC-optimizedVMplacement.
Oursolutionconsistsofa)One-timeapplicationcharacterization,andb)application-awarescheduling.
Next,wediscussthesetwocomponents.
A.
ApplicationCharacterizationOurgoalistoidentifywhatcharacteristicsofapplicationsaffecttheirperformancewhentheyareco-locatedwithotherapplicationsonanode.
Togetmoreinsightsintotheper-formanceobservedinFigure2,weplottheperformanceofeachapplicationobtainedwhenrunningalone(butusing2VMsona4corenode,leaving2coresidle)normalizedwithrespecttotheperformanceobtainedwhenusingall4coresforsameapplication(SeeFigures3aand3brstbarforeachapplication).
WecanseethatLUbenetsmostwhenrunin2corepernodecase,EPandChaNGaachievealmostsameperformance,andISsuffers.
Thisindicatesthatthecontentionofsharedresourcesinmulti-corenodesisacriticalfactorfortheseapplications.
Toconrmourhypothesis,wemeasuredthenumberoflastlevelcache(LLC)missespersecforeachapplicationusinghardwareperformancecountersandLinuxtooloprofile.
Figure3cshowsLLCmisses/secforourapplicationset,anddemonstratesthatLUsuffersahugenumberofmisses,indicativeoflargerworkingsetsize(orcache-intensiveness).
Inourterminology,cache-intensivereferstolargerworkingset.
Co-relatingFigures3aand3c,weseethatapplicationswhicharemorecache-intensive(thatissuffermoreLLCmissespersec)aretheonesthatbenetmostin2-corepernodecase,whereasapplicationswhicharelowtomoderatecache-intensive(e.
g.
EPandChaNGa)aremostlynotaffectedbytheuseof2or4-corespernode.
OneexceptiontothisisIS.
B.
4,becausethisapplicationishighlycommunication-intensiveandhencesuffersbecauseoftheinter-nodecommunicationhappeningin2-corepernodecase.
Barringthisexception,onefairlyintuitiveconclusionthatcanbedrawnfromthisexperimentisthatitisindeedbenecialtoco-locatecache-intensiveapplications(suchasLU)andapplicationwithlesscacheusage(suchasEP)onsamenode.
ThisisconrmedbymorecloselyexaminingFigure2.
HPCapplicationsintroduceanotherdimensiontotheprob-lemofaccountingcross-applicationinterference.
Ingeneral,theeffectofnoise/interferencegetsampliedinapplicationswhicharebulksynchronous.
ForsynchronousHPCapplica-tions,evenifonlyoneVMsuffersaperformancepenalty,alltheremainingVMswouldhavetowaitforittoreachthesynchronizationpoint.
Eventhoughtheinterferencesufferedbyindividualprocessesmaybelessoveraperiodoftime,theoveralleffectonapplicationperformancecanbesignicantduetotheaccumulationofnoiseoverallprocesses.
Hence,wecharacterizeapplicationsalongtwodimensions:1)Cache-intensiveness–Weassigneachapplicationacachescore(=100KLLCmisses/sec),representativeofthepressureitputsonthesharedcacheandmemorycontrollersubsystem.
Weacknowledgethatonecanuseworkingsetsizeasametric,butwechoseLLCmisses/secsinceitcanbeexperimentallymeasuredusinghardwareperformancecounters.
2)ParallelSynchronizationandNetworkSensitivity–Wemapapplicationstofourdifferentapplicationclasses,whichcanbespeciedbyauserwhenrequestingVMs:ExtremeHPC:Extremelytightlycoupledortopology-sensitiveapplicationsforwhichthebestwillbetoprovidededicatednodes,example–IS.
SyncHPC:Sensitivetointerference,butlesscomparedtoExtremeHPCandcansustainsmalldegreeofinterferencetogetconsolidationbenets,examples–LU,ChaNGa.
AsyncHPC:Asynchronous(andlesscommunicationsen-sitive)andcansustainmoreinterferencethanSyncHPC,examples–EP,MapReduceapplications.
NonHPC:Donotperformanycommunication,cansus-tainmoreinterference,andcanbeplacedonheteroge-neoushardware,example–Webapplications.
B.
Application-awareSchedulingWiththischaracterization,wedeviseanapplication-characteristicsawareVMplacementalgorithmwhichisacombinationofHPC-awareness(topologyandhomogeneityawareness),Multi-dimensionalOnlineBinpacking,andInter-ferenceminimizationthroughcache-sensitivityawareness.
Wediscussthedetailsofthisschedulerinthenextsection.
IV.
ANHPC-AWARESCHEDULERNext,wediscussthedesignandimplementationoftheproposedtechniquesontopofOpenStackNovascheduler[17].
A.
Background:OpenStackNovaSchedulerOpenStack[17]isanopensourcesoftware,beingdevelopedbycollaborationofmultipleinter-relatedprojects,forlarge-scaledeploymentandmanagementofprivateandpubliccloudsfromlargepoolsofinfrastructureresources(compute,storage,andnetworking).
Inthiswork,wefocusonthecomputecomponentofOpenStack,knownasNova.
NovaschedulerperformsthetaskofselectingphysicalnodeswhereaVMwillbeprovisioned.
SinceOpenStackisapopularcloudmanagementsystem,weimplementedourschedulingtechniquesontopofexistingNovascheduler(Diablo2011.
3).
ThedefaultschedulermakesVMplacementbasedontheVMprovisioningrequest(request_spec),andtheexist-ingstateandoccupancyofphysicalhosts(capabilitydata).
request_specspeciesthenumberandtypeofrequestedinstances(VMs),instancetypemapstoresourcerequirementssuchasnumberofvirtualcores,amountofmemory,amountofdiskspace.
Hostcapabilitydatacontainsthecurrentcapa-bilities(suchasfreeCPUs,freememory)ofphysicalservers(hosts)inthecloud.
Usingrequest_specandcapabilitiesdata,theschedulerperformsa2-stepalgorithm:1)Filtering–excludeshostsincapableoffulllingtherequest(e.
gfreecores5:rackList=newset6:hostCapacity,rackCapacity,filteredHostList←Calculate-HostAndRackCapacity(requestspec,capabilities)7:if(requestspec.
class==ExtremeHPC)||(requestspec.
class==SyncHPC)then8:sortedHostList←sortfilteredHostListbydecreasingorderofhostCapacity[j]wherej∈filteredHostList.
9:PrelimBuildPlan←stableSortsortedHostListbydecreas-ingorderofrackCapacity[capability[j].
rackid]wherej∈filteredHostList.
10:else11:PreBuildPlan=filteredHostList12:endif13:ifrequestspec.
class==ExtremeHPCthen14:buildPlan=newvector[int]15:fori=1toi>β37:totalCacheScore=ii.
cacheScoreisuchthatiisaninstancecurrentlyrunningonnode38:if(totalCacheScore+requestspec.
cacheScore)>αthen39:returnfalse40:endif41:ifi.
class=SyncHPCforanyi–aninstancecurrentlyrunningonnodethen42:if(totalCacheScore+requestspec.
cacheScore)>βthen43:returnfalse44:endif45:endif46:returntrueSchedulerignoresprocessorheterogeneityandnetworktopology.
SchedulerconsidersthekVMsrequestedbyanHPCuseraskseparateplacementproblems,thereisnoco-relationbetweentheplacementofVMsofasinglerequest.
TherehasbeenrecentandongoingworkonadaptingNovaschedulertomakeitarchitecture-andHPC-aware[8],[9].
B.
DesignandImplementationAlgorithm1describesourschedulingalgorithmus-ingOpenStackterminology.
TheVMprovisioningrequest(request_spec)nowcontainsapplicationclassandnameinadditiontoexistingparameters.
Thealgorithmproceedsbycalculatingthecurrenthostandrackfreecapacity,thatisnumberofadditionalVMsofrequestedspecicationthatcanbeplacedataparticularhostandrack(line6).
Whiledoingso,itsetsthecapacityofallthehostswhichhavearunningVMaszeroiftherequestedVMtypeisExtremeHPCtoensurethatonlydedicatednodesareusedforExtremeHPC.
Next,iftheclassofrequestedVMisExtremeHPCorSyncHPC,theschedulercreatesapreliminarybuildplanwhichisalistofhostsorderedbyrackCapacityoftheracktowhichahostbelongsandhostCapacityforhostsofsamerack.
ThegoalistoallocateVMstosamehostandsameracktotheextentpossibletominimizeinter-VMcommunicationoverheadfortheseapplicationclasses.
ForExtremeHPC,thisPreBuildPlanisusedforprovisioningVMs,whereasfortherestclasses,thealgorithmperformsmulti-dimensionalonlinebinpackingtotVMsofdifferentcharacteristicstogetheronsamehost(line21).
ProcedureMDOBPusesabinpackingheuristicforselectingahostfromavailablechoices(line28).
Weuseadimension-awareheuristic–selectthehostforwhichthevectorofrequestedresourcesalignsthemostwiththevectorofremainingcapacities.
Thekeyintuitioncanbeunderstoodbyrevisitingtheexampleof2-dimensionalbin-packinginFigure1.
Forbestutilizationofthecapacityinbothdimensions,itisdesirablethatthenalsumofalltheVMvectorsonahostisclosetothetoprightcornerofthehostrectangle.
Hence,weselectthehostsuchthatplacingtherequestedVMonitwouldmovethevectorrepresentingitsoccupiedresourcestowardsthetoprightcorner.
OurheuristicissimilartothosestudiedbyLeeetal.
[18].
Formally,considerremainingorresidualcapacities(CPURes,MemRes)ofahost,i.
e.
subtractfromthecapacity(totalCPU,totalmemory)thetotaldemandofalltheitems(VMcores,VMmemory)currentlyassignedtoit.
AlsoconsiderrequestedVM:(CPUReq(=1),MemReq).
Thisheuristicselectsthehostwiththeminimumθwherecos(θ)iscalculatedusingdotproductofthetwovectors,andisgivenby:(CPUReqCPURes)+(MemReqMemRes)√CPURes2+MemRes2√CPUReq2+MemReq2,withCPURes>=CPUReq,MemRes>=MemReq.
Next,theselectedhostischeckedtoensurethatplacingtherequestedVMonitdoesnotviolatetheinterferencecriteria(line29).
Weusethefollowingcriteria–thesumofcachescoresoftherequestedVMandalltheVMsrunningonahostshouldnotexceedathreshold,whichneedstobedeterminedthroughexperimentalanalysis.
ThisthresholdisdifferentiftherequestedVMoroneormoreVMsrunningonthathostisofclassSyncHPCsinceapplicationsofthisclasscantoleratelesserinterference(line44).
Inaddition,wemaintainadatabaseofinterferenceindicestorecordinterferencebetweenthoseapplicationswhichsufferlargeperformancepenaltywhensharinghosts.
Thisinformationisusedtoavoidco-locationswhicharedenitelynotbenecial.
TheoutputofAlgorithm1isbuildPlanwhichisthelistofhostswheretheVMsshouldbeprovisioned.
Toensurehomogeneity,hostsaregroupedintodifferentlistsbasedontheirprocessortype,andthealgorithmoperatesonthesegroups.
Currently,weuseCPUfrequencyastheFig.
4:Implementationdetailsandcontrolowofaprovisioningrequestdistinctioncriteriabetweenprocessortypes.
Formoreaccuratedistinction,additionalfactorssuchasMIPScanbeconsidered.
Figure4showstheoverallcontrolowforaVMpro-visioningrequest,highlightingtheadditionalfeaturesandchangesthatweintroducedwhileimplementingtheHPC-awareschedulingalgorithminOpenStackNova.
Wemodiedbotheuca-toolsandOpenStackEC2APItoallowadditionalparameterstobepassedalongwithaVMprovisioningrequest.
Further,westoretheseadditionalproperties(suchasapplicationclassandcachescore)ofrunningVMsintheNovadatabase.
Also,wecreateandmaintainadditionalinformation–interferenceindices,whichisarecordofinter-ferencesufferedbyeachapplicationwithotherapplicationsduringacharacterizationrun.
Newtablesapp_runningwithcolumnsthatstorethehostnameandapplicationsrunningonit,andapp_interferencesthatstorestheinterferencebetweenanyofthemwereaddedtoNovaDB.
NovaDBAPIwasmodiedtoincludeafunctiontoreadthisdatabase.
Weextendedtheexistingabstract_scheduler.
pyinNovatocreateHPCinCloud_scheduler.
pywhichcontainstheadditionalfunctions_scheduleMDOBP,choose_best_fit_host,andmeet_interference_criteria.
V.
EVALUATIONMETHODOLOGYInthissection,wedescribeourcloudsetupandtheappli-cationswhichweused.
A.
ExperimentalTestbedWeevaluatedourtechniquesonacloudsetupusingOpen-StackonOpenCirrustestbedatHPLabssite[14].
Thiscloudhas3typesofservers:IntelXeonE5450(12MCache,3.
00GHz)IntelXeonX3370(12MCache,3.
00GHz)IntelXeonX3210(8MCache,2.
13GHz)TheclustertopologyisasdescribedinSectionII-B.
Forvirtualization,wechoseKVM[19],sincepriorresearchhasindicatedthatKVMisagoodchoiceforvirtualizationforHPCclouds[20].
Fornetworkvirtualization,weexperimentedwithdifferentnetworkdriverssuchasrtl8139,eth1000,virtio-net,andsettledonvirtio-netbecauseofbetternetworkperfor-mance(alsoshownin[6]).
WeusedVMsoftypem1.
small(1core,2GBmemory,20GBdisk).
However,thesechoicesdonotinuencethegeneralityofourconclusions.
B.
BenchmarksandApplicationsWeusedtheNASParallelBenchmarks(NPB)[15]prob-lemsizeclassBandC(theMPIversion,NPB3.
3-MPI),whicharewidelyusedbyHPCcommunityforperformancebenchmarking,andprovidegoodcoverageofcomputation,communication,andmemorycharacteristics.
WealsousedthreelargerHPCapplications:NAMD[21]–Ahighlyscalablemoleculardynamicsapplicationusedubiquitouslyonsupercomputers.
WeusedtheApoA1input(92katoms)forourexperiments.
ChaNGa[16]–AcosmologyapplicationwhichperformcollisionlessN-bodysimulationusingBarnes-Huttreeforforcecalculation.
Weuseda300,000particlesystem.
Jacobi2D–A5-pointstencilcomputationkernelwhichaveragesvaluesina2-Dgrid,andisusedinscienticsimulations,numericalalgebra,andimageprocessing.
TheseapplicationsarewritteninCharm++[22]whichisanobject-orientedparallelprogramminglanguage.
Weusedthenet-linux-x86-64machinelayerofCharm++with–O3optimizationlevel.
VI.
EXPERIMENTALRESULTSNext,weevaluatethebenetsofHPC-awareVMplacementandtheeffectofjitterarisingfromVMconsolidation.
A.
HPC-AwarePlacementTodemonstratetheimpactoftopologyawarenessandhomogeneity,wecomparedtheperformanceobtainedbyHPC-awareschedulerwithrandomVMplacement.
Intheseexperi-ments,wedidnotperformVMconsolidation.
Figure5showstheperformanceobtainedbyourVMplacement(Homo)comparedtothecasewhentwoVMsaremappedtoaslowerprocessors,resttothefasterprocessor(Hetero).
Wecalculated%improvement=(THeteroTHomo)/THetero.
Wecanseethattheimprovementachieveddependsonthenatureofapplicationandthescaleatwhichitisrun.
Also,theimprovementisnotequaltotheratioofsequentialexecutiontimeonslowerprocessortothatonfasterprocessor.
ThiscanFig.
5:%improvementachievedusingHPCawareness(homogeneity)comparedtothecasewhere2VMswereonslowerprocessorsandrestonfasterprocessors(a)Heterogeneous:FirsttwoVMsonslowerprocessors(b)Homogeneous:All8VMsonsametypeofprocessorsFig.
6:CPUTimelinesof8VMsrunningJacobi2Dbeattributedtothecommunicationtimeandparalleloverhead,whichisnotnecessarilydependentontheprocessorspeeds.
Fortheseapplications,weachievedupto20%improvementinparallelexecutiontime,whichmeanswesave20%oftime*NCPU-hours,whereNisthenumberofprocessorsused.
WeanalyzedtheperformancebottleneckusingthePro-jections[23]tool.
Figure6showstheCPU(VM)timelinesforan8-coreJacobi2Dexperiment,x-axisistime,y-axisisthe(virtual)corenumber,whiteportionshowsidletime,andcoloredportionsrepresentapplicationfunctions.
InFigure6a,thereislotmoreidletimeonVMs3-7comparedtorst2VMs(runningonslowerprocessors)sinceVMs3-7havetowaitforVMs0-1toreachthesynchronizationpoint.
ThesmallidletimeinFigure6bduetothecommunicationtime.
Next,wecomparedtheperformanceobtainedwhenusingtheVMplacementprovidedbyHPC-optimizedalgorithmvs.
thedefaultVMplacementvs.
withoutvirtualizationonthesametestbed(seeFigure7).
ThedefaultplacementselectsthehostwithleastfreeCPUcores(orPEs)agnosticofitstopologyandhardware.
Inthisexperiment,thersthostinthecloudhadslowerprocessortype.
Figure7showsthatevencommunication-intensiveapplicationssuchasNAMDandChaNGascalewellforCloud-optcase,andachieveper-formanceclosetothatobtainedonphysicalplatform.
Benetsupto25%areachievedcomparedtothedefaultscheduler.
However,performanceachievedonthephysicalplatformitselfisupto4Xworsecomparedtoidealscalingat64cores,(a)NAMD(MolecularDynamics)(b)ChaNGa(Cosmology)(c)Jacobi2D–4Kby4KmatrixFig.
7:RuntimeResults:ExecutionTimevs.
Numberofcores/VMsfordifferentapplications.
ApplicationClassCacheTimePlacementScoreded.
runβ=100β=40β=60IS.
B.
4ExtremeHPC4789.
54*N14*N14*N1LU.
C.
16SyncHPC16180.
184*(N2-N5)2*(N1-N8)afterApp1,App3-App53*(N2-N6)+1*N7LU.
B.
4SyncHPC291473*N6+1*N71*(N2-N5)2*N1+2*N8afterApp1ChaNGa.
4SyncHPC7.
5100.
421*N6+3*N71*(N2-N5)1*(N2-N5)EP.
B.
4AsyncHPC2.
5101.
54*N81*(N2-N5)1*N6+3*N7Fig.
8:Tableofapplications,andgureshowingpercentageimprovementachievedusingapplication-awareschedulingcomparedtothecasewhenapplicationswererunindedicatedmannerlikelyduetotheabsenceofanHPC-optimizednetwork.
Adetailedanalysisofthecommunicationperformanceofthiscloud(withdifferentvirtualizationdrivers)wasdonein[6].
B.
CaseStudyofApplication-AwareSchedulingHere,weconsider8nodes(32cores)ofourexperimentaltestbed,andperformVMplacementfortheapplicationstreamshowninFigure8usingtheapplication-awarescheduler.
TheapplicationsufxisthenumberofrequestedVMs.
Figure8showsthecharacteristicsoftheseapplicationsandtheoutputofschedulerwiththreedifferentcachethresholds(β).
Theoutput(Placement)ispresentedintheformofthenodes(andcorespernode)towhichtheschedulermappedtheapplication.
Thisgurealsoshowstheachievedperformanceforthesecasescomparedtothededicatedexecutionusingall4corespernode.
Whenthecachethresholdistoolarge,thereislessperformanceimprovementduetoaggressivepackingofcache-intensiveapplicationsonthesamenode.
Onthecontrary,averysmallthresholdresultsinunnecessarywastageofsomeCPUcoresiftherearefewapplicationswithverysmallcachescores.
ThisisillustratedbytheplacementshowninFigure8,wheretheexecutionofsomeapplicationswasdeferredbecausetheinterferencecriteriawerenotsatisedduetosmallcachethreshold.
Moreover,thereisadditionalpenalty(communicationoverhead)associatedwhennotusingallcoresofanodeforrunninganHPCapplication.
Hence,thecachethresholdneedstobechosencarefullythroughextensiveexperimentation.
Inthiscase,weseethatthethresholdof60worksthebest.
Forthisthresholdandourapplicationset,weachieveperformancegainsupto45%forasingleapplicationwhilelimitingnegativeimpactofinterferenceto8%.
Wealsomeasuredtheoverheadofourschedulingalgorithmbymeasuringtheexecutiontime.
Theaveragetimetohandlearequestfor1,16instanceswas1.
58s,1.
80srespectivelybyourschedulercomparedto1.
54s,1.
67sfordefaultscheduler.
VII.
SIMULATIONCloudSimisasimulationtoolmodelingacloudcomputingenvironmentinadatacenter,andiswidelyusedforevaluationofresourceprovisioningalgorithms[10].
Inthiswork,wemodiedittoenablethesimulationofHighPerformanceComputingjobsincloud.
HPCmachineshavemassivenumberofprocessors,whereasCloudSimisdesignedandimplementedforcloudcomputingenvironment,andworksmainlywithjobswhichneedssingleprocessor.
Hence,forsimulatingHPCincloud,theprimarymodicationweperformedwastoimprovethehandlingofmulti-corejobs.
WeextendedtheexistingvmAllocationPolicySimpleclasstocreateavmAllocationPolicyHPCwhichcanhandleauserrequestcomprisingmultipleVMinstancesandperformsapplication-awarescheduling(discussedinAlgorithm1).
Atthestartofsimulation,axednumberofVMs(ofdifferentspeciedtypes)arecreated,andjobs(cloudlets)aresubmittedtothedatacenterbrokerwhichmapsajobtoaVM.
Whentherearenopendingjobs,alltheVMsareterminatedandsimulationcompletes.
SinceourfocuswasonmappingofVMstophysicalhosts,wecreatedaone-to-onemappingbetweencloudletsandVMs.
Moreover,weimplementedVMterminationduringsimulationtoensurecompletesimulation.
WithoutdynamicVMcreationandtermination,theinitialsetofVMsruntilltheendofsimulation,leadingtoindenitelyblockedjobsinthesystemsincetheVMswheretheycanrunnevergetscheduledbecauseofthelimiteddatacentercapacity.
Wesimulatedtheexecutionofjobsfromthelogsob-tainedfromparallelworkloadarchive[12].
WeusedtheMETACENTRUM-02.
swflogssincetheselogscontainin-(a)MDOBPvs.
default"LeastFreePE"heuristicfordifferentamountofRAMpernode.
ppn=processors(cores)pernode(b)Interference-awareMDOBPfordifferentcachethresholds(β),adjustedexecutiontimes(accountingforbetterperformancewithcacheawareness)Fig.
9:SimulationResults:Numberofcompletedjobsvstimefordifferentschedulingtechniques,1024coresformationaboutajob'smemoryconsumption.
Foreachjobrecord(ncores,mMBmemory,executiontime)inthelogle,wecreatenVMseachwith1-coreandm/nMBmemory.
Wesimulatedtheexecutionofrst1500jobsfromthelogleon1024cores,andmeasuredthenumberofcompletedjobsafter100seconds.
Figure9ashowsthatthenumberofcompletedjobsafter100secondsincreasedbyaround109/801=13.
6%whenusingMDOBPinsteadofthedefaultheuristic(selectinganodewithleastfreePEs)fortheconstrainedmemorycase(2GBpernode),whereastherewasnoimprovementforthisjobsetwhenthenodeshadlargememorypercore.
Thisisattributedtothefactthatthisjobsethasveryfewapplicationswithlargememoryrequirement.
However,withthetrendofthebigmemoryapplications,alsotrueforthenextgenerationexascaleapplications,weexpecttoseesignicantgainsforarchitectureswithlargememorypernodeaswell.
WealsosimulatedourHPC-awarescheduler(includingcache-awareness)byassigningeachjobacachescorefrom(0-30)usingauniformdistributionrandomnumbergenerator.
Weusedtwodifferentvaluesofthecachethreshold(SeeFigure9bIFMDOBP).
Wesimulatedthejobswithmodiedexecutiontimesofalljobsby-10%and-20%toaccountfortheimprovementinperformanceresultingfromcache-awarenessasseenfromresultsinSectionVI.
Thenumberofcompletedjobsafter100secondsfurtherincreasedto1060forthecachethresholdof60andadjustmentof-10%,whichisareasonablechoicebasedontheresultsobtainedinSec-tionVI-B.
Hence,overallwegetimprovementinthroughputby259/801=32.
3%comparedtodefaultscheduler.
Also,wecanseethatasmallcachethreshold(β=40)canactuallydegradeoverallthroughputbecausesomecoreswillbeleftunusedtoensurethattheinterferencerequirementsareobeyed.
VIII.
RELATEDWORKPreviousstudiesonHPCapplicationsincloudhavecon-cludedthatcloudcannotcompetewithsupercomputersbasedonthemetric$/GFLOPSforlargescaleHPCapplicationsbecauseofbottleneckssuchasinterconnectandI/Operfor-mance[1]–[4].
However,cloudscanbecost-effectiveforsomeapplications,specicallythosewithlesscommunicationandatlowscale[3],[4].
Inthiswork,weexploreVMplacementtechniquestomakeHPCincloudmoreeconomicalthroughimprovedperformanceandresourceutilization.
Workonschedulingincloudcanbeclassiedintothreeareas:1)InitialVMPlacement–wheretheproblemistomapa(setof)VM(s)ofasingleuserrequesttoavailablepoolofresources.
2)OfineVMConsolidation–wheretheproblemistomapVM(s)fromdifferentuserrequests,hencewithdifferentresourcerequirementstophysicalresourcestominimizethenumberofactiveserverstosaveenergy.
3)LiveMigration–whereremappingdecisionsaremadeforliveVMs.
Ourfocusisontherstproblem,sinceourresearchistowardsinfrastructureclouds(IaaS)suchasAmazonEC2,whereVMallocationandmappinghappenasandwhenVMrequestsarrive.
OfineVMconsolidationhasbeenextensivelyresearched[24],[25],butisnotapplicabletoIaaS.
Also,livemigrationhasassociatedcosts,andintroducesfurthernoise.
ForinitialVMplacement,existingcloudmanagementsys-temssuchasOpenStack[17],Eucalyptus[26],andOpen-Nebula[27]useRoundRobin(nextavailableserver),FirstFit(rstavailableserver),orGreedyRankingbased(besttaccordingtocertaincriteriae.
g.
leastfreeRAM)strategies,whichoperateinone-dimension(CPUormemory).
Otherresearchershaveproposedgeneticalgorithms[28].
AdetaileddescriptionandvalidationofVMconsolidationheuristicsisprovidedin[18].
However,thesetechniquesignoretheintrinsicnatureofHPCVMs–tightlycoupledness.
Fanetal.
discusstopology-awaredeploymentforscienticapplicationsincloud,andmapthecommunicationtopologyofaparallelapplicationtotheVMphysicaltopology[7].
Recently,OpenStackcommunityhasbeenworkingonmakingtheschedulerarchitecture-awareandsuitableforHPC[8],[9].
AmazonEC2hasaClusterComputeinstancewhichallowsplacementgroupssuchthatallinstanceswithinaplacementgroupareexpectedtogetlowlatencyandfullbisection10Gbpsbandwidth[29].
Itisnotknownhowstrictlythoseguaranteesaremetandwhattechniquesareusedtomeetthem.
Inthiswork,weextendourpreviousworkonHPC-awarescheduler[6]inmultipleways-First,weusemulti-dimensionalonlinebinpacking(MDOBP)forconsideringresourcesalongalldimensions(suchasCPUandmemory).
MDOBPalgorithmshavebeenexploredinofineVMconsol-idationresearch,butweapplythesetoinitialVMplacementproblemandinthecontextofHPCincloud.
Second,weleveragetheadditionalknowledgeabouttheapplicationchar-acteristics,suchasHPCornon-HPC,synchronization,com-munication,andcachecharacteristicstolimitcross-applicationinterference.
Wegotinsightsfromstudieswhichhaveexploredtheeffectsofsharedmulti-corenodeoncross-VMinterfer-ence,bothinHPCandnon-HPCdomain[24],[30],[31].
TherearemanytoolsforschedulingHPCjobsonclusters,suchasOracleGridEngine,ALPS,OpenPBS,SLURM,TORQUE,andCondor.
Theyarealljobschedulersorresourcemanagementsystemsforclusterorgridenvironment,andaimtoutilizesystemresourcesinanefcientmanner.
Theydifferfromschedulingoncloudsincetheyworkwithphysicalnotvirtualmachines,andhencecannotbenetfromthetraitsofvirtualizationsuchasconsolidation.
Nodesaretypicallyallottedtoasingleuser,andnotsharedwithotherusers.
IX.
LESSONS,CONCLUSIONSANDFUTUREWORKWesummarizethelessonslearnedthroughthisresearch:Althoughitmaybecounterintuitive,HPCcanbenetgreatlybyconsolidatingVMsusingsmartco-locations.
AcloudmanagementsystemsuchasOpenStackwouldgreatlybenetfromaschedulerwhichisawareoftheapplicationcharacteristicssuchascache,synchronizationandcommunicationbehavior,andHPCvsnon-HPC.
CarefulVMplacementandexecutionofHPCandotherworkloadscanresultinbetterresourceutilization,costreduction,andhencebroaderacceptanceofHPCclouds.
Throughexperimentalresearch,weexploredtheopportuni-tiesandchallengesofVMconsolidationforHPCincloud.
WedesignedandimplementedanHPC-awareschedulingalgorithmforVMplacementwhichachievesbetterresourceutilizationandlimitscross-applicationinterferencethroughcarefulco-location.
Throughexperimentalandsimulationre-sults,wedemonstratedbenetsofupto32%increaseinjobthroughputandperformanceimprovementupto45%whilelimitingtheeffectofjitterto8%.
Infuture,weplantoconsiderotherfactorswhichcanaffectperformanceofaVMinasharedmulti-corenodesuchasI/O(networkanddisk).
AnotherdirectionofresearchistoaddressotherchallengesforadoptionofcloudbyHPCcommunity.
ACKNOWLEDGMENTSWethankArpitaKundu,IvanNithin,ChaitraPadmanabhan,JibanJ.
SarmaandRSuryaprakashforsettinguptheclouden-vironmentandhelpingwiththeschedulerimplementation.
WethankAlexZhangforthediscussionsonserverconsolidation.
FirstauthorwassupportedbyHPLab's2012IRPaward.
REFERENCES[1]E.
Walker,"BenchmarkingAmazonEC2forHigh-PerformanceScien-ticComputing,"LOGIN,pp.
18–23,2008.
[2]"MagellanFinalReport,"U.
S.
DepartmentofEnergy(DOE),Tech.
Rep.
,2011,http://science.
energy.
gov//media/ascr/pdf/program-documents/docs/MagellanFinalReport.
pdf.
[3]A.
GuptaandD.
Milojicic,"EvaluationofHPCApplicationsonCloud,"inOpenCirrusSummit(BestStudentPaper),Atlanta,GA,Oct.
2011,pp.
22–26.
[Online].
Available:http://dx.
doi.
org/10.
1109/OCS.
2011.
10[4]A.
Guptaetal.
,"ExploringthePerformanceandMappingofHPCApplicationstoPlatformsinthecloud,"inHPDC'12.
NewYork,NY,USA:ACM,2012,pp.
121–122.
[5]"MPI:AMessagePassingInterfaceStandard,"inM.
P.
I.
Forum,1994.
[6]A.
Gupta,D.
Milojicic,andL.
Kale,"OptimizingVMPlacementforHPCinCloud,"inWorkshoponCloudServices,Federationandthe8thOpenCirrusSummit,SanJose,CA,2012.
[7]P.
Fan,Z.
Chen,J.
Wang,Z.
Zheng,andM.
R.
Lyu,"Topology-AwareDeploymentofScienticApplicationsinCloudComputing,"CloudComputing,IEEEInternationalConferenceon,vol.
0,2012.
[8]"NovaSchedulingAdaptations,"http://xlcloud.
org/bin/download/Download/Presentations/Workshop26072012Scheduler.
pdf.
[9]"HeterogeneousArchitectureScheduler,"http://wiki.
openstack.
org/HeterogeneousArchitectureScheduler.
[10]R.
N.
Calheiros,R.
Ranjan,A.
Beloglazov,C.
A.
F.
DeRose,andR.
Buyya,"CloudSim:AToolkitforModelingandSimulationofCloudComputingEnvironmentsandEvaluationofResourceProvisioningalgorithms,"Softw.
Pract.
Exper.
,vol.
41,no.
1,pp.
23–50,Jan.
2011.
[11]"AmazonElasticComputeCloud,"http://aws.
amazon.
com/ec2.
[12]"ParallelWorkloadsArchive.
"[Online].
Available:http://www.
cs.
huji.
ac.
il/labs/parallel/workload/[13]"ExascaleChallenges,"http://science.
energy.
gov/ascr/research/scidac/exascale-challenges.
[14]A.
Avetisyanetal.
,"OpenCirrus:AGlobalCloudComputingTestbed,"IEEEComputer,vol.
43,pp.
35–43,April2010.
[15]"NASParallelBenchmarks,"http://www.
nas.
nasa.
gov/Resources/Software/npb.
html.
[16]P.
Jetley,F.
Gioachin,C.
Mendes,L.
V.
Kale,andT.
R.
Quinn,"MassivelyParallelCosmologicalSimulationswithChaNGa,"inIDPPS,2008,pp.
1–12.
[17]"OpenStackCloudComputingSoftware,"http://openstack.
org.
[18]S.
Lee,R.
Panigrahy,V.
Prabhakaran,V.
Ramasubramanian,K.
Talwar,L.
Uyeda,andU.
Wieder.
,"ValidatingHeuristicsforVirtualMachinesConsolidation,"MicrosoftResearch,Tech.
Rep.
,2011.
[19]"KVM–Kernel-basedVirtualMachine,"Redhat,Inc.
,Tech.
Rep.
,2009.
[20]A.
J.
Younge,R.
Henschel,J.
T.
Brown,G.
vonLaszewski,J.
Qiu,andG.
C.
Fox,"AnalysisofVirtualizationTechnologiesforHighPerformanceComputingEnvironments,"CloudComputing,IEEEIntl.
Conf.
on,vol.
0,pp.
9–16,2011.
[21]A.
Bhatele,S.
Kumar,C.
Mei,J.
C.
Phillips,G.
Zheng,andL.
V.
Kale,"OvercomingScalingChallengesinBiomolecularSimulationsacrossMultiplePlatforms,"inIPDPS2008,April2008,pp.
1–12.
[22]L.
Kale,"TheChareKernelparallelprogramminglanguageandsystem,"inProceedingsoftheInternationalConferenceonParallelProcessing,vol.
II,Aug.
1990,pp.
17–25.
[23]L.
KaleandA.
Sinha,"Projections:AScalablePerformanceTool,"inParallelSystemsFair,InternationalParallelProcessingSymposium,Apr.
1993.
[24]A.
Verma,P.
Ahuja,andA.
Neogi,"Power-awareDynamicPlacementofHPCApplications,"ser.
ICS'08.
NewYork,NY,USA:ACM,2008,pp.
175–184.
[25]S.
K.
Garg,C.
S.
Yeo,A.
Anandasivam,andR.
Buyya,"Energy-EfcientSchedulingofHPCApplicationsinCloudComputingEnvironments,"CoRR,vol.
abs/0909.
1146,2009.
[26]D.
Nurmietal.
,"TheEucalyptusOpen-sourceCloud-computingSys-tem,"inProceedingsofCloudComputingandItsApplications,2008.
[27]"TheCloudDataCenterManagementSolution,"http://opennebula.
org.
[28]J.
XuandJ.
A.
B.
Fortes,"Multi-ObjectiveVirtualMachinePlacementinVirtualizedDataCenterEnvironments,"ser.
GREENCOM-CPSCOM'10.
Washington,DC,USA:IEEEComputerSociety,pp.
179–188.
[29]"HighPerformanceComputing(HPC)onAWS,"http://aws.
amazon.
com/hpc-applications.
[30]J.
Marsetal.
,"Bubble-Up:IncreasingUtilizationinModernWarehouseScaleComputersviaSensibleCo-locations,"ser.
MICRO-44'11.
NewYork,NY,USA:ACM,2011,pp.
248–259.
[31]J.
Han,J.
Ahn,C.
Kim,Y.
Kwon,Y.
-R.
Choi,andJ.
Huh,"TheEffectofMulti-coreonHPCApplicationsinVirtualizedSystems,"ser.
Euro-Par2010.
Berlin,Heidelberg:Springer-Verlag,2011,pp.
615–623.

展开全文