cumulativesandybridge

sandybridge  时间:2021-03-27  阅读:()
MeasuringEnergyandPowerwithPAPIVincentM.
Weaver,MattJohnson,KiranKasichayanula,JamesRalph,PiotrLuszczek,DanTerpstra,andShirleyMooreInnovativeComputingLaboratoryUniversityofTennessee{vweaver1,mrj,kirankk,ralph,luszczek,terpstra,shirley}@eecs.
utk.
eduAbstract—Energyandpowerconsumptionarebecomingcriti-calmetricsinthedesignandusageofhighperformancesystems.
WehaveextendedthePerformanceAPI(PAPI)analysislibrarytomeasureandreportenergyandpowervalues.
ThesevaluesarereportedusingtheexistingPAPIAPI,allowingcodepreviouslyinstrumentedforperformancecounterstoalsomeasurepowerandenergy.
HigherleveltoolsthatbuildonPAPIwillautomat-icallygainsupportforpowerandenergyreadingswhenusedwiththenewestversionofPAPI.
WedescribeindetailthetypesofenergyandpowerreadingsavailablethroughPAPI.
Wesupportexternalpowermeters,aswellasvaluesprovidedinternallybyrecentCPUsandGPUs.
Measurementsareprovideddirectlytotheinstrumentedprocess,allowingimmediatecodeanalysisinrealtime.
Weprovideexamplesshowingresultsthatcanbeobtainedwithourinfrastructure.
IndexTerms—energymeasurement;powermeasurement;per-formanceanalysisI.
INTRODUCTIONThePerformanceAPI(PAPI)[1]frameworkhastradition-allyprovidedlow-levelcross-platformaccesstothehardwareperformancecountersavailableonmostmodernCPUs.
WiththeadventofcomponentPAPI(PAPI-C)[2],PAPIhasbeenextendedtoprovideawidervarietyofperformancedatafromvarioussources.
Recentlyanumberofnewcomponentshavebeenaddedthatprovidetheabilitytomeasureasystem'senergyandpowerusage.
Energyandpowerhavebecomeincreasinglyimportantcomponentsofoverallsystembehaviorinhigh-performancecomputing(HPC).
Powerandenergyconcernswereonceprimarilyofinteresttoembeddeddevelopers.
NowthatHPCmachineshavehundredsofthousandsofcores[3],theabilitytoreduceconsumptionbyjustafewWattsperCPUquicklyaddsuptomajorpower,cooling,andmonetarysavings.
TherehasbeenalotofHPCinterestinthisarearecently,includingtheGreen500[4]listofenergy-efcientsupercomputers.
PAPI'sabilitytobeextendedbycomponentsallowsaddingsupportforenergyandpowermeasurementswithoutanychangesneededtothecoreinfrastructure.
Existingcodethatisalreadyinstrumentedformeasuringperformancecounterscanbere-used;thenewpowerandenergyeventswillshowupineventlistingsjustlikeotherperformanceevents,andcanbemeasuredwiththesameexistingPAPIAPI.
ThiswillallowcurrentusersofPAPIonHPCsystemstoanalyzepowerandenergywithlittleadditionaleffort.
Therearemanyexistingtoolsthatprovideaccesstopowerandenergymeasurements(oftenthesecomewiththepowermeasuringhardware).
PAPI'sadvantageisthatitallowsmea-suringadiversesetofhardwarewithonecommoninterface.
Usersonlyinstrumenttheircodeonce,andthencanuseitwithminimalchangesastheircodeismovedbetweendifferentmachineswithdifferenthardware.
WithoutPAPItheinstrumentedcodewouldhavetobere-writtendependingonwhatpowermeasurementhardwareitisrunningon.
AnotherbenetofPAPIisthatinadditiontomeasuringenergyandpower,italsoprovidesaccesstoothervalues,suchasCPUperformancecounters,GPUcounters,network,andI/O.
Allofthesecanbemeasuredatthesametime,providingforaricheranalysisenvironment.
ManyoftheotheradvancedPAPIfeatures,suchassamplingandproling,canpotentiallybeusedinconjunctionwiththesenewpowerandenergyevents.
Higher-leveltoolsthatbuildontopofPAPI(suchasTAU[5],HPCToolkit[6],orVampir[7])automaticallygetsupportforthesenewmeasurementsassoonastheyarepairedwithanupdatedPAPIversion.
WewilldescribeindetailthevarioustypesofpowerandenergymeasurementsthatwillbeavailableinthePAPI5.
0release,aswellasshowingexamplesofthedatathatcanbegathered.
II.
RELATEDWORKTherearevariousexistingtoolsthatprovideaccesstopowerandenergyvalues.
Ingeneralthesetoolsdonothaveacross-platformAPIlikePAPI,noraretheydeployedaswidely.
PAPIhasthebenetofallowingenergymeasurementsatthesametimeasCPUandotherperformancecountermeasurements,allowinganalysisoflow-levelenergybehavioratthesourcecodelevel.
PAPIcanalsoactasanabstractionlibrary,somostofthetoolslistedbelowcouldbegivenPAPIcomponentinterfaces.
ThetoolthatprovidesthemostsimilarfunctionalitytoPAPIistheIntelEnergyCheckerSDK[8].
ItprovidesanAPIforinstrumentingcodeandgatheringenergyinformationfromavarietyofexternalpowermetersandsystemcounters.
Itprovidessupportforvariousoperatingsystems,butislimitedtoIntelarchitectures.
PowerPack[9]providesaninterfaceformeasuringpowerfromavarietyofexternalpowersources.
TheAPIprovidesroutinesforstartingandstoppingthegatheringofdataontheremotemachine.
UnlikePAPI,themeasurementsaregatheredout-of-band(onaseparatemachine)andthuscannotbedirectlyprovidedtotherunningprocessinrealtime.
Appearedinthe2012PASAWorkshopIBMPowerExecutive[10]allowsmonitoringpowerandenergyonIBMbladeservers.
AswithPowerPack,thedataisgatheredandanalyzedbyatool(inthiscaseIBMDirector)runningonaseparatemachine.
Shinetal.
[11]constructapowerboardforanARMsystemthatestimatespowerandcommunicateswithafront-endtoolviaPCI.
Varioustoolsaredescribedthatusethegatheredinformation,butthereisnotagenericAPIforaccessingit.
TheLinuxEnergyAttributionandAccountingPlatform(LEA2P)[12]acquiresdataonasystemwithhardwarecustom-modiedtoprovidepowerreadingsviaadataacqui-sitionboard.
ThesevaluesarepassedintotheLinuxkernelandmadeavailableviathe/proclesystemandcanbereadin-band.
PowerScope[13]usesadigitalmultimetertoperformoff-lineanalysisusingstatisticalsampling.
Itprovidesakernel-levelinterface(viasystemcalls)tostartandstopmeasure-ments;thisrequiresmodifyingtheoperatingsystem.
Thebenetofthissystemisthatpowerinformationiskeptintheprocesstable,allowingonetomapenergyusageinadetailedper-processway.
TheEnergyEndoscope[14]isanembeddedwirelesssensornetworkthatprovidesdetailedreal-timeenergymeasurementsviaacustom-designedhelperchip.
TheLinuxkernelismodiedtoreportenergyin/proc/statalongwithotherprocessorstats.
IsciandMartonosi[15]combineexternalpowermetermea-surementswithperformancecounterresultstogeneratepowerreadingswithamodeledCPU.
Thereadingsaregatheredonanexternalmachine.
Bellosa[16]proposesJouleWatcher,aninfrastructurethatuseshardwareperformancecounterstoestimatepowerandprovidethisinformationtothekernelforschedulingdecisions.
HeproposesagenericAPItoprovidethisinformationtousers.
III.
BACKGROUNDPAPIusershaverecentlybecomemoreconcernedwithenergyandpowermeasurements.
Partofthisisduetotheadditionofembeddedsystemsupport(includingARMandMIPSprocessors)andpartisfromthecurrentinterestinenergy-efciencyinPAPI'straditionalHPCenvironment.
WithPAPI-C(componentPAPI)itisstraightforwardtoaddextraPAPI"components"thatreportvaluesoutsideoftheusualhardwareperformancecountersthatwerelongthemainstayofPAPI.
ThePAPIAPIreturnsunsigned64-bitintegers;aslongasapowerorenergyvaluecantthatconstraintnochangesatallneedtobemadetoexistingPAPIcode.
A.
NewPAPIInterfacesTheexistingPAPIinterfaceissufcientforprovidingpowerandenergyvalues,buttherecentPAPI5.
0releaseaddsmanyfeaturesthatimprovethecollectionofthisinformation.
Themostimportantnewfeatureisenhancedeventinfor-mationsupport.
Theusercanqueryaneventandobtainfarricherdetailsthanwereavailablepreviously.
Thenewinterfaceallowsspecifyingunitsforareturnedvalue,allowingausertoknowifthevaluestheyaregettingarein"Watts","Joules"orperhapseven"nano-Joules"withouthavingtolookinthesystemdocumentation.
Anothernewfeatureistheabilitytoreturnvaluesotherthanunsignedintegers,includingoatingpoint.
Thisallowreturningpowervaluesinhuman-friendlyamountssuchas96.
45Wattsratherthan96450milliwatts.
Additionaleventinformationisprovidedthatwillhelpexternaltoolsanalyzetheresults,especiallywhentryingtocorrelatepowerresultswithothermeasurements.
PAPInowprovidesthefrequencywithwhichthevalueisupdatedandwhetherthevaluereturnedisinstantaneous(likeanaveragepowerreading)orcumulative(totalEnergy).
B.
LimitationsTherearesomelimitationswhenmeasuringpowerandenergyusingPAPI.
Typicallythesereadingsaresystem-wide:itisnotpossibletoexactlymaptheresultsexactlytotheuser'scode,especiallyonmulti-coresystems.
Oftenauserisinter-estedinknowingwherethepowerusagecomesfrom:powersupplyinefciencies,theCPU,networkcard,memory,etc.
Withexternalpowermetersitisnotpossibletobreakdownthefull-systempowermeasurementsintoper-componentvalues.
Sincepoweroptimizationforvarioushardwarecomponentsrequiredifferentstrategies,havingonlytotalsystempowermightnotprovideenoughinformationtoallowoptimization.
IdeallyonecouldcorrelatepowerandenergywithCPUandotherPAPImeasurements.
Thiscanbedone;valuescanbemeasuredatthesametime(althoughinseparateeventsets).
Howeverduetothenatureofthemeasurementsitishardtogetanexactcorrelation.
Anotherissueisthatofmeasurementoverhead.
SincePAPIhastorunonthesystemgatheringtheresults,itcontributestotheoverallpowerbudgetofthesystem.
Toolsthatmeasurepowerexternallydonothavethisproblem.
IV.
PAPIENERGYANDPOWERCOMPONENTSThenewPAPI5.
0releaseaddssupportforvariouspowerandenergycomponents.
PAPIcomponentsmeasurepowerandenergyin-band:aprogramisinstrumentedwithPAPIcallsandcanreadmea-surementdataintotherunningprocess.
Thedatacanbestoredtodiskforlaterofineanalysis,butbydefaultitisavailableforimmediateaction.
Thiscontrastswithothertoolsthatonlysupportout-of-bandmeasurements:theycanonlyanalyzecodeatalatertime,andtheprogrambeingproledisnotawareofitscurrentpowerorenergystatus.
Weuselinearalgebraroutinesthatperformone-sidedfac-torizationofdensematricestocomparevariousmethodsofmeasuringenergy.
Inparticular,wetestCholeskyfactorizationfromPLASMA[17]ontheprocessorsideandLUfactor-izationontheGPUusingMAGMA[18].
Bothofthesearecomputationallyboundandthusshowvariablepowerdrawbythecomputingdevice:eitherCPUorGPU.
Ourtestsalsoshowmemoryeffectsbyincludingmemoryboundoperationssuchasllingthematriceswithinitialvalues.
2Appearedinthe2012PASAWorkshop0204060801001201401600510152025303540Power(Watts)Time(seconds)CPUMemoryMotherboadFanFig.
1.
PLASMACholeskypowerusagegatheredbyPowerPack(notPAPI).
Resultsweregatheredout-of-band;PAPIcangathersimilardatain-band.
Forcomparisonpurposes,Figure1showsPLASMACholeskyresultsgatheredwithPowerPack[9](notPAPI)onamachinecustom-wiredforpowermeasurement.
Resultsaregatheredonanunrelatedmachine(whichhastheadvantageofnotincludingtheoverheadofthemeasurementinthepowerreadings).
WeshowthatPAPIcangeneratesimilarresultsfromavarietyofpowermeasurementdevices.
A.
ExternalMeasurementThemostcommontypeofpowermeasurementinfrastruc-tureisonewhereanexternalpowermeterisused.
ForPAPItoaccessthedata,thevalueshavetobepassedbacktothemachinebeingmeasured.
ThisisusuallydoneviaaserialorUSBconnection.
Theeasiesttypeofequipmenttouseinthiscaseisonewhereapowerpass-throughisused;thisdevicelookslikeapowerstrip,andallowsmeasuringthepowerconsumptionofanythingpluggedintothedevice.
Moreintrusivefull-systeminstrumentationcanbedone,wherewiresarehookedintopowersupplies,disks,processorsockets,andDIMMsockets.
Thisenablesne-grainedpowermeasurementbutusuallyrequiresextensiveinstallationcosts.
1)Watt'sUpProPowerMeter:TheWatt'sUpPropower-meterisanexternalmeasurementdevicethatasystemplugsintoinsteadofawalloutlet;itprovidesvariousmeasurementsviaaUSBserialconnection.
Themetricscollectedincludeaveragepower,voltage,current,andvariousothers.
Energycanbederivedbasedontheaveragepowerandtime.
Theresultsaresystem-wideandlowresolution,withupdatesonlyonceasecond.
WritingaPAPIdriverforthisdeviceisnontrivial,astheresultsbecomeavailableeverysecondwhetherrequestedornot.
Anydatacanpotentiallybelostiftheon-boardloggingmemoryisfullandareaddoesnothappenintheone-secondtimewindow.
SincePAPIuserscannotbeexpectedtohavetheircodeinterruptitselfonceasecondtomeasuredata,thePAPIcomponentforksahelperthreadthatreadsthedataonaregularbasis,andthenreturnsoverallvalueswhenaninstrumentedprogramrequestsit.
SomedatagatheredfromaWatt'sUpProdeviceareshowninFigure2.
Theresultsarecoarseduetotheone-secondsamplingfrequencyofthedevice.
Thiscanbegoodenoughfordoingvalidationandglobalinvestigations,butprobablynotdetailedenoughwhentuningcodeforenergyefciency.
However,thegeneraltrendsinpowerconsumptionforthecodeinquestion(CholeskyfactorizationfromPLASMA[17])aresimilartothemuchner-graingraphinFigure1.
InFigure2theinitialspikeinpowerconsumptiontoabout50W(twosecondsintotherun)representsdatageneration(creationofarandommatrix)andcorrespondstoaatledgeatabout130WinFigure1.
Foursecondsintotherun,bothguresindicateauctuationaroundthemaximumpowerlevelforthewholerun.
TheuctuationsaremuchmoreaccuratelyportrayedinFigure1,indicatingtheneedforgranularitysubstantiallylowerthan1secondavailablefortheWatt'sUpProdevice.
2)PowerMon2:Thepowermon2[19]cardsitsbetweenasystem'spowersupplyanditsvariouscomponents.
Itmeasuresvoltageandcurrenton8differentlines,monitoringmostofthepowergoingintothecomputer.
Measurementshappenatafrequencyofupto3kHz;thisismultiplexedacrossauser-selectedsubsetofthe8channels.
WeareworkingonaPAPIcomponentforthisdevice,butsupportiscurrentlynotavailable.
Weforeseeusingthisdevicetoprovideenergyresultsatadetailnotavailablewithotherexternalpowermeters.
B.
InternalMeasurementRecentcomputerhardwareincludessupportformeasuringenergyandpowerconsumptioninternally.
Thisallowsne-grainedpoweranalysiswithouthavingtocustom-instrumentthehardware.
3Appearedinthe2012PASAWorkshop0102030Time(seconds)0204060AveragePower(Watts)PLASMACholeskyFactorizationN=10,000threads=2Fig.
2.
PLASMACholeskypowergatheredwithaWatt'sUpProdeviceonanIntelCore2laptop.
Coarseresultsduetoone-secondsamplingfrequency.
Accesstothemeasurementsusuallyrequiresdirectlow-levelhardwarereads,althoughsometimestheoperatingsystemoralibrarywilldothisforyou.
1)IntelRAPL:RecentIntelSandyBridgechipsincludethe"RunningAveragePowerLimit"(RAPL)interface,whichisdescribedintheIntelSoftwareDeveloper'sManual[20].
RAPL'soveralldesigngoalistoprovideaninfrastructureforkeepingprocessorsinsideofagivenuser-speciedpowerenvelope.
Theinternalcircuitrycanestimatecurrentenergyusagebasedonamodeldrivenbyhardwarecounters,tem-perature,andleakagemodels.
Theresultsofthismodelareavailabletotheuserviaamodelspecicregister(MSR),withanupdatefrequencyontheorderofmilliseconds.
ThepowermodelhasbeenvalidatedbyIntel[21]tocloselyfollowactualenergybeingused.
PAPIprovidesaccesstothevaluesreturnedbythepowermodel.
AccessingMSRsrequiresring-0accesstothehardware;typicallyonlytheoperatingsystemkernelcandothis.
ThismeansaccessingtheRAPLvaluesrequiresakerneldriver.
CurrentlyLinuxdoesnotprovidesuchadriver;onehasbeenproposed[22]butitisunlikelyitwillbemergedintothemainkerneltreeanytimesoon.
Togetaroundthisproblem,weusetheLinux"MSRdriver"thatexportsMSRaccesstouserspaceviaaspecialdevicedriver.
IftheMSRdriverisenabledandgivenproperread-onlypermissionsthenPAPIcanaccesstheseregistersdirectlywithoutneedingkernelsupport.
TherearesomelimitationstoaccessingRAPLthisway.
Theresultsaresystem-widevaluesandcannoteasilybeattributedtoindividualthreads.
Thisisnotworsethanmeasurementsofanysharedresource;onmodernIntelchipslastlevelcachesandtheuncoreeventssharethislimitation.
RAPLreportsvariousenergyreadings.
Thisincludestheenergyusageforthetotalprocessorpackageandthetotalcombinedenergyusedbyallthecores(referredtoasPower-Plane0(PP0)).
PP0alsoincludesalloftheprocessorcaches.
SomeversionsofSandyBridgechipsalsoreportpowerusagebytheon-boardGPU(Power-Plane1(PP1)).
SandybridgeEPchipsdonotsupporttheGPUmeasurement,butinsteadreportenergyreadingsfortheDRAMinterface.
WhiletheRAPLvaluescanbemeasuredin-bandandconsumedbytheprogram,sinceRAPLissystem-wideaseparateprocessmaybeusedtomeasureenergyandpower.
InthiswaytherunningcodedoesnotneedtobeinstrumentedandsomeofthePAPIoverheadcanbeavoided.
Weusethismethodtogathertheresultspresented.
WetakemeasurementsonaSandybridgeEPmachine.
Ithas2CPUpackages,eachwith8cores,andeachcorewith2threads.
Figure3showssomeaveragepowermea-surementsgatheredwhiledoingCholeskyfactorizationusingthePLASMAlibrary.
Noticethattheenergyusagebyeachpackagevaries,despiteallofthecoresdoingsimilarwork.
Partofthisislikelyduetovariationsinthecoresatthesiliconlevel,asnoticedbyRountreeetal.
[23].
Figure4showsthesamemeasurementsusingtheIntelMKLlibrary[24].
Figure5showssomeenergymeasurementscomparingthesameCholeskyfactorizationusingbothPLASMAandIntelMKLonthesamehardware.
ThePAPIresultsshowthatforthiscase,PLASMAusesenergymorequickly,butnishesfasteranduseslesstotalenergyforthecalculation.
2)AMDApplicationPowerManagement:RecentAMDFamily15hprocessorscanreport"CurrentPowerInWatts".
[25]viathe"ProcessorPowerinTDP"MSR.
Weareinvesti-gatingPAPIsupportforthisandhopetodeployacomponentsimilarinnatureandscopetotheIntelRAPLcomponent.
4Appearedinthe2012PASAWorkshop10203040Time(seconds)050100150AveragePower(Watts)PLASMACholeskyFactorizationN=30,000threads=16DRAMPackage0DRAMPackage1PP0Package0PP0Package1TotalPackage0TotalPackage1Fig.
3.
PLASMACholeskypowerusagemeasuredwithRAPLonSandybridgeEP.
PowerPlane0(PP0)istotalusageforall8coresinapackage.
10203040Time(seconds)050100150AveragePower(Watts)MKLCholeskyFactorizationN=30,000threads=16DRAMPackage0DRAMPackage1PP0Package0PP0Package1TotalPackage0TotalPackage1Fig.
4.
IntelMKLCholeskypowerusagemeasuredwithRAPLonSandybridge.
PowerPlane0(PP0)istotalusageforall8coresinapackage.
10203040Time(seconds)01000200030004000TotalEnergy(Joules)CholeskyFactorizationN=30,000threads=16PLASMAPackage0PLASMAPackage1mklPackage0mklPackage1Fig.
5.
Energyusageoftwodifferentimplementations(PLASMAandMKL)ofCholeskyonSandybridgeEPmeasuredwithRAPL.
5Appearedinthe2012PASAWorkshop012Time(seconds)050100150AveragePower(Watts)Fig.
6.
MAGMALUwithsize10,000powermeasurementonanNvidiaFermiC2075,gatheredwithNVML.
3)NVIDIAManagementLibrary:RecentNVIDIAGPUscanreportpowerusageviatheNVIDIAManagementLi-brary(NVML)[26].
ThenvmlDeviceGetPowerUsage()routineexportsthecurrentpower;onFermiC2075GPUsithasmilliwattresolutionwithin±5Wandisupdatedatroughly60Hz.
Thepowerreportedisthatfortheentireboard,includingGPUandmemory.
GatheringdetailedperformanceinformationfromaGPUisdifcult:onceyoudispatchcodetoaGPUtherunningCPUhasnocontroloverituntiltheGPUreturnsuponcomple-tion.
ThismeansthatitisnotgenerallypossibletoattributewhatGPUcodecorrespondstowhatpowerreadings.
Nvidiaprovidesahigh-levelutilitycallednvidia-smiwhichcanbeusedtomeasurepower,butitssamplerateistoolongtoobtainusefulmeasurements.
InordertoprovidebetterpowermeasurementswehaveconstructedanNVMLcomponent[27]forPAPIandhavevalidatedtheresultsusinga"Kill-A-Watt"powermeter.
Figure6showsdatagatheredonanNvidiaFermiC2075cardrunningaMAGMA[28]kernelusingtheLUalgo-rithm[29]withamatrixsizeof10k.
TheMAGMALUfactorizationisacomputeboundalgo-rithm(expressedintermsofGEMMs);itusesahybridizationmethodologytosplitthecomputationbetweentheCPUhostandGPU.
ThesplitaimstomatchLU'salgorithmicrequire-mentstothearchitecturalstrengthsoftheGPUandtheCPU.
InthecaseofLU,thistranslatesintohavingallmatrix-matrix(GEMM)multiplicationdoneontheGmyPU,andthepanelfactorizationsonCPU.
ThedesignofthealgorithmallowsforbigenoughmatricestototallyoverlaptheCPUworkwiththelargematrix-matrixmultiplicationsontheGPU.
Asaresult,theperformanceoftheMAGMALUalgorithmrunsatthespeedofperformingGEMMsontheGPU.
OurexperimentshaveshownthattheuseofMAGMAGEMMoperationsonGPUcompletelyutilizeit,maximizingthepowerconsumption.
ThisexplainswhythehybridLUfactorizationalsomaximizestheGPUpowerconsumption,whichreducestimetakensotheoverallenergyconsumptionisminimized.
C.
EstimatedPowerVariousresearcheshaveproposedusinghardwareperfor-mancecounterstomodelenergyandpowerconsumption[15],[30],[31],[32],[33],[16],[34],[35],[36].
Goeletal.
[36]haveshownthatpowercanbemodeledtowithin10%usingjustfourhardwareperformancecounters.
UsingthePAPIuser-denedeventsinfrastructure[37]aneventcanbecreatedthatderivesanestimatedpowervaluefromthehardwarecounters.
Thiscanbeusedtomeasurepoweronsystemsthatdonothavehardwarepowermeasure-mentavailable.
V.
CONCLUSIONThePAPIlibrarycannowprovidetransparentaccesstopowerandenergymeasurementsviaexistinginterfaces.
Exist-ingprogramsthatalreadyhaveinstrumentationforPAPIforCPUperformancemeasurementscanquicklybeadaptedtomeasurepower,andexistingtoolswillgainaccesstothenewpowereventswithasimplePAPIupgrade.
Withlargerandlargerclustersbeingbuilt,energyconsump-tionhasbecomeoneofthedeningconstraints.
PAPIhasbeencontinuallyextendedtoprovidesupportforthemostup-to-dateperformancemeasurementsonmodernsystems.
TheadditionofpowerandenergymeasurementsallowPAPIuserstostay6Appearedinthe2012PASAWorkshopontopofthisincreasinglyimportantareainthealwaysrapidlychangingHPCenvironment.
ACKNOWLEDGMENTThismaterialisbaseduponworksupportedbytheNationalScienceFoundationunderGrantNo.
0910899andtheU.
S.
DepartmentofEnergyOfceofScienceundercontractDE-FC02-06ER25761.
REFERENCES[1]S.
Browne,J.
Dongarra,N.
Garner,G.
Ho,andP.
Mucci,"Aportableprogramminginterfaceforperformanceevaluationonmodernproces-sors,"InternationalJournalofHighPerformanceComputingApplica-tions,vol.
14,no.
3,pp.
189–204,2000.
[2]D.
Terpstra,H.
Jagode,H.
You,andJ.
Dongarra,"Collectingperfor-mancedatawithPAPI-C,"in3rdParallelToolsWorkshop,2009,pp.
157–173.
[3]"Top500supercomputingsites,"http://www.
top500.
org/.
[4]"Topgreen500list::Environmentallyresponsiblesupercomputing,"http://www.
green500.
org/.
[5]S.
ShendeandA.
Malony,"TheTauparallelperformancesystem,"InternationalJournalofHighPerformanceComputingApplications,vol.
20,no.
2,pp.
287–311,2006.
[6]L.
Adhianto,S.
Banerjee,M.
Fagan,M.
Krentel,G.
Marin,J.
Mellor-Crummey,andN.
Tallent,"HPCToolkit:Toolsforperformanceanalysisofoptimizedparallelprograms,"ConcurrencyandComputation:Prac-ticeandExperience,vol.
22,no.
6,pp.
685–701,2010.
[7]W.
Nagel,A.
Arnold,M.
Weber,H.
-C.
Hoppe,andK.
Solchenbach,"VAMPIR:VisualizationandanalysisofMPIresources,"Supercom-puter,vol.
12,no.
1,pp.
69–80,1996.
[8]Intel,IntelEnergyChecker:SoftwareDeveloperKitUserGuide,2010.
[9]R.
Ge,X.
Feng,S.
Song,H.
-C.
Chang,D.
Li,andK.
Cameron,"Pow-erPack:Energyprolingandanalysisofhigh-performancesystemsandapplications,"IEEETransactionsonParallelandDistributedSystems,vol.
21,no.
6,May2010.
[10]P.
Popa,"ManagingserverenergyconsumptionusingIBMPowerExec-utive,"IBMSystemsandTechnologyGroup,Tech.
Rep.
,2006.
[11]D.
Shin,H.
Shim,Y.
Joo,H.
-S.
Yun,J.
Kim,andN.
Chang,"Energy-monitoringtoolforlow-powerembeddedprograms,"IEEEDesign&TestofComputers,vol.
19,no.
4,pp.
7–17,July/August2002.
[12]S.
Ryffel,"LEA2P:Thelinuxenergyattributionandaccountingplat-form,"Master'sthesis,SwissFederalInstituteofTechnology,Jan.
2009.
[13]J.
FlinnandM.
Satyanarayanan,"PowerScope:atoolforprolingtheenergyusageofmobileapplications,"inProc.
ofthe2ndIEEEWorkshoponMobileComputingSystemsandApplications,Feb.
1999,pp.
2–10.
[14]T.
Stathopoulos,D.
McIntire,andW.
Kaiser,"Theenergyendoscope:Real-timedetailedenergyaccountingforwirelesssensornodes,"inProc.
oftheInternationalConferenceonInformationProcessinginSensorNetworks,Apr.
2008,pp.
383–394.
[15]C.
IsciandM.
Martonosi,"Runtimepowermonitoringinhigh-endprocessors:Methodologyandempiricaldata,"inProc.
IEEE/ACM36thAnnualInternationalSymposiumonMicroarchitecture,Dec.
2003.
[16]F.
Bellosa,"Thebenetsofevent:drivenenergyaccountinginpower-sensitivesystems,"inProceedingsofthe9thworkshoponACMSIGOPSEuropeanworkshop,2000.
[17]PLASMAUsers'Guide,ParallelLinearAlgebraSoftwareforMulticoreArchitectures,Version2.
3,UniversityofTennesseeKnoxville,Nov.
2010.
[18]S.
Tomov,R.
Nath,H.
Ltaief,andJ.
Dongarra,"DenselinearalgebrasolversformulticorewithGPUaccelerators,"inProc.
24thIEEE/ACMInternationalParallelandDistributedProcessingSymposium,Apr.
2010.
[19]D.
Bedard,R.
Fowler,M.
Linn,andA.
Portereld,"PowerMon2:Fine-grained,integratedpowermeasurement,"RenaissanceComputingInstitute,Tech.
Rep.
TR-09-04,2009.
[20]Intel,IntelArchitectureSoftwareDeveloper'sManual,Volume3:SystemProgrammingGuide,2009.
[21]E.
Rotem,A.
Naveh,D.
Rajwan,A.
Anathakrishnan,andE.
Weissmann,"Power-managementarchitectureoftheIntelmicroarchitecturecode-namedSandyBridge,"IEEEMicro,vol.
32,no.
2,pp.
20–27,2012.
[22]Z.
Rui.
(2011,May)[patch2/3]introducein-telrapldriver.
linux-kernelmailinglist.
[Online].
Available:http://thread.
gmane.
org/gmane.
linux.
kernel/1145973[23]B.
Rountree,D.
Ahn,B.
deSupinski,D.
Lowenthal,andM.
Schulz,"BeyondDVFS:Arstlookatperformanceunderahardware-enforcedpowerbound,"inProc.
of8thWorkshoponHigh-Performance,Power-AwareComputing,May2012.
[24]Intel,Intel,MathKernelLibrary(MKL),http://www.
intel.
com/software/products/mkl/.
[25]AMD,AMDFamily15hProcessorBIOSandKernelDeveloperGuide,2011.
[26]NVMLReferenceManual,NVIDIA,2012.
[27]K.
Kasichayanula,"PowerawarecomputingonGPUs,"Master'sthesis,UniversityofTennessee,Knoxville,May2012.
[28]E.
Agullo,C.
Augonnet,J.
Dongarra,H.
Ltaief,R.
Namyst,S.
Thibault,andS.
Tomov,"Faster,cheaper,better-ahybridizationmethodologytodeveloplinearalgebrasoftwareforGPUs,"LAPACKWorkingNote230.
[29]S.
Yamazaki,S.
Tomov,andJ.
Dongarra,"One-sideddensematrixfactorizationsonamulticorewithmultipleGPUaccelerators,"inProc.
ofthe2012InternationalConferenceonComputationalScience,Jun.
2012.
[30]K.
Singh,M.
Bhadauria,andS.
McKee,"Realtimepowerestimationofmulti-coresviaperformancecounters,"Proc.
WorkshoponDesign,ArchitectureandSimulationofChipMulti-Processors,Nov.
2008.
[31]I.
Kadayif,T.
Chinoda,M.
Kandemir,N.
Vijaykirsnan,M.
Irwin,andA.
Sivasubramaniam,"vEC:virtualenergycounters,"inProc.
ofthe2001ACMSIGPLAN-SIGSOFTworkshoponProgramanalysisforsoftwaretoolsandengineering,Jun.
2001.
[32]V.
Tiwari,S.
Malik,andA.
Wolfe,"Poweranalysisofembeddedsoftware:arststeptowardssoftwarepowerminimization,"IEEETransactionsonVLSI,vol.
3,no.
4,pp.
437–445,1994.
[33]J.
RussellandM.
Jacome,"Softwarepowerestimationandoptimizationforhighperformance,32-bitembeddedprocessors,"inProc.
IEEEInternationalConferenceonComputerDesign,Oct.
1998,pp.
328–333.
[34]R.
JosephandM.
Martonosi,"Run-timepowerestimationinhigh-performancemicroprocessors,"inProc.
IEEE/ACMInternationalSym-posiumonLowPowerElectronicsandDesign,Aug.
2001,pp.
135–140.
[35]J.
Haid,G.
Kaefer,C.
Steger,andR.
Weiss,"Run-timeenergyestimationinsystem-on-a-chipdesigns,"inProc.
oftheAsiaandSouthPacicDesignAutomationConference,Jan.
2003,pp.
595–599.
[36]B.
Goel,S.
McKee,R.
Gioiosa,K.
Singh,M.
Bhadauria,andM.
Cesati,"Portable,scalable,per-corepowerestimationforintelligentresourcemanagement.
"inFirstInternationalGreenComputingConference,Aug.
2010.
[37]S.
MooreandJ.
Ralph,"User-denedeventsforhardwareperformancemonitoring,"inProc.
11thWorkshoponToolsforProgramDevelopmentandAnalysisinComputationalScience,Jun.
2011.
7

3G流量免费高防CDN 50-200G防御

简介酷盾安全怎么样?酷盾安全,隶属于云南酷番云计算有限公司,主要提供高防CDN服务,高防服务器等,分为中国境内CDN,和境外CDN和二个产品,均支持SSL。目前CDN处于内测阶段,目前是免费的,套餐包0.01一个。3G流量(高防CDN)用完了继续续费或者购买升级包即可。有兴趣的可以看看,需要实名的。官方网站: :点击进入官网云南酷番云计算有限公司优惠方案流量3G,用完了不够再次购买或者升级套餐流量...

野草云提供适合入门建站香港云服务器 年付138元起 3M带宽 2GB内存

野草云服务商在前面的文章中也有多次提到,算是一个国内的小众服务商。促销活动也不是很多,比较专注个人云服务用户业务,之前和站长聊到不少网友选择他们家是用来做网站的。这不看到商家有提供香港云服务器的优惠促销,可选CN2、BGP线路、支持Linux与windows系统,支持故障自动迁移,使用NVMe优化的Ceph集群存储,比较适合建站用户选择使用,最低年付138元 。野草云(原野草主机),公司成立于20...

Sharktech($49/月),10G端口 32GB内存,鲨鱼机房新用户赠送$50

Sharktech 鲨鱼机房商家我们是不是算比较熟悉的,因为有很多的服务商渠道的高防服务器都是拿他们家的机器然后部署高防VPS主机的,不过这几年Sharktech商家有自己直接销售云服务器产品,比如看到有新增公有云主机有促销活动,一般有人可能买回去自己搭建虚拟主机拆分销售的,有的也是自用的。有看到不少网友在分享到鲨鱼机房商家促销活动期间,有赠送开通公有云主机$50,可以购买最低配置的,$49/月的...

sandybridge为你推荐
h连锁酒店连锁酒店有哪些嘉兴商标注册嘉兴那里有设计商标的李子柒年入1.6亿将55g铁片放入硫酸铜溶液中片刻,取出洗净,干燥后,称重为56.6g,问生成铜多少g??求解题步骤及答案甲骨文不满赔偿如果合同期不满被单位辞退,用人单位是否需要赔偿巫正刚想在淘宝开一个类似于耐克、阿迪之类的店、需要多少钱、如何能够代理巫正刚阿迪三叶草彩虹板鞋的鞋带怎么穿?详细点,最后有图解。高分求5xoy.com求个如月群真汉化版下载地址www.zjs.com.cn怎么查询我的平安信用卡寄送情况se95se.com现在400se就是进不去呢?进WWW怎么400se总cOM打开一半,?求解m.kan84.net那里有免费的电影看?
vps代理 lnmp 主机屋免费空间 服务器怎么绑定域名 国内php空间 权嘉云 腾讯实名认证中心 免费申请网站 免费测手机号 国外ip加速器 空间登入 韩国代理ip 阿里云邮箱登陆 网站加速 tracker服务器 湖南铁通 cdn加速 apache启动失败 发证机构 服务器是什么意思 更多