MeasuringEnergyandPowerwithPAPIVincentM.
Weaver,MattJohnson,KiranKasichayanula,JamesRalph,PiotrLuszczek,DanTerpstra,andShirleyMooreInnovativeComputingLaboratoryUniversityofTennessee{vweaver1,mrj,kirankk,ralph,luszczek,terpstra,shirley}@eecs.
utk.
eduAbstract—Energyandpowerconsumptionarebecomingcriti-calmetricsinthedesignandusageofhighperformancesystems.
WehaveextendedthePerformanceAPI(PAPI)analysislibrarytomeasureandreportenergyandpowervalues.
ThesevaluesarereportedusingtheexistingPAPIAPI,allowingcodepreviouslyinstrumentedforperformancecounterstoalsomeasurepowerandenergy.
HigherleveltoolsthatbuildonPAPIwillautomat-icallygainsupportforpowerandenergyreadingswhenusedwiththenewestversionofPAPI.
WedescribeindetailthetypesofenergyandpowerreadingsavailablethroughPAPI.
Wesupportexternalpowermeters,aswellasvaluesprovidedinternallybyrecentCPUsandGPUs.
Measurementsareprovideddirectlytotheinstrumentedprocess,allowingimmediatecodeanalysisinrealtime.
Weprovideexamplesshowingresultsthatcanbeobtainedwithourinfrastructure.
IndexTerms—energymeasurement;powermeasurement;per-formanceanalysisI.
INTRODUCTIONThePerformanceAPI(PAPI)[1]frameworkhastradition-allyprovidedlow-levelcross-platformaccesstothehardwareperformancecountersavailableonmostmodernCPUs.
WiththeadventofcomponentPAPI(PAPI-C)[2],PAPIhasbeenextendedtoprovideawidervarietyofperformancedatafromvarioussources.
Recentlyanumberofnewcomponentshavebeenaddedthatprovidetheabilitytomeasureasystem'senergyandpowerusage.
Energyandpowerhavebecomeincreasinglyimportantcomponentsofoverallsystembehaviorinhigh-performancecomputing(HPC).
Powerandenergyconcernswereonceprimarilyofinteresttoembeddeddevelopers.
NowthatHPCmachineshavehundredsofthousandsofcores[3],theabilitytoreduceconsumptionbyjustafewWattsperCPUquicklyaddsuptomajorpower,cooling,andmonetarysavings.
TherehasbeenalotofHPCinterestinthisarearecently,includingtheGreen500[4]listofenergy-efcientsupercomputers.
PAPI'sabilitytobeextendedbycomponentsallowsaddingsupportforenergyandpowermeasurementswithoutanychangesneededtothecoreinfrastructure.
Existingcodethatisalreadyinstrumentedformeasuringperformancecounterscanbere-used;thenewpowerandenergyeventswillshowupineventlistingsjustlikeotherperformanceevents,andcanbemeasuredwiththesameexistingPAPIAPI.
ThiswillallowcurrentusersofPAPIonHPCsystemstoanalyzepowerandenergywithlittleadditionaleffort.
Therearemanyexistingtoolsthatprovideaccesstopowerandenergymeasurements(oftenthesecomewiththepowermeasuringhardware).
PAPI'sadvantageisthatitallowsmea-suringadiversesetofhardwarewithonecommoninterface.
Usersonlyinstrumenttheircodeonce,andthencanuseitwithminimalchangesastheircodeismovedbetweendifferentmachineswithdifferenthardware.
WithoutPAPItheinstrumentedcodewouldhavetobere-writtendependingonwhatpowermeasurementhardwareitisrunningon.
AnotherbenetofPAPIisthatinadditiontomeasuringenergyandpower,italsoprovidesaccesstoothervalues,suchasCPUperformancecounters,GPUcounters,network,andI/O.
Allofthesecanbemeasuredatthesametime,providingforaricheranalysisenvironment.
ManyoftheotheradvancedPAPIfeatures,suchassamplingandproling,canpotentiallybeusedinconjunctionwiththesenewpowerandenergyevents.
Higher-leveltoolsthatbuildontopofPAPI(suchasTAU[5],HPCToolkit[6],orVampir[7])automaticallygetsupportforthesenewmeasurementsassoonastheyarepairedwithanupdatedPAPIversion.
WewilldescribeindetailthevarioustypesofpowerandenergymeasurementsthatwillbeavailableinthePAPI5.
0release,aswellasshowingexamplesofthedatathatcanbegathered.
II.
RELATEDWORKTherearevariousexistingtoolsthatprovideaccesstopowerandenergyvalues.
Ingeneralthesetoolsdonothaveacross-platformAPIlikePAPI,noraretheydeployedaswidely.
PAPIhasthebenetofallowingenergymeasurementsatthesametimeasCPUandotherperformancecountermeasurements,allowinganalysisoflow-levelenergybehavioratthesourcecodelevel.
PAPIcanalsoactasanabstractionlibrary,somostofthetoolslistedbelowcouldbegivenPAPIcomponentinterfaces.
ThetoolthatprovidesthemostsimilarfunctionalitytoPAPIistheIntelEnergyCheckerSDK[8].
ItprovidesanAPIforinstrumentingcodeandgatheringenergyinformationfromavarietyofexternalpowermetersandsystemcounters.
Itprovidessupportforvariousoperatingsystems,butislimitedtoIntelarchitectures.
PowerPack[9]providesaninterfaceformeasuringpowerfromavarietyofexternalpowersources.
TheAPIprovidesroutinesforstartingandstoppingthegatheringofdataontheremotemachine.
UnlikePAPI,themeasurementsaregatheredout-of-band(onaseparatemachine)andthuscannotbedirectlyprovidedtotherunningprocessinrealtime.
Appearedinthe2012PASAWorkshopIBMPowerExecutive[10]allowsmonitoringpowerandenergyonIBMbladeservers.
AswithPowerPack,thedataisgatheredandanalyzedbyatool(inthiscaseIBMDirector)runningonaseparatemachine.
Shinetal.
[11]constructapowerboardforanARMsystemthatestimatespowerandcommunicateswithafront-endtoolviaPCI.
Varioustoolsaredescribedthatusethegatheredinformation,butthereisnotagenericAPIforaccessingit.
TheLinuxEnergyAttributionandAccountingPlatform(LEA2P)[12]acquiresdataonasystemwithhardwarecustom-modiedtoprovidepowerreadingsviaadataacqui-sitionboard.
ThesevaluesarepassedintotheLinuxkernelandmadeavailableviathe/proclesystemandcanbereadin-band.
PowerScope[13]usesadigitalmultimetertoperformoff-lineanalysisusingstatisticalsampling.
Itprovidesakernel-levelinterface(viasystemcalls)tostartandstopmeasure-ments;thisrequiresmodifyingtheoperatingsystem.
Thebenetofthissystemisthatpowerinformationiskeptintheprocesstable,allowingonetomapenergyusageinadetailedper-processway.
TheEnergyEndoscope[14]isanembeddedwirelesssensornetworkthatprovidesdetailedreal-timeenergymeasurementsviaacustom-designedhelperchip.
TheLinuxkernelismodiedtoreportenergyin/proc/statalongwithotherprocessorstats.
IsciandMartonosi[15]combineexternalpowermetermea-surementswithperformancecounterresultstogeneratepowerreadingswithamodeledCPU.
Thereadingsaregatheredonanexternalmachine.
Bellosa[16]proposesJouleWatcher,aninfrastructurethatuseshardwareperformancecounterstoestimatepowerandprovidethisinformationtothekernelforschedulingdecisions.
HeproposesagenericAPItoprovidethisinformationtousers.
III.
BACKGROUNDPAPIusershaverecentlybecomemoreconcernedwithenergyandpowermeasurements.
Partofthisisduetotheadditionofembeddedsystemsupport(includingARMandMIPSprocessors)andpartisfromthecurrentinterestinenergy-efciencyinPAPI'straditionalHPCenvironment.
WithPAPI-C(componentPAPI)itisstraightforwardtoaddextraPAPI"components"thatreportvaluesoutsideoftheusualhardwareperformancecountersthatwerelongthemainstayofPAPI.
ThePAPIAPIreturnsunsigned64-bitintegers;aslongasapowerorenergyvaluecantthatconstraintnochangesatallneedtobemadetoexistingPAPIcode.
A.
NewPAPIInterfacesTheexistingPAPIinterfaceissufcientforprovidingpowerandenergyvalues,buttherecentPAPI5.
0releaseaddsmanyfeaturesthatimprovethecollectionofthisinformation.
Themostimportantnewfeatureisenhancedeventinfor-mationsupport.
Theusercanqueryaneventandobtainfarricherdetailsthanwereavailablepreviously.
Thenewinterfaceallowsspecifyingunitsforareturnedvalue,allowingausertoknowifthevaluestheyaregettingarein"Watts","Joules"orperhapseven"nano-Joules"withouthavingtolookinthesystemdocumentation.
Anothernewfeatureistheabilitytoreturnvaluesotherthanunsignedintegers,includingoatingpoint.
Thisallowreturningpowervaluesinhuman-friendlyamountssuchas96.
45Wattsratherthan96450milliwatts.
Additionaleventinformationisprovidedthatwillhelpexternaltoolsanalyzetheresults,especiallywhentryingtocorrelatepowerresultswithothermeasurements.
PAPInowprovidesthefrequencywithwhichthevalueisupdatedandwhetherthevaluereturnedisinstantaneous(likeanaveragepowerreading)orcumulative(totalEnergy).
B.
LimitationsTherearesomelimitationswhenmeasuringpowerandenergyusingPAPI.
Typicallythesereadingsaresystem-wide:itisnotpossibletoexactlymaptheresultsexactlytotheuser'scode,especiallyonmulti-coresystems.
Oftenauserisinter-estedinknowingwherethepowerusagecomesfrom:powersupplyinefciencies,theCPU,networkcard,memory,etc.
Withexternalpowermetersitisnotpossibletobreakdownthefull-systempowermeasurementsintoper-componentvalues.
Sincepoweroptimizationforvarioushardwarecomponentsrequiredifferentstrategies,havingonlytotalsystempowermightnotprovideenoughinformationtoallowoptimization.
IdeallyonecouldcorrelatepowerandenergywithCPUandotherPAPImeasurements.
Thiscanbedone;valuescanbemeasuredatthesametime(althoughinseparateeventsets).
Howeverduetothenatureofthemeasurementsitishardtogetanexactcorrelation.
Anotherissueisthatofmeasurementoverhead.
SincePAPIhastorunonthesystemgatheringtheresults,itcontributestotheoverallpowerbudgetofthesystem.
Toolsthatmeasurepowerexternallydonothavethisproblem.
IV.
PAPIENERGYANDPOWERCOMPONENTSThenewPAPI5.
0releaseaddssupportforvariouspowerandenergycomponents.
PAPIcomponentsmeasurepowerandenergyin-band:aprogramisinstrumentedwithPAPIcallsandcanreadmea-surementdataintotherunningprocess.
Thedatacanbestoredtodiskforlaterofineanalysis,butbydefaultitisavailableforimmediateaction.
Thiscontrastswithothertoolsthatonlysupportout-of-bandmeasurements:theycanonlyanalyzecodeatalatertime,andtheprogrambeingproledisnotawareofitscurrentpowerorenergystatus.
Weuselinearalgebraroutinesthatperformone-sidedfac-torizationofdensematricestocomparevariousmethodsofmeasuringenergy.
Inparticular,wetestCholeskyfactorizationfromPLASMA[17]ontheprocessorsideandLUfactor-izationontheGPUusingMAGMA[18].
Bothofthesearecomputationallyboundandthusshowvariablepowerdrawbythecomputingdevice:eitherCPUorGPU.
Ourtestsalsoshowmemoryeffectsbyincludingmemoryboundoperationssuchasllingthematriceswithinitialvalues.
2Appearedinthe2012PASAWorkshop0204060801001201401600510152025303540Power(Watts)Time(seconds)CPUMemoryMotherboadFanFig.
1.
PLASMACholeskypowerusagegatheredbyPowerPack(notPAPI).
Resultsweregatheredout-of-band;PAPIcangathersimilardatain-band.
Forcomparisonpurposes,Figure1showsPLASMACholeskyresultsgatheredwithPowerPack[9](notPAPI)onamachinecustom-wiredforpowermeasurement.
Resultsaregatheredonanunrelatedmachine(whichhastheadvantageofnotincludingtheoverheadofthemeasurementinthepowerreadings).
WeshowthatPAPIcangeneratesimilarresultsfromavarietyofpowermeasurementdevices.
A.
ExternalMeasurementThemostcommontypeofpowermeasurementinfrastruc-tureisonewhereanexternalpowermeterisused.
ForPAPItoaccessthedata,thevalueshavetobepassedbacktothemachinebeingmeasured.
ThisisusuallydoneviaaserialorUSBconnection.
Theeasiesttypeofequipmenttouseinthiscaseisonewhereapowerpass-throughisused;thisdevicelookslikeapowerstrip,andallowsmeasuringthepowerconsumptionofanythingpluggedintothedevice.
Moreintrusivefull-systeminstrumentationcanbedone,wherewiresarehookedintopowersupplies,disks,processorsockets,andDIMMsockets.
Thisenablesne-grainedpowermeasurementbutusuallyrequiresextensiveinstallationcosts.
1)Watt'sUpProPowerMeter:TheWatt'sUpPropower-meterisanexternalmeasurementdevicethatasystemplugsintoinsteadofawalloutlet;itprovidesvariousmeasurementsviaaUSBserialconnection.
Themetricscollectedincludeaveragepower,voltage,current,andvariousothers.
Energycanbederivedbasedontheaveragepowerandtime.
Theresultsaresystem-wideandlowresolution,withupdatesonlyonceasecond.
WritingaPAPIdriverforthisdeviceisnontrivial,astheresultsbecomeavailableeverysecondwhetherrequestedornot.
Anydatacanpotentiallybelostiftheon-boardloggingmemoryisfullandareaddoesnothappenintheone-secondtimewindow.
SincePAPIuserscannotbeexpectedtohavetheircodeinterruptitselfonceasecondtomeasuredata,thePAPIcomponentforksahelperthreadthatreadsthedataonaregularbasis,andthenreturnsoverallvalueswhenaninstrumentedprogramrequestsit.
SomedatagatheredfromaWatt'sUpProdeviceareshowninFigure2.
Theresultsarecoarseduetotheone-secondsamplingfrequencyofthedevice.
Thiscanbegoodenoughfordoingvalidationandglobalinvestigations,butprobablynotdetailedenoughwhentuningcodeforenergyefciency.
However,thegeneraltrendsinpowerconsumptionforthecodeinquestion(CholeskyfactorizationfromPLASMA[17])aresimilartothemuchner-graingraphinFigure1.
InFigure2theinitialspikeinpowerconsumptiontoabout50W(twosecondsintotherun)representsdatageneration(creationofarandommatrix)andcorrespondstoaatledgeatabout130WinFigure1.
Foursecondsintotherun,bothguresindicateauctuationaroundthemaximumpowerlevelforthewholerun.
TheuctuationsaremuchmoreaccuratelyportrayedinFigure1,indicatingtheneedforgranularitysubstantiallylowerthan1secondavailablefortheWatt'sUpProdevice.
2)PowerMon2:Thepowermon2[19]cardsitsbetweenasystem'spowersupplyanditsvariouscomponents.
Itmeasuresvoltageandcurrenton8differentlines,monitoringmostofthepowergoingintothecomputer.
Measurementshappenatafrequencyofupto3kHz;thisismultiplexedacrossauser-selectedsubsetofthe8channels.
WeareworkingonaPAPIcomponentforthisdevice,butsupportiscurrentlynotavailable.
Weforeseeusingthisdevicetoprovideenergyresultsatadetailnotavailablewithotherexternalpowermeters.
B.
InternalMeasurementRecentcomputerhardwareincludessupportformeasuringenergyandpowerconsumptioninternally.
Thisallowsne-grainedpoweranalysiswithouthavingtocustom-instrumentthehardware.
3Appearedinthe2012PASAWorkshop0102030Time(seconds)0204060AveragePower(Watts)PLASMACholeskyFactorizationN=10,000threads=2Fig.
2.
PLASMACholeskypowergatheredwithaWatt'sUpProdeviceonanIntelCore2laptop.
Coarseresultsduetoone-secondsamplingfrequency.
Accesstothemeasurementsusuallyrequiresdirectlow-levelhardwarereads,althoughsometimestheoperatingsystemoralibrarywilldothisforyou.
1)IntelRAPL:RecentIntelSandyBridgechipsincludethe"RunningAveragePowerLimit"(RAPL)interface,whichisdescribedintheIntelSoftwareDeveloper'sManual[20].
RAPL'soveralldesigngoalistoprovideaninfrastructureforkeepingprocessorsinsideofagivenuser-speciedpowerenvelope.
Theinternalcircuitrycanestimatecurrentenergyusagebasedonamodeldrivenbyhardwarecounters,tem-perature,andleakagemodels.
Theresultsofthismodelareavailabletotheuserviaamodelspecicregister(MSR),withanupdatefrequencyontheorderofmilliseconds.
ThepowermodelhasbeenvalidatedbyIntel[21]tocloselyfollowactualenergybeingused.
PAPIprovidesaccesstothevaluesreturnedbythepowermodel.
AccessingMSRsrequiresring-0accesstothehardware;typicallyonlytheoperatingsystemkernelcandothis.
ThismeansaccessingtheRAPLvaluesrequiresakerneldriver.
CurrentlyLinuxdoesnotprovidesuchadriver;onehasbeenproposed[22]butitisunlikelyitwillbemergedintothemainkerneltreeanytimesoon.
Togetaroundthisproblem,weusetheLinux"MSRdriver"thatexportsMSRaccesstouserspaceviaaspecialdevicedriver.
IftheMSRdriverisenabledandgivenproperread-onlypermissionsthenPAPIcanaccesstheseregistersdirectlywithoutneedingkernelsupport.
TherearesomelimitationstoaccessingRAPLthisway.
Theresultsaresystem-widevaluesandcannoteasilybeattributedtoindividualthreads.
Thisisnotworsethanmeasurementsofanysharedresource;onmodernIntelchipslastlevelcachesandtheuncoreeventssharethislimitation.
RAPLreportsvariousenergyreadings.
Thisincludestheenergyusageforthetotalprocessorpackageandthetotalcombinedenergyusedbyallthecores(referredtoasPower-Plane0(PP0)).
PP0alsoincludesalloftheprocessorcaches.
SomeversionsofSandyBridgechipsalsoreportpowerusagebytheon-boardGPU(Power-Plane1(PP1)).
SandybridgeEPchipsdonotsupporttheGPUmeasurement,butinsteadreportenergyreadingsfortheDRAMinterface.
WhiletheRAPLvaluescanbemeasuredin-bandandconsumedbytheprogram,sinceRAPLissystem-wideaseparateprocessmaybeusedtomeasureenergyandpower.
InthiswaytherunningcodedoesnotneedtobeinstrumentedandsomeofthePAPIoverheadcanbeavoided.
Weusethismethodtogathertheresultspresented.
WetakemeasurementsonaSandybridgeEPmachine.
Ithas2CPUpackages,eachwith8cores,andeachcorewith2threads.
Figure3showssomeaveragepowermea-surementsgatheredwhiledoingCholeskyfactorizationusingthePLASMAlibrary.
Noticethattheenergyusagebyeachpackagevaries,despiteallofthecoresdoingsimilarwork.
Partofthisislikelyduetovariationsinthecoresatthesiliconlevel,asnoticedbyRountreeetal.
[23].
Figure4showsthesamemeasurementsusingtheIntelMKLlibrary[24].
Figure5showssomeenergymeasurementscomparingthesameCholeskyfactorizationusingbothPLASMAandIntelMKLonthesamehardware.
ThePAPIresultsshowthatforthiscase,PLASMAusesenergymorequickly,butnishesfasteranduseslesstotalenergyforthecalculation.
2)AMDApplicationPowerManagement:RecentAMDFamily15hprocessorscanreport"CurrentPowerInWatts".
[25]viathe"ProcessorPowerinTDP"MSR.
Weareinvesti-gatingPAPIsupportforthisandhopetodeployacomponentsimilarinnatureandscopetotheIntelRAPLcomponent.
4Appearedinthe2012PASAWorkshop10203040Time(seconds)050100150AveragePower(Watts)PLASMACholeskyFactorizationN=30,000threads=16DRAMPackage0DRAMPackage1PP0Package0PP0Package1TotalPackage0TotalPackage1Fig.
3.
PLASMACholeskypowerusagemeasuredwithRAPLonSandybridgeEP.
PowerPlane0(PP0)istotalusageforall8coresinapackage.
10203040Time(seconds)050100150AveragePower(Watts)MKLCholeskyFactorizationN=30,000threads=16DRAMPackage0DRAMPackage1PP0Package0PP0Package1TotalPackage0TotalPackage1Fig.
4.
IntelMKLCholeskypowerusagemeasuredwithRAPLonSandybridge.
PowerPlane0(PP0)istotalusageforall8coresinapackage.
10203040Time(seconds)01000200030004000TotalEnergy(Joules)CholeskyFactorizationN=30,000threads=16PLASMAPackage0PLASMAPackage1mklPackage0mklPackage1Fig.
5.
Energyusageoftwodifferentimplementations(PLASMAandMKL)ofCholeskyonSandybridgeEPmeasuredwithRAPL.
5Appearedinthe2012PASAWorkshop012Time(seconds)050100150AveragePower(Watts)Fig.
6.
MAGMALUwithsize10,000powermeasurementonanNvidiaFermiC2075,gatheredwithNVML.
3)NVIDIAManagementLibrary:RecentNVIDIAGPUscanreportpowerusageviatheNVIDIAManagementLi-brary(NVML)[26].
ThenvmlDeviceGetPowerUsage()routineexportsthecurrentpower;onFermiC2075GPUsithasmilliwattresolutionwithin±5Wandisupdatedatroughly60Hz.
Thepowerreportedisthatfortheentireboard,includingGPUandmemory.
GatheringdetailedperformanceinformationfromaGPUisdifcult:onceyoudispatchcodetoaGPUtherunningCPUhasnocontroloverituntiltheGPUreturnsuponcomple-tion.
ThismeansthatitisnotgenerallypossibletoattributewhatGPUcodecorrespondstowhatpowerreadings.
Nvidiaprovidesahigh-levelutilitycallednvidia-smiwhichcanbeusedtomeasurepower,butitssamplerateistoolongtoobtainusefulmeasurements.
InordertoprovidebetterpowermeasurementswehaveconstructedanNVMLcomponent[27]forPAPIandhavevalidatedtheresultsusinga"Kill-A-Watt"powermeter.
Figure6showsdatagatheredonanNvidiaFermiC2075cardrunningaMAGMA[28]kernelusingtheLUalgo-rithm[29]withamatrixsizeof10k.
TheMAGMALUfactorizationisacomputeboundalgo-rithm(expressedintermsofGEMMs);itusesahybridizationmethodologytosplitthecomputationbetweentheCPUhostandGPU.
ThesplitaimstomatchLU'salgorithmicrequire-mentstothearchitecturalstrengthsoftheGPUandtheCPU.
InthecaseofLU,thistranslatesintohavingallmatrix-matrix(GEMM)multiplicationdoneontheGmyPU,andthepanelfactorizationsonCPU.
ThedesignofthealgorithmallowsforbigenoughmatricestototallyoverlaptheCPUworkwiththelargematrix-matrixmultiplicationsontheGPU.
Asaresult,theperformanceoftheMAGMALUalgorithmrunsatthespeedofperformingGEMMsontheGPU.
OurexperimentshaveshownthattheuseofMAGMAGEMMoperationsonGPUcompletelyutilizeit,maximizingthepowerconsumption.
ThisexplainswhythehybridLUfactorizationalsomaximizestheGPUpowerconsumption,whichreducestimetakensotheoverallenergyconsumptionisminimized.
C.
EstimatedPowerVariousresearcheshaveproposedusinghardwareperfor-mancecounterstomodelenergyandpowerconsumption[15],[30],[31],[32],[33],[16],[34],[35],[36].
Goeletal.
[36]haveshownthatpowercanbemodeledtowithin10%usingjustfourhardwareperformancecounters.
UsingthePAPIuser-denedeventsinfrastructure[37]aneventcanbecreatedthatderivesanestimatedpowervaluefromthehardwarecounters.
Thiscanbeusedtomeasurepoweronsystemsthatdonothavehardwarepowermeasure-mentavailable.
V.
CONCLUSIONThePAPIlibrarycannowprovidetransparentaccesstopowerandenergymeasurementsviaexistinginterfaces.
Exist-ingprogramsthatalreadyhaveinstrumentationforPAPIforCPUperformancemeasurementscanquicklybeadaptedtomeasurepower,andexistingtoolswillgainaccesstothenewpowereventswithasimplePAPIupgrade.
Withlargerandlargerclustersbeingbuilt,energyconsump-tionhasbecomeoneofthedeningconstraints.
PAPIhasbeencontinuallyextendedtoprovidesupportforthemostup-to-dateperformancemeasurementsonmodernsystems.
TheadditionofpowerandenergymeasurementsallowPAPIuserstostay6Appearedinthe2012PASAWorkshopontopofthisincreasinglyimportantareainthealwaysrapidlychangingHPCenvironment.
ACKNOWLEDGMENTThismaterialisbaseduponworksupportedbytheNationalScienceFoundationunderGrantNo.
0910899andtheU.
S.
DepartmentofEnergyOfceofScienceundercontractDE-FC02-06ER25761.
REFERENCES[1]S.
Browne,J.
Dongarra,N.
Garner,G.
Ho,andP.
Mucci,"Aportableprogramminginterfaceforperformanceevaluationonmodernproces-sors,"InternationalJournalofHighPerformanceComputingApplica-tions,vol.
14,no.
3,pp.
189–204,2000.
[2]D.
Terpstra,H.
Jagode,H.
You,andJ.
Dongarra,"Collectingperfor-mancedatawithPAPI-C,"in3rdParallelToolsWorkshop,2009,pp.
157–173.
[3]"Top500supercomputingsites,"http://www.
top500.
org/.
[4]"Topgreen500list::Environmentallyresponsiblesupercomputing,"http://www.
green500.
org/.
[5]S.
ShendeandA.
Malony,"TheTauparallelperformancesystem,"InternationalJournalofHighPerformanceComputingApplications,vol.
20,no.
2,pp.
287–311,2006.
[6]L.
Adhianto,S.
Banerjee,M.
Fagan,M.
Krentel,G.
Marin,J.
Mellor-Crummey,andN.
Tallent,"HPCToolkit:Toolsforperformanceanalysisofoptimizedparallelprograms,"ConcurrencyandComputation:Prac-ticeandExperience,vol.
22,no.
6,pp.
685–701,2010.
[7]W.
Nagel,A.
Arnold,M.
Weber,H.
-C.
Hoppe,andK.
Solchenbach,"VAMPIR:VisualizationandanalysisofMPIresources,"Supercom-puter,vol.
12,no.
1,pp.
69–80,1996.
[8]Intel,IntelEnergyChecker:SoftwareDeveloperKitUserGuide,2010.
[9]R.
Ge,X.
Feng,S.
Song,H.
-C.
Chang,D.
Li,andK.
Cameron,"Pow-erPack:Energyprolingandanalysisofhigh-performancesystemsandapplications,"IEEETransactionsonParallelandDistributedSystems,vol.
21,no.
6,May2010.
[10]P.
Popa,"ManagingserverenergyconsumptionusingIBMPowerExec-utive,"IBMSystemsandTechnologyGroup,Tech.
Rep.
,2006.
[11]D.
Shin,H.
Shim,Y.
Joo,H.
-S.
Yun,J.
Kim,andN.
Chang,"Energy-monitoringtoolforlow-powerembeddedprograms,"IEEEDesign&TestofComputers,vol.
19,no.
4,pp.
7–17,July/August2002.
[12]S.
Ryffel,"LEA2P:Thelinuxenergyattributionandaccountingplat-form,"Master'sthesis,SwissFederalInstituteofTechnology,Jan.
2009.
[13]J.
FlinnandM.
Satyanarayanan,"PowerScope:atoolforprolingtheenergyusageofmobileapplications,"inProc.
ofthe2ndIEEEWorkshoponMobileComputingSystemsandApplications,Feb.
1999,pp.
2–10.
[14]T.
Stathopoulos,D.
McIntire,andW.
Kaiser,"Theenergyendoscope:Real-timedetailedenergyaccountingforwirelesssensornodes,"inProc.
oftheInternationalConferenceonInformationProcessinginSensorNetworks,Apr.
2008,pp.
383–394.
[15]C.
IsciandM.
Martonosi,"Runtimepowermonitoringinhigh-endprocessors:Methodologyandempiricaldata,"inProc.
IEEE/ACM36thAnnualInternationalSymposiumonMicroarchitecture,Dec.
2003.
[16]F.
Bellosa,"Thebenetsofevent:drivenenergyaccountinginpower-sensitivesystems,"inProceedingsofthe9thworkshoponACMSIGOPSEuropeanworkshop,2000.
[17]PLASMAUsers'Guide,ParallelLinearAlgebraSoftwareforMulticoreArchitectures,Version2.
3,UniversityofTennesseeKnoxville,Nov.
2010.
[18]S.
Tomov,R.
Nath,H.
Ltaief,andJ.
Dongarra,"DenselinearalgebrasolversformulticorewithGPUaccelerators,"inProc.
24thIEEE/ACMInternationalParallelandDistributedProcessingSymposium,Apr.
2010.
[19]D.
Bedard,R.
Fowler,M.
Linn,andA.
Portereld,"PowerMon2:Fine-grained,integratedpowermeasurement,"RenaissanceComputingInstitute,Tech.
Rep.
TR-09-04,2009.
[20]Intel,IntelArchitectureSoftwareDeveloper'sManual,Volume3:SystemProgrammingGuide,2009.
[21]E.
Rotem,A.
Naveh,D.
Rajwan,A.
Anathakrishnan,andE.
Weissmann,"Power-managementarchitectureoftheIntelmicroarchitecturecode-namedSandyBridge,"IEEEMicro,vol.
32,no.
2,pp.
20–27,2012.
[22]Z.
Rui.
(2011,May)[patch2/3]introducein-telrapldriver.
linux-kernelmailinglist.
[Online].
Available:http://thread.
gmane.
org/gmane.
linux.
kernel/1145973[23]B.
Rountree,D.
Ahn,B.
deSupinski,D.
Lowenthal,andM.
Schulz,"BeyondDVFS:Arstlookatperformanceunderahardware-enforcedpowerbound,"inProc.
of8thWorkshoponHigh-Performance,Power-AwareComputing,May2012.
[24]Intel,Intel,MathKernelLibrary(MKL),http://www.
intel.
com/software/products/mkl/.
[25]AMD,AMDFamily15hProcessorBIOSandKernelDeveloperGuide,2011.
[26]NVMLReferenceManual,NVIDIA,2012.
[27]K.
Kasichayanula,"PowerawarecomputingonGPUs,"Master'sthesis,UniversityofTennessee,Knoxville,May2012.
[28]E.
Agullo,C.
Augonnet,J.
Dongarra,H.
Ltaief,R.
Namyst,S.
Thibault,andS.
Tomov,"Faster,cheaper,better-ahybridizationmethodologytodeveloplinearalgebrasoftwareforGPUs,"LAPACKWorkingNote230.
[29]S.
Yamazaki,S.
Tomov,andJ.
Dongarra,"One-sideddensematrixfactorizationsonamulticorewithmultipleGPUaccelerators,"inProc.
ofthe2012InternationalConferenceonComputationalScience,Jun.
2012.
[30]K.
Singh,M.
Bhadauria,andS.
McKee,"Realtimepowerestimationofmulti-coresviaperformancecounters,"Proc.
WorkshoponDesign,ArchitectureandSimulationofChipMulti-Processors,Nov.
2008.
[31]I.
Kadayif,T.
Chinoda,M.
Kandemir,N.
Vijaykirsnan,M.
Irwin,andA.
Sivasubramaniam,"vEC:virtualenergycounters,"inProc.
ofthe2001ACMSIGPLAN-SIGSOFTworkshoponProgramanalysisforsoftwaretoolsandengineering,Jun.
2001.
[32]V.
Tiwari,S.
Malik,andA.
Wolfe,"Poweranalysisofembeddedsoftware:arststeptowardssoftwarepowerminimization,"IEEETransactionsonVLSI,vol.
3,no.
4,pp.
437–445,1994.
[33]J.
RussellandM.
Jacome,"Softwarepowerestimationandoptimizationforhighperformance,32-bitembeddedprocessors,"inProc.
IEEEInternationalConferenceonComputerDesign,Oct.
1998,pp.
328–333.
[34]R.
JosephandM.
Martonosi,"Run-timepowerestimationinhigh-performancemicroprocessors,"inProc.
IEEE/ACMInternationalSym-posiumonLowPowerElectronicsandDesign,Aug.
2001,pp.
135–140.
[35]J.
Haid,G.
Kaefer,C.
Steger,andR.
Weiss,"Run-timeenergyestimationinsystem-on-a-chipdesigns,"inProc.
oftheAsiaandSouthPacicDesignAutomationConference,Jan.
2003,pp.
595–599.
[36]B.
Goel,S.
McKee,R.
Gioiosa,K.
Singh,M.
Bhadauria,andM.
Cesati,"Portable,scalable,per-corepowerestimationforintelligentresourcemanagement.
"inFirstInternationalGreenComputingConference,Aug.
2010.
[37]S.
MooreandJ.
Ralph,"User-denedeventsforhardwareperformancemonitoring,"inProc.
11thWorkshoponToolsforProgramDevelopmentandAnalysisinComputationalScience,Jun.
2011.
7
profitserver怎么样?profitserver是一家成立于2003的主机商家,是ITC控股的一个部门,主要经营的产品域名、SSL证书、虚拟主机、VPS和独立服务器,机房有俄罗斯、新加坡、荷兰、美国、保加利亚,VPS采用的是KVM虚拟架构,硬盘采用纯SSD,而且最大的优势是不限制流量,大公司运营,机器比较稳定,数据中心众多。此次ProfitServer正在对德国VPS(法兰克福)、西班牙v...
中秋节快到了,spinservers针对中国用户准备了几款圣何塞机房特别独立服务器,大家知道这家服务器都是高配,这次推出的机器除了配置高以外,默认1Gbps不限制流量,解除了常规机器10TB/月的流量限制,价格每月179美元起,机器自动化上架,一般30分钟内,有基本自助管理功能,带IPMI,支持安装Windows或者Linux操作系统。配置一 $179/月CPU:Dual Intel Xeon E...
天上云怎么样?天上云隶属于成都天上云网络科技有限公司,是一家提供云服务器及物理服务器的国人商家,目前商家针对香港物理机在做优惠促销,香港沙田机房采用三网直连,其中电信走CN2,带宽为50Mbps,不限制流量,商家提供IPMI,可以自行管理,随意安装系统,目前E3-1225/16G的套餐低至572元每月,有做大规模业务的朋友可以看看。点击进入:天上云官方网站天上云香港物理机服务器套餐:香港沙田数据中...
sandybridge为你推荐
美国互联网瘫痪如果全球网络瘫痪3分钟会造成多少损失留学生认证国外留学生毕业证怎么进行认证呢?杰景新特萨克斯吉普特500是台湾原产的吗777k7.com怎么在这几个网站上下载图片啊www.777mu.com www.gangguan23.comwww.22zizi.com乐乐电影天堂 http://www.leleooo.com 这个网站怎么样?www.kaspersky.com.cn卡巴斯基杀毒软件有免费的吗?稳定版的怎么找?本冈一郎只想问本冈一郎的效果真的和说的一样吗?大概多长时间可以管用呢?用过的进!月风随笔散文校园月色600字初中作文盗车飞侠侠盗飞车罪恶都市全部秘籍ps手柄版的猴山条约关于猴的谚语
西部数码vps 工信部域名备案系统 virpus bandwagonhost 59.99美元 60g硬盘 华为云主机 100x100头像 南通服务器 空间技术网 香港新世界中心 免费网页申请 环聊 空间购买 登陆空间 域名和主机 国外代理服务器 谷歌搜索打不开 酷锐 ddos攻击工具 更多