cumulativesandybridge

sandybridge  时间:2021-03-27  阅读:()
MeasuringEnergyandPowerwithPAPIVincentM.
Weaver,MattJohnson,KiranKasichayanula,JamesRalph,PiotrLuszczek,DanTerpstra,andShirleyMooreInnovativeComputingLaboratoryUniversityofTennessee{vweaver1,mrj,kirankk,ralph,luszczek,terpstra,shirley}@eecs.
utk.
eduAbstract—Energyandpowerconsumptionarebecomingcriti-calmetricsinthedesignandusageofhighperformancesystems.
WehaveextendedthePerformanceAPI(PAPI)analysislibrarytomeasureandreportenergyandpowervalues.
ThesevaluesarereportedusingtheexistingPAPIAPI,allowingcodepreviouslyinstrumentedforperformancecounterstoalsomeasurepowerandenergy.
HigherleveltoolsthatbuildonPAPIwillautomat-icallygainsupportforpowerandenergyreadingswhenusedwiththenewestversionofPAPI.
WedescribeindetailthetypesofenergyandpowerreadingsavailablethroughPAPI.
Wesupportexternalpowermeters,aswellasvaluesprovidedinternallybyrecentCPUsandGPUs.
Measurementsareprovideddirectlytotheinstrumentedprocess,allowingimmediatecodeanalysisinrealtime.
Weprovideexamplesshowingresultsthatcanbeobtainedwithourinfrastructure.
IndexTerms—energymeasurement;powermeasurement;per-formanceanalysisI.
INTRODUCTIONThePerformanceAPI(PAPI)[1]frameworkhastradition-allyprovidedlow-levelcross-platformaccesstothehardwareperformancecountersavailableonmostmodernCPUs.
WiththeadventofcomponentPAPI(PAPI-C)[2],PAPIhasbeenextendedtoprovideawidervarietyofperformancedatafromvarioussources.
Recentlyanumberofnewcomponentshavebeenaddedthatprovidetheabilitytomeasureasystem'senergyandpowerusage.
Energyandpowerhavebecomeincreasinglyimportantcomponentsofoverallsystembehaviorinhigh-performancecomputing(HPC).
Powerandenergyconcernswereonceprimarilyofinteresttoembeddeddevelopers.
NowthatHPCmachineshavehundredsofthousandsofcores[3],theabilitytoreduceconsumptionbyjustafewWattsperCPUquicklyaddsuptomajorpower,cooling,andmonetarysavings.
TherehasbeenalotofHPCinterestinthisarearecently,includingtheGreen500[4]listofenergy-efcientsupercomputers.
PAPI'sabilitytobeextendedbycomponentsallowsaddingsupportforenergyandpowermeasurementswithoutanychangesneededtothecoreinfrastructure.
Existingcodethatisalreadyinstrumentedformeasuringperformancecounterscanbere-used;thenewpowerandenergyeventswillshowupineventlistingsjustlikeotherperformanceevents,andcanbemeasuredwiththesameexistingPAPIAPI.
ThiswillallowcurrentusersofPAPIonHPCsystemstoanalyzepowerandenergywithlittleadditionaleffort.
Therearemanyexistingtoolsthatprovideaccesstopowerandenergymeasurements(oftenthesecomewiththepowermeasuringhardware).
PAPI'sadvantageisthatitallowsmea-suringadiversesetofhardwarewithonecommoninterface.
Usersonlyinstrumenttheircodeonce,andthencanuseitwithminimalchangesastheircodeismovedbetweendifferentmachineswithdifferenthardware.
WithoutPAPItheinstrumentedcodewouldhavetobere-writtendependingonwhatpowermeasurementhardwareitisrunningon.
AnotherbenetofPAPIisthatinadditiontomeasuringenergyandpower,italsoprovidesaccesstoothervalues,suchasCPUperformancecounters,GPUcounters,network,andI/O.
Allofthesecanbemeasuredatthesametime,providingforaricheranalysisenvironment.
ManyoftheotheradvancedPAPIfeatures,suchassamplingandproling,canpotentiallybeusedinconjunctionwiththesenewpowerandenergyevents.
Higher-leveltoolsthatbuildontopofPAPI(suchasTAU[5],HPCToolkit[6],orVampir[7])automaticallygetsupportforthesenewmeasurementsassoonastheyarepairedwithanupdatedPAPIversion.
WewilldescribeindetailthevarioustypesofpowerandenergymeasurementsthatwillbeavailableinthePAPI5.
0release,aswellasshowingexamplesofthedatathatcanbegathered.
II.
RELATEDWORKTherearevariousexistingtoolsthatprovideaccesstopowerandenergyvalues.
Ingeneralthesetoolsdonothaveacross-platformAPIlikePAPI,noraretheydeployedaswidely.
PAPIhasthebenetofallowingenergymeasurementsatthesametimeasCPUandotherperformancecountermeasurements,allowinganalysisoflow-levelenergybehavioratthesourcecodelevel.
PAPIcanalsoactasanabstractionlibrary,somostofthetoolslistedbelowcouldbegivenPAPIcomponentinterfaces.
ThetoolthatprovidesthemostsimilarfunctionalitytoPAPIistheIntelEnergyCheckerSDK[8].
ItprovidesanAPIforinstrumentingcodeandgatheringenergyinformationfromavarietyofexternalpowermetersandsystemcounters.
Itprovidessupportforvariousoperatingsystems,butislimitedtoIntelarchitectures.
PowerPack[9]providesaninterfaceformeasuringpowerfromavarietyofexternalpowersources.
TheAPIprovidesroutinesforstartingandstoppingthegatheringofdataontheremotemachine.
UnlikePAPI,themeasurementsaregatheredout-of-band(onaseparatemachine)andthuscannotbedirectlyprovidedtotherunningprocessinrealtime.
Appearedinthe2012PASAWorkshopIBMPowerExecutive[10]allowsmonitoringpowerandenergyonIBMbladeservers.
AswithPowerPack,thedataisgatheredandanalyzedbyatool(inthiscaseIBMDirector)runningonaseparatemachine.
Shinetal.
[11]constructapowerboardforanARMsystemthatestimatespowerandcommunicateswithafront-endtoolviaPCI.
Varioustoolsaredescribedthatusethegatheredinformation,butthereisnotagenericAPIforaccessingit.
TheLinuxEnergyAttributionandAccountingPlatform(LEA2P)[12]acquiresdataonasystemwithhardwarecustom-modiedtoprovidepowerreadingsviaadataacqui-sitionboard.
ThesevaluesarepassedintotheLinuxkernelandmadeavailableviathe/proclesystemandcanbereadin-band.
PowerScope[13]usesadigitalmultimetertoperformoff-lineanalysisusingstatisticalsampling.
Itprovidesakernel-levelinterface(viasystemcalls)tostartandstopmeasure-ments;thisrequiresmodifyingtheoperatingsystem.
Thebenetofthissystemisthatpowerinformationiskeptintheprocesstable,allowingonetomapenergyusageinadetailedper-processway.
TheEnergyEndoscope[14]isanembeddedwirelesssensornetworkthatprovidesdetailedreal-timeenergymeasurementsviaacustom-designedhelperchip.
TheLinuxkernelismodiedtoreportenergyin/proc/statalongwithotherprocessorstats.
IsciandMartonosi[15]combineexternalpowermetermea-surementswithperformancecounterresultstogeneratepowerreadingswithamodeledCPU.
Thereadingsaregatheredonanexternalmachine.
Bellosa[16]proposesJouleWatcher,aninfrastructurethatuseshardwareperformancecounterstoestimatepowerandprovidethisinformationtothekernelforschedulingdecisions.
HeproposesagenericAPItoprovidethisinformationtousers.
III.
BACKGROUNDPAPIusershaverecentlybecomemoreconcernedwithenergyandpowermeasurements.
Partofthisisduetotheadditionofembeddedsystemsupport(includingARMandMIPSprocessors)andpartisfromthecurrentinterestinenergy-efciencyinPAPI'straditionalHPCenvironment.
WithPAPI-C(componentPAPI)itisstraightforwardtoaddextraPAPI"components"thatreportvaluesoutsideoftheusualhardwareperformancecountersthatwerelongthemainstayofPAPI.
ThePAPIAPIreturnsunsigned64-bitintegers;aslongasapowerorenergyvaluecantthatconstraintnochangesatallneedtobemadetoexistingPAPIcode.
A.
NewPAPIInterfacesTheexistingPAPIinterfaceissufcientforprovidingpowerandenergyvalues,buttherecentPAPI5.
0releaseaddsmanyfeaturesthatimprovethecollectionofthisinformation.
Themostimportantnewfeatureisenhancedeventinfor-mationsupport.
Theusercanqueryaneventandobtainfarricherdetailsthanwereavailablepreviously.
Thenewinterfaceallowsspecifyingunitsforareturnedvalue,allowingausertoknowifthevaluestheyaregettingarein"Watts","Joules"orperhapseven"nano-Joules"withouthavingtolookinthesystemdocumentation.
Anothernewfeatureistheabilitytoreturnvaluesotherthanunsignedintegers,includingoatingpoint.
Thisallowreturningpowervaluesinhuman-friendlyamountssuchas96.
45Wattsratherthan96450milliwatts.
Additionaleventinformationisprovidedthatwillhelpexternaltoolsanalyzetheresults,especiallywhentryingtocorrelatepowerresultswithothermeasurements.
PAPInowprovidesthefrequencywithwhichthevalueisupdatedandwhetherthevaluereturnedisinstantaneous(likeanaveragepowerreading)orcumulative(totalEnergy).
B.
LimitationsTherearesomelimitationswhenmeasuringpowerandenergyusingPAPI.
Typicallythesereadingsaresystem-wide:itisnotpossibletoexactlymaptheresultsexactlytotheuser'scode,especiallyonmulti-coresystems.
Oftenauserisinter-estedinknowingwherethepowerusagecomesfrom:powersupplyinefciencies,theCPU,networkcard,memory,etc.
Withexternalpowermetersitisnotpossibletobreakdownthefull-systempowermeasurementsintoper-componentvalues.
Sincepoweroptimizationforvarioushardwarecomponentsrequiredifferentstrategies,havingonlytotalsystempowermightnotprovideenoughinformationtoallowoptimization.
IdeallyonecouldcorrelatepowerandenergywithCPUandotherPAPImeasurements.
Thiscanbedone;valuescanbemeasuredatthesametime(althoughinseparateeventsets).
Howeverduetothenatureofthemeasurementsitishardtogetanexactcorrelation.
Anotherissueisthatofmeasurementoverhead.
SincePAPIhastorunonthesystemgatheringtheresults,itcontributestotheoverallpowerbudgetofthesystem.
Toolsthatmeasurepowerexternallydonothavethisproblem.
IV.
PAPIENERGYANDPOWERCOMPONENTSThenewPAPI5.
0releaseaddssupportforvariouspowerandenergycomponents.
PAPIcomponentsmeasurepowerandenergyin-band:aprogramisinstrumentedwithPAPIcallsandcanreadmea-surementdataintotherunningprocess.
Thedatacanbestoredtodiskforlaterofineanalysis,butbydefaultitisavailableforimmediateaction.
Thiscontrastswithothertoolsthatonlysupportout-of-bandmeasurements:theycanonlyanalyzecodeatalatertime,andtheprogrambeingproledisnotawareofitscurrentpowerorenergystatus.
Weuselinearalgebraroutinesthatperformone-sidedfac-torizationofdensematricestocomparevariousmethodsofmeasuringenergy.
Inparticular,wetestCholeskyfactorizationfromPLASMA[17]ontheprocessorsideandLUfactor-izationontheGPUusingMAGMA[18].
Bothofthesearecomputationallyboundandthusshowvariablepowerdrawbythecomputingdevice:eitherCPUorGPU.
Ourtestsalsoshowmemoryeffectsbyincludingmemoryboundoperationssuchasllingthematriceswithinitialvalues.
2Appearedinthe2012PASAWorkshop0204060801001201401600510152025303540Power(Watts)Time(seconds)CPUMemoryMotherboadFanFig.
1.
PLASMACholeskypowerusagegatheredbyPowerPack(notPAPI).
Resultsweregatheredout-of-band;PAPIcangathersimilardatain-band.
Forcomparisonpurposes,Figure1showsPLASMACholeskyresultsgatheredwithPowerPack[9](notPAPI)onamachinecustom-wiredforpowermeasurement.
Resultsaregatheredonanunrelatedmachine(whichhastheadvantageofnotincludingtheoverheadofthemeasurementinthepowerreadings).
WeshowthatPAPIcangeneratesimilarresultsfromavarietyofpowermeasurementdevices.
A.
ExternalMeasurementThemostcommontypeofpowermeasurementinfrastruc-tureisonewhereanexternalpowermeterisused.
ForPAPItoaccessthedata,thevalueshavetobepassedbacktothemachinebeingmeasured.
ThisisusuallydoneviaaserialorUSBconnection.
Theeasiesttypeofequipmenttouseinthiscaseisonewhereapowerpass-throughisused;thisdevicelookslikeapowerstrip,andallowsmeasuringthepowerconsumptionofanythingpluggedintothedevice.
Moreintrusivefull-systeminstrumentationcanbedone,wherewiresarehookedintopowersupplies,disks,processorsockets,andDIMMsockets.
Thisenablesne-grainedpowermeasurementbutusuallyrequiresextensiveinstallationcosts.
1)Watt'sUpProPowerMeter:TheWatt'sUpPropower-meterisanexternalmeasurementdevicethatasystemplugsintoinsteadofawalloutlet;itprovidesvariousmeasurementsviaaUSBserialconnection.
Themetricscollectedincludeaveragepower,voltage,current,andvariousothers.
Energycanbederivedbasedontheaveragepowerandtime.
Theresultsaresystem-wideandlowresolution,withupdatesonlyonceasecond.
WritingaPAPIdriverforthisdeviceisnontrivial,astheresultsbecomeavailableeverysecondwhetherrequestedornot.
Anydatacanpotentiallybelostiftheon-boardloggingmemoryisfullandareaddoesnothappenintheone-secondtimewindow.
SincePAPIuserscannotbeexpectedtohavetheircodeinterruptitselfonceasecondtomeasuredata,thePAPIcomponentforksahelperthreadthatreadsthedataonaregularbasis,andthenreturnsoverallvalueswhenaninstrumentedprogramrequestsit.
SomedatagatheredfromaWatt'sUpProdeviceareshowninFigure2.
Theresultsarecoarseduetotheone-secondsamplingfrequencyofthedevice.
Thiscanbegoodenoughfordoingvalidationandglobalinvestigations,butprobablynotdetailedenoughwhentuningcodeforenergyefciency.
However,thegeneraltrendsinpowerconsumptionforthecodeinquestion(CholeskyfactorizationfromPLASMA[17])aresimilartothemuchner-graingraphinFigure1.
InFigure2theinitialspikeinpowerconsumptiontoabout50W(twosecondsintotherun)representsdatageneration(creationofarandommatrix)andcorrespondstoaatledgeatabout130WinFigure1.
Foursecondsintotherun,bothguresindicateauctuationaroundthemaximumpowerlevelforthewholerun.
TheuctuationsaremuchmoreaccuratelyportrayedinFigure1,indicatingtheneedforgranularitysubstantiallylowerthan1secondavailablefortheWatt'sUpProdevice.
2)PowerMon2:Thepowermon2[19]cardsitsbetweenasystem'spowersupplyanditsvariouscomponents.
Itmeasuresvoltageandcurrenton8differentlines,monitoringmostofthepowergoingintothecomputer.
Measurementshappenatafrequencyofupto3kHz;thisismultiplexedacrossauser-selectedsubsetofthe8channels.
WeareworkingonaPAPIcomponentforthisdevice,butsupportiscurrentlynotavailable.
Weforeseeusingthisdevicetoprovideenergyresultsatadetailnotavailablewithotherexternalpowermeters.
B.
InternalMeasurementRecentcomputerhardwareincludessupportformeasuringenergyandpowerconsumptioninternally.
Thisallowsne-grainedpoweranalysiswithouthavingtocustom-instrumentthehardware.
3Appearedinthe2012PASAWorkshop0102030Time(seconds)0204060AveragePower(Watts)PLASMACholeskyFactorizationN=10,000threads=2Fig.
2.
PLASMACholeskypowergatheredwithaWatt'sUpProdeviceonanIntelCore2laptop.
Coarseresultsduetoone-secondsamplingfrequency.
Accesstothemeasurementsusuallyrequiresdirectlow-levelhardwarereads,althoughsometimestheoperatingsystemoralibrarywilldothisforyou.
1)IntelRAPL:RecentIntelSandyBridgechipsincludethe"RunningAveragePowerLimit"(RAPL)interface,whichisdescribedintheIntelSoftwareDeveloper'sManual[20].
RAPL'soveralldesigngoalistoprovideaninfrastructureforkeepingprocessorsinsideofagivenuser-speciedpowerenvelope.
Theinternalcircuitrycanestimatecurrentenergyusagebasedonamodeldrivenbyhardwarecounters,tem-perature,andleakagemodels.
Theresultsofthismodelareavailabletotheuserviaamodelspecicregister(MSR),withanupdatefrequencyontheorderofmilliseconds.
ThepowermodelhasbeenvalidatedbyIntel[21]tocloselyfollowactualenergybeingused.
PAPIprovidesaccesstothevaluesreturnedbythepowermodel.
AccessingMSRsrequiresring-0accesstothehardware;typicallyonlytheoperatingsystemkernelcandothis.
ThismeansaccessingtheRAPLvaluesrequiresakerneldriver.
CurrentlyLinuxdoesnotprovidesuchadriver;onehasbeenproposed[22]butitisunlikelyitwillbemergedintothemainkerneltreeanytimesoon.
Togetaroundthisproblem,weusetheLinux"MSRdriver"thatexportsMSRaccesstouserspaceviaaspecialdevicedriver.
IftheMSRdriverisenabledandgivenproperread-onlypermissionsthenPAPIcanaccesstheseregistersdirectlywithoutneedingkernelsupport.
TherearesomelimitationstoaccessingRAPLthisway.
Theresultsaresystem-widevaluesandcannoteasilybeattributedtoindividualthreads.
Thisisnotworsethanmeasurementsofanysharedresource;onmodernIntelchipslastlevelcachesandtheuncoreeventssharethislimitation.
RAPLreportsvariousenergyreadings.
Thisincludestheenergyusageforthetotalprocessorpackageandthetotalcombinedenergyusedbyallthecores(referredtoasPower-Plane0(PP0)).
PP0alsoincludesalloftheprocessorcaches.
SomeversionsofSandyBridgechipsalsoreportpowerusagebytheon-boardGPU(Power-Plane1(PP1)).
SandybridgeEPchipsdonotsupporttheGPUmeasurement,butinsteadreportenergyreadingsfortheDRAMinterface.
WhiletheRAPLvaluescanbemeasuredin-bandandconsumedbytheprogram,sinceRAPLissystem-wideaseparateprocessmaybeusedtomeasureenergyandpower.
InthiswaytherunningcodedoesnotneedtobeinstrumentedandsomeofthePAPIoverheadcanbeavoided.
Weusethismethodtogathertheresultspresented.
WetakemeasurementsonaSandybridgeEPmachine.
Ithas2CPUpackages,eachwith8cores,andeachcorewith2threads.
Figure3showssomeaveragepowermea-surementsgatheredwhiledoingCholeskyfactorizationusingthePLASMAlibrary.
Noticethattheenergyusagebyeachpackagevaries,despiteallofthecoresdoingsimilarwork.
Partofthisislikelyduetovariationsinthecoresatthesiliconlevel,asnoticedbyRountreeetal.
[23].
Figure4showsthesamemeasurementsusingtheIntelMKLlibrary[24].
Figure5showssomeenergymeasurementscomparingthesameCholeskyfactorizationusingbothPLASMAandIntelMKLonthesamehardware.
ThePAPIresultsshowthatforthiscase,PLASMAusesenergymorequickly,butnishesfasteranduseslesstotalenergyforthecalculation.
2)AMDApplicationPowerManagement:RecentAMDFamily15hprocessorscanreport"CurrentPowerInWatts".
[25]viathe"ProcessorPowerinTDP"MSR.
Weareinvesti-gatingPAPIsupportforthisandhopetodeployacomponentsimilarinnatureandscopetotheIntelRAPLcomponent.
4Appearedinthe2012PASAWorkshop10203040Time(seconds)050100150AveragePower(Watts)PLASMACholeskyFactorizationN=30,000threads=16DRAMPackage0DRAMPackage1PP0Package0PP0Package1TotalPackage0TotalPackage1Fig.
3.
PLASMACholeskypowerusagemeasuredwithRAPLonSandybridgeEP.
PowerPlane0(PP0)istotalusageforall8coresinapackage.
10203040Time(seconds)050100150AveragePower(Watts)MKLCholeskyFactorizationN=30,000threads=16DRAMPackage0DRAMPackage1PP0Package0PP0Package1TotalPackage0TotalPackage1Fig.
4.
IntelMKLCholeskypowerusagemeasuredwithRAPLonSandybridge.
PowerPlane0(PP0)istotalusageforall8coresinapackage.
10203040Time(seconds)01000200030004000TotalEnergy(Joules)CholeskyFactorizationN=30,000threads=16PLASMAPackage0PLASMAPackage1mklPackage0mklPackage1Fig.
5.
Energyusageoftwodifferentimplementations(PLASMAandMKL)ofCholeskyonSandybridgeEPmeasuredwithRAPL.
5Appearedinthe2012PASAWorkshop012Time(seconds)050100150AveragePower(Watts)Fig.
6.
MAGMALUwithsize10,000powermeasurementonanNvidiaFermiC2075,gatheredwithNVML.
3)NVIDIAManagementLibrary:RecentNVIDIAGPUscanreportpowerusageviatheNVIDIAManagementLi-brary(NVML)[26].
ThenvmlDeviceGetPowerUsage()routineexportsthecurrentpower;onFermiC2075GPUsithasmilliwattresolutionwithin±5Wandisupdatedatroughly60Hz.
Thepowerreportedisthatfortheentireboard,includingGPUandmemory.
GatheringdetailedperformanceinformationfromaGPUisdifcult:onceyoudispatchcodetoaGPUtherunningCPUhasnocontroloverituntiltheGPUreturnsuponcomple-tion.
ThismeansthatitisnotgenerallypossibletoattributewhatGPUcodecorrespondstowhatpowerreadings.
Nvidiaprovidesahigh-levelutilitycallednvidia-smiwhichcanbeusedtomeasurepower,butitssamplerateistoolongtoobtainusefulmeasurements.
InordertoprovidebetterpowermeasurementswehaveconstructedanNVMLcomponent[27]forPAPIandhavevalidatedtheresultsusinga"Kill-A-Watt"powermeter.
Figure6showsdatagatheredonanNvidiaFermiC2075cardrunningaMAGMA[28]kernelusingtheLUalgo-rithm[29]withamatrixsizeof10k.
TheMAGMALUfactorizationisacomputeboundalgo-rithm(expressedintermsofGEMMs);itusesahybridizationmethodologytosplitthecomputationbetweentheCPUhostandGPU.
ThesplitaimstomatchLU'salgorithmicrequire-mentstothearchitecturalstrengthsoftheGPUandtheCPU.
InthecaseofLU,thistranslatesintohavingallmatrix-matrix(GEMM)multiplicationdoneontheGmyPU,andthepanelfactorizationsonCPU.
ThedesignofthealgorithmallowsforbigenoughmatricestototallyoverlaptheCPUworkwiththelargematrix-matrixmultiplicationsontheGPU.
Asaresult,theperformanceoftheMAGMALUalgorithmrunsatthespeedofperformingGEMMsontheGPU.
OurexperimentshaveshownthattheuseofMAGMAGEMMoperationsonGPUcompletelyutilizeit,maximizingthepowerconsumption.
ThisexplainswhythehybridLUfactorizationalsomaximizestheGPUpowerconsumption,whichreducestimetakensotheoverallenergyconsumptionisminimized.
C.
EstimatedPowerVariousresearcheshaveproposedusinghardwareperfor-mancecounterstomodelenergyandpowerconsumption[15],[30],[31],[32],[33],[16],[34],[35],[36].
Goeletal.
[36]haveshownthatpowercanbemodeledtowithin10%usingjustfourhardwareperformancecounters.
UsingthePAPIuser-denedeventsinfrastructure[37]aneventcanbecreatedthatderivesanestimatedpowervaluefromthehardwarecounters.
Thiscanbeusedtomeasurepoweronsystemsthatdonothavehardwarepowermeasure-mentavailable.
V.
CONCLUSIONThePAPIlibrarycannowprovidetransparentaccesstopowerandenergymeasurementsviaexistinginterfaces.
Exist-ingprogramsthatalreadyhaveinstrumentationforPAPIforCPUperformancemeasurementscanquicklybeadaptedtomeasurepower,andexistingtoolswillgainaccesstothenewpowereventswithasimplePAPIupgrade.
Withlargerandlargerclustersbeingbuilt,energyconsump-tionhasbecomeoneofthedeningconstraints.
PAPIhasbeencontinuallyextendedtoprovidesupportforthemostup-to-dateperformancemeasurementsonmodernsystems.
TheadditionofpowerandenergymeasurementsallowPAPIuserstostay6Appearedinthe2012PASAWorkshopontopofthisincreasinglyimportantareainthealwaysrapidlychangingHPCenvironment.
ACKNOWLEDGMENTThismaterialisbaseduponworksupportedbytheNationalScienceFoundationunderGrantNo.
0910899andtheU.
S.
DepartmentofEnergyOfceofScienceundercontractDE-FC02-06ER25761.
REFERENCES[1]S.
Browne,J.
Dongarra,N.
Garner,G.
Ho,andP.
Mucci,"Aportableprogramminginterfaceforperformanceevaluationonmodernproces-sors,"InternationalJournalofHighPerformanceComputingApplica-tions,vol.
14,no.
3,pp.
189–204,2000.
[2]D.
Terpstra,H.
Jagode,H.
You,andJ.
Dongarra,"Collectingperfor-mancedatawithPAPI-C,"in3rdParallelToolsWorkshop,2009,pp.
157–173.
[3]"Top500supercomputingsites,"http://www.
top500.
org/.
[4]"Topgreen500list::Environmentallyresponsiblesupercomputing,"http://www.
green500.
org/.
[5]S.
ShendeandA.
Malony,"TheTauparallelperformancesystem,"InternationalJournalofHighPerformanceComputingApplications,vol.
20,no.
2,pp.
287–311,2006.
[6]L.
Adhianto,S.
Banerjee,M.
Fagan,M.
Krentel,G.
Marin,J.
Mellor-Crummey,andN.
Tallent,"HPCToolkit:Toolsforperformanceanalysisofoptimizedparallelprograms,"ConcurrencyandComputation:Prac-ticeandExperience,vol.
22,no.
6,pp.
685–701,2010.
[7]W.
Nagel,A.
Arnold,M.
Weber,H.
-C.
Hoppe,andK.
Solchenbach,"VAMPIR:VisualizationandanalysisofMPIresources,"Supercom-puter,vol.
12,no.
1,pp.
69–80,1996.
[8]Intel,IntelEnergyChecker:SoftwareDeveloperKitUserGuide,2010.
[9]R.
Ge,X.
Feng,S.
Song,H.
-C.
Chang,D.
Li,andK.
Cameron,"Pow-erPack:Energyprolingandanalysisofhigh-performancesystemsandapplications,"IEEETransactionsonParallelandDistributedSystems,vol.
21,no.
6,May2010.
[10]P.
Popa,"ManagingserverenergyconsumptionusingIBMPowerExec-utive,"IBMSystemsandTechnologyGroup,Tech.
Rep.
,2006.
[11]D.
Shin,H.
Shim,Y.
Joo,H.
-S.
Yun,J.
Kim,andN.
Chang,"Energy-monitoringtoolforlow-powerembeddedprograms,"IEEEDesign&TestofComputers,vol.
19,no.
4,pp.
7–17,July/August2002.
[12]S.
Ryffel,"LEA2P:Thelinuxenergyattributionandaccountingplat-form,"Master'sthesis,SwissFederalInstituteofTechnology,Jan.
2009.
[13]J.
FlinnandM.
Satyanarayanan,"PowerScope:atoolforprolingtheenergyusageofmobileapplications,"inProc.
ofthe2ndIEEEWorkshoponMobileComputingSystemsandApplications,Feb.
1999,pp.
2–10.
[14]T.
Stathopoulos,D.
McIntire,andW.
Kaiser,"Theenergyendoscope:Real-timedetailedenergyaccountingforwirelesssensornodes,"inProc.
oftheInternationalConferenceonInformationProcessinginSensorNetworks,Apr.
2008,pp.
383–394.
[15]C.
IsciandM.
Martonosi,"Runtimepowermonitoringinhigh-endprocessors:Methodologyandempiricaldata,"inProc.
IEEE/ACM36thAnnualInternationalSymposiumonMicroarchitecture,Dec.
2003.
[16]F.
Bellosa,"Thebenetsofevent:drivenenergyaccountinginpower-sensitivesystems,"inProceedingsofthe9thworkshoponACMSIGOPSEuropeanworkshop,2000.
[17]PLASMAUsers'Guide,ParallelLinearAlgebraSoftwareforMulticoreArchitectures,Version2.
3,UniversityofTennesseeKnoxville,Nov.
2010.
[18]S.
Tomov,R.
Nath,H.
Ltaief,andJ.
Dongarra,"DenselinearalgebrasolversformulticorewithGPUaccelerators,"inProc.
24thIEEE/ACMInternationalParallelandDistributedProcessingSymposium,Apr.
2010.
[19]D.
Bedard,R.
Fowler,M.
Linn,andA.
Portereld,"PowerMon2:Fine-grained,integratedpowermeasurement,"RenaissanceComputingInstitute,Tech.
Rep.
TR-09-04,2009.
[20]Intel,IntelArchitectureSoftwareDeveloper'sManual,Volume3:SystemProgrammingGuide,2009.
[21]E.
Rotem,A.
Naveh,D.
Rajwan,A.
Anathakrishnan,andE.
Weissmann,"Power-managementarchitectureoftheIntelmicroarchitecturecode-namedSandyBridge,"IEEEMicro,vol.
32,no.
2,pp.
20–27,2012.
[22]Z.
Rui.
(2011,May)[patch2/3]introducein-telrapldriver.
linux-kernelmailinglist.
[Online].
Available:http://thread.
gmane.
org/gmane.
linux.
kernel/1145973[23]B.
Rountree,D.
Ahn,B.
deSupinski,D.
Lowenthal,andM.
Schulz,"BeyondDVFS:Arstlookatperformanceunderahardware-enforcedpowerbound,"inProc.
of8thWorkshoponHigh-Performance,Power-AwareComputing,May2012.
[24]Intel,Intel,MathKernelLibrary(MKL),http://www.
intel.
com/software/products/mkl/.
[25]AMD,AMDFamily15hProcessorBIOSandKernelDeveloperGuide,2011.
[26]NVMLReferenceManual,NVIDIA,2012.
[27]K.
Kasichayanula,"PowerawarecomputingonGPUs,"Master'sthesis,UniversityofTennessee,Knoxville,May2012.
[28]E.
Agullo,C.
Augonnet,J.
Dongarra,H.
Ltaief,R.
Namyst,S.
Thibault,andS.
Tomov,"Faster,cheaper,better-ahybridizationmethodologytodeveloplinearalgebrasoftwareforGPUs,"LAPACKWorkingNote230.
[29]S.
Yamazaki,S.
Tomov,andJ.
Dongarra,"One-sideddensematrixfactorizationsonamulticorewithmultipleGPUaccelerators,"inProc.
ofthe2012InternationalConferenceonComputationalScience,Jun.
2012.
[30]K.
Singh,M.
Bhadauria,andS.
McKee,"Realtimepowerestimationofmulti-coresviaperformancecounters,"Proc.
WorkshoponDesign,ArchitectureandSimulationofChipMulti-Processors,Nov.
2008.
[31]I.
Kadayif,T.
Chinoda,M.
Kandemir,N.
Vijaykirsnan,M.
Irwin,andA.
Sivasubramaniam,"vEC:virtualenergycounters,"inProc.
ofthe2001ACMSIGPLAN-SIGSOFTworkshoponProgramanalysisforsoftwaretoolsandengineering,Jun.
2001.
[32]V.
Tiwari,S.
Malik,andA.
Wolfe,"Poweranalysisofembeddedsoftware:arststeptowardssoftwarepowerminimization,"IEEETransactionsonVLSI,vol.
3,no.
4,pp.
437–445,1994.
[33]J.
RussellandM.
Jacome,"Softwarepowerestimationandoptimizationforhighperformance,32-bitembeddedprocessors,"inProc.
IEEEInternationalConferenceonComputerDesign,Oct.
1998,pp.
328–333.
[34]R.
JosephandM.
Martonosi,"Run-timepowerestimationinhigh-performancemicroprocessors,"inProc.
IEEE/ACMInternationalSym-posiumonLowPowerElectronicsandDesign,Aug.
2001,pp.
135–140.
[35]J.
Haid,G.
Kaefer,C.
Steger,andR.
Weiss,"Run-timeenergyestimationinsystem-on-a-chipdesigns,"inProc.
oftheAsiaandSouthPacicDesignAutomationConference,Jan.
2003,pp.
595–599.
[36]B.
Goel,S.
McKee,R.
Gioiosa,K.
Singh,M.
Bhadauria,andM.
Cesati,"Portable,scalable,per-corepowerestimationforintelligentresourcemanagement.
"inFirstInternationalGreenComputingConference,Aug.
2010.
[37]S.
MooreandJ.
Ralph,"User-denedeventsforhardwareperformancemonitoring,"inProc.
11thWorkshoponToolsforProgramDevelopmentandAnalysisinComputationalScience,Jun.
2011.
7

NameCheap黑色星期五和网络礼拜一

如果我们较早关注NameCheap商家的朋友应该记得前几年商家黑色星期五和网络星期一的时候大促采用的闪购活动,每一个小时轮番变化一次促销活动而且限量的。那时候会导致拥挤官网打不开迟缓的问题。从去年开始,包括今年,NameCheap商家比较直接的告诉你黑色星期五和网络星期一为期6天的活动。没有给你限量的活动,只有限时六天,这个是到11月29日。如果我们有需要新注册、转入域名的可以参加,优惠力度还是比...

LiCloud:香港CMI/香港CN2+BGP服务器,30Mbps,$39.99/月;香港KVM VPS仅$6.99/月

licloud怎么样?licloud目前提供香港cmi服务器及香港CN2+BGP服务器/E3-1230v2/16GB内存/240GB SSD硬盘/不限流量/30Mbps带宽,$39.99/月。licloud 成立於2021年,是香港LiCloud Limited(CR No.3013909)旗下的品牌,主要提供香港kvm vps,分为精简网络和高级网络A、高级网络B,现在精简网络和高级网络A。现在...

免费注册宝塔面板账户赠送价值3188礼包适合购买抵扣折扣

对于一般的用户来说,我们使用宝塔面板免费版本功能还是足够的,如果我们有需要付费插件和专业版的功能,且需要的插件比较多,实际上且长期使用的话,还是购买付费专业版或者企业版本划算一些。昨天也有在文章中分享年中促销活动。如今我们是否会发现,我们在安装宝塔面板后是必须强制我们登录账户的,否则一直有弹出登录界面,我们还是注册一个账户比较好。反正免费注册宝塔账户还有代金券赠送。 新注册宝塔账户送代金券我们注册...

sandybridge为你推荐
网络访问怎样设置Internet网络连接共享?太空国家在载人航天领域排名前三的国家是什么?留学生认证留学生前阶段双认证认证什么内容?18comic.fun黑色禁药http://www.lovecomic.cn/attachment/Fid_18/18_4_00d3b0cb502ea74.jpg这幅画名字叫什么?杰景新特我准备在网上买杰普特711RBES长笛,10700元,这价格合理吗?还有,这是纯银的吗,是国内组装的吗?seo优化工具SEO优化工具哪个好用点啊?www.119mm.comwww.993mm+com精品集!www.gegeshe.com有什么好听的流行歌曲www.javmoo.comJAV编程怎么做?avtt4.comwww.5c5c.com怎么进入
花生壳域名贝锐 flashfxp怎么用 国外主机 mediafire下载 12u机柜尺寸 线路工具 typecho debian7 网站被封 免费全能空间 有奖调查 jsp空间 服务器是干什么的 美国独立日 空间购买 php服务器 畅行云 cdn网站加速 supercache zcloud 更多