cumulativesandybridge

sandybridge  时间:2021-03-27  阅读:()
MeasuringEnergyandPowerwithPAPIVincentM.
Weaver,MattJohnson,KiranKasichayanula,JamesRalph,PiotrLuszczek,DanTerpstra,andShirleyMooreInnovativeComputingLaboratoryUniversityofTennessee{vweaver1,mrj,kirankk,ralph,luszczek,terpstra,shirley}@eecs.
utk.
eduAbstract—Energyandpowerconsumptionarebecomingcriti-calmetricsinthedesignandusageofhighperformancesystems.
WehaveextendedthePerformanceAPI(PAPI)analysislibrarytomeasureandreportenergyandpowervalues.
ThesevaluesarereportedusingtheexistingPAPIAPI,allowingcodepreviouslyinstrumentedforperformancecounterstoalsomeasurepowerandenergy.
HigherleveltoolsthatbuildonPAPIwillautomat-icallygainsupportforpowerandenergyreadingswhenusedwiththenewestversionofPAPI.
WedescribeindetailthetypesofenergyandpowerreadingsavailablethroughPAPI.
Wesupportexternalpowermeters,aswellasvaluesprovidedinternallybyrecentCPUsandGPUs.
Measurementsareprovideddirectlytotheinstrumentedprocess,allowingimmediatecodeanalysisinrealtime.
Weprovideexamplesshowingresultsthatcanbeobtainedwithourinfrastructure.
IndexTerms—energymeasurement;powermeasurement;per-formanceanalysisI.
INTRODUCTIONThePerformanceAPI(PAPI)[1]frameworkhastradition-allyprovidedlow-levelcross-platformaccesstothehardwareperformancecountersavailableonmostmodernCPUs.
WiththeadventofcomponentPAPI(PAPI-C)[2],PAPIhasbeenextendedtoprovideawidervarietyofperformancedatafromvarioussources.
Recentlyanumberofnewcomponentshavebeenaddedthatprovidetheabilitytomeasureasystem'senergyandpowerusage.
Energyandpowerhavebecomeincreasinglyimportantcomponentsofoverallsystembehaviorinhigh-performancecomputing(HPC).
Powerandenergyconcernswereonceprimarilyofinteresttoembeddeddevelopers.
NowthatHPCmachineshavehundredsofthousandsofcores[3],theabilitytoreduceconsumptionbyjustafewWattsperCPUquicklyaddsuptomajorpower,cooling,andmonetarysavings.
TherehasbeenalotofHPCinterestinthisarearecently,includingtheGreen500[4]listofenergy-efcientsupercomputers.
PAPI'sabilitytobeextendedbycomponentsallowsaddingsupportforenergyandpowermeasurementswithoutanychangesneededtothecoreinfrastructure.
Existingcodethatisalreadyinstrumentedformeasuringperformancecounterscanbere-used;thenewpowerandenergyeventswillshowupineventlistingsjustlikeotherperformanceevents,andcanbemeasuredwiththesameexistingPAPIAPI.
ThiswillallowcurrentusersofPAPIonHPCsystemstoanalyzepowerandenergywithlittleadditionaleffort.
Therearemanyexistingtoolsthatprovideaccesstopowerandenergymeasurements(oftenthesecomewiththepowermeasuringhardware).
PAPI'sadvantageisthatitallowsmea-suringadiversesetofhardwarewithonecommoninterface.
Usersonlyinstrumenttheircodeonce,andthencanuseitwithminimalchangesastheircodeismovedbetweendifferentmachineswithdifferenthardware.
WithoutPAPItheinstrumentedcodewouldhavetobere-writtendependingonwhatpowermeasurementhardwareitisrunningon.
AnotherbenetofPAPIisthatinadditiontomeasuringenergyandpower,italsoprovidesaccesstoothervalues,suchasCPUperformancecounters,GPUcounters,network,andI/O.
Allofthesecanbemeasuredatthesametime,providingforaricheranalysisenvironment.
ManyoftheotheradvancedPAPIfeatures,suchassamplingandproling,canpotentiallybeusedinconjunctionwiththesenewpowerandenergyevents.
Higher-leveltoolsthatbuildontopofPAPI(suchasTAU[5],HPCToolkit[6],orVampir[7])automaticallygetsupportforthesenewmeasurementsassoonastheyarepairedwithanupdatedPAPIversion.
WewilldescribeindetailthevarioustypesofpowerandenergymeasurementsthatwillbeavailableinthePAPI5.
0release,aswellasshowingexamplesofthedatathatcanbegathered.
II.
RELATEDWORKTherearevariousexistingtoolsthatprovideaccesstopowerandenergyvalues.
Ingeneralthesetoolsdonothaveacross-platformAPIlikePAPI,noraretheydeployedaswidely.
PAPIhasthebenetofallowingenergymeasurementsatthesametimeasCPUandotherperformancecountermeasurements,allowinganalysisoflow-levelenergybehavioratthesourcecodelevel.
PAPIcanalsoactasanabstractionlibrary,somostofthetoolslistedbelowcouldbegivenPAPIcomponentinterfaces.
ThetoolthatprovidesthemostsimilarfunctionalitytoPAPIistheIntelEnergyCheckerSDK[8].
ItprovidesanAPIforinstrumentingcodeandgatheringenergyinformationfromavarietyofexternalpowermetersandsystemcounters.
Itprovidessupportforvariousoperatingsystems,butislimitedtoIntelarchitectures.
PowerPack[9]providesaninterfaceformeasuringpowerfromavarietyofexternalpowersources.
TheAPIprovidesroutinesforstartingandstoppingthegatheringofdataontheremotemachine.
UnlikePAPI,themeasurementsaregatheredout-of-band(onaseparatemachine)andthuscannotbedirectlyprovidedtotherunningprocessinrealtime.
Appearedinthe2012PASAWorkshopIBMPowerExecutive[10]allowsmonitoringpowerandenergyonIBMbladeservers.
AswithPowerPack,thedataisgatheredandanalyzedbyatool(inthiscaseIBMDirector)runningonaseparatemachine.
Shinetal.
[11]constructapowerboardforanARMsystemthatestimatespowerandcommunicateswithafront-endtoolviaPCI.
Varioustoolsaredescribedthatusethegatheredinformation,butthereisnotagenericAPIforaccessingit.
TheLinuxEnergyAttributionandAccountingPlatform(LEA2P)[12]acquiresdataonasystemwithhardwarecustom-modiedtoprovidepowerreadingsviaadataacqui-sitionboard.
ThesevaluesarepassedintotheLinuxkernelandmadeavailableviathe/proclesystemandcanbereadin-band.
PowerScope[13]usesadigitalmultimetertoperformoff-lineanalysisusingstatisticalsampling.
Itprovidesakernel-levelinterface(viasystemcalls)tostartandstopmeasure-ments;thisrequiresmodifyingtheoperatingsystem.
Thebenetofthissystemisthatpowerinformationiskeptintheprocesstable,allowingonetomapenergyusageinadetailedper-processway.
TheEnergyEndoscope[14]isanembeddedwirelesssensornetworkthatprovidesdetailedreal-timeenergymeasurementsviaacustom-designedhelperchip.
TheLinuxkernelismodiedtoreportenergyin/proc/statalongwithotherprocessorstats.
IsciandMartonosi[15]combineexternalpowermetermea-surementswithperformancecounterresultstogeneratepowerreadingswithamodeledCPU.
Thereadingsaregatheredonanexternalmachine.
Bellosa[16]proposesJouleWatcher,aninfrastructurethatuseshardwareperformancecounterstoestimatepowerandprovidethisinformationtothekernelforschedulingdecisions.
HeproposesagenericAPItoprovidethisinformationtousers.
III.
BACKGROUNDPAPIusershaverecentlybecomemoreconcernedwithenergyandpowermeasurements.
Partofthisisduetotheadditionofembeddedsystemsupport(includingARMandMIPSprocessors)andpartisfromthecurrentinterestinenergy-efciencyinPAPI'straditionalHPCenvironment.
WithPAPI-C(componentPAPI)itisstraightforwardtoaddextraPAPI"components"thatreportvaluesoutsideoftheusualhardwareperformancecountersthatwerelongthemainstayofPAPI.
ThePAPIAPIreturnsunsigned64-bitintegers;aslongasapowerorenergyvaluecantthatconstraintnochangesatallneedtobemadetoexistingPAPIcode.
A.
NewPAPIInterfacesTheexistingPAPIinterfaceissufcientforprovidingpowerandenergyvalues,buttherecentPAPI5.
0releaseaddsmanyfeaturesthatimprovethecollectionofthisinformation.
Themostimportantnewfeatureisenhancedeventinfor-mationsupport.
Theusercanqueryaneventandobtainfarricherdetailsthanwereavailablepreviously.
Thenewinterfaceallowsspecifyingunitsforareturnedvalue,allowingausertoknowifthevaluestheyaregettingarein"Watts","Joules"orperhapseven"nano-Joules"withouthavingtolookinthesystemdocumentation.
Anothernewfeatureistheabilitytoreturnvaluesotherthanunsignedintegers,includingoatingpoint.
Thisallowreturningpowervaluesinhuman-friendlyamountssuchas96.
45Wattsratherthan96450milliwatts.
Additionaleventinformationisprovidedthatwillhelpexternaltoolsanalyzetheresults,especiallywhentryingtocorrelatepowerresultswithothermeasurements.
PAPInowprovidesthefrequencywithwhichthevalueisupdatedandwhetherthevaluereturnedisinstantaneous(likeanaveragepowerreading)orcumulative(totalEnergy).
B.
LimitationsTherearesomelimitationswhenmeasuringpowerandenergyusingPAPI.
Typicallythesereadingsaresystem-wide:itisnotpossibletoexactlymaptheresultsexactlytotheuser'scode,especiallyonmulti-coresystems.
Oftenauserisinter-estedinknowingwherethepowerusagecomesfrom:powersupplyinefciencies,theCPU,networkcard,memory,etc.
Withexternalpowermetersitisnotpossibletobreakdownthefull-systempowermeasurementsintoper-componentvalues.
Sincepoweroptimizationforvarioushardwarecomponentsrequiredifferentstrategies,havingonlytotalsystempowermightnotprovideenoughinformationtoallowoptimization.
IdeallyonecouldcorrelatepowerandenergywithCPUandotherPAPImeasurements.
Thiscanbedone;valuescanbemeasuredatthesametime(althoughinseparateeventsets).
Howeverduetothenatureofthemeasurementsitishardtogetanexactcorrelation.
Anotherissueisthatofmeasurementoverhead.
SincePAPIhastorunonthesystemgatheringtheresults,itcontributestotheoverallpowerbudgetofthesystem.
Toolsthatmeasurepowerexternallydonothavethisproblem.
IV.
PAPIENERGYANDPOWERCOMPONENTSThenewPAPI5.
0releaseaddssupportforvariouspowerandenergycomponents.
PAPIcomponentsmeasurepowerandenergyin-band:aprogramisinstrumentedwithPAPIcallsandcanreadmea-surementdataintotherunningprocess.
Thedatacanbestoredtodiskforlaterofineanalysis,butbydefaultitisavailableforimmediateaction.
Thiscontrastswithothertoolsthatonlysupportout-of-bandmeasurements:theycanonlyanalyzecodeatalatertime,andtheprogrambeingproledisnotawareofitscurrentpowerorenergystatus.
Weuselinearalgebraroutinesthatperformone-sidedfac-torizationofdensematricestocomparevariousmethodsofmeasuringenergy.
Inparticular,wetestCholeskyfactorizationfromPLASMA[17]ontheprocessorsideandLUfactor-izationontheGPUusingMAGMA[18].
Bothofthesearecomputationallyboundandthusshowvariablepowerdrawbythecomputingdevice:eitherCPUorGPU.
Ourtestsalsoshowmemoryeffectsbyincludingmemoryboundoperationssuchasllingthematriceswithinitialvalues.
2Appearedinthe2012PASAWorkshop0204060801001201401600510152025303540Power(Watts)Time(seconds)CPUMemoryMotherboadFanFig.
1.
PLASMACholeskypowerusagegatheredbyPowerPack(notPAPI).
Resultsweregatheredout-of-band;PAPIcangathersimilardatain-band.
Forcomparisonpurposes,Figure1showsPLASMACholeskyresultsgatheredwithPowerPack[9](notPAPI)onamachinecustom-wiredforpowermeasurement.
Resultsaregatheredonanunrelatedmachine(whichhastheadvantageofnotincludingtheoverheadofthemeasurementinthepowerreadings).
WeshowthatPAPIcangeneratesimilarresultsfromavarietyofpowermeasurementdevices.
A.
ExternalMeasurementThemostcommontypeofpowermeasurementinfrastruc-tureisonewhereanexternalpowermeterisused.
ForPAPItoaccessthedata,thevalueshavetobepassedbacktothemachinebeingmeasured.
ThisisusuallydoneviaaserialorUSBconnection.
Theeasiesttypeofequipmenttouseinthiscaseisonewhereapowerpass-throughisused;thisdevicelookslikeapowerstrip,andallowsmeasuringthepowerconsumptionofanythingpluggedintothedevice.
Moreintrusivefull-systeminstrumentationcanbedone,wherewiresarehookedintopowersupplies,disks,processorsockets,andDIMMsockets.
Thisenablesne-grainedpowermeasurementbutusuallyrequiresextensiveinstallationcosts.
1)Watt'sUpProPowerMeter:TheWatt'sUpPropower-meterisanexternalmeasurementdevicethatasystemplugsintoinsteadofawalloutlet;itprovidesvariousmeasurementsviaaUSBserialconnection.
Themetricscollectedincludeaveragepower,voltage,current,andvariousothers.
Energycanbederivedbasedontheaveragepowerandtime.
Theresultsaresystem-wideandlowresolution,withupdatesonlyonceasecond.
WritingaPAPIdriverforthisdeviceisnontrivial,astheresultsbecomeavailableeverysecondwhetherrequestedornot.
Anydatacanpotentiallybelostiftheon-boardloggingmemoryisfullandareaddoesnothappenintheone-secondtimewindow.
SincePAPIuserscannotbeexpectedtohavetheircodeinterruptitselfonceasecondtomeasuredata,thePAPIcomponentforksahelperthreadthatreadsthedataonaregularbasis,andthenreturnsoverallvalueswhenaninstrumentedprogramrequestsit.
SomedatagatheredfromaWatt'sUpProdeviceareshowninFigure2.
Theresultsarecoarseduetotheone-secondsamplingfrequencyofthedevice.
Thiscanbegoodenoughfordoingvalidationandglobalinvestigations,butprobablynotdetailedenoughwhentuningcodeforenergyefciency.
However,thegeneraltrendsinpowerconsumptionforthecodeinquestion(CholeskyfactorizationfromPLASMA[17])aresimilartothemuchner-graingraphinFigure1.
InFigure2theinitialspikeinpowerconsumptiontoabout50W(twosecondsintotherun)representsdatageneration(creationofarandommatrix)andcorrespondstoaatledgeatabout130WinFigure1.
Foursecondsintotherun,bothguresindicateauctuationaroundthemaximumpowerlevelforthewholerun.
TheuctuationsaremuchmoreaccuratelyportrayedinFigure1,indicatingtheneedforgranularitysubstantiallylowerthan1secondavailablefortheWatt'sUpProdevice.
2)PowerMon2:Thepowermon2[19]cardsitsbetweenasystem'spowersupplyanditsvariouscomponents.
Itmeasuresvoltageandcurrenton8differentlines,monitoringmostofthepowergoingintothecomputer.
Measurementshappenatafrequencyofupto3kHz;thisismultiplexedacrossauser-selectedsubsetofthe8channels.
WeareworkingonaPAPIcomponentforthisdevice,butsupportiscurrentlynotavailable.
Weforeseeusingthisdevicetoprovideenergyresultsatadetailnotavailablewithotherexternalpowermeters.
B.
InternalMeasurementRecentcomputerhardwareincludessupportformeasuringenergyandpowerconsumptioninternally.
Thisallowsne-grainedpoweranalysiswithouthavingtocustom-instrumentthehardware.
3Appearedinthe2012PASAWorkshop0102030Time(seconds)0204060AveragePower(Watts)PLASMACholeskyFactorizationN=10,000threads=2Fig.
2.
PLASMACholeskypowergatheredwithaWatt'sUpProdeviceonanIntelCore2laptop.
Coarseresultsduetoone-secondsamplingfrequency.
Accesstothemeasurementsusuallyrequiresdirectlow-levelhardwarereads,althoughsometimestheoperatingsystemoralibrarywilldothisforyou.
1)IntelRAPL:RecentIntelSandyBridgechipsincludethe"RunningAveragePowerLimit"(RAPL)interface,whichisdescribedintheIntelSoftwareDeveloper'sManual[20].
RAPL'soveralldesigngoalistoprovideaninfrastructureforkeepingprocessorsinsideofagivenuser-speciedpowerenvelope.
Theinternalcircuitrycanestimatecurrentenergyusagebasedonamodeldrivenbyhardwarecounters,tem-perature,andleakagemodels.
Theresultsofthismodelareavailabletotheuserviaamodelspecicregister(MSR),withanupdatefrequencyontheorderofmilliseconds.
ThepowermodelhasbeenvalidatedbyIntel[21]tocloselyfollowactualenergybeingused.
PAPIprovidesaccesstothevaluesreturnedbythepowermodel.
AccessingMSRsrequiresring-0accesstothehardware;typicallyonlytheoperatingsystemkernelcandothis.
ThismeansaccessingtheRAPLvaluesrequiresakerneldriver.
CurrentlyLinuxdoesnotprovidesuchadriver;onehasbeenproposed[22]butitisunlikelyitwillbemergedintothemainkerneltreeanytimesoon.
Togetaroundthisproblem,weusetheLinux"MSRdriver"thatexportsMSRaccesstouserspaceviaaspecialdevicedriver.
IftheMSRdriverisenabledandgivenproperread-onlypermissionsthenPAPIcanaccesstheseregistersdirectlywithoutneedingkernelsupport.
TherearesomelimitationstoaccessingRAPLthisway.
Theresultsaresystem-widevaluesandcannoteasilybeattributedtoindividualthreads.
Thisisnotworsethanmeasurementsofanysharedresource;onmodernIntelchipslastlevelcachesandtheuncoreeventssharethislimitation.
RAPLreportsvariousenergyreadings.
Thisincludestheenergyusageforthetotalprocessorpackageandthetotalcombinedenergyusedbyallthecores(referredtoasPower-Plane0(PP0)).
PP0alsoincludesalloftheprocessorcaches.
SomeversionsofSandyBridgechipsalsoreportpowerusagebytheon-boardGPU(Power-Plane1(PP1)).
SandybridgeEPchipsdonotsupporttheGPUmeasurement,butinsteadreportenergyreadingsfortheDRAMinterface.
WhiletheRAPLvaluescanbemeasuredin-bandandconsumedbytheprogram,sinceRAPLissystem-wideaseparateprocessmaybeusedtomeasureenergyandpower.
InthiswaytherunningcodedoesnotneedtobeinstrumentedandsomeofthePAPIoverheadcanbeavoided.
Weusethismethodtogathertheresultspresented.
WetakemeasurementsonaSandybridgeEPmachine.
Ithas2CPUpackages,eachwith8cores,andeachcorewith2threads.
Figure3showssomeaveragepowermea-surementsgatheredwhiledoingCholeskyfactorizationusingthePLASMAlibrary.
Noticethattheenergyusagebyeachpackagevaries,despiteallofthecoresdoingsimilarwork.
Partofthisislikelyduetovariationsinthecoresatthesiliconlevel,asnoticedbyRountreeetal.
[23].
Figure4showsthesamemeasurementsusingtheIntelMKLlibrary[24].
Figure5showssomeenergymeasurementscomparingthesameCholeskyfactorizationusingbothPLASMAandIntelMKLonthesamehardware.
ThePAPIresultsshowthatforthiscase,PLASMAusesenergymorequickly,butnishesfasteranduseslesstotalenergyforthecalculation.
2)AMDApplicationPowerManagement:RecentAMDFamily15hprocessorscanreport"CurrentPowerInWatts".
[25]viathe"ProcessorPowerinTDP"MSR.
Weareinvesti-gatingPAPIsupportforthisandhopetodeployacomponentsimilarinnatureandscopetotheIntelRAPLcomponent.
4Appearedinthe2012PASAWorkshop10203040Time(seconds)050100150AveragePower(Watts)PLASMACholeskyFactorizationN=30,000threads=16DRAMPackage0DRAMPackage1PP0Package0PP0Package1TotalPackage0TotalPackage1Fig.
3.
PLASMACholeskypowerusagemeasuredwithRAPLonSandybridgeEP.
PowerPlane0(PP0)istotalusageforall8coresinapackage.
10203040Time(seconds)050100150AveragePower(Watts)MKLCholeskyFactorizationN=30,000threads=16DRAMPackage0DRAMPackage1PP0Package0PP0Package1TotalPackage0TotalPackage1Fig.
4.
IntelMKLCholeskypowerusagemeasuredwithRAPLonSandybridge.
PowerPlane0(PP0)istotalusageforall8coresinapackage.
10203040Time(seconds)01000200030004000TotalEnergy(Joules)CholeskyFactorizationN=30,000threads=16PLASMAPackage0PLASMAPackage1mklPackage0mklPackage1Fig.
5.
Energyusageoftwodifferentimplementations(PLASMAandMKL)ofCholeskyonSandybridgeEPmeasuredwithRAPL.
5Appearedinthe2012PASAWorkshop012Time(seconds)050100150AveragePower(Watts)Fig.
6.
MAGMALUwithsize10,000powermeasurementonanNvidiaFermiC2075,gatheredwithNVML.
3)NVIDIAManagementLibrary:RecentNVIDIAGPUscanreportpowerusageviatheNVIDIAManagementLi-brary(NVML)[26].
ThenvmlDeviceGetPowerUsage()routineexportsthecurrentpower;onFermiC2075GPUsithasmilliwattresolutionwithin±5Wandisupdatedatroughly60Hz.
Thepowerreportedisthatfortheentireboard,includingGPUandmemory.
GatheringdetailedperformanceinformationfromaGPUisdifcult:onceyoudispatchcodetoaGPUtherunningCPUhasnocontroloverituntiltheGPUreturnsuponcomple-tion.
ThismeansthatitisnotgenerallypossibletoattributewhatGPUcodecorrespondstowhatpowerreadings.
Nvidiaprovidesahigh-levelutilitycallednvidia-smiwhichcanbeusedtomeasurepower,butitssamplerateistoolongtoobtainusefulmeasurements.
InordertoprovidebetterpowermeasurementswehaveconstructedanNVMLcomponent[27]forPAPIandhavevalidatedtheresultsusinga"Kill-A-Watt"powermeter.
Figure6showsdatagatheredonanNvidiaFermiC2075cardrunningaMAGMA[28]kernelusingtheLUalgo-rithm[29]withamatrixsizeof10k.
TheMAGMALUfactorizationisacomputeboundalgo-rithm(expressedintermsofGEMMs);itusesahybridizationmethodologytosplitthecomputationbetweentheCPUhostandGPU.
ThesplitaimstomatchLU'salgorithmicrequire-mentstothearchitecturalstrengthsoftheGPUandtheCPU.
InthecaseofLU,thistranslatesintohavingallmatrix-matrix(GEMM)multiplicationdoneontheGmyPU,andthepanelfactorizationsonCPU.
ThedesignofthealgorithmallowsforbigenoughmatricestototallyoverlaptheCPUworkwiththelargematrix-matrixmultiplicationsontheGPU.
Asaresult,theperformanceoftheMAGMALUalgorithmrunsatthespeedofperformingGEMMsontheGPU.
OurexperimentshaveshownthattheuseofMAGMAGEMMoperationsonGPUcompletelyutilizeit,maximizingthepowerconsumption.
ThisexplainswhythehybridLUfactorizationalsomaximizestheGPUpowerconsumption,whichreducestimetakensotheoverallenergyconsumptionisminimized.
C.
EstimatedPowerVariousresearcheshaveproposedusinghardwareperfor-mancecounterstomodelenergyandpowerconsumption[15],[30],[31],[32],[33],[16],[34],[35],[36].
Goeletal.
[36]haveshownthatpowercanbemodeledtowithin10%usingjustfourhardwareperformancecounters.
UsingthePAPIuser-denedeventsinfrastructure[37]aneventcanbecreatedthatderivesanestimatedpowervaluefromthehardwarecounters.
Thiscanbeusedtomeasurepoweronsystemsthatdonothavehardwarepowermeasure-mentavailable.
V.
CONCLUSIONThePAPIlibrarycannowprovidetransparentaccesstopowerandenergymeasurementsviaexistinginterfaces.
Exist-ingprogramsthatalreadyhaveinstrumentationforPAPIforCPUperformancemeasurementscanquicklybeadaptedtomeasurepower,andexistingtoolswillgainaccesstothenewpowereventswithasimplePAPIupgrade.
Withlargerandlargerclustersbeingbuilt,energyconsump-tionhasbecomeoneofthedeningconstraints.
PAPIhasbeencontinuallyextendedtoprovidesupportforthemostup-to-dateperformancemeasurementsonmodernsystems.
TheadditionofpowerandenergymeasurementsallowPAPIuserstostay6Appearedinthe2012PASAWorkshopontopofthisincreasinglyimportantareainthealwaysrapidlychangingHPCenvironment.
ACKNOWLEDGMENTThismaterialisbaseduponworksupportedbytheNationalScienceFoundationunderGrantNo.
0910899andtheU.
S.
DepartmentofEnergyOfceofScienceundercontractDE-FC02-06ER25761.
REFERENCES[1]S.
Browne,J.
Dongarra,N.
Garner,G.
Ho,andP.
Mucci,"Aportableprogramminginterfaceforperformanceevaluationonmodernproces-sors,"InternationalJournalofHighPerformanceComputingApplica-tions,vol.
14,no.
3,pp.
189–204,2000.
[2]D.
Terpstra,H.
Jagode,H.
You,andJ.
Dongarra,"Collectingperfor-mancedatawithPAPI-C,"in3rdParallelToolsWorkshop,2009,pp.
157–173.
[3]"Top500supercomputingsites,"http://www.
top500.
org/.
[4]"Topgreen500list::Environmentallyresponsiblesupercomputing,"http://www.
green500.
org/.
[5]S.
ShendeandA.
Malony,"TheTauparallelperformancesystem,"InternationalJournalofHighPerformanceComputingApplications,vol.
20,no.
2,pp.
287–311,2006.
[6]L.
Adhianto,S.
Banerjee,M.
Fagan,M.
Krentel,G.
Marin,J.
Mellor-Crummey,andN.
Tallent,"HPCToolkit:Toolsforperformanceanalysisofoptimizedparallelprograms,"ConcurrencyandComputation:Prac-ticeandExperience,vol.
22,no.
6,pp.
685–701,2010.
[7]W.
Nagel,A.
Arnold,M.
Weber,H.
-C.
Hoppe,andK.
Solchenbach,"VAMPIR:VisualizationandanalysisofMPIresources,"Supercom-puter,vol.
12,no.
1,pp.
69–80,1996.
[8]Intel,IntelEnergyChecker:SoftwareDeveloperKitUserGuide,2010.
[9]R.
Ge,X.
Feng,S.
Song,H.
-C.
Chang,D.
Li,andK.
Cameron,"Pow-erPack:Energyprolingandanalysisofhigh-performancesystemsandapplications,"IEEETransactionsonParallelandDistributedSystems,vol.
21,no.
6,May2010.
[10]P.
Popa,"ManagingserverenergyconsumptionusingIBMPowerExec-utive,"IBMSystemsandTechnologyGroup,Tech.
Rep.
,2006.
[11]D.
Shin,H.
Shim,Y.
Joo,H.
-S.
Yun,J.
Kim,andN.
Chang,"Energy-monitoringtoolforlow-powerembeddedprograms,"IEEEDesign&TestofComputers,vol.
19,no.
4,pp.
7–17,July/August2002.
[12]S.
Ryffel,"LEA2P:Thelinuxenergyattributionandaccountingplat-form,"Master'sthesis,SwissFederalInstituteofTechnology,Jan.
2009.
[13]J.
FlinnandM.
Satyanarayanan,"PowerScope:atoolforprolingtheenergyusageofmobileapplications,"inProc.
ofthe2ndIEEEWorkshoponMobileComputingSystemsandApplications,Feb.
1999,pp.
2–10.
[14]T.
Stathopoulos,D.
McIntire,andW.
Kaiser,"Theenergyendoscope:Real-timedetailedenergyaccountingforwirelesssensornodes,"inProc.
oftheInternationalConferenceonInformationProcessinginSensorNetworks,Apr.
2008,pp.
383–394.
[15]C.
IsciandM.
Martonosi,"Runtimepowermonitoringinhigh-endprocessors:Methodologyandempiricaldata,"inProc.
IEEE/ACM36thAnnualInternationalSymposiumonMicroarchitecture,Dec.
2003.
[16]F.
Bellosa,"Thebenetsofevent:drivenenergyaccountinginpower-sensitivesystems,"inProceedingsofthe9thworkshoponACMSIGOPSEuropeanworkshop,2000.
[17]PLASMAUsers'Guide,ParallelLinearAlgebraSoftwareforMulticoreArchitectures,Version2.
3,UniversityofTennesseeKnoxville,Nov.
2010.
[18]S.
Tomov,R.
Nath,H.
Ltaief,andJ.
Dongarra,"DenselinearalgebrasolversformulticorewithGPUaccelerators,"inProc.
24thIEEE/ACMInternationalParallelandDistributedProcessingSymposium,Apr.
2010.
[19]D.
Bedard,R.
Fowler,M.
Linn,andA.
Portereld,"PowerMon2:Fine-grained,integratedpowermeasurement,"RenaissanceComputingInstitute,Tech.
Rep.
TR-09-04,2009.
[20]Intel,IntelArchitectureSoftwareDeveloper'sManual,Volume3:SystemProgrammingGuide,2009.
[21]E.
Rotem,A.
Naveh,D.
Rajwan,A.
Anathakrishnan,andE.
Weissmann,"Power-managementarchitectureoftheIntelmicroarchitecturecode-namedSandyBridge,"IEEEMicro,vol.
32,no.
2,pp.
20–27,2012.
[22]Z.
Rui.
(2011,May)[patch2/3]introducein-telrapldriver.
linux-kernelmailinglist.
[Online].
Available:http://thread.
gmane.
org/gmane.
linux.
kernel/1145973[23]B.
Rountree,D.
Ahn,B.
deSupinski,D.
Lowenthal,andM.
Schulz,"BeyondDVFS:Arstlookatperformanceunderahardware-enforcedpowerbound,"inProc.
of8thWorkshoponHigh-Performance,Power-AwareComputing,May2012.
[24]Intel,Intel,MathKernelLibrary(MKL),http://www.
intel.
com/software/products/mkl/.
[25]AMD,AMDFamily15hProcessorBIOSandKernelDeveloperGuide,2011.
[26]NVMLReferenceManual,NVIDIA,2012.
[27]K.
Kasichayanula,"PowerawarecomputingonGPUs,"Master'sthesis,UniversityofTennessee,Knoxville,May2012.
[28]E.
Agullo,C.
Augonnet,J.
Dongarra,H.
Ltaief,R.
Namyst,S.
Thibault,andS.
Tomov,"Faster,cheaper,better-ahybridizationmethodologytodeveloplinearalgebrasoftwareforGPUs,"LAPACKWorkingNote230.
[29]S.
Yamazaki,S.
Tomov,andJ.
Dongarra,"One-sideddensematrixfactorizationsonamulticorewithmultipleGPUaccelerators,"inProc.
ofthe2012InternationalConferenceonComputationalScience,Jun.
2012.
[30]K.
Singh,M.
Bhadauria,andS.
McKee,"Realtimepowerestimationofmulti-coresviaperformancecounters,"Proc.
WorkshoponDesign,ArchitectureandSimulationofChipMulti-Processors,Nov.
2008.
[31]I.
Kadayif,T.
Chinoda,M.
Kandemir,N.
Vijaykirsnan,M.
Irwin,andA.
Sivasubramaniam,"vEC:virtualenergycounters,"inProc.
ofthe2001ACMSIGPLAN-SIGSOFTworkshoponProgramanalysisforsoftwaretoolsandengineering,Jun.
2001.
[32]V.
Tiwari,S.
Malik,andA.
Wolfe,"Poweranalysisofembeddedsoftware:arststeptowardssoftwarepowerminimization,"IEEETransactionsonVLSI,vol.
3,no.
4,pp.
437–445,1994.
[33]J.
RussellandM.
Jacome,"Softwarepowerestimationandoptimizationforhighperformance,32-bitembeddedprocessors,"inProc.
IEEEInternationalConferenceonComputerDesign,Oct.
1998,pp.
328–333.
[34]R.
JosephandM.
Martonosi,"Run-timepowerestimationinhigh-performancemicroprocessors,"inProc.
IEEE/ACMInternationalSym-posiumonLowPowerElectronicsandDesign,Aug.
2001,pp.
135–140.
[35]J.
Haid,G.
Kaefer,C.
Steger,andR.
Weiss,"Run-timeenergyestimationinsystem-on-a-chipdesigns,"inProc.
oftheAsiaandSouthPacicDesignAutomationConference,Jan.
2003,pp.
595–599.
[36]B.
Goel,S.
McKee,R.
Gioiosa,K.
Singh,M.
Bhadauria,andM.
Cesati,"Portable,scalable,per-corepowerestimationforintelligentresourcemanagement.
"inFirstInternationalGreenComputingConference,Aug.
2010.
[37]S.
MooreandJ.
Ralph,"User-denedeventsforhardwareperformancemonitoring,"inProc.
11thWorkshoponToolsforProgramDevelopmentandAnalysisinComputationalScience,Jun.
2011.
7

iHostART:罗马尼亚VPS/无视DMCA抗投诉vps;2核4G/40GB SSD/100M端口月流量2TB,€20/年

ihostart怎么样?ihostart是一家国外新商家,主要提供cPanel主机、KVM VPS、大硬盘存储VPS和独立服务器,数据中心位于罗马尼亚,官方明确说明无视DMCA,对版权内容较为宽松。有需要的可以关注一下。目前,iHostART给出了罗马尼亚vps的优惠信息,罗马尼亚VPS无视DMCA、抗投诉vps/2核4G内存/40GB SSD/100M端口月流量2TB,€20/年。点击直达:ih...

HostKvm - 夏季云服务器七折优惠 香港和韩国机房月付5.95美元起

HostKvm,我们很多人都算是比较熟悉的国人服务商,旗下也有多个品牌,差异化多占位策略营销的,商家是一个创建于2013年的品牌,有提供中国香港、美国、日本、新加坡区域虚拟化服务器业务,所有业务均对中国大陆地区线路优化,已经如果做海外线路的话,竞争力不够。今天有看到HostKvm夏季优惠发布,主要针对香港国际和韩国VPS提供7折优惠,折后最低月付5.95美元,其他机房VPS依然是全场8折。第一、夏...

特网云,美国独立物理服务器 Atom d525 4G 100M 40G防御 280元/月 香港站群 E3-1200V2 8G 10M 1500元/月

特网云为您提供高速、稳定、安全、弹性的云计算服务计算、存储、监控、安全,完善的云产品满足您的一切所需,深耕云计算领域10余年;我们拥有前沿的核心技术,始终致力于为政府机构、企业组织和个人开发者提供稳定、安全、可靠、高性价比的云计算产品与服务。公司名:珠海市特网科技有限公司官方网站:https://www.56dr.com特网云为您提供高速、稳定、安全、弹性的云计算服务 计算、存储、监控、安全,完善...

sandybridge为你推荐
嘉兴商标注册我想注册个商标怎么注册啊?老虎数码1200万相素的数码相机都有哪些款?大概价钱是多少?刘祚天还有DJ网么?同ip域名不同域名解析到同一个IP是否有影响www.yahoo.com.hk香港有什么有名的娱乐门户网站吗?www.gegeshe.comSHE个人资料www.33xj.compro/engineer 在哪里下载,为什么找不到下载网站?百度指数词什么是百度指数www.15job.com广州天河区的南方人才市场javlibrary.com大家有没有在线图书馆WWW。QUESTIA。COM的免费帐号
双线vps 购买域名和空间 net主机 新世界机房 singlehop 新加坡服务器 美国主机推荐 gitcafe 服务器cpu性能排行 万网优惠券 qq数据库 刀片服务器是什么 刀片服务器的优势 200g硬盘 789电视网 网站木马检测工具 vip域名 四核服务器 太原联通测速 godaddy空间 更多