PACUE:ProcessorAllocatorConsideringUserExperienceTetsuroHorikawa1,MichioHonda1,JinNakazawa2,KazunoriTakashio2,andHideyukiTokuda2,31GraduateSchoolofMediaandGovernance,KeioUniversity2FacultyofEnvironmentandInformationStudies,KeioUniversity,5322,Endo,Fujisawa,Kanagawa252-8520,Japan3JST-CREST,Japan{techi,jin,kaz,hxt}@ht.
sfc.
keio.
ac.
jp,micchie@sfc.
wide.
ad.
jpAbstract.
GPUacceleratedapplicationsincludingGPGPUonesarecommonlyseeninmodernPCs.
IfmanyapplicationscompeteonthesameGPU,theperfor-mancewilldecreasesignicantly.
Someapplicationshavealargeimpactonuserexperience.
Therefore,forsuchapplications,wehavetolimitGPUutilizationbytheotherapplications.
Itmightbestraightforwardtomodifyapplicationstoswitchcomputedevicedynamicallyforintelligentresourcesallocation.
Unfortu-nately,wecannotdosoduetosoftwaredistributionpolicyortheotherreasons.
Inthispaper,weproposePACUE,whichallowstheendsystemtoallocatecomputedevicesarbitrarytoapplications.
Inaddition,PACUEguessesoptimalcomputedeviceforeachapplicationaccordingtouserpreference.
WeimplementedthedynamiccomputedeviceredirectorofPACUEincludingOpenCLAPIhookinganddevicecamouagingfeatures.
WealsoimplementedtheframeoftheresourcemanagerofPACUE.
WedemonstratePACUEachievesdynamiccomputedeviceredirectingononeoutoftworealapplicationsandonallof20samplecodes.
Keywords:Resourcemanagement,OpenCL,binarycompatibility,GPU,GPGPU,PC,userexperience.
1IntroductionGraphicsProcessingUnit(GPU)usehasbeenextendedtoawiderrangeofcomput-ingpurposesonthePCplatform.
GPUutilizationpurposesonPCscanbeclassiedintofourpurposes.
Therstis3Dgraphicscomputation,suchas3Dgamesand3D-graphics-basedGUIshell(e.
g.
,WindowsAero).
Thesecondis2Dgraphicsaccelera-tion,suchasfontrenderinginmodernwebbrowsers.
Thethirdisvideodecodingandencodingacceleration.
VideoplayerapplicationsusethevideodecodingaccelerationfunctionoftheGPUtoreduceCPUloadandtoincreasethevideoquality.
Also,someofGPUshavevideoencodingaccelerationunitsonthedieoftheGPU.
Thelastpurposeisgeneral-purposecomputing,calledGeneral-PurposecomputingonGPU(GPGPU).
OnPCs,GPGPUisoftenusedbyvideoencodingapplicationsandphysicssimulationapplicationsincluding3Dgames.
11Some3DgamesutilizeGPUforgeneral-purposecomputingbesides3Dgraphicsrendering.
M.
Alexanderetal.
(Eds.
):Euro-Par2011Workshops,PartII,LNCS7156,pp.
335–344,2012.
cSpringer-VerlagBerlinHeidelberg2012336T.
Horikawaetal.
Intoday'sPCsGPUsareutilizedefciently,becauseonlyafewoftheapplicationsareacceleratedatthesametime;theseapplicationsdonotcompeteeachotheronthesameGPU.
Applicationsthuschoosecomputedevicesstatically,suchasbyuserselec-tionintheapplicationcongurationmenuoftheGUIinterface.
However,weenvisagethatmoreandmoreapplicationsutilizeGPUs.
Forexample,OpenComputingLanguage(OpenCL)[2]allowsapplicationstoselectthecomputedeviceexplicitlytoexecutesomepartsoftheapplication.
Therefore,efcientloadbal-ancingbetweencomputedevicesconsistingofCPUsandGPUsisessentialforfutureconsumerPCs.
TherearethreetechnicalchallengestoachieveefcientcomputedeviceassignmentofheterogeneousprocessorsinPCs.
First,GPUaccelerationisutilizedforvariouspur-poses,whileGPUsareutilizedmainlyforgeneral-purposecomputinginsupercomput-ers.
Inaddition,someoftasksrunninginPCsstronglyrequirespecicprocessors.
Forexample,3DrenderingisnormallyprocessedbyGPUs,andsomeof3Dgraphicstrans-actionscannotbeprocessedbyCPUs,whereassomeapplicationscanbeprocessedbybothCPUsandGPUs.
WhentheGPUloadishigh,wecouldrunthelatterapplicationsexplicitlyonCPUs.
Second,wemustnotmodifyapplications.
Typically,mostofapplicationsinstalledinmajorOSessuchasWindowsandMacOScannotbemodiedbyathirdperson,duetotheirsoftwaredistributionpolicies.
Applicationvendorsmaynotbewillingtomodifytheirapplicationseither,becauseitwillnotbenetthemstraightforwardly.
Forthesereasons,existingruntimelibrariesorlibrariestodistributetasksbetweencomputedevices[6,10,7]proposedforHPCarenotdeployableonconsumerPCs.
Third,performancemetricforconsumerPCsiscomplicated,becauseuserpreferenceisoneofthemostimportantmetricsforassigningcomputedevicestoapplications.
ItisclearlydifferentfromgeneralHPC'smetricswhosetaskdistributingpolicyisusuallystatic,suchasmaximizingtasktransactionspeedormaximizingperformanceperwatt.
InPCs,taskdistributingpoliciesandmeritseasilychangedependingontheuse.
Forexample,whentheuserwouldliketoplaythe3Dgamesmoothly,theotherGPGPUtasksshouldnotbeassignedtotheGPU.
Ontheotherhand,sometimestheusermightbewillingtotranscodevideosquicklyratherthanplayingthetriinggamesmoothly.
Thecomputedeviceselectingmethodmustrecognizeuserpreferencestodecidethepropercomputedevicetoassign.
Howeverthisishard,thususerpreferencerecognizingcannotautomate.
Therefore,theresourcemanagementhastoinferPCutilizationandtheusershavetobeabletotellhowtheyareusingPCatthattime.
Inthispaper,weproposePACUEwhichallocatescomputedevicestoapplicationsefciently.
PACUEhastwofeatures,oneisdynamiccomputedeviceredirectingfeatureandtheotherissystem-wideoptimaldeviceselectingfeature.
Westronglyfocusonsolvingrealproblemswhichwilloccurwhenwedistributeoursystemovertheworldviaweb.
Therefore,wepreferchoosingpoliticallysafermethodratherthantechnicallybettermethod.
Thus,rstadvantageofPACUEisthepossibilityofthedeployment.
ThesecondadvantageofPACUEisdesignedtomaximizePCusers'experience.
Thus,webringanewmetricforusingaccelerators,anditwillbealsobenecialforothercomputerssuchassmartphonesorgameconsoles.
PACUE:ProcessorAllocatorConsideringUserExperience337OurexperimentalresultsshowthatPACUEcanswitchcomputedevicesin1outof2applications,andallof20samplecodesbuiltwithOpenCL.
Thereminderofthispaperisorganizedasfollows:InSec.
2,wedescribethedesignofPACUEconsistingofthedynamiccomputedeviceredirectingandthesystemresourcemanager.
InSec.
3,weevaluateourprototypeimplementation.
ThepaperconcludeswithSec.
4.
2DesigningPACUEPACUEisconstructedbytwocomponents;DynamicComputeDeviceRedirectorandResourceManager.
WefocusonapplicationsbuiltwithOpenCL,awidelyusedframe-workwhichsupportsmanytypesofcomputedevicessuchasCPUsandGPUs.
2.
1DynamicComputeDeviceRedirectionWedesigntheDynamicComputeDeviceRedirection(DCDR)methodtomeetthe"noapplicationmodication"requirement.
DCDRimplementsOpenCLAPIhookingthatconcealsactualcomputedevicesfromapplications,andavoidserrorcausedbyinconsistentinformationofdevices.
OpenCLAPIHooking.
OpenCLabstractscomputedevicesandmemoryhierarchytoutilizeheterogeneousprocessorswithinitsprogrammingmodel.
Toutilizeacom-putedevice,applicationscallOpenCLAPIsandspecifyacomputedevice.
Assigningprocessarefollowing:Secondly,selectpossibledevicesandcreateanOpenCLcon-text.
Thirdly,selectonedevicetouseandcreateacommandqueue.
Lastly,puttaskstothequeuecreatedabove.
Inthesecondandthethirdsteps,theapplicationspeciesaconcretedevicebecauseOpenCLAPIsneedsdeviceIDasitsparameter,whichmakessystem-wideoptimaldeviceselectionimpossible.
Foroptimaldeviceselection,were-movetherestrictionthattheapplicationsneedtochoosethedevicebyitselfbecausethedecisionishardforapplicationsandusers.
However,decisionsbyapplicationsorusersarerarelyoptimal(SeeSec.
2.
2).
PACUEhooksapartofOpenCLAPIswhichconcerndeviceselecting,andimplementsaskingfunctionthataskswhichdevicetoutilize.
ThereareseveralmethodstohookAPIsinWindows7wherePACUEisimple-mented.
TherstpossibilityismakingathreadinthetargetapplicationbycallingaWindowsAPICreateRemoteThread()[12].
Withthismethodweimplementanapplica-tionwhichmakeathreadinotherapplicationsandmapexternalDLLcontainingover-riddentargetAPIs.
However,theseapplicationsandDLLsarehardtoimplementduetocomplicatedprocedures.
Ithasariskbeingtreatedasmalwarebytheanti-malwaresoft-ware.
ThesecondpossibilityisGlobalHook,theuserapplicationhooksspecicAPIsofallapplicationbycallingWindowsAPISetWindowsHookEx()[13].
Thismethodisunsafe,becauseithasariskofhookingunknownapplicationsandcausingunexpectedaffecttothem.
ThethirdpossibilityismakingWrapperDLL,whichisaDLLwiththesamelenameoforiginalDLLandhasallAPIsoforiginalDLL.
WrapperDLLisalmostshelloforiginalDLL,becausemostAPIsaresimplycallsoriginalDLLAPIsexceptAPIswhichactuallyneedtododifferenttransactionfromoriginal.
ThismethodhasthemostchanceofhookingAPIs,becausewrapperDLLlocatedintheapplica-tiondirectoryisalwaysloadedpriortotheotherones,suchasDLLslocatedinsystem338T.
Horikawaetal.
Fig.
1.
DynamicComputeDeviceSwitchingbyOpenCLAPIHookingdirectoriesbydefault.
Inaddition,whenlocatingwrapperDLLinthedirectorywhichtargetEXElocated,onlyaffectsapplicationswhosebinaryislocatedinthesamedi-rectory.
Therefore,thisisreallysafewaytohookAPIs.
ThelastpossibilityistheuseofAPIhooklibraries,suchas[14].
Theselibrariesareeasytouse,howeverithaslessprobabilitytosuccesstohookAPIsthanWrapperDLL.
Italsohasarisktobetreatedasmalware.
Fromthiscomparison,weadopttheWrapperDLLmethod.
Fig.
1illus-tratesthearchitecturetohookOpenCLAPIswiththismethod.
OthermajorPCOSessuchasMacOSorLinuxdonotprovideanyfunctionlikewrapperDLLs,stillwecanimplementasimilarsystembyusingAPIhookingfunctionsofferedbyotherOSes.
Anothermethodtoswitchdevicesismakingavirtualdevice.
[5]Onthismethod,ap-plicationswillassignthevirtualdeviceandtheresourcemanagementsystemchoosearealdevice.
Thismethodhasasignicantadvantagethatitcanswitchrealdevicesatanytime,howeveritmayconictwithInstallableClientDriver(ICD)systemofOpenCL.
InstallerofOpenCLruntimelibrariesdistributedbyhardwarevendorssometimesover-write"OpenCL.
dll"le,thusinstallingavirtualdeviceorshowingapplicationsonlythevirtualdeviceisdifcultonPCs.
DeviceInformationCamouaging.
Whenapartofapplications'tasksareassignedtoPACUEselectedOpenCLdevice,someapplicationsshowerrors.
Thisisbecausedeviceinformationisdifferentfromtheapplication'sintendedone,thussomeapplicationsrecognizeitasanunusualevent.
Toavoidtheseerrors,PACUEcamouagesOpenCLdevicedetailswhenthedesiredOpenCLdevicehasbeenchangeddynamically.
However,camouagingOpenCLdevicedetailsisrisky,becausedeviceshavediffer-entspecicationsinthelowerlevel.
Therstriskisapplicationstability.
Thememorysizeofeachhierarchyisdevicedependent,hencetheunexpectedmemorysizecanre-sultinapplicationcrashorerror.
Thesecondriskisexecutionspeed.
Ifanapplicationimplementsper-deviceoptimization,mismatchbetweentheintendeddeviceandtheas-signeddevicecanresultinunexpectedperformancedegradation.
Fromthesereasons,weshouldcamouagesdevicedetailsonlywhenitisnecessary.
Tominimizetherisks,PACUEcamouagesdevicesinfollowinglevels.
1.
DevicetypelevelcamouageWhenanapplicationtriestoacquireanOpenCLdevicelist,PACUEwillover-writethecldevicetypevalue.
Asfaraspossible,PACUEwillchangethisvalueforCLDEVICETYPEALL.
Showingalldevicesinsteadofthespecictypede-vicesisareasonablechoice,becauseitavoidsforcingapplicationusingunknownPACUE:ProcessorAllocatorConsideringUserExperience339Table1.
ComparisonofDeviceCamouagingMethodsOverriddendevicetype/IDSpeciedTypewhengettingdevicelistSpeciedIDwhencreatingaContextSpeciedIDwhencreatingaCommandqueuecreationCrash/ErrorRiskCompatibilityA.
DevicetypelevelCPUsorGPUsAllCPUsorallGPUs\LowMostapplica-tionsB.
Contextlevel\CPUsorGPUsLowLowC.
Commandqueuelevel\ALLOneCPUoroneGPUHighMostapplica-tionsD.
A+CALLALLOneCPUoroneGPUNormalHighdevice.
Occasionally,applicationscannotexecutetheirOpenCLcodeonsomede-vicetypes.
Inthiscase,PACUEsetsthecldevicetypevaluetothedesiredtype,suchasCLDEVICETYPECPUorCLDEVICETYPEGPU.
2.
ContextlevelcamouageWhencreatinganOpenCLcontext,PACUEoverridesthecldeviceidvalueandforceOpenCLframeworktobuildOpenCLbinariesforeachcomputedevice.
IfPACUErecognizethatthetargetapplicationsupportonlyspecictypeofcomputedevices,PACUEwilloverwritethecldeviceidvalueandlimitdevicetypesforcontext.
Inaddition,PACUEoverridesthecldeviceidvaluewhenapplicationsrequestsdetaileddeviceinformation.
Therefore,applicationwillseeinformationofthedevicePACUEselected.
Thiscontributestoapplication'sstability,becauseacquireddeviceinformation,suchasthememorysizecorrespondstothatofthedeviceactuallywillbeused.
3.
CommandqueuelevelcamouageWhentheapplicationcallsclCreateCommandQueue()API,thisisthelastchancetochangethedevice.
Becauseofthestabilityissuedescribedabove,PACUEtriesnottochangedevicethistiming,butifnecessary,PACUEchangescldeviceidinargumentsofthisAPI.
Inthissituation,thedeviceiscamouagedcompletely,thustheapplicationrecognizesthecamouageddeviceasthedeviceapplicationspeci-ed.
Thisisaterriblydangerouswaytochangedevice,stillitimprovesapplicationcompatibility.
Thisisriskyintermsofdevicedependentcharacteristics,suchasthememorysize,however,wecanswitchtheprocessorinmoreapplicationswiththismethod.
Hence,thismethodisaceinthehole.
AsshowninTable1,thereareseveraldeviceassignmentoverridingwaysbythecom-binationofthesesteps.
Becausetheyhaveatrade-offbetweenapplicationcompatibilityandapplicationstability,wehavetomakearuleforapplyingthesemethods,andsomehintsareguredoutinSec.
3.
2.
2SystemResourceManagementWeneedasystem-wideresourcemanagerforheterogeneousprocessors,becauseav-eragePCuserscannotchoosepropercomputedeviceforeachapplication,anditis340T.
Horikawaetal.
inconvenientthattheyselectcomputedeviceeverytimetheapplicationruns.
Somead-vancedPCuserscanchoosepropercomputedevicemanually,howeveritisterriblyinconvenient.
Besides,manyPCusersdonotknowdetailedconstructionofthePCtheyareusing.
Theseuserscannotchoosethepropercomputedevicewhichsatisestheirpreferenceaccurately,eveniftheapplicationallowstheusertoselectthecomputede-viceonitsGUIcongurationmenu.
Forachievinghighuser-experience,theresourcemanagershouldselectacomputedeviceautomaticallyaccordingtouser'spreferences.
TherearemanystudiesinHPCareathatbuildaresourcemanagertoselectcomputedeviceautomatically[7,8].
Theyshowtaskdistributingalgorithmforheterogeneousprocessorsenvironmentthatoptimizedforsomespecicpurposes,suchasmaximizingperformanceormaximizingperformance-per-watt.
However,theycannotbeappliedtoresourcemanagementonPCbecausetherequirementsaredifferentbetweenPCandHPC.
Theotherapproachtodifferentiatetasks,suchasdevice-driverlevelapproach[9]wouldbeapossibilityforourgoal.
However,westillneedasystemwideresourcemanagertoconsiderheterogeneousprocessorsandapplications.
Thesearethreere-quirementsoftheresourcemanagerespeciallyforPCs.
–ConsideringuserpreferenceAPCuser'spreferenceoftenchangesandtheyarenotsimpleobjectssuchasmax-imizingperformance.
Inaddition,itisdifculttorecognizewhichapplicationisreallyimportant,becausewerarelyspecifypriorityoftheprocessexplicitly.
There-fore,wehavetobuildaresourcemanager,whichinfersuser'spreferencebycol-lectingPCutilizationstatusandchoosescomputedevicesforeachapplicationtoachieveuserpreferenceaccurately.
–SupportingvarioushardwarecongurationsThereareplentyofPChardwarecomponentsandapplications.
Becauseofthisreason,combinationofhardwarecomponentsandapplicationsareinnumerable.
Inaddition,thespecicationsofcomponentsdependontechnologytrends.
Forinstance,somenewGPUvirtualizationtechnologiesforPCsuchasVirtuGPUvirtualization[11]seamlesslyusediscreteGPUwhenspecicAPIscalled.
Thus,wehavetobuildresourcemanagerthatsupportsvarioushardwarecongurations.
–SupportingvariousruntimeversionsInstalledruntimelibrariesforparallelcomputingmayvaryinPCs.
Applicationexecutionspeedsarenotonlydependsonhardware,butalsodependsonruntimelibrarieslikeOpenCLframeworks.
Thus,acomputedeviceselectingalgorithmop-timizedforspecicruntimeversion,suchasdesignedforHPC,maynotshowgoodresultsonthenewerversionruntimelibraries.
Wehavetobuildcomputedevicese-lectingalgorithmsthatdonotdependonaspecicruntimeversion.
Thisresourcemanagerhasthreefeaturesforsatisfyingtherequirementsexplainedabove.
Therstfeatureisinformationgathering.
PACUEcollectsinformationabouthowPCisutilized,suchaswhetheranACadapterisconnected,temperaturesandvolt-agesofcomponents,andprocessorutilizationlevelsuchasprocessorloadsandtherunningapplicationslist.
Thesecondfeatureistheuserpreferenceinferringfeature.
Theuserdescribestheirrequirementsbycreatingseveralrequirementpatterns.
PACUEinferswhichpatternisthebestforthepresentsituationbyusinginformationacquiredinPACUE:ProcessorAllocatorConsideringUserExperience341therststep.
Thethirdfeatureiscomputedeviceselection,whichdecidestheOpenCLdevicetobeassignedtoeachapplication.
Weplantoimplementafewcomputedeviceselectingalgorithmsforseveraluserpreferencepatterns.
PACUEwillassigncomputedevicestoeachapplicationbasedonthealgorithmwhichmatchestheinferredpatternofuserpreference.
Theresourcemanagerworksascyclesofthesesteps:1.
CollectPCutilizationinformation.
2.
Guesswhichproleisthebestforthepresentcondition.
3.
Waitaninquiryofapplicationandanswerwhichdeviceshouldbeused.
Forevaluationpurpose,webuiltabasicresourcemanagerwhichhascommunicationfunctiontoorderapplicationstoutilizespeciccomputedevice.
Becauseoflackofuserpreferencebasedcomputedeviceselectingalgorithms,recentPACUEcanonlyselectcomputedevicebymanualselectionintheresourcemanagerGUI.
Still,itcanreceiveaninquiryofcomputedeviceselectionandansweracomputedevicetoutilize.
3EvaluationInthissectionweconrmPACUEprovidescomputedevicesredirectioncapabilityforapplicationswithoutmodicationonwidelyusedapplications.
Werststatethepolicyoftheevaluation,thenshowandanalyzetheresults.
3.
1EvaluationPolicyWeevaluatePACUEinaPCwithIntelCorei7-920CPUandAMDRADEONHD4850GPU.
AsOpenCLframework,weadoptx86binaryofATIStreamSDK2.
2[4].
ThisframeworksupportsbothCPUsandAMDRADEONGPUsasOpenCLdevices.
Astestingapplications,wechosethefollowings.
Theyarepubliclyreleasedandwidelyusedforbenchmarking,thussuitesourpurpose.
–DirectCompute&OpenCLBenchmark[1]–SiSoftwareSandra2011[15]–Samplecodeof"OpenCLIntrodouction"book[3]Weswitchthedevicetoutilizefortheseapplications,andcomparethemethodsfordeviceswitchingforeachoftheseapplications.
3.
2ResultsDirectCompute&OpenCLBenchmark.
Table2showstheresults.
PACUEcanredirectcomputedeviceperfectlyonDirectCompute&OpenCLBenchmark,butonlywithmethodD.
SiSoftwareSandra2011.
Deviceswitchingfailed.
WhenPACUEtriedtoswitchthedevice,Sandra2011exhibitedstrangebehavior,suchasshowingthesamedevicetwiceintheGUI.
BecauseSandra2011isaninformation&diagnosticutilityforPC,itgathersdeviceinformationbyvariousAPIs.
Thus,thefailuremaybecausedbythelackofintegritybetweendeviceinformationgatheredbyPACUEhookedOpenCLAPIandinformationgatheredbyotherAPIs.
However,PACUEdonotmakeSandracrashed.
342T.
Horikawaetal.
Table2.
ResultofDirectCompute&OpenCLBenchmarkOverrideMethodA-1A-2B-1B-2C-1C-2D-1D-2SpeciedDeviceTypeCPUGPU\\\\ALLALLSpeciedDeviceIDforContext\\CPUsGPUsALLALLALLALLSp.
Dev.
IDforCommandQueue\\\\CPUGPUCPUGPUApplicationRecognizedDevicesCPU*2GPU*2CPU*1GPU*1CPU*1CPU*1CPU*1+GPU*1CPU*1+GPU*1DynamicDeviceSwitchingImpossibleImpossibleStaticStaticStaticStaticDynamicDynamicSampleCodesof"OpenCLIntroduction"Book.
Thesecodesareasetof20sampleapplicationsofOpenCLAPIs.
Thedeviceswitchingsucceededforallapplicationsinthem.
However,1sampleusesdevicememoryinformationfortheoptimizedarraysize,thustheresultmightdependonthedevice.
Thecompletecamouagingdeviceinfor-mationmightthusbeincompatiblewiththeinformationexpectedbythesample.
Thiscancausetheapplicationcrashingorerrors,howeveritseemedtobeworkingcorrectlywhiletheexperiment.
3.
3AnalysisTheresultsshowthatPACUEcanswitchthecomputedevicesonrealapplications.
However,itfailsfordevicedependentapplications.
Theyusedetailedinformationoftheparticulardevice,suchasdevicememorysize.
Thus,theymaycrashorbehavestrangelybecauseoftheinformationcamouagedbyPACUE.
Amongcombinationsofthedeviceinformationoverriding,wefoundtheproperor-dertoapplyonapplications.
ShowninTable1,thesemethodshaveatrade-offbetweenapplicationstabilityandapplicationcompatibility.
Inourevaluation,wefoundthatthecompletecamouagingmethodsignicantlyincreaseapplicationcompatibilityforrealapplications,suchasDirectCompute&OpenCLBenchmark.
However,itisrealizedbygivingapplicationstheinformationofthedevicetheapplicationspecied,insteadofgivingthedeviceinformationactuallyusing.
Originalapplicationcreatoristheonlyonewhoknowsiftheapplicationworkscorrectlywhenusingthecompletecamouag-ingmethod,thusweshouldavoidusingthisriskymethodifpossible.
Ingeneral,wesuggestthefollowingmethodapplyingorder;1.
OverridedevicetypeALLandoverridedeviceidwhencreatingcontext.
(Table1B)2.
OverridedevicetypeALLandoverridedeviceidwhencreatingcommandqueue.
(Table1D)3.
Keeporiginaldevicetypeandoverridedeviceidwhencreatingcommandqueue.
(Table1C)4.
OverridedevicetypeCPUorGPUwhenapplicationrequestslistofavailablede-vices.
(Table1A)Thersttothethirdmethodssimilarlyrealizedynamicdeviceselection.
Theupperissafer,thelowerhasmorecompatibility.
Applicationsthatcannotswitchdeviceswiththerstmethodshouldusethesecondorthethirdmethod.
Thelastonehasthehigh-estcompatibilitybutitonlyprovidesstaticandrestrictivedeviceswitching.
Thus,thismethodshouldbeappliedwhenallothermethodsfail.
PACUE:ProcessorAllocatorConsideringUserExperience3434ConclusionsandFutureWorkInthispaperwepresentedPACUE.
First,PACUEswitchesthecomputedevicesdynam-icallyforapplicationsonPCswithheterogeneousprocessors.
Second,PACUEchoosescomputedevicesassignedtoapplicationstomeettheuser'srequirement.
Weconductedexperimentsofourimplementation,anddemonstratedthat1outof2realOpenCLap-plications,andallof20sampleprogramscanchangethecomputedevicedynamicallywiththedynamiccomputedeviceredirector.
Inaddition,weshowedthatafewde-viceinformationcamouagingmethodssignicantlyincreaseapplicationcompatibil-ity.
Fromabovework,wedemonstratedpotentialavailabilityofthedynamiccomputedeviceredirectingwithoutapplicationmodied.
However,thereare2technicaldisad-vantagesinPACUE.
TherstdisadvantageisthatPACUEcanswitchdevicesonlywhencreatingcommandqueue.
Thisisbecausethereisnosupportfordynamicdeviceswitch-inginOpenCL,thusthechancesforswitchingdevicesarelimited.
Wewillinvestigateothermethodstoexpandthechancesforswitchingdevices,alsowewillinvestigatethefrequenciesofthedeviceswitchingtimingonotherAPIs.
TheseconddisadvantageisOpenCLkerneloptimization.
Becauseofdeviceinformationcamouaging,thereisapossibilityofexecutingkernelsdesignedforotherdevices.
Thismaydecreasetheper-formancesignicantly,thusweshouldavoidmakingsituationslikethat.
OneansweriscachingeverytypeofkernelsourcecodesbyAPIhooking,andswitchitaccordingtothedeviceactuallyusing.
Anotheranswerisapplyingjust-in-timeOpenCLcodeopti-mizationtechniquetoimproveperformance.
However,bothofthemcaninterferethecopyrightlaworlicensesoftheapplications.
Therefore,itmaybedifculttoapplyitforPCapplications.
Becauseofthisreason,wecontinueimprovingcamouagemethodsandwewillavoidshowingdifferentdevicesinformationaspossibleaswecan.
Forourresearchgoals,wehavetheseongoingworks:IncreaseCompatibilityforApplications.
WewilladdresstheproblemthatPACUEcannotswitchcomputedevicesinsomeapplications.
Alsowewillexperimentapplica-tionstabilitytestsonapplications.
EvaluateinManyHardwareEnvironment.
WewillconductexperimentsonmorehardwarecongurationsuchasVirtu,andimprovehardwaresupportofPACUE.
ImplementtheUserPreferencesHandlerintheResourceManager.
Weassumethatthereareseveralpatternsdescribinguserpredenedrequirements(e.
g.
,playingimportantgamewiththeACadaptor,andhastylecompressionwithunremarkablevideoencoding).
PACUEinfersmatchingpatternfromtheuser'sactivityandresourceutilization.
ImplementComputeDeviceSelectingAlgorithm.
Withuserrequirementrecogni-tion,weselectcomputedevicestofollowuserpreferenceaccurately.
Wewillimple-mentsomealgorithmsandparametersetsforeachuserrequirementpattern.
Also,wewillexploreperformanceimpactwhileredirectingcomputedeviceinrealapplicationsandtakemeasureagainstheavyperformancedegradation.
ShowingapplicationsnoOpenCLdevicebyoverridingOpenCLAPIscanbeoneoftheanswers.
Inthiscase,344T.
Horikawaetal.
applicationswilluseinternaloptimizedassemblytoexecuteitstransactionanditisoftenmuchfasterthanexecutingOpenCLcodeonCPUs.
However,ithasadisadvan-tagethatcomputedevicecannotchangeuntilrestartingtheapplication,becausetheapplicationwillnevercallOpenCLAPIsagain.
Therefore,wewillinvestigateeachapplication'sbehaviorconcretelytodecidehowtoletapplicationtouseCPUs.
SupportforOtherParallelComputingFrameworks.
Weplantoimplementmod-ulesforotherAPIssuchasFusionSystemArchitectureIntermediateLayerLanguage(FSAIL).
References1.
DirectCompute&OpenCLBenchmark,http://www.
ngohq.
com/graphic-cards/16920-directcompute-and-opencl-benchmark.
html(accessedonAugust21,2011)2.
OpenCL1.
1Specication,http://www.
khronos.
org/registry/cl/specs/opencl-1.
1.
pdf3.
FixtarsCorporation:OpenCLIntroduction-ParallelProgrammingforMulticoreCPUsandGPUs.
ImpressJapan(January2010)(inJapanese)4.
AMD.
ATIStreamTechnology,http://www.
amd.
com/US/PRODUCTS/TECHNOLOGIES/STREAM-TECHNOLOGY/Pages/stream-technology.
aspx(accessedonAu-gust21,2011)5.
Aoki,R.
,Oikawa,S.
,Tsuchiyama,R.
,Nakamura,T.
:Hybridopencl:Connectingdifferentopenclimplementationsovernetwork.
In:Proc.
IEEECIT2010,pp.
2729–2735(2010)6.
Brodman,J.
C.
,Fraguela,B.
B.
,Garzaran,M.
J.
,Padua,D.
:Newabstractionsfordataparallelprogramming.
In:Proc.
USENIXHotPar,p.
16(2009)7.
Diamos,G.
F.
,Yalamanchili,S.
:Harmony:anexecutionmodelandruntimeforheteroge-neousmanycoresystems.
In:Proc.
ACMHPDC,pp.
197–200(2008)8.
Gupta,V.
,Schwan,K.
,Tolia,N.
,Talwar,V.
,Ranganathan,P.
:Pegasus:CoordinatedSchedul-ingforVirtualizedAccelerator-basedSystems.
In:Proc.
USENIXATC,pp.
31–44(2011)9.
Kato,S.
,Lakshmanan,K.
,Rajkumar,R.
,Ishikawa,Y.
:TimeGraph:GPUSchedulingforReal-TimeMulti-TaskingEnvironments.
In:Proc.
USENIXATC,pp.
17–30(2011)10.
Liu,W.
,Lewis,B.
,Zhou,X.
,Chen,H.
,Gao,Y.
,Yan,S.
,Luo,S.
,Saha,B.
:Abalancedpro-grammingmodelforemergingheterogeneousmulticoresystems.
In:Proc.
USENIXHotPar,p.
3(2010)11.
Lucidlogix.
Lucidlogixvirtu,http://www.
lucidlogix.
com/product-virtu.
html(accessedonAugust21,2011)12.
Microsoft.
CreateRemoteThreadFunction(Windows),http://msdn.
microsoft.
com/en-us/library/ms682437.
aspx(accessedonAugust21,2011)13.
Microsoft.
SetWindowsHookExFunction(Windows),http://msdn.
microsoft.
com/en-us/library/ms644990.
aspx(accessedonAugust21,2011)14.
MicrosoftResearch.
Detours-microsoftresearch,http://research.
microsoft.
com/en-us/projects/detours/(accessedonAugust21,2011)15.
SiSoftware.
Sisoftwarezone,http://www.
sisoftware.
net/(accessedonAugust21,2011)
imidc怎么样?imidc彩虹网路,rainbow cloud知名服务器提供商。自营多地区数据中心,是 Apnic RIPE Afrinic Arin 认证服务商。拥有丰富的网路资源。 在2021年 6.18 开启了输血大促销,促销区域包括 香港 台湾 日本 莫斯科 等地促销机型为 E3係,参与促销地区有 香港 日本 台湾 莫斯科 等地, 限量50台,售罄为止,先到先得。所有服务器配置 CPU ...
racknerd从成立到现在发展是相当迅速,用最低的价格霸占了大部分低端便宜vps市场,虽然VPS价格便宜,但是VPS的质量和服务一点儿都不拉跨,服务器稳定、性能给力,尤其是售后方面时间短技术解决能力强,估计这也是racknerd这个品牌能如此成功的原因吧! 官方网站:https://www.racknerd.com 多种加密数字货币、信用卡、PayPal、支付宝、银联、webmoney,可...
我们很多老用户对于BuyVM商家还是相当熟悉的,也有翻看BuyVM相关的文章可以追溯到2014年的时候有介绍过,不过那时候介绍这个商家并不是很多,主要是因为这个商家很是刁钻。比如我们注册账户的信息是否完整,以及我们使用是否规范,甚至有其他各种问题导致我们是不能购买他们家机器的。以前你嚣张是很多人没有办法购买到其他商家的机器,那时候其他商家的机器不多。而如今,我们可选的商家比较多,你再也嚣张不起来。...
sisoftwaresandra为你推荐
美国互联网瘫痪美国是否有能力关闭全球互联网以及中国互联网,还有美国有没能力关闭某个网站,比如淘宝,天涯,网易等12306崩溃12306是不是瘫痪了?haokandianyingwang谁有好看电影网站啊、要无毒播放速度快的、在线等抓站工具大家在家用什么工具练站?怎么固定?面壁思过?在医院是站站立架m.yushuwu.org花样滑冰名将YU NA KIM的资料谁有?sodu.tw今天sodu.org为什么打不开了?盗车飞侠侠盗飞车罪恶都市警车任务怎么做云鹏清16届大学生篮球联赛西北赛前八强长房娇人物描写片段,不用太长,150字左右,要有出处!急!!!!!弗雷德疯哈利波特与死亡圣器前面的两首诗是什么含义啊?
ip查域名 vps租用 godaddy优惠码 免费ftp空间 哈喽图床 好看的留言 香港新世界电讯 howfile 静态空间 t云 最漂亮的qq空间 web服务器是什么 路由跟踪 免费asp空间申请 atom处理器 umax windows2008 godaddy域名 nano 瓦工招聘 更多