supportssisoftwaresandra

sisoftwaresandra  时间:2021-04-01  阅读:()
PACUE:ProcessorAllocatorConsideringUserExperienceTetsuroHorikawa1,MichioHonda1,JinNakazawa2,KazunoriTakashio2,andHideyukiTokuda2,31GraduateSchoolofMediaandGovernance,KeioUniversity2FacultyofEnvironmentandInformationStudies,KeioUniversity,5322,Endo,Fujisawa,Kanagawa252-8520,Japan3JST-CREST,Japan{techi,jin,kaz,hxt}@ht.
sfc.
keio.
ac.
jp,micchie@sfc.
wide.
ad.
jpAbstract.
GPUacceleratedapplicationsincludingGPGPUonesarecommonlyseeninmodernPCs.
IfmanyapplicationscompeteonthesameGPU,theperfor-mancewilldecreasesignicantly.
Someapplicationshavealargeimpactonuserexperience.
Therefore,forsuchapplications,wehavetolimitGPUutilizationbytheotherapplications.
Itmightbestraightforwardtomodifyapplicationstoswitchcomputedevicedynamicallyforintelligentresourcesallocation.
Unfortu-nately,wecannotdosoduetosoftwaredistributionpolicyortheotherreasons.
Inthispaper,weproposePACUE,whichallowstheendsystemtoallocatecomputedevicesarbitrarytoapplications.
Inaddition,PACUEguessesoptimalcomputedeviceforeachapplicationaccordingtouserpreference.
WeimplementedthedynamiccomputedeviceredirectorofPACUEincludingOpenCLAPIhookinganddevicecamouagingfeatures.
WealsoimplementedtheframeoftheresourcemanagerofPACUE.
WedemonstratePACUEachievesdynamiccomputedeviceredirectingononeoutoftworealapplicationsandonallof20samplecodes.
Keywords:Resourcemanagement,OpenCL,binarycompatibility,GPU,GPGPU,PC,userexperience.
1IntroductionGraphicsProcessingUnit(GPU)usehasbeenextendedtoawiderrangeofcomput-ingpurposesonthePCplatform.
GPUutilizationpurposesonPCscanbeclassiedintofourpurposes.
Therstis3Dgraphicscomputation,suchas3Dgamesand3D-graphics-basedGUIshell(e.
g.
,WindowsAero).
Thesecondis2Dgraphicsaccelera-tion,suchasfontrenderinginmodernwebbrowsers.
Thethirdisvideodecodingandencodingacceleration.
VideoplayerapplicationsusethevideodecodingaccelerationfunctionoftheGPUtoreduceCPUloadandtoincreasethevideoquality.
Also,someofGPUshavevideoencodingaccelerationunitsonthedieoftheGPU.
Thelastpurposeisgeneral-purposecomputing,calledGeneral-PurposecomputingonGPU(GPGPU).
OnPCs,GPGPUisoftenusedbyvideoencodingapplicationsandphysicssimulationapplicationsincluding3Dgames.
11Some3DgamesutilizeGPUforgeneral-purposecomputingbesides3Dgraphicsrendering.
M.
Alexanderetal.
(Eds.
):Euro-Par2011Workshops,PartII,LNCS7156,pp.
335–344,2012.
cSpringer-VerlagBerlinHeidelberg2012336T.
Horikawaetal.
Intoday'sPCsGPUsareutilizedefciently,becauseonlyafewoftheapplicationsareacceleratedatthesametime;theseapplicationsdonotcompeteeachotheronthesameGPU.
Applicationsthuschoosecomputedevicesstatically,suchasbyuserselec-tionintheapplicationcongurationmenuoftheGUIinterface.
However,weenvisagethatmoreandmoreapplicationsutilizeGPUs.
Forexample,OpenComputingLanguage(OpenCL)[2]allowsapplicationstoselectthecomputedeviceexplicitlytoexecutesomepartsoftheapplication.
Therefore,efcientloadbal-ancingbetweencomputedevicesconsistingofCPUsandGPUsisessentialforfutureconsumerPCs.
TherearethreetechnicalchallengestoachieveefcientcomputedeviceassignmentofheterogeneousprocessorsinPCs.
First,GPUaccelerationisutilizedforvariouspur-poses,whileGPUsareutilizedmainlyforgeneral-purposecomputinginsupercomput-ers.
Inaddition,someoftasksrunninginPCsstronglyrequirespecicprocessors.
Forexample,3DrenderingisnormallyprocessedbyGPUs,andsomeof3Dgraphicstrans-actionscannotbeprocessedbyCPUs,whereassomeapplicationscanbeprocessedbybothCPUsandGPUs.
WhentheGPUloadishigh,wecouldrunthelatterapplicationsexplicitlyonCPUs.
Second,wemustnotmodifyapplications.
Typically,mostofapplicationsinstalledinmajorOSessuchasWindowsandMacOScannotbemodiedbyathirdperson,duetotheirsoftwaredistributionpolicies.
Applicationvendorsmaynotbewillingtomodifytheirapplicationseither,becauseitwillnotbenetthemstraightforwardly.
Forthesereasons,existingruntimelibrariesorlibrariestodistributetasksbetweencomputedevices[6,10,7]proposedforHPCarenotdeployableonconsumerPCs.
Third,performancemetricforconsumerPCsiscomplicated,becauseuserpreferenceisoneofthemostimportantmetricsforassigningcomputedevicestoapplications.
ItisclearlydifferentfromgeneralHPC'smetricswhosetaskdistributingpolicyisusuallystatic,suchasmaximizingtasktransactionspeedormaximizingperformanceperwatt.
InPCs,taskdistributingpoliciesandmeritseasilychangedependingontheuse.
Forexample,whentheuserwouldliketoplaythe3Dgamesmoothly,theotherGPGPUtasksshouldnotbeassignedtotheGPU.
Ontheotherhand,sometimestheusermightbewillingtotranscodevideosquicklyratherthanplayingthetriinggamesmoothly.
Thecomputedeviceselectingmethodmustrecognizeuserpreferencestodecidethepropercomputedevicetoassign.
Howeverthisishard,thususerpreferencerecognizingcannotautomate.
Therefore,theresourcemanagementhastoinferPCutilizationandtheusershavetobeabletotellhowtheyareusingPCatthattime.
Inthispaper,weproposePACUEwhichallocatescomputedevicestoapplicationsefciently.
PACUEhastwofeatures,oneisdynamiccomputedeviceredirectingfeatureandtheotherissystem-wideoptimaldeviceselectingfeature.
Westronglyfocusonsolvingrealproblemswhichwilloccurwhenwedistributeoursystemovertheworldviaweb.
Therefore,wepreferchoosingpoliticallysafermethodratherthantechnicallybettermethod.
Thus,rstadvantageofPACUEisthepossibilityofthedeployment.
ThesecondadvantageofPACUEisdesignedtomaximizePCusers'experience.
Thus,webringanewmetricforusingaccelerators,anditwillbealsobenecialforothercomputerssuchassmartphonesorgameconsoles.
PACUE:ProcessorAllocatorConsideringUserExperience337OurexperimentalresultsshowthatPACUEcanswitchcomputedevicesin1outof2applications,andallof20samplecodesbuiltwithOpenCL.
Thereminderofthispaperisorganizedasfollows:InSec.
2,wedescribethedesignofPACUEconsistingofthedynamiccomputedeviceredirectingandthesystemresourcemanager.
InSec.
3,weevaluateourprototypeimplementation.
ThepaperconcludeswithSec.
4.
2DesigningPACUEPACUEisconstructedbytwocomponents;DynamicComputeDeviceRedirectorandResourceManager.
WefocusonapplicationsbuiltwithOpenCL,awidelyusedframe-workwhichsupportsmanytypesofcomputedevicessuchasCPUsandGPUs.
2.
1DynamicComputeDeviceRedirectionWedesigntheDynamicComputeDeviceRedirection(DCDR)methodtomeetthe"noapplicationmodication"requirement.
DCDRimplementsOpenCLAPIhookingthatconcealsactualcomputedevicesfromapplications,andavoidserrorcausedbyinconsistentinformationofdevices.
OpenCLAPIHooking.
OpenCLabstractscomputedevicesandmemoryhierarchytoutilizeheterogeneousprocessorswithinitsprogrammingmodel.
Toutilizeacom-putedevice,applicationscallOpenCLAPIsandspecifyacomputedevice.
Assigningprocessarefollowing:Secondly,selectpossibledevicesandcreateanOpenCLcon-text.
Thirdly,selectonedevicetouseandcreateacommandqueue.
Lastly,puttaskstothequeuecreatedabove.
Inthesecondandthethirdsteps,theapplicationspeciesaconcretedevicebecauseOpenCLAPIsneedsdeviceIDasitsparameter,whichmakessystem-wideoptimaldeviceselectionimpossible.
Foroptimaldeviceselection,were-movetherestrictionthattheapplicationsneedtochoosethedevicebyitselfbecausethedecisionishardforapplicationsandusers.
However,decisionsbyapplicationsorusersarerarelyoptimal(SeeSec.
2.
2).
PACUEhooksapartofOpenCLAPIswhichconcerndeviceselecting,andimplementsaskingfunctionthataskswhichdevicetoutilize.
ThereareseveralmethodstohookAPIsinWindows7wherePACUEisimple-mented.
TherstpossibilityismakingathreadinthetargetapplicationbycallingaWindowsAPICreateRemoteThread()[12].
Withthismethodweimplementanapplica-tionwhichmakeathreadinotherapplicationsandmapexternalDLLcontainingover-riddentargetAPIs.
However,theseapplicationsandDLLsarehardtoimplementduetocomplicatedprocedures.
Ithasariskbeingtreatedasmalwarebytheanti-malwaresoft-ware.
ThesecondpossibilityisGlobalHook,theuserapplicationhooksspecicAPIsofallapplicationbycallingWindowsAPISetWindowsHookEx()[13].
Thismethodisunsafe,becauseithasariskofhookingunknownapplicationsandcausingunexpectedaffecttothem.
ThethirdpossibilityismakingWrapperDLL,whichisaDLLwiththesamelenameoforiginalDLLandhasallAPIsoforiginalDLL.
WrapperDLLisalmostshelloforiginalDLL,becausemostAPIsaresimplycallsoriginalDLLAPIsexceptAPIswhichactuallyneedtododifferenttransactionfromoriginal.
ThismethodhasthemostchanceofhookingAPIs,becausewrapperDLLlocatedintheapplica-tiondirectoryisalwaysloadedpriortotheotherones,suchasDLLslocatedinsystem338T.
Horikawaetal.
Fig.
1.
DynamicComputeDeviceSwitchingbyOpenCLAPIHookingdirectoriesbydefault.
Inaddition,whenlocatingwrapperDLLinthedirectorywhichtargetEXElocated,onlyaffectsapplicationswhosebinaryislocatedinthesamedi-rectory.
Therefore,thisisreallysafewaytohookAPIs.
ThelastpossibilityistheuseofAPIhooklibraries,suchas[14].
Theselibrariesareeasytouse,howeverithaslessprobabilitytosuccesstohookAPIsthanWrapperDLL.
Italsohasarisktobetreatedasmalware.
Fromthiscomparison,weadopttheWrapperDLLmethod.
Fig.
1illus-tratesthearchitecturetohookOpenCLAPIswiththismethod.
OthermajorPCOSessuchasMacOSorLinuxdonotprovideanyfunctionlikewrapperDLLs,stillwecanimplementasimilarsystembyusingAPIhookingfunctionsofferedbyotherOSes.
Anothermethodtoswitchdevicesismakingavirtualdevice.
[5]Onthismethod,ap-plicationswillassignthevirtualdeviceandtheresourcemanagementsystemchoosearealdevice.
Thismethodhasasignicantadvantagethatitcanswitchrealdevicesatanytime,howeveritmayconictwithInstallableClientDriver(ICD)systemofOpenCL.
InstallerofOpenCLruntimelibrariesdistributedbyhardwarevendorssometimesover-write"OpenCL.
dll"le,thusinstallingavirtualdeviceorshowingapplicationsonlythevirtualdeviceisdifcultonPCs.
DeviceInformationCamouaging.
Whenapartofapplications'tasksareassignedtoPACUEselectedOpenCLdevice,someapplicationsshowerrors.
Thisisbecausedeviceinformationisdifferentfromtheapplication'sintendedone,thussomeapplicationsrecognizeitasanunusualevent.
Toavoidtheseerrors,PACUEcamouagesOpenCLdevicedetailswhenthedesiredOpenCLdevicehasbeenchangeddynamically.
However,camouagingOpenCLdevicedetailsisrisky,becausedeviceshavediffer-entspecicationsinthelowerlevel.
Therstriskisapplicationstability.
Thememorysizeofeachhierarchyisdevicedependent,hencetheunexpectedmemorysizecanre-sultinapplicationcrashorerror.
Thesecondriskisexecutionspeed.
Ifanapplicationimplementsper-deviceoptimization,mismatchbetweentheintendeddeviceandtheas-signeddevicecanresultinunexpectedperformancedegradation.
Fromthesereasons,weshouldcamouagesdevicedetailsonlywhenitisnecessary.
Tominimizetherisks,PACUEcamouagesdevicesinfollowinglevels.
1.
DevicetypelevelcamouageWhenanapplicationtriestoacquireanOpenCLdevicelist,PACUEwillover-writethecldevicetypevalue.
Asfaraspossible,PACUEwillchangethisvalueforCLDEVICETYPEALL.
Showingalldevicesinsteadofthespecictypede-vicesisareasonablechoice,becauseitavoidsforcingapplicationusingunknownPACUE:ProcessorAllocatorConsideringUserExperience339Table1.
ComparisonofDeviceCamouagingMethodsOverriddendevicetype/IDSpeciedTypewhengettingdevicelistSpeciedIDwhencreatingaContextSpeciedIDwhencreatingaCommandqueuecreationCrash/ErrorRiskCompatibilityA.
DevicetypelevelCPUsorGPUsAllCPUsorallGPUs\LowMostapplica-tionsB.
Contextlevel\CPUsorGPUsLowLowC.
Commandqueuelevel\ALLOneCPUoroneGPUHighMostapplica-tionsD.
A+CALLALLOneCPUoroneGPUNormalHighdevice.
Occasionally,applicationscannotexecutetheirOpenCLcodeonsomede-vicetypes.
Inthiscase,PACUEsetsthecldevicetypevaluetothedesiredtype,suchasCLDEVICETYPECPUorCLDEVICETYPEGPU.
2.
ContextlevelcamouageWhencreatinganOpenCLcontext,PACUEoverridesthecldeviceidvalueandforceOpenCLframeworktobuildOpenCLbinariesforeachcomputedevice.
IfPACUErecognizethatthetargetapplicationsupportonlyspecictypeofcomputedevices,PACUEwilloverwritethecldeviceidvalueandlimitdevicetypesforcontext.
Inaddition,PACUEoverridesthecldeviceidvaluewhenapplicationsrequestsdetaileddeviceinformation.
Therefore,applicationwillseeinformationofthedevicePACUEselected.
Thiscontributestoapplication'sstability,becauseacquireddeviceinformation,suchasthememorysizecorrespondstothatofthedeviceactuallywillbeused.
3.
CommandqueuelevelcamouageWhentheapplicationcallsclCreateCommandQueue()API,thisisthelastchancetochangethedevice.
Becauseofthestabilityissuedescribedabove,PACUEtriesnottochangedevicethistiming,butifnecessary,PACUEchangescldeviceidinargumentsofthisAPI.
Inthissituation,thedeviceiscamouagedcompletely,thustheapplicationrecognizesthecamouageddeviceasthedeviceapplicationspeci-ed.
Thisisaterriblydangerouswaytochangedevice,stillitimprovesapplicationcompatibility.
Thisisriskyintermsofdevicedependentcharacteristics,suchasthememorysize,however,wecanswitchtheprocessorinmoreapplicationswiththismethod.
Hence,thismethodisaceinthehole.
AsshowninTable1,thereareseveraldeviceassignmentoverridingwaysbythecom-binationofthesesteps.
Becausetheyhaveatrade-offbetweenapplicationcompatibilityandapplicationstability,wehavetomakearuleforapplyingthesemethods,andsomehintsareguredoutinSec.
3.
2.
2SystemResourceManagementWeneedasystem-wideresourcemanagerforheterogeneousprocessors,becauseav-eragePCuserscannotchoosepropercomputedeviceforeachapplication,anditis340T.
Horikawaetal.
inconvenientthattheyselectcomputedeviceeverytimetheapplicationruns.
Somead-vancedPCuserscanchoosepropercomputedevicemanually,howeveritisterriblyinconvenient.
Besides,manyPCusersdonotknowdetailedconstructionofthePCtheyareusing.
Theseuserscannotchoosethepropercomputedevicewhichsatisestheirpreferenceaccurately,eveniftheapplicationallowstheusertoselectthecomputede-viceonitsGUIcongurationmenu.
Forachievinghighuser-experience,theresourcemanagershouldselectacomputedeviceautomaticallyaccordingtouser'spreferences.
TherearemanystudiesinHPCareathatbuildaresourcemanagertoselectcomputedeviceautomatically[7,8].
Theyshowtaskdistributingalgorithmforheterogeneousprocessorsenvironmentthatoptimizedforsomespecicpurposes,suchasmaximizingperformanceormaximizingperformance-per-watt.
However,theycannotbeappliedtoresourcemanagementonPCbecausetherequirementsaredifferentbetweenPCandHPC.
Theotherapproachtodifferentiatetasks,suchasdevice-driverlevelapproach[9]wouldbeapossibilityforourgoal.
However,westillneedasystemwideresourcemanagertoconsiderheterogeneousprocessorsandapplications.
Thesearethreere-quirementsoftheresourcemanagerespeciallyforPCs.
–ConsideringuserpreferenceAPCuser'spreferenceoftenchangesandtheyarenotsimpleobjectssuchasmax-imizingperformance.
Inaddition,itisdifculttorecognizewhichapplicationisreallyimportant,becausewerarelyspecifypriorityoftheprocessexplicitly.
There-fore,wehavetobuildaresourcemanager,whichinfersuser'spreferencebycol-lectingPCutilizationstatusandchoosescomputedevicesforeachapplicationtoachieveuserpreferenceaccurately.
–SupportingvarioushardwarecongurationsThereareplentyofPChardwarecomponentsandapplications.
Becauseofthisreason,combinationofhardwarecomponentsandapplicationsareinnumerable.
Inaddition,thespecicationsofcomponentsdependontechnologytrends.
Forinstance,somenewGPUvirtualizationtechnologiesforPCsuchasVirtuGPUvirtualization[11]seamlesslyusediscreteGPUwhenspecicAPIscalled.
Thus,wehavetobuildresourcemanagerthatsupportsvarioushardwarecongurations.
–SupportingvariousruntimeversionsInstalledruntimelibrariesforparallelcomputingmayvaryinPCs.
Applicationexecutionspeedsarenotonlydependsonhardware,butalsodependsonruntimelibrarieslikeOpenCLframeworks.
Thus,acomputedeviceselectingalgorithmop-timizedforspecicruntimeversion,suchasdesignedforHPC,maynotshowgoodresultsonthenewerversionruntimelibraries.
Wehavetobuildcomputedevicese-lectingalgorithmsthatdonotdependonaspecicruntimeversion.
Thisresourcemanagerhasthreefeaturesforsatisfyingtherequirementsexplainedabove.
Therstfeatureisinformationgathering.
PACUEcollectsinformationabouthowPCisutilized,suchaswhetheranACadapterisconnected,temperaturesandvolt-agesofcomponents,andprocessorutilizationlevelsuchasprocessorloadsandtherunningapplicationslist.
Thesecondfeatureistheuserpreferenceinferringfeature.
Theuserdescribestheirrequirementsbycreatingseveralrequirementpatterns.
PACUEinferswhichpatternisthebestforthepresentsituationbyusinginformationacquiredinPACUE:ProcessorAllocatorConsideringUserExperience341therststep.
Thethirdfeatureiscomputedeviceselection,whichdecidestheOpenCLdevicetobeassignedtoeachapplication.
Weplantoimplementafewcomputedeviceselectingalgorithmsforseveraluserpreferencepatterns.
PACUEwillassigncomputedevicestoeachapplicationbasedonthealgorithmwhichmatchestheinferredpatternofuserpreference.
Theresourcemanagerworksascyclesofthesesteps:1.
CollectPCutilizationinformation.
2.
Guesswhichproleisthebestforthepresentcondition.
3.
Waitaninquiryofapplicationandanswerwhichdeviceshouldbeused.
Forevaluationpurpose,webuiltabasicresourcemanagerwhichhascommunicationfunctiontoorderapplicationstoutilizespeciccomputedevice.
Becauseoflackofuserpreferencebasedcomputedeviceselectingalgorithms,recentPACUEcanonlyselectcomputedevicebymanualselectionintheresourcemanagerGUI.
Still,itcanreceiveaninquiryofcomputedeviceselectionandansweracomputedevicetoutilize.
3EvaluationInthissectionweconrmPACUEprovidescomputedevicesredirectioncapabilityforapplicationswithoutmodicationonwidelyusedapplications.
Werststatethepolicyoftheevaluation,thenshowandanalyzetheresults.
3.
1EvaluationPolicyWeevaluatePACUEinaPCwithIntelCorei7-920CPUandAMDRADEONHD4850GPU.
AsOpenCLframework,weadoptx86binaryofATIStreamSDK2.
2[4].
ThisframeworksupportsbothCPUsandAMDRADEONGPUsasOpenCLdevices.
Astestingapplications,wechosethefollowings.
Theyarepubliclyreleasedandwidelyusedforbenchmarking,thussuitesourpurpose.
–DirectCompute&OpenCLBenchmark[1]–SiSoftwareSandra2011[15]–Samplecodeof"OpenCLIntrodouction"book[3]Weswitchthedevicetoutilizefortheseapplications,andcomparethemethodsfordeviceswitchingforeachoftheseapplications.
3.
2ResultsDirectCompute&OpenCLBenchmark.
Table2showstheresults.
PACUEcanredirectcomputedeviceperfectlyonDirectCompute&OpenCLBenchmark,butonlywithmethodD.
SiSoftwareSandra2011.
Deviceswitchingfailed.
WhenPACUEtriedtoswitchthedevice,Sandra2011exhibitedstrangebehavior,suchasshowingthesamedevicetwiceintheGUI.
BecauseSandra2011isaninformation&diagnosticutilityforPC,itgathersdeviceinformationbyvariousAPIs.
Thus,thefailuremaybecausedbythelackofintegritybetweendeviceinformationgatheredbyPACUEhookedOpenCLAPIandinformationgatheredbyotherAPIs.
However,PACUEdonotmakeSandracrashed.
342T.
Horikawaetal.
Table2.
ResultofDirectCompute&OpenCLBenchmarkOverrideMethodA-1A-2B-1B-2C-1C-2D-1D-2SpeciedDeviceTypeCPUGPU\\\\ALLALLSpeciedDeviceIDforContext\\CPUsGPUsALLALLALLALLSp.
Dev.
IDforCommandQueue\\\\CPUGPUCPUGPUApplicationRecognizedDevicesCPU*2GPU*2CPU*1GPU*1CPU*1CPU*1CPU*1+GPU*1CPU*1+GPU*1DynamicDeviceSwitchingImpossibleImpossibleStaticStaticStaticStaticDynamicDynamicSampleCodesof"OpenCLIntroduction"Book.
Thesecodesareasetof20sampleapplicationsofOpenCLAPIs.
Thedeviceswitchingsucceededforallapplicationsinthem.
However,1sampleusesdevicememoryinformationfortheoptimizedarraysize,thustheresultmightdependonthedevice.
Thecompletecamouagingdeviceinfor-mationmightthusbeincompatiblewiththeinformationexpectedbythesample.
Thiscancausetheapplicationcrashingorerrors,howeveritseemedtobeworkingcorrectlywhiletheexperiment.
3.
3AnalysisTheresultsshowthatPACUEcanswitchthecomputedevicesonrealapplications.
However,itfailsfordevicedependentapplications.
Theyusedetailedinformationoftheparticulardevice,suchasdevicememorysize.
Thus,theymaycrashorbehavestrangelybecauseoftheinformationcamouagedbyPACUE.
Amongcombinationsofthedeviceinformationoverriding,wefoundtheproperor-dertoapplyonapplications.
ShowninTable1,thesemethodshaveatrade-offbetweenapplicationstabilityandapplicationcompatibility.
Inourevaluation,wefoundthatthecompletecamouagingmethodsignicantlyincreaseapplicationcompatibilityforrealapplications,suchasDirectCompute&OpenCLBenchmark.
However,itisrealizedbygivingapplicationstheinformationofthedevicetheapplicationspecied,insteadofgivingthedeviceinformationactuallyusing.
Originalapplicationcreatoristheonlyonewhoknowsiftheapplicationworkscorrectlywhenusingthecompletecamouag-ingmethod,thusweshouldavoidusingthisriskymethodifpossible.
Ingeneral,wesuggestthefollowingmethodapplyingorder;1.
OverridedevicetypeALLandoverridedeviceidwhencreatingcontext.
(Table1B)2.
OverridedevicetypeALLandoverridedeviceidwhencreatingcommandqueue.
(Table1D)3.
Keeporiginaldevicetypeandoverridedeviceidwhencreatingcommandqueue.
(Table1C)4.
OverridedevicetypeCPUorGPUwhenapplicationrequestslistofavailablede-vices.
(Table1A)Thersttothethirdmethodssimilarlyrealizedynamicdeviceselection.
Theupperissafer,thelowerhasmorecompatibility.
Applicationsthatcannotswitchdeviceswiththerstmethodshouldusethesecondorthethirdmethod.
Thelastonehasthehigh-estcompatibilitybutitonlyprovidesstaticandrestrictivedeviceswitching.
Thus,thismethodshouldbeappliedwhenallothermethodsfail.
PACUE:ProcessorAllocatorConsideringUserExperience3434ConclusionsandFutureWorkInthispaperwepresentedPACUE.
First,PACUEswitchesthecomputedevicesdynam-icallyforapplicationsonPCswithheterogeneousprocessors.
Second,PACUEchoosescomputedevicesassignedtoapplicationstomeettheuser'srequirement.
Weconductedexperimentsofourimplementation,anddemonstratedthat1outof2realOpenCLap-plications,andallof20sampleprogramscanchangethecomputedevicedynamicallywiththedynamiccomputedeviceredirector.
Inaddition,weshowedthatafewde-viceinformationcamouagingmethodssignicantlyincreaseapplicationcompatibil-ity.
Fromabovework,wedemonstratedpotentialavailabilityofthedynamiccomputedeviceredirectingwithoutapplicationmodied.
However,thereare2technicaldisad-vantagesinPACUE.
TherstdisadvantageisthatPACUEcanswitchdevicesonlywhencreatingcommandqueue.
Thisisbecausethereisnosupportfordynamicdeviceswitch-inginOpenCL,thusthechancesforswitchingdevicesarelimited.
Wewillinvestigateothermethodstoexpandthechancesforswitchingdevices,alsowewillinvestigatethefrequenciesofthedeviceswitchingtimingonotherAPIs.
TheseconddisadvantageisOpenCLkerneloptimization.
Becauseofdeviceinformationcamouaging,thereisapossibilityofexecutingkernelsdesignedforotherdevices.
Thismaydecreasetheper-formancesignicantly,thusweshouldavoidmakingsituationslikethat.
OneansweriscachingeverytypeofkernelsourcecodesbyAPIhooking,andswitchitaccordingtothedeviceactuallyusing.
Anotheranswerisapplyingjust-in-timeOpenCLcodeopti-mizationtechniquetoimproveperformance.
However,bothofthemcaninterferethecopyrightlaworlicensesoftheapplications.
Therefore,itmaybedifculttoapplyitforPCapplications.
Becauseofthisreason,wecontinueimprovingcamouagemethodsandwewillavoidshowingdifferentdevicesinformationaspossibleaswecan.
Forourresearchgoals,wehavetheseongoingworks:IncreaseCompatibilityforApplications.
WewilladdresstheproblemthatPACUEcannotswitchcomputedevicesinsomeapplications.
Alsowewillexperimentapplica-tionstabilitytestsonapplications.
EvaluateinManyHardwareEnvironment.
WewillconductexperimentsonmorehardwarecongurationsuchasVirtu,andimprovehardwaresupportofPACUE.
ImplementtheUserPreferencesHandlerintheResourceManager.
Weassumethatthereareseveralpatternsdescribinguserpredenedrequirements(e.
g.
,playingimportantgamewiththeACadaptor,andhastylecompressionwithunremarkablevideoencoding).
PACUEinfersmatchingpatternfromtheuser'sactivityandresourceutilization.
ImplementComputeDeviceSelectingAlgorithm.
Withuserrequirementrecogni-tion,weselectcomputedevicestofollowuserpreferenceaccurately.
Wewillimple-mentsomealgorithmsandparametersetsforeachuserrequirementpattern.
Also,wewillexploreperformanceimpactwhileredirectingcomputedeviceinrealapplicationsandtakemeasureagainstheavyperformancedegradation.
ShowingapplicationsnoOpenCLdevicebyoverridingOpenCLAPIscanbeoneoftheanswers.
Inthiscase,344T.
Horikawaetal.
applicationswilluseinternaloptimizedassemblytoexecuteitstransactionanditisoftenmuchfasterthanexecutingOpenCLcodeonCPUs.
However,ithasadisadvan-tagethatcomputedevicecannotchangeuntilrestartingtheapplication,becausetheapplicationwillnevercallOpenCLAPIsagain.
Therefore,wewillinvestigateeachapplication'sbehaviorconcretelytodecidehowtoletapplicationtouseCPUs.
SupportforOtherParallelComputingFrameworks.
Weplantoimplementmod-ulesforotherAPIssuchasFusionSystemArchitectureIntermediateLayerLanguage(FSAIL).
References1.
DirectCompute&OpenCLBenchmark,http://www.
ngohq.
com/graphic-cards/16920-directcompute-and-opencl-benchmark.
html(accessedonAugust21,2011)2.
OpenCL1.
1Specication,http://www.
khronos.
org/registry/cl/specs/opencl-1.
1.
pdf3.
FixtarsCorporation:OpenCLIntroduction-ParallelProgrammingforMulticoreCPUsandGPUs.
ImpressJapan(January2010)(inJapanese)4.
AMD.
ATIStreamTechnology,http://www.
amd.
com/US/PRODUCTS/TECHNOLOGIES/STREAM-TECHNOLOGY/Pages/stream-technology.
aspx(accessedonAu-gust21,2011)5.
Aoki,R.
,Oikawa,S.
,Tsuchiyama,R.
,Nakamura,T.
:Hybridopencl:Connectingdifferentopenclimplementationsovernetwork.
In:Proc.
IEEECIT2010,pp.
2729–2735(2010)6.
Brodman,J.
C.
,Fraguela,B.
B.
,Garzaran,M.
J.
,Padua,D.
:Newabstractionsfordataparallelprogramming.
In:Proc.
USENIXHotPar,p.
16(2009)7.
Diamos,G.
F.
,Yalamanchili,S.
:Harmony:anexecutionmodelandruntimeforheteroge-neousmanycoresystems.
In:Proc.
ACMHPDC,pp.
197–200(2008)8.
Gupta,V.
,Schwan,K.
,Tolia,N.
,Talwar,V.
,Ranganathan,P.
:Pegasus:CoordinatedSchedul-ingforVirtualizedAccelerator-basedSystems.
In:Proc.
USENIXATC,pp.
31–44(2011)9.
Kato,S.
,Lakshmanan,K.
,Rajkumar,R.
,Ishikawa,Y.
:TimeGraph:GPUSchedulingforReal-TimeMulti-TaskingEnvironments.
In:Proc.
USENIXATC,pp.
17–30(2011)10.
Liu,W.
,Lewis,B.
,Zhou,X.
,Chen,H.
,Gao,Y.
,Yan,S.
,Luo,S.
,Saha,B.
:Abalancedpro-grammingmodelforemergingheterogeneousmulticoresystems.
In:Proc.
USENIXHotPar,p.
3(2010)11.
Lucidlogix.
Lucidlogixvirtu,http://www.
lucidlogix.
com/product-virtu.
html(accessedonAugust21,2011)12.
Microsoft.
CreateRemoteThreadFunction(Windows),http://msdn.
microsoft.
com/en-us/library/ms682437.
aspx(accessedonAugust21,2011)13.
Microsoft.
SetWindowsHookExFunction(Windows),http://msdn.
microsoft.
com/en-us/library/ms644990.
aspx(accessedonAugust21,2011)14.
MicrosoftResearch.
Detours-microsoftresearch,http://research.
microsoft.
com/en-us/projects/detours/(accessedonAugust21,2011)15.
SiSoftware.
Sisoftwarezone,http://www.
sisoftware.
net/(accessedonAugust21,2011)

imidc:$88/月,e3-1230/16G内存/512gSSD/30M直连带宽/13个IPv4日本多IP

imidc对日本独立服务器在搞特别促销,原价159美元的机器现在只需要88美元,而且给13个独立IPv4,30Mbps直连带宽,不限制流量。注意,本次促销只有一个链接,有2个不同的优惠码,你用不同的优惠码就对应着不同的配置,价格也不一样。88美元的机器,下单后默认不管就给512G SSD,要指定用HDD那就发工单,如果需要多加一个/28(13个)IPv4,每个月32美元...官方网站:https:...

萤光云(20元/月),香港CN2国庆特惠

可以看到这次国庆萤光云搞了一个不错的折扣,香港CN2产品6.5折促销,还送50的国庆红包。萤光云是2002年创立的商家,本次国庆活动主推的是香港CN2优化的机器,其另外还有国内BGP和高防服务器。本次活动力度较大,CN2优化套餐低至20/月(需买三个月,用上折扣+代金券组合),有需求的可以看看。官方网站:https://www.lightnode.cn/地区CPU内存SSDIP带宽/流量价格备注购...

Dynadot COM特价新注册48元

想必我们有一些朋友应该陆续收到国内和国外的域名注册商关于域名即将涨价的信息。大概的意思是说从9月1日开始,.COM域名会涨价一点点,大约需要单个9.99美元左右一个。其实对于大部分用户来说也没多大的影响,毕竟如今什么都涨价,域名涨一点点也不要紧。如果是域名较多的话,确实增加续费成本和注册成本。今天整理看到Dynadot有发布新的八月份域名优惠活动,.COM首年注册依然是仅需48元,本次优惠活动截止...

sisoftwaresandra为你推荐
外挂购买什么外挂网好点硬盘工作原理数据存储的原理是什么mathplayer如何学好理科杰景新特萨克斯吉普特500是台湾原产的吗丑福晋大福晋比正福晋大么丑福晋八阿哥胤禩有几个福晋 都叫啥名儿呀www.haole012.com012.qq.com是真的吗qq530.com求教:如何下载http://www.qq530.com/ 上的音乐avtt4.comwww.5c5c.com怎么进入avtt4.comCOM1/COM3/COM4是什么意思??/
虚拟主机试用30天 游戏服务器租用 个人域名备案 域名备案只选云聚达 免费申请域名和空间 美国主机排名 t牌 2017年万圣节 mysql主机 eq2 长沙服务器 美国十次啦服务器 天互数据 idc资讯 hinet 东莞idc 路由跟踪 游戏服务器出租 徐州电信 测试网速命令 更多