expensiveamd裁员

amd裁员  时间:2021-04-02  阅读:()
FAQ1of13AMDAPPSDKv2.
8FAQ1GeneralQuestions1.
DoIneedtouseadditionalsoftwarewiththeSDKTorunanOpenCLapplication,youmusthaveanOpenCLruntimeonyoursystem.
IfyoursystemincludesarecentAMDdiscreteGPU,oranAPU,youalsoshouldinstallthelatestCatalystdrivers,whichcanbedownloadedfromAMD.
com.
Informationonsupporteddevicescanbefoundatdeveloper.
amd.
com/appsdk.
IfyoursystemdoesnotincludearecentAMDdiscreteGPU,orAPU,theSDKinstallsaCPU-onlyOpenCLrun-time.
Also,werecommendusingthedebuggingprofilingandanalysistoolscontainedintheAMDCodeXLheterogeneouscomputetoolssuite.
2.
WhichversionsoftheOpenCLstandarddoesthisSDKsupportAMDAPPSDK2.
8supportsdevelopmentofapplicationsusingtheOpenCLSpecificationv1.
2.
AsallOpenCL1.
1APIsaresupportedwithinOpenCL1.
2,youalsocandevelopOpenCL1.
1-compliantapplications.
3.
WillapplicationsdevelopedtoexecuteonOpenCL1.
1stilloperateinanOpenCL1.
2environmentOpenCLisdesignedtobebackwardscompatible.
TheOpenCL1.
2run-timedeliveredwiththeAMDCatalystdriversrunanyOpenCL1.
1-compliantapplication.
However,anOpenCL1.
2-compliantapplicationwillnotexecuteonanOpenCL1.
1run-timeifAPIsonlysupportedbyOpenCL1.
2areused.
4.
DoesAMDprovideanyadditionalOpenCLsamples,otherthanthosecontainedwithintheSDKThemostrecentversionsofallofthesamplescontainedwithintheSDKarealsoavailableforindividualdownloadfromthedeveloper.
amd.
com/appsdk"Samples&Demos"page.
ThispagealsocontainsadditionalsamplesthateitherweretoolargetoincludeintheSDK,orwhichhavebeendevelopedsincethemostrecentSDKrelease.
Checkthiswebpagefornew,updated,orlargesamples.
5.
HowoftencanIexpecttogetAMDAPPSDKupdatesDeveloperscanexpectthattheAMDAPPSDKmaybeupdatedtwotothreetimesayear.
Actualreleaseintervalsmayvarydependingonavailablenewfeaturesandproductupdates.
AMDiscommittedtoprovidingdeveloperswithregularupdatestoallowthemtotakeadvantageofthelatestdevelopmentsinAMDAPPtechnology.
2of13FAQ6.
WhatisthedifferencebetweentheCPUandGPUcomponentsofOpenCLthatarebundledwiththeAMDAPPSDKTheCPUcomponentusesthecompatibleCPUcoresinyoursystemtoaccelerateyourOpenCLcomputekernels;theGPUcomponentusesthecompatibleGPUcoresinyoursystemtoaccelerateyourOpenCLcomputekernels.
7.
WhatCPUsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonTheCPUcomponentofOpenCLbundledwiththeAMDAPPSDKworkswithanyx86CPUwithSSE3orlater,aswellasSSE2.
xorlater.
AMDCPUshavesupportedSSE3(andlater)since2005.
SomeexamplesofAMDCPUsthatsupportSSE3(orlater)aretheAMDAthlon64(startingwiththeVenice/SanDiegosteppings),AMDAthlon64X2,AMDAthlon64FX(startingwithSanDiegostepping),AMDOpteron(startingwithE4stepping),AMDSempron(startingwithPalermostepping),AMDPhenom,AMDTurion64,andAMDTurion64X2.
8.
WhatAPUsandGPUsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonForthelistofsupportedAPUsandGPUs,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk9.
CanmyOpenCLcoderunonGPUsfromothervendorsAtthistime,AMDdoesnotplantohavetheAMDAPPSDKsupportGPUproductsfromothervendors;however,sinceOpenCLisanindustrystandardprogramminginterface,programswritteninOpenCL1.
2canberecompiledandrunwithanyOpenCL-compliantcompilerandruntime.
10.
WhatversionofMSVisualStudioissupportedTheAMDAPPSDKv2.
8withOpenCL1.
2supportsMicrosoftVisualStudio2008ProfessionalEdition,MicrosoftVisualStudio2010ProfessionalEdition,andMicrosoftVisualStudio2012.
11.
IsitpossibletorunmultipleAMDAPPapplications(computeandgraphics)concurrentlyMultipleAMDAPPapplicationscanberunconcurrently,aslongastheydonotaccessthesameGPUatthesametime.
AMDAPPapplicationsthatattempttoaccessthesameGPUatthesametimeareautomaticallyserializedbytheruntimesystem.
12.
WhichgraphicsdriverisrequiredforthecurrentAMDAPPSDKv2.
8withOpenCL1.
2CPUsupportFortheminimumrequiredgraphicsdriver,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk.
Ingeneral,itisadvisedthatyouupdateyoursystemtousethemostrecentgraphicsdriversthatareavailableforit.
13.
HowdoesOpenCLcomparetootherAPIsandprogrammingplatformsforparallelcomputing,suchasOpenMPandMPIWhichoneshouldIuseOpenCLisdesignedtotargetparallelismwithinasinglesystemandprovideportabilitytomultipledifferenttypesofdevices(GPUs,multi-coreCPUs,etc.
).
OpenMPtargetsmulti-coreFAQ3of13CPUsandSMPsystems.
MPIisamessagepassingprotocolmostoftenusedforcommunicationbetweennodes;itisapopularparallelprogrammingmodelforclustersofmachines.
Eachprogrammingmodelhasitsadvantages.
ItisanticipatedthatdevelopersmixAPIs,forexampleprogrammingaclusterofmachineswithGPUswithMPIandOpenCL.
14.
IfIwritemycodeontheCPUversion,doesitworkontheGPUversion,ordoIhavetomakechanges.
AssumingthesizelimitationsforCPUsisconsidered,thecodeworksonboththeCPUandGPUcomponents.
Performancetuning,however,isdifferentforeach.
15.
WhatistheprecisionofmathematicaloperationsSeeChapter7,"OpenCLNumericalCompliance,"oftheOpenCL1.
2Specificationforexactmathematicaloperationsprecisionrequirements.
http://developer.
amd.
com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.
aspxID=8816.
Arebyte-addressablestoressupportedByte-addressablestoresaresupported.
17.
ArelongintegerssupportedYes,64-bitintegersaresupported.
18.
AreoperationsonvectorssupportedYes,operationsonvectorsaresupported.
19.
IsswizzlingsupportedYes,swizzling(therearrangingofelementsinavector)issupported.
20.
HowdoesoneverifyifOpenCLhasbeeninstalledcorrectlyonthesystemRunclinfofromthecommand-lineinterface.
TheclinfotoolshowstheavailableOpenCLdevicesonasystem.
21.
HowdoIknowifIhaveinstalledthelatestversionoftheAMDAPPSDKOninstallationoftheSDK,ifaninternetconnectionisavailable,theinstallerstateswhetherornotanewerSDKisavailableinthefileVersionInfo.
txtindirectoryC:\ProgramFiles(x86)\AMDAPP\docs\.
Alternatively,youcancheckforthelatestavailableversionoftheAMDAPPSDKathttp://developer.
amd.
com/appsdk.
2Optimizations22.
HowdoIuseconstantsinakernelforbestperformanceForperformanceusingconstants,highesttolowestperformanceisachievedusing:-Literalvalues-Constantpointerwithcompiletimeconstantsindexing.
-Constantpointerwithruntimeconstantindexingthatisthesameforallthreads.
4of13FAQ-Constantpointerwithlinearaccessindexingthatisthesameforallthreads.
-Constantpointerwithlinearaccessindexingthatisdifferentbetweenthreads.
-Constantpointerwithrandomaccessindexing.
23.
WhyareliteralvaluesthefastestwaytouseconstantsUpto96bitsofliteralvaluesareembeddedintheinstruction;thus,intheory,thereisnolimitonthenumberofusableliterals.
Inpractice,thelimitis16Kuniqueliteralsinacompilationunit.
24.
Whydoesa*b+cnotgenerateamadinstructionDependingonthehardwareandthefloatingpointprecision,thecompilermaynotgenerateamadinstructionforthecomputationofa*b+cduetothefloating-pointprecisionrequirementsintheOpenCLspecification.
Here,developerswhowanttoexploitthemadinstructionforperformancecanreplacethatcomputationwiththemad()built-infunctioninOpenCL.
25.
Whatisthepreferredwork-groupsizeThepreferredwork-groupsizeontheAMDplatformisamultipleof64.
3OpenCLQuestions26.
WhatisOpenCLOpenCL(OpenComputingLanguage)isthefirsttrulyopenandroyalty-freeprogrammingstandardforgeneral-purposecomputationsonheterogeneoussystems.
OpenCLletsprogrammerspreservetheirexpensivesourcecodeinvestmentandeasilytargetbothmulti-coreCPUsandthelatestGPUs,suchasthosefromAMD.
Developedinanopenstandardscommitteewithrepresentativesfrommajorindustryvendors,OpenCLgivesusersacross-vendor,non-proprietarysolutionforacceleratingtheirapplicationsontheirCPUandGPUcores.
27.
HowmuchdoestheAMDOpenCLdevelopmentplatformcostAMDbundlessupportforOpenCLaspartofitsAMDAPPSDKproductoffering.
TheAMDAPPSDKisofferedtodevelopersandusersfreeofcharge.
28.
WhatoperatingsystemsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportAMDAPPSDKv2.
8runson32-bitand64-bitversionsofWindowsandLinux.
Fortheexactlistofsupportedoperatingsystems,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk29.
CanIwriteanOpenCLapplicationthatworksonbothCPUandGPUApplicationsthatprogramtothecoreOpenCL1.
2APIandkernellanguageshouldbeabletotargetbothCPUsandGPUs.
Atruntime,theappropriatedevice(CPUorGPU)mustbeselectedbytheapplication.
FAQ5of1330.
DoestheAMDOpenCLcompilerautomaticallyvectorizeforSSEontheCPUTheCPUcomponentofOpenCLthatisbundledwiththeAMDAPPSDKtakesadvantageofSSE3instructionsontheCPU.
ItalsotakesadvantageoftheAVXinstructionswheresupported.
InadditiontoAVX,OpenCLmathlibraryfunctionsalsoleverageXOPandFMA4capabilitiesonCPUsthatsupportthem.
31.
DoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonmultipleGPUs(ATICrossFire)OpenCLapplicationscanexplicitlyinvokeseparatecomputekernelsonmultiplecompatibleGPUsinasinglesystem.
Thepartitioningofthealgorithmtomultipleparallelcomputekernelsmustbedonebythedeveloper.
ItisrecommendedthatATICrossFirebeturnedoffinmostsystemconfigurationssothatAMDAPPapplicationscanaccessallavailableGPUsinthesystem.
ATICrossFiretechnologyallowsmultipleAMDGPUstoworktogetheronasinglegraphics-renderingtask.
ThismethoddoesnotapplytoAMDAPPcomputationaltasksbecauseitisnotcompatiblewiththecomputemodelusedforAMDAPPapplications.
32.
CanIshippre-compiledOpenCLapplicationbinariesthatworkoneitherCPUorGPUByusingOpenCLruntimeAPIs,developerscanwriteOpenCLapplicationsthatcandetecttheavailablecompatibleCPUsandGPUsinthesystem.
Thisletsdeveloperspre-compileapplicationsintobinariesthatdynamicallyworkoneitherCPUsorGPUsthatexecuteontargeteddevices.
IncludingLLVMIRinthebinaryprovidesameansforthebinarytosupportdevicesforwhichtheapplicationwasnotexplicitlypre-compiled.
33.
IstheOpenCLdoubleprecisionoptionalextensionsupportedTheKhronosandAMDdoubleprecisionextensionsaresupportedoncertaindevices.
YourapplicationcanusetheOpenCLAPItoqueryifthisfunctionalityissupportedonthedeviceinuse.
34.
IsitpossibletowriteOpenCLcodethatscalestransparentlyovermultipledevicesForOpenCLprogramsthattargetonlymulti-coreCPUs,scalingcanbedonetransparently;however,scalingacrossmultipleGPUsrequiresthedevelopertoexplicitlypartitionthealgorithmintomultiplecomputekernels,aswellasexplicitlylaunchthecomputekernelsontoeachcompatibleGPU.
35.
WhatshouldIdoifIgetwrongresultsontheAppleplatformwithAMDdevicesApplehandlessupportfortheAppleplatform;pleasecontactthem.
36.
IsitpossibletodynamicallyindexintoavectorNo,thisisnotpossiblebecauseavectorisnotanarray,butarepresentationofahardwareregister.
6of13FAQ37.
Whatisthedifferencebetweenlocalinta[4]andinta[4]localinta[4]useshardwarelocalmemory,whichisasmall,low-latency,high-bandwidthmemory;inta[4]usesper-threadhardwarescratchmemory,whichislocatedinuncachedglobalmemory.
38.
Whydoesusingabarriercausethemaxkernelwork-groupsizetodropto64onHD4XXXchipsThesupportedHD4XXXchipsdonothaveahardwarebarrier,sotheOpenCLruntimecannotexecutemorethanasinglewavefrontpergrouptosatisfytheOpenCLmemoryconsistencymodel.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
39.
HowcomemyprogramrunsslowerinOpenCLthaninCUDA/Brook+/ILWhencomparingperformance,itisbettertocomparecodeoptimizedforourOpenCLplatformagainstcodeoptimizedagainstanothervendor'sOpenCLplatforms.
Bycomparingthesametoolchainondifferentvendors,youcanfindoutwhichvendorshardwareworksthebestforyourproblemset.
40.
WhycanInotusetextureonRV7XXdevicesinOpenCLRV7XXdevicesdonotsupportallofthetexturemodesandprecisionrequirementsthatOpenCLrequires.
SincetexturesaremappedtoimagesinOpenCLandisan"allornothing"approach,wedonotsupportimagesonRV7XXdevices;thus,thereisnoaccesstotextures.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
41.
Whydoread-writeimagesnotexistinOpenCLOpenCLhasamemoryconsistencymodelthatrequirescertainconstraints(seetheOpenCLSpecificationformoreinformation).
Sinceimagesarespecialfunctionalhardwareunits,theyaredifferentforreadingandwriting.
Thisisdifferentfrompointers,whichforthemostpartusethesamehardwareunitsandcanguaranteetheconsistencythatOpenCLrequires.
42.
DoesprefetchingworkontheGPUPrefetchisnotneededontheGPUbecausethehardwarehasabuilt-inmechanismtohidelatencywhenmanywork-groupsarerunning.
TheLDScanbeusedasasoftware-controlledcache.
43.
Howdoyoudeterminethemaxnumberofconcurrentwork-groupsThemaximumnumberofconcurrentwork-groupsisdeterminedbyresourceusage.
Thisincludesnumberofregisters,amountofLDSspace,andnumberofthreadsperworkgroup.
Thereisnowaytodirectlyspecifythenumberofregistersusedbyakernel.
EachSIMDhasa64-wideregisterfile,witheachcolumnconsistingof256x32x4registers.
44.
IsitpossibletotellOpenCLnottouseallCPUCoresYes,usethedevicefissionextension.
FAQ7of134OpenCLOptimizations45.
Whatismoreefficient,theternaryoperator:ortheselectfunctionTheselectfunctioncompilestothesinglecycleinstruction,cmov_logical;inmostcases,:alsocompilestothesameinstruction.
Insomecases,whenmemoryisinoneoftheoperands,the:operatoriscompileddowntoanIF/ELSEblock.
AnIF/ELSEblocktakesmorethanasingleinstructiontoexecute.
46.
Whatisthedifferencebetween24-bitand32-bitintegeroperations24-bitoperationsarefasterbecausetheyusefloatingpointhardwareandcanexecuteonallcomputeunits.
Many32-bitintegeroperationsalsorunonallstreamprocessors,butifbotha24-bitanda32-bitversionexistforthesameinstruction,the32-bitinstructionexecutesonlyonepercycle.
5HardwareInformation47.
Howare8/16-bitoperationshandledinhardwareThe8/16-bitoperationsareemulatedwith32-bitregisters.
48.
Do24-bitintegersexistinhardwareNo,thereare24-bitinstructions,suchasMUL24/MAD24,butthesmallestintegerinhardwareregistersis32-bits.
49.
Whatarethebenefitsofusing8/16-bittypesover32-bitintegers8/16-bittypestakelessmemorythana32-bitintegertype,increasingtheamountofdatayouareabletoloadwithasingleinstruction.
TheOpenCLcompilerup-convertsto32-bitsonloadanddown-convertstothecorrecttypeonstore.
50.
WhatisthedifferencebetweenaGPRandasharedregisterAlthoughtheyarephysicallyequivalent,thedifferenceiswhethertheregisteroffsetinthehardwareisabsolutetotheregisterfileorrelativetothewavefrontID.
51.
HowoftenarewavefrontscreatedWavefrontsarecreatedbythehardwaretoexecuteaslongasresourcesareavailable.
Iftheyarecreatedbutcannotexecuteimmediately,theyareputinawaitqueuewheretheystayuntilcurrentlyrunningwavefrontsarefinished.
52.
WhatisthemaximumnumberofwavefrontsThemaximumnumberofwavefrontsisdeterminedbywhichresourcelimitsthenumberofwavefrontsthatcanbespawned.
Thiscanbethenumberofregisters,amountoflocalmemory,requiredstacksize,orotherfactors.
Computeshaderwithlocalmemoryusagehasahardcapat16wavefronts.
8of13FAQ53.
WhydoIgetblueorblackscreenswhenexecutinglongerrunningkernelsTheGPUisnotapreemptabledevice.
IfyouarerunningtheGPUasyourdisplaydevice,ensurethatacomputeprogramdoesnotusetheGPUpastacertaintimelimitsetbyWindows.
Exceedingthetimelimitcausesthewatchdogtimertotrigger;thiscanresultinundefinedprogramresults.
54.
WhatisthecostofaclauseswitchIngeneral,thelatencyofaclauseswitchisaround40cycles.
NotethatthisisrelevantonlyforEvergreenandNorthernIslanddevices.
55.
HowcanIhideclauseswitchlatencyByexecutingmultiplewavefrontsinparallel.
NotethatthisisrelevantonlyforEvergreenandNorthernIslanddevices.
56.
HowcanIreduceclauseswitchesClauseswitchesarealmostdirectlyrelatedtosourceprogramcontrolflow.
Byreducingsourceprogramcontrolflow,clauseswitchescanalsobereduced.
ThisisonlyrelevantforEvergreenandNorthernIslandsdevices.
57.
HowdoesthehardwareexecutealooponawavefrontThelooponlyendsexecutionforawavefrontonceeverythreadinthewavefrontbreaksoutoftheloop.
Onceathreadbreaksoutoftheloop,allofitsexecutionresultsaremasked,buttheexecutionstilloccurs.
58.
HowdoesflowcontrolworkwithwavefrontsTherearenoflowcontrolunitsforeachindividualthread,sothewholewavefrontmustexecutethebranchifanythreadinthewavefrontexecutesthebranch.
Iftheconditionisfalse,theresultsarenotwrittentomemory,buttheexecutionstilloccurs.
59.
WhatistheconstantbuffersizeonGPUhardware64kB.
60.
Whathappenswithout-of-boundmemoryoperationsWritesaredropped,andreadsreturnapre-definedvalue.
61.
For7XXdevices,whydoes64x1givebadperformanceinOpenCLkernelsOneofthereasonsisbecauseofhowthecachesaresetuponRV7XXdevices.
Thecachesareoptimizedtoworkinatiledmode,notinlinearmode(whichisthemodeOpenCLkernelsuse).
Togetoptimalcachere-usefromthetextureincomputeshadermodeonRV7XXdevices,reblockyourthreadIDs.
A16x4,8x8,or4x16shouldgiveyougoodenoughblockingtogetsimilarcacheperformanceasyourpixelshaderkernel.
Thisisbecauseacachelinecanbethoughtofasa4x2blockofdatacominginatonce.
So,forpixelshaders,64threadsareblockedina8x8blockthatusesexactlyeightcachelines.
ForOpenCLkernels,your64x1blockpatternuses16cachelines,butonlyuseshalfthedataineachcacheline.
NotethatHD4XXXdevicesupportisEOL.
CatalystdriversnolongerincludeFAQ9of13supportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
62.
WhatisuniqueabouttheLDSinHD4XXXdevices,andwhatareitsperformancecharacteristicsTheLDSintheHD4XXXdevicesisanowner'swritemodelwithlimitedapplications.
Whenusedcorrectly,ithasverysimilarperformancecharacteristicstotheL1cache,buttheusergainscontroloverwhatdataexistsinthememory.
TheLDS_TransposesampleintheSDKusestheLDSintheHD4XXXdevicesveryefficiently.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
6MicrosoftVisualStudio63.
CanIusetheSDKwithMicrosoftVisualStudio2012ExpressDuetolimitationsinMicrosoftVisualStudioExpress,itisonlypossibletousebuildfilesforindividualsamples.
MicrosoftVisualStudioExpressdoesnotsupportbuildingofallofthesamplesatthesametime.
TheprojectfilesthatbuildallofthesamplesareonlysupportedbyfullversionsofMicrosoftVisualStudio2008,2010,or2012.
7Bolt64.
WhatGPUsisBoltoptimizedforBoltisoptimizedforAMD"SouthernIslands"familyofGPUs.
65.
WhichGPUcomputetechnologiesaresupportedbyBoltThepreviewversionofBoltusesOpenCLasitscomputeback-end;however,futurereleasesshouldsupportbothC++AMPandOpenCLcomputeback-ends.
66.
AretheBoltAPIsfinalTheAPIsintheBoltpreviewarenotfinal.
TheBoltAPIshouldbestableandshoulduseformaldeprecationproceduresfromBoltrelease1.
0onwards.
PleasehelpustoimproveBolt;weencourageyoutoprovidesuggestionsonhowwecanimprovetheBoltAPI.
67.
DoesBoltrequirelinkingtoalibraryYes,Boltincludesasmall,staticlibrarythatmustbelinkedintoauserprogramtoproperlyresolvesymbols.
SinceBoltisatemplatelibrary,almostallfunctionalityiswritteninheaderfiles,butOpenCLfollowsanonlinecompilationmodelwhichmakessensetoincludeinalibraryandnotinheaderfiles.
68.
DoesBoltdependonanyotherlibrariesYes,BoltcontainsdependenciesonBoost.
Alldependentheaderfilesandpre-compiledlibrariesareavailableintheBoltSDK.
10of13FAQ69.
WhatalgorithmsdoesBoltcurrentlysupportTransform,reduce,transform_reduce,count,count_if,inclusive_scan,exclusive_scan,andsort.
70.
WhatversionofOpenCLdoesBoltcurrentlyrequireBoltusesfeaturesavailableinOpenCLv1.
2.
71.
DoesBoltrequiretheAMDOpenCLruntimeYes,Boltrequirestheuseoftemplatesinkernelcode.
Atthetimeofthiswriting,AMDistheonlyvendortoprovidethissupport.
72.
WhichCatalystpackageshouldIusewithBoltGenerallyspeaking,downloadingandinstallingthelatestCatalystpackagecontainsthemostrecentOpenCLruntime.
Asofthetimeofthiswriting,therecommendedCatalystpackageis12.
10.
73.
WhichlicenseisBoltlicensedunderTheApacheLicense,Version2.
0.
74.
WhenshouldIusedevice_vectorvsregularhostmemorybolt::cl::device_vectorisusedtomanagedevice-localmemoryandcandeliverhigherperformanceondiscreteGPUsystem.
However,thehostmemoryinterfaceseliminatetheneedtocreateandmanagedevice_vectors.
Ifmemoryisre-usedacrossmultipleBoltcallsorisreferencedbyotherkernels,usingdevice_vectordelivershigherperformance75.
HowdoImeasuretheperformanceofOpenCLBoltlibrarycallsThefirsttimethataBoltlibraryroutineiscalled,theruntimecallstheOpenCLcompilertocompilethekernel.
EachuniquetemplateinstantiationreusesthesameOpenCLcompilation,sotwoBoltcallswiththesamefunctorsarecompiledonlyonetime.
WhenmeasuringtheperformanceofBolt,itisimportanttoexcludethefirstcall(withthecompilationoverhead)fromthetimingmeasurement.
HerearetwowaystotimeBoltfunctioncalls:Thefirstexamplepullsthecalloutsidethetimerloop.
std::vectora;bolt::cl::transform(a.
begin,a.
end,z.
begin(),bolt::cl::negate);startTimer(.
.
.
);for(inti=0;i);}stopTimer(.
.
.
);Asecondalternativeistoexplicitlyexcludethefirstcompilationfromthetimingregion://Explicitlystarttimerafterfirstiteration:std::vectora,z;for(inti=0;i);FAQ11of13}stopTimer(.
.
.
);76.
TheBoltlibraryAPIsaccepthostdatastructuressuchasstd::vector().
ArethesecopiedtotheGPUorkeptinhostmemoryForhostdatastructures,BoltcreatesanOpenCLbufferwiththeCL_MEM_USE_HOST_PTRflagandusestheOpenCLmapandunmapAPIstomanagethehostanddeviceaccess.
OntheAMDOpenCLimplementation,thisresultsindatabeingkeptinhostmemorywhenthehostmemoryisalignedon256-byteboundary.
Inothercases,theruntimecopiesthedataasneededtotheGPU.
Seesection4.
5.
2oftheAMDAcceleratedParallelProcessingOpenCLProgrammingGuideformoreinformationonmemoryoptimization.
Higher-performancemaybeobtainedbyaligningthehostdatastructuresto256-byteboundariesorbyusingthedevice_vectorclassforfinercontrolovertheallocationproperties.
77.
BoltAPIsalsoacceptdevice_vectorinputs.
Whenshouldusedevice_vectorsformyinputsThebolt::cl::device_vectorinterfaceisathinwrapperaroundthecl::Bufferclass;itprovidesaninterfacesimilartostd::vector.
Also,thedevice_vectoroptionallyacceptsaflagsargumentthatispassedtotheOpenCLbuffercreation.
Thefollowingtwoflagscanbeuseful:CL_MEM_ALLOC_HOST_POINTER:Forcesmemorytobeallocatedinhostmemory,evenondiscreteGPUs.
GPUsaccesstothedataisslower(overthePCIebus),buttheallocationavoidstheoverheadofcopyingdatato,andfrom,theGPU.
ThiscanbeusefulwhenthedataisusedinonlyasingleBoltcallbeforebeingprocessedagainontheCPU.
CL_MEM_USE_PERSISTENT_MEM_AMD:Providesdevice-residentzerocopymemory.
Usethisfordatathatiswrite-onlybytheCPU.
Seesection4.
5.
2oftheAMDAcceleratedParallelProcessingOpenCLProgrammingGuideformoreinformation.
8C++AMP78.
WhatisC++AMPC++AcceleratedMassiveParallelism(C++AMP)acceleratesexecutionofC++codebytakingadvantageofdata-parallelhardware,suchasthecomputeunitsonandAPU.
C++AMPisanopenspecificationthatextendsregularC++throughtheuseoftherestrictkeywordtodenoteportionsofcodetobeexecutedontheaccelerateddevice.
79.
WhatAMDdevicessupportC++AMPAnyAMDAPUorGPUthatsupportsMicrosoftDirectX11FeatureLevel11.
0andhigher.
80.
CanyourunaC++AMPprogramonaCPUYoucanrunaC++AMPprogramonaCPU,butitsusageisnotrecommendedinaproductionenvironment.
Formoreinformation,see:http://blogs.
msdn.
com/b/nativeconcurrency/archive/2012/03/10/cpu-accelerator-in-c-amp.
aspx81.
WherecanIgetsupportforC++AMPC++AMPissupportedbytheC++compilerinMicrosoftVisualStudio2012.
12of13FAQ82.
IsC++AMPsupportedonLinuxAtthistime,C++AMPisnotsupportedonLinux;however,asC++AMPisdefinedasanopenstandard,MicrosofthasopenedthedoorforimplementationsonoperatingsystemsotherthanWindows.
83.
WherecanIlearnmoreaboutC++AMPThereisanMSDNpageinC++AMP.
Also,thebookC++AMP:AcceleratedMassiveParallelismwithMicrosoftVisualC++,byKateGregoryandAdeMiller,maybeuseful.
9Aparapi84.
WhatisAparapiAparapiisanAPIforexpressingdataparallelalgorithmsinJavaandaruntimecomponentcapableofconvertingJavabytecodetoOpenCLforexecutionontheGPU.
IfAparapicannotexecuteontheGPUatruntime,itexecutesthedeveloper'salgorithminaJavathreadpool.
Forappropriateworkloads,thisextendsJava's'WriteOnceRunAnywhere'toincludeGPUdevices.
"85.
WhatAMDdevicessupportAparapiAnyAMDAPU,CPU,orGPUthatsupportsOpenCL1.
1andhigherwillwork.
SeethelistofAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk.
86.
WherecanIgetsupportforAparapiSee:http://code.
google.
com/p/aparapi/87.
IsAparapisupportedonLinuxYes.
88.
WhichJDK/JREisrecommendedforAparapiTheOracleJDK/JRE,althoughnotarequirementforAparapi,ishighlyrecommendedforperformanceandforstabilityreasons.
89.
WherecanIlearnmoreaboutAparapiYoucanfindoutmoreaboutAparapifrom:http://code.
google.
com/p/aparapi/AMD'sproductsarenotdesigned,intended,authorizedorwarrantedforuseascomponentsinsystemsintendedforsurgicalimplantintothebody,orinotherapplicationsintendedtosupportorsustainlife,orinanyotherapplicationinwhichthefailureofAMD'sproductcouldcreateasituationwherepersonalinjury,death,orseverepropertyorenvironmentaldamagemayoccur.
AMDreservestherighttodiscontinueormakechangestoitsproductsatanytimewithoutnotice.
CopyrightandTrademarks2012AdvancedMicroDevices,Inc.
Allrightsreserved.
AMD,theAMDArrowlogo,ATI,theATIlogo,Radeon,FireStream,andcombinationsthereofaretrade-marksofAdvancedMicroDevices,Inc.
OpenCLandtheOpenCLlogoaretrade-marksofAppleInc.
usedbypermissionbyKhronos.
Othernamesareforinfor-mationalpurposesonlyandmaybetrademarksoftheirrespectiveowners.
ThecontentsofthisdocumentareprovidedinconnectionwithAdvancedMicroDevices,Inc.
("AMD")products.
AMDmakesnorepresentationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthispublicationandreservestherighttomakechangestospecificationsandproductdescriptionsatanytimewithoutnotice.
Theinformationcontainedhereinmaybeofapreliminaryoradvancenatureandissubjecttochangewithoutnotice.
Nolicense,whetherexpress,implied,arisingbyestoppelorotherwise,toanyintellectualpropertyrightsisgrantedbythispublication.
ExceptassetforthinAMD'sStandardTermsandConditionsofSale,AMDassumesnoliabilitywhatsoever,anddisclaimsanyexpressorimpliedwar-ranty,relatingtoitsproductsincluding,butnotlimitedto,theimpliedwar-rantyofmerchantability,fitnessforaparticularpurpose,orinfringementofanyintellectualpropertyright.
ContactAdvancedMicroDevices,Inc.
OneAMDPlaceP.
O.
Box3453Sunnyvale,CA,94088-3453Phone:+1.
408.
749.
400013of13FAQForAMDAcceleratedParallelProcessing:URL:developer.
amd.
com/appsdkDeveloping:developer.
amd.
com/Forum:developer.
amd.
com/openclforum

Megalayer新加坡服务器国际带宽线路测评

前几天有关注到Megalayer云服务器提供商有打算在月底的时候新增新加坡机房,这个是继美国、中国香港、菲律宾之外的第四个机房。也有工单询问到官方,新加坡机房有包括CN2国内优化线路和国际带宽,CN2优化线路应该是和菲律宾差不多的。如果我们追求速度和稳定性的中文业务,建议还是选择CN2优化带宽的香港服务器。这里有要到Megalayer新加坡服务器国际带宽的测试服务器,E3-1230配置20M国际带...

数脉科技香港物理机 E3 16G 10M 华为线路165元 阿里云线路 188元 Cera线路 157元

2021年9月中秋特惠优惠促销来源:数脉科技 编辑:数脉科技编辑部 发布时间:2021-09-11 03:31尊敬的新老客户:9月优惠促销信息如下,10Mbps、 30Mbps、 50Mbps、100Mbps香港优质或BGPN2、阿里云线路、华为云线路,满足多种项目需求!支持测试。全部线路首月五折起。数脉官网 https://my.shuhost.com/香港特价数脉阿里云华为云 10MbpsCN...

腾讯云爆款秒杀:1C2G5M服务器38元/年,CDN流量包6元起

农历春节将至,腾讯云开启了热门爆款云产品首单特惠秒杀活动,上海/北京/广州1核2G云服务器首年仅38元起,上架了新的首单优惠活动,每天三场秒杀,长期有效,其中轻量应用服务器2G内存5M带宽仅需年费38元起,其他产品比如CDN流量包、短信包、MySQL、直播流量包、标准存储等等产品也参与活动,腾讯云官网已注册且完成实名认证的国内站用户均可参与。活动页面:https://cloud.tencent.c...

amd裁员为你推荐
vc组合金钟大奖VC组合的两个人分别叫什么?brandoff淘宝上的代购奢侈品都是真品吗?www.983mm.com哪有mm图片?你懂得关键字关键词标签里写多少个关键词为最好sss17.com一玩棋牌吧(www.17wqp.com)怎么样?baqizi.cc徐悲鸿到其中一张很美的女人体画鹤城勿扰齐齐哈尔,又叫鹤城吗?xyq.cbg.163.com梦幻西游藏宝阁怎么开通怎么用www.seowhy.com哪里有免费学习seo的m.yushuwu.com至尊影视网www.xuexiyu.com 怎么只收录首页啊
虚拟空间租用 英文域名 fc2最新域名 域名备案批量查询 鲨鱼机 视频存储服务器 mediafire下载工具 512au java空间 个人空间申请 搜索引擎提交入口 如何安装服务器系统 便宜空间 789 数据库空间 石家庄服务器托管 网页加速 杭州电信 北京主机托管 rewritecond 更多