expensiveamd裁员
amd裁员 时间:2021-04-02 阅读:(
)
FAQ1of13AMDAPPSDKv2.
8FAQ1GeneralQuestions1.
DoIneedtouseadditionalsoftwarewiththeSDKTorunanOpenCLapplication,youmusthaveanOpenCLruntimeonyoursystem.
IfyoursystemincludesarecentAMDdiscreteGPU,oranAPU,youalsoshouldinstallthelatestCatalystdrivers,whichcanbedownloadedfromAMD.
com.
Informationonsupporteddevicescanbefoundatdeveloper.
amd.
com/appsdk.
IfyoursystemdoesnotincludearecentAMDdiscreteGPU,orAPU,theSDKinstallsaCPU-onlyOpenCLrun-time.
Also,werecommendusingthedebuggingprofilingandanalysistoolscontainedintheAMDCodeXLheterogeneouscomputetoolssuite.
2.
WhichversionsoftheOpenCLstandarddoesthisSDKsupportAMDAPPSDK2.
8supportsdevelopmentofapplicationsusingtheOpenCLSpecificationv1.
2.
AsallOpenCL1.
1APIsaresupportedwithinOpenCL1.
2,youalsocandevelopOpenCL1.
1-compliantapplications.
3.
WillapplicationsdevelopedtoexecuteonOpenCL1.
1stilloperateinanOpenCL1.
2environmentOpenCLisdesignedtobebackwardscompatible.
TheOpenCL1.
2run-timedeliveredwiththeAMDCatalystdriversrunanyOpenCL1.
1-compliantapplication.
However,anOpenCL1.
2-compliantapplicationwillnotexecuteonanOpenCL1.
1run-timeifAPIsonlysupportedbyOpenCL1.
2areused.
4.
DoesAMDprovideanyadditionalOpenCLsamples,otherthanthosecontainedwithintheSDKThemostrecentversionsofallofthesamplescontainedwithintheSDKarealsoavailableforindividualdownloadfromthedeveloper.
amd.
com/appsdk"Samples&Demos"page.
ThispagealsocontainsadditionalsamplesthateitherweretoolargetoincludeintheSDK,orwhichhavebeendevelopedsincethemostrecentSDKrelease.
Checkthiswebpagefornew,updated,orlargesamples.
5.
HowoftencanIexpecttogetAMDAPPSDKupdatesDeveloperscanexpectthattheAMDAPPSDKmaybeupdatedtwotothreetimesayear.
Actualreleaseintervalsmayvarydependingonavailablenewfeaturesandproductupdates.
AMDiscommittedtoprovidingdeveloperswithregularupdatestoallowthemtotakeadvantageofthelatestdevelopmentsinAMDAPPtechnology.
2of13FAQ6.
WhatisthedifferencebetweentheCPUandGPUcomponentsofOpenCLthatarebundledwiththeAMDAPPSDKTheCPUcomponentusesthecompatibleCPUcoresinyoursystemtoaccelerateyourOpenCLcomputekernels;theGPUcomponentusesthecompatibleGPUcoresinyoursystemtoaccelerateyourOpenCLcomputekernels.
7.
WhatCPUsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonTheCPUcomponentofOpenCLbundledwiththeAMDAPPSDKworkswithanyx86CPUwithSSE3orlater,aswellasSSE2.
xorlater.
AMDCPUshavesupportedSSE3(andlater)since2005.
SomeexamplesofAMDCPUsthatsupportSSE3(orlater)aretheAMDAthlon64(startingwiththeVenice/SanDiegosteppings),AMDAthlon64X2,AMDAthlon64FX(startingwithSanDiegostepping),AMDOpteron(startingwithE4stepping),AMDSempron(startingwithPalermostepping),AMDPhenom,AMDTurion64,andAMDTurion64X2.
8.
WhatAPUsandGPUsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonForthelistofsupportedAPUsandGPUs,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk9.
CanmyOpenCLcoderunonGPUsfromothervendorsAtthistime,AMDdoesnotplantohavetheAMDAPPSDKsupportGPUproductsfromothervendors;however,sinceOpenCLisanindustrystandardprogramminginterface,programswritteninOpenCL1.
2canberecompiledandrunwithanyOpenCL-compliantcompilerandruntime.
10.
WhatversionofMSVisualStudioissupportedTheAMDAPPSDKv2.
8withOpenCL1.
2supportsMicrosoftVisualStudio2008ProfessionalEdition,MicrosoftVisualStudio2010ProfessionalEdition,andMicrosoftVisualStudio2012.
11.
IsitpossibletorunmultipleAMDAPPapplications(computeandgraphics)concurrentlyMultipleAMDAPPapplicationscanberunconcurrently,aslongastheydonotaccessthesameGPUatthesametime.
AMDAPPapplicationsthatattempttoaccessthesameGPUatthesametimeareautomaticallyserializedbytheruntimesystem.
12.
WhichgraphicsdriverisrequiredforthecurrentAMDAPPSDKv2.
8withOpenCL1.
2CPUsupportFortheminimumrequiredgraphicsdriver,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk.
Ingeneral,itisadvisedthatyouupdateyoursystemtousethemostrecentgraphicsdriversthatareavailableforit.
13.
HowdoesOpenCLcomparetootherAPIsandprogrammingplatformsforparallelcomputing,suchasOpenMPandMPIWhichoneshouldIuseOpenCLisdesignedtotargetparallelismwithinasinglesystemandprovideportabilitytomultipledifferenttypesofdevices(GPUs,multi-coreCPUs,etc.
).
OpenMPtargetsmulti-coreFAQ3of13CPUsandSMPsystems.
MPIisamessagepassingprotocolmostoftenusedforcommunicationbetweennodes;itisapopularparallelprogrammingmodelforclustersofmachines.
Eachprogrammingmodelhasitsadvantages.
ItisanticipatedthatdevelopersmixAPIs,forexampleprogrammingaclusterofmachineswithGPUswithMPIandOpenCL.
14.
IfIwritemycodeontheCPUversion,doesitworkontheGPUversion,ordoIhavetomakechanges.
AssumingthesizelimitationsforCPUsisconsidered,thecodeworksonboththeCPUandGPUcomponents.
Performancetuning,however,isdifferentforeach.
15.
WhatistheprecisionofmathematicaloperationsSeeChapter7,"OpenCLNumericalCompliance,"oftheOpenCL1.
2Specificationforexactmathematicaloperationsprecisionrequirements.
http://developer.
amd.
com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.
aspxID=8816.
Arebyte-addressablestoressupportedByte-addressablestoresaresupported.
17.
ArelongintegerssupportedYes,64-bitintegersaresupported.
18.
AreoperationsonvectorssupportedYes,operationsonvectorsaresupported.
19.
IsswizzlingsupportedYes,swizzling(therearrangingofelementsinavector)issupported.
20.
HowdoesoneverifyifOpenCLhasbeeninstalledcorrectlyonthesystemRunclinfofromthecommand-lineinterface.
TheclinfotoolshowstheavailableOpenCLdevicesonasystem.
21.
HowdoIknowifIhaveinstalledthelatestversionoftheAMDAPPSDKOninstallationoftheSDK,ifaninternetconnectionisavailable,theinstallerstateswhetherornotanewerSDKisavailableinthefileVersionInfo.
txtindirectoryC:\ProgramFiles(x86)\AMDAPP\docs\.
Alternatively,youcancheckforthelatestavailableversionoftheAMDAPPSDKathttp://developer.
amd.
com/appsdk.
2Optimizations22.
HowdoIuseconstantsinakernelforbestperformanceForperformanceusingconstants,highesttolowestperformanceisachievedusing:-Literalvalues-Constantpointerwithcompiletimeconstantsindexing.
-Constantpointerwithruntimeconstantindexingthatisthesameforallthreads.
4of13FAQ-Constantpointerwithlinearaccessindexingthatisthesameforallthreads.
-Constantpointerwithlinearaccessindexingthatisdifferentbetweenthreads.
-Constantpointerwithrandomaccessindexing.
23.
WhyareliteralvaluesthefastestwaytouseconstantsUpto96bitsofliteralvaluesareembeddedintheinstruction;thus,intheory,thereisnolimitonthenumberofusableliterals.
Inpractice,thelimitis16Kuniqueliteralsinacompilationunit.
24.
Whydoesa*b+cnotgenerateamadinstructionDependingonthehardwareandthefloatingpointprecision,thecompilermaynotgenerateamadinstructionforthecomputationofa*b+cduetothefloating-pointprecisionrequirementsintheOpenCLspecification.
Here,developerswhowanttoexploitthemadinstructionforperformancecanreplacethatcomputationwiththemad()built-infunctioninOpenCL.
25.
Whatisthepreferredwork-groupsizeThepreferredwork-groupsizeontheAMDplatformisamultipleof64.
3OpenCLQuestions26.
WhatisOpenCLOpenCL(OpenComputingLanguage)isthefirsttrulyopenandroyalty-freeprogrammingstandardforgeneral-purposecomputationsonheterogeneoussystems.
OpenCLletsprogrammerspreservetheirexpensivesourcecodeinvestmentandeasilytargetbothmulti-coreCPUsandthelatestGPUs,suchasthosefromAMD.
Developedinanopenstandardscommitteewithrepresentativesfrommajorindustryvendors,OpenCLgivesusersacross-vendor,non-proprietarysolutionforacceleratingtheirapplicationsontheirCPUandGPUcores.
27.
HowmuchdoestheAMDOpenCLdevelopmentplatformcostAMDbundlessupportforOpenCLaspartofitsAMDAPPSDKproductoffering.
TheAMDAPPSDKisofferedtodevelopersandusersfreeofcharge.
28.
WhatoperatingsystemsdoestheAMDAPPSDKv2.
8withOpenCL1.
2supportAMDAPPSDKv2.
8runson32-bitand64-bitversionsofWindowsandLinux.
Fortheexactlistofsupportedoperatingsystems,seetheAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk29.
CanIwriteanOpenCLapplicationthatworksonbothCPUandGPUApplicationsthatprogramtothecoreOpenCL1.
2APIandkernellanguageshouldbeabletotargetbothCPUsandGPUs.
Atruntime,theappropriatedevice(CPUorGPU)mustbeselectedbytheapplication.
FAQ5of1330.
DoestheAMDOpenCLcompilerautomaticallyvectorizeforSSEontheCPUTheCPUcomponentofOpenCLthatisbundledwiththeAMDAPPSDKtakesadvantageofSSE3instructionsontheCPU.
ItalsotakesadvantageoftheAVXinstructionswheresupported.
InadditiontoAVX,OpenCLmathlibraryfunctionsalsoleverageXOPandFMA4capabilitiesonCPUsthatsupportthem.
31.
DoestheAMDAPPSDKv2.
8withOpenCL1.
2supportworkonmultipleGPUs(ATICrossFire)OpenCLapplicationscanexplicitlyinvokeseparatecomputekernelsonmultiplecompatibleGPUsinasinglesystem.
Thepartitioningofthealgorithmtomultipleparallelcomputekernelsmustbedonebythedeveloper.
ItisrecommendedthatATICrossFirebeturnedoffinmostsystemconfigurationssothatAMDAPPapplicationscanaccessallavailableGPUsinthesystem.
ATICrossFiretechnologyallowsmultipleAMDGPUstoworktogetheronasinglegraphics-renderingtask.
ThismethoddoesnotapplytoAMDAPPcomputationaltasksbecauseitisnotcompatiblewiththecomputemodelusedforAMDAPPapplications.
32.
CanIshippre-compiledOpenCLapplicationbinariesthatworkoneitherCPUorGPUByusingOpenCLruntimeAPIs,developerscanwriteOpenCLapplicationsthatcandetecttheavailablecompatibleCPUsandGPUsinthesystem.
Thisletsdeveloperspre-compileapplicationsintobinariesthatdynamicallyworkoneitherCPUsorGPUsthatexecuteontargeteddevices.
IncludingLLVMIRinthebinaryprovidesameansforthebinarytosupportdevicesforwhichtheapplicationwasnotexplicitlypre-compiled.
33.
IstheOpenCLdoubleprecisionoptionalextensionsupportedTheKhronosandAMDdoubleprecisionextensionsaresupportedoncertaindevices.
YourapplicationcanusetheOpenCLAPItoqueryifthisfunctionalityissupportedonthedeviceinuse.
34.
IsitpossibletowriteOpenCLcodethatscalestransparentlyovermultipledevicesForOpenCLprogramsthattargetonlymulti-coreCPUs,scalingcanbedonetransparently;however,scalingacrossmultipleGPUsrequiresthedevelopertoexplicitlypartitionthealgorithmintomultiplecomputekernels,aswellasexplicitlylaunchthecomputekernelsontoeachcompatibleGPU.
35.
WhatshouldIdoifIgetwrongresultsontheAppleplatformwithAMDdevicesApplehandlessupportfortheAppleplatform;pleasecontactthem.
36.
IsitpossibletodynamicallyindexintoavectorNo,thisisnotpossiblebecauseavectorisnotanarray,butarepresentationofahardwareregister.
6of13FAQ37.
Whatisthedifferencebetweenlocalinta[4]andinta[4]localinta[4]useshardwarelocalmemory,whichisasmall,low-latency,high-bandwidthmemory;inta[4]usesper-threadhardwarescratchmemory,whichislocatedinuncachedglobalmemory.
38.
Whydoesusingabarriercausethemaxkernelwork-groupsizetodropto64onHD4XXXchipsThesupportedHD4XXXchipsdonothaveahardwarebarrier,sotheOpenCLruntimecannotexecutemorethanasinglewavefrontpergrouptosatisfytheOpenCLmemoryconsistencymodel.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
39.
HowcomemyprogramrunsslowerinOpenCLthaninCUDA/Brook+/ILWhencomparingperformance,itisbettertocomparecodeoptimizedforourOpenCLplatformagainstcodeoptimizedagainstanothervendor'sOpenCLplatforms.
Bycomparingthesametoolchainondifferentvendors,youcanfindoutwhichvendorshardwareworksthebestforyourproblemset.
40.
WhycanInotusetextureonRV7XXdevicesinOpenCLRV7XXdevicesdonotsupportallofthetexturemodesandprecisionrequirementsthatOpenCLrequires.
SincetexturesaremappedtoimagesinOpenCLandisan"allornothing"approach,wedonotsupportimagesonRV7XXdevices;thus,thereisnoaccesstotextures.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
41.
Whydoread-writeimagesnotexistinOpenCLOpenCLhasamemoryconsistencymodelthatrequirescertainconstraints(seetheOpenCLSpecificationformoreinformation).
Sinceimagesarespecialfunctionalhardwareunits,theyaredifferentforreadingandwriting.
Thisisdifferentfrompointers,whichforthemostpartusethesamehardwareunitsandcanguaranteetheconsistencythatOpenCLrequires.
42.
DoesprefetchingworkontheGPUPrefetchisnotneededontheGPUbecausethehardwarehasabuilt-inmechanismtohidelatencywhenmanywork-groupsarerunning.
TheLDScanbeusedasasoftware-controlledcache.
43.
Howdoyoudeterminethemaxnumberofconcurrentwork-groupsThemaximumnumberofconcurrentwork-groupsisdeterminedbyresourceusage.
Thisincludesnumberofregisters,amountofLDSspace,andnumberofthreadsperworkgroup.
Thereisnowaytodirectlyspecifythenumberofregistersusedbyakernel.
EachSIMDhasa64-wideregisterfile,witheachcolumnconsistingof256x32x4registers.
44.
IsitpossibletotellOpenCLnottouseallCPUCoresYes,usethedevicefissionextension.
FAQ7of134OpenCLOptimizations45.
Whatismoreefficient,theternaryoperator:ortheselectfunctionTheselectfunctioncompilestothesinglecycleinstruction,cmov_logical;inmostcases,:alsocompilestothesameinstruction.
Insomecases,whenmemoryisinoneoftheoperands,the:operatoriscompileddowntoanIF/ELSEblock.
AnIF/ELSEblocktakesmorethanasingleinstructiontoexecute.
46.
Whatisthedifferencebetween24-bitand32-bitintegeroperations24-bitoperationsarefasterbecausetheyusefloatingpointhardwareandcanexecuteonallcomputeunits.
Many32-bitintegeroperationsalsorunonallstreamprocessors,butifbotha24-bitanda32-bitversionexistforthesameinstruction,the32-bitinstructionexecutesonlyonepercycle.
5HardwareInformation47.
Howare8/16-bitoperationshandledinhardwareThe8/16-bitoperationsareemulatedwith32-bitregisters.
48.
Do24-bitintegersexistinhardwareNo,thereare24-bitinstructions,suchasMUL24/MAD24,butthesmallestintegerinhardwareregistersis32-bits.
49.
Whatarethebenefitsofusing8/16-bittypesover32-bitintegers8/16-bittypestakelessmemorythana32-bitintegertype,increasingtheamountofdatayouareabletoloadwithasingleinstruction.
TheOpenCLcompilerup-convertsto32-bitsonloadanddown-convertstothecorrecttypeonstore.
50.
WhatisthedifferencebetweenaGPRandasharedregisterAlthoughtheyarephysicallyequivalent,thedifferenceiswhethertheregisteroffsetinthehardwareisabsolutetotheregisterfileorrelativetothewavefrontID.
51.
HowoftenarewavefrontscreatedWavefrontsarecreatedbythehardwaretoexecuteaslongasresourcesareavailable.
Iftheyarecreatedbutcannotexecuteimmediately,theyareputinawaitqueuewheretheystayuntilcurrentlyrunningwavefrontsarefinished.
52.
WhatisthemaximumnumberofwavefrontsThemaximumnumberofwavefrontsisdeterminedbywhichresourcelimitsthenumberofwavefrontsthatcanbespawned.
Thiscanbethenumberofregisters,amountoflocalmemory,requiredstacksize,orotherfactors.
Computeshaderwithlocalmemoryusagehasahardcapat16wavefronts.
8of13FAQ53.
WhydoIgetblueorblackscreenswhenexecutinglongerrunningkernelsTheGPUisnotapreemptabledevice.
IfyouarerunningtheGPUasyourdisplaydevice,ensurethatacomputeprogramdoesnotusetheGPUpastacertaintimelimitsetbyWindows.
Exceedingthetimelimitcausesthewatchdogtimertotrigger;thiscanresultinundefinedprogramresults.
54.
WhatisthecostofaclauseswitchIngeneral,thelatencyofaclauseswitchisaround40cycles.
NotethatthisisrelevantonlyforEvergreenandNorthernIslanddevices.
55.
HowcanIhideclauseswitchlatencyByexecutingmultiplewavefrontsinparallel.
NotethatthisisrelevantonlyforEvergreenandNorthernIslanddevices.
56.
HowcanIreduceclauseswitchesClauseswitchesarealmostdirectlyrelatedtosourceprogramcontrolflow.
Byreducingsourceprogramcontrolflow,clauseswitchescanalsobereduced.
ThisisonlyrelevantforEvergreenandNorthernIslandsdevices.
57.
HowdoesthehardwareexecutealooponawavefrontThelooponlyendsexecutionforawavefrontonceeverythreadinthewavefrontbreaksoutoftheloop.
Onceathreadbreaksoutoftheloop,allofitsexecutionresultsaremasked,buttheexecutionstilloccurs.
58.
HowdoesflowcontrolworkwithwavefrontsTherearenoflowcontrolunitsforeachindividualthread,sothewholewavefrontmustexecutethebranchifanythreadinthewavefrontexecutesthebranch.
Iftheconditionisfalse,theresultsarenotwrittentomemory,buttheexecutionstilloccurs.
59.
WhatistheconstantbuffersizeonGPUhardware64kB.
60.
Whathappenswithout-of-boundmemoryoperationsWritesaredropped,andreadsreturnapre-definedvalue.
61.
For7XXdevices,whydoes64x1givebadperformanceinOpenCLkernelsOneofthereasonsisbecauseofhowthecachesaresetuponRV7XXdevices.
Thecachesareoptimizedtoworkinatiledmode,notinlinearmode(whichisthemodeOpenCLkernelsuse).
Togetoptimalcachere-usefromthetextureincomputeshadermodeonRV7XXdevices,reblockyourthreadIDs.
A16x4,8x8,or4x16shouldgiveyougoodenoughblockingtogetsimilarcacheperformanceasyourpixelshaderkernel.
Thisisbecauseacachelinecanbethoughtofasa4x2blockofdatacominginatonce.
So,forpixelshaders,64threadsareblockedina8x8blockthatusesexactlyeightcachelines.
ForOpenCLkernels,your64x1blockpatternuses16cachelines,butonlyuseshalfthedataineachcacheline.
NotethatHD4XXXdevicesupportisEOL.
CatalystdriversnolongerincludeFAQ9of13supportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
62.
WhatisuniqueabouttheLDSinHD4XXXdevices,andwhatareitsperformancecharacteristicsTheLDSintheHD4XXXdevicesisanowner'swritemodelwithlimitedapplications.
Whenusedcorrectly,ithasverysimilarperformancecharacteristicstotheL1cache,buttheusergainscontroloverwhatdataexistsinthememory.
TheLDS_TransposesampleintheSDKusestheLDSintheHD4XXXdevicesveryefficiently.
NotethatHD4XXXdevicesupportisEOL.
Catalystdriversnolongerincludesupportforthesedevices.
SeetheOpenCLSDKdriverandcompatibilitypageformoredetails.
6MicrosoftVisualStudio63.
CanIusetheSDKwithMicrosoftVisualStudio2012ExpressDuetolimitationsinMicrosoftVisualStudioExpress,itisonlypossibletousebuildfilesforindividualsamples.
MicrosoftVisualStudioExpressdoesnotsupportbuildingofallofthesamplesatthesametime.
TheprojectfilesthatbuildallofthesamplesareonlysupportedbyfullversionsofMicrosoftVisualStudio2008,2010,or2012.
7Bolt64.
WhatGPUsisBoltoptimizedforBoltisoptimizedforAMD"SouthernIslands"familyofGPUs.
65.
WhichGPUcomputetechnologiesaresupportedbyBoltThepreviewversionofBoltusesOpenCLasitscomputeback-end;however,futurereleasesshouldsupportbothC++AMPandOpenCLcomputeback-ends.
66.
AretheBoltAPIsfinalTheAPIsintheBoltpreviewarenotfinal.
TheBoltAPIshouldbestableandshoulduseformaldeprecationproceduresfromBoltrelease1.
0onwards.
PleasehelpustoimproveBolt;weencourageyoutoprovidesuggestionsonhowwecanimprovetheBoltAPI.
67.
DoesBoltrequirelinkingtoalibraryYes,Boltincludesasmall,staticlibrarythatmustbelinkedintoauserprogramtoproperlyresolvesymbols.
SinceBoltisatemplatelibrary,almostallfunctionalityiswritteninheaderfiles,butOpenCLfollowsanonlinecompilationmodelwhichmakessensetoincludeinalibraryandnotinheaderfiles.
68.
DoesBoltdependonanyotherlibrariesYes,BoltcontainsdependenciesonBoost.
Alldependentheaderfilesandpre-compiledlibrariesareavailableintheBoltSDK.
10of13FAQ69.
WhatalgorithmsdoesBoltcurrentlysupportTransform,reduce,transform_reduce,count,count_if,inclusive_scan,exclusive_scan,andsort.
70.
WhatversionofOpenCLdoesBoltcurrentlyrequireBoltusesfeaturesavailableinOpenCLv1.
2.
71.
DoesBoltrequiretheAMDOpenCLruntimeYes,Boltrequirestheuseoftemplatesinkernelcode.
Atthetimeofthiswriting,AMDistheonlyvendortoprovidethissupport.
72.
WhichCatalystpackageshouldIusewithBoltGenerallyspeaking,downloadingandinstallingthelatestCatalystpackagecontainsthemostrecentOpenCLruntime.
Asofthetimeofthiswriting,therecommendedCatalystpackageis12.
10.
73.
WhichlicenseisBoltlicensedunderTheApacheLicense,Version2.
0.
74.
WhenshouldIusedevice_vectorvsregularhostmemorybolt::cl::device_vectorisusedtomanagedevice-localmemoryandcandeliverhigherperformanceondiscreteGPUsystem.
However,thehostmemoryinterfaceseliminatetheneedtocreateandmanagedevice_vectors.
Ifmemoryisre-usedacrossmultipleBoltcallsorisreferencedbyotherkernels,usingdevice_vectordelivershigherperformance75.
HowdoImeasuretheperformanceofOpenCLBoltlibrarycallsThefirsttimethataBoltlibraryroutineiscalled,theruntimecallstheOpenCLcompilertocompilethekernel.
EachuniquetemplateinstantiationreusesthesameOpenCLcompilation,sotwoBoltcallswiththesamefunctorsarecompiledonlyonetime.
WhenmeasuringtheperformanceofBolt,itisimportanttoexcludethefirstcall(withthecompilationoverhead)fromthetimingmeasurement.
HerearetwowaystotimeBoltfunctioncalls:Thefirstexamplepullsthecalloutsidethetimerloop.
std::vectora;bolt::cl::transform(a.
begin,a.
end,z.
begin(),bolt::cl::negate);startTimer(.
.
.
);for(inti=0;i);}stopTimer(.
.
.
);Asecondalternativeistoexplicitlyexcludethefirstcompilationfromthetimingregion://Explicitlystarttimerafterfirstiteration:std::vectora,z;for(inti=0;i);FAQ11of13}stopTimer(.
.
.
);76.
TheBoltlibraryAPIsaccepthostdatastructuressuchasstd::vector().
ArethesecopiedtotheGPUorkeptinhostmemoryForhostdatastructures,BoltcreatesanOpenCLbufferwiththeCL_MEM_USE_HOST_PTRflagandusestheOpenCLmapandunmapAPIstomanagethehostanddeviceaccess.
OntheAMDOpenCLimplementation,thisresultsindatabeingkeptinhostmemorywhenthehostmemoryisalignedon256-byteboundary.
Inothercases,theruntimecopiesthedataasneededtotheGPU.
Seesection4.
5.
2oftheAMDAcceleratedParallelProcessingOpenCLProgrammingGuideformoreinformationonmemoryoptimization.
Higher-performancemaybeobtainedbyaligningthehostdatastructuresto256-byteboundariesorbyusingthedevice_vectorclassforfinercontrolovertheallocationproperties.
77.
BoltAPIsalsoacceptdevice_vectorinputs.
Whenshouldusedevice_vectorsformyinputsThebolt::cl::device_vectorinterfaceisathinwrapperaroundthecl::Bufferclass;itprovidesaninterfacesimilartostd::vector.
Also,thedevice_vectoroptionallyacceptsaflagsargumentthatispassedtotheOpenCLbuffercreation.
Thefollowingtwoflagscanbeuseful:CL_MEM_ALLOC_HOST_POINTER:Forcesmemorytobeallocatedinhostmemory,evenondiscreteGPUs.
GPUsaccesstothedataisslower(overthePCIebus),buttheallocationavoidstheoverheadofcopyingdatato,andfrom,theGPU.
ThiscanbeusefulwhenthedataisusedinonlyasingleBoltcallbeforebeingprocessedagainontheCPU.
CL_MEM_USE_PERSISTENT_MEM_AMD:Providesdevice-residentzerocopymemory.
Usethisfordatathatiswrite-onlybytheCPU.
Seesection4.
5.
2oftheAMDAcceleratedParallelProcessingOpenCLProgrammingGuideformoreinformation.
8C++AMP78.
WhatisC++AMPC++AcceleratedMassiveParallelism(C++AMP)acceleratesexecutionofC++codebytakingadvantageofdata-parallelhardware,suchasthecomputeunitsonandAPU.
C++AMPisanopenspecificationthatextendsregularC++throughtheuseoftherestrictkeywordtodenoteportionsofcodetobeexecutedontheaccelerateddevice.
79.
WhatAMDdevicessupportC++AMPAnyAMDAPUorGPUthatsupportsMicrosoftDirectX11FeatureLevel11.
0andhigher.
80.
CanyourunaC++AMPprogramonaCPUYoucanrunaC++AMPprogramonaCPU,butitsusageisnotrecommendedinaproductionenvironment.
Formoreinformation,see:http://blogs.
msdn.
com/b/nativeconcurrency/archive/2012/03/10/cpu-accelerator-in-c-amp.
aspx81.
WherecanIgetsupportforC++AMPC++AMPissupportedbytheC++compilerinMicrosoftVisualStudio2012.
12of13FAQ82.
IsC++AMPsupportedonLinuxAtthistime,C++AMPisnotsupportedonLinux;however,asC++AMPisdefinedasanopenstandard,MicrosofthasopenedthedoorforimplementationsonoperatingsystemsotherthanWindows.
83.
WherecanIlearnmoreaboutC++AMPThereisanMSDNpageinC++AMP.
Also,thebookC++AMP:AcceleratedMassiveParallelismwithMicrosoftVisualC++,byKateGregoryandAdeMiller,maybeuseful.
9Aparapi84.
WhatisAparapiAparapiisanAPIforexpressingdataparallelalgorithmsinJavaandaruntimecomponentcapableofconvertingJavabytecodetoOpenCLforexecutionontheGPU.
IfAparapicannotexecuteontheGPUatruntime,itexecutesthedeveloper'salgorithminaJavathreadpool.
Forappropriateworkloads,thisextendsJava's'WriteOnceRunAnywhere'toincludeGPUdevices.
"85.
WhatAMDdevicessupportAparapiAnyAMDAPU,CPU,orGPUthatsupportsOpenCL1.
1andhigherwillwork.
SeethelistofAMDAPPSDKv2.
8SystemRequirementslistat:http://developer.
amd.
com/appsdk.
86.
WherecanIgetsupportforAparapiSee:http://code.
google.
com/p/aparapi/87.
IsAparapisupportedonLinuxYes.
88.
WhichJDK/JREisrecommendedforAparapiTheOracleJDK/JRE,althoughnotarequirementforAparapi,ishighlyrecommendedforperformanceandforstabilityreasons.
89.
WherecanIlearnmoreaboutAparapiYoucanfindoutmoreaboutAparapifrom:http://code.
google.
com/p/aparapi/AMD'sproductsarenotdesigned,intended,authorizedorwarrantedforuseascomponentsinsystemsintendedforsurgicalimplantintothebody,orinotherapplicationsintendedtosupportorsustainlife,orinanyotherapplicationinwhichthefailureofAMD'sproductcouldcreateasituationwherepersonalinjury,death,orseverepropertyorenvironmentaldamagemayoccur.
AMDreservestherighttodiscontinueormakechangestoitsproductsatanytimewithoutnotice.
CopyrightandTrademarks2012AdvancedMicroDevices,Inc.
Allrightsreserved.
AMD,theAMDArrowlogo,ATI,theATIlogo,Radeon,FireStream,andcombinationsthereofaretrade-marksofAdvancedMicroDevices,Inc.
OpenCLandtheOpenCLlogoaretrade-marksofAppleInc.
usedbypermissionbyKhronos.
Othernamesareforinfor-mationalpurposesonlyandmaybetrademarksoftheirrespectiveowners.
ThecontentsofthisdocumentareprovidedinconnectionwithAdvancedMicroDevices,Inc.
("AMD")products.
AMDmakesnorepresentationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthispublicationandreservestherighttomakechangestospecificationsandproductdescriptionsatanytimewithoutnotice.
Theinformationcontainedhereinmaybeofapreliminaryoradvancenatureandissubjecttochangewithoutnotice.
Nolicense,whetherexpress,implied,arisingbyestoppelorotherwise,toanyintellectualpropertyrightsisgrantedbythispublication.
ExceptassetforthinAMD'sStandardTermsandConditionsofSale,AMDassumesnoliabilitywhatsoever,anddisclaimsanyexpressorimpliedwar-ranty,relatingtoitsproductsincluding,butnotlimitedto,theimpliedwar-rantyofmerchantability,fitnessforaparticularpurpose,orinfringementofanyintellectualpropertyright.
ContactAdvancedMicroDevices,Inc.
OneAMDPlaceP.
O.
Box3453Sunnyvale,CA,94088-3453Phone:+1.
408.
749.
400013of13FAQForAMDAcceleratedParallelProcessing:URL:developer.
amd.
com/appsdkDeveloping:developer.
amd.
com/Forum:developer.
amd.
com/openclforum
在六月初的时候有介绍过一次来自中国台湾的PQS彼得巧商家(在这里)。商家的特点是有提供台湾彰化HiNet线路VPS主机,起步带宽200M,从带宽速率看是不错的,不过价格也比较贵原价需要300多一个月,是不是很贵?当然懂的人可能会有需要。这次年中促销期间,商家也有提供一定的优惠。比如月付七折,年付达到38折,不过年付价格确实总价格比较高的。第一、商家优惠活动年付三八折优惠:PQS2021-618-C...
hostwebis怎么样?hostwebis昨天在webhosting发布了几款美国高配置大硬盘机器,但报价需要联系客服。看了下该商家的其它产品,发现几款美国服务器、法国服务器还比较实惠,100Mbps不限流量,高配置大硬盘,$44/月起,有兴趣的可以关注一下。HostWebis是一家国外主机品牌,官网宣称1998年就成立了,根据目标市场的不同,以不同品牌名称提供网络托管服务。2003年,通过与W...
中秋节快到了,spinservers针对中国用户准备了几款圣何塞机房特别独立服务器,大家知道这家服务器都是高配,这次推出的机器除了配置高以外,默认1Gbps不限制流量,解除了常规机器10TB/月的流量限制,价格每月179美元起,机器自动化上架,一般30分钟内,有基本自助管理功能,带IPMI,支持安装Windows或者Linux操作系统。配置一 $179/月CPU:Dual Intel Xeon E...
amd裁员为你推荐
怎么查询商标手机上能查询商标吗?怎么查?地陷裂口天上顿时露出一个大窟窿地上也裂开了,一到黑幽幽的深沟可以用什么四字词语来?陈嘉垣反黑阿欣是谁演的 扮演者介绍rawtools照片上面的RAW是什么意思,为什么不能到PS中去编辑罗伦佐娜米开朗琪罗简介www.gegeshe.com《我的电台fm》 she网址是多少?m.2828dy.com电影虫www.dyctv.com这个电影站能下载电影吗?www.5any.com我想去重庆上大学16668.com香港最快开奖现场直播今晚开99nets.com制作网络虚拟证件的网站 那里有呀?
vps推荐 lamp安装 冰山互联 主机评测 GGC 10t等于多少g 2017年黑色星期五 css样式大全 12306抢票助手 免费ftp空间申请 100m免费空间 e蜗 jsp空间 cdn联盟 双十一秒杀 可外链相册 空间合租 网游服务器 shopex主机 四川电信商城 更多