Dresdenhttp

http错误403-禁止访问  时间:2021-04-09  阅读:()
GPUTechnologyConference,May14-17,2012McEneryConventionCenter,SanJose,Californiawww.
gputechconf.
comSessionsonComputationalPhysics(subjecttochange)IMPORTANT:Visithttp://www.
gputechconf.
com/page/sessions.
htmlforthemostup-to-dateschedule.
S0268-VirtualProcessEngineering-RealtimeSimulationofMultiphaseSystemsWeiGe(InstituteofProcessEngineering,ChineseAcademyofSciences)Day:Tuesday,05/15|Time:9:00am-9:50amTopicAreas:ComputationalFluidDynamics;MolecularDynamics;ComputationalPhysics;Algorithms&NumericalTechniquesSessionLevel:AdvancedRealtimesimulationandvirtualrealitywithquantitativelycorrectphysicsforindustrialprocesseswithmulti-scaleandmultiphasesystemisoncearemotedreamforprocessengineering,butisbecomingtruenowwithCPU-GPUhybridsupercomputing.
NumericalandvisualizationmethodsforsuchsimulationsonthousandsofGPUswillbereportedwithapplicationsinchemicalandenergyindustries.
S0258-Sailfish:LatticeBoltzmannFluidSimulationswithGPUsandPythonMichalJanuszewski(UniversityofSilesiainKatowice;GoogleSwitzerland)Day:Tuesday,05/15|Time:9:30am-9:55amTopicAreas:ComputationalFluidDynamics;ComputationalPhysics;DevelopmentTools&LibrariesSessionLevel:IntermediateLearnhowRun-TimeCodeGeneration(RTCG)techniquesallowedforfastdevelopmentofalatticeBoltzmann(LB)fluiddynamicssolvercalledSailfish.
Sailfishiscompletelyopensource,supportsawidevarietyofLBmodels(singleandmultiplerelaxationtimes,theentropicmodel;singleandbinaryfluids)andcantakeadvantageofmultipleGPUs.
EventhoughtheprojectiswrittenpredominantlyinPython,noperformancecompromisesaremade.
ThistalkwillintroducethebasicdesignprinciplesofSailfishandillustratehowRTCGallowstoexploitthepowerofGPUswithminimalprogrammereffort.
S0031-UnstructuredGridNumberingSchemesforGPUCoalescingRequirementsAndrewCorrigan(NavalResearchLaboratory),JohannDahm(UniversityofMichigan)Day:Tuesday,05/15|Time:10:00am-10:25amTopicAreas:ComputationalFluidDynamics;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:AdvancedLearnhowtoachievehighperformanceforcomputationalfluiddynamics(CFD)solversoverunstructuredgridsusingnumberingschemestailoredforGPUcoalescingrequirements.
Usingthesetechniques,unstructuredgridCFDsolverscanmakemoreeffectiveuseofmemorybandwidth,whichisanotherwisesignificantperformancebottleneckthathassofarledtorelativelylimitedperformancegainsonGPUsincomparisontostructuredgridCFDsolvers.
PerformancebenchmarkswillbeshownusingtheJetEngineNoiseReduction(JENRE)code.
S0321-GPU-BasedMonteCarloRayTracingSimulationforSolarPowerPlantsClausNilsson(TietronixSoftware,Inc.
),MichelIzygon(TietronixSoftware,Inc.
)Day:Tuesday,05/15|Time:2:00pm-2:25pmTopicAreas:EnergyExploration;ComputationalPhysics;RayTracingSessionLevel:BeginnerLearnaboutrealtimesimulationsofConcentratingThermalSolarPowerusingGPUtechnologytoenableperformanceoptimizationoftheseutilityscaleplants.
ByleveragingthepowerofGPUsandtheparallelaspectofthefieldofthousandssun-trackingmirrors,wehavebeensuccessfulincuttingthecomputationtimebyordersofmagnitudeversusthepreviouslyrequiredminutesandhoursruntime.
WewillpresentanoverviewoftheproblemdomainanddescribehowweusedtheGPUtoderiveaMonteCarlophysicsraytracingmethodtosimulatethefluxreflectedbythemirrorsontothesolarreceiver.
S0046-ApplicationoftheGPUtoaTwo-PartComputationalElectromagneticAlgorithmEricDunn(SAIC)Day:Tuesday,05/15|Time:2:30pm-2:55pmTopicAreas:ComputationalPhysics;Algorithms&NumericalTechniques;RayTracingSessionLevel:BeginnerTheshootingandbouncingray(SBR)methodisonewaytosimulateelectromagneticfieldradiation.
Likeallmethods,therearecertainproblemswhereitdoesnotyieldaccurateresults.
Inthispresentation,wewillexplainonesuchcasethatconsistsofanantennaresonatingbetweentwometalplates.
Wewilldiscusshowweusedthegraphicsprocessingunit(GPU)toseparatetheproblemintotwoparts.
EachpartissimulatedindividuallywithSBRproducinganimprovedresult.
SuchaGPU-accelerated,two-partapproachcanbeappliedtoothermoregeneralhybridsimulations.
S0379-GPU-basedHigh-PerformanceSimulationsforSpintronicsJanJacob(UniversityofHamburg-InstituteofAppliedPhysicsandMicrostructureResearchCenter)Day:Tuesday,05/15|Time:2:30pm-2:55pmTopicAreas:GeneralInterest;ComputationalPhysics;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateThejointutilizationoftheelectron'schargeandspinin"spintronics"representsapromisingtechnologyfordataprocessingandstorageinnanostructures.
Thecomplexquantumeffectslikethespin-Halleffectinthesedevicesrequiredemandingnumericalsimulationsprovidingaconvenientlinkbetweenidealizedanalyticalmodelstooftenverycomplexresultsfrommeasurements.
ThesimulationsinvolvingmultiplicationsandinversionsoflargematricesprovideanidealshowcaseforperformancegainbyemployingGPGPUsintheexecutionofthealgebraicroutinesonthesematricesincomputingenvironmentswithsharedexecutionofalgorithmsonmultiplenodeswithmultipleGPGPUsandCPUcores.
S0036-MultiparticleCollisionDynamicsonGPUsElmarWestphal(ForschungszentrumJuelich)Day:Tuesday,05/15|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;ComputationalFluidDynamics;MolecularDynamicsSessionLevel:IntermediateSeehowweemployGPUstosimulatetheinteractionofmillionsofsolventandsoluteparticlesofafluidsystem.
Oftenthedomainoflargeclustersystem,themosttimeconsumingpartofoursimulationscannowbedoneondesktopPCsinreasonabletime.
ThiscontributionshowshowGPUscaneffectivelybeusedtoaccelerateexistingprogramsandhowtechniqueslikestreamingandincreaseddatalocalitysignificantlyenhancecalculationthroughput.
ItalsoshowshowaGPU-optimizedprogramstructureyieldsusuallyexpensiveadditionalfunctionality"almostfree".
Furthermore,awell-scalingsingle-node/multi-GPUimplementationoftheprogramispresented.
S0067-PIConGPU-Bringinglarge-scaleLaserPlasmaSimulationstoGPUSupercomputingMichaelBussmann(Helmholtz-ZentrumDresden-Rossendorf),GuidoJuckeland(CenterforInformationServicesandHighPerformanceComputing,TechnicalUniversityDresden)Day:Tuesday,05/15|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;Algorithms&NumericalTechniques;ApplicationDesign&PortingTechniques;SupercomputingSessionLevel:AdvancedWithpowerfullasersbreakingthePetawattbarrier,applicationsforlaser-acceleratedparticlebeamsaregainingmoreinterestthanever.
Ionbeamsacceleratedbyintenselaserpulsesfosternewwaysoftreatingcancerandmakethemavailabletomorepeoplethaneverbefore.
Laser-generatedelectronbeamscandrivenewcompactx-raysourcestocreatesnapshotsofultrafastprocessesinmaterials.
WithPIConGPUlaser-drivenparticleaccelerationcanbecomputedinhourscomparedtoweeksonstandardCPUclusters.
WepresentthetechniquesbehindPIConGPU,detailedperformanceanalysisandthebenefitsofPIConGPUforreal-worldphysicscases.
S0221-1024BitParallelRationalArithmeticOperatorsfortheGPURobertZigon(BeckmanCoulter)Day:Tuesday,05/15|Time:4:00pm-4:50pmTopicAreas:Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateLearnhowtocreateasetofrationalarithmeticoperatorsthatmanipulate1024bitoperandsonaTeslaC2050.
TheseoperatorsareusedtocreateanumericallystableimplementationforBesselfunctions.
NaiveimplementationsoftheBesselfunctionsproduceunreliableresultswhentheyareusedtosolveMaxwell'sequationsbywayofMietheory.
Maxwell'sequationsareusedtomodelthescatteringoflightbysmallparticles.
LightscatterisusedinParticleCharacterizationtomeasurethequalityofmaterialslikecocoa,cementandpharmaceuticals.
S0245-PortingLegacyPlasmaCodestoGPUPengWang(NVIDIA)Day:Tuesday,05/15|Time:4:00pm-4:25pmTopicAreas:ComputationalPhysics;ComputationalPhysicsSessionLevel:IntermediateLearnhowtoportlegacyFortranplasmacodestoGPU.
ManylegacyplasmacodesarewritteninFortranandhavemanylinesofcodes.
WewilldiscusstechniquesinportingsuchlegacycodeseasilyandefficientlytoCUDAC/C++.
Performanceanalysisofmajoralgorithmicpatternsinplasmacodeswillbediscussed.
ThediscussionwillusetheGTCandGeFiplasmacodeasrealisticexamples.
S0058-AdvancingGPUMolecularDynamics:RigidBodiesinHOOMD-blueJoshuaAnderson(UniversityofMichigan),TrungDacNguyen(UniversityofMichigan)Day:Wednesday,05/16|Time:10:00am-10:50amTopicAreas:MolecularDynamics;ComputationalPhysicsSessionLevel:IntermediateLearnhowrigidbodydynamicsareimplementedinHOOMD-blue.
Previousreleaseswerecapableofexecutingclassicalmoleculardynamics--wherefreeparticlesinteractviasmoothpotentialsandtheirmotionthroughtimeiscomputedusingNewton'slaws.
Thelatestversionallowsparticlestobegroupedintobodiesthatmoveasrigidunits.
Userscannowsimulatematerialsmadeofcubes,rods,bentrods,jacks,plates,patchyparticles,buckyballs,oranyotherarbitraryshapes.
ThistalkcovershowthesealgorithmsareimplementedontheGPU,tunedtoperformwellforbodiesofanysize,anddiscussesseveraluse-casesrelevanttoresearch.
S0125-MemoryEfficientReverseTimeMigrationin3DChrisLeader(StanfordExplorationProject)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:EnergyExploration;ComputationalPhysicsSessionLevel:IntermediateLearnhowwecanimagetheinterioroftheEarthinthreedimensionsusingReverseTimeMigration.
WediscusshowGPUsacceleratethismethodusingparallelwavepropagationkernels,texturememoriesandminimaldevicetohosttransfers.
Furtherwediscusshowtheprogressionto3Dpresentsamultitudeofnewproblems,particularlymemorybased-causingthesystemtobeIOlimited.
Bymanipulatingboundarypositionsandvaluestoapseudo-randomformweshowhowmanyofthesememoryrestrictionscanbediminishedandhowdetailedsubsurfaceimagescanbefullyconstructedusingGPUs.
S0236-AdvancedOptimizationTechniquesOnaCUDAImplementationofConjugateGradientSolversEriRubin(OptiTex)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:Algorithms&NumericalTechniques;Algorithms&NumericalTechniques;ComputationalPhysics;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateLinearsystemsareattheheartofallotofcomputeproblems.
Inlargesparsesystems,thereare2distinctapproaches,thedirectanditerativesolvers.
Aftermanyyearsofresearchingandtestingbothapproaches,onCPUandGPUwehaveimplementedahighlyefficientCGsolverontheGPUusingacombinationofuniquetechniques.
Inthistalkwewillgooverthesetechniquesandtheimprovedperformancetheybring.
S0312-GPUImplementationforRapidIterativeImageReconstructioninNuclearMedicineJakubPietrzak(UniversityofWarsaw)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:MedicalImaging&Visualization;ComputationalPhysics;ComputerGraphicsSessionLevel:IntermediateGPUimplementationcangreatlyaccelerateiterativetechniquesof3Dimagereconstructioninnuclearmedicineimaging.
SinglePhotonEmissionComputedTomography(SPECT)isafunctionalimagingmodalitywidelyusedinclinicaldiagnosis.
Toobtainhighqualityimageswithinreducedscanningtimeshighsensitivitycollimatorsneedtobeusedandtheirresponsefunctionmodeledinthereconstruction.
ThisisingeneralverycomputationallyintensiveandunfeasiblewithCPUandalgorithmimplementations.
Oursoftwareisabletoperformthereconstructionofpatientdatawithinclinicallyacceptabletimesusingrelativelylowcostandwidelyavailablehardware.
S0352-GPU-AcceleratedParallelComputingforSimulationofSeismicWavePropagationTaroOkamoto(DepartmentofEarthandPlanetarySciences,TokyoInstituteofTechnology)Day:Wednesday,05/16|Time:10:30am-10:55amTopicAreas:ComputationalPhysics;GeneralInterestSessionLevel:AdvancedWeadoptedGPUtoacceleratelarge-scale,parallelfinite-difference(FDTD)simulationofseismicwavepropagation.
EffectiveparallelimplementationisneededbecausethesizeofthememoryofasingleGPUistoosmallforrealapplications.
Thuswedescribethememoryoptimization,thethree-dimensionaldomaindecomposition,andoverlappingthecommunicationandcomputationadoptedinourprogram.
Weachievedsofarahighperformance(single-precision)ofabout61TFlopsbyusing1200GPUsofTSUBAME-2.
0,theGPUsupercomputerinTokyoInstituteofTechnology,Japan.
Asanimportantapplication,weshowtheresultsofthesimulationofthe2011Tohoku-Okimega-quake.
S0269-Accelerating3D-RISMCalculationsusingGPUsYutakaMaruyama(InstituteforMolecularScience),FumioHirata(InstituteforMolecularScience)Day:Wednesday,05/16|Time:3:00pm-3:25pmTopicAreas:LifeSciences;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateThethree-dimensionalreferenceinteractionsitemodel(3D-RISM)theory,isapowerfultooltoinvestigatebiomolecularprocessesinsolution.
Unfortunately,3D-RISMcalculationsareoftenbothmemoryintensiveandtime-consuming.
WesoughttoacceleratethesecalculationsusingGPUs.
ToworkaroundtheproblemoflimitedmemorysizeinGPUs,wemodifiedthelessmemory-intensiveAndersonmethodforfasterconvergenceof3D-RISMcalculations.
UsingthismethodonC2070,wereducedthecomputationaltimebyafactorofeightcomparedtoIntelXeon(8cores,3.
33GHz)withtheconventionalmethod.
S0055-ParticleDynamicswithMBDandFEAusingCUDAGrahamSanborn(FunctionBay)Day:Wednesday,05/16|Time:4:00pm-4:25pmTopicAreas:ComputationalStructuralMechanics;ComputationalPhysics;ComputationalFluidDynamicsSessionLevel:IntermediateManysphereparticlesaresolvedwithDEM(DiscreteElementMethod)andsimulatedwithGPUtechnology.
Fastalgorithmisappliedtocalculatehertziancontactforcesbetweenmanysphereparticles(from100,000to1,000,000)andNVIDIA'sCUDAisusedtoacceleratethecalculation.
ManysphereparticlesandMBDandFEAentitiesaresimulatedwithincommercialsoftwareRecurDyn.
Manymodelsarebuiltandsimulated;forklifterwithsandmodel,oilinoiltankmodel,oilfilledenginesystemandwaterfilledwashingmachinemodel.
AllmodelsaresimulatedwithNVIDIA'sGPUandtheresultisshown.
S0363-EfficientMolecularDynamicsonHeterogeneousGPUArchitecturesinGROMACSSzilárdPáll(KTHRoyalInstituteofTechnology),BerkHess(KTHRoyalInstituteofTechnology)Day:Wednesday,05/16|Time:4:00pm-4:25pmTopicAreas:MolecularDynamics;ComputationalPhysics;LifeSciencesSessionLevel:IntermediateMolecularDynamicsisanimportantapplicationforGPUacceleration,butmanyalgorithmicoptimizationsandfeaturesstillrelyoncodethatpreferstraditionalCPUs.
ItisonlywiththelatesthardwareandsoftwarewehavebeenabletorealizeaheterogeneousGPU/CPUimplementationandreachperformancesignificantlybeyondthestate-of-the-artofhand-tunedCPUcodeinourGROMACSprogram.
Thesub-milliseconditerationtimeposeschallengesonalllevelsofparallelization.
Comeandlearnaboutournewatom-clusterpairinteractionapproachfornon-bondedforceevaluationthatachieves60%work-efficiencyandotherinnovativesolutionsforheterogeneousGPUsystems.
S0139-GPU-BasedMolecularDynamicsSimulationsofProteinandRNAAssemblySamuelCho(WakeForestUniversity)Day:Wednesday,05/16|Time:5:00pm-5:25pmTopicAreas:MolecularDynamics;ComputationalPhysicsSessionLevel:IntermediateProteinandRNAbiomolecularfoldingandassemblyproblemshaveimportantapplicationsbecausemisfoldingisassociatedwithdiseaseslikeAlzheimer'sandParkinson's.
However,simulatingcomplexbiomoleculesonthesametimescalesasexperimentsisanextraordinarychallengeduetoabottleneckintheforcecalculations.
Toovercomethesehurdles,weperformcoarse-grainedmoleculardynamicssimulationswherebiomoleculesarereducedintosimplercomponents.
Furthermore,ourGPU-basedsimulationshaveasignificantperformanceimprovementoverCPU-basedsimulations,whichislimitedtosystemsof50-150residues/nucleotides.
TheGPU-basedcodecansimulateprotein/RNAsystemsof400-10,000+residues/nucleotides,andwepresentribosomeassemblysimulations.
S0129-AMonteCarloThermalRadiationSolverinGPU/CPUHybridArchitectureGaofengWang(LaboratoireE.
M2.
C,EcoleCentraleParis),OliverGicquel(LaboratoireE.
M2.
C,EcoleCentraleParis)Day:Thursday,05/17|Time:9:00am-9:25amTopicAreas:ComputationalFluidDynamics;ComputationalFluidDynamics;ComputationalPhysics;RayTracingSessionLevel:IntermediateAMonteCarloray-tracingcodeisdevelopedtopredictradiativeheattransferbehaviorsinCFDsimulationofcombustionphenomena.
Usingemission-reciprocalmethod,eachrandomraycastingofeachnodecouldbeindependentlyconductedforparallelcomputations.
ThecodeisefficientlyimplementedinhybridGPU/CPUHPCresourcesusingadedicateddynamicloadbalancingstrategy.
AlinearspeedupscalingofhybridHPCresourceshasbeenshownindemonstratingcalculationofradiativeheattransferofahelicopterengine'scombustionchamber,whileaddingoneGPUinHPCresourcespoolisinsenseofnineCPUcoressupplements.
S0508-FasterFiniteElementsforWavePropagationCodesMaxRietmann(InstituteforComputationalScience/USILugano,Switzerland)Day:Thursday,05/17|Time:10:00am-10:25amTopicAreas:Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateLearnhowtodevelopfasterandbetterfinite-elementcodesforwavepropagationusingGPUsandMPIcombinedwithoverlappingtechniquestohidethecostofcommunicationsandofhost/devicememorycopies.
Differentoptionsbasedonmeshcoloringoronatomicoperationswillbepresented.
Thedifficultytodefinespeedupwillalsobediscussed(speedupversuswhatusingwhatdefinitionof"cost").
ExampleswillbegivenusingSPECFEM3D,ahighlyoptimizedspectralfinite-elementcodethathaswontheGordonBellSupercomputingawardandtheBULLJosephFourieraward,andthatcanrunonCPUorGPUclusters.
S0039-Data-DrivenGPGPUIdeologyExtensionAlexandrKosenkov(UniversityofGeneva),BelaBauer(MicrosoftResearch)Day:Thursday,05/17|Time:10:00am-10:25amTopicAreas:ApplicationDesign&PortingTechniques;ComputationalPhysics;ParallelProgrammingLanguages&Compilers;DevelopmentTools&LibrariesSessionLevel:AdvancedInthissessionwewilldemonstratehowtheGPGPUideologycanbeextendedsothatitcanbeusedonascaleofInfinibandhybridsystem.
Theapproachthatwearepresentingcombinesdelayedexecution,schedulingtechniquesand,mostimportantly,castsdowntheCPUmulti-coreideologytothestreamingmultiprocessor'soneenforcingfullfledged"GPGPUasaco-processor"wayofprogrammingforlarge-scaleMPIhybridapplications.
StayingcompatiblewithmodernCPU/GPGPUlibrariesitprovidesmorethanafinegrainedcontroloverresources-morethanyouwantedthatis.
S0217-EfficientImplementationofCFDAlgorithmsonGPUAcceleratedSupercomputersAliKhajeh-Saeed(UniversityofMassachusetts,Amherst),BlairPerot(UniversityofMassachusetts,Amherst)Day:Thursday,05/17|Time:10:30am-10:55amTopicAreas:ComputationalFluidDynamics;ComputationalPhysics;Supercomputing;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateThegoalofthissessionistointroducetheconceptsnecessarytoperformlargecomputationalfluiddynamic(CFD)problemsoncollectionsofmanyGPUs.
CommunicationandcomputationoverlappingschemesbecomeevenmorecriticalwhenusingfastcomputeenginessuchasGPUsthatareconnectedviaarelativelyslowinterconnect(suchasMPIonInfiniBand).
ThealgorithmspresentedarevalidatedonunsteadyCFDsimulationsofturbulenceusing192graphicsprocessorstoupdatehalf-a-billionunknownspercomputationaltimestep.
TheperformanceresultsfromthreedifferentGPUacceleratedsupercomputers(Lincoln,Forge,andKeeneland)arecomparedwithalargeCPUbasedsupercomputer(Ranger).
S0378-VASPAcceleratedwithGPUsMaxwellHutchinson(UniversityofChicago)Day:Thursday,05/17|Time:2:00pm-2:50pmTopicAreas:QuantumChemistry;ApplicationDesign&PortingTechniques;ComputationalPhysicsSessionLevel:IntermediateThissessionwilldetailtheperformanceandcapabilitiesofGPU-acceleratedVASP,explaindesigndecisionsmadeinportingVASPtoCUDA,andpresentaroadmapforGPUacceleratedVASPdevelopment.
We'veachievedperformanceimprovementsuptoaround20xonsystemsofaround100ionsandhaveimplementedexact-exchange.
Weareworkingonportsofmoreconventionalfunctionality.
S0071-TheHigh-LevelLinearAlgebraLibraryViennaCLAndItsApplicationsKarlRupp(TUWien)Day:Thursday,05/17|Time:3:00pm-3:50pmTopicAreas:DevelopmentTools&Libraries;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateGettoknowViennaCL,anOpenCLhigh-levellinearalgebrasoftware,whichallowstogetthespeedofGPUcomputingattheconvenienceleveloftheC++Boostlibraries.
Decreasethedevelopmentandexecutiontimeofapplicationsbyutilizingourwell-testedandwidelyusedlibrary,insteadofspendingdaysonlearningdetailsofGPUarchitecturesanddebugging.
Weprovideexamplesthatdemonstratenotonlyhowquicklyexistingapplicationsareportedefficientlyfromsingle-threadedexecutiontofullyutilizingmulti-threadedenvironments,butalsohowtoutilizetherichsetoffunctionalitiesrangingfromcommonBLASroutinestoiterativesolvers.
S0087-GPUAccelerationofDenseStellarClustersSimulationBharathPattabiraman(NorthwesternUniversity),StefanUmbreit(NorthwesternUniversity)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:Astronomy&Astrophysics;ComputationalPhysics;Algorithms&NumericalTechniquesSessionLevel:IntermediateComputingtheinteractionsbetweenstarswithindensestellarclustersisaproblemoffundamentalimportanceintheoreticalastrophysics.
ThispaperpresentstheparallelizationofaMonteCarloalgorithmforsimulatingstellarclusterevolutionusingprogrammableGraphicsProcessingUnits.
Thekernelsofthisalgorithmexhibithighlevelsofdatadependentdecisionmakingandunavoidablenon-contiguousmemoryaccesses.
However,weadoptvariousparallelizationstrategiesandutilizethehighcomputingpoweroftheGPUtoobtainsubstantialnear-linearspeedupswhichcannotbeeasilyachievedonaCPU-basedsystem.
Thisaccelerationallowstoexplorephysicalregimeswhichwereoutofreachofcurrentsimulations.
S0368-UnravelingtheMysteriesofQuarkswithHundredsofGPUsRonaldBabich(NVIDIA)Day:Thursday,05/17|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;ApplicationDesign&PortingTechniques;Algorithms&NumericalTechniques;SupercomputingSessionLevel:IntermediateDiveintotheworldofquarksandgluons,andhearhowGPUcomputingisrevolutionizingthewaymanycalculationsinlatticequantumchromodynamics(latticeQCD)areperformed.
Themaincomputationalchallengeinsuchcalculationsistorepeatedlysolvelargesystemsoflinearequationsarisingfromafour-dimensionalfinite-differenceproblem.
Inthissession,we'lldiscussstrategiesforparallelizingsuchasolveracrosshundredsofGPUs.
Theseincludetechniquesandalgorithmsforreducingmemorytrafficandinter-GPUcommunication.
Thenetresultisanimplementationthatachievesbetterthan20Tflopson256GPUs,realizedintheopen-source"QUDA"library.
S0091-SustainableHybridParallelizationofanUnstructuredHydrodynamicCodeRaphalPoncet(Commissariatàl'EnergieAtomiqueetauxEnergiesAlternatives)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:ApplicationDesign&PortingTechniques;Algorithms&NumericalTechniques;ComputationalFluidDynamics;ComputationalPhysicsSessionLevel:AdvancedThegoalofthispresentationistoshareourmethodologyforportinganumericalcodetohybridsupercomputingarchitecturesusingMPIcoupledwithdirective-basedlanguages(OpenMPformulticoreCPUs,andHMPPforGPUs).
Ourcode,VOLNA,isanunstructuredpartialdifferentialequationhydrodynamicsolverdevelopedforthesimulationoftsunamis.
Ourresultsdemonstratethatusingdirective-basedlanguagessuchasHMPPforGPUprogramming,onecanretaingoodperformance(e.
g.
speedupof15comparedto1CPUcore,3comparedto8CPUcores)withminimalmodificationsoftheoriginalCPUsourcecode(about30linesofdirectivesinourcase).
S0334-TheFastMultipoleMethodonCPUandGPUProcessorsEricDarve(Stanford)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:ComputationalPhysics;MolecularDynamics;Algorithms&NumericalTechniquesSessionLevel:AdvancedThefastmultipolemethod(FMM)isawidelyusednumericalalgorithmincomputationalengineering.
AcceleratingtheFMMonCUDA-enabledGPUsischallengingbecausetheFMMhasacomplicateddataaccesspattern,mostlyduringtheso-calledmultipole-to-local(M2L)operation.
WehavecreatedseveralschemestooptimizetheM2Landhaveattainedaperformanceofover350(resp.
160)Gflop/sforsingle(double)precisionarithmetic.
TheoptimalalgorithmwasincorporatedintoacompleteFMMcode,whichcanacceptanysmoothkernelasspecifiedbytheuser,makingitveryflexible.
WehavealsodevelopedahighlyefficientCPUversion.
S0282-LeveragingNVIDIAGPUDirectonAPEnet+3DTorusClusterInterconnectDavideRossetti(ItalianNationalInstitueforNuclearPhysics)Day:Thursday,05/17|Time:4:30pm-4:55pmTopicAreas:Supercomputing;ComputationalPhysicsSessionLevel:IntermediateAPEnet+isanovelclusterinterconnect,basedonacustomPCIcardwhichfeaturesaPCIExpressGen2X8linkandare-configurableHWcomponent(FPGA).
Itsupportsa3DTorustopologyandhasspecialaccelerationfeaturesspecificallydevelopedforNVIDIAFermiGPUs.
AnintroductiontothebasicfeaturesandtheprogrammingmodelofAPEnet+willbefollowedbyadescriptionofitsperformanceonsomenumericalsimulations,e.
g.
HighEnergyPhysicssimulations.
S0218-ASIParallelFortran:AGeneral-PurposeFortrantoGPUTranslatorRainaldLohner(GeorgeMasonUniversity)Day:Thursday,05/17|Time:4:30pm-4:55pmTopicAreas:DevelopmentTools&Libraries;ComputationalFluidDynamics;ComputationalPhysics;ParallelProgrammingLanguages&CompilersSessionLevel:AdvancedOverthelast3yearswehavedevelopedageneral-purposeFortrantoGPUtranslator:ASIParallelFortrandoes.
Thetalkwilldetailitspurpose,designlayoutandcapabilities,andshowhowitisusedandimplemented.
TheuseofASIParallelFortranwillbeshownforlarge-scaleCFD/CEMcodesaswellasothergeneralpurposeFortrancodes.

火数云-618限时活动,国内云服务器大连3折,限量50台,九江7折 限量30台!

官方网站:点击访问火数云活动官网活动方案:CPU内存硬盘带宽流量架构IP机房价格购买地址4核4G50G 高效云盘20Mbps独享不限openstack1个九江287元/月立即抢购4核8G50G 高效云盘20Mbps独享不限openstack1个九江329元/月立即抢购2核2G50G 高效云盘5Mbps独享不限openstack1个大连15.9元/月立即抢购2核4G50G 高效云盘5Mbps独享不限...

[6.18]IMIDC:香港/台湾服务器月付30美元起,日本/俄罗斯服务器月付49美元起

IMIDC发布了6.18大促销活动,针对香港、台湾、日本和莫斯科独立服务器提供特别优惠价格最低月付30美元起。IMIDC名为彩虹数据(Rainbow Cloud),是一家香港本土运营商,全线产品自营,自有IP网络资源等,提供的产品包括VPS主机、独立服务器、站群独立服务器等,数据中心区域包括香港、日本、台湾、美国和南非等地机房,CN2网络直连到中国大陆。香港服务器   $39/...

georgedatacenter:美国VPS可选洛杉矶/芝加哥/纽约/达拉斯机房,$20/年;洛杉矶独立服务器39美元/月

georgedatacenter怎么样?georgedatacenter这次其实是两个促销,一是促销一款特价洛杉矶E3-1220 V5独服,性价比其实最高;另外还促销三款特价vps,大家可以根据自己的需要入手。georgedatacenter是一家成立于2019年的美国vps商家,主营美国洛杉矶、芝加哥、达拉斯、新泽西、西雅图机房的VPS、邮件服务器和托管独立服务器业务。georgedatacen...

http错误403-禁止访问为你推荐
乐划锁屏乐视手机如何破解锁屏密码?硬盘的工作原理简述下硬盘的工作原理?李子柒年入1.6亿新晋网红李子柒是不是背后有团队是摆拍、炒作为的是人气、流量?www.yahoo.com.hk香港有什么网页www.bbb336.comwww.zzfyx.com大家感觉这个网站咋样,给俺看看呀。多提意见哦。哈哈。javmoo.comjavbus上不去.怎么办抓站工具一起来捉妖神行抓妖辅助工具都有哪些?广告法新修订的《广告法》有哪些内容66smsm.com【回家的欲望(回家的诱惑)大结局】 回家的诱惑全集66 67 68 69 70集QOVD快播观看地址??bk乐乐《哭泣的Bk》是Bk乐乐唱的吗?
1g虚拟主机 fc2新域名 北京服务器租用 simcentric cdn服务器 私服服务器 圣诞节促销 免费ftp站点 免费网站申请 架设服务器 ftp教程 柚子舍官网 怎样建立邮箱 酷番云 国外视频网站有哪些 dnspod 免费asp空间申请 服务器防火墙 万网空间 网页加速 更多