GPUTechnologyConference,May14-17,2012McEneryConventionCenter,SanJose,Californiawww.
gputechconf.
comSessionsonComputationalPhysics(subjecttochange)IMPORTANT:Visithttp://www.
gputechconf.
com/page/sessions.
htmlforthemostup-to-dateschedule.
S0268-VirtualProcessEngineering-RealtimeSimulationofMultiphaseSystemsWeiGe(InstituteofProcessEngineering,ChineseAcademyofSciences)Day:Tuesday,05/15|Time:9:00am-9:50amTopicAreas:ComputationalFluidDynamics;MolecularDynamics;ComputationalPhysics;Algorithms&NumericalTechniquesSessionLevel:AdvancedRealtimesimulationandvirtualrealitywithquantitativelycorrectphysicsforindustrialprocesseswithmulti-scaleandmultiphasesystemisoncearemotedreamforprocessengineering,butisbecomingtruenowwithCPU-GPUhybridsupercomputing.
NumericalandvisualizationmethodsforsuchsimulationsonthousandsofGPUswillbereportedwithapplicationsinchemicalandenergyindustries.
S0258-Sailfish:LatticeBoltzmannFluidSimulationswithGPUsandPythonMichalJanuszewski(UniversityofSilesiainKatowice;GoogleSwitzerland)Day:Tuesday,05/15|Time:9:30am-9:55amTopicAreas:ComputationalFluidDynamics;ComputationalPhysics;DevelopmentTools&LibrariesSessionLevel:IntermediateLearnhowRun-TimeCodeGeneration(RTCG)techniquesallowedforfastdevelopmentofalatticeBoltzmann(LB)fluiddynamicssolvercalledSailfish.
Sailfishiscompletelyopensource,supportsawidevarietyofLBmodels(singleandmultiplerelaxationtimes,theentropicmodel;singleandbinaryfluids)andcantakeadvantageofmultipleGPUs.
EventhoughtheprojectiswrittenpredominantlyinPython,noperformancecompromisesaremade.
ThistalkwillintroducethebasicdesignprinciplesofSailfishandillustratehowRTCGallowstoexploitthepowerofGPUswithminimalprogrammereffort.
S0031-UnstructuredGridNumberingSchemesforGPUCoalescingRequirementsAndrewCorrigan(NavalResearchLaboratory),JohannDahm(UniversityofMichigan)Day:Tuesday,05/15|Time:10:00am-10:25amTopicAreas:ComputationalFluidDynamics;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:AdvancedLearnhowtoachievehighperformanceforcomputationalfluiddynamics(CFD)solversoverunstructuredgridsusingnumberingschemestailoredforGPUcoalescingrequirements.
Usingthesetechniques,unstructuredgridCFDsolverscanmakemoreeffectiveuseofmemorybandwidth,whichisanotherwisesignificantperformancebottleneckthathassofarledtorelativelylimitedperformancegainsonGPUsincomparisontostructuredgridCFDsolvers.
PerformancebenchmarkswillbeshownusingtheJetEngineNoiseReduction(JENRE)code.
S0321-GPU-BasedMonteCarloRayTracingSimulationforSolarPowerPlantsClausNilsson(TietronixSoftware,Inc.
),MichelIzygon(TietronixSoftware,Inc.
)Day:Tuesday,05/15|Time:2:00pm-2:25pmTopicAreas:EnergyExploration;ComputationalPhysics;RayTracingSessionLevel:BeginnerLearnaboutrealtimesimulationsofConcentratingThermalSolarPowerusingGPUtechnologytoenableperformanceoptimizationoftheseutilityscaleplants.
ByleveragingthepowerofGPUsandtheparallelaspectofthefieldofthousandssun-trackingmirrors,wehavebeensuccessfulincuttingthecomputationtimebyordersofmagnitudeversusthepreviouslyrequiredminutesandhoursruntime.
WewillpresentanoverviewoftheproblemdomainanddescribehowweusedtheGPUtoderiveaMonteCarlophysicsraytracingmethodtosimulatethefluxreflectedbythemirrorsontothesolarreceiver.
S0046-ApplicationoftheGPUtoaTwo-PartComputationalElectromagneticAlgorithmEricDunn(SAIC)Day:Tuesday,05/15|Time:2:30pm-2:55pmTopicAreas:ComputationalPhysics;Algorithms&NumericalTechniques;RayTracingSessionLevel:BeginnerTheshootingandbouncingray(SBR)methodisonewaytosimulateelectromagneticfieldradiation.
Likeallmethods,therearecertainproblemswhereitdoesnotyieldaccurateresults.
Inthispresentation,wewillexplainonesuchcasethatconsistsofanantennaresonatingbetweentwometalplates.
Wewilldiscusshowweusedthegraphicsprocessingunit(GPU)toseparatetheproblemintotwoparts.
EachpartissimulatedindividuallywithSBRproducinganimprovedresult.
SuchaGPU-accelerated,two-partapproachcanbeappliedtoothermoregeneralhybridsimulations.
S0379-GPU-basedHigh-PerformanceSimulationsforSpintronicsJanJacob(UniversityofHamburg-InstituteofAppliedPhysicsandMicrostructureResearchCenter)Day:Tuesday,05/15|Time:2:30pm-2:55pmTopicAreas:GeneralInterest;ComputationalPhysics;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateThejointutilizationoftheelectron'schargeandspinin"spintronics"representsapromisingtechnologyfordataprocessingandstorageinnanostructures.
Thecomplexquantumeffectslikethespin-Halleffectinthesedevicesrequiredemandingnumericalsimulationsprovidingaconvenientlinkbetweenidealizedanalyticalmodelstooftenverycomplexresultsfrommeasurements.
ThesimulationsinvolvingmultiplicationsandinversionsoflargematricesprovideanidealshowcaseforperformancegainbyemployingGPGPUsintheexecutionofthealgebraicroutinesonthesematricesincomputingenvironmentswithsharedexecutionofalgorithmsonmultiplenodeswithmultipleGPGPUsandCPUcores.
S0036-MultiparticleCollisionDynamicsonGPUsElmarWestphal(ForschungszentrumJuelich)Day:Tuesday,05/15|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;ComputationalFluidDynamics;MolecularDynamicsSessionLevel:IntermediateSeehowweemployGPUstosimulatetheinteractionofmillionsofsolventandsoluteparticlesofafluidsystem.
Oftenthedomainoflargeclustersystem,themosttimeconsumingpartofoursimulationscannowbedoneondesktopPCsinreasonabletime.
ThiscontributionshowshowGPUscaneffectivelybeusedtoaccelerateexistingprogramsandhowtechniqueslikestreamingandincreaseddatalocalitysignificantlyenhancecalculationthroughput.
ItalsoshowshowaGPU-optimizedprogramstructureyieldsusuallyexpensiveadditionalfunctionality"almostfree".
Furthermore,awell-scalingsingle-node/multi-GPUimplementationoftheprogramispresented.
S0067-PIConGPU-Bringinglarge-scaleLaserPlasmaSimulationstoGPUSupercomputingMichaelBussmann(Helmholtz-ZentrumDresden-Rossendorf),GuidoJuckeland(CenterforInformationServicesandHighPerformanceComputing,TechnicalUniversityDresden)Day:Tuesday,05/15|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;Algorithms&NumericalTechniques;ApplicationDesign&PortingTechniques;SupercomputingSessionLevel:AdvancedWithpowerfullasersbreakingthePetawattbarrier,applicationsforlaser-acceleratedparticlebeamsaregainingmoreinterestthanever.
Ionbeamsacceleratedbyintenselaserpulsesfosternewwaysoftreatingcancerandmakethemavailabletomorepeoplethaneverbefore.
Laser-generatedelectronbeamscandrivenewcompactx-raysourcestocreatesnapshotsofultrafastprocessesinmaterials.
WithPIConGPUlaser-drivenparticleaccelerationcanbecomputedinhourscomparedtoweeksonstandardCPUclusters.
WepresentthetechniquesbehindPIConGPU,detailedperformanceanalysisandthebenefitsofPIConGPUforreal-worldphysicscases.
S0221-1024BitParallelRationalArithmeticOperatorsfortheGPURobertZigon(BeckmanCoulter)Day:Tuesday,05/15|Time:4:00pm-4:50pmTopicAreas:Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateLearnhowtocreateasetofrationalarithmeticoperatorsthatmanipulate1024bitoperandsonaTeslaC2050.
TheseoperatorsareusedtocreateanumericallystableimplementationforBesselfunctions.
NaiveimplementationsoftheBesselfunctionsproduceunreliableresultswhentheyareusedtosolveMaxwell'sequationsbywayofMietheory.
Maxwell'sequationsareusedtomodelthescatteringoflightbysmallparticles.
LightscatterisusedinParticleCharacterizationtomeasurethequalityofmaterialslikecocoa,cementandpharmaceuticals.
S0245-PortingLegacyPlasmaCodestoGPUPengWang(NVIDIA)Day:Tuesday,05/15|Time:4:00pm-4:25pmTopicAreas:ComputationalPhysics;ComputationalPhysicsSessionLevel:IntermediateLearnhowtoportlegacyFortranplasmacodestoGPU.
ManylegacyplasmacodesarewritteninFortranandhavemanylinesofcodes.
WewilldiscusstechniquesinportingsuchlegacycodeseasilyandefficientlytoCUDAC/C++.
Performanceanalysisofmajoralgorithmicpatternsinplasmacodeswillbediscussed.
ThediscussionwillusetheGTCandGeFiplasmacodeasrealisticexamples.
S0058-AdvancingGPUMolecularDynamics:RigidBodiesinHOOMD-blueJoshuaAnderson(UniversityofMichigan),TrungDacNguyen(UniversityofMichigan)Day:Wednesday,05/16|Time:10:00am-10:50amTopicAreas:MolecularDynamics;ComputationalPhysicsSessionLevel:IntermediateLearnhowrigidbodydynamicsareimplementedinHOOMD-blue.
Previousreleaseswerecapableofexecutingclassicalmoleculardynamics--wherefreeparticlesinteractviasmoothpotentialsandtheirmotionthroughtimeiscomputedusingNewton'slaws.
Thelatestversionallowsparticlestobegroupedintobodiesthatmoveasrigidunits.
Userscannowsimulatematerialsmadeofcubes,rods,bentrods,jacks,plates,patchyparticles,buckyballs,oranyotherarbitraryshapes.
ThistalkcovershowthesealgorithmsareimplementedontheGPU,tunedtoperformwellforbodiesofanysize,anddiscussesseveraluse-casesrelevanttoresearch.
S0125-MemoryEfficientReverseTimeMigrationin3DChrisLeader(StanfordExplorationProject)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:EnergyExploration;ComputationalPhysicsSessionLevel:IntermediateLearnhowwecanimagetheinterioroftheEarthinthreedimensionsusingReverseTimeMigration.
WediscusshowGPUsacceleratethismethodusingparallelwavepropagationkernels,texturememoriesandminimaldevicetohosttransfers.
Furtherwediscusshowtheprogressionto3Dpresentsamultitudeofnewproblems,particularlymemorybased-causingthesystemtobeIOlimited.
Bymanipulatingboundarypositionsandvaluestoapseudo-randomformweshowhowmanyofthesememoryrestrictionscanbediminishedandhowdetailedsubsurfaceimagescanbefullyconstructedusingGPUs.
S0236-AdvancedOptimizationTechniquesOnaCUDAImplementationofConjugateGradientSolversEriRubin(OptiTex)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:Algorithms&NumericalTechniques;Algorithms&NumericalTechniques;ComputationalPhysics;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateLinearsystemsareattheheartofallotofcomputeproblems.
Inlargesparsesystems,thereare2distinctapproaches,thedirectanditerativesolvers.
Aftermanyyearsofresearchingandtestingbothapproaches,onCPUandGPUwehaveimplementedahighlyefficientCGsolverontheGPUusingacombinationofuniquetechniques.
Inthistalkwewillgooverthesetechniquesandtheimprovedperformancetheybring.
S0312-GPUImplementationforRapidIterativeImageReconstructioninNuclearMedicineJakubPietrzak(UniversityofWarsaw)Day:Wednesday,05/16|Time:10:00am-10:25amTopicAreas:MedicalImaging&Visualization;ComputationalPhysics;ComputerGraphicsSessionLevel:IntermediateGPUimplementationcangreatlyaccelerateiterativetechniquesof3Dimagereconstructioninnuclearmedicineimaging.
SinglePhotonEmissionComputedTomography(SPECT)isafunctionalimagingmodalitywidelyusedinclinicaldiagnosis.
Toobtainhighqualityimageswithinreducedscanningtimeshighsensitivitycollimatorsneedtobeusedandtheirresponsefunctionmodeledinthereconstruction.
ThisisingeneralverycomputationallyintensiveandunfeasiblewithCPUandalgorithmimplementations.
Oursoftwareisabletoperformthereconstructionofpatientdatawithinclinicallyacceptabletimesusingrelativelylowcostandwidelyavailablehardware.
S0352-GPU-AcceleratedParallelComputingforSimulationofSeismicWavePropagationTaroOkamoto(DepartmentofEarthandPlanetarySciences,TokyoInstituteofTechnology)Day:Wednesday,05/16|Time:10:30am-10:55amTopicAreas:ComputationalPhysics;GeneralInterestSessionLevel:AdvancedWeadoptedGPUtoacceleratelarge-scale,parallelfinite-difference(FDTD)simulationofseismicwavepropagation.
EffectiveparallelimplementationisneededbecausethesizeofthememoryofasingleGPUistoosmallforrealapplications.
Thuswedescribethememoryoptimization,thethree-dimensionaldomaindecomposition,andoverlappingthecommunicationandcomputationadoptedinourprogram.
Weachievedsofarahighperformance(single-precision)ofabout61TFlopsbyusing1200GPUsofTSUBAME-2.
0,theGPUsupercomputerinTokyoInstituteofTechnology,Japan.
Asanimportantapplication,weshowtheresultsofthesimulationofthe2011Tohoku-Okimega-quake.
S0269-Accelerating3D-RISMCalculationsusingGPUsYutakaMaruyama(InstituteforMolecularScience),FumioHirata(InstituteforMolecularScience)Day:Wednesday,05/16|Time:3:00pm-3:25pmTopicAreas:LifeSciences;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateThethree-dimensionalreferenceinteractionsitemodel(3D-RISM)theory,isapowerfultooltoinvestigatebiomolecularprocessesinsolution.
Unfortunately,3D-RISMcalculationsareoftenbothmemoryintensiveandtime-consuming.
WesoughttoacceleratethesecalculationsusingGPUs.
ToworkaroundtheproblemoflimitedmemorysizeinGPUs,wemodifiedthelessmemory-intensiveAndersonmethodforfasterconvergenceof3D-RISMcalculations.
UsingthismethodonC2070,wereducedthecomputationaltimebyafactorofeightcomparedtoIntelXeon(8cores,3.
33GHz)withtheconventionalmethod.
S0055-ParticleDynamicswithMBDandFEAusingCUDAGrahamSanborn(FunctionBay)Day:Wednesday,05/16|Time:4:00pm-4:25pmTopicAreas:ComputationalStructuralMechanics;ComputationalPhysics;ComputationalFluidDynamicsSessionLevel:IntermediateManysphereparticlesaresolvedwithDEM(DiscreteElementMethod)andsimulatedwithGPUtechnology.
Fastalgorithmisappliedtocalculatehertziancontactforcesbetweenmanysphereparticles(from100,000to1,000,000)andNVIDIA'sCUDAisusedtoacceleratethecalculation.
ManysphereparticlesandMBDandFEAentitiesaresimulatedwithincommercialsoftwareRecurDyn.
Manymodelsarebuiltandsimulated;forklifterwithsandmodel,oilinoiltankmodel,oilfilledenginesystemandwaterfilledwashingmachinemodel.
AllmodelsaresimulatedwithNVIDIA'sGPUandtheresultisshown.
S0363-EfficientMolecularDynamicsonHeterogeneousGPUArchitecturesinGROMACSSzilárdPáll(KTHRoyalInstituteofTechnology),BerkHess(KTHRoyalInstituteofTechnology)Day:Wednesday,05/16|Time:4:00pm-4:25pmTopicAreas:MolecularDynamics;ComputationalPhysics;LifeSciencesSessionLevel:IntermediateMolecularDynamicsisanimportantapplicationforGPUacceleration,butmanyalgorithmicoptimizationsandfeaturesstillrelyoncodethatpreferstraditionalCPUs.
ItisonlywiththelatesthardwareandsoftwarewehavebeenabletorealizeaheterogeneousGPU/CPUimplementationandreachperformancesignificantlybeyondthestate-of-the-artofhand-tunedCPUcodeinourGROMACSprogram.
Thesub-milliseconditerationtimeposeschallengesonalllevelsofparallelization.
Comeandlearnaboutournewatom-clusterpairinteractionapproachfornon-bondedforceevaluationthatachieves60%work-efficiencyandotherinnovativesolutionsforheterogeneousGPUsystems.
S0139-GPU-BasedMolecularDynamicsSimulationsofProteinandRNAAssemblySamuelCho(WakeForestUniversity)Day:Wednesday,05/16|Time:5:00pm-5:25pmTopicAreas:MolecularDynamics;ComputationalPhysicsSessionLevel:IntermediateProteinandRNAbiomolecularfoldingandassemblyproblemshaveimportantapplicationsbecausemisfoldingisassociatedwithdiseaseslikeAlzheimer'sandParkinson's.
However,simulatingcomplexbiomoleculesonthesametimescalesasexperimentsisanextraordinarychallengeduetoabottleneckintheforcecalculations.
Toovercomethesehurdles,weperformcoarse-grainedmoleculardynamicssimulationswherebiomoleculesarereducedintosimplercomponents.
Furthermore,ourGPU-basedsimulationshaveasignificantperformanceimprovementoverCPU-basedsimulations,whichislimitedtosystemsof50-150residues/nucleotides.
TheGPU-basedcodecansimulateprotein/RNAsystemsof400-10,000+residues/nucleotides,andwepresentribosomeassemblysimulations.
S0129-AMonteCarloThermalRadiationSolverinGPU/CPUHybridArchitectureGaofengWang(LaboratoireE.
M2.
C,EcoleCentraleParis),OliverGicquel(LaboratoireE.
M2.
C,EcoleCentraleParis)Day:Thursday,05/17|Time:9:00am-9:25amTopicAreas:ComputationalFluidDynamics;ComputationalFluidDynamics;ComputationalPhysics;RayTracingSessionLevel:IntermediateAMonteCarloray-tracingcodeisdevelopedtopredictradiativeheattransferbehaviorsinCFDsimulationofcombustionphenomena.
Usingemission-reciprocalmethod,eachrandomraycastingofeachnodecouldbeindependentlyconductedforparallelcomputations.
ThecodeisefficientlyimplementedinhybridGPU/CPUHPCresourcesusingadedicateddynamicloadbalancingstrategy.
AlinearspeedupscalingofhybridHPCresourceshasbeenshownindemonstratingcalculationofradiativeheattransferofahelicopterengine'scombustionchamber,whileaddingoneGPUinHPCresourcespoolisinsenseofnineCPUcoressupplements.
S0508-FasterFiniteElementsforWavePropagationCodesMaxRietmann(InstituteforComputationalScience/USILugano,Switzerland)Day:Thursday,05/17|Time:10:00am-10:25amTopicAreas:Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateLearnhowtodevelopfasterandbetterfinite-elementcodesforwavepropagationusingGPUsandMPIcombinedwithoverlappingtechniquestohidethecostofcommunicationsandofhost/devicememorycopies.
Differentoptionsbasedonmeshcoloringoronatomicoperationswillbepresented.
Thedifficultytodefinespeedupwillalsobediscussed(speedupversuswhatusingwhatdefinitionof"cost").
ExampleswillbegivenusingSPECFEM3D,ahighlyoptimizedspectralfinite-elementcodethathaswontheGordonBellSupercomputingawardandtheBULLJosephFourieraward,andthatcanrunonCPUorGPUclusters.
S0039-Data-DrivenGPGPUIdeologyExtensionAlexandrKosenkov(UniversityofGeneva),BelaBauer(MicrosoftResearch)Day:Thursday,05/17|Time:10:00am-10:25amTopicAreas:ApplicationDesign&PortingTechniques;ComputationalPhysics;ParallelProgrammingLanguages&Compilers;DevelopmentTools&LibrariesSessionLevel:AdvancedInthissessionwewilldemonstratehowtheGPGPUideologycanbeextendedsothatitcanbeusedonascaleofInfinibandhybridsystem.
Theapproachthatwearepresentingcombinesdelayedexecution,schedulingtechniquesand,mostimportantly,castsdowntheCPUmulti-coreideologytothestreamingmultiprocessor'soneenforcingfullfledged"GPGPUasaco-processor"wayofprogrammingforlarge-scaleMPIhybridapplications.
StayingcompatiblewithmodernCPU/GPGPUlibrariesitprovidesmorethanafinegrainedcontroloverresources-morethanyouwantedthatis.
S0217-EfficientImplementationofCFDAlgorithmsonGPUAcceleratedSupercomputersAliKhajeh-Saeed(UniversityofMassachusetts,Amherst),BlairPerot(UniversityofMassachusetts,Amherst)Day:Thursday,05/17|Time:10:30am-10:55amTopicAreas:ComputationalFluidDynamics;ComputationalPhysics;Supercomputing;ApplicationDesign&PortingTechniquesSessionLevel:IntermediateThegoalofthissessionistointroducetheconceptsnecessarytoperformlargecomputationalfluiddynamic(CFD)problemsoncollectionsofmanyGPUs.
CommunicationandcomputationoverlappingschemesbecomeevenmorecriticalwhenusingfastcomputeenginessuchasGPUsthatareconnectedviaarelativelyslowinterconnect(suchasMPIonInfiniBand).
ThealgorithmspresentedarevalidatedonunsteadyCFDsimulationsofturbulenceusing192graphicsprocessorstoupdatehalf-a-billionunknownspercomputationaltimestep.
TheperformanceresultsfromthreedifferentGPUacceleratedsupercomputers(Lincoln,Forge,andKeeneland)arecomparedwithalargeCPUbasedsupercomputer(Ranger).
S0378-VASPAcceleratedwithGPUsMaxwellHutchinson(UniversityofChicago)Day:Thursday,05/17|Time:2:00pm-2:50pmTopicAreas:QuantumChemistry;ApplicationDesign&PortingTechniques;ComputationalPhysicsSessionLevel:IntermediateThissessionwilldetailtheperformanceandcapabilitiesofGPU-acceleratedVASP,explaindesigndecisionsmadeinportingVASPtoCUDA,andpresentaroadmapforGPUacceleratedVASPdevelopment.
We'veachievedperformanceimprovementsuptoaround20xonsystemsofaround100ionsandhaveimplementedexact-exchange.
Weareworkingonportsofmoreconventionalfunctionality.
S0071-TheHigh-LevelLinearAlgebraLibraryViennaCLAndItsApplicationsKarlRupp(TUWien)Day:Thursday,05/17|Time:3:00pm-3:50pmTopicAreas:DevelopmentTools&Libraries;Algorithms&NumericalTechniques;ComputationalPhysicsSessionLevel:IntermediateGettoknowViennaCL,anOpenCLhigh-levellinearalgebrasoftware,whichallowstogetthespeedofGPUcomputingattheconvenienceleveloftheC++Boostlibraries.
Decreasethedevelopmentandexecutiontimeofapplicationsbyutilizingourwell-testedandwidelyusedlibrary,insteadofspendingdaysonlearningdetailsofGPUarchitecturesanddebugging.
Weprovideexamplesthatdemonstratenotonlyhowquicklyexistingapplicationsareportedefficientlyfromsingle-threadedexecutiontofullyutilizingmulti-threadedenvironments,butalsohowtoutilizetherichsetoffunctionalitiesrangingfromcommonBLASroutinestoiterativesolvers.
S0087-GPUAccelerationofDenseStellarClustersSimulationBharathPattabiraman(NorthwesternUniversity),StefanUmbreit(NorthwesternUniversity)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:Astronomy&Astrophysics;ComputationalPhysics;Algorithms&NumericalTechniquesSessionLevel:IntermediateComputingtheinteractionsbetweenstarswithindensestellarclustersisaproblemoffundamentalimportanceintheoreticalastrophysics.
ThispaperpresentstheparallelizationofaMonteCarloalgorithmforsimulatingstellarclusterevolutionusingprogrammableGraphicsProcessingUnits.
Thekernelsofthisalgorithmexhibithighlevelsofdatadependentdecisionmakingandunavoidablenon-contiguousmemoryaccesses.
However,weadoptvariousparallelizationstrategiesandutilizethehighcomputingpoweroftheGPUtoobtainsubstantialnear-linearspeedupswhichcannotbeeasilyachievedonaCPU-basedsystem.
Thisaccelerationallowstoexplorephysicalregimeswhichwereoutofreachofcurrentsimulations.
S0368-UnravelingtheMysteriesofQuarkswithHundredsofGPUsRonaldBabich(NVIDIA)Day:Thursday,05/17|Time:3:00pm-3:50pmTopicAreas:ComputationalPhysics;ApplicationDesign&PortingTechniques;Algorithms&NumericalTechniques;SupercomputingSessionLevel:IntermediateDiveintotheworldofquarksandgluons,andhearhowGPUcomputingisrevolutionizingthewaymanycalculationsinlatticequantumchromodynamics(latticeQCD)areperformed.
Themaincomputationalchallengeinsuchcalculationsistorepeatedlysolvelargesystemsoflinearequationsarisingfromafour-dimensionalfinite-differenceproblem.
Inthissession,we'lldiscussstrategiesforparallelizingsuchasolveracrosshundredsofGPUs.
Theseincludetechniquesandalgorithmsforreducingmemorytrafficandinter-GPUcommunication.
Thenetresultisanimplementationthatachievesbetterthan20Tflopson256GPUs,realizedintheopen-source"QUDA"library.
S0091-SustainableHybridParallelizationofanUnstructuredHydrodynamicCodeRaphalPoncet(Commissariatàl'EnergieAtomiqueetauxEnergiesAlternatives)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:ApplicationDesign&PortingTechniques;Algorithms&NumericalTechniques;ComputationalFluidDynamics;ComputationalPhysicsSessionLevel:AdvancedThegoalofthispresentationistoshareourmethodologyforportinganumericalcodetohybridsupercomputingarchitecturesusingMPIcoupledwithdirective-basedlanguages(OpenMPformulticoreCPUs,andHMPPforGPUs).
Ourcode,VOLNA,isanunstructuredpartialdifferentialequationhydrodynamicsolverdevelopedforthesimulationoftsunamis.
Ourresultsdemonstratethatusingdirective-basedlanguagessuchasHMPPforGPUprogramming,onecanretaingoodperformance(e.
g.
speedupof15comparedto1CPUcore,3comparedto8CPUcores)withminimalmodificationsoftheoriginalCPUsourcecode(about30linesofdirectivesinourcase).
S0334-TheFastMultipoleMethodonCPUandGPUProcessorsEricDarve(Stanford)Day:Thursday,05/17|Time:3:00pm-3:25pmTopicAreas:ComputationalPhysics;MolecularDynamics;Algorithms&NumericalTechniquesSessionLevel:AdvancedThefastmultipolemethod(FMM)isawidelyusednumericalalgorithmincomputationalengineering.
AcceleratingtheFMMonCUDA-enabledGPUsischallengingbecausetheFMMhasacomplicateddataaccesspattern,mostlyduringtheso-calledmultipole-to-local(M2L)operation.
WehavecreatedseveralschemestooptimizetheM2Landhaveattainedaperformanceofover350(resp.
160)Gflop/sforsingle(double)precisionarithmetic.
TheoptimalalgorithmwasincorporatedintoacompleteFMMcode,whichcanacceptanysmoothkernelasspecifiedbytheuser,makingitveryflexible.
WehavealsodevelopedahighlyefficientCPUversion.
S0282-LeveragingNVIDIAGPUDirectonAPEnet+3DTorusClusterInterconnectDavideRossetti(ItalianNationalInstitueforNuclearPhysics)Day:Thursday,05/17|Time:4:30pm-4:55pmTopicAreas:Supercomputing;ComputationalPhysicsSessionLevel:IntermediateAPEnet+isanovelclusterinterconnect,basedonacustomPCIcardwhichfeaturesaPCIExpressGen2X8linkandare-configurableHWcomponent(FPGA).
Itsupportsa3DTorustopologyandhasspecialaccelerationfeaturesspecificallydevelopedforNVIDIAFermiGPUs.
AnintroductiontothebasicfeaturesandtheprogrammingmodelofAPEnet+willbefollowedbyadescriptionofitsperformanceonsomenumericalsimulations,e.
g.
HighEnergyPhysicssimulations.
S0218-ASIParallelFortran:AGeneral-PurposeFortrantoGPUTranslatorRainaldLohner(GeorgeMasonUniversity)Day:Thursday,05/17|Time:4:30pm-4:55pmTopicAreas:DevelopmentTools&Libraries;ComputationalFluidDynamics;ComputationalPhysics;ParallelProgrammingLanguages&CompilersSessionLevel:AdvancedOverthelast3yearswehavedevelopedageneral-purposeFortrantoGPUtranslator:ASIParallelFortrandoes.
Thetalkwilldetailitspurpose,designlayoutandcapabilities,andshowhowitisusedandimplemented.
TheuseofASIParallelFortranwillbeshownforlarge-scaleCFD/CEMcodesaswellasothergeneralpurposeFortrancodes.
零途云(Lingtuyun.com)新上了香港站群云服务器 – CN2精品线路,香港多ip站群云服务器16IP/5M带宽,4H4G仅220元/月,还有美国200g高防云服务器低至39元/月起。零途云是一家香港公司,主要产品香港cn2 gia线路、美国Cera线路云主机,美国CERA高防服务器,日本CN2直连服务器;同时提供香港多ip站群云服务器。即日起,购买香港/美国/日本云服务器享受9折优惠,新...
亚洲云Asiayun怎么样?亚洲云成立于2021年,隶属于上海玥悠悠云计算有限公司(Yyyisp),是一家新国人IDC商家,且正规持证IDC/ISP/CDN,商家主要提供数据中心基础服务、互联网业务解决方案,及专属服务器租用、云服务器、云虚拟主机、专属服务器托管、带宽租用等产品和服务。Asiayun提供源自大陆、香港、韩国和美国等地骨干级机房优质资源,包括BGP国际多线网络,CN2点对点直连带宽以...
Hostinger 商家我们可能一些新用户不是太熟悉,因为我们很多新人用户都可能较多的直接从云服务器、独立服务器起步的。而Hostinger商家已经有将近十年的历史的商家,曾经主做低价虚拟主机,也是比较有知名度的,那时候也有接触过,不过一直没有过多的使用。这不这么多年过去,Hostinger商家一直比较稳妥的在运营,最近看到这个商家在改版UI后且产品上也在活动策划比较多。目前Hostinger在进...
http错误403-禁止访问为你推荐
网易网盘关闭入口网易网盘怎么打不开了.cn域名cn是什么域名?百度商城百度商城里抽奖全是假的老虎数码1200万相素的数码相机都有哪些款?大概价钱是多少?22zizi.com福利彩双色球22号开奖号22zizi.com河南福利彩票22选52010175开奖结果22zizi.comwww 地址 didi22怎么打不开了,还有好看的吗>com杰景新特谁给我一个李尔王中的葛罗斯特这个人物的分析?急 ....先谢谢了冯媛甑冯媛甄 康熙来了百度关键词分析百度关键字分析是什么意思?
域名出售 提供香港vps a5域名交易 新加坡主机 全球付 外国服务器 l5520 绍兴高防 ibox官网 阿里云浏览器 大容量存储器 炎黄盛世 hostker 域名接入 卡巴斯基试用版 tna官网 流媒体加速 dnspod starry 主机返佣 更多