LS-DYNAPerformanceBenchmarkandProfilingOctober20172NoteThefollowingresearchwasperformedundertheHPCAdvisoryCouncilactivities–Participatingvendors:LSTC,Huawei,Mellanox–Computeresource-HPCAdvisoryCouncilClusterCenterThefollowingwasdonetoprovidebestpractices–LS-DYNAperformanceoverview–UnderstandingLS-DYNAcommunicationpatterns–WaystoincreaseLS-DYNAproductivity–MPIlibrariescomparisonsFormoreinfopleasereferto–http://www.
lstc.
com–http://www.
huawei.
com–http://www.
mellanox.
com3LS-DYNALS-DYNA–Ageneralpurposestructuralandfluidanalysissimulationsoftwarepackagecapableofsimulatingcomplexrealworldproblems–DevelopedbytheLivermoreSoftwareTechnologyCorporation(LSTC)LS-DYNAusedby–Automobile–Aerospace–Construction–Military–Manufacturing–Bioengineering4ObjectivesThepresentedresearchwasdonetoprovidebestpractices–LS-DYNAperformancebenchmarkingMPILibraryperformancecomparisonInterconnectperformancecomparisonCompilerscomparisonOptimizationtuningThepresentedresultswilldemonstrate–Thescalabilityofthecomputeenvironment/application–Considerationsforhigherproductivityandefficiency5TestClusterConfigurationHuaweiFusionServerE9000withFusionServerCH121V516-node(640-core)"Skylake"cluster–Dual-Socket20-CoreIntelXeonGold6138@2.
00GHzCPUs–Memory:192GBmemory,DDR42666MHzRDIMMspernode–OS:RHEL7.
2,MLNX_OFED_LINUX-4.
1-1.
0.
2.
0InfiniBandSWstackMellanoxConnectX-5EDR100Gb/sInfiniBandAdaptersMellanoxSwitch-IBSB780036-portEDR100Gb/sInfiniBandSwitchCompilers:IntelParallelStudioXE2018MPI:IntelMPI2018,MellanoxHPC-XMPIToolkitv1.
9.
7,PlatformMPI9.
1.
4.
3Application:MPPLS-DYNAR9.
1.
0,build113698,singleprecisionMPIProfiler:IPM(fromMellanoxHPC-X)Benchmarks:TopCrunchbenchmarks–NeonRefinedRevised(neon_refined_revised),ThreeVehicleCollision(3cars),NCACMinivanModel(Caravan2m-ver10),odb10m(NCACTaurusmodel)6High-Performance2-SocketBladeUnlocksSupremeComputingPowerFull-seriesIntelXeonScalableProcessors,24DDR4DIMMs,AEPmemorysupported,1PCIeslot,2SFF/2NVMeSSDs/4M.
2SSDshigh-performancestorage,multi-planenetwork,LOMsupportedIntroducingHuaweiFusionServerE9000(CH121)V57LS-DYNAPerformance–CPUSKUsandGenerationLS-DYNAperformancegainbylargercorecountsandbettermemorythroughput–The"Gold6140"demonstratesa50%ofperformancegain(29%morecores)vsE5-2680v4–The"Gold6148"demonstratesa61%ofperformancegain(42%morecores)vsE5-2680v4–BaseclockarethesameonE5-2680v4andGold6148,whileGold6140runsslightlyslower–Skylakesupports6memorychannelsandfasterDIMMswhichimpactsonmemoryperformanceSingleNodePerformanceHigherisbetter61%50%8LS-DYNAPerformance–MemorySpeedMemoryspeedprovidessomebenefitstoLS-DYNAperformance–SkylakeplatformsupportsDIMMspeedupto2666MHzDIMMs–2666MHzDIMMsistheoretically~11%fasterthanthe2400MHzDIMMs–LS-DYNAreportsonlyabout~2-3%oftheimprovementonasinglenode–ItappearsonlypartofthespeeddifferenceistranslatedintoLS-DYNAperformancegain40MPIProcesses/NodeHigherisbetter9LS-DYNAPerformance–Sub-NUMAClusteringEnablingSNCprovidessomebenefitsforLS-DYNA–Sub-NUMAClustering(SNC)issimilartoacluster-on-die(COD)inHaswell/Broadwellgeneration–CPUcoresandmemorywouldbesplitinto2separateNUMAdomainswhenSNCisenabled–SNCgenerallyshoulddemonstratesomebenefitsforapplicationsthatrequiresgoodNUMAlocality–SNCdemonstratesaperformancegainof~2-3%onasinglenodebasis40MPIProcesses/NodeHigherisbetter10LS-DYNAPerformance–CPUInstructionsAVX2outperformsbothAVX-512andSSE2executablesonSkylakeCPU–Performancegainof17%byusingAVX2overAVX-512executables–AVX-512performsworsecomparedtoAVX2,despiteimprovedvectorization–AVX-512instructionsrunsatareducedclockfrequencyasAVX2andnormalclocks–BenefitofAVX2appearstobelargeronbiggerdataset(suchascar2car)40MPIProcesses/NodeHigherisbetter17%8%4%3%11LS-DYNAPerformance–CPUInstructionSetsSomevarianceinperformanceamongdifferentLS-DYNAversions/executables–AVX2performsbetterthanSSE2LS-DYNAexecutables–SmallvarianceinperformanceamongdifferentLS-DYNAreleases–R7.
1.
3appearedtoperformbetteronlargerdatasets40MPIProcesses/NodeHigherisbetter20%12LS-DYNAPerformance–MPILibrariesAllthreeMPIimplementationsshowsdecentperformanceatscale–PlatformMPIandHPC-Xperformssimilarly,whileIntelMPIshowsadropatsmalldatasetatscale40MPIProcesses/NodeHigherisbetter13LS-DYNAPerformance–SystemGenerationsCurrentSkylakesystemconfigurationoutperformspriorsystemgenerations–SkylakeplatformoutperformedBroadwellby21%,Haswellby51%,IvyBridgeby89%,SandyBridgeby132%,Westmereby222%,Nehalemby425%–Skylakeperforms41%betterthanBroadwellforthe3carsmodelonasingle-nodebasis–Systemcomponentsused:Skylake:2-socket20-coreXeonGold61382.
0GHz,2666MHzDIMMs,ConnectX-5EDRInfiniBandBroadwell:2-socket14-coreXeonE5-2690v42.
6GHz,2400MHzDIMMs,ConnectX-4EDRInfiniBandHaswell:2-socket14-coreXeonE5-2697v32.
6GHz,2133MHzDIMMs,ConnectX-4EDRInfiniBandIvyBridge:2-socket10-coreXeonE5-2680v22.
8GHz,1600MHzDIMMs,Connect-IBFDRInfiniBandSandyBridge:2-socket8-coreXeonE5-26802.
7GHz,1600MHzDIMMs,ConnectX-3FDRInfiniBandWestmere:2-socket6-coreXeonx56702.
93GHz,1333MHzDIMMs,ConnectX-2QDRInfiniBandNehalem:2-socket4-coreXeonx55702.
93GHz,1333MHzDIMMs,ConnectX-2QDRInfiniBandBestresultsshownHigherisbetter41%14LS-DYNASummaryLS-DYNAismulti-purposeexplicitandimplicitfiniteelementprogram–Utilizesbothcompute,memoryandnetworkcommunicationsforperformanceEffectofMPIonperformance–PlatformMPIandHPC-Xperformssimilarly,IntelMPIshowsadropatsmalldatasetEffectofSkylakegenerationonperformance–Providessubstantialperformancegainduetothelargercorecount,supportformemorychannels–Faster2666MHzDIMM(comparesto2400MHz)translatestoincrease2-3%inhigherperformanceEffortofCPUInstructionsonperformance–AVX-512performsworsecomparedtoAVX2,despitetheimprovedvectorization–AVX-512instructionsrunsatareducedclockfrequencyasAVX2andnormalclocksEffectofSNConperformance–EnablingSub-NUMAClusteringprovidessmalladvantage(~2-3%)onsinglenodeEffectfoLS-DYNAversiononperformance–SmallvarianceinperformanceamongdifferentLS-DYNAreleases;bestappearedtobeR7.
1.
31515ThankYouHPCAdvisoryCouncilAlltrademarksarepropertyoftheirrespectiveowners.
Allinformationisprovided"As-Is"withoutanykindofwarranty.
TheHPCAdvisoryCouncilmakesnorepresentationtotheaccuracyandcompletenessoftheinformationcontainedherein.
HPCAdvisoryCouncilundertakesnodutyandassumesnoobligationtoupdateorcorrectanyinformationpresentedherein
易探云香港vps主机价格多少钱?香港vps主机租用费用大体上是由配置决定的,我们选择香港vps主机租用最大的优势是免备案vps。但是,每家服务商的机房、配置、定价也不同。我们以最基础配置为标准,综合比对各大香港vps主机供应商的价格,即可选到高性能、价格适中的香港vps主机。通常1核CPU、1G内存、2Mbps独享带宽,价格在30元-120元/月。不过,易探云香港vps主机推出四个机房的优惠活动,...
Friendhosting发布了今年黑色星期五促销活动,针对全场VDS主机提供45折优惠码,虚拟主机4折,老用户续费可获9折加送1个月使用时长,优惠后VDS最低仅€14.53/年起,商家支持PayPal、信用卡、支付宝等付款方式。这是一家成立于2009年的老牌保加利亚主机商,提供的产品包括虚拟主机、VPS/VDS和独立服务器租用等,数据中心可选美国、保加利亚、乌克兰、荷兰、拉脱维亚、捷克、瑞士和波...
LOCVPS发来了针对XEN架构VPS的促销方案,其中美国洛杉矶机房7折,其余日本/新加坡/中国香港等机房全部8折,优惠后日本/新加坡机房XEN VPS月付仅29.6元起。这是成立较久的一家国人VPS服务商,目前提供美国洛杉矶(MC/C3)、和中国香港(邦联、沙田电信、大埔)、日本(东京、大阪)、新加坡、德国和荷兰等机房VPS主机,基于XEN或者KVM虚拟架构,均选择国内访问线路不错的机房,适合建...
ivybridge为你推荐
百度爱好者什么是贴吧沙滩捡12块石头价值近百万捡块石头价值一亿 奇石到底应该怎么定价地图应用看卫星地图哪个手机软件最好。firetrap你们知道的有多少运动品牌的服饰?www.haole012.com012qq.com真的假的www.5any.comwww.qbo5.com 这个网站要安装播放器广告法新广告法哪些广告词不能用,广告违禁词大全hao.rising.cnIE主页被瑞星绑架http://hao.rising.cn//?b=84主页明明设置的是百度但打开后是瑞星导航,朴容熙这个女的叫什么?www.jsjtxx.com怎样让电脑安全又高速
域名邮箱 域名解析文件 ftp空间 plesk 嘉洲服务器 上海域名 40g硬盘 炎黄盛世 ftp教程 howfile 老左来了 中国电信测网速 速度云 佛山高防服务器 ftp免费空间 常州联通宽带 in域名 华为云服务登录 联通网站 空间首页登陆 更多