1.0ivybridge

ivybridge  时间:2021-03-28  阅读:()
LS-DYNAPerformanceBenchmarkandProfilingOctober20172NoteThefollowingresearchwasperformedundertheHPCAdvisoryCouncilactivities–Participatingvendors:LSTC,Huawei,Mellanox–Computeresource-HPCAdvisoryCouncilClusterCenterThefollowingwasdonetoprovidebestpractices–LS-DYNAperformanceoverview–UnderstandingLS-DYNAcommunicationpatterns–WaystoincreaseLS-DYNAproductivity–MPIlibrariescomparisonsFormoreinfopleasereferto–http://www.
lstc.
com–http://www.
huawei.
com–http://www.
mellanox.
com3LS-DYNALS-DYNA–Ageneralpurposestructuralandfluidanalysissimulationsoftwarepackagecapableofsimulatingcomplexrealworldproblems–DevelopedbytheLivermoreSoftwareTechnologyCorporation(LSTC)LS-DYNAusedby–Automobile–Aerospace–Construction–Military–Manufacturing–Bioengineering4ObjectivesThepresentedresearchwasdonetoprovidebestpractices–LS-DYNAperformancebenchmarkingMPILibraryperformancecomparisonInterconnectperformancecomparisonCompilerscomparisonOptimizationtuningThepresentedresultswilldemonstrate–Thescalabilityofthecomputeenvironment/application–Considerationsforhigherproductivityandefficiency5TestClusterConfigurationHuaweiFusionServerE9000withFusionServerCH121V516-node(640-core)"Skylake"cluster–Dual-Socket20-CoreIntelXeonGold6138@2.
00GHzCPUs–Memory:192GBmemory,DDR42666MHzRDIMMspernode–OS:RHEL7.
2,MLNX_OFED_LINUX-4.
1-1.
0.
2.
0InfiniBandSWstackMellanoxConnectX-5EDR100Gb/sInfiniBandAdaptersMellanoxSwitch-IBSB780036-portEDR100Gb/sInfiniBandSwitchCompilers:IntelParallelStudioXE2018MPI:IntelMPI2018,MellanoxHPC-XMPIToolkitv1.
9.
7,PlatformMPI9.
1.
4.
3Application:MPPLS-DYNAR9.
1.
0,build113698,singleprecisionMPIProfiler:IPM(fromMellanoxHPC-X)Benchmarks:TopCrunchbenchmarks–NeonRefinedRevised(neon_refined_revised),ThreeVehicleCollision(3cars),NCACMinivanModel(Caravan2m-ver10),odb10m(NCACTaurusmodel)6High-Performance2-SocketBladeUnlocksSupremeComputingPowerFull-seriesIntelXeonScalableProcessors,24DDR4DIMMs,AEPmemorysupported,1PCIeslot,2SFF/2NVMeSSDs/4M.
2SSDshigh-performancestorage,multi-planenetwork,LOMsupportedIntroducingHuaweiFusionServerE9000(CH121)V57LS-DYNAPerformance–CPUSKUsandGenerationLS-DYNAperformancegainbylargercorecountsandbettermemorythroughput–The"Gold6140"demonstratesa50%ofperformancegain(29%morecores)vsE5-2680v4–The"Gold6148"demonstratesa61%ofperformancegain(42%morecores)vsE5-2680v4–BaseclockarethesameonE5-2680v4andGold6148,whileGold6140runsslightlyslower–Skylakesupports6memorychannelsandfasterDIMMswhichimpactsonmemoryperformanceSingleNodePerformanceHigherisbetter61%50%8LS-DYNAPerformance–MemorySpeedMemoryspeedprovidessomebenefitstoLS-DYNAperformance–SkylakeplatformsupportsDIMMspeedupto2666MHzDIMMs–2666MHzDIMMsistheoretically~11%fasterthanthe2400MHzDIMMs–LS-DYNAreportsonlyabout~2-3%oftheimprovementonasinglenode–ItappearsonlypartofthespeeddifferenceistranslatedintoLS-DYNAperformancegain40MPIProcesses/NodeHigherisbetter9LS-DYNAPerformance–Sub-NUMAClusteringEnablingSNCprovidessomebenefitsforLS-DYNA–Sub-NUMAClustering(SNC)issimilartoacluster-on-die(COD)inHaswell/Broadwellgeneration–CPUcoresandmemorywouldbesplitinto2separateNUMAdomainswhenSNCisenabled–SNCgenerallyshoulddemonstratesomebenefitsforapplicationsthatrequiresgoodNUMAlocality–SNCdemonstratesaperformancegainof~2-3%onasinglenodebasis40MPIProcesses/NodeHigherisbetter10LS-DYNAPerformance–CPUInstructionsAVX2outperformsbothAVX-512andSSE2executablesonSkylakeCPU–Performancegainof17%byusingAVX2overAVX-512executables–AVX-512performsworsecomparedtoAVX2,despiteimprovedvectorization–AVX-512instructionsrunsatareducedclockfrequencyasAVX2andnormalclocks–BenefitofAVX2appearstobelargeronbiggerdataset(suchascar2car)40MPIProcesses/NodeHigherisbetter17%8%4%3%11LS-DYNAPerformance–CPUInstructionSetsSomevarianceinperformanceamongdifferentLS-DYNAversions/executables–AVX2performsbetterthanSSE2LS-DYNAexecutables–SmallvarianceinperformanceamongdifferentLS-DYNAreleases–R7.
1.
3appearedtoperformbetteronlargerdatasets40MPIProcesses/NodeHigherisbetter20%12LS-DYNAPerformance–MPILibrariesAllthreeMPIimplementationsshowsdecentperformanceatscale–PlatformMPIandHPC-Xperformssimilarly,whileIntelMPIshowsadropatsmalldatasetatscale40MPIProcesses/NodeHigherisbetter13LS-DYNAPerformance–SystemGenerationsCurrentSkylakesystemconfigurationoutperformspriorsystemgenerations–SkylakeplatformoutperformedBroadwellby21%,Haswellby51%,IvyBridgeby89%,SandyBridgeby132%,Westmereby222%,Nehalemby425%–Skylakeperforms41%betterthanBroadwellforthe3carsmodelonasingle-nodebasis–Systemcomponentsused:Skylake:2-socket20-coreXeonGold61382.
0GHz,2666MHzDIMMs,ConnectX-5EDRInfiniBandBroadwell:2-socket14-coreXeonE5-2690v42.
6GHz,2400MHzDIMMs,ConnectX-4EDRInfiniBandHaswell:2-socket14-coreXeonE5-2697v32.
6GHz,2133MHzDIMMs,ConnectX-4EDRInfiniBandIvyBridge:2-socket10-coreXeonE5-2680v22.
8GHz,1600MHzDIMMs,Connect-IBFDRInfiniBandSandyBridge:2-socket8-coreXeonE5-26802.
7GHz,1600MHzDIMMs,ConnectX-3FDRInfiniBandWestmere:2-socket6-coreXeonx56702.
93GHz,1333MHzDIMMs,ConnectX-2QDRInfiniBandNehalem:2-socket4-coreXeonx55702.
93GHz,1333MHzDIMMs,ConnectX-2QDRInfiniBandBestresultsshownHigherisbetter41%14LS-DYNASummaryLS-DYNAismulti-purposeexplicitandimplicitfiniteelementprogram–Utilizesbothcompute,memoryandnetworkcommunicationsforperformanceEffectofMPIonperformance–PlatformMPIandHPC-Xperformssimilarly,IntelMPIshowsadropatsmalldatasetEffectofSkylakegenerationonperformance–Providessubstantialperformancegainduetothelargercorecount,supportformemorychannels–Faster2666MHzDIMM(comparesto2400MHz)translatestoincrease2-3%inhigherperformanceEffortofCPUInstructionsonperformance–AVX-512performsworsecomparedtoAVX2,despitetheimprovedvectorization–AVX-512instructionsrunsatareducedclockfrequencyasAVX2andnormalclocksEffectofSNConperformance–EnablingSub-NUMAClusteringprovidessmalladvantage(~2-3%)onsinglenodeEffectfoLS-DYNAversiononperformance–SmallvarianceinperformanceamongdifferentLS-DYNAreleases;bestappearedtobeR7.
1.
31515ThankYouHPCAdvisoryCouncilAlltrademarksarepropertyoftheirrespectiveowners.
Allinformationisprovided"As-Is"withoutanykindofwarranty.
TheHPCAdvisoryCouncilmakesnorepresentationtotheaccuracyandcompletenessoftheinformationcontainedherein.
HPCAdvisoryCouncilundertakesnodutyandassumesnoobligationtoupdateorcorrectanyinformationpresentedherein

CloudCone2核KVM美国洛杉矶MC机房机房2.89美元/月,美国洛杉矶MC机房KVM虚拟架构2核1.5G内存1Gbps带宽,国外便宜美国VPS七月特价优惠

近日CloudCone发布了七月的特价便宜优惠VPS云服务器产品,KVM虚拟架构,性价比最高的为2核心1.5G内存1Gbps带宽5TB月流量,2.89美元/月,稳定性还是非常不错的,有需要国外便宜VPS云服务器的朋友可以关注一下。CloudCone怎么样?CloudCone服务器好不好?CloudCone值不值得购买?CloudCone是一家成立于2017年的美国服务器提供商,国外实力大厂,自己开...

乌云数据(10/月),香港cera 1核1G 10M带宽/美国cera 8核8G10M

乌云数据主营高性价比国内外云服务器,物理机,本着机器为主服务为辅的运营理念,将客户的体验放在第一位,提供性价比最高的云服务器,帮助各位站长上云,同时我们深知新人站长的不易,特此提供永久免费虚拟主机,已提供两年之久,帮助了上万名站长从零上云官网:https://wuvps.cn迎国庆豪礼一多款机型史上最低价,续费不加价 尽在wuvps.cn香港cera机房,香港沙田机房,超低延迟CN2线路地区CPU...

无忧云( 9.9元/首月),河南洛阳BGP 2核 2G,大连BGP线路 20G高防 ,

无忧云怎么样?无忧云服务器好不好?无忧云值不值得购买?无忧云,无忧云是一家成立于2017年的老牌商家旗下的服务器销售品牌,现由深圳市云上无忧网络科技有限公司运营,是正规持证IDC/ISP/IRCS商家,自营有国内雅安高防、洛阳BGP企业线路、香港CN2线路、国外服务器产品等,非常适合需要稳定的线路的用户,如游戏、企业建站业务需求和各种负载较高的项目,同时还有自营的高性能、高配置的BGP线路高防物理...

ivybridge为你推荐
站酷zcool有那位知道从哪个网站能下到广告素材Baby被问婚变绯闻黄晓明baby一起出来带娃,想要打破离婚传闻?安徽汽车网合肥汽车站网上售票access数据库access数据库的组成是什么www.jjwxc.net有那个网站可以看书?月神谭求古典武侠类的变身小说~!丑福晋历史上真正的八福晋是什么样子的?51sese.comwww.51xuanh.com这是什么网站是骗人的吗?www.javmoo.comjavimdb怎么看avtt4.comwww.51kao4.com为什么进不去啊?
哈尔滨域名注册 绍兴服务器租用 免费国际域名 七牛优惠码 linode日本 私服服务器 windows2003iso cpanel空间 anylink 日本bb瘦 韩国名字大全 速度云 cdn加速原理 免费活动 服务器托管什么意思 ftp免费空间 100mbps drupal安装 美国独立日 万网主机管理 更多