1.0ivybridge

ivybridge  时间:2021-03-28  阅读:()
LS-DYNAPerformanceBenchmarkandProfilingOctober20172NoteThefollowingresearchwasperformedundertheHPCAdvisoryCouncilactivities–Participatingvendors:LSTC,Huawei,Mellanox–Computeresource-HPCAdvisoryCouncilClusterCenterThefollowingwasdonetoprovidebestpractices–LS-DYNAperformanceoverview–UnderstandingLS-DYNAcommunicationpatterns–WaystoincreaseLS-DYNAproductivity–MPIlibrariescomparisonsFormoreinfopleasereferto–http://www.
lstc.
com–http://www.
huawei.
com–http://www.
mellanox.
com3LS-DYNALS-DYNA–Ageneralpurposestructuralandfluidanalysissimulationsoftwarepackagecapableofsimulatingcomplexrealworldproblems–DevelopedbytheLivermoreSoftwareTechnologyCorporation(LSTC)LS-DYNAusedby–Automobile–Aerospace–Construction–Military–Manufacturing–Bioengineering4ObjectivesThepresentedresearchwasdonetoprovidebestpractices–LS-DYNAperformancebenchmarkingMPILibraryperformancecomparisonInterconnectperformancecomparisonCompilerscomparisonOptimizationtuningThepresentedresultswilldemonstrate–Thescalabilityofthecomputeenvironment/application–Considerationsforhigherproductivityandefficiency5TestClusterConfigurationHuaweiFusionServerE9000withFusionServerCH121V516-node(640-core)"Skylake"cluster–Dual-Socket20-CoreIntelXeonGold6138@2.
00GHzCPUs–Memory:192GBmemory,DDR42666MHzRDIMMspernode–OS:RHEL7.
2,MLNX_OFED_LINUX-4.
1-1.
0.
2.
0InfiniBandSWstackMellanoxConnectX-5EDR100Gb/sInfiniBandAdaptersMellanoxSwitch-IBSB780036-portEDR100Gb/sInfiniBandSwitchCompilers:IntelParallelStudioXE2018MPI:IntelMPI2018,MellanoxHPC-XMPIToolkitv1.
9.
7,PlatformMPI9.
1.
4.
3Application:MPPLS-DYNAR9.
1.
0,build113698,singleprecisionMPIProfiler:IPM(fromMellanoxHPC-X)Benchmarks:TopCrunchbenchmarks–NeonRefinedRevised(neon_refined_revised),ThreeVehicleCollision(3cars),NCACMinivanModel(Caravan2m-ver10),odb10m(NCACTaurusmodel)6High-Performance2-SocketBladeUnlocksSupremeComputingPowerFull-seriesIntelXeonScalableProcessors,24DDR4DIMMs,AEPmemorysupported,1PCIeslot,2SFF/2NVMeSSDs/4M.
2SSDshigh-performancestorage,multi-planenetwork,LOMsupportedIntroducingHuaweiFusionServerE9000(CH121)V57LS-DYNAPerformance–CPUSKUsandGenerationLS-DYNAperformancegainbylargercorecountsandbettermemorythroughput–The"Gold6140"demonstratesa50%ofperformancegain(29%morecores)vsE5-2680v4–The"Gold6148"demonstratesa61%ofperformancegain(42%morecores)vsE5-2680v4–BaseclockarethesameonE5-2680v4andGold6148,whileGold6140runsslightlyslower–Skylakesupports6memorychannelsandfasterDIMMswhichimpactsonmemoryperformanceSingleNodePerformanceHigherisbetter61%50%8LS-DYNAPerformance–MemorySpeedMemoryspeedprovidessomebenefitstoLS-DYNAperformance–SkylakeplatformsupportsDIMMspeedupto2666MHzDIMMs–2666MHzDIMMsistheoretically~11%fasterthanthe2400MHzDIMMs–LS-DYNAreportsonlyabout~2-3%oftheimprovementonasinglenode–ItappearsonlypartofthespeeddifferenceistranslatedintoLS-DYNAperformancegain40MPIProcesses/NodeHigherisbetter9LS-DYNAPerformance–Sub-NUMAClusteringEnablingSNCprovidessomebenefitsforLS-DYNA–Sub-NUMAClustering(SNC)issimilartoacluster-on-die(COD)inHaswell/Broadwellgeneration–CPUcoresandmemorywouldbesplitinto2separateNUMAdomainswhenSNCisenabled–SNCgenerallyshoulddemonstratesomebenefitsforapplicationsthatrequiresgoodNUMAlocality–SNCdemonstratesaperformancegainof~2-3%onasinglenodebasis40MPIProcesses/NodeHigherisbetter10LS-DYNAPerformance–CPUInstructionsAVX2outperformsbothAVX-512andSSE2executablesonSkylakeCPU–Performancegainof17%byusingAVX2overAVX-512executables–AVX-512performsworsecomparedtoAVX2,despiteimprovedvectorization–AVX-512instructionsrunsatareducedclockfrequencyasAVX2andnormalclocks–BenefitofAVX2appearstobelargeronbiggerdataset(suchascar2car)40MPIProcesses/NodeHigherisbetter17%8%4%3%11LS-DYNAPerformance–CPUInstructionSetsSomevarianceinperformanceamongdifferentLS-DYNAversions/executables–AVX2performsbetterthanSSE2LS-DYNAexecutables–SmallvarianceinperformanceamongdifferentLS-DYNAreleases–R7.
1.
3appearedtoperformbetteronlargerdatasets40MPIProcesses/NodeHigherisbetter20%12LS-DYNAPerformance–MPILibrariesAllthreeMPIimplementationsshowsdecentperformanceatscale–PlatformMPIandHPC-Xperformssimilarly,whileIntelMPIshowsadropatsmalldatasetatscale40MPIProcesses/NodeHigherisbetter13LS-DYNAPerformance–SystemGenerationsCurrentSkylakesystemconfigurationoutperformspriorsystemgenerations–SkylakeplatformoutperformedBroadwellby21%,Haswellby51%,IvyBridgeby89%,SandyBridgeby132%,Westmereby222%,Nehalemby425%–Skylakeperforms41%betterthanBroadwellforthe3carsmodelonasingle-nodebasis–Systemcomponentsused:Skylake:2-socket20-coreXeonGold61382.
0GHz,2666MHzDIMMs,ConnectX-5EDRInfiniBandBroadwell:2-socket14-coreXeonE5-2690v42.
6GHz,2400MHzDIMMs,ConnectX-4EDRInfiniBandHaswell:2-socket14-coreXeonE5-2697v32.
6GHz,2133MHzDIMMs,ConnectX-4EDRInfiniBandIvyBridge:2-socket10-coreXeonE5-2680v22.
8GHz,1600MHzDIMMs,Connect-IBFDRInfiniBandSandyBridge:2-socket8-coreXeonE5-26802.
7GHz,1600MHzDIMMs,ConnectX-3FDRInfiniBandWestmere:2-socket6-coreXeonx56702.
93GHz,1333MHzDIMMs,ConnectX-2QDRInfiniBandNehalem:2-socket4-coreXeonx55702.
93GHz,1333MHzDIMMs,ConnectX-2QDRInfiniBandBestresultsshownHigherisbetter41%14LS-DYNASummaryLS-DYNAismulti-purposeexplicitandimplicitfiniteelementprogram–Utilizesbothcompute,memoryandnetworkcommunicationsforperformanceEffectofMPIonperformance–PlatformMPIandHPC-Xperformssimilarly,IntelMPIshowsadropatsmalldatasetEffectofSkylakegenerationonperformance–Providessubstantialperformancegainduetothelargercorecount,supportformemorychannels–Faster2666MHzDIMM(comparesto2400MHz)translatestoincrease2-3%inhigherperformanceEffortofCPUInstructionsonperformance–AVX-512performsworsecomparedtoAVX2,despitetheimprovedvectorization–AVX-512instructionsrunsatareducedclockfrequencyasAVX2andnormalclocksEffectofSNConperformance–EnablingSub-NUMAClusteringprovidessmalladvantage(~2-3%)onsinglenodeEffectfoLS-DYNAversiononperformance–SmallvarianceinperformanceamongdifferentLS-DYNAreleases;bestappearedtobeR7.
1.
31515ThankYouHPCAdvisoryCouncilAlltrademarksarepropertyoftheirrespectiveowners.
Allinformationisprovided"As-Is"withoutanykindofwarranty.
TheHPCAdvisoryCouncilmakesnorepresentationtotheaccuracyandcompletenessoftheinformationcontainedherein.
HPCAdvisoryCouncilundertakesnodutyandassumesnoobligationtoupdateorcorrectanyinformationpresentedherein

云雀云(larkyun)低至368元/月,广州移动1Gbps带宽VDS(带100G防御),常州联通1Gbps带宽VDS

云雀云(larkyun)当前主要运作国内线路的机器,最大提供1Gbps服务器,有云服务器(VDS)、也有独立服务器,对接国内、国外的效果都是相当靠谱的。此外,还有台湾hinet线路的动态云服务器和静态云服务器。当前,larkyun对广州移动二期正在搞优惠促销!官方网站:https://larkyun.top付款方式:支付宝、微信、USDT广移二期开售8折折扣码:56NZVE0YZN (试用于常州联...

Megalayer新加坡服务器国际带宽线路测评

前几天有关注到Megalayer云服务器提供商有打算在月底的时候新增新加坡机房,这个是继美国、中国香港、菲律宾之外的第四个机房。也有工单询问到官方,新加坡机房有包括CN2国内优化线路和国际带宽,CN2优化线路应该是和菲律宾差不多的。如果我们追求速度和稳定性的中文业务,建议还是选择CN2优化带宽的香港服务器。这里有要到Megalayer新加坡服务器国际带宽的测试服务器,E3-1230配置20M国际带...

Raksmart:香港高防服务器/20Mbps带宽(cn2+bgp)/40G-100Gbps防御

RAKsmart怎么样?RAKsmart香港机房新增了付费的DDoS高防保护服务,香港服务器默认接入20Mbps的大陆优化带宽(电信走CN2、联通和移动走BGP)。高防服务器需要在下单页面的IP Addresses Option里面选择购买,分:40Gbps大陆优化高防IP-$461/月、100Gbps国际BGP高防IP-$692/月,有兴趣的可以根据自己的需求来选择!点击进入:RAKsmart官...

ivybridge为你推荐
蓝瘦香菇被抢注最近玩网络上流传的难受香菇是什么典故刘祚天你们知道21世纪的DJ分为几种类型吗?(答对者重赏)8090lu.com《8090》节目有不有高清的在线观看网站啊?www.baitu.com韩国片爱人.欲望的观看地址www.5any.comwww.qbo5.com 这个网站要安装播放器yinrentangWeichentang正品怎么样,谁知道?sodu.tw给个看免费小说的网站www.175qq.com求带名字的情侣网名!www.diediao.com谁知道台湾的拼音怎么拼啊?有具体的对照表最好!www.seowhy.com哪里有免费学习seo的
域名城 singlehop linode 海外服务器 iisphpmysql payoneer evssl tightvnc 165邮箱 韩国名字大全 傲盾官网 厦门电信 外贸空间 网站加速软件 789 服务器维护 智能dns解析 国外在线代理服务器 lamp是什么意思 1美元 更多