E5sandybridge

sandybridge  时间:2021-03-27  阅读:()
2014LENOVO.
ALLRIGHTSRESERVED.
2OutlineParallelSystemdescription.
p775,p460anddx360M4,HardwareandSoftwareCompileroptionsandlibrariesused.
WRFtunableparametersforscalingruns.
nproc_x,nproc_y,numtiles,nio_groups,nio_tasks_per_groupWRFI/OSerialandParallelnetcdflibrariesfordataI/O.
WRFruntimeparametersforscalingruns.
SMT,MPItaskgrids,OpenMPthreads,quiltingtasks.
WRFperformancecomponentsforscalingruns.
Computation,Communication,I/O,loadImbalance.
WRFscalingrunresultsonp775,p460anddx360M4.
Conclusions.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
3IBMPOWER7p775system(P7-IH)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
2012IBMCorporation/464002///--/-/-((ManagementandNetworkRack7316TF3KEYBDISP2MMDU7316TF3KEYBDISP1BLANKBLANKBLANKPDUPDBLANKBLANKAirDuct4273J48Ex481GE,210GEBlank4273J48Ex481GE,210GEBlank7042-CR6HMC17042-CR6HMC2EMS-2p7508W32GB)EMS-1p7508W32GB)HDD(6HDD)HDD(6HDD)7042-CR6HMC3.
.
.
.
.
.
-+464002P+-//U-+464002P+-//U-+464002P+-//UComputeFrameARack001ComputeFrameBRack002ComputeFrameCRack003SimilartotheoldECMWFsystem,WaterCooledRack:3Supernodes;Supernode:4drawers;Drawer:32POWER7chips;Chip:8POWER7cores.
32Cores/node@3.
86GHz,256GB/node,8GB/core,1024cores/drawer,3072cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:4MB/core,DualmemorychannelDisklessnode,TorrentPOWER7chip,GiGEadaptorsonservicenodesHFIfibrenetwork,4xstorage(GPFS)nodes,3xDiskEnclosures.
AIX7.
1,XLF14,VAC12,GPFS,IBMParallelEnvironment,LoadLevelerMPIprofileLibrary,FloatingPointMonitor,VectorandScalarMassLibraries4IBMPOWER7p460Pureflexsystem(Firebird)CMAsystempartition,AirCooled,RearDoorHeatExchangers.
Rack:4chassis;Chassis:7nodes;Node:4POWER7chips;Chip:8POWER7cores32Cores/node@3.
55GHz,128GB/node,4GB/core,224cores/chassis,896cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:4MB/core,singlememorychannel2HDpernode,2QDRDualportInfinibandadaptors,GiGEDualRailFattreeInfinibandnetwork,10xp740storage(GPFS)nodes,8xDSC3700devices.
AIX7.
1,XLF14,VAC12,GPFS,IBMParallelEnvironment,LoadLevelerMPIprofileLibrary,VectorandScalarMassLibraries2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
ComputeRack14Chassis,28nodesComputeRack24Chassis,28nodesComputeRack34Chassis,28nodesComputeRack44Chassis,28nodesComputeRack54Chassis,28nodesStorageRack110xp470nodesStorageRack28xDSC3700+QDRInfinibandRack14Switches/managementComputeRack64Chassis,28nodesComputeRack74Chassis,28nodesComputeRack84Chassis,28nodesComputeRack94Chassis,28nodesComputeRack104Chassis,28nodes5IBMiDataplexdx360M4system(Sandybridge)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
NCEPWCOSSsystempartition,AirCooled,RearDoorHeatExchangersRack:72Nodes;Node:2Inteldx360m4sockets;Socket:8IntelE5-2670Sandybridgecores.
16Cores/node@2.
6GHz,32GB/node,2GB/core,1152cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:20MBsharedper8cores1HDpernode,MellanoxconnectX-FDR,10GiGEportsMellanoxIB4XFDRfullbisectionFatTreenetwork,10x3650M4storagenodes,20xDSC3700RHEL6.
2,IntelFORTRAN13,C11,GPFS,IBMParallelEnvironment,PlatformLSFMPIprofileLibraryComputeRack172nodesComputeRack272nodesComputeRack372nodesComputeRack472nodesComputeRack572nodesComputeRack672nodesComputeRack772nodesComputeRack872nodesInfinibandRack1FDR14I/ORack210nodesI/ORack212DSC37006TestedSystemsSummary2014LENOVO.
ALLRIGHTSRESERVED.
16THECMWFHPCWORKSHOPCorefrequncy(GHz)Memory(GB/core)CorestestedVectorSMT/HTsettingsICFabricOSStorageCompilersParallelEnvrnmntQueueingsystemsLibrariesp7753.
868(Alldimmsoccupied)6144VSXonlyforintrinsicsSMT4(SMT2to2048cores)HFIAIX7.
1GPFS4NSD3DiskEnclsrIBMXLCompilers(AIXv14)IBMPEforAIXLoadLevelerMPIProfile,MASS,HardwarePerfrmnceMonitorp4603.
554(Alldimmsoccupied)8192VSXOnlyforIntriscicsSMT4(SMT2to512cores)QDRDual-RailInfinibandFatTreeAIX7.
1GPFS10NSD8DSC3700IBMXLCompilers(V14)IBMPEforAIXLoadLevelerMPIProfile,MASSdx360M42.
62(Alldimmsoccupied)6144AVXEverywhereHT(Notused)FDRSingleRailInfinibandFatTreeRHEL6.
2GPFS10NSD20DSC3700IntelStudio13CompilersIBMPEforlinuxLSFMPIProfile7WRFv3.
3CompilerOptions2014LENOVO.
ALLRIGHTSRESERVED.
16THECMWFHPCWORKSHOPIdenticalWRFcodewasusedCOMPILEROptionsLibrariesp775FCOPTIM=-O3–qhot–qarch=pwr7–qtune=pwr7FCBASEOPTS=-qsmp=omp-qcache=auto–qfloat=rsqrtnetcdf,pnetcdf,massp7_simd,massvp7,mpihpmp460FCOPTIM=-O3–qhot–qarch=pwr7–qtune=pwr7FCBASEOPTS=-qsmp=omp-qcache=auto–qfloat=rsqrtnetcdf,pnetcdf,massp7_simd,massvp7,mpitracedx360M4FCOPTIM=-O3-xAVX-fp-modelfast=2–ipFCBASEOPTS=-ip-fno-alias-w-ftz-no-prec-div-no-prec-sqrt-alignall–openmpnetcdf,pnetcdf,mpitrace8WRFtunablesforscalingrunsWRFrunspecificsasdefinedinnamelist.
input:5kmhorizontalresolution,6sectimestep,12-hourforecast.
2200X1200X28gridpoints.
Oneoutputfileperforecastinghour.
FourBoundaryreadseverythreeforecasthours.
SameWRFtunableswereusedforeverysystem,basedonselectionsthatyieldedoptimalperformanceonthep775system.
nproc_x:LogicalMPItaskpartitioninx-direction.
nproc_y:LogicalMPItaskpartitioniny-direction.
numtiles:NumberoftilesthatcanbeusedinOpenMP.
nproc_xXnproc_y=numberofMPItasks.
Criticalincomparingcommunicationcharacteristics.
SMTwasusedonlyontheIBMPOWERsystems.
SMTwasemployedonPOWERsystemsbyusingtwoOpenMPthreadsperMPItask.
SMT2wasusedforrunswithlessthan2048coresonp775(Bestperformance).
SMT2wasusedforrunswithlessthan512coresonp460(Bestperformance).
Hyper-threadingwasnotbeneficialondx360M4system(notused).
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
9WRFI/OI/OreadingandwritingofWRFvariables.
13I/Owritesteps,eachwritinga7.
5GBfile.
1ReadfortheInitialconditions(6.
74GBfile).
4Readsfortheboundaryconditions(1.
47GBseach).
Dataingestscannotbedoneasynchronously.
Parallelnetcdfwasusedfordataingests(MPI-IO)forallscalingrunsReadoption11forinitialandboundarydata.
I/Onetcdfquiltingwasusedtowritedatafiles.
AssignthesameI/Otasksandgroupsonallthreesystemsforeachofthescalingruns.
LastI/Ostepisdonesynchronously,sinceWRFcomputationsterminate.
QuiltingI/OtimesonWRFtimersreportI/Osynchronizationtimeonly.
I/OisdonebyquiltingtasksontheI/Osubsystemwhilecomputetaskscompute.
I/OParallelnetcdfquiltingwasnotused.
CanfurtherimproveI/Owritingsteps,especiallythelastI/Ostep.
EarlyWRFversionhadproblemswithIBMParallelEnvironmentandparallelnetcdf.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
10WRFuniformvariablestunables2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
Unchangedvariablesonallsystemsforthesamenumberofphysicalcoresnproc_x;nproc_y,nio_groups,nio_tasks_per_group,numtilesPerformanceonPOWERsystemswasfoundtobealwaysbetterwhen:nproc_x1evenforrunswithasingleOpenMPthread.
numtiles>1actsasacacheblockmechanism(likeNPROMA)p775andp460arefavoredagainstdx360M4Forthetestingscenariosifnproc_x2didnothaveaneffectonperformance.
Choiceofnproc_x,nproc_y,numtiles:Wasbasedonbestperformanceonthep775numtiles=4wassetasanadvantageondx360M411WRFruntimeparametersTaskaffinity(binding)wasusedinalltestruns.
SMT/HT:ON–greenbackground(OneextraOpenMPthread),OFF–yellowbackground.
Variables:OpenMPthreads,numtiles,MPItasks,nproc_x,nproc_y,nio_groups,nio_tasks_per_groupPhysical/Logicalcores=OpenMP_threads*(nproc_x*nproc_y+nio_groups*nio_tasks_per_group)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
Numberofp775nodesOpenMPThreadsNumberofp460nodesOpenMPThreadsNumberofdx360nodesOpenMPThreadsnumtilesNumberofcoresMPITasksnproc_xxnproc_ynio_groupsxnio_tasks_per_group42428141281244x311x4828216142562526x421x416216232145125048x631x83243226424102450611x461x64844829624153676019x401x8644642128242048102020x511x496496419244307276019x401x812841284256444096102017x601x419241924384446144153018x851x612WRFRunstatisticsRunsonp775weredonewiththeMPIHPMlibrary:CollectHardwareperformancemonitordata(smalloverhead).
ScaleHPMdatatop460anddx360M4systemsbyfrequencyratios)EstimateSustainedGFLOPratesonallsystems.
SystemPeakrate=(numberofcoresx8xcorefrequency).
CollectMPIcommunicationstatistics.
Runsonp460anddx360M4weredonewiththeMPITRACElibrary:CollectMPIcommunicationstatistics.
MPIcommunicationfromtracelibrariescanhelpestimate:Communication:(minimumcommunicationamongallMPItasksinvolved).
LoadImbalance:(mediancommunication–minimumcommunication).
AccumulationofinternalWRFtimerscanhelpestimate:ReadI/Otimes(initialfilereadtime+boundaryreadtimes).
WriteI/Otimes(I/OWritequiltingtimefromsynchronization+LastI/OWritetimestep).
LastI/Owritestep:~(totalelapsedtime–totaltimefrominternaltimers).
TotalComputation(Purecomputation+communication+Loadimbalance).
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
13WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144p775TotalElapsedtime(seconds)Numberofp775CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation14WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144p460TotalElapsedtime(seconds)Numberofp460CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation15WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144dx360M4TotalElapsedtime(seconds)Numberofdx360M4CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation16WRFRunstatistics2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
005000.
0010000.
0015000.
0020000.
0025000.
00128256512102415362048307240966144TotalElapsedTime(sec)NumberOfCoresp775dx360M4p4600.
00100.
00200.
00300.
00400.
00500.
00600.
00700.
00800.
00900.
001000.
00128256512102415362048307240966144TotalCommunicationTime(sec)NumberOfCoresp775dx360M4p4600.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144TotalPureComputationTime(sec)NumberOfCoresp775dx360M4p4600.
00500.
001000.
001500.
002000.
002500.
00128256512102415362048307240966144TotalLoadImbalanceTime(sec)NumberOfCoresp775dx360M4p46017WRFRunstatistics2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
000.
501.
001.
502.
002.
503.
00128256512102415362048307240966144Averagecomputetimepertimestep(seconds)NumberOfCoresp775dx360M4p4600.
005.
0010.
0015.
0020.
0025.
00128256512102415362048307240966144Averagereadtimeperreadtimestep(seconds)NumberOfCoresp775dx360M4p4600.
005.
0010.
0015.
0020.
0025.
00128256512102415362048307240966144Averagewritetimeperwritetimestep(seconds)NumberOfCoresp775dx360M4p46018WRFGFLOPRates2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002.
004.
006.
008.
0010.
0012.
0014.
0016.
00128256512102415362048307240966144Percent(%)SustainedofPeakPerformanceNumberOfCoresp775dx360M4p4600.
001000.
002000.
003000.
004000.
005000.
006000.
007000.
008000.
009000.
00128256512102415362048307240966144GFLOPSSustainedNumberOfCoresp775GFLOPSSustaineddx360M4GFLOPSSustainedp460GFLOPSSustainedPeakGFLOPS=NumberofcoresX8xCorefrequency.
p775SustainedGFLOPS=(10-9/p775_run_time)*(PM_VSU_1FLOP+2*PM_VSU_2FLOP+4*PM_VSU_4FLOP+8*PM_VSU_8FLOP)p460SustainedGGLOPS=p775SustainedGFLOPS*(p775_run_time/p460_run_time)dx360M4SustainedGFLOPS=p775SustainedGFLOPS*(p775_run_time/dx360M4_run_time)19ConclusionsWRFscalesandperformswellonalltestedsystems.
QuiltingI/Owithnetcdfworksverywellonallsystems.
Parallelnetcdffordataingestimprovesdatareadtimes.
WRFisapopularsingleprecisionCode.
Itrunsverywellondx360M4system.
-xAVXworksverywell.
IntelCompilersdoagreatjobproducingoptimalandfastbinaries.
numtiles~cacheblockparameterforadditionalperformance.
Hyperthreadinggivesnobenefittowardsoverallperformance.
NearneighborcommunicationishandledeffectivelybyFDRIB.
Itrunsokonp775andp460systems.
VSXdoesnotworkwell.
Codecrashesifcompiledwith-qsimdIBMXLcompilersdoOKwith–O3–qhot(VectorMASSlibrary).
Thinrectangulardecompositionsworkok(cachingandvectorMASS).
SMTworkswellonp775,duetoavailablememory-to-coreBW.
Nearneighborcommunicationanoverkillforp775,butOKforp460.
Performanceoddswerestackedagainstdx360M4.
Runswith6144coresanddifferentnproc_x,nproc_yyieldevenbetterperformance.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.

6元虚拟主机是否值得购买

6元虚拟主机是否值得购买?近期各商家都纷纷推出了优质便宜的虚拟主机产品,其中不少6元的虚拟主机,这种主机是否值得购买,下面我们一起来看看。1、百度云6元体验三个月(活动时间有限抓紧体验)体验地址:https://cloud.baidu.com/campaign/experience/index.html?from=bchPromotion20182、Ucloud 10元云主机体验地址:https:...

Hostwinds:免费更换IP/优惠码美元VPS免费更换IP4.99,7月最新优惠码西雅图直连VPS

hostwinds怎么样?2021年7月最新 hostwinds 优惠码整理,Hostwinds 优惠套餐整理,Hostwinds 西雅图机房直连线路 VPS 推荐,目前最低仅需 $4.99 月付,并且可以免费更换 IP 地址。本文分享整理一下最新的 Hostwinds 优惠套餐,包括托管型 VPS、无托管型 VPS、Linux VPS、Windows VPS 等多种套餐。目前 Hostwinds...

DMIT:香港国际线路vps,1.5GB内存/20GB SSD空间/4TB流量/1Gbps/KVM,$9.81/月

DMIT怎么样?DMIT是一家美国主机商,主要提供KVM VPS、独立服务器等,主要提供香港CN2、洛杉矶CN2 GIA等KVM VPS,稳定性、网络都很不错。支持中文客服,可Paypal、支付宝付款。2020年推出的香港国际线路的KVM VPS,大带宽,适合中转落地使用。现在有永久9折优惠码:July-4-Lite-10OFF,季付及以上还有折扣,非 中国路由优化;AS4134,AS4837 均...

sandybridge为你推荐
云爆发什么是蒸汽云爆炸?要具备那些条件?mathplayer如何学好理科lunwenjiancepaperfree论文检测安全吗xyq.163.cbg.com『梦幻西游』那藏宝阁怎么登录?ip在线查询通过对方的IP地址怎么样找到他的详细地址?www.vtigu.com初三了,为什么考试的数学题都那么难,我最多也就135,最后一道选择,填空啊根本没法做,最后几道大题倒www.kaspersky.com.cn现在网上又有病毒了?www.zhiboba.com登录哪个网站可以看nba当天的直播 是直播dpscycle魔兽世界国服,求几个暗影MS的输出宏222cc.com有什么电影网站啊
域名估价 网站域名备案查询 最新代理服务器ip 个人域名备案流程 winhost 视频存储服务器 idc测评网 debian6 云图标 地址大全 全能主机 java空间 cpanel空间 qq云端 免费智能解析 空间技术网 怎么建立邮箱 drupal安装 闪讯官网 智能dns解析 更多