E5sandybridge

sandybridge  时间:2021-03-27  阅读:()
2014LENOVO.
ALLRIGHTSRESERVED.
2OutlineParallelSystemdescription.
p775,p460anddx360M4,HardwareandSoftwareCompileroptionsandlibrariesused.
WRFtunableparametersforscalingruns.
nproc_x,nproc_y,numtiles,nio_groups,nio_tasks_per_groupWRFI/OSerialandParallelnetcdflibrariesfordataI/O.
WRFruntimeparametersforscalingruns.
SMT,MPItaskgrids,OpenMPthreads,quiltingtasks.
WRFperformancecomponentsforscalingruns.
Computation,Communication,I/O,loadImbalance.
WRFscalingrunresultsonp775,p460anddx360M4.
Conclusions.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
3IBMPOWER7p775system(P7-IH)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
2012IBMCorporation/464002///--/-/-((ManagementandNetworkRack7316TF3KEYBDISP2MMDU7316TF3KEYBDISP1BLANKBLANKBLANKPDUPDBLANKBLANKAirDuct4273J48Ex481GE,210GEBlank4273J48Ex481GE,210GEBlank7042-CR6HMC17042-CR6HMC2EMS-2p7508W32GB)EMS-1p7508W32GB)HDD(6HDD)HDD(6HDD)7042-CR6HMC3.
.
.
.
.
.
-+464002P+-//U-+464002P+-//U-+464002P+-//UComputeFrameARack001ComputeFrameBRack002ComputeFrameCRack003SimilartotheoldECMWFsystem,WaterCooledRack:3Supernodes;Supernode:4drawers;Drawer:32POWER7chips;Chip:8POWER7cores.
32Cores/node@3.
86GHz,256GB/node,8GB/core,1024cores/drawer,3072cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:4MB/core,DualmemorychannelDisklessnode,TorrentPOWER7chip,GiGEadaptorsonservicenodesHFIfibrenetwork,4xstorage(GPFS)nodes,3xDiskEnclosures.
AIX7.
1,XLF14,VAC12,GPFS,IBMParallelEnvironment,LoadLevelerMPIprofileLibrary,FloatingPointMonitor,VectorandScalarMassLibraries4IBMPOWER7p460Pureflexsystem(Firebird)CMAsystempartition,AirCooled,RearDoorHeatExchangers.
Rack:4chassis;Chassis:7nodes;Node:4POWER7chips;Chip:8POWER7cores32Cores/node@3.
55GHz,128GB/node,4GB/core,224cores/chassis,896cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:4MB/core,singlememorychannel2HDpernode,2QDRDualportInfinibandadaptors,GiGEDualRailFattreeInfinibandnetwork,10xp740storage(GPFS)nodes,8xDSC3700devices.
AIX7.
1,XLF14,VAC12,GPFS,IBMParallelEnvironment,LoadLevelerMPIprofileLibrary,VectorandScalarMassLibraries2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
ComputeRack14Chassis,28nodesComputeRack24Chassis,28nodesComputeRack34Chassis,28nodesComputeRack44Chassis,28nodesComputeRack54Chassis,28nodesStorageRack110xp470nodesStorageRack28xDSC3700+QDRInfinibandRack14Switches/managementComputeRack64Chassis,28nodesComputeRack74Chassis,28nodesComputeRack84Chassis,28nodesComputeRack94Chassis,28nodesComputeRack104Chassis,28nodes5IBMiDataplexdx360M4system(Sandybridge)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
NCEPWCOSSsystempartition,AirCooled,RearDoorHeatExchangersRack:72Nodes;Node:2Inteldx360m4sockets;Socket:8IntelE5-2670Sandybridgecores.
16Cores/node@2.
6GHz,32GB/node,2GB/core,1152cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:20MBsharedper8cores1HDpernode,MellanoxconnectX-FDR,10GiGEportsMellanoxIB4XFDRfullbisectionFatTreenetwork,10x3650M4storagenodes,20xDSC3700RHEL6.
2,IntelFORTRAN13,C11,GPFS,IBMParallelEnvironment,PlatformLSFMPIprofileLibraryComputeRack172nodesComputeRack272nodesComputeRack372nodesComputeRack472nodesComputeRack572nodesComputeRack672nodesComputeRack772nodesComputeRack872nodesInfinibandRack1FDR14I/ORack210nodesI/ORack212DSC37006TestedSystemsSummary2014LENOVO.
ALLRIGHTSRESERVED.
16THECMWFHPCWORKSHOPCorefrequncy(GHz)Memory(GB/core)CorestestedVectorSMT/HTsettingsICFabricOSStorageCompilersParallelEnvrnmntQueueingsystemsLibrariesp7753.
868(Alldimmsoccupied)6144VSXonlyforintrinsicsSMT4(SMT2to2048cores)HFIAIX7.
1GPFS4NSD3DiskEnclsrIBMXLCompilers(AIXv14)IBMPEforAIXLoadLevelerMPIProfile,MASS,HardwarePerfrmnceMonitorp4603.
554(Alldimmsoccupied)8192VSXOnlyforIntriscicsSMT4(SMT2to512cores)QDRDual-RailInfinibandFatTreeAIX7.
1GPFS10NSD8DSC3700IBMXLCompilers(V14)IBMPEforAIXLoadLevelerMPIProfile,MASSdx360M42.
62(Alldimmsoccupied)6144AVXEverywhereHT(Notused)FDRSingleRailInfinibandFatTreeRHEL6.
2GPFS10NSD20DSC3700IntelStudio13CompilersIBMPEforlinuxLSFMPIProfile7WRFv3.
3CompilerOptions2014LENOVO.
ALLRIGHTSRESERVED.
16THECMWFHPCWORKSHOPIdenticalWRFcodewasusedCOMPILEROptionsLibrariesp775FCOPTIM=-O3–qhot–qarch=pwr7–qtune=pwr7FCBASEOPTS=-qsmp=omp-qcache=auto–qfloat=rsqrtnetcdf,pnetcdf,massp7_simd,massvp7,mpihpmp460FCOPTIM=-O3–qhot–qarch=pwr7–qtune=pwr7FCBASEOPTS=-qsmp=omp-qcache=auto–qfloat=rsqrtnetcdf,pnetcdf,massp7_simd,massvp7,mpitracedx360M4FCOPTIM=-O3-xAVX-fp-modelfast=2–ipFCBASEOPTS=-ip-fno-alias-w-ftz-no-prec-div-no-prec-sqrt-alignall–openmpnetcdf,pnetcdf,mpitrace8WRFtunablesforscalingrunsWRFrunspecificsasdefinedinnamelist.
input:5kmhorizontalresolution,6sectimestep,12-hourforecast.
2200X1200X28gridpoints.
Oneoutputfileperforecastinghour.
FourBoundaryreadseverythreeforecasthours.
SameWRFtunableswereusedforeverysystem,basedonselectionsthatyieldedoptimalperformanceonthep775system.
nproc_x:LogicalMPItaskpartitioninx-direction.
nproc_y:LogicalMPItaskpartitioniny-direction.
numtiles:NumberoftilesthatcanbeusedinOpenMP.
nproc_xXnproc_y=numberofMPItasks.
Criticalincomparingcommunicationcharacteristics.
SMTwasusedonlyontheIBMPOWERsystems.
SMTwasemployedonPOWERsystemsbyusingtwoOpenMPthreadsperMPItask.
SMT2wasusedforrunswithlessthan2048coresonp775(Bestperformance).
SMT2wasusedforrunswithlessthan512coresonp460(Bestperformance).
Hyper-threadingwasnotbeneficialondx360M4system(notused).
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
9WRFI/OI/OreadingandwritingofWRFvariables.
13I/Owritesteps,eachwritinga7.
5GBfile.
1ReadfortheInitialconditions(6.
74GBfile).
4Readsfortheboundaryconditions(1.
47GBseach).
Dataingestscannotbedoneasynchronously.
Parallelnetcdfwasusedfordataingests(MPI-IO)forallscalingrunsReadoption11forinitialandboundarydata.
I/Onetcdfquiltingwasusedtowritedatafiles.
AssignthesameI/Otasksandgroupsonallthreesystemsforeachofthescalingruns.
LastI/Ostepisdonesynchronously,sinceWRFcomputationsterminate.
QuiltingI/OtimesonWRFtimersreportI/Osynchronizationtimeonly.
I/OisdonebyquiltingtasksontheI/Osubsystemwhilecomputetaskscompute.
I/OParallelnetcdfquiltingwasnotused.
CanfurtherimproveI/Owritingsteps,especiallythelastI/Ostep.
EarlyWRFversionhadproblemswithIBMParallelEnvironmentandparallelnetcdf.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
10WRFuniformvariablestunables2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
Unchangedvariablesonallsystemsforthesamenumberofphysicalcoresnproc_x;nproc_y,nio_groups,nio_tasks_per_group,numtilesPerformanceonPOWERsystemswasfoundtobealwaysbetterwhen:nproc_x1evenforrunswithasingleOpenMPthread.
numtiles>1actsasacacheblockmechanism(likeNPROMA)p775andp460arefavoredagainstdx360M4Forthetestingscenariosifnproc_x2didnothaveaneffectonperformance.
Choiceofnproc_x,nproc_y,numtiles:Wasbasedonbestperformanceonthep775numtiles=4wassetasanadvantageondx360M411WRFruntimeparametersTaskaffinity(binding)wasusedinalltestruns.
SMT/HT:ON–greenbackground(OneextraOpenMPthread),OFF–yellowbackground.
Variables:OpenMPthreads,numtiles,MPItasks,nproc_x,nproc_y,nio_groups,nio_tasks_per_groupPhysical/Logicalcores=OpenMP_threads*(nproc_x*nproc_y+nio_groups*nio_tasks_per_group)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
Numberofp775nodesOpenMPThreadsNumberofp460nodesOpenMPThreadsNumberofdx360nodesOpenMPThreadsnumtilesNumberofcoresMPITasksnproc_xxnproc_ynio_groupsxnio_tasks_per_group42428141281244x311x4828216142562526x421x416216232145125048x631x83243226424102450611x461x64844829624153676019x401x8644642128242048102020x511x496496419244307276019x401x812841284256444096102017x601x419241924384446144153018x851x612WRFRunstatisticsRunsonp775weredonewiththeMPIHPMlibrary:CollectHardwareperformancemonitordata(smalloverhead).
ScaleHPMdatatop460anddx360M4systemsbyfrequencyratios)EstimateSustainedGFLOPratesonallsystems.
SystemPeakrate=(numberofcoresx8xcorefrequency).
CollectMPIcommunicationstatistics.
Runsonp460anddx360M4weredonewiththeMPITRACElibrary:CollectMPIcommunicationstatistics.
MPIcommunicationfromtracelibrariescanhelpestimate:Communication:(minimumcommunicationamongallMPItasksinvolved).
LoadImbalance:(mediancommunication–minimumcommunication).
AccumulationofinternalWRFtimerscanhelpestimate:ReadI/Otimes(initialfilereadtime+boundaryreadtimes).
WriteI/Otimes(I/OWritequiltingtimefromsynchronization+LastI/OWritetimestep).
LastI/Owritestep:~(totalelapsedtime–totaltimefrominternaltimers).
TotalComputation(Purecomputation+communication+Loadimbalance).
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
13WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144p775TotalElapsedtime(seconds)Numberofp775CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation14WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144p460TotalElapsedtime(seconds)Numberofp460CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation15WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144dx360M4TotalElapsedtime(seconds)Numberofdx360M4CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation16WRFRunstatistics2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
005000.
0010000.
0015000.
0020000.
0025000.
00128256512102415362048307240966144TotalElapsedTime(sec)NumberOfCoresp775dx360M4p4600.
00100.
00200.
00300.
00400.
00500.
00600.
00700.
00800.
00900.
001000.
00128256512102415362048307240966144TotalCommunicationTime(sec)NumberOfCoresp775dx360M4p4600.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144TotalPureComputationTime(sec)NumberOfCoresp775dx360M4p4600.
00500.
001000.
001500.
002000.
002500.
00128256512102415362048307240966144TotalLoadImbalanceTime(sec)NumberOfCoresp775dx360M4p46017WRFRunstatistics2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
000.
501.
001.
502.
002.
503.
00128256512102415362048307240966144Averagecomputetimepertimestep(seconds)NumberOfCoresp775dx360M4p4600.
005.
0010.
0015.
0020.
0025.
00128256512102415362048307240966144Averagereadtimeperreadtimestep(seconds)NumberOfCoresp775dx360M4p4600.
005.
0010.
0015.
0020.
0025.
00128256512102415362048307240966144Averagewritetimeperwritetimestep(seconds)NumberOfCoresp775dx360M4p46018WRFGFLOPRates2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002.
004.
006.
008.
0010.
0012.
0014.
0016.
00128256512102415362048307240966144Percent(%)SustainedofPeakPerformanceNumberOfCoresp775dx360M4p4600.
001000.
002000.
003000.
004000.
005000.
006000.
007000.
008000.
009000.
00128256512102415362048307240966144GFLOPSSustainedNumberOfCoresp775GFLOPSSustaineddx360M4GFLOPSSustainedp460GFLOPSSustainedPeakGFLOPS=NumberofcoresX8xCorefrequency.
p775SustainedGFLOPS=(10-9/p775_run_time)*(PM_VSU_1FLOP+2*PM_VSU_2FLOP+4*PM_VSU_4FLOP+8*PM_VSU_8FLOP)p460SustainedGGLOPS=p775SustainedGFLOPS*(p775_run_time/p460_run_time)dx360M4SustainedGFLOPS=p775SustainedGFLOPS*(p775_run_time/dx360M4_run_time)19ConclusionsWRFscalesandperformswellonalltestedsystems.
QuiltingI/Owithnetcdfworksverywellonallsystems.
Parallelnetcdffordataingestimprovesdatareadtimes.
WRFisapopularsingleprecisionCode.
Itrunsverywellondx360M4system.
-xAVXworksverywell.
IntelCompilersdoagreatjobproducingoptimalandfastbinaries.
numtiles~cacheblockparameterforadditionalperformance.
Hyperthreadinggivesnobenefittowardsoverallperformance.
NearneighborcommunicationishandledeffectivelybyFDRIB.
Itrunsokonp775andp460systems.
VSXdoesnotworkwell.
Codecrashesifcompiledwith-qsimdIBMXLcompilersdoOKwith–O3–qhot(VectorMASSlibrary).
Thinrectangulardecompositionsworkok(cachingandvectorMASS).
SMTworkswellonp775,duetoavailablememory-to-coreBW.
Nearneighborcommunicationanoverkillforp775,butOKforp460.
Performanceoddswerestackedagainstdx360M4.
Runswith6144coresanddifferentnproc_x,nproc_yyieldevenbetterperformance.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.

盘点618年中大促中这款云服务器/VPS主机相对值得选择

昨天有在"盘点2021年主流云服务器商家618年中大促活动"文章中整理到当前年中大促618活动期间的一些国内国外的云服务商的促销活动,相对来说每年年中和年末的活动力度还是蛮大的,唯独就是活动太过于密集,而且商家比较多,导致我们很多新人不懂如何选择,当然对于我们这些老油条还是会选择的,估计没有比我们更聪明的进行薅爆款新人活动。有网友提到,是否可以整理一篇当前的这些活动商家中的促销产品。哪些商家哪款产...

妮妮云80元/月,香港站群云服务器 1核1G

妮妮云的来历妮妮云是 789 陈总 张总 三方共同投资建立的网站 本着“良心 便宜 稳定”的初衷 为小白用户避免被坑妮妮云的市场定位妮妮云主要代理市场稳定速度的云服务器产品,避免新手购买云服务器的时候众多商家不知道如何选择,妮妮云就帮你选择好了产品,无需承担购买风险,不用担心出现被跑路 被诈骗的情况。妮妮云的售后保证妮妮云退款 通过于合作商的友好协商,云服务器提供2天内全额退款,超过2天不退款 物...

这几个Vultr VPS主机商家的优点造就商家的用户驱动力

目前云服务器市场竞争是相当的大的,比如我们在年中活动中看到各大服务商都找准这个噱头的活动发布各种活动,有的甚至就是平时的活动价格,只是换一个说法而已。可见这个行业确实竞争很大,当然我们也可以看到很多主机商几个月就消失,也有看到很多个人商家捣鼓几个品牌然后忽悠一圈跑路的。当然,个人建议在选择服务商的时候尽量选择老牌商家,这样性能更为稳定一些。近期可能会准备重新整理Vultr商家的一些信息和教程。以前...

sandybridge为你推荐
公司网络被攻击最近企业受到网络攻击的事件特别多,怎么才能有效地保护企业的网络安全呢?mathplayer西南交大网页上的 Mathplayer 安装了为什么还是用不了?www.se333se.com米奇网www.qvod333.com 看电影的效果好不?ip查询器查看自己IP的指令lcoc.topoffsettop和scrolltop的区别partnersonline电脑内一切浏览器无法打开dadi.tvApple TV是干嘛的?怎么用?多少钱?www.diediao.com这是什么电影蜘蛛机器人汤姆克鲁斯主演,有巴掌大小的蜘蛛机器人,很厉害的,科幻片吧,是什么电影恶魔兜兜恶魔圈怎么选癫狂
网站域名注册 域名注册服务 深圳主机租用 工信部域名备案 免费动态域名解析 ipage 账号泄露 双11抢红包攻略 全能主机 申请个人网页 智能骨干网 cpanel空间 免费个人空间 网站cdn加速 美国网站服务器 网游服务器 国外ip加速器 视频服务器是什么 全能空间 photobucket 更多