E5sandybridge

sandybridge  时间:2021-03-27  阅读:()
2014LENOVO.
ALLRIGHTSRESERVED.
2OutlineParallelSystemdescription.
p775,p460anddx360M4,HardwareandSoftwareCompileroptionsandlibrariesused.
WRFtunableparametersforscalingruns.
nproc_x,nproc_y,numtiles,nio_groups,nio_tasks_per_groupWRFI/OSerialandParallelnetcdflibrariesfordataI/O.
WRFruntimeparametersforscalingruns.
SMT,MPItaskgrids,OpenMPthreads,quiltingtasks.
WRFperformancecomponentsforscalingruns.
Computation,Communication,I/O,loadImbalance.
WRFscalingrunresultsonp775,p460anddx360M4.
Conclusions.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
3IBMPOWER7p775system(P7-IH)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
2012IBMCorporation/464002///--/-/-((ManagementandNetworkRack7316TF3KEYBDISP2MMDU7316TF3KEYBDISP1BLANKBLANKBLANKPDUPDBLANKBLANKAirDuct4273J48Ex481GE,210GEBlank4273J48Ex481GE,210GEBlank7042-CR6HMC17042-CR6HMC2EMS-2p7508W32GB)EMS-1p7508W32GB)HDD(6HDD)HDD(6HDD)7042-CR6HMC3.
.
.
.
.
.
-+464002P+-//U-+464002P+-//U-+464002P+-//UComputeFrameARack001ComputeFrameBRack002ComputeFrameCRack003SimilartotheoldECMWFsystem,WaterCooledRack:3Supernodes;Supernode:4drawers;Drawer:32POWER7chips;Chip:8POWER7cores.
32Cores/node@3.
86GHz,256GB/node,8GB/core,1024cores/drawer,3072cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:4MB/core,DualmemorychannelDisklessnode,TorrentPOWER7chip,GiGEadaptorsonservicenodesHFIfibrenetwork,4xstorage(GPFS)nodes,3xDiskEnclosures.
AIX7.
1,XLF14,VAC12,GPFS,IBMParallelEnvironment,LoadLevelerMPIprofileLibrary,FloatingPointMonitor,VectorandScalarMassLibraries4IBMPOWER7p460Pureflexsystem(Firebird)CMAsystempartition,AirCooled,RearDoorHeatExchangers.
Rack:4chassis;Chassis:7nodes;Node:4POWER7chips;Chip:8POWER7cores32Cores/node@3.
55GHz,128GB/node,4GB/core,224cores/chassis,896cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:4MB/core,singlememorychannel2HDpernode,2QDRDualportInfinibandadaptors,GiGEDualRailFattreeInfinibandnetwork,10xp740storage(GPFS)nodes,8xDSC3700devices.
AIX7.
1,XLF14,VAC12,GPFS,IBMParallelEnvironment,LoadLevelerMPIprofileLibrary,VectorandScalarMassLibraries2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
ComputeRack14Chassis,28nodesComputeRack24Chassis,28nodesComputeRack34Chassis,28nodesComputeRack44Chassis,28nodesComputeRack54Chassis,28nodesStorageRack110xp470nodesStorageRack28xDSC3700+QDRInfinibandRack14Switches/managementComputeRack64Chassis,28nodesComputeRack74Chassis,28nodesComputeRack84Chassis,28nodesComputeRack94Chassis,28nodesComputeRack104Chassis,28nodes5IBMiDataplexdx360M4system(Sandybridge)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
NCEPWCOSSsystempartition,AirCooled,RearDoorHeatExchangersRack:72Nodes;Node:2Inteldx360m4sockets;Socket:8IntelE5-2670Sandybridgecores.
16Cores/node@2.
6GHz,32GB/node,2GB/core,1152cores/rack,L1:32KBinstruction/32KBdata;L2:256KB/core;L3:20MBsharedper8cores1HDpernode,MellanoxconnectX-FDR,10GiGEportsMellanoxIB4XFDRfullbisectionFatTreenetwork,10x3650M4storagenodes,20xDSC3700RHEL6.
2,IntelFORTRAN13,C11,GPFS,IBMParallelEnvironment,PlatformLSFMPIprofileLibraryComputeRack172nodesComputeRack272nodesComputeRack372nodesComputeRack472nodesComputeRack572nodesComputeRack672nodesComputeRack772nodesComputeRack872nodesInfinibandRack1FDR14I/ORack210nodesI/ORack212DSC37006TestedSystemsSummary2014LENOVO.
ALLRIGHTSRESERVED.
16THECMWFHPCWORKSHOPCorefrequncy(GHz)Memory(GB/core)CorestestedVectorSMT/HTsettingsICFabricOSStorageCompilersParallelEnvrnmntQueueingsystemsLibrariesp7753.
868(Alldimmsoccupied)6144VSXonlyforintrinsicsSMT4(SMT2to2048cores)HFIAIX7.
1GPFS4NSD3DiskEnclsrIBMXLCompilers(AIXv14)IBMPEforAIXLoadLevelerMPIProfile,MASS,HardwarePerfrmnceMonitorp4603.
554(Alldimmsoccupied)8192VSXOnlyforIntriscicsSMT4(SMT2to512cores)QDRDual-RailInfinibandFatTreeAIX7.
1GPFS10NSD8DSC3700IBMXLCompilers(V14)IBMPEforAIXLoadLevelerMPIProfile,MASSdx360M42.
62(Alldimmsoccupied)6144AVXEverywhereHT(Notused)FDRSingleRailInfinibandFatTreeRHEL6.
2GPFS10NSD20DSC3700IntelStudio13CompilersIBMPEforlinuxLSFMPIProfile7WRFv3.
3CompilerOptions2014LENOVO.
ALLRIGHTSRESERVED.
16THECMWFHPCWORKSHOPIdenticalWRFcodewasusedCOMPILEROptionsLibrariesp775FCOPTIM=-O3–qhot–qarch=pwr7–qtune=pwr7FCBASEOPTS=-qsmp=omp-qcache=auto–qfloat=rsqrtnetcdf,pnetcdf,massp7_simd,massvp7,mpihpmp460FCOPTIM=-O3–qhot–qarch=pwr7–qtune=pwr7FCBASEOPTS=-qsmp=omp-qcache=auto–qfloat=rsqrtnetcdf,pnetcdf,massp7_simd,massvp7,mpitracedx360M4FCOPTIM=-O3-xAVX-fp-modelfast=2–ipFCBASEOPTS=-ip-fno-alias-w-ftz-no-prec-div-no-prec-sqrt-alignall–openmpnetcdf,pnetcdf,mpitrace8WRFtunablesforscalingrunsWRFrunspecificsasdefinedinnamelist.
input:5kmhorizontalresolution,6sectimestep,12-hourforecast.
2200X1200X28gridpoints.
Oneoutputfileperforecastinghour.
FourBoundaryreadseverythreeforecasthours.
SameWRFtunableswereusedforeverysystem,basedonselectionsthatyieldedoptimalperformanceonthep775system.
nproc_x:LogicalMPItaskpartitioninx-direction.
nproc_y:LogicalMPItaskpartitioniny-direction.
numtiles:NumberoftilesthatcanbeusedinOpenMP.
nproc_xXnproc_y=numberofMPItasks.
Criticalincomparingcommunicationcharacteristics.
SMTwasusedonlyontheIBMPOWERsystems.
SMTwasemployedonPOWERsystemsbyusingtwoOpenMPthreadsperMPItask.
SMT2wasusedforrunswithlessthan2048coresonp775(Bestperformance).
SMT2wasusedforrunswithlessthan512coresonp460(Bestperformance).
Hyper-threadingwasnotbeneficialondx360M4system(notused).
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
9WRFI/OI/OreadingandwritingofWRFvariables.
13I/Owritesteps,eachwritinga7.
5GBfile.
1ReadfortheInitialconditions(6.
74GBfile).
4Readsfortheboundaryconditions(1.
47GBseach).
Dataingestscannotbedoneasynchronously.
Parallelnetcdfwasusedfordataingests(MPI-IO)forallscalingrunsReadoption11forinitialandboundarydata.
I/Onetcdfquiltingwasusedtowritedatafiles.
AssignthesameI/Otasksandgroupsonallthreesystemsforeachofthescalingruns.
LastI/Ostepisdonesynchronously,sinceWRFcomputationsterminate.
QuiltingI/OtimesonWRFtimersreportI/Osynchronizationtimeonly.
I/OisdonebyquiltingtasksontheI/Osubsystemwhilecomputetaskscompute.
I/OParallelnetcdfquiltingwasnotused.
CanfurtherimproveI/Owritingsteps,especiallythelastI/Ostep.
EarlyWRFversionhadproblemswithIBMParallelEnvironmentandparallelnetcdf.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
10WRFuniformvariablestunables2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
Unchangedvariablesonallsystemsforthesamenumberofphysicalcoresnproc_x;nproc_y,nio_groups,nio_tasks_per_group,numtilesPerformanceonPOWERsystemswasfoundtobealwaysbetterwhen:nproc_x1evenforrunswithasingleOpenMPthread.
numtiles>1actsasacacheblockmechanism(likeNPROMA)p775andp460arefavoredagainstdx360M4Forthetestingscenariosifnproc_x2didnothaveaneffectonperformance.
Choiceofnproc_x,nproc_y,numtiles:Wasbasedonbestperformanceonthep775numtiles=4wassetasanadvantageondx360M411WRFruntimeparametersTaskaffinity(binding)wasusedinalltestruns.
SMT/HT:ON–greenbackground(OneextraOpenMPthread),OFF–yellowbackground.
Variables:OpenMPthreads,numtiles,MPItasks,nproc_x,nproc_y,nio_groups,nio_tasks_per_groupPhysical/Logicalcores=OpenMP_threads*(nproc_x*nproc_y+nio_groups*nio_tasks_per_group)2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
Numberofp775nodesOpenMPThreadsNumberofp460nodesOpenMPThreadsNumberofdx360nodesOpenMPThreadsnumtilesNumberofcoresMPITasksnproc_xxnproc_ynio_groupsxnio_tasks_per_group42428141281244x311x4828216142562526x421x416216232145125048x631x83243226424102450611x461x64844829624153676019x401x8644642128242048102020x511x496496419244307276019x401x812841284256444096102017x601x419241924384446144153018x851x612WRFRunstatisticsRunsonp775weredonewiththeMPIHPMlibrary:CollectHardwareperformancemonitordata(smalloverhead).
ScaleHPMdatatop460anddx360M4systemsbyfrequencyratios)EstimateSustainedGFLOPratesonallsystems.
SystemPeakrate=(numberofcoresx8xcorefrequency).
CollectMPIcommunicationstatistics.
Runsonp460anddx360M4weredonewiththeMPITRACElibrary:CollectMPIcommunicationstatistics.
MPIcommunicationfromtracelibrariescanhelpestimate:Communication:(minimumcommunicationamongallMPItasksinvolved).
LoadImbalance:(mediancommunication–minimumcommunication).
AccumulationofinternalWRFtimerscanhelpestimate:ReadI/Otimes(initialfilereadtime+boundaryreadtimes).
WriteI/Otimes(I/OWritequiltingtimefromsynchronization+LastI/OWritetimestep).
LastI/Owritestep:~(totalelapsedtime–totaltimefrominternaltimers).
TotalComputation(Purecomputation+communication+Loadimbalance).
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
13WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144p775TotalElapsedtime(seconds)Numberofp775CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation14WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144p460TotalElapsedtime(seconds)Numberofp460CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation15WRFScalingResults2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144dx360M4TotalElapsedtime(seconds)Numberofdx360M4CoresWriteI/OReadI/OInitialization+terminationLoadimbalanceCommunicationComputation16WRFRunstatistics2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
005000.
0010000.
0015000.
0020000.
0025000.
00128256512102415362048307240966144TotalElapsedTime(sec)NumberOfCoresp775dx360M4p4600.
00100.
00200.
00300.
00400.
00500.
00600.
00700.
00800.
00900.
001000.
00128256512102415362048307240966144TotalCommunicationTime(sec)NumberOfCoresp775dx360M4p4600.
002000.
004000.
006000.
008000.
0010000.
0012000.
0014000.
0016000.
0018000.
0020000.
00128256512102415362048307240966144TotalPureComputationTime(sec)NumberOfCoresp775dx360M4p4600.
00500.
001000.
001500.
002000.
002500.
00128256512102415362048307240966144TotalLoadImbalanceTime(sec)NumberOfCoresp775dx360M4p46017WRFRunstatistics2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
000.
501.
001.
502.
002.
503.
00128256512102415362048307240966144Averagecomputetimepertimestep(seconds)NumberOfCoresp775dx360M4p4600.
005.
0010.
0015.
0020.
0025.
00128256512102415362048307240966144Averagereadtimeperreadtimestep(seconds)NumberOfCoresp775dx360M4p4600.
005.
0010.
0015.
0020.
0025.
00128256512102415362048307240966144Averagewritetimeperwritetimestep(seconds)NumberOfCoresp775dx360M4p46018WRFGFLOPRates2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.
0.
002.
004.
006.
008.
0010.
0012.
0014.
0016.
00128256512102415362048307240966144Percent(%)SustainedofPeakPerformanceNumberOfCoresp775dx360M4p4600.
001000.
002000.
003000.
004000.
005000.
006000.
007000.
008000.
009000.
00128256512102415362048307240966144GFLOPSSustainedNumberOfCoresp775GFLOPSSustaineddx360M4GFLOPSSustainedp460GFLOPSSustainedPeakGFLOPS=NumberofcoresX8xCorefrequency.
p775SustainedGFLOPS=(10-9/p775_run_time)*(PM_VSU_1FLOP+2*PM_VSU_2FLOP+4*PM_VSU_4FLOP+8*PM_VSU_8FLOP)p460SustainedGGLOPS=p775SustainedGFLOPS*(p775_run_time/p460_run_time)dx360M4SustainedGFLOPS=p775SustainedGFLOPS*(p775_run_time/dx360M4_run_time)19ConclusionsWRFscalesandperformswellonalltestedsystems.
QuiltingI/Owithnetcdfworksverywellonallsystems.
Parallelnetcdffordataingestimprovesdatareadtimes.
WRFisapopularsingleprecisionCode.
Itrunsverywellondx360M4system.
-xAVXworksverywell.
IntelCompilersdoagreatjobproducingoptimalandfastbinaries.
numtiles~cacheblockparameterforadditionalperformance.
Hyperthreadinggivesnobenefittowardsoverallperformance.
NearneighborcommunicationishandledeffectivelybyFDRIB.
Itrunsokonp775andp460systems.
VSXdoesnotworkwell.
Codecrashesifcompiledwith-qsimdIBMXLcompilersdoOKwith–O3–qhot(VectorMASSlibrary).
Thinrectangulardecompositionsworkok(cachingandvectorMASS).
SMTworkswellonp775,duetoavailablememory-to-coreBW.
Nearneighborcommunicationanoverkillforp775,butOKforp460.
Performanceoddswerestackedagainstdx360M4.
Runswith6144coresanddifferentnproc_x,nproc_yyieldevenbetterperformance.
2014LENOVO.
ALLRIGHTSRESERVED16THECMWFHPCWORKSHOP.

选择Vultr VPS主机不支持支付宝付款的解决方案

在刚才更新Vultr 新年福利文章的时候突然想到前几天有网友问到自己有在Vultr 注册账户的时候无法用支付宝付款的问题,当时有帮助他给予解决,这里正好顺带一并介绍整理出来。毕竟对于来说,虽然使用的服务器不多,但是至少是见过世面的,大大小小商家的一些特性特征还是比较清楚的。在这篇文章中,和大家分享如果我们有在Vultr新注册账户或者充值购买云服务器的时候,不支持支付宝付款的原因。毕竟我们是知道的,...

ProfitServer$34.56/年,西班牙vps、荷兰vps、德国vps/不限制流量/支持自定义ISO

profitserver怎么样?profitserver是一家成立于2003的主机商家,是ITC控股的一个部门,主要经营的产品域名、SSL证书、虚拟主机、VPS和独立服务器,机房有俄罗斯、新加坡、荷兰、美国、保加利亚,VPS采用的是KVM虚拟架构,硬盘采用纯SSD,而且最大的优势是不限制流量,大公司运营,机器比较稳定,数据中心众多。此次ProfitServer正在对德国VPS(法兰克福)、西班牙v...

半月湾($59.99/年),升级带宽至200M起步 三网CN2 GIA线路

在前面的文章中就有介绍到半月湾Half Moon Bay Cloud服务商有提供洛杉矶DC5数据中心云服务器,这个堪比我们可能熟悉的某服务商,如果我们有用过的话会发现这个服务商的价格比较贵,而且一直缺货。这里,于是半月湾服务商看到机会来了,于是有新增同机房的CN2 GIA优化线路。在之前的文章中介绍到Half Moon Bay Cloud DC5机房且进行过测评。这次的变化是从原来基础的年付49....

sandybridge为你推荐
8080端口如何关闭8080端口留学生认证留学生的学位证书怎样认证?lunwenjiancewritecheck论文检测准吗?长尾关键词挖掘工具怎么挖掘长尾关键词,可以批量操作的那种同一服务器网站一个服务器放多个网站怎么设置?5xoy.com求个如月群真汉化版下载地址lcoc.top日本Ni-TOP是什么意思?www.ca800.com西门子plc仿真软件有什么功能dadi.tv1223tv影院首页地址是什么?1223tv影院在哪里可以找到?鹤城勿扰齐齐哈尔电视台晴彩鹤城是哪个频道
政务和公益机构域名注册管理中心 域名备案只选云聚达 仿牌空间 l5639 缓存服务器 payoneer godaddy支付宝 国外空间 日本bb瘦 卡巴斯基试用版 河南移动m值兑换 天翼云盘 服务器监测 安徽双线服务器 百度云加速 卡巴斯基官网下载 umax asp空间 双11促销 winserver2008r2 更多