2.31opteron

opteron  时间:2021-03-27  阅读:()
IBMTechnicalComputing2011IBMIntroductionofaStabilizedBi-ConjugateGradientiterativesolverforHelmholtz'sEquationontheCMAGRAPESGlobalandRegionalmodels.
PengHongBo(IBM),ZaphirisChristidis(Lenovo)andZhiyanJin(CMA)2014IBMIBMTechnicalComputing2IBMECMWF16thHPCWorkshop,October2014OutlineIntroduction.
–Helmholtz'sEquation.
TheCMAGRAPESmodelsandtheGeneralizedConjugateResidualMethod(GCR).
–GCRimplementationonGRAPES-GLOBALandGRAPES-MESOmodels.
–GRAPESprofiles.
IntroductionofBiconjugateGradientStabilizedMethod(BiCGSTAB)onGRAPES.
–Properties,ImplementationandprofileinformationinbothGLOBALandMESOmodels.
–PerformanceofBiCGSTABonGRAPES-GLOBALandGRAPES-MESOmodels.
Accuracyverificationandstatistics.
–Verificationchallengesofthe10-dayforecastofGRAPES-GLOBAL.
–Accuracybehavioronintroducedcodechangesasafunctionofforecastdays.
Areaaveragederrorsandcorrelationcoefficientsofoptimizedvsbaseresults.
–Chaoticbehaviorintheverificationofresultsformorethan7forecastdays.
Conclusions2014IBMIBMTechnicalComputing3IBMECMWF16thHPCWorkshop,October2014HelmholtzorPressureEquation.
Hemholtz'sequationiscommonlyusedinNumericalWeatherPrediction(NWP)models.
+2=0,–istheLaplacianOperator,isa3Dpressurefunctionandisapositivefunction.
Usingfinitedifferences,theaboveequationisreducedtoasystemoflinearequationsas:=0,–AisanMNxNMblocktriadiagonalmatrix,foragridofMxNhorizontalpoints–Theapproximatesolutionofthelinearequationsis:,theresidualis:=.
–WhenapreconditionerLisused,thediscretizedHelmholtzequationisformulatedas:1=1.
–LargehorizontalgridsinNWPmodelscallforefficientiterativemethodsforsolutions.
2014IBMIBMTechnicalComputing4IBMECMWF16thHPCWorkshop,October2014HelmholtzEquationinGRAPESGRAPES(Global/RegionalAssimilationPredictionSystem).
–ItisaNumericalWeatherpredictionsystemdevelopedbyChinaMeteorologicalAdministration(CMA).
–ItincludesaGlobalandaRegionalweathermodelaswellasdataassimilationsystemsforthem.
DynamiccorefeaturesinGRAPES–Fullycompressibleequations.
–Height-basedterrain-followingcoordinates–Optionforhydrostaticandnon-hydrostaticschemes.
–Arakawa"C"staggeredlat-lonhorizontalgrid.
–Charney-Phillipsverticalschemeforprognosticvariables–PolarFilterandMassFixingscheme–2-time-levelSemiImplicitSemi-Lagrangiantime-stepping.
–GCR–solverforHelmholtzEquationGeneralizedConjugateResidual(GCR)algorithm.
UsesanIncompletesparseLowerandUppertriangular(ILU)matrixfactorizationasapre-conditioner.
B1,B2,…,B19representthecoefficientmatrixofHelmholtz'sequation,whichisdiscretizedintoalargesparsematrix2014IBMIBMTechnicalComputing5IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALProfile,GCRcalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren2.
31811.
41384/384.
__module_integrate_NMOD_integrate[4]48.
62.
31811.
41384.
solver_grapes0.
31236.
34384/384.
pbl_driver0.
00166.
60384/384.
*__module_gcr_NMOD_solve_helmholts_stub_in_solver_grapes0.
03151.
48384/384.
radiation_driver0.
0078.
31384/384.
microphysics_driver0.
0069.
13384/384.
*__module_semi_lag_NMOD_semi_lag_interp_stub_in_solver_grapes0.
0052.
03384/384.
*__module_semi_lag_NMOD_upstream_interp_jin_stub_in_solver_grapes0.
0019.
46384/384.
cumulus_driver0.
0012.
03384/384.
*__module_semi_lag_NMOD_semi_get_upstream_jin_stub_in_solver_grapesMincommunicationtime:MPItask649Maxcommunicationtime:MPItask9392014IBMIBMTechnicalComputing6IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOProfileGCRcalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren1.
68504.
701080/1080.
*__module_integrate_NMOD_solver_grapes_stub_in___module_integrate_NMOD_solve_interface[5]1.
68504.
701080.
__module_integrate_NMOD_solver_grapes[6]0.
00221.
071080/1080.
__module_gcr_NMOD_solve_helmholts[8]0.
0667.
781080/1080.
__module_semi_lag_NMOD_semi_lag_interp[9]0.
5132.
821079/1079.
__module_semi_lag_NMOD_upstream_interp_phy[18]33.
270.
001080/1080.
__module_prm_wangmh_NMOD_prm_y_xiao[19]30.
930.
001080/1080.
__module_prm_wangmh_NMOD_prm_x_xiao[21]0.
0028.
631080/1080.
microphysics_driver[22]23.
440.
001080/1080.
__module_prm_wangmh_NMOD_prm_z_xiao[27]Mincommunicationtime:MPItask0Maxcommunicationtime:MPItask10802014IBMIBMTechnicalComputing7IBMECMWF16thHPCWorkshop,October2014ConvergenceofBi-conjugateGradientStabilizedalgorithmConvergenceoftheBiCGSTABandGCRalgorithmsfor1and25stepsofGRAPES.
–BiCGSTAB(2)convergesinfeweriterationsthanCGR,butmorecomputationallyintensive.
–TheintroductionBiCGSTABimprovedoverallperformanceintheGRAPESmodels.
Usedaspre-cursortotheapplicationoftheGCRalgorithm(extrapre-conditioner),TheamountofiterationsrequiredfortheconvergenceoftheGCRdecreasedsignificantly,GRAPESexecutedmuchfaster(withthehelpofVSXprimitivesincoding),SameandevenbetteraccuracyastheoriginalGCRalgorithm.
2014IBMIBMTechnicalComputing8IBMECMWF16thHPCWorkshop,October2014UpdatedHelmholtzSolverimplementationGRAPES-GLOBAL#ifdefBCGSLep=max(1.
D-10,DBLE(grid%ep))CALLpsolve_bcgsl_main(grid,gcr,ep,a_helm,b_helm,pi,&idep,jdep,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#elseep=max(1.
D-8,DBLE(grid%ep))CALLpsolve_bicgstab_main(grid,gcr,ep,a_helm,b_helm,pi,&idep,jdep,ids,ide,jds,jde,kds,&kde,ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#endifep=grid%epd=1.
0d0CALLpsolve_gcr_main(grid,gcr,ep,a_helm,b_helm,&iter_max,pi,d,idep,jdep,ids,ide,&jds,jde,kds,kde,ims,ime,jms,jme,&kms,kme,its,ite,jts,jte,kts,kte)GRAPES-MESO#ifdefBCGSLep=1.
D-8CALLpsolve_bcgsl_main(grid,gcr,ep,a_helm,b_helm,&pi,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#elseep=1D-8CALLpsolve_bicgstab_main(grid,gcr,ep,a_helm,b_helm,&pi,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#endifep=1.
D-19CALLpsolve_gcr_main(grid,gcr,ep,a_helm,b_helm,&iter_max,pi,d,ids,ide,jds,jde,&kds,kde,ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)2014IBMIBMTechnicalComputing9IBMECMWF16thHPCWorkshop,October2014ConvergenceofBiCGSTABinGRAPES-GLOBALUn-optimizedCodeOptimizedCodebeginofgcr0.
328934647159688379E-03RESofgcr0.
951769473740471055E-09in54iterationsTimingforprocessingforstep1:105.
43999elapsedseconds.
beginofgcr0.
307738677760282797E-01RESofgcr0.
985465629245594409E-09in64iterationsTimingforprocessingforstep2:3.
56000elapsedseconds.
beginofgcr0.
466354355510276777E-01RESofgcr0.
987319218430061550E-09in55iterationsTimingforprocessingforstep3:3.
54000elapsedseconds.
beginofgcr0.
419494279764634215E-01RESofgcr0.
952816344175419192E-09in45iterationsTimingforprocessingforstep4:3.
39000elapsedseconds.
beginofgcr0.
298146267204818100E-01RESofgcr0.
955547301333094658E-09in49iterationsTimingforprocessingforstep5:3.
44000elapsedseconds.
beginofbcgsl0.
328934356968701958E-03RESofbcgsl0.
698006138227474393E-09in16iterationsbeginofgcr0.
102067544683602406E-08RESofgcr0.
969841675518509429E-09in1iterationsTimingforprocessingforstep1:108.
25000elapsedseconds.
beginofbcgsl0.
307101071999445543E-01RESofbcgsl0.
998788656259226276E-09in11iterationsbeginofgcr0.
131913191092197407E-08RESofgcr0.
889851041683508861E-09in2iterationsTimingforprocessingforstep2:2.
50000elapsedseconds.
beginofbcgsl0.
370215569337918604E-01RESofbcgsl0.
728471243819791556E-09in12iterationsbeginofgcr0.
104455860894560670E-08RESofgcr0.
948845550215151657E-09in1iterationsTimingforprocessingforstep3:2.
50000elapsedseconds.
beginofbcgsl0.
348878083179526982E-01RESofbcgsl0.
829610442476401725E-09in12iterationsbeginofgcr0.
114433762484590935E-08RESofgcr0.
635845995011923888E-09in2iterationsTimingforprocessingforstep4:2.
50000elapsedseconds.
beginofbcgsl0.
266947703233833440E-01RESofbcgsl0.
688385709819754403E-09in12iterationsbeginofgcr0.
100135435371643626E-08RESofgcr0.
875385663076386664E-09in1iterationsTimingforprocessingforstep5:2.
46000elapsedseconds.
2014IBMIBMTechnicalComputing10IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALProfileComparisoncalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren2.
09682.
41384/384.
__module_integrate_NMOD_integrate[4]52.
32.
09682.
41384.
solver_grapes0.
24214.
12384/384.
pbl_driver0.
04157.
94384/384.
radiation_driver0.
0083.
80384/384.
*__module_gcr_NMOD_solve_helmholts_stub_in_solver_grapes0.
0067.
05384/384.
*__module_semi_lag_NMOD_semi_lag_interp_stub_in_solver_grapes0.
0054.
97384/384.
microphysics_driver0.
0150.
33384/384.
*__module_semi_lag_NMOD_upstream_interp_jin_stub_in_solver_grapes0.
0017.
91384/384.
cumulus_driver0.
0011.
72384/384.
*__module_semi_lag_NMOD_semi_get_upstream_jin_stub_in_solver_grapesMincommunicationtime:MPItask6492014IBMIBMTechnicalComputing11IBMECMWF16thHPCWorkshop,October2014ConvergenceofBiCGSTABinGRAPES-MESOUn-optimizedCodeOptimizedCode0:beginofgcr0.
118096356906410122E-030:RESofgcr0.
785681906255938855E-19in49iterations0:Timingforprocessingforstep1:18.
15000elapsedseconds.
0:Timingforprocessingforstep1:14.
52999cpuseconds.
0:beginofgcr0.
180227130734546867E-030:RESofgcr0.
690132004197575959E-19in49iterations0:Timingforprocessingforstep2:0.
90000elapsedseconds.
0:Timingforprocessingforstep2:0.
75000cpuseconds.
0:beginofgcr0.
712260919191608395E-040:RESofgcr0.
966563876032326532E-19in48iterations0:Timingforprocessingforstep3:0.
68000elapsedseconds.
0:Timingforprocessingforstep3:0.
57000cpuseconds.
0:beginofgcr0.
337160794746152708E-040:RESofgcr0.
877018965782972674E-19in47iterations0:Timingforprocessingforstep4:0.
67000elapsedseconds.
0:Timingforprocessingforstep4:0.
57000cpuseconds.
0:beginofgcr0.
196107554793862609E-040:RESofgcr0.
635560985222081976E-19in47iterations0:Timingforprocessingforstep5:0.
71000elapsedseconds.
0:Timingforprocessingforstep5:0.
60000cpuseconds.
0:beginofbicgstab0.
118096453737757547E-030:RESofbicgstab0.
380226254620264712E-08in3iterations0:beginofgcr0.
394720884628083064E-080:RESofgcr0.
746418612263664838E-19in16iterations0:Timingforprocessingforstep1:18.
99000elapsedseconds.
0:Timingforprocessingforstep1:18.
69000cpuseconds.
0:beginofbicgstab0.
168370346746922749E-030:RESofbicgstab0.
166872655366664435E-08in3iterations0:beginofgcr0.
181367330318505421E-080:RESofgcr0.
465501345880251435E-19in16iterations0:Timingforprocessingforstep2:0.
67000elapsedseconds.
0:Timingforprocessingforstep2:0.
68000cpuseconds.
0:beginofbicgstab0.
696717378252718038E-040:RESofbicgstab0.
137254158106719979E-08in3iterations0:beginofgcr0.
151730006467615455E-080:RESofgcr0.
322109698287421177E-19in16iterations0:Timingforprocessingforstep3:0.
45000elapsedseconds.
0:Timingforprocessingforstep3:0.
44000cpuseconds.
0:beginofbicgstab0.
320771797557436878E-040:RESofbicgstab0.
950087839437367948E-09in3iterations0:beginofgcr0.
109450945243131875E-080:RESofgcr0.
881479429351996220E-19in15iterations0:Timingforprocessingforstep4:0.
50000elapsedseconds.
0:Timingforprocessingforstep4:0.
50000cpuseconds.
0:beginofbicgstab0.
193261775264966473E-040:RESofbicgstab0.
985010942067601368E-08in2iterations0:beginofgcr0.
996454289745865310E-080:RESofgcr0.
365415647279281880E-19in17iterations0:Timingforprocessingforstep5:0.
48000elapsedseconds.
0:Timingforprocessingforstep5:0.
49000cpuseconds.
2014IBMIBMTechnicalComputing12IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOProfileComparison2014IBMIBMTechnicalComputing13IBMECMWF16thHPCWorkshop,October2014OptimizationVerification.
Accuracyofthecomputations.
Howdoesonecheckaccuracyonthecomputationsonoptimizedcodes–GRAPESMESOaccuracyverificationwassetfora48-hoursforecast.
–GRAPESGLOBALaccuracyverificationwassetfora10-dayforecast.
Majorchangeswereintroducedintoboth,GRAPESGLOBALandMESOCodes.
–Helmholtz'sequationsolutionalgorithm,VectorMASSinMicrophysicsroutines.
Qualitativeandquantitativeverificationmethods.
–VisualinspectionoftheGRAPESGLOBALandMESOgeneratedresults.
–Applystatistics,anddefinelimitsforacceptableresults.
Proceedslowlywithcaution.
Correlationcoefficients(ρ)betweenbase(C)andoptimizedresults(I).
Areaaveragednormalizeddifferences(σ)betweenbase(C)andoptimizedresults(I).
500mbGeopotentialHeight(Φ)fieldsandSurfacePrecipitationaregoodcandidates.
KMArangeforσ0.
98allmodels.
2014IBMIBMTechnicalComputing14IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOVerificationBase:42-hourforecastOptimized:42-hourforecast500mbGeopotentialHeightσandρarewithinacceptablerangeSurfacePrecipitationσandρarewithinacceptablerange2014IBMIBMTechnicalComputing15IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALVerificationGlobalModelsfor10-dayforecastsareimpossibletoverify–http://www.
washingtonpost.
com/blogs/capital-weather-gang/wp/2013/06/25/new-weather-service-supercomputer-faces-chaos/–GFS7-dayforecastdifferencesbetweenPOWER6andIntelsystemsatNCEP.
–Evenasmallchangeincompilerversion,nodecount,systemarchitecture,algorithmicchange,orbitlossesbyusinglessaccuraterepresentations(vectormass)cancauseaglobalweathermodeltodivertfrombaseresultsbeyond7forecastdays.
–Globalweathermodelverificationbeyond7daysforρ>0.
98,ishopeless.
–GRAPES-GLOBALverificationwasexaminedfrom1-10daysofforecast.
2014IBMIBMTechnicalComputing16IBMECMWF16thHPCWorkshop,October201410-DayGRAPES-GLOBALverification.
CorrelationcoefficientsandAreaAveragedDifferencesareusedtocompareruns.
–192-coreunmodifiedcoderunswereusedasbaseforcomparisons.
–10-dayforecastsofthe500mbGeopotentialHeightsfor2048-coresunmodified.
–10-dayforecastsofthe500mbGeopotentialHeightsfor4096-coresmodified.
–Microphysics(WSM6),BiCGSTAB,andacombinationofbothweretested.
–VSXintrinsiccallswereintroducedandtestedinBiCGSTABroutine.
–VectorMASSinWSM6drivesforecastinaslightlydifferentdirection.
2014IBMIBMTechnicalComputing17IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBAL:10-DAYGeopotentialHeightsForecast.
10-day500mbGeopotentialHeightsForecast.
–2048-coreunmodifiedcode,4096-coreoptimizedcode(WSM6,BiCGSTAB_SIMD)UnoptimizedRun:2048Cores500mbGeopotentialHeights.
OptimizedRun:4096Cores500mbGeopotentialHeights.
2014IBMIBMTechnicalComputing18IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBAL:10-DAYSurfacePrecipitationForecast.
10-daySurfacePrecipitationForecast.
–2048-coreunmodifiedcode,4096-coreoptimizedcode(WSM6,BiCGSTAB_SIMD)UnoptimizedRun:2048CoresSurfacePrecipitation.
OptimizedRun:4096CoresSurfacePrecipitation.
2014IBMIBMTechnicalComputing19IBMECMWF16thHPCWorkshop,October2014SummaryandConclusions.
TheGRAPES-GLOBALandGRAPES-MESOmodelswereoptimizedforperformance–BothmodelsusedtheGeneralizedConjugateResidual(GCR)IterativeSolver.
GCR:veryefficientcode,moderateconvergencerates.
–TheBi-conjugateGradientStabilized(BiCGSTAB)iterativesolverwasintroduced.
BiCGSTAB:lessefficientcode,butfastconvergencerates.
–Stand-aloneBiCGSTABsolverdidnotimproveperformance.
WhenBiCGSTABwasusedaheadofGCR,significantimprovementswererealized.
Increasedaccuracy,asseenfromconvergenceresiduals.
Lesstotaliterationstoachieveconvergence,betteroverallperformance.
–VectorMASSintrinsicfunctionswereappliedinthemicrophysicsroutines.
AccuracyverificationwasachallengeforGRAPES-GLOBALforupto10-days.
–GRAPES-MESOverifiedsuccessfullyfor7days,unlikeWSM6.
–VSXprimitives(singleprecision)inBiCGSTABwasnotcriticalinbothperformanceandaccuracy.

Advinservers:美国达拉斯便宜VPS/1核/4GB/80GB SSD/1Gbps不限流量/月付$2.5/美国10Gbps高防服务器/高达3.5TBDDos保护$149.99元/月

Advinservers,国外商家,公司位于新泽西州,似乎刚刚新成立不久,主要提供美国和欧洲地区VPS和独立服务器业务等。现在有几款产品优惠,高达7.5TB的存储VPS和高达3.5TBDDoS保护的美国纽约高防服务器,性价比非常不错,有兴趣的可以关注一下,并且支持Paypal付款。官方网站点击直达官方网站促销产品第一款VPS为预购,预计8月1日交付。CPU为英特尔至强 CPU(X 或 E5)。官方...

美国cera机房 2核4G 19.9元/月 宿主机 E5 2696v2x2 512G

美国特价云服务器 2核4G 19.9元杭州王小玉网络科技有限公司成立于2020是拥有IDC ISP资质的正规公司,这次推荐的美国云服务器也是商家主打产品,有点在于稳定 速度 数据安全。企业级数据安全保障,支持异地灾备,数据安全系数达到了100%安全级别,是国内唯一一家美国云服务器拥有这个安全级别的商家。E5 2696v2x2 2核 4G内存 20G系统盘 10G数据盘 20M带宽 100G流量 1...

BuyVM($5/月),1Gbps不限流量流媒体VPS主机

BuyVM针对中国客户推出了China Special - STREAM RYZEN VPS主机,带Streaming Optimized IP,帮你解锁多平台流媒体,适用于对于海外流媒体有需求的客户,主机开设在拉斯维加斯机房,AMD Ryzen+NVMe磁盘,支持Linux或者Windows操作系统,IPv4+IPv6,1Gbps不限流量,最低月付5加元起,比美元更低一些,现在汇率1加元=0.7...

opteron为你推荐
微盟赔付方案对意外险赔付方案不同意 该怎么办?国家网络安全部中国国家安全局是怎么招人的?多家五星酒店回应网传名媛拼单我妈同学给了她一张那种酒店的入场券,不知道这个入场券是随便发的还是怎么,我们这里算是四五星吧,一个小度商城小度在家智能屏Air性价比高吗?懂行的进~百花百游“百花竟放贺阳春 万物从今尽转新 末数莫言穷运至 不知否极泰来临”是什么意思啊?8090lu.com8090向前冲电影 8090向前冲清晰版 8090向前冲在线观看 8090向前冲播放 8090向前冲视频下载地址??haokandianyingwang有什么好看的电影网站www.bbb551.combbb是什么意思www.zhiboba.com网上看nbawww.cn12365.org全国公民身份证号码查询服务中心(http://www.nciic.com.cn/)这个网站怎么查不了啊?
vps虚拟服务器 siteground 服务器cpu性能排行 免费静态空间 云鼎网络 个人空间申请 193邮箱 seednet 网站cdn加速 电信虚拟主机 上海联通宽带测速 114dns 阿里云邮箱申请 xshell5注册码 winds forwarder 认证机构 hosts文件修改 nic wannacry勒索病毒 更多