2.31opteron
opteron 时间:2021-03-27 阅读:(
)
IBMTechnicalComputing2011IBMIntroductionofaStabilizedBi-ConjugateGradientiterativesolverforHelmholtz'sEquationontheCMAGRAPESGlobalandRegionalmodels.
PengHongBo(IBM),ZaphirisChristidis(Lenovo)andZhiyanJin(CMA)2014IBMIBMTechnicalComputing2IBMECMWF16thHPCWorkshop,October2014OutlineIntroduction.
–Helmholtz'sEquation.
TheCMAGRAPESmodelsandtheGeneralizedConjugateResidualMethod(GCR).
–GCRimplementationonGRAPES-GLOBALandGRAPES-MESOmodels.
–GRAPESprofiles.
IntroductionofBiconjugateGradientStabilizedMethod(BiCGSTAB)onGRAPES.
–Properties,ImplementationandprofileinformationinbothGLOBALandMESOmodels.
–PerformanceofBiCGSTABonGRAPES-GLOBALandGRAPES-MESOmodels.
Accuracyverificationandstatistics.
–Verificationchallengesofthe10-dayforecastofGRAPES-GLOBAL.
–Accuracybehavioronintroducedcodechangesasafunctionofforecastdays.
Areaaveragederrorsandcorrelationcoefficientsofoptimizedvsbaseresults.
–Chaoticbehaviorintheverificationofresultsformorethan7forecastdays.
Conclusions2014IBMIBMTechnicalComputing3IBMECMWF16thHPCWorkshop,October2014HelmholtzorPressureEquation.
Hemholtz'sequationiscommonlyusedinNumericalWeatherPrediction(NWP)models.
+2=0,–istheLaplacianOperator,isa3Dpressurefunctionandisapositivefunction.
Usingfinitedifferences,theaboveequationisreducedtoasystemoflinearequationsas:=0,–AisanMNxNMblocktriadiagonalmatrix,foragridofMxNhorizontalpoints–Theapproximatesolutionofthelinearequationsis:,theresidualis:=.
–WhenapreconditionerLisused,thediscretizedHelmholtzequationisformulatedas:1=1.
–LargehorizontalgridsinNWPmodelscallforefficientiterativemethodsforsolutions.
2014IBMIBMTechnicalComputing4IBMECMWF16thHPCWorkshop,October2014HelmholtzEquationinGRAPESGRAPES(Global/RegionalAssimilationPredictionSystem).
–ItisaNumericalWeatherpredictionsystemdevelopedbyChinaMeteorologicalAdministration(CMA).
–ItincludesaGlobalandaRegionalweathermodelaswellasdataassimilationsystemsforthem.
DynamiccorefeaturesinGRAPES–Fullycompressibleequations.
–Height-basedterrain-followingcoordinates–Optionforhydrostaticandnon-hydrostaticschemes.
–Arakawa"C"staggeredlat-lonhorizontalgrid.
–Charney-Phillipsverticalschemeforprognosticvariables–PolarFilterandMassFixingscheme–2-time-levelSemiImplicitSemi-Lagrangiantime-stepping.
–GCR–solverforHelmholtzEquationGeneralizedConjugateResidual(GCR)algorithm.
UsesanIncompletesparseLowerandUppertriangular(ILU)matrixfactorizationasapre-conditioner.
B1,B2,…,B19representthecoefficientmatrixofHelmholtz'sequation,whichisdiscretizedintoalargesparsematrix2014IBMIBMTechnicalComputing5IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALProfile,GCRcalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren2.
31811.
41384/384.
__module_integrate_NMOD_integrate[4]48.
62.
31811.
41384.
solver_grapes0.
31236.
34384/384.
pbl_driver0.
00166.
60384/384.
*__module_gcr_NMOD_solve_helmholts_stub_in_solver_grapes0.
03151.
48384/384.
radiation_driver0.
0078.
31384/384.
microphysics_driver0.
0069.
13384/384.
*__module_semi_lag_NMOD_semi_lag_interp_stub_in_solver_grapes0.
0052.
03384/384.
*__module_semi_lag_NMOD_upstream_interp_jin_stub_in_solver_grapes0.
0019.
46384/384.
cumulus_driver0.
0012.
03384/384.
*__module_semi_lag_NMOD_semi_get_upstream_jin_stub_in_solver_grapesMincommunicationtime:MPItask649Maxcommunicationtime:MPItask9392014IBMIBMTechnicalComputing6IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOProfileGCRcalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren1.
68504.
701080/1080.
*__module_integrate_NMOD_solver_grapes_stub_in___module_integrate_NMOD_solve_interface[5]1.
68504.
701080.
__module_integrate_NMOD_solver_grapes[6]0.
00221.
071080/1080.
__module_gcr_NMOD_solve_helmholts[8]0.
0667.
781080/1080.
__module_semi_lag_NMOD_semi_lag_interp[9]0.
5132.
821079/1079.
__module_semi_lag_NMOD_upstream_interp_phy[18]33.
270.
001080/1080.
__module_prm_wangmh_NMOD_prm_y_xiao[19]30.
930.
001080/1080.
__module_prm_wangmh_NMOD_prm_x_xiao[21]0.
0028.
631080/1080.
microphysics_driver[22]23.
440.
001080/1080.
__module_prm_wangmh_NMOD_prm_z_xiao[27]Mincommunicationtime:MPItask0Maxcommunicationtime:MPItask10802014IBMIBMTechnicalComputing7IBMECMWF16thHPCWorkshop,October2014ConvergenceofBi-conjugateGradientStabilizedalgorithmConvergenceoftheBiCGSTABandGCRalgorithmsfor1and25stepsofGRAPES.
–BiCGSTAB(2)convergesinfeweriterationsthanCGR,butmorecomputationallyintensive.
–TheintroductionBiCGSTABimprovedoverallperformanceintheGRAPESmodels.
Usedaspre-cursortotheapplicationoftheGCRalgorithm(extrapre-conditioner),TheamountofiterationsrequiredfortheconvergenceoftheGCRdecreasedsignificantly,GRAPESexecutedmuchfaster(withthehelpofVSXprimitivesincoding),SameandevenbetteraccuracyastheoriginalGCRalgorithm.
2014IBMIBMTechnicalComputing8IBMECMWF16thHPCWorkshop,October2014UpdatedHelmholtzSolverimplementationGRAPES-GLOBAL#ifdefBCGSLep=max(1.
D-10,DBLE(grid%ep))CALLpsolve_bcgsl_main(grid,gcr,ep,a_helm,b_helm,pi,&idep,jdep,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#elseep=max(1.
D-8,DBLE(grid%ep))CALLpsolve_bicgstab_main(grid,gcr,ep,a_helm,b_helm,pi,&idep,jdep,ids,ide,jds,jde,kds,&kde,ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#endifep=grid%epd=1.
0d0CALLpsolve_gcr_main(grid,gcr,ep,a_helm,b_helm,&iter_max,pi,d,idep,jdep,ids,ide,&jds,jde,kds,kde,ims,ime,jms,jme,&kms,kme,its,ite,jts,jte,kts,kte)GRAPES-MESO#ifdefBCGSLep=1.
D-8CALLpsolve_bcgsl_main(grid,gcr,ep,a_helm,b_helm,&pi,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#elseep=1D-8CALLpsolve_bicgstab_main(grid,gcr,ep,a_helm,b_helm,&pi,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#endifep=1.
D-19CALLpsolve_gcr_main(grid,gcr,ep,a_helm,b_helm,&iter_max,pi,d,ids,ide,jds,jde,&kds,kde,ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)2014IBMIBMTechnicalComputing9IBMECMWF16thHPCWorkshop,October2014ConvergenceofBiCGSTABinGRAPES-GLOBALUn-optimizedCodeOptimizedCodebeginofgcr0.
328934647159688379E-03RESofgcr0.
951769473740471055E-09in54iterationsTimingforprocessingforstep1:105.
43999elapsedseconds.
beginofgcr0.
307738677760282797E-01RESofgcr0.
985465629245594409E-09in64iterationsTimingforprocessingforstep2:3.
56000elapsedseconds.
beginofgcr0.
466354355510276777E-01RESofgcr0.
987319218430061550E-09in55iterationsTimingforprocessingforstep3:3.
54000elapsedseconds.
beginofgcr0.
419494279764634215E-01RESofgcr0.
952816344175419192E-09in45iterationsTimingforprocessingforstep4:3.
39000elapsedseconds.
beginofgcr0.
298146267204818100E-01RESofgcr0.
955547301333094658E-09in49iterationsTimingforprocessingforstep5:3.
44000elapsedseconds.
beginofbcgsl0.
328934356968701958E-03RESofbcgsl0.
698006138227474393E-09in16iterationsbeginofgcr0.
102067544683602406E-08RESofgcr0.
969841675518509429E-09in1iterationsTimingforprocessingforstep1:108.
25000elapsedseconds.
beginofbcgsl0.
307101071999445543E-01RESofbcgsl0.
998788656259226276E-09in11iterationsbeginofgcr0.
131913191092197407E-08RESofgcr0.
889851041683508861E-09in2iterationsTimingforprocessingforstep2:2.
50000elapsedseconds.
beginofbcgsl0.
370215569337918604E-01RESofbcgsl0.
728471243819791556E-09in12iterationsbeginofgcr0.
104455860894560670E-08RESofgcr0.
948845550215151657E-09in1iterationsTimingforprocessingforstep3:2.
50000elapsedseconds.
beginofbcgsl0.
348878083179526982E-01RESofbcgsl0.
829610442476401725E-09in12iterationsbeginofgcr0.
114433762484590935E-08RESofgcr0.
635845995011923888E-09in2iterationsTimingforprocessingforstep4:2.
50000elapsedseconds.
beginofbcgsl0.
266947703233833440E-01RESofbcgsl0.
688385709819754403E-09in12iterationsbeginofgcr0.
100135435371643626E-08RESofgcr0.
875385663076386664E-09in1iterationsTimingforprocessingforstep5:2.
46000elapsedseconds.
2014IBMIBMTechnicalComputing10IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALProfileComparisoncalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren2.
09682.
41384/384.
__module_integrate_NMOD_integrate[4]52.
32.
09682.
41384.
solver_grapes0.
24214.
12384/384.
pbl_driver0.
04157.
94384/384.
radiation_driver0.
0083.
80384/384.
*__module_gcr_NMOD_solve_helmholts_stub_in_solver_grapes0.
0067.
05384/384.
*__module_semi_lag_NMOD_semi_lag_interp_stub_in_solver_grapes0.
0054.
97384/384.
microphysics_driver0.
0150.
33384/384.
*__module_semi_lag_NMOD_upstream_interp_jin_stub_in_solver_grapes0.
0017.
91384/384.
cumulus_driver0.
0011.
72384/384.
*__module_semi_lag_NMOD_semi_get_upstream_jin_stub_in_solver_grapesMincommunicationtime:MPItask6492014IBMIBMTechnicalComputing11IBMECMWF16thHPCWorkshop,October2014ConvergenceofBiCGSTABinGRAPES-MESOUn-optimizedCodeOptimizedCode0:beginofgcr0.
118096356906410122E-030:RESofgcr0.
785681906255938855E-19in49iterations0:Timingforprocessingforstep1:18.
15000elapsedseconds.
0:Timingforprocessingforstep1:14.
52999cpuseconds.
0:beginofgcr0.
180227130734546867E-030:RESofgcr0.
690132004197575959E-19in49iterations0:Timingforprocessingforstep2:0.
90000elapsedseconds.
0:Timingforprocessingforstep2:0.
75000cpuseconds.
0:beginofgcr0.
712260919191608395E-040:RESofgcr0.
966563876032326532E-19in48iterations0:Timingforprocessingforstep3:0.
68000elapsedseconds.
0:Timingforprocessingforstep3:0.
57000cpuseconds.
0:beginofgcr0.
337160794746152708E-040:RESofgcr0.
877018965782972674E-19in47iterations0:Timingforprocessingforstep4:0.
67000elapsedseconds.
0:Timingforprocessingforstep4:0.
57000cpuseconds.
0:beginofgcr0.
196107554793862609E-040:RESofgcr0.
635560985222081976E-19in47iterations0:Timingforprocessingforstep5:0.
71000elapsedseconds.
0:Timingforprocessingforstep5:0.
60000cpuseconds.
0:beginofbicgstab0.
118096453737757547E-030:RESofbicgstab0.
380226254620264712E-08in3iterations0:beginofgcr0.
394720884628083064E-080:RESofgcr0.
746418612263664838E-19in16iterations0:Timingforprocessingforstep1:18.
99000elapsedseconds.
0:Timingforprocessingforstep1:18.
69000cpuseconds.
0:beginofbicgstab0.
168370346746922749E-030:RESofbicgstab0.
166872655366664435E-08in3iterations0:beginofgcr0.
181367330318505421E-080:RESofgcr0.
465501345880251435E-19in16iterations0:Timingforprocessingforstep2:0.
67000elapsedseconds.
0:Timingforprocessingforstep2:0.
68000cpuseconds.
0:beginofbicgstab0.
696717378252718038E-040:RESofbicgstab0.
137254158106719979E-08in3iterations0:beginofgcr0.
151730006467615455E-080:RESofgcr0.
322109698287421177E-19in16iterations0:Timingforprocessingforstep3:0.
45000elapsedseconds.
0:Timingforprocessingforstep3:0.
44000cpuseconds.
0:beginofbicgstab0.
320771797557436878E-040:RESofbicgstab0.
950087839437367948E-09in3iterations0:beginofgcr0.
109450945243131875E-080:RESofgcr0.
881479429351996220E-19in15iterations0:Timingforprocessingforstep4:0.
50000elapsedseconds.
0:Timingforprocessingforstep4:0.
50000cpuseconds.
0:beginofbicgstab0.
193261775264966473E-040:RESofbicgstab0.
985010942067601368E-08in2iterations0:beginofgcr0.
996454289745865310E-080:RESofgcr0.
365415647279281880E-19in17iterations0:Timingforprocessingforstep5:0.
48000elapsedseconds.
0:Timingforprocessingforstep5:0.
49000cpuseconds.
2014IBMIBMTechnicalComputing12IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOProfileComparison2014IBMIBMTechnicalComputing13IBMECMWF16thHPCWorkshop,October2014OptimizationVerification.
Accuracyofthecomputations.
Howdoesonecheckaccuracyonthecomputationsonoptimizedcodes–GRAPESMESOaccuracyverificationwassetfora48-hoursforecast.
–GRAPESGLOBALaccuracyverificationwassetfora10-dayforecast.
Majorchangeswereintroducedintoboth,GRAPESGLOBALandMESOCodes.
–Helmholtz'sequationsolutionalgorithm,VectorMASSinMicrophysicsroutines.
Qualitativeandquantitativeverificationmethods.
–VisualinspectionoftheGRAPESGLOBALandMESOgeneratedresults.
–Applystatistics,anddefinelimitsforacceptableresults.
Proceedslowlywithcaution.
Correlationcoefficients(ρ)betweenbase(C)andoptimizedresults(I).
Areaaveragednormalizeddifferences(σ)betweenbase(C)andoptimizedresults(I).
500mbGeopotentialHeight(Φ)fieldsandSurfacePrecipitationaregoodcandidates.
KMArangeforσ0.
98allmodels.
2014IBMIBMTechnicalComputing14IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOVerificationBase:42-hourforecastOptimized:42-hourforecast500mbGeopotentialHeightσandρarewithinacceptablerangeSurfacePrecipitationσandρarewithinacceptablerange2014IBMIBMTechnicalComputing15IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALVerificationGlobalModelsfor10-dayforecastsareimpossibletoverify–http://www.
washingtonpost.
com/blogs/capital-weather-gang/wp/2013/06/25/new-weather-service-supercomputer-faces-chaos/–GFS7-dayforecastdifferencesbetweenPOWER6andIntelsystemsatNCEP.
–Evenasmallchangeincompilerversion,nodecount,systemarchitecture,algorithmicchange,orbitlossesbyusinglessaccuraterepresentations(vectormass)cancauseaglobalweathermodeltodivertfrombaseresultsbeyond7forecastdays.
–Globalweathermodelverificationbeyond7daysforρ>0.
98,ishopeless.
–GRAPES-GLOBALverificationwasexaminedfrom1-10daysofforecast.
2014IBMIBMTechnicalComputing16IBMECMWF16thHPCWorkshop,October201410-DayGRAPES-GLOBALverification.
CorrelationcoefficientsandAreaAveragedDifferencesareusedtocompareruns.
–192-coreunmodifiedcoderunswereusedasbaseforcomparisons.
–10-dayforecastsofthe500mbGeopotentialHeightsfor2048-coresunmodified.
–10-dayforecastsofthe500mbGeopotentialHeightsfor4096-coresmodified.
–Microphysics(WSM6),BiCGSTAB,andacombinationofbothweretested.
–VSXintrinsiccallswereintroducedandtestedinBiCGSTABroutine.
–VectorMASSinWSM6drivesforecastinaslightlydifferentdirection.
2014IBMIBMTechnicalComputing17IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBAL:10-DAYGeopotentialHeightsForecast.
10-day500mbGeopotentialHeightsForecast.
–2048-coreunmodifiedcode,4096-coreoptimizedcode(WSM6,BiCGSTAB_SIMD)UnoptimizedRun:2048Cores500mbGeopotentialHeights.
OptimizedRun:4096Cores500mbGeopotentialHeights.
2014IBMIBMTechnicalComputing18IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBAL:10-DAYSurfacePrecipitationForecast.
10-daySurfacePrecipitationForecast.
–2048-coreunmodifiedcode,4096-coreoptimizedcode(WSM6,BiCGSTAB_SIMD)UnoptimizedRun:2048CoresSurfacePrecipitation.
OptimizedRun:4096CoresSurfacePrecipitation.
2014IBMIBMTechnicalComputing19IBMECMWF16thHPCWorkshop,October2014SummaryandConclusions.
TheGRAPES-GLOBALandGRAPES-MESOmodelswereoptimizedforperformance–BothmodelsusedtheGeneralizedConjugateResidual(GCR)IterativeSolver.
GCR:veryefficientcode,moderateconvergencerates.
–TheBi-conjugateGradientStabilized(BiCGSTAB)iterativesolverwasintroduced.
BiCGSTAB:lessefficientcode,butfastconvergencerates.
–Stand-aloneBiCGSTABsolverdidnotimproveperformance.
WhenBiCGSTABwasusedaheadofGCR,significantimprovementswererealized.
Increasedaccuracy,asseenfromconvergenceresiduals.
Lesstotaliterationstoachieveconvergence,betteroverallperformance.
–VectorMASSintrinsicfunctionswereappliedinthemicrophysicsroutines.
AccuracyverificationwasachallengeforGRAPES-GLOBALforupto10-days.
–GRAPES-MESOverifiedsuccessfullyfor7days,unlikeWSM6.
–VSXprimitives(singleprecision)inBiCGSTABwasnotcriticalinbothperformanceandaccuracy.
妮妮云的来历妮妮云是 789 陈总 张总 三方共同投资建立的网站 本着“良心 便宜 稳定”的初衷 为小白用户避免被坑妮妮云的市场定位妮妮云主要代理市场稳定速度的云服务器产品,避免新手购买云服务器的时候众多商家不知道如何选择,妮妮云就帮你选择好了产品,无需承担购买风险,不用担心出现被跑路 被诈骗的情况。妮妮云的售后保证妮妮云退款 通过于合作商的友好协商,云服务器提供2天内全额退款,超过2天不退款 物...
我们在选择虚拟主机和云服务器的时候,是不是经常有看到有的线路是BGP线路,比如前几天有看到服务商有国际BGP线路和国内BGP线路。这个BGP线路和其他服务线路有什么不同呢?所谓的BGP线路机房,就是在不同的运营商之间通过技术手段时间各个网络的兼容速度最佳,但是IP地址还是一个。正常情况下,我们看到的某个服务商提供的IP地址,在电信和联通移动速度是不同的,有的电信速度不错,有的是移动速度好。但是如果...
Mineserver(ASN142586|UK CompanyNumber 1351696),已经成立一年半。主营香港日本机房的VPS、物理服务器业务。Telegram群组: @mineserver1 | Discord群组: https://discord.gg/MTB8ww9GEA7折循环优惠:JP30(JPCN2宣布产品可以使用)8折循环优惠:CMI20(仅1024M以上套餐可以使用)9折循...
opteron为你推荐
留学生认证留学生回国学历认证 需要带什么材料丑福晋男主角中毒眼瞎毁容,女主角被逼当丫鬟,应用自己的血做药引帮男主角解毒的言情小说百度关键词分析百度关键字分析是什么意思?www.6vhao.com有哪些电影网站ww.66bobo.comfq55点com是什么网站www4399com4399是什么网站啊???www.147.qqq.com谁有147清晰的视频?学习学习175qq.comkf.qq.com.地址是什么百度关键字百度推广多少关键词合适云鹏清藏头诗!急急急急急急急!谢谢啦!大师进
美国虚拟主机推荐 上海域名注册 cn域名注册 已备案未注册域名 企业域名备案 主机测评 赵容 服务器评测 外国服务器 长沙服务器 中国电信测速112 域名接入 稳定免费空间 搜索引擎提交入口 ca187 美国盐湖城 net空间 德隆中文网 注册阿里云邮箱 测试网速命令 更多