2.31opteron
opteron 时间:2021-03-27 阅读:(
)
IBMTechnicalComputing2011IBMIntroductionofaStabilizedBi-ConjugateGradientiterativesolverforHelmholtz'sEquationontheCMAGRAPESGlobalandRegionalmodels.
PengHongBo(IBM),ZaphirisChristidis(Lenovo)andZhiyanJin(CMA)2014IBMIBMTechnicalComputing2IBMECMWF16thHPCWorkshop,October2014OutlineIntroduction.
–Helmholtz'sEquation.
TheCMAGRAPESmodelsandtheGeneralizedConjugateResidualMethod(GCR).
–GCRimplementationonGRAPES-GLOBALandGRAPES-MESOmodels.
–GRAPESprofiles.
IntroductionofBiconjugateGradientStabilizedMethod(BiCGSTAB)onGRAPES.
–Properties,ImplementationandprofileinformationinbothGLOBALandMESOmodels.
–PerformanceofBiCGSTABonGRAPES-GLOBALandGRAPES-MESOmodels.
Accuracyverificationandstatistics.
–Verificationchallengesofthe10-dayforecastofGRAPES-GLOBAL.
–Accuracybehavioronintroducedcodechangesasafunctionofforecastdays.
Areaaveragederrorsandcorrelationcoefficientsofoptimizedvsbaseresults.
–Chaoticbehaviorintheverificationofresultsformorethan7forecastdays.
Conclusions2014IBMIBMTechnicalComputing3IBMECMWF16thHPCWorkshop,October2014HelmholtzorPressureEquation.
Hemholtz'sequationiscommonlyusedinNumericalWeatherPrediction(NWP)models.
+2=0,–istheLaplacianOperator,isa3Dpressurefunctionandisapositivefunction.
Usingfinitedifferences,theaboveequationisreducedtoasystemoflinearequationsas:=0,–AisanMNxNMblocktriadiagonalmatrix,foragridofMxNhorizontalpoints–Theapproximatesolutionofthelinearequationsis:,theresidualis:=.
–WhenapreconditionerLisused,thediscretizedHelmholtzequationisformulatedas:1=1.
–LargehorizontalgridsinNWPmodelscallforefficientiterativemethodsforsolutions.
2014IBMIBMTechnicalComputing4IBMECMWF16thHPCWorkshop,October2014HelmholtzEquationinGRAPESGRAPES(Global/RegionalAssimilationPredictionSystem).
–ItisaNumericalWeatherpredictionsystemdevelopedbyChinaMeteorologicalAdministration(CMA).
–ItincludesaGlobalandaRegionalweathermodelaswellasdataassimilationsystemsforthem.
DynamiccorefeaturesinGRAPES–Fullycompressibleequations.
–Height-basedterrain-followingcoordinates–Optionforhydrostaticandnon-hydrostaticschemes.
–Arakawa"C"staggeredlat-lonhorizontalgrid.
–Charney-Phillipsverticalschemeforprognosticvariables–PolarFilterandMassFixingscheme–2-time-levelSemiImplicitSemi-Lagrangiantime-stepping.
–GCR–solverforHelmholtzEquationGeneralizedConjugateResidual(GCR)algorithm.
UsesanIncompletesparseLowerandUppertriangular(ILU)matrixfactorizationasapre-conditioner.
B1,B2,…,B19representthecoefficientmatrixofHelmholtz'sequation,whichisdiscretizedintoalargesparsematrix2014IBMIBMTechnicalComputing5IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALProfile,GCRcalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren2.
31811.
41384/384.
__module_integrate_NMOD_integrate[4]48.
62.
31811.
41384.
solver_grapes0.
31236.
34384/384.
pbl_driver0.
00166.
60384/384.
*__module_gcr_NMOD_solve_helmholts_stub_in_solver_grapes0.
03151.
48384/384.
radiation_driver0.
0078.
31384/384.
microphysics_driver0.
0069.
13384/384.
*__module_semi_lag_NMOD_semi_lag_interp_stub_in_solver_grapes0.
0052.
03384/384.
*__module_semi_lag_NMOD_upstream_interp_jin_stub_in_solver_grapes0.
0019.
46384/384.
cumulus_driver0.
0012.
03384/384.
*__module_semi_lag_NMOD_semi_get_upstream_jin_stub_in_solver_grapesMincommunicationtime:MPItask649Maxcommunicationtime:MPItask9392014IBMIBMTechnicalComputing6IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOProfileGCRcalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren1.
68504.
701080/1080.
*__module_integrate_NMOD_solver_grapes_stub_in___module_integrate_NMOD_solve_interface[5]1.
68504.
701080.
__module_integrate_NMOD_solver_grapes[6]0.
00221.
071080/1080.
__module_gcr_NMOD_solve_helmholts[8]0.
0667.
781080/1080.
__module_semi_lag_NMOD_semi_lag_interp[9]0.
5132.
821079/1079.
__module_semi_lag_NMOD_upstream_interp_phy[18]33.
270.
001080/1080.
__module_prm_wangmh_NMOD_prm_y_xiao[19]30.
930.
001080/1080.
__module_prm_wangmh_NMOD_prm_x_xiao[21]0.
0028.
631080/1080.
microphysics_driver[22]23.
440.
001080/1080.
__module_prm_wangmh_NMOD_prm_z_xiao[27]Mincommunicationtime:MPItask0Maxcommunicationtime:MPItask10802014IBMIBMTechnicalComputing7IBMECMWF16thHPCWorkshop,October2014ConvergenceofBi-conjugateGradientStabilizedalgorithmConvergenceoftheBiCGSTABandGCRalgorithmsfor1and25stepsofGRAPES.
–BiCGSTAB(2)convergesinfeweriterationsthanCGR,butmorecomputationallyintensive.
–TheintroductionBiCGSTABimprovedoverallperformanceintheGRAPESmodels.
Usedaspre-cursortotheapplicationoftheGCRalgorithm(extrapre-conditioner),TheamountofiterationsrequiredfortheconvergenceoftheGCRdecreasedsignificantly,GRAPESexecutedmuchfaster(withthehelpofVSXprimitivesincoding),SameandevenbetteraccuracyastheoriginalGCRalgorithm.
2014IBMIBMTechnicalComputing8IBMECMWF16thHPCWorkshop,October2014UpdatedHelmholtzSolverimplementationGRAPES-GLOBAL#ifdefBCGSLep=max(1.
D-10,DBLE(grid%ep))CALLpsolve_bcgsl_main(grid,gcr,ep,a_helm,b_helm,pi,&idep,jdep,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#elseep=max(1.
D-8,DBLE(grid%ep))CALLpsolve_bicgstab_main(grid,gcr,ep,a_helm,b_helm,pi,&idep,jdep,ids,ide,jds,jde,kds,&kde,ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#endifep=grid%epd=1.
0d0CALLpsolve_gcr_main(grid,gcr,ep,a_helm,b_helm,&iter_max,pi,d,idep,jdep,ids,ide,&jds,jde,kds,kde,ims,ime,jms,jme,&kms,kme,its,ite,jts,jte,kts,kte)GRAPES-MESO#ifdefBCGSLep=1.
D-8CALLpsolve_bcgsl_main(grid,gcr,ep,a_helm,b_helm,&pi,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#elseep=1D-8CALLpsolve_bicgstab_main(grid,gcr,ep,a_helm,b_helm,&pi,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#endifep=1.
D-19CALLpsolve_gcr_main(grid,gcr,ep,a_helm,b_helm,&iter_max,pi,d,ids,ide,jds,jde,&kds,kde,ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)2014IBMIBMTechnicalComputing9IBMECMWF16thHPCWorkshop,October2014ConvergenceofBiCGSTABinGRAPES-GLOBALUn-optimizedCodeOptimizedCodebeginofgcr0.
328934647159688379E-03RESofgcr0.
951769473740471055E-09in54iterationsTimingforprocessingforstep1:105.
43999elapsedseconds.
beginofgcr0.
307738677760282797E-01RESofgcr0.
985465629245594409E-09in64iterationsTimingforprocessingforstep2:3.
56000elapsedseconds.
beginofgcr0.
466354355510276777E-01RESofgcr0.
987319218430061550E-09in55iterationsTimingforprocessingforstep3:3.
54000elapsedseconds.
beginofgcr0.
419494279764634215E-01RESofgcr0.
952816344175419192E-09in45iterationsTimingforprocessingforstep4:3.
39000elapsedseconds.
beginofgcr0.
298146267204818100E-01RESofgcr0.
955547301333094658E-09in49iterationsTimingforprocessingforstep5:3.
44000elapsedseconds.
beginofbcgsl0.
328934356968701958E-03RESofbcgsl0.
698006138227474393E-09in16iterationsbeginofgcr0.
102067544683602406E-08RESofgcr0.
969841675518509429E-09in1iterationsTimingforprocessingforstep1:108.
25000elapsedseconds.
beginofbcgsl0.
307101071999445543E-01RESofbcgsl0.
998788656259226276E-09in11iterationsbeginofgcr0.
131913191092197407E-08RESofgcr0.
889851041683508861E-09in2iterationsTimingforprocessingforstep2:2.
50000elapsedseconds.
beginofbcgsl0.
370215569337918604E-01RESofbcgsl0.
728471243819791556E-09in12iterationsbeginofgcr0.
104455860894560670E-08RESofgcr0.
948845550215151657E-09in1iterationsTimingforprocessingforstep3:2.
50000elapsedseconds.
beginofbcgsl0.
348878083179526982E-01RESofbcgsl0.
829610442476401725E-09in12iterationsbeginofgcr0.
114433762484590935E-08RESofgcr0.
635845995011923888E-09in2iterationsTimingforprocessingforstep4:2.
50000elapsedseconds.
beginofbcgsl0.
266947703233833440E-01RESofbcgsl0.
688385709819754403E-09in12iterationsbeginofgcr0.
100135435371643626E-08RESofgcr0.
875385663076386664E-09in1iterationsTimingforprocessingforstep5:2.
46000elapsedseconds.
2014IBMIBMTechnicalComputing10IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALProfileComparisoncalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren2.
09682.
41384/384.
__module_integrate_NMOD_integrate[4]52.
32.
09682.
41384.
solver_grapes0.
24214.
12384/384.
pbl_driver0.
04157.
94384/384.
radiation_driver0.
0083.
80384/384.
*__module_gcr_NMOD_solve_helmholts_stub_in_solver_grapes0.
0067.
05384/384.
*__module_semi_lag_NMOD_semi_lag_interp_stub_in_solver_grapes0.
0054.
97384/384.
microphysics_driver0.
0150.
33384/384.
*__module_semi_lag_NMOD_upstream_interp_jin_stub_in_solver_grapes0.
0017.
91384/384.
cumulus_driver0.
0011.
72384/384.
*__module_semi_lag_NMOD_semi_get_upstream_jin_stub_in_solver_grapesMincommunicationtime:MPItask6492014IBMIBMTechnicalComputing11IBMECMWF16thHPCWorkshop,October2014ConvergenceofBiCGSTABinGRAPES-MESOUn-optimizedCodeOptimizedCode0:beginofgcr0.
118096356906410122E-030:RESofgcr0.
785681906255938855E-19in49iterations0:Timingforprocessingforstep1:18.
15000elapsedseconds.
0:Timingforprocessingforstep1:14.
52999cpuseconds.
0:beginofgcr0.
180227130734546867E-030:RESofgcr0.
690132004197575959E-19in49iterations0:Timingforprocessingforstep2:0.
90000elapsedseconds.
0:Timingforprocessingforstep2:0.
75000cpuseconds.
0:beginofgcr0.
712260919191608395E-040:RESofgcr0.
966563876032326532E-19in48iterations0:Timingforprocessingforstep3:0.
68000elapsedseconds.
0:Timingforprocessingforstep3:0.
57000cpuseconds.
0:beginofgcr0.
337160794746152708E-040:RESofgcr0.
877018965782972674E-19in47iterations0:Timingforprocessingforstep4:0.
67000elapsedseconds.
0:Timingforprocessingforstep4:0.
57000cpuseconds.
0:beginofgcr0.
196107554793862609E-040:RESofgcr0.
635560985222081976E-19in47iterations0:Timingforprocessingforstep5:0.
71000elapsedseconds.
0:Timingforprocessingforstep5:0.
60000cpuseconds.
0:beginofbicgstab0.
118096453737757547E-030:RESofbicgstab0.
380226254620264712E-08in3iterations0:beginofgcr0.
394720884628083064E-080:RESofgcr0.
746418612263664838E-19in16iterations0:Timingforprocessingforstep1:18.
99000elapsedseconds.
0:Timingforprocessingforstep1:18.
69000cpuseconds.
0:beginofbicgstab0.
168370346746922749E-030:RESofbicgstab0.
166872655366664435E-08in3iterations0:beginofgcr0.
181367330318505421E-080:RESofgcr0.
465501345880251435E-19in16iterations0:Timingforprocessingforstep2:0.
67000elapsedseconds.
0:Timingforprocessingforstep2:0.
68000cpuseconds.
0:beginofbicgstab0.
696717378252718038E-040:RESofbicgstab0.
137254158106719979E-08in3iterations0:beginofgcr0.
151730006467615455E-080:RESofgcr0.
322109698287421177E-19in16iterations0:Timingforprocessingforstep3:0.
45000elapsedseconds.
0:Timingforprocessingforstep3:0.
44000cpuseconds.
0:beginofbicgstab0.
320771797557436878E-040:RESofbicgstab0.
950087839437367948E-09in3iterations0:beginofgcr0.
109450945243131875E-080:RESofgcr0.
881479429351996220E-19in15iterations0:Timingforprocessingforstep4:0.
50000elapsedseconds.
0:Timingforprocessingforstep4:0.
50000cpuseconds.
0:beginofbicgstab0.
193261775264966473E-040:RESofbicgstab0.
985010942067601368E-08in2iterations0:beginofgcr0.
996454289745865310E-080:RESofgcr0.
365415647279281880E-19in17iterations0:Timingforprocessingforstep5:0.
48000elapsedseconds.
0:Timingforprocessingforstep5:0.
49000cpuseconds.
2014IBMIBMTechnicalComputing12IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOProfileComparison2014IBMIBMTechnicalComputing13IBMECMWF16thHPCWorkshop,October2014OptimizationVerification.
Accuracyofthecomputations.
Howdoesonecheckaccuracyonthecomputationsonoptimizedcodes–GRAPESMESOaccuracyverificationwassetfora48-hoursforecast.
–GRAPESGLOBALaccuracyverificationwassetfora10-dayforecast.
Majorchangeswereintroducedintoboth,GRAPESGLOBALandMESOCodes.
–Helmholtz'sequationsolutionalgorithm,VectorMASSinMicrophysicsroutines.
Qualitativeandquantitativeverificationmethods.
–VisualinspectionoftheGRAPESGLOBALandMESOgeneratedresults.
–Applystatistics,anddefinelimitsforacceptableresults.
Proceedslowlywithcaution.
Correlationcoefficients(ρ)betweenbase(C)andoptimizedresults(I).
Areaaveragednormalizeddifferences(σ)betweenbase(C)andoptimizedresults(I).
500mbGeopotentialHeight(Φ)fieldsandSurfacePrecipitationaregoodcandidates.
KMArangeforσ0.
98allmodels.
2014IBMIBMTechnicalComputing14IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOVerificationBase:42-hourforecastOptimized:42-hourforecast500mbGeopotentialHeightσandρarewithinacceptablerangeSurfacePrecipitationσandρarewithinacceptablerange2014IBMIBMTechnicalComputing15IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALVerificationGlobalModelsfor10-dayforecastsareimpossibletoverify–http://www.
washingtonpost.
com/blogs/capital-weather-gang/wp/2013/06/25/new-weather-service-supercomputer-faces-chaos/–GFS7-dayforecastdifferencesbetweenPOWER6andIntelsystemsatNCEP.
–Evenasmallchangeincompilerversion,nodecount,systemarchitecture,algorithmicchange,orbitlossesbyusinglessaccuraterepresentations(vectormass)cancauseaglobalweathermodeltodivertfrombaseresultsbeyond7forecastdays.
–Globalweathermodelverificationbeyond7daysforρ>0.
98,ishopeless.
–GRAPES-GLOBALverificationwasexaminedfrom1-10daysofforecast.
2014IBMIBMTechnicalComputing16IBMECMWF16thHPCWorkshop,October201410-DayGRAPES-GLOBALverification.
CorrelationcoefficientsandAreaAveragedDifferencesareusedtocompareruns.
–192-coreunmodifiedcoderunswereusedasbaseforcomparisons.
–10-dayforecastsofthe500mbGeopotentialHeightsfor2048-coresunmodified.
–10-dayforecastsofthe500mbGeopotentialHeightsfor4096-coresmodified.
–Microphysics(WSM6),BiCGSTAB,andacombinationofbothweretested.
–VSXintrinsiccallswereintroducedandtestedinBiCGSTABroutine.
–VectorMASSinWSM6drivesforecastinaslightlydifferentdirection.
2014IBMIBMTechnicalComputing17IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBAL:10-DAYGeopotentialHeightsForecast.
10-day500mbGeopotentialHeightsForecast.
–2048-coreunmodifiedcode,4096-coreoptimizedcode(WSM6,BiCGSTAB_SIMD)UnoptimizedRun:2048Cores500mbGeopotentialHeights.
OptimizedRun:4096Cores500mbGeopotentialHeights.
2014IBMIBMTechnicalComputing18IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBAL:10-DAYSurfacePrecipitationForecast.
10-daySurfacePrecipitationForecast.
–2048-coreunmodifiedcode,4096-coreoptimizedcode(WSM6,BiCGSTAB_SIMD)UnoptimizedRun:2048CoresSurfacePrecipitation.
OptimizedRun:4096CoresSurfacePrecipitation.
2014IBMIBMTechnicalComputing19IBMECMWF16thHPCWorkshop,October2014SummaryandConclusions.
TheGRAPES-GLOBALandGRAPES-MESOmodelswereoptimizedforperformance–BothmodelsusedtheGeneralizedConjugateResidual(GCR)IterativeSolver.
GCR:veryefficientcode,moderateconvergencerates.
–TheBi-conjugateGradientStabilized(BiCGSTAB)iterativesolverwasintroduced.
BiCGSTAB:lessefficientcode,butfastconvergencerates.
–Stand-aloneBiCGSTABsolverdidnotimproveperformance.
WhenBiCGSTABwasusedaheadofGCR,significantimprovementswererealized.
Increasedaccuracy,asseenfromconvergenceresiduals.
Lesstotaliterationstoachieveconvergence,betteroverallperformance.
–VectorMASSintrinsicfunctionswereappliedinthemicrophysicsroutines.
AccuracyverificationwasachallengeforGRAPES-GLOBALforupto10-days.
–GRAPES-MESOverifiedsuccessfullyfor7days,unlikeWSM6.
–VSXprimitives(singleprecision)inBiCGSTABwasnotcriticalinbothperformanceandaccuracy.
Megalayer 商家我们还算是比较熟悉的,商家主要业务方向是CN2优化带宽、国际BGP和全向带宽的独立服务器和站群服务器,且后来也有增加云服务器(VPS主机)业务。这次中秋节促销活动期间,有发布促销活动,这次活动力度认为还是比较大的,有提供香港、美国、菲律宾的年付VPS主机,CN2优化方案线路的低至年付159元。这次活动截止到10月30日,如果我们有需要的话可以选择。第一、特价限量年付VPS主...
热网互联怎么样?热网互联(hotiis)是随客云计算(Suike.Cloud)成立于2009年,增值电信业务经营许可证:B1-20203716)旗下平台。热网互联云主机是CN2高速回国线路,香港/日本/洛杉矶/韩国CN2高速线路云主机,最低33元/月;热网互联国内BGP高防服务器,香港服务器,日本服务器全线活动中,大量七五折来袭!点击进入:热网互联官方网站地址热网互联香港/日本/洛杉矶/韩国cn2...
公司成立于2007年,是国内领先的互联网业务平台服务提供商。公司专注为用户提供低价高性能云计算产品,致力于云计算应用的易用性开发,并引导云计算在国内普及。目前,旅途云公司研发以及运营云服务基础设施服务平台(IaaS),面向全球客户提供基于云计算的IT解决方案与客户服务,拥有丰富的国内BGP、双线高防、香港等优质的IDC资源。点击进入:旅途云官方网商家LOGO优惠方案:CPU内存硬盘带宽/流量/防御...
opteron为你推荐
ip购买如何购买.com的网站?haole10.com空人电影网改网址了?www.10yyy.cn是空人电影网么dadi.tv电视机如何从iptv转换成tv?555sss.com不能在线播放了??555恶魔兜兜狼人杀恶魔可以验出神民的身份吗莱姿蔓圣诗蔓有祛痘功效吗雀嘴鳝雀鳝鱼嘴巴变红甚么缘由网站检测工具网站数据分析员都在使用那些工具监测网站啊?www.mm.com来个网站~?~悠达网大学生校园网购有什么好的平台可以推荐给我吗?
中国域名注册 域名空间购买 高防dns GGC t楼 godaddy优惠券 国外在线代理 ca4249 howfile 上海联通宽带测速 无限流量 百度云加速 97rb nnt 密钥索引 免费赚q币 web是什么意思 俄勒冈州 卡巴斯基官方下载 达拉斯 更多