2.31opteron

opteron  时间:2021-03-27  阅读:()
IBMTechnicalComputing2011IBMIntroductionofaStabilizedBi-ConjugateGradientiterativesolverforHelmholtz'sEquationontheCMAGRAPESGlobalandRegionalmodels.
PengHongBo(IBM),ZaphirisChristidis(Lenovo)andZhiyanJin(CMA)2014IBMIBMTechnicalComputing2IBMECMWF16thHPCWorkshop,October2014OutlineIntroduction.
–Helmholtz'sEquation.
TheCMAGRAPESmodelsandtheGeneralizedConjugateResidualMethod(GCR).
–GCRimplementationonGRAPES-GLOBALandGRAPES-MESOmodels.
–GRAPESprofiles.
IntroductionofBiconjugateGradientStabilizedMethod(BiCGSTAB)onGRAPES.
–Properties,ImplementationandprofileinformationinbothGLOBALandMESOmodels.
–PerformanceofBiCGSTABonGRAPES-GLOBALandGRAPES-MESOmodels.
Accuracyverificationandstatistics.
–Verificationchallengesofthe10-dayforecastofGRAPES-GLOBAL.
–Accuracybehavioronintroducedcodechangesasafunctionofforecastdays.
Areaaveragederrorsandcorrelationcoefficientsofoptimizedvsbaseresults.
–Chaoticbehaviorintheverificationofresultsformorethan7forecastdays.
Conclusions2014IBMIBMTechnicalComputing3IBMECMWF16thHPCWorkshop,October2014HelmholtzorPressureEquation.
Hemholtz'sequationiscommonlyusedinNumericalWeatherPrediction(NWP)models.
+2=0,–istheLaplacianOperator,isa3Dpressurefunctionandisapositivefunction.
Usingfinitedifferences,theaboveequationisreducedtoasystemoflinearequationsas:=0,–AisanMNxNMblocktriadiagonalmatrix,foragridofMxNhorizontalpoints–Theapproximatesolutionofthelinearequationsis:,theresidualis:=.
–WhenapreconditionerLisused,thediscretizedHelmholtzequationisformulatedas:1=1.
–LargehorizontalgridsinNWPmodelscallforefficientiterativemethodsforsolutions.
2014IBMIBMTechnicalComputing4IBMECMWF16thHPCWorkshop,October2014HelmholtzEquationinGRAPESGRAPES(Global/RegionalAssimilationPredictionSystem).
–ItisaNumericalWeatherpredictionsystemdevelopedbyChinaMeteorologicalAdministration(CMA).
–ItincludesaGlobalandaRegionalweathermodelaswellasdataassimilationsystemsforthem.
DynamiccorefeaturesinGRAPES–Fullycompressibleequations.
–Height-basedterrain-followingcoordinates–Optionforhydrostaticandnon-hydrostaticschemes.
–Arakawa"C"staggeredlat-lonhorizontalgrid.
–Charney-Phillipsverticalschemeforprognosticvariables–PolarFilterandMassFixingscheme–2-time-levelSemiImplicitSemi-Lagrangiantime-stepping.
–GCR–solverforHelmholtzEquationGeneralizedConjugateResidual(GCR)algorithm.
UsesanIncompletesparseLowerandUppertriangular(ILU)matrixfactorizationasapre-conditioner.
B1,B2,…,B19representthecoefficientmatrixofHelmholtz'sequation,whichisdiscretizedintoalargesparsematrix2014IBMIBMTechnicalComputing5IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALProfile,GCRcalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren2.
31811.
41384/384.
__module_integrate_NMOD_integrate[4]48.
62.
31811.
41384.
solver_grapes0.
31236.
34384/384.
pbl_driver0.
00166.
60384/384.
*__module_gcr_NMOD_solve_helmholts_stub_in_solver_grapes0.
03151.
48384/384.
radiation_driver0.
0078.
31384/384.
microphysics_driver0.
0069.
13384/384.
*__module_semi_lag_NMOD_semi_lag_interp_stub_in_solver_grapes0.
0052.
03384/384.
*__module_semi_lag_NMOD_upstream_interp_jin_stub_in_solver_grapes0.
0019.
46384/384.
cumulus_driver0.
0012.
03384/384.
*__module_semi_lag_NMOD_semi_get_upstream_jin_stub_in_solver_grapesMincommunicationtime:MPItask649Maxcommunicationtime:MPItask9392014IBMIBMTechnicalComputing6IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOProfileGCRcalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren1.
68504.
701080/1080.
*__module_integrate_NMOD_solver_grapes_stub_in___module_integrate_NMOD_solve_interface[5]1.
68504.
701080.
__module_integrate_NMOD_solver_grapes[6]0.
00221.
071080/1080.
__module_gcr_NMOD_solve_helmholts[8]0.
0667.
781080/1080.
__module_semi_lag_NMOD_semi_lag_interp[9]0.
5132.
821079/1079.
__module_semi_lag_NMOD_upstream_interp_phy[18]33.
270.
001080/1080.
__module_prm_wangmh_NMOD_prm_y_xiao[19]30.
930.
001080/1080.
__module_prm_wangmh_NMOD_prm_x_xiao[21]0.
0028.
631080/1080.
microphysics_driver[22]23.
440.
001080/1080.
__module_prm_wangmh_NMOD_prm_z_xiao[27]Mincommunicationtime:MPItask0Maxcommunicationtime:MPItask10802014IBMIBMTechnicalComputing7IBMECMWF16thHPCWorkshop,October2014ConvergenceofBi-conjugateGradientStabilizedalgorithmConvergenceoftheBiCGSTABandGCRalgorithmsfor1and25stepsofGRAPES.
–BiCGSTAB(2)convergesinfeweriterationsthanCGR,butmorecomputationallyintensive.
–TheintroductionBiCGSTABimprovedoverallperformanceintheGRAPESmodels.
Usedaspre-cursortotheapplicationoftheGCRalgorithm(extrapre-conditioner),TheamountofiterationsrequiredfortheconvergenceoftheGCRdecreasedsignificantly,GRAPESexecutedmuchfaster(withthehelpofVSXprimitivesincoding),SameandevenbetteraccuracyastheoriginalGCRalgorithm.
2014IBMIBMTechnicalComputing8IBMECMWF16thHPCWorkshop,October2014UpdatedHelmholtzSolverimplementationGRAPES-GLOBAL#ifdefBCGSLep=max(1.
D-10,DBLE(grid%ep))CALLpsolve_bcgsl_main(grid,gcr,ep,a_helm,b_helm,pi,&idep,jdep,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#elseep=max(1.
D-8,DBLE(grid%ep))CALLpsolve_bicgstab_main(grid,gcr,ep,a_helm,b_helm,pi,&idep,jdep,ids,ide,jds,jde,kds,&kde,ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#endifep=grid%epd=1.
0d0CALLpsolve_gcr_main(grid,gcr,ep,a_helm,b_helm,&iter_max,pi,d,idep,jdep,ids,ide,&jds,jde,kds,kde,ims,ime,jms,jme,&kms,kme,its,ite,jts,jte,kts,kte)GRAPES-MESO#ifdefBCGSLep=1.
D-8CALLpsolve_bcgsl_main(grid,gcr,ep,a_helm,b_helm,&pi,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#elseep=1D-8CALLpsolve_bicgstab_main(grid,gcr,ep,a_helm,b_helm,&pi,ids,ide,jds,jde,kds,kde,&ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)#endifep=1.
D-19CALLpsolve_gcr_main(grid,gcr,ep,a_helm,b_helm,&iter_max,pi,d,ids,ide,jds,jde,&kds,kde,ims,ime,jms,jme,kms,kme,&its,ite,jts,jte,kts,kte)2014IBMIBMTechnicalComputing9IBMECMWF16thHPCWorkshop,October2014ConvergenceofBiCGSTABinGRAPES-GLOBALUn-optimizedCodeOptimizedCodebeginofgcr0.
328934647159688379E-03RESofgcr0.
951769473740471055E-09in54iterationsTimingforprocessingforstep1:105.
43999elapsedseconds.
beginofgcr0.
307738677760282797E-01RESofgcr0.
985465629245594409E-09in64iterationsTimingforprocessingforstep2:3.
56000elapsedseconds.
beginofgcr0.
466354355510276777E-01RESofgcr0.
987319218430061550E-09in55iterationsTimingforprocessingforstep3:3.
54000elapsedseconds.
beginofgcr0.
419494279764634215E-01RESofgcr0.
952816344175419192E-09in45iterationsTimingforprocessingforstep4:3.
39000elapsedseconds.
beginofgcr0.
298146267204818100E-01RESofgcr0.
955547301333094658E-09in49iterationsTimingforprocessingforstep5:3.
44000elapsedseconds.
beginofbcgsl0.
328934356968701958E-03RESofbcgsl0.
698006138227474393E-09in16iterationsbeginofgcr0.
102067544683602406E-08RESofgcr0.
969841675518509429E-09in1iterationsTimingforprocessingforstep1:108.
25000elapsedseconds.
beginofbcgsl0.
307101071999445543E-01RESofbcgsl0.
998788656259226276E-09in11iterationsbeginofgcr0.
131913191092197407E-08RESofgcr0.
889851041683508861E-09in2iterationsTimingforprocessingforstep2:2.
50000elapsedseconds.
beginofbcgsl0.
370215569337918604E-01RESofbcgsl0.
728471243819791556E-09in12iterationsbeginofgcr0.
104455860894560670E-08RESofgcr0.
948845550215151657E-09in1iterationsTimingforprocessingforstep3:2.
50000elapsedseconds.
beginofbcgsl0.
348878083179526982E-01RESofbcgsl0.
829610442476401725E-09in12iterationsbeginofgcr0.
114433762484590935E-08RESofgcr0.
635845995011923888E-09in2iterationsTimingforprocessingforstep4:2.
50000elapsedseconds.
beginofbcgsl0.
266947703233833440E-01RESofbcgsl0.
688385709819754403E-09in12iterationsbeginofgcr0.
100135435371643626E-08RESofgcr0.
875385663076386664E-09in1iterationsTimingforprocessingforstep5:2.
46000elapsedseconds.
2014IBMIBMTechnicalComputing10IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALProfileComparisoncalled/totalparentsindex%timeselfdescendentscalled+selfnameindexcalled/totalchildren2.
09682.
41384/384.
__module_integrate_NMOD_integrate[4]52.
32.
09682.
41384.
solver_grapes0.
24214.
12384/384.
pbl_driver0.
04157.
94384/384.
radiation_driver0.
0083.
80384/384.
*__module_gcr_NMOD_solve_helmholts_stub_in_solver_grapes0.
0067.
05384/384.
*__module_semi_lag_NMOD_semi_lag_interp_stub_in_solver_grapes0.
0054.
97384/384.
microphysics_driver0.
0150.
33384/384.
*__module_semi_lag_NMOD_upstream_interp_jin_stub_in_solver_grapes0.
0017.
91384/384.
cumulus_driver0.
0011.
72384/384.
*__module_semi_lag_NMOD_semi_get_upstream_jin_stub_in_solver_grapesMincommunicationtime:MPItask6492014IBMIBMTechnicalComputing11IBMECMWF16thHPCWorkshop,October2014ConvergenceofBiCGSTABinGRAPES-MESOUn-optimizedCodeOptimizedCode0:beginofgcr0.
118096356906410122E-030:RESofgcr0.
785681906255938855E-19in49iterations0:Timingforprocessingforstep1:18.
15000elapsedseconds.
0:Timingforprocessingforstep1:14.
52999cpuseconds.
0:beginofgcr0.
180227130734546867E-030:RESofgcr0.
690132004197575959E-19in49iterations0:Timingforprocessingforstep2:0.
90000elapsedseconds.
0:Timingforprocessingforstep2:0.
75000cpuseconds.
0:beginofgcr0.
712260919191608395E-040:RESofgcr0.
966563876032326532E-19in48iterations0:Timingforprocessingforstep3:0.
68000elapsedseconds.
0:Timingforprocessingforstep3:0.
57000cpuseconds.
0:beginofgcr0.
337160794746152708E-040:RESofgcr0.
877018965782972674E-19in47iterations0:Timingforprocessingforstep4:0.
67000elapsedseconds.
0:Timingforprocessingforstep4:0.
57000cpuseconds.
0:beginofgcr0.
196107554793862609E-040:RESofgcr0.
635560985222081976E-19in47iterations0:Timingforprocessingforstep5:0.
71000elapsedseconds.
0:Timingforprocessingforstep5:0.
60000cpuseconds.
0:beginofbicgstab0.
118096453737757547E-030:RESofbicgstab0.
380226254620264712E-08in3iterations0:beginofgcr0.
394720884628083064E-080:RESofgcr0.
746418612263664838E-19in16iterations0:Timingforprocessingforstep1:18.
99000elapsedseconds.
0:Timingforprocessingforstep1:18.
69000cpuseconds.
0:beginofbicgstab0.
168370346746922749E-030:RESofbicgstab0.
166872655366664435E-08in3iterations0:beginofgcr0.
181367330318505421E-080:RESofgcr0.
465501345880251435E-19in16iterations0:Timingforprocessingforstep2:0.
67000elapsedseconds.
0:Timingforprocessingforstep2:0.
68000cpuseconds.
0:beginofbicgstab0.
696717378252718038E-040:RESofbicgstab0.
137254158106719979E-08in3iterations0:beginofgcr0.
151730006467615455E-080:RESofgcr0.
322109698287421177E-19in16iterations0:Timingforprocessingforstep3:0.
45000elapsedseconds.
0:Timingforprocessingforstep3:0.
44000cpuseconds.
0:beginofbicgstab0.
320771797557436878E-040:RESofbicgstab0.
950087839437367948E-09in3iterations0:beginofgcr0.
109450945243131875E-080:RESofgcr0.
881479429351996220E-19in15iterations0:Timingforprocessingforstep4:0.
50000elapsedseconds.
0:Timingforprocessingforstep4:0.
50000cpuseconds.
0:beginofbicgstab0.
193261775264966473E-040:RESofbicgstab0.
985010942067601368E-08in2iterations0:beginofgcr0.
996454289745865310E-080:RESofgcr0.
365415647279281880E-19in17iterations0:Timingforprocessingforstep5:0.
48000elapsedseconds.
0:Timingforprocessingforstep5:0.
49000cpuseconds.
2014IBMIBMTechnicalComputing12IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOProfileComparison2014IBMIBMTechnicalComputing13IBMECMWF16thHPCWorkshop,October2014OptimizationVerification.
Accuracyofthecomputations.
Howdoesonecheckaccuracyonthecomputationsonoptimizedcodes–GRAPESMESOaccuracyverificationwassetfora48-hoursforecast.
–GRAPESGLOBALaccuracyverificationwassetfora10-dayforecast.
Majorchangeswereintroducedintoboth,GRAPESGLOBALandMESOCodes.
–Helmholtz'sequationsolutionalgorithm,VectorMASSinMicrophysicsroutines.
Qualitativeandquantitativeverificationmethods.
–VisualinspectionoftheGRAPESGLOBALandMESOgeneratedresults.
–Applystatistics,anddefinelimitsforacceptableresults.
Proceedslowlywithcaution.
Correlationcoefficients(ρ)betweenbase(C)andoptimizedresults(I).
Areaaveragednormalizeddifferences(σ)betweenbase(C)andoptimizedresults(I).
500mbGeopotentialHeight(Φ)fieldsandSurfacePrecipitationaregoodcandidates.
KMArangeforσ0.
98allmodels.
2014IBMIBMTechnicalComputing14IBMECMWF16thHPCWorkshop,October2014GRAPES-MESOVerificationBase:42-hourforecastOptimized:42-hourforecast500mbGeopotentialHeightσandρarewithinacceptablerangeSurfacePrecipitationσandρarewithinacceptablerange2014IBMIBMTechnicalComputing15IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBALVerificationGlobalModelsfor10-dayforecastsareimpossibletoverify–http://www.
washingtonpost.
com/blogs/capital-weather-gang/wp/2013/06/25/new-weather-service-supercomputer-faces-chaos/–GFS7-dayforecastdifferencesbetweenPOWER6andIntelsystemsatNCEP.
–Evenasmallchangeincompilerversion,nodecount,systemarchitecture,algorithmicchange,orbitlossesbyusinglessaccuraterepresentations(vectormass)cancauseaglobalweathermodeltodivertfrombaseresultsbeyond7forecastdays.
–Globalweathermodelverificationbeyond7daysforρ>0.
98,ishopeless.
–GRAPES-GLOBALverificationwasexaminedfrom1-10daysofforecast.
2014IBMIBMTechnicalComputing16IBMECMWF16thHPCWorkshop,October201410-DayGRAPES-GLOBALverification.
CorrelationcoefficientsandAreaAveragedDifferencesareusedtocompareruns.
–192-coreunmodifiedcoderunswereusedasbaseforcomparisons.
–10-dayforecastsofthe500mbGeopotentialHeightsfor2048-coresunmodified.
–10-dayforecastsofthe500mbGeopotentialHeightsfor4096-coresmodified.
–Microphysics(WSM6),BiCGSTAB,andacombinationofbothweretested.
–VSXintrinsiccallswereintroducedandtestedinBiCGSTABroutine.
–VectorMASSinWSM6drivesforecastinaslightlydifferentdirection.
2014IBMIBMTechnicalComputing17IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBAL:10-DAYGeopotentialHeightsForecast.
10-day500mbGeopotentialHeightsForecast.
–2048-coreunmodifiedcode,4096-coreoptimizedcode(WSM6,BiCGSTAB_SIMD)UnoptimizedRun:2048Cores500mbGeopotentialHeights.
OptimizedRun:4096Cores500mbGeopotentialHeights.
2014IBMIBMTechnicalComputing18IBMECMWF16thHPCWorkshop,October2014GRAPES-GLOBAL:10-DAYSurfacePrecipitationForecast.
10-daySurfacePrecipitationForecast.
–2048-coreunmodifiedcode,4096-coreoptimizedcode(WSM6,BiCGSTAB_SIMD)UnoptimizedRun:2048CoresSurfacePrecipitation.
OptimizedRun:4096CoresSurfacePrecipitation.
2014IBMIBMTechnicalComputing19IBMECMWF16thHPCWorkshop,October2014SummaryandConclusions.
TheGRAPES-GLOBALandGRAPES-MESOmodelswereoptimizedforperformance–BothmodelsusedtheGeneralizedConjugateResidual(GCR)IterativeSolver.
GCR:veryefficientcode,moderateconvergencerates.
–TheBi-conjugateGradientStabilized(BiCGSTAB)iterativesolverwasintroduced.
BiCGSTAB:lessefficientcode,butfastconvergencerates.
–Stand-aloneBiCGSTABsolverdidnotimproveperformance.
WhenBiCGSTABwasusedaheadofGCR,significantimprovementswererealized.
Increasedaccuracy,asseenfromconvergenceresiduals.
Lesstotaliterationstoachieveconvergence,betteroverallperformance.
–VectorMASSintrinsicfunctionswereappliedinthemicrophysicsroutines.
AccuracyverificationwasachallengeforGRAPES-GLOBALforupto10-days.
–GRAPES-MESOverifiedsuccessfullyfor7days,unlikeWSM6.
–VSXprimitives(singleprecision)inBiCGSTABwasnotcriticalinbothperformanceandaccuracy.

老用户专享福利 腾讯云 免费领取轻量云2核4G服务器一年

感恩一年有你!免费领取2核4G套餐!2核4G轻量应用服务器2核 CPU 4GB内存 60G SSD云硬盘 6Mbps带宽领取地址:https://cloud.tencent.com/act/pro/lighthousethankyou活动规则活动时间2021年9月23日 ~ 2021年10月23日活动对象腾讯云官网已注册且完成实名认证的国内站用户(协作者与子用户账号除外),且符合以下活动条件:账号...

CloudCone月付$48,MC机房可小时付费

CloudCone商家在前面的文章中也有多次介绍,他们家的VPS主机还是蛮有特点的,和我们熟悉的DO、Linode、VuLTR商家很相似可以采用小时时间计费,如果我们不满意且不需要可以删除机器,这样就不扣费,如果希望用的时候再开通。唯独比较吐槽的就是他们家的产品太过于单一,一来是只有云服务器,而且是机房就唯一的MC机房。CloudCone 这次四周年促销活动期间,商家有新增独立服务器业务。同样的C...

ftlcloud(超云)9元/月,1G内存/1核/20g硬盘/10M带宽不限/10G防御,美国云服务器

ftlcloud怎么样?ftlcloud(超云)目前正在搞暑假促销,美国圣何塞数据中心的云服务器低至9元/月,系统盘与数据盘分离,支持Windows和Linux,免费防御CC攻击,自带10Gbps的DDoS防御。FTL-超云服务器的主要特色:稳定、安全、弹性、高性能的云端计算服务,快速部署,并且可根据业务需要扩展计算能力,按需付费,节约成本,提高资源的有效利用率。点击进入:ftlcloud官方网站...

opteron为你推荐
巨星prince去世Whitney Houston因什么去世的?www.hao360.cn主页设置为http://hao.360.cn/,但打开360浏览器先显示www.yes125.com后转换为www.2345.com,搜索注册表和嘀动网手机一键通用来干嘛呢?seo优化工具seo优化软件有哪些?bbs2.99nets.com天堂1单机版到底怎么做dadi.tvapple TV 功能介绍月风随笔享受生活作文600字月风随笔散文校园月色600字初中作文铂金血痕“斑斑的血痕”是什么意思?莱姿蔓圣诗蔓有祛痘功效吗
tightvnc 国内php空间 中国电信测网速 卡巴斯基免费试用版 申请免费空间和域名 个人免费主页 东莞idc 沈阳主机托管 域名转入 空间申请 后门 免备案cdn加速 rewritecond .htaccess 时间同步服务器 大硬盘补丁 瓦工技术 招聘瓦工 主机箱 国外bt下载网站 更多