sufficientopteron
opteron 时间:2021-03-27 阅读:(
)
MellanoxTechnologiesInc.
2900StenderWay,SantaClara,CA95054Tel:408-970-3400Fax:408-970-3403http://www.
mellanox.
comRealApplicationPerformanceandBeyondWhitePaper:RealApplicationPerformanceandBeyond2006MellanoxTechnologiesInc.
2Scientists,engineersandanalystsinvirtuallyeveryfieldareturningtohighperformancecomputingtosolvetoday'svitalandcomplexproblems.
Simulationsareincreasinglyreplacingexpensivephysicaltesting,asmorecomplexenvironmentscanbemodeledandinsomecases,fullysimulated.
High-performancecomputingencompassesadvancedcomputationoverparallelprocessing,enablingfasterexecutionofhighlycomputeintensivetaskssuchasclimateresearch,molecularmodeling,physicalsimulations,cryptanalysis,geophysicalmodeling,automotiveandaerospacedesign,financialmodeling,dataminingandmore.
HPCclustershavebecomethemostcommonbuildingblocksforhigh-performancecomputing,notonlybecausetheyareaffordable,butbecausetheyprovidetheneededflexibilityanddeliversuperiorprice/performancecomparedtoproprietarysymmetricmultiprocessing(SMP)systems,withthesimplicityandvalueofindustrystandardcomputing.
MauiHighPerformanceComputingCenter1280servers,MellanoxInfiniBandinterconnect,42.
3TFlopsReal-worldapplicationperformancedependsontheperformanceofthevariouscluster'skeyelements–theprocessor,thememory,andtheinterconnect.
Theinterconnectcontrolsthedatatransferbetweenservers,andhasahighinfluenceontheCPUefficiencyandmemoryutilization.
Transportoffloadinterconnectarchitectures,unlikethe"on-loading"ones,eliminatetheneedofdealingwiththeprotocolprocessingwithintheCPUandthereforeincreasethenumberofcyclesavailableforcomputationaltasks.
IftheCPUisbusymovingdataandhandlingnetworkprotocolprocessing,itisunabletoperformcomputationalwork,andtheoverallproductivityofthesystemisseverelydegraded.
Thememorycopyoverheadincludestheresourcesrequiredtocopydatabuffersfromthenetworkdevicetothekernelmemoryandthenfromthekernelmemorytotheapplicationmemory.
Thisapproachrequiresmultiplememoryaccessesbeforethedataisplacedinitsfinaldestination.
Whileitisnotamajorproblemforsmalldatatransfers,itisabigproblemforlargerdatatransfers.
Thisiswheretheinterconnectzero-copycapabilitieseliminatesthememorybandwidthbottleneckwithoutinvolvingtheCPUinthenetworkdatatransfer.
WhitePaper:RealApplicationPerformanceandBeyond2006MellanoxTechnologiesInc.
3SandiaNationalLab4500servers,MellanoxInfiniBandinterconnect53TFlops,84.
66%LinpackefficiencyTheinterconnectbandwidthandlatencyhavetraditionallybeenusedastwometricsforassessingtheperformanceofthesystem'sinterconnectfabric.
However,thesetwometricsaretypicallynotsufficienttodeterminetheperformanceofrealworldapplications.
Typicalreal-worldapplicationssendmessagesrangingfrom64Byteto4Megabyteusingnotonlypoint-to-pointcommunicationbutadiversemixtureofcommunicationpatterns,includingcollectiveandreductionpatternsinthecaseofMPI.
Insomecases,interconnectvendorscreateartificialbenchmarks,suchasmessagerate,andapplybombasticmarketingsloganstothesebenchmarks–suchas"Hypermessaging".
Messagerateisyetanothersinglepointinthepoint-to-pointbandwidthgraph.
Ifthetraditionalinterconnectbandwidthindicatesthemaximumavailablebandwidth(singlepoint),messagerateindicatesthebandwidthformessagesizeofzeroor2bytes.
Thesinglepointsofdata,givesomeindicationfortheinterconnectperformance,butarefarfromdescribingtherealworldapplicationperformance.
Theinteractivecombinationofthosepoints,togetherwithothers(CPUoverhead,zerocopyetc.
),willdeterminetheoverallabilityoftheconnectivitysolution.
Thedifferencebetweentheoreticalpowerandwhatisactuallydeliveredismeasuredasprocessorefficiency.
ThemoreCPUcyclesusedtogetthedataoutthedoorby"fillingthewire"duetoprotocolanddatatransferinefficiencies,thelesscyclesareavailablefortheapplication.
Whencomparinglatenciesofdifferentinterconnects,oneneedstopayattentiontotheinterconnectarchitecture.
1useclatency"on-loading"interconnectversus2useclatency"off-load"solutionissimilartoacasewhenoneneedstodecidebetweentwocarsthatshowthesamehorsepower(i.
e.
CPU).
Bothenginesarecapableof200milesperhour,butthefirstcar,dueto"on-loading",limitstheactualenginepowerto75milesperhour(theenginepowermustbeusedforothertasks).
TheSecondcarhasnolimitationsontheengine,butitswheelscantolerateonly150milesWhitePaper:RealApplicationPerformanceandBeyond2006MellanoxTechnologiesInc.
4perhour.
Theknowledgeonthewheelstolerance(i.
e.
latency),asasinglepointofdata,isdefinitelymisleading.
Thereareattemptstoproviderealworldapplicationperformancewhilecomparingdifferentinterconnects,butinmostcasesthe"comparison"isbiasedandbyusingdifferentsystemsand/orconditions,whichmakesatruecomparisondifficult.
Therehavebeenrecentcasescomparing10-GigabitEthernettoInfiniBand.
WhileInfiniBandadaptersweretestedwithPCIex4(thatislimitedto~700MByte/secbandwidth(duetolimitationsincurrentavailablesystems),the10GigabitEthernetcardswerePCI-X,thatiscapabletohigherbandwidth(~850-900MByte/s).
OthercasescompareInfiniBandPCIex4tootherinterconnectswithPCIex8hostinterface(theonlyvalidconclusiononecanmakeisthatPCIex8hasmorelanesthanPCIex4).
AnotherpapercomparedQLogicInfiniPathonIntel3GHzCPUbasedsystemtoMellanoxInfiniBandon2.
2GHzOpteronbasedsystem.
Anyattempttocomparedifferentinterconnectsinthosemannersisdeceptive.
RealapplicationperformanceInfiniBandisaproveninterconnectforclusteredserversolutions,andoneoftheleadingconnectivitysolutionforhigh-performancecomputing.
InfiniBandwasdesignedasageneralI/Oandinpracticeprovideslow-latencyandthehighestlinkspeed.
ComputationalFluidDynamics(CFD)isoneofthebranchesoffluidmechanicsthatusesnumericalmethodsandalgorithmstosolveandanalyzeproblemsthatinvolvefluidflows.
ANSYS/FLUENTisaleadingcommercialsoftwareproviderforsolvingfluidflowproblems.
ThebroadphysicalmodelingcapabilitiesofFLUENThavebeenappliedtoindustrialapplicationsrangingfromairflowoveranaircraftwingtocombustioninafurnace,frombubblecolumnstoglassproduction,frombloodflowtosemiconductormanufacturing,fromcleanroomdesigntowastewatertreatmentplants.
Theabilityofthesoftwaretomodelin-cylinderengines,aeroacoustics,turbomachinery,andmultiphasesystemshasservedtobroadenitsreach.
AtthecoreofanyCFDcalculationisacomputationalgrid,usedtodividethesolutiondomainintothousandsormillionsofelementswheretheproblemvariablesarecomputedandstored.
InFLUENT,unstructuredgridtechnologyisused,whichmeansthatthegridcanconsistofelementsinavarietyofshapes:quadrilateralsandtrianglesfor2Dsimulations,andhexahedral,tetrahedral,prisms,andpyramidsfor3Dsimulations.
Theseelementsformaninterlockingnetworkthroughoutthevolumewherethefluidflowanalysistakesplace.
TheperformanceofaCFDcodedependsonseveralfactors,includingsizeandtopologyofthemesh,physicalmodels,numericsandparallelization,compilersandoptimization,inadditiontoperformancecharacteristicsofthehardwarewherethesimulationisperformed.
FLUENTprovidesasetofbenchmarkproblemswhichrepresenttypicalcurrentusageandcoveringawiderangeofmeshsizesandphysicalmodels.
Theproblemsselectedrepresentarangeofsimulationstypicalofthosewhichmightbefoundinindustry.
TheprincipalobjectiveofthisbenchmarksuiteistoprovidecomprehensiveandfaircomparativeinformationoftheperformanceofFLUENTonavailablehardwareplatforms.
ThefollowingchartscomparesMellanoxInfiniBandandQLogicInfiniPathinterconnectsonthesameplatform–dualcore,dualsocket,IntelXeon3GHz5100series(codenameWoodcrest)servers,usingFLUENTbenchmarks.
Whentestingrealworldapplications,theentirearchitecturemakesthedifference.
TheMellanoxarchitectureisafulltransport-offloadone,withhardwarecapabilitiesofRDMA,whileQLogicisafull"on-loading"architecture.
WhitePaper:RealApplicationPerformanceandBeyond2006MellanoxTechnologiesInc.
5InFluentFL5L3benchmark,aTurbulentflowofairthroughaductiscomputed.
Thecross-sectionalplanesoftheducttransitionfromacircleattheinlettoarectangleattheoutflowboundary.
TheReynolds-StressModelisusedforcomputingturbulence(numberofcells:9,792,512,celltypehexahedral,modelsRSMturbulence,solversegregatedimplicit).
FLUENTFL5L2benchmarkrepresentsthecomputationoftheexteriorflowfieldaroundasimplifiedmodelofapassengersedan.
ThesimulationgeometrywasusedfortheJapanExternalAerodynamicscompetition.
Aviscous-hybridgridwithprismaticcellsisusedtoadequatelyFluent6.
3,FL5L3case0200400600800100012001400160018002000020406080100120140CPUcoresRating(performance)QlogicMellanoxFluent6.
3,FL5L2case02000400060008000020406080CPUcoresRating(performance)QlogicMellanoxWhitePaper:RealApplicationPerformanceandBeyond2006MellanoxTechnologiesInc.
6modeltheboundarylayerregions(numberofcells3,618,080,celltypehybrid,modelsk-epsilonturbulence,solversegregatedimplicit).
ChoosingtherightinterconnectInbothcasesofFLUENTbenchmarks,MellanoxInfiniBandshowshigherperformanceandbettersuper-linearscalingcomparingtoQLogicInfiniPath.
FLUENT'sCFDapplicationisalatency-sensitiveapplication,andtheresultsshownherearegoodexamplesonhowpurelatencybenchmarkscanbemisleadingwhenchoosingtherightinterconnect.
Inordertodeterminethesystem'sperformance,oneshouldtakeintoconsiderationtheentireinterconnectarchitecture(suchasoff-loadingversuson-loading)andtheabilityofscaling,ratherthanjustsinglepointsofdata.
Inordertoprovidebetterapplicationssight,MellanoxhascreatedtheMellanoxClusterCenter.
TheMellanoxClusterCenteroffersanenvironmentfordeveloping,testing,benchmarkingandoptimizingproductsbasedonInfiniBandtechnology.
Thecenter,locatedinSantaClara,California,provideson-sitetechnicalsupportandenablessecuresessionsonsiteorremotely.
MoredetailscanbeachievedthroughMellanoxwebsite.
主机参考最新消息:JustHost怎么样?JustHost服务器好不好?JustHost好不好?JustHost是一家成立于2006年的俄罗斯服务器提供商,支持支付宝付款,服务器价格便宜,200Mbps大带宽不限流量,支持免费更换5次IP,支持控制面板自由切换机房,目前JustHost有俄罗斯5个机房可以自由切换选择,最重要的还是价格真的特别便宜,最低只需要87卢布/月,约8.5元/月起!just...
菠萝云国人商家,今天分享一下菠萝云的广州移动机房的套餐,广州移动机房分为NAT套餐和VDS套餐,NAT就是只给端口,共享IP,VDS有自己的独立IP,可做站,商家给的带宽起步为200M,最高给到800M,目前有一个8折的优惠,另外VDS有一个下单立减100元的活动,有需要的朋友可以看看。菠萝云优惠套餐:广州移动NAT套餐,开放100个TCP+UDP固定端口,共享IP,8折优惠码:gzydnat-8...
Webhosting24宣布自7月1日起开始对日本机房的VPS进行NVMe和流量大升级,几乎是翻倍了硬盘和流量,价格依旧不变。目前来看,日本VPS国内过去走的是NTT直连,服务器托管机房应该是CDN77*(也就是datapacket.com),加上高性能平台(AMD Ryzen 9 3900X+NVMe),还是有相当大的性价比的。此外在6月30日,又新增了洛杉矶机房,CPU为AMD Ryzen 9...
opteron为你推荐
硬盘的工作原理硬盘的工作原理?是怎样存取数据的?bbs.99nets.com怎么打造完美SF月神谭求男变女类的变身小说网站检测如何进行网站全面诊断javmoo.com找下载JAV软件格式的网站m.kan84.net经常使用http://www.feikan.cc看电影的进来帮我下啊邯郸纠风网邯郸媒体曝光电话多少莱姿蔓蕊姿蔓是什么样的牌子来的苗惟妮和空姐一起的日子23集全集在线看下载电视剧分集剧情介绍两朝太岁冲犯太岁什么意思
过期域名 tk域名注册 韩国vps 联通vps 腾讯云数据库 国外bt realvnc parseerror 搜狗抢票助手 搜狗12306抢票助手 卡巴斯基永久免费版 怎么测试下载速度 seednet 已备案删除域名 me空间社区 域名接入 联通网站 架设邮件服务器 带宽租赁 全能空间 更多