Intel:AcceleratingthePathtoExascaleKirkSkaugenVicePresidentIntelArchitectureGroupGeneralManagerDataCenterGroupAnInsatiableNeedForComputingExascaleProblemsCannotBeSolvedUsingtheComputingPowerAvailableToday10PFlops1PFlops100TFlops10TFlops1TFlops100GFlops10GFlops1GFlops100MFlops100PFlops10EFlops1EFlops100EFlops1993201719992005201120231ZFlops2029WeatherPredictionMedicalImagingGenomicsResearchSource:www.
top500.
orgForecastExascaleAnswersMankind'sChallengesIn…Weather/ClimateHealthcareNewFormsofEnergyWe'veHelpedTransformIndustries~1TFLOP~$55K/GFLOP500TFLOPSPerformance$/GFLOPAnnualServerProcessorShipmentsSupercomputingin1997Supercomputingin201019952000200020052010201519952000200520101995IntelCommitmentToExascaleProgrammingParallelismEfficientPerformanceExtremeScalabilityIntelExascaleCommitment:>100XPerformanceOfTodayAtOnly2XThePowerofToday's#1SystemScalingToday'sSoftwareModel6ExascaleRequirementsPetascaleMachineof2010:TFLOPofComputeEstimationbasedonPetascalemachinerequirementscirca2010.
Compute40xMemory75XComms20xDisk/Storage33xOther900xVisceralFocusonSystemPowerEfficiencyImprovementScalingProgrammabilityOneProgrammingModelDemocratizesUsage…AvoidCostlyDetours2003200520072009201190nm65nm45nm32nm22nmInventedSiGeStrainedSilicon2ndGen.
SiGeStrainedSilicon2ndGen.
Gate-LastHigh-kMetalGateInventedGate-LastHigh-kMetalGateFirsttoImplementTri-GateSTRAINEDSILICONHIGH-kMETALGATETRI-GATE22nmARevolutionaryLeapinProcessTechnology37%PerformanceGainatLowVoltage*>50%ActivePowerReductionatConstantPerformance*ProcessTechnologyLeadershipThefoundationforallcomputingSource:Intel*ComparedtoIntel32nmTechnologyIntelLabs&HPCStrongResearchPartnershipsUniversitiesGovernmentIndustryWorldClassResearchinHPC*Othernames,logosandbrandsmaybeclaimedasthepropertyofothers.
DeliveringBreakthroughTechnologiestoFuelInnovationPowerful.
Intelligent.
EfficientI/OIntegratedPCIereduceslatencyandpowerGrowingPerformanceUpto8corespersocket2XFLOPSwithIntelAdvancedVectorExtensionsContinuingTheJourney:NextIntelXeonProcessorCodenamedSandyBridge-EPTheFoundationoftheInnovationinScienceandTechnologyHighlyParallelPerformanceIntelManyIntegratedCore(IntelMIC)ArchitectureLaunchingon22nmwith>50corestoprovideoutstandingperformanceforHPCusersThemanybenefitsofbroadIntelCPUprogrammingmodels,techniques,andfamiliarx86developertoolsDeliveredPerformanceThecomputedensityassociatedwithspecialtyacceleratorsforparallelworkloadsAStepForwardInDealingWithEfficientPerformance&ProgrammabilityProgrammabilityPerformanceDensity13EvaluatingtheIntelMICArchitectureArndtBodeLeibnizSupercomputingCentre,GermanywithinputfromIrisChristadler,AlexanderHeineckeandVolkerWeinbergJune2011,ISC,HamburgEvaluatingtheIntelMICArchitecture,Prof.
A.
Bode,LRZJune2011PrefaceProgrammingmodelsarethekeytoharnessthecomputationalpowerofmassivelyparalleldevices.
Obviously,Intelhasrealizedthistrendandsubstantiallysupportsopenstandardsandinvestsininnovativeprogrammingmodels.
LRZandTUMareusingIntelhard-andsoftwareformanyyearsandknowthetoolchainbyheart.
Weexpect:Ahardwareproductthatdeliversgoodperformance(andenergy-efficiency)withoutloosingprogrammability.
14EvaluatingtheIntelMICArchitecture,Prof.
A.
Bode,LRZJune2011AdvantagesoftheMICArchitectureIsastandardx86architecture!
AllowsmanydifferentparallelprogrammingmodelslikeOpenMP,MPIandIntelCilk!
Offersstandardmath-librarieslikeIntelMKL!
SupportswholeInteltoolchain,e.
g.
Compiler&Debugger!
WritingMIC-acceleratedcodewithminimaleffortandgreatperformance15EvaluatingtheIntelMICArchitecture,Prof.
A.
Bode,LRZJune2011WorkloadsunderInvestigationEurobenKernels(7dwarfsofHPC)DataMiningTifaMMy–MatrixOperations(DemohereatISC'11!
)FurtherLinearAlgebraandSimulationCodes16EvaluatingtheIntelMICArchitecture,Prof.
A.
Bode,LRZJune2011EurobenKernelsSelectedmicro-benchmarksusedinPRACEfortheevaluationofacceleratorhardware&newlanguages:http://www.
prace-project.
eu/documents/public-deliverables/d6-6.
pdf–Example:mod2am:densematrix-matrixmultiplication(MxM)17Performanceevaluationofmod2amonKNFwith30cores@1050MHzusingIntel'sOffloadCompiler,singleprecision,datatransfertimesexcludedEvaluatingtheIntelMICArchitecture,Prof.
A.
Bode,LRZJune2011DataMiningwithAdaptiveSparseGridsMachinelearningalgorithmLearningfunctionfromatrainingdatasetImportantworkloadforclassificationandregressionofhugedatasetsMIC-Execution:StraightforwardFirstversionwithinafewhoursOptimizedversiontook2days150420050100150200250300350400450WSM-EPX5670KNF32/1200(incl.
offload)GFlops/s18Testworkload:Learning5dcheckerboardwith262144instancesandclassificationaccuracyof92%EvaluatingtheIntelMICArchitecture,Prof.
A.
Bode,LRZJune2011TifaMMy–IdeaandApplicationTifaMMy:self-adaptiveandcache-obliviousframeworkformatrixoperationsoptimizedonfatx86coresThisisdonebynestedrecursionsandvectorizedkernels–OnMIConlythekernelswerechanged,MIC'sx86coresareabletotacklenestedrecursions!
parallelizationschemeemployingOpenMPcanbereusedhavingSSEkernels,bringingcodetoMICisnearlyforfree19EvaluatingtheIntelMICArchitecture,Prof.
A.
Bode,LRZJune2011TifaMMy–PerformanceMatrixMultiplication20010020030040050060070032256480704928115213761600182420482272249627202944316833923616384040644288451247364960518454085632585660806304652867526976720074247648MatrixSizeGFLOPSMaxTestworkload:TifaMMyExecutedonKNFwith32cores@1200MHzEvaluatingtheIntelMICArchitecture,Prof.
A.
Bode,LRZJune2011AdvantagesoftheMICArchitectureIsastandardx86architecture!
AllowsmanydifferentparallelprogrammingmodelslikeOpenMP,MPIandIntelCilk!
Offersstandardmath-librarieslikeIntelMKL!
SupportswholeInteltoolchain,e.
g.
Compiler&Debugger!
Pre-releaseMIC-acceleratedcodeforatypicalscientificworkload(e.
g.
DataMining,TifaMMy)canreachupto50%ofpeakperformance!
VisitdemohereatISC'11!
21"SGIunderstandsthesignificanceofinter-processorcommunications,power,densityandusabilitywhenarchitectingforexascale.
IntelhasmadetheleaptowardsexaflopcomputingwiththeintroductionofIntelManyIntegratedCore(MIC)architecture.
FutureIntelMICproductswillsatisfyallfourofthesepriorities,especiallywiththeirexpectedtentimesincreaseincomputedensitycoupledwiththeirfamiliarX86programmingenvironment.
"Dr.
EngLimGoh,SGICTO23IntelMICArchitecture:NeededforExascaleExaflopby2018125xcomputepower25x:Moore'sLaw5x:remains24IntelMICArchitecture:Familiarx86Programming#include#include#defineN1000000000LLmain(){doublepi=0.
0f;longi;#pragmaoffloadtarget(mic)#pragmaompparallelforreduction(+:pi)for(i=0;i100XPerformanceOfTodayAtOnly2XThePowerOfToday's#1ScalingToday'sSoftwareModel30SystemConfiguration7TFLOPSSGEMMinanodeHWspecifications8xKNFD0Si@1.
2GHz,2GBGDDR5@3.
6GT/sHostColfaxCXT8000:2socketplatformwith2IntelXeonprocessorX5690(3.
46GHz,6cores,12MBL3cache)with24GBDDR3@1333MHz,DualIntel5520IOH,OSRHEL6.
0KNFSWStackLarrabeekerneldriverver.
1.
6.
197FlashImage/uOS:1.
0.
0.
1137/1.
0.
0.
1137-EXT-HPCOffloadcompiler(w/dataxfer):ComposerXEforMIC0.
043Nativecompiler(w/odataxfer):VersionAlphaBuild20110518–ColfaxModel:CXT8000Serverw/Intel5520chipsetand4PLXPEX8647Gen2PCIeswitches–IntelAlphalevelsoftware(IntelCompilers,driversetc.
)31SystemConfigurationHybridComputingwithIntelMKLHWspecifications1xKNFD0Si@1.
2GHz,2GBGDDR5@3.
6GT/sHostShadyCove2socketplatformwith2IntelXeonprocessorX5680(3.
33GHz,6cores,12MBL3cache)with24GBDDR3@1333MHz,singleIntel5520IOH,OS:RHEL6.
0KNFSWStackLarrabeekerneldriverver.
1.
6.
197FlashImage/uOS:1.
0.
0.
1137/1.
0.
0.
1137-EXT-HPCOffloadcompiler(w/dataxfer):IntelComposerXEforMIC0.
043Nativecompiler(w/odataxfer):VersionAlphaBuild20110518–KnightsFerrySoftwareDevelopmentPlatform(ShadyCove)–IntelAlphalevelsoftware(IntelCompilers,IntelMKL,driversetc.
)SWspecificationsMKL4KNFMKLKNF.
b2build20110518MKL10.
3.
332SystemConfigurationHybridComputingLUFactorizationHWspecifications1xKNFD0Si@1.
2GHz,2GBGDDR5@3.
6GT/sHostShadyCove2socketplatformwith2IntelXeonprocessorX5680(3.
33GHz,6cores,12MBL3Cache)with24GBDDR3@1333MHz,singleIntel5520IOH,OS:RHEL6.
0KNFSWStackLarrabeekerneldriverver.
1.
6.
197FlashImage/uOS:1.
0.
0.
1137/1.
0.
0.
1137-EXT-HPCOffloadcompiler(w/dataxfer):IntelComposerXEforMIC0.
043Nativecompiler(w/odataxfer):VersionAlphaBuild20110518–KnightsFerrySoftwareDevelopmentPlatform(ShadyCove)–IntelAlphalevelsoftware(IntelCompilers,driversetc.
)33SystemConfigurationKISTIMolecularDynamicsHWspecifications1xKNFC0Si@1.
2GHz,2GBGDDR5@3.
0GT/sHostDellPrecisionWorkstation1socketplatformwith1IntelXeonprocessorX5620(4cores,2.
4GHz,12MBL3cache)with24GBDDR3@1333MHz,singleIntel5520IOH,OS:RHEL6.
0KNFSWStackLarrabeekerneldriverver.
1.
6.
197FlashImage/uOS:1.
0.
0.
1137/1.
0.
0.
1137-EXT-HPCOffloadcompiler(w/dataxfer):IntelComposerXEforMIC0.
043Nativecompiler(w/odataxfer):VersionAlphaBuild20110518–DellPrecisionWorkstation–IntelAlphalevelsoftware(IntelCompilers,driversetc.
)34SystemConfigurationCERNopenlab:CoreScalingofIntelMICArchitectureHWspecifications1xKNFC0Si@1.
2GHz,2GBGDDR5@3.
0GT/sHostSGIH40022socketplatformwith2IntelXeonprocessorX5690(6cores,3.
46GHz,12MBL3cache)with24GBDDR3@1333MHz,singleIntel5520IOH,OS:RHEL6.
0KNFSWStackLarrabeekerneldriverver.
1.
6.
197FlashImage/uOS:1.
0.
0.
1137/1.
0.
0.
1137-EXT-HPCOffloadcompiler(w/dataxfer):IntelComposerXEforMIC0.
043Nativecompiler(w/odataxfer):VersionAlphaBuild20110518–SGIH4002System–IntelAlphalevelsoftware(IntelCompilers,driversetc.
)35SystemConfigurationLRZ:TifaMMyMatrixMultiplicationHWspecifications1xKNFC0Si@1.
2GHz,2GBGDDR5@3.
0GT/sHostShadyCove2socketplatformwith2IntelXeonprocessorX5680(3.
33GHz,6cores,12MBL3Cache)with24GBDDR3@1333MHz,singleIntel5520IOH,OS:RHEL6.
0KNFSWStackLarrabeekerneldriverver.
1.
6.
197FlashImage/uOS:1.
0.
0.
1137/1.
0.
0.
1137-EXT-HPCOffloadcompiler(w/dataxfer):IntelComposerXEforMIC0.
043Nativecompiler(w/odataxfer):VersionAlphaBuild20110518–KnightsFerrySoftwareDevelopmentPlatform(ShadyCove)–IntelAlphalevelsoftware(IntelCompilers,driversetc.
)36SystemConfigurationFZJülich:SMMPProteinFoldingHWspecifications1xKNFC0Si@1.
2GHz,2GBGDDR5@3.
0GT/sHostShadyCove2socketplatformwith2IntelXeonprocessorX5680(3.
33GHz,6cores,12MBL3Cache)with24GBDDR3@1333MHz,singleIntel5520IOH,OS:RHEL6.
0KNFSWStackLarrabeekerneldriverver.
1.
6.
197FlashImage/uOS:1.
0.
0.
1137/1.
0.
0.
1137-EXT-HPCOffloadcompiler(w/dataxfer):IntelComposerXEforMIC0.
043Nativecompiler(w/odataxfer):VersionAlphaBuild20110518–KnightsFerrySoftwareDevelopmentPlatform(ShadyCove)–IntelAlphalevelsoftware(IntelCompilers,driversetc.
)
酷锐云是一家2019年开业的国人主机商家,商家为企业运营,主要销售主VPS服务器,提供挂机宝和云服务器,机房有美国CERA、中国香港安畅和电信,CERA为CN2 GIA线路,提供单机10G+天机盾防御,提供美国原生IP,支持媒体流解锁,商家的套餐价格非常美丽,CERA机房月付20元起,香港安畅机房10M带宽月付25元,有需要的朋友可以入手试试。酷锐云自开业以来一直有着良好的产品稳定性及服务态度,支...
819云互联 在本月发布了一个购买香港,日本独立服务器的活动,相对之前的首月活动性价比更高,最多只能享受1个月的活动 续费价格恢复原价 是有些颇高 这次819云互联与机房是合作伙伴 本次拿到机房 活动7天内购买独立服务器后期的长期续费价格 加大力度 确实来说这次的就可以买年付或者更长时间了…本次是5个机房可供选择,独立服务器最低默认是50M带宽,不限制流量,。官网:https://ww...
博鳌云是一家以海外互联网基础业务为主的高新技术企业,运营全球高品质数据中心业务。自2008年开始为用户提供服务,距今11年,在国人商家中来说非常老牌。致力于为中国用户提供域名注册(国外接口)、免费虚拟主机、香港虚拟主机、VPS云主机和香港、台湾、马来西亚等地服务器租用服务,各类网络应用解決方案等领域的专业网络数据服务。商家支持支付宝、微信、银行转账等付款方式。目前香港有一款特价独立服务器正在促销,...
www.6080.org为你推荐
西部妈妈网烟台分类妈妈网 分类妈妈网的前2个字什么?嘀动网动网和爱动网各自的优势是什么?杰景新特杰普特长笛JFL-511SCE是不是有纯银的唇口片??价格怎样??haokandianyingwang谁有好看电影网站啊、要无毒播放速度快的、在线等haole10.comwww.qq10eu.in是QQ网站吗www.javmoo.comjavimdb是什么网站为什么打不开m.kan84.net电视剧海派甜心全集海派甜心在线观看海派甜心全集高清dvd快播迅雷下载抓站工具一起来捉妖神行抓妖辅助工具都有哪些?baqizi.cc讲讲曾子杀猪的主要内容!baqizi.cc誰知道,最近有什麼好看的電視劇
淘宝虚拟主机 高防服务器租用 域名备案中心 hostigation wordpress主机 西安服务器 simcentric vpsio rak机房 512m 服务器日志分析 dropbox网盘 mysql主机 国外免费全能空间 南通服务器 免费申请网站 免费dns解析 华为云盘 空间首页登陆 太原联通测速 更多