1Copyright2015EMCCorporation.
Allrightsreserved.
RAIDShield:Characterizing,Monitoring,andProactivelyProtectingAgainstDiskFailuresPresenter:AoMaAjointworkwithFredDouglis,GuanlinLu,DarrenSawyerSurendarChandra,WindsorHsu2Copyright2015EMCCorporation.
Allrightsreserved.
DiskfailuresarecommonplaceWhole-diskfailurePartialfailureRAIDiswidelydeployedProtectdataagainstfailureswithredundancyPervasiveRAIDProtection3Copyright2015EMCCorporation.
Allrightsreserved.
StoragesystemisevolvingEscalateduseoflessreliabledrivescausesmorewhole-diskfailuresIncreasingdiskcapacityresultsinmoresectorerrorsSolutionAddextraredundancy(RAID5,RAID6,…)–EnsuredatareliabilityatthecostofstorageefficiencyRAIDOverviewIsaddingextraredundancyanefficientsolution4Copyright2015EMCCorporation.
Allrightsreserved.
Analyzed1millionSATAdisksandrevealedFailuremodesdegradingRAIDreliabilityReallocatedsectorsreflectdiskreliabilitydeteriorationDiskfailureispredictableBuiltRAIDSHIELD,anactivedefensemechanismReconstructfailingdiskbeforeit'stoolate!
PLATE:single-diskproactiveprotection–Deploymenteliminates70%ofRAIDfailuresARMOR:diskgroupproactiveprotection–RecognizevulnerableRAIDgroupsWhatWeDid5Copyright2015EMCCorporation.
Allrightsreserved.
BackgroundDiskfailureanalysisRAIDSHIELD:IdentifyfailureindicatorReallocatedSector(RS)characterizationSinglediskproactiveprotectionDiskgroupproactiveprotection5Outline6Copyright2015EMCCorporation.
Allrightsreserved.
Diskfailuredoesnotfollowafail-stopmodelTheproductionsystemsstudieddefinefailureasConnectionislostAnoperationexceedsthetimeoutthresholdWritefailsWhole-diskFailureDefinition7Copyright2015EMCCorporation.
Allrightsreserved.
EachdiskdrivemodelisdenotedasRelativesizeswithinafamilyareorderedbythecapacitynumber–E.
g.
A-2islargerthanA-1DiskModelPopulation(Thousands)FirstDeploymentLogLength(Months)A-13406/200860A-216511/200860B-110006/200848C-19310/201036C-225312/201036D-138409/201121DiskDataCollection8Copyright2015EMCCorporation.
Allrightsreserved.
WhatDoRealDiskFailuresLookLike9Copyright2015EMCCorporation.
Allrightsreserved.
0001424462031010203040500-612-1824-3036-4248-54A-2MonthDistributionofLifetimeofFailedDrives001241534291130102030400-612-1824-3036-4248-54A-1MonthAlargefractionoffaileddrivesarefoundatasimilaragePercentage(%)Percentage(%)10Copyright2015EMCCorporation.
Allrightsreserved.
ThenumberofaffecteddiskskeepgrowingAbout10%ofdisksgetsectorerrorsatthe3rdyearSectorerrornumbersincreasescontinuouslyAverageerrorcountincreases25%to300%yearoveryearIncreasingFrequencyofSectorErrors11Copyright2015EMCCorporation.
Allrightsreserved.
DrivefailingatasimilarageFailurerateisnotconstantAhighriskofmultiplesimultaneousfailuresIncreasingfrequencyofsectorerrorsExacerbateriskofreconstructionfailuresPassiveRedundancyisInefficientEnsuringreliabilityintheworstcaserequiresaddingconsiderableextraredundancy,makingitunattractivefromacostperspective12Copyright2015EMCCorporation.
Allrightsreserved.
MotivationEnsuredatasafetywithminimalredundancyProactivelyrecognizeimpendingfailuresandmigratevulnerabledatainadvanceMethodologyIdentifyindicatorofimpendingfailureIndicatorcharacterizationProactiveprotectionRAIDSHIELD,TheProactiveProtection13Copyright2015EMCCorporation.
Allrightsreserved.
PotentialindicatorsVariousdiskerrorsCriteriaofagoodindicatorIthappensmuchmorefrequentlyonfaileddisksratherthanworkingdisksApproachQuantifythediscriminationbetweenerrorvalueonfaileddisksandworkingones–DecilescomparisonisusedIdentifyFailureIndicator14Copyright2015EMCCorporation.
Allrightsreserved.
FaileddiskshavemoremediaerrorsthanworkingonesThediscriminationisnotsignificantenoughMediaErrorComparison2359152232478611123471330020406080100123456789faileddiskworkingdiskDecilesA-2MediaErrorCount15Copyright2015EMCCorporation.
Allrightsreserved.
2238718732752281212422025000001262905001000150020002500123456789faileddiskworkingdiskA-2DecilesReallocatedSectorCountA-2DecilesRSisstronglycorrelatedwithdiskfailuresReallocatedSector(RS)Comparison16Copyright2015EMCCorporation.
Allrightsreserved.
MostfaileddrivestendtohavealargernumberofRSthanworkingonesRSisstronglycorrelatedwithwhole-diskfailures,followedbymediaerrors,pendingsectorerrorsanduncorrectablesectorerrorsCorrelationBetweenSectorErrorsAndWhole-diskFailureRSisastrongindicatorofimpendingdiskfailure17Copyright2015EMCCorporation.
Allrightsreserved.
LargerRScountimplieshigherfailurerateintwo-monthwindowDiskFailureRateGivenDifferentRSCount1.
767758083858689909091929393949502040608010004080120160200240280320360400440480520560600DiskFailureRate(%)RScountA-2RSCharacterization(1)18Copyright2015EMCCorporation.
Allrightsreserved.
LargerRScount,fastertofail10%25%median75%90%RSCountTimeMargin(Days)DiskFailureTimeGivenDifferentRSCountRSCharacterization(2)19Copyright2015EMCCorporation.
Allrightsreserved.
RScountindicatesthedegreeofdiskreliabilitydeteriorationUsetheRScounttopredictimpendingdiskfailureinadvancePLATE:SingleDiskProactiveProtection20Copyright2015EMCCorporation.
Allrightsreserved.
70.
166.
66461.
859.
952.
14742.
63936.
94.
52.
82.
11.
71.
40.
80.
70.
40.
30.
27010203040506070809010020406080100200300400500600failurespredictedfalsepositivePercentage(%)BoththepredictedfailureandfalsepositiveratesdecreaseasthethresholdincreasesSimulationResult:FailuresCapturedRateGivenDifferentRSThresholdRSthreshold21Copyright2015EMCCorporation.
Allrightsreserved.
551515801070020406080100WithoutProactiveProtectionWithProactiveProtectionHardwareFailuresOthersTripleFailuresEliminatedTripleFailuresSingleproactiveprotectionreducesabout70%ofRAIDfailures,equivalentto88%ofthetriple-diskfailuresPLATEDeploymentResult:CausesofRecoveryIncidentsPercentage(%)22Copyright2015EMCCorporation.
Allrightsreserved.
10%remainingtriplefailuresPLATEmissesRAIDfailurescausedbymultiplelessreliabledrives,whoseRScountshaven'texceedthethresholdTriagePrioritizediskgroupswithhighestriskMotivationofARMOR:TheRAIDGroupProactiveProtection23Copyright2015EMCCorporation.
Allrightsreserved.
1-11-21-31-42-12-22-32-43-13-23-33-44-14-24-34-4XXXXThreatofFailureImminentFailureGoodDiskHealthyDG1ImminentFailureofDG2ProtectedDG3PossibleFailureofDG4Singlediskprotection:Replace2-3,2-4,3-4(PLATE)Can'tidentifyDG4northedifferencebetweenDG2andDG3Groupprotection:ReplaceDG4orincreaseredundancy(ARMOR)ProtectDG4andrecognizethedifferencebetweenDG2andDG3DiskGroupProtectionExample24Copyright2015EMCCorporation.
Allrightsreserved.
CalculatethesinglediskfailureprobabilityConditionalprobabilitythroughBayesTheoremCalculatetheprobabilityofavulnerableRAIDCombinationofthosesinglediskprobabilitiesthroughjointprobabilityARMORMethodology25Copyright2015EMCCorporation.
Allrightsreserved.
ThediscriminationshowsARMORiseffectivetorecognizeendangeredDGsInpractice,itidentifiesmostDGfailuresthatarenotpredictedbyPLATEProbabilityDecilesdistributionEvaluation0.
250.
330.
390.
440.
460.
50.
630.
730.
930.
150.
20.
230.
250.
270.
280.
30.
310.
3200.
20.
40.
60.
81123456789GroupswithmorethanonefailureGroupswithoutfailure26Copyright2015EMCCorporation.
Allrightsreserved.
GooglereportsSMARTmetricssuchasreallocatedsectorstronglysuggestanimpendingfailure,buttheyalsodeterminethathalfofthefaileddisksshownosucherrors[Pinheiro'07]DifferentworkloadandRAIDrewriteDiskfailurepredictionAveragemaximumlatency[Goldszmidt'12]SMARTfailureprediction[Murray'05,Hughes'02]RelatedWork27Copyright2015EMCCorporation.
Allrightsreserved.
Weanalyzed1millionSATAdrivesObservefailuremodesdegradingRAIDreliabilityRevealRScountreflectsthediskreliabilitydeteriorationDiskfailureispredictableWebuiltRAIDSHIELD,anactivedefensemechanismPLATE:singlediskproactiveprotection–Deploymenteliminates70%ofRAIDfailuresARMOR:diskgroupproactiveprotection–RecognizevulnerableRAIDgroups–HopetodeployinfutureIsaddingextraredundancyanefficientsolutionUseasmuchredundancyasneededtoensureavailabilityProactivereplacementshoulddecreasethelevelneededSummary28Copyright2015EMCCorporation.
Allrightsreserved.
RAIDShield:Characterizing,Monitoring,andProactivelyProtectingAgainstDiskFailuresQuestionsAcknowledgementAndreaArpaci-DusseauandRemziArpaci-DusseauDataDomainengineerteam,membersofADandCTOoffice,StephenManley29Copyright2015EMCCorporation.
Allrightsreserved.
CalculatethesinglediskfailureprobabilityCalculatetheprobabilityofavulnerableRAID
A400互联怎么样?A400互联是一家成立于2020年的商家,A400互联是云服务器网(yuntue.com)首次发布的云主机商家。本次A400互联给大家带来的是,全新上线的香港节点,cmi+cn2线路,全场香港产品7折优惠,优惠码0711,A400互联,只为给你提供更快,更稳,更实惠的套餐,香港节点上线cn2+cmi线路云服务器,37.8元/季/1H/1G/10M/300G,云上日子,你我共享。...
HaBangNet支持支付宝和微信支付,只是价格偏贵,之前国内用户并不多。这次HaBangNet推出三个特价套餐,其中美国机房和德国机房价格也还可以,但是香港机房虽然是双向CN2 GIA线路,但是还是贵的惊人,需要美国和德国机房的可以参考下。HaBangNet是一家成立于2014年的香港IDC商家,中文译名:哈邦网络公司,主营中国香港、新加坡、澳大利亚、荷兰、美国、德国机房的虚拟主机、vps、专用...
官方网站:点击访问青果云官方网站活动方案:—————————–活动规则—————————1、选购活动产品并下单(先不要支付)2、联系我司在线客服修改价格或领取赠送时间3、确认价格已按活动政策修改正确后,支付订单,到此产品开设成功4、本活动产品可以升级,升级所需费用按产品原价计算若发生退款,按资源实际使用情况折算为产品原价再退还剩余余额! 美国洛杉矶CN2_GIACPU内存系统盘流量宽带i...
阵列卡为你推荐
neworiental上海新东方有几个校区,分别是那几个?access数据库access数据库主要学什么www.33xj.compro/engineer 在哪里下载,为什么找不到下载网站?www.03ggg.comwww.tvb33.com这里好像有中国性戏观看吧??www.idanmu.com腾讯有qqsk.zik.mu这个网站吗?www.idanmu.com万通奇迹,www.wcm77.HK 是传销么?www.ca800.comPLC好学吗www.175qq.com这表情是什么?梦遗姐男人梦遗,女人会吗?hao.rising.cnIE主页被瑞星绑架http://hao.rising.cn//?b=84主页明明设置的是百度但打开后是瑞星导航,
南通服务器租用 联通vps 域名商 美国主机论坛 灵动鬼影 京东商城0元抢购 七夕促销 183是联通还是移动 域名和空间 phpmyadmin配置 常州联通宽带 视频服务器是什么 便宜空间 浙江服务器 中国联通宽带测试 阿里dns 卡巴斯基官网下载 带宽测试 七牛云存储 存储服务器 更多