1Copyright2015EMCCorporation.
Allrightsreserved.
RAIDShield:Characterizing,Monitoring,andProactivelyProtectingAgainstDiskFailuresPresenter:AoMaAjointworkwithFredDouglis,GuanlinLu,DarrenSawyerSurendarChandra,WindsorHsu2Copyright2015EMCCorporation.
Allrightsreserved.
DiskfailuresarecommonplaceWhole-diskfailurePartialfailureRAIDiswidelydeployedProtectdataagainstfailureswithredundancyPervasiveRAIDProtection3Copyright2015EMCCorporation.
Allrightsreserved.
StoragesystemisevolvingEscalateduseoflessreliabledrivescausesmorewhole-diskfailuresIncreasingdiskcapacityresultsinmoresectorerrorsSolutionAddextraredundancy(RAID5,RAID6,…)–EnsuredatareliabilityatthecostofstorageefficiencyRAIDOverviewIsaddingextraredundancyanefficientsolution4Copyright2015EMCCorporation.
Allrightsreserved.
Analyzed1millionSATAdisksandrevealedFailuremodesdegradingRAIDreliabilityReallocatedsectorsreflectdiskreliabilitydeteriorationDiskfailureispredictableBuiltRAIDSHIELD,anactivedefensemechanismReconstructfailingdiskbeforeit'stoolate!
PLATE:single-diskproactiveprotection–Deploymenteliminates70%ofRAIDfailuresARMOR:diskgroupproactiveprotection–RecognizevulnerableRAIDgroupsWhatWeDid5Copyright2015EMCCorporation.
Allrightsreserved.
BackgroundDiskfailureanalysisRAIDSHIELD:IdentifyfailureindicatorReallocatedSector(RS)characterizationSinglediskproactiveprotectionDiskgroupproactiveprotection5Outline6Copyright2015EMCCorporation.
Allrightsreserved.
Diskfailuredoesnotfollowafail-stopmodelTheproductionsystemsstudieddefinefailureasConnectionislostAnoperationexceedsthetimeoutthresholdWritefailsWhole-diskFailureDefinition7Copyright2015EMCCorporation.
Allrightsreserved.
EachdiskdrivemodelisdenotedasRelativesizeswithinafamilyareorderedbythecapacitynumber–E.
g.
A-2islargerthanA-1DiskModelPopulation(Thousands)FirstDeploymentLogLength(Months)A-13406/200860A-216511/200860B-110006/200848C-19310/201036C-225312/201036D-138409/201121DiskDataCollection8Copyright2015EMCCorporation.
Allrightsreserved.
WhatDoRealDiskFailuresLookLike9Copyright2015EMCCorporation.
Allrightsreserved.
0001424462031010203040500-612-1824-3036-4248-54A-2MonthDistributionofLifetimeofFailedDrives001241534291130102030400-612-1824-3036-4248-54A-1MonthAlargefractionoffaileddrivesarefoundatasimilaragePercentage(%)Percentage(%)10Copyright2015EMCCorporation.
Allrightsreserved.
ThenumberofaffecteddiskskeepgrowingAbout10%ofdisksgetsectorerrorsatthe3rdyearSectorerrornumbersincreasescontinuouslyAverageerrorcountincreases25%to300%yearoveryearIncreasingFrequencyofSectorErrors11Copyright2015EMCCorporation.
Allrightsreserved.
DrivefailingatasimilarageFailurerateisnotconstantAhighriskofmultiplesimultaneousfailuresIncreasingfrequencyofsectorerrorsExacerbateriskofreconstructionfailuresPassiveRedundancyisInefficientEnsuringreliabilityintheworstcaserequiresaddingconsiderableextraredundancy,makingitunattractivefromacostperspective12Copyright2015EMCCorporation.
Allrightsreserved.
MotivationEnsuredatasafetywithminimalredundancyProactivelyrecognizeimpendingfailuresandmigratevulnerabledatainadvanceMethodologyIdentifyindicatorofimpendingfailureIndicatorcharacterizationProactiveprotectionRAIDSHIELD,TheProactiveProtection13Copyright2015EMCCorporation.
Allrightsreserved.
PotentialindicatorsVariousdiskerrorsCriteriaofagoodindicatorIthappensmuchmorefrequentlyonfaileddisksratherthanworkingdisksApproachQuantifythediscriminationbetweenerrorvalueonfaileddisksandworkingones–DecilescomparisonisusedIdentifyFailureIndicator14Copyright2015EMCCorporation.
Allrightsreserved.
FaileddiskshavemoremediaerrorsthanworkingonesThediscriminationisnotsignificantenoughMediaErrorComparison2359152232478611123471330020406080100123456789faileddiskworkingdiskDecilesA-2MediaErrorCount15Copyright2015EMCCorporation.
Allrightsreserved.
2238718732752281212422025000001262905001000150020002500123456789faileddiskworkingdiskA-2DecilesReallocatedSectorCountA-2DecilesRSisstronglycorrelatedwithdiskfailuresReallocatedSector(RS)Comparison16Copyright2015EMCCorporation.
Allrightsreserved.
MostfaileddrivestendtohavealargernumberofRSthanworkingonesRSisstronglycorrelatedwithwhole-diskfailures,followedbymediaerrors,pendingsectorerrorsanduncorrectablesectorerrorsCorrelationBetweenSectorErrorsAndWhole-diskFailureRSisastrongindicatorofimpendingdiskfailure17Copyright2015EMCCorporation.
Allrightsreserved.
LargerRScountimplieshigherfailurerateintwo-monthwindowDiskFailureRateGivenDifferentRSCount1.
767758083858689909091929393949502040608010004080120160200240280320360400440480520560600DiskFailureRate(%)RScountA-2RSCharacterization(1)18Copyright2015EMCCorporation.
Allrightsreserved.
LargerRScount,fastertofail10%25%median75%90%RSCountTimeMargin(Days)DiskFailureTimeGivenDifferentRSCountRSCharacterization(2)19Copyright2015EMCCorporation.
Allrightsreserved.
RScountindicatesthedegreeofdiskreliabilitydeteriorationUsetheRScounttopredictimpendingdiskfailureinadvancePLATE:SingleDiskProactiveProtection20Copyright2015EMCCorporation.
Allrightsreserved.
70.
166.
66461.
859.
952.
14742.
63936.
94.
52.
82.
11.
71.
40.
80.
70.
40.
30.
27010203040506070809010020406080100200300400500600failurespredictedfalsepositivePercentage(%)BoththepredictedfailureandfalsepositiveratesdecreaseasthethresholdincreasesSimulationResult:FailuresCapturedRateGivenDifferentRSThresholdRSthreshold21Copyright2015EMCCorporation.
Allrightsreserved.
551515801070020406080100WithoutProactiveProtectionWithProactiveProtectionHardwareFailuresOthersTripleFailuresEliminatedTripleFailuresSingleproactiveprotectionreducesabout70%ofRAIDfailures,equivalentto88%ofthetriple-diskfailuresPLATEDeploymentResult:CausesofRecoveryIncidentsPercentage(%)22Copyright2015EMCCorporation.
Allrightsreserved.
10%remainingtriplefailuresPLATEmissesRAIDfailurescausedbymultiplelessreliabledrives,whoseRScountshaven'texceedthethresholdTriagePrioritizediskgroupswithhighestriskMotivationofARMOR:TheRAIDGroupProactiveProtection23Copyright2015EMCCorporation.
Allrightsreserved.
1-11-21-31-42-12-22-32-43-13-23-33-44-14-24-34-4XXXXThreatofFailureImminentFailureGoodDiskHealthyDG1ImminentFailureofDG2ProtectedDG3PossibleFailureofDG4Singlediskprotection:Replace2-3,2-4,3-4(PLATE)Can'tidentifyDG4northedifferencebetweenDG2andDG3Groupprotection:ReplaceDG4orincreaseredundancy(ARMOR)ProtectDG4andrecognizethedifferencebetweenDG2andDG3DiskGroupProtectionExample24Copyright2015EMCCorporation.
Allrightsreserved.
CalculatethesinglediskfailureprobabilityConditionalprobabilitythroughBayesTheoremCalculatetheprobabilityofavulnerableRAIDCombinationofthosesinglediskprobabilitiesthroughjointprobabilityARMORMethodology25Copyright2015EMCCorporation.
Allrightsreserved.
ThediscriminationshowsARMORiseffectivetorecognizeendangeredDGsInpractice,itidentifiesmostDGfailuresthatarenotpredictedbyPLATEProbabilityDecilesdistributionEvaluation0.
250.
330.
390.
440.
460.
50.
630.
730.
930.
150.
20.
230.
250.
270.
280.
30.
310.
3200.
20.
40.
60.
81123456789GroupswithmorethanonefailureGroupswithoutfailure26Copyright2015EMCCorporation.
Allrightsreserved.
GooglereportsSMARTmetricssuchasreallocatedsectorstronglysuggestanimpendingfailure,buttheyalsodeterminethathalfofthefaileddisksshownosucherrors[Pinheiro'07]DifferentworkloadandRAIDrewriteDiskfailurepredictionAveragemaximumlatency[Goldszmidt'12]SMARTfailureprediction[Murray'05,Hughes'02]RelatedWork27Copyright2015EMCCorporation.
Allrightsreserved.
Weanalyzed1millionSATAdrivesObservefailuremodesdegradingRAIDreliabilityRevealRScountreflectsthediskreliabilitydeteriorationDiskfailureispredictableWebuiltRAIDSHIELD,anactivedefensemechanismPLATE:singlediskproactiveprotection–Deploymenteliminates70%ofRAIDfailuresARMOR:diskgroupproactiveprotection–RecognizevulnerableRAIDgroups–HopetodeployinfutureIsaddingextraredundancyanefficientsolutionUseasmuchredundancyasneededtoensureavailabilityProactivereplacementshoulddecreasethelevelneededSummary28Copyright2015EMCCorporation.
Allrightsreserved.
RAIDShield:Characterizing,Monitoring,andProactivelyProtectingAgainstDiskFailuresQuestionsAcknowledgementAndreaArpaci-DusseauandRemziArpaci-DusseauDataDomainengineerteam,membersofADandCTOoffice,StephenManley29Copyright2015EMCCorporation.
Allrightsreserved.
CalculatethesinglediskfailureprobabilityCalculatetheprobabilityofavulnerableRAID
LightNode官网LightNode是一家位于香港的VPS服务商.提供基于KVM虚拟化技术的VPS.在提供全球常见节点的同时,还具备东南亚地区、中国香港等边缘节点.满足开发者建站,游戏应用,外贸电商等应用场景的需求。为用户带来高性能服务器以及优质的服务的同时还提供丰厚的促销活动,新用户注册最高送$20。注册用户带新客即可得10%返佣。商家支持PayPal,支付宝等支付方式。官网:https:/...
这几天有几个网友询问到是否有Windows VPS主机便宜的VPS主机商。原本他们是在Linode、Vultr主机商挂载DD安装Windows系统的,有的商家支持自定义WIN镜像,但是这些操作起来特别效率低下,每次安装一个Windows系统需要一两个小时,所以如果能找到比较合适的自带Windows系统的服务器那最好不过。这不看到PacificRack商家有提供夏季促销活动,其中包括年付便宜套餐的P...
pacificrack发布了7月最新vps优惠,新款促销便宜vps采用的是魔方管理,也就是PR-M系列。提一下有意思的是这次支持Windows server 2003、2008R2、2012R2、2016、2019、Windows 7、Windows 10,当然啦,常规Linux系统是必不可少的!1Gbps带宽、KVM虚拟、纯SSD raid10、自家QN机房洛杉矶数据中心...支持PayPal、...
阵列卡为你推荐
psbc.comwww.psbc.com怎样注册rawtoolsRAW是什么衣服牌子haole018.com为什么www.haole008.com在我这里打不开啊,是不是haole008换新的地址了?haokandianyingwang谁有好看电影网站啊、要无毒播放速度快的、在线等mole.61.com谁知道摩尔庄园的网址啊125xx.com115xx.com是什么意思百度指数词百度指数我创建的新词www.5any.com我想去重庆上大学www.5any.comwww.qbo5.com 这个网站要安装播放器www.bbb551.com100bbb网站怎样上不去了
国外主机 xfce evssl qingyun 有益网络 宁波服务器 isp服务商 流媒体加速 银盘服务 空间登录首页 西安服务器托管 贵阳电信测速 华为k3 lamp是什么意思 金主 美国迈阿密 杭州电信 cdn服务 小夜博客 免 更多