1Copyright2015EMCCorporation.
Allrightsreserved.
RAIDShield:Characterizing,Monitoring,andProactivelyProtectingAgainstDiskFailuresPresenter:AoMaAjointworkwithFredDouglis,GuanlinLu,DarrenSawyerSurendarChandra,WindsorHsu2Copyright2015EMCCorporation.
Allrightsreserved.
DiskfailuresarecommonplaceWhole-diskfailurePartialfailureRAIDiswidelydeployedProtectdataagainstfailureswithredundancyPervasiveRAIDProtection3Copyright2015EMCCorporation.
Allrightsreserved.
StoragesystemisevolvingEscalateduseoflessreliabledrivescausesmorewhole-diskfailuresIncreasingdiskcapacityresultsinmoresectorerrorsSolutionAddextraredundancy(RAID5,RAID6,…)–EnsuredatareliabilityatthecostofstorageefficiencyRAIDOverviewIsaddingextraredundancyanefficientsolution4Copyright2015EMCCorporation.
Allrightsreserved.
Analyzed1millionSATAdisksandrevealedFailuremodesdegradingRAIDreliabilityReallocatedsectorsreflectdiskreliabilitydeteriorationDiskfailureispredictableBuiltRAIDSHIELD,anactivedefensemechanismReconstructfailingdiskbeforeit'stoolate!
PLATE:single-diskproactiveprotection–Deploymenteliminates70%ofRAIDfailuresARMOR:diskgroupproactiveprotection–RecognizevulnerableRAIDgroupsWhatWeDid5Copyright2015EMCCorporation.
Allrightsreserved.
BackgroundDiskfailureanalysisRAIDSHIELD:IdentifyfailureindicatorReallocatedSector(RS)characterizationSinglediskproactiveprotectionDiskgroupproactiveprotection5Outline6Copyright2015EMCCorporation.
Allrightsreserved.
Diskfailuredoesnotfollowafail-stopmodelTheproductionsystemsstudieddefinefailureasConnectionislostAnoperationexceedsthetimeoutthresholdWritefailsWhole-diskFailureDefinition7Copyright2015EMCCorporation.
Allrightsreserved.
EachdiskdrivemodelisdenotedasRelativesizeswithinafamilyareorderedbythecapacitynumber–E.
g.
A-2islargerthanA-1DiskModelPopulation(Thousands)FirstDeploymentLogLength(Months)A-13406/200860A-216511/200860B-110006/200848C-19310/201036C-225312/201036D-138409/201121DiskDataCollection8Copyright2015EMCCorporation.
Allrightsreserved.
WhatDoRealDiskFailuresLookLike9Copyright2015EMCCorporation.
Allrightsreserved.
0001424462031010203040500-612-1824-3036-4248-54A-2MonthDistributionofLifetimeofFailedDrives001241534291130102030400-612-1824-3036-4248-54A-1MonthAlargefractionoffaileddrivesarefoundatasimilaragePercentage(%)Percentage(%)10Copyright2015EMCCorporation.
Allrightsreserved.
ThenumberofaffecteddiskskeepgrowingAbout10%ofdisksgetsectorerrorsatthe3rdyearSectorerrornumbersincreasescontinuouslyAverageerrorcountincreases25%to300%yearoveryearIncreasingFrequencyofSectorErrors11Copyright2015EMCCorporation.
Allrightsreserved.
DrivefailingatasimilarageFailurerateisnotconstantAhighriskofmultiplesimultaneousfailuresIncreasingfrequencyofsectorerrorsExacerbateriskofreconstructionfailuresPassiveRedundancyisInefficientEnsuringreliabilityintheworstcaserequiresaddingconsiderableextraredundancy,makingitunattractivefromacostperspective12Copyright2015EMCCorporation.
Allrightsreserved.
MotivationEnsuredatasafetywithminimalredundancyProactivelyrecognizeimpendingfailuresandmigratevulnerabledatainadvanceMethodologyIdentifyindicatorofimpendingfailureIndicatorcharacterizationProactiveprotectionRAIDSHIELD,TheProactiveProtection13Copyright2015EMCCorporation.
Allrightsreserved.
PotentialindicatorsVariousdiskerrorsCriteriaofagoodindicatorIthappensmuchmorefrequentlyonfaileddisksratherthanworkingdisksApproachQuantifythediscriminationbetweenerrorvalueonfaileddisksandworkingones–DecilescomparisonisusedIdentifyFailureIndicator14Copyright2015EMCCorporation.
Allrightsreserved.
FaileddiskshavemoremediaerrorsthanworkingonesThediscriminationisnotsignificantenoughMediaErrorComparison2359152232478611123471330020406080100123456789faileddiskworkingdiskDecilesA-2MediaErrorCount15Copyright2015EMCCorporation.
Allrightsreserved.
2238718732752281212422025000001262905001000150020002500123456789faileddiskworkingdiskA-2DecilesReallocatedSectorCountA-2DecilesRSisstronglycorrelatedwithdiskfailuresReallocatedSector(RS)Comparison16Copyright2015EMCCorporation.
Allrightsreserved.
MostfaileddrivestendtohavealargernumberofRSthanworkingonesRSisstronglycorrelatedwithwhole-diskfailures,followedbymediaerrors,pendingsectorerrorsanduncorrectablesectorerrorsCorrelationBetweenSectorErrorsAndWhole-diskFailureRSisastrongindicatorofimpendingdiskfailure17Copyright2015EMCCorporation.
Allrightsreserved.
LargerRScountimplieshigherfailurerateintwo-monthwindowDiskFailureRateGivenDifferentRSCount1.
767758083858689909091929393949502040608010004080120160200240280320360400440480520560600DiskFailureRate(%)RScountA-2RSCharacterization(1)18Copyright2015EMCCorporation.
Allrightsreserved.
LargerRScount,fastertofail10%25%median75%90%RSCountTimeMargin(Days)DiskFailureTimeGivenDifferentRSCountRSCharacterization(2)19Copyright2015EMCCorporation.
Allrightsreserved.
RScountindicatesthedegreeofdiskreliabilitydeteriorationUsetheRScounttopredictimpendingdiskfailureinadvancePLATE:SingleDiskProactiveProtection20Copyright2015EMCCorporation.
Allrightsreserved.
70.
166.
66461.
859.
952.
14742.
63936.
94.
52.
82.
11.
71.
40.
80.
70.
40.
30.
27010203040506070809010020406080100200300400500600failurespredictedfalsepositivePercentage(%)BoththepredictedfailureandfalsepositiveratesdecreaseasthethresholdincreasesSimulationResult:FailuresCapturedRateGivenDifferentRSThresholdRSthreshold21Copyright2015EMCCorporation.
Allrightsreserved.
551515801070020406080100WithoutProactiveProtectionWithProactiveProtectionHardwareFailuresOthersTripleFailuresEliminatedTripleFailuresSingleproactiveprotectionreducesabout70%ofRAIDfailures,equivalentto88%ofthetriple-diskfailuresPLATEDeploymentResult:CausesofRecoveryIncidentsPercentage(%)22Copyright2015EMCCorporation.
Allrightsreserved.
10%remainingtriplefailuresPLATEmissesRAIDfailurescausedbymultiplelessreliabledrives,whoseRScountshaven'texceedthethresholdTriagePrioritizediskgroupswithhighestriskMotivationofARMOR:TheRAIDGroupProactiveProtection23Copyright2015EMCCorporation.
Allrightsreserved.
1-11-21-31-42-12-22-32-43-13-23-33-44-14-24-34-4XXXXThreatofFailureImminentFailureGoodDiskHealthyDG1ImminentFailureofDG2ProtectedDG3PossibleFailureofDG4Singlediskprotection:Replace2-3,2-4,3-4(PLATE)Can'tidentifyDG4northedifferencebetweenDG2andDG3Groupprotection:ReplaceDG4orincreaseredundancy(ARMOR)ProtectDG4andrecognizethedifferencebetweenDG2andDG3DiskGroupProtectionExample24Copyright2015EMCCorporation.
Allrightsreserved.
CalculatethesinglediskfailureprobabilityConditionalprobabilitythroughBayesTheoremCalculatetheprobabilityofavulnerableRAIDCombinationofthosesinglediskprobabilitiesthroughjointprobabilityARMORMethodology25Copyright2015EMCCorporation.
Allrightsreserved.
ThediscriminationshowsARMORiseffectivetorecognizeendangeredDGsInpractice,itidentifiesmostDGfailuresthatarenotpredictedbyPLATEProbabilityDecilesdistributionEvaluation0.
250.
330.
390.
440.
460.
50.
630.
730.
930.
150.
20.
230.
250.
270.
280.
30.
310.
3200.
20.
40.
60.
81123456789GroupswithmorethanonefailureGroupswithoutfailure26Copyright2015EMCCorporation.
Allrightsreserved.
GooglereportsSMARTmetricssuchasreallocatedsectorstronglysuggestanimpendingfailure,buttheyalsodeterminethathalfofthefaileddisksshownosucherrors[Pinheiro'07]DifferentworkloadandRAIDrewriteDiskfailurepredictionAveragemaximumlatency[Goldszmidt'12]SMARTfailureprediction[Murray'05,Hughes'02]RelatedWork27Copyright2015EMCCorporation.
Allrightsreserved.
Weanalyzed1millionSATAdrivesObservefailuremodesdegradingRAIDreliabilityRevealRScountreflectsthediskreliabilitydeteriorationDiskfailureispredictableWebuiltRAIDSHIELD,anactivedefensemechanismPLATE:singlediskproactiveprotection–Deploymenteliminates70%ofRAIDfailuresARMOR:diskgroupproactiveprotection–RecognizevulnerableRAIDgroups–HopetodeployinfutureIsaddingextraredundancyanefficientsolutionUseasmuchredundancyasneededtoensureavailabilityProactivereplacementshoulddecreasethelevelneededSummary28Copyright2015EMCCorporation.
Allrightsreserved.
RAIDShield:Characterizing,Monitoring,andProactivelyProtectingAgainstDiskFailuresQuestionsAcknowledgementAndreaArpaci-DusseauandRemziArpaci-DusseauDataDomainengineerteam,membersofADandCTOoffice,StephenManley29Copyright2015EMCCorporation.
Allrightsreserved.
CalculatethesinglediskfailureprobabilityCalculatetheprobabilityofavulnerableRAID
最近主机参考拿到了一台恒创科技的美国VPS云服务器测试机器,那具体恒创科技美国云服务器性能到底怎么样呢?主机参考进行了一番VPS测评,大家可以参考一下,总体来说还是非常不错的,是值得购买的。非常适用于稳定建站业务需求。恒创科技服务器怎么样?恒创科技服务器好不好?henghost怎么样?henghost值不值得购买?SonderCloud服务器好不好?恒创科技henghost值不值得购买?恒创科技是...
DMIT,最近动作频繁,前几天刚刚上架了日本lite版VPS,正在酝酿上线日本高级网络VPS,又差不多在同一时间推出了美国cn2 gia线路不限流量的美国云服务器,不过价格太过昂贵。丐版只有30M带宽,月付179.99 美元 !!目前美国云服务器已经有个4个套餐,分别是,Premium(cn2 gia线路)、Lite(普通直连)、Premium Secure(带高防的cn2 gia线路),Prem...
atcloud怎么样?atcloud刚刚发布了最新的8折优惠码,该商家主要提供常规cloud(VPS)和storage(大硬盘存储)系列VPS,其数据中心分布在美国(俄勒冈、弗吉尼亚)、加拿大、英国、法国、德国、新加坡,所有VPS默认提供480Gbps的超高DDoS防御。Atcloud高防VPS。atcloud.net,2020年成立,主要提供基于KVM虚拟架构的VPS、只能DNS解析、域名、SS...
阵列卡为你推荐
ip购买买一个电信的固定IP多少钱啊?leavemealone女孩说leave me alone还有戏吗?求帮助甲骨文不满赔偿劳动法员工工作不满一个月辞退赔偿标准百花百游百花蛇草的作用百度关键词分析如何正确分析关键词?www.baitu.com韩国片爱人.欲望的观看地址www.vtigu.com初三了,为什么考试的数学题都那么难,我最多也就135,最后一道选择,填空啊根本没法做,最后几道大题倒www.vtigu.com如图,已知四边形ABCD是平行四边形,下列条件:①AC=BD,②AB=AD,③∠1=∠2④AB⊥BC中,能说明平行四边形16668.com香港最快开奖现场直播今晚开kb123.net股市里的STAQ、NET市场是什么?
域名备案中心 256m内存 站群服务器 nerd 好玩的桌面 阿里云代金券 帽子云 国外免费全能空间 美国免费空间 qq金券 免费个人网页 google搜索打不开 贵州电信 hosts文件修改 达拉斯 大硬盘补丁 瓦工招聘 竞彩论坛空间 网络带宽测试 19寸机柜尺寸 更多