1Copyright2015EMCCorporation.
Allrightsreserved.
RAIDShield:Characterizing,Monitoring,andProactivelyProtectingAgainstDiskFailuresPresenter:AoMaAjointworkwithFredDouglis,GuanlinLu,DarrenSawyerSurendarChandra,WindsorHsu2Copyright2015EMCCorporation.
Allrightsreserved.
DiskfailuresarecommonplaceWhole-diskfailurePartialfailureRAIDiswidelydeployedProtectdataagainstfailureswithredundancyPervasiveRAIDProtection3Copyright2015EMCCorporation.
Allrightsreserved.
StoragesystemisevolvingEscalateduseoflessreliabledrivescausesmorewhole-diskfailuresIncreasingdiskcapacityresultsinmoresectorerrorsSolutionAddextraredundancy(RAID5,RAID6,…)–EnsuredatareliabilityatthecostofstorageefficiencyRAIDOverviewIsaddingextraredundancyanefficientsolution4Copyright2015EMCCorporation.
Allrightsreserved.
Analyzed1millionSATAdisksandrevealedFailuremodesdegradingRAIDreliabilityReallocatedsectorsreflectdiskreliabilitydeteriorationDiskfailureispredictableBuiltRAIDSHIELD,anactivedefensemechanismReconstructfailingdiskbeforeit'stoolate!
PLATE:single-diskproactiveprotection–Deploymenteliminates70%ofRAIDfailuresARMOR:diskgroupproactiveprotection–RecognizevulnerableRAIDgroupsWhatWeDid5Copyright2015EMCCorporation.
Allrightsreserved.
BackgroundDiskfailureanalysisRAIDSHIELD:IdentifyfailureindicatorReallocatedSector(RS)characterizationSinglediskproactiveprotectionDiskgroupproactiveprotection5Outline6Copyright2015EMCCorporation.
Allrightsreserved.
Diskfailuredoesnotfollowafail-stopmodelTheproductionsystemsstudieddefinefailureasConnectionislostAnoperationexceedsthetimeoutthresholdWritefailsWhole-diskFailureDefinition7Copyright2015EMCCorporation.
Allrightsreserved.
EachdiskdrivemodelisdenotedasRelativesizeswithinafamilyareorderedbythecapacitynumber–E.
g.
A-2islargerthanA-1DiskModelPopulation(Thousands)FirstDeploymentLogLength(Months)A-13406/200860A-216511/200860B-110006/200848C-19310/201036C-225312/201036D-138409/201121DiskDataCollection8Copyright2015EMCCorporation.
Allrightsreserved.
WhatDoRealDiskFailuresLookLike9Copyright2015EMCCorporation.
Allrightsreserved.
0001424462031010203040500-612-1824-3036-4248-54A-2MonthDistributionofLifetimeofFailedDrives001241534291130102030400-612-1824-3036-4248-54A-1MonthAlargefractionoffaileddrivesarefoundatasimilaragePercentage(%)Percentage(%)10Copyright2015EMCCorporation.
Allrightsreserved.
ThenumberofaffecteddiskskeepgrowingAbout10%ofdisksgetsectorerrorsatthe3rdyearSectorerrornumbersincreasescontinuouslyAverageerrorcountincreases25%to300%yearoveryearIncreasingFrequencyofSectorErrors11Copyright2015EMCCorporation.
Allrightsreserved.
DrivefailingatasimilarageFailurerateisnotconstantAhighriskofmultiplesimultaneousfailuresIncreasingfrequencyofsectorerrorsExacerbateriskofreconstructionfailuresPassiveRedundancyisInefficientEnsuringreliabilityintheworstcaserequiresaddingconsiderableextraredundancy,makingitunattractivefromacostperspective12Copyright2015EMCCorporation.
Allrightsreserved.
MotivationEnsuredatasafetywithminimalredundancyProactivelyrecognizeimpendingfailuresandmigratevulnerabledatainadvanceMethodologyIdentifyindicatorofimpendingfailureIndicatorcharacterizationProactiveprotectionRAIDSHIELD,TheProactiveProtection13Copyright2015EMCCorporation.
Allrightsreserved.
PotentialindicatorsVariousdiskerrorsCriteriaofagoodindicatorIthappensmuchmorefrequentlyonfaileddisksratherthanworkingdisksApproachQuantifythediscriminationbetweenerrorvalueonfaileddisksandworkingones–DecilescomparisonisusedIdentifyFailureIndicator14Copyright2015EMCCorporation.
Allrightsreserved.
FaileddiskshavemoremediaerrorsthanworkingonesThediscriminationisnotsignificantenoughMediaErrorComparison2359152232478611123471330020406080100123456789faileddiskworkingdiskDecilesA-2MediaErrorCount15Copyright2015EMCCorporation.
Allrightsreserved.
2238718732752281212422025000001262905001000150020002500123456789faileddiskworkingdiskA-2DecilesReallocatedSectorCountA-2DecilesRSisstronglycorrelatedwithdiskfailuresReallocatedSector(RS)Comparison16Copyright2015EMCCorporation.
Allrightsreserved.
MostfaileddrivestendtohavealargernumberofRSthanworkingonesRSisstronglycorrelatedwithwhole-diskfailures,followedbymediaerrors,pendingsectorerrorsanduncorrectablesectorerrorsCorrelationBetweenSectorErrorsAndWhole-diskFailureRSisastrongindicatorofimpendingdiskfailure17Copyright2015EMCCorporation.
Allrightsreserved.
LargerRScountimplieshigherfailurerateintwo-monthwindowDiskFailureRateGivenDifferentRSCount1.
767758083858689909091929393949502040608010004080120160200240280320360400440480520560600DiskFailureRate(%)RScountA-2RSCharacterization(1)18Copyright2015EMCCorporation.
Allrightsreserved.
LargerRScount,fastertofail10%25%median75%90%RSCountTimeMargin(Days)DiskFailureTimeGivenDifferentRSCountRSCharacterization(2)19Copyright2015EMCCorporation.
Allrightsreserved.
RScountindicatesthedegreeofdiskreliabilitydeteriorationUsetheRScounttopredictimpendingdiskfailureinadvancePLATE:SingleDiskProactiveProtection20Copyright2015EMCCorporation.
Allrightsreserved.
70.
166.
66461.
859.
952.
14742.
63936.
94.
52.
82.
11.
71.
40.
80.
70.
40.
30.
27010203040506070809010020406080100200300400500600failurespredictedfalsepositivePercentage(%)BoththepredictedfailureandfalsepositiveratesdecreaseasthethresholdincreasesSimulationResult:FailuresCapturedRateGivenDifferentRSThresholdRSthreshold21Copyright2015EMCCorporation.
Allrightsreserved.
551515801070020406080100WithoutProactiveProtectionWithProactiveProtectionHardwareFailuresOthersTripleFailuresEliminatedTripleFailuresSingleproactiveprotectionreducesabout70%ofRAIDfailures,equivalentto88%ofthetriple-diskfailuresPLATEDeploymentResult:CausesofRecoveryIncidentsPercentage(%)22Copyright2015EMCCorporation.
Allrightsreserved.
10%remainingtriplefailuresPLATEmissesRAIDfailurescausedbymultiplelessreliabledrives,whoseRScountshaven'texceedthethresholdTriagePrioritizediskgroupswithhighestriskMotivationofARMOR:TheRAIDGroupProactiveProtection23Copyright2015EMCCorporation.
Allrightsreserved.
1-11-21-31-42-12-22-32-43-13-23-33-44-14-24-34-4XXXXThreatofFailureImminentFailureGoodDiskHealthyDG1ImminentFailureofDG2ProtectedDG3PossibleFailureofDG4Singlediskprotection:Replace2-3,2-4,3-4(PLATE)Can'tidentifyDG4northedifferencebetweenDG2andDG3Groupprotection:ReplaceDG4orincreaseredundancy(ARMOR)ProtectDG4andrecognizethedifferencebetweenDG2andDG3DiskGroupProtectionExample24Copyright2015EMCCorporation.
Allrightsreserved.
CalculatethesinglediskfailureprobabilityConditionalprobabilitythroughBayesTheoremCalculatetheprobabilityofavulnerableRAIDCombinationofthosesinglediskprobabilitiesthroughjointprobabilityARMORMethodology25Copyright2015EMCCorporation.
Allrightsreserved.
ThediscriminationshowsARMORiseffectivetorecognizeendangeredDGsInpractice,itidentifiesmostDGfailuresthatarenotpredictedbyPLATEProbabilityDecilesdistributionEvaluation0.
250.
330.
390.
440.
460.
50.
630.
730.
930.
150.
20.
230.
250.
270.
280.
30.
310.
3200.
20.
40.
60.
81123456789GroupswithmorethanonefailureGroupswithoutfailure26Copyright2015EMCCorporation.
Allrightsreserved.
GooglereportsSMARTmetricssuchasreallocatedsectorstronglysuggestanimpendingfailure,buttheyalsodeterminethathalfofthefaileddisksshownosucherrors[Pinheiro'07]DifferentworkloadandRAIDrewriteDiskfailurepredictionAveragemaximumlatency[Goldszmidt'12]SMARTfailureprediction[Murray'05,Hughes'02]RelatedWork27Copyright2015EMCCorporation.
Allrightsreserved.
Weanalyzed1millionSATAdrivesObservefailuremodesdegradingRAIDreliabilityRevealRScountreflectsthediskreliabilitydeteriorationDiskfailureispredictableWebuiltRAIDSHIELD,anactivedefensemechanismPLATE:singlediskproactiveprotection–Deploymenteliminates70%ofRAIDfailuresARMOR:diskgroupproactiveprotection–RecognizevulnerableRAIDgroups–HopetodeployinfutureIsaddingextraredundancyanefficientsolutionUseasmuchredundancyasneededtoensureavailabilityProactivereplacementshoulddecreasethelevelneededSummary28Copyright2015EMCCorporation.
Allrightsreserved.
RAIDShield:Characterizing,Monitoring,andProactivelyProtectingAgainstDiskFailuresQuestionsAcknowledgementAndreaArpaci-DusseauandRemziArpaci-DusseauDataDomainengineerteam,membersofADandCTOoffice,StephenManley29Copyright2015EMCCorporation.
Allrightsreserved.
CalculatethesinglediskfailureprobabilityCalculatetheprobabilityofavulnerableRAID
Hostodo是一家成立于2014年的国外VPS主机商,现在主要提供基于KVM架构的VPS主机,美国三个地区机房:拉斯维加斯、迈阿密和斯波坎,采用NVMe或者SSD磁盘,支持支付宝、PayPal、加密货币等付款方式。商家最近对于上架不久的斯波坎机房SSD硬盘VPS主机提供66折优惠码,适用于1GB或者以上内存套餐年付,最低每年12美元起。下面列出几款套餐配置信息。CPU:1core内存:256MB...
公司成立于2021年,专注为用户提供低价高性能云计算产品,致力于云计算应用的易用性开发,面向全球客户提供基于云计算的IT解决方案与客户服务,拥有丰富的国内BGP、三线高防、香港等优质的IDC资源。公司一直秉承”以人为本、客户为尊、永续创新”的价值观,坚持”以微笑收获友善, 以尊重收获理解,以责任收获支持,以谦卑收获成长”的行为观向客户提供全面优质的互...
RAKsmart发布了9月份优惠促销活动,从9月1日~9月30日期间,爆款美国服务器每日限量抢购最低$30.62-$46/月起,洛杉矶/圣何塞/香港/日本站群大量补货特价销售,美国1-10Gbps大带宽不限流量服务器低价热卖等。RAKsmart是一家华人运营的国外主机商,提供的产品包括独立服务器租用和VPS等,可选数据中心包括美国加州圣何塞、洛杉矶、中国香港、韩国、日本、荷兰等国家和地区数据中心(...
阵列卡为你推荐
小度商城小度智能屏Air哪里可以买?大家都怎么入手的?openeuler手机里的安全性open.wpapsk分别是什么意思www.hao360.cn搜狗360导航网址是什么www.20ren.com有什么好看的电影吗?来几个…lunwenjiancepaperfree论文检测安全吗haokandianyingwang谁给个好看的电影网站看看。789se.com莫非现在的789mmm珍的com不管了百度指数词百度指数我创建的新词抓站工具抓鸡要什么工具?广告法中华人民共和国广告法中,有哪些广告不得发布?
二级域名查询 krypt 鲜果阅读 远程登陆工具 服务器架设 网站挂马检测工具 铁通流量查询 上海域名 合租空间 移动服务器托管 东莞主机托管 广州虚拟主机 万网空间 服务器硬件配置 电信主机托管 谷歌搜索打不开 九零网络 ncp 免费网站加速 防盗链 更多