tirersync
rsync 时间:2021-01-12 阅读:(
)
TAPER:TieredApproachforEliminatingRedundancyinReplicaSynchronizationNavenduJain,MikeDahlin,andRenuTewari§DepartmentofComputerSciences,UniversityofTexasatAustin,Austin,TX,78712§IBMAlmadenResearchCenter,650HarryRoad,SanJose,CA,95111{nav,dahlin}@cs.
utexas.
edu,tewarir@us.
ibm.
comAbstractWepresentTAPER,ascalabledatareplicationprotocolthatsynchronizesalargecollectionofdataacrossmulti-plegeographicallydistributedreplicalocations.
TAPERcanbeappliedtoabroadrangeofsystems,suchassoft-waredistributionmirrors,contentdistributionnetworks,backupandrecovery,andfederatedlesystems.
TA-PERisdesignedtobebandwidthefcient,scalableandcontent-based,anditdoesnotrequirepriorknowledgeofthereplicastate.
Toachievetheseproperties,TA-PERprovides:i)fourpluggableredundancyeliminationphasesthatbalancethetrade-offbetweenbandwidthsav-ingsandcomputationoverheads,ii)ahierarchicalhashtreebaseddirectorypruningphasethatquicklymatchesidenticaldatafromthegranularityofdirectorytreestoindividualles,iii)acontent-basedsimilaritydetectiontechniqueusingBloomlterstoidentifysimilarles,andiv)acombinationofcoarse-grainedchunkmatchingwithner-grainedblockmatchestoachievebandwidthefciency.
Throughextensiveexperimentsonvariousdatasets,weobservethatincomparisonwithrsync,awidely-useddirectorysynchronizationtool,TAPERre-ducesbandwidthby15%to71%,performsfastermatch-ing,andscalestoalargernumberofreplicas.
1IntroductionInthispaperwedescribeTAPER,aredundancyelimina-tionprotocolforreplicasynchronization.
OurmotivationforTAPERarosefrombuildingafederatedlesystemusingNFSv4servers,eachsharingacommonsystem-widenamespace[25].
Inthissystem,dataisreplicatedfromamasterservertoacollectionofservers,updatedatthemaster,periodicallysynchronizedtotheotherservers,andreadfromanyserverviaNFSv4clients.
Synchronizationinthisenvironmentrequiresaproto-colthatminimizesbothnetworkbandwidthconsumptionandend-hostoverheads.
Numerousapplicationshavesimilarrequirements:theyrequirereplicatingandsyn-chronizingalargecollectionofdataacrossmultiplesites,possiblyoverlow-bandwidthlinks.
Forexample,soft-waredistributionmirrorsites,synchronizingpersonalsystemswitharemoteserver,backupandrestoresys-tems,versioningsystems,contentdistributionnetworks(CDN),andfederatedlesystemsallrelyonsynchro-nizingthecurrentdataatthesourcewitholderversionsofthedataatremotetargetsitesandcouldmakeuseofTAPER.
Unfortunately,existingapproachesdonotsuitsuchenvironments.
Ononehand,protocolssuchasdeltacompression(e.
g.
,vcdiff[14])andsnapshotdifferencing(e.
g.
,WAFL[11])canefcientlyupdateonesitefromanother,buttheyrequireaprioriknowledgeofwhichversionsarestoredateachsiteandwhatchangesoc-curredbetweentheversions.
But,ourenvironmentre-quiresauniversaldatasynchronizationprotocolthatin-teroperateswithmulti-vendorNFSimplementationsondifferentoperatingsystemswithoutanyknowledgeoftheirinternalstatetodeterminetheversionofthedataatthereplica.
Ontheotherhand,hash-baseddifferentialcompressionprotocolssuchasrsync[2]andLBFS[20]donotrequireaprioriknowledgeofreplicastate,buttheyareinefcient.
Forexample,rsyncreliesonpathnamestoidentifysimilarlesandthereforetransferslargeamountsofdatawhenaleordirectoryisrenamedorcopied,andLBFS'ssingle-granularitychunkingcom-promisesefciency(a)bytransferringextrametadatawhenredundancyspanningmultiplechunksexistsand(b)bymissingsimilarityongranularitiessmallerthanthechunksize.
TheTAPERdesignfocusesonprovidingfourkeypropertiesinordertoprovidespeed,scalability,band-widthefciency,andcomputationalefciency:P1:Low,re-usablecomputationatthesourceP2:FastmatchingatthetargetP3:FindmaximalcommondatabetweenthesourceandthetargetP4:MinimizetotalmetadataexchangedP1isnecessaryforascalablesolutionthatcansimulta-neouslysynchronizemultipletargetswithasource.
Sim-ilarly,P2isnecessarytoreducethematchingtimeand,therefore,thetotalresponsetimeforsynchronization.
TosupportP2,thematchingatthetargetshouldbebasedonFAST'05:4thUSENIXConferenceonFileandStorageTechnologiesUSENIXAssociation281FAST'05:4thUSENIXConferenceonFileandStorageTechnologiesindexingtoidentifythematchingcomponentsinO(1)time.
Thelasttwo,P3andP4,arebothindicatorsofbandwidthefciencyastheydeterminethetotalamountofdataandthetotalmetadatainformation(hashesetc.
)thataretransferred.
BalancingP3andP4isthekeyre-quirementinordertominimizethemetadataoverheadforthedatatransfersavings.
ObservethatinrealizingP3,thesourceandtargetshouldndcommondataacrossalllesandnotjustcomparelepairsbasedonname.
Toprovidealloftheseproperties,TAPERisamulti-phase,hierarchicalprotocol.
Eachphaseoperatesoverdecreasingdatagranularity,startingwithdirectoriesandles,thenlargechunks,thensmallerblocks,and-nallybytes.
ThephasesofTAPERbalancetheband-widthefciencyofsmaller-sizematchingwiththere-ducedcomputationaloverheadoflesserunmatcheddata.
TherstphaseofTAPEReliminatesallcommonlesandquicklyprunesdirectoriesusingacontent-basedhi-erarchicalhashtreedatastructure.
Thenextphaseelim-inatesallcommoncontent-denedchunks(CDC)acrossallles.
ThethirdphaseoperatesonblockswithintheremainingunmatchedchunksbyapplyingasimilaritydetectiontechniquebasedonBloomlters.
Finally,thematchedandunmatchedblocksremainingatthesourcearefurtherdeltaencodedtoeliminatecommonbytes.
Ourmaincontributionsinthispaperare:i)designofanewhierarchicalhashtreedatastructureforfastprun-ingofdirectorytrees,ii)designandanalysisofasimi-laritydetectiontechniqueusingCDCandBloomltersthatcompactlyrepresentthecontentofale,iii)designofacombinedCDCandslidingblocktechniqueforbothcoarse-grainedandne-grainedmatching,iv)integratingandimplementingalltheabovetechniquesinTAPER,amulti-phase,multi-grainprotocol,thatisengineeredaspluggableunits.
ThephasesofTAPERarepluggableinthateachphaseusesadifferentmechanismcorrespond-ingtodatagranularity,andaphasecanbedroppedallto-gethertotradebandwidthsavingsforcomputationcosts.
And,v)acompleteprototypeimplementationandper-formanceevaluationofoursystem.
Throughextensiveexperimentsonvariousdatasets,weobservethatTAPERreducesbandwidthby15%to71%overrsyncfordiffer-entworkloads.
Therestofthepaperisorganizedasfollows.
Sec-tion2providesanoverviewoftheworkingofslidingblockandCDC.
TheseoperationsformthebasisofboththesecondandthirdphasesthatlieatthecoreofTA-PER.
TheoverallTAPERprotocolisdescribedindetailinSection3.
SimilaritydetectionusingCDCandBloomltersisdescribedandanalyzedinSection4.
Section5evaluatesandcomparesTAPERfordifferentworkloads.
Finally,Section6coversrelatedworkandweconcludewithSection7.
2BackgroundFigure1:DirectoryTreeSynchronizationProblem:Thesourcetreeisshownontheleftandthetargettreewithmul-tipleupdates,additions,andrenames,isontheright.
Insynchronizingadirectorytreebetweenasourceandtarget(Figure1),anyapproachshouldefcientlyhandleallthecommonupdateoperationsonlesystems.
Theseinclude:i)adding,deleting,ormodifyinglesanddirec-tories,ii)movinglesordirectoriestootherpartsofthetree,iii)renaminglesanddirectories,andiv)archivingalargecollectionoflesanddirectoriesintoasinglele(e.
g.
,tar,lib).
Althoughnumeroustoolsandutilitiesexistfordi-rectorysynchronizationwithnodataversioninginfor-mation,theunderlyingtechniquesareeitherbasedonmatching:i)blockhashesorii)hashesofcontent-denedchunks.
Wendthatslidingblockhashes(Section2.
1)arewellsuitedtorelativelyne-grainedmatchingbe-tweensimilarles,andthatCDCmatching(Section2.
2)issuitableformorecoarse-grained,globalmatchingacrossallles.
2.
1FixedandSlidingBlocksInblock-basedprotocols,axed-blockapproachcom-putesthesignature(e.
g.
,SHA-1,MD5,orMD4hash)ofaxed-sizeblockatboththesourceandtargetandsimplyindexesthesignaturesforaquickmatch.
Fixed-blockmatchingperformspoorlybecausesmallmodi-cationschangeallsubsequentblockboundariesinaleandeliminateanyremainingmatches.
Instead,asliding-blockapproachisusedinprotocolslikersyncforabettermatch.
Here,thetarget,T,dividesalefintonon-overlapping,contiguous,xed-sizeblocksandsendsitssignatures,4-byteMD4alongwitha2-byterollingchecksum(rsync'simplementationusesfull16byteMD4and4byterollingchecksumsper-blockforlargeles),tothesourceS.
IfanexistingleatS,sayf,hasthesamenameasf,eachblocksignatureoffiscomparedwithasliding-blocksignatureofeveryover-lappingxed-sizeblockinf.
Thereareseveralvariantsofthebasicsliding-blockapproach,whichwediscussinSection6,butallofthemcomputeaseparatemulti-bytechecksumforeachbyteofdatatobetransferred.
Becausethischecksuminformationislargecomparedtothedatabeingstored,itwouldbetoocostlytostoreallchecksumsforalloffsetsofalllesinasystem,sotheseUSENIXAssociation282Figure2:EffectofSprinkledChangesinCDC.
Thex-axisistheexpectedchunksize.
Thelefty-axis,usedforthebargraphs,showsthenumberofmatchingchunks.
Therighty-axis,forthelineplot,showsthetotaldatatransferred.
Figure3:EffectofSprinkledChangesinRsync.
Thex-axisisthexedblocksize.
Thelefty-axis,usedforthebargraphs,showsthenumberofmatchingblocks.
Therighty-axis,forthelineplot,showsthetotaldatatransferred.
systemsmustdomatchingonaner(e.
g.
,per-le)gran-ularity.
Asaresult,thesesystemshavethreefundamentalproblems.
First,matchingrequiresknowingwhichlefatthesourceshouldbematchedwiththelefatthetar-get.
Rsyncsimplyreliesonlenamesbeingthesame.
Thisapproachmakesrsyncvulnerabletonamechanges(i.
e.
,arenameoramoveofadirectorytreewillresultinnomatches,violatingpropertyP3).
Second,scala-bilitywiththenumberofreplicasislimitedbecausethesourcemachinerecomputestheslidingblockmatchforeveryleandforeverytargetmachineandcannotre-useanyhashcomputation(propertyP1).
Finally,thematch-ingtimeishighasthereisnoindexingsupportforthehashes:todetermineifablockmatchestakestimeoftheorderofnumberofbytesinaleastherollinghashhastobecomputedovertheentireleuntilamatchoccurs(propertyP2).
Observethatrsync[2]thusviolatesprop-ertiesP1,P2,andP3.
Althoughrsyncisawidelyusedprotocolforsynchronizingasingleclientandserver,itisnotdesignedforlargescalereplicasynchronization.
Tohighlighttheproblemofname-basedmatchinginrsync,consider,forexample,thesourcedirectoryofGNUEmacs-20.
7consistingof2086leswithtotalsizeof54.
67MB.
Supposewerenameonlythetoplevelsub-directoriesinEmacs-20.
7(ormovethemtoanotherpartoftheparenttree).
Althoughnodatahaschanged,rsyncwouldhavesenttheentire54.
67MBofdatawithanad-ditional41.
04KBofhashmetadata(usingthedefaultblocksizeof700bytes),acrossthenetwork.
Incon-trast,aswedescribeinSection3.
1.
1,TAPERalleviatesthisproblembyperformingcontent-basedpruningusingahierarchicalhashtree.
2.
2Content-denedChunksContent-denedchunkingbalancesthefast-matchingofaxed-blockapproachwiththenerdatamatchingabil-ityofsliding-blocks.
CDChasbeenusedinLBFS[20],Venti[21]andothersystemsthatwediscussinSec-tion6.
Achunkisavariable-sizedblockwhosebound-ariesaredeterminedbyitsRabinngerprintmatchingapre-determinedmarkervalue[22].
ThenumberofbitsintheRabinngerprintthatareusedtomatchthemarkerdeterminetheexpectedchunksize.
Forexam-ple,givenamarker0x78andanexpectedchunksizeof2k,arolling(overlappingsequence)48-bytenger-printiscomputed.
Ifthelowerkbitsofthengerprintequal0x78,anewchunkboundaryisset.
Sincethechunkboundariesarecontent-based,alemodicationshouldaffectonlyneighboringchunksandnottheen-tirele.
Formatching,theSHA-1hashofthechunkisused.
MatchingachunkusingCDCisasimplehashta-blelookup.
Clearly,theexpectedchunksizeiscriticaltotheper-formanceofCDCanddependsonthedegreeoflesim-ilarityandthelocationsofthelemodications.
Thechunksizeisatrade-offbetweenthedegreeofmatch-ingandthesizeofthemetadata(hashvalues).
Largerchunksreducethesizeofmetadatabutalsoreducethenumberofmatches.
Thus,foranygivenchunksize,theCDCapproachviolatespropertiesP3,P4,orboth.
Fur-thermore,asminormodicationscanaffectneighboringchunks,changessprinkledacrossalecanresultinfewmatchingchunks.
TheexpectedchunksizeismanuallysetinLBFS(8KBdefault).
Similarly,thexedblocksizeismanuallyselectedinrsync(700bytedefault).
Toillustratetheeffectofsmallchangesrandomlydis-tributedinale,consider,forexample,ale(say'bar')with100KBofdatathatisupdatedwith100changesof10byteseach(i.
e.
,a1%change).
Figures2and3showthevariationsduetosprinkledchangesinthematcheddataforCDCandrsync,respectively.
ObservethatwhilersyncndsmorematchingdatathanCDCforsmallblocksizes,CDCperformsbetterforlargechunksizes.
Forablockandexpectedchunksizeof768bytes,FAST'05:4thUSENIXConferenceonFileandStorageTechnologiesUSENIXAssociation283FAST'05:4thUSENIXConferenceonFileandStorageTechnologiesrsyncmatched51blocks,transmittingatotalof62KB,whileCDCmatched31chunks,transmittingatotalof86KB.
Foralargerblocksizeof2KB,however,rsyncfoundnomatches,whileCDCmatched12chunksandtransmitted91KB.
IndesigningTAPER,weusethisob-servationtoapplyCDCintheearlierphasewithrela-tivelylargerchunksizes.
3TAPERAlgorithmInthissection,werstpresenttheoverallarchitectureoftheTAPERprotocolandthendescribeeachofthefourTAPERphasesindetail.
3.
1TAPERProtocolOverviewTAPERisadirectorytreesynchronizationprotocolbe-tweenasourceandatargetnodethataimsatminimizingthetransmissionofanycommondatathatalreadyexistsatthetarget.
TheTAPERprotocoldoesnotassumeanyknowledgeofthestateortheversionofthedataatthetarget.
It,therefore,buildsonhash-basedtechniquesfordatasynchronization.
Ingeneral,foranyhash-basedsynchronizationproto-col,thesmallerthematchinggranularitythebetterthematchandlowerthenumberofbytestransfered.
How-ever,ne-grainedmatchingincreasesthemetadatatrans-fer(hashvaluesperblock)andthecomputationover-head.
Whilesystemswithlowbandwidthnetworkswilloptimizeonthetotaldatatransferred,thosewithslowerserverswilloptimizethecomputationoverhead.
TheintuitionbehindTAPERistoworkinphases(Fig-ure4)whereeachphasemovesfromalargertoanermatchinggranularity.
Theprotocolworksinfourphases:startingfromadirectorytree,movingontolargechunks,thentosmallerblocks,andnallytobytes.
EachphaseinTAPERusesthebestmatchingtechniqueforthatsize,doesthenecessarytransformations,anddeterminesthesetofdataoverwhichthematchesoccur.
Specically,thersttwophasesperformcoarsegrainedmatchingatthelevelofdirectorytreesandlargeCDCchunks(4KBexpectedchunksize).
Sincetheinitialmatchingisperformedatahighgranularity,thecorrespondinghashinformationconstitutesonlyasmallfractionofthetotaldata.
TheSHA-1hashescomputedinthersttwophasescanthereforebepre-computedonceandstoredinaglobalandpersistentdatabaseatthesource.
Theglobaldatabasemaximizesmatchingbyallowinganydirectory,le,orchunkthatthesourcewantstotransmittobematchedagainstanydirectory,le,orchunkthatthetargetstores.
Andthepersistentdatabaseenhancescomputationalefciencybyallowingthesourcetore-usehashcomputationsacrossmultipletargets.
Conversely,thelasttwophasesperformmatch-ingatthelevelofsmallerblocks(e.
g.
,700bytes),soprecomputingandstoringallhashesofallsmallblockswouldbeexpensive.
Instead,thesephasesuselocalmatchinginwhichtheyidentifysimilarlesorblocksandcomputeandtemporarilystoresummarymetadataaboutthespeciclesorblockscurrentlybeingexam-ined.
Akeybuildingblockforthesephasesisefcientsimilaritydetection,whichweassumeasaprimitiveinthissectionanddiscussindetailinSection4.
Figure4:ThebuildingblocksofTAPER3.
1.
1DirectoryMatchingTherstphase,directorymatching,eliminatesidenticalportionsofthedirectorytreethatarecommonincontentandstructure(butmayhavedifferentnames)betweenthesourceandthetarget.
Wedeneahierarchicalhashtree(HHT)forthispurposetoquicklyndalltheexactmatchingsubtreesprogressingdownfromthelargesttothesmallestdirectorymatchandnallymatchingidenti-calindividualles.
Figure5:PhaseI:HierarchicalHashTreeTheHHTrepresentationencodesthedirectorystruc-tureandcontentsofadirectorytreeasalistofhashval-uesforeverynodeinthetree.
Thenodesconsistoftherootofthedirectorytree,alltheinternalsub-directories,leafdirectories,andnallyalltheles.
TheHHTstruc-tureisrecursivelycomputedasfollows.
First,thehashvalueofalenodefiisobtainedusingastandardcryp-tographichashalgorithm(SHA-1)ofthecontentsofthele.
Second,foraleafdirectoryDL,thehashvalueh(DL)isthehashofallthekconstituentlehashes,i.
e.
,h(DL)=h(h(f1)h(f2).
.
.
h(fk)).
Notethattheorderofconcatenatinghashesofleswithinthesamedirectoryisbasedonthehashvaluesandnotonthelenames.
Third,foranon-leafsub-directory,thehashvaluecap-turesnotonlythecontentasinMerkletreesbutalsoUSENIXAssociation284Figure6:Emacs-20.
7CDCDistribution(Mean=2KB,Max=64KB).
Thelefty-axis(logscale)correspondstothehistogramofchunksizes,andtherighty-axisshowsthecumulativedistribution.
Figure7:Emacs-20.
7CDCDistribution(Mean=2KB,Max=4KB).
Thelefty-axis(logscale)correspondstothehistogramofchunksizes,andtherighty-axisshowsthecumulativedistribution.
thestructureofthetree.
AsillustratedinFigure5,thehashofasub-directoryDSiscomputedbyanin-ordertraversalofallitsimmediatechildren.
Forexample,ifDS={DS1,DS2}thenh(DS)=h(h(DN)h(DS1)h(UP)h(DN)h(DS2)h(UP))where"UP"and"DN"aretwoliteralsrepresentingthetraversaloftheupanddownlinksinthetreerespec-tively.
Finally,thehashoftherootnode,DR,ofthedirectorytreeiscomputedsimilartothatofasubtreede-nedabove.
TheHHTalgorithm,thus,outputsalistofthehashvaluesofallthenodes,inthedirectorytreei.
e.
,h(DR),h(DA),h(DS)h(DL)h(f1)NotethatourHHTtechniqueprovidesahierarchicalen-codingofboththelecontentandthedirectorystruc-ture.
Thisprovesbenecialineliminatingdirectorytreesidenticalincontentandstructureatthehighestlevel.
Thetarget,inturn,computestheHHThashvaluesofitsdirectorytreeandstoreseachelementinahashtable.
EachelementoftheHHTsentbythesource—startingattherootnodeofthedirectorytreeandifnecessarypro-gressingdownwardtothelenodes—isusedtoindexintothetarget'shashtabletoseeifthenodematchesanynodeatthetarget.
Thus,HHTndsthemaximalcom-mondirectorymatchandenablesfastdirectorypruningsinceamatchatanynodeimpliesthatallthedescendantnodesmatchaswell.
Forexample,iftheroothashval-uesmatch,thennofurthermatchingisdoneasthetreesareidenticalinbothcontentandstructure.
Attheendofthisphase,allexactlymatchingdirectorytreesandleswouldhavebeenpruned.
ToillustratetheadvantageofHHT,consider,forex-ample,arenameupdateoftherootdirectoryofLinuxKernel2.
4.
26sourcetree.
Eventhoughnocontentwaschanged,rsyncfoundnomatchingdataandsenttheen-tiretreeofsize161.
7MBwithanadditional1.
03MBofmetadata(usingthedefaultblock-sizeof700bytes).
Incontrast,theHHTphaseofTAPERsent291KBoftheHHTmetadataanddetermined,afterasinglehashmatchoftherootnode,thattheentiredatawasidentical.
ThemainadvantagesofusingHHTfordirectoryprun-ingarethatitcan:i)quickly(inO(1)time)ndthemaxi-malexactmatch,ii)handleexactmatchesfromtheentiretreetoindividualles,iii)matchbothstructureandcon-tent,andiv)handleleordirectoryrenamesandmoves.
3.
1.
2MatchingChunksOnceallthecommonlesanddirectorieshavebeeneliminated,weareleftwithasetofunmatchedlesatthesourceandthetarget.
InPhaseII,tocapturethedatacommonalityacrossalllesandfurtherreducetheunmatcheddata,werelyoncontent-denedchunking(whichwediscussedinSection2).
Duringthisphase,thetargetsendstheSHA-1hashvaluesoftheunique(toremovelocalredundancy)CDCsofalltheremain-inglestothesource.
SinceCDChashescanbeindexedforfastmatching,thesourcecanquicklyeliminateallthematchingchunksacrossallthelesbetweenthesourceandtarget.
ThesourcestorestheCDChasheslocallyforre-usewhensynchronizingwithmultipletargets.
WhenusingCDCs,twoparameters—theexpectedchunksizeandthemaximumchunksize—havetobeselectedforagivenworkload.
LBFS[20]usedanex-pectedchunksizeof8KBwithamaximumof64KB.
Thechunksizes,however,couldhavealargevariancearoundthemean.
Figure6showsthefrequencyandcu-mulativedistributionofchunksizesfortheEmacs-20.
7sourcetreeusinganexpectedchunksizevalueof2KBwithnolimitationonthechunksizeexceptfortheabso-lutemaximumof64KB.
Ascanbeseenfromthegure,thechunksizeshavealargevariance,rangingfrom256bytesto12KBwitharelativelylongtail.
Themaximumchunksizelimitsthisvariancebyforc-ingachunktobecreatedifthesizeexceedsthemaxi-FAST'05:4thUSENIXConferenceonFileandStorageTechnologiesUSENIXAssociation285FAST'05:4thUSENIXConferenceonFileandStorageTechnologiesmumvalue.
However,aforcedsplitatxedsizevaluesmakesthealgorithmbehavemorelikexed-sizeblockmatchingwithpoorresiliencetoupdates.
Figure7showsthedistributionofchunksizesforthesameworkloadandexpectedchunksizevalueof2KBwithamaximumvaluenowsetto4KB.
Approximately17%ofthechunkswerecreatedduetothislimitation.
Moreover,asanupdateaffectstheneighboringchunks,CDCsarenotsuitedforne-grainedmatcheswhentherearesmall-sizedupdatessprinkledthroughoutthedata.
AsweobservedinFigures2and3inSection2,CDCperformedbetterthansliding-blockforlargersizedchunks,whilersyncwasbetterforner-grainedmatches.
We,therefore,usearelativelylargeexpectedchunksize(4KB)inthisphasetodofast,coarse-grainedmatchingofdataacrossalltheremainingles.
Attheendofthechunkmatchingphase,thesourcehasasetofleseachwithasequenceofmatchedandunmatchedregions.
Inthenextphase,doingner-grainedblockmatches,wetrytoreducethesizeoftheseunmatchedregions.
3.
1.
3MatchingBlocksAfterthecompletionofthesecondphase,eachleatthesourcewouldbeintheformofaseriesofmatchedandunmatchedregions.
Thecontiguousunmatchedchunkslyingin-betweentwomatchedchunksofthesamelearemergedtogetherandarecalledholes.
Toreducethesizeoftheholes,inthisphase,weperformner-grainedblockmatching.
Thesliding-blockmatch,however,canbeappliedonlytoapairofles.
We,therefore,needtodeterminetheconstituentlestomatchapairofholes,i.
e.
,weneedtodeterminewhichpairoflesatthesourceandtargetaresimilar.
Thetechniqueweuseforsimi-laritydetectionisneededinmultiplephases,hence,wediscussitindetailinSection4.
Onceweidentifythepairofsimilarlestocompare,blockmatchingisap-pliedtotheholesoftheleatthesource.
Wesplittheunmatchedholesofale,f,atthesourceusingrelativelysmallerxed-sizeblocks(700bytes)andsendtheblocksignatures(Rabinngerprintforweakrollingchecksum;SHA-1forstrongchecksum)tothetarget.
Atthetarget,asliding-blockmatchisusedtocompareagainsttheholesinthecorrespondingle.
Thetargetthenrequeststhesetofunmatchedblocksfromthesource.
Toenableaner-grainedmatch,inthisphase,thematchingsizeof700bytesisselectedtobeafractionoftheexpectedchunksizeof4KB.
Theextracostofsmallerblocksisoffsetbythefactthatwehavemuchlessdata(holesinsteadofles)toworkwith.
3.
1.
4MatchingBytesThisnalphasefurtherreducesthebytestobesent.
Af-terthethirdphase,thesourcehasasetofunmatchedblocksremaining.
Thesourcealsohasthesetofmatchedchunks,blocksandlesthatmatchedintherstthreephases.
Tofurtherreducethebytestobesent,theblocksintheunmatchedsetaredeltaencodedwithrespecttoasimilarblockinthematchedset.
Thetargetcanthenreconstructtheblockbyapplyingthedelta-bytestothematchedblock.
Observethatunlikeredundancyelimi-nationtechniquesforstorage,thesourcedoesnothavethedataatthetarget.
Todeterminewhichmatchedandunmatchedblocksaresimilar,weapplythesimilarityde-tectiontechniqueatthesource.
Finally,theremainingunmatchedblocksandthedelta-bytesarefurthercompressedusingstandardcompressionalgorithms(e.
g.
,gzip)andsenttothetarget.
Thedataatthetargetisvalidatedintheendbysendinganadditionalchecksumperletoavoidanyinconsistencies.
3.
1.
5DiscussionInessence,TAPERcombinesthefastermatchingofcontent-denedchunksandthenermatchingoftheslidingblockapproach.
CDChelpsinndingcommondataacrossallles,whilesliding-blockcanndsmallrandomchangesbetweenapairofles.
Someoftheis-suesinimplementingTAPERrequirefurtherdiscussion:Phasedrenement:ThemultiplephasesofTAPERresultinbetterdifferentialcompression.
Byusingacoarsegranularityforalargerdatasetwereducethemetadataoverhead.
Sincethedatasetsizereducesineachphase,itbalancesthecomputationandmeta-dataoverheadofnergranularitymatching.
TheTAPERphasesarenotjustrecursiveapplicationofthesamealgorithmtosmallerblocksizes.
Instead,theyusethebestapproachforaparticularsize.
Re-usingHashcomputation:Unlikersyncwherethesourcedoesthesliding-blockmatch,TAPERstoresthehashvaluesatthesourcebothinthedirectorymatchingandthechunkmatchingphase.
Theseval-uesneednotberecomputedfordifferenttargets,thereby,increasingthescalabilityofTAPER.
Thehashvaluesarecomputedeitherwhenthesourcelesystemisquiescedoroveraconsistentcopyofthelesystem,andarestoredinalocaldatabase.
Pluggable:TheTAPERphasesarepluggableinthatsomecanbedroppedifthedesiredlevelofdatare-ductionhasbeenachieved.
Forexample,PhaseIcanbedirectlycombinedwithPhaseIIIandsimi-laritydetectiongivingusanrsync++.
Anotherpos-sibilityisjustdroppingphasesIIIandIV.
Round-triplatency:EachphaseofTAPERrequiresametadataexchangebetweentheserverandthetar-getcorrespondingtoonelogicalround-trip.
Thisadditionalround-triplatencyperphaseisbalancedbythefactthatamountofdataandmetadatatrans-ferredissufcientlyreduced.
USENIXAssociation286Hashcollisions:Inanyhash-baseddifferentialcom-pressiontechniquethereistheextremelylowbutnon-zeroprobabilityofahashcollision[10].
Insys-temsthatusehash-basedtechniquestocompresslo-caldata,acollisionmaycorruptthesourcelesys-tem.
TAPERisusedforreplicasynchronizationandhenceonlyaffectsthetargetdata.
Secondly,dataisvalidatedbyasecondcryptographicchecksumovertheentirele.
Theprobabilityoftwohashcolli-sionsoverthesamedataisquadraticallylowerandweignorethatpossibility.
TherecentattackontheSHA-1hashfunc-tion[26]raisesthechallengeofanattackerdelib-eratelycreatingtwoleswiththesamecontent[1].
Thisattackcanbeaddressedbyprependingasecret,knownonlytotherootatthesourceandtarget,toeachchunkbeforecomputingthehashvalue.
4SimilarityDetectionAswediscussedinSection3,thelasttwophasesoftheTAPERprotocolrelyonamechanismforsimilarityde-tection.
Forblockandbytematching,TAPERneedstodeterminewhichtwolesorchunksaresimilar.
Simi-laritydetectionforleshasbeenextensivelystudiedintheWWWdomainandreliesonshingling[22]andsuperngerprintsdiscussedlaterinSection4.
3.
InTAPER,weexploretheapplicationofBloomltersforlesimilaritydetection.
Bloomlterscompactlyrep-resentasetofelementsusinganitenumberofbitsandareusedtoanswerapproximatesetmembershipqueries.
GiventhatBloomlterscompactlyrepresentaset,theycanalsobeusedtoapproximatelymatchtwosets.
Bloomlters,however,cannotbeusedforexactmatchingastheyhaveanitefalse-matchprobability,buttheyarenaturallysuitedforsimilaritymatching.
WerstgiveabriefoverviewofBloomlters,andlaterpresentandan-alyzethesimilaritydetectiontechnique.
4.
1BloomFiltersOverviewABloomlterisaspace-efcientrepresentationofaset.
GivenasetU,theBloomlterofUisimplementedasanarrayofmbits,initializedto0[4].
Eachelementu(u∈U)ofthesetishashedusingkindependenthashfunctionsh1,hk.
Eachhashfunctionhi(u)for1≤i≤kreturnsavaluebetween1andmthenwhenanelementisaddedtotheset,itsetskbits,eachbitcorrespondingtoahashfunctionoutput,intheBloomlterarrayto1.
Ifabitwasalreadysetitstays1.
Forsetmembershipqueries,Bloomltersmayyieldafalsepositive,whereitmayappearthatanelementvisinUeventhoughitisnot.
Fromtheanalysisinthesurveypa-perbyBroderandMitzenmacher[8],givenn=|U|andtheBloomltersizem,theoptimalvalueofkthatmin-imizesthefalsepositiveprobability,pk,wherepdenotesthatprobabilitythatagivenbitissetintheBloomlter,isk=mnln2.
4.
2BloomFiltersforSimilarityTestingObservethatwecanvieweachletobeasetinBloomlterparlancewhoseelementsaretheCDCsthatitiscomposedof.
FileswiththesamesetofCDCshavethesameBloomlterrepresentation.
Correspondingly,lesthataresimilarhavealargenumberof1scommonamongtheirBloomlters.
Formultisets,wemakeeachCDCuniquebeforeBloomltergenerationtodifferenti-atemultiplecopiesofthesameCDC.
ThisisachievedbyattachinganindexvalueofeachCDCchunktoitsSHA-1hash.
Theindexrangesfrom1tolnr,whereristhemultiplicityofthegivenchunkinthele.
Forndingsimilarles,wecomparetheBlooml-terofagivenleatthesourcewiththatofallthelesatthereplica.
Thelesharingthehighestnumberof1's(bit-wiseAND)withthesourceleandaboveacer-tainthreshold(say70%)ismarkedasthematchingle.
Inthiscase,thebitwiseANDcanalsobeperceivedasthedotproductofthetwobitvectors.
Ifthe1bitsintheBloomlterofaleareacompletesubsetofthatofanotherlterthenitishighlyprobablethattheleisincludedintheother.
Bloomlterwhenappliedtosimilaritydetectionhaveseveraladvantages.
First,thecompactnessofBloomltersisveryattractiveforremotereplication(storageandtransmission)systemswherewewanttominimizethemetadataoverheads.
Second,Bloomltersenablefastcomparisonasmatchingisabitwise-ANDoperation.
Third,sinceBloomltersareacompleterepresentationofasetratherthanadeterministicsample(e.
g.
,shin-gling),theycandetermineinclusionseffectivelye.
g.
,tarlesandlibraries.
Finally,astheyhavealowmetadataoverheadtheycouldbecombinedfurtherwitheitherslid-ingblockorCDCfornarrowingthematchspace.
TodemonstratetheeffectivenessofBloomltersforsimilaritydetection,consider,forexample,theleChangeLogintheEmacs-20.
7sourcedistributionwhichwecompareagainstalltheremaining1967lesintheEmacs-20.
1sourcetree.
119identicallesoutofatotal2086leswereremovedintheHHTphase.
TheCDCsoftheleswerecomputedusinganexpectedandmax-imumchunksizeof1KBand2KBrespectively.
Fig-ure8showsthatthecorrespondingChangeLogleintheEmacs-20.
1treematchedthemostwithabout90%ofthebitsmatching.
Asanotherexample,considerthelent/cong.
ntinEmacs-20.
7(Figure9)whichwecompareagainstthelesofEmacs-20.
1.
Surprisingly,thelethatmatchedmostwas–src/cong.
in—alewithadifferentnameinadifferentdirectorytree.
TheCDCexpectedandmax-imumchunksizeswere512bytesand1KBrespec-FAST'05:4thUSENIXConferenceonFileandStorageTechnologiesUSENIXAssociation287FAST'05:4thUSENIXConferenceonFileandStorageTechnologiesFigure8:BloomlterComparisonofthele'Emacs-20.
7/ChangeLog'withles'Emacs-20.
1/*'Figure9:BloomlterComparisonofthele'Emacs-20.
7/nt/cong.
nt'withles'Emacs-20.
1/*'Figure10:BloomlterComparisonofle'foo'withlaterversions'foo.
1','foo.
2',.
.
.
'foo.
10'tively.
Figure9showsthatwhilethelewiththesamenament/cong.
ntmatchedin57%ofthebits,thelesrc/cong.
inmatchedin66%.
Wefurtherveriedthisbycomputingthecorrespondingdiffoutputof1481and1172bytes,respectively.
Thisexperimentfurtherem-phasizestheneedforcontent-basedsimilaritydetection.
TofurtherillustratethatBloomlterscandifferenti-atebetweenmultiplesimilarles,weextractedatechni-caldocumentationle'foo'(say)(ofsize175KB)in-crementallyfromaCVSarchive,generating10differ-entversions,with'foo'beingtheoriginal,'foo.
1'be-ingtherstversion(withachangeof4154bytesfrom'foo')and'foo.
10'beingthelast.
TheCDCchunksizeswerechosenasintheChangeLogleexampleabove.
AsshowninFigure10,theBloomlterfor'foo'matchedthemost(98%)withtheclosestversion'foo.
1'andtheleast(58%)withthelatestversion'foo.
10'.
4.
2.
1AnalysisThemainconsiderationwhenusingBloomltersforsimilaritydetectionisthefalsematchprobabilityoftheabovealgorithmasafunctionofsimilaritybetweenthesourceandacandidatele.
Extendingtheanalysisformembershiptesting[4]tosimilaritydetection,weproceedtodeterminetheexpectednumberofinferredmatchesbetweenthetwosets.
LetAandBbethetwosetsbeingcomparedforsimilarity.
Letmdenotethenumberofbits(size)intheBloomlter.
Forsimplicity,assumethatbothsetshavethesamenumberofelements.
LetndenotethenumberofelementsinbothsetsAandBi.
e.
,|A|=|B|=n.
Asbefore,kdenotesthenumberofhashfunctions.
Theprobabilitythatabitissetbyahashfunctionhifor1≤i≤kis1m.
Abitcanbesetbyanyofthekhashfunctionsforeachofthenelements.
Therefore,theprobabilitythatabitisnotsetbyanyhashfunctionforanyelementis(11m)nk.
Thus,theproba-bility,p,thatagivenbitissetintheBloomlterofAisgivenby:p=111mnk≈1enkm(1)Foranelementtobeconsideredamemberoftheset,allthecorrespondingkbitsshouldbeset.
Thus,theprobabilityofafalsematch,i.
e.
,anoutsideelementisinferredasbeinginsetA,ispk.
LetCdenotetheinter-sectionofsetsAandBandcdenoteitscardinality,i.
e.
,C=A∩Band|C|=c.
Forsimilaritycomparison,letustakeeachelementinsetBandcheckifitbelongstotheBloomlterofthegivensetA.
Weshouldndthattheccommonelementswilldenitelymatchandafewoftheother(nc)mayalsomatchduetothefalsematchprobability.
ByLinear-ityofExpectation,theexpectednumberofelementsofBinferredtohavematchedwithAisE[#ofinferredmatches]=(c)+(nc)pkTominimizethefalsematches,thisexpectednumbershouldbeasclosetocaspossible.
Forthat(nc)pkshouldbecloseto0,i.
e.
,pkshouldapproach0.
Thishappenstobethesameasminimizingtheprobabilityofafalsepositive.
Expandingpandunderasymptoticanalysis,itreducestominimizing(1enkm)k.
Us-ingthesameanalysisforminimizingthefalsepositiverate[8],theminimaobtainedafterdifferentiationiswhenk=mnln2.
Thus,theexpectednumberofinferredmatchesforthisvalueofkbecomesE[#ofinferredmatches]=c+(nc)(0.
6185)mnThus,theexpectednumberofbitssetcorrespondingtoinferredmatchesisE[#ofmatchedbits]=m111mkc+(nc)(0.
6185)mnUndertheassumptionofperfectlyrandomhashfunc-tions,theexpectednumberoftotalbitssetintheBloomlterofthesourcesetA,ismp.
Theratio,then,oftheex-pectednumberofmatchedbitscorrespondingtoinferredmatchesinA∩BtotheexpectedtotalnumberofbitssetintheBloomlterofAis:E[#ofmatchedbits]E[#totalbitsset]=1ekm(c+(nc)(0.
6185)mn)1enkmUSENIXAssociation288Figure11:CDCcomparisonofthele'Emacs-20.
7/ChangeLog'withles'Emacs-20.
1/*'Figure12:CDCcomparisonofthele'Emacs-20.
7/nt/cong.
nt'withles'Emacs-20.
1/*'Figure13:CDCComparisonofle'foo'withlaterversions'foo.
1','foo.
2',.
.
.
'foo.
10'Observethatthisratioequals1whenalltheelementsmatch,i.
e.
,c=n.
Iftherearenomatchingelements,i.
e.
,c=0,theratio=2(1(0.
5)(0.
6185)mn).
Form=n,thisevaluatesto0.
6973,i.
e.
,69%ofmatchingbitsmaybefalse.
Forlargervalues,m=2n,4n,8n,10n,11n,thecorrespondingratiosare0.
4658,0.
1929,0.
0295,0.
0113,0.
0070respectively.
Thus,form=11n,onanaverage,lessthan1%ofthebitssetmaymatchincorrectly.
Theexpectedratioofmatchingbitsishighlycorrelatedtotheexpectedratioofmatchingelements.
Thus,ifalargefractionofthebitsmatch,thenit'shighlylikelythatalargefractionoftheelementsarecommon.
Althoughtheaboveanalysiswasdonebasedonex-pectedvalues,weshowinanextendedtechnicalre-port[13]thatundertheassumptionthatthedifferencebetweenpand(1enkm)isverysmall,theactualnum-berofmatchedbitsishighlyconcentratedaroundtheex-pectednumberofmatchedbitswithsmallvariance[18].
GiventhatthenumberofbitsintheBloomltershouldbelargerthanthenumberofelementsinthesetweneedlargeltersforlargeles.
Oneapproachistoselectanewltersizewhenthelesizedoublesandonlycomparethelesrepresentedwiththesameltersize.
Tosupportsubsetmatching,however,theltersizeforallthelesshouldbeidenticalandthereforealllesneedtohavealtersizeequaltosizerequiredforthelargestle.
4.
2.
2SizeoftheBloomFilterAsdiscussedintheanalysis,thefractionofbitsmatchingincorrectlydependsonthesizeoftheBloomlter.
Fora97%accuratematch,thenumberofbitsintheBloomltershouldbe8xthenumberofelements(chunks)intheset(le).
Foraleofsize128KB,anexpectedandmaximumchunksizeof4KBand64KB,respectivelyresultsinaround32chunks.
TheBloomlterissettobe8xthisvaluei.
e.
,256bits.
Forsmallles,wecansettheexpectedchunksizeto256bytes.
Therefore,theBloomltersizeissetto8xtheexpectednumberofchunks(32for8KBle)i.
e.
,256bits,whichisa0.
39%and0.
02%overheadforlesizeof8KBand128KB,respectively.
4.
3ComparisonwithShinglingPreviousworkonlesimilarityhasmostlybeenbasedonshinglingorsuperngerprints.
Usingthismethod,foreachobject,allthekconsecutivewordsofale(calledk-shingles)arehashedusingRabinngerprint[22]tocreateasetofngerprints(alsocalledfeaturesorpre-images).
Thesengerprintsarethensampledtocom-puteasuper-ngerprintofthele.
Manyvariantshavebeenproposedthatusedifferenttechniquesonhowtheshinglengerprintsaresampled(min-hashing,Modm,Mins,etc.
)andmatched[5–7].
WhileModmselectsallngerprintswhosevaluemodulomiszero;Minsselectsthesetofsngerprintswiththesmallestvalue.
Themin-hashingapproachfurtherrenesthesamplingtobetheminvaluesofsay84randommin-wiseinde-pendentpermutations(orhashes)ofthesetofallshin-glengerprints.
Thisresultsinaxedsizesampleof84ngerprintsthatistheresultingfeaturevector.
Tofurthersimplifymatching,these84ngerprintscanbegroupedas6"super-shingles"byconcatenating14ad-jacentngerprints[9].
InREBL[15]thesearecalledsuper-ngerprints.
Apairofobjectsarethenconsideredsimilarifeitheralloralargefractionofthevaluesinthesuper-ngerprintsmatch.
OurBloomlterbasedsimilaritydetectiondiffersfromtheshinglingtechniqueinseveralways.
Itshouldbenoted,however,thatthevariantsofshinglingdis-cussedaboveimproveupontheoriginalapproachandweprovideacomparisonofourtechniquewiththesevariantswhereverapplicable.
First,shingling(Modm,Mins)computeslesimilarityusingtheintersectionofthetwofeaturesets.
Inourapproach,itrequiresonlythebit-wiseANDofthetwoBloomlters(e.
g.
,two128bitvectors).
Next,shinglinghasahighercomputationaloverheadasitrstsegmentstheleintok-wordshin-gles(k=5in[9])resultinginshinglesetsizeofaboutSk+1,whereSisthelesize.
Later,itcomputestheimage(value)ofeachshinglebyapplyingset(sayH)ofmin-wiseindependenthashfunctions(|H|=84[9])andthenforeachfunction,selectingtheshinglecorre-FAST'05:4thUSENIXConferenceonFileandStorageTechnologiesUSENIXAssociation289FAST'05:4thUSENIXConferenceonFileandStorageTechnologiesspondingtotheminimumimage.
Ontheotherhand,weapplyasetofindependenthashfunctions(typicallylessthan8)tothechunksetofsizeonaverageScwherecistheexpectedchunksize(e.
g.
,c=256bytesforS=8KBle).
Third,thesizeofthefeatureset(numberofshin-gles)dependsonthesamplingtechniqueinshingling.
Forexample,inModm,evensomelargelesmighthaveveryfewfeatureswhereassmalllesmighthavezerofeatures.
Someshinglingvariants(e.
g.
,Mins,Mod2i)aimtoselectroughlyaconstantnumberoffeatures.
OurCDCbasedapproachonlyvariesthechunksizec,tode-terminethenumberofchunksasatrade-offbetweenper-formanceandne-grainedmatching.
Weleavetheem-piricalcomparisonwithshinglingasfuturework.
Ingen-eral,acompactBloomlteriseasiertoattachasaletagandiscomparedsimplybymatchingthebits.
4.
4DirectChunkMatchingforSimilarityThechunk-basedmatchinginthesecondphase,canbedirectlyusedtosimultaneouslydetectsimilarlesbe-tweenthesourceandtarget.
Whenmatchingthechunkhashesbelongingtoale,wecreatealistofcandi-datelesthathaveacommonchunkwiththele.
Thelewiththemaximumnumberofmatchingchunksismarkedasthesimilarle.
ThusthematchingcomplexityofdirectchunkmatchingisO(NumberofChunks).
Thisdirectmatchingtechniquecanalsobeusedincon-junctionwithothersimilaritydetectiontechniquesforvalidation.
WhiletheBloomltertechniqueisgen-eralandcanbeappliedevenwhenadatabaseofalllechunksisnotmaintained,directmatchingisasimpleex-tensionofthechunkmatchingphase.
ToevaluatetheeffectivenessofsimilaritydetectionusingCDC,weperformthesamesetofexperimentsasdiscussedinSection4.
2forBloomlters.
There-sults,asexpected,wereidenticaltotheBloomlterap-proach.
Figures11,12,and13showthecorrespondingplotsformatchingtheles'ChangeLog','nt/cong.
nt'and'foo',respectively.
Directmatchingismoreex-actasthereisnoprobabilityoffalsematching.
TheEmacs-20.
1/ChangeLoglematchedwiththeEmacs-20.
7/ChangeLoglein112outof128CDCs(88%).
Similarly,theEmacs-20.
7/nt/cong.
ntlehadanon-zeromatchwithonlythreeEmacs-20.
1/*leswith8(46%),9(53%),5(29%)matchesoutof17correspond-ingtothelesnt/cong.
nt,src/cong.
inandnt/cong.
h,resp.
Thele'foo'matched'foo.
1'in99%oftheCDCs.
5ExperimentalEvaluationInthissection,weevaluateTAPERusingseveralwork-loads,analyzethebehaviorofthevariousphasesoftheprotocolandcomparethebandwidthefciency,compu-tationoverhead,andresponsetimeswithtar+gzip,rsync,andCDC.
5.
1MethodologyWehaveimplementedaprototypeofTAPERinCandPerl.
ThechunkmatchinginPhaseIIusescodefromtheCDCimplementationofLBFS[20]andusestheSleep-yCatsoftware'sBerkeleyDBdatabasepackageforpro-vidinghashbasedindexing.
Thedelta-compressionofPhaseIVwasimplementedusingvcdiff[14].
Theex-perimentaltestbedusedtwo933MHzIntelPentiumIIIworkstationswith512MBofRAMrunningLinuxker-nel2.
4.
22connectedbyfull-duplex100MbitEthernet.
SoftwareSources(SizeKB)WorkloadNo.
ofFilesTotalSizelinux-src(2.
4.
26)13235161,973AIX-src(5.
3)36007874,579emacs(20.
7)208654,667gcc(3.
4.
1)22834172,310rsync(2.
6.
2)2507,479ObjectBinaries(SizeMB)linux-bin(Fedora)383871,339AIX-bin(5.
3)615273,704WebData(SizeMB)CNN13534247Yahoo12167208IBM9223248GoogleGroups16284251Table1:CharacteristicsofthedifferentDatasetsForouranalysis,weusedthreedifferentkindsofwork-loads:i)softwaredistributionsources,ii)operatingsys-temobjectbinaries,andiii)webcontent.
Table1detailsthedifferentworkloadcharacteristicsgivingthetotalun-compressedsizeandthenumberoflesforthenewerversionofthedataatthesource.
Workloadlinux-srcAIX-srcemacsgccVersions2.
4.
26-2.
4.
225.
3-5.
220.
7-20.
13.
4.
1-3.
3.
1SizeKB161,973874,57954,667172,310PhaseI62,804809,51447,954153,649PhaseII24,321302,52930,71898,428PhaseIII20,689212,35127,89582,952PhaseIV18,127189,52826,12673,263DiffOutput10,260158,46314,36260,215Table2:EvaluationofTAPERPhases.
ThenumbersdenotetheunmatcheddatainKBremainingattheendofaphase.
SoftwaredistributionssourcesForthesoftwaredis-tributionworkload,weconsiderthesourcetreesofthegcccompiler,theemacseditor,rsync,theLinuxkernel,andtheAIXkernel.
ThedatainthesourcetreesconsistsofonlyASCIItextles.
ThegccworkloadrepresentsthesourcetreeforGNUgccversions3.
3.
1atthetargetsandversion3.
4.
1atthesource.
Theemacsdatasetcon-USENIXAssociation290sistsofthesourcecodeforGNUEmacsversions20.
1and20.
7.
Similarly,thersyncdatasetdenotesthesourcecodeforthersyncsoftwareversions2.
5.
1and2.
6.
2,withtheadditionthat2.
6.
2alsoincludestheobjectcodebina-riesofthesource.
Thetwokernelworkloads,linux-srcandAIX-src,comprisethesourcetreeoftheLinuxker-nelversions2.
4.
22and2.
4.
26andthesourcetreefortheAIXoperatingsystemversions5.
2and5.
3,respectively.
ObjectbinariesAnothertypeofdatawidelyupgradedandreplicatediscodebinaries.
Binaryleshavediffer-entcharacteristicscomparedtoASCIIles.
Tocaptureatreeofcodebinaries,weusedtheoperatingsystembi-nariesofLinuxandAIX.
Wescannedtheentirecontentsofthedirectorytrees/usr/bin,/usr/X11R6and/usr/libinRedHat7.
3andRedHatFedoraCoreIdistributions,de-notedbylinux-bindataset.
Themajorityofdatainthesetreescomprisesofsystembinariesandsoftwarelibrariescontainingmanyobjectles.
TheAIX-bindatasetcon-sistsofobjectbinariesandlibrariesin/usr,/etc,/var,and/sbindirectoriesofAIXversions5.
2and5.
3.
WebcontentWebdataisarichcollectionoftext,im-ages,video,binaries,andvariousotherdocumentfor-mats.
TogetarepresentativesampleofwebcontentthatcanbereplicatedatmirrorsitesandCDNs,weusedawebcrawlertocrawlanumberoflargewebservers.
Forthis,weusedthewget1.
8.
2crawlertoretrievethewebpagesandalltheleslinkedfromthem,recursivelyforanunlimiteddepth.
However,welimitedthesizeofthedownloadedcontenttobe250MBandrestrictedthecrawlertoremainwithinthewebsite'sdomain.
Thefourdatasets,CNN,Yahoo,IBMandGoogleGroups,denotethecontentofwww.
cnn.
com,www.
yahoo.
com,www.
ibm.
com,andgroups.
google.
comwebsitesthatwasdownloadedeverydayfrom15Sep.
to10Oct.
,2004.
CNNisanewsandcurrentaffairssitewhereinthetop-levelwebpageschangesignicantlyoveraperiodofaboutaday.
Yahoo,apopularportalontheInternet,representsmultiplepageswhichhavesmallchangescorrespondingtodailyupdates.
IBMisthecompany'scorporatehomepageprovidinginformationaboutitsproductsandservices.
Here,againthetop-levelpageschangewithannounce-mentsofproductlaunchesandtechnologyevents,whiletheothersrelatingtotechnicalspecicationsareunchanged.
FortheGoogleGroupsdataset,mostpageshavenumerouschangesduetonewuserpostingsandupdatescorrespondingtofeedbackandreplies.
5.
2EvaluatingTAPERPhasesAswedescribedearlier,TAPERisamulti-phaseproto-colwhereeachphaseoperatesatadifferentgranularity.
Inthissection,weevaluatethebehaviorofeachphaseondifferentworkloads.
Foreachdataset,weupgradetheWorkloadlinux-srcAIX-srcemacsgccPhaseI29179246502PhaseII3173,968241762PhaseIII2973,6813811,204Table3:UncompressedMetadataoverheadinKBoftherstthreeTAPERphases.
Figure14:Normalizedtransmitteddatavolume(uncom-pressed)byRsync,HHT+CDC,TAPERonSoftwaredistribu-tionandObjectbinaries.
Theresultsarenormalizedagainstthetotalsizeofthedataset.
olderversiontothenewerversion,e.
g.
,Linuxversion2.
4.
22to2.
4.
26.
Foreachphase,wemeasurethetotalsizeofunmatcheddatathatremainsforthenextphaseandthetotalmetadatathatwasexchangedbetweenthesourceandthetarget.
TheparametersusedforexpectedandmaxchunksizeinPhaseIIwas4KBand64KB,respectively.
ForPhaseIII,theblocksizeparameterwas700bytes.
ThedataforPhaseIVrepresentsthenalun-matcheddatathatincludesthedelta-bytes.
Inpractice,thisdatawouldthenbecompressedusinggzipandsenttothetarget.
Wedonotpresentthenalcompressednum-bershereaswewanttofocusonthecontributionofTA-PERandnotgzip.
Forcomparison,weshowthesizeoftheoutputof"diff-r".
Table2showsthetotalunmatcheddatathatremainsafterthecompletionofaphasefortheworkloadslinux-src,AIX-src,emacsandgcc.
Addition-ally,Table3showsthemetadatathatwastransmittedforeachphaseforthesameworkloads.
Thetableshowsthatthedatareductionintermsofuncompressedbytestrans-mittedrangefrom88.
8%forthelinux-srcand78.
3%fortheAIX-srcto52.
2%foremacsand58%forgcc.
Ontheotherhand,theoverhead(comparedtotheoriginaldata)ofmetadatatransmissionrangedfrom0.
5%forlinux-srcand0.
9%forAIX-srcto1.
2%foremacsand1.
4%forgcc.
ObservethatthemetadatainPhaseIIandIIIisinthesameballparkalthoughthematchinggranularityisreducedbyanorderofmagnitude.
Thisisduetotheunmatcheddatareductionperphase.
Themetadataover-headofPhaseIisrelativelyhigh.
Thisispartlyduetothestrong20-bytehashSHA-1hashthatisused.
NotethattheunmatcheddataattheendofPhaseIVisintheFAST'05:4thUSENIXConferenceonFileandStorageTechnologiesUSENIXAssociation291FAST'05:4thUSENIXConferenceonFileandStorageTechnologiesFigure15:Rsync,TAPERComparisononCNNwebdatasetFigure16:Rsync,TAPERComparisononYahoowebdatasetsameballparkasthediffoutputbetweenthenewandolddataversionbutthatcomputingthelatterrequiresanodetohaveacopyofbothversionsandsoisnotaviablesolutiontoourproblem.
5.
3ComparingBandwidthEfciencyInthissection,wecomparethebandwidthefciencyofTAPER(intermsoftotaldataandmetadatatrans-ferred)withtar+gzip,rsync,andHHT+CDC.
Todiffer-entiatebandwidthsavingsduetoTAPERfromdatacom-pression(gzip),werstillustrateTAPER'scontributiontobandwidthsavingswithoutgzipforsoftwaresourcesandobjectbinariesworkloads.
Figure14showsthenor-malizedtransmitteddatavolumebyTAPER,rsync,andHHT+CDCforthegivendatasets.
Thetransmitteddatavolumeisnormalizedagainstthetotalsizeofthedataset.
Forthegcc,AIX-src,andlinux-bindatasets,rsynctrans-mittedabout102MB,332MB,and1.
17GB,respec-tively.
Incomparison,TAPERsentabout73MB,189MB,and896MBcorrespondingtobandwidthsavingsof29%,43%and24%,respectivelyforthesethreedatasets.
Overall,weobservethatTAPER'simprovementoverrsyncrangedfrom15%to43%forsoftwaresourcesand24%to58%forobjectbinariesworkload.
Usinggzipcompression,wecompareTAPERandrsyncwiththebaselinetechniqueoftar+gzip.
Forthelinux-srcandAIX-bindata-sets,thecompressedtarball(tar+gzip)ofthedirectorytrees,Linux2.
4.
26andAIX5.
3,areabout38MBand1.
26GB,respectively.
TA-PER(withcompressioninthelastphase)sentabout5MBand542MBofdifferencedata,i.
e.
,aperformancegainof86%and57%respectivelyoverthecompressedtaroutput.
Comparedtorsync,TAPER'simprovementrangedfrom18%to25%forsoftwaresourcesand32%to39%forobjectbinariesdatasets.
Forwebdatasets,wemarkedthedatacrawledonSep.
15,2004asthebasesetandthesixadditionalversionscorrespondingtothedatagatheredafter1,5,10,15,20and25days.
Weexaminedthebandwidthcostofupdat-ingthebasesettoeachoftheupdatedversionswithoutcompression.
Figures15,16,17,18showthetotaldatatransmitted(withoutcompression)byTAPERandrsynctoupdatethebaseversionforthewebdatasets.
FortheCNNworkload,thedatatransmittedbyTAPERacrossthedifferentdaysrangedfrom14MBto67MB,whilethatbyrsyncrangedfrom44MBto133MB.
Forthisdataset,TAPERimprovedoverrsyncfrom54%to71%withoutcompressionand21%to43%withcompression.
Similarly,fortheYahoo,IBMandGooglegroupswork-load,TAPER'simprovementoverrsyncwithoutcom-pressionranged44-62%,26-56%,and10-32%,respec-tively.
Withcompression,thecorrespondingbandwidthsavingsbyTAPERforthesethreeworkloadsranged31-57%,23-38%,and12-19%,respectively.
5.
4ComparingComputationalOverheadInthissection,weevaluatetheoverallcomputationoverheadatthesourcemachine.
Micro-benchmarkex-perimentstoanalyzetheperformanceoftheindividualphasesaregiveninSection5.
5.
Intuitively,ahighercom-putationalloadatthesourcewouldlimititsscalability.
Fortheemacsdataset,thecompressedtarballtakes10.
4sofuserand0.
64sofsystemCPUtime.
Thecor-respondingCPUtimesforrsyncare14.
32sand1.
51s.
RecallthatthersttwophasesofTAPERneedonlytobecomputedonceandstored.
ThetotalCPUtimesforthersttwophasesare13.
66s(user)and0.
88s(system).
Thecorrespondingtotaltimesforallfourphasesare23.
64sand4.
31s.
Thus,thetargetspeciccomputationonlyrequiresroughly13.
5swhichisroughlysameasrsync.
Duetospaceconstraints,weomittheseresultsfortheotherdatasets,butthecomparisonsbetweenrsyncandTAPERarequalitativelysimilarforallexperiments.
5.
5AnalyzingResponseTimesInthissection,weanalyzetheresponsetimesforthevariousphasesofTAPER.
SincethephasesofTAPERUSENIXAssociation292Figure17:Rsync,TAPERComparisononIBMwebdatasetFigure18:Rsync,TAPERComparisononGoogleGroupswebdatasetChunkSizes256Bytes512Bytes2KB8KBFileSize(ms)(ms)(ms)(ms)100KB43321MB2927262410MB405321267259Table4:CDChashcomputationtimefordifferentlesandexpectedchunksizesincludesliding-blockandCDC,thesameanalysisholdsforrsyncandanyCDC-basedsystem.
Thetotalresponsetimeincludesthetimefori)hash-computation,ii)match-ing,iii)metadataexchange,andiv)naldatatransmis-sion.
Inthepreviousdiscussiononbandwidthefciency,thetotalmetadataexchangeanddatatransmissionbytevaluesareagoodindicatorofthetimespentinthesetwocomponents.
Theothertwocomponentsofhash-computationandmatchingarewhatwecomparenext.
Thehash-computationtimeforasingleblock,usedinthesliding-blockphase,tocomputea2-bytechecksumanda4-byteMD4hashforblocksizesof512bytes,2KB,and8KB,are5.
37s,19.
77s,and77.
71s,re-spectively.
Eachvalueisanaverageof1000runsoftheexperiment.
ForCDC,thehash-computationtimeincludesdetectingthechunkboundary,computingthe20-byteSHA-1signatureandpopulatingthedatabaseforindexing.
Table4showstheCDCcomputationtimesfordifferentlesizesof100KB,1MB,and10MB,usingdifferentexpectedchunksizesof256bytes,512bytes,2KB,and8KB,respectively.
TheBloomltergenerationtimefora100KBle(309CDCs)takes118ms,120ms,and126msfor2,4,and8hashfunctions,respectively.
Figure19showsthematchtimeforsliding-blockandCDCforthe3lesizes(10KB,1MBand10MB)and3blocksizes(512bytes,2KB,8KB).
Althoughthexed-blockhashgenerationis2to4timesfasterthanCDCchunkhash-computation,thetimeforCDCmatch-ingis10toa100timesfaster.
Thehash-computationtimecanbeamortizedovermultipletargetsastheresultsarestoredinadatabaseandre-used.
SincethematchingtimeismuchfasterforCDCweuseitinPhaseIIwhereitisusedtomatchallthechunksoveralltheles.
Figure19:MatchingtimesforCDCandsliding-block(SLB).
6RelatedWorkOurworkiscloselyrelatedtotwoprevioushash-basedtechniques:slidingblockusedinrsync[2],andCDCin-troducedinLBFS[20].
AsdiscussedinSection2,thesliding-blocktechniqueworkswellonlyundercertainconditions:smalllecontentupdatesbutnodirectorystructurechanges(renames,moves,etc.
).
Rsyncusesslidingblockonlyandthusperformspoorlyinname-resilience,scalability,andmatchingtime.
TAPER,how-ever,usesslidingblockinthethirdphasewhentheseconditionshold.
TheCDCapproach,inturn,issen-sitivetothechunksizeparameter:smallsizeleadstone-grainedmatchingbuthighmetadatawhereaslargechunksizeresultsinlowermetadatabutfewermatches.
Somerecentstudieshaveproposedmultiresolutionpar-titioningofdatablockstoaddresstheproblemoftheoptimalblock-sizebothinthecontextofrsync[16]andCDC[12].
Thisresultsinatrade-offbetweenbandwidthFAST'05:4thUSENIXConferenceonFileandStorageTechnologiesUSENIXAssociation293FAST'05:4thUSENIXConferenceonFileandStorageTechnologiessavingsandthenumberofnetworkround-trips.
Previouseffortshavealsoexploredhash-basedschemesbasedonslidingblockandCDCforduplicatedatasuppressionindifferentcontexts.
Moguletal.
useMD5checksumsoverwebpayloadstoeliminateredun-dantdatatransfersoverHTTPlinks[19].
Rheaetal.
describeaCDCbasedtechniquethatremovesduplicatepayloadtransfersatnergranularities[23]comparedtoMogul'sapproach.
Venti[21]usescryptographichashesonCDCstoreduceduplicationinanarchivalstoragesys-tem.
Farsite[3],asecure,scalabledistributedlesystem,employslelevelhashingtoreclaimstoragespacefromduplicateles.
Youetal.
examinewhole-lehashingandCDCstosuppressduplicatedataintheDeepstorearchivalstoragesystem[27].
Sapuntzakisetal.
computeSHA-1hashesoflestoreducedatatransferredduringthemigrationofappliancestatesbetweenmachines[24].
Forsimilaritydetection,Manber[17]originallypro-posedtheshinglingtechniquetondsimilarlesinalargelesystem.
BroderrenedManber'stechniquebyrstusingadeterministicsampleofthehashvalues(e.
g.
,min-hashing)andthencoalescingmultiplesampledn-gerprintsintosuper-ngerprints[5–7].
Incontrast,TA-PERusesBloomlters[4]whichcompactlyencodetheCDCsofagivenletosavebandwidthandperformsfastbit-wiseANDofBloomltersforsimilaritydetection.
Bloomltershavebeenproposedtoestimatethecardi-nalityofsetintersection[8]buthaveneverbeenappliedfornear-duplicateeliminationinlesystems.
FurtherimprovementsonBloomlterscanbeachievedbyusingcompressedBloomlters[18],whichreducethenumberofbitstransmittedoverthenetworkatthecostofincreas-ingstorageandcomputationcosts.
7ConclusionInthispaperwepresentTAPER,ascalabledatarepli-cationprotocolforreplicasynchronizationthatprovidesfourredundancyeliminationphasestobalancethetrade-offbetweenbandwidthsavingsandcomputationover-heads.
Experimentalresultsshowthatincomparisonwithrsync,TAPERreducesbandwidthsavingsby15%to71%,performsfastermatching,andscalestoalargernumberofreplicas.
Infuturework,insteadofsynchro-nizingdataonaper-clientbasis,TAPERcan(a)usemulticasttotransferthecommonupdatestomajorityoftheclients,andlater(b)usecooperativesynchronizationwhereclientsexchangesmallupdatesamongthemselvesfortheremainingindividualdifferences.
8AcknowledgmentsWethankRezaulChowdhury,GregPlaxton,SridharRa-jagopalan,MadhukarKorupolu,andtheanonymousre-viewersfortheirinsightfulcomments.
ThisworkwasdonewhiletherstauthorwasaninternatIBMAlmadenResearchCenter.
ThisworkwassupportedinpartbytheNSF(CNS-0411026),theTexasAdvancedTechnologyProgram(003658-0503-2003),andanIBMFacultyRe-searchAward.
References[1]http://th.
informatik.
uni-mannheim.
de/people/lucks/hashcollisions/.
[2]Rsynchttp://rsync.
samba.
org.
[3]A.
Adya,W.
J.
Bolosky,M.
Castro,G.
Cermak,R.
Chaiken,J.
R.
Douceur,J.
Howell,J.
R.
Lorch,M.
Theimer,andR.
P.
Watten-hofer.
FARSITE:Federated,available,andreliablestorageforanincompletelytrustedenvironment.
InOSDI,Dec.
2002.
[4]B.
H.
Bloom.
Space/timetrade-offsinhashcodingwithallowableerrors.
Commun.
ACM,13(7):422–426,1970.
[5]A.
Z.
Broder.
Ontheresemblanceandcontainmentofdocuments.
InSEQUENCES,1997.
[6]A.
Z.
Broder.
Identifyingandlteringnear-duplicatedocuments.
InCOM,pages1–10,2000.
[7]A.
Z.
Broder,S.
C.
Glassman,M.
S.
Manasse,andG.
Zweig.
Syntacticclusteringoftheweb.
InWWW,1997.
[8]A.
Z.
BroderandM.
Mitzenmacher.
NetworkapplicationsofBloomlters:Asurvey.
InAllerton,2002.
[9]D.
Fetterly,M.
Manasse,andM.
Najork.
Ontheevolutionofclustersofnear-duplicatewebpages.
InLA-WEB,2003.
[10]V.
Henson.
Ananalysisofcompare-by-hash.
InHotOSIX,2003.
[11]D.
Hitz,J.
Lau,andM.
Malcolm.
FilesystemdesignforanNFSleserverappliance.
TechnicalReportTR-3002,NetworkAppli-anceInc.
[12]U.
IrmakandT.
Suel.
Hierarchicalsubstringcachingforefcientcontentdistributiontolow-bandwidthclients.
InWWW,2005.
[13]N.
Jain,M.
Dahlin,andR.
Tewari.
TAPER:Tieredapproachforeliminatingredundancyinreplicasynchronization.
TechnicalReportTR-05-42,Dept.
ofComp.
Sc.
,Univ.
ofTexasatAustin.
[14]D.
G.
KornandK.
-P.
Vo.
Engineeringadifferencingandcom-pressiondataformat.
InUSENIXAnnualTechnicalConference,GeneralTrack,pages219–228,2002.
[15]P.
Kulkarni,F.
Douglis,J.
D.
LaVoie,andJ.
M.
Tracey.
Redun-dancyeliminationwithinlargecollectionsofles.
InUSENIXAnnualTechnicalConference,GeneralTrack,pages59–72,2004.
[16]J.
Langford.
Multiroundrsync.
Unpublishedmanuscript.
http://www-2.
cs.
cmu.
edu/jcl/research/mrsync/mrsync.
ps.
[17]U.
Manber.
Findingsimilarlesinalargelesystem.
InUSENIXWinterTechnicalConference,1994.
[18]M.
Mitzenmacher.
CompressedBloomlters.
IEEE/ACMTrans.
Netw.
,10(5):604–612,2002.
[19]J.
C.
Mogul,Y.
-M.
Chan,andT.
Kelly.
Design,implementation,andevaluationofduplicatetransferdetectioninHTTP.
InNSDI,2004.
[20]A.
Muthitacharoen,B.
Chen,andD.
Mazieres.
Alow-bandwidthnetworklesystem.
InSOSP,2001.
[21]S.
QuinlanandS.
Dorward.
Venti:anewapproachtoarchivalstorage.
InFAST,2002.
[22]M.
O.
Rabin.
Fingerprintingbyrandompolynomials.
TechnicalReportTR-15-81,HarvardUniversity,1981.
[23]S.
C.
Rhea,K.
Liang,andE.
Brewer.
Value-basedwebcaching.
InWWW,pages619–628,2003.
[24]C.
P.
Sapuntzakis,R.
Chandra,B.
Pfaff,J.
Chow,M.
S.
Lam,andM.
Rosenblum.
Optimizingthemigrationofvirtualcomputers.
InOSDI,Dec.
2002.
[25]R.
Thurlow.
Aserver-to-serverreplication/migrationprotocol.
IETFDraftMay2003.
[26]X.
Wang,Y.
Yin,andH.
Yu.
FindingcollisionsinthefullSHA1.
InCrypto,2005.
[27]L.
You,K.
Pollack,andD.
D.
E.
Long.
Deepstore:anarchivalstoragesystemarchitecture.
InICDE,pages804–815,2005.
USENIXAssociation294
horain怎么样?horain cloud是一家2019年成立的国人主机商家,隶属于北京辰帆科技有限公司,horain持有增值电信业务经营许可证(B1-20203595),与中国电信天翼云、腾讯云、华为云、UCloud、AWS等签署渠道合作协议,主要提企业和个人提供云服务器,目前商家推出了几款特价物理机,都是在内地,性价比不错,其中有目前性能比较强悍的AMD+NVMe系列。点击进入:horain...
今天遇到一个网友,他在一个服务器中搭建有十几个网站,但是他之前都是采集站点数据很大,但是现在他删除数据之后希望设置可能有索引的文章给予404跳转页面。虽然他程序有默认的404页面,但是达不到他引流的目的,他希望设置统一的404页面。实际上设置还是很简单的,我们找到他是Nginx还是Apache,直接在引擎配置文件中设置即可。这里有看到他采用的是宝塔面板,直接在他的Nginx中设置。这里我们找到当前...
hosthatch在做美国独立日促销,可能你会说这操作是不是晚了一个月?对,为了准备资源等,他们拖延到现在才有空,这次是针对自己全球14个数据中心的VPS。提前示警:各个数据中心的网络没有一个是针对中国直连的,都会绕道而且ping值比较高,想买的考虑清楚再说!官方网站:https://hosthatch.com所有VPS都基于KVM虚拟,支持PayPal在内的多种付款方式!芝加哥(大硬盘)VPS5...
rsync为你推荐
域名空间注册免费空间域名注册?ip代理地址ip代理有什么用?有图片..查询ip如何查IP网址免费vps服务器如何免费搭建自己的vps服务器域名购买为什么要购买域名,域名是干嘛用的?虚拟主机申请在哪里可以申请到虚拟主机呢台湾vps台湾服务器租用托管那里好php虚拟空间虚拟空间怎么修改php.ini配置美国网站空间美国空间做什么网站好?美国网站空间论坛选择空间可以选美国网站空间吗?
免费二级域名申请 贝锐花生壳域名 阿云浏览器 腾讯云盘 seovip ubuntu更新源 嘉洲服务器 我爱水煮鱼 帽子云 刀片式服务器 中国电信测速网 免费高速空间 免费网页空间 吉林铁通 新世界服务器 免费外链相册 中国电信测速网站 lamp兄弟连 免费个人主页 金主 更多