caches服务器社区

服务器社区  时间:2021-04-15  阅读:()
MakeitFresh,MakeitQuick—SearchingaNetworkofPersonalWebserversMayankBawaComputerScienceDept.
StanfordUniversityStanford,CA94305bawa@db.
stanford.
eduRobertoJ.
BayardoJr.
IBMAlmadenResearchCenter650HarryRoadSanJose,CA95120bayardo@alum.
mit.
eduSridharRajagopalanIBMAlmadenResearchCenter650HarryRoadSanJose,CA95120sridhar@almaden.
ibm.
comEugeneJ.
ShekitaIBMAlmadenResearchCenter650HarryRoadSanJose,CA95120shekita@almaden.
ibm.
comABSTRACTPersonalwebservershaveproventobeapopularmeansofsharinglesandpeercollaboration.
Unfortunately,thetransientavailabil-ityandrapidlyevolvingcontentonsuchhostsrendercentralized,crawl-basedsearchindicesstaleandincomplete.
Toaddressthisproblem,weproposeYouSearch,adistributedsearchapplicationforpersonalwebserversoperatingwithinasharedcontext(e.
g.
,acorporateintranet).
WithYouSearch,searchresultsarealwaysfast,freshandcomplete—propertiesweshowarisefromanarchitec-turethatexploitsboththeextensivedistributedresourcesavailableatthepeerwebserversinadditiontoacentralizedrepositoryofsummarizednetworkstate.
YouSearchextendstheconceptofasharedcontextwithinwebcommunitiesbyenablingpeerstoag-gregateintogroupsanduserstosearchoverspecicgroups.
Inthispaper,wedescribethechallenges,design,implementationandexperienceswithasuccessfulintranetdeploymentofYouSearch.
CategoriesandSubjectDescriptorsH.
3.
4[SystemsandSoftware]:Distributedsystems,informationnetworks,Performanceevaluation(efciencyandeffectiveness);H.
4.
1[OfceAutomation]:Groupware;H.
5.
4[Hypertext/Hyper-media]:Architectures,UserissuesGeneralTermsAlgorithms,Performance,HumanFactorsKeywordsWebsearch,Intranetsearch,peer-to-peer(P2P),decentralizedsys-tems,informationcommunities1.
INTRODUCTIONThesimplicityofWWWprotocols(e.
g.
,HTTP)andthematuri-tyofthoseoftheInternet(e.
g.
,DNS)havecombinedtomakewebtoolsthepreferredmethodofsharinginformationonbothpublicandprivatenetworks.
ThecommoditizationofpersonalcomputerCopyrightisheldbytheauthor/owner(s).
WWW2003,May20–24,2003,Budapest,Hungary.
ACM1-58113-680-3/03/0005.
hardwareandtheubiquityoftheweb-browserinterfacehavemadeessentiallyuniversalcontentaccesspossible.
Thesefactshavedriv-entheadoptionofpersonalwebserversforsharinglesandpeercollaboration.
Asanexample,almost,peoplewithinIBMusetheYouServ[14]personalwebservingsystemeveryweek.
Criticaltogroupcollaborationistheabilitytonddesiredcon-tent.
WhiletheWeboffersURLsforaccessingcontentfromanamedlocation,thepopularmethodforlocatingcontentonthewebisthroughsearch.
Webelievepersonalwebserversareusedinamannerwhichmakessearchevenmorepreferable:¤Contentonpersonalwebservershashighlytransientavailabili-ty(eveninsystemssuchasYouServwhichoffersimplemeansofimprovingavailabilitythroughsitereplication).
Onewouldbeill-advisedtorelyonlocatingcontentbyURLwhentheref-erencedlocationmaybeinaccessible.
¤Contentonpersonalwebserversistypicallypoorlyarranged,renderinglocationbynavigationineffective.
¤Contentonpersonalwebserversconsistsofamuchlargerfrac-tionofnon-HTMLdocumentsthanthetypicalweb.
Thesedoc-umentstypicallyarenotcross-linkedlikeHTML,furtherre-ducingtheeffectivenessoflocationbynavigation.
InadequacyofWebSearchToolsDespitetheimportanceofsearchovercontenthostedonpersonalwebservers,searchesofferedbytheWeb'sexistingsearchtoolsareinadequate.
Anysearchindexthatreliesonwebcrawlingwillomitanyhostthatisunavailableatthetimeofcrawling.
Also,hoststhatdomanagetogetindexedmaybedownatthepointofquerying,leadingtodeadresults.
Finally,acommonusageofpersonalweb-serversistoshareadocumentduringauthoringsothatfeedbackcanbegatheredandsuggestedchangesincorporatedintothedocu-ment.
Thiscausesdocumentstohaveintentionallysmalllife-cycleswhichcannotbecapturedbycrawling.
Resultsarethereforeboundtobestaleandincomplete.
AddressingtheChallengesInthispaper,weproposeanddescribeYouSearch,aneasytouse,lowcostapplicationthatseamlesslyintegrateswithapersonalweb-server.
AYouSearch-enabledwebserverprovidesatrueweb-basedcontentsharingsystembyallowingfresh,fastandcompletesearch-esoverpersonalcontent.
Motivatedbythedesiderataoffreshness,weoptedforaPeer-to-Peer(P2P)paradigmindesigningYouSearch.
Inthisarchitecture,eachwebserverisenhancedwithasearchcomponentconsistingofacontentindexerandaqueryevaluator.
Theindexerregularlymonitorssharedles,immediatelyupdatingitslocalindexwhenchangesaredetected.
Queriesareforwardedtolocalindexesforevaluationatruntime.
Onlyclientsthatareavailableatquerytimerespond,ensuringliveresults.
TheneedforspeedandcompletenessledustowardsahybridarchitectureinwhichtheP2Pnetworkisaugmentedwithalight-weightcentralizedcomponent.
Peersmaintaincompactsitesum-maries(intheformaBloomlter[16])whichareaggregatedatacentralizedregistrar.
Thesesummariesarequeriedsothatsearch-estargetonlytherelevantmachines.
YouSearchdoesnotrelyonqueryoodingorotherrouting-basedschemesthataresubjecttonetworkfragmentationandlimitedsearchhorizons.
Peershelpre-ducequeryloadonthesystembycachingandsharingqueryresults.
Theyalsocooperatetomaintainfreshnessofthesummaryaggrega-tion.
Thisminimizestheroleofcentralizedresourcesforlowcostandgracefulscaling.
EnhancingtheSharedContextofUsersUsersinaprivatenetworkoftenorganizethemselvesintogroupswithasharedgoal(e.
g.
,projectteams)orasharedinterest(e.
g.
,music).
YouSearchexploitsthissharedcontexttoenhancesearchthroughinter-usercooperation.
Webhostscanbeaggregatedin-tooverlapping(user-dened)groups.
Searchescanberestrictedtocontentsharedbymembersofagroup.
Searchresultswithingroupscanbemadepersistentandhandeditedbyusers.
Whenaqueryisre-runwithinagroup,thegroupspecicpersistentversionofresultsisreturned.
Thus,cooperationamongusersisusedtoimproveboththeefciencyandaccuracyofYouSearch.
DeploymentExperienceYouSearchwasarchitectedtobeextremelysimpletousetoen-hanceitsacceptancebyusers.
YouSearchwasreleasedwithintheIBMintranetinmid-September2002.
Sinceitdoesnotrequirethedeploymentofexpensiveserverhardware,rollouthasbeenprac-ticallycost-free.
Atthetimeofwriting,inlessthanmonths,thesystemhasbeenadoptedbynearly,usersandthenumberissteadilyincreasing.
Ourexperiencesinsuchareal-lifeactiveusagescenarioshowthatYouSearchisfastandefcient,andmostimpor-tantly,satisesusers'needforsearchonpersonalwebservers.
PaperOverviewWebeginwithasurveyofrelatedworkinSection2.
Section3describesYouSearchfromanenduser'sperspective.
Section4providesYouSearch'stechnicaldetails.
Section5evaluatesperfor-manceforYouSearchintheIBMcorporateintranet.
Asaninter-estingaside,wediscussourattemptsateasingperformancetuningofadistributedsysteminSection6.
WeconcludeinSection7.
2.
RELATEDWORKYouSearchisadeployedapplicationthatcombinesseveralideastoprovideaneffectiveandcoherentsolutionforwebsearchingoverpeer-hostedcontent.
Thissectiondiscussesthemostcloselyrelatedsystemsandproposals.
P2PFile-sharingSystemsLikeYouSearch,currentP2Ple-sharingnetworks(Gnutella[3],KaZaa[7])supportsearchovertransient,peer-hostedcontent.
Weeliminatedtheseschemesaspotentialsolutionstoourproblemsincetheysufferfrominefcienciesthatlimittheirscalabilityinprovid-inganegrainedsearchcapability.
Gnutella[3]performssearch-esbyoodingeachquerytoallthehostsinthenetworkreach-ablewithinaxedhorizonimposedbyatime-to-liveparameteronquerypackets.
Sincesearchesrarelyreacheverymachinewhenthenetworkislarge,queryresultsareoftenincomplete.
Inaddition,itsbandwidthandcomputerequirementsareexcessive.
AnothercauseofsearchincompletenessinGnutellaisduetotopartition-ingoftheoverlaytopologyaspeersjoinandleavethenetwork.
KaZaa[7]improvesonthisbasicschemebydelegatingmostoftheworktosuperpeers–machinesinthenetworkwithhighbandwidthandlargecomputeresourcesattheirdisposal.
Eachsuper-peerinKaZaahastoassumequeryprocessingresponsibilitiesforalargenumberofordinarypeers.
Thoughreduced,problemsoffragmen-tationandlimitedhorizonarenotentirelysolvedbythisapproach.
WebelievethatthesuccessofGnutellaandKaZaainspiteoftheseproblemsisduetotheiruseformusicandvideosharing.
Insuchnetworks,themostpopularsongsandlesarebothwidelyreplicatedandthetargetofmostqueries.
Documentsincorporateenvironmentsdonotfollowsuchreplicationorquerypatterns.
Insuchsettings,webelievethattheirinherentlimitsonperformancewillproveastumblingblock.
Napster[11]providedsearchoverpeer-hostedmusiclesbyadoptingahybridschemethatissimilartoYouSearch.
Napsterrequiredtheentiresonglistandsongmetadatafromeachpeertobecentralizedforindexing.
ThisoverheadforcedfragmentationofNapsterintoislandsofserverswhosepeerscouldnotcommu-nicate.
Suchcentralizationofthetermindexbecomesevenmoreinfeasiblewhentermsarisefrombothlenamesaswellasdocu-mentcontents,asinYouSearch.
Perhapsmoreimportantthanthetechnicaldetailsisthephiloso-phyofusebehindcurrentP2Psystems.
Thesesystemsformclosedcommunitiesofusers.
Evenpeoplewhowishtomerelyaccessorsearchsharedcontentmustinstallspecialpurposesoftwarethatus-esaproprietaryprotocol.
Asaresult,eachsuchprotocolcreatesapartitionofshareddatathatisinaccessibletousersoftheothers.
YouSearch,incontrast,isatitscoreweb-compatible,andcloselymimicstheexistinguserexperienceofconductingawebsearch.
P2PResearchProposalsTherehavebeenmanyrecentproposalsforP2Psystemsandarchi-tecturesthathaveyettoprogressintodeploymentandusebyasig-nicantnumberofusers.
Onesuchclassofsystemsincludesstruc-turedoverlaynetworkscalledDistributedHashTables(DHTs)[24,19,26].
DHTsprovideefcientkey-valuelookupsacrosspeers,butDHTsalonedonotsupportcontentsearch.
PlanetP[18]isaresearchprojectthatproposesanothersearch-ableP2Pnetwork.
AsinYouSearch,eachpeerconstructsbloomlterstosummarizeitscontent.
PlanetPpeersthengossipwitheachothertoachievealooselyconsistentdirectoryofeverypeerinthenetwork.
Incontrast,YouSearchpeersexploitadesignatedlightweightregistrarasa"blackboard"forstoringnetworkstate.
Theadvantageisthatregistrar-maintainedstateismuchmorecon-sistentandcanbemoreefcientlymaintained.
Theregistraral-soperformsotherusefulfunctionslikelocatingresultcachesthatwouldbeimpracticaltoprovideviaagossipingscheme.
WebservingToolsSeveralorganizationsprovidewebserversoftwarethatuserscanin-stallontheirmachinestoservetheirownwebsites[1,10,8].
How-ever,mostdonotprovideanintegratedsearchfacility,letaloneonecapableofspanningmanytransientsites.
Solutionsforendow-ingsuchwebserverswithasearchfacilityincludeht://Dig[6]andSWISH-E[12].
Theformerisunsuitableinthepresenceoftran-siencesinceitiscrawl-based,thelatterprovidesforindexingandsearchingofcontentoverjustthelocalsiterunningthewebserver.
OnewebserversupportingsearchacrosstransientpeersisBad-Blue[2].
BadBluecanbeusedtosearchovera(private)networkofhostsrunningBadBlueorother-Gnutellacompliantsoftware.
Inre-lyingontheGnutellaprotocolforsearchingthenetwork,BadBluesuffersfromtheproblemsalreadyoutlinedabove.
CollaborativeApplicationsCollaborativeworktoolssuchasXDegrees[13]andGroove[5]providesecurelesharingamongknowledge-workers.
Groovedoesnotenablesearchoverlessharedbyalargenumberofpeers.
ApressreleaseattheXDegreeswebsitementionstheabilitytosearchoversharedcontentbutitstechnicaldetailsarenotavail-able.
Bharatin[15]proposedSearchPadasanextensiontothesearchinterfaceataclienttokeeptrackofchosen"interesting"queryre-sults.
Theresultsarerememberedandusedinisolationbyeachindividualclient.
Thetoolwasfoundtobehelpfulforusersinastudyconductedinacorporatesetting.
YouSearchenablesuserstorecordandshareresultswithothermembersintheirgroups.
Webelievethatsuchasharedcontextamonggroupmembersmakesthisfunctionalityevenmoreuseful.
CachingofResourcesatPeersTherehavebeenseveralproposalsforcachingresourcesatpeers.
CoopNet[23],Backslash[27]andPROOFS[28]areproposalsforaP2Pcachingschemetoaddressashcrowdsatahost.
Squir-rel[21]andMangoSoft'sCacheLinkproduct[9]proposeorga-nizingaP2Pcacheforwebobjects.
YouSearchexploitspeerre-sourcestocachequeryresults.
ThedistributedsearchinterfaceofYouSearchprovidesanaturalmeansfordistributingcaching.
Itsuseofacentralizedregistrarmakesthediscoveryofcachelocationssimpleandeffective.
3.
END-USERPERSPECTIVEInthissectionwediscusstheexperienceofatypicalendus-eroftheYouSearchapplicationasitiscurrentlydeployedwithintheIBMintranet.
WehighlighthowtheusagecharacteristicsofYouSearchmotivateourdesignchoices.
TheWebSearchMetaphorTheprimaryinterfacetotheworldwidewebformanyusersisthesearchformofferedbyanyofanumberofwebsearchengines.
Be-causeofitsubiquity,expectationsoffunctionforsuchaninterfaceoverpersonal-webserverhostedcontentissetbywhatisavailableonthewebatlarge.
TheYouSearchinterfacepreservestheseex-pectationsbycloselymimickingfamiliarwebsearchinterfacesinbothappearanceandfunctionaswediscussbelow.
TheSearchInterfaceEachYouSearch-enabledpeerprovidesasearchformbothinitsstandard"folderlisting"templateaswellaswithinadedicatedsearchpage(Figure1).
Fromthededicatedsearchform,theus-ercanspecifywhetherthesearchistogoacrossallhostsorberestrictedtothelocalhost.
Inkeepingwithuserexpectation,thesemanticsofthesearchformismodeledafterGoogle[4],withphrasematch(bywayofenclosingphrasesinquotes),anex-clusion(-)operator,asiteoperatorforrestrictingsearchestoadesignatedhost,andsoon.
Noticethatunlikewebsearchengines,YouSearchoffersnosin-glecentralizedqueryform.
Instead,eachparticipatinghostoffersitsownweb-accessiblesearchinterface.
Thus,oursolutionisanamalgamofaP2Pexecutioninfrastructureandtheuserexpecta-tionofa"universalsearchform.
"Figure1:[YouSearchinterface]AuserbrowsingaYouSearchenabledwebsitecanissuequeriesusingtheaboveform-basedsearchinterfaces.
SharingSearchableContentTraditionalwebsearchenginesindexonlycontentthatisreachableviaURLs.
Mostwebserversallow"hiding"theURL'sofsomeweb-accessiblelesbyplacingan"indexle"(index.
html)inthedirectorycontainingthem,sincetheindexlewillbeservedinplaceofadirectorylisting.
Suchhiddenlesarethusnotindexedbywebsearchengines,andmanypeopleexpectsuchlestoremainhiddentoallbutthosewhoknowtheexactURL.
YouSearchpeersbydefaultwillpreservethisexpectationbyindexingonlypubliclyshared,non-hiddencontent.
Filesaresearchablenotonlyviathekeywordsthatappearintheirlenames/URLs,butalsothroughthecontentoftheles.
Whileaccesscontrolledcontentisnotcurrentlysearchable,weplantoallowsearchesbyauthenticatedusersoversuchcontentinthefuture.
Peersupdatetheirlocalindexeverytwominutestoensurefreshness.
Forpeerswhichsharehugeamountsofcontent,thisupdateintervalcanbeincreasedtoreduceprocess-ingload.
ViewingSearchResultsAfterexecutingasearch,aresultspage(Figure2)isdisplayedassoonasenoughresultsareobtainedfromtheunderlyingnetwork.
Whiletheuserviewstheavailablesearchresults,thesystemcon-tinuestogatheradditionalresultsinthebackground.
Thisprovidesquickresponsetimeswhilepreservingtheweborientedinterface.
Whileasearchisunderway,theresultspagewillindicatethetotalnumberofresultsfounduptothatpoint.
Theprecisenumberofresultsisdisplayedoncethesearchrunstocompletion.
Thehitsfromeachindividualhostareappropriatelyrankedusingstandardtextsearchmetrics.
Thoughnotcurrentlyimplemented,Figure2:[YouSearchresultsdisplay]Theresultstoaqueryaredisplayedastheyarebeinggathered.
Noticethe"ofatleast"messageatthetoprightcorneroftheresultspage.
backgroundrankaggregationofinter-hostresults[20]couldcausethemostrelevant(unviewed)resultstopercolateupwardsintherankingevenifdiscoveredlater.
Therankingofalready-viewedresults,however,shouldbefrozentopreservebackandforwardsemanticsofwebbrowsing.
SearchingwithinaSharedContextUserswhosharesomecontextoftenknow"where"informationis,butnotwhichofanumberofdocumentsitisin.
YouSearchlever-agesthisintwoways.
First,byallowingsearchrestrictedtolo-calhosts,andsecond,bysupportinguserdened"groups.
"Grouprestrictionisenabledviaagroupoperatorthatdesignatesaus-erdenedsubsetofhoststhataretobeconsidered(orexcludedifprecededby"-")whensearching.
Forexample,thequerypdfgroup:YouSearchTeamwillidentifyallpdfles(andlescontain-ingthetermpdf)thataresharedbypeersintheYouSearchTeamgroup.
Peersarefreetodeneandmanagegroupsoftheirchoice.
ByleveragingtheexistingIBMcorporatesystemformanagingus-ergroupmemberships,usershaveimmediateaccesstomanygroupdenitionsthathavealreadybeendenedforotherapplications.
UsageCharacteristicsYouSearch'slowbarriertoconductingsearchesmightbeseenasproblematicbyusersofotherP2Pcontentsharingsystems,whereuserswhodonotsharecontentareoftenbarredfromconduct-ingsearchesand/ordownloadstoencouragesharingbyallpartici-pants.
YouServ,however,differssignicantlyinitsusagecharac-teristicsthantheseothernetworks.
InYouServ,usersareidentiedintheURLofeveryletheyshare.
Thisaccountability,particu-larlywhendeployedwithinaworkenvironment,discouragestheuseofYouServforillicitpurposessuchascopyrightinfringement.
YouServisinsteadregularlyusedforconstructivepurposes,suchassharingofwork-relateddocumentsandwebcontent,sharingfamilyphotos,creatinginformativeweb-hostsforvariousworktasks,andsoon.
WithYouServ,userstypicallywantotherstousetheirhostsbothtoexecutesearchesandtodownloadtheircontent,whetherornottheseothersareactivelysharingcontentthemselves.
DeploymentStatusAtthetimeofthiswriting,YouSearchhasbeendeployedforalmostmonths.
Overthelastweek,nearly,peoplehavemadetheircontentavailabletothesearchsystem.
Thefullsetoffunctionalitydescribedinthispaperisbeinggraduallyrolledout.
Forexample,theinitialdeploymentofYouSearchprovidedconjunctivequeries,exclusionandsiteoperators.
Amonthlaterweprovideduserswiththegroupoperator.
Thefeatureallowingsharingofuser-recommendedresultsamonggroupmembersisimplementedbutnotcurrentlydeployed.
Weintendtowaitforuserstofamiliarizethemselveswiththegroupoperatorbeforeitsrelease.
4.
INTERNALSOFYOUSEARCHInthissectionwedetailtheYouSearcharchitectureanditsim-plementation.
Weoverviewthevariouscomponentsandthendis-cusshowtheyinteracttoprovidethefeaturesdiscussedinthepre-vioussection.
Weconcludewithadiscussionofthecommunity'sroleinmaintainingnetworkconsistency.
4.
1SystemOverviewTheparticipantsinYouSearchinclude(1)peernodeswhorunYouSearch-enabledclients,(2)browserswhosearchYouSearch-enabledcontentthroughtheirwebbrowsers,and(3)aRegistrarwhichisacentralizedlight-weightservicethatactslikea"black-board"onwhichpeernodesstoreandlookup(summarized)net-workstate.
ThesearchsysteminYouSearchfunctionsasfollows.
Eachpeernodecloselymonitorsitsowncontenttomaintainafreshlocalin-dex.
Abloomltercontentsummaryiscreatedbyeachpeerandpushedtotheregistrar.
Whenabrowserissuesasearchqueryatapeer,thepeerrstqueriesthesummariesattheregistrartoobtainasetofpeersinthenetworkthatarehostingrelevantdoc-uments.
ThepeersinarethendirectlycontactedbywiththequerytoobtaintheURLsforitsresults.
Toquicklysatisfyanysubsequentlyissuedquerieswithidenticalterms,theresultsfromeachqueryissuedatapeerarecachedforalimitedtimeat.
Thepeernotiestheregistrarofanycacheentryitmaintains.
Thisallowspeersotherthanthathappentoreceivethesamequerytolocateandreturnthecachedresultsinsteadofexecutingthequeryfromscratch.
PeersinYouSearchusethecentralizedregistrarasablackboardbypostingthestateoftheirnodecontentforotherpeerstoquery.
Thisway,theregistrarcanbeusedtoavoidcostlyandofteninef-fectiveoodingofqueriestoirrelevantpeers,andalsoforefcientlocationofrelevantresultcaches.
Thoughitsroleisimportant,theregistrarremainsamostlypassive,andhencelight-weightentity.
Thepeernodesperformalmostalloftheheavyworkinsearching.
4.
2IndexingSharedContentWenowdescribetheindexingprocessindetail.
Anindexingprocess(Figure3)isperiodicallyexecutedateverypeernode.
TheprocessstartsoffwithanInspectorthatexaminessharedlesinaccordancewithauserspeciedindexaccesspolicy.
Eachsharedleisinspectedforitslastmodicationdateandtime.
Iftheleisneworthelehaschanged,theleispassedtotheIndexer.
TheIndexermaintainsadisk-basedinverted-indexoverthesharedcon-tent.
Thenameandpathinformationoftheleareindexedaswell.
Ourimplementationusestheenginedescribedin[17]forindex-inglocalnodecontent,whichprovidessub-secondqueryresponsetimes,efcientindexbuilds,andexcellentresultsranking.
TheSummarizerobtainsalistoftermsfromtheIndexerandcreatesabloomlter[16]fromtheminthefollowingway.
Abitvectoroflength¤iscreatedwitheachbitsetto.
Aspeciedhashfunctionwithrange¨§§¤isusedtohasheachterminandthebitatpositioninissetto.
Noticethatbloomltersareprecisesummariesofcontentatapeer.
Supposewewanttodetermineifatermoccursatapeerwithabloomlter.
Weinspectthebitposition!
in.
Ifthebitis,thenitisguaranteedthatadocumentwithtermdoesnotappearat.
Ifthebitis,thenthetermmightormightnotoccuratsincemultipletermsmighthashtothesamevaluethussettingthesamebitin.
Theprobabilitythatsuchconictsoccurcanbereducedbyincreasingthelength¤.
IndexerSummarizerBitIP-AddressSummaryManager(a)(b)(c)(d)PEERNODE(Alice)InspectorREGISTRARFigure3:[Indexingsharedcontent]Peerprocessesarerunpe-riodicallyatintervalsthatcanbesetbytheenduser.
Conictscanalsobereducedbyusing"independenthashfunc-tions$#§&%§''(§&)andinspectingthatbitpositions$#!
§0%¨!
§''(§)areallsetto.
Conventionally,the"hashfunctionsareusedtosetbitsinasinglebitvector.
However,inYouSearchweconstruct"differentbloomlters,oneforeachhashfunction.
Itcanbeshownthatsuchadesignincursaslightincreaseinfalsepositives.
Nevertheless,weoptformultiplebloomlterstoenablefurtherdecentralizationofregistrar.
Asthenetworkgrowsandtheloadonregistrarincreases,wecaneasilydistributeloadbyplacingthe"bloomltersacrossmultipleservers.
The"bloomltersaresenttotheregistrarwhenapeerbecomesavailable,andwheneverchangesinitscontentaredetected.
TheSummaryManagerattheregistraraggregatesthesebloomltersintoastructurethatmapseachbitpositiontoasetofpeerswhosebloomltershavethecorrespondingbitset.
4.
3QueryingIndexedContentWenowdescribethequeryingprocessindetail.
Anybrowser(Bob)visitingaYouSearchenabledwebsite(Alice'speer)hasaqueryinterfacefromwhichhecansearchthenetwork.
4.
3.
1QueryingGlobalContentSupposeBobwishestosearchallofYouSearchenabledcontent(Figure4).
Bob'squeryisreceivedbyAlice'speernodeviaawebinterfaceandisforwardedtoaCanonicalTransformerthatcon-vertsthequeryintoacanonicalformconsistingofsetsoftermslabeledwiththeassociatedmodier.
Forexample,aqueryofpdfgroup:YouSearchTeamwillbeconvertedto213"54687@9BABCED§!
FCHGI¨P,1Q8AB9¨RS§UTV9BRIWX4UY8AB`'a5@4Y8bcBP!
.
Thecanonicalqueryisforward-edtotheResultGatherer.
TheResultGatherersendsthecanonicalquerytotheregistrarwheretheQueryManagercomputesthehashofkeywordstode-terminethecorrespondingbitsforeachofthe"bloomlters.
TheregistrarlooksupitsbitpositiontoIPaddressmappinganddeter-minestheintersectionofpeerIPaddresssets.
Thesetisthenreturnedtothequeryingpeer(Alice).
TheResultGathereratAlice'speerobtains.
Ifthequerycon-tainedspecialmodiers(e.
g.
,site,group),isfurtherl-teredtocontainonlypeersthatsatisfythemodier.
ItthencontactseachofthepeersinandobtainsalistofURLsdformatchingdocuments.
TheresultsarethenpassedtoResultDisplaywhichthenappropriatelyformatsanddisplaysd.
InordertoreducethelatencyperceivedbyBob,ResultDisplayshowsitsresultsevenasResultGathereriscollectingthem.
Thedecisiontorestrictqueryprocessingattheregistrartokey-wordswasmotivatedbythedesiretoreduceprocessingloadattheWebInterfaceResultDisplayCanonicalTxResultGathererQueryManager(a)PEERNODE(Alice)(b)http://alice.
youserv.
com/yousearchCLIENTBROWSER(Bob)(c)(d)(e)(f)(g)(h)QueryQueryBitHostPeerSearchallofYouServSearchAlice'scontentSearchpersistentresultsPeerResultsCachingPeerHostpeersREGISTRARFigure4:[Queryingglobalcontent]Theregistrarusesitsglobalsummarytodetermineasetofpeersthatcouldhavedocumentsmatchingthequery.
Alice'speerthencontactseachofthesepeerstoobtaintheresults.
registrar.
Thetrade-offinvolvesincreasedbandwidthusageascouldbefurtherlteredbyquerymodiers.
However,thegroupmodierinparticularisanexpensivelterasitinvolvesaremoteprocedurecalltotheintranetgroupserver.
Bydelegatingmodi-erprocessingtoqueryingpeers,theincurredcostisdistributedacrossthecommunity.
Moreover,wehavenowallowedapossibleextensionofhavinggroupsdenedlocallyateachpeer.
Thereg-istrarcanthenbemadetransparenttomaintenancecostsasgroupsevolveovertime.
Notethattheuseofbloomltersassummariesofsharedcontentimpliesthatisguaranteedtocontainallpeersthathaveamatch-ingdocument(nofalsenegatives).
Thisensuresresultcomplete-ness.
However,canhavefalsepositivesinwhichcaseAlicewillreceiveanswersfromsomeofthepeersitcontacts.
Thechancesofsuchfalsepositivescanbereducedbyincreasingthelength¤ofabloomlterorthenumber"ofbloomltersused.
4.
3.
2QueryingLocalContentSupposeBobwishestolimithissearchtoAlice'snodealone.
ThequeryisreceivedbythewebinterfaceatAlice'speer,trans-formedintoacanonicalformandforwardedtoResultGathererasbefore.
TheResultGathererrecognizesthequerytobealocalqueryandlooksupitslocalindextonddocumentsthatmatchthequery.
TheindexreturnsalistofrankedURLsdforthematch-ingdocuments.
TherankedlistissenttoResultsDisplaywhichformatsanddisplaysresultsasbefore.
4.
4CachingQueryResultsAsthenetworkmatures,asignicantfractionofthequerieswillberepeated.
Indeed,ithasbeenwidelyreportedthatquerieshaveazipandistributionandindividualqueriesaretemporallyclustered[29].
Cachingsearchresultsenablesasearchsolutiontoreducecostsbyreusingthesearcheffort.
SinceYouSearchhasabundantresources(thecomputingandstor-ageresourcesofallpeersinthenetwork)atitsdisposal,itisex-tremelyaggressiveinitsuseofcaching.
Everytimeaglobalqueryisansweredthatreturnsnon-zeroresults,thequeryingpeer(Al-ice'speernode)cachestheresultsetofURLsd.
Thepeertheninformstheregistrarofthefact.
TheregistraraddsamappingfromthequerytotheIP-addressofthecachingpeer(Alice)initscachetable.
Eachcacheentryisassociatedwitha(small)lifetimethatismon-itoredbythecachingpeernode.
Thecachingpeeritselfmonitorsandexpiresentriesinitscache,andinformstheregistrarofanysuchchanges.
SupposeBobasksaglobalqueryatAlice'speernodethathasbeencachedatotherpeernodesinthenetwork.
TheResultGather-eratAlice'snodesendsthequerytotheQueryManageratthereg-istrar.
TheQueryManagerattheregistrarlooksupitscachemap-pingforthequeryanddeterminesthesetofpeersthatarecachingthequery.
Theregistrarpicksoneofthem(Ted)atrandomandaskstheResultGathereratAlice'snodetoobtainthecachedre-sultsfromTed'speernode.
4.
4.
1QueryingRecommendedResultsSupposeBobsearchedforaquerythatheexpectsothersinhisgrouptobeinterestedin(e.
g.
,howtoinstallaprinter,locationofweeklymeetings,etc.
).
Sincepeersinagroupshareacontext,Bobexpectsthatresultstheyareinterestedinwillbesimilartohisowninterests.
ItwouldbeconvenientforthemifBobcouldpersisthiseffortsininspectingtheresultstodeterminethemostrelevantanswertohisownquery.
YouSearchprovidesamechanismbywhichsuchinformationcanbesharedbymembersinthegroup.
EachglobalqueryresultisdisplayedwithacheckboxthatallowsBobtoindicateifhefoundtheresultrelevantandwouldliketorecommendit.
IfBoboptstodoso,thequeryandtheselectedresultaresenttotheregistrarwhomaintainssuchmappingsfromquerytotherecommendedURL.
IfBobissignedin,theresultisstoredasarecommendationofBob,otherwiseitisrecordedasananonymousrecommendation.
Bobcanalsoquerytherecommendedinformation.
ThequeryisevaluatedasbeforeexceptthattheResultGathererobtainstheresultsetofURLsdirectlyfromtheregistrar.
TheresultsdisplayedbyResultDisplayaregroupedintothreecategories:recommend-edbyBobhimself,byanonymoususers,andbyidentiedusers.
BobcanalsodeletearesultfromtherecommendedinformationthatwassetbyBobhimselforbyananonymoususer.
Thusthein-formationpooliseditable,allowingpeerstomodifyandaugmentsharedinformation.
4.
5RoleoftheCommunityPeerstakeitonthemselvestokeepthenetworkstateconsistent0100020003000400050006000700080009000100001002003004005006007008009001000NumberofsessionsTime(minutes)PEERSESSIONDURATIONS(AUG---NOV2002)MEDIANAVERAGEFigure5:DistributionofsessiondurationsofYouServpeers.
byabsolvingtheregistrarfromactivelyhavingtomaintainitsstoreofbloomindexandcachemappings.
Eachpeerinformstheregis-trarofanychangesinitsindexorcacheentries.
Wheneverapeerleavesthenetwork,itaskstheregistrartoremoveitsentriesfromtheindexandcachetables.
Ifamisbehavingpeer(Carol)neglectstoinformtheregistrarofitschanges,themappingsattheregistrarbecomeinconsistent.
Insuchacase,theregistrarwouldincludeCarol'speernodeinitssetofhintsofpeerstocontactforaparticularquery.
Thepeersarede-signedtohandleandreportsuchinconsistencies.
Aqueryingpeer(Alice)willtrytoretrieveanswersfromCarolwhoisnolongeravailablecausingtherequesttotimeout.
AlicetheninformstheregistrarthatCarolisunreachable.
Theregistrarveriestheinfor-mationbycontactingCarolandsubsequentlyremovesentriesforCarolfromitsmappings.
Thus,theregistrarinYouSearchservesjustasarepositoryofstate,changingthestateattheexplicitdirec-tionofthepeersthemselves.
5.
PERFORMANCEONAREAL-WORLDDEPLOYMENTYouSearchhasbeendeployedasacomponentoftheYouServpersonalweb-hostingapplicationwithintheIBMcorporateintranetsinceSeptember16,2002,withalimitedbetareleaseprecedingitaweekbeforeonSeptember9,2002.
ThissectionprovidesalookattheusagetrendsofYouSearchcompiledtodate.
WeshowhowthesetrendssupportYouSearch'sdesignprinciplesandourassertionregardingtheimportanceofsearchwithinsuchanetwork.
5.
1PeerLifetimesYouServwasoriginallydeployedwithintheIBMintranetinJu-ly2001.
ApeerenablescontentsharingbystartingtheYouServclient,anddisablesitwhentheclientstops.
TheintervalbetweenstartandstopofaYouServclientisdenedasaYouServsessionforthatpeer.
Figure5showsthedistributionofsessiondurationsobservedamongtheYouServpeersthatwereactivebetweenJuly1,2002andNovember9,2002.
Ascanbeseen,mostofthesessionsareshort,withhalfofthesessionsbeinglessthanhourslong.
Thedottedarrowinthegurepointstotheaveragesessiondurationof¤minutes.
YouServofferstheabilityforreplicatingcontentacrosstrustedpeernodessothatpeertransienceneednotnecessarilytranslatein-tocontentavailabilitytransience.
Theideaisthataslongassomepeerhostingthecontentisonline,thecontentremainsavailable,andthroughthesameURLs.
Althoughusersareawareofthisfea-ture,onlyasmallfraction(§oftheofthepeersmadeuseofit.
Thus,resultfreshnessremainsanimportantconcern.
0100200300400500600700800900USUKDECANLFRINCHAUIEDKSEITATESNumberofpeersGEOGRAPHICDISTRIBUTIONOFPEERSFigure6:Thetopof¤countrieswithYouSearchusers.
Al-thoughUSAconstitutesthedominantfraction,theYouSearchenabledwebisdistributedacrosstheworld.
13401360138014001420144014601480150015201540-42-28-140142842NumberofactivepeersDayssinceYouSearchreleaseYouServWEEKLYUSAGE(AUG---NOV2002)Figure7:ThenumberofYouServpeersthatareactiveduringaweekhasbeenincreasingrapidlysinceYouSearchwasreleased.
5.
2GeographicalLayoutofPeersFigure6showsthedistributionofYouSearch-enabledpeerswith-intheIBMintranet.
AlthoughthedominantfractionofpeersislocatedwithintheUSA,YouSearchisbeingusedactivelyin¤countriesacrosstheworld.
ThedifferenttimezonesofitsuserscombinedwithshortsessiondurationscausestheYouServwebtobeinconstantchurn,renderingcentralizedcrawlingimpractical.
5.
3PerceivedValue-AddofYouSearchTheproliferationofYouServpeersontheIBMintranetshowsthatpeoplewantasimpleandeffectivemeansofsharingcontentusingtheweb.
Recentusagestatisticsindicatethevalue-addofYouSearch.
Figure7showsthenumberofuniqueuserswhoac-tivelyusedYouServsometimeduringtheweekbeforeeachplottedpoint.
NotethatasignicantgrowthinthisusagemetriccoincideswiththedayYouSearchwasreleased.
Anothersignofdemandforsearchisthefactthatthebetareleasealone(interval¨8§§inFig-ure7)wasdownloadedandusedbynearlyonehundredusers.
5.
4CharacteristicsofBloomFiltersInourIBMintranetdeployment,thelengthofeachbloomlteris¤¤Kbitsandthenumberofbloomltersissetto".
Thethreehashfunctions#§%§arecomputedasfollows.
AnMD5[25]hashofeachtermatapeerisdetermined.
AnMD5hashisbyteslongfromwhichthreebithashvaluesareextractedandusedaskeysinthebloomlter.
NotethattheuseofMD5en-suresthatthehashisstronglyrandomandthattheresultingbloomltersareindependent.
AsmentionedinSection3,onlypubliclysharedlesateachpeer00.
20.
40.
60.
810100200300400FractionofbitssetRankofpeer(a)BITSPERPEER00.
10.
20.
30.
40.
50.
60.
70.
80.
912^{4}FractionofpeerswithbitsetRankofbit(b)PEERSPERBITFigure8:Characteristicsofbloomltersfromapproximately¤peers.
01002003004005006000200400600800100012001400UnitsRankofqueryTIMETOGATHERALLRESULTSMEDIAN(8.
18s)MAXIMUM(354.
20s)Time(sec)Peers(count)Figure9:Timetakentogatherallresults.
Notethatthere-sponsetimeseenbyabrowserisasmallfractionofthistime.
areindexed.
Userscanalsoexplicitlydeclarespecicdirectoriestobeindexableornot.
Foreachindexablele,theIndexerindexeskeywordsthatappearinitsURL.
Inaddition,iftheleisoftypeHTMLortext,theIndexerindexeskeywordswithintheleitself.
Figure8(a)showsthatonlyafewpeershavealarge(two-thirds)fractionoftheirbitsset.
Inadditiontoincreasingnetworktrafc,thesepeerscouldfacehighqueryprocessingloads.
Arelativelysimplesolutiontothisproblemistocreatepartitionsofcontentatsuchpeers,withdistinctbloomlterssummarizingeachpartitioninsteadofeachnode.
Figure8(b)showsthatmostofthebitsinthebloomindexarehighlyselective,thoughafewbitsthatcorrespondtothemostfrequentlyoccurringwords(about)aresetbyalmostofpeers.
YouSearchcouldbemadetolterstopwordstoreducethiseffect.
5.
5TimetoanswerqueriesToevaluateoverallqueryperformance,weloggedstatisticsforglobalqueriesaskedataYouSearchpeer.
Weformedaquerysetcomprisedoftherst§globalqueriesloggedduringthemon-itoringperiod(September9,2002toNovember9,2002).
Thequeriesweresortedbasedonthetimetakentogatheranswersatthequeryingpeer.
Figure9showsthatmorethanhalfthequerieswereansweredinlessthanseconds.
Nearlyofthequeriestookmorethanaminutetobeanswered.
Notethatthesetimesarenottheresponsetimesseenbythebrowser.
AsdiscussedinSec-tion3,theresultsaredisplayedtotheuserwhiletheyarebeinggathered.
Wealsoplottedthenumberofpeersthatwerecontactedforan-swerstothecorrespondingquery.
Notsurprisingly,thecurvefornumberofpeerscontactedfollowsthetimecurveclosely.
Thejitter01020304050607080102030TimetogatherresultsRankofqueryGAINSFROMCACHINGNetworkCachedFigure10:Timetogatherresultswithandwithoutcaching.
inthiscurvecanbeattributedtothegeographicdistancebetweenpeersandthefactthatevenontheIBMintranet,nodeshavevastlydifferentbandwidthandlatencycharacteristics(someconnectviadialupVPN,forexample).
Wenotethatthecurrentimplementationprobespeerssequentially.
Parallelizingsuchprobesbycontactingnodessimultaneouslywillresultinproportionalimprovementsingathertimes.
Afactorofteninimprovementiseasilyfeasibleandwillbringthemediangathertimetoasub-secondrange.
Notsurprisingly,thelongestquerieswerealsoobservedtohavelargeanswersets.
Forthesequeries,thecollectionofresultswereinfactcollectedfasterthanthespeedatwhichauserislikelytoinspectthem.
5.
6QueryCharacteristicsWeexaminedqueriesinthequerysetdiscussedearlierinSec-tion5.
5.
Alargefraction(¤¤¤)ofqueriesweresimplekeywordquerieswithanaveragelengthofkeywordsandstandardde-viation.
Theremaining(ofthe¨§)involvedanadvancedfeaturelikesiteorgroupsearch.
Aboutofthequerieshadatleastoneanswerwithanaverageof¤¤answersperqueryobtainedfromanaverageofpeers.
Figure9plotsthenumberofpeers(sizeof)thatwereprobedtoobtainanswers.
Webelievethattheusersarestilladjustingtotheavailabilityofsearch,withasignicantamountofcontentremainingunindexedduetoYouSearch'sdefaultbehaviorofleavingcontentunindexedshoulditbehiddenbehindindex.
htmlles.
AsusersbecomemorefamiliarwithYouSearch,moredatawillbemadeavailableforsearching,andthefractionofsuccessfulquerieswillincrease.
Weobservedthat§2ofthequerieshadafalse-positivepeerinitsresultset.
Theaveragenumberoffalse-positivepeerswas¤whichcorrespondstoofanaverageresultsetofsizepeers.
MostofthesefalsepositivesareduetothefewpeersinFigure8(a)thathavealargefractionoftheirbitsset.
Ofthesuccessfulqueries,§¤wereservedfrompeercaches.
Thisvaluewillincreaseasthesystemgrowsduetohigherqueryloads.
Additionally,wehavebeenveryaggressiveinclearingcaches:thedefaultcachelifetimeissettominutes.
Increasingthisdefaultparameterwillleadtoimprovedcachehitrates,thoughwithaslightpenaltyinresultfreshness.
Indeed,nearlyathird()ofallqueriesinoursamplewereaskedmorethanonce.
Tobetterquantifytheeffectofcachingonperformance,weis-suedasampleofqueriesatonepeer,andthenrepeatedthesesamequeriesatdifferentpeersinthenetwork.
Thesecondtimeaquerywasexecuted,resultsweregatheredfromacacheinsteadofgatheredfromscratch.
Figure10showsthetimestakeninthetwoinvocations.
Clearlycachingimprovesperformance,oftenbyanorderofmagnitude.
5.
7LoadonRegistrarRecallthatYouSearchutilizesacentralizedregistrarforprovid-inganaggregatedbloomlterindex.
Inthissectionweanalyzethecommunicationandprocessingdemandsrequiredofthiscom-ponent.
Supposetherearepeersparticipatinginthenetwork.
Eachofthepeerswillsendtheir"bloomltersofsizebitseverysecondsiftheircontenthaschanged,whereistheperiodatwhichcrawlsoccurateachpeer.
LetGbethefractionofpeerswhosecontentchangesinanintervalofseconds.
TheregistrarthushasanaverageinboundtrafcofGbitseveryseconds.
InthecurrentYouSearchdeployment,",andseconds.
Weconservativelysetthefrequencyofsitechanges,G,to.
Withsuchsettings,assumingtheregistrarhasaT1-linebandwidthof¨§¤¨§bitspersecond,ofwhichisconsumedbynetworkingoverhead,theregistrarcouldsupport)'()109,856peers.
Assumingacorporatepri-vatenetworkwithaT3linecapacityof¤¤2§43HD,theregistrarcouldsupport286,310peers.
Theseareratherlooseupperboundsgivenourconservativesettingofsite-modicationfrequen-cy.
Bandwidthcouldbefurtherreducedbyhavingpeerssendonlychangedbits[22]insteadofre-sendingtheentirebloomlterwitheachchange.
ThecurrentdesignneverthelesseasilysupportsthecurrentandprojecteduserbaseswithintheIBMdeployment.
Letusnowconsidertheprocessingcostsattheregistrar.
Foranyreasonablenumberofpeers,theregistrarcaneasilymaintainthethreemappingsofSection4inmainmemory.
Foreachquery,theregistrarperformsasmallnumberofsimplelookupsonthesedatastructureswhichamounttoeasily-optimizedbit-vectoroperations.
Thisdesignpermitsevenmodesthardwaretoscaletotensifnothundredsofthousandsofusersandtheirqueries.
6.
TUNINGPEERDEPLOYMENTSWerealizedthattuningpeerdeploymentsforoptimalperfor-mancewouldbeadifcultjob.
Thedifcultystemsfromtheenduser'sreluctanceindownloadingnewreleasesofsoftwarethatisalreadyprovidingadesiredlevelofsatisfaction.
Inthissectionwepresentasimplesolutionthatwedesignedinresponse.
6.
1ChallengesUniquetoP2PNetworksWhileeveryclient-sidesoftwaredeploymentsuffersfromsimi-larend-userinertia,theproblemisespeciallysevereforP2Pnet-works.
AP2Pnetworkdrawsitsbenetsfromhavinglargenum-bersofpeersparticipatetoformasinglecommunityofusers.
Thelargenumbersofpeersdirectlytranslatestoalargenumberofin-dividualsoftwaredeploymentsthatneedtobetuned.
Moreimpor-tantly,thesinglecommunityconstrainsthatsuchtuningbesimul-taneousacrossallpeers:thesuccessofthecommunityreliesonallpeersusingthesameprotocol.
EXAMPLE6.
1.
SupposethatwereleasedYouSearchwiththesizeofindividualbloomltersas¤53.
Wemighthavefoundthatasignicantfractionofthecommunitysetsmostofthebits.
EachquerywouldthenbemappedtoallthepeersinthenetworkcausingtheResultGathererphasetodegenerateintoabroadcastofthequeryovertheentirenetwork.
Thecorrectthingtodowouldbetoincreasethesizeofthebloomltersto¤bits.
However,theproblemisnotsolvedbyjustrollingoutanewversionofthesoftware.
Ifthenewandtheoldversionsco-existed,thesametermwouldbehashedtoadifferentbitpositioninboththeseversions.
Thusthesamequerywouldberepresenteddifferentlyatdifferentpeers.
WhiletheproblemcouldbetemporarilysolvedbymaintainingtwodifferentltersattheRegistrar(oneeachforbitsandfor¤bits),thecomplexityofthecodewouldincrease.
Fur-ther,eachsuchtweakwouldresultinmoreltersbeingcreatedandmaintainedforbackwardcompatibility.
SimilartothesizeofbloomltersdiscussedinExample6.
1,thereareseveralparametersinthecodethatneedtobetweakedbasedontheusagepatternsofpeers(e.
g.
,thenumberofbloomlterstocreateateachpeer,thefrequencyofsendingbloomlterstotheRegistrarfromeachpeer,thedurationoftime-outswhilecontactingapeertogatherresults,etc.
)Welabelsuchparametersthatarisefromanimplementationoftheconceptualdesigntunableparameters.
6.
2AddressingtheChallengesWeprogrammedYouSearchtoallowdistributedtuning.
EachYouSearchpeerisenabledwithaTuningManager.
TheTuningManagerworkswithacentralizedAdministratortoreceiveandaf-fectthechangespushedoutbytheAdministrator.
TheManagercreatesalocalstateleondiskatthepeerduringtheinstallationprocess.
SignedmessagesreceivedfromtheAdministratorarein-terpreted,acteduponandpersistedinthestatelebythemanager.
TherestoftheapplicationreadsthemaintenancestatetorespondtothechangespushedoutbytheAdministrator.
Thevariousparametersthatneedtobetunedinthecodeareidentiedandsettovaluesreadfromthemaintenancestateleatapplicationlaunch.
Changestothevaluesofthesetunableparame-terscanbesenttotheMaintenanceManagerbytheAdministrator.
TheManagermerelyoverwritesthevaluesfortheseparametersinthestateletoaffectthechanges.
Thus,wecansimultaneouslychangethesettingsfortunableparametersofasmanypeersaswelike,allowingustoexperimentandtunethenetworkasitevolves.
7.
CONCLUSIONSANDFUTUREWORKInthispaper,weaddressedthechallengeofprovidingfresh,fastandcompletesearchoverpersonalwebserver-hostedcontent.
Be-causeofthetransientavailabilityofpersonalwebserversandtheirrapidlyevolvingcontent,anycrawl-basedsearchsolutionsuffersfromstaleandincompleteresults.
Oursolution,YouSearch,isin-steadahybridpeer-to-peersystemthatreliesprimarilyontheweb-serversthemselvestokeeppacewithchangingnetworkstate.
Itscalesgracefullyandcostslittlesinceitscentralizedresourcere-quirementsaresmall.
YouSearchalsoenhancesthesharedcontextamongitsusers.
Personalwebserverscanbeaggregatedintooverlapping,userspec-iedgroups,andthesegroupssearchedjustasindividualnodes.
Anygroupmembercanpersistresultrecommendationssothatoth-erscandrawuponitsknowledge.
Withintwomonthsofitsdeployment,YouSearchhasalreadybeenadoptedbynearly,users.
Ourstudyofitsusageinthisreal-lifesettingshowedthatYouSearchperformswelland,mostimportantly,satisesuserneeds.
Futureworkmightconsiderallowingauthenticatedpeerstosearchsecuredinadditiontopubliccontent.
Otherusefulextensionswouldincludehavingapeergeneratesnippetsofmatchingbodytextforits(cached)searchresults,exploitsocialnetworks(denedbyex-istinggroupdenitions)forpersonalizedinter-hostranking,andevenactivelymaintaincachedresultsforthemostpopularqueries(insteadofsimplytimingthemouttoavoidstaleness).
Unlikepure-lycentralizedsearcharchitectures,theplethoraofcompute,stor-age,andbandwidthavailabletothesetofYouSearchpeersasawholeputsfewconstraintsonfurtherenhancement.
8.
ACKNOWLEDGEMENTSWethankRakeshAgrawalformanyinsightfulcommentsonthisdraftandYouSearchingeneral.
WethankDanGruhlforde-signingtheYouSearchlogo.
Finally,weacknowledgethemanyusersofourinternalYouSearchdeploymentfortheirvaluablefeed-back.
MayankalsothanksAmitSomaniforintroducinghimtotheYouServproject.
9.
REFERENCES[1]Apache'sHTTPServerProject.
http://httpd.
apache.
org/.
[2]BadBlue—TheP2PFileSharingWebServer.
http://www.
badblue.
com/.
[3]TheGnutellaNetwork.
http://www.
gnutella.
com/.
[4]Google—TheWebSearchEngine.
http://www.
google.
com/.
[5]GrooveNetworks,Inc.
DesktopCollaborationSoftware.
http://www.
groove.
net/.
[6]ht://Dig–InternetSearchEngineSoftware.
http://htdig.
org/.
[7]TheKaZaaMediaNetwork.
http://www.
kazaa.
com/.
[8]MacOSX:PersonalWebSharing.
http://www.
mac3d.
com/.
[9]LAN-Basedwebcachingforacceleratedwebaccess.
http://www.
mangosoft.
com/products/cachelink.
[10]Microsoft'sPersonalWebServerandPeerWebServices.
http://www.
microsoft.
com/.
[11]TheNapsterCompany.
http://www.
napster.
com/.
[12]SWISH-E:SimpleWebIndexingSystemforHumans-Enhanced.
http://swish-e.
org/.
[13]TheXDegreesCompany.
http://www.
xdegrees.
com/.
[14]R.
J.
BayardoJr.
,A.
Somani,D.
Gruhl,andR.
Agrawal.
YouServ:AWebHostingandContentSharingToolfortheMasses.
InProc.
11thIntl.
WorldWideWebConf.
(WWW),2002.
[15]K.
Bharat.
SearchPad:ExplicitCaptureofSearchContexttoSupportWebSearch.
InProc.
9thIntl.
WorldWideWeb(WWW)Conference,2000.
[16]B.
Bloom.
Space/timeTrade-offsinHashCodingwithAllowableErrors.
InCommunicationsofACM,volume13(7),pages422–426,1970.
[17]D.
Carmel,E.
Amitay,M.
Herscovici,Y.
Maarek,Y.
Petruschka,andA.
Soffer.
JuruatTREC10-ExperimentswithIndexPruning.
InProc.
10thTextREtrievalConference(TREC),2001.
[18]F.
M.
Cuenca-AcunaandT.
D.
Nguyen.
Text-BasedContentSearchandRetrievalinAdHocP2PCommunities.
InProc.
InternationalWorkshopinPeer-to-PeerComputing,2002.
[19]F.
Dabek,E.
Brunskill,M.
F.
Kaashoek,D.
Karger,R.
Morris,I.
Stoica,andH.
Balakrishnan.
BuildingPeer-to-PeerSystemswithChord,aDistributedLookupService.
InProc.
8thWorkshoponHotTopicsinOperatingSystems(HotOS),2001.
[20]C.
Dwork,R.
Kumar,M.
Naor,andD.
Sivakumar.
RankAggregationMethodsfortheWeb.
InProc.
10thIntl.
WorldWideWebConf.
(WWW),pages613–622,2001.
[21]S.
Iyer,A.
Rowstron,andP.
Druschel.
Squirrel:Adecentralizedpeer-to-peerwebcache.
InProc.
21stACMSIGACT-SIGOPSSymposiumonPrinciplesofDistributedComputing(PODC),2002.
[22]M.
Mitzenmacher.
CompressedBloomFilters.
InProc.
20thACMSIGACT-SIGOPSSymposiumonPrinciplesofDistributedComputing(PODC),pages144–150,2001.
[23]V.
N.
PadmanabhanandK.
Sripanidkulchai.
TheCaseforCooperativeNetworking.
InProc.
1stIntl.
Peer-to-PeerSystems(IPTPS)Workshop,2002.
[24]S.
Ratnasamy,P.
Francis,M.
Handley,andR.
Karp.
AScalableContent-AddressableNetwork(CAN).
Proc.
ofACMSIGCOMM,2001.
[25]R.
Rivest.
RFC1321:TheMD5Message-DigestAlgorithm.
Technicalreport,NetworkWorkingGroup,1992.
[26]A.
RowstronandP.
Druschel.
Pastry:Scalable,DistributedObjectLocationandRoutingforLarge-scalePeer-to-PeerSystems.
InProc.
IFIP/ACMIntl.
Conf.
onDistributedSystemsPlatforms(Middleware),2001.
[27]T.
Stading,P.
Maniatis,andM.
Baker.
Peer-to-peerCachingSchemestoAddressFlashCrowds.
InProc.
1stIntl.
Peer-to-PeerSystems(IPTPS)Workshop,2002.
[28]A.
Stavrou,D.
Rubenstein,andS.
Sahu.
ALightweight,RobustP2PSystemtoHandleFlashCrowds.
InProc.
IEEEIntl.
Conf.
onNetworkProtocols(ICNP),2002.
[29]Y.
XieandD.
O'Hallaron.
LocalityinSearchEngineQueriesanditsImplicationsforCaching.
InProc.
19thIEEEInfocom,2000.

香港ceranetworks(69元/月) 2核2G 50G硬盘 20M 50M 100M 不限流量

香港ceranetworks提速啦是成立于2012年的十分老牌的一个商家这次给大家评测的是 香港ceranetworks 8核16G 100M 这款产品 提速啦老板真的是豪气每次都给高配我测试 不像别的商家每次就给1核1G,废话不多说开始跑脚本。香港ceranetworks 2核2G 50G硬盘20M 69元/月30M 99元/月50M 219元/月100M 519元/月香港ceranetwork...

HostKvm($4.25/月),俄罗斯CN2带宽大升级,俄罗斯/香港高防限量5折优惠进行中

HostKvm是一家成立于2013年的国外VPS服务商,产品基于KVM架构,数据中心包括日本、新加坡、韩国、美国、俄罗斯、中国香港等多个地区机房,均为国内直连或优化线路,延迟较低,适合建站或者远程办公等。本月,商家旗下俄罗斯、新加坡、美国、香港等节点带宽进行了大幅度升级,俄罗斯机房国内电信/联通直连,CN2线路,150Mbps(原来30Mbps)带宽起,目前俄罗斯和香港高防节点5折骨折码继续优惠中...

iON Cloud:新加坡cn2 gia vps/1核/2G内存/25G SSD/250G流量/10M带宽,$35/月

iON Cloud怎么样?iON Cloud升级了新加坡CN2 VPS的带宽和流量最低配的原先带宽5M现在升级为10M,流量也从原先的150G升级为250G。注意,流量也仅计算出站方向。iON Cloud是Krypt旗下的云服务器品牌,成立于2019年,是美国老牌机房(1998~)krypt旗下的VPS云服务器品牌,主打国外VPS云服务器业务,均采用KVM架构,整体性能配置较高,云服务器产品质量靠...

服务器社区为你推荐
aspweb服务器asp网站挂上服务器,详细步骤sqlserver数据库SQL SERVER数据库是可以做什么用的?字节跳动回应TikTok易主贾斯汀比伯的confident他在mv女主说了什么,大神回复,采纳支付宝账户是什么好评返现 要支付宝帐号 支付宝帐号是什么啊netshwinsockreset电脑开机老是出现wwbizsrv.exe 应用程序错误 怎么处理filezilla_server如何用FileZilla Server新增FTP帐号更新internal大飞资讯单仁资讯的黄功夫是何许人?地址栏图标网站在地址栏显示的图标,是怎么显示出来的drupal主题Drupal比DEDE等国内CMS好在哪里?
tk域名注册 宿迁服务器租用 阿里云邮箱登陆首页 vpsio java主机 php探针 mysql主机 panel1 徐正曦 太原联通测速 shuang12 lamp怎么读 镇江高防 广东服务器托管 时间服务器 优惠服务器 comodo hosts文件 iptables 网易轻博客 更多