union11kk99

11kk99 com  时间:2021-03-03  阅读:()
ComputerNetworks31(1999)1467–1479FindingrelatedpagesintheWorldWideWebJeffreyDean,1,MonikaR.
Henzinger2CompaqSystemsResearchCenter,130LyttonAve.
,PaloAlto,CA94301,USAAbstractWhenusingtraditionalsearchengines,usershavetoformulatequeriestodescribetheirinformationneed.
ThispaperdiscussesadifferentapproachtoWebsearchingwheretheinputtothesearchprocessisnotasetofqueryterms,butinsteadistheURLofapage,andtheoutputisasetofrelatedWebpages.
ArelatedWebpageisonethataddressesthesametopicastheoriginalpage.
Forexample,www.
washingtonpost.
comisapagerelatedtowww.
nytimes.
com,sincebothareonlinenewspapers.
WedescribetwoalgorithmstoidentifyrelatedWebpages.
ThesealgorithmsuseonlytheconnectivityinformationintheWeb(i.
e.
,thelinksbetweenpages)andnotthecontentofpagesorusageinformation.
Wehaveimple-mentedbothalgorithmsandmeasuredtheirruntimeperformance.
Toevaluatetheeffectivenessofouralgorithms,weperformedauserstudycomparingouralgorithmswithNetscape's'What'sRelated'service(http://home.
netscape.
com/escapes/related/).
Ourstudyshowedthattheprecisionat10forourtwoalgorithmsare73%betterand51%betterthanthatofNetscape,despitethefactthatNetscapeusesbothcontentandusagepatterninformationinadditiontoconnectivityinformation.
1999PublishedbyElsevierScienceB.
V.
Allrightsreserved.
Keywords:Searchengines;Relatedpages;Searchingparadigms1.
IntroductionTraditionalWebsearchenginestakeaqueryasinputandproduceasetof(hopefully)relevantpagesthatmatchthequeryterms.
Whileusefulinmanycircumstances,searchengineshavethedisadvantagethatusershavetoformulatequeriesthatspecifytheirinformationneed,whichispronetoerrors.
ThispaperdiscusseshowtondrelatedWebpages,adifferentapproachtoWebsearching.
InourapproachCorrespondingauthor.
Presentaddress:mySimon,Inc.
,SantaClara,CA,USA.
E-mail:jdean@mysimon.
com1ThisworkwasdonewhiletheauthorwasattheCompaqWesternResearchLaboratory.
2E-mail:monika@pa.
dec.
comtheinputtothesearchprocessisnotasetofqueryterms,buttheURLofapage,andtheoutputisasetofrelatedWebpages.
ArelatedWebpageisonethataddressesthesametopicastheoriginalpage,butisnotnecessarilysemanticallyidentical.
Forexample,givenwww.
nytimes.
com,thetoolshouldndothernewspapersandnewsorganizationsontheWeb.
Ofcourse,incontrasttosearchengines,ourapproachrequiresthattheuserhasalreadyfoundapageofinterest.
RecentworkininformationretrievalontheWebhasrecognizedthatthehyperlinkstructurecanbeveryvaluableforlocatinginformation[1,3,4,8,12,16,17,20,22,23].
Thisassumesthatifthereisalinkfrompagevandw,thentheauthorofvrecommendspagew,andlinksoftenconnect1389-1286/99/$–seefrontmatter1999PublishedbyElsevierScienceB.
V.
Allrightsreserved.
PII:S1389-1286(99)00022-51468J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479Table1ExampleresultsproducedbytheCompanionalgorithmInput:www.
nytimes.
comwww.
usatoday.
comUSATodaynewspaperwww.
washingtonpost.
comWashingtonPostnewspaperwww.
cnn.
comCableNewsNetworkwww.
latimes.
comLosAngelesTimesnewspaperwww.
wsj.
comWallStreetJournalnewspaperwww.
msnbc.
comMSNBCcablenewsstationwww.
sjmercury.
comSanJoseMercuryNewsnewspaperwww.
chicago.
tribune.
comChicagoTribunenewspaperwww.
nando.
netNandoTimeson-linenewsservicewww.
the-times.
co.
ukTheTimesnewspaperrelatedpages.
Inthispaper,wedescribetheCom-panionandCocitationalgorithms,twoalgorithmswhichuseonlythehyperlinkstructureoftheWebtoidentifyrelatedWebpages.
Forexample,Table1showstheoutputoftheCompanionalgorithmwhengivenwww.
nytimes.
comasinput(inthiscase,theresultsfortheCocitationalgorithmareidenticalandtheresultsforNetscapeareverysimilar,althoughthisisnotalwaystrue).
Oneofourgoalswastodesignalgorithmswithhighprecisionthatareveryfastandthatdonotrequirealargenumberofdifferentkindsofinputdata.
SincewehaveatoolthatgivesusaccesstothehyperlinkstructureoftheWeb(theConnectivityServer[2]),wefocusedonalgorithmsthatonlyuseconnectivityinformationtoidentifyrelatedpages.
Ouralgorithmsuseonlytheinformationaboutthelinksthatappearoneachpageandtheorderinwhichthelinksappear.
Theyneitherexaminethecontentofpages,nordotheyexaminepatternsofhowuserstendtonavigateamongpages.
OurCompanionalgorithmisderivedfromtheHITSalgorithmproposedbyKleinbergforrankingsearchenginequeries[12].
KleinbergsuggestedthattheHITSalgorithmcouldbeusedforndingrelatedpagesaswell,andprovidedanecdotalevidencethatitmightworkwell.
Inthispaper,weextendthealgorithmtoexploitnotonlylinksbutalsotheirorderonapage(seeSection2.
1.
1)andpresenttheresultsofauser-studyshowingthattheresultingalgorithmworksverywell.
TheCocitationalgorithmndspagesthatarefrequentlycocitedwiththeinputURLu(thatis,itndsotherpagesthatarepointedtobymanyotherpagesthatallalsopointtou).
NetscapeCommunicatorVersion4.
06introducedarelatedpagesservicethatisbuiltintothebrowser[15](seeSection2.
3foramoredetaileddiscus-sion).
Onthebrowserscreen,thereisa'What'sRe-lated'button,whichpresentsamenuofrelatedWebpagesinsomecases.
The'What'sRelated'algo-rithminNetscapeisbasedontechnologydevelopedbyAlexa,Inc.
,andcomputesitsanswersbasedonconnectivityinformation,contentinformation,andusageinformation[14].
Tocomparetheperformanceofourtwoalgo-rithmsandNetscape'salgorithm,weperformedauserstudyon59URLschosenby18volunteers.
Ourstudyresultsshowthattheprecisionat10com-putedoverall59URLsofourtwoalgorithmsare73%betterand51%betterthanNetscape's.
NotallalgorithmsgaveanswersforallURLsinourstudy.
Ifwerestrictthecomparisontoonlythe37URLsforwhichallthreealgorithmsreturnedanswers,thentheprecisionat10ofourtwoalgorithmsare40%and22%betterthanNetscape'salgorithm.
Thisissurprisingsinceouralgorithmsarebasedonlyonconnectivityinformation.
Netscape'salgorithmgivesanswersforabout17millionURLs[14],whileouralgorithmscangiveanswersforamuchlargersetofURLs(wehavecon-nectivityinformationon180millionURLs).
ThisisimportantbecauseitmeansthatwecangiverelatedURLinformationformoreURLs.
Ouralgorithmsarealsofast:inourenvironment,bothaveragelessthan200msecofcomputationperinputURL.
TheexampleshowninTable1isforaURLwithaveryhighlevelofconnectivity(www.
nytimes.
comcontains47,751inlinksinourConnectivityServer),andallthreealgorithmsgenerallyperformquitewellforwell-connectedURLs.
Ouralgorithmscanalsoworkwellwhenthereismuchlessconnectivity,asshownbytheexampleinTable2.
ThistableshowstheanswersfortheCompanionandNetscapealgo-rithmsforlinas.
org/linux/corba.
html,oneoftheinputURLschosenbyauseraspartofouruserstudy.
Alongsideeachansweristheuser'srat-ingforeachanswer,witha'1'meaningthattheuserconsideredthepagerelated,'0'meaningthattheJ.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791469Table2ComparisonoftheresultsoftheCompanionandNetscapealgorithmsInput:linas.
org/linux/corba.
htmlCompanionNetscape1www.
cs.
wustl.
edu/schmidt/TAO.
html0labserver.
kuntrynet.
com/linux1dsg.
harvard.
edu/public/arachne0web.
singnet.
com.
sg/siuyin1jorba.
castle.
net.
au/–www.
clark.
net/pub/srokicki/linux1www-b0.
fnal.
gov:8000/ROBIN–www.
earth.
demon.
co.
uk/linux/uk1www.
paragon-software.
com/products/oak0www.
emry.
net/webwatcher1www.
tatanka.
com/orb1.
htm0www.
interl.
net/jasoneng/NLL/lwr1www.
oot.
co.
uk/dome-index.
html0www.
jnpcs.
com/mkb/linux0www.
intellisoft.
com/mark1www.
linuxresources.
com/1www.
dstc.
edu.
au/AU/staff/mart.
.
.
0www.
liszt.
com/1www.
inf.
fu-berlin.
de/brose/jacorb0www.
local.
net/jgo/linuxhelp.
htmlA'1'meansthatthepagewasvaluable,a'0'meansthatthepagewasnotvaluable,a'–'meansthatthepagecouldnotbeaccessed.
userconsideredthepageunrelated,and'–'meaningthattheuserwasunabletoaccessthepageatall.
TheoriginalpagewasaboutCORBAimplementa-tionsforLinux,andtherewere123pagespointingtothispageinourConnectivityServer.
NineofthetenanswersgivenbytheCompanionalgorithmweredeemedrelatedbyouruser,whileonlyonepagefromNetscape'ssetofanswerswasdeemedrelated.
MostofNetscape'sanswerswereaboutthemuchbroadertopicofLinux,ratherthanspecicallyaboutCORBAimplementationsonLinux.
Section2presentsouralgorithmsindetailandde-scribesNetscape'sservice,whileSection3discussestheimplementationofouralgorithms.
Section4de-scribestheuserstudyweperformedandpresentsitsresults,andalsoprovidesabriefperformanceeval-uationofouralgorithms.
Finally,Section6presentsourconclusions.
2.
RelatedpagealgorithmsInthissectionwedescribeourtwoalgorithms(theCompanionalgorithmandtheCocitationalgorithm),aswellasNetscape'salgorithm.
UnlikeNetscape'salgorithm,bothofouralgorithmsexploitonlythehyperlink-structure(i.
e.
graphconnectivity)oftheWebanddonotexaminetheinformationaboutthecontentorusageofpages.
Netscape'salgorithmusesallthreekindsofinformationtoarriveatitsresults.
Inthesectionsbelow,weusethetermsparentandchild.
Ifthereisahyperlinkfrompagewtopagev,wesaythatwisaparentofvandthatvisachildofw.
2.
1.
CompanionalgorithmTheCompanionalgorithmtakesasinputastartingURLuandconsistsoffoursteps:(1)Buildavicinitygraphforu.
(2)Contractduplicatesandnear-duplicatesinthisgraph.
(3)Computeedgeweightsbasedonhosttohostconnections.
(4)Computeahubscoreandanauthorityscoreforeachnodeinthegraphandreturnthetoprankedauthoritynodes(ourimplementationreturnsthetop10).
Thisphaseofthealgorithmusesamod-iedversionoftheHITSalgorithmoriginallyproposedbyKleinberg[12].
Thesestepsaredescribedinmoredetailinthesubsectionsbelow.
Onlystep1exploitstheorderoflinksonapage.
2.
1.
1.
Step1:buildingthevicinitygraphGivenaqueryURLuwebuildadirectedgraphofnodesthatarenearbytouintheWebgraph.
GraphnodescorrespondtoWebpagesandgraphedgescorrespondtohyperlinks.
Thegraphconsistsofthefollowingnodes(andtheedgesbetweenthesenodes):1470J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479(1)u,(2)uptoBparentsofu,andforeachparentuptoBFofitschildrendifferentfromu,and(3)uptoFchildrenofu,andforeachchilduptoFBofitsparentsdifferentfromu.
Hereishowwechoosethesenodesindetail:ThereisastoplistSTOPofURLsthatareunrelatedtomostqueriesandthathaveveryhighindegree.
Ourstoplistwasdevelopedbyexperimentation,andcurrentlycontains21URLs.
Examplesofnodesthatappearonourstoplistarewww.
yahoo.
comandwww.
microsoft.
com/ie/download.
html.
IfthestartingURLuisnotoneofthenodesonourstoplist,thenweignorealltheURLsonthestoplistwhenformingthevicinitygraph.
Ifudoesappearonthestoplist,however,thenwedisablethestoplist(i.
e.
setSTOPtotheemptylist)andfreelyincludeanynodesinthevicinitygraph.
WedisablethestoplistwhentheinputURLappearsonthestoplistbecausemanynodesonthestoplistarepopularsearchenginesandportals,andwewanttopermitthesenodestobeconsideredwhentheinputURLisanothersuchpopularsite.
Goback(B),andback-forward(BF):IfuhasmorethanBparents,addBrandomparentsnotonSTOPtothegraph;otherwiseaddallofu'sparents.
IfaparentxofuhasmorethanBFC1children,adduptoBF=2childrenpointedtobytheBF=2linksonximmediatelyprecedingthelinktouanduptoBF=2childrenpointedtobytheBF=2linksonximmediatelysucceedingthelinktou(ignoringduplicatelinks).
IfpagexhasfewerthanBFchildren,weaddallofitschildrentothegraph.
Notethatthisexploitstheorderoflinksonpagex.
Goforward(F),andforward-back(FB):IfuhasmorethanFchildren,addthechildrenpointedtobytherstFlinksofu;otherwise,addallofu'schil-dren.
IfachildofuhasmorethanBFparents,addtheBFparentsnotonSTOPwithhighestindegree;otherwise,addallofthechild'sparents.
Ifthereisahyperlinkfromapagerepresentedbynodevinthegraphtoapagerepresentedbynodew,andvandwdonotbelongtothesamehost,thenthereisadirectededgefromvtowinthegraph(weomitedgesbetweennodesonthesamehost).
Inourexperience,wehavefoundthatusingalargevalueforB(2000)andasmallvalueforBF(8)worksbetterinpracticethanusingmoderatevaluesforeach(say,50and50).
Wehaveobservedthatlinkstopagesonasimilartopictendtobeclusteredtogether,whilelinksthatarefartherapartonapagearelesslikelytobeonthesametopic(forexample,mosthotlistsaregroupedintocategories).
Thishasalsobeenobservedbyotherresearchers[6].
UsingalargervalueforBalsomeansthatthelikelihoodofthecomputationbeingdominatedbyasingleparentpageisreduced.
2.
1.
2.
Step2:duplicateeliminationAfterbuildingthisgraphwecombinenear-dupli-cates.
Wesaytwonodesarenear-duplicatesif(a)theyeachhavemorethan10linksand(b)theyhaveatleast95%oftheirlinksincommon.
Tocombinetwonear-duplicateswereplacetheirtwonodesbyanodewhoselinksaretheunionofthelinksofthetwonear-duplicates.
Thisduplicateeliminationphaseisimportantbecausemanypagesareduplicatedacrosshosts(e.
g.
mirrorsites,differentaliasesforthesamepage),andwehaveobservedthatallowingthemtoremainseparatecangreatlydistorttheresults.
2.
1.
3.
Step3:assignedgeweightsInthisstage,weassignaweighttoeachedge,usingtheedgeweightingschemeofBharatandHen-zinger[3]whichwerepeathereforcompleteness.
Anedgebetweentwonodesonthesamehost3hasweight0.
Iftherearekedgesfromdocumentsonarsthosttoasingledocumentonasecondhostwegiveeachedgeanauthorityweightof1=k.
Thisweightisusedwhencomputingtheauthorityscoreofthedocumentonthesecondhost.
Ifthereareledgesfromasingledocumentonarsthosttoasetofdocumentsonasecondhost,wegiveeachedgeahubweightof1=l.
Weperformthisscalingtopreventasinglehostfromhavingtoomuchinuenceonthecomputation.
Wecalltheresultingweightedgraphthevicinitygraphofu.
2.
1.
4.
Step4:computehubandauthorityscoresInthisstep,weruntheimpalgorithm[3]onthegraphtocomputehubandauthorityscores.
Theimp3Weassumethroughoutthepaperthatthehostcanbedeter-minedfromtheURL-string.
J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791471algorithmisastraightforwardextensionoftheHITSalgorithmwithedgeweights.
TheintuitionbehindtheHITSalgorithmisthatadocumentthatpointstomanyothersisagoodhub,andadocumentthatmanydocumentspointtoisagoodauthority.
Transitively,adocumentthatpointstomanygoodauthoritiesisanevenbetterhub,andsimilarlyadocumentpointedtobymanygoodhubsisanevenbetterauthority.
TheHITScomputationrepeatedlyupdateshubandauthorityscoressothatdocumentswithhighauthorityscoresareexpectedtohaverelevantcontent,whereasdocumentswithhighhubscoresareexpectedtocontainlinkstorelevantcontent.
Thecomputationofhubandauthorityscoresisdoneasfollows:InitializeallelementsofthehubvectorHto1.
0.
InitializeallelementsoftheauthorityvectorAto1.
0.
WhilethevectorsHandAhavenotconverged:ForallnodesninthevicinitygraphN,A[n]:DP.
n0;n/2edges.
N/H[n0]authorityweight.
n0;n/ForallninNH[n]:DP.
n;n0/2edges.
N/A[n0]hubweight.
n;n0/NormalizetheHandAvectors.
Notethatthealgorithmdoesnotclaimtondallrelevantpages,sincetheremaybesomethathavegoodcontentbuthavenotbeenlinkedtobymanyauthors.
TheCompanionalgorithmthenreturnsthenodeswiththetenhighestauthorityscores(excludinguitself)asthepagesthataremostrelatedtothestartpageu.
2.
2.
CocitationalgorithmAnalternativeapproachforndingrelatedpagesistoexaminethesiblingsofastartingnodeuintheWebgraph.
Twonodesarecocitediftheyhaveacommonparent.
Thenumberofcommonparentsoftwonodesistheirdegreeofcocitation.
AsanalternativetotheCompanionalgorithm,wehavedevelopedaverysimplealgorithmthatdeterminesrelatedpagesbylookingforsiblingnodeswiththehighestdegreeofcocitation.
TheCocitationalgo-rithmrstchoosesuptoBarbitraryparentsofu.
Foreachoftheseparentsp,itaddstoasetSuptoBFchildrenofpthatsurroundthelinkfromptou.
TheelementsofSaresiblingsofu.
ForeachnodesinS,wedeterminethedegreeofcocitationofswithu.
Finally,thealgorithmreturnsthe10mostfrequentlycocitednodesinSastherelatedpages.
Insomecasesthereisaninsufcientlevelofcoc-itationwithutoprovidemeaningfulresults.
Inourimplementation,iftherearenotatleast15nodesinSthatarecocitedwithuatleasttwice,thenwerestartthealgorithmusingthenodecorrespondingtou'sURLwithonepathelementremoved.
Forexam-ple,ifu'sURLisa.
com/X/Y/ZandaninsufcientnumberofcocitednodesexistforthisURL,thenwerestartthealgorithmwiththeURLa.
com/X/Y(iftheresultingURLisinvalid,wecontinuetochopelementsuntilweareleftwithjustahostname,orwendavalidURL).
Inourimplementation,wechoseBtobe2000andBFtobe8(thesameparametervaluesweusedforourimplementationoftheCompanionalgorithm).
OnewayoflookingattheCocitationalgorithmisthatitnds'maximal'n2bipartitesubgraphsinthevicinitygraph.
2.
3.
Netscape'sapproachNetscapeintroducedanew'What'sRelated'fea-tureinversion4.
06oftheNetscapeCommunicatorbrowser.
Detailsabouttheapproachusedtoidentifyrelatedpagesintheiralgorithmaresketchy.
How-ever,theWhat'sRelatedFAQpageindicatesthatthealgorithmusesconnectivityinformation,usageinformation,andcontentanalysisofthepagestodeterminerelationships.
Wequotefromthe'What'sRelated'FAQpage:TheWhat'sRelateddataiscreatedbyAlexaInternet.
Alexausescrawling,archiving,categorizinganddataminingtechniquestobuildtherelatedsiteslistsformillionsofWebURLs.
Forexample,Alexauseslinksonthecrawledpagestondrelatedsites.
Theday-to-dayuseofWhat'sRelatedalsohelpsbuildandrenethedata.
Astheserviceisused,therequestedURLsarelogged.
Bylookingathigh-leveltrends,NetscapeandAlexacandeducerelationshipsbetweenWebsites.
Forexample,ifthousandsofusersgodirectlyfromsiteAtositeB,thetwositesarelikelytoberelated.
1472J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479Next,AlexachecksalltheURLstomakesuretheyarelivelinks.
Thisprocessremoveslinksthatwouldtrytoreturnpagesthatdon'texist(404errors),aswellasanylinkstoserversthatarenotavailabletothegeneralInternetpopulation,suchasserversthatarenolongeractiveorarebehindrewalls.
Finally,oncealloftherelationshipsareestablishedandthelinksarechecked,thetoptenrelatedsitesforeachURLarechosenbylookingatthestrengthoftherelationshipbetweenthesites.
Eachmonth,AlexarecrawlstheWebandrebuildsthedatatopullinnewsitesandtorenetherelationshipsbetweentheexistingsites.
Newsiteswithstrongre-lationshipstoasitewillautomaticallyappearintheWhat'sRelatedlistforthatsitebydisplacinganysiteswithweakerrelationships.
Notethatsincetherelationshipsbetweensitesarebasedonstrength,What'sRelatedlistsarenotnec-essarilybalanced.
SiteAmayappearinthelistforSiteB,butSiteBmaynotbeinthelistforSiteA.
Generally,thishappenswhenthenumberofsiteswithstrongrelationshipsisgreaterthanten,orwhensitesdonothavesimilarenoughcontent.
3.
ImplementationInexperimentingwiththesealgorithms,wewerefortunatetohaveaccesstoCompaq'sConnectivityServer[2].
TheConnectivityServerprovideshighspeedaccesstothegraphstructureof180millionURLs(nodes)andthelinks(edges)thatconnectthem.
TheentiregraphstructureiskeptinmemoryonaCompaqAlphaServer4100with8GBofmainmemoryanddual466MHzAlphaprocessors.
Therandomaccesspatternsengenderedbytheconnec-tivity-basedalgorithmsdescribedinthispapermeanthatitisimportantformostorallofthegraphtotinmainmemorytopreventhighlevelsofpagingactivity.
Weimplementedamulti-threadedserverthatac-ceptsaURLanduseseithertheCocitationalgorithmortheCompanionalgorithmtondpagesrelatedtothegivenURL.
Ourserverimplementationconsistsofapproximately5500linesofcommentedCcode,ofwhichapproximately1000linesimplementtheCompanionalgorithm,400linesimplementtheCoc-itationalgorithm,andtheremainderaresharedcodetoperformtaskssuchasparsingHTTPqueryre-quests,printingresults,andloggingstatusmessages.
WelinkourservercodedirectlywiththeCon-nectivityServerlibrary,andaccesstheconnectivityinformationbymmappingthegraphinformationintotheaddressspaceofourserver.
OurimplementationoftheCompanionalgorithmhasbeensubjectedtoamoderateamountofper-formancetuning,mostlyindesigningtheneighbor-hoodgraphdatastructurestoimprovedata-cacheperformance.
TheimplementationoftheCocitationalgorithmhasnotbeentunedextensively,althoughitdoesshareafairamountofcodewiththeCompan-ionalgorithm,andthissharedcodehasbeentunedsomewhat.
4.
EvaluationInthissectionwedescribetheevaluationweperformedforthealgorithms.
Section4.
1describesouruserstudy,whileSection4.
2discussestheresultsofthestudy.
Section4.
3evaluatestheruntimeperformanceofouralgorithms.
4.
1.
ExperimentalsetupTocomparethedifferentapproaches,weper-formedauserstudy.
Weasked18volunteerstosupplyuswithatleasttwoURLsforwhichtheywantedtondrelatedpages.
Ourvolunteersincluded14computerscienceprofessionals(mostlyourcol-leaguesatCompaq'sresearchlaboratories),aswellas4peoplewithotherprofessionalcareers.
Were-ceived69URLsandusedeachofthealgorithmstodeterminethetop10answersforeachURL.
Weputtheanswersinrandomorderandreturnedthemtothevolunteersforrating.
Thevolunteerswereinstructedasfollows:Table3SummaryofallanswersforthealgorithmsAlgorithmNo.
ofURLsNo.
ofNo.
ofwithanswersanswersdeadlinksCompanion5049842Cocitation5858062Netscape4036429J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791473Wewanttomeasurehowwelleachalgorithmper-forms.
TomeasureperformancewewanttoknowthepercentageofvaluableURLsreturnedbyeachalgo-rithm.
TobevaluabletheURLmustbebothrelevanttothetopicyouareinterestedinandahighqualitypage.
Forexample,ifyourURLwaswww.
audi.
comandyougetbackanewsgrouppostingwheresomebodytalksabouthisnewAudicar,thenthepagewasontopic,butprobablynothighquality.
Ontheotherhand,ifyougetwww.
jaguar.
comasananswer,thenitisuptoyoutodecidewhetherthisanswerisontopicornot.
Scoring:0:Pagewasnotvaluable=useful1:Pagewasvaluable=useful–:Pagecouldnotbeaccessed(i.
e.
didnotexist,orserverwasdown)Pleaseignoretheorderinwhichthepagesarereturned.
Soifalaterpagecontainssimilarcontenttoanearlierpagepleaseratethelatterpageasifyouhadnotseentheearlierpage.
Thiswillimplythatwedonotmeasurehow'happy'youarewithasetofanswersreturnedbyanalgorithm.
Insteadwemeasurehowmanyvaluableanswerseachalgorithmgives.
"4.
2.
UserstudyresultsWereceivedresponsesratingtheanswerURLsfor59oftheinputURLs.
These59inputURLsformthebasisofourstudy.
Table3showshowmanyofthesequeriesthealgorithmsansweredandhowmanyanswerURLstheyreturned.
Inmanycases,thealgorithmsreturnedlinksthatourusersratedasinaccessible.
ThecolumnlabeledNo.
ofdeadlinksshowsthenumberofinaccessiblepagesamongalltheanswersforeachalgorithm.
Forthepurposesofourevaluation,wetreataninaccessiblelinkasascoreof'0',sinceinaccessiblepagesarenotvaluable=useful.
TheCocitationalgorithmreturnedresultsforallbutoneoftheURLs.
ThereasonwhyitreturnedresultsforalmostallinputURLsisthatwheninsuf-cientconnectivitywasfoundsurroundinganinputURL(e.
g.
a.
com/X/Y),theCocitationalgorithmusedachoppedURLasinput(e.
g.
a.
com/X).
Al-thoughwedidnotincludethischoppingfeatureinourimplementationoftheCompanionalgorithm,itisdirectlyapplicableandwouldenabletheCompan-ionalgorithmtoreturnanswersformoreURLs.
WehaveempiricallyobservedthatNetscape'salgorithmalsoappliesasimilarchoppingheuristicinsomecases.
Table4containsalistingofthe59URLsinourstudy.
ForeachURL,thethreecolumnslabeledCp,Ct,andNshowtheURLsforwhichtheCompanion,Cocitation,andNetscapealgorithmsreturnedresults,respectively.
ThetablealsoshowsthenumberofhyperlinkspointingtotheURLintheConnectivityServer(Inlinks).
FortheCompanionalgorithm,itshowsthenumberofnodesandedgesinthevicinitygraph,aswellasthewallclocktimeinmillisecondstakentocomputethesetofrelatedpages(computedbysurroundingthecomputationwithgettimeof-daysystemcalls).
Forthecocitationalgorithm,itshowsthenumberofsiblingsfound,thenumberofsiblingscocitedatleasttwice(Cocited),andthewallclocktimetakentocomputetheanswers.
Thethreealgorithmsreturnanswersfordiffer-entsubsetsofour59URLs.
Tocomparethesealgorithms,wecansubdividetheURLsintosev-eralgroups.
TheintersectiongroupconsistsofthoseURLswhereallalgorithmsreturnedatleastoneanswer.
Therewere37URLsinthisgroup.
Thenon-NetscapegroupconsistsoftheURLswhereNetscape'sapproachdidnotreturnanyanswers.
Itconsistsof19URLs.
Toquantifytheperformanceofthethreealgo-rithms,wenowdenetwometrics.
Theprecisionatrforagivenalgorithmisthetotalnumberofanswersreceivinga'1'scorewithintherstranswers,di-videdbyrtimesthenumberofqueryURLs.
NoticethatwhenanalgorithmdoesnotgiveanyanswersforaURL,thisisasifitgaveallnon-relevantanswersforthatURL.
ForagivenURLu,theaverageprecisionforuofanalgorithmisthesumoftheprecisionateachrankwheretheanswerofthealgorithmforureceiveda'1'scoredividedbythetotalnumberoftheanswersofthealgorithmforureceivinga'1'score.
Ifthealgorithmdoesnotreturnanyanswersforu,itsaverageprecisionforuis0.
TheoverallaverageprecisionforanalgorithmisthesumofalltheaverageprecisionsforallthequeryURLsdividedbythetotalnumberofqueryURLs.
ForeachofthethreegroupsofURLs(all,in-tersection,andnon-Netscape),Table5showstheaverageprecisionandtheprecisionat10foreachalgorithm.
Fig.
1showstheprecisionatrforeachof1474J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479Table4User-providedURLsforevaluation(andstatistics)URLCpCtNInCompanionalgorithmCocitationalgorithmlinksNodesEdgesTimeSiblingsCocitedTime(ms)(ms)1.
babelsh.
altavista.
digital.
com/cgi.
.
.
ppp29284638210305269660110795732.
developer.
intel.
com/design/strong/t.
.
.
ppp19333479128539173.
english-server.
hss.
cmu.
edu/ppp377441978263215663020635844.
hack.
box.
sk/ppp1666796314090488609914935155.
ieor.
berkeley.
edu/hochbaum/html/bo.
.
.
pp5647344915176.
linas.
org/linux/corba.
htmlppp1232028422277444138487.
members.
tripod.
com/src-fall-regatt.
.
.
p0000680015237368.
metroactive.
com/music/ppp132013026716858319.
travelwithkids.
miningco.
com/pp9974410512745912847110.
www-db.
stanford.
edu/wienerpp21718381630110011.
www.
acf.
dhhs.
gov/ppp18225522149673943659140142412.
www.
adventurewest.
com/pub/NASTC/ppp136592452161413.
www.
amazon.
com/exec/obidos/cache/br.
.
.
pp0000114941126214.
www.
anglia.
ac.
uk/systimk/Humour/Hi.
.
.
p000010327915.
www.
ayso26.
org/p5142169800016.
www.
babynames.
com/ppp52826954781110182649616717.
www.
braintumor.
org/ppp2017791763355512126818.
www.
bris.
ac.
uk/Depts/Union/BTS/Scri.
.
.
ppp13150510841623791527219.
www.
carrier.
com/ppp96427555672123183758016520.
www.
chesschampions.
com/kasparov.
htm.
.
.
ppp11579035352205521.
www.
cl.
cam.
ac.
uk/users/rja14/tamper.
.
.
ppp1197411256274521553622.
www.
davecentral.
com/ppp69094703113912634349114847723.
www.
duofold.
com/stepout/ski-wsa.
htmp0000185662524.
www.
ebay.
com/ppp4658438986601995951131449325.
www.
etoys.
com/ppp2579229461531113351100542726.
www.
fa.
com/ppp128156105123603496452145252227.
www.
focus.
de/ppp76624881140393614208130152528.
www.
geocities.
com/Paris/Metro/1324/pp1192138739153029.
www.
geocities.
com/TheTropics/2442/d.
.
.
pp6445068519184652530.
www.
harappa.
com/har/har0.
htmlppp3823531886202372331.
www.
harmony-central.
com/MIDI/Doc/tu.
.
.
ppp311842497153272132.
www.
hotelres.
com/ppp70830726036148238088919633.
www.
innovation.
ch/java/HTTPClient/ppp7334163913215682134.
www.
inquizit.
com/pp126295454181535.
www.
israelidance.
com/ppp402103239173401636.
www.
jewishmusic.
com/ppp3991617311166123239713437.
www.
joh.
cam.
ac.
uk/pp1625571027234091143738.
www.
levenger.
com/ppp259136720835111162748739.
www.
mdl.
sandia.
gov/micromachine/gal.
.
.
p0000143611540.
www.
midiweb.
com/ppp19675304155493403370114140841.
www.
minimed.
com/ppp2177781663335732236942.
www.
mit.
edu/people/mkgray/net/ppp25811382124448962877243.
www.
mot-sps.
com/ppp3918831896384992065444.
www.
movielink.
com/ppp122746205115893375622120751445.
www.
netusa1.
net/spost/bench.
htmlpp19939442298646.
www.
nsc.
com/catalog/sg708.
htmlpp00002774114933347.
www.
odci.
gov/cia/publications/factb.
.
.
ppp3765435283222216708196261348.
www.
paccc.
com/ppp18409625156421749.
www.
perl.
com/perl/index.
htmlppp14539243918550.
www.
pianospot.
com/1700305.
htmp0000392385J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791475Table4(continued)URLCpCtNInCompanionalgorithmCocitationalgorithmlinksNodesEdgesTimeSiblingsCocitedTime(ms)(ms)51.
www.
rei-outlet.
com/pp20105163681161652.
www.
sddt.
com/les/library/94headli.
.
.
ppp9316633647126038653.
www.
sultry.
arts.
su.
edu.
au/links/lin.
.
.
pp5425846111210742154.
www.
telemarque.
com/articles/andrnch.
.
.
pp00006072416155.
www.
trane.
com/ppp90029656201140209668418156.
www.
traxxx.
de/pp12194607105603043355118447857.
www.
us-soccer.
com/ppp117534588515201234486523058.
www.
wisdom.
weizmann.
ac.
il/naor/puz.
.
.
pp84965396161059.
www.
wwa.
com/android7/pilot/index.
h.
.
.
p00001846709214thesegroupsofURLsingraphsa,bandc.
Fig.
1aandbillustratethattheCompanionandCocitationalgorithmssubstantiallyoutperformNetscape'sal-gorithmatallranks,andtheCompanionalgorithmalmostalwaysoutperformstheCocitationalgorithm.
Theintersectiongroupisthemostinterestingcomparison,sinceitavoidspenalizinganalgorithmfornotreturningatleastoneanswer.
Fortheinter-sectiongroup,Netscape'salgorithmachievesapreci-sionat10of0.
357,whiletheCompanionalgorithmachievesaprecisionat10of0.
501(40%better),andtheCocitationalgorithmachievesaprecisionat10of0.
435(22%better).
Theaverageprecisionintheintersectiongroupdoesnotpenalizeanalgorithmforreturningfewerthan10answers.
Underthismet-ric,theCompanionalgorithmis32%betterthanNetscape'salgorithm,whiletheCocitationalgorithmis20%betterthanNetscape'salgorithm.
InthegroupthatincludesallURLs,allthreeal-gorithmshaddropsintheirprecisionat10values.
Therearetworeasonsforthis.
Therstisthatalgo-rithmsweregivenaprecisionof0foragivenURLiftheydidnotreturnanyanswers.
ThismostlyaffectedtheNetscapeandCompanionalgorithms.
Thesec-Table5PrecisionmetricsforeachalgorithmforthreegroupsofURLsAlgorithmAllIntersectionNon-NetscapeAverageprecisionPrecisionat10AverageprecisionPrecisionat10AverageprecisionPrecisionat10Companion0.
5410.
4170.
6660.
5010.
5400.
401Cocitation0.
5180.
3630.
6050.
4350.
4340.
325Netscape0.
3430.
2410.
5020.
357n=an=aondreasonisthatfortheURLsinthenon-Netscapeset,boththeCompanionandCocitationalgorithmsdidnotperformaswellastheydidforURLsintheIntersectionset.
Despitethesedropsinabsoluteaverageprecision,theaverageprecisionoftheCom-panionalgorithmis57%betterthanthatofNetscape,andtheaverageprecisionoftheCocitationalgorithmis51%betterthanthatofNetscape.
Similarresultsholdwhenexaminingaverageprecisionratherthanprecisionat10.
Toevaluatethestatisticalsignicanceofourre-sults,wecomputedthesigntestandtheWilcoxsonsumsofrankstestforeachpairofalgorithms[18].
TheseresultsareshowninTable6andshowthatthedifferencebetweentheCompanionandNetscapealgorithmsandbetweentheCocitationandNetscapealgorithmsarestatisticallysignicant.
WealsowantedtoevaluatewhetherornotthealgorithmsweregenerallyreturningthesameresultsforagivenURLorwhethertheywerereturninglargelydisjointsetsofURLs.
Table7showstheamountofoverlapintheanswersreturnedbyeachpairofalgorithms.
Thepercentageinparenthesesistheoverlapdividedbythetotalnumberofan-1476J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479110r0.
00.
20.
40.
60.
81.
0Precision(a)AllCompanionCocitationNetscape110r0.
00.
20.
40.
60.
81.
0(b)Intersection110r0.
00.
20.
40.
60.
81.
0(c)Non-NetscapeFig.
1.
PrecisionatrforthethreegroupsofURLs.
Table6SigntestandWilcoxonsumofrankstestforalgorithmpairsAlgorithmAllIntersectionNon-NetscapeSignRanksumSignRanksumSignRanksumCompanionbetterthanNetscapeCompanionbetterthanCocitation0.
19220.
38980.
07930.
26280.
26430.
4180Table7OverlapbetweenanswersreturnedbyalgorithmsAlgorithmCompanionCocitationNetscapeCompanion253(51%)55(11%)Cocitation253(44%)56(10%)Netscape55(15%)56(15%)swersreturnedbythealgorithminthatrow.
Asthetableshows,thereisalargeoverlapbetweentheanswersreturnedbytheCompanionandCocitationalgorithms.
Thisisnotsurprising,sincethetwoalgorithmsarebothbasedonconnectivityinforma-tionsurroundingtheinputURLandsincebothusesimilarparameterstochoosethesurroundingnodes.
ThereisrelativelylittleoverlapbetweentheanswersreturnedbyNetscapeandtheothertwoalgorithms.
4.
3.
Run-timeperformanceInthissection,wepresentdataabouttherun-timeperformanceoftheCompanionandCocitationalgorithms.
SincewedonothavedirectaccesstoNetscape'salgorithmandonlyaccessitthroughthepublicWebinterface,weareunabletopresentper-formanceinformationforNetscape'salgorithm.
AllmeasurementswereperformedonaCompaqAl-phaServer4100with8GBofmainmemoryanddual466MHzAlphaprocessors.
ThemeasuredrunningtimesarewallclocktimesfromthetimetheinputURLisgiventotheserveruntilthetimetheanswersarereturned.
ThesetimesdonotincludethetimetakentoformattheresultsasanHTMLpage,sincethatwasdonebyaserverprocessrunningonanothermachine(andthetimetodothiswasnegligible).
TheaveragerunningtimefortheCompanionalgorithmonthe50URLsforwhichitreturnedanswerswas109msec,whiletheaveragerunningtimefortheCocitationalgorithmonthe58URLsforwhichitprovidedanswerswas195msec.
Theperformanceofboththesealgorithmsissufcientlyfastthateitheronecouldhandlealargeamountoftrafc(closeto800,000requestsperdayfortheCompanionalgorithm).
Furthermore,theaverageperformancecouldprobablybeimprovedbycachinganswersforfrequentlyrequestedURLs.
Althoughwedidnotexplicitlyincludethisfactorinouruserstudy,wehaveinformallyobservedthatthesubjectivequalityofanswersreturnedforboththeCompanionandtheCocitationalgorithmsdoesJ.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791477050001000015000Graphedges0100200300400Companiontime(msec)(a)0200040006000#ofsiblings0200400600Cocitationtime(msec)(b)Fig.
2.
GraphsizeversusrunningtimeofCompanionandCocitationalgs.
notdecreasewhenwesomewhatdecreasetheparam-eterB(thenumberofinlinksconsidered)duringthebuildingofthevicinitygraph.
Thisisimportantforon-lineservicesbecauseitmeansthatthegraphsizecouldbereducedduringtimesofhighload,therebyreducingtheamountoftimetakentoserviceeachrequest.
Underconditionsoflowload,thegraphsizecouldbeincreased.
TheCompanionalgorithmgenerallyconvergesonitsanswerswithinafewiterations(typicallylessthan10iterations),butthenumberofiterationsincreaseswiththegraphsize.
Eachiterationtakestimethatislinearinthenumberofedgesinthevicinitygraph.
WeplottherunningtimeversusthenumberofgraphedgesinFig.
2a.
TherunningtimeoftheCocitationalgorithmisO.
nlogn/,wherenisthenumberofsiblingsexaminedforcocitation,sinceitsortsthesiblingsbythedegreeofcocitation.
ThiseffectisillustratedinFig.
2b.
Inourexperience,therunningtimesforthecocitationandcompanionalgorithmsaregenerallycorrelated,sinceURLswhichhavealargenumberofsiblingstoconsiderinthecocitationalgorithmalsogenerallyproducealargeneighborhoodgraphforprocessinginthecompanionalgorithm.
5.
RelatedworkManyresearchershaveproposedschemesforus-ingthehyperlinkstructureoftheWeb[1,3,4,8,12,16,17,20,22,23].
Forthemostpart,thisworkdoesnotdiscussthendingofrelatedpages,withfourexceptionsdiscussedbelow.
Weknowofonlyonepreviousworkthatexploitstheorderoflinks:Chakrabartietal.
[6]usethelinksandtheirordertocategorizeWebpagesandtheyshowthatthelinksthatarenearagivenlinkinpageorderfrequentlypointtopagesonthesametopic.
PreviousauthorshavesuggestedusingcocitationandotherformsofconnectivitytoidentifyrelatedWebpages.
Spertusobservedthatcocitationcanin-dicatethattwopagesarerelated[20].
Thatis,ifpageApointstobothpagesBandC,thenBandCmightberelated.
Variousresearchersintheeldofbibliometricshavealsoobservedthis[9–11,19],andthisobservationformsthebasisofourCocitationalgorithm.
Thenotionofcollaborativeltering,al-thoughitisbasedonuser'srecommendationsratherthanhyperlinks,alsoreliesonthisobservation[21].
PitkowandPirolli[16]clusterWebpagesbasedoncocitationanalysis.
TerveenandHill[22]usetheconnectivitystructureoftheWebtondgroupsofrelatedWebsites.
OurcompanionalgorithmdescendedfromtheHITSalgorithmdevelopedbyKleinberg[12].
TheHITSalgorithmwasoriginallyproposedbyKlein-bergasawayofusingconnectivitystructuretoiden-tifythemostauthoritativesourcesofinformationonaparticulartopic,wherethetopicwasdenedbythecombinedlinkstructureofalargenumberofstartingnodesonthetopic.
Kleinbergalsoproposedthatthe1478J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479HITSalgorithmcouldbeusedtondrelatedpageswhenthetopicwasdenedbyjustasinglenode.
TheCompanionalgorithmusedHITSalgorithmasastartingpointandextendedandmodieditinfourmainways:(1)Kleinbergsuggestedusingthefollowinggraphtondrelatedpages:Takeaxednumber(say200)ofparentsofthegivenURLandcallthesetconsistingoftheURLandtheseparentsthestartset.
Nowbuildthegraphconsistingofallnodespointingtoanodeinthestartsetorpointedtobyanodeinthestartset.
Thismeansthat'grandparents'ofuareincludedinthegraph,whilenodesthatshareachildwithuarenotincludedinthegraph.
Webelievethatthelatternodesaremorelikelytoberelatedtouthanarethe'grandparent'nodes.
Thereforeourvicinitygraphisstructuredtoexcludegrandparentnodesbuttoincludenodesthatshareachildwithu.
(2)Weexploittheorderofthelinksonapagetodeterminewhich'siblings'ofutoinclude.
Whenweaddedthisfeature,theprecisionofouralgorithmimprovednoticably.
(3)TheoriginalHITSalgorithmdidnothaveedgeweights.
Weuseedgeweightstoreducetheinuenceofpagesthatallresideononehost,sinceBharatandHenzingerhaveshownthatedgeweightsimprovetheprecision[3].
(4)Wealsomergenodesinourvicinitygraphthathavealargenumberofduplicatelinks.
Dupli-catenodesarenotsuchaseriousproblemwhenusingtheHITSorimpalgorithmstorankqueryresults,sincethestartsetconsistsofalargenumberofURLs.
However,whenformingthevicinitygraphstartingwithjustasingleURL,theinuenceofduplicatenodesisincreasedbe-causeduplicatenodeswithalargenumberofoutlinkswillquicklydominatethehubandauthoritycomputation.
KleinbergalsoshowedthatHITScomputestheprincipaleigenvectorofthematrixAAT,whereAistheadjacencymatrixoftheabovegraph,andsug-gestedusingnon-principaleigenvectorsforndingrelatedpages.
Finally,hegaveanecdotalevidencethatHITSmightworkwell.
Consecutively,asequenceofpapers[5,7]pre-sentedimprovementsonHITSandusedittopopu-lateagivenhierarchyofcategories.
6.
ConclusionWehavepresentedtwodifferentalgorithmsforndingrelatedpagesintheWorldWideWeb.
TheysignicantlyoutperformNetscape'salgorithmforndingrelatedpages.
Thealgorithmscanbeimple-mentedefcientlyandaresuitableforusewithinalargescaleWebserviceprovidingarelatedpagesfeature.
OurtwoalgorithmscanbeextendedtohandlemorethanoneinputURL.
Inthiscase,thealgo-rithmswouldcomputepagesthatarerelatedtoallinputURLs.
Wearecurrentlyexploringtheseexten-sions.
AcknowledgementsThisworkhasbenetedgreatlyfromdiscussionswehavehadwithKrishnaBharat,AndreiBroder,PuneetKumar,andHannesMarais.
WearealsoindebtedtoPuneetforhisworkontheConnec-tivityServer.
Assomeoftheearliestusersoftheserver,Puneetansweredourmanyquestionsandimplementedmanyimprovementstotheserverinresponsetooursuggestions.
WearealsogratefultoHannesMaraisfordevelopingWebL,aWebscript-inglanguage[13].
UsingWebL,wewereabletoquicklydevelopaprototypeuserinterfaceforourrelatedpagesserver.
KrishnaBharat,AllanHeydon,andHannesMaraisprovidedusefulfeedbackonear-lierdraftsofthispaper.
Finally,wewouldliketothankalltheparticipantsinouruserstudy.
References[1]G.
O.
Arocena,A.
O.
MendelzonandG.
A.
Mihaila,Appli-cationsofawebquerylanguage,in:Proc.
oftheSixthInternationalWorldWideWebConference,pp.
587–595,SantaClara,CA,April,1997.
[2]K.
Bharat,A.
Z.
Broder,M.
Henzinger,P.
KumarandS.
Venkatasubramanian,Theconnectivityserver:fastaccesstolinkageinformationontheWeb,in:Proc.
ofthe7thInternationalWorldWideWebConference,pp.
469–477,Brisbane,Qld,April1998.
[3]K.
BharatandM.
Henzinger,Improvedalgorithmsfortopicdistillationinhyperlinkedenvironments,in:Proc.
ofthe21stInternationalACMSIGIRConferenceonResearchJ.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791479andDevelopmentinInformationRetrieval(SIGIR'98),pp.
104–111,1998.
[4]S.
BrinandL.
Page,Theanatomyofalarge-scalehyper-textualWebsearchengine,in:Proc.
ofthe7thInternationalWorldWideWebConference,pp.
107–117,Brisbane,Qld.
,April1998.
[5]S.
Chakrabarti,B.
Dom,D.
Gibson,S.
R.
Kumar,P.
Ragha-van,S.
RajagopalanandA.
Tomkins,Experimentsintopicdistillation,in:ACM–SIGIR'98Post-ConferenceWorkshoponHypertextInformationRetrievalfortheWeb,1998.
[6]S.
Chakrabarti,B.
DomandP.
Indyk,Enhancedhypertextcategorizationusinghyperlinks,in:Proc.
oftheACMSIG-MODInternationalConferenceonManagementofData,pp.
307–318,1998.
[7]S.
Chakrabarti,B.
Dom,P.
Raghavan,S.
Rajagopalan,D.
GibsonandJ.
Kleinberg,Automaticresourcecompilationbyanalyzinghyperlinkstructureandassociatedtext,in:Proc.
oftheSixthInternationalWorldWideWebConfer-ence,pp.
65–74,SantaClara,CA,April1997.
[8]J.
CarriereandR.
Kazman,WebQuery:Searchingandvisu-alizingthewebthroughconnectivity,in:Proc.
oftheSixthInternationalWorldWideWebConference,pp.
701–711,SantaClara,CA,April1998.
[9]E.
Gareld,Citationanalysisasatoolinjournalevaluation,Science178(1972).
[10]E.
Gareld,CitationIndexing,ISIPress,Philadelphia,PA,1979.
[11]M.
M.
Kessler,Bibliographiccouplingbetweenscienticpapers,AmericanDocumentation14(1963).
[12]J.
Kleinberg,Authoritativesourcesinahyperlinkedenvi-ronment,in:Proc.
ofthe9thAnnualACM–SIAMSympo-siumonDiscreteAlgorithms,pp.
668–677,January1998.
[13]T.
KistlerandH.
Marais,WebL—Aprogramminglan-guagefortheWeb,in:Proc.
ofthe7thInternationalWorldWideWebConference,pp.
259–270,Brisbane,Qld.
,April1998.
[14]NetscapeCommunicationsCorporation,'What'sRelatedFAQ'webpage,http://home.
netscape.
com/escapes/related/faq.
html[15]NetscapeCommunicationsCorporation,'What'sRelated'webpage,http://home.
netscape.
com/escapes/related/[16]J.
PitkowandP.
Pirolli,Life,death,andlawfulnessontheelectronicfrontier,in:Proc.
oftheConferenceonHumanFactorsinComputingSystems(CHI97),pp.
383–390,March1997.
[17]P.
Pirolli,J.
PitkowandR.
Rao,Silkfromasow'sear:ExtractingusablestructuresfromtheWeb,in:Proc.
oftheConferenceonHumanFactorsinComputingSystems(CHI96),pp.
118–125,April1996.
[18]S.
M.
Ross,IntroductoryStatistics,McGraw-Hill,NewYork,1996.
[19]H.
Small,Co-citationinthescienticliterature:anewmea-sureoftherelationshipbetweentwodocuments,JournalAmericanSocietyInformationScience24(1973).
[20]E.
Spertus,ParaSite:MiningstructuralinformationontheWeb,in:Proc.
oftheSixthInternationalWorldWideWebConference,pp.
587–595,SantaClara,CA,April1997.
[21]U.
ShardanandandP.
Maes,Socialinformationltering:Algorithmsforautomating'WordofMouth',in:Proc.
ofthe1995ConferenceonHumanFactorsinComputingSystems(CHI'95),1995.
[22]L.
TerveenandW.
Hill,Findingandvisualizinginter-siteclangraphs,in:Proc.
oftheConferenceonHumanFactorsinComputingSystems(CHI-98):MakingtheImpossiblePossible,pp.
448–455,ACMPress,NewYork,April18–23,1998.
[23]L.
TerveenandW.
Hill,Evaluatingemergentcollabora-tionontheWeb,in:Proc.
ofACMCSCW'98ConferenceonComputer-SupportedCooperativeWork,pp.
355–362,SocialFiltering,SocialInuences,1998.
JeffreyDeanreceivedhisPh.
D.
fromtheUniversityofWashing-tonin1996,workingonefcientimplementationtechniquesforobject-orientedlanguagesunderProfessorCraigChambers.
HejoinedCompaq'sWesternResearchLaboratoryin1996,whereheworkedonprolingtechniques,performancemonitoringhard-ware,compileralgorithmsandinformationretrieval.
InFebruary,1999,hejoinedmySimon,Inc.
,whereheiscurrentlyworkingonscalablecomparisonshoppingsystemsfortheWorldWideWeb.
HiscurrentresearchinterestsincludeinformationretrievalandthedevelopmentofscalablesystemsfortheWorldWideWeb.
Heistwocontinentsshyofhisgoalofplayingbasketballoneverycontinent.
MonikaR.
HenzingerreceivedherPh.
D.
fromPrincetonUni-versityin1993underthesupervisionofRobertE.
Tarjan.
Af-terwards,shewasanassistantprofessorinComputerScienceatCornellUniversity.
ShejoinedtheDigitalSystemsResearchCenter(nowCompaqComputerCorporation'sSystemsResearchCenter)in1996.
HercurrentresearchinterestsareinformationretrievalontheWorldWideWebandalgorithmicproblemsarisinginthiscontext.

无忧云-河南洛阳BGP,CEPH集群分布式存储,数据安全可靠,活动期间月付大优惠!

 无忧云怎么样?无忧云服务器好不好?无忧云值不值得购买?无忧云是一家成立于2017年的老牌商家旗下的服务器销售品牌,现由深圳市云上无忧网络科技有限公司运营,是正规持证IDC/ISP/IRCS商家,主要销售国内、中国香港、国外服务器产品,线路有腾讯云国外线路、自营香港CN2线路等,都是中国大陆直连线路,非常适合免备案建站业务需求和各种负载较高的项目,同时国内服务器也有多个BGP以及高防节点...

ZJI-全场八折优惠,香港服务器 600元起,还有日本/美国/韩国服务器

ZJI怎么样?ZJI是一家成立于2011年的商家,原名维翔主机,主要从事独立服务器产品销售,目前主打中国香港、日本、美国独立服务器产品,是一个稳定、靠谱的老牌商家。详情如下:月付/年付优惠码:zji??下物理服务器/VDS/虚拟主机空间订单八折终身优惠(长期有效)一、ZJI官网点击直达香港葵湾特惠B型 CPU:E5-2650L核心:6核12线程内存:16GB硬盘:480GB SSD带宽:5Mbps...

ZoeCloud:香港BGP云服务器,1GB内存/20GB SSD空间/2TB流量/500Mbps/KVM,32元/月

zoecloud怎么样?zoecloud是一家国人商家,5月成立,暂时主要提供香港BGP KVM VPS,线路为AS41378,并有首发永久8折优惠:HKBGP20OFF。目前,解锁香港区 Netflix、Youtube Premium ,但不保证一直解锁,谢绝以不是原生 IP 理由退款。不保证中国大陆连接速度,建议移动中转使用,配合广州移动食用效果更佳。点击进入:zoecloud官方网站地址zo...

11kk99 com为你推荐
优酷路由宝怎么赚钱优酷路由宝是如何赚钱的?支付宝查询余额支付宝钱包怎么查余额?二叉树遍历怎么正确理解二叉树的遍历网站联盟网站联盟的运作流程今日热点怎么删除如何彻底删除今日热点网易公开课怎么下载如何下载网易公开课迅雷云点播账号求百度云或者迅雷云播账号密码2012年正月十五山西省太原市2012年正月十五活动的相关情况iphone6上市时间iphone6什么时候上市,价格是多少?安全漏洞如何发现系统安全漏洞
域名注册中心 网游服务器租用 域名服务器是什么 狗爹 themeforest 2014年感恩节 ev证书 卡巴斯基永久免费版 最好的空间 ca4249 韩国名字大全 阿里校园 免费申请个人网站 免费cdn cloudlink 免费asp空间 iki 贵阳电信 畅行云 cdn网站加速 更多