union11kk99

11kk99 com  时间:2021-03-03  阅读:()
ComputerNetworks31(1999)1467–1479FindingrelatedpagesintheWorldWideWebJeffreyDean,1,MonikaR.
Henzinger2CompaqSystemsResearchCenter,130LyttonAve.
,PaloAlto,CA94301,USAAbstractWhenusingtraditionalsearchengines,usershavetoformulatequeriestodescribetheirinformationneed.
ThispaperdiscussesadifferentapproachtoWebsearchingwheretheinputtothesearchprocessisnotasetofqueryterms,butinsteadistheURLofapage,andtheoutputisasetofrelatedWebpages.
ArelatedWebpageisonethataddressesthesametopicastheoriginalpage.
Forexample,www.
washingtonpost.
comisapagerelatedtowww.
nytimes.
com,sincebothareonlinenewspapers.
WedescribetwoalgorithmstoidentifyrelatedWebpages.
ThesealgorithmsuseonlytheconnectivityinformationintheWeb(i.
e.
,thelinksbetweenpages)andnotthecontentofpagesorusageinformation.
Wehaveimple-mentedbothalgorithmsandmeasuredtheirruntimeperformance.
Toevaluatetheeffectivenessofouralgorithms,weperformedauserstudycomparingouralgorithmswithNetscape's'What'sRelated'service(http://home.
netscape.
com/escapes/related/).
Ourstudyshowedthattheprecisionat10forourtwoalgorithmsare73%betterand51%betterthanthatofNetscape,despitethefactthatNetscapeusesbothcontentandusagepatterninformationinadditiontoconnectivityinformation.
1999PublishedbyElsevierScienceB.
V.
Allrightsreserved.
Keywords:Searchengines;Relatedpages;Searchingparadigms1.
IntroductionTraditionalWebsearchenginestakeaqueryasinputandproduceasetof(hopefully)relevantpagesthatmatchthequeryterms.
Whileusefulinmanycircumstances,searchengineshavethedisadvantagethatusershavetoformulatequeriesthatspecifytheirinformationneed,whichispronetoerrors.
ThispaperdiscusseshowtondrelatedWebpages,adifferentapproachtoWebsearching.
InourapproachCorrespondingauthor.
Presentaddress:mySimon,Inc.
,SantaClara,CA,USA.
E-mail:jdean@mysimon.
com1ThisworkwasdonewhiletheauthorwasattheCompaqWesternResearchLaboratory.
2E-mail:monika@pa.
dec.
comtheinputtothesearchprocessisnotasetofqueryterms,buttheURLofapage,andtheoutputisasetofrelatedWebpages.
ArelatedWebpageisonethataddressesthesametopicastheoriginalpage,butisnotnecessarilysemanticallyidentical.
Forexample,givenwww.
nytimes.
com,thetoolshouldndothernewspapersandnewsorganizationsontheWeb.
Ofcourse,incontrasttosearchengines,ourapproachrequiresthattheuserhasalreadyfoundapageofinterest.
RecentworkininformationretrievalontheWebhasrecognizedthatthehyperlinkstructurecanbeveryvaluableforlocatinginformation[1,3,4,8,12,16,17,20,22,23].
Thisassumesthatifthereisalinkfrompagevandw,thentheauthorofvrecommendspagew,andlinksoftenconnect1389-1286/99/$–seefrontmatter1999PublishedbyElsevierScienceB.
V.
Allrightsreserved.
PII:S1389-1286(99)00022-51468J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479Table1ExampleresultsproducedbytheCompanionalgorithmInput:www.
nytimes.
comwww.
usatoday.
comUSATodaynewspaperwww.
washingtonpost.
comWashingtonPostnewspaperwww.
cnn.
comCableNewsNetworkwww.
latimes.
comLosAngelesTimesnewspaperwww.
wsj.
comWallStreetJournalnewspaperwww.
msnbc.
comMSNBCcablenewsstationwww.
sjmercury.
comSanJoseMercuryNewsnewspaperwww.
chicago.
tribune.
comChicagoTribunenewspaperwww.
nando.
netNandoTimeson-linenewsservicewww.
the-times.
co.
ukTheTimesnewspaperrelatedpages.
Inthispaper,wedescribetheCom-panionandCocitationalgorithms,twoalgorithmswhichuseonlythehyperlinkstructureoftheWebtoidentifyrelatedWebpages.
Forexample,Table1showstheoutputoftheCompanionalgorithmwhengivenwww.
nytimes.
comasinput(inthiscase,theresultsfortheCocitationalgorithmareidenticalandtheresultsforNetscapeareverysimilar,althoughthisisnotalwaystrue).
Oneofourgoalswastodesignalgorithmswithhighprecisionthatareveryfastandthatdonotrequirealargenumberofdifferentkindsofinputdata.
SincewehaveatoolthatgivesusaccesstothehyperlinkstructureoftheWeb(theConnectivityServer[2]),wefocusedonalgorithmsthatonlyuseconnectivityinformationtoidentifyrelatedpages.
Ouralgorithmsuseonlytheinformationaboutthelinksthatappearoneachpageandtheorderinwhichthelinksappear.
Theyneitherexaminethecontentofpages,nordotheyexaminepatternsofhowuserstendtonavigateamongpages.
OurCompanionalgorithmisderivedfromtheHITSalgorithmproposedbyKleinbergforrankingsearchenginequeries[12].
KleinbergsuggestedthattheHITSalgorithmcouldbeusedforndingrelatedpagesaswell,andprovidedanecdotalevidencethatitmightworkwell.
Inthispaper,weextendthealgorithmtoexploitnotonlylinksbutalsotheirorderonapage(seeSection2.
1.
1)andpresenttheresultsofauser-studyshowingthattheresultingalgorithmworksverywell.
TheCocitationalgorithmndspagesthatarefrequentlycocitedwiththeinputURLu(thatis,itndsotherpagesthatarepointedtobymanyotherpagesthatallalsopointtou).
NetscapeCommunicatorVersion4.
06introducedarelatedpagesservicethatisbuiltintothebrowser[15](seeSection2.
3foramoredetaileddiscus-sion).
Onthebrowserscreen,thereisa'What'sRe-lated'button,whichpresentsamenuofrelatedWebpagesinsomecases.
The'What'sRelated'algo-rithminNetscapeisbasedontechnologydevelopedbyAlexa,Inc.
,andcomputesitsanswersbasedonconnectivityinformation,contentinformation,andusageinformation[14].
Tocomparetheperformanceofourtwoalgo-rithmsandNetscape'salgorithm,weperformedauserstudyon59URLschosenby18volunteers.
Ourstudyresultsshowthattheprecisionat10com-putedoverall59URLsofourtwoalgorithmsare73%betterand51%betterthanNetscape's.
NotallalgorithmsgaveanswersforallURLsinourstudy.
Ifwerestrictthecomparisontoonlythe37URLsforwhichallthreealgorithmsreturnedanswers,thentheprecisionat10ofourtwoalgorithmsare40%and22%betterthanNetscape'salgorithm.
Thisissurprisingsinceouralgorithmsarebasedonlyonconnectivityinformation.
Netscape'salgorithmgivesanswersforabout17millionURLs[14],whileouralgorithmscangiveanswersforamuchlargersetofURLs(wehavecon-nectivityinformationon180millionURLs).
ThisisimportantbecauseitmeansthatwecangiverelatedURLinformationformoreURLs.
Ouralgorithmsarealsofast:inourenvironment,bothaveragelessthan200msecofcomputationperinputURL.
TheexampleshowninTable1isforaURLwithaveryhighlevelofconnectivity(www.
nytimes.
comcontains47,751inlinksinourConnectivityServer),andallthreealgorithmsgenerallyperformquitewellforwell-connectedURLs.
Ouralgorithmscanalsoworkwellwhenthereismuchlessconnectivity,asshownbytheexampleinTable2.
ThistableshowstheanswersfortheCompanionandNetscapealgo-rithmsforlinas.
org/linux/corba.
html,oneoftheinputURLschosenbyauseraspartofouruserstudy.
Alongsideeachansweristheuser'srat-ingforeachanswer,witha'1'meaningthattheuserconsideredthepagerelated,'0'meaningthattheJ.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791469Table2ComparisonoftheresultsoftheCompanionandNetscapealgorithmsInput:linas.
org/linux/corba.
htmlCompanionNetscape1www.
cs.
wustl.
edu/schmidt/TAO.
html0labserver.
kuntrynet.
com/linux1dsg.
harvard.
edu/public/arachne0web.
singnet.
com.
sg/siuyin1jorba.
castle.
net.
au/–www.
clark.
net/pub/srokicki/linux1www-b0.
fnal.
gov:8000/ROBIN–www.
earth.
demon.
co.
uk/linux/uk1www.
paragon-software.
com/products/oak0www.
emry.
net/webwatcher1www.
tatanka.
com/orb1.
htm0www.
interl.
net/jasoneng/NLL/lwr1www.
oot.
co.
uk/dome-index.
html0www.
jnpcs.
com/mkb/linux0www.
intellisoft.
com/mark1www.
linuxresources.
com/1www.
dstc.
edu.
au/AU/staff/mart.
.
.
0www.
liszt.
com/1www.
inf.
fu-berlin.
de/brose/jacorb0www.
local.
net/jgo/linuxhelp.
htmlA'1'meansthatthepagewasvaluable,a'0'meansthatthepagewasnotvaluable,a'–'meansthatthepagecouldnotbeaccessed.
userconsideredthepageunrelated,and'–'meaningthattheuserwasunabletoaccessthepageatall.
TheoriginalpagewasaboutCORBAimplementa-tionsforLinux,andtherewere123pagespointingtothispageinourConnectivityServer.
NineofthetenanswersgivenbytheCompanionalgorithmweredeemedrelatedbyouruser,whileonlyonepagefromNetscape'ssetofanswerswasdeemedrelated.
MostofNetscape'sanswerswereaboutthemuchbroadertopicofLinux,ratherthanspecicallyaboutCORBAimplementationsonLinux.
Section2presentsouralgorithmsindetailandde-scribesNetscape'sservice,whileSection3discussestheimplementationofouralgorithms.
Section4de-scribestheuserstudyweperformedandpresentsitsresults,andalsoprovidesabriefperformanceeval-uationofouralgorithms.
Finally,Section6presentsourconclusions.
2.
RelatedpagealgorithmsInthissectionwedescribeourtwoalgorithms(theCompanionalgorithmandtheCocitationalgorithm),aswellasNetscape'salgorithm.
UnlikeNetscape'salgorithm,bothofouralgorithmsexploitonlythehyperlink-structure(i.
e.
graphconnectivity)oftheWebanddonotexaminetheinformationaboutthecontentorusageofpages.
Netscape'salgorithmusesallthreekindsofinformationtoarriveatitsresults.
Inthesectionsbelow,weusethetermsparentandchild.
Ifthereisahyperlinkfrompagewtopagev,wesaythatwisaparentofvandthatvisachildofw.
2.
1.
CompanionalgorithmTheCompanionalgorithmtakesasinputastartingURLuandconsistsoffoursteps:(1)Buildavicinitygraphforu.
(2)Contractduplicatesandnear-duplicatesinthisgraph.
(3)Computeedgeweightsbasedonhosttohostconnections.
(4)Computeahubscoreandanauthorityscoreforeachnodeinthegraphandreturnthetoprankedauthoritynodes(ourimplementationreturnsthetop10).
Thisphaseofthealgorithmusesamod-iedversionoftheHITSalgorithmoriginallyproposedbyKleinberg[12].
Thesestepsaredescribedinmoredetailinthesubsectionsbelow.
Onlystep1exploitstheorderoflinksonapage.
2.
1.
1.
Step1:buildingthevicinitygraphGivenaqueryURLuwebuildadirectedgraphofnodesthatarenearbytouintheWebgraph.
GraphnodescorrespondtoWebpagesandgraphedgescorrespondtohyperlinks.
Thegraphconsistsofthefollowingnodes(andtheedgesbetweenthesenodes):1470J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479(1)u,(2)uptoBparentsofu,andforeachparentuptoBFofitschildrendifferentfromu,and(3)uptoFchildrenofu,andforeachchilduptoFBofitsparentsdifferentfromu.
Hereishowwechoosethesenodesindetail:ThereisastoplistSTOPofURLsthatareunrelatedtomostqueriesandthathaveveryhighindegree.
Ourstoplistwasdevelopedbyexperimentation,andcurrentlycontains21URLs.
Examplesofnodesthatappearonourstoplistarewww.
yahoo.
comandwww.
microsoft.
com/ie/download.
html.
IfthestartingURLuisnotoneofthenodesonourstoplist,thenweignorealltheURLsonthestoplistwhenformingthevicinitygraph.
Ifudoesappearonthestoplist,however,thenwedisablethestoplist(i.
e.
setSTOPtotheemptylist)andfreelyincludeanynodesinthevicinitygraph.
WedisablethestoplistwhentheinputURLappearsonthestoplistbecausemanynodesonthestoplistarepopularsearchenginesandportals,andwewanttopermitthesenodestobeconsideredwhentheinputURLisanothersuchpopularsite.
Goback(B),andback-forward(BF):IfuhasmorethanBparents,addBrandomparentsnotonSTOPtothegraph;otherwiseaddallofu'sparents.
IfaparentxofuhasmorethanBFC1children,adduptoBF=2childrenpointedtobytheBF=2linksonximmediatelyprecedingthelinktouanduptoBF=2childrenpointedtobytheBF=2linksonximmediatelysucceedingthelinktou(ignoringduplicatelinks).
IfpagexhasfewerthanBFchildren,weaddallofitschildrentothegraph.
Notethatthisexploitstheorderoflinksonpagex.
Goforward(F),andforward-back(FB):IfuhasmorethanFchildren,addthechildrenpointedtobytherstFlinksofu;otherwise,addallofu'schil-dren.
IfachildofuhasmorethanBFparents,addtheBFparentsnotonSTOPwithhighestindegree;otherwise,addallofthechild'sparents.
Ifthereisahyperlinkfromapagerepresentedbynodevinthegraphtoapagerepresentedbynodew,andvandwdonotbelongtothesamehost,thenthereisadirectededgefromvtowinthegraph(weomitedgesbetweennodesonthesamehost).
Inourexperience,wehavefoundthatusingalargevalueforB(2000)andasmallvalueforBF(8)worksbetterinpracticethanusingmoderatevaluesforeach(say,50and50).
Wehaveobservedthatlinkstopagesonasimilartopictendtobeclusteredtogether,whilelinksthatarefartherapartonapagearelesslikelytobeonthesametopic(forexample,mosthotlistsaregroupedintocategories).
Thishasalsobeenobservedbyotherresearchers[6].
UsingalargervalueforBalsomeansthatthelikelihoodofthecomputationbeingdominatedbyasingleparentpageisreduced.
2.
1.
2.
Step2:duplicateeliminationAfterbuildingthisgraphwecombinenear-dupli-cates.
Wesaytwonodesarenear-duplicatesif(a)theyeachhavemorethan10linksand(b)theyhaveatleast95%oftheirlinksincommon.
Tocombinetwonear-duplicateswereplacetheirtwonodesbyanodewhoselinksaretheunionofthelinksofthetwonear-duplicates.
Thisduplicateeliminationphaseisimportantbecausemanypagesareduplicatedacrosshosts(e.
g.
mirrorsites,differentaliasesforthesamepage),andwehaveobservedthatallowingthemtoremainseparatecangreatlydistorttheresults.
2.
1.
3.
Step3:assignedgeweightsInthisstage,weassignaweighttoeachedge,usingtheedgeweightingschemeofBharatandHen-zinger[3]whichwerepeathereforcompleteness.
Anedgebetweentwonodesonthesamehost3hasweight0.
Iftherearekedgesfromdocumentsonarsthosttoasingledocumentonasecondhostwegiveeachedgeanauthorityweightof1=k.
Thisweightisusedwhencomputingtheauthorityscoreofthedocumentonthesecondhost.
Ifthereareledgesfromasingledocumentonarsthosttoasetofdocumentsonasecondhost,wegiveeachedgeahubweightof1=l.
Weperformthisscalingtopreventasinglehostfromhavingtoomuchinuenceonthecomputation.
Wecalltheresultingweightedgraphthevicinitygraphofu.
2.
1.
4.
Step4:computehubandauthorityscoresInthisstep,weruntheimpalgorithm[3]onthegraphtocomputehubandauthorityscores.
Theimp3Weassumethroughoutthepaperthatthehostcanbedeter-minedfromtheURL-string.
J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791471algorithmisastraightforwardextensionoftheHITSalgorithmwithedgeweights.
TheintuitionbehindtheHITSalgorithmisthatadocumentthatpointstomanyothersisagoodhub,andadocumentthatmanydocumentspointtoisagoodauthority.
Transitively,adocumentthatpointstomanygoodauthoritiesisanevenbetterhub,andsimilarlyadocumentpointedtobymanygoodhubsisanevenbetterauthority.
TheHITScomputationrepeatedlyupdateshubandauthorityscoressothatdocumentswithhighauthorityscoresareexpectedtohaverelevantcontent,whereasdocumentswithhighhubscoresareexpectedtocontainlinkstorelevantcontent.
Thecomputationofhubandauthorityscoresisdoneasfollows:InitializeallelementsofthehubvectorHto1.
0.
InitializeallelementsoftheauthorityvectorAto1.
0.
WhilethevectorsHandAhavenotconverged:ForallnodesninthevicinitygraphN,A[n]:DP.
n0;n/2edges.
N/H[n0]authorityweight.
n0;n/ForallninNH[n]:DP.
n;n0/2edges.
N/A[n0]hubweight.
n;n0/NormalizetheHandAvectors.
Notethatthealgorithmdoesnotclaimtondallrelevantpages,sincetheremaybesomethathavegoodcontentbuthavenotbeenlinkedtobymanyauthors.
TheCompanionalgorithmthenreturnsthenodeswiththetenhighestauthorityscores(excludinguitself)asthepagesthataremostrelatedtothestartpageu.
2.
2.
CocitationalgorithmAnalternativeapproachforndingrelatedpagesistoexaminethesiblingsofastartingnodeuintheWebgraph.
Twonodesarecocitediftheyhaveacommonparent.
Thenumberofcommonparentsoftwonodesistheirdegreeofcocitation.
AsanalternativetotheCompanionalgorithm,wehavedevelopedaverysimplealgorithmthatdeterminesrelatedpagesbylookingforsiblingnodeswiththehighestdegreeofcocitation.
TheCocitationalgo-rithmrstchoosesuptoBarbitraryparentsofu.
Foreachoftheseparentsp,itaddstoasetSuptoBFchildrenofpthatsurroundthelinkfromptou.
TheelementsofSaresiblingsofu.
ForeachnodesinS,wedeterminethedegreeofcocitationofswithu.
Finally,thealgorithmreturnsthe10mostfrequentlycocitednodesinSastherelatedpages.
Insomecasesthereisaninsufcientlevelofcoc-itationwithutoprovidemeaningfulresults.
Inourimplementation,iftherearenotatleast15nodesinSthatarecocitedwithuatleasttwice,thenwerestartthealgorithmusingthenodecorrespondingtou'sURLwithonepathelementremoved.
Forexam-ple,ifu'sURLisa.
com/X/Y/ZandaninsufcientnumberofcocitednodesexistforthisURL,thenwerestartthealgorithmwiththeURLa.
com/X/Y(iftheresultingURLisinvalid,wecontinuetochopelementsuntilweareleftwithjustahostname,orwendavalidURL).
Inourimplementation,wechoseBtobe2000andBFtobe8(thesameparametervaluesweusedforourimplementationoftheCompanionalgorithm).
OnewayoflookingattheCocitationalgorithmisthatitnds'maximal'n2bipartitesubgraphsinthevicinitygraph.
2.
3.
Netscape'sapproachNetscapeintroducedanew'What'sRelated'fea-tureinversion4.
06oftheNetscapeCommunicatorbrowser.
Detailsabouttheapproachusedtoidentifyrelatedpagesintheiralgorithmaresketchy.
How-ever,theWhat'sRelatedFAQpageindicatesthatthealgorithmusesconnectivityinformation,usageinformation,andcontentanalysisofthepagestodeterminerelationships.
Wequotefromthe'What'sRelated'FAQpage:TheWhat'sRelateddataiscreatedbyAlexaInternet.
Alexausescrawling,archiving,categorizinganddataminingtechniquestobuildtherelatedsiteslistsformillionsofWebURLs.
Forexample,Alexauseslinksonthecrawledpagestondrelatedsites.
Theday-to-dayuseofWhat'sRelatedalsohelpsbuildandrenethedata.
Astheserviceisused,therequestedURLsarelogged.
Bylookingathigh-leveltrends,NetscapeandAlexacandeducerelationshipsbetweenWebsites.
Forexample,ifthousandsofusersgodirectlyfromsiteAtositeB,thetwositesarelikelytoberelated.
1472J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479Next,AlexachecksalltheURLstomakesuretheyarelivelinks.
Thisprocessremoveslinksthatwouldtrytoreturnpagesthatdon'texist(404errors),aswellasanylinkstoserversthatarenotavailabletothegeneralInternetpopulation,suchasserversthatarenolongeractiveorarebehindrewalls.
Finally,oncealloftherelationshipsareestablishedandthelinksarechecked,thetoptenrelatedsitesforeachURLarechosenbylookingatthestrengthoftherelationshipbetweenthesites.
Eachmonth,AlexarecrawlstheWebandrebuildsthedatatopullinnewsitesandtorenetherelationshipsbetweentheexistingsites.
Newsiteswithstrongre-lationshipstoasitewillautomaticallyappearintheWhat'sRelatedlistforthatsitebydisplacinganysiteswithweakerrelationships.
Notethatsincetherelationshipsbetweensitesarebasedonstrength,What'sRelatedlistsarenotnec-essarilybalanced.
SiteAmayappearinthelistforSiteB,butSiteBmaynotbeinthelistforSiteA.
Generally,thishappenswhenthenumberofsiteswithstrongrelationshipsisgreaterthanten,orwhensitesdonothavesimilarenoughcontent.
3.
ImplementationInexperimentingwiththesealgorithms,wewerefortunatetohaveaccesstoCompaq'sConnectivityServer[2].
TheConnectivityServerprovideshighspeedaccesstothegraphstructureof180millionURLs(nodes)andthelinks(edges)thatconnectthem.
TheentiregraphstructureiskeptinmemoryonaCompaqAlphaServer4100with8GBofmainmemoryanddual466MHzAlphaprocessors.
Therandomaccesspatternsengenderedbytheconnec-tivity-basedalgorithmsdescribedinthispapermeanthatitisimportantformostorallofthegraphtotinmainmemorytopreventhighlevelsofpagingactivity.
Weimplementedamulti-threadedserverthatac-ceptsaURLanduseseithertheCocitationalgorithmortheCompanionalgorithmtondpagesrelatedtothegivenURL.
Ourserverimplementationconsistsofapproximately5500linesofcommentedCcode,ofwhichapproximately1000linesimplementtheCompanionalgorithm,400linesimplementtheCoc-itationalgorithm,andtheremainderaresharedcodetoperformtaskssuchasparsingHTTPqueryre-quests,printingresults,andloggingstatusmessages.
WelinkourservercodedirectlywiththeCon-nectivityServerlibrary,andaccesstheconnectivityinformationbymmappingthegraphinformationintotheaddressspaceofourserver.
OurimplementationoftheCompanionalgorithmhasbeensubjectedtoamoderateamountofper-formancetuning,mostlyindesigningtheneighbor-hoodgraphdatastructurestoimprovedata-cacheperformance.
TheimplementationoftheCocitationalgorithmhasnotbeentunedextensively,althoughitdoesshareafairamountofcodewiththeCompan-ionalgorithm,andthissharedcodehasbeentunedsomewhat.
4.
EvaluationInthissectionwedescribetheevaluationweperformedforthealgorithms.
Section4.
1describesouruserstudy,whileSection4.
2discussestheresultsofthestudy.
Section4.
3evaluatestheruntimeperformanceofouralgorithms.
4.
1.
ExperimentalsetupTocomparethedifferentapproaches,weper-formedauserstudy.
Weasked18volunteerstosupplyuswithatleasttwoURLsforwhichtheywantedtondrelatedpages.
Ourvolunteersincluded14computerscienceprofessionals(mostlyourcol-leaguesatCompaq'sresearchlaboratories),aswellas4peoplewithotherprofessionalcareers.
Were-ceived69URLsandusedeachofthealgorithmstodeterminethetop10answersforeachURL.
Weputtheanswersinrandomorderandreturnedthemtothevolunteersforrating.
Thevolunteerswereinstructedasfollows:Table3SummaryofallanswersforthealgorithmsAlgorithmNo.
ofURLsNo.
ofNo.
ofwithanswersanswersdeadlinksCompanion5049842Cocitation5858062Netscape4036429J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791473Wewanttomeasurehowwelleachalgorithmper-forms.
TomeasureperformancewewanttoknowthepercentageofvaluableURLsreturnedbyeachalgo-rithm.
TobevaluabletheURLmustbebothrelevanttothetopicyouareinterestedinandahighqualitypage.
Forexample,ifyourURLwaswww.
audi.
comandyougetbackanewsgrouppostingwheresomebodytalksabouthisnewAudicar,thenthepagewasontopic,butprobablynothighquality.
Ontheotherhand,ifyougetwww.
jaguar.
comasananswer,thenitisuptoyoutodecidewhetherthisanswerisontopicornot.
Scoring:0:Pagewasnotvaluable=useful1:Pagewasvaluable=useful–:Pagecouldnotbeaccessed(i.
e.
didnotexist,orserverwasdown)Pleaseignoretheorderinwhichthepagesarereturned.
Soifalaterpagecontainssimilarcontenttoanearlierpagepleaseratethelatterpageasifyouhadnotseentheearlierpage.
Thiswillimplythatwedonotmeasurehow'happy'youarewithasetofanswersreturnedbyanalgorithm.
Insteadwemeasurehowmanyvaluableanswerseachalgorithmgives.
"4.
2.
UserstudyresultsWereceivedresponsesratingtheanswerURLsfor59oftheinputURLs.
These59inputURLsformthebasisofourstudy.
Table3showshowmanyofthesequeriesthealgorithmsansweredandhowmanyanswerURLstheyreturned.
Inmanycases,thealgorithmsreturnedlinksthatourusersratedasinaccessible.
ThecolumnlabeledNo.
ofdeadlinksshowsthenumberofinaccessiblepagesamongalltheanswersforeachalgorithm.
Forthepurposesofourevaluation,wetreataninaccessiblelinkasascoreof'0',sinceinaccessiblepagesarenotvaluable=useful.
TheCocitationalgorithmreturnedresultsforallbutoneoftheURLs.
ThereasonwhyitreturnedresultsforalmostallinputURLsisthatwheninsuf-cientconnectivitywasfoundsurroundinganinputURL(e.
g.
a.
com/X/Y),theCocitationalgorithmusedachoppedURLasinput(e.
g.
a.
com/X).
Al-thoughwedidnotincludethischoppingfeatureinourimplementationoftheCompanionalgorithm,itisdirectlyapplicableandwouldenabletheCompan-ionalgorithmtoreturnanswersformoreURLs.
WehaveempiricallyobservedthatNetscape'salgorithmalsoappliesasimilarchoppingheuristicinsomecases.
Table4containsalistingofthe59URLsinourstudy.
ForeachURL,thethreecolumnslabeledCp,Ct,andNshowtheURLsforwhichtheCompanion,Cocitation,andNetscapealgorithmsreturnedresults,respectively.
ThetablealsoshowsthenumberofhyperlinkspointingtotheURLintheConnectivityServer(Inlinks).
FortheCompanionalgorithm,itshowsthenumberofnodesandedgesinthevicinitygraph,aswellasthewallclocktimeinmillisecondstakentocomputethesetofrelatedpages(computedbysurroundingthecomputationwithgettimeof-daysystemcalls).
Forthecocitationalgorithm,itshowsthenumberofsiblingsfound,thenumberofsiblingscocitedatleasttwice(Cocited),andthewallclocktimetakentocomputetheanswers.
Thethreealgorithmsreturnanswersfordiffer-entsubsetsofour59URLs.
Tocomparethesealgorithms,wecansubdividetheURLsintosev-eralgroups.
TheintersectiongroupconsistsofthoseURLswhereallalgorithmsreturnedatleastoneanswer.
Therewere37URLsinthisgroup.
Thenon-NetscapegroupconsistsoftheURLswhereNetscape'sapproachdidnotreturnanyanswers.
Itconsistsof19URLs.
Toquantifytheperformanceofthethreealgo-rithms,wenowdenetwometrics.
Theprecisionatrforagivenalgorithmisthetotalnumberofanswersreceivinga'1'scorewithintherstranswers,di-videdbyrtimesthenumberofqueryURLs.
NoticethatwhenanalgorithmdoesnotgiveanyanswersforaURL,thisisasifitgaveallnon-relevantanswersforthatURL.
ForagivenURLu,theaverageprecisionforuofanalgorithmisthesumoftheprecisionateachrankwheretheanswerofthealgorithmforureceiveda'1'scoredividedbythetotalnumberoftheanswersofthealgorithmforureceivinga'1'score.
Ifthealgorithmdoesnotreturnanyanswersforu,itsaverageprecisionforuis0.
TheoverallaverageprecisionforanalgorithmisthesumofalltheaverageprecisionsforallthequeryURLsdividedbythetotalnumberofqueryURLs.
ForeachofthethreegroupsofURLs(all,in-tersection,andnon-Netscape),Table5showstheaverageprecisionandtheprecisionat10foreachalgorithm.
Fig.
1showstheprecisionatrforeachof1474J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479Table4User-providedURLsforevaluation(andstatistics)URLCpCtNInCompanionalgorithmCocitationalgorithmlinksNodesEdgesTimeSiblingsCocitedTime(ms)(ms)1.
babelsh.
altavista.
digital.
com/cgi.
.
.
ppp29284638210305269660110795732.
developer.
intel.
com/design/strong/t.
.
.
ppp19333479128539173.
english-server.
hss.
cmu.
edu/ppp377441978263215663020635844.
hack.
box.
sk/ppp1666796314090488609914935155.
ieor.
berkeley.
edu/hochbaum/html/bo.
.
.
pp5647344915176.
linas.
org/linux/corba.
htmlppp1232028422277444138487.
members.
tripod.
com/src-fall-regatt.
.
.
p0000680015237368.
metroactive.
com/music/ppp132013026716858319.
travelwithkids.
miningco.
com/pp9974410512745912847110.
www-db.
stanford.
edu/wienerpp21718381630110011.
www.
acf.
dhhs.
gov/ppp18225522149673943659140142412.
www.
adventurewest.
com/pub/NASTC/ppp136592452161413.
www.
amazon.
com/exec/obidos/cache/br.
.
.
pp0000114941126214.
www.
anglia.
ac.
uk/systimk/Humour/Hi.
.
.
p000010327915.
www.
ayso26.
org/p5142169800016.
www.
babynames.
com/ppp52826954781110182649616717.
www.
braintumor.
org/ppp2017791763355512126818.
www.
bris.
ac.
uk/Depts/Union/BTS/Scri.
.
.
ppp13150510841623791527219.
www.
carrier.
com/ppp96427555672123183758016520.
www.
chesschampions.
com/kasparov.
htm.
.
.
ppp11579035352205521.
www.
cl.
cam.
ac.
uk/users/rja14/tamper.
.
.
ppp1197411256274521553622.
www.
davecentral.
com/ppp69094703113912634349114847723.
www.
duofold.
com/stepout/ski-wsa.
htmp0000185662524.
www.
ebay.
com/ppp4658438986601995951131449325.
www.
etoys.
com/ppp2579229461531113351100542726.
www.
fa.
com/ppp128156105123603496452145252227.
www.
focus.
de/ppp76624881140393614208130152528.
www.
geocities.
com/Paris/Metro/1324/pp1192138739153029.
www.
geocities.
com/TheTropics/2442/d.
.
.
pp6445068519184652530.
www.
harappa.
com/har/har0.
htmlppp3823531886202372331.
www.
harmony-central.
com/MIDI/Doc/tu.
.
.
ppp311842497153272132.
www.
hotelres.
com/ppp70830726036148238088919633.
www.
innovation.
ch/java/HTTPClient/ppp7334163913215682134.
www.
inquizit.
com/pp126295454181535.
www.
israelidance.
com/ppp402103239173401636.
www.
jewishmusic.
com/ppp3991617311166123239713437.
www.
joh.
cam.
ac.
uk/pp1625571027234091143738.
www.
levenger.
com/ppp259136720835111162748739.
www.
mdl.
sandia.
gov/micromachine/gal.
.
.
p0000143611540.
www.
midiweb.
com/ppp19675304155493403370114140841.
www.
minimed.
com/ppp2177781663335732236942.
www.
mit.
edu/people/mkgray/net/ppp25811382124448962877243.
www.
mot-sps.
com/ppp3918831896384992065444.
www.
movielink.
com/ppp122746205115893375622120751445.
www.
netusa1.
net/spost/bench.
htmlpp19939442298646.
www.
nsc.
com/catalog/sg708.
htmlpp00002774114933347.
www.
odci.
gov/cia/publications/factb.
.
.
ppp3765435283222216708196261348.
www.
paccc.
com/ppp18409625156421749.
www.
perl.
com/perl/index.
htmlppp14539243918550.
www.
pianospot.
com/1700305.
htmp0000392385J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791475Table4(continued)URLCpCtNInCompanionalgorithmCocitationalgorithmlinksNodesEdgesTimeSiblingsCocitedTime(ms)(ms)51.
www.
rei-outlet.
com/pp20105163681161652.
www.
sddt.
com/les/library/94headli.
.
.
ppp9316633647126038653.
www.
sultry.
arts.
su.
edu.
au/links/lin.
.
.
pp5425846111210742154.
www.
telemarque.
com/articles/andrnch.
.
.
pp00006072416155.
www.
trane.
com/ppp90029656201140209668418156.
www.
traxxx.
de/pp12194607105603043355118447857.
www.
us-soccer.
com/ppp117534588515201234486523058.
www.
wisdom.
weizmann.
ac.
il/naor/puz.
.
.
pp84965396161059.
www.
wwa.
com/android7/pilot/index.
h.
.
.
p00001846709214thesegroupsofURLsingraphsa,bandc.
Fig.
1aandbillustratethattheCompanionandCocitationalgorithmssubstantiallyoutperformNetscape'sal-gorithmatallranks,andtheCompanionalgorithmalmostalwaysoutperformstheCocitationalgorithm.
Theintersectiongroupisthemostinterestingcomparison,sinceitavoidspenalizinganalgorithmfornotreturningatleastoneanswer.
Fortheinter-sectiongroup,Netscape'salgorithmachievesapreci-sionat10of0.
357,whiletheCompanionalgorithmachievesaprecisionat10of0.
501(40%better),andtheCocitationalgorithmachievesaprecisionat10of0.
435(22%better).
Theaverageprecisionintheintersectiongroupdoesnotpenalizeanalgorithmforreturningfewerthan10answers.
Underthismet-ric,theCompanionalgorithmis32%betterthanNetscape'salgorithm,whiletheCocitationalgorithmis20%betterthanNetscape'salgorithm.
InthegroupthatincludesallURLs,allthreeal-gorithmshaddropsintheirprecisionat10values.
Therearetworeasonsforthis.
Therstisthatalgo-rithmsweregivenaprecisionof0foragivenURLiftheydidnotreturnanyanswers.
ThismostlyaffectedtheNetscapeandCompanionalgorithms.
Thesec-Table5PrecisionmetricsforeachalgorithmforthreegroupsofURLsAlgorithmAllIntersectionNon-NetscapeAverageprecisionPrecisionat10AverageprecisionPrecisionat10AverageprecisionPrecisionat10Companion0.
5410.
4170.
6660.
5010.
5400.
401Cocitation0.
5180.
3630.
6050.
4350.
4340.
325Netscape0.
3430.
2410.
5020.
357n=an=aondreasonisthatfortheURLsinthenon-Netscapeset,boththeCompanionandCocitationalgorithmsdidnotperformaswellastheydidforURLsintheIntersectionset.
Despitethesedropsinabsoluteaverageprecision,theaverageprecisionoftheCom-panionalgorithmis57%betterthanthatofNetscape,andtheaverageprecisionoftheCocitationalgorithmis51%betterthanthatofNetscape.
Similarresultsholdwhenexaminingaverageprecisionratherthanprecisionat10.
Toevaluatethestatisticalsignicanceofourre-sults,wecomputedthesigntestandtheWilcoxsonsumsofrankstestforeachpairofalgorithms[18].
TheseresultsareshowninTable6andshowthatthedifferencebetweentheCompanionandNetscapealgorithmsandbetweentheCocitationandNetscapealgorithmsarestatisticallysignicant.
WealsowantedtoevaluatewhetherornotthealgorithmsweregenerallyreturningthesameresultsforagivenURLorwhethertheywerereturninglargelydisjointsetsofURLs.
Table7showstheamountofoverlapintheanswersreturnedbyeachpairofalgorithms.
Thepercentageinparenthesesistheoverlapdividedbythetotalnumberofan-1476J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479110r0.
00.
20.
40.
60.
81.
0Precision(a)AllCompanionCocitationNetscape110r0.
00.
20.
40.
60.
81.
0(b)Intersection110r0.
00.
20.
40.
60.
81.
0(c)Non-NetscapeFig.
1.
PrecisionatrforthethreegroupsofURLs.
Table6SigntestandWilcoxonsumofrankstestforalgorithmpairsAlgorithmAllIntersectionNon-NetscapeSignRanksumSignRanksumSignRanksumCompanionbetterthanNetscapeCompanionbetterthanCocitation0.
19220.
38980.
07930.
26280.
26430.
4180Table7OverlapbetweenanswersreturnedbyalgorithmsAlgorithmCompanionCocitationNetscapeCompanion253(51%)55(11%)Cocitation253(44%)56(10%)Netscape55(15%)56(15%)swersreturnedbythealgorithminthatrow.
Asthetableshows,thereisalargeoverlapbetweentheanswersreturnedbytheCompanionandCocitationalgorithms.
Thisisnotsurprising,sincethetwoalgorithmsarebothbasedonconnectivityinforma-tionsurroundingtheinputURLandsincebothusesimilarparameterstochoosethesurroundingnodes.
ThereisrelativelylittleoverlapbetweentheanswersreturnedbyNetscapeandtheothertwoalgorithms.
4.
3.
Run-timeperformanceInthissection,wepresentdataabouttherun-timeperformanceoftheCompanionandCocitationalgorithms.
SincewedonothavedirectaccesstoNetscape'salgorithmandonlyaccessitthroughthepublicWebinterface,weareunabletopresentper-formanceinformationforNetscape'salgorithm.
AllmeasurementswereperformedonaCompaqAl-phaServer4100with8GBofmainmemoryanddual466MHzAlphaprocessors.
ThemeasuredrunningtimesarewallclocktimesfromthetimetheinputURLisgiventotheserveruntilthetimetheanswersarereturned.
ThesetimesdonotincludethetimetakentoformattheresultsasanHTMLpage,sincethatwasdonebyaserverprocessrunningonanothermachine(andthetimetodothiswasnegligible).
TheaveragerunningtimefortheCompanionalgorithmonthe50URLsforwhichitreturnedanswerswas109msec,whiletheaveragerunningtimefortheCocitationalgorithmonthe58URLsforwhichitprovidedanswerswas195msec.
Theperformanceofboththesealgorithmsissufcientlyfastthateitheronecouldhandlealargeamountoftrafc(closeto800,000requestsperdayfortheCompanionalgorithm).
Furthermore,theaverageperformancecouldprobablybeimprovedbycachinganswersforfrequentlyrequestedURLs.
Althoughwedidnotexplicitlyincludethisfactorinouruserstudy,wehaveinformallyobservedthatthesubjectivequalityofanswersreturnedforboththeCompanionandtheCocitationalgorithmsdoesJ.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791477050001000015000Graphedges0100200300400Companiontime(msec)(a)0200040006000#ofsiblings0200400600Cocitationtime(msec)(b)Fig.
2.
GraphsizeversusrunningtimeofCompanionandCocitationalgs.
notdecreasewhenwesomewhatdecreasetheparam-eterB(thenumberofinlinksconsidered)duringthebuildingofthevicinitygraph.
Thisisimportantforon-lineservicesbecauseitmeansthatthegraphsizecouldbereducedduringtimesofhighload,therebyreducingtheamountoftimetakentoserviceeachrequest.
Underconditionsoflowload,thegraphsizecouldbeincreased.
TheCompanionalgorithmgenerallyconvergesonitsanswerswithinafewiterations(typicallylessthan10iterations),butthenumberofiterationsincreaseswiththegraphsize.
Eachiterationtakestimethatislinearinthenumberofedgesinthevicinitygraph.
WeplottherunningtimeversusthenumberofgraphedgesinFig.
2a.
TherunningtimeoftheCocitationalgorithmisO.
nlogn/,wherenisthenumberofsiblingsexaminedforcocitation,sinceitsortsthesiblingsbythedegreeofcocitation.
ThiseffectisillustratedinFig.
2b.
Inourexperience,therunningtimesforthecocitationandcompanionalgorithmsaregenerallycorrelated,sinceURLswhichhavealargenumberofsiblingstoconsiderinthecocitationalgorithmalsogenerallyproducealargeneighborhoodgraphforprocessinginthecompanionalgorithm.
5.
RelatedworkManyresearchershaveproposedschemesforus-ingthehyperlinkstructureoftheWeb[1,3,4,8,12,16,17,20,22,23].
Forthemostpart,thisworkdoesnotdiscussthendingofrelatedpages,withfourexceptionsdiscussedbelow.
Weknowofonlyonepreviousworkthatexploitstheorderoflinks:Chakrabartietal.
[6]usethelinksandtheirordertocategorizeWebpagesandtheyshowthatthelinksthatarenearagivenlinkinpageorderfrequentlypointtopagesonthesametopic.
PreviousauthorshavesuggestedusingcocitationandotherformsofconnectivitytoidentifyrelatedWebpages.
Spertusobservedthatcocitationcanin-dicatethattwopagesarerelated[20].
Thatis,ifpageApointstobothpagesBandC,thenBandCmightberelated.
Variousresearchersintheeldofbibliometricshavealsoobservedthis[9–11,19],andthisobservationformsthebasisofourCocitationalgorithm.
Thenotionofcollaborativeltering,al-thoughitisbasedonuser'srecommendationsratherthanhyperlinks,alsoreliesonthisobservation[21].
PitkowandPirolli[16]clusterWebpagesbasedoncocitationanalysis.
TerveenandHill[22]usetheconnectivitystructureoftheWebtondgroupsofrelatedWebsites.
OurcompanionalgorithmdescendedfromtheHITSalgorithmdevelopedbyKleinberg[12].
TheHITSalgorithmwasoriginallyproposedbyKlein-bergasawayofusingconnectivitystructuretoiden-tifythemostauthoritativesourcesofinformationonaparticulartopic,wherethetopicwasdenedbythecombinedlinkstructureofalargenumberofstartingnodesonthetopic.
Kleinbergalsoproposedthatthe1478J.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–1479HITSalgorithmcouldbeusedtondrelatedpageswhenthetopicwasdenedbyjustasinglenode.
TheCompanionalgorithmusedHITSalgorithmasastartingpointandextendedandmodieditinfourmainways:(1)Kleinbergsuggestedusingthefollowinggraphtondrelatedpages:Takeaxednumber(say200)ofparentsofthegivenURLandcallthesetconsistingoftheURLandtheseparentsthestartset.
Nowbuildthegraphconsistingofallnodespointingtoanodeinthestartsetorpointedtobyanodeinthestartset.
Thismeansthat'grandparents'ofuareincludedinthegraph,whilenodesthatshareachildwithuarenotincludedinthegraph.
Webelievethatthelatternodesaremorelikelytoberelatedtouthanarethe'grandparent'nodes.
Thereforeourvicinitygraphisstructuredtoexcludegrandparentnodesbuttoincludenodesthatshareachildwithu.
(2)Weexploittheorderofthelinksonapagetodeterminewhich'siblings'ofutoinclude.
Whenweaddedthisfeature,theprecisionofouralgorithmimprovednoticably.
(3)TheoriginalHITSalgorithmdidnothaveedgeweights.
Weuseedgeweightstoreducetheinuenceofpagesthatallresideononehost,sinceBharatandHenzingerhaveshownthatedgeweightsimprovetheprecision[3].
(4)Wealsomergenodesinourvicinitygraphthathavealargenumberofduplicatelinks.
Dupli-catenodesarenotsuchaseriousproblemwhenusingtheHITSorimpalgorithmstorankqueryresults,sincethestartsetconsistsofalargenumberofURLs.
However,whenformingthevicinitygraphstartingwithjustasingleURL,theinuenceofduplicatenodesisincreasedbe-causeduplicatenodeswithalargenumberofoutlinkswillquicklydominatethehubandauthoritycomputation.
KleinbergalsoshowedthatHITScomputestheprincipaleigenvectorofthematrixAAT,whereAistheadjacencymatrixoftheabovegraph,andsug-gestedusingnon-principaleigenvectorsforndingrelatedpages.
Finally,hegaveanecdotalevidencethatHITSmightworkwell.
Consecutively,asequenceofpapers[5,7]pre-sentedimprovementsonHITSandusedittopopu-lateagivenhierarchyofcategories.
6.
ConclusionWehavepresentedtwodifferentalgorithmsforndingrelatedpagesintheWorldWideWeb.
TheysignicantlyoutperformNetscape'salgorithmforndingrelatedpages.
Thealgorithmscanbeimple-mentedefcientlyandaresuitableforusewithinalargescaleWebserviceprovidingarelatedpagesfeature.
OurtwoalgorithmscanbeextendedtohandlemorethanoneinputURL.
Inthiscase,thealgo-rithmswouldcomputepagesthatarerelatedtoallinputURLs.
Wearecurrentlyexploringtheseexten-sions.
AcknowledgementsThisworkhasbenetedgreatlyfromdiscussionswehavehadwithKrishnaBharat,AndreiBroder,PuneetKumar,andHannesMarais.
WearealsoindebtedtoPuneetforhisworkontheConnec-tivityServer.
Assomeoftheearliestusersoftheserver,Puneetansweredourmanyquestionsandimplementedmanyimprovementstotheserverinresponsetooursuggestions.
WearealsogratefultoHannesMaraisfordevelopingWebL,aWebscript-inglanguage[13].
UsingWebL,wewereabletoquicklydevelopaprototypeuserinterfaceforourrelatedpagesserver.
KrishnaBharat,AllanHeydon,andHannesMaraisprovidedusefulfeedbackonear-lierdraftsofthispaper.
Finally,wewouldliketothankalltheparticipantsinouruserstudy.
References[1]G.
O.
Arocena,A.
O.
MendelzonandG.
A.
Mihaila,Appli-cationsofawebquerylanguage,in:Proc.
oftheSixthInternationalWorldWideWebConference,pp.
587–595,SantaClara,CA,April,1997.
[2]K.
Bharat,A.
Z.
Broder,M.
Henzinger,P.
KumarandS.
Venkatasubramanian,Theconnectivityserver:fastaccesstolinkageinformationontheWeb,in:Proc.
ofthe7thInternationalWorldWideWebConference,pp.
469–477,Brisbane,Qld,April1998.
[3]K.
BharatandM.
Henzinger,Improvedalgorithmsfortopicdistillationinhyperlinkedenvironments,in:Proc.
ofthe21stInternationalACMSIGIRConferenceonResearchJ.
Dean,M.
R.
Henzinger/ComputerNetworks31(1999)1467–14791479andDevelopmentinInformationRetrieval(SIGIR'98),pp.
104–111,1998.
[4]S.
BrinandL.
Page,Theanatomyofalarge-scalehyper-textualWebsearchengine,in:Proc.
ofthe7thInternationalWorldWideWebConference,pp.
107–117,Brisbane,Qld.
,April1998.
[5]S.
Chakrabarti,B.
Dom,D.
Gibson,S.
R.
Kumar,P.
Ragha-van,S.
RajagopalanandA.
Tomkins,Experimentsintopicdistillation,in:ACM–SIGIR'98Post-ConferenceWorkshoponHypertextInformationRetrievalfortheWeb,1998.
[6]S.
Chakrabarti,B.
DomandP.
Indyk,Enhancedhypertextcategorizationusinghyperlinks,in:Proc.
oftheACMSIG-MODInternationalConferenceonManagementofData,pp.
307–318,1998.
[7]S.
Chakrabarti,B.
Dom,P.
Raghavan,S.
Rajagopalan,D.
GibsonandJ.
Kleinberg,Automaticresourcecompilationbyanalyzinghyperlinkstructureandassociatedtext,in:Proc.
oftheSixthInternationalWorldWideWebConfer-ence,pp.
65–74,SantaClara,CA,April1997.
[8]J.
CarriereandR.
Kazman,WebQuery:Searchingandvisu-alizingthewebthroughconnectivity,in:Proc.
oftheSixthInternationalWorldWideWebConference,pp.
701–711,SantaClara,CA,April1998.
[9]E.
Gareld,Citationanalysisasatoolinjournalevaluation,Science178(1972).
[10]E.
Gareld,CitationIndexing,ISIPress,Philadelphia,PA,1979.
[11]M.
M.
Kessler,Bibliographiccouplingbetweenscienticpapers,AmericanDocumentation14(1963).
[12]J.
Kleinberg,Authoritativesourcesinahyperlinkedenvi-ronment,in:Proc.
ofthe9thAnnualACM–SIAMSympo-siumonDiscreteAlgorithms,pp.
668–677,January1998.
[13]T.
KistlerandH.
Marais,WebL—Aprogramminglan-guagefortheWeb,in:Proc.
ofthe7thInternationalWorldWideWebConference,pp.
259–270,Brisbane,Qld.
,April1998.
[14]NetscapeCommunicationsCorporation,'What'sRelatedFAQ'webpage,http://home.
netscape.
com/escapes/related/faq.
html[15]NetscapeCommunicationsCorporation,'What'sRelated'webpage,http://home.
netscape.
com/escapes/related/[16]J.
PitkowandP.
Pirolli,Life,death,andlawfulnessontheelectronicfrontier,in:Proc.
oftheConferenceonHumanFactorsinComputingSystems(CHI97),pp.
383–390,March1997.
[17]P.
Pirolli,J.
PitkowandR.
Rao,Silkfromasow'sear:ExtractingusablestructuresfromtheWeb,in:Proc.
oftheConferenceonHumanFactorsinComputingSystems(CHI96),pp.
118–125,April1996.
[18]S.
M.
Ross,IntroductoryStatistics,McGraw-Hill,NewYork,1996.
[19]H.
Small,Co-citationinthescienticliterature:anewmea-sureoftherelationshipbetweentwodocuments,JournalAmericanSocietyInformationScience24(1973).
[20]E.
Spertus,ParaSite:MiningstructuralinformationontheWeb,in:Proc.
oftheSixthInternationalWorldWideWebConference,pp.
587–595,SantaClara,CA,April1997.
[21]U.
ShardanandandP.
Maes,Socialinformationltering:Algorithmsforautomating'WordofMouth',in:Proc.
ofthe1995ConferenceonHumanFactorsinComputingSystems(CHI'95),1995.
[22]L.
TerveenandW.
Hill,Findingandvisualizinginter-siteclangraphs,in:Proc.
oftheConferenceonHumanFactorsinComputingSystems(CHI-98):MakingtheImpossiblePossible,pp.
448–455,ACMPress,NewYork,April18–23,1998.
[23]L.
TerveenandW.
Hill,Evaluatingemergentcollabora-tionontheWeb,in:Proc.
ofACMCSCW'98ConferenceonComputer-SupportedCooperativeWork,pp.
355–362,SocialFiltering,SocialInuences,1998.
JeffreyDeanreceivedhisPh.
D.
fromtheUniversityofWashing-tonin1996,workingonefcientimplementationtechniquesforobject-orientedlanguagesunderProfessorCraigChambers.
HejoinedCompaq'sWesternResearchLaboratoryin1996,whereheworkedonprolingtechniques,performancemonitoringhard-ware,compileralgorithmsandinformationretrieval.
InFebruary,1999,hejoinedmySimon,Inc.
,whereheiscurrentlyworkingonscalablecomparisonshoppingsystemsfortheWorldWideWeb.
HiscurrentresearchinterestsincludeinformationretrievalandthedevelopmentofscalablesystemsfortheWorldWideWeb.
Heistwocontinentsshyofhisgoalofplayingbasketballoneverycontinent.
MonikaR.
HenzingerreceivedherPh.
D.
fromPrincetonUni-versityin1993underthesupervisionofRobertE.
Tarjan.
Af-terwards,shewasanassistantprofessorinComputerScienceatCornellUniversity.
ShejoinedtheDigitalSystemsResearchCenter(nowCompaqComputerCorporation'sSystemsResearchCenter)in1996.
HercurrentresearchinterestsareinformationretrievalontheWorldWideWebandalgorithmicproblemsarisinginthiscontext.

PacificRack 下架旧款方案 续费涨价 谨慎自动续费

前几天看到网友反馈到PacificRack商家关于处理问题的工单速度慢,于是也有后台提交个工单问问,没有得到答复导致工单自动停止,不清楚商家最近在调整什么。而且看到有网友反馈到,PacificRack 商家的之前年付低价套餐全部下架,而且如果到期续费的话账单中的产品价格会涨价不少。所以,如果我们有需要续费产品的话,谨慎选择。1、特价产品下架我们看到他们的所有原来发布的特价方案均已下架。如果我们已有...

[6.18]IMIDC:香港/台湾服务器月付30美元起,日本/俄罗斯服务器月付49美元起

IMIDC发布了6.18大促销活动,针对香港、台湾、日本和莫斯科独立服务器提供特别优惠价格最低月付30美元起。IMIDC名为彩虹数据(Rainbow Cloud),是一家香港本土运营商,全线产品自营,自有IP网络资源等,提供的产品包括VPS主机、独立服务器、站群独立服务器等,数据中心区域包括香港、日本、台湾、美国和南非等地机房,CN2网络直连到中国大陆。香港服务器   $39/...

Puaex:香港vds,wtt套餐,G口带宽不限流量;可解流媒体,限量补货

puaex怎么样?puaex是一家去年成立的国人商家,本站也分享过几次,他家主要销售香港商宽的套餐,给的全部为G口带宽,而且是不限流量的,目前有WTT和HKBN两种线路的方面,虽然商家的价格比较贵,但是每次补一些货,就会被抢空,之前一直都是断货的状态,目前商家进行了补货,有需要这种类型机器的朋友可以入手。点击进入:puaex商家官方网站Puaex香港vds套餐:全部为KVM虚拟架构,G口的带宽,可...

11kk99 com为你推荐
如何建立自己的网站如何建立自己的网站雅虎天盾高手进来看看我该怎么办 新装的ie8 内存使用率达到100%了lockdowndios8.1怎么激活内置卡贴iphone6上市时间苹果6什么时候在中国大陆上市如何快速收录谁知道怎么快速被搜索引擎快速收录啊?小米手柄小米手柄怎么用?系统分析员系统分析员的工作内容qq新闻弹窗如何关闭QQ新闻弹窗怎么把网页的字变大如何使网页的字体变大?recovery教程如何强制刷入recovery
西安服务器租用 域名服务器的作用 亚洲大于500m burstnet rackspace 香港cdn 512m 魔兽世界台湾服务器 本网站服务器在美国 有益网络 帽子云 能外链的相册 登陆空间 便宜空间 免费的asp空间 日本代理ip 谷歌台湾 photobucket wordpress中文主题 网页加速 更多