estimatepagerank

pagerank  时间:2021-04-19  阅读:()
AFrameworkforWebPageRankPredictionElliVoudigari1,JohnPavlopoulos1,andMichalisVazirgiannis1,2,1DepartmentofInformatics,AthensUniversityofEconomicsandBusiness,Greeceelliv@aueb.
gr,annis.
pavlo@gmail.
com,mvazirg@aueb.
gr2InstitutTelecom,EcoledeTelecomParisTech,DepartementInformatiqueetReseaux,Paris,FranceAbstract.
WeproposeaframeworkforpredictingtherankingpositionofaWebpagebasedonpreviousrankings.
Assumingasetofsuccessivetop-krankings,welearnpredictorsbasedondierentmethodologies.
Thepredictionqualityisquantiedasthesimilaritybetweenthepre-dictedandtheactualrankings.
Extensiveexperimentswereperformedonrealworldlargescaledatasetsforglobalandquery-basedtop-krank-ings,usingavarietyofexistingsimilaritymeasuresforcomparingtop-krankedlists,includinganovelandmorestrictmeasureintroducedinthispaper.
Thepredictionsarehighlyaccurateandrobustforallexperimen-talsetupsandsimilaritymeasures.
Keywords:RankPrediction,DataMining,WebMining,ArticialIntelligence.
1IntroductionTheWorldWideWebisahighlydynamicstructurecontinuouslychanging,asWebpagesandhyperlinksarecreated,deletedormodied.
Rankingoftheresultsisacornerstoneprocessenablinguserstoeectivelyretrieverelevantandimportantinformation.
GiventhehugesizeoftheWebgraph,computingrankingsofWebpagesrequiresawesomeresources-computationsonmatriceswhosesizeisoftheorderofWebsize(109nodes).
Ontheotherhandtheowneroftheindividualwebpagecanseeitsrankingonlyinthecaseofthewebgraphbysubmittingqueriestotheownerofthegraph(i.
e.
asearchengine).
Givenaseriesoftime-orderedrankingsofthenodesofagraphwhereeachbearsitsrankingforeachtimestamp,wedeveloplearn-ingmechanismsthatenablepredictionsofthenodesrankinginfuturetimes.
Thepredictionsrequireonlylocalfeatureknowledgewhilenoglobaldataarenecessary.
Specically,anindividualnodecanpredictitsrankingonlyknowingthevaluesofitsownranking.
Insuchacasethenodecouldplanactionsforoptimizingitsrankinginfuture.
InthispaperwepresentanintegratedeortforaframeworktowardsWebpagerankpredictionconsideringdierentlearningalgorithms.
WeconsiderPartiallysupportedbytheDIGITEOChairgrantLEVETONEinFranceandtheResearchCentreoftheAthensUniversityofEconomicsandBusiness,Greece.
L.
Iliadisetal.
(Eds.
):EANN/AIAI2011,PartII,IFIPAICT364,pp.
240–249,2011.
cIFIPInternationalFederationforInformationProcessing2011AFrameworkforWebPageRankPrediction241i)variableorderMarkovModels(MMs),ii)regressionmodelsandiii)anEMbasedapproachwithBayesianlearning.
ThenalpurposeistorepresentthetrendsandpredictfuturerankingsofWebpages.
Allthemodelsarelearnedfromtimeseriesdatasetswhereeachtrainingsetcorrespondstopre-processedrankvaluesofWebpagesobservedovertime.
Forallmethods,predictionqualityisevaluatedbasedonthesimilaritybe-tweenthepredictedandactualrankedlists,whilewefocusonthetop-kelementsoftheWebpagerankedlists,astoppagesareusuallymoreimportantinWebsearch.
Preliminaryworkonthistopicwaspresentedin[13]and[15].
Thecurrentworksignicantlydiersandadvancespreviousworksoftheauthorsinthefollowingways:a)Renedandcarefulre-engineeringoftheMMs'parameterlearningpro-cedurebyusingcrossvalidation,b)Integrationandelaborationoftheresultsof[15]inordertovalidatetheperformancecomparisonbetweenregression(boostedwithclustering)andMMpredictors,inlargescalerealworlddatasets,c)Namelyweadopt:LinearRegression,random1st/2nd/3rdorderMarkovmodelsprovingtherobustnessofthemodel,d)Anewtop-klistsimilaritymeasure(RSim)isintroducedandusedfortheevaluationofpredictorsandmoreimportantly,e)Additional,extensiveandrobustexperimentstookplaceusingquerybasedontop-klistsfromYahoo!
andGoogleSearchengine.
2RelatedWorkTherankingofqueryresultsinaWebsearch-engineisanimportantproblemandhasattractedsignicantattentionintheresearchcommunity.
TheproblemofpredictingPageRankispartlyaddressedin[9].
ItfocusesonWebpageclassicationbasedonURLfeatures.
Basedonthis,theauthorsperformexperimentstryingtomakePageRankpredictionsusingtheextractedfeatures.
Forthispurpose,theyuselinearregression;however,thecomplexityofthisapproachgrowslinearlyinproportiontothenumberoffeatures.
TheexperimentalresultsshowthatPageRankpredictionbasedonURLfeaturesdoesnotperformverywell,probablybecauseeventhoughtheycorrelateverywellwiththesubjectofpages,theydonotinuencepage'sauthorityinthesameway.
Arecentapproachtowardspagerankingpredictionispresentedin[13]gener-atingMarkovModelsfromhistoricalrankedlistsandusingthemforpredictions.
AnapproachthataimsatapproximatingPageRankvalueswithouttheneedofperformingthecomputationsovertheentiregraphis[6].
TheauthorsproposeanalgorithmtoincrementallycomputeapproximationstoPageRank,basedonevolutionofthelinkstructureofWebgraph(asetoflinkchanges).
Theirexper-imentsdemonstratethatthealgorithmperformswellbothinspeedandqualityandisrobusttovarioustypesoflinkmodications.
However,thisrequirescon-tinuousmonitoringoftheWebgraphinordertotrackanylinkmodications.
TherehasalsobeenworkinadaptivecomputationofPageRank([8],[11])orevenestimationofPageRankscores[7].
242E.
Voudigarietal.
In[10]amethodcalledpredictiverankingisproposed,aimingatestimatingtheWebstructurebasedontheintuitionthatthecrawlingandconsequentlytherankingresultsareinaccurate(duetoinadequatedataanddanglingpages).
Inthiswork,theauthorsdonotmakefuturerankpredictions.
Instead,theyestimatethemissingdatainordertoachievemoreaccuraterankings.
In[14]theauthorssuggestanewmeasureforrankingscienticarticles,basedonfuturecitations.
Basedonpublicationtimeandauthor'sname,theypredictfuturecitationsandsuggestabettermodel.
3PredictionMethodsInthissection,wepresentaframeworkthataimstopredictthefuturerankpositionofWebpagesbasedontheirtrendsshownthepast.
OurgoalistondpatternsinrankingevolutionofWebpages.
GivenasetofsuccessiveWebgraphsnapshots,foreachpagewegenerateasequenceofrankchangeratesthatindicatesthetrendsofthispageamongtheprevioussnapshots.
WeusethesesequencesofprevioussnapshotsoftheWebgraphasatrainingsetandtrytopredictthetrendsofaWebpagebasedonprevious.
Theremainingofthissectionisorganizedasfollows:InSect.
3.
1wetrainMMsofvariousordersandtrytopredictthetrendsofaWebpage.
Section3.
2discussesanapproachthatusesaseparatelinearregressionmodelforeachwebpage,whileSect.
3.
3combineslinearregressionwithclusteringbasedonanEMprobabilisticframework.
RankChangeRate.
InordertopredictfuturerankingsofWebpages,weneedtodeneameasureintroducedin[12]suitableformeasuringpagerankdynamics.
Webrieypresentitsdesign.
LetGtibethesnapshotoftheWebgraphcreatedbyacrawlandnti=|Gti|thenumberofWebpagesattimeti.
Then,rank(p,ti)isafunctionprovidingtherankingofaWebpagep∈Gti,accordingtosomecriterion(i.
e.
PageRankvalues).
Intuitively,anappropriatemeasureforWebpagestrendsistherankchangeratebetweentwosnapshots,butasthesizeoftheWebgraphconstantlyincreasesthetrendmeasureshouldbecomparableacrossdierentgraphsizes.
Thus,weutilizethenormalizedrank(nrank)ofaWebpage,asitwasdenedin[12].
Forapageprankedatpositionrank(p,ti):nrank(p,ti)=2·rank(p,ti)n2ti,whichrangesbetween2n2tiand2n1ti.
Then,usingthenormalizedranks,theRankChangeRate(Racer)isgivenbyracer(p,ti)=1nrank(p,ti+1)nrank(p,ti).
3.
1MarkovModelLearningMarkovModels(MMs)[1]havebeenwidelyusedforstudyingandunderstandingstochasticprocessesandbehaveverywellonmodelingandpredictingvaluesinvariousapplications.
Theirfundamentalassumptionisthatthefuturevaluedependsonanumberofmpreviousvalues,wheremistheorderoftheMM.
AFrameworkforWebPageRankPrediction243TheyaredenedbasedonasetofstatesS={s1,s2,sn}andamatrixToftransitionprobabilitiestieachofwhichrepresentstheprobabilitythatastatesioccursafterasequenceofstates.
OurgoalistorepresenttheWebpagesrankingtrendsacrossdierentwebgraphsnapshots.
WeusetheracervaluestodescribetherankchangeofaWebpagebetweentwosnapshotsandweutilizeracersequencestolearnMMs.
Ob-viously,stablerankingacrosstimeisrepresentedbyazeroracervalue,whileallothertrendsbyrealnumbersgeneratingahugespaceofdiscretevalues.
Asexpected(intuitivelymostpagesareexpectedtoremainstableforsometimeirrespectivetotheirrankatthetime),thezerovaluehasanunreasonablyhighfrequencycomparedtoallothervalueswhichmeansthatallstatesbesidesthezerooneshouldbeformedbyinherentrangesofvaluesinsteadofasingledis-crete.
Inordertoensureequalprobabilityfortransitionbetweenanypairofstates,weguaranteedequiprobablestatesbyformingrangeswithequalcumu-lativefrequencies(showingracervaluewithintherange)witheachother.
InordertocalculatethestatenumberforourMMs,wecomputedtherelativecumulativefrequencyofthezeroracerstateRFRacer=0andusedthistondtheoptimumnumberofstatesns=lRFRacer=0.
Next,weformednsequiprobablepartitionsandusedtheranges'meanaveragevaluesasstatestotrainourmodel.
Weshouldnotethatwithinthesignicantlyhighfrequencyofthezeroracervalues,arealsoconsideredpagesinitiallyobtainedwithinthetop-klistandthenfell(andremained)out.
WeremoveanybiasfromRFRacer=0,excludinganyvaluesnotcorrespondingtostablerankandobtainingRFRacer=0≈0.
1whichinturnsuggested10equiprobablestates.
PredictionswithRacer.
BasedonthesetofstatesmentionedaboveandformedtorepresentWebpagetrends,weareabletotrainMMsandpredictthetrendofaWebpageinthefutureaccordingtopasttrends.
Byassumingm+1temporallysuccessivecrawls,resultinginrespectivesnapshots,asequenceofmstates(representativeofracervalues)areconstructedforeachWebpage.
Theseareusedtoconstructanm-orderMM.
Notethatthememorymisaninherentfeatureofthemodel.
Aftercomputingtransitionprobabilitiesforeverypath,usingthegeneratedstates,thefuturestatescanbepredictedbyusingthechainrule[1].
Thus,foranm-orderMarkovModel,thepathprobabilityofastatesequenceisP(s1→.
.
.
→sm)=P(s1)·mi=2P(si|sim,.
.
,si1),whereeachsi(i∈{1,2,n})foranytimeintervalmayvaryoverallthepossiblestates(rangesofracervalues).
Then,predictingthefuturetrendofapageisperformedbycomputingthemostlikelynextstategiventhesofarstatepath.
Inspecic,assumingmtimeintervals,thenextmostprobablestateXiscomputedas:X=argmaxXP(s1→.
.
.
→sm1→X).
Usingthat,wepredictfuturestatesforeachpage.
AseachstateisthemeanofaRacerrange,wecomputebackthefuturenrank.
Therefore,weareabletopredictfuturetop-krankingbysortingtheracerofWebpagesinascendingorder.
244E.
Voudigarietal.
3.
2RegressionModelsAssumeasetofNWebpagesandobservationsofnrankvaluesatmtimesteps.
Letxi=(xi1,xim)bethenrankvaluesforWebpageiatthetimepointst=(t1,tm),wherethe(N*m)designmatrixXstoresalltheobservednrankvaluessothateachrowcorrespondstoaWebpageandeachcolumntoatimepoint.
GiventhesevalueswewishtopredictthenrankvaluexiforeachWebpageatsometimetwhichtypicallycorrespondstoafuturetimepoint(t>ti,i=1,m).
Next,wediscussasimplepredictionmethodbasedonlinearregressionwheretheinputvariablecorrespondstotimeandtheoutputtothenrankvalue.
ForacertainWebpageiweassumealinearregressionmodelhavingtheformxik=aitk+bi+k,k=1,m(kdenotesazero-meanGaussiannoise).
Notethattheparameters(ai,bi)areWebpage-specicandtheirvaluesarecalculatedusingleastsquares.
Inotherwords,theaboveformulationdenesaseparatelinearregressionmodelforeachWebpagethustheytreatindependently.
ThiscanberestrictivesincepossibleexistingsimilaritiesanddependenciesbetweendierentWebpagesarenottakenintoaccount.
3.
3ClusteringUsingEMWeassumethatthenrankvaluesofeachWebpagefallintooneofJdierentclusters.
Clusteringcanbeviewedastrainingamixtureprobabilitymodel.
TogeneratethenrankvaluesxiforWebpagei,werstselecttheclustertypejwithprobabilityπj(whereπj≥0andJj=1πj=1)andthenproducethevaluesxiaccordingtoalinearregressionmodelxik=aitk+bi+k,k=1,m,wherekisindependentGaussiannoisewithzeromeanandvarianceσ2j.
ThisimpliesthatgiventheclustertypejthenrankvaluesaredrawnfromtheproductofGaussiansp(xi|j)=mk=1N(xik|ajtk+bj,σ2j).
TheclustertypethatgeneratedthenrankvaluesofacertainWebpageisanunobservedvariableandthusaftermarginalizationweobtainamixtureuncon-ditionaldensityp(xi)=Jj=1πjp(xi|j)fortheobservationvectorxi.
Totrainthemixturemodelandestimatetheparametersθ=(πj,σ2j,aj,bj)j=1,.
.
.
,J,wecanmaximizetheloglikelihoodofthedataL(θ)=logNi=1p(xi)byusingtheEMalgorithm[2].
Givenaninitialstatefortheparameters,EMoptimizesoverθbyiteratingbetweenEandMsteps:TheEstepcomputestheposteriorprobabilitiesRij=πjp(xi|j)Jρ=1πρp(xi|ρ),forj=1,Jandi=1,N,(Nisthetotalnumberofwebpages).
TheMstepupdatestheparametersaccordingto:πj=1NNi=1Rij,σ2j=Ni=1Rijmk=1(xikajtkbj)2πjandajbj=1NjtTttT1tT1m1Ni=1RijxTitNi=1RijxTi1,j=1,J,tisthevectorofalltimepointsand1isthem-dimensionalvectorofones.
Oncewehaveobtainedsuitablevaluesfortheparameters,wecanusethemixturemodelforprediction.
Particularly,topredictthenrankvaluexiofWebAFrameworkforWebPageRankPrediction245pageiattgiventheobservedvaluesxi=(xi1,xim)atprevioustimes,weexpresstheposteriordistributionp(xi|xi)usingtheBayesrulep(xi|xi)=Jj=1RijNxiajt+bj,s2j,whereRijiscomputedaccordingtoE-step.
Toobtainaspecicpredictivevalueforxi,wecanusethemeanvalueoftheaboveposteriordistributionxi=Jj=1Rij(ajt+bj)orthemedianestimatexi=ajt+bj,wherej=argmaxρRiρthatconsidersahardassignmentoftheWebpageintooneoftheJclusters.
4Top-kListSimilarityMeasuresInordertoevaluatethequalityofpredictions,weneedtomeasurethesimilar-ityofthepredictedtotheactualtop-kranking.
Forthispurpose,weexaminemeasurescommonlyusedforcomparingrankings,pointouttheshortcomingsofexistinganddeneanewsimilaritymeasurefortop-krankings,denotedasRSim.
4.
1ExistingSimilarityMeasuresTherstone,denotedasOSim(A,B)[4]indicatesthedegreeofoverlapbetweenthetop-kelementsoftwosetsAandB(eachoneofsizek):OSim(A,B)=|A∩B|k.
Thesecond,KSim(A,B)[4],isbasedonKendall'sdistancemeasure[3]andindicatesthedegreethattherelativeorderingsoftwotop-klistsareinagreement:KSim(A,B)=|(u,v):A,B,agreeinorder||A∪B|(|A∪B|1),whereAisanextensionofAresultingfromappendingatitstailtheelementsx∈A∪(BA)andBisdenedanalogously.
AnotherinterestingmeasureintroducedinInformationRetrievalforevaluat-ingtheaccumulatedrelevanceofatop-kdocumentlisttoaqueryisthe(Nor-malized)DiscountedCumulativeGain(N(DCG))[5].
Thismeasureassumesatop-klist,whereeachdocumentisfeaturedwitharelevancescoreaccumulatedbyscanningthelistfromtoptobottom.
AlthoughDCGcouldbeusedfortheevaluationofourpredictions,sinceittakesintoaccounttherelevanceofatop-klisttoanother,itexhibitssomebasicfeaturesthatpreventedusfromusingitinourexperiments.
Itpenalizeserrorsbymaintaininganincreasingvalueofcumulativerelevance.
Whilethisisbasedontherankofeachdocument,thesizekofthelistisnottakenintoaccount–thusthelengthofthelistisirrelevantinDCG.
Errorsintopranksofatop-klistshouldbeconsideredmoreimpor-tantthanerrorsinlow-rankedpositions.
ThisimportantfeaturelacksfrombothDCGandNDCGmeasures.
Moreover,DCGvalueforeachrankinthetop-klistiscomputedtakingintoaccountthepreviousvaluesinthelist.
Next,weintroduceSpearman'sRankCorrelationCoecient,whichwasusedduringtheexperimentalevaluation,consistsanon-parametric(distribution-free)rankstatisticproposedbySpearman(1904)measuringthestrengthofassoci-ationsbetweentwovariablesandisoftensymbolizedbyρ.
Itestimateshowwelltherelationshipbetweentwovariablescanbedescribedusingamonotonic246E.
Voudigarietal.
function.
Iftherearenorepeateddatavaluesofthesevariables(likeinrankingproblem),aperfectSpearmancorrelationof+1or-1existsifeachvariableisaperfectmonotonefunctionoftheother.
ItisoftenconfusedwiththePearsoncorrelationcoecientbetweenrankedvariables.
However,theprocedureusedtocalculateρismuchsimpler.
IfXandYaretwovariableswithcorrespondingranksxiandyi,di=xiyi,i=1,n,betweentheranksofeachobservationonthetwovariables,thenitisgivenby:ρ=16·ni=1d2in(n21).
4.
2RSimQualityMeasureTheobservedsimilaritymeasuresdonotcoversucientlythenegrainedre-quirementsarising,comparingtop-krankingsintheWebsearchcontext.
Soweneedanewsimilaritymetrictakingintoconsideration:a)TheabsolutedierencebetweenthepredictedandactualpositionforeachWebpageaslargedierenceindicatesalessaccuratepredictionandb)TheactualrankingpositionofaWebpage,becausefailingtopredictahighlyrankedWebpageismoreimportantthanalow-ranked.
Basedontheseobservations,weintroduceanewmeasure,namedRSim.
Everyinaccuratepredictionmadeincursacertainpenaltydependingonthetwonotedfactors.
Ifpredictionis100%accurate(samepredictedandactualrank),thepenaltyisequaltozero.
LetBibethepredictedrankpositionforpageiandAitheactual.
TheCumulativePenaltyScore(CPS)iscomputedasCPS(A,B)=ki=1|AiBi|·(k+1Ai).
TheproposedpenaltyscoreCPSrepresentstheoverallerror(dierence)be-tweentheinvolvedtop-klistsAandBandisproportionalto|AiBi|.
Theterm(k+1Ai)increaseswhenAibecomessmallersoerrorsinhighlyrankedWebpagesarepenalizedmore.
Inthebestcase,rankpredictionsforallWebpagesarecompletelyaccurate(CPS=0),sinceAi=Biforanyvalueofi.
Intheworstcase,therankpredictionsforallWebpagesnotonlyareinaccurate,butalsobearthegreatestCPSpenaltypossible.
Insuchascenario,alltheWebpagespredictedtobeinthetop-klist,actuallyholdthepositionk+1(orworse).
Assumingthatwewanttocomparetworankingsoflengthk,thenthemax-imumCPSforevenandoddvaluesofkisequalto2k3+3k2+k6.
TheproofforCPSmaxnalformisomittedduetospacelimitations.
Basedontheabovewedeneanewsimilaritymeasure,RSim,tocomparethesimilaritybetweentop-kranklistsasfollows:RSim(Ai,Bi)=1CPS(Ai,Bi)CPSmax(Ai,Bi).
(1)Inthebest-casepredictionscenario,RSimisequaltoone,whileintheworst-caseRSimisequaltozero.
SothecloserthevalueofRSimistoone,thebetterandmoreaccuratetherankpredictionsare.
AFrameworkforWebPageRankPrediction2470501001502002503000.
50.
550.
60.
650.
70.
750.
80.
850.
90.
951topkOSimMM1MM2MM3LinRegBayesMod(a)OSim0501001502002503000.
550.
60.
650.
70.
750.
80.
85topkKSimMM1MM2MM3LinRegBayesMod(b)KSim0501001502002503000.
50.
550.
60.
650.
70.
75topkRSimMM1MM2MM3LinRegBayesMod(c)RSim0501001502002503000.
650.
70.
750.
80.
850.
90.
951topknDCGMM1MM2MM3LinRegBayesMod(d)NDCG0501001502002503000.
70.
750.
80.
850.
90.
951topkSpearmanCorrelationMM1MM2MM3LinRegBayesMod(e)SpearmanCorrelationFig.
1.
PredictionaccuracyvsTop-klistlength-Yahoodataset5ExperimentalEvaluationInordertoevaluatetheeectivenessofourmethodsweperformedexperimentsontwodierentrealworlddatasets.
Theseconsistcollectionsoftop-krankedlistsfor22queriesoveraperiodof11daysasresultedfromtheYahoo!
1andtheGooglesearchengines,producedinthesameway.
Inourexperiments,weevaluatethepredictionqualityintermsofsimilaritiesbetweenthepredictedandtheactualtop-krankedlistsusingOSim,KSim,NDCG,SpearmancorrelationandthenovelsimilaritymeasureRSim.
5.
1DatasetsandQuerySelectionForeachdataset(YahooandGoogle)awealthofsnapshotswereavailable,en-suringwehaveenoughevolutiontotestourapproach.
Aconcisedescriptionofeachdatasetandquery-basedapproachfollow.
TheYahooandGoogledatasetsconsistof11consecutivedailytop-1000rankedlistscomputedusingtheYa-hooSearchWebServices2andtheGoogleSearchenginerespectively.
Thesesetswerepickedfrompopular:a)queriesappearedinGoogleTrends3andb)currentqueries(i.
e.
euro2008orOlympicgames2008).
5.
2ExperimentalMethodologyWecomparedallpredictionsamongthevariousapproachesandwenextdescribethestepsassumedforbothdatasets.
Atrst,wecomputedPageRankscoresfor1http://search.
yahoo.
com2http://developer.
yahoo.
com/search/3http://www.
google.
com/trends248E.
Voudigarietal.
0501001502002503000.
20.
30.
40.
50.
60.
70.
80.
91topkOSimMM1MM2MM3LinRegBayesMod(a)OSim0501001502002503000.
50.
550.
60.
650.
70.
750.
8topkKSimMM1MM2MM3LinRegBayesMod(b)KSim0501001502002503000.
30.
40.
50.
60.
70.
8topkRSimMM1MM2MM3LinRegBayesMod(c)RSim0501001502002503000.
40.
50.
60.
70.
80.
91topknDCGMM1MM2MM3LinRegBayesMod(d)NDCG0501001502002503000.
650.
70.
750.
80.
850.
90.
951topkSpearmanCorrelationMM1MM2MM3LinRegBayesMod(e)SpearmanCorrelationFig.
2.
PredictionaccuracyvsTop-klistlength-Googledataseteachsnapshotofourdatasetsandobtainedthetop-krankingsusingthescoringfunctionmentioned.
Havingcomputedthescores,wecalculatedthenrank(racervaluesforMMs)foreachpairofconsecutivegraphsnapshotsandstoredtheminamatrixnrank(racer)*time.
Then,assuminganm-pathofconsecutivesnapshots,wepredictthem+1state.
Foreachpagep,wepredictarankingcomparingittoactualbya10-foldcrossvalidationprocess(training90%ofdatasetandtestingontheremaining10%).
InthecaseoftheEMapproach,wetestedthequalityofclusteringresultsforclusterscardinalitybetween2and10foreachqueryandchosetheonethatmaximizedtheoverallqualityofclustering.
Thiswasdenedasamonotonecombinationofwithin-clusterwc(sumofsquareddistancesfromeachpointtothecenterofclusteritbelongsto)andbetween-clustervariationbc(distancebetweenclustercenters).
Asscorefunctionofclustering,weconsideredtheratiobc/wc.
5.
3ExperimentalResultsRegardingtheGoogleandYahoo!
datasetresultscomingoutoftheexperimen-talevaluation,onecanseethattheMMsprevailwithveryaccurateresults.
Regressionbasedtechniques(LinReg)reachandoutweighMMsperformanceasthelengthoftop-klistincreasesprovingtheirrobustness.
InbothdatasetsexperimentsprovethesuperiorityofEMapproach(BayesMod)whoseperformanceisverysatisfyingforallsimilaritymeasures.
TheMMscomenextintheevaluationranking,whereassmallertheorderisthebetteristhepredictionaccuracy,thoughonewouldthinkofthecontrary.
AFrameworkforWebPageRankPrediction249Obviously(gures)theproposedframeworkoersincrediblyhighaccuracypredictionsandisveryencouraging,asitrangessystematicallybetween70%and100%providingatoolforeectivepredictions.
6ConclusionsWehavedescribedpredictorlearningalgorithmsforWebpagerankpredictionbasedonaframeworkoflearningtechniques(MMs,LinReg,BayesMod)andexperimentalstudyshowedthattheycanachieveoverallverygoodpredictionperformance.
Furtherworkwillfocusinthefollowingissues:a)Multi-featureprediction:weintendtodealwiththeinternalmechanismthatproducestherankingofpages(notonlyrankvalues)basedonmultiplefeatures,b)Combi-nationofsuchmethodswithdimensionalityreductiontechniques.
References1.
Kemeny,J.
G.
,Snell,J.
L.
:FiniteMarkovChains.
Prinston(1963)2.
Dempster,A.
P.
,Laird,N.
M.
,Rubin,D.
B.
:MaximumLikelihoodfromIncompleteDataviatheEMAlgorithm.
RoyalStatisticalSociety39,1–38(1977)3.
Kendall,M.
G.
,Gibbons,J.
D.
:RankCorrelationMethods.
CharlesGrin,UK(1990)4.
Haveliwala,T.
H.
:Topic-SensitivePageRank.
In:Proc.
WWW(2002)5.
Jarvelin,K.
,Kekalainen,J.
:Cumulatedgain-basedevaluationofIRtechniques.
TOIS20,422–446(2002)6.
Chien,S.
,Dwork,C.
,Kumar,R.
,Simon,D.
R.
,Sivakumar,D.
:LinkEvolu-tion:AnalysisandAlgorithms.
InternetMathematics1,277–304(2003)7.
Chen,Y.
-Y.
,Gan,Q.
,Suel,T.
:LocalMethodsforEstimatingPageRankValues.
In:Proc.
ofCIKM(2004)8.
Langville,A.
N.
,Meyer,C.
D.
:UpdatingPageRankwithiterativeaggregation.
In:Proc.
ofthe13thInternationalWorldWideWebConferenceonAlternateTrackPapersandPosters,pp.
392–393(2004)9.
Kan,M.
-Y.
,Thi,H.
O.
:FastwebpageclassicationusingURLfeatures.
In:Confer-enceonInformationandKnowledgeManagement,pp.
325–326.
ACM,NewYork(2005)10.
Yang,H.
,King,I.
,Lu,M.
R.
:PredictiveRanking:ANovelPageRankingApproachbyEstimatingtheWebStructure.
In:Proc.
ofthe14thInternationalWWWCon-ference,pp.
1825–1832(2005)11.
Broder,A.
Z.
,Lempel,R.
,Maghoul,F.
,Pedersen,J.
:EcientPageRankapproxi-mationviagraphaggregation.
Inf.
Retrieval9,123–138(2006)12.
Vlachou,A.
,Berberich,K.
,Vazirgiannis,M.
:Representingandquantifyingrank-changefortheWebgraph.
In:Aiello,W.
,Broder,A.
,Janssen,J.
,Milios,E.
E.
(eds.
)WAW2006.
LNCS,vol.
4936,pp.
157–165.
Springer,Heidelberg(2008)13.
Vazirgiannis,M.
,Drosos,D.
,Senellart,P.
,Vlachou,A.
:WebPageRankPredictionwithMarkovModels.
WWWposter(2008)14.
Sayyadi,H.
,Getoor,L.
:FutureRank:RankingScienticArticlesbyPredictingtheirFuturePageRank.
In:SIAMIntern.
Confer.
onDataMining,pp.
533–544(2009)15.
Zacharouli,P.
,Titsias,M.
,Vazirgiannis,M.
:WebpagerankpredictionwithPCAandEMclustering.
In:Avrachenkov,K.
,Donato,D.
,Litvak,N.
(eds.
)WAW2009.
LNCS,vol.
5427,pp.
104–115.
Springer,Heidelberg(2009)

零途云:香港站群云服务器16IP220元/月,云服务器低至39元/月

零途云(Lingtuyun.com)新上了香港站群云服务器 – CN2精品线路,香港多ip站群云服务器16IP/5M带宽,4H4G仅220元/月,还有美国200g高防云服务器低至39元/月起。零途云是一家香港公司,主要产品香港cn2 gia线路、美国Cera线路云主机,美国CERA高防服务器,日本CN2直连服务器;同时提供香港多ip站群云服务器。即日起,购买香港/美国/日本云服务器享受9折优惠,新...

【IT狗】在线ping,在线tcping,路由追踪

IT狗为用户提供 在线ping、在线tcping、在线路由追踪、域名被墙检测、域名被污染检测 等实用工具。【工具地址】https://www.itdog.cn/【工具特色】1、目前同类网站中,在线ping 仅支持1次或少量次数的测试,无法客观的展现目标服务器一段时间的网络状况,IT狗Ping工具可持续的进行一段时间的ping测试,并生成更为直观的网络质量柱状图,让用户更容易掌握服务器在各地区、各线...

SunthyCloud阿里云国际版分销商注册教程,即可PayPal信用卡分销商服务器

阿里云国际版注册认证教程-免绑卡-免实名买服务器安全、便宜、可靠、良心,支持人民币充值,提供代理折扣简介SunthyCloud成立于2015年,是阿里云国际版正规战略级渠道商,也是阿里云国际版最大的分销商,专业为全球企业客户提供阿里云国际版开户注册、认证、充值等服务,通过SunthyCloud开通阿里云国际版只需要一个邮箱,不需要PayPal信用卡就可以帮你开通、充值、新购、续费阿里云国际版,服务...

pagerank为你推荐
yw372:Com怎么把www.jiayw.com 家园影院的电影下载到本地!!http404未找到打开网页提示HTTP 404未找到文件搜狗360360影视大全怎样免费看大片设计eset北京大学cuteftp宜人贷官网宜人贷是不是骗局,借了五万,每月还2030元,还十个月了,想一次性还完,他们说我tumblr上不去为什么,爱看软件打不开?页面一直在加载tumblr上不去百度为什么经常打不开可信网站网站备案了,还要验证可信网站吗?他们有什么区别美国独立美国独立战争
虚拟主机试用30天 北京vps主机 堪萨斯服务器 全球付 Hello图床 长沙服务器 免费全能空间 gg广告 qq数据库下载 北京双线机房 新家坡 河南移动网 美国在线代理服务器 百度云1t 免费dns解析 免费网页空间 英国伦敦 英雄联盟台服官网 免费蓝钻 服务器硬件配置 更多