estimatepagerank

pagerank  时间:2021-04-19  阅读:()
AFrameworkforWebPageRankPredictionElliVoudigari1,JohnPavlopoulos1,andMichalisVazirgiannis1,2,1DepartmentofInformatics,AthensUniversityofEconomicsandBusiness,Greeceelliv@aueb.
gr,annis.
pavlo@gmail.
com,mvazirg@aueb.
gr2InstitutTelecom,EcoledeTelecomParisTech,DepartementInformatiqueetReseaux,Paris,FranceAbstract.
WeproposeaframeworkforpredictingtherankingpositionofaWebpagebasedonpreviousrankings.
Assumingasetofsuccessivetop-krankings,welearnpredictorsbasedondierentmethodologies.
Thepredictionqualityisquantiedasthesimilaritybetweenthepre-dictedandtheactualrankings.
Extensiveexperimentswereperformedonrealworldlargescaledatasetsforglobalandquery-basedtop-krank-ings,usingavarietyofexistingsimilaritymeasuresforcomparingtop-krankedlists,includinganovelandmorestrictmeasureintroducedinthispaper.
Thepredictionsarehighlyaccurateandrobustforallexperimen-talsetupsandsimilaritymeasures.
Keywords:RankPrediction,DataMining,WebMining,ArticialIntelligence.
1IntroductionTheWorldWideWebisahighlydynamicstructurecontinuouslychanging,asWebpagesandhyperlinksarecreated,deletedormodied.
Rankingoftheresultsisacornerstoneprocessenablinguserstoeectivelyretrieverelevantandimportantinformation.
GiventhehugesizeoftheWebgraph,computingrankingsofWebpagesrequiresawesomeresources-computationsonmatriceswhosesizeisoftheorderofWebsize(109nodes).
Ontheotherhandtheowneroftheindividualwebpagecanseeitsrankingonlyinthecaseofthewebgraphbysubmittingqueriestotheownerofthegraph(i.
e.
asearchengine).
Givenaseriesoftime-orderedrankingsofthenodesofagraphwhereeachbearsitsrankingforeachtimestamp,wedeveloplearn-ingmechanismsthatenablepredictionsofthenodesrankinginfuturetimes.
Thepredictionsrequireonlylocalfeatureknowledgewhilenoglobaldataarenecessary.
Specically,anindividualnodecanpredictitsrankingonlyknowingthevaluesofitsownranking.
Insuchacasethenodecouldplanactionsforoptimizingitsrankinginfuture.
InthispaperwepresentanintegratedeortforaframeworktowardsWebpagerankpredictionconsideringdierentlearningalgorithms.
WeconsiderPartiallysupportedbytheDIGITEOChairgrantLEVETONEinFranceandtheResearchCentreoftheAthensUniversityofEconomicsandBusiness,Greece.
L.
Iliadisetal.
(Eds.
):EANN/AIAI2011,PartII,IFIPAICT364,pp.
240–249,2011.
cIFIPInternationalFederationforInformationProcessing2011AFrameworkforWebPageRankPrediction241i)variableorderMarkovModels(MMs),ii)regressionmodelsandiii)anEMbasedapproachwithBayesianlearning.
ThenalpurposeistorepresentthetrendsandpredictfuturerankingsofWebpages.
Allthemodelsarelearnedfromtimeseriesdatasetswhereeachtrainingsetcorrespondstopre-processedrankvaluesofWebpagesobservedovertime.
Forallmethods,predictionqualityisevaluatedbasedonthesimilaritybe-tweenthepredictedandactualrankedlists,whilewefocusonthetop-kelementsoftheWebpagerankedlists,astoppagesareusuallymoreimportantinWebsearch.
Preliminaryworkonthistopicwaspresentedin[13]and[15].
Thecurrentworksignicantlydiersandadvancespreviousworksoftheauthorsinthefollowingways:a)Renedandcarefulre-engineeringoftheMMs'parameterlearningpro-cedurebyusingcrossvalidation,b)Integrationandelaborationoftheresultsof[15]inordertovalidatetheperformancecomparisonbetweenregression(boostedwithclustering)andMMpredictors,inlargescalerealworlddatasets,c)Namelyweadopt:LinearRegression,random1st/2nd/3rdorderMarkovmodelsprovingtherobustnessofthemodel,d)Anewtop-klistsimilaritymeasure(RSim)isintroducedandusedfortheevaluationofpredictorsandmoreimportantly,e)Additional,extensiveandrobustexperimentstookplaceusingquerybasedontop-klistsfromYahoo!
andGoogleSearchengine.
2RelatedWorkTherankingofqueryresultsinaWebsearch-engineisanimportantproblemandhasattractedsignicantattentionintheresearchcommunity.
TheproblemofpredictingPageRankispartlyaddressedin[9].
ItfocusesonWebpageclassicationbasedonURLfeatures.
Basedonthis,theauthorsperformexperimentstryingtomakePageRankpredictionsusingtheextractedfeatures.
Forthispurpose,theyuselinearregression;however,thecomplexityofthisapproachgrowslinearlyinproportiontothenumberoffeatures.
TheexperimentalresultsshowthatPageRankpredictionbasedonURLfeaturesdoesnotperformverywell,probablybecauseeventhoughtheycorrelateverywellwiththesubjectofpages,theydonotinuencepage'sauthorityinthesameway.
Arecentapproachtowardspagerankingpredictionispresentedin[13]gener-atingMarkovModelsfromhistoricalrankedlistsandusingthemforpredictions.
AnapproachthataimsatapproximatingPageRankvalueswithouttheneedofperformingthecomputationsovertheentiregraphis[6].
TheauthorsproposeanalgorithmtoincrementallycomputeapproximationstoPageRank,basedonevolutionofthelinkstructureofWebgraph(asetoflinkchanges).
Theirexper-imentsdemonstratethatthealgorithmperformswellbothinspeedandqualityandisrobusttovarioustypesoflinkmodications.
However,thisrequirescon-tinuousmonitoringoftheWebgraphinordertotrackanylinkmodications.
TherehasalsobeenworkinadaptivecomputationofPageRank([8],[11])orevenestimationofPageRankscores[7].
242E.
Voudigarietal.
In[10]amethodcalledpredictiverankingisproposed,aimingatestimatingtheWebstructurebasedontheintuitionthatthecrawlingandconsequentlytherankingresultsareinaccurate(duetoinadequatedataanddanglingpages).
Inthiswork,theauthorsdonotmakefuturerankpredictions.
Instead,theyestimatethemissingdatainordertoachievemoreaccuraterankings.
In[14]theauthorssuggestanewmeasureforrankingscienticarticles,basedonfuturecitations.
Basedonpublicationtimeandauthor'sname,theypredictfuturecitationsandsuggestabettermodel.
3PredictionMethodsInthissection,wepresentaframeworkthataimstopredictthefuturerankpositionofWebpagesbasedontheirtrendsshownthepast.
OurgoalistondpatternsinrankingevolutionofWebpages.
GivenasetofsuccessiveWebgraphsnapshots,foreachpagewegenerateasequenceofrankchangeratesthatindicatesthetrendsofthispageamongtheprevioussnapshots.
WeusethesesequencesofprevioussnapshotsoftheWebgraphasatrainingsetandtrytopredictthetrendsofaWebpagebasedonprevious.
Theremainingofthissectionisorganizedasfollows:InSect.
3.
1wetrainMMsofvariousordersandtrytopredictthetrendsofaWebpage.
Section3.
2discussesanapproachthatusesaseparatelinearregressionmodelforeachwebpage,whileSect.
3.
3combineslinearregressionwithclusteringbasedonanEMprobabilisticframework.
RankChangeRate.
InordertopredictfuturerankingsofWebpages,weneedtodeneameasureintroducedin[12]suitableformeasuringpagerankdynamics.
Webrieypresentitsdesign.
LetGtibethesnapshotoftheWebgraphcreatedbyacrawlandnti=|Gti|thenumberofWebpagesattimeti.
Then,rank(p,ti)isafunctionprovidingtherankingofaWebpagep∈Gti,accordingtosomecriterion(i.
e.
PageRankvalues).
Intuitively,anappropriatemeasureforWebpagestrendsistherankchangeratebetweentwosnapshots,butasthesizeoftheWebgraphconstantlyincreasesthetrendmeasureshouldbecomparableacrossdierentgraphsizes.
Thus,weutilizethenormalizedrank(nrank)ofaWebpage,asitwasdenedin[12].
Forapageprankedatpositionrank(p,ti):nrank(p,ti)=2·rank(p,ti)n2ti,whichrangesbetween2n2tiand2n1ti.
Then,usingthenormalizedranks,theRankChangeRate(Racer)isgivenbyracer(p,ti)=1nrank(p,ti+1)nrank(p,ti).
3.
1MarkovModelLearningMarkovModels(MMs)[1]havebeenwidelyusedforstudyingandunderstandingstochasticprocessesandbehaveverywellonmodelingandpredictingvaluesinvariousapplications.
Theirfundamentalassumptionisthatthefuturevaluedependsonanumberofmpreviousvalues,wheremistheorderoftheMM.
AFrameworkforWebPageRankPrediction243TheyaredenedbasedonasetofstatesS={s1,s2,sn}andamatrixToftransitionprobabilitiestieachofwhichrepresentstheprobabilitythatastatesioccursafterasequenceofstates.
OurgoalistorepresenttheWebpagesrankingtrendsacrossdierentwebgraphsnapshots.
WeusetheracervaluestodescribetherankchangeofaWebpagebetweentwosnapshotsandweutilizeracersequencestolearnMMs.
Ob-viously,stablerankingacrosstimeisrepresentedbyazeroracervalue,whileallothertrendsbyrealnumbersgeneratingahugespaceofdiscretevalues.
Asexpected(intuitivelymostpagesareexpectedtoremainstableforsometimeirrespectivetotheirrankatthetime),thezerovaluehasanunreasonablyhighfrequencycomparedtoallothervalueswhichmeansthatallstatesbesidesthezerooneshouldbeformedbyinherentrangesofvaluesinsteadofasingledis-crete.
Inordertoensureequalprobabilityfortransitionbetweenanypairofstates,weguaranteedequiprobablestatesbyformingrangeswithequalcumu-lativefrequencies(showingracervaluewithintherange)witheachother.
InordertocalculatethestatenumberforourMMs,wecomputedtherelativecumulativefrequencyofthezeroracerstateRFRacer=0andusedthistondtheoptimumnumberofstatesns=lRFRacer=0.
Next,weformednsequiprobablepartitionsandusedtheranges'meanaveragevaluesasstatestotrainourmodel.
Weshouldnotethatwithinthesignicantlyhighfrequencyofthezeroracervalues,arealsoconsideredpagesinitiallyobtainedwithinthetop-klistandthenfell(andremained)out.
WeremoveanybiasfromRFRacer=0,excludinganyvaluesnotcorrespondingtostablerankandobtainingRFRacer=0≈0.
1whichinturnsuggested10equiprobablestates.
PredictionswithRacer.
BasedonthesetofstatesmentionedaboveandformedtorepresentWebpagetrends,weareabletotrainMMsandpredictthetrendofaWebpageinthefutureaccordingtopasttrends.
Byassumingm+1temporallysuccessivecrawls,resultinginrespectivesnapshots,asequenceofmstates(representativeofracervalues)areconstructedforeachWebpage.
Theseareusedtoconstructanm-orderMM.
Notethatthememorymisaninherentfeatureofthemodel.
Aftercomputingtransitionprobabilitiesforeverypath,usingthegeneratedstates,thefuturestatescanbepredictedbyusingthechainrule[1].
Thus,foranm-orderMarkovModel,thepathprobabilityofastatesequenceisP(s1→.
.
.
→sm)=P(s1)·mi=2P(si|sim,.
.
,si1),whereeachsi(i∈{1,2,n})foranytimeintervalmayvaryoverallthepossiblestates(rangesofracervalues).
Then,predictingthefuturetrendofapageisperformedbycomputingthemostlikelynextstategiventhesofarstatepath.
Inspecic,assumingmtimeintervals,thenextmostprobablestateXiscomputedas:X=argmaxXP(s1→.
.
.
→sm1→X).
Usingthat,wepredictfuturestatesforeachpage.
AseachstateisthemeanofaRacerrange,wecomputebackthefuturenrank.
Therefore,weareabletopredictfuturetop-krankingbysortingtheracerofWebpagesinascendingorder.
244E.
Voudigarietal.
3.
2RegressionModelsAssumeasetofNWebpagesandobservationsofnrankvaluesatmtimesteps.
Letxi=(xi1,xim)bethenrankvaluesforWebpageiatthetimepointst=(t1,tm),wherethe(N*m)designmatrixXstoresalltheobservednrankvaluessothateachrowcorrespondstoaWebpageandeachcolumntoatimepoint.
GiventhesevalueswewishtopredictthenrankvaluexiforeachWebpageatsometimetwhichtypicallycorrespondstoafuturetimepoint(t>ti,i=1,m).
Next,wediscussasimplepredictionmethodbasedonlinearregressionwheretheinputvariablecorrespondstotimeandtheoutputtothenrankvalue.
ForacertainWebpageiweassumealinearregressionmodelhavingtheformxik=aitk+bi+k,k=1,m(kdenotesazero-meanGaussiannoise).
Notethattheparameters(ai,bi)areWebpage-specicandtheirvaluesarecalculatedusingleastsquares.
Inotherwords,theaboveformulationdenesaseparatelinearregressionmodelforeachWebpagethustheytreatindependently.
ThiscanberestrictivesincepossibleexistingsimilaritiesanddependenciesbetweendierentWebpagesarenottakenintoaccount.
3.
3ClusteringUsingEMWeassumethatthenrankvaluesofeachWebpagefallintooneofJdierentclusters.
Clusteringcanbeviewedastrainingamixtureprobabilitymodel.
TogeneratethenrankvaluesxiforWebpagei,werstselecttheclustertypejwithprobabilityπj(whereπj≥0andJj=1πj=1)andthenproducethevaluesxiaccordingtoalinearregressionmodelxik=aitk+bi+k,k=1,m,wherekisindependentGaussiannoisewithzeromeanandvarianceσ2j.
ThisimpliesthatgiventheclustertypejthenrankvaluesaredrawnfromtheproductofGaussiansp(xi|j)=mk=1N(xik|ajtk+bj,σ2j).
TheclustertypethatgeneratedthenrankvaluesofacertainWebpageisanunobservedvariableandthusaftermarginalizationweobtainamixtureuncon-ditionaldensityp(xi)=Jj=1πjp(xi|j)fortheobservationvectorxi.
Totrainthemixturemodelandestimatetheparametersθ=(πj,σ2j,aj,bj)j=1,.
.
.
,J,wecanmaximizetheloglikelihoodofthedataL(θ)=logNi=1p(xi)byusingtheEMalgorithm[2].
Givenaninitialstatefortheparameters,EMoptimizesoverθbyiteratingbetweenEandMsteps:TheEstepcomputestheposteriorprobabilitiesRij=πjp(xi|j)Jρ=1πρp(xi|ρ),forj=1,Jandi=1,N,(Nisthetotalnumberofwebpages).
TheMstepupdatestheparametersaccordingto:πj=1NNi=1Rij,σ2j=Ni=1Rijmk=1(xikajtkbj)2πjandajbj=1NjtTttT1tT1m1Ni=1RijxTitNi=1RijxTi1,j=1,J,tisthevectorofalltimepointsand1isthem-dimensionalvectorofones.
Oncewehaveobtainedsuitablevaluesfortheparameters,wecanusethemixturemodelforprediction.
Particularly,topredictthenrankvaluexiofWebAFrameworkforWebPageRankPrediction245pageiattgiventheobservedvaluesxi=(xi1,xim)atprevioustimes,weexpresstheposteriordistributionp(xi|xi)usingtheBayesrulep(xi|xi)=Jj=1RijNxiajt+bj,s2j,whereRijiscomputedaccordingtoE-step.
Toobtainaspecicpredictivevalueforxi,wecanusethemeanvalueoftheaboveposteriordistributionxi=Jj=1Rij(ajt+bj)orthemedianestimatexi=ajt+bj,wherej=argmaxρRiρthatconsidersahardassignmentoftheWebpageintooneoftheJclusters.
4Top-kListSimilarityMeasuresInordertoevaluatethequalityofpredictions,weneedtomeasurethesimilar-ityofthepredictedtotheactualtop-kranking.
Forthispurpose,weexaminemeasurescommonlyusedforcomparingrankings,pointouttheshortcomingsofexistinganddeneanewsimilaritymeasurefortop-krankings,denotedasRSim.
4.
1ExistingSimilarityMeasuresTherstone,denotedasOSim(A,B)[4]indicatesthedegreeofoverlapbetweenthetop-kelementsoftwosetsAandB(eachoneofsizek):OSim(A,B)=|A∩B|k.
Thesecond,KSim(A,B)[4],isbasedonKendall'sdistancemeasure[3]andindicatesthedegreethattherelativeorderingsoftwotop-klistsareinagreement:KSim(A,B)=|(u,v):A,B,agreeinorder||A∪B|(|A∪B|1),whereAisanextensionofAresultingfromappendingatitstailtheelementsx∈A∪(BA)andBisdenedanalogously.
AnotherinterestingmeasureintroducedinInformationRetrievalforevaluat-ingtheaccumulatedrelevanceofatop-kdocumentlisttoaqueryisthe(Nor-malized)DiscountedCumulativeGain(N(DCG))[5].
Thismeasureassumesatop-klist,whereeachdocumentisfeaturedwitharelevancescoreaccumulatedbyscanningthelistfromtoptobottom.
AlthoughDCGcouldbeusedfortheevaluationofourpredictions,sinceittakesintoaccounttherelevanceofatop-klisttoanother,itexhibitssomebasicfeaturesthatpreventedusfromusingitinourexperiments.
Itpenalizeserrorsbymaintaininganincreasingvalueofcumulativerelevance.
Whilethisisbasedontherankofeachdocument,thesizekofthelistisnottakenintoaccount–thusthelengthofthelistisirrelevantinDCG.
Errorsintopranksofatop-klistshouldbeconsideredmoreimpor-tantthanerrorsinlow-rankedpositions.
ThisimportantfeaturelacksfrombothDCGandNDCGmeasures.
Moreover,DCGvalueforeachrankinthetop-klistiscomputedtakingintoaccountthepreviousvaluesinthelist.
Next,weintroduceSpearman'sRankCorrelationCoecient,whichwasusedduringtheexperimentalevaluation,consistsanon-parametric(distribution-free)rankstatisticproposedbySpearman(1904)measuringthestrengthofassoci-ationsbetweentwovariablesandisoftensymbolizedbyρ.
Itestimateshowwelltherelationshipbetweentwovariablescanbedescribedusingamonotonic246E.
Voudigarietal.
function.
Iftherearenorepeateddatavaluesofthesevariables(likeinrankingproblem),aperfectSpearmancorrelationof+1or-1existsifeachvariableisaperfectmonotonefunctionoftheother.
ItisoftenconfusedwiththePearsoncorrelationcoecientbetweenrankedvariables.
However,theprocedureusedtocalculateρismuchsimpler.
IfXandYaretwovariableswithcorrespondingranksxiandyi,di=xiyi,i=1,n,betweentheranksofeachobservationonthetwovariables,thenitisgivenby:ρ=16·ni=1d2in(n21).
4.
2RSimQualityMeasureTheobservedsimilaritymeasuresdonotcoversucientlythenegrainedre-quirementsarising,comparingtop-krankingsintheWebsearchcontext.
Soweneedanewsimilaritymetrictakingintoconsideration:a)TheabsolutedierencebetweenthepredictedandactualpositionforeachWebpageaslargedierenceindicatesalessaccuratepredictionandb)TheactualrankingpositionofaWebpage,becausefailingtopredictahighlyrankedWebpageismoreimportantthanalow-ranked.
Basedontheseobservations,weintroduceanewmeasure,namedRSim.
Everyinaccuratepredictionmadeincursacertainpenaltydependingonthetwonotedfactors.
Ifpredictionis100%accurate(samepredictedandactualrank),thepenaltyisequaltozero.
LetBibethepredictedrankpositionforpageiandAitheactual.
TheCumulativePenaltyScore(CPS)iscomputedasCPS(A,B)=ki=1|AiBi|·(k+1Ai).
TheproposedpenaltyscoreCPSrepresentstheoverallerror(dierence)be-tweentheinvolvedtop-klistsAandBandisproportionalto|AiBi|.
Theterm(k+1Ai)increaseswhenAibecomessmallersoerrorsinhighlyrankedWebpagesarepenalizedmore.
Inthebestcase,rankpredictionsforallWebpagesarecompletelyaccurate(CPS=0),sinceAi=Biforanyvalueofi.
Intheworstcase,therankpredictionsforallWebpagesnotonlyareinaccurate,butalsobearthegreatestCPSpenaltypossible.
Insuchascenario,alltheWebpagespredictedtobeinthetop-klist,actuallyholdthepositionk+1(orworse).
Assumingthatwewanttocomparetworankingsoflengthk,thenthemax-imumCPSforevenandoddvaluesofkisequalto2k3+3k2+k6.
TheproofforCPSmaxnalformisomittedduetospacelimitations.
Basedontheabovewedeneanewsimilaritymeasure,RSim,tocomparethesimilaritybetweentop-kranklistsasfollows:RSim(Ai,Bi)=1CPS(Ai,Bi)CPSmax(Ai,Bi).
(1)Inthebest-casepredictionscenario,RSimisequaltoone,whileintheworst-caseRSimisequaltozero.
SothecloserthevalueofRSimistoone,thebetterandmoreaccuratetherankpredictionsare.
AFrameworkforWebPageRankPrediction2470501001502002503000.
50.
550.
60.
650.
70.
750.
80.
850.
90.
951topkOSimMM1MM2MM3LinRegBayesMod(a)OSim0501001502002503000.
550.
60.
650.
70.
750.
80.
85topkKSimMM1MM2MM3LinRegBayesMod(b)KSim0501001502002503000.
50.
550.
60.
650.
70.
75topkRSimMM1MM2MM3LinRegBayesMod(c)RSim0501001502002503000.
650.
70.
750.
80.
850.
90.
951topknDCGMM1MM2MM3LinRegBayesMod(d)NDCG0501001502002503000.
70.
750.
80.
850.
90.
951topkSpearmanCorrelationMM1MM2MM3LinRegBayesMod(e)SpearmanCorrelationFig.
1.
PredictionaccuracyvsTop-klistlength-Yahoodataset5ExperimentalEvaluationInordertoevaluatetheeectivenessofourmethodsweperformedexperimentsontwodierentrealworlddatasets.
Theseconsistcollectionsoftop-krankedlistsfor22queriesoveraperiodof11daysasresultedfromtheYahoo!
1andtheGooglesearchengines,producedinthesameway.
Inourexperiments,weevaluatethepredictionqualityintermsofsimilaritiesbetweenthepredictedandtheactualtop-krankedlistsusingOSim,KSim,NDCG,SpearmancorrelationandthenovelsimilaritymeasureRSim.
5.
1DatasetsandQuerySelectionForeachdataset(YahooandGoogle)awealthofsnapshotswereavailable,en-suringwehaveenoughevolutiontotestourapproach.
Aconcisedescriptionofeachdatasetandquery-basedapproachfollow.
TheYahooandGoogledatasetsconsistof11consecutivedailytop-1000rankedlistscomputedusingtheYa-hooSearchWebServices2andtheGoogleSearchenginerespectively.
Thesesetswerepickedfrompopular:a)queriesappearedinGoogleTrends3andb)currentqueries(i.
e.
euro2008orOlympicgames2008).
5.
2ExperimentalMethodologyWecomparedallpredictionsamongthevariousapproachesandwenextdescribethestepsassumedforbothdatasets.
Atrst,wecomputedPageRankscoresfor1http://search.
yahoo.
com2http://developer.
yahoo.
com/search/3http://www.
google.
com/trends248E.
Voudigarietal.
0501001502002503000.
20.
30.
40.
50.
60.
70.
80.
91topkOSimMM1MM2MM3LinRegBayesMod(a)OSim0501001502002503000.
50.
550.
60.
650.
70.
750.
8topkKSimMM1MM2MM3LinRegBayesMod(b)KSim0501001502002503000.
30.
40.
50.
60.
70.
8topkRSimMM1MM2MM3LinRegBayesMod(c)RSim0501001502002503000.
40.
50.
60.
70.
80.
91topknDCGMM1MM2MM3LinRegBayesMod(d)NDCG0501001502002503000.
650.
70.
750.
80.
850.
90.
951topkSpearmanCorrelationMM1MM2MM3LinRegBayesMod(e)SpearmanCorrelationFig.
2.
PredictionaccuracyvsTop-klistlength-Googledataseteachsnapshotofourdatasetsandobtainedthetop-krankingsusingthescoringfunctionmentioned.
Havingcomputedthescores,wecalculatedthenrank(racervaluesforMMs)foreachpairofconsecutivegraphsnapshotsandstoredtheminamatrixnrank(racer)*time.
Then,assuminganm-pathofconsecutivesnapshots,wepredictthem+1state.
Foreachpagep,wepredictarankingcomparingittoactualbya10-foldcrossvalidationprocess(training90%ofdatasetandtestingontheremaining10%).
InthecaseoftheEMapproach,wetestedthequalityofclusteringresultsforclusterscardinalitybetween2and10foreachqueryandchosetheonethatmaximizedtheoverallqualityofclustering.
Thiswasdenedasamonotonecombinationofwithin-clusterwc(sumofsquareddistancesfromeachpointtothecenterofclusteritbelongsto)andbetween-clustervariationbc(distancebetweenclustercenters).
Asscorefunctionofclustering,weconsideredtheratiobc/wc.
5.
3ExperimentalResultsRegardingtheGoogleandYahoo!
datasetresultscomingoutoftheexperimen-talevaluation,onecanseethattheMMsprevailwithveryaccurateresults.
Regressionbasedtechniques(LinReg)reachandoutweighMMsperformanceasthelengthoftop-klistincreasesprovingtheirrobustness.
InbothdatasetsexperimentsprovethesuperiorityofEMapproach(BayesMod)whoseperformanceisverysatisfyingforallsimilaritymeasures.
TheMMscomenextintheevaluationranking,whereassmallertheorderisthebetteristhepredictionaccuracy,thoughonewouldthinkofthecontrary.
AFrameworkforWebPageRankPrediction249Obviously(gures)theproposedframeworkoersincrediblyhighaccuracypredictionsandisveryencouraging,asitrangessystematicallybetween70%and100%providingatoolforeectivepredictions.
6ConclusionsWehavedescribedpredictorlearningalgorithmsforWebpagerankpredictionbasedonaframeworkoflearningtechniques(MMs,LinReg,BayesMod)andexperimentalstudyshowedthattheycanachieveoverallverygoodpredictionperformance.
Furtherworkwillfocusinthefollowingissues:a)Multi-featureprediction:weintendtodealwiththeinternalmechanismthatproducestherankingofpages(notonlyrankvalues)basedonmultiplefeatures,b)Combi-nationofsuchmethodswithdimensionalityreductiontechniques.
References1.
Kemeny,J.
G.
,Snell,J.
L.
:FiniteMarkovChains.
Prinston(1963)2.
Dempster,A.
P.
,Laird,N.
M.
,Rubin,D.
B.
:MaximumLikelihoodfromIncompleteDataviatheEMAlgorithm.
RoyalStatisticalSociety39,1–38(1977)3.
Kendall,M.
G.
,Gibbons,J.
D.
:RankCorrelationMethods.
CharlesGrin,UK(1990)4.
Haveliwala,T.
H.
:Topic-SensitivePageRank.
In:Proc.
WWW(2002)5.
Jarvelin,K.
,Kekalainen,J.
:Cumulatedgain-basedevaluationofIRtechniques.
TOIS20,422–446(2002)6.
Chien,S.
,Dwork,C.
,Kumar,R.
,Simon,D.
R.
,Sivakumar,D.
:LinkEvolu-tion:AnalysisandAlgorithms.
InternetMathematics1,277–304(2003)7.
Chen,Y.
-Y.
,Gan,Q.
,Suel,T.
:LocalMethodsforEstimatingPageRankValues.
In:Proc.
ofCIKM(2004)8.
Langville,A.
N.
,Meyer,C.
D.
:UpdatingPageRankwithiterativeaggregation.
In:Proc.
ofthe13thInternationalWorldWideWebConferenceonAlternateTrackPapersandPosters,pp.
392–393(2004)9.
Kan,M.
-Y.
,Thi,H.
O.
:FastwebpageclassicationusingURLfeatures.
In:Confer-enceonInformationandKnowledgeManagement,pp.
325–326.
ACM,NewYork(2005)10.
Yang,H.
,King,I.
,Lu,M.
R.
:PredictiveRanking:ANovelPageRankingApproachbyEstimatingtheWebStructure.
In:Proc.
ofthe14thInternationalWWWCon-ference,pp.
1825–1832(2005)11.
Broder,A.
Z.
,Lempel,R.
,Maghoul,F.
,Pedersen,J.
:EcientPageRankapproxi-mationviagraphaggregation.
Inf.
Retrieval9,123–138(2006)12.
Vlachou,A.
,Berberich,K.
,Vazirgiannis,M.
:Representingandquantifyingrank-changefortheWebgraph.
In:Aiello,W.
,Broder,A.
,Janssen,J.
,Milios,E.
E.
(eds.
)WAW2006.
LNCS,vol.
4936,pp.
157–165.
Springer,Heidelberg(2008)13.
Vazirgiannis,M.
,Drosos,D.
,Senellart,P.
,Vlachou,A.
:WebPageRankPredictionwithMarkovModels.
WWWposter(2008)14.
Sayyadi,H.
,Getoor,L.
:FutureRank:RankingScienticArticlesbyPredictingtheirFuturePageRank.
In:SIAMIntern.
Confer.
onDataMining,pp.
533–544(2009)15.
Zacharouli,P.
,Titsias,M.
,Vazirgiannis,M.
:WebpagerankpredictionwithPCAandEMclustering.
In:Avrachenkov,K.
,Donato,D.
,Litvak,N.
(eds.
)WAW2009.
LNCS,vol.
5427,pp.
104–115.
Springer,Heidelberg(2009)

青云互联:香港安畅CN2弹性云限时首月五折,15元/月起,可选Windows/可自定义配置

青云互联怎么样?青云互联是一家成立于2020年的主机服务商,致力于为用户提供高性价比稳定快速的主机托管服务,目前提供有美国免费主机、香港主机、韩国服务器、香港服务器、美国云服务器,香港安畅cn2弹性云限时首月五折,15元/月起;可选Windows/可自定义配置,让您的网站高速、稳定运行。点击进入:青云互联官方网站地址青云互联优惠码:八折优惠码:ltY8sHMh (续费同价)青云互联香港云服务器活动...

限时新网有提供5+个免费域名

有在六月份的时候也有分享过新网域名注册商发布的域名促销活动(这里)。这不在九月份发布秋季域名促销活动,有提供年付16元的.COM域名,同时还有5个+的特殊后缀的域名是免费的。对于新网服务商是曾经非常老牌的域名注册商,早年也是有在他们家注册域名的。我们可以看到,如果有针对新用户的可以领到16元的.COM域名。包括还有首年免费的.XYZ、.SHOP、Space等等后缀的域名。除了.COM域名之外的其他...

PQS彼得巧 年中低至38折提供台湾彰化HiNet线路VPS主机 200M带宽

在六月初的时候有介绍过一次来自中国台湾的PQS彼得巧商家(在这里)。商家的特点是有提供台湾彰化HiNet线路VPS主机,起步带宽200M,从带宽速率看是不错的,不过价格也比较贵原价需要300多一个月,是不是很贵?当然懂的人可能会有需要。这次年中促销期间,商家也有提供一定的优惠。比如月付七折,年付达到38折,不过年付价格确实总价格比较高的。第一、商家优惠活动年付三八折优惠:PQS2021-618-C...

pagerank为你推荐
点击mediaphp计划任务windows系统下如何设置PHP定时任务360邮箱邮箱地址指的是什么?结点cuteftpduplicate500pintang俏品堂是干什么的?很多论坛都有他们的踪迹。即时通请问有没有人知道即时通是什么?怎样先可以开??三五互联股票三五互联是什么股票123456hdAPP上面带有HD是啥意思开源网店开源网店系统 独立网店系统 淘宝 有什么区别?
域名交易网 主机测评 lamp安装 云网数据 新加坡服务器 流媒体服务器 香港机房托管 监控宝 unsplash 新站长网 商务主机 股票老左 卡巴斯基是免费的吗 中国电信宽带测速器 厦门电信 web服务器是什么 空间登入 东莞服务器托管 秒杀品 中国联通宽带测试 更多