estimatepagerank
pagerank 时间:2021-04-19 阅读:(
)
AFrameworkforWebPageRankPredictionElliVoudigari1,JohnPavlopoulos1,andMichalisVazirgiannis1,2,1DepartmentofInformatics,AthensUniversityofEconomicsandBusiness,Greeceelliv@aueb.
gr,annis.
pavlo@gmail.
com,mvazirg@aueb.
gr2InstitutTelecom,EcoledeTelecomParisTech,DepartementInformatiqueetReseaux,Paris,FranceAbstract.
WeproposeaframeworkforpredictingtherankingpositionofaWebpagebasedonpreviousrankings.
Assumingasetofsuccessivetop-krankings,welearnpredictorsbasedondierentmethodologies.
Thepredictionqualityisquantiedasthesimilaritybetweenthepre-dictedandtheactualrankings.
Extensiveexperimentswereperformedonrealworldlargescaledatasetsforglobalandquery-basedtop-krank-ings,usingavarietyofexistingsimilaritymeasuresforcomparingtop-krankedlists,includinganovelandmorestrictmeasureintroducedinthispaper.
Thepredictionsarehighlyaccurateandrobustforallexperimen-talsetupsandsimilaritymeasures.
Keywords:RankPrediction,DataMining,WebMining,ArticialIntelligence.
1IntroductionTheWorldWideWebisahighlydynamicstructurecontinuouslychanging,asWebpagesandhyperlinksarecreated,deletedormodied.
Rankingoftheresultsisacornerstoneprocessenablinguserstoeectivelyretrieverelevantandimportantinformation.
GiventhehugesizeoftheWebgraph,computingrankingsofWebpagesrequiresawesomeresources-computationsonmatriceswhosesizeisoftheorderofWebsize(109nodes).
Ontheotherhandtheowneroftheindividualwebpagecanseeitsrankingonlyinthecaseofthewebgraphbysubmittingqueriestotheownerofthegraph(i.
e.
asearchengine).
Givenaseriesoftime-orderedrankingsofthenodesofagraphwhereeachbearsitsrankingforeachtimestamp,wedeveloplearn-ingmechanismsthatenablepredictionsofthenodesrankinginfuturetimes.
Thepredictionsrequireonlylocalfeatureknowledgewhilenoglobaldataarenecessary.
Specically,anindividualnodecanpredictitsrankingonlyknowingthevaluesofitsownranking.
Insuchacasethenodecouldplanactionsforoptimizingitsrankinginfuture.
InthispaperwepresentanintegratedeortforaframeworktowardsWebpagerankpredictionconsideringdierentlearningalgorithms.
WeconsiderPartiallysupportedbytheDIGITEOChairgrantLEVETONEinFranceandtheResearchCentreoftheAthensUniversityofEconomicsandBusiness,Greece.
L.
Iliadisetal.
(Eds.
):EANN/AIAI2011,PartII,IFIPAICT364,pp.
240–249,2011.
cIFIPInternationalFederationforInformationProcessing2011AFrameworkforWebPageRankPrediction241i)variableorderMarkovModels(MMs),ii)regressionmodelsandiii)anEMbasedapproachwithBayesianlearning.
ThenalpurposeistorepresentthetrendsandpredictfuturerankingsofWebpages.
Allthemodelsarelearnedfromtimeseriesdatasetswhereeachtrainingsetcorrespondstopre-processedrankvaluesofWebpagesobservedovertime.
Forallmethods,predictionqualityisevaluatedbasedonthesimilaritybe-tweenthepredictedandactualrankedlists,whilewefocusonthetop-kelementsoftheWebpagerankedlists,astoppagesareusuallymoreimportantinWebsearch.
Preliminaryworkonthistopicwaspresentedin[13]and[15].
Thecurrentworksignicantlydiersandadvancespreviousworksoftheauthorsinthefollowingways:a)Renedandcarefulre-engineeringoftheMMs'parameterlearningpro-cedurebyusingcrossvalidation,b)Integrationandelaborationoftheresultsof[15]inordertovalidatetheperformancecomparisonbetweenregression(boostedwithclustering)andMMpredictors,inlargescalerealworlddatasets,c)Namelyweadopt:LinearRegression,random1st/2nd/3rdorderMarkovmodelsprovingtherobustnessofthemodel,d)Anewtop-klistsimilaritymeasure(RSim)isintroducedandusedfortheevaluationofpredictorsandmoreimportantly,e)Additional,extensiveandrobustexperimentstookplaceusingquerybasedontop-klistsfromYahoo!
andGoogleSearchengine.
2RelatedWorkTherankingofqueryresultsinaWebsearch-engineisanimportantproblemandhasattractedsignicantattentionintheresearchcommunity.
TheproblemofpredictingPageRankispartlyaddressedin[9].
ItfocusesonWebpageclassicationbasedonURLfeatures.
Basedonthis,theauthorsperformexperimentstryingtomakePageRankpredictionsusingtheextractedfeatures.
Forthispurpose,theyuselinearregression;however,thecomplexityofthisapproachgrowslinearlyinproportiontothenumberoffeatures.
TheexperimentalresultsshowthatPageRankpredictionbasedonURLfeaturesdoesnotperformverywell,probablybecauseeventhoughtheycorrelateverywellwiththesubjectofpages,theydonotinuencepage'sauthorityinthesameway.
Arecentapproachtowardspagerankingpredictionispresentedin[13]gener-atingMarkovModelsfromhistoricalrankedlistsandusingthemforpredictions.
AnapproachthataimsatapproximatingPageRankvalueswithouttheneedofperformingthecomputationsovertheentiregraphis[6].
TheauthorsproposeanalgorithmtoincrementallycomputeapproximationstoPageRank,basedonevolutionofthelinkstructureofWebgraph(asetoflinkchanges).
Theirexper-imentsdemonstratethatthealgorithmperformswellbothinspeedandqualityandisrobusttovarioustypesoflinkmodications.
However,thisrequirescon-tinuousmonitoringoftheWebgraphinordertotrackanylinkmodications.
TherehasalsobeenworkinadaptivecomputationofPageRank([8],[11])orevenestimationofPageRankscores[7].
242E.
Voudigarietal.
In[10]amethodcalledpredictiverankingisproposed,aimingatestimatingtheWebstructurebasedontheintuitionthatthecrawlingandconsequentlytherankingresultsareinaccurate(duetoinadequatedataanddanglingpages).
Inthiswork,theauthorsdonotmakefuturerankpredictions.
Instead,theyestimatethemissingdatainordertoachievemoreaccuraterankings.
In[14]theauthorssuggestanewmeasureforrankingscienticarticles,basedonfuturecitations.
Basedonpublicationtimeandauthor'sname,theypredictfuturecitationsandsuggestabettermodel.
3PredictionMethodsInthissection,wepresentaframeworkthataimstopredictthefuturerankpositionofWebpagesbasedontheirtrendsshownthepast.
OurgoalistondpatternsinrankingevolutionofWebpages.
GivenasetofsuccessiveWebgraphsnapshots,foreachpagewegenerateasequenceofrankchangeratesthatindicatesthetrendsofthispageamongtheprevioussnapshots.
WeusethesesequencesofprevioussnapshotsoftheWebgraphasatrainingsetandtrytopredictthetrendsofaWebpagebasedonprevious.
Theremainingofthissectionisorganizedasfollows:InSect.
3.
1wetrainMMsofvariousordersandtrytopredictthetrendsofaWebpage.
Section3.
2discussesanapproachthatusesaseparatelinearregressionmodelforeachwebpage,whileSect.
3.
3combineslinearregressionwithclusteringbasedonanEMprobabilisticframework.
RankChangeRate.
InordertopredictfuturerankingsofWebpages,weneedtodeneameasureintroducedin[12]suitableformeasuringpagerankdynamics.
Webrieypresentitsdesign.
LetGtibethesnapshotoftheWebgraphcreatedbyacrawlandnti=|Gti|thenumberofWebpagesattimeti.
Then,rank(p,ti)isafunctionprovidingtherankingofaWebpagep∈Gti,accordingtosomecriterion(i.
e.
PageRankvalues).
Intuitively,anappropriatemeasureforWebpagestrendsistherankchangeratebetweentwosnapshots,butasthesizeoftheWebgraphconstantlyincreasesthetrendmeasureshouldbecomparableacrossdierentgraphsizes.
Thus,weutilizethenormalizedrank(nrank)ofaWebpage,asitwasdenedin[12].
Forapageprankedatpositionrank(p,ti):nrank(p,ti)=2·rank(p,ti)n2ti,whichrangesbetween2n2tiand2n1ti.
Then,usingthenormalizedranks,theRankChangeRate(Racer)isgivenbyracer(p,ti)=1nrank(p,ti+1)nrank(p,ti).
3.
1MarkovModelLearningMarkovModels(MMs)[1]havebeenwidelyusedforstudyingandunderstandingstochasticprocessesandbehaveverywellonmodelingandpredictingvaluesinvariousapplications.
Theirfundamentalassumptionisthatthefuturevaluedependsonanumberofmpreviousvalues,wheremistheorderoftheMM.
AFrameworkforWebPageRankPrediction243TheyaredenedbasedonasetofstatesS={s1,s2,sn}andamatrixToftransitionprobabilitiestieachofwhichrepresentstheprobabilitythatastatesioccursafterasequenceofstates.
OurgoalistorepresenttheWebpagesrankingtrendsacrossdierentwebgraphsnapshots.
WeusetheracervaluestodescribetherankchangeofaWebpagebetweentwosnapshotsandweutilizeracersequencestolearnMMs.
Ob-viously,stablerankingacrosstimeisrepresentedbyazeroracervalue,whileallothertrendsbyrealnumbersgeneratingahugespaceofdiscretevalues.
Asexpected(intuitivelymostpagesareexpectedtoremainstableforsometimeirrespectivetotheirrankatthetime),thezerovaluehasanunreasonablyhighfrequencycomparedtoallothervalueswhichmeansthatallstatesbesidesthezerooneshouldbeformedbyinherentrangesofvaluesinsteadofasingledis-crete.
Inordertoensureequalprobabilityfortransitionbetweenanypairofstates,weguaranteedequiprobablestatesbyformingrangeswithequalcumu-lativefrequencies(showingracervaluewithintherange)witheachother.
InordertocalculatethestatenumberforourMMs,wecomputedtherelativecumulativefrequencyofthezeroracerstateRFRacer=0andusedthistondtheoptimumnumberofstatesns=lRFRacer=0.
Next,weformednsequiprobablepartitionsandusedtheranges'meanaveragevaluesasstatestotrainourmodel.
Weshouldnotethatwithinthesignicantlyhighfrequencyofthezeroracervalues,arealsoconsideredpagesinitiallyobtainedwithinthetop-klistandthenfell(andremained)out.
WeremoveanybiasfromRFRacer=0,excludinganyvaluesnotcorrespondingtostablerankandobtainingRFRacer=0≈0.
1whichinturnsuggested10equiprobablestates.
PredictionswithRacer.
BasedonthesetofstatesmentionedaboveandformedtorepresentWebpagetrends,weareabletotrainMMsandpredictthetrendofaWebpageinthefutureaccordingtopasttrends.
Byassumingm+1temporallysuccessivecrawls,resultinginrespectivesnapshots,asequenceofmstates(representativeofracervalues)areconstructedforeachWebpage.
Theseareusedtoconstructanm-orderMM.
Notethatthememorymisaninherentfeatureofthemodel.
Aftercomputingtransitionprobabilitiesforeverypath,usingthegeneratedstates,thefuturestatescanbepredictedbyusingthechainrule[1].
Thus,foranm-orderMarkovModel,thepathprobabilityofastatesequenceisP(s1→.
.
.
→sm)=P(s1)·mi=2P(si|sim,.
.
,si1),whereeachsi(i∈{1,2,n})foranytimeintervalmayvaryoverallthepossiblestates(rangesofracervalues).
Then,predictingthefuturetrendofapageisperformedbycomputingthemostlikelynextstategiventhesofarstatepath.
Inspecic,assumingmtimeintervals,thenextmostprobablestateXiscomputedas:X=argmaxXP(s1→.
.
.
→sm1→X).
Usingthat,wepredictfuturestatesforeachpage.
AseachstateisthemeanofaRacerrange,wecomputebackthefuturenrank.
Therefore,weareabletopredictfuturetop-krankingbysortingtheracerofWebpagesinascendingorder.
244E.
Voudigarietal.
3.
2RegressionModelsAssumeasetofNWebpagesandobservationsofnrankvaluesatmtimesteps.
Letxi=(xi1,xim)bethenrankvaluesforWebpageiatthetimepointst=(t1,tm),wherethe(N*m)designmatrixXstoresalltheobservednrankvaluessothateachrowcorrespondstoaWebpageandeachcolumntoatimepoint.
GiventhesevalueswewishtopredictthenrankvaluexiforeachWebpageatsometimetwhichtypicallycorrespondstoafuturetimepoint(t>ti,i=1,m).
Next,wediscussasimplepredictionmethodbasedonlinearregressionwheretheinputvariablecorrespondstotimeandtheoutputtothenrankvalue.
ForacertainWebpageiweassumealinearregressionmodelhavingtheformxik=aitk+bi+k,k=1,m(kdenotesazero-meanGaussiannoise).
Notethattheparameters(ai,bi)areWebpage-specicandtheirvaluesarecalculatedusingleastsquares.
Inotherwords,theaboveformulationdenesaseparatelinearregressionmodelforeachWebpagethustheytreatindependently.
ThiscanberestrictivesincepossibleexistingsimilaritiesanddependenciesbetweendierentWebpagesarenottakenintoaccount.
3.
3ClusteringUsingEMWeassumethatthenrankvaluesofeachWebpagefallintooneofJdierentclusters.
Clusteringcanbeviewedastrainingamixtureprobabilitymodel.
TogeneratethenrankvaluesxiforWebpagei,werstselecttheclustertypejwithprobabilityπj(whereπj≥0andJj=1πj=1)andthenproducethevaluesxiaccordingtoalinearregressionmodelxik=aitk+bi+k,k=1,m,wherekisindependentGaussiannoisewithzeromeanandvarianceσ2j.
ThisimpliesthatgiventheclustertypejthenrankvaluesaredrawnfromtheproductofGaussiansp(xi|j)=mk=1N(xik|ajtk+bj,σ2j).
TheclustertypethatgeneratedthenrankvaluesofacertainWebpageisanunobservedvariableandthusaftermarginalizationweobtainamixtureuncon-ditionaldensityp(xi)=Jj=1πjp(xi|j)fortheobservationvectorxi.
Totrainthemixturemodelandestimatetheparametersθ=(πj,σ2j,aj,bj)j=1,.
.
.
,J,wecanmaximizetheloglikelihoodofthedataL(θ)=logNi=1p(xi)byusingtheEMalgorithm[2].
Givenaninitialstatefortheparameters,EMoptimizesoverθbyiteratingbetweenEandMsteps:TheEstepcomputestheposteriorprobabilitiesRij=πjp(xi|j)Jρ=1πρp(xi|ρ),forj=1,Jandi=1,N,(Nisthetotalnumberofwebpages).
TheMstepupdatestheparametersaccordingto:πj=1NNi=1Rij,σ2j=Ni=1Rijmk=1(xikajtkbj)2πjandajbj=1NjtTttT1tT1m1Ni=1RijxTitNi=1RijxTi1,j=1,J,tisthevectorofalltimepointsand1isthem-dimensionalvectorofones.
Oncewehaveobtainedsuitablevaluesfortheparameters,wecanusethemixturemodelforprediction.
Particularly,topredictthenrankvaluexiofWebAFrameworkforWebPageRankPrediction245pageiattgiventheobservedvaluesxi=(xi1,xim)atprevioustimes,weexpresstheposteriordistributionp(xi|xi)usingtheBayesrulep(xi|xi)=Jj=1RijNxiajt+bj,s2j,whereRijiscomputedaccordingtoE-step.
Toobtainaspecicpredictivevalueforxi,wecanusethemeanvalueoftheaboveposteriordistributionxi=Jj=1Rij(ajt+bj)orthemedianestimatexi=ajt+bj,wherej=argmaxρRiρthatconsidersahardassignmentoftheWebpageintooneoftheJclusters.
4Top-kListSimilarityMeasuresInordertoevaluatethequalityofpredictions,weneedtomeasurethesimilar-ityofthepredictedtotheactualtop-kranking.
Forthispurpose,weexaminemeasurescommonlyusedforcomparingrankings,pointouttheshortcomingsofexistinganddeneanewsimilaritymeasurefortop-krankings,denotedasRSim.
4.
1ExistingSimilarityMeasuresTherstone,denotedasOSim(A,B)[4]indicatesthedegreeofoverlapbetweenthetop-kelementsoftwosetsAandB(eachoneofsizek):OSim(A,B)=|A∩B|k.
Thesecond,KSim(A,B)[4],isbasedonKendall'sdistancemeasure[3]andindicatesthedegreethattherelativeorderingsoftwotop-klistsareinagreement:KSim(A,B)=|(u,v):A,B,agreeinorder||A∪B|(|A∪B|1),whereAisanextensionofAresultingfromappendingatitstailtheelementsx∈A∪(BA)andBisdenedanalogously.
AnotherinterestingmeasureintroducedinInformationRetrievalforevaluat-ingtheaccumulatedrelevanceofatop-kdocumentlisttoaqueryisthe(Nor-malized)DiscountedCumulativeGain(N(DCG))[5].
Thismeasureassumesatop-klist,whereeachdocumentisfeaturedwitharelevancescoreaccumulatedbyscanningthelistfromtoptobottom.
AlthoughDCGcouldbeusedfortheevaluationofourpredictions,sinceittakesintoaccounttherelevanceofatop-klisttoanother,itexhibitssomebasicfeaturesthatpreventedusfromusingitinourexperiments.
Itpenalizeserrorsbymaintaininganincreasingvalueofcumulativerelevance.
Whilethisisbasedontherankofeachdocument,thesizekofthelistisnottakenintoaccount–thusthelengthofthelistisirrelevantinDCG.
Errorsintopranksofatop-klistshouldbeconsideredmoreimpor-tantthanerrorsinlow-rankedpositions.
ThisimportantfeaturelacksfrombothDCGandNDCGmeasures.
Moreover,DCGvalueforeachrankinthetop-klistiscomputedtakingintoaccountthepreviousvaluesinthelist.
Next,weintroduceSpearman'sRankCorrelationCoecient,whichwasusedduringtheexperimentalevaluation,consistsanon-parametric(distribution-free)rankstatisticproposedbySpearman(1904)measuringthestrengthofassoci-ationsbetweentwovariablesandisoftensymbolizedbyρ.
Itestimateshowwelltherelationshipbetweentwovariablescanbedescribedusingamonotonic246E.
Voudigarietal.
function.
Iftherearenorepeateddatavaluesofthesevariables(likeinrankingproblem),aperfectSpearmancorrelationof+1or-1existsifeachvariableisaperfectmonotonefunctionoftheother.
ItisoftenconfusedwiththePearsoncorrelationcoecientbetweenrankedvariables.
However,theprocedureusedtocalculateρismuchsimpler.
IfXandYaretwovariableswithcorrespondingranksxiandyi,di=xiyi,i=1,n,betweentheranksofeachobservationonthetwovariables,thenitisgivenby:ρ=16·ni=1d2in(n21).
4.
2RSimQualityMeasureTheobservedsimilaritymeasuresdonotcoversucientlythenegrainedre-quirementsarising,comparingtop-krankingsintheWebsearchcontext.
Soweneedanewsimilaritymetrictakingintoconsideration:a)TheabsolutedierencebetweenthepredictedandactualpositionforeachWebpageaslargedierenceindicatesalessaccuratepredictionandb)TheactualrankingpositionofaWebpage,becausefailingtopredictahighlyrankedWebpageismoreimportantthanalow-ranked.
Basedontheseobservations,weintroduceanewmeasure,namedRSim.
Everyinaccuratepredictionmadeincursacertainpenaltydependingonthetwonotedfactors.
Ifpredictionis100%accurate(samepredictedandactualrank),thepenaltyisequaltozero.
LetBibethepredictedrankpositionforpageiandAitheactual.
TheCumulativePenaltyScore(CPS)iscomputedasCPS(A,B)=ki=1|AiBi|·(k+1Ai).
TheproposedpenaltyscoreCPSrepresentstheoverallerror(dierence)be-tweentheinvolvedtop-klistsAandBandisproportionalto|AiBi|.
Theterm(k+1Ai)increaseswhenAibecomessmallersoerrorsinhighlyrankedWebpagesarepenalizedmore.
Inthebestcase,rankpredictionsforallWebpagesarecompletelyaccurate(CPS=0),sinceAi=Biforanyvalueofi.
Intheworstcase,therankpredictionsforallWebpagesnotonlyareinaccurate,butalsobearthegreatestCPSpenaltypossible.
Insuchascenario,alltheWebpagespredictedtobeinthetop-klist,actuallyholdthepositionk+1(orworse).
Assumingthatwewanttocomparetworankingsoflengthk,thenthemax-imumCPSforevenandoddvaluesofkisequalto2k3+3k2+k6.
TheproofforCPSmaxnalformisomittedduetospacelimitations.
Basedontheabovewedeneanewsimilaritymeasure,RSim,tocomparethesimilaritybetweentop-kranklistsasfollows:RSim(Ai,Bi)=1CPS(Ai,Bi)CPSmax(Ai,Bi).
(1)Inthebest-casepredictionscenario,RSimisequaltoone,whileintheworst-caseRSimisequaltozero.
SothecloserthevalueofRSimistoone,thebetterandmoreaccuratetherankpredictionsare.
AFrameworkforWebPageRankPrediction2470501001502002503000.
50.
550.
60.
650.
70.
750.
80.
850.
90.
951topkOSimMM1MM2MM3LinRegBayesMod(a)OSim0501001502002503000.
550.
60.
650.
70.
750.
80.
85topkKSimMM1MM2MM3LinRegBayesMod(b)KSim0501001502002503000.
50.
550.
60.
650.
70.
75topkRSimMM1MM2MM3LinRegBayesMod(c)RSim0501001502002503000.
650.
70.
750.
80.
850.
90.
951topknDCGMM1MM2MM3LinRegBayesMod(d)NDCG0501001502002503000.
70.
750.
80.
850.
90.
951topkSpearmanCorrelationMM1MM2MM3LinRegBayesMod(e)SpearmanCorrelationFig.
1.
PredictionaccuracyvsTop-klistlength-Yahoodataset5ExperimentalEvaluationInordertoevaluatetheeectivenessofourmethodsweperformedexperimentsontwodierentrealworlddatasets.
Theseconsistcollectionsoftop-krankedlistsfor22queriesoveraperiodof11daysasresultedfromtheYahoo!
1andtheGooglesearchengines,producedinthesameway.
Inourexperiments,weevaluatethepredictionqualityintermsofsimilaritiesbetweenthepredictedandtheactualtop-krankedlistsusingOSim,KSim,NDCG,SpearmancorrelationandthenovelsimilaritymeasureRSim.
5.
1DatasetsandQuerySelectionForeachdataset(YahooandGoogle)awealthofsnapshotswereavailable,en-suringwehaveenoughevolutiontotestourapproach.
Aconcisedescriptionofeachdatasetandquery-basedapproachfollow.
TheYahooandGoogledatasetsconsistof11consecutivedailytop-1000rankedlistscomputedusingtheYa-hooSearchWebServices2andtheGoogleSearchenginerespectively.
Thesesetswerepickedfrompopular:a)queriesappearedinGoogleTrends3andb)currentqueries(i.
e.
euro2008orOlympicgames2008).
5.
2ExperimentalMethodologyWecomparedallpredictionsamongthevariousapproachesandwenextdescribethestepsassumedforbothdatasets.
Atrst,wecomputedPageRankscoresfor1http://search.
yahoo.
com2http://developer.
yahoo.
com/search/3http://www.
google.
com/trends248E.
Voudigarietal.
0501001502002503000.
20.
30.
40.
50.
60.
70.
80.
91topkOSimMM1MM2MM3LinRegBayesMod(a)OSim0501001502002503000.
50.
550.
60.
650.
70.
750.
8topkKSimMM1MM2MM3LinRegBayesMod(b)KSim0501001502002503000.
30.
40.
50.
60.
70.
8topkRSimMM1MM2MM3LinRegBayesMod(c)RSim0501001502002503000.
40.
50.
60.
70.
80.
91topknDCGMM1MM2MM3LinRegBayesMod(d)NDCG0501001502002503000.
650.
70.
750.
80.
850.
90.
951topkSpearmanCorrelationMM1MM2MM3LinRegBayesMod(e)SpearmanCorrelationFig.
2.
PredictionaccuracyvsTop-klistlength-Googledataseteachsnapshotofourdatasetsandobtainedthetop-krankingsusingthescoringfunctionmentioned.
Havingcomputedthescores,wecalculatedthenrank(racervaluesforMMs)foreachpairofconsecutivegraphsnapshotsandstoredtheminamatrixnrank(racer)*time.
Then,assuminganm-pathofconsecutivesnapshots,wepredictthem+1state.
Foreachpagep,wepredictarankingcomparingittoactualbya10-foldcrossvalidationprocess(training90%ofdatasetandtestingontheremaining10%).
InthecaseoftheEMapproach,wetestedthequalityofclusteringresultsforclusterscardinalitybetween2and10foreachqueryandchosetheonethatmaximizedtheoverallqualityofclustering.
Thiswasdenedasamonotonecombinationofwithin-clusterwc(sumofsquareddistancesfromeachpointtothecenterofclusteritbelongsto)andbetween-clustervariationbc(distancebetweenclustercenters).
Asscorefunctionofclustering,weconsideredtheratiobc/wc.
5.
3ExperimentalResultsRegardingtheGoogleandYahoo!
datasetresultscomingoutoftheexperimen-talevaluation,onecanseethattheMMsprevailwithveryaccurateresults.
Regressionbasedtechniques(LinReg)reachandoutweighMMsperformanceasthelengthoftop-klistincreasesprovingtheirrobustness.
InbothdatasetsexperimentsprovethesuperiorityofEMapproach(BayesMod)whoseperformanceisverysatisfyingforallsimilaritymeasures.
TheMMscomenextintheevaluationranking,whereassmallertheorderisthebetteristhepredictionaccuracy,thoughonewouldthinkofthecontrary.
AFrameworkforWebPageRankPrediction249Obviously(gures)theproposedframeworkoersincrediblyhighaccuracypredictionsandisveryencouraging,asitrangessystematicallybetween70%and100%providingatoolforeectivepredictions.
6ConclusionsWehavedescribedpredictorlearningalgorithmsforWebpagerankpredictionbasedonaframeworkoflearningtechniques(MMs,LinReg,BayesMod)andexperimentalstudyshowedthattheycanachieveoverallverygoodpredictionperformance.
Furtherworkwillfocusinthefollowingissues:a)Multi-featureprediction:weintendtodealwiththeinternalmechanismthatproducestherankingofpages(notonlyrankvalues)basedonmultiplefeatures,b)Combi-nationofsuchmethodswithdimensionalityreductiontechniques.
References1.
Kemeny,J.
G.
,Snell,J.
L.
:FiniteMarkovChains.
Prinston(1963)2.
Dempster,A.
P.
,Laird,N.
M.
,Rubin,D.
B.
:MaximumLikelihoodfromIncompleteDataviatheEMAlgorithm.
RoyalStatisticalSociety39,1–38(1977)3.
Kendall,M.
G.
,Gibbons,J.
D.
:RankCorrelationMethods.
CharlesGrin,UK(1990)4.
Haveliwala,T.
H.
:Topic-SensitivePageRank.
In:Proc.
WWW(2002)5.
Jarvelin,K.
,Kekalainen,J.
:Cumulatedgain-basedevaluationofIRtechniques.
TOIS20,422–446(2002)6.
Chien,S.
,Dwork,C.
,Kumar,R.
,Simon,D.
R.
,Sivakumar,D.
:LinkEvolu-tion:AnalysisandAlgorithms.
InternetMathematics1,277–304(2003)7.
Chen,Y.
-Y.
,Gan,Q.
,Suel,T.
:LocalMethodsforEstimatingPageRankValues.
In:Proc.
ofCIKM(2004)8.
Langville,A.
N.
,Meyer,C.
D.
:UpdatingPageRankwithiterativeaggregation.
In:Proc.
ofthe13thInternationalWorldWideWebConferenceonAlternateTrackPapersandPosters,pp.
392–393(2004)9.
Kan,M.
-Y.
,Thi,H.
O.
:FastwebpageclassicationusingURLfeatures.
In:Confer-enceonInformationandKnowledgeManagement,pp.
325–326.
ACM,NewYork(2005)10.
Yang,H.
,King,I.
,Lu,M.
R.
:PredictiveRanking:ANovelPageRankingApproachbyEstimatingtheWebStructure.
In:Proc.
ofthe14thInternationalWWWCon-ference,pp.
1825–1832(2005)11.
Broder,A.
Z.
,Lempel,R.
,Maghoul,F.
,Pedersen,J.
:EcientPageRankapproxi-mationviagraphaggregation.
Inf.
Retrieval9,123–138(2006)12.
Vlachou,A.
,Berberich,K.
,Vazirgiannis,M.
:Representingandquantifyingrank-changefortheWebgraph.
In:Aiello,W.
,Broder,A.
,Janssen,J.
,Milios,E.
E.
(eds.
)WAW2006.
LNCS,vol.
4936,pp.
157–165.
Springer,Heidelberg(2008)13.
Vazirgiannis,M.
,Drosos,D.
,Senellart,P.
,Vlachou,A.
:WebPageRankPredictionwithMarkovModels.
WWWposter(2008)14.
Sayyadi,H.
,Getoor,L.
:FutureRank:RankingScienticArticlesbyPredictingtheirFuturePageRank.
In:SIAMIntern.
Confer.
onDataMining,pp.
533–544(2009)15.
Zacharouli,P.
,Titsias,M.
,Vazirgiannis,M.
:WebpagerankpredictionwithPCAandEMclustering.
In:Avrachenkov,K.
,Donato,D.
,Litvak,N.
(eds.
)WAW2009.
LNCS,vol.
5427,pp.
104–115.
Springer,Heidelberg(2009)
现在宝塔面板真的是越来越过分了,删除文件、删除数据库、删除站点等操作都需要做计算题!我今天升级到7.7版本,发现删除数据库竟然还加了几秒的延时等待,也无法跳过!宝塔的老板该不会是小学数学老师吧,那么喜欢让我们做计算题!因此我写了个js用于去除各种计算题以及延时等待,同时还去除了软件列表页面的bt企业版广告。只需要执行以下命令即可一键完成!复制以下命令在SSH界面执行:Layout_file="/w...
DogYun怎么样?DogYun是一家2019年成立的国人主机商,称为狗云,提供VPS及独立服务器租用,其中VPS分为经典云和动态云(支持小时计费及随时可删除),DogYun云服务器基于Kernel-based Virtual Machine(Kvm)硬件的完全虚拟化架构,您可以在弹性云中,随时调整CPU,内存,硬盘,网络,IPv4路线(如果该数据中心接入了多条路线)等。DogYun弹性云服务器优...
华纳云(HNCloud Limited)是一家专业的全球数据中心基础服务提供商,总部在香港,隶属于香港联合通讯国际有限公司,拥有香港政府颁发的商业登记证明,保证用户的安全性和合规性。 华纳云是APNIC 和 ARIN 会员单位。主要提供数据中心基础服务、互联网业务解决方案, 以及香港服务器租用、香港服务器托管、香港云服务器、美国云服务器,云计算、云安全技术研发等产品和服务。其中云服务器基于成熟的 ...
pagerank为你推荐
access数据库修复求救,ACCESS数据库破坏了,怎么修复?开启javascript电脑怎样开启javascript?????????要步骤!!!!!!?!ym.163.com免费企业邮箱重庆400年老树穿楼生长重庆海拔500左右的红沙土适合栽哪种果树支持http大飞资讯单仁资讯集团怎么样徐州商标徐州松木家具前十名香盛圆排第几12306.com注册12306邮箱地址怎么写curl扩展大神帮忙看下centos 7.2 系统 php7.0.12的 curl 扩展怎么开启,谢谢啦电子商务世界世界前十大电子商务企业名字
网站域名注册 互联网域名管理办法 hostgator blackfriday ix主机 美国便宜货网站 主机合租 国内php空间 上海域名 重庆双线服务器托管 酷番云 idc查询 数据库空间 杭州电信 mteam wordpress空间 zcloud 学生机 腾讯云平台 镇江高防服务器 更多