estimatepagerank

pagerank  时间:2021-04-19  阅读:()
AFrameworkforWebPageRankPredictionElliVoudigari1,JohnPavlopoulos1,andMichalisVazirgiannis1,2,1DepartmentofInformatics,AthensUniversityofEconomicsandBusiness,Greeceelliv@aueb.
gr,annis.
pavlo@gmail.
com,mvazirg@aueb.
gr2InstitutTelecom,EcoledeTelecomParisTech,DepartementInformatiqueetReseaux,Paris,FranceAbstract.
WeproposeaframeworkforpredictingtherankingpositionofaWebpagebasedonpreviousrankings.
Assumingasetofsuccessivetop-krankings,welearnpredictorsbasedondierentmethodologies.
Thepredictionqualityisquantiedasthesimilaritybetweenthepre-dictedandtheactualrankings.
Extensiveexperimentswereperformedonrealworldlargescaledatasetsforglobalandquery-basedtop-krank-ings,usingavarietyofexistingsimilaritymeasuresforcomparingtop-krankedlists,includinganovelandmorestrictmeasureintroducedinthispaper.
Thepredictionsarehighlyaccurateandrobustforallexperimen-talsetupsandsimilaritymeasures.
Keywords:RankPrediction,DataMining,WebMining,ArticialIntelligence.
1IntroductionTheWorldWideWebisahighlydynamicstructurecontinuouslychanging,asWebpagesandhyperlinksarecreated,deletedormodied.
Rankingoftheresultsisacornerstoneprocessenablinguserstoeectivelyretrieverelevantandimportantinformation.
GiventhehugesizeoftheWebgraph,computingrankingsofWebpagesrequiresawesomeresources-computationsonmatriceswhosesizeisoftheorderofWebsize(109nodes).
Ontheotherhandtheowneroftheindividualwebpagecanseeitsrankingonlyinthecaseofthewebgraphbysubmittingqueriestotheownerofthegraph(i.
e.
asearchengine).
Givenaseriesoftime-orderedrankingsofthenodesofagraphwhereeachbearsitsrankingforeachtimestamp,wedeveloplearn-ingmechanismsthatenablepredictionsofthenodesrankinginfuturetimes.
Thepredictionsrequireonlylocalfeatureknowledgewhilenoglobaldataarenecessary.
Specically,anindividualnodecanpredictitsrankingonlyknowingthevaluesofitsownranking.
Insuchacasethenodecouldplanactionsforoptimizingitsrankinginfuture.
InthispaperwepresentanintegratedeortforaframeworktowardsWebpagerankpredictionconsideringdierentlearningalgorithms.
WeconsiderPartiallysupportedbytheDIGITEOChairgrantLEVETONEinFranceandtheResearchCentreoftheAthensUniversityofEconomicsandBusiness,Greece.
L.
Iliadisetal.
(Eds.
):EANN/AIAI2011,PartII,IFIPAICT364,pp.
240–249,2011.
cIFIPInternationalFederationforInformationProcessing2011AFrameworkforWebPageRankPrediction241i)variableorderMarkovModels(MMs),ii)regressionmodelsandiii)anEMbasedapproachwithBayesianlearning.
ThenalpurposeistorepresentthetrendsandpredictfuturerankingsofWebpages.
Allthemodelsarelearnedfromtimeseriesdatasetswhereeachtrainingsetcorrespondstopre-processedrankvaluesofWebpagesobservedovertime.
Forallmethods,predictionqualityisevaluatedbasedonthesimilaritybe-tweenthepredictedandactualrankedlists,whilewefocusonthetop-kelementsoftheWebpagerankedlists,astoppagesareusuallymoreimportantinWebsearch.
Preliminaryworkonthistopicwaspresentedin[13]and[15].
Thecurrentworksignicantlydiersandadvancespreviousworksoftheauthorsinthefollowingways:a)Renedandcarefulre-engineeringoftheMMs'parameterlearningpro-cedurebyusingcrossvalidation,b)Integrationandelaborationoftheresultsof[15]inordertovalidatetheperformancecomparisonbetweenregression(boostedwithclustering)andMMpredictors,inlargescalerealworlddatasets,c)Namelyweadopt:LinearRegression,random1st/2nd/3rdorderMarkovmodelsprovingtherobustnessofthemodel,d)Anewtop-klistsimilaritymeasure(RSim)isintroducedandusedfortheevaluationofpredictorsandmoreimportantly,e)Additional,extensiveandrobustexperimentstookplaceusingquerybasedontop-klistsfromYahoo!
andGoogleSearchengine.
2RelatedWorkTherankingofqueryresultsinaWebsearch-engineisanimportantproblemandhasattractedsignicantattentionintheresearchcommunity.
TheproblemofpredictingPageRankispartlyaddressedin[9].
ItfocusesonWebpageclassicationbasedonURLfeatures.
Basedonthis,theauthorsperformexperimentstryingtomakePageRankpredictionsusingtheextractedfeatures.
Forthispurpose,theyuselinearregression;however,thecomplexityofthisapproachgrowslinearlyinproportiontothenumberoffeatures.
TheexperimentalresultsshowthatPageRankpredictionbasedonURLfeaturesdoesnotperformverywell,probablybecauseeventhoughtheycorrelateverywellwiththesubjectofpages,theydonotinuencepage'sauthorityinthesameway.
Arecentapproachtowardspagerankingpredictionispresentedin[13]gener-atingMarkovModelsfromhistoricalrankedlistsandusingthemforpredictions.
AnapproachthataimsatapproximatingPageRankvalueswithouttheneedofperformingthecomputationsovertheentiregraphis[6].
TheauthorsproposeanalgorithmtoincrementallycomputeapproximationstoPageRank,basedonevolutionofthelinkstructureofWebgraph(asetoflinkchanges).
Theirexper-imentsdemonstratethatthealgorithmperformswellbothinspeedandqualityandisrobusttovarioustypesoflinkmodications.
However,thisrequirescon-tinuousmonitoringoftheWebgraphinordertotrackanylinkmodications.
TherehasalsobeenworkinadaptivecomputationofPageRank([8],[11])orevenestimationofPageRankscores[7].
242E.
Voudigarietal.
In[10]amethodcalledpredictiverankingisproposed,aimingatestimatingtheWebstructurebasedontheintuitionthatthecrawlingandconsequentlytherankingresultsareinaccurate(duetoinadequatedataanddanglingpages).
Inthiswork,theauthorsdonotmakefuturerankpredictions.
Instead,theyestimatethemissingdatainordertoachievemoreaccuraterankings.
In[14]theauthorssuggestanewmeasureforrankingscienticarticles,basedonfuturecitations.
Basedonpublicationtimeandauthor'sname,theypredictfuturecitationsandsuggestabettermodel.
3PredictionMethodsInthissection,wepresentaframeworkthataimstopredictthefuturerankpositionofWebpagesbasedontheirtrendsshownthepast.
OurgoalistondpatternsinrankingevolutionofWebpages.
GivenasetofsuccessiveWebgraphsnapshots,foreachpagewegenerateasequenceofrankchangeratesthatindicatesthetrendsofthispageamongtheprevioussnapshots.
WeusethesesequencesofprevioussnapshotsoftheWebgraphasatrainingsetandtrytopredictthetrendsofaWebpagebasedonprevious.
Theremainingofthissectionisorganizedasfollows:InSect.
3.
1wetrainMMsofvariousordersandtrytopredictthetrendsofaWebpage.
Section3.
2discussesanapproachthatusesaseparatelinearregressionmodelforeachwebpage,whileSect.
3.
3combineslinearregressionwithclusteringbasedonanEMprobabilisticframework.
RankChangeRate.
InordertopredictfuturerankingsofWebpages,weneedtodeneameasureintroducedin[12]suitableformeasuringpagerankdynamics.
Webrieypresentitsdesign.
LetGtibethesnapshotoftheWebgraphcreatedbyacrawlandnti=|Gti|thenumberofWebpagesattimeti.
Then,rank(p,ti)isafunctionprovidingtherankingofaWebpagep∈Gti,accordingtosomecriterion(i.
e.
PageRankvalues).
Intuitively,anappropriatemeasureforWebpagestrendsistherankchangeratebetweentwosnapshots,butasthesizeoftheWebgraphconstantlyincreasesthetrendmeasureshouldbecomparableacrossdierentgraphsizes.
Thus,weutilizethenormalizedrank(nrank)ofaWebpage,asitwasdenedin[12].
Forapageprankedatpositionrank(p,ti):nrank(p,ti)=2·rank(p,ti)n2ti,whichrangesbetween2n2tiand2n1ti.
Then,usingthenormalizedranks,theRankChangeRate(Racer)isgivenbyracer(p,ti)=1nrank(p,ti+1)nrank(p,ti).
3.
1MarkovModelLearningMarkovModels(MMs)[1]havebeenwidelyusedforstudyingandunderstandingstochasticprocessesandbehaveverywellonmodelingandpredictingvaluesinvariousapplications.
Theirfundamentalassumptionisthatthefuturevaluedependsonanumberofmpreviousvalues,wheremistheorderoftheMM.
AFrameworkforWebPageRankPrediction243TheyaredenedbasedonasetofstatesS={s1,s2,sn}andamatrixToftransitionprobabilitiestieachofwhichrepresentstheprobabilitythatastatesioccursafterasequenceofstates.
OurgoalistorepresenttheWebpagesrankingtrendsacrossdierentwebgraphsnapshots.
WeusetheracervaluestodescribetherankchangeofaWebpagebetweentwosnapshotsandweutilizeracersequencestolearnMMs.
Ob-viously,stablerankingacrosstimeisrepresentedbyazeroracervalue,whileallothertrendsbyrealnumbersgeneratingahugespaceofdiscretevalues.
Asexpected(intuitivelymostpagesareexpectedtoremainstableforsometimeirrespectivetotheirrankatthetime),thezerovaluehasanunreasonablyhighfrequencycomparedtoallothervalueswhichmeansthatallstatesbesidesthezerooneshouldbeformedbyinherentrangesofvaluesinsteadofasingledis-crete.
Inordertoensureequalprobabilityfortransitionbetweenanypairofstates,weguaranteedequiprobablestatesbyformingrangeswithequalcumu-lativefrequencies(showingracervaluewithintherange)witheachother.
InordertocalculatethestatenumberforourMMs,wecomputedtherelativecumulativefrequencyofthezeroracerstateRFRacer=0andusedthistondtheoptimumnumberofstatesns=lRFRacer=0.
Next,weformednsequiprobablepartitionsandusedtheranges'meanaveragevaluesasstatestotrainourmodel.
Weshouldnotethatwithinthesignicantlyhighfrequencyofthezeroracervalues,arealsoconsideredpagesinitiallyobtainedwithinthetop-klistandthenfell(andremained)out.
WeremoveanybiasfromRFRacer=0,excludinganyvaluesnotcorrespondingtostablerankandobtainingRFRacer=0≈0.
1whichinturnsuggested10equiprobablestates.
PredictionswithRacer.
BasedonthesetofstatesmentionedaboveandformedtorepresentWebpagetrends,weareabletotrainMMsandpredictthetrendofaWebpageinthefutureaccordingtopasttrends.
Byassumingm+1temporallysuccessivecrawls,resultinginrespectivesnapshots,asequenceofmstates(representativeofracervalues)areconstructedforeachWebpage.
Theseareusedtoconstructanm-orderMM.
Notethatthememorymisaninherentfeatureofthemodel.
Aftercomputingtransitionprobabilitiesforeverypath,usingthegeneratedstates,thefuturestatescanbepredictedbyusingthechainrule[1].
Thus,foranm-orderMarkovModel,thepathprobabilityofastatesequenceisP(s1→.
.
.
→sm)=P(s1)·mi=2P(si|sim,.
.
,si1),whereeachsi(i∈{1,2,n})foranytimeintervalmayvaryoverallthepossiblestates(rangesofracervalues).
Then,predictingthefuturetrendofapageisperformedbycomputingthemostlikelynextstategiventhesofarstatepath.
Inspecic,assumingmtimeintervals,thenextmostprobablestateXiscomputedas:X=argmaxXP(s1→.
.
.
→sm1→X).
Usingthat,wepredictfuturestatesforeachpage.
AseachstateisthemeanofaRacerrange,wecomputebackthefuturenrank.
Therefore,weareabletopredictfuturetop-krankingbysortingtheracerofWebpagesinascendingorder.
244E.
Voudigarietal.
3.
2RegressionModelsAssumeasetofNWebpagesandobservationsofnrankvaluesatmtimesteps.
Letxi=(xi1,xim)bethenrankvaluesforWebpageiatthetimepointst=(t1,tm),wherethe(N*m)designmatrixXstoresalltheobservednrankvaluessothateachrowcorrespondstoaWebpageandeachcolumntoatimepoint.
GiventhesevalueswewishtopredictthenrankvaluexiforeachWebpageatsometimetwhichtypicallycorrespondstoafuturetimepoint(t>ti,i=1,m).
Next,wediscussasimplepredictionmethodbasedonlinearregressionwheretheinputvariablecorrespondstotimeandtheoutputtothenrankvalue.
ForacertainWebpageiweassumealinearregressionmodelhavingtheformxik=aitk+bi+k,k=1,m(kdenotesazero-meanGaussiannoise).
Notethattheparameters(ai,bi)areWebpage-specicandtheirvaluesarecalculatedusingleastsquares.
Inotherwords,theaboveformulationdenesaseparatelinearregressionmodelforeachWebpagethustheytreatindependently.
ThiscanberestrictivesincepossibleexistingsimilaritiesanddependenciesbetweendierentWebpagesarenottakenintoaccount.
3.
3ClusteringUsingEMWeassumethatthenrankvaluesofeachWebpagefallintooneofJdierentclusters.
Clusteringcanbeviewedastrainingamixtureprobabilitymodel.
TogeneratethenrankvaluesxiforWebpagei,werstselecttheclustertypejwithprobabilityπj(whereπj≥0andJj=1πj=1)andthenproducethevaluesxiaccordingtoalinearregressionmodelxik=aitk+bi+k,k=1,m,wherekisindependentGaussiannoisewithzeromeanandvarianceσ2j.
ThisimpliesthatgiventheclustertypejthenrankvaluesaredrawnfromtheproductofGaussiansp(xi|j)=mk=1N(xik|ajtk+bj,σ2j).
TheclustertypethatgeneratedthenrankvaluesofacertainWebpageisanunobservedvariableandthusaftermarginalizationweobtainamixtureuncon-ditionaldensityp(xi)=Jj=1πjp(xi|j)fortheobservationvectorxi.
Totrainthemixturemodelandestimatetheparametersθ=(πj,σ2j,aj,bj)j=1,.
.
.
,J,wecanmaximizetheloglikelihoodofthedataL(θ)=logNi=1p(xi)byusingtheEMalgorithm[2].
Givenaninitialstatefortheparameters,EMoptimizesoverθbyiteratingbetweenEandMsteps:TheEstepcomputestheposteriorprobabilitiesRij=πjp(xi|j)Jρ=1πρp(xi|ρ),forj=1,Jandi=1,N,(Nisthetotalnumberofwebpages).
TheMstepupdatestheparametersaccordingto:πj=1NNi=1Rij,σ2j=Ni=1Rijmk=1(xikajtkbj)2πjandajbj=1NjtTttT1tT1m1Ni=1RijxTitNi=1RijxTi1,j=1,J,tisthevectorofalltimepointsand1isthem-dimensionalvectorofones.
Oncewehaveobtainedsuitablevaluesfortheparameters,wecanusethemixturemodelforprediction.
Particularly,topredictthenrankvaluexiofWebAFrameworkforWebPageRankPrediction245pageiattgiventheobservedvaluesxi=(xi1,xim)atprevioustimes,weexpresstheposteriordistributionp(xi|xi)usingtheBayesrulep(xi|xi)=Jj=1RijNxiajt+bj,s2j,whereRijiscomputedaccordingtoE-step.
Toobtainaspecicpredictivevalueforxi,wecanusethemeanvalueoftheaboveposteriordistributionxi=Jj=1Rij(ajt+bj)orthemedianestimatexi=ajt+bj,wherej=argmaxρRiρthatconsidersahardassignmentoftheWebpageintooneoftheJclusters.
4Top-kListSimilarityMeasuresInordertoevaluatethequalityofpredictions,weneedtomeasurethesimilar-ityofthepredictedtotheactualtop-kranking.
Forthispurpose,weexaminemeasurescommonlyusedforcomparingrankings,pointouttheshortcomingsofexistinganddeneanewsimilaritymeasurefortop-krankings,denotedasRSim.
4.
1ExistingSimilarityMeasuresTherstone,denotedasOSim(A,B)[4]indicatesthedegreeofoverlapbetweenthetop-kelementsoftwosetsAandB(eachoneofsizek):OSim(A,B)=|A∩B|k.
Thesecond,KSim(A,B)[4],isbasedonKendall'sdistancemeasure[3]andindicatesthedegreethattherelativeorderingsoftwotop-klistsareinagreement:KSim(A,B)=|(u,v):A,B,agreeinorder||A∪B|(|A∪B|1),whereAisanextensionofAresultingfromappendingatitstailtheelementsx∈A∪(BA)andBisdenedanalogously.
AnotherinterestingmeasureintroducedinInformationRetrievalforevaluat-ingtheaccumulatedrelevanceofatop-kdocumentlisttoaqueryisthe(Nor-malized)DiscountedCumulativeGain(N(DCG))[5].
Thismeasureassumesatop-klist,whereeachdocumentisfeaturedwitharelevancescoreaccumulatedbyscanningthelistfromtoptobottom.
AlthoughDCGcouldbeusedfortheevaluationofourpredictions,sinceittakesintoaccounttherelevanceofatop-klisttoanother,itexhibitssomebasicfeaturesthatpreventedusfromusingitinourexperiments.
Itpenalizeserrorsbymaintaininganincreasingvalueofcumulativerelevance.
Whilethisisbasedontherankofeachdocument,thesizekofthelistisnottakenintoaccount–thusthelengthofthelistisirrelevantinDCG.
Errorsintopranksofatop-klistshouldbeconsideredmoreimpor-tantthanerrorsinlow-rankedpositions.
ThisimportantfeaturelacksfrombothDCGandNDCGmeasures.
Moreover,DCGvalueforeachrankinthetop-klistiscomputedtakingintoaccountthepreviousvaluesinthelist.
Next,weintroduceSpearman'sRankCorrelationCoecient,whichwasusedduringtheexperimentalevaluation,consistsanon-parametric(distribution-free)rankstatisticproposedbySpearman(1904)measuringthestrengthofassoci-ationsbetweentwovariablesandisoftensymbolizedbyρ.
Itestimateshowwelltherelationshipbetweentwovariablescanbedescribedusingamonotonic246E.
Voudigarietal.
function.
Iftherearenorepeateddatavaluesofthesevariables(likeinrankingproblem),aperfectSpearmancorrelationof+1or-1existsifeachvariableisaperfectmonotonefunctionoftheother.
ItisoftenconfusedwiththePearsoncorrelationcoecientbetweenrankedvariables.
However,theprocedureusedtocalculateρismuchsimpler.
IfXandYaretwovariableswithcorrespondingranksxiandyi,di=xiyi,i=1,n,betweentheranksofeachobservationonthetwovariables,thenitisgivenby:ρ=16·ni=1d2in(n21).
4.
2RSimQualityMeasureTheobservedsimilaritymeasuresdonotcoversucientlythenegrainedre-quirementsarising,comparingtop-krankingsintheWebsearchcontext.
Soweneedanewsimilaritymetrictakingintoconsideration:a)TheabsolutedierencebetweenthepredictedandactualpositionforeachWebpageaslargedierenceindicatesalessaccuratepredictionandb)TheactualrankingpositionofaWebpage,becausefailingtopredictahighlyrankedWebpageismoreimportantthanalow-ranked.
Basedontheseobservations,weintroduceanewmeasure,namedRSim.
Everyinaccuratepredictionmadeincursacertainpenaltydependingonthetwonotedfactors.
Ifpredictionis100%accurate(samepredictedandactualrank),thepenaltyisequaltozero.
LetBibethepredictedrankpositionforpageiandAitheactual.
TheCumulativePenaltyScore(CPS)iscomputedasCPS(A,B)=ki=1|AiBi|·(k+1Ai).
TheproposedpenaltyscoreCPSrepresentstheoverallerror(dierence)be-tweentheinvolvedtop-klistsAandBandisproportionalto|AiBi|.
Theterm(k+1Ai)increaseswhenAibecomessmallersoerrorsinhighlyrankedWebpagesarepenalizedmore.
Inthebestcase,rankpredictionsforallWebpagesarecompletelyaccurate(CPS=0),sinceAi=Biforanyvalueofi.
Intheworstcase,therankpredictionsforallWebpagesnotonlyareinaccurate,butalsobearthegreatestCPSpenaltypossible.
Insuchascenario,alltheWebpagespredictedtobeinthetop-klist,actuallyholdthepositionk+1(orworse).
Assumingthatwewanttocomparetworankingsoflengthk,thenthemax-imumCPSforevenandoddvaluesofkisequalto2k3+3k2+k6.
TheproofforCPSmaxnalformisomittedduetospacelimitations.
Basedontheabovewedeneanewsimilaritymeasure,RSim,tocomparethesimilaritybetweentop-kranklistsasfollows:RSim(Ai,Bi)=1CPS(Ai,Bi)CPSmax(Ai,Bi).
(1)Inthebest-casepredictionscenario,RSimisequaltoone,whileintheworst-caseRSimisequaltozero.
SothecloserthevalueofRSimistoone,thebetterandmoreaccuratetherankpredictionsare.
AFrameworkforWebPageRankPrediction2470501001502002503000.
50.
550.
60.
650.
70.
750.
80.
850.
90.
951topkOSimMM1MM2MM3LinRegBayesMod(a)OSim0501001502002503000.
550.
60.
650.
70.
750.
80.
85topkKSimMM1MM2MM3LinRegBayesMod(b)KSim0501001502002503000.
50.
550.
60.
650.
70.
75topkRSimMM1MM2MM3LinRegBayesMod(c)RSim0501001502002503000.
650.
70.
750.
80.
850.
90.
951topknDCGMM1MM2MM3LinRegBayesMod(d)NDCG0501001502002503000.
70.
750.
80.
850.
90.
951topkSpearmanCorrelationMM1MM2MM3LinRegBayesMod(e)SpearmanCorrelationFig.
1.
PredictionaccuracyvsTop-klistlength-Yahoodataset5ExperimentalEvaluationInordertoevaluatetheeectivenessofourmethodsweperformedexperimentsontwodierentrealworlddatasets.
Theseconsistcollectionsoftop-krankedlistsfor22queriesoveraperiodof11daysasresultedfromtheYahoo!
1andtheGooglesearchengines,producedinthesameway.
Inourexperiments,weevaluatethepredictionqualityintermsofsimilaritiesbetweenthepredictedandtheactualtop-krankedlistsusingOSim,KSim,NDCG,SpearmancorrelationandthenovelsimilaritymeasureRSim.
5.
1DatasetsandQuerySelectionForeachdataset(YahooandGoogle)awealthofsnapshotswereavailable,en-suringwehaveenoughevolutiontotestourapproach.
Aconcisedescriptionofeachdatasetandquery-basedapproachfollow.
TheYahooandGoogledatasetsconsistof11consecutivedailytop-1000rankedlistscomputedusingtheYa-hooSearchWebServices2andtheGoogleSearchenginerespectively.
Thesesetswerepickedfrompopular:a)queriesappearedinGoogleTrends3andb)currentqueries(i.
e.
euro2008orOlympicgames2008).
5.
2ExperimentalMethodologyWecomparedallpredictionsamongthevariousapproachesandwenextdescribethestepsassumedforbothdatasets.
Atrst,wecomputedPageRankscoresfor1http://search.
yahoo.
com2http://developer.
yahoo.
com/search/3http://www.
google.
com/trends248E.
Voudigarietal.
0501001502002503000.
20.
30.
40.
50.
60.
70.
80.
91topkOSimMM1MM2MM3LinRegBayesMod(a)OSim0501001502002503000.
50.
550.
60.
650.
70.
750.
8topkKSimMM1MM2MM3LinRegBayesMod(b)KSim0501001502002503000.
30.
40.
50.
60.
70.
8topkRSimMM1MM2MM3LinRegBayesMod(c)RSim0501001502002503000.
40.
50.
60.
70.
80.
91topknDCGMM1MM2MM3LinRegBayesMod(d)NDCG0501001502002503000.
650.
70.
750.
80.
850.
90.
951topkSpearmanCorrelationMM1MM2MM3LinRegBayesMod(e)SpearmanCorrelationFig.
2.
PredictionaccuracyvsTop-klistlength-Googledataseteachsnapshotofourdatasetsandobtainedthetop-krankingsusingthescoringfunctionmentioned.
Havingcomputedthescores,wecalculatedthenrank(racervaluesforMMs)foreachpairofconsecutivegraphsnapshotsandstoredtheminamatrixnrank(racer)*time.
Then,assuminganm-pathofconsecutivesnapshots,wepredictthem+1state.
Foreachpagep,wepredictarankingcomparingittoactualbya10-foldcrossvalidationprocess(training90%ofdatasetandtestingontheremaining10%).
InthecaseoftheEMapproach,wetestedthequalityofclusteringresultsforclusterscardinalitybetween2and10foreachqueryandchosetheonethatmaximizedtheoverallqualityofclustering.
Thiswasdenedasamonotonecombinationofwithin-clusterwc(sumofsquareddistancesfromeachpointtothecenterofclusteritbelongsto)andbetween-clustervariationbc(distancebetweenclustercenters).
Asscorefunctionofclustering,weconsideredtheratiobc/wc.
5.
3ExperimentalResultsRegardingtheGoogleandYahoo!
datasetresultscomingoutoftheexperimen-talevaluation,onecanseethattheMMsprevailwithveryaccurateresults.
Regressionbasedtechniques(LinReg)reachandoutweighMMsperformanceasthelengthoftop-klistincreasesprovingtheirrobustness.
InbothdatasetsexperimentsprovethesuperiorityofEMapproach(BayesMod)whoseperformanceisverysatisfyingforallsimilaritymeasures.
TheMMscomenextintheevaluationranking,whereassmallertheorderisthebetteristhepredictionaccuracy,thoughonewouldthinkofthecontrary.
AFrameworkforWebPageRankPrediction249Obviously(gures)theproposedframeworkoersincrediblyhighaccuracypredictionsandisveryencouraging,asitrangessystematicallybetween70%and100%providingatoolforeectivepredictions.
6ConclusionsWehavedescribedpredictorlearningalgorithmsforWebpagerankpredictionbasedonaframeworkoflearningtechniques(MMs,LinReg,BayesMod)andexperimentalstudyshowedthattheycanachieveoverallverygoodpredictionperformance.
Furtherworkwillfocusinthefollowingissues:a)Multi-featureprediction:weintendtodealwiththeinternalmechanismthatproducestherankingofpages(notonlyrankvalues)basedonmultiplefeatures,b)Combi-nationofsuchmethodswithdimensionalityreductiontechniques.
References1.
Kemeny,J.
G.
,Snell,J.
L.
:FiniteMarkovChains.
Prinston(1963)2.
Dempster,A.
P.
,Laird,N.
M.
,Rubin,D.
B.
:MaximumLikelihoodfromIncompleteDataviatheEMAlgorithm.
RoyalStatisticalSociety39,1–38(1977)3.
Kendall,M.
G.
,Gibbons,J.
D.
:RankCorrelationMethods.
CharlesGrin,UK(1990)4.
Haveliwala,T.
H.
:Topic-SensitivePageRank.
In:Proc.
WWW(2002)5.
Jarvelin,K.
,Kekalainen,J.
:Cumulatedgain-basedevaluationofIRtechniques.
TOIS20,422–446(2002)6.
Chien,S.
,Dwork,C.
,Kumar,R.
,Simon,D.
R.
,Sivakumar,D.
:LinkEvolu-tion:AnalysisandAlgorithms.
InternetMathematics1,277–304(2003)7.
Chen,Y.
-Y.
,Gan,Q.
,Suel,T.
:LocalMethodsforEstimatingPageRankValues.
In:Proc.
ofCIKM(2004)8.
Langville,A.
N.
,Meyer,C.
D.
:UpdatingPageRankwithiterativeaggregation.
In:Proc.
ofthe13thInternationalWorldWideWebConferenceonAlternateTrackPapersandPosters,pp.
392–393(2004)9.
Kan,M.
-Y.
,Thi,H.
O.
:FastwebpageclassicationusingURLfeatures.
In:Confer-enceonInformationandKnowledgeManagement,pp.
325–326.
ACM,NewYork(2005)10.
Yang,H.
,King,I.
,Lu,M.
R.
:PredictiveRanking:ANovelPageRankingApproachbyEstimatingtheWebStructure.
In:Proc.
ofthe14thInternationalWWWCon-ference,pp.
1825–1832(2005)11.
Broder,A.
Z.
,Lempel,R.
,Maghoul,F.
,Pedersen,J.
:EcientPageRankapproxi-mationviagraphaggregation.
Inf.
Retrieval9,123–138(2006)12.
Vlachou,A.
,Berberich,K.
,Vazirgiannis,M.
:Representingandquantifyingrank-changefortheWebgraph.
In:Aiello,W.
,Broder,A.
,Janssen,J.
,Milios,E.
E.
(eds.
)WAW2006.
LNCS,vol.
4936,pp.
157–165.
Springer,Heidelberg(2008)13.
Vazirgiannis,M.
,Drosos,D.
,Senellart,P.
,Vlachou,A.
:WebPageRankPredictionwithMarkovModels.
WWWposter(2008)14.
Sayyadi,H.
,Getoor,L.
:FutureRank:RankingScienticArticlesbyPredictingtheirFuturePageRank.
In:SIAMIntern.
Confer.
onDataMining,pp.
533–544(2009)15.
Zacharouli,P.
,Titsias,M.
,Vazirgiannis,M.
:WebpagerankpredictionwithPCAandEMclustering.
In:Avrachenkov,K.
,Donato,D.
,Litvak,N.
(eds.
)WAW2009.
LNCS,vol.
5427,pp.
104–115.
Springer,Heidelberg(2009)

CloudCone月付$48,MC机房可小时付费

CloudCone商家在前面的文章中也有多次介绍,他们家的VPS主机还是蛮有特点的,和我们熟悉的DO、Linode、VuLTR商家很相似可以采用小时时间计费,如果我们不满意且不需要可以删除机器,这样就不扣费,如果希望用的时候再开通。唯独比较吐槽的就是他们家的产品太过于单一,一来是只有云服务器,而且是机房就唯一的MC机房。CloudCone 这次四周年促销活动期间,商家有新增独立服务器业务。同样的C...

百纵科技,美国独立服务器 E52670*1 32G 50M 200G防御 899元/月

百纵科技:美国高防服务器,洛杉矶C3机房 独家接入zenlayer清洗 带金盾硬防,CPU全系列E52670、E52680v3 DDR4内存 三星固态盘阵列!带宽接入了cn2/bgp线路,速度快,无需备案,非常适合国内外用户群体的外贸、搭建网站等用途。C3机房,双程CN2线路,默认200G高防,3+1(高防IP),不限流量,季付送带宽美国洛杉矶C3机房套餐处理器内存硬盘IP数带宽线路防御价格/月套...

QQ防红跳转短网址生成网站源码(91she完整源码)

使用此源码可以生成QQ自动跳转到浏览器的短链接,无视QQ报毒,任意网址均可生成。新版特色:全新界面,网站背景图采用Bing随机壁纸支持生成多种短链接兼容电脑和手机页面生成网址记录功能,域名黑名单功能网站后台可管理数据安装说明:由于此版本增加了记录和黑名单功能,所以用到了数据库。安装方法为修改config.php里面的数据库信息,导入install.sql到数据库。...

pagerank为你推荐
phpweb破解如何破解网络锁360邮箱免费注册360账号-电子邮箱怎么填写?govya资费标准中国电信套餐资费一览表2021三友网有了解唐山三友集团的吗?大学生待遇如何,工资收入,福利保障,工作环境等等即时通平台寻找娱乐高科技产品3g手机有哪些电信3g手机有哪些?dedecms自动采集织梦CMS系统的采集功能不知道怎么采集软件。长沙电话号码升位长沙的座机什么时候变成8位的,急!在线等答案,那如果之前的7位数是不是都会变啊?变成什么样了呢?财务单据出纳要用什么单据?
我的世界服务器租用 香港主机租用 GGC kdata gomezpeer 英语简历模板word 台湾谷歌网址 100m空间 nerds 域名和空间 Updog 宏讯 湖南idc photobucket 注册阿里云邮箱 实惠 免备案cdn加速 网络速度 双十二促销 小夜博客 更多