estimatepagerank

pagerank  时间:2021-04-19  阅读:()
AFrameworkforWebPageRankPredictionElliVoudigari1,JohnPavlopoulos1,andMichalisVazirgiannis1,2,1DepartmentofInformatics,AthensUniversityofEconomicsandBusiness,Greeceelliv@aueb.
gr,annis.
pavlo@gmail.
com,mvazirg@aueb.
gr2InstitutTelecom,EcoledeTelecomParisTech,DepartementInformatiqueetReseaux,Paris,FranceAbstract.
WeproposeaframeworkforpredictingtherankingpositionofaWebpagebasedonpreviousrankings.
Assumingasetofsuccessivetop-krankings,welearnpredictorsbasedondierentmethodologies.
Thepredictionqualityisquantiedasthesimilaritybetweenthepre-dictedandtheactualrankings.
Extensiveexperimentswereperformedonrealworldlargescaledatasetsforglobalandquery-basedtop-krank-ings,usingavarietyofexistingsimilaritymeasuresforcomparingtop-krankedlists,includinganovelandmorestrictmeasureintroducedinthispaper.
Thepredictionsarehighlyaccurateandrobustforallexperimen-talsetupsandsimilaritymeasures.
Keywords:RankPrediction,DataMining,WebMining,ArticialIntelligence.
1IntroductionTheWorldWideWebisahighlydynamicstructurecontinuouslychanging,asWebpagesandhyperlinksarecreated,deletedormodied.
Rankingoftheresultsisacornerstoneprocessenablinguserstoeectivelyretrieverelevantandimportantinformation.
GiventhehugesizeoftheWebgraph,computingrankingsofWebpagesrequiresawesomeresources-computationsonmatriceswhosesizeisoftheorderofWebsize(109nodes).
Ontheotherhandtheowneroftheindividualwebpagecanseeitsrankingonlyinthecaseofthewebgraphbysubmittingqueriestotheownerofthegraph(i.
e.
asearchengine).
Givenaseriesoftime-orderedrankingsofthenodesofagraphwhereeachbearsitsrankingforeachtimestamp,wedeveloplearn-ingmechanismsthatenablepredictionsofthenodesrankinginfuturetimes.
Thepredictionsrequireonlylocalfeatureknowledgewhilenoglobaldataarenecessary.
Specically,anindividualnodecanpredictitsrankingonlyknowingthevaluesofitsownranking.
Insuchacasethenodecouldplanactionsforoptimizingitsrankinginfuture.
InthispaperwepresentanintegratedeortforaframeworktowardsWebpagerankpredictionconsideringdierentlearningalgorithms.
WeconsiderPartiallysupportedbytheDIGITEOChairgrantLEVETONEinFranceandtheResearchCentreoftheAthensUniversityofEconomicsandBusiness,Greece.
L.
Iliadisetal.
(Eds.
):EANN/AIAI2011,PartII,IFIPAICT364,pp.
240–249,2011.
cIFIPInternationalFederationforInformationProcessing2011AFrameworkforWebPageRankPrediction241i)variableorderMarkovModels(MMs),ii)regressionmodelsandiii)anEMbasedapproachwithBayesianlearning.
ThenalpurposeistorepresentthetrendsandpredictfuturerankingsofWebpages.
Allthemodelsarelearnedfromtimeseriesdatasetswhereeachtrainingsetcorrespondstopre-processedrankvaluesofWebpagesobservedovertime.
Forallmethods,predictionqualityisevaluatedbasedonthesimilaritybe-tweenthepredictedandactualrankedlists,whilewefocusonthetop-kelementsoftheWebpagerankedlists,astoppagesareusuallymoreimportantinWebsearch.
Preliminaryworkonthistopicwaspresentedin[13]and[15].
Thecurrentworksignicantlydiersandadvancespreviousworksoftheauthorsinthefollowingways:a)Renedandcarefulre-engineeringoftheMMs'parameterlearningpro-cedurebyusingcrossvalidation,b)Integrationandelaborationoftheresultsof[15]inordertovalidatetheperformancecomparisonbetweenregression(boostedwithclustering)andMMpredictors,inlargescalerealworlddatasets,c)Namelyweadopt:LinearRegression,random1st/2nd/3rdorderMarkovmodelsprovingtherobustnessofthemodel,d)Anewtop-klistsimilaritymeasure(RSim)isintroducedandusedfortheevaluationofpredictorsandmoreimportantly,e)Additional,extensiveandrobustexperimentstookplaceusingquerybasedontop-klistsfromYahoo!
andGoogleSearchengine.
2RelatedWorkTherankingofqueryresultsinaWebsearch-engineisanimportantproblemandhasattractedsignicantattentionintheresearchcommunity.
TheproblemofpredictingPageRankispartlyaddressedin[9].
ItfocusesonWebpageclassicationbasedonURLfeatures.
Basedonthis,theauthorsperformexperimentstryingtomakePageRankpredictionsusingtheextractedfeatures.
Forthispurpose,theyuselinearregression;however,thecomplexityofthisapproachgrowslinearlyinproportiontothenumberoffeatures.
TheexperimentalresultsshowthatPageRankpredictionbasedonURLfeaturesdoesnotperformverywell,probablybecauseeventhoughtheycorrelateverywellwiththesubjectofpages,theydonotinuencepage'sauthorityinthesameway.
Arecentapproachtowardspagerankingpredictionispresentedin[13]gener-atingMarkovModelsfromhistoricalrankedlistsandusingthemforpredictions.
AnapproachthataimsatapproximatingPageRankvalueswithouttheneedofperformingthecomputationsovertheentiregraphis[6].
TheauthorsproposeanalgorithmtoincrementallycomputeapproximationstoPageRank,basedonevolutionofthelinkstructureofWebgraph(asetoflinkchanges).
Theirexper-imentsdemonstratethatthealgorithmperformswellbothinspeedandqualityandisrobusttovarioustypesoflinkmodications.
However,thisrequirescon-tinuousmonitoringoftheWebgraphinordertotrackanylinkmodications.
TherehasalsobeenworkinadaptivecomputationofPageRank([8],[11])orevenestimationofPageRankscores[7].
242E.
Voudigarietal.
In[10]amethodcalledpredictiverankingisproposed,aimingatestimatingtheWebstructurebasedontheintuitionthatthecrawlingandconsequentlytherankingresultsareinaccurate(duetoinadequatedataanddanglingpages).
Inthiswork,theauthorsdonotmakefuturerankpredictions.
Instead,theyestimatethemissingdatainordertoachievemoreaccuraterankings.
In[14]theauthorssuggestanewmeasureforrankingscienticarticles,basedonfuturecitations.
Basedonpublicationtimeandauthor'sname,theypredictfuturecitationsandsuggestabettermodel.
3PredictionMethodsInthissection,wepresentaframeworkthataimstopredictthefuturerankpositionofWebpagesbasedontheirtrendsshownthepast.
OurgoalistondpatternsinrankingevolutionofWebpages.
GivenasetofsuccessiveWebgraphsnapshots,foreachpagewegenerateasequenceofrankchangeratesthatindicatesthetrendsofthispageamongtheprevioussnapshots.
WeusethesesequencesofprevioussnapshotsoftheWebgraphasatrainingsetandtrytopredictthetrendsofaWebpagebasedonprevious.
Theremainingofthissectionisorganizedasfollows:InSect.
3.
1wetrainMMsofvariousordersandtrytopredictthetrendsofaWebpage.
Section3.
2discussesanapproachthatusesaseparatelinearregressionmodelforeachwebpage,whileSect.
3.
3combineslinearregressionwithclusteringbasedonanEMprobabilisticframework.
RankChangeRate.
InordertopredictfuturerankingsofWebpages,weneedtodeneameasureintroducedin[12]suitableformeasuringpagerankdynamics.
Webrieypresentitsdesign.
LetGtibethesnapshotoftheWebgraphcreatedbyacrawlandnti=|Gti|thenumberofWebpagesattimeti.
Then,rank(p,ti)isafunctionprovidingtherankingofaWebpagep∈Gti,accordingtosomecriterion(i.
e.
PageRankvalues).
Intuitively,anappropriatemeasureforWebpagestrendsistherankchangeratebetweentwosnapshots,butasthesizeoftheWebgraphconstantlyincreasesthetrendmeasureshouldbecomparableacrossdierentgraphsizes.
Thus,weutilizethenormalizedrank(nrank)ofaWebpage,asitwasdenedin[12].
Forapageprankedatpositionrank(p,ti):nrank(p,ti)=2·rank(p,ti)n2ti,whichrangesbetween2n2tiand2n1ti.
Then,usingthenormalizedranks,theRankChangeRate(Racer)isgivenbyracer(p,ti)=1nrank(p,ti+1)nrank(p,ti).
3.
1MarkovModelLearningMarkovModels(MMs)[1]havebeenwidelyusedforstudyingandunderstandingstochasticprocessesandbehaveverywellonmodelingandpredictingvaluesinvariousapplications.
Theirfundamentalassumptionisthatthefuturevaluedependsonanumberofmpreviousvalues,wheremistheorderoftheMM.
AFrameworkforWebPageRankPrediction243TheyaredenedbasedonasetofstatesS={s1,s2,sn}andamatrixToftransitionprobabilitiestieachofwhichrepresentstheprobabilitythatastatesioccursafterasequenceofstates.
OurgoalistorepresenttheWebpagesrankingtrendsacrossdierentwebgraphsnapshots.
WeusetheracervaluestodescribetherankchangeofaWebpagebetweentwosnapshotsandweutilizeracersequencestolearnMMs.
Ob-viously,stablerankingacrosstimeisrepresentedbyazeroracervalue,whileallothertrendsbyrealnumbersgeneratingahugespaceofdiscretevalues.
Asexpected(intuitivelymostpagesareexpectedtoremainstableforsometimeirrespectivetotheirrankatthetime),thezerovaluehasanunreasonablyhighfrequencycomparedtoallothervalueswhichmeansthatallstatesbesidesthezerooneshouldbeformedbyinherentrangesofvaluesinsteadofasingledis-crete.
Inordertoensureequalprobabilityfortransitionbetweenanypairofstates,weguaranteedequiprobablestatesbyformingrangeswithequalcumu-lativefrequencies(showingracervaluewithintherange)witheachother.
InordertocalculatethestatenumberforourMMs,wecomputedtherelativecumulativefrequencyofthezeroracerstateRFRacer=0andusedthistondtheoptimumnumberofstatesns=lRFRacer=0.
Next,weformednsequiprobablepartitionsandusedtheranges'meanaveragevaluesasstatestotrainourmodel.
Weshouldnotethatwithinthesignicantlyhighfrequencyofthezeroracervalues,arealsoconsideredpagesinitiallyobtainedwithinthetop-klistandthenfell(andremained)out.
WeremoveanybiasfromRFRacer=0,excludinganyvaluesnotcorrespondingtostablerankandobtainingRFRacer=0≈0.
1whichinturnsuggested10equiprobablestates.
PredictionswithRacer.
BasedonthesetofstatesmentionedaboveandformedtorepresentWebpagetrends,weareabletotrainMMsandpredictthetrendofaWebpageinthefutureaccordingtopasttrends.
Byassumingm+1temporallysuccessivecrawls,resultinginrespectivesnapshots,asequenceofmstates(representativeofracervalues)areconstructedforeachWebpage.
Theseareusedtoconstructanm-orderMM.
Notethatthememorymisaninherentfeatureofthemodel.
Aftercomputingtransitionprobabilitiesforeverypath,usingthegeneratedstates,thefuturestatescanbepredictedbyusingthechainrule[1].
Thus,foranm-orderMarkovModel,thepathprobabilityofastatesequenceisP(s1→.
.
.
→sm)=P(s1)·mi=2P(si|sim,.
.
,si1),whereeachsi(i∈{1,2,n})foranytimeintervalmayvaryoverallthepossiblestates(rangesofracervalues).
Then,predictingthefuturetrendofapageisperformedbycomputingthemostlikelynextstategiventhesofarstatepath.
Inspecic,assumingmtimeintervals,thenextmostprobablestateXiscomputedas:X=argmaxXP(s1→.
.
.
→sm1→X).
Usingthat,wepredictfuturestatesforeachpage.
AseachstateisthemeanofaRacerrange,wecomputebackthefuturenrank.
Therefore,weareabletopredictfuturetop-krankingbysortingtheracerofWebpagesinascendingorder.
244E.
Voudigarietal.
3.
2RegressionModelsAssumeasetofNWebpagesandobservationsofnrankvaluesatmtimesteps.
Letxi=(xi1,xim)bethenrankvaluesforWebpageiatthetimepointst=(t1,tm),wherethe(N*m)designmatrixXstoresalltheobservednrankvaluessothateachrowcorrespondstoaWebpageandeachcolumntoatimepoint.
GiventhesevalueswewishtopredictthenrankvaluexiforeachWebpageatsometimetwhichtypicallycorrespondstoafuturetimepoint(t>ti,i=1,m).
Next,wediscussasimplepredictionmethodbasedonlinearregressionwheretheinputvariablecorrespondstotimeandtheoutputtothenrankvalue.
ForacertainWebpageiweassumealinearregressionmodelhavingtheformxik=aitk+bi+k,k=1,m(kdenotesazero-meanGaussiannoise).
Notethattheparameters(ai,bi)areWebpage-specicandtheirvaluesarecalculatedusingleastsquares.
Inotherwords,theaboveformulationdenesaseparatelinearregressionmodelforeachWebpagethustheytreatindependently.
ThiscanberestrictivesincepossibleexistingsimilaritiesanddependenciesbetweendierentWebpagesarenottakenintoaccount.
3.
3ClusteringUsingEMWeassumethatthenrankvaluesofeachWebpagefallintooneofJdierentclusters.
Clusteringcanbeviewedastrainingamixtureprobabilitymodel.
TogeneratethenrankvaluesxiforWebpagei,werstselecttheclustertypejwithprobabilityπj(whereπj≥0andJj=1πj=1)andthenproducethevaluesxiaccordingtoalinearregressionmodelxik=aitk+bi+k,k=1,m,wherekisindependentGaussiannoisewithzeromeanandvarianceσ2j.
ThisimpliesthatgiventheclustertypejthenrankvaluesaredrawnfromtheproductofGaussiansp(xi|j)=mk=1N(xik|ajtk+bj,σ2j).
TheclustertypethatgeneratedthenrankvaluesofacertainWebpageisanunobservedvariableandthusaftermarginalizationweobtainamixtureuncon-ditionaldensityp(xi)=Jj=1πjp(xi|j)fortheobservationvectorxi.
Totrainthemixturemodelandestimatetheparametersθ=(πj,σ2j,aj,bj)j=1,.
.
.
,J,wecanmaximizetheloglikelihoodofthedataL(θ)=logNi=1p(xi)byusingtheEMalgorithm[2].
Givenaninitialstatefortheparameters,EMoptimizesoverθbyiteratingbetweenEandMsteps:TheEstepcomputestheposteriorprobabilitiesRij=πjp(xi|j)Jρ=1πρp(xi|ρ),forj=1,Jandi=1,N,(Nisthetotalnumberofwebpages).
TheMstepupdatestheparametersaccordingto:πj=1NNi=1Rij,σ2j=Ni=1Rijmk=1(xikajtkbj)2πjandajbj=1NjtTttT1tT1m1Ni=1RijxTitNi=1RijxTi1,j=1,J,tisthevectorofalltimepointsand1isthem-dimensionalvectorofones.
Oncewehaveobtainedsuitablevaluesfortheparameters,wecanusethemixturemodelforprediction.
Particularly,topredictthenrankvaluexiofWebAFrameworkforWebPageRankPrediction245pageiattgiventheobservedvaluesxi=(xi1,xim)atprevioustimes,weexpresstheposteriordistributionp(xi|xi)usingtheBayesrulep(xi|xi)=Jj=1RijNxiajt+bj,s2j,whereRijiscomputedaccordingtoE-step.
Toobtainaspecicpredictivevalueforxi,wecanusethemeanvalueoftheaboveposteriordistributionxi=Jj=1Rij(ajt+bj)orthemedianestimatexi=ajt+bj,wherej=argmaxρRiρthatconsidersahardassignmentoftheWebpageintooneoftheJclusters.
4Top-kListSimilarityMeasuresInordertoevaluatethequalityofpredictions,weneedtomeasurethesimilar-ityofthepredictedtotheactualtop-kranking.
Forthispurpose,weexaminemeasurescommonlyusedforcomparingrankings,pointouttheshortcomingsofexistinganddeneanewsimilaritymeasurefortop-krankings,denotedasRSim.
4.
1ExistingSimilarityMeasuresTherstone,denotedasOSim(A,B)[4]indicatesthedegreeofoverlapbetweenthetop-kelementsoftwosetsAandB(eachoneofsizek):OSim(A,B)=|A∩B|k.
Thesecond,KSim(A,B)[4],isbasedonKendall'sdistancemeasure[3]andindicatesthedegreethattherelativeorderingsoftwotop-klistsareinagreement:KSim(A,B)=|(u,v):A,B,agreeinorder||A∪B|(|A∪B|1),whereAisanextensionofAresultingfromappendingatitstailtheelementsx∈A∪(BA)andBisdenedanalogously.
AnotherinterestingmeasureintroducedinInformationRetrievalforevaluat-ingtheaccumulatedrelevanceofatop-kdocumentlisttoaqueryisthe(Nor-malized)DiscountedCumulativeGain(N(DCG))[5].
Thismeasureassumesatop-klist,whereeachdocumentisfeaturedwitharelevancescoreaccumulatedbyscanningthelistfromtoptobottom.
AlthoughDCGcouldbeusedfortheevaluationofourpredictions,sinceittakesintoaccounttherelevanceofatop-klisttoanother,itexhibitssomebasicfeaturesthatpreventedusfromusingitinourexperiments.
Itpenalizeserrorsbymaintaininganincreasingvalueofcumulativerelevance.
Whilethisisbasedontherankofeachdocument,thesizekofthelistisnottakenintoaccount–thusthelengthofthelistisirrelevantinDCG.
Errorsintopranksofatop-klistshouldbeconsideredmoreimpor-tantthanerrorsinlow-rankedpositions.
ThisimportantfeaturelacksfrombothDCGandNDCGmeasures.
Moreover,DCGvalueforeachrankinthetop-klistiscomputedtakingintoaccountthepreviousvaluesinthelist.
Next,weintroduceSpearman'sRankCorrelationCoecient,whichwasusedduringtheexperimentalevaluation,consistsanon-parametric(distribution-free)rankstatisticproposedbySpearman(1904)measuringthestrengthofassoci-ationsbetweentwovariablesandisoftensymbolizedbyρ.
Itestimateshowwelltherelationshipbetweentwovariablescanbedescribedusingamonotonic246E.
Voudigarietal.
function.
Iftherearenorepeateddatavaluesofthesevariables(likeinrankingproblem),aperfectSpearmancorrelationof+1or-1existsifeachvariableisaperfectmonotonefunctionoftheother.
ItisoftenconfusedwiththePearsoncorrelationcoecientbetweenrankedvariables.
However,theprocedureusedtocalculateρismuchsimpler.
IfXandYaretwovariableswithcorrespondingranksxiandyi,di=xiyi,i=1,n,betweentheranksofeachobservationonthetwovariables,thenitisgivenby:ρ=16·ni=1d2in(n21).
4.
2RSimQualityMeasureTheobservedsimilaritymeasuresdonotcoversucientlythenegrainedre-quirementsarising,comparingtop-krankingsintheWebsearchcontext.
Soweneedanewsimilaritymetrictakingintoconsideration:a)TheabsolutedierencebetweenthepredictedandactualpositionforeachWebpageaslargedierenceindicatesalessaccuratepredictionandb)TheactualrankingpositionofaWebpage,becausefailingtopredictahighlyrankedWebpageismoreimportantthanalow-ranked.
Basedontheseobservations,weintroduceanewmeasure,namedRSim.
Everyinaccuratepredictionmadeincursacertainpenaltydependingonthetwonotedfactors.
Ifpredictionis100%accurate(samepredictedandactualrank),thepenaltyisequaltozero.
LetBibethepredictedrankpositionforpageiandAitheactual.
TheCumulativePenaltyScore(CPS)iscomputedasCPS(A,B)=ki=1|AiBi|·(k+1Ai).
TheproposedpenaltyscoreCPSrepresentstheoverallerror(dierence)be-tweentheinvolvedtop-klistsAandBandisproportionalto|AiBi|.
Theterm(k+1Ai)increaseswhenAibecomessmallersoerrorsinhighlyrankedWebpagesarepenalizedmore.
Inthebestcase,rankpredictionsforallWebpagesarecompletelyaccurate(CPS=0),sinceAi=Biforanyvalueofi.
Intheworstcase,therankpredictionsforallWebpagesnotonlyareinaccurate,butalsobearthegreatestCPSpenaltypossible.
Insuchascenario,alltheWebpagespredictedtobeinthetop-klist,actuallyholdthepositionk+1(orworse).
Assumingthatwewanttocomparetworankingsoflengthk,thenthemax-imumCPSforevenandoddvaluesofkisequalto2k3+3k2+k6.
TheproofforCPSmaxnalformisomittedduetospacelimitations.
Basedontheabovewedeneanewsimilaritymeasure,RSim,tocomparethesimilaritybetweentop-kranklistsasfollows:RSim(Ai,Bi)=1CPS(Ai,Bi)CPSmax(Ai,Bi).
(1)Inthebest-casepredictionscenario,RSimisequaltoone,whileintheworst-caseRSimisequaltozero.
SothecloserthevalueofRSimistoone,thebetterandmoreaccuratetherankpredictionsare.
AFrameworkforWebPageRankPrediction2470501001502002503000.
50.
550.
60.
650.
70.
750.
80.
850.
90.
951topkOSimMM1MM2MM3LinRegBayesMod(a)OSim0501001502002503000.
550.
60.
650.
70.
750.
80.
85topkKSimMM1MM2MM3LinRegBayesMod(b)KSim0501001502002503000.
50.
550.
60.
650.
70.
75topkRSimMM1MM2MM3LinRegBayesMod(c)RSim0501001502002503000.
650.
70.
750.
80.
850.
90.
951topknDCGMM1MM2MM3LinRegBayesMod(d)NDCG0501001502002503000.
70.
750.
80.
850.
90.
951topkSpearmanCorrelationMM1MM2MM3LinRegBayesMod(e)SpearmanCorrelationFig.
1.
PredictionaccuracyvsTop-klistlength-Yahoodataset5ExperimentalEvaluationInordertoevaluatetheeectivenessofourmethodsweperformedexperimentsontwodierentrealworlddatasets.
Theseconsistcollectionsoftop-krankedlistsfor22queriesoveraperiodof11daysasresultedfromtheYahoo!
1andtheGooglesearchengines,producedinthesameway.
Inourexperiments,weevaluatethepredictionqualityintermsofsimilaritiesbetweenthepredictedandtheactualtop-krankedlistsusingOSim,KSim,NDCG,SpearmancorrelationandthenovelsimilaritymeasureRSim.
5.
1DatasetsandQuerySelectionForeachdataset(YahooandGoogle)awealthofsnapshotswereavailable,en-suringwehaveenoughevolutiontotestourapproach.
Aconcisedescriptionofeachdatasetandquery-basedapproachfollow.
TheYahooandGoogledatasetsconsistof11consecutivedailytop-1000rankedlistscomputedusingtheYa-hooSearchWebServices2andtheGoogleSearchenginerespectively.
Thesesetswerepickedfrompopular:a)queriesappearedinGoogleTrends3andb)currentqueries(i.
e.
euro2008orOlympicgames2008).
5.
2ExperimentalMethodologyWecomparedallpredictionsamongthevariousapproachesandwenextdescribethestepsassumedforbothdatasets.
Atrst,wecomputedPageRankscoresfor1http://search.
yahoo.
com2http://developer.
yahoo.
com/search/3http://www.
google.
com/trends248E.
Voudigarietal.
0501001502002503000.
20.
30.
40.
50.
60.
70.
80.
91topkOSimMM1MM2MM3LinRegBayesMod(a)OSim0501001502002503000.
50.
550.
60.
650.
70.
750.
8topkKSimMM1MM2MM3LinRegBayesMod(b)KSim0501001502002503000.
30.
40.
50.
60.
70.
8topkRSimMM1MM2MM3LinRegBayesMod(c)RSim0501001502002503000.
40.
50.
60.
70.
80.
91topknDCGMM1MM2MM3LinRegBayesMod(d)NDCG0501001502002503000.
650.
70.
750.
80.
850.
90.
951topkSpearmanCorrelationMM1MM2MM3LinRegBayesMod(e)SpearmanCorrelationFig.
2.
PredictionaccuracyvsTop-klistlength-Googledataseteachsnapshotofourdatasetsandobtainedthetop-krankingsusingthescoringfunctionmentioned.
Havingcomputedthescores,wecalculatedthenrank(racervaluesforMMs)foreachpairofconsecutivegraphsnapshotsandstoredtheminamatrixnrank(racer)*time.
Then,assuminganm-pathofconsecutivesnapshots,wepredictthem+1state.
Foreachpagep,wepredictarankingcomparingittoactualbya10-foldcrossvalidationprocess(training90%ofdatasetandtestingontheremaining10%).
InthecaseoftheEMapproach,wetestedthequalityofclusteringresultsforclusterscardinalitybetween2and10foreachqueryandchosetheonethatmaximizedtheoverallqualityofclustering.
Thiswasdenedasamonotonecombinationofwithin-clusterwc(sumofsquareddistancesfromeachpointtothecenterofclusteritbelongsto)andbetween-clustervariationbc(distancebetweenclustercenters).
Asscorefunctionofclustering,weconsideredtheratiobc/wc.
5.
3ExperimentalResultsRegardingtheGoogleandYahoo!
datasetresultscomingoutoftheexperimen-talevaluation,onecanseethattheMMsprevailwithveryaccurateresults.
Regressionbasedtechniques(LinReg)reachandoutweighMMsperformanceasthelengthoftop-klistincreasesprovingtheirrobustness.
InbothdatasetsexperimentsprovethesuperiorityofEMapproach(BayesMod)whoseperformanceisverysatisfyingforallsimilaritymeasures.
TheMMscomenextintheevaluationranking,whereassmallertheorderisthebetteristhepredictionaccuracy,thoughonewouldthinkofthecontrary.
AFrameworkforWebPageRankPrediction249Obviously(gures)theproposedframeworkoersincrediblyhighaccuracypredictionsandisveryencouraging,asitrangessystematicallybetween70%and100%providingatoolforeectivepredictions.
6ConclusionsWehavedescribedpredictorlearningalgorithmsforWebpagerankpredictionbasedonaframeworkoflearningtechniques(MMs,LinReg,BayesMod)andexperimentalstudyshowedthattheycanachieveoverallverygoodpredictionperformance.
Furtherworkwillfocusinthefollowingissues:a)Multi-featureprediction:weintendtodealwiththeinternalmechanismthatproducestherankingofpages(notonlyrankvalues)basedonmultiplefeatures,b)Combi-nationofsuchmethodswithdimensionalityreductiontechniques.
References1.
Kemeny,J.
G.
,Snell,J.
L.
:FiniteMarkovChains.
Prinston(1963)2.
Dempster,A.
P.
,Laird,N.
M.
,Rubin,D.
B.
:MaximumLikelihoodfromIncompleteDataviatheEMAlgorithm.
RoyalStatisticalSociety39,1–38(1977)3.
Kendall,M.
G.
,Gibbons,J.
D.
:RankCorrelationMethods.
CharlesGrin,UK(1990)4.
Haveliwala,T.
H.
:Topic-SensitivePageRank.
In:Proc.
WWW(2002)5.
Jarvelin,K.
,Kekalainen,J.
:Cumulatedgain-basedevaluationofIRtechniques.
TOIS20,422–446(2002)6.
Chien,S.
,Dwork,C.
,Kumar,R.
,Simon,D.
R.
,Sivakumar,D.
:LinkEvolu-tion:AnalysisandAlgorithms.
InternetMathematics1,277–304(2003)7.
Chen,Y.
-Y.
,Gan,Q.
,Suel,T.
:LocalMethodsforEstimatingPageRankValues.
In:Proc.
ofCIKM(2004)8.
Langville,A.
N.
,Meyer,C.
D.
:UpdatingPageRankwithiterativeaggregation.
In:Proc.
ofthe13thInternationalWorldWideWebConferenceonAlternateTrackPapersandPosters,pp.
392–393(2004)9.
Kan,M.
-Y.
,Thi,H.
O.
:FastwebpageclassicationusingURLfeatures.
In:Confer-enceonInformationandKnowledgeManagement,pp.
325–326.
ACM,NewYork(2005)10.
Yang,H.
,King,I.
,Lu,M.
R.
:PredictiveRanking:ANovelPageRankingApproachbyEstimatingtheWebStructure.
In:Proc.
ofthe14thInternationalWWWCon-ference,pp.
1825–1832(2005)11.
Broder,A.
Z.
,Lempel,R.
,Maghoul,F.
,Pedersen,J.
:EcientPageRankapproxi-mationviagraphaggregation.
Inf.
Retrieval9,123–138(2006)12.
Vlachou,A.
,Berberich,K.
,Vazirgiannis,M.
:Representingandquantifyingrank-changefortheWebgraph.
In:Aiello,W.
,Broder,A.
,Janssen,J.
,Milios,E.
E.
(eds.
)WAW2006.
LNCS,vol.
4936,pp.
157–165.
Springer,Heidelberg(2008)13.
Vazirgiannis,M.
,Drosos,D.
,Senellart,P.
,Vlachou,A.
:WebPageRankPredictionwithMarkovModels.
WWWposter(2008)14.
Sayyadi,H.
,Getoor,L.
:FutureRank:RankingScienticArticlesbyPredictingtheirFuturePageRank.
In:SIAMIntern.
Confer.
onDataMining,pp.
533–544(2009)15.
Zacharouli,P.
,Titsias,M.
,Vazirgiannis,M.
:WebpagerankpredictionwithPCAandEMclustering.
In:Avrachenkov,K.
,Donato,D.
,Litvak,N.
(eds.
)WAW2009.
LNCS,vol.
5427,pp.
104–115.
Springer,Heidelberg(2009)

金山云:618年中促销,企业云服务器2核4G仅401.28元/年,827.64元/3年

金山云618年中促销活动正在进行中!金山云针对企业级新用户优惠力度比普通个人用户优惠力度要大,所以我们也是推荐企业新用户身份购买金山云企业级云服务器,尽量购买3年配置的,而不是限时秒杀活动中1年的机型。企业级用户购买金山云服务器推荐企业专区:云服务器N3 2核4G云服务器,1-5M带宽,827.64元/3年,性价比高,性能稳定!点击进入:金山云618年中促销活动目前,金山云基础型E1云服务器2核4...

georgedatacenter39美元/月$20/年/洛杉矶独立服务器美国VPS/可选洛杉矶/芝加哥/纽约/达拉斯机房/

georgedatacenter这次其实是两个促销,一是促销一款特价洛杉矶E3-1220 V5独服,性价比其实最高;另外还促销三款特价vps,georgedatacenter是一家成立于2019年的美国VPS商家,主营美国洛杉矶、芝加哥、达拉斯、新泽西、西雅图机房的VPS、邮件服务器和托管独立服务器业务。georgedatacenter的VPS采用KVM和VMware虚拟化,可以选择windows...

美国云服务器 1核 1G 30M 50元/季 兆赫云

【双十二】兆赫云:全场vps季付六折优惠,低至50元/季,1H/1G/30M/20G数据盘/500G流量/洛杉矶联通9929商家简介:兆赫云是一家国人商家,成立2020年,主要业务是美西洛杉矶联通9929线路VPS,提供虚拟主机、VPS和独立服务器。VPS采用KVM虚拟架构,线路优质,延迟低,稳定性强。是不是觉得黑五折扣力度不够大?还在犹豫徘徊中?这次为了提前庆祝双十二,特价推出全场季付六折优惠。...

pagerank为你推荐
新iphone也将禁售现在2017年iPhone6s还有多久会被淘汰支付宝调整还款日支付宝还款日期可以更改吗?dell服务器bios设置dell R410服务器 bios设置参数如何恢复出厂设置?360防火墙在哪里电脑或电脑360有联网防火墙吗,在哪里设置ipad代理想买个ipad买几代性价比比较高刚刚网刚刚在网上认识了一个女孩子,不是很了解她,就跟她表白了。我爱试用网电信爱玩4G定向流量包开通需要交费吗如何发帖子网上怎么发帖子?最土团购程序团购网真实吗,流程是什么?帖子标题百度贴吧里帖子标题后面的“(共xxx贴)”和此张贴子的楼层数有何区别?两者的数值并不一样。
2019年感恩节 187邮箱 高防dns westhost raksmart ixwebhosting 轻博 lighttpd 镇江联通宽带 美国十次啦服务器 howfile 泉州电信 东莞数据中心 南通服务器 免费美国空间 免费申请个人网站 美国网站服务器 免费蓝钻 测速电信 免费网络空间 更多