equationsitelink
sitelink 时间:2021-05-24 阅读:(
)
TheMultiRankBootstrapAlgorithm:Semi-SupervisedPoliticalBlogClassicationandRankingUsingSemi-SupervisedLinkClassicationFrankLinandWilliamW.
CohenCarnegieMellonUniversity,5000ForbesAve,Pittsburgh,PA15213frank,wcohen@cs.
cmu.
eduAbstractWepresentanewsemi-supervisedlearningalgorithmforclassifyingpoliticalblogsinablognetworkandrankingthemwithinpredictedclasses.
Wetestouralgorithmontwodatasetsandachieveclassicationaccuracyof81.
9%and84.
6%usingonly2seedblogs.
IntroductionWeproposeanovelalgorithmthatbothclassiespoliticalblogsandrankstheblogswithinthepredicatedclass.
Weseealinktoablogofacertainpoliticalfactionasalinkthatendorsesthatfaction.
Inpredictingthelinklabel,weex-ploitalinkingpropertyfoundinthepoliticalblogosphere:blogswithsimilarpoliticalleaningtendtolinktoeachother(Adamic&Glance2005).
Webootstraptheclassicationoftheblogsandthelinksandtherankingoftheblogsbypropagatingpoliticalleaningfromaninitialsetofknownseednodes.
Weshowthatouralgorithmachieveshighclas-sicationaccuracywhenappliedtonetworksofliberalandconservativepoliticalblogsusingveryfewseeds.
ProposedAlgorithmPageRank(Pageetal.
1998)iswidelyusedtodeterminetheimportanceorauthorityofawebsite.
However,differ-entcommunitiesofusersmightattachdifferentdegreesofauthoritytothesamesite.
Thissuggestsassessingauthor-itywithanextendedversionofPageRank,inwhicheverywebsite(andeveryinter-sitelink)isassociatedwithadiffer-entcommunity,andauthorityscorespropagateonlywithinacommunity.
Inthecontextofpoliticalblogs,eachblogandeachhyperlinkwouldbeassignedtoaparticularfac-tion(e.
g.
liberalorconservative);belowwewilldescribeamethodforassigningblogstofactionsgivenasmallsetofseeds.
Toassessafaction-specicmeasureofauthority,wedeneMultiRankasfollows:rf=(1d)u+dWfrf(1)whereWfijisWijiftheedgefromitojisinEf,otherwisezero;anduistheuniformpersonalizationvectorwhereui=1/|V|anddisaconstantdampingfactor.
Inthisequation,Copyrightc2008,AssociationfortheAdvancementofArticialIntelligence(www.
aaai.
org).
Allrightsreserved.
rfcanbeseenastheprobabilityofarandomwalkonGiftheweonlyfollowedgesbelongstofactionf.
Incontextofapoliticalblognetwork,wecanseethisastheprobabilityofaliberal/conservativeblogsurferrandomlyclickingonlinkspointingtoliberal/conservativeblogs.
Inordertocalculaterf,weneedEf.
Weproposeanitera-tivebootstrappingalgorithm,showninFigure1,tograduallyexpandthesetofedgesEffromasetofinitialseednodesSuntiltheeveryedgeintheentiregraphhasbeenlabeled.
Input:AgraphG=(V,E),setofseednodesS,anedgeexpansionmetriconthegraphM(G,f)thatreturnsasetofpreviouslyunlabelededgesandlabelthemfOutput:Rankingvectorsrf=1.
.
.
nwherefcorrespondtoeachfactionAlgorithm:initializeEfusingSwhile|f=1.
.
.
nEf|=|E|do–e←infinity–whilee>0rf←MultiRank(G,Ef)flabel(v)←argmaxfrf(v)v∈VEf←{e(x→v)∈E:label(v)=f}fe←|EfEf|Ef←Eff–Ef←EfM(G,f)fFigure1:TheMultiRankbootstrapalgorithm(ExploratoryPhase)Wetriedtwoexpansionmetrics:therstmetricsimplylabelallcurrentlyunlabelededgesneighboringcurrentlyla-belededgeswiththesamelabelasthecommonendpoint.
Thesecondmetricisthesame,exceptwecontroltheexpan-sionbylimitingittonunlabelededgesincidenttothenodeswiththehighestcombinedrankingfrf(v),wherenisthenumberofnodesincidenttolabelededges.
Werefertotherstmetricasinniteexpansionandthesecondascontrolledexpansion.
Afterthealgorithmconverges,wecanclassifytheedgesaccordingtoEf,rankthenodeswithinfactionsaccordingtorf,andclassifythenodesaccordingtoargmaxfrf(v).
Wealsopresentasecond,optionalphasetothealgorithmKaleInniteExpansionKaleControlledExpansionExploratorySettlingExploratorySettlingSeedsVertexEdgeVertexEdgeVertexEdgeVertexEdge20.
6410.
7630.
8190.
9680.
7870.
8980.
8040.
95240.
6980.
8760.
8040.
9520.
7700.
9120.
8190.
96880.
7030.
8940.
8040.
9520.
7850.
9490.
8190.
968120.
7000.
8930.
8040.
9520.
8270.
9530.
8040.
952160.
7280.
9170.
8040.
9520.
8240.
9530.
8040.
952200.
7570.
9520.
8070.
9660.
7800.
9590.
8040.
965AdamicInniteExpansionAdamicControlledExpansionExploratorySettlingExploratorySettlingSeedsVertexEdgeVertexEdgeVertexEdgeVertexEdge20.
7000.
8350.
8460.
9780.
5930.
7760.
8450.
97740.
7440.
8880.
8490.
9780.
6140.
7700.
8480.
97860.
7450.
8920.
8490.
9780.
7970.
8870.
8540.
978100.
7360.
8800.
8490.
9780.
7270.
8720.
8490.
978200.
7310.
8890.
8470.
9770.
7430.
9160.
8490.
978400.
7080.
9090.
8460.
9770.
7600.
9450.
8490.
978Table1:Blog(Vertex)andlink(Edge)classicationaccuracyontheKaleandAdamicdatasetsthatmayfurtherimprovetheoutputoftherstphase.
WewillrefertotheoriginalalgorithmshowninFigure1astheexploratoryphaseandthesecondextensionalgorithmasthesettlingphase.
Thesettlingphaseagainexploitsthelinkpropertyfoundinpoliticalblognetwork:blogsaremorelikelytolinkstoblogsofthesamepoliticalfaction.
First,wendallthenodeswherethemajorityoftheneighborsareofandifferentfaction,changingthelabelingofitsin-comingedgestothemajorityneighborfaction,andrunningtheMultiRankalgorithmonthemodiedgraph.
Thisisre-peateduntilthealgorithmconvergeswhena)therearenomorechangesinedgelabelingorb)whenthealgorithmre-visitsanoldstateduetocyclingchanges.
ExperimentsandDiscussionsToassesstheeffectivenessofouralgorithm,wetesteditontwodatasets.
Therstdatasetisconstructedinthesamewayasdescribedin(Kaleetal.
2007),whereweendedupwithagraphof404connectedblogs.
WewillrefertothisastheKaledataset.
Theseconddatasetisconstructedbysimplycreatingagraphfrom(Adamic&Glance2005)andtakingthelargestconnectedcomponent.
Thisdatasetcontains1222connectedblogsandwerefertoitastheAdamicdataset.
Itshouldbepointedoutthatthedatasetlabelingisnot100%accurateasnotedin(Adamic&Glance2005).
Werunouralgorithmonthetwodatasetsvaryingthreeparameters:thenumberofseednodes,theexpansionmet-ric,andtheinclusionorexclusionoftheoptional"settlingphase.
"Inallourexperiments,wepickseedsaccordingtothetopnPageRankedblogs,n/2perfaction.
Inallin-stancesoftheMultiRankalgorithmthedampingfactordissetto0.
85,apopularchoiceofdampingfactorwhichweborrowedwithoutfurthertuning.
Wepointoutsomeobservationsontheeffectofthethreevariables.
First,inclusionoftheoptionalsettlingphasetendstoimproveupontheresultsoftherstexploratoryphaseuptoanalmostconstantpointregardlessofthenumberofseedswiththeexceptionofcontrolledexpansionwith12and16seedsontheKaledataset,wheresettlingphaseactuallyhurttheperformance.
Second,increasingthenumberofseedsimprovestheperformanceoftheexploratoryphase,butnotwiththeadditionofthesettlingphase,whichworkssurpris-inglywellwithonlytwoseeds.
Third,ingeneral,controllingtheexpansionseemstohelpclassicationaccuracy.
AnotherinterestingpropertyofthisalgorithmisthatmostclassicationerrorsaremadeonblogswithlowerPageR-ank.
IfblogsareorderedbyPageRank,theerrorrateonthetopquartileofblogsis0.
05,whiletheerrorrateonthebottomquartileis0.
45(datanotshownduetospacelimita-tions).
ConclusionsWehaveintroducedanewsemi-supervisedalgorithmforsi-multaneouslyclassifyingandrankingpoliticalblogsbasedonlinkstructure.
Weshowedthatthisalgorithmrequiresveryfewinitialseedstoachieveperformanceabove80%ontwopoliticalblogdatasetsofdifferentsizeandlinkstruc-ture.
Thisalgorithmtendfavormoreauthoritativeblogsintermsofclassicationaccuracy.
ReferencesAdamic,L.
,andGlance,N.
2005.
Thepoliticalblogo-sphereandthe2004u.
s.
election:Dividedtheyblog.
InProceedingsoftheWWW-2005WorkshopontheWeblog-gingEcosystem.
Kale,A.
;Karandikar,A.
;Kolari,P.
;Java,A.
;Finin,T.
;andJoshi,A.
2007.
Modelingtrustandinuenceintheblogosphereusinglinkpolarity.
InICWSM2007.
Page,L.
;Brin,S.
;Motwani,R.
;andWinograd,T.
1998.
ThePageRankcitationranking:Bringingordertotheweb.
Technicalreport,StanfordDigitalLibraryTechnologiesProject.
4324云是成立于2012年的老牌商家,主要经营国内服务器资源,是目前国内实力很强的商家,从价格上就可以看出来商家实力,这次商家给大家带来了全网最便宜的物理服务器。只能说用叹为观止形容。官网地址 点击进入由于是活动套餐 本款产品需要联系QQ客服 购买 QQ 800083597 QQ 2772347271CPU内存硬盘带宽IP防御价格e5 2630 12核16GBSSD 500GB30M1个IP...
无忧云怎么样?无忧云服务器好不好?无忧云值不值得购买?无忧云是一家成立于2017年的老牌商家旗下的服务器销售品牌,现由深圳市云上无忧网络科技有限公司运营,是正规持证IDC/ISP/IRCS商家,主要销售国内、中国香港、国外服务器产品,线路有腾讯云国外线路、自营香港CN2线路等,都是中国大陆直连线路,非常适合免备案建站业务需求和各种负载较高的项目,同时国内服务器也有多个BGP以及高防节点...
Mineserver(ASN142586|UK CompanyNumber 1351696),已经成立一年半。主营香港日本机房的VPS、物理服务器业务。Telegram群组: @mineserver1 | Discord群组: https://discord.gg/MTB8ww9GEA7折循环优惠:JP30(JPCN2宣布产品可以使用)8折循环优惠:CMI20(仅1024M以上套餐可以使用)9折循...
sitelink为你推荐
尊敬的浪潮英信服务器用户:模式ios8存在问题的应用软件名单(2020年第四批)支付appleipad连不上wifiiPad 连不上Wifi,显示无互联网连接win10关闭445端口win10家庭版怎么禁用445端口win10445端口win的22端口和23端口作用分别是什么 ?iphonewifi苹果手机怎么扫二维码连wifi联通版iphone4s苹果4s是联通版,或移动版,或全网通如何知道?重庆电信宽带管家重庆电信宽带安装收费
虚拟主机软件 私服服务器租用 漂亮qq空间 香港机房 bluehost 双11抢红包攻略 火车票抢票攻略 win8升级win10正式版 debian6 个人空间申请 已备案删除域名 北京双线 699美元 静态空间 isp服务商 息壤代理 个人免费主页 韩国代理ip xuni 免费个人网页 更多