equationsitelink
sitelink 时间:2021-05-24 阅读:(
)
TheMultiRankBootstrapAlgorithm:Semi-SupervisedPoliticalBlogClassicationandRankingUsingSemi-SupervisedLinkClassicationFrankLinandWilliamW.
CohenCarnegieMellonUniversity,5000ForbesAve,Pittsburgh,PA15213frank,wcohen@cs.
cmu.
eduAbstractWepresentanewsemi-supervisedlearningalgorithmforclassifyingpoliticalblogsinablognetworkandrankingthemwithinpredictedclasses.
Wetestouralgorithmontwodatasetsandachieveclassicationaccuracyof81.
9%and84.
6%usingonly2seedblogs.
IntroductionWeproposeanovelalgorithmthatbothclassiespoliticalblogsandrankstheblogswithinthepredicatedclass.
Weseealinktoablogofacertainpoliticalfactionasalinkthatendorsesthatfaction.
Inpredictingthelinklabel,weex-ploitalinkingpropertyfoundinthepoliticalblogosphere:blogswithsimilarpoliticalleaningtendtolinktoeachother(Adamic&Glance2005).
Webootstraptheclassicationoftheblogsandthelinksandtherankingoftheblogsbypropagatingpoliticalleaningfromaninitialsetofknownseednodes.
Weshowthatouralgorithmachieveshighclas-sicationaccuracywhenappliedtonetworksofliberalandconservativepoliticalblogsusingveryfewseeds.
ProposedAlgorithmPageRank(Pageetal.
1998)iswidelyusedtodeterminetheimportanceorauthorityofawebsite.
However,differ-entcommunitiesofusersmightattachdifferentdegreesofauthoritytothesamesite.
Thissuggestsassessingauthor-itywithanextendedversionofPageRank,inwhicheverywebsite(andeveryinter-sitelink)isassociatedwithadiffer-entcommunity,andauthorityscorespropagateonlywithinacommunity.
Inthecontextofpoliticalblogs,eachblogandeachhyperlinkwouldbeassignedtoaparticularfac-tion(e.
g.
liberalorconservative);belowwewilldescribeamethodforassigningblogstofactionsgivenasmallsetofseeds.
Toassessafaction-specicmeasureofauthority,wedeneMultiRankasfollows:rf=(1d)u+dWfrf(1)whereWfijisWijiftheedgefromitojisinEf,otherwisezero;anduistheuniformpersonalizationvectorwhereui=1/|V|anddisaconstantdampingfactor.
Inthisequation,Copyrightc2008,AssociationfortheAdvancementofArticialIntelligence(www.
aaai.
org).
Allrightsreserved.
rfcanbeseenastheprobabilityofarandomwalkonGiftheweonlyfollowedgesbelongstofactionf.
Incontextofapoliticalblognetwork,wecanseethisastheprobabilityofaliberal/conservativeblogsurferrandomlyclickingonlinkspointingtoliberal/conservativeblogs.
Inordertocalculaterf,weneedEf.
Weproposeanitera-tivebootstrappingalgorithm,showninFigure1,tograduallyexpandthesetofedgesEffromasetofinitialseednodesSuntiltheeveryedgeintheentiregraphhasbeenlabeled.
Input:AgraphG=(V,E),setofseednodesS,anedgeexpansionmetriconthegraphM(G,f)thatreturnsasetofpreviouslyunlabelededgesandlabelthemfOutput:Rankingvectorsrf=1.
.
.
nwherefcorrespondtoeachfactionAlgorithm:initializeEfusingSwhile|f=1.
.
.
nEf|=|E|do–e←infinity–whilee>0rf←MultiRank(G,Ef)flabel(v)←argmaxfrf(v)v∈VEf←{e(x→v)∈E:label(v)=f}fe←|EfEf|Ef←Eff–Ef←EfM(G,f)fFigure1:TheMultiRankbootstrapalgorithm(ExploratoryPhase)Wetriedtwoexpansionmetrics:therstmetricsimplylabelallcurrentlyunlabelededgesneighboringcurrentlyla-belededgeswiththesamelabelasthecommonendpoint.
Thesecondmetricisthesame,exceptwecontroltheexpan-sionbylimitingittonunlabelededgesincidenttothenodeswiththehighestcombinedrankingfrf(v),wherenisthenumberofnodesincidenttolabelededges.
Werefertotherstmetricasinniteexpansionandthesecondascontrolledexpansion.
Afterthealgorithmconverges,wecanclassifytheedgesaccordingtoEf,rankthenodeswithinfactionsaccordingtorf,andclassifythenodesaccordingtoargmaxfrf(v).
Wealsopresentasecond,optionalphasetothealgorithmKaleInniteExpansionKaleControlledExpansionExploratorySettlingExploratorySettlingSeedsVertexEdgeVertexEdgeVertexEdgeVertexEdge20.
6410.
7630.
8190.
9680.
7870.
8980.
8040.
95240.
6980.
8760.
8040.
9520.
7700.
9120.
8190.
96880.
7030.
8940.
8040.
9520.
7850.
9490.
8190.
968120.
7000.
8930.
8040.
9520.
8270.
9530.
8040.
952160.
7280.
9170.
8040.
9520.
8240.
9530.
8040.
952200.
7570.
9520.
8070.
9660.
7800.
9590.
8040.
965AdamicInniteExpansionAdamicControlledExpansionExploratorySettlingExploratorySettlingSeedsVertexEdgeVertexEdgeVertexEdgeVertexEdge20.
7000.
8350.
8460.
9780.
5930.
7760.
8450.
97740.
7440.
8880.
8490.
9780.
6140.
7700.
8480.
97860.
7450.
8920.
8490.
9780.
7970.
8870.
8540.
978100.
7360.
8800.
8490.
9780.
7270.
8720.
8490.
978200.
7310.
8890.
8470.
9770.
7430.
9160.
8490.
978400.
7080.
9090.
8460.
9770.
7600.
9450.
8490.
978Table1:Blog(Vertex)andlink(Edge)classicationaccuracyontheKaleandAdamicdatasetsthatmayfurtherimprovetheoutputoftherstphase.
WewillrefertotheoriginalalgorithmshowninFigure1astheexploratoryphaseandthesecondextensionalgorithmasthesettlingphase.
Thesettlingphaseagainexploitsthelinkpropertyfoundinpoliticalblognetwork:blogsaremorelikelytolinkstoblogsofthesamepoliticalfaction.
First,wendallthenodeswherethemajorityoftheneighborsareofandifferentfaction,changingthelabelingofitsin-comingedgestothemajorityneighborfaction,andrunningtheMultiRankalgorithmonthemodiedgraph.
Thisisre-peateduntilthealgorithmconvergeswhena)therearenomorechangesinedgelabelingorb)whenthealgorithmre-visitsanoldstateduetocyclingchanges.
ExperimentsandDiscussionsToassesstheeffectivenessofouralgorithm,wetesteditontwodatasets.
Therstdatasetisconstructedinthesamewayasdescribedin(Kaleetal.
2007),whereweendedupwithagraphof404connectedblogs.
WewillrefertothisastheKaledataset.
Theseconddatasetisconstructedbysimplycreatingagraphfrom(Adamic&Glance2005)andtakingthelargestconnectedcomponent.
Thisdatasetcontains1222connectedblogsandwerefertoitastheAdamicdataset.
Itshouldbepointedoutthatthedatasetlabelingisnot100%accurateasnotedin(Adamic&Glance2005).
Werunouralgorithmonthetwodatasetsvaryingthreeparameters:thenumberofseednodes,theexpansionmet-ric,andtheinclusionorexclusionoftheoptional"settlingphase.
"Inallourexperiments,wepickseedsaccordingtothetopnPageRankedblogs,n/2perfaction.
Inallin-stancesoftheMultiRankalgorithmthedampingfactordissetto0.
85,apopularchoiceofdampingfactorwhichweborrowedwithoutfurthertuning.
Wepointoutsomeobservationsontheeffectofthethreevariables.
First,inclusionoftheoptionalsettlingphasetendstoimproveupontheresultsoftherstexploratoryphaseuptoanalmostconstantpointregardlessofthenumberofseedswiththeexceptionofcontrolledexpansionwith12and16seedsontheKaledataset,wheresettlingphaseactuallyhurttheperformance.
Second,increasingthenumberofseedsimprovestheperformanceoftheexploratoryphase,butnotwiththeadditionofthesettlingphase,whichworkssurpris-inglywellwithonlytwoseeds.
Third,ingeneral,controllingtheexpansionseemstohelpclassicationaccuracy.
AnotherinterestingpropertyofthisalgorithmisthatmostclassicationerrorsaremadeonblogswithlowerPageR-ank.
IfblogsareorderedbyPageRank,theerrorrateonthetopquartileofblogsis0.
05,whiletheerrorrateonthebottomquartileis0.
45(datanotshownduetospacelimita-tions).
ConclusionsWehaveintroducedanewsemi-supervisedalgorithmforsi-multaneouslyclassifyingandrankingpoliticalblogsbasedonlinkstructure.
Weshowedthatthisalgorithmrequiresveryfewinitialseedstoachieveperformanceabove80%ontwopoliticalblogdatasetsofdifferentsizeandlinkstruc-ture.
Thisalgorithmtendfavormoreauthoritativeblogsintermsofclassicationaccuracy.
ReferencesAdamic,L.
,andGlance,N.
2005.
Thepoliticalblogo-sphereandthe2004u.
s.
election:Dividedtheyblog.
InProceedingsoftheWWW-2005WorkshopontheWeblog-gingEcosystem.
Kale,A.
;Karandikar,A.
;Kolari,P.
;Java,A.
;Finin,T.
;andJoshi,A.
2007.
Modelingtrustandinuenceintheblogosphereusinglinkpolarity.
InICWSM2007.
Page,L.
;Brin,S.
;Motwani,R.
;andWinograd,T.
1998.
ThePageRankcitationranking:Bringingordertotheweb.
Technicalreport,StanfordDigitalLibraryTechnologiesProject.
ProfitServer怎么样?ProfitServer好不好。ProfitServer是一家成立于2003的主机商家,是ITC控股的一个部门,主要经营的产品域名、SSL证书、虚拟主机、VPS和独立服务器,机房有俄罗斯、新加坡、荷兰、美国、保加利亚,VPS采用的是KVM虚拟架构,硬盘采用纯SSD,而且最大的优势是不限制流量,大公司运营,机器比较稳定,数据中心众多。此次ProfitServer正在对...
HostYun是一家成立于2008年的VPS主机品牌,原主机分享组织(hostshare.cn),商家以提供低端廉价VPS产品而广为人知,是小成本投入学习练手首选,主要提供基于XEN和KVM架构VPS主机,数据中心包括中国香港、日本、德国、韩国和美国的多个地区,大部分机房为国内直连或者CN2等优质线路。本月商家全场9折优惠码仍然有效,以KVM架构产品为例,优惠后韩国VPS月付13.5元起,日本东京...
近期联通CUVIP的线路(AS4837线路)非常火热,妮妮云也推出了这类线路的套餐以及优惠,目前到国内优质线路排行大致如下:电信CN2 GIA>联通AS9929>联通AS4837>电信CN2 GT>普通线路,AS4837线路比起前两的优势就是带宽比较大,相对便宜一些,所以大家才能看到这个线路的带宽都非常高。妮妮云互联目前云服务器开放抽奖活动,每天开通前10台享3折优惠,另外...
sitelink为你推荐
"2014年全国民营企业招聘会现场A区域企业信息",,,,技术参数及要求:重要产品信息指南支持ipad支持ipadphotoshop技术ps是一种什么技术??????css下拉菜单CSS如何把下拉菜单改为上拉菜单重庆电信宽带管家电信的宽带上网助手是什么?联通iphone4联通iphone4怎么样,好不好用?google分析google analysis干什么用的?
注册域名 idc评测 便宜服务器 樊云 息壤备案 seovip 免费网络电视 godaddy域名证书 天互数据 有益网络 免费美国空间 免费测手机号 web服务器安全 申请网页 如何建立邮箱 重庆电信服务器托管 下载速度测试 日本代理ip 德讯 阿里云邮箱申请 更多