reciprocatedgraphsearch
graphsearch 时间:2021-05-25 阅读:(
)
ICWSM'2007Boulder,Colorado,USAStructuralLinkAnalysisfromUserProfilesandFriendsNetworks:AFeatureConstructionApproachWilliamH.
HsuJosephLancasterMartinS.
R.
ParadesiTimWeningerDepartmentofComputingandInformationSciences,KansasStateUniversity234NicholsHallManhattan,KS66506-2302+17855326350{bhsu|joseph|pmsr|weninger}@ksu.
eduAbstractWeconsidertheproblemsofpredicting,classifying,andannotatingfriendsrelationsinfriendsnetworks,baseduponnetworkstructureanduserprofiledata.
First,wedocumentadatamodelfortheblogserviceLiveJournal,anddefineasetofmachinelearningproblemssuchaspredictingexistinglinksandestimatinginter-pairdistance.
Next,weexplainhowtheproblemofclassifyingauserpairinasocialnetwork,asdirectlyconnectedornot,posestheproblemofselectingandconstructingrelevantfeatures.
Wedocumentfeatureanalyzersforattributesthatdependonlyongraphattributes,thosethatdependonindividualuserdemographicsandset-valuedattributes(e.
g.
,interests,communities,andeducationalinstitutions),andthosethatdependoncandidateuserpairs.
Wethenextendourdatamodelusingwhole-networkattributesandreportmachinelearningexperimentsonlearningtheconceptofaconnectedpairoffriendsfromLiveJournaldata.
Finally,wedevelopatheoryofdependenttypesforderivingcausalexplanationsanddiscusshowthiscanbeusedtoscalestatisticalrelationallearninguptoourfullcorpus,arecentcrawlofoveramillionrecordsfromLiveJournal.
GeneralTermsAlgorithms,ExperimentationKeywordsdatamining,linkanalysis,machinelearning,socialnetworkanalysis,userprofiling.
1.
IntroductionAnalysisoffriendsnetworksprovidesabasisforunderstandingthewebofinfluence[Ko01]insocialmedia.
Inparticular,theproblemsofdeterminingtheexistenceoflinksandofclassifyingandannotatingknownlinksarefirststepstowardidentifyingpotentialrelationships.
Thisinferredinformationcaninturnbeusedtointroducenewpotentialfriendstooneanother,makebasicrecommendationssuchascommunityrecruitsormoderatorcandidates,oridentifywholecliquesandcommunities.
Inthispaper,weconsidertheproblemofdiscoveringlinksinanincompletegraph.
Wepresentanapproachtolinkpredictionthatisbasedongraphfeatureanalysisandintrinsicattributesofentities(usersandcommunities).
Wereportsomepromisingpreliminaryresultsonradius-limitedneighborhoodsofthebloggingserviceLiveJournalanddiscusstheresultsofexploratoryexperimentsthatpointtowardaneedtodifferentiatethetypesoffeaturesinafriendsnetwork,namely:1.
thosethatdependonthedemographicsoftheentirenetwork2.
thosethatarecomputableforeachuseroreachpairofuser3.
thosethatdependontheexistenceofareported,inferred,orsuspectedlinkWederivesomesuchfeaturesanddiscussthecostsofcomputing,selecting,andrecombiningthem.
Ofparticularinterestinthedomainofcommercialweblogsandsocialmediaaredemographicfeaturesrelevanttocollaborativerecommendationofgoodsandformationofbrandingcommunities.
Thestructuraldependenceandcontext-specificdependenceoffeaturesdetermineswhatnewfeaturesarefeasibletoconstruct,bothintermsofstatisticalsufficiencyandcomputationalcomplexity.
Inconclusion,weexaminesomenewfeaturesthatwerederivedbyhand,discussthealgorithmsusedtocomputethem,andrelatethesespecificalgorithmstoabroaderclassofrelationaldatabasequeriesthatformthebasisofamorepowerfulfeatureconstructionsystem.
2.
Background2.
1FriendsNetworksfromUserProfilesSocialnetworkservicessuchasMySpaceandFacebookallowuserstolistinterestsandlinktofriends,sometimesannotatingtheselinksbydesignatingtrustlevelsorqualitativeratingsforselectedfriends.
Somesuchservices,suchasGoogle'sOrkut,arecommunity-centric;others,suchasthevideobloggingserviceYouTubeandthephotoserviceFlickr,emphasizesocialmedia;whilesome,suchasSixApart'sLiveJournalandVox,areorganizedaroundtext-and-imageweblogs.
LiveJournalanditsderivativeservices,suchasGreatestJournal,DeadJournal,andJournalFen,arebasedonthesameopen-sourceservercode.
Atthetimeofthiswriting,thereareover11.
7millionLiveJournalaccounts,1.
8millionofthemactive.
ThefriendsnetworkofLiveJournal,ourtopicofstudy,hastwovarietiesofaccounts:usersandcommunities(weomitRSSfeeds).
Oneadvantageouspropertyofitsdatamodel,stemmingfromacommonschemaforthetwoaccounttypes(whichcouldoriginallybeconvertedfromusertocommunity),isthatitprovidesasimple,flexiblerepresentationforentitiesandrelations.
StartEndLinkDenotesUserUserTrustorfriendshipUserCommunityReadershiporsubscribershipCommunityUserMembership,postingaccess,maintainerCommunityCommunityObsoleteTable1.
TypesoflinksintheblogserviceLiveJournal.
Table1showsthetypesoflinksinLiveJournalandtheirconstituentattributes.
Friendshipisanasymmetricrelationbetweentwoaccounts,eachrepresentedbyavertexinadirectedgraph.
Thetypeofthestartandendpointdefinestherelationshipsetattributesofthelink.
Forexample,auseruwhoaddsanotheruservtohisorherfriendslistcanspecifythemembershipinanyofupto30groups.
Theseservethedualpurposeofblogaggregation(postsfromeachgroup'smembersarefilteredintoitsaggregatorpage,whichucanreadormakepublic)andgroups-basedsecurity(eachgroupdenotesaread/commentaccesscontrollist).
Accesscontrollistsforcommunitiesareassociatedwithmemberships(community-to-userlinks),whilecontentiscontrolledbypostersorsubscribers.
Ausercan"watch"acommunityinordertoaddallaccessiblepoststoamainaggregatorpageortocustomgroups.
Thesetofaccessiblepostsconsistsofeitherpublicpostsonly,orpublicandrestricted(members-only)posts.
Theaccesscontrollistisdefinedbythemembershiprelationandindividualposters'selections(whethertoallowcommentsandwhethertodisplaythembydefaultfromnoreaders,allreaders,non-anonymousreaders,orcommunitymembers).
Acquisitionofprivilegesisacommunityproperty,ofwhichonlymembershipmaybeacquiredsolelybyuseraction("joining"acommunity),ifthemoderatorhasspecifiedopenmembership.
Figure1.
LiveJournalaccesscontrollistmaintenance(communitymoderatorinterface).
Thus,areciprocallinkbetweenauserandacommunitymeansthattheuserbothsubscribestothecommunityandisanapprovedmember.
Linksfromuserutovarelistedinthe"Friends"listofuandinanoptionallydisplayed"FriendsOf"listofv.
Thislistcanbepartitionedintoreciprocalandnon-reciprocalsublistsforauseru:MutualFriends:{v|(v,u)∈E∧(u,v)∈E}AlsoFriendOf:{v|(v,u)∈E∧(u,v)E}Thecommunityanalogueofthe"FriendsOf"lististhe"WatchedBy"(subscriber)list,whosemembershavethecommunitynamelistedinthe"Friends:Communities"sectionsoftheirindividualuserprofilepages.
Thecommunityanalogueofthe"Friends"lististhe"Members"list.
ThefriendsnetworkforLiveJournalconsistsofaverylargecentralconnectedcomponentandmanysmallislands,mostofwhicharesingletonusers.
Thereareafewsourcevertices,correspondingtoaccountsthatlinktoothersbuthavenoreciprocatedfriendships;theseareusuallyRSSorblogaggregatoraccountsownedbyindividuals.
Additionally,therearesinkverticescorrespondingtoaccountswatchedbyothers,butwhichhavenamednofriends.
Someofthesearechannelsforannouncementordisseminationofcreativework.
2.
2LinkIdentificationInpreviouswork[HKP+06],weintroducedalinkpredictionproblemforLiveJournal:givenagraphinwhichtheexistenceofacandidatelinkishidden(elidedifitexists),classifyitaspresentorabsentgivenallotherattributesofthegraphandoftheendpoints.
Ourinitialapproachtolinkidentificationconsistedofdividingfriendsnetworkfeaturesintographfeaturesandinterest-basedfeatures.
Graphfeaturescouldbecomputedsimplybyscanningthegraph,inthecaseofpair-distancemetrics,performingall-pairsshortestpath(APSP)search:1.
Indegreeofu:popularityoftheuser2.
Indegreeofv:popularityofthecandidate3.
Outdegreeofu:numberofotherfriendsbesidesthecandidate;saturationoffriendslist4.
Outdegreeofv:numberofexistingfriendsofthecandidatebesidestheuser;correlateslooselywithlikelihoodofareciprocallink5.
Numberofmutualfriendswsuchthatu→w∧w→v6.
"Forwarddeleteddistance":minimumalternativedistancefromutovinthegraphwithouttheedge(u,v)7.
BackwarddistancefromvtouinthegraphTheseweresupplementedbyinterest-basedfeatures:8.
Numberofmutualinterestsbetweenuandv9.
Numberofinterestslistedbyu10.
Numberofinterestslistedbyv11.
Ratioofthenumberofmutualintereststothenumberlistedbyu12.
Ratioofthenumberofmutualintereststothenumberlistedbyv2.
3EfficientfeatureanalysisThedegreeattributescanbeenumeratedintimelinearinthenumberofusers,ascanthemutualfriendscountforeachpairofusers.
Forwarddeleteddistancemeasuresthedistancefromutovbyalternateroutes,aftertheedge(u,v)iselided.
Thepredictiontaskisthustoreconstructtheincompletegraphresultingfromthiserasure,todeterminewhetheraparticularlink(u,v)existed.
ForwarddeleteddistancecanbeprecomputedexhaustivelyfortheentiregraphinΘ(|E|(|V|+|E|))=Θ(|E|2)timebyerasingeachedgeinEandre-runningabreadth-firstsearchfromthestartvertex.
Ifacandidateedgeisnotstoredintheresultingcache,itsdeleteddistanceisthatfoundbyBFSontheoriginalgraph,inΘ(|V|+|E|)time.
Inagraph(V,E),backwarddistancerequiresΘ(|V|+|E|)usingBFSforaparticularcandidateedge.
SincetheexpectedsizeoftheedgesetisE[|E|]=k|V|,aboutk=20onaverageacrossLiveJournal,thebottleneckcomputationisthatofforwarddeleteddistance:Θ(|E|2)=Θ(k2|V|2),orΘ(|V|2)withalargeconstant.
Usingastraightforwardstringpairenumerationandcomparisonalgorithm,themutualinterestcountsarestoredinmatrixof|V|2elements,eachrequiringconstanttimetocheck(givenamaximumof150interests).
previouswork[HKP+06],weintroducedalinkpredictionproblemforLiveJournal:givenagraphinwhichtheexistenceofacandidatelinkishidden(elidedifitexists),classifyitaspresentorabsentgivenallotherattributesofthegraphandoftheendpoints.
Ourinitialapproachtolinkidentificationconsistedofdividingfriendsnetworkfeaturesintographfeaturesandinterest-basedfeatures.
2.
4MethodologiesforlinkminingGetoorandDiehl[GD05]recentlysurveyedtechniquesforlinkmining,focusingonstatisticalrelationallearningapproachesandemphasizinggraphicalmodelsrepresentationsoflinkstructure.
Ketkaretal.
[KHC05]comparedataminingtechniquesovergraph-basedrepresentationsoflinkstofirst-orderandrelationalrepresentationsandlearningtechniquesthatarebaseduponinductivelogicprogramming(ILP).
SarkarandMoore[SM05]extendtheanalysisofsocialnetworksintothetemporaldimensionbymodelingchangeinlinkstructureacrossdiscretetimesteps,usinglatentspacemodelsandmultidimensionalscaling.
OneofthechallengesincollectingtimeseriesdatafromLiveJournalistheslowrateofdataacquisition,justasspatialannotationdata(suchasthatfoundinLJmapsandthe"plotyourfriendsonamapmeme)isrelativelyincomplete.
2.
5OtherapplicationsusinggraphminingPopesculandUngar[PU03]learnakindofentity-relationalmodelfromdatainordertopredictlinks.
Hill[Hi03]andBhattacharyaandGetoor[BG04]similarlyusestatisticalrelationallearningfromdatainordertoresolveidentityuncertainty,particularlycoreferencesandotherredundancies(alsocalleddeduplication).
Resigetal.
[RDHT04]usealarge(200000-user)crawlofLiveJournaltoannotateasocialnetworkofinstantmessagingusers,andexploretheapproachofpredictingonlinetimesasafunctionoffriendsgraphdegree.
Therehavebeennumerousrecentapplicationsofsocialnetworkminingbasedonthetextandheadersofe-mail.
OnenotableresearchprojectbyMcCallumetal.
[MCW05]usestheEnrone-mailcorpusandinfersrolesandtopiccategoriesbasedonlinkanalysisAprimarygoalofthisworkistoextendthegraphminingapproachbeyondlinkpredictionandrecommendationtowardslinkexplanationandannotation.
Itmaybemuchmoreusefultoexplainwhyagroupoffriendsinablogservicecreatedaccountsenmasseoraddedoneanotherasfriendsthantorecommendrelationshipsetsthatarealreadyextantorstructuredaccordingtoapreexistentsocialgroup.
Forexample,highschoolclassmatesoftencreateaccountsandencouragetheirpeerstojointhesameservice.
Inafewcases,thisisencouragedorfacilitatedbyateacher,foraclassproject.
Solvingtheproblemoflinkpredictionisnotparticularlyusefulinthiscase,becausetheuserdecisionshavealreadybeenmadeorstronglyconstrained;however,itmaybeveryusefultolinkotherclassmatesnotworkingonthesameprojecttothesamerelationshipset(perhapstheywereencouragedtojointheblogservicebystudentswhocontinuedtouseitaftertheclassproject).
Largegroupssuchaswebcomicsubscriberships,communityco-members,etc.
arealsosomewhatidentifiable,andrelatingmembersofablogservicetooneanotherthroughrelationshipsetsisatypicalentity-relationaldatamodelingoperationthatcanbemademorerobustandefficientthroughgraphfeatureextraction.
3.
ExperimentDesign3.
1LJCrawlerv2Toacquirethegraphstructureandattributesdescribeintheprevioussection,wedevelopedanHTTP-basedspidercalledLJCrawlertoharvestuserinformationfromLiveJournalAmultithreadedversionofthisprogram,whichretrievesBMLdatapublishedbyDenga(theownersofLiveJournal),collectsanaverageofupto15recordspersecond,traversingthesocialnetworkdepth-firstandarchivingtheresultsinamasterindexfile.
BecauseLiveJournal'sfunctionalityforlookingupusersbyusernumberisonlyavailabletoadministrators,wedecidedtocompilealistofseedsforadisjoint-setrepresentationofthedisconnectedsocialnetwork.
Forpurposesofthisexperiment,however,startingfromjustoneseed(thefirstauthor'sLiveJournalID)andrestrictingthecrawltooneconnectedcomponentwassufficient.
UsingLJCrawler,wecompiledanadjacencylistandthefollowinggroundfeaturesforeachuser:Accounttype(user,community)InterestlistSchoollistCommunitieswatchedlistCommunitymembershiplistFriendsoflistFriendslist3.
2FeatureAnalyzersWedefineasingleexampletobeacandidateedge(u,v)intheunderlyingdirectedgraphofthesocialnetwork,alongwithasetofdescriptivefeaturescalculatedfromtheannotatedgraphrecordedbyLJCrawler:Otherfeatures:Additionalplannedfeaturesforcontinuingexperimentsincludedates(updatefrequencieswhentakendifferentially),useroptionssuchasmaximumfriendscount,andcontentdescriptorsofLiveJournalentriesandcomments(averagepostlength,wordfrequency,etc.
).
3.
3GraphSearchAlgorithmsforComputingFeaturesComputingtheminimumforwardandbackwarddistancescanbedonemoreefficientlybyusingbreadth-firstsearch.
Currently,aJavaimplementationofthisalgorithmrequiresunderoneminuteona2GHzAMDOpteronsystemtoprocessa2000-nodegraph.
However,enumeratingallpossiblecandidatepairswithinaneighborhoodof2nodes(1.
6millionpairsfor4000nodes)requiresseveralhoursonthesamesystem.
WenotethattheamortizedcostofrunningBFStoprecomputeall-pairsshortestpaths(APSP)withtheactualedgedeleted(whichisnecessarytoavoidknowingthepredictiontargetinlinkpredicton)isΘ(|E|(|V|+|E|)).
Thisisprohibitivelylargeevenforour"mid-sized"subgraphsof10-50Knodes;when|V|isabout11million,|E|isalittleover200million,enumeratingAPSPiscompletelyinfeasible.
However,wedonottypicallyconsiderallofE,sothebottleneckistypicallythefirststepplusaconstantnumberofcallstoBFS,requiringrunningtimeinΘ(k(|V|+|E|)).
3.
4GeneratingCandidatesWeconsideredseveralalternativewaystogeneratecandidateedges(u,v):Thefirsttechniqueislikelytobeunscalable,asthenumberofcandidatesis|V|2.
ThesecondrequireshavingarepresentativelylargesampleofthefullLiveJournalsocialnetwork,inordertofitthedistributionparametersaccurately.
Thethirdwasthemoststraightforwardtoimplement.
Twocallstotheallpairsshortestpathalgorithmprovidedcostmatrix,andonepassateachradiusuptoamaximumof10yieldedthedatashowninTable2.
Tosimplifytheinitialexperiments,wedefinedtheclassificationproblemtobeclassificationofd(u,v)as1or2.
Thistaskisactuallyusefulforsocialnetworkrecommendersystemsbecausediscriminationofadirectfriendfroma"friendofafriend"(FOAF)isfunctionallysimilartorecommendingFOAFstolinktodirectly.
Therearemoredetailedclassificationtargets,suchasplacement,promotion,anddemotionoflinkedfriendswithinstrataoftrust(setting,increasing,anddecreasingthesecuritylevel),butchoosingauser'sfriendstobeginwithisthemorefundamentaldecision.
Table2andTable3reportthedistributionofinter-vertexdistancesinthefriendsnetworkfortwosubnetworksinducedbylimitingthemaximumnumberofnodes.
DistancedFrequency(=d)Cumulative(≤d)1620462042107307113511369896183407459926243333534002467336255246988716247004812470059001000∞9731256735Table2.
Numberofcandidateedgesforthe1000-nodeLiveJournalgraph.
DistancedFrequency(=d)Cumulative(≤d)1194101941023705683899783403075793053452037313134265123747143717361845314556267265714582838339145862292914586511001458651∞1745341633185Table3.
Numberofcandidateedgesforthe4000-nodeLiveJournalgraph.
4.
Results4.
1Preliminaryexperiment:941-nodeversionInapreliminaryexperiment,weconstructeda941-nodesubgraph,definingtheconceptIsFriendOfandtrainedthreetypesofinducerswith:1.
allattributes2.
allgraphattributesexcludingtheforwardandbackwarddistances3.
thebackwarddistancesalone4.
thebackwardandforwarddistancesalone5.
interest-relatedattributesalone.
Table4andTable5showtheresultsforthreeinducers:theJ48decisiontreeinducer,Holte's1Rinducer(asingle-ruleclassifierbasedonasingleattribute)[Ho93],andtheLogisticregressioninducer.
Allaccuracymeasureswerecollectedover10-foldcross-validatedruns.
TheJ48outputwthallfeaturesachievesasignificantboostoverthenexthighest(distanceonly).
InducerAllNoDistBkDistDistInterestJ4898.
294.
895.
897.
688.
5OneR95.
892.
095.
895.
888.
5Logistic91.
690.
988.
388.
988.
4Table4.
Percentaccuracyforpredictingallclassesusingthe941-nodegraph.
InducerAllNoDistBkDistDistInterestJ4889.
565.
767.
783.
05.
4OneR67.
741.
167.
767.
74.
5Logistic38.
333.
304.
54.
5Table5.
Precision(truepositivestoallpositives)usingthe941-nodegraph.
4.
2ExperimentsonrestrictedgraphsWedevelopedanapplication,ljclipper,torestricttheoverallfriendsgraphtothatinducedbyasubsetofnodesoffixednumber,foundusingbreadth-firstsearchstartingfromagivenseed.
Usinga4000-nodesubgraphsummarizedinTable3,wegenerated1633185candidateedges.
Notethatallforwarddistancesaregreaterthan1:whenuandvareactuallyconnected,weerase(u,v).
Inpreliminaryexperiments,wethencomputedthelengthoftheshortestalternativepath.
Thisis,however,alessscalableapproach,becausetheasymptoticrunningtimeisdominatedbythesuperlineartimerequiredtocomputeThecompletelistingofalltwelvefeaturesisgiveninSection2.
2.
Thenumericaltypesofallofthenetworkfeatures–boththeonesdescribingthegraphandthosemeasuringandinterestsandratios–makesdatasetamenabletologisticregression.
InducerAccuracyPrecisionRecallJ4899.
997.
596.
1OneR99.
691.
791.
8Table6.
Percentaccuracy,precisionandrecallusinga1000-nodegraph(10-foldCV).
InducerAccuracyPrecisionRecallJ4899.
895.
892.
0OneR99.
791.
189.
9Table7.
Percentaccuracy,precisionandrecallusinga2000-nodegraph(10-foldCV).
InducerAccuracyPrecisionRecallJ4899.
894.
588.
3OneR99.
788.
284.
3Table8.
Percentaccuracy,precisionandrecallusinga4000-nodegraph(10-foldCV).
Table6throughTable8showtheaccuracy,precision,andrecallforthe1000,2000,and4000-nodefriendsgraphs.
Trendsofhigherprecisionthanrecall,anddiminishingprecisionandrecallasthenetworkgrowslarger,areobserved.
Thesetrendsaresustainedforsubsamplesofsize10000andsize100000,thoughprecisionandrecallalsodiminishslightlywithsampling.
4.
3DataacquisitionandlargerexperimentsThecrawlerhasbeenimprovedwithseveralservice-specificoptimizationsforfetchinguserinfopages.
PresentlythesedonotuseLiveJournal'sBMLfeedofuserdata,whichisincompleteforourpurposes(thatis,notallgroundattributesinourinitialrelationsareprovided).
Atpresstime,thiscrawlerprocessesabout20000userrecordsperhourandthuswouldrequireoveraweektocrawlLiveJournal.
ThecurrentbottleneckistheΘ(|V|(|V|+|E|))stepdescribedinSection3.
3.
Thisisthedominantterm,becausetheconstantkdenotingthenumberofcandidateedgesisusuallymuchsmallerthann,e.
g.
,100-1000,sothatΘ(k(|V|+|E|))isnotonlyinΘ(|V|+|E|),butactuallyjustafewhundredtimesthecostofasingleBFS.
4.
4InterpretationUsingmutualinterestsalone,evenwithnormalizationbasedonthenumberofinterestsinuandv,resultsinverypoorpredictionaccuracyusingallinducerswithwhichweexperimented.
Intermediateresultsareachievedusingmutualfriendscountanddegree(NoDist:65.
7%onpredictingedges)andusingforwarddeleteddistanceandbackwarddistance(Dist:67.
7%).
Usingall12computedgraphandannotationfeaturesresultedinthehighestprecision(All:89.
5%)andaccuracy(All:98.
2%).
WenotethatLiveJournalonceusedavariantofnormalizedmutualintereststoproducealistofpotentialfriends,arrangedindecreasingorderofmatchquality.
AlthoughthiswasnotthesametypeofrecommendersystemasLJMinersupports,itshowsthatthestateoftheartusermatchingsystemshavealotofroomforimprovement.
TheresultsindicatethatfeaturesproducedbyLJMiner,usedwithagoodinducer,cangeneratecollaborativeandstructuralrecommendations.
5.
ContinuingWorkScalingup:Ourcurrentresearchfocusesonscalinguptotensofthousandsandeventuallymillionsofusers.
Crawlingover11-12millionrecordsisatleasttechnicallyfeasible,butscalingupthegraphanalyzersisachallengethatmaybestbemetwithheuristicsearch.
Learningrelationalmodels:Apromisingareaofresearchistherecoveryofrelationalgraphicalmodels,includingclass-level(membershipandreferenceslot)uncertainty.
[GFKT02]LJMinerhasyieldedareadysourceofsemistructureddataforbothstructurelearninganddistributionlearning.
Anotherpotentiallyusefulapproachistoorganizeusersandcommunitiesintoclustersusingthisrelationalmodel.
Wehavedevelopedschemasforblogposts(entries,threads,comments)andforusersanddynamicgroupsofusers.
Thisisrelatedtopreviouspreliminaryworkonrelationaldataminingforpersonalizationofwebportals,especiallycomputationalgridportals.
[HBJ03].
Muchoftherelationalmetadatainthebioinformaticsdomaincomesfromdescriptionlanguagesforworkflowsandworkflowcomponents[Hs04].
Thenextstepinourexperimentalplanistouseschemassuchasourdetailedonesforblogseviceusersandbioinformaticsinformationandcomputationalgridusers[Hs05]tolearnaricherpredictivemodel.
Finally,modelingrelationaldataasitpersistsorchangesacrosstimeisanimportantchallenge.
AcknowledgementsWethankToddEastonandKirstenHildrumforhelpfuldiscussionsconcerningalgorithmsandtheLiveJournaldatamodel.
WealsothankAndrewKingandTejaswiPydimarriforcontributionstotheoriginalLJMinersystemandVikasBahirwaniforcontributionstothesecondversion.
References[BG04]I.
Bhattacharya&L.
Getoor.
Deduplicationandgroupdetectionusinglinks.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[CLRS02]T.
H.
Cormen,C.
E.
Leiserson,R.
L.
Rivest,&C.
Stein.
IntroductiontoAlgorithms,SecondEdition.
Cambridge,MA:MITPress,2002.
[GD05]L.
Getoor&C.
P.
Diehl.
Linkmining:asurvey.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):3-12.
[GFKT02]L.
Getoor,N.
Friedman,D.
Koller,&B.
Taskar.
LearningProbabilisticModelsofLinkStructure.
JournalofMachineLearningResearch,2002.
[HBJ03]W.
H.
Hsu,P.
Boddhireddy,&R.
Joehanes.
Usingprobabilisticrelationalmodelsforcollaborativefiltering.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[Hi03]S.
Hill.
SocialnetworkrelationalvectorsforanonymousidentitymatchingInProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[Ho93]R.
C.
Holte.
VerySimpleClassificationRulesPerformWellonMostCommonlyUsedDatasets.
MachineLearning,11(1):63-90.
[Hs04]W.
H.
Hsu.
Relationalgraphicalmodelsofcomputationalworkflowsfordatamining.
InProceedingsoftheInternationalConferenceonSemanticsofaNetworkedWorld:SemanticsforGridDatabases(ICSNW-2004),p.
309-310,Paris,FRANCE,June,2004.
[Hs05]W.
H.
Hsu.
Relationalgraphicalmodelsforcollaborativefilteringandrecommendationofcomputationalworkflowcomponents.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponMulti-AgentInformationRetrievalandRecommenderSystems,Edinburgh,UK,July31,2005.
[HKP+06]W.
H.
Hsu,A.
King,M.
S.
R.
Paradesi,T.
Pydimarri,&T.
Weninger.
CollaborativeandStructuralRecommendationofFriendsusingWeblog-basedSocialNetworkAnalysis.
InProceedingsofthe2006AAAISpringSymposiumonComputatationalApproachestoAnalyzingWeblogs(CAAW2006).
[KHC05]N.
S.
Ketkar,L.
B.
Holder,&D.
J.
Cook.
Comparisonofgraph-basedandlogic-basedmulti-relationaldatamining.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):64-71.
[Ko01]D.
Koller.
Representation,ReasoningandLearning.
IJCAIComputersandThoughtAwardLecture,2001.
[MCW05]A.
McCallum,A.
Corrada-Emmanuel,&X.
Wang.
Topicandrolediscoveryinsocialnetworks.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI),Edinburgh,UK,August,2005.
[MH04]M.
Mukherjee&L.
B.
Holder.
Graph-baseddataminingonsocialnetworks.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[PU03]A.
Popescul&L.
H.
Ungar.
Statisticalrelationallearningforlinkprediction.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[RDHT04]J.
Resig,S.
Dawara,C.
M.
Homan,&A.
Teredesai.
Extractingsocialnetworksfrominstantmessagingpopulations.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[SM05]P.
Sarkar&A.
Moore.
Dynamicsocialnetworkanalysisusinglatentspacemodels.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):31-40.
Digital-vm是一家成立于2019年的国外主机商,商家提供VPS和独立服务器租用业务,其中VPS基于KVM架构,提供1-10Gbps带宽,数据中心可选包括美国洛杉矶、日本、新加坡、挪威、西班牙、丹麦、荷兰、英国等8个地区机房;除了VPS主机外,商家还提供日本、新加坡独立服务器,同样可选1-10Gbps带宽,最低每月仅80美元起。下面列出两款独立服务器配置信息。配置一 $80/月CPU:E3-...
a400互联是一家成立于2020年商家,主营美国机房的产品,包括BGP线路、CN2 GIA线路的云服务器、独立服务器、高防服务器,接入线路优质,延迟低,稳定性高,额外也还有香港云服务器业务。当前,全场服务器5折,香港VPS7折,洛杉矶VPS5折,限时促销!A400互联官网:https://a400.net/优惠活动全场独服永久5折优惠(续费同价):0722香港VPS七折优惠:0711洛杉矶VPS五...
Pia云商家在前面有介绍过一次,根据市面上的信息是2018的开办的国人商家,原名叫哔哔云,目前整合到了魔方云平台。这个云服务商家主要销售云服务器VPS主机业务和服务,云服务器采用KVM虚拟架构 。目前涉及的机房有美国洛杉矶、中国香港和深圳地区。洛杉矶为crea机房,三网回程CN2 GIA,自带20G防御。中国香港机房的线路也是CN2直连大陆,比较适合建站或者有游戏业务需求的用户群。在这篇文章中,简...
graphsearch为你推荐
参数winrar5主机route技术参数及要求:重庆网通中国联通重庆分公司的公司简介css3圆角用CSS3怎么实现圆角边框?勒索病毒win7补丁求问win7 64位旗舰版怎么预防勒索病毒icloudiphone没开启icloud的iphone怎么用find my iphone找回googleadsense10分钟申请Google Adsense是一种怎样的体验chromeframe我的Chrome Frame为什么不能使用?迅雷雷鸟迅雷会员每日免费抽奖,抽中迅雷的雷鸟披肩了,要钱吗
虚拟空间免费试用 安徽虚拟主机 香港服务器租用 idc评测 樊云 jsp主机 pw域名 站群服务器 免费ftp空间 圣诞节促销 最好看的qq空间 国外在线代理 刀片服务器的优势 hinet 100m独享 怎么建立邮箱 in域名 便宜空间 如何登陆阿里云邮箱 windowsserver2008 更多