reciprocatedgraphsearch

graphsearch  时间:2021-05-25  阅读:()
ICWSM'2007Boulder,Colorado,USAStructuralLinkAnalysisfromUserProfilesandFriendsNetworks:AFeatureConstructionApproachWilliamH.
HsuJosephLancasterMartinS.
R.
ParadesiTimWeningerDepartmentofComputingandInformationSciences,KansasStateUniversity234NicholsHallManhattan,KS66506-2302+17855326350{bhsu|joseph|pmsr|weninger}@ksu.
eduAbstractWeconsidertheproblemsofpredicting,classifying,andannotatingfriendsrelationsinfriendsnetworks,baseduponnetworkstructureanduserprofiledata.
First,wedocumentadatamodelfortheblogserviceLiveJournal,anddefineasetofmachinelearningproblemssuchaspredictingexistinglinksandestimatinginter-pairdistance.
Next,weexplainhowtheproblemofclassifyingauserpairinasocialnetwork,asdirectlyconnectedornot,posestheproblemofselectingandconstructingrelevantfeatures.
Wedocumentfeatureanalyzersforattributesthatdependonlyongraphattributes,thosethatdependonindividualuserdemographicsandset-valuedattributes(e.
g.
,interests,communities,andeducationalinstitutions),andthosethatdependoncandidateuserpairs.
Wethenextendourdatamodelusingwhole-networkattributesandreportmachinelearningexperimentsonlearningtheconceptofaconnectedpairoffriendsfromLiveJournaldata.
Finally,wedevelopatheoryofdependenttypesforderivingcausalexplanationsanddiscusshowthiscanbeusedtoscalestatisticalrelationallearninguptoourfullcorpus,arecentcrawlofoveramillionrecordsfromLiveJournal.
GeneralTermsAlgorithms,ExperimentationKeywordsdatamining,linkanalysis,machinelearning,socialnetworkanalysis,userprofiling.
1.
IntroductionAnalysisoffriendsnetworksprovidesabasisforunderstandingthewebofinfluence[Ko01]insocialmedia.
Inparticular,theproblemsofdeterminingtheexistenceoflinksandofclassifyingandannotatingknownlinksarefirststepstowardidentifyingpotentialrelationships.
Thisinferredinformationcaninturnbeusedtointroducenewpotentialfriendstooneanother,makebasicrecommendationssuchascommunityrecruitsormoderatorcandidates,oridentifywholecliquesandcommunities.
Inthispaper,weconsidertheproblemofdiscoveringlinksinanincompletegraph.
Wepresentanapproachtolinkpredictionthatisbasedongraphfeatureanalysisandintrinsicattributesofentities(usersandcommunities).
Wereportsomepromisingpreliminaryresultsonradius-limitedneighborhoodsofthebloggingserviceLiveJournalanddiscusstheresultsofexploratoryexperimentsthatpointtowardaneedtodifferentiatethetypesoffeaturesinafriendsnetwork,namely:1.
thosethatdependonthedemographicsoftheentirenetwork2.
thosethatarecomputableforeachuseroreachpairofuser3.
thosethatdependontheexistenceofareported,inferred,orsuspectedlinkWederivesomesuchfeaturesanddiscussthecostsofcomputing,selecting,andrecombiningthem.
Ofparticularinterestinthedomainofcommercialweblogsandsocialmediaaredemographicfeaturesrelevanttocollaborativerecommendationofgoodsandformationofbrandingcommunities.
Thestructuraldependenceandcontext-specificdependenceoffeaturesdetermineswhatnewfeaturesarefeasibletoconstruct,bothintermsofstatisticalsufficiencyandcomputationalcomplexity.
Inconclusion,weexaminesomenewfeaturesthatwerederivedbyhand,discussthealgorithmsusedtocomputethem,andrelatethesespecificalgorithmstoabroaderclassofrelationaldatabasequeriesthatformthebasisofamorepowerfulfeatureconstructionsystem.
2.
Background2.
1FriendsNetworksfromUserProfilesSocialnetworkservicessuchasMySpaceandFacebookallowuserstolistinterestsandlinktofriends,sometimesannotatingtheselinksbydesignatingtrustlevelsorqualitativeratingsforselectedfriends.
Somesuchservices,suchasGoogle'sOrkut,arecommunity-centric;others,suchasthevideobloggingserviceYouTubeandthephotoserviceFlickr,emphasizesocialmedia;whilesome,suchasSixApart'sLiveJournalandVox,areorganizedaroundtext-and-imageweblogs.
LiveJournalanditsderivativeservices,suchasGreatestJournal,DeadJournal,andJournalFen,arebasedonthesameopen-sourceservercode.
Atthetimeofthiswriting,thereareover11.
7millionLiveJournalaccounts,1.
8millionofthemactive.
ThefriendsnetworkofLiveJournal,ourtopicofstudy,hastwovarietiesofaccounts:usersandcommunities(weomitRSSfeeds).
Oneadvantageouspropertyofitsdatamodel,stemmingfromacommonschemaforthetwoaccounttypes(whichcouldoriginallybeconvertedfromusertocommunity),isthatitprovidesasimple,flexiblerepresentationforentitiesandrelations.
StartEndLinkDenotesUserUserTrustorfriendshipUserCommunityReadershiporsubscribershipCommunityUserMembership,postingaccess,maintainerCommunityCommunityObsoleteTable1.
TypesoflinksintheblogserviceLiveJournal.
Table1showsthetypesoflinksinLiveJournalandtheirconstituentattributes.
Friendshipisanasymmetricrelationbetweentwoaccounts,eachrepresentedbyavertexinadirectedgraph.
Thetypeofthestartandendpointdefinestherelationshipsetattributesofthelink.
Forexample,auseruwhoaddsanotheruservtohisorherfriendslistcanspecifythemembershipinanyofupto30groups.
Theseservethedualpurposeofblogaggregation(postsfromeachgroup'smembersarefilteredintoitsaggregatorpage,whichucanreadormakepublic)andgroups-basedsecurity(eachgroupdenotesaread/commentaccesscontrollist).
Accesscontrollistsforcommunitiesareassociatedwithmemberships(community-to-userlinks),whilecontentiscontrolledbypostersorsubscribers.
Ausercan"watch"acommunityinordertoaddallaccessiblepoststoamainaggregatorpageortocustomgroups.
Thesetofaccessiblepostsconsistsofeitherpublicpostsonly,orpublicandrestricted(members-only)posts.
Theaccesscontrollistisdefinedbythemembershiprelationandindividualposters'selections(whethertoallowcommentsandwhethertodisplaythembydefaultfromnoreaders,allreaders,non-anonymousreaders,orcommunitymembers).
Acquisitionofprivilegesisacommunityproperty,ofwhichonlymembershipmaybeacquiredsolelybyuseraction("joining"acommunity),ifthemoderatorhasspecifiedopenmembership.
Figure1.
LiveJournalaccesscontrollistmaintenance(communitymoderatorinterface).
Thus,areciprocallinkbetweenauserandacommunitymeansthattheuserbothsubscribestothecommunityandisanapprovedmember.
Linksfromuserutovarelistedinthe"Friends"listofuandinanoptionallydisplayed"FriendsOf"listofv.
Thislistcanbepartitionedintoreciprocalandnon-reciprocalsublistsforauseru:MutualFriends:{v|(v,u)∈E∧(u,v)∈E}AlsoFriendOf:{v|(v,u)∈E∧(u,v)E}Thecommunityanalogueofthe"FriendsOf"lististhe"WatchedBy"(subscriber)list,whosemembershavethecommunitynamelistedinthe"Friends:Communities"sectionsoftheirindividualuserprofilepages.
Thecommunityanalogueofthe"Friends"lististhe"Members"list.
ThefriendsnetworkforLiveJournalconsistsofaverylargecentralconnectedcomponentandmanysmallislands,mostofwhicharesingletonusers.
Thereareafewsourcevertices,correspondingtoaccountsthatlinktoothersbuthavenoreciprocatedfriendships;theseareusuallyRSSorblogaggregatoraccountsownedbyindividuals.
Additionally,therearesinkverticescorrespondingtoaccountswatchedbyothers,butwhichhavenamednofriends.
Someofthesearechannelsforannouncementordisseminationofcreativework.
2.
2LinkIdentificationInpreviouswork[HKP+06],weintroducedalinkpredictionproblemforLiveJournal:givenagraphinwhichtheexistenceofacandidatelinkishidden(elidedifitexists),classifyitaspresentorabsentgivenallotherattributesofthegraphandoftheendpoints.
Ourinitialapproachtolinkidentificationconsistedofdividingfriendsnetworkfeaturesintographfeaturesandinterest-basedfeatures.
Graphfeaturescouldbecomputedsimplybyscanningthegraph,inthecaseofpair-distancemetrics,performingall-pairsshortestpath(APSP)search:1.
Indegreeofu:popularityoftheuser2.
Indegreeofv:popularityofthecandidate3.
Outdegreeofu:numberofotherfriendsbesidesthecandidate;saturationoffriendslist4.
Outdegreeofv:numberofexistingfriendsofthecandidatebesidestheuser;correlateslooselywithlikelihoodofareciprocallink5.
Numberofmutualfriendswsuchthatu→w∧w→v6.
"Forwarddeleteddistance":minimumalternativedistancefromutovinthegraphwithouttheedge(u,v)7.
BackwarddistancefromvtouinthegraphTheseweresupplementedbyinterest-basedfeatures:8.
Numberofmutualinterestsbetweenuandv9.
Numberofinterestslistedbyu10.
Numberofinterestslistedbyv11.
Ratioofthenumberofmutualintereststothenumberlistedbyu12.
Ratioofthenumberofmutualintereststothenumberlistedbyv2.
3EfficientfeatureanalysisThedegreeattributescanbeenumeratedintimelinearinthenumberofusers,ascanthemutualfriendscountforeachpairofusers.
Forwarddeleteddistancemeasuresthedistancefromutovbyalternateroutes,aftertheedge(u,v)iselided.
Thepredictiontaskisthustoreconstructtheincompletegraphresultingfromthiserasure,todeterminewhetheraparticularlink(u,v)existed.
ForwarddeleteddistancecanbeprecomputedexhaustivelyfortheentiregraphinΘ(|E|(|V|+|E|))=Θ(|E|2)timebyerasingeachedgeinEandre-runningabreadth-firstsearchfromthestartvertex.
Ifacandidateedgeisnotstoredintheresultingcache,itsdeleteddistanceisthatfoundbyBFSontheoriginalgraph,inΘ(|V|+|E|)time.
Inagraph(V,E),backwarddistancerequiresΘ(|V|+|E|)usingBFSforaparticularcandidateedge.
SincetheexpectedsizeoftheedgesetisE[|E|]=k|V|,aboutk=20onaverageacrossLiveJournal,thebottleneckcomputationisthatofforwarddeleteddistance:Θ(|E|2)=Θ(k2|V|2),orΘ(|V|2)withalargeconstant.
Usingastraightforwardstringpairenumerationandcomparisonalgorithm,themutualinterestcountsarestoredinmatrixof|V|2elements,eachrequiringconstanttimetocheck(givenamaximumof150interests).
previouswork[HKP+06],weintroducedalinkpredictionproblemforLiveJournal:givenagraphinwhichtheexistenceofacandidatelinkishidden(elidedifitexists),classifyitaspresentorabsentgivenallotherattributesofthegraphandoftheendpoints.
Ourinitialapproachtolinkidentificationconsistedofdividingfriendsnetworkfeaturesintographfeaturesandinterest-basedfeatures.
2.
4MethodologiesforlinkminingGetoorandDiehl[GD05]recentlysurveyedtechniquesforlinkmining,focusingonstatisticalrelationallearningapproachesandemphasizinggraphicalmodelsrepresentationsoflinkstructure.
Ketkaretal.
[KHC05]comparedataminingtechniquesovergraph-basedrepresentationsoflinkstofirst-orderandrelationalrepresentationsandlearningtechniquesthatarebaseduponinductivelogicprogramming(ILP).
SarkarandMoore[SM05]extendtheanalysisofsocialnetworksintothetemporaldimensionbymodelingchangeinlinkstructureacrossdiscretetimesteps,usinglatentspacemodelsandmultidimensionalscaling.
OneofthechallengesincollectingtimeseriesdatafromLiveJournalistheslowrateofdataacquisition,justasspatialannotationdata(suchasthatfoundinLJmapsandthe"plotyourfriendsonamapmeme)isrelativelyincomplete.
2.
5OtherapplicationsusinggraphminingPopesculandUngar[PU03]learnakindofentity-relationalmodelfromdatainordertopredictlinks.
Hill[Hi03]andBhattacharyaandGetoor[BG04]similarlyusestatisticalrelationallearningfromdatainordertoresolveidentityuncertainty,particularlycoreferencesandotherredundancies(alsocalleddeduplication).
Resigetal.
[RDHT04]usealarge(200000-user)crawlofLiveJournaltoannotateasocialnetworkofinstantmessagingusers,andexploretheapproachofpredictingonlinetimesasafunctionoffriendsgraphdegree.
Therehavebeennumerousrecentapplicationsofsocialnetworkminingbasedonthetextandheadersofe-mail.
OnenotableresearchprojectbyMcCallumetal.
[MCW05]usestheEnrone-mailcorpusandinfersrolesandtopiccategoriesbasedonlinkanalysisAprimarygoalofthisworkistoextendthegraphminingapproachbeyondlinkpredictionandrecommendationtowardslinkexplanationandannotation.
Itmaybemuchmoreusefultoexplainwhyagroupoffriendsinablogservicecreatedaccountsenmasseoraddedoneanotherasfriendsthantorecommendrelationshipsetsthatarealreadyextantorstructuredaccordingtoapreexistentsocialgroup.
Forexample,highschoolclassmatesoftencreateaccountsandencouragetheirpeerstojointhesameservice.
Inafewcases,thisisencouragedorfacilitatedbyateacher,foraclassproject.
Solvingtheproblemoflinkpredictionisnotparticularlyusefulinthiscase,becausetheuserdecisionshavealreadybeenmadeorstronglyconstrained;however,itmaybeveryusefultolinkotherclassmatesnotworkingonthesameprojecttothesamerelationshipset(perhapstheywereencouragedtojointheblogservicebystudentswhocontinuedtouseitaftertheclassproject).
Largegroupssuchaswebcomicsubscriberships,communityco-members,etc.
arealsosomewhatidentifiable,andrelatingmembersofablogservicetooneanotherthroughrelationshipsetsisatypicalentity-relationaldatamodelingoperationthatcanbemademorerobustandefficientthroughgraphfeatureextraction.
3.
ExperimentDesign3.
1LJCrawlerv2Toacquirethegraphstructureandattributesdescribeintheprevioussection,wedevelopedanHTTP-basedspidercalledLJCrawlertoharvestuserinformationfromLiveJournalAmultithreadedversionofthisprogram,whichretrievesBMLdatapublishedbyDenga(theownersofLiveJournal),collectsanaverageofupto15recordspersecond,traversingthesocialnetworkdepth-firstandarchivingtheresultsinamasterindexfile.
BecauseLiveJournal'sfunctionalityforlookingupusersbyusernumberisonlyavailabletoadministrators,wedecidedtocompilealistofseedsforadisjoint-setrepresentationofthedisconnectedsocialnetwork.
Forpurposesofthisexperiment,however,startingfromjustoneseed(thefirstauthor'sLiveJournalID)andrestrictingthecrawltooneconnectedcomponentwassufficient.
UsingLJCrawler,wecompiledanadjacencylistandthefollowinggroundfeaturesforeachuser:Accounttype(user,community)InterestlistSchoollistCommunitieswatchedlistCommunitymembershiplistFriendsoflistFriendslist3.
2FeatureAnalyzersWedefineasingleexampletobeacandidateedge(u,v)intheunderlyingdirectedgraphofthesocialnetwork,alongwithasetofdescriptivefeaturescalculatedfromtheannotatedgraphrecordedbyLJCrawler:Otherfeatures:Additionalplannedfeaturesforcontinuingexperimentsincludedates(updatefrequencieswhentakendifferentially),useroptionssuchasmaximumfriendscount,andcontentdescriptorsofLiveJournalentriesandcomments(averagepostlength,wordfrequency,etc.
).
3.
3GraphSearchAlgorithmsforComputingFeaturesComputingtheminimumforwardandbackwarddistancescanbedonemoreefficientlybyusingbreadth-firstsearch.
Currently,aJavaimplementationofthisalgorithmrequiresunderoneminuteona2GHzAMDOpteronsystemtoprocessa2000-nodegraph.
However,enumeratingallpossiblecandidatepairswithinaneighborhoodof2nodes(1.
6millionpairsfor4000nodes)requiresseveralhoursonthesamesystem.
WenotethattheamortizedcostofrunningBFStoprecomputeall-pairsshortestpaths(APSP)withtheactualedgedeleted(whichisnecessarytoavoidknowingthepredictiontargetinlinkpredicton)isΘ(|E|(|V|+|E|)).
Thisisprohibitivelylargeevenforour"mid-sized"subgraphsof10-50Knodes;when|V|isabout11million,|E|isalittleover200million,enumeratingAPSPiscompletelyinfeasible.
However,wedonottypicallyconsiderallofE,sothebottleneckistypicallythefirststepplusaconstantnumberofcallstoBFS,requiringrunningtimeinΘ(k(|V|+|E|)).
3.
4GeneratingCandidatesWeconsideredseveralalternativewaystogeneratecandidateedges(u,v):Thefirsttechniqueislikelytobeunscalable,asthenumberofcandidatesis|V|2.
ThesecondrequireshavingarepresentativelylargesampleofthefullLiveJournalsocialnetwork,inordertofitthedistributionparametersaccurately.
Thethirdwasthemoststraightforwardtoimplement.
Twocallstotheallpairsshortestpathalgorithmprovidedcostmatrix,andonepassateachradiusuptoamaximumof10yieldedthedatashowninTable2.
Tosimplifytheinitialexperiments,wedefinedtheclassificationproblemtobeclassificationofd(u,v)as1or2.
Thistaskisactuallyusefulforsocialnetworkrecommendersystemsbecausediscriminationofadirectfriendfroma"friendofafriend"(FOAF)isfunctionallysimilartorecommendingFOAFstolinktodirectly.
Therearemoredetailedclassificationtargets,suchasplacement,promotion,anddemotionoflinkedfriendswithinstrataoftrust(setting,increasing,anddecreasingthesecuritylevel),butchoosingauser'sfriendstobeginwithisthemorefundamentaldecision.
Table2andTable3reportthedistributionofinter-vertexdistancesinthefriendsnetworkfortwosubnetworksinducedbylimitingthemaximumnumberofnodes.
DistancedFrequency(=d)Cumulative(≤d)1620462042107307113511369896183407459926243333534002467336255246988716247004812470059001000∞9731256735Table2.
Numberofcandidateedgesforthe1000-nodeLiveJournalgraph.
DistancedFrequency(=d)Cumulative(≤d)1194101941023705683899783403075793053452037313134265123747143717361845314556267265714582838339145862292914586511001458651∞1745341633185Table3.
Numberofcandidateedgesforthe4000-nodeLiveJournalgraph.
4.
Results4.
1Preliminaryexperiment:941-nodeversionInapreliminaryexperiment,weconstructeda941-nodesubgraph,definingtheconceptIsFriendOfandtrainedthreetypesofinducerswith:1.
allattributes2.
allgraphattributesexcludingtheforwardandbackwarddistances3.
thebackwarddistancesalone4.
thebackwardandforwarddistancesalone5.
interest-relatedattributesalone.
Table4andTable5showtheresultsforthreeinducers:theJ48decisiontreeinducer,Holte's1Rinducer(asingle-ruleclassifierbasedonasingleattribute)[Ho93],andtheLogisticregressioninducer.
Allaccuracymeasureswerecollectedover10-foldcross-validatedruns.
TheJ48outputwthallfeaturesachievesasignificantboostoverthenexthighest(distanceonly).
InducerAllNoDistBkDistDistInterestJ4898.
294.
895.
897.
688.
5OneR95.
892.
095.
895.
888.
5Logistic91.
690.
988.
388.
988.
4Table4.
Percentaccuracyforpredictingallclassesusingthe941-nodegraph.
InducerAllNoDistBkDistDistInterestJ4889.
565.
767.
783.
05.
4OneR67.
741.
167.
767.
74.
5Logistic38.
333.
304.
54.
5Table5.
Precision(truepositivestoallpositives)usingthe941-nodegraph.
4.
2ExperimentsonrestrictedgraphsWedevelopedanapplication,ljclipper,torestricttheoverallfriendsgraphtothatinducedbyasubsetofnodesoffixednumber,foundusingbreadth-firstsearchstartingfromagivenseed.
Usinga4000-nodesubgraphsummarizedinTable3,wegenerated1633185candidateedges.
Notethatallforwarddistancesaregreaterthan1:whenuandvareactuallyconnected,weerase(u,v).
Inpreliminaryexperiments,wethencomputedthelengthoftheshortestalternativepath.
Thisis,however,alessscalableapproach,becausetheasymptoticrunningtimeisdominatedbythesuperlineartimerequiredtocomputeThecompletelistingofalltwelvefeaturesisgiveninSection2.
2.
Thenumericaltypesofallofthenetworkfeatures–boththeonesdescribingthegraphandthosemeasuringandinterestsandratios–makesdatasetamenabletologisticregression.
InducerAccuracyPrecisionRecallJ4899.
997.
596.
1OneR99.
691.
791.
8Table6.
Percentaccuracy,precisionandrecallusinga1000-nodegraph(10-foldCV).
InducerAccuracyPrecisionRecallJ4899.
895.
892.
0OneR99.
791.
189.
9Table7.
Percentaccuracy,precisionandrecallusinga2000-nodegraph(10-foldCV).
InducerAccuracyPrecisionRecallJ4899.
894.
588.
3OneR99.
788.
284.
3Table8.
Percentaccuracy,precisionandrecallusinga4000-nodegraph(10-foldCV).
Table6throughTable8showtheaccuracy,precision,andrecallforthe1000,2000,and4000-nodefriendsgraphs.
Trendsofhigherprecisionthanrecall,anddiminishingprecisionandrecallasthenetworkgrowslarger,areobserved.
Thesetrendsaresustainedforsubsamplesofsize10000andsize100000,thoughprecisionandrecallalsodiminishslightlywithsampling.
4.
3DataacquisitionandlargerexperimentsThecrawlerhasbeenimprovedwithseveralservice-specificoptimizationsforfetchinguserinfopages.
PresentlythesedonotuseLiveJournal'sBMLfeedofuserdata,whichisincompleteforourpurposes(thatis,notallgroundattributesinourinitialrelationsareprovided).
Atpresstime,thiscrawlerprocessesabout20000userrecordsperhourandthuswouldrequireoveraweektocrawlLiveJournal.
ThecurrentbottleneckistheΘ(|V|(|V|+|E|))stepdescribedinSection3.
3.
Thisisthedominantterm,becausetheconstantkdenotingthenumberofcandidateedgesisusuallymuchsmallerthann,e.
g.
,100-1000,sothatΘ(k(|V|+|E|))isnotonlyinΘ(|V|+|E|),butactuallyjustafewhundredtimesthecostofasingleBFS.
4.
4InterpretationUsingmutualinterestsalone,evenwithnormalizationbasedonthenumberofinterestsinuandv,resultsinverypoorpredictionaccuracyusingallinducerswithwhichweexperimented.
Intermediateresultsareachievedusingmutualfriendscountanddegree(NoDist:65.
7%onpredictingedges)andusingforwarddeleteddistanceandbackwarddistance(Dist:67.
7%).
Usingall12computedgraphandannotationfeaturesresultedinthehighestprecision(All:89.
5%)andaccuracy(All:98.
2%).
WenotethatLiveJournalonceusedavariantofnormalizedmutualintereststoproducealistofpotentialfriends,arrangedindecreasingorderofmatchquality.
AlthoughthiswasnotthesametypeofrecommendersystemasLJMinersupports,itshowsthatthestateoftheartusermatchingsystemshavealotofroomforimprovement.
TheresultsindicatethatfeaturesproducedbyLJMiner,usedwithagoodinducer,cangeneratecollaborativeandstructuralrecommendations.
5.
ContinuingWorkScalingup:Ourcurrentresearchfocusesonscalinguptotensofthousandsandeventuallymillionsofusers.
Crawlingover11-12millionrecordsisatleasttechnicallyfeasible,butscalingupthegraphanalyzersisachallengethatmaybestbemetwithheuristicsearch.
Learningrelationalmodels:Apromisingareaofresearchistherecoveryofrelationalgraphicalmodels,includingclass-level(membershipandreferenceslot)uncertainty.
[GFKT02]LJMinerhasyieldedareadysourceofsemistructureddataforbothstructurelearninganddistributionlearning.
Anotherpotentiallyusefulapproachistoorganizeusersandcommunitiesintoclustersusingthisrelationalmodel.
Wehavedevelopedschemasforblogposts(entries,threads,comments)andforusersanddynamicgroupsofusers.
Thisisrelatedtopreviouspreliminaryworkonrelationaldataminingforpersonalizationofwebportals,especiallycomputationalgridportals.
[HBJ03].
Muchoftherelationalmetadatainthebioinformaticsdomaincomesfromdescriptionlanguagesforworkflowsandworkflowcomponents[Hs04].
Thenextstepinourexperimentalplanistouseschemassuchasourdetailedonesforblogseviceusersandbioinformaticsinformationandcomputationalgridusers[Hs05]tolearnaricherpredictivemodel.
Finally,modelingrelationaldataasitpersistsorchangesacrosstimeisanimportantchallenge.
AcknowledgementsWethankToddEastonandKirstenHildrumforhelpfuldiscussionsconcerningalgorithmsandtheLiveJournaldatamodel.
WealsothankAndrewKingandTejaswiPydimarriforcontributionstotheoriginalLJMinersystemandVikasBahirwaniforcontributionstothesecondversion.
References[BG04]I.
Bhattacharya&L.
Getoor.
Deduplicationandgroupdetectionusinglinks.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[CLRS02]T.
H.
Cormen,C.
E.
Leiserson,R.
L.
Rivest,&C.
Stein.
IntroductiontoAlgorithms,SecondEdition.
Cambridge,MA:MITPress,2002.
[GD05]L.
Getoor&C.
P.
Diehl.
Linkmining:asurvey.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):3-12.
[GFKT02]L.
Getoor,N.
Friedman,D.
Koller,&B.
Taskar.
LearningProbabilisticModelsofLinkStructure.
JournalofMachineLearningResearch,2002.
[HBJ03]W.
H.
Hsu,P.
Boddhireddy,&R.
Joehanes.
Usingprobabilisticrelationalmodelsforcollaborativefiltering.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[Hi03]S.
Hill.
SocialnetworkrelationalvectorsforanonymousidentitymatchingInProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[Ho93]R.
C.
Holte.
VerySimpleClassificationRulesPerformWellonMostCommonlyUsedDatasets.
MachineLearning,11(1):63-90.
[Hs04]W.
H.
Hsu.
Relationalgraphicalmodelsofcomputationalworkflowsfordatamining.
InProceedingsoftheInternationalConferenceonSemanticsofaNetworkedWorld:SemanticsforGridDatabases(ICSNW-2004),p.
309-310,Paris,FRANCE,June,2004.
[Hs05]W.
H.
Hsu.
Relationalgraphicalmodelsforcollaborativefilteringandrecommendationofcomputationalworkflowcomponents.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponMulti-AgentInformationRetrievalandRecommenderSystems,Edinburgh,UK,July31,2005.
[HKP+06]W.
H.
Hsu,A.
King,M.
S.
R.
Paradesi,T.
Pydimarri,&T.
Weninger.
CollaborativeandStructuralRecommendationofFriendsusingWeblog-basedSocialNetworkAnalysis.
InProceedingsofthe2006AAAISpringSymposiumonComputatationalApproachestoAnalyzingWeblogs(CAAW2006).
[KHC05]N.
S.
Ketkar,L.
B.
Holder,&D.
J.
Cook.
Comparisonofgraph-basedandlogic-basedmulti-relationaldatamining.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):64-71.
[Ko01]D.
Koller.
Representation,ReasoningandLearning.
IJCAIComputersandThoughtAwardLecture,2001.
[MCW05]A.
McCallum,A.
Corrada-Emmanuel,&X.
Wang.
Topicandrolediscoveryinsocialnetworks.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI),Edinburgh,UK,August,2005.
[MH04]M.
Mukherjee&L.
B.
Holder.
Graph-baseddataminingonsocialnetworks.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[PU03]A.
Popescul&L.
H.
Ungar.
Statisticalrelationallearningforlinkprediction.
InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI)WorkshoponStatisticalLearningofRelationalModels(SRL),Acapulco,MEXICO,August,2003.
[RDHT04]J.
Resig,S.
Dawara,C.
M.
Homan,&A.
Teredesai.
Extractingsocialnetworksfrominstantmessagingpopulations.
InProceedingsoftheACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD)WorkshoponLinkAnalysisandGroupDetection(LinkKDD2004),Seattle,WA,USA,August22-25,2004.
[SM05]P.
Sarkar&A.
Moore.
Dynamicsocialnetworkanalysisusinglatentspacemodels.
SIGKDDExplorations,SpecialIssueonLinkMining,7(2):31-40.

湖北22元/月(昔日数据)云服务器,国内湖北十堰云服务器,首月6折

昔日数据怎么样?昔日数据新上了湖北十堰云服务器,湖北十堰市IDC数据中心 母鸡采用e5 2651v2 SSD MLC企业硬盘 rdid5阵列为数据护航 100G高防 超出防御峰值空路由2小时 不限制流量。目前,国内湖北十堰云服务器,首月6折火热销售限量30台价格低至22元/月。(注意:之前有个xrhost.cn也叫昔日数据,已经打不开了,一看网站LOGO和名称为同一家,有一定风险,所以尽量不要选择...

SugarHosts糖果主机六折 云服务器五折

也有在上个月介绍到糖果主机商12周年的促销活动,我有看到不少的朋友还是选择他们家的香港虚拟主机和美国虚拟主机比较多,同时有一个网友有联系到推荐入门的个人网站主机,最后建议他选择糖果主机的迷你主机方案,适合单个站点的。这次商家又推出所谓的秋季活动促销,这里一并整理看看这个服务商在秋季活动中有哪些值得选择的主机方案,比如虚拟主机最低可以享受六折,云服务器可以享受五折优惠。 官网地址:糖果主机秋季活动促...

简单测评v5.net的美国cn2云服务器:电信双程cn2+联通AS9929+移动直连

v5.net一直做独立服务器这块儿的,自从推出云服务器(VPS)以来站长一直还没有关注过,在网友的提醒下弄了个6G内存、2核、100G SSD的美国云服务器来写测评,主机测评给大家趟雷,让你知道v5.net的美国云服务器效果怎么样。本次测评数据仅供参考,有兴趣的还是亲自测试吧! 官方网站:https://v5.net/cloud.html 从显示来看CPU是e5-2660(2.2GHz主频),...

graphsearch为你推荐
签约xp仪器win7图书馆学、情报学期刊投稿指南重庆网通中国联通重庆分公司的公司简介eaccelerator使用apmsevr中eAccelerator显示NO是什么问题netbios端口netbios ssn是什么意思?ipad如何上网iPad怎么上网?请高手指点勒索病毒win7补丁为了防勒索病毒,装了kb4012212补丁,但出现关机蓝屏的问题了,开机正常win10关闭445端口在win10 如何关闭445端口的最新相关信息iphone连不上wifi苹果8p连接不了WiFi
php虚拟空间 免费二级域名 欧洲欧洲vps 域名备案信息查询 淘宝抢红包攻略 阿云浏览器 oneasiahost 网站监控 debian源 cpanel空间 权嘉云 建立邮箱 789电视网 qq对话框 香港新世界中心 上海联通宽带测速 常州联通宽带 cloudlink qq金券 镇江高防 更多