pandemic403forbidden

403forbidden  时间:2021-04-12  阅读:()
LosingMyRevolutionHowManyResourcesSharedonSocialMediaHaveBeenLostHanyM.
SalahEldeenandMichaelL.
NelsonOldDominionUniversity,DepartmentofComputerScienceNorfolkVA,23529,USA{hany,mln}@cs.
odu.
eduAbstract.
Socialmediacontenthasgrownexponentiallyintherecentyearsandtheroleofsocialmediahasevolvedfromjustnarratinglifeeventstoactuallyshapingthem.
Inthispaperweexplorehowmanyresourcessharedinsocialmediaarestillavailableontheliveweborinpublicwebarchives.
Byanalyzingsixdierentevent-centricdatasetsofresourcessharedinsocialmediaintheperiodfromJune2009toMarch2012,wefoundabout11%lostand20%archivedafterjustayearandanaverageof27%lostand41%archivedaftertwoandahalfyears.
Furthermore,wefoundanearlylinearrelationshipbetweentimeofsharingoftheresourceandthepercentagelost,withaslightlylesslinearrelationshipbetweentimeofsharingandarchivingcoverageoftheresource.
Fromthismodelweconcludethataftertherstyearofpublishing,nearly11%ofsharedresourceswillbelostandafterthatwewillcontinuetolose0.
02%perday.
Keywords:WebArchiving,SocialMedia,DigitalPreservation1IntroductionWithmorethan845millionFacebookusersattheendof2011[5]andover140milliontweetssentdailyin2011[16]userscantakephotos,videos,posttheiropinions,andreportincidentsastheyhappen.
Manyofthepostsandtweetsareaboutquotidianeventsandtheirpreservationisdebatable.
However,someofthepostsandeventsareaboutculturallyimportanteventswhosepreservationislesscontroversial.
Inthispaperweshedlightontheimportanceofarchivingsocialmediacontentabouttheseeventsandestimatehowmuchofthiscontentisarchived,stillavailable,orlostwithnopossibilityofrecovery.
Toemphasizetheculturallyimportantcommentaryandsharing,wecol-lecteddataaboutsixeventsinthetimeperiodofJune2009toMarch2012:theH1N1virusoutbreak,MichaelJackson'sdeath,theIranianelectionsandprotests,BarackObama'sNobelPeacePrize,theEgyptianrevolution,andtheSyrianuprising.
arXiv:1209.
3026v1[cs.
DL]13Sep20122HanyM.
SalahEldeenandMichaelL.
Nelson2RelatedWorkToourknowledge,nopriorstudyhasanalyzedtheamountofsharedresourcesinsocialmedialostthroughtime.
Therehavebeenmanystudiesanalyzingthebehaviorofuserswithinasocialnetwork,howtheyinteract,andwhatcontenttheyshare[3,19,20,23].
AsforTwitter,Kwaketal.
[6]studieditsnatureanditstopologicalcharacteristicsandfoundadeviationfromknowncharacteristicsofhumansocialnetworksthatwereanalyzedbyNewmanandPark[10].
Leeanalyzedthereasonsbehindsharingnewsinsocialmediaandfoundthatinfor-mativenesswasthestrongestmotivationinpredictingnewssharingintention,followedbysocializingandstatusseeking[4].
AlsosharedcontentinsocialmedialikeTwittermoveanddiuserelativelyfastasstatedbyYangetal.
[22].
Furthermore,manyconcernswereraisedaboutthepersistenceofsharedresourcesandwebcontentingeneral.
NelsonandAllenstudiedthepersistenceofobjectsinadigitallibraryandfoundthat,withjustoverayear,3%ofthesampletheycollectedhaveappearedtonolongerbeavailable[9].
Sandersonetal.
analyzedthepersistenceandavailabilityofwebresourcesreferencedfrompapersinscholarlyrepositoriesusingMementoandfoundthat28%oftheseresourceshavebeenlost[14].
Memento[17]isacollectionofHTTPextensionsthatenablesuniform,inter-archiveaccess.
Ainsworthetal.
[1]examinedhowmuchofthewebisarchivedandfounditrangesfrom16%to79%,dependingonthestartingseedURIs.
McCownetal.
examinedthefactorsaectingreconstructingwebsites(usingcachesandarchives)andfoundthatPageRank,Age,andthenumberofhopsfromthetop-levelofthesiteweremostinuential[8].
3DataGatheringWecompiledalistofURIsthatweresharedinsocialmediaandcorrespondtospecicculturallyimportantevents.
Inthissectionwedescribethedataacqui-sitionandsamplingprocessweperformedtoextractsixdierentdatasetswhichwillbetestedandanalyzedinthefollowingsections.
3.
1StanfordSNAPProjectDatasetTheStanfordLargeNetworkDatasetisacollectionofabout50largenetworkdatasetshavingmillionsofnodes,edgesandtuples.
ItwascollectedasapartoftheStanfordNetworkAnalysisPlatform(SNAP)project[15].
Itincludessocialnetworks,webgraphs,roadnetworks,Internetnetworks,citationnetworks,collaborationnetworks,andcommunicationnetworks.
Forthepurposeofourinvestigation,weselectedtheirTwitterpostsdataset.
ThisdatasetwascollectedfromJune1st,2009toDecember31st,2009andcontainsnearly476milliontweetspostedbynearly17millionusers.
Thedatasetisestimatedtocover20%-30%ofallpostspublishedonTwitterduringthattimeframe[21].
ToselectwhichLosingMyRevolution3eventswillbecoveredinthisstudy,weexaminedCNN's2009eventstimeline1.
Wewantedtoselectasmallnumberofeventsthatwerediverse,withlimitedoverlap,andrelativelyimportanttoalargenumberofpeople.
Giventhat,weselectedfourevents:theH1N1virusoutbreak,theIranianprotestsandelections,MichaelJackson'sdeath,andBarrackObama'sNobelPeacePrizeaward.
Preparation:Atweetistypicallycomposedoftext,hashtags,embeddedre-sourcesorURIsandusertagsallspanningamaximumof140characters.
HereisanexampleofatweetrecordintheSNAPdataset:T2009-07-3123:57:18Uhttp://Twitter.
com/nickgotchWRT@rockingjude:December21,2009DepopulationbyFoodWillBeginhttp://is.
gd/1WMZbWHOA.
.
BETTERWATCHRTplz#pwa#tcotThelinestartingwiththeletterTindicatesthedateandtimeofthetweetcreation.
WhilethelinestartingwithUshowsalinktotheuserwhoau-thoredthisparticulartweet.
Finally,thelinestartingwithWshowstheen-tiretweetincludingalltheuser-references"@rockingjude",theembeddedURIs"http://is.
gd/1WMZb",andhashtags"#pwa#tcot".
TagExpansion:Wewantedtoselecttweetsthatwecansaywithhighcon-denceareaboutaselectedevent.
Inthiscase,precisionismoreimportantthanrecallascollectingeverysingletweetpublishedaboutacertaineventislessimportantthanmakingsurethattheselectedtweetsaredenitelyaboutthatevent.
Severalstudiesfocusedonestimatingtheaboutnessofacertainwebpageoraresourceingeneral[12,18].
FortunatelyinTwitter,hashtagsincorporatedwithinatweetcanhelpusestimatetheir"aboutness".
Usersnormallyaddcer-tainhashtagstotheirtweetstoeasethesearchanddiscoverabilityinfollowingacertaintopic.
Thesehashtagswillbeutilizedintheevent-centricltrationprocess.
Foreachevent,weselectedinitialtagsthatdescribeit(Table1).
Thoseinitialtagswerederivedempiricallyafterexaminingsomeevent-relatedtweets.
Nextweextractedallthehashtagsthatco-occurredwithourinitialsetofhashtags.
Forexample,inclassH1N1weextractedalltheotherhashtagsthatappearedalongwith#h1n1withinthesametweetandkeptcountoftheirfrequency.
Thoseextractedhashtagsweresortedindescendingorderofthefrequencyoftheirappearanceintweets.
Weremovedallthegeneralscopetagslike#cnn,#health,#death,#warandothers.
Inregardstoaboutness,removinggeneraltagswillindeeddecreaserecallbutwillincreaseprecision.
Finallywepickedthetop8-10hashtagstorepresentthisevent-classandbeutilizedintheltrationprocess.
Table1showsthenalsetoftagsselectedforeachclass.
TweetFiltration:Inthepreviousstepweextractedthetagsthatwillhelpusclassifyandltertweetsinthedatasetaccordingtoeachevent.
Thisltration1http://www.
cnn.
com/2009/US/12/16/year.
timeline/index.
html4HanyM.
SalahEldeenandMichaelL.
NelsonEventInitialHashtagsTopCo-occurringHashtagsH1N1'h1n1''swine'=61,829'swineflu'=56,419'flu'=8,436Outbreak=61,351'pandemic'=6,839'influenza'=1,725'grippe'=1,559'tamiflu'=331M.
Jackson's'michaeljackson''michael'=27,075'mj'=18,584'thisisit'8,770'rip'=3,559'jacko'=3,325Death=22,934'kingofpop'=2,888'jackson'=2,559'thriller'=1,357'thankyoumichael'=1,050Iranian'iranelection''iran'949,641'gr88'=197,113'tehran'=109,006'freeiran'=13,378Elections=911,808'neda'=191,067'mousavi'=16,587'united4iran'=9,198'iranrevolution'=7,295Obama's'obama'=48,161&'nobel'=2,261'obamanobel'=14'nobelprize''nobelpeace'=113NobelPrize'peace'=3,721'barack'=1292'nobelpeaceprize'=107Table1.
Twitterhashtagsgeneratedforlteringandtheirfrequencyofoccurringprocessaimstoextractareasonablesizeddatasetoftweetsforeacheventandtominimizetheinter-eventoverlap.
Sincethelifeandpersistenceofthetweetitselfisnotthefocusofthisstudybutrathertheassociatedresourcethatappearsinthetweet(image,video,shortenedURIorotherembeddedresource),wewillextractonlythetweetsthatcontainanembeddedresource.
Thisstepresultedin181milliontweetswithembeddedresources(http://is.
gd/1WMZbinthepriorexample).
ThesetweetswerefurtherlteredtokeeponlythetweetsthathaveatleastoneoftheexpandedtagsobtainedfromTable1.
Thenumberoftweetsafterthisphasereached1.
1milliontweets.
Filteringthetweetsbasedontheoccurrenceofatleastoneofthehashtagsonlyisundesirableasitwillcausetwoproblems:First,itwillintroducepossibleeventoverlapduetogeneraltweetstalkingabouttwoormoretopics.
Second,isthatusingonlythesingleoccurrenceofthesetagswillyieldahugeamountoftweetsandweneedtoreducethissizetoreachamoremanageablesize.
In-tuitivelyspeaking,stronglyrelatedhashtagswillco-occuroften.
Forexample,atweetthathas#h1n1alongwith#swineuand#pandemicismostlikelyabouttheH1N1outbreakratherthanatweethavingjustthetag#uorjust#sick.
Filteringwiththisco-occurrencewillinturnsolvebothproblemsasbyincreasingrelevancetoaparticularevent,generaltweetsthattalkaboutseveraleventswillbelteredoutthusdiminishingtheoverlap,andinturnitwillreducethesizeofthedataset.
Next,weincreasetheprecisionofthetweetsassociatedwitheacheventfromthesetof1.
1milliontweets.
Intherstiterationweselectedthetagthathadthehighestfrequencyofco-occurrenceinthedatasetwiththeinitialtagandaddedittoasetwewillcalltheselectionset.
Afterthatwechecktheco-occurrenceofalltheremainingextractedtagswiththetagintheselectionsetandrecordthefrequenciesofco-occurrence.
Aftersortingthefrequenciesofco-occurrencewiththetagfromtheselectionset,wepickthehighestonetokeepaddittotheselectionset.
Werepeatthisstepofcountingco-occurrencesbutwithallthepreviouslyextractedhashtagsintheselectionsetfrompreviousiterations.
Toelaborate,forH1N1assumethatthehastag'#h1n1'hadthehighestfrequencyofappearanceinthedatasetsoweaddittotheselectionset.
IntheLosingMyRevolution5nextiterationwerecordthehowmanytimeseachtaginthelistappearedalongwith'#h1n1'inasametweet.
Ifweselected'#swine'astheonewiththehighestfrequencyofoccurrencewiththeinitialtag'#h1n1'weaddittotheselectionlistandinthenextiterationwerecordthefrequencyofoccurrenceoftheremaininghashtagswithbothoftheextractedtags'#h1n1'and'#swine'.
Werepeatthisstep,foreachevent,tothepointwherewehaveamanageablesizedatasetwhichwearecondentinits'aboutness'inrelationtotheevent.
EventHashtagsselectedforlterationTweetsExtractedOperationPerformedFinalTweetsMJmichael27,075michael&michaeljackson22,934Sample10%2,293Iraniran949,641iran&iranelection911,808iran&iranelection&gr88189,757iran&iranelection&gr88&neda91,815iran&iranelection&gr88&neda&tehran34,294Sample10%3,429H1N1h1n161,351h1n1&swine44,972h1n1&swine&swineflu42,574h1n1&swine&swineflu&pandemic5,517TakeAll5,517Obamaobama48,161obama&nobel1,118TakeAll1,118Table2.
TweetFiltrationiterationsandnaltweetcollectionsTwoproblemsappearedfromthisapproachwiththeIranandMichaelJack-sondatasets.
IntheIrandatasetthenumberoftweetswasinhundredsofthou-sandsandevenwith5tagsco-occurrenceitwasstillabout34K+tweets.
Tosolvethisweperformedarandomsamplingfromthoseresultingtweetstotakeonly10%ofthemresultinginasmallermanageabledataset.
ThesecondproblemwiththeMichaelJacksondatasetuponusing5tagstodecreaseittoamanage-ablesizewerealizedtherewerefewuniquedomainsfortheembeddedresources.
Acloserlookrevealedthiscombinationoftagswasmostlyborder-linetweetspam(MJringtones).
Tosolvethisweusedonlythetwotoptags"#michael"and"#michaeljackson",andthenwerandomlysampled10%oftheresultingtweetstoreachthedesireddatasetsize(Table2).
3.
2EgyptianRevolutionDatasetTheoneyearanniversaryofthiseventwastheoriginalmotivationforthisstudy[13].
Inthiscase,westartedwithaneventandthentriedtogetso-cialmediacontentdescribingit.
Despiteitsubiquity,gatheringsocialmediaforapasteventissurprisinglyhard.
WepickedtheEgyptianrevolutionduetotheroleofthesocialmediaincuratinganddrivingtheincidentsthatledtotheresignationofthepresident.
SeveralinitiativeswerecommencedtocollectandcuratethesocialmediacontentduringtherevolutionlikeR-sheif.
org2whichspecializesinsocialcontentanalysisoftheissuesintheArabworldbyusingaggregatedatafromTwitterandtheWeb.
WearecurrentlyintheprocessofobtainingthemillionsofrecordsrelatedtotheArabSpringof2011.
Meanwhile,wedecidedtobuildourowndatasetmanually.
2http://www.
r-shief.
org/6HanyM.
SalahEldeenandMichaelL.
NelsonThereareseveralsitesthatcurateresourcesabouttheEgyptianRevolutionandwewanttoinvestigateasmanyofthemaspossible.
Atthesametime,weneedtodiversifyourresourcesandthetypesofdigitalartifactsthatareembeddedinthem.
Tweets,videos,images,embeddedlinks,entirewebpagesandbookswereincludedinourinvestigation.
Forthesakeofconsistency,welimitedouranalysistoresourcescreatedwithintheperiodfromthe20thofJanuary2011tothe1stofMarch2011.
Inthenextsubsectionsweexplaineachoftheresourcesweutilizedinourdataacquisitionindetail.
Storify:StorifyisawebsitethatenablesuserstocreatestoriesbycreatingcollectionsofURIs(e.
g.
,Tweets,images,videos,links)andarrangethemtem-porally.
Theseentriesarepostedbyreferencetotheirhostwebsites.
Thus,addingcontenttoStorifydoesnotnecessarilymeanitisarchived.
IfauseraddedavideofromYouTubeandafterawhilethepublisherofthatvideodecidedtoremoveitfromYouTubetheuserisleftwithagapintheirStorifyentry.
ForthispurposewegatheredalltheStorifyentriesthatwerecreatedbetween20thofJanuary2011andthe1stofMarch2011,resultingin219uniqueresources.
IAmJan25:Someentirewebsiteswerededicatedasacollectionhubofmediatocuratetherevolution.
Basedonpubliccontributions,thosewebsitescollectdierenttypesofmedia,classifythem,orderthemchronologicallyandpublishthemtothepublic.
WepickedawebsitenamedIAmJan25.
com,asanexampleofthesewebsites,toanalyzeandinvestigate.
Theadministratorsofthewebsitereceivedselectedvideosandimagesfornotableeventsandactionsthathappenedduringtherevolution.
Thoseimagesandvideoswereselectedbyusersastheyvouchedforthemtobeofsomeimportanceandtheysendtheresource'sURItothewebsiteadministrators.
Thewebsiteitselfisdividedintotwocollections:avideocollectionandanimagecollection.
Thevideocollectionhad2387uniqueURIswhiletheimagecollectionhad3525uniqueURIs.
TweetsFromTahrir:Severalbookswerepublishedin2011documentingtherevolutionandtheArabSpring.
TobridgethegapbetweenbooksanddigitalmediaweanalyzedabookentitledTweetsfromTahrir[11]whichwaspub-lishedonApril21st,2011.
Asthenamestates,thisbooktellsastoryformedbytweetsofpeopleduringtherevolutionandtheclasheswiththepastregime.
Weanalyzedthisbookasacollectionoftweetsthathadtheluxuryofapaperbackpreservationandfocusedonthetweetedmedia,inthiscaseimages.
Thebookhadatotalof1118tweetshaving23uniqueimages.
3.
3SyriaDatasetThisdatasethasbeenselectedtorepresentacurrent(March2012)event.
UsingtheTwittersearchAPI,wefollowedthesamepatternofdataacquisitionasinsection3.
1.
Westartedwithonehashtag,#Syria,andexpandedit.
Table3LosingMyRevolution7showthetagsproducedfromthetagexpansionstep.
AfterthateachofthosetagswereinputintoaprocessutilizingtheTwitterstreamingAPIandproducedtherst1000resultsmatchingeachtag.
Fromthisset,werandomlysampled10%.
Asaresult,1955tweetswereextractedeachhavingoneormoreembeddedresourcesandtagsfromtheexpandedtagsinTable3.
InitialHashtagsExtractedHashtags'Syria''Bashar''RiseDamascus''GenocideInSyria''STOPASSAD2012''AssadCrimes''Assad'Table3.
Twitter#TagsgeneratedforlteringtheSyrianuprisingTable4showstheresourcescollectedalongwiththetopleveldomainsthatthoseresourcesbelongtoforeachevent.
EventTopDomains(numberofresourcesfound)MJyoutube(110),twitpic(45),latimes(43),cnn(30),amazon(30)Iranyoutube(385),twitpic(36),blogspot(30),roozonline(29)H1N1rhizalabs(676),reuters(17),google(16),utrackers(16),calgaryherald(11)Obamablogspot(16),nytimes(15),wordpress(12),youtube(11),cnn(10)Egyptyoutube(2414),cloudfront(2303),yfrog(1255),twitpic(114),imageshack.
us(20)Syriayoutube(130),twitter(61),hostpic.
biz(9),telegraph.
co.
uk(5)Table4.
Thetopleveldomainsfoundforeacheventordereddescendinglybythenumberofresources.
4UniquenessandExistenceFromthepreviousdatagatheringstepweobtainedsixdierentdatasetsrelatedtosixdierenthistoricevents.
ForeacheventweextractedalistofURIsthatweresharedintweetsoruploadedtositeslikeStorifyorIAmJan25.
ToanswerthequestionofhowmuchofthesocialmediacontentismissingwetestthoseURIsforeachdatasettoeliminateURIaliasesinwhichseveralURIsidentifytothesameresource.
UponobtainingthoseuniqueURIsweexaminehowmanyofwhicharestillavailableonthelivewebandhowmanyareavailableinpublicwebarchives.
4.
1UniquenessSomeURIs,especiallythosethatappearinTwitter,maybealiasesforthesameresource.
Forexample"http://bit.
ly/2EEjBl"and"http://goo.
gl/2ViC"bothresolveto"http://www.
cnn.
com".
Tosolvethis,weresolvedalltheURIsfollowingredirectstothenalURI.
TheHTTPresponseofthelastredirecthasaeldcalledlocationthatcontainstheoriginallongURIoftheresource.
ThisstepreducedthetotalnumberofURIsinthesixdatasetsfrom21,625to11,051.
Table5showsthenumberofuniqueresourcesineverydataset.
4.
2ExistenceontheLive-WebAfterobtainingtheuniqueURIsfromthepreviousstepweresolveallofthemandclassifythemasSuccessorFailure.
TheSuccessclassincludesalltheresources8HanyM.
SalahEldeenandMichaelL.
NelsonAllUnique2,2931,187=51.
77%MJArchivedNotArchivedAvailable316=26.
62%474=39.
93%Missing90=7.
58%307=25.
86%397=33.
45%406=34.
20%each/1,187AllUnique3,4291,340=39.
08%IranArchivedNotArchivedAvailable415=30.
97%586=43.
73%Missing101=7.
54%238=17.
76%339=25.
30%516=38.
51%each/1,340AllUnique5,5171,645=29.
82%H1N1ArchivedNotArchivedAvailable595=36.
17%656=39.
88%Missing98=5.
96%296=17.
99%394=23.
95%693=42.
12%each/1,645AllUnique1,118370=33.
09%ObamaArchivedNotArchivedAvailable143=38.
65%135=36.
49%Missing33=8.
92%59=15.
95%92=24.
86%176=47.
57%each/370AllUnique7,3136,154=84.
15%EgyptArchivedNotArchivedAvailable1,069=17.
37%4440=72.
15%Missing173=2.
81%472=7.
67%645=10.
48%1242=20.
18%each/6,154AllUnique1,955355=18.
16%SyriaArchivedNotArchivedAvailable19=5.
35%311=87.
61%Missing0=0%25=7.
04%25=7.
04%19=5.
35%each/355Table5.
Percentagesofuniqueresourcesfromalltheextractedonesweobtainedpereventandthepercentagesofpresenceofthoseuniqueresourcesonlivewebandinarchives.
Allresources=21,625,Uniqueresources=11,051thatultimatelyreturna"200OK"HTTPresponse.
TheFailureclassincludesalltheresourcesthatreturna"4XX"familyresponselike:"404NotFound","403Forbidden"and"410Gone",the"30X"redirectfamilywhilehavinginniteloopredirects,andservererrorswithresponse"50X".
Toavoidtransienterrorswerepeatedtherequests,onalldatasets,severaltimesforaweektoresolvethoseerrors.
Wealsotestfor"Soft404s",whicharepagesthatreturn"200OK"responsecodebutarenotarepresentationoftheresource,usingatechniquebasedonaheuristicforautomaticallydiscoveringsoft404sfromBar-Yossefetal.
[2].
Wealsoincludenoresponsefromtheserver,aswellasDNStimeouts,asfailures.
Notethatfailuremeansthatthisresourceismissingontheliveweb.
Table5summarizes,foreachdataset,thetotalpercentagesoftheresourcesmissingfromthelivewebandthenumberofmissingresourcesdividedbythetotalnumberofuniqueresources.
4.
3ExistenceintheArchivesInthepreviousstepwetestedtheexistenceoftheuniquelistofURIsforeacheventontheliveweb.
Next,weevaluatehowmanyURIshavebeenarchivedinpublicwebarchives.
TocheckthosearchivesweutilizetheMementoframe-work.
IfthereisamementofortheURI,wedownloaditsmementotimemapandanalyzeit.
Thetimemapisadatestamporderedlistofallknownarchivedver-sions(called"mementos")ofaURI.
Next,weparsethistimemapandextractLosingMyRevolution9thenumberofmementosthatpointtoversionsoftheresourceinthepublicarchives.
Wedeclaretheresourcetobearchivedifithasatleastonememento.
Thisstepwasalsorepeatedseveraltimestoavoidthetransientstatesofthearchivesbeforedeemingaresourceasunarchived.
TheresultsofthisexperimentalongwiththearchivecoveragepercentagearepresentedinTable5.
5ExistenceasaFunctionofTimeInspectingtheresultsfromthepreviousstepssuggeststhatthenumberofmiss-ingsharedresourcesinsocialmediacorrespondingtoaneventisdirectlypropor-tionalwithitsage.
Todeterminedatesforeachoftheeventsthisweextractedallthecreationdatesfromallthetweet-baseddatasetsandsortedthem.
Foreachevent,weplottedagraphillustratingthenumberoftweetsperdayrelatedtothateventasshowningure1.
Sincethedatasetisseparatedtemporallyinto3partitions,andinordertodisplayalltheeventsononegraphwereducedthesizeofthex-axisbyremovingthetimeperiodsnotcoveredinourstudy.
Fig.
1.
URIssharedperdaycorrespondingtoeacheventandshowingthetwopeaksinthenon-Syrianandnon-EgyptianeventsUponexaminingthegraphwefoundaninterestingphenomenainthenon-Syrianandnon-Egyptianevents:eacheventhastwopeaks.
Uponinvestigatinghistorytimelineswecametoconclusionthatthosepeaksreectasecondwaveofsocialmediainteractionasaresultofnewincidentwithinthesameeventafteraperiodoftime.
Forexample,intheH1N1dataset,therstpeakillustratestheworld-wideoutbreakannouncementwhilethesecondpeakdenotesthereleaseofthevaccine.
IntheIrandataset,therstpeakshowsthepeakoftheelectionswhilethesecondpeakpinpointstheIraniantrials.
AsfortheMJdatasettherstpeakcorrespondstohisdeathandthesecondpeakdescribestherumorsthatMichaelJacksondiedofunnaturalcausesandapossiblehomicide.
FortheObamadataset,therstpeakrevealstheannouncementofhiswinningtheprizewhilethesecondpeakpresentstheaward-givingceremonyinOslo.
FortheEgyptianevolution,theresourcesareallwithinasmalltimeslotof2weeks10HanyM.
SalahEldeenandMichaelL.
Nelsonaroundthedate11thofFebruary.
AsfortheSyrianevent,sincethecollectionwasveryrecenttherewasnoobviouspeaks.
Thosepeaksweexaminedwillbecometemporalcentroidsofthesocialcontentcollections(thedatasets).
MJ(June25th&July10th2009),Iran(June13th&1stAugust2009),H1N1(September11th&5thOctober2009),andObama(October9th&December10th2009).
Egyptwas(February11th2011)andtheSyriadatasetalsohadonecentroidonMarch27th2012.
Wespliteacheventaccordingtothetwocentroidsineacheventaccordingly.
Figure1showsthosepeaksandTable6showsthemissingcontentandthearchivedcontentpercentagescorrespondingtoeachcentroid.
MJIranH1N1ObamaEgyptSyria%Missing36.
24%31.
62%26.
98%24.
47%23.
49%25.
64%24.
59%26.
15%10.
48%7.
04%%Archived39.
45%30.
78%43.
08%36.
26%41.
65%43.
87%47.
87%46.
15%20.
18%5.
35%Table6.
TheSplitDatasetFig.
2.
Percentageofcontentmissingandarchivedfortheeventsasafunctionoftime.
Figure2showsthemissingandarchivedvaluesfromTable6asafunctionoftimesinceshared.
Equation1showsthemodeledestimateforthepercentageofsharedresourceslost,whereAgeisindays.
Whilethereisalesslinearrelationshipbetweentimeandbeingarchived,equation2showsthemodeledestimateforthepercentageofsharedresourcesarchivedinapublicarchive.
ContentLostPercentage=0.
02(Ageindays)+4.
20(1)ContentArchivedPercentage=0.
04(Ageindays)+6.
74(2)Giventheseobservationsandourcurvettingweestimatethatafterayearfrompublishingabout11%ofcontentsharedinsocialmediawillbegone.
Afterthispoint,wearelosingroughly0.
02%ofthiscontentperday.
LosingMyRevolution116ConclusionsandFutureworkWecanconcludethatthereisanearlylinearrelationshipbetweentimeofshar-inginthesocialmediaandthepercentagelost.
Althoughnotaslinear,thereisasimilarrelationshipbetweenthetimeofsharingandtheexpectedpercentageofcoverageinthearchives.
Toreachthisconclusion,weextractedcollectionsoftweetsandothersocialmediacontentthatwaspostedandsharedinrelationtosixdierenteventsthatoccurredinthetimeperiodfromJune2009toMarch2012.
Nextweextractedtheembeddedresourceswithinthissocialmediacontentandtestedtheirexistenceonthelivewebandinthearchives.
Afteranalyzingthepercentageslostandarchivedinrelationtotimeandplottingthemweusedalinearregressionmodeltotthosepoints.
Finallywepresentedtwolinearmodelsthatcanestimatetheexistenceofaresource,thatwaspostedorsharedatonepointoftimeinthesocialmedia,onthelivewebandinthearchivesasafunctionofageinthesocialmedia.
Inthenextstageofourresearchweneedtoexpandthedatasetsandimportothersimilardatasetsespeciallyintheuncoveredtemporalareas(e.
g.
,theyearof2010andbefore2009).
Examiningmoredatasetsacrossextendedpointsintimecouldenableustobettermodelthesetwofunctionsoftime.
Alsoseveralotherfactorsbesidetimewouldbeanalyzedtounderstandtheireectonpersistenceonthelivewebandarchivingcoveragelike:publishingvenue,rateofsharing,popularityofauthorsandthenatureoftherelatedevent.
7AcknowledgmentsThisworkwassupportedinpartbytheLibraryofCongressandNSFIIS-1009392.
References1.
Ainsworth,ScottG.
andAlsum,AhmedandSalahEldeen,HanyandWeigle,MicheleC.
andNelson,MichaelL.
:HowMuchoftheWebIsArchivedInProceedingsofthe11thannualinternationalACM/IEEEjointconferenceonDigitallibraries,JCDL'11,pages133-136,(2011).
2.
Bar-Yossef,ZivandBroder,AndreiZ.
andKumar,RaviandTomkins,Andrew.
:SicTransitGloriaTelae:TowardsanUnderstandingoftheWeb'sDecay.
InProceedingsofthe13thinternationalconferenceonWorldWideWeb,WWW'04,pages328-337,(2004).
3.
F.
Benevenut,T.
Rodrigues,M.
Cha,andV.
Almeida.
:CharacterizingUserBehav-iorinOnlineSocialNetworks.
InInProc.
ofACMSIGCOMMInternetMeasure-mentConference,SIGCOMM'09,pages49-62,(2009).
4.
Lee,CheiandMa,LongandGoh,Dion.
:WhyDoPeopleShareNewsinSocialMediaActiveMediaTechnology,SpringerBerlin/Heidelberg,pages129-140,Vol-ume:6890,(2011).
12HanyM.
SalahEldeenandMichaelL.
Nelson5.
Facebookocialfactsheet,http://newsroom.
fb.
com/content/default.
aspxNewsAreaId=226.
Kwak,HaewoonandLee,ChanghyunandPark,HosungandMoon,Sue.
:WhatisTwitter,aSocialNetworkoraNewsMediaInProceedingsofthe19thinternationalconferenceonWorldwideweb,WWW'10,pages591-600,(2010).
7.
GordonMohr,MicheleKimpton,MichealStackandIgorRanitovic.
:IntroductiontoHeritrix,anArchivalQualityWebCrawler.
In4thInternationalWebArchivingWorkshop,IWAW'04,(2004).
8.
FrankMcCownandNorouDiawaraandMichaelL.
Nelson.
:FactorsAectingWebsiteReconstructionfromtheWebInfrastructure.
InProceedingsofthe7thACM/IEEE-CSJointConferenceonDigitalLibraries,JCDL'07,pages39-48,(2007).
9.
MichaelL.
Nelson,B.
DanetteAllen.
:ObjectPersistenceandAvailabilityinDigitalLibraries.
D-LibMagazine,Volume8,Number1,January(2002)10.
M.
E.
J.
NewmanandJ.
Park.
:Whysocialnetworksaredierentfromothertypesofnetworks.
Phys.
Rev.
E,68(3):036122,September,(2003).
11.
AlexNunnsandNadiaIdle.
:TweetsFromTahrir.
ISBN-10:1935928457.
12.
T.
A.
PhelpsandR.
Wilensky.
:RobustHyperlinksCostJustFiveWordsEach.
TechnicalReport,UCB/CSD-00-1091,EECSDepartment,UniversityofCalifornia,Berkeley,(2000).
13.
HanyM.
SalahEldeen,MichaelL.
Nelson.
:LosingMyRevolution:AyearaftertheEgyptianRevolution,10%ofthesocialmediadocumentationisgone.
http://ws-dl.
blogspot.
com/2012/02/2012-02-11-losing-my-revolution-year.
html14.
RobertSanderson,MarkPhillipsandHerbertVandeSompel.
:AnalyzingthePersistenceofReferencedWebResourceswithMemento.
CoRR,arXiv:1105.
3459,(2011)15.
StanfordSNAPProjectDataset,http://snap.
stanford.
edu/16.
Twitternumbers,http://blog.
Twitter.
com/2011/03/numbers.
html17.
H.
VandeSompel,M.
L.
Nelson,R.
Sanderson,L.
L.
Balakireva,S.
Ainsworth,H.
Shankar.
:Memento:TimeTravelfortheWeb,TechnicalReport,arXiv:0911.
1112,November,(2009).
18.
Wan,X.
,Yang,J.
:Wordrank-basedLexicalSignaturesforFindingLostorRelatedWebPages.
InProceedingsofthe8thAsia-PacicWebconferenceonFrontiersofWWWResearchandDevelopment,APWeb'06,pages843-849,(2006).
19.
C.
Wilson,B.
Boe,A.
Sala,K.
P.
Puttaswamy,andB.
Y.
Zhao.
:UserInteractionsinSocialNetworksandtheirImplications.
InProceedingsofthe4thACMEuropeanconferenceonComputersystems,EuroSys'09,pages205-218,(2009).
20.
Wu,ShaomeiandHofman,JakeM.
andMason,WinterA.
andWatts,DuncanJ.
:WhoSaysWhattoWhomonTwitter.
InProceedingsofthe20thinternationalconferenceonWorldwideweb,WWW'11,pages705-714,(2011).
21.
JaewonYangandJureLeskovec.
:PatternsofTemporalVariationinOnlineMedia.
InACMInternationalConferenceonWebSearchandDataMinig,WSDM'11,pages177-186,(2011).
22.
J.
YangandS.
Counts.
:PredictingtheSpeed,Scale,andRangeofInformationDiusioninTwitter.
In4thInternationalAAAIConferenceonWeblogsandSocialMedia,ICWSM'10,May,(2010).
23.
D.
ZhaoandM.
B.
Rosson.
:HowandWhyPeopleTwitter:TheRolethatMicro-bloggingPlaysinInformalCommunicationatWork.
InProceedingsoftheACM2009internationalconferenceonSupportinggroupwork.
GROUP'09,pages243-252,(2009).

星梦云-100G高防4H4G21M月付仅99元,成都/雅安/德阳

商家介绍:星梦云怎么样,星梦云好不好,资质齐全,IDC/ISP均有,从星梦云这边租的服务器均可以备案,属于一手资源,高防机柜、大带宽、高防IP业务,一手整C IP段,四川电信,星梦云专注四川高防服务器,成都服务器,雅安服务器,。活动优惠促销:1、成都电信夏日激情大宽带活动机(封锁UDP,不可解封):机房CPU内存硬盘带宽IP防护流量原价活动价开通方式成都电信优化线路2vCPU2G40G+60G21...

香港、美国、日本、韩国、新加坡、越南、泰国、加拿大、英国、德国、法国等VPS,全球独立服务器99元起步 湘南科技

全球独立服务器、站群多IP服务器、VPS(哪个国家都有),香港、美国、日本、韩国、新加坡、越南、泰国、加拿大、英国、德国、法国等等99元起步,湘南科技郴州市湘南科技有限公司官方网址:www.xiangnankeji.cn产品内容:全球独立服务器、站群多IP服务器、VPS(哪个国家都有),香港、美国、日本、韩国、新加坡、越南、泰国、加拿大、英国、德国、法国等等99元起步,湘南科技VPS价格表:独立服...

萤光云(13.25元)香港CN2 新购首月6.5折

萤光云怎么样?萤光云是一家国人云厂商,总部位于福建福州。其成立于2002年,主打高防云服务器产品,主要提供福州、北京、上海BGP和香港CN2节点。萤光云的高防云服务器自带50G防御,适合高防建站、游戏高防等业务。目前萤光云推出北京云服务器优惠活动,机房为北京BGP机房,购买北京云服务器可享受6.5折优惠+51元代金券(折扣和代金券可叠加使用)。活动期间还支持申请免费试用,需提交工单开通免费试用体验...

403forbidden为你推荐
企业邮局系统企业邮件系统用什么软件好?outlookexpressOUTLOOK EXPRESS作用是什么?我想删除它会不会影响系统支付宝账户是什么好评返现 要支付宝帐号 支付宝帐号是什么啊重庆网站制作重庆网站制作,哪家专业,价格最优?温州商标注册温州注册公司在哪里注册三五互联南京最专业的网站建设公司是哪家?双尚网络做的好不好? 给分求答案3g手机有哪些什么样的手机属于3G手机?什么是通配符dir是什么艾泰科技闻泰科技是做什么的啊?有人能告诉我吗?玩具网上商城点恰网这个玩具商城怎么样?信誉保证吗?玩具是不是正品的?
北京vps diahosting 分销主机 cpanel 地址大全 免费ftp站点 台湾谷歌网址 linux空间 帽子云 admit的用法 股票老左 赞助 tna官网 免费申请网站 paypal注册教程 in域名 360云服务 创建邮箱 域名dns 万网主机管理 更多