pandemic403forbidden

403forbidden  时间:2021-04-12  阅读:()
LosingMyRevolutionHowManyResourcesSharedonSocialMediaHaveBeenLostHanyM.
SalahEldeenandMichaelL.
NelsonOldDominionUniversity,DepartmentofComputerScienceNorfolkVA,23529,USA{hany,mln}@cs.
odu.
eduAbstract.
Socialmediacontenthasgrownexponentiallyintherecentyearsandtheroleofsocialmediahasevolvedfromjustnarratinglifeeventstoactuallyshapingthem.
Inthispaperweexplorehowmanyresourcessharedinsocialmediaarestillavailableontheliveweborinpublicwebarchives.
Byanalyzingsixdierentevent-centricdatasetsofresourcessharedinsocialmediaintheperiodfromJune2009toMarch2012,wefoundabout11%lostand20%archivedafterjustayearandanaverageof27%lostand41%archivedaftertwoandahalfyears.
Furthermore,wefoundanearlylinearrelationshipbetweentimeofsharingoftheresourceandthepercentagelost,withaslightlylesslinearrelationshipbetweentimeofsharingandarchivingcoverageoftheresource.
Fromthismodelweconcludethataftertherstyearofpublishing,nearly11%ofsharedresourceswillbelostandafterthatwewillcontinuetolose0.
02%perday.
Keywords:WebArchiving,SocialMedia,DigitalPreservation1IntroductionWithmorethan845millionFacebookusersattheendof2011[5]andover140milliontweetssentdailyin2011[16]userscantakephotos,videos,posttheiropinions,andreportincidentsastheyhappen.
Manyofthepostsandtweetsareaboutquotidianeventsandtheirpreservationisdebatable.
However,someofthepostsandeventsareaboutculturallyimportanteventswhosepreservationislesscontroversial.
Inthispaperweshedlightontheimportanceofarchivingsocialmediacontentabouttheseeventsandestimatehowmuchofthiscontentisarchived,stillavailable,orlostwithnopossibilityofrecovery.
Toemphasizetheculturallyimportantcommentaryandsharing,wecol-lecteddataaboutsixeventsinthetimeperiodofJune2009toMarch2012:theH1N1virusoutbreak,MichaelJackson'sdeath,theIranianelectionsandprotests,BarackObama'sNobelPeacePrize,theEgyptianrevolution,andtheSyrianuprising.
arXiv:1209.
3026v1[cs.
DL]13Sep20122HanyM.
SalahEldeenandMichaelL.
Nelson2RelatedWorkToourknowledge,nopriorstudyhasanalyzedtheamountofsharedresourcesinsocialmedialostthroughtime.
Therehavebeenmanystudiesanalyzingthebehaviorofuserswithinasocialnetwork,howtheyinteract,andwhatcontenttheyshare[3,19,20,23].
AsforTwitter,Kwaketal.
[6]studieditsnatureanditstopologicalcharacteristicsandfoundadeviationfromknowncharacteristicsofhumansocialnetworksthatwereanalyzedbyNewmanandPark[10].
Leeanalyzedthereasonsbehindsharingnewsinsocialmediaandfoundthatinfor-mativenesswasthestrongestmotivationinpredictingnewssharingintention,followedbysocializingandstatusseeking[4].
AlsosharedcontentinsocialmedialikeTwittermoveanddiuserelativelyfastasstatedbyYangetal.
[22].
Furthermore,manyconcernswereraisedaboutthepersistenceofsharedresourcesandwebcontentingeneral.
NelsonandAllenstudiedthepersistenceofobjectsinadigitallibraryandfoundthat,withjustoverayear,3%ofthesampletheycollectedhaveappearedtonolongerbeavailable[9].
Sandersonetal.
analyzedthepersistenceandavailabilityofwebresourcesreferencedfrompapersinscholarlyrepositoriesusingMementoandfoundthat28%oftheseresourceshavebeenlost[14].
Memento[17]isacollectionofHTTPextensionsthatenablesuniform,inter-archiveaccess.
Ainsworthetal.
[1]examinedhowmuchofthewebisarchivedandfounditrangesfrom16%to79%,dependingonthestartingseedURIs.
McCownetal.
examinedthefactorsaectingreconstructingwebsites(usingcachesandarchives)andfoundthatPageRank,Age,andthenumberofhopsfromthetop-levelofthesiteweremostinuential[8].
3DataGatheringWecompiledalistofURIsthatweresharedinsocialmediaandcorrespondtospecicculturallyimportantevents.
Inthissectionwedescribethedataacqui-sitionandsamplingprocessweperformedtoextractsixdierentdatasetswhichwillbetestedandanalyzedinthefollowingsections.
3.
1StanfordSNAPProjectDatasetTheStanfordLargeNetworkDatasetisacollectionofabout50largenetworkdatasetshavingmillionsofnodes,edgesandtuples.
ItwascollectedasapartoftheStanfordNetworkAnalysisPlatform(SNAP)project[15].
Itincludessocialnetworks,webgraphs,roadnetworks,Internetnetworks,citationnetworks,collaborationnetworks,andcommunicationnetworks.
Forthepurposeofourinvestigation,weselectedtheirTwitterpostsdataset.
ThisdatasetwascollectedfromJune1st,2009toDecember31st,2009andcontainsnearly476milliontweetspostedbynearly17millionusers.
Thedatasetisestimatedtocover20%-30%ofallpostspublishedonTwitterduringthattimeframe[21].
ToselectwhichLosingMyRevolution3eventswillbecoveredinthisstudy,weexaminedCNN's2009eventstimeline1.
Wewantedtoselectasmallnumberofeventsthatwerediverse,withlimitedoverlap,andrelativelyimportanttoalargenumberofpeople.
Giventhat,weselectedfourevents:theH1N1virusoutbreak,theIranianprotestsandelections,MichaelJackson'sdeath,andBarrackObama'sNobelPeacePrizeaward.
Preparation:Atweetistypicallycomposedoftext,hashtags,embeddedre-sourcesorURIsandusertagsallspanningamaximumof140characters.
HereisanexampleofatweetrecordintheSNAPdataset:T2009-07-3123:57:18Uhttp://Twitter.
com/nickgotchWRT@rockingjude:December21,2009DepopulationbyFoodWillBeginhttp://is.
gd/1WMZbWHOA.
.
BETTERWATCHRTplz#pwa#tcotThelinestartingwiththeletterTindicatesthedateandtimeofthetweetcreation.
WhilethelinestartingwithUshowsalinktotheuserwhoau-thoredthisparticulartweet.
Finally,thelinestartingwithWshowstheen-tiretweetincludingalltheuser-references"@rockingjude",theembeddedURIs"http://is.
gd/1WMZb",andhashtags"#pwa#tcot".
TagExpansion:Wewantedtoselecttweetsthatwecansaywithhighcon-denceareaboutaselectedevent.
Inthiscase,precisionismoreimportantthanrecallascollectingeverysingletweetpublishedaboutacertaineventislessimportantthanmakingsurethattheselectedtweetsaredenitelyaboutthatevent.
Severalstudiesfocusedonestimatingtheaboutnessofacertainwebpageoraresourceingeneral[12,18].
FortunatelyinTwitter,hashtagsincorporatedwithinatweetcanhelpusestimatetheir"aboutness".
Usersnormallyaddcer-tainhashtagstotheirtweetstoeasethesearchanddiscoverabilityinfollowingacertaintopic.
Thesehashtagswillbeutilizedintheevent-centricltrationprocess.
Foreachevent,weselectedinitialtagsthatdescribeit(Table1).
Thoseinitialtagswerederivedempiricallyafterexaminingsomeevent-relatedtweets.
Nextweextractedallthehashtagsthatco-occurredwithourinitialsetofhashtags.
Forexample,inclassH1N1weextractedalltheotherhashtagsthatappearedalongwith#h1n1withinthesametweetandkeptcountoftheirfrequency.
Thoseextractedhashtagsweresortedindescendingorderofthefrequencyoftheirappearanceintweets.
Weremovedallthegeneralscopetagslike#cnn,#health,#death,#warandothers.
Inregardstoaboutness,removinggeneraltagswillindeeddecreaserecallbutwillincreaseprecision.
Finallywepickedthetop8-10hashtagstorepresentthisevent-classandbeutilizedintheltrationprocess.
Table1showsthenalsetoftagsselectedforeachclass.
TweetFiltration:Inthepreviousstepweextractedthetagsthatwillhelpusclassifyandltertweetsinthedatasetaccordingtoeachevent.
Thisltration1http://www.
cnn.
com/2009/US/12/16/year.
timeline/index.
html4HanyM.
SalahEldeenandMichaelL.
NelsonEventInitialHashtagsTopCo-occurringHashtagsH1N1'h1n1''swine'=61,829'swineflu'=56,419'flu'=8,436Outbreak=61,351'pandemic'=6,839'influenza'=1,725'grippe'=1,559'tamiflu'=331M.
Jackson's'michaeljackson''michael'=27,075'mj'=18,584'thisisit'8,770'rip'=3,559'jacko'=3,325Death=22,934'kingofpop'=2,888'jackson'=2,559'thriller'=1,357'thankyoumichael'=1,050Iranian'iranelection''iran'949,641'gr88'=197,113'tehran'=109,006'freeiran'=13,378Elections=911,808'neda'=191,067'mousavi'=16,587'united4iran'=9,198'iranrevolution'=7,295Obama's'obama'=48,161&'nobel'=2,261'obamanobel'=14'nobelprize''nobelpeace'=113NobelPrize'peace'=3,721'barack'=1292'nobelpeaceprize'=107Table1.
Twitterhashtagsgeneratedforlteringandtheirfrequencyofoccurringprocessaimstoextractareasonablesizeddatasetoftweetsforeacheventandtominimizetheinter-eventoverlap.
Sincethelifeandpersistenceofthetweetitselfisnotthefocusofthisstudybutrathertheassociatedresourcethatappearsinthetweet(image,video,shortenedURIorotherembeddedresource),wewillextractonlythetweetsthatcontainanembeddedresource.
Thisstepresultedin181milliontweetswithembeddedresources(http://is.
gd/1WMZbinthepriorexample).
ThesetweetswerefurtherlteredtokeeponlythetweetsthathaveatleastoneoftheexpandedtagsobtainedfromTable1.
Thenumberoftweetsafterthisphasereached1.
1milliontweets.
Filteringthetweetsbasedontheoccurrenceofatleastoneofthehashtagsonlyisundesirableasitwillcausetwoproblems:First,itwillintroducepossibleeventoverlapduetogeneraltweetstalkingabouttwoormoretopics.
Second,isthatusingonlythesingleoccurrenceofthesetagswillyieldahugeamountoftweetsandweneedtoreducethissizetoreachamoremanageablesize.
In-tuitivelyspeaking,stronglyrelatedhashtagswillco-occuroften.
Forexample,atweetthathas#h1n1alongwith#swineuand#pandemicismostlikelyabouttheH1N1outbreakratherthanatweethavingjustthetag#uorjust#sick.
Filteringwiththisco-occurrencewillinturnsolvebothproblemsasbyincreasingrelevancetoaparticularevent,generaltweetsthattalkaboutseveraleventswillbelteredoutthusdiminishingtheoverlap,andinturnitwillreducethesizeofthedataset.
Next,weincreasetheprecisionofthetweetsassociatedwitheacheventfromthesetof1.
1milliontweets.
Intherstiterationweselectedthetagthathadthehighestfrequencyofco-occurrenceinthedatasetwiththeinitialtagandaddedittoasetwewillcalltheselectionset.
Afterthatwechecktheco-occurrenceofalltheremainingextractedtagswiththetagintheselectionsetandrecordthefrequenciesofco-occurrence.
Aftersortingthefrequenciesofco-occurrencewiththetagfromtheselectionset,wepickthehighestonetokeepaddittotheselectionset.
Werepeatthisstepofcountingco-occurrencesbutwithallthepreviouslyextractedhashtagsintheselectionsetfrompreviousiterations.
Toelaborate,forH1N1assumethatthehastag'#h1n1'hadthehighestfrequencyofappearanceinthedatasetsoweaddittotheselectionset.
IntheLosingMyRevolution5nextiterationwerecordthehowmanytimeseachtaginthelistappearedalongwith'#h1n1'inasametweet.
Ifweselected'#swine'astheonewiththehighestfrequencyofoccurrencewiththeinitialtag'#h1n1'weaddittotheselectionlistandinthenextiterationwerecordthefrequencyofoccurrenceoftheremaininghashtagswithbothoftheextractedtags'#h1n1'and'#swine'.
Werepeatthisstep,foreachevent,tothepointwherewehaveamanageablesizedatasetwhichwearecondentinits'aboutness'inrelationtotheevent.
EventHashtagsselectedforlterationTweetsExtractedOperationPerformedFinalTweetsMJmichael27,075michael&michaeljackson22,934Sample10%2,293Iraniran949,641iran&iranelection911,808iran&iranelection&gr88189,757iran&iranelection&gr88&neda91,815iran&iranelection&gr88&neda&tehran34,294Sample10%3,429H1N1h1n161,351h1n1&swine44,972h1n1&swine&swineflu42,574h1n1&swine&swineflu&pandemic5,517TakeAll5,517Obamaobama48,161obama&nobel1,118TakeAll1,118Table2.
TweetFiltrationiterationsandnaltweetcollectionsTwoproblemsappearedfromthisapproachwiththeIranandMichaelJack-sondatasets.
IntheIrandatasetthenumberoftweetswasinhundredsofthou-sandsandevenwith5tagsco-occurrenceitwasstillabout34K+tweets.
Tosolvethisweperformedarandomsamplingfromthoseresultingtweetstotakeonly10%ofthemresultinginasmallermanageabledataset.
ThesecondproblemwiththeMichaelJacksondatasetuponusing5tagstodecreaseittoamanage-ablesizewerealizedtherewerefewuniquedomainsfortheembeddedresources.
Acloserlookrevealedthiscombinationoftagswasmostlyborder-linetweetspam(MJringtones).
Tosolvethisweusedonlythetwotoptags"#michael"and"#michaeljackson",andthenwerandomlysampled10%oftheresultingtweetstoreachthedesireddatasetsize(Table2).
3.
2EgyptianRevolutionDatasetTheoneyearanniversaryofthiseventwastheoriginalmotivationforthisstudy[13].
Inthiscase,westartedwithaneventandthentriedtogetso-cialmediacontentdescribingit.
Despiteitsubiquity,gatheringsocialmediaforapasteventissurprisinglyhard.
WepickedtheEgyptianrevolutionduetotheroleofthesocialmediaincuratinganddrivingtheincidentsthatledtotheresignationofthepresident.
SeveralinitiativeswerecommencedtocollectandcuratethesocialmediacontentduringtherevolutionlikeR-sheif.
org2whichspecializesinsocialcontentanalysisoftheissuesintheArabworldbyusingaggregatedatafromTwitterandtheWeb.
WearecurrentlyintheprocessofobtainingthemillionsofrecordsrelatedtotheArabSpringof2011.
Meanwhile,wedecidedtobuildourowndatasetmanually.
2http://www.
r-shief.
org/6HanyM.
SalahEldeenandMichaelL.
NelsonThereareseveralsitesthatcurateresourcesabouttheEgyptianRevolutionandwewanttoinvestigateasmanyofthemaspossible.
Atthesametime,weneedtodiversifyourresourcesandthetypesofdigitalartifactsthatareembeddedinthem.
Tweets,videos,images,embeddedlinks,entirewebpagesandbookswereincludedinourinvestigation.
Forthesakeofconsistency,welimitedouranalysistoresourcescreatedwithintheperiodfromthe20thofJanuary2011tothe1stofMarch2011.
Inthenextsubsectionsweexplaineachoftheresourcesweutilizedinourdataacquisitionindetail.
Storify:StorifyisawebsitethatenablesuserstocreatestoriesbycreatingcollectionsofURIs(e.
g.
,Tweets,images,videos,links)andarrangethemtem-porally.
Theseentriesarepostedbyreferencetotheirhostwebsites.
Thus,addingcontenttoStorifydoesnotnecessarilymeanitisarchived.
IfauseraddedavideofromYouTubeandafterawhilethepublisherofthatvideodecidedtoremoveitfromYouTubetheuserisleftwithagapintheirStorifyentry.
ForthispurposewegatheredalltheStorifyentriesthatwerecreatedbetween20thofJanuary2011andthe1stofMarch2011,resultingin219uniqueresources.
IAmJan25:Someentirewebsiteswerededicatedasacollectionhubofmediatocuratetherevolution.
Basedonpubliccontributions,thosewebsitescollectdierenttypesofmedia,classifythem,orderthemchronologicallyandpublishthemtothepublic.
WepickedawebsitenamedIAmJan25.
com,asanexampleofthesewebsites,toanalyzeandinvestigate.
Theadministratorsofthewebsitereceivedselectedvideosandimagesfornotableeventsandactionsthathappenedduringtherevolution.
Thoseimagesandvideoswereselectedbyusersastheyvouchedforthemtobeofsomeimportanceandtheysendtheresource'sURItothewebsiteadministrators.
Thewebsiteitselfisdividedintotwocollections:avideocollectionandanimagecollection.
Thevideocollectionhad2387uniqueURIswhiletheimagecollectionhad3525uniqueURIs.
TweetsFromTahrir:Severalbookswerepublishedin2011documentingtherevolutionandtheArabSpring.
TobridgethegapbetweenbooksanddigitalmediaweanalyzedabookentitledTweetsfromTahrir[11]whichwaspub-lishedonApril21st,2011.
Asthenamestates,thisbooktellsastoryformedbytweetsofpeopleduringtherevolutionandtheclasheswiththepastregime.
Weanalyzedthisbookasacollectionoftweetsthathadtheluxuryofapaperbackpreservationandfocusedonthetweetedmedia,inthiscaseimages.
Thebookhadatotalof1118tweetshaving23uniqueimages.
3.
3SyriaDatasetThisdatasethasbeenselectedtorepresentacurrent(March2012)event.
UsingtheTwittersearchAPI,wefollowedthesamepatternofdataacquisitionasinsection3.
1.
Westartedwithonehashtag,#Syria,andexpandedit.
Table3LosingMyRevolution7showthetagsproducedfromthetagexpansionstep.
AfterthateachofthosetagswereinputintoaprocessutilizingtheTwitterstreamingAPIandproducedtherst1000resultsmatchingeachtag.
Fromthisset,werandomlysampled10%.
Asaresult,1955tweetswereextractedeachhavingoneormoreembeddedresourcesandtagsfromtheexpandedtagsinTable3.
InitialHashtagsExtractedHashtags'Syria''Bashar''RiseDamascus''GenocideInSyria''STOPASSAD2012''AssadCrimes''Assad'Table3.
Twitter#TagsgeneratedforlteringtheSyrianuprisingTable4showstheresourcescollectedalongwiththetopleveldomainsthatthoseresourcesbelongtoforeachevent.
EventTopDomains(numberofresourcesfound)MJyoutube(110),twitpic(45),latimes(43),cnn(30),amazon(30)Iranyoutube(385),twitpic(36),blogspot(30),roozonline(29)H1N1rhizalabs(676),reuters(17),google(16),utrackers(16),calgaryherald(11)Obamablogspot(16),nytimes(15),wordpress(12),youtube(11),cnn(10)Egyptyoutube(2414),cloudfront(2303),yfrog(1255),twitpic(114),imageshack.
us(20)Syriayoutube(130),twitter(61),hostpic.
biz(9),telegraph.
co.
uk(5)Table4.
Thetopleveldomainsfoundforeacheventordereddescendinglybythenumberofresources.
4UniquenessandExistenceFromthepreviousdatagatheringstepweobtainedsixdierentdatasetsrelatedtosixdierenthistoricevents.
ForeacheventweextractedalistofURIsthatweresharedintweetsoruploadedtositeslikeStorifyorIAmJan25.
ToanswerthequestionofhowmuchofthesocialmediacontentismissingwetestthoseURIsforeachdatasettoeliminateURIaliasesinwhichseveralURIsidentifytothesameresource.
UponobtainingthoseuniqueURIsweexaminehowmanyofwhicharestillavailableonthelivewebandhowmanyareavailableinpublicwebarchives.
4.
1UniquenessSomeURIs,especiallythosethatappearinTwitter,maybealiasesforthesameresource.
Forexample"http://bit.
ly/2EEjBl"and"http://goo.
gl/2ViC"bothresolveto"http://www.
cnn.
com".
Tosolvethis,weresolvedalltheURIsfollowingredirectstothenalURI.
TheHTTPresponseofthelastredirecthasaeldcalledlocationthatcontainstheoriginallongURIoftheresource.
ThisstepreducedthetotalnumberofURIsinthesixdatasetsfrom21,625to11,051.
Table5showsthenumberofuniqueresourcesineverydataset.
4.
2ExistenceontheLive-WebAfterobtainingtheuniqueURIsfromthepreviousstepweresolveallofthemandclassifythemasSuccessorFailure.
TheSuccessclassincludesalltheresources8HanyM.
SalahEldeenandMichaelL.
NelsonAllUnique2,2931,187=51.
77%MJArchivedNotArchivedAvailable316=26.
62%474=39.
93%Missing90=7.
58%307=25.
86%397=33.
45%406=34.
20%each/1,187AllUnique3,4291,340=39.
08%IranArchivedNotArchivedAvailable415=30.
97%586=43.
73%Missing101=7.
54%238=17.
76%339=25.
30%516=38.
51%each/1,340AllUnique5,5171,645=29.
82%H1N1ArchivedNotArchivedAvailable595=36.
17%656=39.
88%Missing98=5.
96%296=17.
99%394=23.
95%693=42.
12%each/1,645AllUnique1,118370=33.
09%ObamaArchivedNotArchivedAvailable143=38.
65%135=36.
49%Missing33=8.
92%59=15.
95%92=24.
86%176=47.
57%each/370AllUnique7,3136,154=84.
15%EgyptArchivedNotArchivedAvailable1,069=17.
37%4440=72.
15%Missing173=2.
81%472=7.
67%645=10.
48%1242=20.
18%each/6,154AllUnique1,955355=18.
16%SyriaArchivedNotArchivedAvailable19=5.
35%311=87.
61%Missing0=0%25=7.
04%25=7.
04%19=5.
35%each/355Table5.
Percentagesofuniqueresourcesfromalltheextractedonesweobtainedpereventandthepercentagesofpresenceofthoseuniqueresourcesonlivewebandinarchives.
Allresources=21,625,Uniqueresources=11,051thatultimatelyreturna"200OK"HTTPresponse.
TheFailureclassincludesalltheresourcesthatreturna"4XX"familyresponselike:"404NotFound","403Forbidden"and"410Gone",the"30X"redirectfamilywhilehavinginniteloopredirects,andservererrorswithresponse"50X".
Toavoidtransienterrorswerepeatedtherequests,onalldatasets,severaltimesforaweektoresolvethoseerrors.
Wealsotestfor"Soft404s",whicharepagesthatreturn"200OK"responsecodebutarenotarepresentationoftheresource,usingatechniquebasedonaheuristicforautomaticallydiscoveringsoft404sfromBar-Yossefetal.
[2].
Wealsoincludenoresponsefromtheserver,aswellasDNStimeouts,asfailures.
Notethatfailuremeansthatthisresourceismissingontheliveweb.
Table5summarizes,foreachdataset,thetotalpercentagesoftheresourcesmissingfromthelivewebandthenumberofmissingresourcesdividedbythetotalnumberofuniqueresources.
4.
3ExistenceintheArchivesInthepreviousstepwetestedtheexistenceoftheuniquelistofURIsforeacheventontheliveweb.
Next,weevaluatehowmanyURIshavebeenarchivedinpublicwebarchives.
TocheckthosearchivesweutilizetheMementoframe-work.
IfthereisamementofortheURI,wedownloaditsmementotimemapandanalyzeit.
Thetimemapisadatestamporderedlistofallknownarchivedver-sions(called"mementos")ofaURI.
Next,weparsethistimemapandextractLosingMyRevolution9thenumberofmementosthatpointtoversionsoftheresourceinthepublicarchives.
Wedeclaretheresourcetobearchivedifithasatleastonememento.
Thisstepwasalsorepeatedseveraltimestoavoidthetransientstatesofthearchivesbeforedeemingaresourceasunarchived.
TheresultsofthisexperimentalongwiththearchivecoveragepercentagearepresentedinTable5.
5ExistenceasaFunctionofTimeInspectingtheresultsfromthepreviousstepssuggeststhatthenumberofmiss-ingsharedresourcesinsocialmediacorrespondingtoaneventisdirectlypropor-tionalwithitsage.
Todeterminedatesforeachoftheeventsthisweextractedallthecreationdatesfromallthetweet-baseddatasetsandsortedthem.
Foreachevent,weplottedagraphillustratingthenumberoftweetsperdayrelatedtothateventasshowningure1.
Sincethedatasetisseparatedtemporallyinto3partitions,andinordertodisplayalltheeventsononegraphwereducedthesizeofthex-axisbyremovingthetimeperiodsnotcoveredinourstudy.
Fig.
1.
URIssharedperdaycorrespondingtoeacheventandshowingthetwopeaksinthenon-Syrianandnon-EgyptianeventsUponexaminingthegraphwefoundaninterestingphenomenainthenon-Syrianandnon-Egyptianevents:eacheventhastwopeaks.
Uponinvestigatinghistorytimelineswecametoconclusionthatthosepeaksreectasecondwaveofsocialmediainteractionasaresultofnewincidentwithinthesameeventafteraperiodoftime.
Forexample,intheH1N1dataset,therstpeakillustratestheworld-wideoutbreakannouncementwhilethesecondpeakdenotesthereleaseofthevaccine.
IntheIrandataset,therstpeakshowsthepeakoftheelectionswhilethesecondpeakpinpointstheIraniantrials.
AsfortheMJdatasettherstpeakcorrespondstohisdeathandthesecondpeakdescribestherumorsthatMichaelJacksondiedofunnaturalcausesandapossiblehomicide.
FortheObamadataset,therstpeakrevealstheannouncementofhiswinningtheprizewhilethesecondpeakpresentstheaward-givingceremonyinOslo.
FortheEgyptianevolution,theresourcesareallwithinasmalltimeslotof2weeks10HanyM.
SalahEldeenandMichaelL.
Nelsonaroundthedate11thofFebruary.
AsfortheSyrianevent,sincethecollectionwasveryrecenttherewasnoobviouspeaks.
Thosepeaksweexaminedwillbecometemporalcentroidsofthesocialcontentcollections(thedatasets).
MJ(June25th&July10th2009),Iran(June13th&1stAugust2009),H1N1(September11th&5thOctober2009),andObama(October9th&December10th2009).
Egyptwas(February11th2011)andtheSyriadatasetalsohadonecentroidonMarch27th2012.
Wespliteacheventaccordingtothetwocentroidsineacheventaccordingly.
Figure1showsthosepeaksandTable6showsthemissingcontentandthearchivedcontentpercentagescorrespondingtoeachcentroid.
MJIranH1N1ObamaEgyptSyria%Missing36.
24%31.
62%26.
98%24.
47%23.
49%25.
64%24.
59%26.
15%10.
48%7.
04%%Archived39.
45%30.
78%43.
08%36.
26%41.
65%43.
87%47.
87%46.
15%20.
18%5.
35%Table6.
TheSplitDatasetFig.
2.
Percentageofcontentmissingandarchivedfortheeventsasafunctionoftime.
Figure2showsthemissingandarchivedvaluesfromTable6asafunctionoftimesinceshared.
Equation1showsthemodeledestimateforthepercentageofsharedresourceslost,whereAgeisindays.
Whilethereisalesslinearrelationshipbetweentimeandbeingarchived,equation2showsthemodeledestimateforthepercentageofsharedresourcesarchivedinapublicarchive.
ContentLostPercentage=0.
02(Ageindays)+4.
20(1)ContentArchivedPercentage=0.
04(Ageindays)+6.
74(2)Giventheseobservationsandourcurvettingweestimatethatafterayearfrompublishingabout11%ofcontentsharedinsocialmediawillbegone.
Afterthispoint,wearelosingroughly0.
02%ofthiscontentperday.
LosingMyRevolution116ConclusionsandFutureworkWecanconcludethatthereisanearlylinearrelationshipbetweentimeofshar-inginthesocialmediaandthepercentagelost.
Althoughnotaslinear,thereisasimilarrelationshipbetweenthetimeofsharingandtheexpectedpercentageofcoverageinthearchives.
Toreachthisconclusion,weextractedcollectionsoftweetsandothersocialmediacontentthatwaspostedandsharedinrelationtosixdierenteventsthatoccurredinthetimeperiodfromJune2009toMarch2012.
Nextweextractedtheembeddedresourceswithinthissocialmediacontentandtestedtheirexistenceonthelivewebandinthearchives.
Afteranalyzingthepercentageslostandarchivedinrelationtotimeandplottingthemweusedalinearregressionmodeltotthosepoints.
Finallywepresentedtwolinearmodelsthatcanestimatetheexistenceofaresource,thatwaspostedorsharedatonepointoftimeinthesocialmedia,onthelivewebandinthearchivesasafunctionofageinthesocialmedia.
Inthenextstageofourresearchweneedtoexpandthedatasetsandimportothersimilardatasetsespeciallyintheuncoveredtemporalareas(e.
g.
,theyearof2010andbefore2009).
Examiningmoredatasetsacrossextendedpointsintimecouldenableustobettermodelthesetwofunctionsoftime.
Alsoseveralotherfactorsbesidetimewouldbeanalyzedtounderstandtheireectonpersistenceonthelivewebandarchivingcoveragelike:publishingvenue,rateofsharing,popularityofauthorsandthenatureoftherelatedevent.
7AcknowledgmentsThisworkwassupportedinpartbytheLibraryofCongressandNSFIIS-1009392.
References1.
Ainsworth,ScottG.
andAlsum,AhmedandSalahEldeen,HanyandWeigle,MicheleC.
andNelson,MichaelL.
:HowMuchoftheWebIsArchivedInProceedingsofthe11thannualinternationalACM/IEEEjointconferenceonDigitallibraries,JCDL'11,pages133-136,(2011).
2.
Bar-Yossef,ZivandBroder,AndreiZ.
andKumar,RaviandTomkins,Andrew.
:SicTransitGloriaTelae:TowardsanUnderstandingoftheWeb'sDecay.
InProceedingsofthe13thinternationalconferenceonWorldWideWeb,WWW'04,pages328-337,(2004).
3.
F.
Benevenut,T.
Rodrigues,M.
Cha,andV.
Almeida.
:CharacterizingUserBehav-iorinOnlineSocialNetworks.
InInProc.
ofACMSIGCOMMInternetMeasure-mentConference,SIGCOMM'09,pages49-62,(2009).
4.
Lee,CheiandMa,LongandGoh,Dion.
:WhyDoPeopleShareNewsinSocialMediaActiveMediaTechnology,SpringerBerlin/Heidelberg,pages129-140,Vol-ume:6890,(2011).
12HanyM.
SalahEldeenandMichaelL.
Nelson5.
Facebookocialfactsheet,http://newsroom.
fb.
com/content/default.
aspxNewsAreaId=226.
Kwak,HaewoonandLee,ChanghyunandPark,HosungandMoon,Sue.
:WhatisTwitter,aSocialNetworkoraNewsMediaInProceedingsofthe19thinternationalconferenceonWorldwideweb,WWW'10,pages591-600,(2010).
7.
GordonMohr,MicheleKimpton,MichealStackandIgorRanitovic.
:IntroductiontoHeritrix,anArchivalQualityWebCrawler.
In4thInternationalWebArchivingWorkshop,IWAW'04,(2004).
8.
FrankMcCownandNorouDiawaraandMichaelL.
Nelson.
:FactorsAectingWebsiteReconstructionfromtheWebInfrastructure.
InProceedingsofthe7thACM/IEEE-CSJointConferenceonDigitalLibraries,JCDL'07,pages39-48,(2007).
9.
MichaelL.
Nelson,B.
DanetteAllen.
:ObjectPersistenceandAvailabilityinDigitalLibraries.
D-LibMagazine,Volume8,Number1,January(2002)10.
M.
E.
J.
NewmanandJ.
Park.
:Whysocialnetworksaredierentfromothertypesofnetworks.
Phys.
Rev.
E,68(3):036122,September,(2003).
11.
AlexNunnsandNadiaIdle.
:TweetsFromTahrir.
ISBN-10:1935928457.
12.
T.
A.
PhelpsandR.
Wilensky.
:RobustHyperlinksCostJustFiveWordsEach.
TechnicalReport,UCB/CSD-00-1091,EECSDepartment,UniversityofCalifornia,Berkeley,(2000).
13.
HanyM.
SalahEldeen,MichaelL.
Nelson.
:LosingMyRevolution:AyearaftertheEgyptianRevolution,10%ofthesocialmediadocumentationisgone.
http://ws-dl.
blogspot.
com/2012/02/2012-02-11-losing-my-revolution-year.
html14.
RobertSanderson,MarkPhillipsandHerbertVandeSompel.
:AnalyzingthePersistenceofReferencedWebResourceswithMemento.
CoRR,arXiv:1105.
3459,(2011)15.
StanfordSNAPProjectDataset,http://snap.
stanford.
edu/16.
Twitternumbers,http://blog.
Twitter.
com/2011/03/numbers.
html17.
H.
VandeSompel,M.
L.
Nelson,R.
Sanderson,L.
L.
Balakireva,S.
Ainsworth,H.
Shankar.
:Memento:TimeTravelfortheWeb,TechnicalReport,arXiv:0911.
1112,November,(2009).
18.
Wan,X.
,Yang,J.
:Wordrank-basedLexicalSignaturesforFindingLostorRelatedWebPages.
InProceedingsofthe8thAsia-PacicWebconferenceonFrontiersofWWWResearchandDevelopment,APWeb'06,pages843-849,(2006).
19.
C.
Wilson,B.
Boe,A.
Sala,K.
P.
Puttaswamy,andB.
Y.
Zhao.
:UserInteractionsinSocialNetworksandtheirImplications.
InProceedingsofthe4thACMEuropeanconferenceonComputersystems,EuroSys'09,pages205-218,(2009).
20.
Wu,ShaomeiandHofman,JakeM.
andMason,WinterA.
andWatts,DuncanJ.
:WhoSaysWhattoWhomonTwitter.
InProceedingsofthe20thinternationalconferenceonWorldwideweb,WWW'11,pages705-714,(2011).
21.
JaewonYangandJureLeskovec.
:PatternsofTemporalVariationinOnlineMedia.
InACMInternationalConferenceonWebSearchandDataMinig,WSDM'11,pages177-186,(2011).
22.
J.
YangandS.
Counts.
:PredictingtheSpeed,Scale,andRangeofInformationDiusioninTwitter.
In4thInternationalAAAIConferenceonWeblogsandSocialMedia,ICWSM'10,May,(2010).
23.
D.
ZhaoandM.
B.
Rosson.
:HowandWhyPeopleTwitter:TheRolethatMicro-bloggingPlaysinInformalCommunicationatWork.
InProceedingsoftheACM2009internationalconferenceonSupportinggroupwork.
GROUP'09,pages243-252,(2009).

RackNerd 黑色星期五5款年付套餐

RackNerd 商家从2019年上线以来争议也是比较大的,一直低价促销很多网友都认为坚持时间不长可能会跑路。不过,目前看到RackNerd还是在坚持且这次黑五活动也有发布,且活动促销也是比较多的,不过对于我们用户来说选择这些低价服务商尽量的不要将长远项目放在上面,低价年付套餐服务商一般都是用来临时业务的。RackNerd商家这次发布黑五促销活动,一共有五款年付套餐,涉及到多个机房。最低年付的套餐...

RackNerd新上圣何塞、芝加哥、达拉斯、亚特兰大INTEL系列,$9.49/年

racknerd怎么样?racknerd商家最近促销三款美国便宜vps,最低只需要9.49美元,可以选择美国圣何塞、西雅图、纽约和芝加哥机房。RackNerd是一家成立于2019年的美国高性价比服务器商家,主要从事美国和荷兰数据中心的便宜vps、独立服务器销售!支持中文工单、支持支付宝和微信以及PayPal付款购买!点击直达:racknerd官方网站INTEL系列可选机房:加利福尼亚州圣何塞、芝加...

弘速云(28元/月)香港葵湾2核2G10M云服务器

弘速云怎么样?弘速云是创建于2021年的品牌,运营该品牌的公司HOSU LIMITED(中文名称弘速科技有限公司)公司成立于2021年国内公司注册于2019年。HOSU LIMITED主要从事出售香港vps、美国VPS、香港独立服务器、香港站群服务器等,目前在售VPS线路有CN2+BGP、CN2 GIA,该公司旗下产品均采用KVM虚拟化架构。可联系商家代安装iso系统,目前推出全场vps新开7折,...

403forbidden为你推荐
strategicsns芜湖三七互娱网络科技集团股份有限公司360搜狗奇虎360到底是做什么的?360退出北京时间在国外如何把手机时间调回到中国北京时间?阅读http如何发帖子请问在网上发帖子怎么发?如何发帖子怎么发帖子啊?什么是seo小红妹 seo是什么意思?seo网站优化该怎 随机阅读 seo是什么意思开源网店开源网店iWebMall中会员管理包括哪些只要内容呢?关闭评论iOS12抖音直播怎样关闭评论?
duniu securitycenter photonvps hawkhost 美国主机网 免备案空间 sockscap 免费网站监控 免费个人网站申请 php空间申请 毫秒英文 刀片服务器是什么 网站木马检测工具 如何安装服务器系统 域名dns 腾讯总部在哪 河南移动梦网 徐州电信 服务器硬件配置 广东服务器托管 更多