browserfilesonic
filesonic 时间:2021-03-28 阅读:(
)
ReSurf:ReconstructingWeb-SurngActivityFromNetworkTrafcGuowuXieUCRiversideMariosIliofotouNarus,Inc.
ThomasKaragiannisMicrosoftResearchMichalisFaloutsosUCRiversideYaohuiJinShanghaiJiaoTongUniversityABSTRACTMoreandmoreapplicationsandservicesmovetothewebandthishasledtowebtrafcamountingtoasmuchas80%ofallnetworktrafc.
Atthesametime,mosttrafcclassi-cationeffortsstoponcetheycorrectlyclassiedaowasweborHTTP.
Inthispaper,wefocusonunderstandingwhathappens"underthehood"ofHTTPtrafc.
OneofourkeycontributionisReSurf,asystematicapproachtoreconstructweb-surngactivitystartingfromrawnetworkdata.
EvenwhenHTTPtrafcisunencrypted,thisproblemisfarfromtrivial.
Akeychallengeisthatwebsitesarecomplex:asin-gleuserrequest(thinkuserclick)createsmanynetworklevelowstomanydifferentwebsites.
ReSurfovercomesthesechallengesandreconstructsonaverage91%ofuserrequestswithmorethan95%precision.
Oursecondcontributionisanextensiveanalysisofwebactivityoverfourdifferentnet-worktraces,includingaresidentialISP,alargeuniversitycampus,andmobiledatafromacellularprovider.
Byuti-lizingReSurf,westudytheuserbehaviorintermsofuserrequestsissuedandtransitionsbetweenwebsites(e.
g.
theclick-throughhistoryoffollowinghyperlinks).
Intermsofuserrequestsexplicitlyissuedtowardsasite,Facebookdom-inatesinourmobiletracewith38%ofuserrequestscom-paredto11%forGoogle.
Asurprisingresultisthe"shal-lowness"oftheclick-throughstreamwithamedianofonewebsitetransition.
Finally,wendthatmobileuserrequestsdownloadonethirdoftheobjectsandgenerateonetenthofthetrafccomparedtouserrequestsonthewirelinetraces.
1.
INTRODUCTIONHTTPisthenewIPintheWeb2.
0world,andtrafcanal-ysismethodsneedtoadapttothisnewreality.
First,webbrowsersarebeingusedastheubiquitousinterfacetoalargenumberofservicesandapplications,suchasEmail,gaming,lesharing,videostreaming,andsocialnetworkingsites.
Second,todayHTTPisthemostwidelyusedprotocol,con-tributingupto80%ofthetrafconsomenetworks[12].
Oneimplicationofthesetrendsisthelimitedrelevanceandap-plicabilityoftraditionaltrafcanalysisandcharacterizationtools[9,19].
AssigningowstoanHTTPcategorytodayconveysverylimitedinformationwithregardtotheusageofwebsites/servicesandwebusersbehaviors.
Giventheabovetrends,itisincreasinglyimportantfornetworkadministratorstomonitorandcharacterizewebtrafcforoperationalandsecuritypurposes.
First,under-standingtrafcisimportantformanagingandprovisioningone'snetwork.
Second,suchcapabilitiesareimportantforsecurity,sincemodernmalwarespreadsviawebsitesandbotnetcommand&controlchannelsutilizeHTTP.
Overall,themoreinformationadministratorshaveaboutthetrafc,themoreeffectivelytheycanmanagethenetwork,identifyanomaliesandpreventattacks.
Inaddition,analyzingwebtrafcisimportantforresearchersthatwanttostudymodernwebsitesandunderstandtheirevolution[8,6].
Theoverarchingproblemweaddressinthispaperisthefollowing.
Givenwebtrafccollectedatanetworklink,wewanttobeabletolook"underthehood"andreconstructtheuserbehavior.
Hereisalistofmotivatingquestions:(a)Whatwebsites(e.
g.
,facebook.
com,cnn.
com)areexplicitlyrequestedbyauserasopposedtobeingaccessedautomaticallyinthebackground(b)Howmuchtrafcisgeneratedbyeachrequestand(c)Whatarethetypicalwebsurnguserpatternsandthetypicalreferralrelation-shipsacrosswebsitesWewanttoanswerthesequestionsstartingfromrawnetworktrafc,suchasatcpdumptrace,orweb-proxyrecords[8].
Surprisinglyperhaps,answeringthesequestionsischal-lengingevenwhenHTTPheadersandpayloadsarenoten-crypted.
First,usersoftenbrowsemultiplewebsitesatthesametime,whichcausesowsandHTTPrequeststointer-mingle.
Second,modernwebpagesarefairlycomplex[6];oftenrenderingasinglepagegeneratestensofHTTPre-quests1towardsdifferentwebservers.
Third,manyweb-sites,suchascontentdistributionnetworks(CDNs),web-adservers,andwebanalyticsservicesareusedbymanyweb-sitesandsharedacrossseveralservices.
AlltheabovemaketheproblemofattributingindividualHTTPrequeststoauserrequestandtothecorrectprimarywebsitequitecomplex.
Wediscussthechallengesofthisproblemandthelimita-tionsofpreviouseffortsinSections2and5.
1EachHTTPrequestcorrespondstoaweb-objectsuchasimage,video,orjavascript.
TheinitialHTTPrequestistypicallyawebpage(e.
g.
*html),whichcanincludeotherobjects,whicharethenacquiredbyseparateHTTPrequests.
Userrequest12345WebsiteAWebsiteBWebsiteCUserrequestClickthroughClick-throughstreamFigure1:Web-surngactivity:(top)AsingleuserrequestrstgeneratesanHTTPrequesttocnn.
com,andthenmul-tiplesubsequentHTTPrequeststootherwebservers(e.
g.
,doubleclick.
com);(bottom)Anexampleofclick-throughstreamofuserrequestsoverdifferentwebsites.
Toillustratethecomplexityofthetask,weshowvisuallythetechnicalquestionsweaddressthroughthetoyexamplesofFigure1.
Thegraphatthetopshowsauserrequesttowardswww.
cnn.
com,whichthentriggersasequenceofrequeststowardsaCDNserverakamai.
com,aweb-adserverdoublecklick.
com,awebanalyticssevergoogle-analytics.
com,andothers.
Therequestsandresponsesarenumberedbasedonhowtheyoccurovertime.
Forsimplicity,onlytherstresponsefromthecnn.
comwebsiteisshown.
Lookingateachoftheserequestsinisolation,itishardtoidentifytheprimarywebsiteorservicethattriggeredthem.
Forexample,itishardtosaythattherequesttowardstheCDNserveractuallyservestherenderingofthecnn.
comwebsite.
Infact,wendthatsimplyusingthehostnameoftheservertomaprequeststowebsitesresultsinlessthan40%accuracy.
Makingtheproblemmorespecic,wecanidentifytwosub-tasks:(a)wewanttogroupHTTPrequestsgeneratedbyasingleuserrequest,suchasaclick,andassociatethemwiththeprimarywebsiterequestedbytheuser,e.
g.
,cnn.
cominFigure1;and(b)wewanttoreconstructtheclick-throughstream,i.
e.
,thereferralrelationshipoftheweb-surng,tocaptureifauser'srequesttoawebsiteisfromahyperlinkclickedonanearlierwebsiteorfromwithinthesameweb-site.
Atoyexampleofaclick-throughstreamisshownatthebottomofFigure1.
Here,theuserclicksfromwebsiteAtowebsiteBandshealsoclicksfromwebsiteAtowebsiteC.
ThegraphalsoshowstheuserissuingtwoclicksinsidewebsiteAandtwoclicksinsidewebsiteB.
Understandingwebtrafcatboththeuserrequestandclick-throughlevelprovidesvisibilityintotheuser'sweb-surngactivity.
Theproblem,asdenedhere,hasnotreceivedmuchat-tention.
Mostexistingwebtrafcstudiesfocusondiffer-entproblemsandmosttrafcclassicationeffortsstoponcetheyidentifyaowaswebtrafc[9,19],whichisourstart-ingpoint.
Severalstudies[2,6]focusonunderstandingthecomplexityofpopularwebsitesbyanalyzingtheirhomepageandtheirevolution[8],butbutitsfocuswasnotrealusersurngbehavior.
Otherwork[5,16]justfocusesonhowusersinteractwithspeciconlinesocialnetworkwebsites.
WediscussandcomparewithpreviouseffortsinSections3.
2and5.
Inthispaper,wemaketwomaincontributions:(a)wepresentReSurf,aneffectiveapproachtoreconstructweb-userbehavior,and(b)weconductanextensivemeasurementstudywithfourrealnetworktraces.
OurdatasetsincludearesidentialISPinEurope,amobileprovideroffering3Gand4GservicesintheUS,agroupofusersfromaresearchlabintheUS,andalargeuniversitycampusnetworkinChina.
Ourdatasetshaveupto19billionHTTPtransactions,andrangeindurationfromathreehourstosixmonths.
ForevaluatingandcomparingReSurf,wealsouseasyntheticdatasetandadatasetbasedonaccessingthemostpopularAlexasites.
Weexplainourtwocontributionsinmoredetailbelow.
(a)ReSurf:reconstructingweb-surngactivity.
Wedevelopasystematicmethodologythatoperatesintwosteps:(i)wereconstructuserrequests(e.
g.
,clicks,refreshes),and(ii)wedeterminetheclick-throughbehavioroftheuser.
Toachievethis,wecombineinformationfromtheHTTPheaderwithtiminginformationatthenetworklevel.
Inanutshell,ReSurfusesthereferrerrelationshipsbetweenrelatedHTTPrequeststotracebacktheuserrequestthatgeneratedit.
Itthenusesthesizeandtypeofthedownloadobjects,aswellexplicittiminginformationbetweensuccessiverequestsinordertoachievethedesirableprecision.
Weprovidethede-tailsofReSurf,discusslimitations,anddescribeourevalua-tioninSection3.
(b)Extensivemeasurementsandvalidation.
Ourexper-imentswithrealdatatracesandourvalidationwithbothrealandsynthesizeddataprovidethefollowinghighlights:ReSurfcanreconstructweb-surngaccurately.
Weshowthatourapproachcanidentifyandreconstructuserrequestswithmorethan95%precisionand91%recallonallourtraces.
Ourvalidationfurtherhighlightsweaknessesinthecurrentstate-of-the-artmethodologiesthatrelyonwebanalyticsbeacons[8].
Theusualsuspectsdominate:Google/Baidu,Facebook/Renren,andadultsites.
Wequantifythepresenceofthedominantplayersofwebtrafc.
InourtracesfromEuropeandtheUS,adultsitescontribute40%ofthetrafc.
Googleisthetopreferrerwebsiteinallourtrafctraces,with34%and49%ofallinter-sitereferralsinourISPandmobiletrace,respectively.
Facebookleadsthewaywith38%ofuserrequestsinourmobiletrace,althoughGooglehasasmall2leadinuserrequestintheISPtrace.
InourChinesetrace,BaiduandRenrenaredominating,followedbyTaobao,anonlineshoppingportal.
Interestingly,evenintheChinesedataset,Googleisthesecondmostactivereferrersite.
Surprisingly,lesharingsiteslikebitshre.
com,filesonic.
comand115.
comaccountfor30-35%ofthetrafcinvolumeinboththeEuropeanresidentialISPandtheChineseUniversitycampustraces.
Webcachingreducesthenumberofdownloadedweb-objectsbythreetimes.
Whenaccessingthesamewebsiteseveraltimesweobservethatonaverage2/3oftheobjectsarecached.
Therefore,usingsyn-thetictracesthataccessespopularAlexawebsites[6],canoverestimatethegeneratednetworktrafc.
Click-throughstreamsare"shallow.
"Surprisingly,themediannumberofwebsitesinaclick-throughstreamisjustone.
Moreover,weobservedthatonly5%oftheclick-throughstreamshavemorethanthreewebsites.
Mobileuserrequestsgenerateonetenthofthetraf-ccomparedtowirelineuserrequests.
Onaverage,theuserrequestsinmobiletracegenerateonethirdoftheHTTPrequestsandgenerateanorderofmagnitudelesswebtrafccomparedtouserrequestsinthewire-linetrace.
Thisreectstheconvergenceofonlineser-vicestomobileequivalentservicesthataresensitivetotheUIlimitationsofmobiledevicesandcorrespondingdatacharges.
Therestofthepaperisstructuredasfollows:InSection2,wepresenttheproblem,providethenecessarybackground,anddescribeourdatasets.
InSection3,weexplainReSurf,andevaluateitsclassicationperformance.
Theobserva-tionsextractedfromdatatracesarepresentedinSection4.
Finally,wediscussrelatedworkinSection5.
2.
PROBLEMDEFINITIONANDTRACESThegoalofthissectionistopresenttheprobleminmoredetail,describeitschallenges,anddenetheterminology.
Wealsopresentthedatasetsthatweusehere.
2.
1ProblemdenitionandbackgroundWebtrafciscomposedofasequenceofHTTPrequestsandresponsesoccurringovertime.
WewillrefertoanHTTPrequestanditscorrespondingresponseasanHTTPtrans-action.
Throughoutthispaper,weusethetermsHTTPre-questandHTTPtransactioninterchangeably.
Whenauserrequestsawebsite,itcausesmanyHTTPtransactionstrans-ferredacrossthenetwork.
EachHTTPtransactioncorre-spondstoaweb-object,suchasanimage,video,HTMLle,ashle,javascript,etc.
Themainwebsiterequestedbytheuseriscalledtheprimarywebsite,e.
g.
,google.
com,facebook.
com,cnn.
comtonameafew.
Wereferto(1)Initialrequesttocnn.
comGET/HTTP/1.
1Host:www.
cnn.
comAccept-Language:en-us,en;q=0.
5Connection:keep-alive.
.
.
(2)Aadvertisementrequestcausedby(1)GET/html.
ng/site=cnn&cnnpagetype=main.
.
.
Host:ads.
cnn.
comReferer:http://www.
cnn.
com/.
.
.
(3)RequesttoaCDNservercausedby(2)GET/cnn/.
.
.
/advertisement.
gifHTTP/1.
1Host:i.
cdn.
turner.
comReferer:ads.
cnn.
com.
.
.
Figure2:AnexampleofsomeHTTPrequestsissuedbyawebbrowserduringavisittowww.
cnn.
com.
Forsim-plicity,onlypartsoftheHTTPheadersareshown.
Therstrequestdirectlyreecttheactionoftheusertorequest(e.
g.
,clickon)thecnn.
comwebsite;andwecallthisthe"headrequest.
"Thesecondandthirdrequestsare"embeddedre-quests,"whichareautomaticallyinitiatedbytheuser'swebbrowsersoftware.
thisrstrequestastheheadHTTPrequest,whichtypi-callyretrievesanHTMLorXMLle.
Usually,thisHTMLleincludesotherembeddedobjects,whichare,inturn,ac-quiredbyseparateHTTPrequestsinitiatedautomaticallybythewebbrowser.
Wecallthesesubsequentrequestsembed-dedHTTPrequestsandtheyareusuallytransparenttotheuser.
Ultimately,ourgoalistoassigneachHTTPtransactionobservedinthenetworktotheuserrequestthatgeneratedit.
Doingsoenablesustomeasuretrafcgeneratedbydifferentuserrequestsandhelpsusunderstandwebusersbehaviorinthenetwork.
InFigure2wepresentanexampleofthreeHTTPrequestsgeneratedbyavisittocnn.
com.
Followingtheaboveter-minology,therstHTTPrequestistheheadrequesttotheprimarywebpage(i.
e.
,cnn.
com)andtheothertwoareem-beddedHTTPrequests.
ForeachHTTPrequest,thedomainnameofthewebserverislocatedinthehosteldoftheHTTPheader.
Eventhoughallthreerequestsarecausedbythevisittocnn.
com,inthisexample,onlyonehaswww.
cnn.
comasthehostname.
Fromthisexample,weseethatbylookingatanHTTPtransactioninisolation,itishardtoknowwhichvisitgeneratestheHTTPtransaction.
Referrer:ThereferrereldinanHTTPheaderprovidesinformationastowhichweb-objectledtothere-questforthecurrentweb-object.
Thispreviousweb-objectiscalledthereferrer.
Forexample,inFigure2,weseethatthesecondHTTPrequesttowardstheadvertisingserverads.
cnn.
comhasthemainwebpage(www.
cnn.
com)3NameLABISP-1MOBSYNALECAMStartingdateOct32010Aug252011Jan72011Aug112011Aug112011Mar92012Duration6mon24h3h1mon-2mon#ofHTTPtransactions1.
2M1.
7M22.
9M186K973K19BGroundtruthavailableNoNoNoYesYesNoPayloadFullFullFullFullFullHTTPheaderTable1:Anoverviewofthewebtrafctracesusedinourstudy.
TheLAB,ISP-1,MOB,andCAMtracescontainreal-worldwebactivityfromthousandsofusersoverdifferentcountries.
TheSYNandALEtracesaretracesthatwerespecicallycraftedtoevaluateReSurfinacontrolledenvironment.
asitsreferrer.
Similarly,thethirdHTTPrequesttotheCDNservercamefromtheadvertisingserverandhasthenameoftheadvertisingserver(ads.
cnn.
com)asitsreferrer.
AsweexplaininSection3inmoredetail,thereferrereldcanhelpinourtaskofattributingHTTPrequeststotheirprimarywebsite.
Thereferrereldhasoneadditionalkeyutility.
Itcapturestheclicksbytheuserfromonewebpagethatleadtoanother.
Forexample,whenauservisitswebpageBbyclickingalinkinwebpageA,thewebbrowserwillplacetheURLofwebpageAinthereferrereldwhengeneratingtheHTTPrequestforwebpageB.
Below,weexplainhowthishelpsinidentifyingtheclick-throughstreamsofdifferentusersinatrace.
UserRequests:Typically,auserrequesthappensintheformofclickinghyperlinks,openingandrefreshingwebpages,writingaURLinabrowser'saddressbar,submittingformsandsoon.
AuserrequestgeneratesaseriesofHTTPtransactions.
Asweexplainedbefore,therstHTTPrequestisreferredtoastheheadrequest.
Aclick-throughstreamisaseriesofconsecutiveuserrequestswhichhavereferringrelationshipbetweenthem.
Figure1showsanexampleofauserrequest(top)andaclick-throughstream(bottom).
Intheclick-throughstreamofFigure1,weseetwoclicks(userrequest)insidewebsiteA,twoinsidewebsiteB,andoneinwebsiteC.
Inaddition,weseethattheusermovedfromweb-siteAtowebsitesBandCbyclickingonhyperlinksfrominsideA.
2.
2DatatracesThesixwebtrafctracesusedinourstudyaresumma-rizedinTable1.
Ourdatacoverseveralthousandsofmil-lionsHTTPrequests,overlongperiodsoftime,andatdif-ferentlocations.
Detailsregardingtheexactlocationsandthenamesoftheprovidersforallourtracesareintention-allykeptanonymizedduetoprivacyconcernsandbusinessagreements.
Wecollectedourdatafromavarietyofsources:(a)frommonitoringthewebactivityofusersinsideauni-versityresearchlaboverthelengthofsixmonthsandoveralargeuniversitycampusovertwomoths;(b)a24-hourtracecollectedataresidentialISPnetwork;(c)a3hourtracefroma3G/4Gmobileserviceprovider;(d)atracegeneratedbyreplayingthebrowsinghistoryfromnineusersoverape-riodofonemonthinacontrolledenvironment;and(e)atracegeneratedbyissuingrequeststopopularwebsitesinacontrolledenvironment.
Wecollectedtracesinbothcon-trolledanduncontrolledenvironments,whichallowsustobothexamineuserbrowsingactivitiesinthewildaswellasverifythecorrectnessofourmethodology.
Theusersinourtracesarealsodiverse,coveringacademicusersinanuni-versitylab,residentialADSLusers,studentsandacademicstafffromalargeuniversitycampus,aswellasmobilede-vice(smartphoneandtablet)users.
Thisallowsustocom-parethebrowsingpatterndifferencebetweendifferentusers.
Next,weprovidedetailsforeachtrace.
Foralltraces,weusethesamecollectionmethodology:WecollectedalltheIPpackets(bothheaderandpayload)onTCPports80,443,8000and8080.
Duetoprivacyconcerns,thecampus(CAM)traceistheonlyonethatcontainsjusttheheaderpartoftheHTTPrequests,withouttheirpayload.
LAB:WecollectedthistrafctracefromaresearchlabinauniversityintheUS.
Inthelab,thereareabout15graduatestudentsand20laptops/desktops.
Thecollectiondurationcoveredsixnon-consecutivemonthsovertheperiodofDe-cember2010untilSeptember2011.
ISP-1:ThetracewascollectedfromanedgelinkofaEuropeanresidentialISP.
WeweregivenaccesstoonlytherstvepacketsofeachunidirectionalTCPow.
GiventhatmostowsonlytransferfewHTTPtransactions,wedidnotobservethistobeproblematicforourstudy.
MOB:Wecollectedthistracefromfroma3G/4GmobileserviceproviderintheUS.
Thevastmajorityofthetrafcisgeneratedbytheapplicationsonmobiledevices,suchassmart-phonesandtablets.
CAM:TheCAMtraceiscollectedfromauniversitycam-pusinChina.
Approximately,thetracecontainstheactiv-ityofabout28.
2Kusers.
OurmonitorpointsitsontheedgegatewayconnectingthecampustothepublicInter-net.
Alldownloadinganduploadingtrafcfromthewholecampusgoesthroughthemonitorpoint.
Welogallimpor-tantHTTPheadereldsforallHTTPtransactionsonTCPport80.
Specically,theeldswelogarethefollowing:timestampofeachrequest,client/serverIPs,URL,referrer,content-type,content-length,HTTPresponsecodeanduser-agents.
Topreserveprivacy,clientIPsareanonymized.
Weappliedourmethodtoseveraldifferentdaysoftrafc.
Thetrendsextractedfromdifferentdayoftrafcareverysimilar.
Soweonlyshowtheresultsforoneworkdayintherestofpaper.
SYN:ThetraceisgeneratedinacontrolledenvironmentforthepurposeofevaluatingReSurf.
Wegeneratedthetraf-4cbyreplayingausers'browsinghistoryfromtheirGoogleChromebrowser.
Weusethewebbrowsinghistoryfromnineusersthatvolunteeredforourexperiments.
Weex-tractedthetimestamp,referrerandURLeldofeachvisitfromtheirbrowsinghistory.
Then,wereplayedeachvisitusingtheproceduredescribedinFigure3.
ThereplayofaURLworksasfollows.
Weremotelyinstructthebrowser(GoogleChrome)toopeneachURLseparately.
Atthesametime,weuseapacketcapturingsoftware(tcpdump)tocol-lectallthetrafconTCPport80,443,8000and8080.
Next,weclosethebrowserafter60secondsandsavethecollectedHTTPtrafctoanindividualle.
Asthenalstep,wear-ticiallyadjustthetimestampsandreferrereldstosimu-latehowthetrafcwouldbelikeifitcamedirectlyfromtheuser'ssurngactivity.
BoththetiminginformationandreferrerrelationshipsaredirectlyextractedfromChrome'sbrowsinghistoryrecords(seeFigure3).
Afterreplayingallvisits,alltheseindividuallesaremergedtoformacom-pletetrafctrace.
Sinceeachuseractivitywascollectedandstoredseparately,weeffectivityhavethegroundtruthforeachHTTPrequestinthetrace.
ALE:TheALEtraceisalsoarticiallygeneratedusingthesamemethodologyasdescribedabove.
Theonlydiffer-enceisthathereweonlyvisittheURLsofthehomepageofthetop80,000rankedwebsitesinAlexa[1],andwedonotarticiallyaddanyreferrereldsormodifythetimestamps.
3.
THEReSurfAPPROACHInthissection,wepresentourReSurfmethodology,andweevaluateandcompareitwithexistingsolutions.
Insec-tion3.
3,wediscussthepracticalissuesandlimitationsofourmethod.
3.
1TheReSurfMethodologyThegoalofReSurfistogroupHTTPtransactionsintouserrequests(seedenitioninSection2.
1).
Ourapproachworksintwosteps.
First,weidentifytheheadHTTPre-questsbyusingdifferentfeaturesfromeachHTTPtransac-tion.
Thesefeaturesinclude:thesizeoftheweb-object,thetypeoftheobject,thetimingbetweensuccessiverequests,andothers.
Second,weusethereferringrelationships(seedenitioninSection2.
1)toassignalltheembeddedHTTPtransactionstotheircorrespondingheadrequest.
Weexplainthemethodologyinmoredetailbelow.
ChromehistoryBrowserTrafcdumpleSynthesizedwebtrafcURLTimestampandreferrerTrafcFigure3:Thediagramshowsthestepswetaketogener-atesynthesizedwebtrafcbyreplayingausers'browsinghistory.
Figure4:AnexampleofanHTTPreferrergraphshow-ingthreeuserrequests,onetocnn.
comandtwotogamestop.
com.
Werepresenttheweb-objectsfromheadHTTPrequestswithcirclesandtheweb-objectsdownloadedfromembeddedHTTPrequestswithrectangles.
ReferrerGraphs:Tofacilitateourmethodology,werstrepresentallHTTPrequestsfromthesameclientIPad-dressasagraph.
Forgeneratingthegraph,weusetherefer-rerandhosteldsfromtheHTTPrequests(seeexamplesinFigure2).
Usingtheseelds,wegenerateadirectedgraphthatcapturesthereferringandtimingrelationshipbetweendownloadedweb-objects.
Figure4showsanexampleoftheHTTPreferrergraphofasingleuseraccessingcnn.
comandgamestop.
com.
Forthepurposeofexposition,wesimplifytheexampleandkeeponlyasubsetofnodesandedges.
Inreality,evenwithinfewminutesthesizeofthereferrergraphreachesseveralhundredsofnodes.
Forexam-ple,intheCAMtrace,themediansizeofthereferrergraphoveratenminuteintervalis200nodesforanIPaddress.
ThenodesintheHTTPreferrergraphareweb-objectsan-notatedwiththeircompleteURI.
InsideeachnodewealsoaddthetimestampofitcorrespondingHTTPtransactiontohelpusunderstandtheorderingofthedifferentrequestsinourtrace.
Thedirectededgescapturethereferringrelation-shipbetweennodes.
ThedirectededgefromAtoBmeansAisB'sreferrer.
Thelabelofadirectededgerepresentsthetimestampdifferencebetweentherequestsforthetwoobjects.
WeprovideanintuitiveexplanationofhowourmethodworksusingtheexampleofFigure4.
Inthegure,werep-resentheadHTTPrequestswithcirclesandtheweb-objectsdownloadedfromembeddedHTTPrequestswithrectangles.
InFigure4,wehavethreeuserrequestswiththeircorre-spondingheadHTTPrequestanddifferentembeddedHTTPtransactions.
Notethat,initially,thereferrergraphdoesnotdistinguishbetweenheadHTTPrequests,andhasnoinfor-mationofwhichgroupsformUserRequests.
Thisinforma-tionistheoutcomeofReSurf.
Todetecttheheadrequests,ReSurfexploitsthefollowingcharacteristics.
First,theheadrequestsofuserrequestsareHTMLorXMLobjects,have5veryfewincomingedgesandmanyoutgoingedgesinthereferrergraph.
Atthesametime,thenodesinthesameuserrequestareveryclosetogetherintime.
Ontheotherhand,HTTPtransactionfromdifferentuserrequestsarefur-therawayfromeachotherintime.
Finally,headrequestsareconnectedwithanedge,ifthereisreferringrelationshipamongthem.
Inmoredetail,ourapproachhasthefollowingsteps:Step1.
WeformtheHTTPreferrergraph.
ReSurfbuildsanHTTPreferrergraphforeachhostoveraperiodoftime(e.
g.
,everyveminutes).
ThecreationoftheHTTPreferrergraphwasexplainedearlierinthissection,andanexampleisshowninFigure4.
Step2.
WeidentifyalltheheadHTTPrequestcandi-dates.
ReSurfselectsheadrequestcandidatesaccordingtofollowingrules:(a)ThecandidateshouldbeanHTML/XMLobject.
Specically,itscontent-typeshouldbeoneoffollowing:"text/html","text/xhtml","text/shtml",and"text/xml.
"(b)Sincemostwebpagesarefairlycomplex,thesizeofcandidatesshouldbelargerthanVbytes.
(c)CandidatesshouldhaveatleastKembeddedobjects.
(d)ThetimegapbetweencandidatesandtheirreferrersshouldbelargerthanapredenedthresholdT.
(e)Asanoptionalrenementrequirement,ReSurfltersoutcandidatesbasedonthekeywordsintheirURI.
Thatis,candidates'URIshouldnotcontainthekeywords:adserver,ads,widget,andbanner.
Weobservedthiskeywordheuristictoboosttheclassi-cationaccuracybetween2-3%dependingonthetrace.
WeprovidespecicvaluesfortheparametersV,K,Tnextinourvalidationsection.
Step3.
Wenalizetheidenticationoftheheadre-quests.
Weutilizethereferringrelationshipbetweentheheadrequestcandidates.
Specically,acandidateisclas-siedasaheadrequestifitsreferrerisalsoaheadrequestorifitdoesnothaveanyreferrer.
Inthereferrergraph,nodeswithnoreferrershavenoincomingedges.
InFigure4,thecnn.
comin"UserRequest1"isanexampleofsuchanodewithnoreferrer.
Suchnodesoccurwhenauser,forexample,opensawebpagesfromabrowserbookmarkorbydirectlytypingtheURLinthebrowser'saddressbar.
Ifthereferrerisnotempty,itmeansthattheusernavigatedtoawebpagebyfollowingthelinksfromareferrerwebpages.
Followingthislogicimpliesthatthereferrerofaheadrequestshouldalsobeaheadrequestitself.
Step4.
WeassignembeddedHTTPrequeststoheadrequests.
ReSurfassociatesembeddedHTTPrequeststoheadrequestsbyutilizingthetiminginformationandreferringrelationshipinthereferrergraph.
Infact,onceweknowtheheadrequestofauserrequests,itiseasytoattributetherestofHTTPrequeststouserrequests.
ForeachHTTPtransaction(node),wetraversethereferrergraphbackwardsuntilwereachaheadrequest.
IfanHTTPHeuristicsfordetectingheadHTTPrequestsaItscontent-typeshouldbeoneof:"text/html","text/xhtml","text/xml","application/xhtml",or"ap-plication/xml.
"bItsobjectsizeshouldbelargerthanVbytes.
cItshouldhaveatleastKembeddedobjects(out-degreeinthereferrergraph).
dThetimegapbetweentherequestunderquestionanditsparentrequest(i.
e.
g,itsreferrer)shouldbemorethanT.
eItsURIshouldnotcontainanyofthekeywords:ad-server,ads,widget,embed,orbanner.
fItsreferrerisaheadrequestordoesnotexist.
Table2:TheheuristicrulesthatReSurfusestoidentifyheadHTTPrequestsgivenareferrergraph.
ThedefaultvaluesfortheaboveparametersusedinReSurfareT=0.
5s,V=3,000bytes,andK=2.
transaction(node)hasmorethanoneincomingedges,wefollowtheedgewiththesmallesttimedifference(i.
e.
,smallerweightontheedge).
Inthisway,thepathwilleventuallyleadbacktotheheadrequestthatwastriggeredbytheuserrequest.
Tosummarize,theheadrequestofanuserrequestshouldmeetalltherequirementsintheTable2.
3.
2EvaluationWeevaluatetheaccuracyofReSurfusingtwodifferentsourcesofgroundtruth:(a)oursyntheticallygeneratedtraceSYN,and(b)usingthewebanalyticsbeaconsasproposedinStreamStructure[8].
WealsoexaminethesensitivityofourapproachtoitsparametersofReSurf,andjustifythevalueswechoose.
Finally,wecompareReSurftothestateoftheart[8].
Evaluationmetrics:Weusethestandardclassicationmetricsofprecisionandrecall.
Precisionisthenumberoftruepositives(TP)dividedbynumberofTPandfalsepos-itives(FP),P=TP/(TP+FP).
Recallisthenum-berofTPdividedbythenumberofTPandfalsenegatives,R=TP/(TP+FN).
WealsousetheF1scorewhichistheharmonicmeanofPandR,specically,F1=2*P*RP+R.
ToevaluatetheperformanceofReSurf,weaskthefollow-ingcomplementarybutslightlydifferentquestions.
Q.
1:HowaccuratelycanReSurfidentifyheadHTTPrequestsWewanttoquantifyhoweffectivelyReSurfiden-tiestheheadrequestsfromalargesetofrequests.
Giventhatthenumberofheadrequestsisusuallymuchlessthanthetotalnumberofrequests,thisquestionallowsustofocusonlyonheadrequests.
Forexample,ifoutof100requestsoneisaheadandtheothersareembedded,ifaclassiersre-portsalltherequestsasembeddeditwouldbecorrect99%ofthetimes,butwouldofferlimitedutilityinsolvingourproblem.
Forthisreason,wereportthePandRonheadrequestsseparately.
Q.
2:HowaccuratelycanReSurfclassifyheadandem-6beddedrequestsWewanttoquantifyhoweffectivelyourapproachclassieseachHTTPrequestsasaheadoranem-beddedHTTPrequest.
UnlikeQ.
1,wereportresultsoverallHTTPrequestsandnotonlyovertheheadrequests.
Thatis,precisionrepresentsthenumberofcorrectlyclassiedHTTPrequestscomparedtothetotalnumberofHTTPre-questsclassiedbyouralgorithm.
NotethatReSurfmayleavesomerequestsunlabeled(a.
k.
aunknown).
Recallex-pressesthetotalnumberofclassiedHTTPrequestscom-paredtothetotalnumberofexistingHTTPrequestsinthetrace.
Q.
3:HowaccuratelycanReSurfassociateHTTPre-queststotheircorrespondinguserrequestThisisamoredemandingquestionthantheclassicationforQ.
1andQ.
2:wewanttoassociateeachHTTPrequestwiththegeneratinguserrequest.
Thisisamulti-classclassicationproblem,whereeachuserrequestisaseparateclass.
Forexample,ifanembeddedHTTPrequestRiscorrectlyidentiedasem-bedded,butitisassociatedwiththewronguserrequest,wewillconsideritamisclassication.
TheprecisioncapturesthenumberofcorrectlyclassiedHTTPrequestscomparedtothetotalnumberofHTTPrequestsclassied.
TherecallreportsthecorrectlyclassiedHTTPrequestscomparedbythetotalnumberofHTTPrequestsinthetrace.
WeusethefollowingvaluesfortheparametersinReSurf:T=0.
5,V=3000andK=2.
Wejustifythisselectionlaterinthissection.
Akeyissueinevaluatinganyclassierishowtodeter-minethegroundtruthinthedatasets.
Toaddressthischal-lenge,weusetwodifferentapproaches:(a)usingasynthe-sizedtrace,and(b)usingthelabelsfromaclassierthatisbasedonwebanalyticsbeacons.
(a)ValidationusinggroundtruthfromtheSYNtrace.
Toevaluateourapproach,wecreatedasynthesizedtraceun-deracontrolledenvironment.
Duringthecreationofthistrace,ateachpointintimeweknewexactlywhichweb-sitewasbeingvisited,andwhatrequestsweregeneratedbythevisitstothosewebsites.
Detailsregardingthegenera-tionoftheSYNtracearegiveninSection2.
2.
EventhoughweknowthegroundtruthfortheALEtraceaswell,wedonotshowresultswiththattraceheresinceitdoesnotrep-resentrealbrowsingactivity.
Figure5showstheprecision,recallandF1scorewhenweapplyReSurfontheSYNtrace,forallthreequestions,Q1-Q3.
Aswesee,allmetricsareabove90%,showingthatReSurfcansuccessfullyidentifytheoriginatingwebsiteforthevastmajorityofHTTPre-quests.
Moreover,weseethattheprecisionofReSurfisveryhigh,96%andabove,implyinghighcondenceinourclassicationofrequests.
(b)Validationusingwebanalyticsbeaconsasgroundtruth.
FortheLAB,ISP-1CAM,andMOBtraces,wedonothavethegroundtruth.
Therefore,wecomparetheclas-sicationperformanceofReSurfbasedonthepredictionsgivenbytheStreamStructure[8]method.
Thismethodisbasedontheobservationthatmanywebsitesusewebanalyt-020406080100Q.
1HeaderDetectionQ.
2RequestbinaryClassificationQ.
3RequestAssociation%PrecisionRecallF1Figure5:Theprecision,recallandF1scoreintheSYNtrace.
icsbeaconstotracktheirwebpagesandobjects.
Intuitively,thewebanalyticsbeaconsreporttotheanalyticsserver,andthishelpsusdetectwhichistheheadrequestandtowardswhichprimarywebsite.
Weconsiderwebanalyticsbeaconsfromthreemajorservices:google-analytics.
com,pixel.
quantserve.
comandyieldmanager.
com[8,10,11].
Here,wegiveamoredetailedexplanationofthebeaconmethod(i.
e.
,StreamStructure),usingthegoogle-analyticsbeaconasanexample.
Oncetheobjectorwebpagetrackedbygoogle-analyticsisrequested,abeaconisgeneratedbasedontherequestedobject'sURIandsenttoagoogle-analyticsserverintheformof"special"HTTPGETrequest.
UnlikeregularHTTPGETrequests,theirURIsencodevariousin-formationabouttheuserrequest,includingtheURIoftherequestedobject/page.
Therefore,aftersomecarefulpars-ingoftheHTTPrequests,wecanidentifythebeacons,andfromtherewecanidentifytheURLofitsmainobject,whichleadsustotheprimarywebsiteoftheuserrequest.
Wereferthereaderto[8]formoredetailsaboutStreamStructure.
Aswewilldiscusslaterinthissection,StreamStructurecanbeusedforonlyafractionoftherequests,sinceonlyasmallpercentageofrequestsusebeacons.
However,thissetofrequestscanhelpusdeterminetheeffectivenessofReSurfprovidinganadditionalgroundtruthset.
Toachievethis,werstuseStreamStructuretoidentifyasmanyheadrequestsusingbeacons.
Werefertothissetofidentiedheadre-questsasS.
Then,wecomparehowwellReSurfperformsovertheknownsetS.
Figure6showstheprecision,recallandF1forheaddetection(Q.
1)usingbeaconsasgroundtruth.
WeobservethatReSurfachievesabove96%precisioninalltracesand91-98%recall.
Theresultsshowthatourapproachperformsconsistentlywellacrossallthedatasets,whicharecollectedindifferentcontinentsandduringdiffer-enttimeperiods.
Notethatweonlyusewebanalyticsbea-consheretoestablishthegroundtruth,butReSurfdoesnotusebeaconinformationduringitsclassicationprocess.
EvaluatingReSurfoveradifferentrangeofparame-ters:WeexaminetheeffectofdifferentparametersontheperformanceofReSurf.
WeonlyshowtheplotsforQ.
1forbrevity;theperformanceforallquestionsisqualitativelythesame.
WeusetheSYNtracetosetourparametersandthenapplythemtotherestofthetraces.
7020406080100LABISP-1MOBSYNCAM%PrecisionRecallF1Figure6:Theprecisionandrecallfordetectingheadre-quests(Q.
1)usingwebanalyticsbeaconsasgroundtruth.
02040608010001000200030004000500060007000F1(%)V(Bytes)K=1K=2K=3K=5Figure7:TheF1scoreofdetectingheadrequests(Q.
1)asafunctionofparameterVandfordifferentvaluesofparam-eterKintheSYNtrace.
Figure7showstheF1metricfordetectingheadrequests(Q.
1)usingdifferentvaluesforthevolumeVandtheout-degreeKovertheSYNtrace.
WeobservethattheprecisionincreasesandtherecalldecreasesasweincreasethevalueofV.
Intuitively,largehtml/xmllesaremorelikelytobetheweb-pageofanactualuserrequestcomparedtoshorterones.
Shorthtml/xmllestypicallycarryadvertisingre-latedcontentandaretriggeredbyembeddedrequests.
Atthesametime,byfurtherincreasingV,westartconsider-ingonlyverylargehtml/xmllesandwestartmissingrequests,whichresultstolowerrecall.
AsweseefromFig-ure7,thecombinedbehaviororPandRcapturedbytheF1score,exhibitsgoodperformanceforVintherangeof3000to5000bytes.
Toachievebothgoodprecisionandrecall,wechooseV=3000.
Inthesamegure,differentlinesshowhowtheout-degreeKvariesfrom1to5.
Wendthatthevalues2and3gavethebestresults,withK=2performingslightlybetterintherangeofparameterV.
RegardingT,wefoundthatourapproachexhibitsgoodperformanceaslongasTislessthan1secondandmorethan0.
1second.
Theresultsarenotshownduetospacelim-itations.
Intherestofpaper,weusethisparametersetting:T=0.
5,V=3000andK=2.
Usingweb-analyticbeaconsisnotenough.
Anaturalquestioniswhywedon'tjustusewebanalyticsbeaconsex-clusivelyforuserrequestreconstruction.
Eventhoughtheuseofbeaconsgivesgoodresultsforthosewebsitesthatusethemweidentifyseverallimitations:Themajorityofuserrequests(80%)donothaveabea-coninourdatatraces.
Wendthatlessthan20%oftheuserrequeststhatwerefoundbyReSurfhavebeaconsintheLAB,ISP-1,CAM,andMOBtraces.
GiventheprecisionandrecallofReSurfinthecontrolleddatasets,wearecon-dentthatthispercentageisreasonablyaccurateestimateofrequestsinthetrace.
Tofurtherverifythis,weusedtheSYNtrace,forwhichwehavethegroundtruth.
WeplottheresultscomparingReSurfwiththeStreamStructureapproach[8]inFigure8.
WeobservethatfortheSYNtrace,lessthan22%ofuserrequestsaredetectedusingbeacons.
Tosumma-rize,weobservedthatusingbeacons,wecanonlysuccess-fullyidentifyapproximately20%oftheuserrequests,com-paredtoabove91%weachievewithReSurf.
Acomparablestatisticinthenumberofheadrequeststhathavebeaconsof23.
9%isalsoreportedinotherstudio[8].
Anadditionalcomplicationisthatsometimes,oneuserre-questhasmultiplebeaconsandcouldconfusebeacon-basedreconstructionsolutions.
Figure8showstherecallfordetectingheadrequestsintheSYNandALEtracesusingStreamStructureandReSurf.
Aswesee,withStreamStructuretherecallis22%and60%fortheSYNandALEtraces,respectively.
ThehigherrecallintheALEtraceisduetothehigherpopularityofwebana-lyticsbyverypopularwebsites.
Bycontrast,ReSurfworksconsistentlywellinbothtraceswithrecallabove92%.
Un-fortunately,fortheLAB,ISP-1,MOBandCAMtraces,wecannotrepeatthesameexperimentsincewedonothavegroundtruth.
Overall,weobservedthatReSurfidentiesdoublethenumberofheadrequestsinthesetracescomparedtoStreamStructure.
3.
3DiscussionWhataboutencryptedwebtrafcReSurfusesinfor-mationfromtheHTTPheader,therefore,ifthewebtrafcisencrypted(e.
g.
,usingHTTPS)ourapproachwillnotclassifythoseows.
However,byanalyzingourreal-wordtraces(seeTable1),weobservedthattheencryptedtrafconlyamountsfor2%to8%ofthetotalwebtrafc.
Thelowestpercentagecorrespondstothemobiletrace,suggestingthatencryptioninsmartphoneapplicationsisnotpopular.
Overall,weob-servedthatunencryptedwebtrafcisthenormtodayandwebelieveitwillcontinuetoamountforasignicantportionofthetrafcinthefuture.
Theanalysisofencryptedwebtrafcremainsaninteresting,openproblem.
HowisReSurfaffectedbyusersbehindnetworkaddresstranslation(NAT)HavingusersbehindNATsisverysim-ilartohavinguserswithveryhighactivity.
SincereferrergraphsarebuiltperIP,NATuserswillappearasone"heavy020406080100SYNALE%ofheaddetectionBeaconsReSurfFigure8:TherecallfordetectingheadHTTPrequests(Q.
1)usingbeaconsandReSurfindifferenttraces.
80204060801001101001000CDF(%)ThenumberofobjectsinoneuserrequestLABISP-1MOBCAMFigure9:TheCDFofthenumberofdownloadedobjectsperuserrequestsoverdifferenttraces.
user"withacomplexreferrergraph,foruseraccesseswithinthesametimewindows.
Therearetwocaseshere:Ifdiffer-entNATusersbrowsecompletelydifferentwebsites,theirreferrergraphswillnotbeconnectedandReSurfwilldistin-guishdifferentrequests.
Onthecontrary,intheworstcasewheretwousersrequestthesamewebpageatthesametime,ReSurfwillcombinethemasonelargerequest.
However,itwillstillbeabletoattributetheirtrafctotheoriginatingwebsite.
Finally,theremaybecaseswheresomeembeddedrequestsare"multiplexed"betweenmorethanoneuserre-questsanddisambiguatingishard;however,wehavenotob-servedthattobeaprobleminourstudy.
Notethatourgoalistwofold:i)GroupHTTPrequeststoidentifytheinitialuserrequestedpage,andii)identifytheuserclick-throughstream.
Hence,havingusersbehindNATsdoesnotaffecttherstgoal,whilethesecondisimpactedifusersfollowthesamestreamofpagesatthesametime.
CanReSurfclassifytrafcinreal-timeOurcurrentim-plementationdoesnotsupportreal-timeclassication.
InStep1,werequirethecollectionoftrafcforseveralmin-utesbeforeweanalyzethereferrergraphandclassifythedifferentrequests.
Therefore,ourapproachcanclassifyre-questsseveralminutesaftertheircreation.
Asmentionedearlier,off-lineanalysisofwebtrafcisusefultooperatorsthatwanttounderstandhowtheirnetworkisbeingused,aswellasforresearchesthatwanttostudymoderntrendsandchangesinwebactivity.
Real-timeclassicationcanbeim-portanttonetworkoperatorsthatwanttoenforcedifferentpoliciesandachievingthisrequirementisleftasfuturework.
4.
USINGReSurfONREALWEBTRAFFICInthissection,weuseReSurfandanalyzethefourreal-worldwebtrafctraces:LAB,ISP-1,MOBandCAM.
First,wegroupHTTPtransactionsintouserrequestswithReSurf.
Then,weanalyzewebtrafcattwolevels:(a)website,and(b)click-throughstream.
Atthewebsitelevel,westudyhowmuchtrafciscausedbydifferentuserrequestsandwhicharethemostpopularwebsites.
Attheclick-throughlevel,weanalyzehowusersbehaveandhowtheymovefromonewebsitetoanother.
Finally,wepresentdifferencesbetweenmobileandwirelinewebtrafc.
4.
1AnalyzingrequestsatthewebsitelevelInFigure9,weshowtheCDFsofthenumberofdown-0204060801001101001000CDF(%)ThenumberofobjectsinoneuserrequestAlexavisit-1Alexavisit-2Alexavisit-3Alexavisit-4Figure10:Thenumberofdownloadedobjectsindifferentvisitstowardsthetop80,000websitesreportedbyAlexa.
loadedobjectsbyalltheuserrequestsinthefourtraces.
Asarstobservation,weseesignicantdifferencesbetweenmo-bileandwirelinetraces.
Themediannumberofweb-objectsperuserrequestis4forthemobiletracecomparedto11forthethreewirelinetraces.
Similartrendsarealsoobservedforthevolumeofgeneratedtrafc,thenumberofgeneratedTCPows,contactedIPs,andautonomoussystemnumbers(ASNs)whicharenotshownhereduetolimitedspace.
Wealsoobservethatapproximately40%oftheuserrequestsinthewirelinetracesresultsinthedownloadof20objectscor-respondingto100KBytestrafc(median).
Thisshowsthatuserrequeststomodernwebsitestriggerafairlylargenum-berofHTTPrequests(web-objects)aswellashightrafcvolume.
Finally,weobservedasignicantsimilaritybe-tweentheCDFsintheLAB,ISP-1,andCAMtraces,es-peciallyforthelower50%ofuserrequests.
Thisshowsthatuserrequestsinwirelinetraceshavesimilarcharacteristics,evenwhentheusersareindifferentcountrieswithdiffer-entwebsitesbeingpopular.
ThetwomostsimilartracesareLABandCAM.
Thissuggeststhatevenwiththesmallnum-berofuserswehaveintheLABtrace,itcapturesbehaviorsthatcloselymatchatraceofthousandsofusersinalargeuniversitycampus(CAM).
4.
1.
1CaveatsofusingsynthesizeddataTofurtherunderstanduserrequeststowardspopularwebsites,wecomparethepropertiesoftheALEtracewithourothertracesthatrepresentrealuserbehavior.
TheuseofpopularAlexa[1]websiteshasalsobeenusedbyotherstudies[6]thataimtounderstandthecomplexityofmodernwebsites.
ApplyingReSurfontheALEtrace,weobservedsomesignicantdifferencesbetweenthistraceandourreal-worldtraces.
Ourhypothesisforthisdifferenceisthatthebrowsers'localcachemayaffectwebtrafcmeasurementssignicantly.
Toquantifytheeffectoflocalcaching,wevisitthehomepageofthetop80,000websiteinAlexa[1]fourtimes.
Thetimegapbetweensuccessivevisitsistenminutes.
InFigure10,weshowtheCDFofthenumberofdown-loadedobjectsindifferentvisits.
Onaverage,thesecondvisitonlydownloadsonethirdoftheobjectsoftherstvisit,andonlygeneratesaboutonethirdofthenetworktrafc.
Inotherwords,abouttwothirdsofobjectsarecachedlocallyaftertherstvisit.
Thethirdvisitrequestsevenlessob-9102103104105googlefacebook.
comlocalsitelocalsiteyoutube.
comyahoo.
comikariam.
comyouporn.
comlocalsitesimply.
comcriteo.
comanimeki.
rulocalsitewikipedia.
orgbakecaincontrii.
comlocalsitemegaupload.
comcam4.
comxhamster.
comlocalsiteOthersUserrequests(a)ISP-1102103104facebook.
comvirginmobileusa.
comgooglecellmania.
commocospace.
comyahoo.
compornhub.
commyxer.
comgo.
comspankwire.
commyyearbook.
comwikipedia.
orgtwitter.
comfunformobile.
comebay.
comlive.
commyspace.
comyoutube.
comcellufun.
comcnn.
comOthersUserrequests(b)MOB104105106107baidu.
comtaobao.
comrenren.
comqq.
comsina.
com.
cnweibo.
com163.
comgooglesohu.
comdouban.
comsogou.
compplive.
cnyouku.
comcnki.
nethoopchina.
comtmall.
comyahoo.
com360buy.
com126.
comele.
meOthersUserrequests(c)CAMFigure11:Thetopwebsitesintermofuserrequest.
jects,whichsuggestsadditionalobjectsbeingcachedbythesecondvisit.
Interestingly,thereisnosignicantdifferencebetweenthethird,fourth,andsoon,visitsinthenumberofdownloadedobjectsandnetworktrafc.
Eventhoughthenumberofobjectsdecreasessignicantlybetweensucces-sivevisits,wehavenotobservedthistobetrueaboutthetotalnumberofuniqueserverIPsbeingcontacted.
Infact,thetotalnumberofcontactedIPsintherstvisitisjust9%morethanthefollowingvisits.
Asexpected,sincethenum-berofobjectdownloadeddecreases,weobservetheaveragenumberofdownloadedobjectsperserverIPtodecreasesig-nicantlyaswell.
Afterfurtherinvestigation,weobservedthatthedifferentIPsoftencorrespondtothird-partyanalyt-icsandadvertisingservers,whichservecontentthatisnotusuallykeptinthelocalcache.
Therefore,thoseIPsarecon-tactedduringeveryvisit.
Keytakeaway:Overall,Alexa-basedstudiesseemtooverestimatethenumberofdownloadedobjectsandgener-atedtrafcbyasmuchasthreetimes,whencomparedtoactualusertrafc,duetolocal-cacheeffects.
Ontheotherhand,thenumberofcontactedIPsanddomainsdonotseemtobeaffectedbylocalcaching.
Itisthereforeimportanttohavethesetwofactsinmindwhenanalyzingtrendsbysyn-thesizingrequeststopopularwebsites.
4.
1.
2WebsitepopularityWenowturnourattentiontowebsitepopularityintermsofuserrequests,externalreferrers,trafcvolume,andnet-workows.
Figure11showsthetop20websitesintermofuserrequestsintheISP-1,MOB,andCAMtraces.
As0153045googlefacebook.
comxhamster.
comlocalsiteforumcommunity.
netlocalsitelocalsitelocalsitelocalsitelocalsitetubegalore.
comyahoo.
com4tube.
comitalia-film.
comlocalsitecamads.
netpornbanana.
comkingdomsofcamelot.
comyuvutu.
comlive.
comOthers%(a)ISP-1015304560googlecellmania.
comyahoo.
compornhub.
comlive.
comdolphin-browser.
comtube8.
comcnn.
comspankwire.
comtwitter.
commobclix.
comeasyhardcore.
infobaidu.
commsn.
comfacebook.
comdoubleclick.
net89.
comtotalporn.
combrazzersmobile.
comskyfire.
comOthers%(b)MOB0102030baidu.
comgoogle360.
cnsogou.
comhao123.
comsina.
com.
cnsohu.
comqq.
comyahoo.
com163.
comrenren.
comyouku.
com126.
comweibo.
comsoku.
com2345.
comsoso.
combing.
comtudou.
comdouban.
comOthers%(c)CAMFigure12:Thetopexternalreferrersindifferenttraces.
expected,the"usualsuspects"areinthetopplacesintheISP-1andMOBtraces,e.
g.
,Google,facebook.
com,wikipedia.
organdyoutube.
com.
TheCAMtraceiscollectedinChina,whichexplainsthedifferencesinwebsitepopularity.
Othertopwebsitesrepresentlocalpreferences,suchasnewssitesandportals.
ThisisespeciallyvisibleintheISP-1andCAMtraces.
ThespeciclocalwebsitesintheISP-1tracesarekeptanonymizedbecauseofabusinessagreement.
Additionally,Figure11showsalargepercentageoftrafcintobecauseby"Other"websites,whichshowsthatweb-trafcdoesnotconsistsofjustahandfulpopularwebsites.
Besidespopularity,understandinghowusersreachapar-ticularwebsiteisofinteresttobothwebsiteoperatorsanddesigners.
Figure12showsthetop20externalreferrerweb-sitesintheISP-1,MOB,andCAMtraces.
Awebsiteisconsideredasanexternalreferrerifitrefersuserstootherwebsites.
Itisworthmentioningthatweaggregateallin-ternationalversionsofwebsitesintoone,e.
g.
,google.
itandgoogle.
br,areaggregatedtogetherasGoogle.
ThelargestexternalreferrerinalltracescollectedinEuropeandintheUSisGoogle.
Itaccountsforover30%and45%ofallexternalreferrers,respectively.
IntheLABtrace,theper-centageisashighas80%.
TheplotforLABtraceisnotin-cludedherebecausetheuserpopulationistoosmalltodrawmeaningfulconclusionsatthislevel.
Thesecondlargestex-ternalreferrer,cellmania.
com,intheMOBtrace,isaportalwebsiteformobiledevices.
Itintegratesnews,map&weather,wirelesssearchandemailformobileusers.
Fig-ure12(c)showsthereferrersfortheChinesedataset(CAM).
InCAM,thelargestreferrerwebsiteisbaidu.
com,which1001e+092e+093e+09xhamster.
comyuvutu.
comsfico.
comlinuxmint.
combitshare.
comforumcommunity.
netfilesonic.
comyouporn.
comfileserve.
comdtvideo.
comnovamov.
comashemaletube.
comwupload.
comyoutube.
comredtube.
comhardsextube.
comyoujizz.
commovshare.
netfacebook.
comxvideos.
comOthersBytes(a)ISP-101e+092e+093e+09mofosex.
commadthumbs.
comspankwire.
commobileboner.
comxshare.
comtube8.
comfacebook.
compornoxo.
combangyoulater.
comcnet.
comperfectgirls.
netfreex.
mobikeezmovies.
combrazzersmobile.
comgooglemobilep0rn.
comvirginmobileusa.
comallmobileporn.
comlubetube.
comgaytvx.
comOthersBytes(b)MOB01e+122e+123e+124e+12115.
comqq.
combaidu.
comtaobao.
comrenren.
comsina.
com.
cntudou.
comamap.
comweibo.
com163.
comwanmei.
comsmgbb.
cn51t.
comduote.
commicrosoft.
comchaoxing.
comyouku.
comtmall.
comgyyx.
cnsdo.
comOthersBytes(c)CAMFigure13:Thetopwebsitesintermoftrafcvolume.
isthemostpopularsearchengineinChina.
Interestingly,GoogleisthesecondmostpopularreferrersiteinCAM.
Insummary,majorexternalreferrersaresearchengine,portalandsocialnetworkingwebsitesinbothwirelineandmobiletraces.
Figure13showsthetop20websitesintermofaggregatedtrafcvolumeintheISP-1,MOB,andCAMtraces.
Nineoutofthetop20websitesintheISP-1traceand16of20intheMOBtraceareadultwebsites.
Infact,adultwebsitesinbothtracesaccountformorethan40%oftotaltrafc.
TheratioofadultwebsitesintheMOBtrace(16/20)ismuchhigherthantheone(9/20)intheISP-1trace.
Onepossibleexpla-nationisthatpeoplepreferbrowsingadultwebsitesthroughpersonalmobiledevicesduetoprivacyreasons.
IncontrasttotheISP-1andMOBtraces,intheCAMtrace,noneofthetopsitesisanadultsite.
Weattributethistostrictercontrolandcensorshipinthisnetwork.
Finally,weseethatthesec-ondgroupofbigplayersintheISP-1traceareonlinelesharingwebsites,e.
g.
,bitshre.
com,filesonic.
comandfileserve.
com,whichaccountforroughly30%ofthetotaltrafc.
ThesitewithlargesttrafcvolumeinCAMisthele-sharingwebsite115.
com,whichaccountsforroughly35%oftotaltrafc.
OnlinelesharingwebsitesaresignicantcontributorsofHTTPtrafc[3].
Overall,on-linelesharingwebsitesandadultwebsitesseemtobethetopsourcesofwebtrafcinourtraces.
Finally,Figure14showsthetop20websitesintermofows.
InbothISP-1andMOBtraces,Facebookisthenumberone.
Therearetwopossiblereasons.
First,Facebookishighlypopularamongusers.
Sec-ond,Facebookisweb2.
0websiteandadoptsAJAXtechniques:Facebookpagesperiodicallyupdatethem-103104105106facebook.
comlocalsitegooglelocalsiteyouporn.
comyoutube.
comxhamster.
comlocalsiteyahoo.
comlocalsitecam4.
comlocalsitecriteo.
comlocalsitelocalsiteyoujizz.
comforumcommunity.
netxvideos.
comodnoklassniki.
rulocalsiteOthersFlows(a)ISP-1102103104105facebook.
comvirginmobileusa.
comgooglemocospace.
comyahoo.
comcellmania.
commyspace.
commyxer.
comgo.
comyoutube.
compornhub.
comtwitter.
comfunformobile.
comlive.
comwikipedia.
orgebay.
comspankwire.
commadthumbs.
commyyearbook.
comcnn.
comOthersFlows(b)MOB105106107108baidu.
comtaobao.
comrenren.
comqq.
comsina.
com.
cnweibo.
comsohu.
com163.
comtmall.
comyouku.
com360buy.
comhoopchina.
comtudou.
comyahoo.
comcnki.
net115.
comdangdang.
compplive.
cndouban.
comgoogleOthersFlows(c)CAMFigure14:Thetopwebsitesintermofows.
selvesevenwhenusersleaveFacebookpagesinbackground.
Moreover,thecontinuedinteractionofuserswithFacebookandtheextensiveuseofloadbalancingthroughCDNs,resultsinmultipleconnectionsbeingcreatedovertime.
Finally,thepopularGooglesearchandYouTubevideoservicesalsorankinthetopinbothISP-1andMOBtraces.
InFigure14(c),theCAMtraceshowsagainsomedifferenceswiththetracesfromEuropeandUS.
Thewebsitewiththemostowsisbaidu.
com,apopularsearchengine.
Furthermore,thesocialnetworksitesrenren.
comandshoppingportaltaobao.
com(thinkamazon)arealsoamongthetopwebsitesintermsofows.
Keytakeaway:Overall,examiningbasicpopularitytrendsinthissectionhighlightsthatdifferentstatisticsprovidedifferentviews.
Forexample,lookingatuserrequestsversusgeneratednetworktrafcornetworkowsresultsindifferencesinthepopularservices,whichshowstheimportanceofaccuratelyreconstructinguserrequests.
ThishighlightstheadvantageofhavingamethodologysuchasReSurfthatcanprovidesuchkeyinsightsintowebtrafc.
4.
2Click-throughstreamsHavingexaminedthebasicpropertiesofuserrequestsandwebsitepopularity,wenowusethereferringrelationshipbetweenwebpagestobetterunderstanduserbrowsingpat-terns.
Wefocusontwomainquestions:(a)Howlongisaclick-throughstreamintermofuserrequests,i.
e.
,douserclickstakethemthroughseveralwebsitesand(b)whatarethetypicaltransitionsintheseclick-throughstreams,i.
e.
,whichwebsitesreferotherwebsites11020406080100246810121416CDF(%)Thenumberofuserrequestsinclick-throughstreamsLABISP-1MOBCAMFigure15:Thenumberofuserrequestsinclick-throughstreams(timeoutT=1800s).
020406080100123>3%Thenumberofwebsitesinclick-throughstreamsLABISP-1MOBCAMFigure16:Thenumberofwebsitesinclick-throughstreams.
InFigure15weshowthenumberofuserrequestsinclick-throughstreamswithatimeoutparameterofhalfanhour.
Thatis,weconsiderauserrequesttobeapartofaclick-throughstreamifandonlyifitislessthanhalfanhourawayfromtheprevioususerrequestinthestream.
Weexperi-mentedwithdifferenttimeoutsintherangeofveminutesuptoonehourwithverysimilarresultswhicharenotshownwhereforbrevity.
Figure15showsthatthemean(median)numberofuserrequestsinclick-throughstreamsis4.
5(3).
The95thpercentileofthenumberofuserrequestsisabout11.
Thisimpliesthat,typically,click-throughstreamsareshortwiththeusersgivingupbrowsingafterasmallnum-berofuserrequests.
InFigure16,weshowthedistributionofthenumberofwebsitesthatarebeingvisitedduringasingleclick-throughstream.
Themediannumberofwebsitesinaclick-throughstreamsistwoforthewirelineusersandoneforthemo-bileusers.
Thisobservationsuggeststhatmobileusersarelesslikelytoclickonlinksthattakethemtodifferentweb-sites,whichmightbeduetoloweravailabledownloadrates.
Intuitively,thissuggeststhatmobileusershaveanapplica-tioninmindwhenusingtheInternetandarelesslikelyto"surfaround.
"Finally,itisinterestingtoobservethatforalltracesthe95thpercentileisonlythree.
Perhapstheuseofspecializedwebservices,suchassocialnetworkingandsearchengines,explainsthisbehavior.
4.
2.
1TransitionsbetweenwebsitesClick-throughstreamsalsoallowustotrackthetransi-00.
511.
52googlexhamster.
comgooglefacebook.
comxhamster.
comforumcommunity.
netlocalsitegooglelocalsitelocalsitelocalsitelocalsitelocalsitefacebook.
comitalia-film.
comcamads.
netkingdomsofcamelot.
comfacebook.
compornbanana.
comshufuni.
comwikipedia.
orgmediagra.
comfacebook.
comrubiconproject.
commedleyads.
com178.
32.
140.
64bidsystem.
comyoutube.
comlocalsitelocalsiteeasycondominio.
comlocalsite4wnet.
complayfish.
commegavideo.
comvideosz.
comfacebook.
comzynga.
comglobalmailer.
comglobalmailer.
com%FromwebsiteTowebsite(a)ISP-1051015cellmania.
comgooglegooglecellmania.
comcellmania.
comgooglegooglepornhub.
comgooglegoogledolphin-browser.
comgooglegooglegooglegoogletube8.
comgooglelive.
comgooglelive.
comyahoo.
comfacebook.
compornhub.
comfacebook.
comgo.
comvirginmobileusa.
comimdb.
compornhubpremium.
comyoutube.
comspankwire.
comgoogleyahoo.
comusablenet.
commyspace.
combeemp3.
combrazzersmobile.
comk9x.
netmsn.
commyxer.
comfacebook.
com%FromwebsiteTowebsite(b)MOB00.
511.
52360.
cnhao123.
comgooglebaidu.
comsogou.
comsoku.
com360.
cnsogou.
comhao123.
combaidu.
combaidu.
combaidu.
com360.
cnbaidu.
combaidu.
combaidu.
com2345.
combaidu.
combaidu.
comgooglebaidu.
combaidu.
combaidu.
comsina.
com.
cnbaidu.
comyouku.
comrenren.
comrenren.
comrenren.
comyouku.
comrenren.
comsoso.
comgoogle163.
comsohu.
comtianya.
cnbaidu.
comtudou.
comdouban.
comwikipedia.
org%FromwebsiteTowebsite(c)CAMFigure17:Thetopwebsitetransitionsoverthreetraces.
tionacrossdifferentwebsites.
Tothisend,Figure17showsthemostfrequent20websitetransitionsindifferenttraces.
Thegureshowsthattransitionsdependonthetraceexam-ined.
Thisprobablyreectsboththedifferenttypeofcollec-tionenvironmentsandlocalitycharacteristics.
Specically,intheLABtrace(notshownduetospacelimitation),mosttransitionsarefromGoogletoacademicandprogrammingrelatedwebsitesreectingusersthataregraduatestudentsinaresearchlab.
AsshowninFigure17,websitetransitionsintheISP-1,MOB,andCAMtracesaremuchmorediversebecauseofthelargeruserpopulations.
IntheISP-1trace,mostwebsitetransitionsarefromGoogle,Facebookandonlineforumstovideoandgamewebsites.
InMOB,mostwebsitetransitionsarefromsearchengineandportalweb-sitestovideoandnews/blogwebsites.
InCAM,thereisaconsiderablepercentage(roughly10%)ofwebsitetransi-tionsfromportalwebsiteslike360.
cn,hao123.
comand2345.
comtootherwebsites,whichwedidnotobserveintheothertraces.
12Notsurprisingly,ourmeasurementsshowthat"referrer"websitesareusuallysearchengines,socialnetworkingandonlineforumwebsites,while"referred-to"websitesarecon-tentproviders,likewikipedia,YouTubeandNews/blogwebsites.
4.
3CharacteristicsofmobilewebtrafcHere,wefocusonwebtrafcgeneratedbymobileusersandfurtheremphasizeonwebsitesthatofferdedicatedsmartphoneapplicationsfortheirusers[20].
Withtheseproprietaryapplications,mobiledeviceuserscanaccesswebsiteswithouttraditionalwebbrowsers.
Asweexplainbelow,identifyingthetrafcfromthoseapplicationsrequiresalesscomplicatedapproachcomparedtoReSurf.
Toanalyzemobiletrafc,wefurtherdecomposedtheMOBtraceintothreecategories:(a)trafcgeneratedbyuserrequeststowardsregularwebpagesusingwebbrowsersonmobiledevices,(b)trafcgeneratedbyuserrequeststowardscustomizedmobilewebpagesusingwebbrowsers,and(c)trafcgeneratedbyuserrequestsusingsmartphoneapplications(e.
g.
,FacebookTouch).
Wedistinguishthetrafcgeneratedbysmartphoneapplicationsbyexaminingtheuser-agenteldintheHTTPheaders.
Further,todistinguishthetrafcgeneratedby(a)and(b),werstreconstructuserrequestsbyapplyingReSurf.
Then,wematchtheheadrequest'sURLagainstalistofkeywords.
Ifaheadrequest'sURLmatchesoneofthefollowingpatterns:m.
*,*.
mobi,*/wap,ormobile.
*,weconsideredtheuserrequesttobetowardsamobilecustomizedwebpage.
Otherwise,itisaregularwebpage.
Forexample,facebook.
comistowardsaregularwebpagewhilem.
facebook.
comisthecustomizedversion.
Wewillrefertothemas"Browsers→Regular"and"Browsers→Customized"throughtheremainingofthepaper.
Identifyingthetrafcfromsmartphoneapplications.
WeobservedthatHTTPheadersgeneratedbytheseappli-cationsonlyincludetheuseragent,host,URI,content-type,andlengthelds.
Thatis,otherimportanteldslikethere-ferrerandcachecontrolinformationaremissingforthema-jorityofcases.
ThereforeReSurfisnotapplicablehere.
Inanutshell,weclassifytheowsgeneratedbysmartphoneapplicationsusingtheuseragentinformationintheHTTPheaders.
WerstextractalluseragentspresentintheMOBtrace.
Intotal,thereare534differentuseragentsafterre-movingversionnumbers.
Then,wemanuallycompileakeywordlistforallthesmartphoneapplications,whichcov-ers118smartphoneapplications'useragents.
Finally,weclassifyallHTTPowsintoapplicationsbysearchingforthesekeywordsinuseragents.
Themostpopularapplica-tionsintermsofthenumberofnetworksowsareFace-book,YouTubeandPandora.
Finally,wesimplyusetiminginformationtogroupHTTPtransactionsintouserrequests.
Auserrequestexpiresifitisidleformorethan30seconds.
Weobservedqualitativelysimilarresultsbyusingdifferent0204060801001101001000CDF(%)ThenumberofobjectsinoneuserrequestBrowsers->RegularFBBrowsers->CustomizedFBApplications->FBFigure18:Thenumberofobjectsdownloadedbyuserre-questsinitiatedbytraditionalwebbrowserstowardsregularandcustomizedFacebookwebpages,aswellasbyFace-booksmartphoneapplications.
timeoutsintherangeoffewsecondtofewminutes.
Forpresentationpurposes,wefocusonthecompar-isonofthetrafcgeneratedbyFacebooksmartphoneapplicationsandtraditionalwebbrowsers,bothforthe"Browsers→Regular"and"Browsers→Customized".
WeobservedqualitativelysimilarresultsforotherpopularservicesandweuseFacebookhereasarepresentativeexample.
Figure18showstheCDFofdownloadedobjectsinuserrequestsinitiatedbysmartphoneapplicationsandbrowsersintheMOBtrace.
Weobservedthatuserrequestsinitiatedbysmartphoneapplications(Applications→FB)andbytraditionalbrowsers(Browsers→CustomizedFB)towardstocustomizedFacebookpagesaresimilarintermofdownloadedobjects.
However,userrequestsinitiatedbywebbrowsers(Browsers→RegularFB)towardsregularFacebookwebpages,onaveragedownloadalmosttwicemoreobjects,andgeneratefourtimesmorenetworktrafc.
Thisshowshowmobileuserscanbenetfromtheuseofsuchapplicationswhenaccessingtheirfavoritewebservicebydownloadingonlyrelevantcontent.
5.
RELATEDWORKTherecenttrendofnetworkservicesoverHTTPattractedtheinterestoftheresearchcommunity.
Labovitzetal.
[12]broughttolightthefactthatmostinter-domaintrafcisHTTP.
Schatzmanetal.
[15]presentamethodologytoidentifyweb-basedmailservers,anddistinguishingbetweenservices,suchasGmailandYahoomail.
Ermanetal.
[7]analyzetrafcfromresidentialusersandndthatasignicantpartofHTTPtrafcisgeneratedbyhand-helddevicesandhomeappliances,whilealargefractionismachinegenerated(e.
g.
,OS/Anti-virusupdates,ads).
Lietal.
[13]presentmethodstoidentifythetypeoftheobjecttransferredoverHTTP(e.
g.
,video,xml,jpeg).
TherecentworkfromSchneideretal.
[17]characterizetheinconsistenciesbetweenobservedHTTPtrafcandwhatisadvertisedinitsHTTPheader.
Allthispreviousworkiscomplementarytoourwork,astheyfocusondifferentproblems,andnotthereconstructionofweb-surngattheuserrequestslevelandthetargetwebsites.
Therearethreecategoriesofuserrequestsreconstructionmethodsintheexistingliterature.
Therstcategoryassumes13thatanyHTTPrequestforanHTMLobjectisconsideredastheheadHTTPrequestofauserrequest.
Thesecondcat-egoryisbasedonthetiminginformationofHTTPrequests[18,4,14]:iftheidletimebetweentwoHTTPrequestsissmallerthanapredenedthreshold,theybelongtothesameuserrequest.
Boththesetwocategoriesofmethodswereeffectiveintheearlydaysoftheweb,butarenotlongeref-fectiveduetothecomplexityoftheweb2.
0world.
Themostrelevantworktooursistheonethatfocusesontheevolutionofwebtrafc[8]startingfromlogsofwebproxyservers.
Tounderstandmodernwebtrafcandmeasurewebpagecom-plexity,theyproposeamethod,StreamStructure,todetect"primary"webpagesrequestedbyusers.
Akeylimi-tationofStreamStructureisitsdependenceonGoogleAnalyticsbeacons,whichseemtoformthebasisoftheirre-constructionalgorithmwithoutwhichaccuracydropssignif-icantly.
Recallthatabout80%ofuserrequestsdonothavewebanalyticsbeacons.
Intangentiallyrelatedwork,[10,11]studytheprivacyissuesarisingfromtheuseofwebanalyticsbeaconsinwebpages.
Finally,Xuetal.
[20]studythediverseusagepatternsofgeneralsmartphonesapplicationsfrommobilenetworktraces.
Beisdesthedifferentfocusofthiswork,ourstudyfurthercomparesvariousweb-trafccharacteristicsacrossmobileandwirelinetrafc.
6.
CONCLUSIONSWebtrafcdominatescurrentnetworktrafc,withHTTPbeingubiquitousacrossdifferentapplications.
Weframeandaddressarelativelynovelproblem:reconstructingweb-surngbehaviorfromnetworkdata.
Theproblemisfarfromtrivialgiventhecomplexandinterconnectedwebsitesofto-day.
Asakeycontribution,wedevelopReSurfwhichcanreconstructuserrequestswithmorethan95%precisionand91%recall.
Asoursecondcontribution,weshowcaseinter-estingresultsthatonecanobtainfromrawnetworktrafcbyanalyzinganumberofnetworktracesincludingaresi-dentialISPandmobileuserdata.
Asurprisingresultisthe"shallowness"oftheclick-throughstreamofusersaccessingwebsiteswithamedianoftwotransitions.
Consideringtherecenttrendsofwebbrowsingthroughcustomapplications,weexpectthisshallownesstobethenormofuserbrowsingpatternsinthefuture.
Wealsoquantifydifferencesbetweenmobileandwirelineweb-accesspatterns:mobileuserre-questsdownloadonethirdoftheobjectsandgenerateonetenthofthetrafccomparedtouserrequestsonthewiredtrace.
Suchndingsarejustasampleoftheresultsandanal-ysisthatReSurfcanenable.
Inthebigschemeofthings,ReSurfandsimilarfutureal-gorithms,representanenablingcapabilityforISPsandnet-workadministrators,thatwanttomanagetheirnetworksef-fectively,aswellasfornetworkresearchersthatwanttoan-alyzeandstudymodernwebtrafc.
7.
REFERENCES[1]http://www.
alexa.
com/topsites.
[2]AGER,B.
,M¨UHLBAUER,W.
,SMARAGDAKIS,G.
,ANDUHLIG,S.
Webcontentcartography.
InACMIMC(2011).
[3]ANTONIADES,D.
,MARKATOS,E.
,ANDDOVROLIS,C.
One-clickhostingservices:ale-sharinghideout.
InACMIMC(2009).
[4]BARFORD,P.
,ANDCROVELLA,M.
Generatingrepresentativewebworkloadsfornetworkandserverperformanceevaluation.
InACMSIGMETRICSPerformanceEvaluationReview(1998).
[5]BENEVENUTO,F.
,RODRIGUES,T.
,CHA,M.
,ANDALMEIDA,V.
Characterizinguserbehaviorinonlinesocialnetworks.
InACMIMC(2009).
[6]BUTKIEWICZ,M.
,MADHYASTHA,H.
,ANDSEKAR,V.
Understandingwebsitecomplexity:Measurements,metrics,andimplications.
InACMIMC(2011).
[7]ERMAN,J.
,GERBER,A.
,ANDSEN,S.
HTTPinthehome:itisnotjustaboutPCs.
InACMSIGCOMMComputerCommunicationReview(2011).
[8]IHM,S.
,ANDPAI,V.
Towardsunderstandingmodernwebtrafc.
InACMIMC(2011).
[9]KARAGIANNIS,T.
,PAPAGIANNAKI,K.
,ANDFALOUTSOS,M.
Blinc:multileveltrafcclassicationinthedark.
InACMSIGCOMM(2005).
[10]KRISHNAMURTHY,B.
,ANDWILLS,C.
Generatingaprivacyfootprintontheinternet.
InACMIMC(2006).
[11]KRISHNAMURTHY,B.
,ANDWILLS,C.
Privacydiffusionontheweb:Alongitudinalperspective.
InACMWWW(2009).
[12]LABOVITZ,C.
,LEKEL-JOHNSON,S.
,OBERHEIDE,J.
,ANDJAHANIAN,F.
InternetInter-DomainTrafc.
InACMSIGCOMM(2010).
[13]LI,W.
,MOORE,A.
,ANDCANINI,M.
ClassifyingHTTPtrafcinthenewage.
InACMSIGCOMMPoster(2008).
[14]MAH,B.
Anempiricalmodelofhttpnetworktrafc.
InIEEEINFOCOM(1997).
[15]SCHATZMANN,D.
,M¨UHLBAUER,W.
,SPYROPOULOS,T.
,ANDDIMITROPOULOS,X.
DiggingintoHTTPS:Flow-BasedClassicationofWebmailTrafc.
InACMIMC(2010).
[16]SCHNEIDER,F.
,FELDMANN,A.
,KRISHNAMURTHY,B.
,ANDWILLINGER,W.
Understandingonlinesocialnetworkusagefromanetworkperspective.
InACMIMC(2009).
[17]SCHNEIDER1I2I,F.
,AGER,B.
,ANDMAIER2I3,G.
Pitfallsinhttptrafcmeasurementsandanalysis.
InInternationalConferenceonPassiveandActiveMeasurement(PAM)(2012).
[18]SMITH,F.
,CAMPOS,F.
,JEFFAY,K.
,ANDOTT,D.
Whattcp/ipprotocolheaderscantellusabouttheweb.
InACMSIGMETRICSPerformanceEvaluationReview(2001).
[19]TRESTIAN,I.
,RANJAN,S.
,KUZMANOVI,A.
,ANDNUCCI,A.
Unconstrainedendpointproling(googlingtheinternet).
InACMSIGCOMM(2008).
[20]XU,Q.
,ERMAN,J.
,GERBER,A.
,MAO,Z.
,PANG,J.
,ANDVENKATARAMAN,S.
Identifyingdiverseusagebehaviorsofsmartphoneapps.
InACMIMC(2011).
14
ucloud香港服务器优惠降价活动开始了!此前,ucloud官方全球云大促活动的香港云服务器一度上涨至2核4G配置752元/年,2031元/3年。让很多想购买ucloud香港云服务器的新用户望而却步!不过,目前,ucloud官方下调了香港服务器价格,此前2核4G香港云服务器752元/年,现在降至358元/年,968元/3年,价格降了快一半了!UCloud活动路子和阿里云、腾讯云不同,活动一步到位,...
轻云互联成立于2018年的国人商家,广州轻云互联网络科技有限公司旗下品牌,主要从事VPS、虚拟主机等云计算产品业务,适合建站、新手上车的值得选择,香港三网直连(电信CN2GIA联通移动CN2直连);美国圣何塞(回程三网CN2GIA)线路,所有产品均采用KVM虚拟技术架构,高效售后保障,稳定多年,高性能可用,网络优质,为您的业务保驾护航。官方网站:点击进入广州轻云网络科技有限公司活动规则:用户购买任...
关于CYUN商家在之前有介绍过一次,CYUN是香港蓝米数据有限公司旗下的云计算服务品牌,和蓝米云、蓝米主机等同属该公司。商家主要是为个人开发者用户、中小型、大型企业用户提供一站式核心网络云端部署服务,促使用户云端部署化简为零,轻松快捷运用云计算。目前,CYUN主要运营美国、香港、台湾、日本、韩国CN2线路产品,包括云服务器、站群服务器和独立服务器等。这次看到CYUN夏季优惠活动发布了,依然是熟悉的...
filesonic为你推荐
云爆发养兵千日用兵千日这个说法对不对留学生认证留学生服务中心认证内容和范围?mathplayer西南交大网页上的 Mathplayer 安装了为什么还是用不了?lunwenjiance论文检测,知网的是32.4%,改了以后,维普的是29.23%。如果再到知网查,会不会超过呢?月神谭给点人妖。变身类得小说。www.qq530.com谁能给我一个听歌的网站?porndao单词prondao的汉语是什么haokandianyingwang谁给个好看的电影网站看看。www.119mm.comwww.993mm+com精品集!www.55125.cnwww95599cn余额查询
西安域名注册 谷歌域名邮箱 亚洲大于500m enzu BWH 60g硬盘 我爱水煮鱼 vip购优汇 秒杀预告 admit的用法 169邮箱 免费美国空间 国外视频网站有哪些 彩虹云 腾讯总部在哪 网通服务器 starry 免费asp空间申请 阿里云邮箱登陆地址 电信宽带测速软件 更多