activityopendns

opendns  时间:2021-05-20  阅读:()
TowardsaModelofDNSClientBehaviorKyleSchomp,MichaelRabinovich,MarkAllmanCaseWesternReserveUniversity,Cleveland,OH,USAInternationalComputerScienceInstitute,Berkeley,CA,USAAbstract.
TheDomainNameSystem(DNS)isacriticalcomponentoftheInternetinfrastructureasitmapshuman-readablehostnamesintotheIPaddressesthenetworkusestoroutetrac.
Yet,theDNSbehaviorofindividualclientsisnotwellunderstood.
Inthispaper,wepresentacharacterizationofDNSclientswithaneyetowardsdevelopingananalyticalmodelofclientinteractionwiththelargerDNSecosystem.
WhilethisisinitialworkandwedonotarriveataDNSworkloadmodel,wehighlightavarietyofbehaviorsandcharacteristicsthatenhanceourmentalmodelsofhowDNSoperatesandmoveustowardsananalyticalmodelofclient-sideDNSoperation.
1IntroductionThemodernInternetreliesontheDomainNameSystem(DNS)fortwomainfunctions.
First,theDNSallowspeopletoleveragehuman-friendlyhostnames(e.
g.
,"www.
cnn.
com")insteadofobtuseIPaddressestoidentifyahost.
Second,hostnamesprovidealayerofabstractionsuchthattheIPaddressassignedtoahostnamecanvaryovertime.
Inparticular,ContentDistributionNetworks(CDNs)employthislatebindingtodirectuserstothebestcontentreplica.
PreviousworkshowsthatDNSlookupsprecedeover60%ofTCPconnections[14].
Asaresult,individualclientsissuelargenumbersofDNSqueries.
Yet,ourunderstandingofDNSquerystreamsislargelybasedonaggregatepopula-tionsofclients—e.
g.
,atanorganizational[6]orresidentiallevel[3]—leavingourknowledgeofindividualclientbehaviorlimited.
ThispaperrepresentsaninitialsteptowardsunderstandingindividualclientDNSbehavior.
WemonitorDNStransactionsbetweenapopulationofthousandsofclientsandtheirlocalresolversuchthatweareabletodirectlytielookupstoindividualclients.
OurultimategoalisananalyticalmodelofDNSclientbehaviorthatcanbeusedforeverythingfromworkloadgenerationtoresourceprovisioningtoanomalydetection.
InthispaperweprovideacharacterizationofDNSbehavioralongthedimensionsourmodelwillultimatelycoverandalsoanecdotallyshowpromisingmodelingapproaches.
Note,oneviewholdsthatDNSisa"sideservice"andshouldnotbedirectlymodeled,butrathercanbewellunderstoodbyderivingtheDNSworkloadfromapplicationssuchaswebbrowsingandemailtransmission.
However,derivingaDNSworkloadfromapplicationbehaviorisatbestdicultbecause(i)clientThisworkwasfundedinpartbyNSFgrantCNS-1213157.
cachingpoliciesimpactwhatDNSqueriesareactuallysentinresponsetoanapplicationevent,(ii)someapplicationsselectivelyusepre-fetchingtolookupnamesbeforetheyareneededand(iii)suchaderivationwouldentailunder-standingmanyapplicationstopulltogetherareasonableDNSworkload.
There-fore,wetaketheapproachthatfocusingontheDNStracitselfisthemosttractablewaytounderstand—andeventuallymodel—namelookups.
Tomotivatetheneedforamodel,weprovideanexemplarfromourpreviouswork.
In[14],weproposethatclientsshoulddirectlyresolvehostnamesinsteadofusingarecursiveresolver.
Ideally,anevaluationofthisendsystem-basedmech-anismwouldbeconductedinthecontextofendsystemsthemselves.
However,thebestdatawecouldobtainwasatthelevelofindividualhouseholds—whichweknowtoincludemultiplehostsbehindaNAT.
Therefore,theresultsofourtrace-drivensimulationsareatbestanapproximationoftheimpactofthemech-anismwewereinvestigating.
OurresultswouldhavebeenmoreprecisehadwebeenabletoleverageamodelofindividualclientDNSbehavior.
Broadly,theremainderofthispaperfollowsthecontoursofwhatamodelwouldcapture.
Werstfocusonunderstandingthenatureoftheclientsthem-selvesin§3,ndingthatwhilemostaretraditionaluser-facingdevices,thereareothersthatinteractwiththeDNSindistinctways.
Nextweobservein§4thatDNSqueriesoftenoccurclosely-spacedintime—e.
g.
,drivenbyloadingobjectsforasinglewebpagefromdisparateservers—andthereforewedevelopamethodtogathertogetherqueriesintoclusters.
Wethenassessthenumberandspacingofqueriesin§5andnallytacklethepatternsinwhathostnamesindividualclientslookupin§6.
Wendthatclientshavefairlydistinct"workingsets"ofnames,andalsothathostnamepopularityhaspowerlawproperties.
2DatasetOurdatasetcomesfromtwopackettapsatCaseWesternReserveUniversity(CWRU)thatmonitorthelinksconnectingthetwodatacentersthathouseallveoftheUniversity'sDNSresolvers—i.
e.
,betweenclientdevicesandtheirre-cursiveDNSresolvers.
WecollectfullpayloadpackettracesofallUDPtracinvolvingport53(thedefaultDNSport).
ThecampuswirelessnetworksituatesclientdevicesbehindNATsandthereforewecannotisolateDNStractoin-dividualclients.
Hence,wedonotconsiderthistracinourstudy(although,futureworkremainstobetterunderstandDNSusageonmobiledevices).
TheUniversityAcceptableUsePolicyprohibitstheuseofNATonitswirednetworkswhileoeringwirelessaccessthroughoutthecampus,andthereforewebelievethetracwecapturefromthewirednetworkdoesrepresentindividualclients.
OurdatasetincludesallDNStracfromtwoseparateweeksandispartitionedbyclientlocation—intheresidentialoroceportionsofthenetwork.
DetailsofthedatasetsaregiveninTable1includingthenumberofqueries,thenumberofclientsthatissuethosequeries,andthenumberofhostnamesqueried.
Validation:DuringtheFebruarydatacollection,wecollectquerylogsfromthevecampusDNSresolverstovalidateourdatasets1.
Comparingthepacket1Weprefertracesoverlogsduetothebettertimestampresolution(msecvs.
sec).
DatasetDatesQueriesClientsHostnamesFeb:ResidentialFeb.
26-Mar.
432.
5M1359(IPs)652KFeb:Residential(lter)Feb.
26-27,Mar.
2-416.
4M1262(MACs)505KFeb:Residential:Users15.
3M1033499KFeb:Residential:Others1.
11M2297.
94KFeb:OceFeb.
26-Mar.
4232M8770(IPs)1.
98MFeb:Oce(lter)Feb.
26-27,Mar.
2-4143M8690(MACs)1.
87MFeb:Oce:Users118M59861.
52MFeb:Oce:Others25.
0M2704158KJun:ResidentialJun.
23-Jun.
2911.
7M345(IPs)140KJun:Residential(lter)Jun.
23-26,296.
22M334(MACs)120KJun:Residential:Users5.
81M204116KJun:Residential:Others408K1304.
13KJun:OceJun.
23-Jun.
29245M8335(IPs)1.
61MJun:Oce(lter)Jun.
23-26,29133M8286(MACs)1.
52MJun:Oce:Users108M54951.
42MJun:Oce:Others25.
0M279163.
1KTable1.
Detailsofthedatasetsusedinthisstudy.
tracesandlogswenda0.
6%and1.
8%lossratesintheFeb:ResidentialandFeb:Ocedatasets,respectively.
Webelievetheselossesareanartifactofourmeasurementapparatusgiventhatthelossrateiscorrelatedwithtracvolume.
TrackingClients:Weaimtotrackindividualclientsinthefaceofdynamicaddressassignment.
SimultaneouslywiththeDNSpackettrace,wegatherlogsfromtheUniversity'sthreeDHCPservers.
Therefore,wecantrackDNSactivitybasedonMACaddresses.
Note,wecouldnotmap1.
3%ofthequeriesacrossourdatasetstoaMACaddressbecausethesourceIPaddressinthequeryneverappearsintheDHCPlogs.
TheselikelyrepresentstaticIPaddressallocations.
Further,withoutanyDHCPassignmentswearecondentthattheseIPsrepre-sentasinglehost.
FilteringDatasets:Wendtwoanomaliesthatskewthedatainwaysthatarenotindicativeofuserbehavior.
First,wendroughly25%ofthequeriesrequesttheTXTrecordfordebug.
opendns.
com.
(Thenextmostpopularrecordrepre-sentslessthan1%ofthelookups!
)Wendthisqueryisnotinresponsetousers'actions,butisautomaticallyissuedtodeterminewhethertheclientisusingtheOpenDNSresolver(indicatedintheanswer)[1].
Weobserve298clientsqueryingthisrecord,whichweassumeuseOpenDNSonothernetworksorusedOpenDNSinthepast.
Weremovethesequeriesfromfurtheranalysis.
Thesecondanomalyinvolves18clientswhoseprominentbehavioristoqueryfordebug.
opendns.
comandotherdomainsrepeatedlywithoutevidenceofaccomplishingmuchwork.
Thecampusinformationtechnologydepartmentveriedthattheseclientsserveanoperationalpurposeandarenotuser-facingdevices.
Therefore,weremovethe18clientsastheyarelikelyuniquetothisnetworkanddonotrepresentusers.
Wedonotattempttofurtherltermisbehavinghosts—e.
g.
,infectedormisconguredhosts—asweconsiderthempartoftheDNSworkload(e.
g.
,sincearesolverwouldberequiredtocopewiththeirrequests).
Timeframe:TomoredirectlycompareresidentialandocesettingsweexcludeSaturdayandSundayfromourdatasets.
Table1showsthemagnitudeofourltering.
Wendcommonalityacrossthepartitionsofthedata,sowefocusontheFeb:Residential:Usersdatasetforconcisenessanddiscusshowotherdatasetsdierasappropriate.
MarkerClients%All1262100%Googleanalytics98378%Searchengine101080%Google100680%Anyother60248%Gmail88170%LDAPLogin84066%Any103382%Table2.
Feb:Residentialclientsthattmarkersforgeneralpurposedevices.
3IdentifyingTypesofClientsSinceourfocusisoncharacterizinggeneralpurposeuser-facingdevices,weaimtoseparatethemfromothertypesofendsystems.
Weexpectgeneral-purposesys-temsareinvolvedintasks,suchas(i)webbrowsing,(ii)accessingsearchengines,(iii)usingemail,and(iv)conductinginstitutional-specictasks2.
Therefore,wedevelopthefollowingmarkerstoidentifygeneral-purposehosts:Browsing:AlargenumberofwebsitesembedGoogleAnalytics[8]intheirpages,thusthereisahighlikelihoodthatregularuserswillqueryforGoogleAnalyticshostnamesonoccasion.
Searching:WedetectwebsearchactivityviaDNSqueriesforthelargestsearchengines:Google,Yahoo,Bing,AOL,Ask,DuckDuckGo,Altavista,Baidu,Lycos,Excite,Naver,andYandex.
Email:CWRUusesGoogletomanagecampusemailandthereforeweusequeriesfor"mail.
google.
com"toindicateemailuse.
Institutional-SpecicTasks:CWRUusesasinglesign-onsystemforauthen-ticatingusersbeforetheyperformavarietyoftasksandthereforeweusequeriesforthecorrespondinghostnameasindicativeofuserbehavior.
Table2showsthebreakdownoftheclientsintheFeb:Residentialdataset.
Ofthe1,262clientsweidentify1,033asuser-facingbasedonatleastoneoftheabovemarkers.
Intuitivelyweexpectthatmultiplemarkerslikelyapplytomostgeneralpurposesystemsandinfactwendatleasttwomarkersapplyto991oftheclientsinourdataset.
Resultsforourotherdatasetsaresimilar.
Wenextturntothe229clients(≈18%)thatdonotmatchanyofourmark-ersforuser-facingclients.
TobetterunderstandtheseclientsweaggregatethembasedonthevendorportionoftheirMACaddresses.
First,wendasetofven-dorsandquerystreamsthatindicatespecial-purposedevices:(i)48Microsoftdevicesthatqueryfornameswithinthexboxlive.
comdomain,whichweconcludeareXboxgamingconsoles,(ii)33Sonydevicesthatqueryfornameswithintheplaystation.
netdomain,whichweconcludeareSonyPlaystationgamingcon-soles,(iii)16Appledevicesthathaveanaverageof11Kqueries—representing96%oftheirlookups—fortheapple.
comdomain,eventhoughtheaverageacrossalldevicesthatlookupanapple.
comnameis262queries,whichweconcludeareAppleTVdevicesand(iv)7Linksysdevicesthatissuequeriesfores-uds.
usatech.
com,whichweconcludearetransactionsystemsattachedtothelaundrymachinesintheresidencehalls(!
).
2Inourcase,thisiscampus-lifetasks,e.
g.
,checkingthecoursematerialsportal.
Inadditiontothese,wenddevicesthatwecannotpinpointexplicitly,butdonotinfactseemtobegeneral-purposeclientsystems.
Wend41DelldevicesthatdierfromthelargerpopulationofhostsinthattheyqueryformorePTRrecordsthanArecords.
Apotentialexplanationisthatthesedevicesareserversobtaininghostnamesforclientsthatconnecttothem(e.
g.
,aspartofsshd'svericationstepsortologclientconnects).
Wealsoidentify12KyoceradevicesthatissuequeriesforonlythecampusNTPandSMTPservers.
Weconcludethatthesearecopymachinesthatalsooeremailingofscanneddocuments.
FortheIPaddressesthatdonotappearintheDHCPlogs(i.
e.
,addressesstaticallyconguredonthehosts),wecannotobtainavendorID.
However,wenotethat97%ofthequeriesand96%oftheuniquedomainnamesfromthesemachinesinvolveCWRUdomainsandthereforeweconcludethattheyservesomeadministrativefunctionandarenotgeneralpurposeclients.
Theremaining61devicesaredistributedamong42hardwarevendors.
Intheremainderofthepaperwewillconsiderthegeneralpurposeclients(Users)andthespecialpurposeclients(Others)separately,aswedetailinTable1.
Wendthatourhigh-levelobservationsholdacrossalloftheUsersdatasets,andthuspresentresultsfortheFeb:Residential:Usersdatasetonly.
4QueryClustersApplicationsoftencallformultipleDNSqueriesinrapidsuccession—e.
g.
,aspartofloadingallobjectsonawebpage,orprefetchingnamesforlinksusersmayclick.
Inthissection,wequantifythisbehaviorusingtheDBSCANalgorithm[4]toconstructclustersofDNSqueriesthatlikelyshareanapplicationevent.
TheDBSCANalgorithmusestwoparameterstoformclusters:aminimumclustersizeMandadistanceεthatcontrolstheadditionofsamplestoacluster.
Weusetheabsolutedierenceinthequerytimestampsasthedistancemetric.
Ourrsttaskistochoosesuitableparameters.
Ourstrategyistostartwitharangeofparametersanddeterminewhetherthereisapointofconvergencewheretheresultsofclusteringdonotchangegreatlywiththeparameters.
Basedonthestrategyin[4],westartwithanMrangeof3–6andanεrangeof0.
5–5seconds—notethatM=2simpliestothresholdbasedclustering,butdoesnotproduceapointofconvergence.
Wendthat96%oftheclustersweidentifywithM=6areexactlyfoundwhenM=3andhenceatM=3wehaveconvergedonareasonablystableanswerwhichweuseinthesubsequentanalysis.
Additionally,wendthatforε∈[2.
5,5],thetotalnumberofclusters,thedistributionofclustersizes,andtheassignmentofqueriestoclustersremainsimilarirrespectiveofεvalueandthereforeuseε=2.
5secondsinouranalysis.
WedenetherstDNSqueryperclusterastherootandallsubsequentqueriesintheclusterasdependents.
IntheFeb:Residential:Usersdataset,wend1Mclustersthatencompass80%oftheroughly15Mqueriesinthedataset.
Tovalidatetheclusteringalgorithmwerstinspectthe67Kuniquehost-namesthealgorithmlabelsasnoise.
Wendavarietyofhostnameswiththemostfrequentbeing:WPAD[7]queriesfordiscoveringproxies,GoogleMailandGoogleDocs,softwareupdatepolling(e.
g.
,McAfeeandSymantec),heart-beatsignalsforgamingapplications(e.
g.
,Origin,Steam,Blizzard,Riot),videoFig.
1.
Numberofqueries,hostnames,andSLDspercluster.
Fig.
2.
Queriesissuedbyeachclientperday.
streaming(e.
g.
,Netix,YouTube,Twitch),andtheNetworkTimeProtocol(NTP).
AllofthesenamescanintuitivelycomefromapplicationsthatrequireonlysporadicDNSqueries,astheyareeithermakingquickcheckseveryonceinawhile,orareusinglong-livedsessionsthatleverageDNSonlywhenstarting.
Tovalidatetheclustersthemselves,weobservethattherearefrequentlyoc-curringroots.
Indeed,the1Mclustershaveonly72Kuniqueroots,withthe100mostfrequentlyoccurringrootsaccountingfor395K(40%)oftheclusters.
Fur-ther,the100mostpopularrootsincludepopularwebsites(e.
g.
,www.
facebook.
com,www.
google.
com).
Thesearethetypeofnameswewouldexpecttoberootsinthecontextofwebbrowsing.
Anothercommonrootissafebrowsing.
google.
com[9],ablacklistdirectoryusedbysomewebbrowserstodetermineifagivenwebsiteissafetoretrieve.
Thisisadistinctlydierenttypeofrootthanapopularwebsitebecausetherootisnotdirectlyrelatedtothedependentsbythepagecontent,butratherviaaprocessrunningontheclients.
ThisinsomesensemeansSafeBrowsing-basedclustershavetworoots.
WhileuseofSafeBrowsingisfairlycommoninourdataset,wedonotndadditionalprevalentcasesofthis"tworoots"phenomenon.
Fromamodelingstandpointwehavenotyetdeterminedwhether"tworoots"clusterswouldneedspecialtreatment.
Figure1showsthedistributionofqueriespercluster.
Whilethemajor-ityofclustersaresmall,therearerelativelyfewlargeclusters.
Wendthat90%ofclusterscontainatmost26queriesforatmost22hostnames.
Addi-tionally,wend90%oftheclustersencompassatmost10SLDs.
Thelargestclusterspans95secondsandconsistsof9,366queriesfornamesthatmatchtothe3rdlevellabel.
Thesecondlargestclusterconsistsof6,211queriesformyapps.
developer.
ubuntu.
com—whichislikelyaUbuntubug.
5QueryTimingNextwetacklethequestionofwhenandhowmanyqueriesclientsissue.
Webeginwiththedistributionoftheaveragenumberofqueriesthatclientsissueperday,Fig.
3.
Timebetweenqueriesfromthesameclientinaggregateandperclient.
Fig.
4.
Durationofclusters,inter-clusterquerytimeandintra-clusterquerytime.
asgiveninFigure2.
WendthatclientsinUsersissue2Klookupsperdayatthemedianand90%ofclientsinUsersissuelessthan6.
7Kqueriesperday.
TheOthersdatasetsshowgreatervariabilitywhererelativelyfewclientsgeneratethelion'sshareofqueries—i.
e.
,thetop5%ofclientsproduceroughlyasmanytotalDNSqueriesperdayasthebottom95%intheFeb:Residential:Othersdataset.
Arelatedmetricisthetimebetweensubsequentqueriesfromthesameclient,orinter-querytimes.
Figure3showsthedistributionoftheinter-querytimes.
The"Aggregate"lineshowsthedistributionacrossallclients.
Thearea"90%"showstherangewithinwhich90%oftheindividualclientinter-querytimedistributionsfall.
Themajorityofinter-querytimesareshort,with50%oflookupsoccurringwithin34millisecondsofthepreviousquery.
However,wealsondaheavytail,with0.
1%ofinter-querytimesbeingover25minutes.
Intuitively,longinter-querytimesrepresentoperiodswhentheclient'suserisawayfromthekeyboard(e.
g.
,asleeporatclass).
TheOthersdatasetsshowwiderangingbehaviorsuggestingthattheyarelessamenabletosuccinctdescriptioninanaggregatemodel.
FortheUsersdataset,weareabletomodeltheaggregateinter-querytimedistributionusingtheWeibulldistributionforthebodyandtheParetodistri-butionfortheheavytail.
Wendthatpartitioningthedataataninter-querytimeof22secondsminimizesthemeansquarederrorbetweenthedataandthetwoanalyticaldistributions.
Next,wettheanalyticaldistributions—splitat22seconds—toeachoftheindividualclientinter-querytimedistributions.
Wendthatwhiletheparametersvaryperclient,theempiricaldataiswellrepre-sentedbytheanalyticalmodelsasthemeansquarederrorfor90%ofclientsislessthan0.
0014.
Thus,parametersforamodelofqueryinter-arrivalswillvaryperclient,butthedistributionisinvariant.
Next,wemovefromfocusingonindividuallookupstofocusingontimingrelatedtothe1Mlookupclustersthatencompass12M(80%)ofthequeriesinourdataset(see§4).
Figure4showsourresults.
The"Intra-clustertime"lineshowsthedistributionofthetimebetweensuccessivequerieswithinthesamecluster.
Thistimeisboundedtoε=2.
5secondsbyconstruction,butover90%oftheinter-arrivalsarelessthan1second.
Ontheotherhand,theline"Inter-clusterFig.
5.
Fractionofqueriesissuedforeachhostnameperclient.
Fig.
6.
FractionofclientsissuingqueriesforeachhostnameandSLD.
time"showsthetimebetweenthelastqueryofaclusterandtherstqueryofthenextcluster.
Again,mostclustersareseparatedfromeachotherbymuchmorethanεtime,theminimumseparationbyconstruction.
Theline"Clusterduration"showsthetimebetweentherstandlastqueryineachcluster.
Mostclustersareshort,with99%lessthan18seconds.
Additionally,wendthatmostofclientDNStracoccursinshortclusters:50%ofclusteredqueriesbelongtoclusterswithdurationlessthan4.
6secondsand90%areinclusterswithdurationlessthan20seconds.
FortheOthersdatasets,asmallerpercentageofDNSqueriesoccurinclusters—e.
g.
,60%intheFeb:Residential:Othersdataset.
6QueryTargetsFinally,wetacklethequeriesthemselvesincludingrelationshipsbetweenqueries.
PopularityofNames:Weanalyzethepopularityofhostnamesusingtwomethods—howoftenthenameisqueriedacrossthedatasetandhowmanyclientsqueryforit.
Figure5showsthefractionofqueriesforeachhostname(withthehostnamessortedbydecreasingpopularity)intheFeb:Residential:Usersdataset.
Per§5,weplottheaggregatedistributionandarangethatencompasses90%oftheindividualclientdistributions.
Ofthe499Kuniquehostnameswithinourdataset,256K(51%)arelookeduponlyonce.
Meanwhile,thetop100hostnamesaccountfor28%ofDNSqueries.
Figure6showsthefractionofclientsthatqueryforeachname.
Wendthat77%ofhostnamesarequeriedbyonlyasingleclient.
However,over90%oftheclientslookupthe14mostpopularhostnames.
Additionally,13ofthesehostnamesareGoogleservicesandtheremainingoneiswww.
facebook.
com.
Theplotshowssimilarresultsforsecond-leveldomains(SLDs),where66%oftheSLDsarelookedupbyasingleclient.
Thedistributionsofbothqueriespernameandclientspernamedemonstratepowerlawbehaviorinthetail.
Interestingly,thePearsoncorrelationbetweenthesetwometrics—popularitybyqueriesandpopularitybyclients—isonly0.
54indicatingthatadomainnamewithmanyqueriesisnotnecessarilyqueriedbyalargefractionoftheclientpopulationandviceversa.
Asanexample,update-keepalive.
mcafee.
comisthe19thmostqueriedhostnamebutisonlyqueriedby8.
1%oftheclients.
Atthesametime,55%oftheclientsqueryfors2.
symcb.
com,butintermsoftotalqueriesthishostnameranksasonlythe1215thmostpop-ular.
ThisphenomenonmaybepartiallyexplainedbydierencesinTTL.
Therecordfors2.
symcb.
comhasaonehourTTL—limitingthequeryfrequency.
Meanwhile,updatekeepalive.
mcafee.
comhasa1minuteTTL.
GiventhisshortTTLandthatthenameimpliespollingactivity,thelargenumbersofqueriesfromagivenclientisunsurprising.
Thus,amodelofDNSclientbehaviormustaccountforthepopularityofhostnamesintermsofbothqueriesandclients.
TheheavytailsofthepopularitydistributionsrepresentalargefractionofDNStransactions.
However,wecannotdisregardunpopularnames—eventhosequeriedjustonce—becausetogethertheyareresponsibleforthemajorityofDNSactivitythereforeimpactingtheentireDNSecosystem(e.
g.
,cachebehavior).
Co-occurrenceNameRelationships:Inadditiontounderstandingpopular-ity,wenextassesstherelationshipsbetweennames,asthesehaveimplicationsonhowtomodelclientbehavior.
Thecrucialrelationshipbetweentwonamesthatweseektoquantifyisfrequentqueryingforthepairtogether.
Webeginwiththerequestclusters(§4)andleveragetheintuitionthattherstquerywithinaclustertriggersthesubsequentqueriesintheclusterandisthereforetherootlookup.
Thisfollowsfromthestructureofmodernwebpages,withacontainerpagecallingforadditionalobjectsfromavarietyofservers—e.
g.
,anaveragewebpageusesobjectsfrom16dierenthostnames[10].
Findingco-occurrenceiscomplicatedduetoclientcaching.
Thatis,wecannotexpecttoseetheentiresetofdependentlookupseachtimeweobservesomerootlookup.
Ourmethodologyfordetectingco-occurrenceisasfollows.
First,wedeneclusters(r)asthenumberofclusterswithrastherootacrossourdatasetandpairs(r,d)asthenumberofclusterswithrootrthatincludedependentd.
Second,welimitouranalysistothecasewhenclusters(r)≥10toreducethepotentialforfalsepositiverelationshipsbasedontoofewsamples.
IntheFeb:Residential:Usersdataset,wend7.
1K(9.
9%)oftheclustersmeetthesecriteria.
Withintheseclusterswend7.
5Mdependentqueriesand2.
2Munique(r,d)pairs.
Third,foreachpair(r,d),wecomputetheco-occurrenceasC=pairs(r,d)/clusters(r)—i.
e.
,thefractionoftheclusterswithrootrthatincluded.
Co-occurrenceofmostpairsislowwith2.
0M(93%)pairshavingaCmuchlessthan0.
1.
Wefocusonthe78KpairsthathavehighC—greaterthan0.
2.
Thesepairsinclude98%oftherootsweidentify,i.
e.
,nearlyallrootshaveatleastonedependentwithwhichtheyco-occurfrequently.
Also,thesepairscomprise28%ofthe7.
5Mdependentquerieswestudy.
Wenotethatintuitivelydependentnamescouldbeexpectedtosharelabelswiththeirroots—e.
g.
,www.
facebook.
comandstar.
c10r.
facebook.
com—andthiscouldbeafurtherwaytoassessco-occurrence.
However,wendthatonly27%ofthepairswithinclusterswithco-occurrenceofatleast0.
2sharethesameSLDand11%sharethe3rdlevellabelastheclusterroot.
Thissuggeststhatwhilenotrare,countingonco-occurringnamestobefromthesamezonetobuildclustersisdubious.
Asanextremeexample,GoogleAnalyticsisadependentof1,049uniqueclusterroots,mostofwhicharenotGooglenames.
Fig.
7.
Cosinesimilaritybetweenthequeryvectorsforthesameclient.
Fig.
8.
Cosinesimilaritybetweenthequeryvectorsfordierentclients.
Finally,wecannottestthemajorityoftheclustersandpairsforco-occurrencebecauseoflimitedsamples.
However,wehypothesizethatourresultsapplytoallclusters.
WenotethatthedistributionofthenumberofqueriesperclusterinFigure1issimilartothedistributionofthenumberofdependentsperrootwheretheco-occurrencefractionisgreaterthan0.
2.
Combiningourobservationsthat80%ofqueriesoccurinclusters,28%ofthedependentquerieswithinclustershavehighco-occurrencewiththeroot,andtheaverageclusterhas1rootand10dependents,weestimatethatataminimum800.
2810/11=20%ofDNSqueriesaredrivenbyco-occurrencerelationships.
Weconcludethatco-occurrencerelationshipsarecommon,thoughtherelationshipsdonotalwaysmanifestasrequestsonthewireduetocaching.
TemporalLocality:Wenextexplorehowthesetofnamesaclientquerieschangesovertime.
Asafoundation,weconstructavectorVc,dforeachclientcandeachdaydinourdataset,whichrepresentsthefractionoflookupsforeachnameweobserveinourdataset.
Specically,westartfromanalphabeticallyorderedlistofallhostnameslookedupacrossallclientsinourdataset,N.
WeinitiallyseteachVc,dtoavectorof|N|zeros.
WetheniteratethroughNandsetthecorrespondingpositionineachVc,dasthetotalnumberofqueriesclientcissuesfornameNiondayddividedbythetotalnumberofqueriescissuesondayd.
Thus,anexampleVc,dwouldbeinthecasewheretherearevetotalnamesinthedatasetandondaydtheclientqueriesforthesecondnameonce,thefourthnametwiceandthefthnameonce.
WerepeatthisprocessusingonlytheSLDsfromeachquery,aswell.
Werstinvestigatewhetherclients'queriestendtoremainstableacrossdaysinthedataset.
Forthis,wecomputetheminimumcosinesimilarityofthequeryvectorsforeachclientacrossallpairsofconsecutivedays.
Figure7showsthedistributionofminimumcosinesimilarityperclientintheFeb:Residential:Usersdataset.
Ingeneral,thecosinesimilarityvaluesarehigh—greaterthan0.
5for80%ofclientsforuniquehostnames—indicatingthatclientsqueryforasimilarsetofnamesinsimilarrelativefrequenciesacrossdays.
Giventhisresult,itisunsurprisingthatthegurealsoshowshighsimilarityacrossSLDs.
Fig.
9.
MeanhostnamesandSLDsqueriedbyeachclientperday.
Fig.
10.
Meanandmedianstackdistanceforeachclient.
Nextweassesswhetherdierentclientsqueryforsimilarsetsofnames.
Wecomputethecosinesimilarityacrossallpairsofclientsandforalldaysofourdataset.
Figure8showsthedistributionofthemaximumsimilarityperclientpairfromanyday.
Whenconsideringhostnames,wendlowersimilarityvaluesthanwhenfocusingonasingleclient—withonly3%showingsimilarityofatleast0.
5—showingthateachclientqueriesforafairlydistinctsetofhostnames.
ThesimilaritybetweenclientsisalsolowforsetsofSLDs,with55%ofthepairsshowingamaximumsimilaritylessthan0.
5.
Thus,clientsqueryfordierentspecichostnamesanddistinctsetsofSLDs.
TheseresultsshowthataclientDNSmodelmustensurethat(i)eachclienttendstostaysimilaracrosstimeandalsothat(ii)clientsmustbedistinctfromoneanother.
Analaspectweexploreishowquicklyaclientrepeatsaquery.
AsweshowinFigure2,50%oftheclientssendlessthan2Kqueriesperdayonaverage.
Figure9showsthedistributionoftheaveragenumberofuniquehostnamesthatclientsqueryperday.
Thenumberofnamesislessthantheoverallnumberoflookups,indicatingthepresenceofrepeatqueries.
Forinstance,atthemedian,aclientqueriesfor400uniquehostnamesand150SLDseachday.
Toassessthetemporallocalityofre-queries,wecomputethestackdistance[12]foreachquery—thenumberofuniquequeriessincethelastqueryforthegivenname.
Figure10showsthedistributionsofthemeanandmedianstackdistanceperclient.
Wendthestackdistancetoberelativelyshortinmostcases—withover85%ofthemediansbeinglessthan100.
However,thelongermeansshowthatthere-userateisnotalwaysshort.
Ourresultsshowthatvariationinrequeryingbehaviorexistsamongclients,withsomeclientsrevisitingnamesfrequentlyandothersqueryingalargersetofnameswithlessfrequency.
7RelatedWorkModelsofvariousprotocolshavebeenconstructedforunderstanding,simulat-ingandpredictingtrac(e.
g.
,[13]foravarietyoftraditionalprotocolsand[2]asanexampleofHTTPmodeling).
Additionally,thereispreviousworkoncharacterizingDNStrac(e.
g.
,[11,6]),whichfocusesontheaggregatetracofapopulationofclients,incontrasttoourfocusonindividualclients.
Finally,wenote—aswediscussin§1—thatseveralrecentstudiesinvolvingDNSmakeassumptionsaboutthebehaviorofindividualclientsorneedtoanalyzedataforspecicinformationbeforeproceeding.
Forinstance,theauthorsof[5]modelDNShierarchicalcacheperformanceusingananalyticalarrivalprocess,whilein[14],theauthorsusesimulationtoexplorechangestotheresolutionpath.
BothstudieswouldbenetfromagreaterunderstandingofDNSclientbehavior.
8ConclusionThisworkisaninitialsteptowardsrichlyunderstandingindividualDNSclientbehavior.
Wecharacterizeclientbehaviorinwaysthatwillultimatelyinformananalyticalmodel.
WendthatdierenttypesofclientsinteractwiththeDNSindistinctways.
Further,DNSqueriesoftenoccurinshortclustersofrelatednames.
Asasteptowardsananalyticalmodel,weshowthattheclientqueryarrivalprocessiswellmodeledbyacombinationoftheWeibullandParetodistributions.
Inaddition,wendthatclientshavea"workingset"ofnamesthatisbothfairlystableovertimeandfairlydistinctfromotherclients.
Fi-nally,ourhigh-levelresultsholdacrossbothtimeandqualitativelydierentuserpopulations—studentresidentialvs.
Universityoce.
Thisisaninitialindicationthatthebroadpropertiesweilluminateholdthepromisetobeinvariants.
References1.
OpenDNS.
http://www.
opendns.
com/.
2.
P.
BarfordandM.
Crovella.
GeneratingRepresentativeWebWorkloadsforNet-workandServerPerformanceEvaluation.
InACMSIGMETRICS,1998.
3.
T.
Callahan,M.
Allman,andM.
Rabinovich.
OnModernDNSBehaviorandProperties.
ACMSIGCOMMComputerCommunicationReview,July2013.
4.
M.
Ester,H.
-P.
Kriegel,J.
Sander,andX.
Xu.
ADensity-BasedAlgorithmforDiscoveringClustersinLargeSpatialDatabaseswithNoise.
InAAAIInternationalConferenceonKnowledgeDiscoveryandDataMining,1996.
5.
N.
C.
FofackandS.
Alouf.
ModelingModernDNSCaches.
InACMInternationalConferenceonPerformanceEvaluationMethodologiesandTools,2013.
6.
H.
Gao,V.
Yegneswaran,Y.
Chen,etal.
AnEmpiricalRe-examinationofGlobalDNSBehavior.
InACMSIGCOMM,2013.
7.
P.
Gauthier,J.
Cohen,andM.
Dunsmuir.
TheWebProxyAuto-DiscoveryPro-tocol.
IETFInternetDraft.
https://tools.
ietf.
org/html/draft-ietf-wrec-wpad-01(workinprogress),1999.
8.
WebsitesUsingGoogleAnalytics.
http://trends.
builtwith.
com/analytics/Google-Analytics.
9.
GoogleSafeBrowsing.
https://developers.
google.
com/safe-browsing.
10.
HTTPArchive.
http://httparchive.
org.
11.
J.
Jung,A.
W.
Berger,andH.
Balakrishnan.
ModelingTTL-BasedInternetCaches.
InIEEEInternationalConferenceonComputerCommunications,2003.
12.
R.
L.
Mattson,J.
Gecsei,D.
R.
Slutz,andI.
L.
Traiger.
EvaluationTechniquesforStorageHierarchies.
IBMSystemsJournal,1970.
13.
V.
Paxson.
EmpiricallyDerivedAnalyticModelsofWide-AreaTCPConnections.
IEEE/ACMTransactionsonNetworking,1994.
14.
K.
Schomp,M.
Allman,andM.
Rabinovich.
DNSResolversConsideredHarmful.
InACMWorkshoponHotTopicsinNetworks,2014.

wordpress公司网站模板 wordpress简洁高级通用公司主题

wordpress公司网站模板,wordpresss简洁风格的高级通用自适应网站效果,完美自适应支持多终端移动屏幕设备功能,高级可视化后台自定义管理模块+规范高效的搜索优化。wordpress公司网站模板采用标准的HTML5+CSS3语言开发,兼容当下的各种主流浏览器: IE 6+(以及类似360、遨游等基于IE内核的)、Firefox、Google Chrome、Safari、Opera等;同时...

imidc:$88/月,e3-1230/16G内存/512gSSD/30M直连带宽/13个IPv4日本多IP

imidc对日本独立服务器在搞特别促销,原价159美元的机器现在只需要88美元,而且给13个独立IPv4,30Mbps直连带宽,不限制流量。注意,本次促销只有一个链接,有2个不同的优惠码,你用不同的优惠码就对应着不同的配置,价格也不一样。88美元的机器,下单后默认不管就给512G SSD,要指定用HDD那就发工单,如果需要多加一个/28(13个)IPv4,每个月32美元...官方网站:https:...

极光KVM(限时16元),洛杉矶三网CN2,cera机房,香港cn2

极光KVM创立于2018年,主要经营美国洛杉矶CN2机房、CeRaNetworks机房、中国香港CeraNetworks机房、香港CMI机房等产品。其中,洛杉矶提供CN2 GIA、CN2 GT以及常规BGP直连线路接入。从名字也可以看到,VPS产品全部是基于KVM架构的。极光KVM也有明确的更换IP政策,下单时选择“IP保险计划”多支付10块钱,可以在服务周期内免费更换一次IP,当然也可以不选择,...

opendns为你推荐
中证财通中国可持续发展100(ECPIcontentcssurlcssfollowcss技术参数及要求:win10445端口怎么样打开电脑10800端口css下拉菜单html+css下拉菜单怎么制作ms17-010win10pybaen.10.的硬币是哪国的再中国至多少钱chromeframe无法安装chrome frame,求助google分析google分析打不开了?
双线vps 域名备案号查询 idc评测 云网数据 美国主机评测 diahosting 分销主机 美元争夺战 rak机房 softbank官网 警告本网站 英文站群 坐公交投2700元 创梦 申请个人网站 韩国名字大全 qq云端 网络空间租赁 服务器监测 流媒体加速 更多