activityopendns
opendns 时间:2021-05-20 阅读:(
)
TowardsaModelofDNSClientBehaviorKyleSchomp,MichaelRabinovich,MarkAllmanCaseWesternReserveUniversity,Cleveland,OH,USAInternationalComputerScienceInstitute,Berkeley,CA,USAAbstract.
TheDomainNameSystem(DNS)isacriticalcomponentoftheInternetinfrastructureasitmapshuman-readablehostnamesintotheIPaddressesthenetworkusestoroutetrac.
Yet,theDNSbehaviorofindividualclientsisnotwellunderstood.
Inthispaper,wepresentacharacterizationofDNSclientswithaneyetowardsdevelopingananalyticalmodelofclientinteractionwiththelargerDNSecosystem.
WhilethisisinitialworkandwedonotarriveataDNSworkloadmodel,wehighlightavarietyofbehaviorsandcharacteristicsthatenhanceourmentalmodelsofhowDNSoperatesandmoveustowardsananalyticalmodelofclient-sideDNSoperation.
1IntroductionThemodernInternetreliesontheDomainNameSystem(DNS)fortwomainfunctions.
First,theDNSallowspeopletoleveragehuman-friendlyhostnames(e.
g.
,"www.
cnn.
com")insteadofobtuseIPaddressestoidentifyahost.
Second,hostnamesprovidealayerofabstractionsuchthattheIPaddressassignedtoahostnamecanvaryovertime.
Inparticular,ContentDistributionNetworks(CDNs)employthislatebindingtodirectuserstothebestcontentreplica.
PreviousworkshowsthatDNSlookupsprecedeover60%ofTCPconnections[14].
Asaresult,individualclientsissuelargenumbersofDNSqueries.
Yet,ourunderstandingofDNSquerystreamsislargelybasedonaggregatepopula-tionsofclients—e.
g.
,atanorganizational[6]orresidentiallevel[3]—leavingourknowledgeofindividualclientbehaviorlimited.
ThispaperrepresentsaninitialsteptowardsunderstandingindividualclientDNSbehavior.
WemonitorDNStransactionsbetweenapopulationofthousandsofclientsandtheirlocalresolversuchthatweareabletodirectlytielookupstoindividualclients.
OurultimategoalisananalyticalmodelofDNSclientbehaviorthatcanbeusedforeverythingfromworkloadgenerationtoresourceprovisioningtoanomalydetection.
InthispaperweprovideacharacterizationofDNSbehavioralongthedimensionsourmodelwillultimatelycoverandalsoanecdotallyshowpromisingmodelingapproaches.
Note,oneviewholdsthatDNSisa"sideservice"andshouldnotbedirectlymodeled,butrathercanbewellunderstoodbyderivingtheDNSworkloadfromapplicationssuchaswebbrowsingandemailtransmission.
However,derivingaDNSworkloadfromapplicationbehaviorisatbestdicultbecause(i)clientThisworkwasfundedinpartbyNSFgrantCNS-1213157.
cachingpoliciesimpactwhatDNSqueriesareactuallysentinresponsetoanapplicationevent,(ii)someapplicationsselectivelyusepre-fetchingtolookupnamesbeforetheyareneededand(iii)suchaderivationwouldentailunder-standingmanyapplicationstopulltogetherareasonableDNSworkload.
There-fore,wetaketheapproachthatfocusingontheDNStracitselfisthemosttractablewaytounderstand—andeventuallymodel—namelookups.
Tomotivatetheneedforamodel,weprovideanexemplarfromourpreviouswork.
In[14],weproposethatclientsshoulddirectlyresolvehostnamesinsteadofusingarecursiveresolver.
Ideally,anevaluationofthisendsystem-basedmech-anismwouldbeconductedinthecontextofendsystemsthemselves.
However,thebestdatawecouldobtainwasatthelevelofindividualhouseholds—whichweknowtoincludemultiplehostsbehindaNAT.
Therefore,theresultsofourtrace-drivensimulationsareatbestanapproximationoftheimpactofthemech-anismwewereinvestigating.
OurresultswouldhavebeenmoreprecisehadwebeenabletoleverageamodelofindividualclientDNSbehavior.
Broadly,theremainderofthispaperfollowsthecontoursofwhatamodelwouldcapture.
Werstfocusonunderstandingthenatureoftheclientsthem-selvesin§3,ndingthatwhilemostaretraditionaluser-facingdevices,thereareothersthatinteractwiththeDNSindistinctways.
Nextweobservein§4thatDNSqueriesoftenoccurclosely-spacedintime—e.
g.
,drivenbyloadingobjectsforasinglewebpagefromdisparateservers—andthereforewedevelopamethodtogathertogetherqueriesintoclusters.
Wethenassessthenumberandspacingofqueriesin§5andnallytacklethepatternsinwhathostnamesindividualclientslookupin§6.
Wendthatclientshavefairlydistinct"workingsets"ofnames,andalsothathostnamepopularityhaspowerlawproperties.
2DatasetOurdatasetcomesfromtwopackettapsatCaseWesternReserveUniversity(CWRU)thatmonitorthelinksconnectingthetwodatacentersthathouseallveoftheUniversity'sDNSresolvers—i.
e.
,betweenclientdevicesandtheirre-cursiveDNSresolvers.
WecollectfullpayloadpackettracesofallUDPtracinvolvingport53(thedefaultDNSport).
ThecampuswirelessnetworksituatesclientdevicesbehindNATsandthereforewecannotisolateDNStractoin-dividualclients.
Hence,wedonotconsiderthistracinourstudy(although,futureworkremainstobetterunderstandDNSusageonmobiledevices).
TheUniversityAcceptableUsePolicyprohibitstheuseofNATonitswirednetworkswhileoeringwirelessaccessthroughoutthecampus,andthereforewebelievethetracwecapturefromthewirednetworkdoesrepresentindividualclients.
OurdatasetincludesallDNStracfromtwoseparateweeksandispartitionedbyclientlocation—intheresidentialoroceportionsofthenetwork.
DetailsofthedatasetsaregiveninTable1includingthenumberofqueries,thenumberofclientsthatissuethosequeries,andthenumberofhostnamesqueried.
Validation:DuringtheFebruarydatacollection,wecollectquerylogsfromthevecampusDNSresolverstovalidateourdatasets1.
Comparingthepacket1Weprefertracesoverlogsduetothebettertimestampresolution(msecvs.
sec).
DatasetDatesQueriesClientsHostnamesFeb:ResidentialFeb.
26-Mar.
432.
5M1359(IPs)652KFeb:Residential(lter)Feb.
26-27,Mar.
2-416.
4M1262(MACs)505KFeb:Residential:Users15.
3M1033499KFeb:Residential:Others1.
11M2297.
94KFeb:OceFeb.
26-Mar.
4232M8770(IPs)1.
98MFeb:Oce(lter)Feb.
26-27,Mar.
2-4143M8690(MACs)1.
87MFeb:Oce:Users118M59861.
52MFeb:Oce:Others25.
0M2704158KJun:ResidentialJun.
23-Jun.
2911.
7M345(IPs)140KJun:Residential(lter)Jun.
23-26,296.
22M334(MACs)120KJun:Residential:Users5.
81M204116KJun:Residential:Others408K1304.
13KJun:OceJun.
23-Jun.
29245M8335(IPs)1.
61MJun:Oce(lter)Jun.
23-26,29133M8286(MACs)1.
52MJun:Oce:Users108M54951.
42MJun:Oce:Others25.
0M279163.
1KTable1.
Detailsofthedatasetsusedinthisstudy.
tracesandlogswenda0.
6%and1.
8%lossratesintheFeb:ResidentialandFeb:Ocedatasets,respectively.
Webelievetheselossesareanartifactofourmeasurementapparatusgiventhatthelossrateiscorrelatedwithtracvolume.
TrackingClients:Weaimtotrackindividualclientsinthefaceofdynamicaddressassignment.
SimultaneouslywiththeDNSpackettrace,wegatherlogsfromtheUniversity'sthreeDHCPservers.
Therefore,wecantrackDNSactivitybasedonMACaddresses.
Note,wecouldnotmap1.
3%ofthequeriesacrossourdatasetstoaMACaddressbecausethesourceIPaddressinthequeryneverappearsintheDHCPlogs.
TheselikelyrepresentstaticIPaddressallocations.
Further,withoutanyDHCPassignmentswearecondentthattheseIPsrepre-sentasinglehost.
FilteringDatasets:Wendtwoanomaliesthatskewthedatainwaysthatarenotindicativeofuserbehavior.
First,wendroughly25%ofthequeriesrequesttheTXTrecordfordebug.
opendns.
com.
(Thenextmostpopularrecordrepre-sentslessthan1%ofthelookups!
)Wendthisqueryisnotinresponsetousers'actions,butisautomaticallyissuedtodeterminewhethertheclientisusingtheOpenDNSresolver(indicatedintheanswer)[1].
Weobserve298clientsqueryingthisrecord,whichweassumeuseOpenDNSonothernetworksorusedOpenDNSinthepast.
Weremovethesequeriesfromfurtheranalysis.
Thesecondanomalyinvolves18clientswhoseprominentbehavioristoqueryfordebug.
opendns.
comandotherdomainsrepeatedlywithoutevidenceofaccomplishingmuchwork.
Thecampusinformationtechnologydepartmentveriedthattheseclientsserveanoperationalpurposeandarenotuser-facingdevices.
Therefore,weremovethe18clientsastheyarelikelyuniquetothisnetworkanddonotrepresentusers.
Wedonotattempttofurtherltermisbehavinghosts—e.
g.
,infectedormisconguredhosts—asweconsiderthempartoftheDNSworkload(e.
g.
,sincearesolverwouldberequiredtocopewiththeirrequests).
Timeframe:TomoredirectlycompareresidentialandocesettingsweexcludeSaturdayandSundayfromourdatasets.
Table1showsthemagnitudeofourltering.
Wendcommonalityacrossthepartitionsofthedata,sowefocusontheFeb:Residential:Usersdatasetforconcisenessanddiscusshowotherdatasetsdierasappropriate.
MarkerClients%All1262100%Googleanalytics98378%Searchengine101080%Google100680%Anyother60248%Gmail88170%LDAPLogin84066%Any103382%Table2.
Feb:Residentialclientsthattmarkersforgeneralpurposedevices.
3IdentifyingTypesofClientsSinceourfocusisoncharacterizinggeneralpurposeuser-facingdevices,weaimtoseparatethemfromothertypesofendsystems.
Weexpectgeneral-purposesys-temsareinvolvedintasks,suchas(i)webbrowsing,(ii)accessingsearchengines,(iii)usingemail,and(iv)conductinginstitutional-specictasks2.
Therefore,wedevelopthefollowingmarkerstoidentifygeneral-purposehosts:Browsing:AlargenumberofwebsitesembedGoogleAnalytics[8]intheirpages,thusthereisahighlikelihoodthatregularuserswillqueryforGoogleAnalyticshostnamesonoccasion.
Searching:WedetectwebsearchactivityviaDNSqueriesforthelargestsearchengines:Google,Yahoo,Bing,AOL,Ask,DuckDuckGo,Altavista,Baidu,Lycos,Excite,Naver,andYandex.
Email:CWRUusesGoogletomanagecampusemailandthereforeweusequeriesfor"mail.
google.
com"toindicateemailuse.
Institutional-SpecicTasks:CWRUusesasinglesign-onsystemforauthen-ticatingusersbeforetheyperformavarietyoftasksandthereforeweusequeriesforthecorrespondinghostnameasindicativeofuserbehavior.
Table2showsthebreakdownoftheclientsintheFeb:Residentialdataset.
Ofthe1,262clientsweidentify1,033asuser-facingbasedonatleastoneoftheabovemarkers.
Intuitivelyweexpectthatmultiplemarkerslikelyapplytomostgeneralpurposesystemsandinfactwendatleasttwomarkersapplyto991oftheclientsinourdataset.
Resultsforourotherdatasetsaresimilar.
Wenextturntothe229clients(≈18%)thatdonotmatchanyofourmark-ersforuser-facingclients.
TobetterunderstandtheseclientsweaggregatethembasedonthevendorportionoftheirMACaddresses.
First,wendasetofven-dorsandquerystreamsthatindicatespecial-purposedevices:(i)48Microsoftdevicesthatqueryfornameswithinthexboxlive.
comdomain,whichweconcludeareXboxgamingconsoles,(ii)33Sonydevicesthatqueryfornameswithintheplaystation.
netdomain,whichweconcludeareSonyPlaystationgamingcon-soles,(iii)16Appledevicesthathaveanaverageof11Kqueries—representing96%oftheirlookups—fortheapple.
comdomain,eventhoughtheaverageacrossalldevicesthatlookupanapple.
comnameis262queries,whichweconcludeareAppleTVdevicesand(iv)7Linksysdevicesthatissuequeriesfores-uds.
usatech.
com,whichweconcludearetransactionsystemsattachedtothelaundrymachinesintheresidencehalls(!
).
2Inourcase,thisiscampus-lifetasks,e.
g.
,checkingthecoursematerialsportal.
Inadditiontothese,wenddevicesthatwecannotpinpointexplicitly,butdonotinfactseemtobegeneral-purposeclientsystems.
Wend41DelldevicesthatdierfromthelargerpopulationofhostsinthattheyqueryformorePTRrecordsthanArecords.
Apotentialexplanationisthatthesedevicesareserversobtaininghostnamesforclientsthatconnecttothem(e.
g.
,aspartofsshd'svericationstepsortologclientconnects).
Wealsoidentify12KyoceradevicesthatissuequeriesforonlythecampusNTPandSMTPservers.
Weconcludethatthesearecopymachinesthatalsooeremailingofscanneddocuments.
FortheIPaddressesthatdonotappearintheDHCPlogs(i.
e.
,addressesstaticallyconguredonthehosts),wecannotobtainavendorID.
However,wenotethat97%ofthequeriesand96%oftheuniquedomainnamesfromthesemachinesinvolveCWRUdomainsandthereforeweconcludethattheyservesomeadministrativefunctionandarenotgeneralpurposeclients.
Theremaining61devicesaredistributedamong42hardwarevendors.
Intheremainderofthepaperwewillconsiderthegeneralpurposeclients(Users)andthespecialpurposeclients(Others)separately,aswedetailinTable1.
Wendthatourhigh-levelobservationsholdacrossalloftheUsersdatasets,andthuspresentresultsfortheFeb:Residential:Usersdatasetonly.
4QueryClustersApplicationsoftencallformultipleDNSqueriesinrapidsuccession—e.
g.
,aspartofloadingallobjectsonawebpage,orprefetchingnamesforlinksusersmayclick.
Inthissection,wequantifythisbehaviorusingtheDBSCANalgorithm[4]toconstructclustersofDNSqueriesthatlikelyshareanapplicationevent.
TheDBSCANalgorithmusestwoparameterstoformclusters:aminimumclustersizeMandadistanceεthatcontrolstheadditionofsamplestoacluster.
Weusetheabsolutedierenceinthequerytimestampsasthedistancemetric.
Ourrsttaskistochoosesuitableparameters.
Ourstrategyistostartwitharangeofparametersanddeterminewhetherthereisapointofconvergencewheretheresultsofclusteringdonotchangegreatlywiththeparameters.
Basedonthestrategyin[4],westartwithanMrangeof3–6andanεrangeof0.
5–5seconds—notethatM=2simpliestothresholdbasedclustering,butdoesnotproduceapointofconvergence.
Wendthat96%oftheclustersweidentifywithM=6areexactlyfoundwhenM=3andhenceatM=3wehaveconvergedonareasonablystableanswerwhichweuseinthesubsequentanalysis.
Additionally,wendthatforε∈[2.
5,5],thetotalnumberofclusters,thedistributionofclustersizes,andtheassignmentofqueriestoclustersremainsimilarirrespectiveofεvalueandthereforeuseε=2.
5secondsinouranalysis.
WedenetherstDNSqueryperclusterastherootandallsubsequentqueriesintheclusterasdependents.
IntheFeb:Residential:Usersdataset,wend1Mclustersthatencompass80%oftheroughly15Mqueriesinthedataset.
Tovalidatetheclusteringalgorithmwerstinspectthe67Kuniquehost-namesthealgorithmlabelsasnoise.
Wendavarietyofhostnameswiththemostfrequentbeing:WPAD[7]queriesfordiscoveringproxies,GoogleMailandGoogleDocs,softwareupdatepolling(e.
g.
,McAfeeandSymantec),heart-beatsignalsforgamingapplications(e.
g.
,Origin,Steam,Blizzard,Riot),videoFig.
1.
Numberofqueries,hostnames,andSLDspercluster.
Fig.
2.
Queriesissuedbyeachclientperday.
streaming(e.
g.
,Netix,YouTube,Twitch),andtheNetworkTimeProtocol(NTP).
AllofthesenamescanintuitivelycomefromapplicationsthatrequireonlysporadicDNSqueries,astheyareeithermakingquickcheckseveryonceinawhile,orareusinglong-livedsessionsthatleverageDNSonlywhenstarting.
Tovalidatetheclustersthemselves,weobservethattherearefrequentlyoc-curringroots.
Indeed,the1Mclustershaveonly72Kuniqueroots,withthe100mostfrequentlyoccurringrootsaccountingfor395K(40%)oftheclusters.
Fur-ther,the100mostpopularrootsincludepopularwebsites(e.
g.
,www.
facebook.
com,www.
google.
com).
Thesearethetypeofnameswewouldexpecttoberootsinthecontextofwebbrowsing.
Anothercommonrootissafebrowsing.
google.
com[9],ablacklistdirectoryusedbysomewebbrowserstodetermineifagivenwebsiteissafetoretrieve.
Thisisadistinctlydierenttypeofrootthanapopularwebsitebecausetherootisnotdirectlyrelatedtothedependentsbythepagecontent,butratherviaaprocessrunningontheclients.
ThisinsomesensemeansSafeBrowsing-basedclustershavetworoots.
WhileuseofSafeBrowsingisfairlycommoninourdataset,wedonotndadditionalprevalentcasesofthis"tworoots"phenomenon.
Fromamodelingstandpointwehavenotyetdeterminedwhether"tworoots"clusterswouldneedspecialtreatment.
Figure1showsthedistributionofqueriespercluster.
Whilethemajor-ityofclustersaresmall,therearerelativelyfewlargeclusters.
Wendthat90%ofclusterscontainatmost26queriesforatmost22hostnames.
Addi-tionally,wend90%oftheclustersencompassatmost10SLDs.
Thelargestclusterspans95secondsandconsistsof9,366queriesfornamesthatmatchtothe3rdlevellabel.
Thesecondlargestclusterconsistsof6,211queriesformyapps.
developer.
ubuntu.
com—whichislikelyaUbuntubug.
5QueryTimingNextwetacklethequestionofwhenandhowmanyqueriesclientsissue.
Webeginwiththedistributionoftheaveragenumberofqueriesthatclientsissueperday,Fig.
3.
Timebetweenqueriesfromthesameclientinaggregateandperclient.
Fig.
4.
Durationofclusters,inter-clusterquerytimeandintra-clusterquerytime.
asgiveninFigure2.
WendthatclientsinUsersissue2Klookupsperdayatthemedianand90%ofclientsinUsersissuelessthan6.
7Kqueriesperday.
TheOthersdatasetsshowgreatervariabilitywhererelativelyfewclientsgeneratethelion'sshareofqueries—i.
e.
,thetop5%ofclientsproduceroughlyasmanytotalDNSqueriesperdayasthebottom95%intheFeb:Residential:Othersdataset.
Arelatedmetricisthetimebetweensubsequentqueriesfromthesameclient,orinter-querytimes.
Figure3showsthedistributionoftheinter-querytimes.
The"Aggregate"lineshowsthedistributionacrossallclients.
Thearea"90%"showstherangewithinwhich90%oftheindividualclientinter-querytimedistributionsfall.
Themajorityofinter-querytimesareshort,with50%oflookupsoccurringwithin34millisecondsofthepreviousquery.
However,wealsondaheavytail,with0.
1%ofinter-querytimesbeingover25minutes.
Intuitively,longinter-querytimesrepresentoperiodswhentheclient'suserisawayfromthekeyboard(e.
g.
,asleeporatclass).
TheOthersdatasetsshowwiderangingbehaviorsuggestingthattheyarelessamenabletosuccinctdescriptioninanaggregatemodel.
FortheUsersdataset,weareabletomodeltheaggregateinter-querytimedistributionusingtheWeibulldistributionforthebodyandtheParetodistri-butionfortheheavytail.
Wendthatpartitioningthedataataninter-querytimeof22secondsminimizesthemeansquarederrorbetweenthedataandthetwoanalyticaldistributions.
Next,wettheanalyticaldistributions—splitat22seconds—toeachoftheindividualclientinter-querytimedistributions.
Wendthatwhiletheparametersvaryperclient,theempiricaldataiswellrepre-sentedbytheanalyticalmodelsasthemeansquarederrorfor90%ofclientsislessthan0.
0014.
Thus,parametersforamodelofqueryinter-arrivalswillvaryperclient,butthedistributionisinvariant.
Next,wemovefromfocusingonindividuallookupstofocusingontimingrelatedtothe1Mlookupclustersthatencompass12M(80%)ofthequeriesinourdataset(see§4).
Figure4showsourresults.
The"Intra-clustertime"lineshowsthedistributionofthetimebetweensuccessivequerieswithinthesamecluster.
Thistimeisboundedtoε=2.
5secondsbyconstruction,butover90%oftheinter-arrivalsarelessthan1second.
Ontheotherhand,theline"Inter-clusterFig.
5.
Fractionofqueriesissuedforeachhostnameperclient.
Fig.
6.
FractionofclientsissuingqueriesforeachhostnameandSLD.
time"showsthetimebetweenthelastqueryofaclusterandtherstqueryofthenextcluster.
Again,mostclustersareseparatedfromeachotherbymuchmorethanεtime,theminimumseparationbyconstruction.
Theline"Clusterduration"showsthetimebetweentherstandlastqueryineachcluster.
Mostclustersareshort,with99%lessthan18seconds.
Additionally,wendthatmostofclientDNStracoccursinshortclusters:50%ofclusteredqueriesbelongtoclusterswithdurationlessthan4.
6secondsand90%areinclusterswithdurationlessthan20seconds.
FortheOthersdatasets,asmallerpercentageofDNSqueriesoccurinclusters—e.
g.
,60%intheFeb:Residential:Othersdataset.
6QueryTargetsFinally,wetacklethequeriesthemselvesincludingrelationshipsbetweenqueries.
PopularityofNames:Weanalyzethepopularityofhostnamesusingtwomethods—howoftenthenameisqueriedacrossthedatasetandhowmanyclientsqueryforit.
Figure5showsthefractionofqueriesforeachhostname(withthehostnamessortedbydecreasingpopularity)intheFeb:Residential:Usersdataset.
Per§5,weplottheaggregatedistributionandarangethatencompasses90%oftheindividualclientdistributions.
Ofthe499Kuniquehostnameswithinourdataset,256K(51%)arelookeduponlyonce.
Meanwhile,thetop100hostnamesaccountfor28%ofDNSqueries.
Figure6showsthefractionofclientsthatqueryforeachname.
Wendthat77%ofhostnamesarequeriedbyonlyasingleclient.
However,over90%oftheclientslookupthe14mostpopularhostnames.
Additionally,13ofthesehostnamesareGoogleservicesandtheremainingoneiswww.
facebook.
com.
Theplotshowssimilarresultsforsecond-leveldomains(SLDs),where66%oftheSLDsarelookedupbyasingleclient.
Thedistributionsofbothqueriespernameandclientspernamedemonstratepowerlawbehaviorinthetail.
Interestingly,thePearsoncorrelationbetweenthesetwometrics—popularitybyqueriesandpopularitybyclients—isonly0.
54indicatingthatadomainnamewithmanyqueriesisnotnecessarilyqueriedbyalargefractionoftheclientpopulationandviceversa.
Asanexample,update-keepalive.
mcafee.
comisthe19thmostqueriedhostnamebutisonlyqueriedby8.
1%oftheclients.
Atthesametime,55%oftheclientsqueryfors2.
symcb.
com,butintermsoftotalqueriesthishostnameranksasonlythe1215thmostpop-ular.
ThisphenomenonmaybepartiallyexplainedbydierencesinTTL.
Therecordfors2.
symcb.
comhasaonehourTTL—limitingthequeryfrequency.
Meanwhile,updatekeepalive.
mcafee.
comhasa1minuteTTL.
GiventhisshortTTLandthatthenameimpliespollingactivity,thelargenumbersofqueriesfromagivenclientisunsurprising.
Thus,amodelofDNSclientbehaviormustaccountforthepopularityofhostnamesintermsofbothqueriesandclients.
TheheavytailsofthepopularitydistributionsrepresentalargefractionofDNStransactions.
However,wecannotdisregardunpopularnames—eventhosequeriedjustonce—becausetogethertheyareresponsibleforthemajorityofDNSactivitythereforeimpactingtheentireDNSecosystem(e.
g.
,cachebehavior).
Co-occurrenceNameRelationships:Inadditiontounderstandingpopular-ity,wenextassesstherelationshipsbetweennames,asthesehaveimplicationsonhowtomodelclientbehavior.
Thecrucialrelationshipbetweentwonamesthatweseektoquantifyisfrequentqueryingforthepairtogether.
Webeginwiththerequestclusters(§4)andleveragetheintuitionthattherstquerywithinaclustertriggersthesubsequentqueriesintheclusterandisthereforetherootlookup.
Thisfollowsfromthestructureofmodernwebpages,withacontainerpagecallingforadditionalobjectsfromavarietyofservers—e.
g.
,anaveragewebpageusesobjectsfrom16dierenthostnames[10].
Findingco-occurrenceiscomplicatedduetoclientcaching.
Thatis,wecannotexpecttoseetheentiresetofdependentlookupseachtimeweobservesomerootlookup.
Ourmethodologyfordetectingco-occurrenceisasfollows.
First,wedeneclusters(r)asthenumberofclusterswithrastherootacrossourdatasetandpairs(r,d)asthenumberofclusterswithrootrthatincludedependentd.
Second,welimitouranalysistothecasewhenclusters(r)≥10toreducethepotentialforfalsepositiverelationshipsbasedontoofewsamples.
IntheFeb:Residential:Usersdataset,wend7.
1K(9.
9%)oftheclustersmeetthesecriteria.
Withintheseclusterswend7.
5Mdependentqueriesand2.
2Munique(r,d)pairs.
Third,foreachpair(r,d),wecomputetheco-occurrenceasC=pairs(r,d)/clusters(r)—i.
e.
,thefractionoftheclusterswithrootrthatincluded.
Co-occurrenceofmostpairsislowwith2.
0M(93%)pairshavingaCmuchlessthan0.
1.
Wefocusonthe78KpairsthathavehighC—greaterthan0.
2.
Thesepairsinclude98%oftherootsweidentify,i.
e.
,nearlyallrootshaveatleastonedependentwithwhichtheyco-occurfrequently.
Also,thesepairscomprise28%ofthe7.
5Mdependentquerieswestudy.
Wenotethatintuitivelydependentnamescouldbeexpectedtosharelabelswiththeirroots—e.
g.
,www.
facebook.
comandstar.
c10r.
facebook.
com—andthiscouldbeafurtherwaytoassessco-occurrence.
However,wendthatonly27%ofthepairswithinclusterswithco-occurrenceofatleast0.
2sharethesameSLDand11%sharethe3rdlevellabelastheclusterroot.
Thissuggeststhatwhilenotrare,countingonco-occurringnamestobefromthesamezonetobuildclustersisdubious.
Asanextremeexample,GoogleAnalyticsisadependentof1,049uniqueclusterroots,mostofwhicharenotGooglenames.
Fig.
7.
Cosinesimilaritybetweenthequeryvectorsforthesameclient.
Fig.
8.
Cosinesimilaritybetweenthequeryvectorsfordierentclients.
Finally,wecannottestthemajorityoftheclustersandpairsforco-occurrencebecauseoflimitedsamples.
However,wehypothesizethatourresultsapplytoallclusters.
WenotethatthedistributionofthenumberofqueriesperclusterinFigure1issimilartothedistributionofthenumberofdependentsperrootwheretheco-occurrencefractionisgreaterthan0.
2.
Combiningourobservationsthat80%ofqueriesoccurinclusters,28%ofthedependentquerieswithinclustershavehighco-occurrencewiththeroot,andtheaverageclusterhas1rootand10dependents,weestimatethatataminimum800.
2810/11=20%ofDNSqueriesaredrivenbyco-occurrencerelationships.
Weconcludethatco-occurrencerelationshipsarecommon,thoughtherelationshipsdonotalwaysmanifestasrequestsonthewireduetocaching.
TemporalLocality:Wenextexplorehowthesetofnamesaclientquerieschangesovertime.
Asafoundation,weconstructavectorVc,dforeachclientcandeachdaydinourdataset,whichrepresentsthefractionoflookupsforeachnameweobserveinourdataset.
Specically,westartfromanalphabeticallyorderedlistofallhostnameslookedupacrossallclientsinourdataset,N.
WeinitiallyseteachVc,dtoavectorof|N|zeros.
WetheniteratethroughNandsetthecorrespondingpositionineachVc,dasthetotalnumberofqueriesclientcissuesfornameNiondayddividedbythetotalnumberofqueriescissuesondayd.
Thus,anexampleVc,dwouldbeinthecasewheretherearevetotalnamesinthedatasetandondaydtheclientqueriesforthesecondnameonce,thefourthnametwiceandthefthnameonce.
WerepeatthisprocessusingonlytheSLDsfromeachquery,aswell.
Werstinvestigatewhetherclients'queriestendtoremainstableacrossdaysinthedataset.
Forthis,wecomputetheminimumcosinesimilarityofthequeryvectorsforeachclientacrossallpairsofconsecutivedays.
Figure7showsthedistributionofminimumcosinesimilarityperclientintheFeb:Residential:Usersdataset.
Ingeneral,thecosinesimilarityvaluesarehigh—greaterthan0.
5for80%ofclientsforuniquehostnames—indicatingthatclientsqueryforasimilarsetofnamesinsimilarrelativefrequenciesacrossdays.
Giventhisresult,itisunsurprisingthatthegurealsoshowshighsimilarityacrossSLDs.
Fig.
9.
MeanhostnamesandSLDsqueriedbyeachclientperday.
Fig.
10.
Meanandmedianstackdistanceforeachclient.
Nextweassesswhetherdierentclientsqueryforsimilarsetsofnames.
Wecomputethecosinesimilarityacrossallpairsofclientsandforalldaysofourdataset.
Figure8showsthedistributionofthemaximumsimilarityperclientpairfromanyday.
Whenconsideringhostnames,wendlowersimilarityvaluesthanwhenfocusingonasingleclient—withonly3%showingsimilarityofatleast0.
5—showingthateachclientqueriesforafairlydistinctsetofhostnames.
ThesimilaritybetweenclientsisalsolowforsetsofSLDs,with55%ofthepairsshowingamaximumsimilaritylessthan0.
5.
Thus,clientsqueryfordierentspecichostnamesanddistinctsetsofSLDs.
TheseresultsshowthataclientDNSmodelmustensurethat(i)eachclienttendstostaysimilaracrosstimeandalsothat(ii)clientsmustbedistinctfromoneanother.
Analaspectweexploreishowquicklyaclientrepeatsaquery.
AsweshowinFigure2,50%oftheclientssendlessthan2Kqueriesperdayonaverage.
Figure9showsthedistributionoftheaveragenumberofuniquehostnamesthatclientsqueryperday.
Thenumberofnamesislessthantheoverallnumberoflookups,indicatingthepresenceofrepeatqueries.
Forinstance,atthemedian,aclientqueriesfor400uniquehostnamesand150SLDseachday.
Toassessthetemporallocalityofre-queries,wecomputethestackdistance[12]foreachquery—thenumberofuniquequeriessincethelastqueryforthegivenname.
Figure10showsthedistributionsofthemeanandmedianstackdistanceperclient.
Wendthestackdistancetoberelativelyshortinmostcases—withover85%ofthemediansbeinglessthan100.
However,thelongermeansshowthatthere-userateisnotalwaysshort.
Ourresultsshowthatvariationinrequeryingbehaviorexistsamongclients,withsomeclientsrevisitingnamesfrequentlyandothersqueryingalargersetofnameswithlessfrequency.
7RelatedWorkModelsofvariousprotocolshavebeenconstructedforunderstanding,simulat-ingandpredictingtrac(e.
g.
,[13]foravarietyoftraditionalprotocolsand[2]asanexampleofHTTPmodeling).
Additionally,thereispreviousworkoncharacterizingDNStrac(e.
g.
,[11,6]),whichfocusesontheaggregatetracofapopulationofclients,incontrasttoourfocusonindividualclients.
Finally,wenote—aswediscussin§1—thatseveralrecentstudiesinvolvingDNSmakeassumptionsaboutthebehaviorofindividualclientsorneedtoanalyzedataforspecicinformationbeforeproceeding.
Forinstance,theauthorsof[5]modelDNShierarchicalcacheperformanceusingananalyticalarrivalprocess,whilein[14],theauthorsusesimulationtoexplorechangestotheresolutionpath.
BothstudieswouldbenetfromagreaterunderstandingofDNSclientbehavior.
8ConclusionThisworkisaninitialsteptowardsrichlyunderstandingindividualDNSclientbehavior.
Wecharacterizeclientbehaviorinwaysthatwillultimatelyinformananalyticalmodel.
WendthatdierenttypesofclientsinteractwiththeDNSindistinctways.
Further,DNSqueriesoftenoccurinshortclustersofrelatednames.
Asasteptowardsananalyticalmodel,weshowthattheclientqueryarrivalprocessiswellmodeledbyacombinationoftheWeibullandParetodistributions.
Inaddition,wendthatclientshavea"workingset"ofnamesthatisbothfairlystableovertimeandfairlydistinctfromotherclients.
Fi-nally,ourhigh-levelresultsholdacrossbothtimeandqualitativelydierentuserpopulations—studentresidentialvs.
Universityoce.
Thisisaninitialindicationthatthebroadpropertiesweilluminateholdthepromisetobeinvariants.
References1.
OpenDNS.
http://www.
opendns.
com/.
2.
P.
BarfordandM.
Crovella.
GeneratingRepresentativeWebWorkloadsforNet-workandServerPerformanceEvaluation.
InACMSIGMETRICS,1998.
3.
T.
Callahan,M.
Allman,andM.
Rabinovich.
OnModernDNSBehaviorandProperties.
ACMSIGCOMMComputerCommunicationReview,July2013.
4.
M.
Ester,H.
-P.
Kriegel,J.
Sander,andX.
Xu.
ADensity-BasedAlgorithmforDiscoveringClustersinLargeSpatialDatabaseswithNoise.
InAAAIInternationalConferenceonKnowledgeDiscoveryandDataMining,1996.
5.
N.
C.
FofackandS.
Alouf.
ModelingModernDNSCaches.
InACMInternationalConferenceonPerformanceEvaluationMethodologiesandTools,2013.
6.
H.
Gao,V.
Yegneswaran,Y.
Chen,etal.
AnEmpiricalRe-examinationofGlobalDNSBehavior.
InACMSIGCOMM,2013.
7.
P.
Gauthier,J.
Cohen,andM.
Dunsmuir.
TheWebProxyAuto-DiscoveryPro-tocol.
IETFInternetDraft.
https://tools.
ietf.
org/html/draft-ietf-wrec-wpad-01(workinprogress),1999.
8.
WebsitesUsingGoogleAnalytics.
http://trends.
builtwith.
com/analytics/Google-Analytics.
9.
GoogleSafeBrowsing.
https://developers.
google.
com/safe-browsing.
10.
HTTPArchive.
http://httparchive.
org.
11.
J.
Jung,A.
W.
Berger,andH.
Balakrishnan.
ModelingTTL-BasedInternetCaches.
InIEEEInternationalConferenceonComputerCommunications,2003.
12.
R.
L.
Mattson,J.
Gecsei,D.
R.
Slutz,andI.
L.
Traiger.
EvaluationTechniquesforStorageHierarchies.
IBMSystemsJournal,1970.
13.
V.
Paxson.
EmpiricallyDerivedAnalyticModelsofWide-AreaTCPConnections.
IEEE/ACMTransactionsonNetworking,1994.
14.
K.
Schomp,M.
Allman,andM.
Rabinovich.
DNSResolversConsideredHarmful.
InACMWorkshoponHotTopicsinNetworks,2014.
DogYun(中文名称狗云)新上了一批韩国自动化上架独立服务器,使用月减200元优惠码后仅需每月300元,双E5 CPU,SSD+NVMe高性能硬盘,支持安装Linux或者Windows操作系统,下单自动化上架。这是一家成立于2019年的国人主机商,提供VPS和独立服务器租用等产品,数据中心包括中国香港、美国洛杉矶、日本、韩国、德国、荷兰等。下面分享这款自动化上架韩国独立服务器的配置和优惠码信息。...
如果我们较早关注NameCheap商家的朋友应该记得前几年商家黑色星期五和网络星期一的时候大促采用的闪购活动,每一个小时轮番变化一次促销活动而且限量的。那时候会导致拥挤官网打不开迟缓的问题。从去年开始,包括今年,NameCheap商家比较直接的告诉你黑色星期五和网络星期一为期6天的活动。没有给你限量的活动,只有限时六天,这个是到11月29日。如果我们有需要新注册、转入域名的可以参加,优惠力度还是比...
极光KVM怎么样?极光KVM本月主打产品:美西CN2双向,1H1G100M,189/年!在美西CN2资源“一兆难求”的大环境下,CN2+大带宽 是很多用户的福音,也是商家实力的象征。目前,极光KVM在7月份的促销,7月促销,美国CN2 GIA大带宽vps,洛杉矶联通cuvip,14元/月起;香港CN2+BGP仅19元/月起,这次补货,机会,不要错过了。点击进入:极光KVM官方网站地址极光KVM七月...
opendns为你推荐
考试chromeToolgraphOPENCORE苹果引导配置说明第四版-基于支持ipad甘肃省政府采购支持ipad支持ipad责任编辑:纪春您的iphonecss3圆角怎样用css实现圆角矩形?
日本软银 韩国空间 webhosting 免费ftp空间 阿里云代金券 地址大全 韩国网名大全 hostker 秒杀汇 shopex主机 申请免费空间和域名 我的世界服务器ip 什么是web服务器 深圳域名 卡巴斯基官网下载 好看的空间 重庆联通服务器托管 wordpress空间 美国十大啦 qq空间打开很慢 更多