Additionalp2pover官网
p2pover官网 时间:2021-05-24 阅读:(
)
Graph-basedP2PTrafcClassicationattheInternetBackboneMariosIliofotou,Hyun-chulKim,MichalisFaloutsos,MichaelMitzenmacher§,PrashanthPappu,andGeorgeVargheseUniversityofCalifornia,RiversideCAIDAandSeoulNationalUniversityUniversityofCalifornia,SanDiego§HarvardUniversityConviva,Inc.
Abstract—Monitoringnetworktrafcandclassifyingappli-cationsareessentialfunctionsfornetworkadministrators.
Inthispaper,weconsidertheuseofTrafcDispersionGraphs(TDGs)toclassifynetworktrafc.
Givenasetofows,aTDGisagraphwithanedgebetweenanytwoIPaddressesthatcommunicate;thusTDGscapturenetwork-wideinteractions.
UsingTDGs,wedevelopanapplicationclassicationframeworkdubbedGraption(Graph-basedclassication).
Ourframeworkprovidesasystematicwaytoharnessthepowerofnetwork-widebehavior,ow-levelcharacteristics,anddataminingtechniques.
Asaproofofconcept,weinstantiateourframeworktodetectP2Papplications,andshowthatitcanidentifyP2Ptrafcwithrecallandprecisiongreaterthan90%inbackbonetraces,whichareparticularlychallengingforothermethods.
I.
INTRODUCTIONAnimportanttaskwhenmonitoringandmanaginglargenetworksisclassifyingowsaccordingtotheapplicationthatgeneratesthem.
Suchinformationcanbeutilizedfornetworkplanninganddesign,QoSandtrafcshaping,andsecurity.
Inparticular,detectingP2PtrafcisapotentiallyimportantproblemforISPsthatwanttomanagesuchtrafc,andforspecicgroupssuchastheentertainmentindustryinlegalandcopyrightdisputes.
DetectingP2PtrafcalsohasparticularinterestsinceitrepresentsalargeportionoftheInternettrafc,withmorethan40%oftheoverallvolumeinsomenetworks[11].
Mostcurrentapplicationclassicationmethodscanbenat-urallycategorizedaccordingtotheirlevelofobservation:payload-basedsignature-matchingmethods[16],[14],ow-levelstatisticalapproaches[6],[18],orhost-levelmethods,suchasBLINC[13],[24].
Eachexistingapproachhasitsownprosandcons,andnosinglemethodclearlyemergesasawinner.
Relevantproblemsthatneedtobeconsideredincludeidentifyingapplicationsthatarenew,andthuswithoutaknownprole;operatingatbackbonelinks[2],[13];anddetectingapplicationsthatintentionallyaltertheirbehavior.
Flow-levelandpayload-basedapproachesrequireperapplicationtrainingandwillthusnotdetecttrafcfromemergingprotocols.
Host-basedapproachescandetecttrafcfromnewprotocols[13],buthaveweakperformancewhenappliedatthebackbone[2].
Inaddition,mosttoolsincludingBLINC[13](whichhas28parameters)requirene-tuningandcarefulselectionofparameters[2].
Wediscussthelimitationsofpreviousmethodsinmoredetailin§IV.
Inthispaper,weusethenetwork-widebehaviorofanapplicationtoassistinclassifyingitstrafc.
Tomodelthisbehavior,weusegraphswhereeachnodeisanIPaddress,andeachedgerepresentsatypeofinteractionbetweentwonodes.
WeusethetermTrafcDispersionGraphorTDGtorefertosuchagraph[10].
Whilewerecognizethatsomepreviousefforts[3],[5]haveusedgraphstodetectwormactivity,theyhavenotexploredthefullcapabilitiesofTDGsforapplicationclassication.
Weproposeaclassicationframework,dubbedGraption,asasystematicwaytocombinenetwork-widebehaviorandow-levelcharacteristics.
Graptionrstgroupsowsusingow-levelfeatures,inanunsupervisedandagnosticway,i.
e.
,withoutusingapplication-specicknowledge.
ItthenusesTDGstoclassifyeachgroupofows.
Asaproofofconcept,weinstantiateourframeworkanddevelopaP2Pdetectionmethod,whichwecall"Graption-P2P".
Comparedtoothermethods,Graption-P2Piseasytocongureandrequiresverylittleaprioriknowledge(mainlyafewintuitiveparameters).
Theexperimentalpartofourpapershowsthat:Graption-P2Pidentiesover90%ofP2Ptrafcwithprecisiongreaterthan95%inbackbonetraces.
Graption-P2PperformsbetterthanBLINCinP2Pidenti-cationatthebackbone.
Forexample,Graption-P2Piden-ties95%ofBitTorrenttrafcwhileBLINCidentiesonly25%.
EvenasinglebackbonelinkcontainsenoughinformationtogenerateTDGsthatcanbeusedtoclassifytrafc.
Inaddition,TDGsofthesameapplicationseemfairlyconsistentacrossdifferenttimesandlocations.
Therestofthepaperisorganizedasfollows.
In§IIwedeneTDGs,andidentifyTDG-basedmetricsthatdifferen-tiatebetweenapplications.
In§IIIwepresenttheGraptionframeworkandourinstantiation,Graption-P2P.
In§IVwediscussrelatedwork.
In§Vweconcludethepaper.
II.
TRAFFICDISPERSIONGRAPHSDenition.
Throughoutthispaper,weassumethatpacketscanbegroupedintoowsusingthestandard5-tuple{srcIP,NameDate/TimeDurationFlowsTR-PAY12004-04-21/17:591hour38,808,604TR-PAY22004-04-21/19:001hour37,612,752TR-ABIL2002-09/(N/A)1month2,057,729TABLEISETOFBACKBONETRACESFROMTHECOOPERATIVEASSOCIATIONFORINTERNETDATAANALYSIS(CAIDA).
STATISTICSFORTHETR-ABILTRACE,AREREPORTEDONLYFORTHEFIRSTFIVE-MINUTEINTERVAL.
srcPort,dstIP,dstPort,protocol}.
GivenagroupofowsS,collectedoveraxed-lengthtimeinterval,wedenethecorrespondingTDGtobeadirectedgraphG(V,E),wherethesetofnodesVcorrespondstothesetofIPaddressesinS,andthereisalink(u,v)∈Efromutovifthereisaowf∈Sbetweenthem.
Inthispaper,weconsiderbidirectionalows.
WedeneaTCPowtostartontherstpacketwiththeSYNagset(referredtoastheSYN-packet),sothattheinitiatorandtherecipientoftheowaredenedforthepurposesofdirection.
ForUDPows,directionisdecidedupontherstpacketoftheow.
DataSet.
TostudyTDGs,weusethreebackbonetracesfromaTier-1ISPandtheAbilene(Internet2)network.
ThesetracesaresummarizedinTableI.
AlldataareIPanonymizedandcontaintrafcfrombothdirectionsofthelink.
TheTR-PAY1andTR-PAY21traceswerecollectedfromanOC48linkofacommercialUSTier-1ISPatthePaloAltoInterneteXchange(PAIX).
TheTR-ABILtraceisapubliclyavailabledatasetcollectedfromtheAbilene(Internet2)academicnet-workconnectingIndianapoliswithKansasCity.
TheAbilenetraceconsistsofverandomlyselectedve-minutesamplestakeneverydayforonemonth,andcoversbothdayandnighthoursaswellasweekdaysandweekends.
GroundTruth.
WeusedaPayload-basedClassier(PC)toestablishthegroundtruthofowsfortheTR-PAY1andTR-PAY2traces.
Bothtracescontainupto16bytesofpayloadineachpacket,therebyallowingthelabelingofowsusingthesignaturematchingtechniquesdescribedin[2],[13].
RunningthePCovertheTR-PAY1andTR-PAY2traceswend14%ofthetrafctobeP2P,28%Web,6%DNS,andtheresttobelongtootherapplications,suchasEmail,FTP,NTP,SNMP,etc.
Forourstudy,weremovethe2%oftrafcthatremainedunclassiedandthe28%thatcontainednopayload.
A.
IdentifyingP2PTDGsIdentifyingtherightmetricstocomparegraphstructuresisachallengingquestionthatarisesinmanydisciplines[17].
Ourapproachistoconsiderseveralgraphmetrics,eachcapturingapotentiallyusefulcharacteristic,untilasetofmetricsisfoundthatdistinguishesthetargetgraphs.
Toselectanappropriatesetofmetrics,wegeneratealargenumberofTDGsusingallourtraces(TableI),thusobservingTDGsovertwodifferentlocationsatthebackbone.
Forthe1TheauthorsthankCAIDAforprovidingthissetoftrafctraces.
AdditionalinformationforthesetracescanbefoundintheDatCat,InternetMeasurementDataCatalog[26],indexedunderthelabel"PAIX".
TR-PAY1andTR-PAY2traces,weusethepayload-basedclassier(PC)inordertoselectwhichowsbelongtoeachTDG.
SincetheTR-ABILtracedoesnothaveanypayloadinformation,weuseportnumbers[2]toassignowstoapplications.
WecanuseportnumbersfortheTR-ABILtracesinceitwascollectedin2002wheremostP2Papplicationsusedtheirdefaultportnumbers[7],[12].
WeonlyusetheTR-ABILtracetoverifyourTDGobservationsoverasecondlocationinthebackboneandwedonotuseitinthenalevaluationofourclassier.
Byusingthemonth-longTR-ABILtrace,wecanstudytheconsistencyofTDGsoverdifferenttimesofthedayandoverweekdaysandweekends.
WeobserveTDGsover5-minuteintervals.
ThisintervallengthgivesgoodclassicationresultsandstabilityofTDGmetricsovertime.
ForeachTDGwegenerateadiversesetofmetrics.
OurmetricscapturevariousaspectsofTDGsin-cludingthedegreedistribution,degreecorrelations,connectedcomponents,anddistancedistribution.
Foradditionaldetailsaboutthesemetricswereferthereaderto[9],[17].
Toselecttherightsetofmetricsweusevariousgraphvisu-alizationsandtrialanderror.
Findingalessadhocapproachisbeyondthescopeofthiswork.
TwoTDGvisualizationexamplesareshowninFigure1.
WeseethatFastTrack(P2P)hasadensergraphthanHTTPS,orahigheraveragedegree,wheretheaveragenodedegreekisgivenbyk=2|E|/|V|.
Weutilizetwoothermetricsthatcapturethedirectionalityoftheedgesinthegraphandthedistancesbetweennodes.
Thedirectionalityisusefulsinceweknowthatpureclientsonlyinitiatetrafc,pureserversshouldneverinitiatetrafc,andthatsomeP2Pnodesplaybothroles.
Tocapturethisquantitatively,wedeneInOtobethepercentageofnodesinthegraphthathavebothincomingandoutgoingedges.
Thedistancebetweentwonodesisdenedasthelengthoftheirshortestpathinthegraph.
Thediameterofagraphisdenedasthemaximumdistancebetweenallpairsofnodes,whichissensitiveasametric[17].
Foramorerobustmetric,weusetheeffectivediameter(EDiam),whichwedeneasthe90-thpercentileofallpairwisedistancesinthegraph.
Fromourmeasurements,weempiricallyderivethefollowingtworulesfordetectingP2Pactivity.
Rule1:k>2.
8andInO>1%;Rule2:InO>1%andEDiam>11.
Withthesesimplerules,wecancorrectlyidentifyallP2PTDGsfrombothbackbonelocations(AbilenebackboneandTier-1ISP).
Intuitively,P2Phostsneedtobeconnectedwithalargesetofpeersinordertoperformtaskssuchasansweringcontentqueriesandsharingles,whichcanexplainthehigheraveragedegreecomparedtoclient-serverapplications.
AnadditionalcharacteristicofP2Papplicationsisthedualityofroles,withmanyhostsactingbothasclientandserver.
ThedualityofrolesisinturncapturedbythehighInOvalue.
WefurtherspeculatethatthedecentralizedarchitectureofsomeP2Papplications(suchasBitTorrrent),canexplainthehighdiametersinsomeP2PTDGs.
AdditionalspeculationsonwhythesethreemetricseffectivelycaptureP2Pbehaviorisprovidedin[9]andareomittedduetospacelimitations.
121127951054910415782045848791995621348517161314601111210151046170015161001718312122233692627127396707242517323021034351703863233394094114361488616622731836112515181676414263053069811454546633104217045152495053541473541701555660616267685256364404747127073746970680834855891124818167980287777859723757635010668182324557106176517538586899016527687882282571784919295961159158916601697171193319376134516509997984644794521111451171181191661180120121126130131132133136137121013413543943140897945146714414391431411423221291148149377146150151155156292111285157158159663801691711726681080173174861877892174817517918017717818518618826477211741271183184135618739397011861911929601959161200193194343104890819820220337154362968117292094232122172182152161474223221222224225802226227232233114611811283231241237238245246249163254258294295296535617662827839893951971973103310551152121212111221124412511298131213741383139814161471176818651868194219432522534002552562672689612652662632712723032692703407147912771161404148827928040114242822831882281113728428569472883128612042892902882922932972983009533013021108152518693063093107571429146331231347011203189683163173143155113232511724320321325110160933033133233518246833333443197533633763413173383413461533339342160034834942434542624334435135235538508356357108935835966611173635013643653663673684625896383723703873753733743783793818833857893883899743923983991566402403409407408411414412413197173341741842142242542743051410314284294388961922434435156416404403442144216361867443444193344644565076844744845545345448645645773246113794584594634696074954671065478482483154248448949011151314156015901686178348767594849610055025031273500137116265101099509515516519520517518526527528531534129109654454154254554613739545485495685521592436550995555553554556121556256356056156556413461514186056719573626574575208126057732757958358258158058558658759259159459559859913956006036046058986089521446610611612613618619884620621625153762362464115486396406441757645646647649931989652653651656657654655658661665667670147366967483267867967749968268368468768868912556956967926916927501529699700697172570270370470518057081027091723596711712713716717140871871919542737227247267277307459431455746748749752755756758759760761166919107671077667697707747757787767817827797807845729826717837857867877217887907937941209139719047967978038018048078088148158168178258268282068858298308358378388438468448488498508528538628638648718728731041611875876882126613851248868878598951780899900901902910907905906912913911959917918915169192292092192593293993850494211391531946947958962964963965307328980164719051268981987988985990991476994996997107499899910017531006113310041955100910101007100810141019102013891027102819161032118310341035103610395333941041116916481893104710511053163190384210561057105818081060106410691088109010931094109710981100110173311021103110511061201200111311141118111911211124115311291127112811261131113211341135113810841141115111491150114711481935450449513115412051437116211631164117011731177117811821184118511911189119011871188119511921197119611991202120312061207120866312161217121812191194122012221223122612287401231928123312341235123712421029124612501252125612581259126112621267127273512741279128012781282128412861288128912921352129713011310131113091315131613181320132413271325132813291330133652293413351761133963114561018135513541357135813601361168113641384138713888221390139113941392139313961399138140114021403141214131410141114191423142514281430143114341441144414451449145314591464146114651466172211661469147015281826147214751478147914761486148514831484148714891491149214931494149915031504150515061508150915101512156918381515152015261534153515361538154015411544154515506151551164113051553155215551554155915621568156715711572114215741577157615781580158215831586159115931594159615971160130716141616161816171619162116221625163715221638164216431644164516531656178216611665167016711672167316801294177417891683168416871685169016891692169313431695169616991702170717091710171417151717171811551720172617281735173817391740234174417461747175117521754175517564191759176417651766176717691778177617791785178617871788179317951802180318061807181418151817181818271221834183518401841184318471858185918611863186246518641866187118721874187518734721877189218901891188918961897189819021906190713801917191952319211926192719251630192819341939194619471951195216028191961(a)TheFastTrack(KaZaa)TDG(P2Papplication).
1516252689799215521565159818632681336133653940444549502856499536364109230758586200887367288232239198824793059309695969798107108110111202089112113117118208712012182518471301313021321332088159160161162167168169170135130401791811821831841871881757189190193194498104721321421821922622724224524624324430442477782481392552562612622652841421972822832672803741489220223853106290801293299300297298752301447305306184626783073083093103163193203233243255327328329330331334335173363483493673723731174438382384385391392399403404405406407554414415416417156142842946043243388043644644844996045345445946746846947047647747848248648727511544497499500918505610421514152127951751851952052152352452752853854154520654455155356135691625187719432292253833025775855865996006066076116176182440633638640641642547153644647648656745660650661664665668669674675687688704711722723229472473638737738744188061374816552931320234207516367687697837847948008178188198288321974835836842847845846858859345886087186987015348728738758769228818828792718838894028868878991299418032185238024042685277728393076321133753538915928929233279234633735259441759951216174955956957493961194726823306962963968969983988989100099199710081009164101210131327101934210231024103010311034104210461050105110531054250526522411106210631070717107710781082108310841085109410951117253112511471530164424271126112711431144114511481149115111591160116411671173117511761199120012031204122120519073204120619512101842189012171361121812191229123112321243124433712451283131422752322262932581246125521371268126912741275127812841285129713011308130921922946133213331344452741135627171359136013621685136685713671380110213861784242313871395830139914001401141314271428189831661432143314941434143514481449145014591460146214631470147114801481375148226573032149214931496151315141515151615191520152415281531153215331372153524722906153715381542155615571558156615674831572157515761579157715781582158313781588158915961608124916091610161316141624162716371980163916402464164916501661166233991666166536316832228168816861687145317041707170817111712172417251727817341737173817401260174917521753175617694561778177917801781179818011318180518061141828351183418353211845185918601807187918851886167172018891902190819091913193020319362224226829561949657196526481972925197319791189199620002001200320042005200620071466200920131944201420232038204090620492050205111234341062068208520862094209520962101210216752106240621092121212314412127213223682139214021462147151721482163216421762177218021812200190015992201169022092210221622172218222022213782223222622271995224322442260226112362263323920631717227415252285228622872288230123032304231123152316746221523232324233123322338233923502372238323849423202399241424152420242124222435243924522453245424571541246324682469247324772478248524862489249050324933314249624972498252325241442252725342535802545202725502554256125622567256825732574257874025792580258125822584572757258925912230260026012604260521026062610261126262627804262826302632263326362637264426462647264926502655265626602661266526662671267222026792688268926925782694269927005972703270627135112715271627202721273627372752276127621855276327712782278327882799273018202804280528122813281634328172822282328322843284428452846589285228532856285928607802877627287821042018289128962897290213052903290429142915291829262927115729292935293611971545295229552917785296435892976297729812982298329841001238229942998299986830073008301330143026302930303036303325213042304330453049305030543062306330778153081587220530823090309130923094309531093108312231233124312831293131313231343135316931703067316831723184318531983199321029432403241324632473245325232533257118732643265327132723275176532773282634329933073312331333172023328333433363337334018003342334333453349335033533354336333641635162133987913411341223783422342634273437344027693465346722783270347734783479348099348862434893497349831135051920351835223523352833093532157353135333534353635613574357543035771831358435853582358336835923591(b)TheHTTPSTDG(client-serverapplication).
Fig.
1.
TwoTDGvisualizationcontrastingaP2Pwithaclient-serverapplication.
Largestcomponentiswithboldedges.
DistinguishingcollaborativeapplicationsfromP2P:Somewell-knownapplicationsotherthanP2Pexhibitcollaborativebehavior,suchasDNSandSMTP.
Thisisnotsurprisingsinceintheseapplicationsserverscommunicatewitheachotherandwithotherclients(highk),andserversactbothasclientsandservers(highInO).
Thisisexactlywhatourmetricsaresetouttodetect.
Ithasbeenreportedrecently[2]thatportnumbersarefairlyaccurateinidentifyingsuchlegacyapplications,althoughtheyfailtoidentifyP2Pandotherapplicationswithdynamicuseofportnumbers.
Therefore,onecoulduselegacyportstopinpointandisolatesuchcollaborativeapplicationsandthenusegraphmetricsontheremainingtrafc.
Inadditiontoportinspection,wecanalsoexaminethepayloadofaowinordertoverifythatitfollowstheexpectedapplication-layerinteractions.
Asafuturework,ourgoalistoselectmetricsthatcanfurtherhelptoseparatebetweencollaborativeapplications(e.
g.
,DNS)andP2P.
Wediscusssimilartopicsagainis§III-C.
Wedonotclaimthatourthresholdsareuniversal,butourmeasurementssuggestthatsmalladjustmentstothesesimpleparametersallowourmethodologytoworkondifferentback-bonelinks.
Furthermore,thethreethresholds(InO,EDiam,andaveragedegree)areobservedtoremainstableovertime.
III.
THEGRAPTIONFRAMEWORKTheGraptionframeworkconsistsofthefollowingthreesteps.
Step1.
FlowIsolation.
Theinputisnetworktrafcintheformofowsasdenedin§II.
Thegoalofthisrstoptionalstepistoutilizeexternalinformationtoisolateanyowsthatcanalreadybeclassied.
Thisknowledgecouldbebasedonpayloadsignatures,portnumbers,orIPaddress(e.
g.
,excludeowsfromaparticulardomainsuchasgoogle.
com).
Step2.
FlowGrouping.
Weusesimilarityattheowandpacketleveltogroupows.
Thedenitionofsimilarityisexibleinourproposedmethodology.
Wecanuseowstatistics(duration,packetsizes,etc.
)orpayloadifthisisavailable.
Eventually,theoutputofthisstepisasetofgroupswitheachgroupideallycontainingowsfromasingleapplication(e.
g.
,Gnutella,NTP,etc.
).
However,atthisstep,theexactapplicationofeachgroupisnotknown.
Step3.
GroupClassier.
Foreachgroupofows,weconstructaTDG.
Next,wequantifyeachTDGusingvariousmetrics.
Theclassierusesthesemetricstoidentifytheapplicationforeachgroupofows.
Fortheclassicationdecision,weuseasetofruleswhichingeneraldependonthefocusofthestudy.
AlthoughthispaperfocusesonP2Pdetection,Graptioncanbeusedforgeneralapplicationclassicationbychoosingmetricsandparametersappropriately.
WenextdescribehowwespecializedGraptiontodetectP2Ptrafc(Graption-P2P).
A.
ImplementationDetailsofGraption-P2PStep1.
Thisisanoptionalstepinourmethodology.
Experimentswithoutthissteparediscussedlaterinthesection.
Recentwork[2]suggeststhatport-basedclassicationworksverywellforlegacyapplications,aslegacyapplicationsusetheirdefaultportsandtunnelingofP2Patsuchportsisnotverycommon.
Thus,inthisstudy,weisolatedowswithport80forWeb,port53forDNS,andport25forSMTP.
Theseapplicationsturnouttobeabout65%ofthetotalnumberofows.
Inourtraces,theproportionofP2Pactuallyusingoneoftheseportsisaslowas0.
1%.
Step2.
Toimplementowgroupingweusethefactthatapplication-levelheadersarelikelytorecuracrossowsfromthesameapplication.
Therefore,payloadsimilaritycanbeusedtogroupows.
InGraption-P2P,weonlyusetherstsixteenbytesfromeachow.
Asweshow,sixteenbytesaresufcient80828486889092949604080120160200240ClusteringPerformance(%)NumberofClusters(k)Precision(TR-PAY1)Recall(TR-PAY1)Precision(TR-PAY2)Recall(TR-PAY2)Fig.
2.
EvaluatingK-means.
togiveverygoodclassicationresults.
Thisobservationagreeswithndingsin[16],[14].
Eventhoughweusethepayloadbytes,ourgroupingisagnostictoapplicationsemantics,aseachbyteisconsideredasasingleindependentcategoricalfeature.
Weconsidereachbyteasasinglecategoricalfeatureintherange{0,1,.
.
.
,255}.
Theowgroupingstepcomprisestwosubsteps:clusterformationandclustermerging.
a.
FormingClusters.
Giventhesetofdiscriminatingfea-tures,thenextstepistocluster"similar"owstogether.
Weusethetermclustertodescribetheoutcomeofaninitialgroupingusingtheselectedfeatures.
Clustersmaybemergedinthenextfunctionofthissteptoformgroups,whichproducesthenaloutput.
Feature-basedclusteringisawell-denedstatisticaldataminingproblem.
ForthistaskweusedthepopularK-meansalgorithm[22].
Thisalgorithmhasbeencommonlyusedforunsupervisedclusteringofnetworkows[6],[16],withverygoodresultsandlowcomputationalcost.
K-meansoperateswithasingleparameterthatselectsthenumberofnalclusters(k).
Asweshowlaterinourevaluation,ourclassiergivesverygoodresultsoveralargerangeofk.
ThesimilaritybetweentwoowsismeasuredbytheHam-ming[22]distancecalculatedoverthe16categoricalfeatures(i.
e.
,thepayloadbytes).
Eventhoughmoreinvolvedsimilaritymeasuressuchasedit-distance(alsoknownasLevenshteindistance)exist,Hammingdistancehasbeenusedsuccessfullybefore[8]andperformsverywellinourapplication.
b.
Clustermerging.
Duringclustering,itislikelythatthesameapplicationgeneratesmultipleclusters.
Forexample,manyP2Pprotocolsexhibitavarietyofinteractionpatterns,suchasqueries(UDPows)andletransfers(TCPows),eachwithsignicantlydifferentowandpacketcharacteristics[12].
Thismotivatesmergingsclustersthatweexpecttobelongtothesameapplicationintogroups.
Thisgroupingprovidesamorecompleteviewoftheapplicationandaidesinunder-standingthestructureoftheP2Pprotocol,asweshowin§III-B.
Clustermergingcannotbebasedonthechosensetofow-levelfeaturesthatwerealreadyusedtocreatetheclusters606570758085909510000.
10.
20.
30.
40.
50.
60.
70.
80.
91F-Measure(%)SimilarityThresholdk=120k=160k=240Fig.
3.
Graption-P2Pachieves>90%F-Measureoveralargerangeofsimilaritythresholdsandnumberofclusters(k).
originally.
Instead,inthecaseofaP2Pprotocol,itisnaturaltoassumethattheTDGscorrespondingtoeachclusterofthesameprotocolwouldsharealargenumberofcommonnodes(IPaddresses).
Basedontheseobservations,weuseanAgglomerative(Hierarchical)ClusteringAlgorithmthatrecursivelymergesclusterswithsignicantsimilarityinIPaddresses.
Weusedthefollowingmetrictocalculatesimilaritybetweenclusters:Sim(C1,C2)=(NumberofowshavingtheirsourceordestinationIPspresentinbothclusters)/(Thenumberofowsofthesmallercluster).
Theclustermergingprocessstartsbyhierarchicallymergingclusterswithhighsimilarityandstopswhenthesimilaritybetweenallnewclusterpairsisbelowasimilaritythreshold(ST).
Asweshowlaterinourevaluation,ourclassiergivesverygoodresultsoveralargerangeofsimilaritythresholds.
Step3.
Theoutcomeofthepreviousstepisasetofgroupsofows,witheachgroupconsistingofowsthatwehopestemfromasingleapplication.
Inordertoclassifyeachgroup,wegenerateaTDGonthegroupinthesamewayasdescribedin§II.
EachgroupyieldsaTDGthatcanbesummarizedusinggraphmetrics.
ToidentifyP2PTDGs,weusedtherulesextractedfrom§II-A.
WhenagroupislabeledasP2PthenalltheowsofthatgroupareclassiedasP2Pows.
B.
EvaluatingGraption-P2PToevaluateGraption-P2P,weusetracesTR-PAY1andTR-PAY2,wherewehavethegroundtruthusingthepayloadclas-sier(§II).
WecomputetheTruePositives,FalsePositives,andFalseNegatives.
TheTruePositives(TP)measureshowmanyinstancesofagivenclassarecorrectlyclassied;theFalsePositives(FP)measureshowmanyinstancesofotherclassesareconfusedwithagivenclass;andtheFalseNegatives(FN)measuresthenumberofmisclassiedinstancesofaclass.
Inourcomparisons,weusedthefollowingstandardmetrics:Precision(P),denedasP=TP/(TP+FP);Recall(R),denedasR=TP/(TP+FN);andtheF-Measure[22],denedasF=2P·R/(P+R),combiningPrecisionandRecall.
WersttesttheabilityofK-meanstogenerateclusterswithowsfromasingleapplication.
Afterformingtheclusters606570758085909510080100120140160180200220240F-Measure(%)Clusters(k)inK-meansGroundTruthGraption-P2P:MergingGraption-P2P:NoMergingFig.
4.
Graption-P2Pwithandwithoutclustermerging.
Resultsarealsocomparedwithclusterlabelingbasedongroundtruth.
withK-means,weusethegroundtruthandlabeleachclusterasbelongingtotheapplicationwiththemajorityofows.
Alltheowsofaclusterarethenclassiedtobelongtothisdominantapplication.
ThePandRofK-meansasweincreasekforbothtracesareshowninFigure2.
Weobservethatwithsufcientlylargek(>120)weachieveverygoodresultswithPandRabove90%.
UsingGraption-P2P,weachievehighF-Measureoverarangeofvaluesofk(K-means)andsimilaritythresholds(ST).
WeshowthisinFigure3,wherewevarytheSTfrom0.
01to1anduseasufcientlylargek(seeFigure2).
Allexperimentsareaveragedovereachdisjoint5minuteintervalofbothtraces.
Intuitively,byusingaverylargeST,theclustersofanapplicationarenotgroupedtogether,whichresultstoTDGsthatarehardertoclassifyasP2P.
Ontheotherhand,withaverysmallST,clustersbelongingtodifferentapplicationsaremergedtogetherleadingtopoorerclassicationperformance.
TheresultsinFigure3showthatweachievegoodclassica-tionperformance(>90%F-Measure),overalargerangeofsimilaritythresholdsandnumberofclusters(k).
InFigure4,wecompareourapproachwithlabelingeachclusterusingthegroundtruth(i.
e.
withoutmerginganyclustersandlabelingeachclusterbasedonthedominantapplication).
Intuitively,foragivenclusteringofows,thegroundtruthshowsthebestthatanyclusterlabelingmecha-nismcanachieve.
Formerging,weuseanSTof0.
5.
FromFigure4,weseethatGraption-P2Pdeviatesonlyslightlyfromlabelingclustersusingthegroundtruth.
Inthesameplot,wealsocompareGraption-P2Pwithouttheclustermergingstep,highlightingthebenetofmergingclustersofthesameapplicationtogether.
UsingaSTof0.
5andk=160,Graption-P2Pachievesabove90%Recallandabove95%Precisionoveralldisjoint5minuteintervalsforbothtraces.
ToapplyGraption-P2Ptootherbackbonelink,thesameselectionprocessedcanberepeatedtoadjustthevaluesofSTandk.
Ourexperimentsshowthattheclassicationperformancecandegradewithaverybadchoiceofparameters.
However,asshowninFigures2,3,and4,forreasonablechoicesforkandST,ourmethodprovidesverygoodresults.
C.
DiscussionComparisonwithBLINC[13].
WeusedBLINCtoclassifytrafcoverbothTR-PAY1andTR-PAY2traces.
BLINCwasoptimizedafterseveraltrialanderroreffortstoachieveitsbestaccuracyoverthesetraces,asdescribedin[2].
TheRecallandPrecisionforBLINCare84%and89%respectively.
Inparticular,BLINChassignicantlylowerperformanceforsomeP2Papplications.
Forexample,Graption-P2Pdetects95%ofBitTorrenttrafc,whileBLINCdetectsonly25%!
OurexperimentssuggestthatBLINCandpossibleotherhostbasedapproachesworkwellwhenappliedattheedge,wherealargefractionofhostowsareobservedandhenceenoughevidenceiscollectedtoproleeachnode.
However,thisisnotalwaystrueforbackbonemonitoringpointswhichcanexplainBLINC'slowerperformance.
Theseobservationsarealsosupportedbyndingsin[2].
OtherCongurations.
WehavetestedGraption-P2Pwith-outusingpayloadundertheassumptionthatpayloadisen-crypted.
Forgroupingowsweusedpacketsizeinformation(i.
e.
min,max,andthesizeoftherstvepackets)andprotocol(UDPorTCP).
OurmethodperformedcomparablywellwithRandPabove88%inbothtraces.
WealsoexperimentedwithoutusingFlowIsolation.
Toachievegoodresults(P,R>85%)wehadtoincreasek(>300)inK-means,whichmadeclustermergingmorechallenging.
Moredetailsareomittedduetospacelimitations.
EvaluatingGraptionwithothercongurationsanddifferentdataminingclusteringalgorithmsisincludedinourfuturework.
EnhancingIsolation.
Toimproveisolationwecanenforcepayloadinspectioninadditiontoport-basedltering.
Forexample,wecantestallDNSowsatport53toseeiftheyalsohaveaDNSpayloadsignatureorifanotherprotocolistunnelingitstrafcundertheDNSport.
Ifpayloadisencrypted,thenwecanchoosetouseow-levelfeaturesuchaspacketsizes[2]orwhite-listedIPaddresses[21].
IV.
RELATEDWORKTrafcClassication.
Asanalternativetoport-basedmeth-ods,someworksusedpayload[16],[14].
OtherapproachesuseMachineLearning(ML)methodstoclassifytrafcusingowfeatures(e.
g,packetsizes).
ForanexhaustivelistandcomparisonofMLmethodswereferthereaderto[18]and[2].
Ourworkhasmoreincommonwithunsuperviseddataminingmethodswhichgroupsimilarowstogether.
Allpreviousmethods[15],[6]requiremanuallabelingofclusters.
Ourworkbridgesthisgapbyprovidingamethodtoautomaticallylabelclustersofowsbasedontheirnetwork-widebehavior.
InBLINC[13],theauthorscharacterizetheconnectionpatterns(e.
g.
,ifitbehaveslikeusingP2P)ofasinglehostattheTransportLayerandusethesepatternstolabeltheowsofeachhost.
BLINCusesgraphmodelscalledgraphletstomodelahost'sconnectionpatternsusingportandIPcardinalities.
UnlikeTDGs,graphletsdonotrepresentnetwork-widehostinteraction.
Insomesense,TDGsrepresentafurtherlevelofaggregation,byaggregatingacrosshostsaswell.
ThusitisperhapsfairtosaythatwhileBLINChintsatthebenetofanalyzingthenode'sinteractionatthe"social"level,itultimatelyfollowsadifferentpaththatfocusesonthebehaviorofindividualnodes.
Asweshow,ourapproachperformsbetterthanBLINCinourbackbonetraces.
SimilartoBLINC,otherhost-basedmethod[1],[23]targettheidenticationofP2Pusersinsideauniversitycampus(i.
e.
,networkedge).
UnlikeGraption,in[1],[23],[13]theydonotusenetwork-widehostinteraction.
In[4],theauthorsuseaport-basedmethodtoidentifyP2Pusers,usingtheirtemporalappearanceandconnectionpatternsinatrace.
ThemostrecenthostprolingmethodisbyTrestianetal.
[21].
TheyusedreadilyavailableinformationfromtheWebtoclassifytrafcusingtheGooglesearchengine.
Theyshowverygoodresultsforclassifyingowsforlegacyapplication,buttheirresultsarenotpromisingforP2PdetectionbecauseofthedynamicnatureofP2PIPhosts.
Ourmethodcanbeusedtocomplementtheworkin[21].
WormDetection.
Graphshavebeenusedfordetectingwormactivitieswithinenterprisenetworks[5].
Theirmaingoalwastodetectthetree-likecommunicationstructureofwormpropagation.
Thischaracteristicofwormswasalsousedforpost-mortemtraceanalysis(fortheidenticationofthesourceofawormoutbreak,theso-calledpatientzero)usingbackbonetraces[25].
Morerecentstudiesusegraphtechniquestodetecthit-listwormswithinanenterprisenetwork,basedontheobservationthatanattackerwillaltertheconnectedcomponentsinthenetwork[3].
Measurements.
Statisticalmethodsareusedin[24]forautomatingtheprolingofnetworkhostsandportsnumbers.
Theconnectivitybehaviorandhabitsofuserswithinenterprisenetworksisthefocusofmanypapers,including[20].
In[19],theauthorsstudyP2Poverlaysusingpassivemeasurements,buttargetmainlytheprolingofP2Phosts.
NoneoftheabovepaperstargetsP2Pdetection.
V.
SUMMARYANDCONCLUSIONSTheunderlyingthemeofourworkistogobeyondjustmonitoringindividualpackets,ows,orhosts,toalsomoni-toringtheinteractionsofagroupofhosts.
Towardsthisend,weintroduceTDGsandshowtheirpotentialtogeneratenoveltoolsfortrafcclassication.
BasedonTDGs,wedevelopedGraption,agraph-basedframework,whichwethenspecialize(Graption-P2P)forthedetectionofP2Papplications.
WeshowthatGraption-P2Pclassiesmorethan90%ofP2Ptrafcwithabove95%precisionwhentestedonreal-worldbackbonetraces.
ACKNOWLEDGMENTSThisworkwassupportedbyNSFgrantNETS-0721889,aCiscoSystems,Inc.
URPgrant,theITR&Dprogram[2007-F-038-02,FundamentalTechnologiesfortheFutureInternet],andtheITRCsupportprogram[IITA-2009-C1090-0902-0006]ofMKE/IITA.
TheSDSC'sTeraGridandcomputeresourcesaresupportedbytheNSFgrantCONMICRI-0551542.
MichaelMitzenmacherwassupportedinpartbyNSFgrantCNS-0721491andresearchgrantfromCiscoSystems,Inc.
SupportforCAIDA'sInternettracesisprovidedbytheNationalScienceFoundation,theUSDepartmentofHomelandSecurity,CAIDAMembers,andtheDatCatsystem.
REFERENCES[1]BARLETT,G.
,HEIDEMANN,J.
,ANDPAPADOPOULOS,C.
InherentBehaviorsofOn-lineDetectionofPeer-to-PeerFileSharing.
InIEEEGlobalInternet(2007).
[2]HYUN-CHULKIM,H.
,CLAFFY,K.
,FOMENKOV,M.
,BARMAN,D.
,FALOUTSOS,M.
,ANDLEE,K.
InternetTrafcClassicatoinDemysti-ed:Myths,Caveats,andtheBestPractices.
InACMCoNEXT(2008).
[3]COLLINS,M.
P.
,ANDREITER,M.
K.
Hit-ListWormDetectionandBotIdenticationinLargeNetworksUsingProtocolGraphs.
InRAID(2007).
[4]CONSTANTINOU,F.
,ANDMAVROMMATIS,P.
Identifyingknownandunknownpeer-to-peertrafc.
InIEEENCA(2006).
[5]ELLIS,D.
,AIKEN,J.
,ATTWOOD,K.
,ANDTENARGLIA,S.
ABehav-ioralApproachtoWormDetection.
InACMCCSWORM(2004).
[6]ERMAN,J.
,ARLITT,M.
,ANDMAHANTI,A.
Trafcclassicationusingclusteringalgorithms.
InACMMineNet(2006).
[7]GERBER,A.
,HOULE,J.
,NGUYEN,H.
,ROUGHAN,M.
,ANDSEN,S.
P2P,theGorillaintheCable.
InNCTA(2003).
[8]HAFFNER,P.
,SEN,S.
,SPATSCHECK,O.
,ANDWANG,D.
ACAS:automatedconstructionofapplicationsignatures.
InACMMineNet(2005).
[9]ILIOFOTOU,M.
,PAPPU,P.
,FALOUTSOS,M.
,MITZENMACHER,M.
,KIM,H.
,ANDVARGHESE,G.
Graption:AutomatedDetectionofP2PApplicationsintheInternetBackbone.
Tech.
rep.
,UCRiverside,2008.
[10]ILIOFOTOU,M.
,PAPPU,P.
,FALOUTSOS,M.
,MITZENMACHER,M.
,SINGH,S.
,ANDVARGHESE,G.
NetworkMonitoringUsingTrafcDispersionGraphs(TDGs).
InACMIMC(2007).
[11]IPOQUE.
Internetstudy2007lehosterslikerapidshareandstreamingserviceslikeyoutube.
2007.
[12]KARAGIANNIS,T.
,BROIDO,A.
,BROWNLEE,N.
,KCCLAFFY,ANDFALOUTSOS,M.
Isp2pdyingorjusthidingInIEEEGLOBECOM(2004).
[13]KARAGIANNIS,T.
,PAPAGIANNAKI,K.
,ANDFALOUTSOS,M.
BLINC:Multi-levelTrafcClassicationintheDark.
InACMSIGCOMM(2005).
[14]MA,J.
,LEVCHENKO,K.
,KREIBICH,C.
,SAVAGE,S.
,ANDVOELKER,G.
M.
Unexpectedmeansofprotocolinference.
InACMIMC(2006).
[15]MCGREGOR,A.
,HALL,M.
,LORIER,P.
,ANDBRUNSKILL,J.
FlowClusteringUsingMachineLearningTechniques.
InPAM(2004).
[16]MOORE,A.
,ANDPAPAGIANNAKI,K.
Towardtheaccurateidentica-tionofnetworkapplications.
InPAM(2005).
[17]NEWMAN,M.
E.
J.
Thestructureandfunctionofcomplexnetworks.
SIAMReview45(2003),167.
[18]NGUYEN,T.
T.
,ANDARMITAGE,G.
AsurveyoftechniquesforInter-nettrafcclassicationusingmachinelearning.
IEEECommunicationsSurveysandTutorials4thedition(March2008).
[19]SEN,S.
,ANDWANG,J.
Analyzingpeer-to-peertrafcacrosslargenetworks.
IEEE/ACMTransactiononNetworking12,2(2004),219–232.
[20]TAN,G.
,POLETTO,M.
,GUTTAG,J.
,ANDKAASHOEK,F.
Roleclassicationofhostswithinenterprisenetworksbasedonconnectionpatterns.
InUSENIXATC(2003).
[21]TRESTIAN,I.
,RANJAN,S.
,KUZMANOVIC,A.
,ANDNUCCI,A.
Un-constrainedendpointproling(GooglingtheInternet).
InACMSIG-COMM(2008).
[22]WITTEN,I.
H.
,ANDFRANK,E.
DataMining:Practicalmachinelearningtoolsandtechniques,MorganKaufmann,2005.
[23]W.
JohnandS.
Tafvelin.
HeuristicstoclassifyInternetBackboneTrafcbasedonConnectionPatterns.
InICOIN,2008.
[24]XU,K.
,ZHANG,Z.
,ANDBHATTACHARYYA,S.
ProlingInternetbackbonetrafc:Behaviormodelsandapplications.
InACMSIGCOMM(2005).
[25]Y.
Xieetal.
Forensicanalysisofepidemicattacksinfederatednetworks.
InIEEEICNP,2006.
[26]DatCat,InternetMeasurementDataCatalog.
http://www.
datcat.
org.
41云怎么样?41云是国人主机品牌,目前经营产品有国内外云服务器、CDN(高防CDN)和物理机,其中国内外云服务器又细分小类有香港限流量VPS、香港大带宽VPS、香港弹性自选VPS、香港不限流VPS、香港BGP线路VPS、香港Cera+大带宽机器、美国超防VPS、韩国原生VPS、仁川原生VPS、日本CN2 VPS、枣庄高防VPS和金华高防VPS;物理机有美国Cera服务器、香港单程CN2服务器、香...
Hostodo近日发布了美国独立日优惠促销活动,主要推送了四款特价优惠便宜的VPS云服务器产品,基于KVM虚拟架构,NVMe阵列,1Gbps带宽,默认分配一个IPv4+/64 IPv6,采用solusvm管理,赠送收费版DirectAdmin授权,服务有效期内均有效,大致约为7折优惠,独立日活动时间不定,活动机型售罄为止,有需要的朋友可以尝试一下。Hostodo怎么样?Hostodo服务器好不好?...
百驰云成立于2017年,是一家新国人IDC商家,且正规持证IDC/ISP/CDN,商家主要提供数据中心基础服务、互联网业务解决方案,及专属服务器租用、云服务器、云虚拟主机、专属服务器托管、带宽租用等产品和服务。百驰云提供源自大陆、香港、韩国和美国等地骨干级机房优质资源,包括BGP国际多线网络,CN2点对点直连带宽以及国际顶尖品牌硬件。专注为个人开发者用户,中小型,大型企业用户提供一站式核心网络云端...
p2pover官网为你推荐
支持ipad支持ipad支持ipad支持ipadxp如何关闭445端口系统怎么关闭445端口iphonewifi苹果手机怎么扫二维码连wifi360chrome360Chrome 世界之窗极速浏览器 ChromePlus迅雷快鸟用迅雷快鸟提示:您所在的网络暂不支持迅雷快鸟迅雷下载速度迅雷下载快慢和什么有关电信版iphone4s4和苹果iPhone 4S 电信版有什么区别
fdcservers idc测评网 空间打开慢 12306抢票助手 512m内存 蜗牛魔方 空间出租 七夕促销 腾讯实名认证中心 免费活动 流量计费 美国网站服务器 美国堪萨斯 安徽双线服务器 cloudlink 上海电信测速 英国伦敦 石家庄服务器托管 apache启动失败 WHMCS 更多