DetectingProbe-resistantProxiesSergeyFrolovUniversityofColoradoBouldersergey.
frolov@colorado.
eduJackWamplerUniversityofColoradoBoulderjack.
wampler@colorado.
eduEricWustrowUniversityofColoradoBoulderewust@colorado.
edutohavenodiscerniblengerprintsorheaderelds,makingthemdifcultforcensorstopassivelydetect.
However,censorssuchastheGreatFirewallofChina(GFW)havestartedactivelyprobingsuspectedproxiesbyconnectingtothemandattemptingtocommunicateusingtheircustomprotocols[18].
Ifasuspectedserverrespondstoaknowncircumventionprotocol,thecensorcanblockthemasconrmedproxies.
Activeprobingcanbeespeciallyeffectiveatdetectingsus-pectedproxies,becausecensorscandiscovernewserversastheyareused.
PreviousworkhasshownthatChinaemploysanextensiveactiveprobingarchitecturethatissuccessfulinblockingoldercircumventionprotocolslikevanillaTor,obfs2andobfs3[18],[4],[45].
Inresponsetothistechnique,circumventorshavedevel-opedprobe-resistantproxyprotocols,suchasobfs4[6],Shadowsocks[2],andLampshade[26],thattrytopreventcensorsfromusingactiveprobingtodiscoverproxies.
Theseprotocolsgenerallyrequirethatclientsproveknowledgeofasecretbeforetheserverwillrespond.
Thesecretisdistributedalongwiththeserveraddressout-of-band,forexamplethroughTor'sBridgeDB[38]orbyemail,andisunknowntothecensor.
Withoutthesecret,censorswillnotbeabletogettheservertorespondtotheirownactiveprobes,makingitdifcultforthecensortoconrmwhatprotocoltheserverspeaks.
Inthispaper,weinvestigateseveralpopularprobe-resistantproxyprotocolsinusetoday.
Despitebeingdesignedtobeprobe-resistant,wediscovernewattacksthatallowacensortopositivelyidentifyproxyserversusingonlyahandfulofspecially-craftedprobes.
Wecomparehowknownproxyserversrespondtoourprobeswiththeresponsesfromlegiti-mateserversontheInternet(collectedfromapassivenetworktapandactivescanning),andndthatwecaneasilydistinguishbetweenthetwowithnegligiblefalsepositivesinourdataset.
Weanalyzevepopularproxyprotocols:obfs4[6]usedbyTor,Lampshade[26]usedinLantern,Shadowsocks[2],MTProto[11]usedinTelegram,andObfuscatedSSH[27]usedinPsiphon.
Foreachoftheseprotocols,wendwaystoactivelyprobeserversoftheseprotocols,anddistinguishthemfromnon-proxyhostswithalowfalsepositiverate.
Ourattackscenteraroundtheobservationthatneverre-spondingwithdataisunusualbehaviorontheInternet.
Bysendingprobescomprisedofpopularprotocolsaswellasrandomdata,wecanelicitresponsesfromnearlyallendpoints(94%)inourTapdatasetofover400kIP/portpairs.
Forendpointsthatdonotrespond,wendTCPbehaviorsuchastimeoutsanddatathresholdsthatareuniquetotheproxiescomparedtootherservers,whichprovideaviableattackforidentifyingallexistingprobe-resistantproxyimplementations.
WeuseourdatasettodiscoverthemostcommonresponsesAbstract—Censorshipcircumventionproxieshavetoresistac-tiveprobingattempts,wherecensorsconnecttosuspectedserversandattempttocommunicateusingknownproxyprotocols.
Iftheserverrespondsinawaythatrevealsitisaproxy,thecensorcanblockitwithminimalcollateralrisktoothernon-proxyservices.
CensorssuchastheGreatFirewallofChinahavepreviouslybeenobservedusingbasicformsofthistechniquetondandblockproxyserversassoonastheyareused.
Inresponse,circumventorshavecreatednew"probe-resistant"proxyprotocols,includingobfs4,shadowsocks,andLampshade,thatattempttopreventcensorsfromdiscoveringthem.
Theseproxiesrequireknowledgeofasecretinordertouse,andtheserversremainsilentwhenprobedbyacensorthatdoesn'thavethesecretinanattempttomakeitmoredifcultforcensorstodetectthem.
Inthispaper,weidentifywaysthatcensorscanstilldistinguishsuchprobe-resistantproxiesfromotherinnocuoushostsontheInternet,despitetheirdesign.
WediscoveruniqueTCPbehaviorsofveprobe-resistantprotocolsusedinpopularcircumventionsoftwarethatcouldallowcensorstoeffectivelyconrmsuspectedproxieswithminimalfalsepositives.
Weevaluateandanalyzeourattacksonhundredsofthousandsofserverscollectedfroma10GbpsuniversityISPvantagepointoverseveraldaysaswellasactivescanningusingZMap.
Wendthatourattacksareabletoefcientlyidentifyproxyserverswithonlyahandfulofprobingconnections,withnegligiblefalsepositives.
Usingourdatasets,wealsosuggestdefensestotheseattacksthatmakeitharderforcensorstodistinguishproxiesfromothercommonservers,andweworkwithproxydeveloperstoimplementthesechangesinseveralpopularcircumventiontools.
I.
INTRODUCTIONInternetcensorshipcontinuestobeapervasiveproblemonline,withusersandcensorsengaginginanongoingcat-and-mousegame.
Usersattempttocircumventcensorshipusingproxies,whilecensorstrytondandblocknewproxiestopreventtheiruse.
Thisarmsracehasledtoanevolutionincircumventionstrategiesastoolmakershavedevelopednewwaystohideproxiesfromcensorswhilestillbeingaccessibletousers.
Inrecentyears,censorshavelearnedtoidentifyandblockcommonproxyprotocolsusingdeeppacketinspection(DPI),promptingcircumventorstocreatenewprotocolssuchasobfsproxy[15]andScramblesuit[47]thatencrypttrafctobeindistinguishablefromrandom.
TheseprotocolsaredesignedNetworkandDistributedSystemsSecurity(NDSS)Symposium202023-26February2020,SanDiego,CA,USAISBN1-891562-61-4https://dx.
doi.
org/10.
14722/ndss.
2020.
23087www.
ndss-symposium.
organdTCPbehaviorofserversonline,andusethisdatatoinformhowprobe-resistantproxiescanbetterblendinwithotherservices,makingthemhardertoblock.
WeworkwithseveralproxydevelopersincludingPsiphon,Lantern,obfs4,andOutlineShadowsockstodeployoursuggestedchanges.
II.
BACKGROUNDInthissection,wedescribethecensorthreatmodel,andprovidebackgroundontheprobe-resistantcircumventionpro-tocolswestudy,focusingonhoweachpreventscensorsfromprobing.
A.
CensorThreatModelWeassumethatcensorscanobservenetworktrafcandcanperformfollow-upprobestoserverstheyseeotherclientsconnectingto.
Thecensorcanalsoperformlargeactivenet-workscans(e.
g.
usingZMap[16]).
Weassumethecensorknowsandunderstandsthecircumventionprotocolsandcanrunlocalinstancesoftheserversthemselves,orcandiscoversubsetsof(butnotall)realproxyserversthroughlegitimateuseoftheproxysystem.
Inthispaper,wefocusonidentifyingpotentialproxiesviaactiveprobing.
Althoughcensorsmayhaveothertechniquesfordiscoveringandblockingproxies(e.
g.
passiveanalysis[42]ordistributionsystemenumeration[14]),theseattacksarebeyondthescopeofthispaper.
Whilesuccessfulcircumventionsystemsmustprotectagainsttheseandotherattacks,itisalsonecessaryfortheproxiestoberesistanttoactiveprobes,whichisthefocusofourpaper.
Weassumethatthecensordoesnotknowthesecretsdistributedout-of-bandforeveryproxyserver.
Althoughthecensorcanusethedistributionsystemtolearnsmallsubsetsofproxies(andtheirsecrets)andblockthem,fullenumerationofthedistributionsystemisoutofscope.
B.
Probe-resistantProtocolsCircumventionprotocolsmustavoidtwomainthreatstoavoidbeingeasilyblockedbycensors:passivedetection,andactiveprobing.
Ifacensorcanpassivelydetectaprotocolanddistinguishitfromlegitimateones,theprotocolcanbeblockedwithminimalcollateraldamage.
Circumventionpro-tocolsattempttoavoidpassiveidenticationbytryingtoblendinwithotherprotocolsorimplementations[17],[32],[15],[22],orbycreatingrandomizedprotocolswithnodiscerniblengerprints[6],[26],[2],[47].
Alternatively,censorscanidentifyorconrmsuspectedproxiesbyactivelyprobingthem.
TheGreatFirewallofChina(GFW)haspreviouslybeenobservedsendingfollow-upprobestosuspectedTorbridges[41],[45],[18],[4].
TheseprobesweredesignedtoconrmifaproxyisaTorbridge(byattemptingtouseittoaccessTor),andblocktheIPifitis.
Tocombatthisstyleofactiveidentication,circumventionprotocolsnowattempttobeprobe-resistant.
Forinstance,inobfs4,clientsaregivenasecretkeythatallowsthemtoconnecttotheprovidedserver.
Withoutthiskey,activeprobingcensorscannotcompletetheinitialhandshake,andtheobfs4serverwillclosetheconnectionafterabrieftimeout.
Wenowdescribeeachoftheveproxyprotocolswestudy.
AnoverviewofhoweachprotocolattemptstothwartactiveprobingisseeninTableI.
1)obfs4:obfs4[6]isagenericprotocolobfuscationlayer,frequentlyusedasaPluggableTransport[39]forTorbridges.
obfs4proxiesaredistributedtousersalongwithacorre-sponding20-bytenodeIDandCurve25519publickey.
Uponconnectingtoanobfs4server,theclientgeneratestheirownephemeralCurve25519keypair,andsendsaninitialmessagecontainingtheclientpublickey,randompadding,andanHMACovertheserver'spublickey,nodeID,andclientpublickey(encodedusingElligator[10]tobeindistinguishablefromrandom),withasecondHMACincludingatimestamp.
TheserveronlyrespondsiftheHMACsarevalid.
SincecomputingtheHMACsrequiresknowledgeofthe(secret)serverpublickeyandnodeID,censorscannotelicitaresponsefromtheobfs4server.
IfaclientsendsinvalidHMACs(e.
g.
aprobingcensor),theserverclosestheconnectionafteraserver-specicrandomdelay.
2)Lampshade:Lampshade[26]isaprotocolusedbytheLanterncensorshipcircumventionsystem,andusesRSAtoencryptaninitialmessagefromtheclient.
Clientsaregiventheserver'sRSApublickeyoutofband,andclientsuseittoencrypttheinitialplaintextmessagecontaininga256-bitrandomseed,theprotocolversions,andciphersthattheclientsupports.
Subsequentmessagesareencrypted/decryptedusingthespeciedcipher(usuallyAES-GCM)andtheclient-chosenseed,withprependedpayloadlengthsencryptedusingChaCha20.
Sincethecensordoesnotknowaserver'sRSApublickey,theywillbeunabletosendavalidinitialmessagewithaknownseed,andanysubsequentmessageswillbeignored.
Iftheserverfailstodecrypttheexpected256-byteciphertext(basedonanexpectedmagicvalueintheplaintext),itwillclosetheconnection.
3)Shadowsocks:Shadowsocksisapopularprotocoldevel-opedandusedinChina,andhasmanydifferentinteroperableimplementationsavailable[25],[50],[13].
Shadowsocksdoesnothostordistributeserversthemselves.
Instead,usersmustlaunchtheirownprivateproxyinstancesoncloudproviders,andcongureitwithauser-chosenpasswordandtimeout.
UserscanthentunneltheirtrafctotheirprivateproxyusingtheShadowsocksprotocol.
Shadowsocksclientsgenerateakeyfromasecretpasswordandrandomsalt,andthensendstherandomsaltandencryptedpayloadusinganauthenticated(AEAD)ciphertotheserver.
Acensorwithoutthesecretpasswordwillnotbeabletogeneratevalidauthenticatedciphertexts,andtheirinvalidciphertextswillbeignoredbytheserver.
TherearenumerousimplementationsoftheShadowsocksprotocol[2];ouranalysiscoverstwo:theeventdrivenShad-owsocksimplementationinPython[13],andJigsaw'sOut-line[25].
Werefertotheseasshadowsocks-pythonandshadowsocks-outlinerespectively.
IntheAEAD-mode,bothserverimplementationswaittoreceive50bytesfromtheclient,andiftheAEADtagisinvalid,theserverclosestheconnection.
4)MTProto:MTProto[1]isaproprietaryprotocolde-signedandusedbytheTelegram[3]securemessagingapp.
Incountriesthatblocktheservice,Telegramemploysthe2ProtocolClient'srstmessageServerBehaviorobfs4[6]K=serverpubkey|NODEIDM=clientpubkey|padding|HMACK(clientpubkey)send:M|HMACK(M|timestamp)Reads8192bytes(maxhandshakepaddinglength)IfnovalidHMACisfound,serverreadsarandom0-8192additionalbytes,thenclosestheconnection.
Lampshade[26]send:RSAencrypt(serverpubkey,.
.
.
|seed)Reads256bytes(correspondingtoRSAciphertextlength)andattemptstodecrypt.
Ifitfails,theserverclosestheconnection.
Shadowsocks[2]K=HKDF(password,salt,"sssubkey")send:salt|AEADK(payloadlen)|AEADK(payload)Reads50bytes(correspondingtosalt,2-bytelength,andAEADtag)Ifthetagisinvalid,serverclosestheconnection.
MTProtoProxy[1]K=sha256(seed|secret)send:seed|IV|EK(magic)Doesnotclosetheconnectiononinvalidhandshake.
OSSH[27]K=SHA11000(seed|secret)send:seed|EK(magic|payload)Reads24bytes,andclosestheconnectioniftheexpectedmagicvalueisnotdecrypted.
TABLEI:Probe-resistantProtocols—Inthistablewelisttherstmessagesentbytheclientintheprobe-resistantproxyprotocolswestudy,andtheserver'scorrespondingparsingbehavior.
Bluetextdenotessecretsdistributedtotheclientout-of-band.
Serverstypicallyclosetheconnectionaftertheyreceiveaninvalidhandshakemessage;however,preciselywhenaserverclosesafailedconnectioncanrevealauniquengerprintthatcouldbeusedtodistinguishthemfromotherservices.
MTProtoProxyprotocol,whichonlyaddsobfuscation,andwillbereferredtoassimplyMTProtofortheremainderofthepaper.
MTProtoderivesasecretkeyfromthehashofarandomseedandasecretdistributedout-of-band.
Thekeyisthenusedtoencrypta4-bytemagicvalueusingAES-CTR.
Iftheserverdoesnotdecrypttheclient'srstmessagetotheexpectedmagicvalue,itwillnotrespond.
Sincethecensordoesnotknowthesecret,theywillbeunabletoconstructaciphertextthatdecryptstotheproper4-bytevalue.
Uponhandshakefailure,theserverkeepstheconnectionopenforeverbutdoesnotrespond.
TherearemanyunofcialimplementationsofMTProtoservers,includingversionsinPython[11],Golang[36],andNodeJS[20],butweonlyinvestigateserversincludedwiththeTelegramAndroidapplicationbydefault,whicharemorelikelytobeanofcialimplementation.
5)ObfuscatedSSH:ObfsucatedSSH[27](OSSH)isasimpleprotocolthatwrapstheSSHprotocolinalayerofencryption,obfuscatingitseasily-identiedheaders.
Clientssenda16-byteseed,whichishashedalongwithasecretkeywordthousandsoftimesusingSHA-1toderiveakey.
Thekeyisusedtoencrypta4-bytemagicvaluealongwithanypayloaddata(i.
e.
SSHtrafc)usingRC4.
Theserverre-derivesthekeyfromtheseedandsecretkeywordanddecryptstoverifythemagicvalue.
WespecicallylookedatPsiphon'sOSSHimplementation[35],whichusesa32-bytesecretkeyworddistributedtoclients.
Withoutknowingthekeyword,censorscannotcreatevalidciphertext1.
IfanOSSHserverreceivesaninvalidrstmessage,itclosestheconnection.
III.
PROBEDESIGNTodetectprobe-resistantproxyservers,weconsiderwhateachprotocolhasincommon.
Specically,eachprotocolrequiresknowledgeofasecretthatisprovencryptographically.
Iftheclientdoesnotknowthesecret,theserverwillsimplynotrespond,andeventuallyclosetheconnection.
Westartwiththeintuitionthatsuchnon-responseisuncommonontheInternet,especiallytoalargecorpusofprobesinpopularprotocols.
For1Wenotethata4-bytemagicvaluemaybeinsufcienttovalidateknowledgeofthekeyword;1in4billionrandomciphertextsmaydecrypttocontainany4-bytevalue,thoughthisislikelyinfeasibleforcensorstopracticallyuseinstance,it'strivialtodistinguishbetweenallHTTPserversandprobe-resistantproxies,asHTTPserverswillrespondtoHTTPrequestswhiletheproxieswillnot.
Followingthisintuition,wecreateseveralprotocol-specicprobesforwell-knownprotocolsunrelatedtocensorshipcir-cumvention,andsendthemtoalargesampleofInternethoststoseehowtheyrespond.
Ifmostorallofthehostsrespond,orotherwiseclosetheconnectioninawaydistinguishablefromtheproxyserverswestudy,thenacensorcouldusethisasaneffectivestrategyforidentifyingproxyserversanddistinguishingthemfromlegitimatehosts.
A.
BasicProbesOurprobesaredatathatwesendtoaserverafterconnect-ingoverTCP.
WelimitourprobestoTCPasalloftheproxyprotocolswestudyuseTCP,thoughourtechniquescouldbeexpandedtoUDP.
Westartedwith6basicprobes:HTTP,TLS,MODBUS,S7,randombytes,andanemptyprobethatsendsnodataafterconnecting.
Foreachprobe,werecordhowaserverrespondsintermsofthedata(ifany)itreplieswith,thetimethatitclosestheconnection(ifitdoes),andhowitclosestheconnection(TCPFINorRST).
Webrieydescribeeachofourinitialprobes.
a)HTTP:ForHTTP,wesendasimpleHTTP/1.
1GETrequestwithahostheaderofexample.
com.
AsHTTPremainsoneofthemostpopularprotocolsontheInternet,weexpectmanyserverswillrespondwithHTTPresponses,redirects,orerrorpages.
Wenotethatlikeanyprotocol-specicprobe,evenserversthatarenotHTTPmayrespondtothisprobewithanerrormessagenativetotheirprotocol.
b)TLS:WesendaTLSClientHellomessagethatistypicallygeneratedbytheChromiumversion71browser.
Thisincludespopularciphersuitesandextensions,thoughwenotethatevenifthereisnomutualsupportbetweenanactualTLSserverandourClientHello,theservershouldrespondwithaTLSAlertmessage,allowingustodistinguishitfromsilentproxyservers.
c)Modbus:Modbusiscommonlyusedbypro-grammablelogiccontrollers(PLC)andotherembeddedde-vicesinsupervisorycontrolanddataacquisition(SCADA)environments.
WeusedaprobedenedinZGrab[51]thatsendsa3-bytecommandthatrequestsadeviceinfodescriptorfromtheremotehost.
3d)S7:S7isaproprietaryprotocolusedbySiemensPLCdevices.
WeagainusedtheprobedenedinZGrab[51]whichsendsarequestfordeviceidentierovertheCOTP/S7protocol.
e)Randombytes:Wealsosendseveralprobeswithdifferingamountsofrandombytes,withthehopethatserversthatattempttoparsethisdatawillfailandrespondwithanerrormessageorclosetheconnectioninawaythatdistinguishesthemfromproxyservers.
f)Emptyprobe:Wealsohaveaspecic"probe"thatsendsnodataafterconnecting.
Someprotocols(suchasSSH)havetheservercommunicaterst(orsimultaneously)withtheclient.
Forotherprotocols,implementationsmayhavedifferenttimeoutsforwhentheclienthassentsomeinitialdatacomparedtowhennodataissent.
Weinitiallyprobedasmallsampleofabout50,000serverendpointscollectedfromourpassivetap(seeSectionIII-B)withtheseinitial6probes,andcomparedhowtheyresponded(data,connectionclosetypeandtiming)withhowinstancesofourproxyserversresponded.
Withonlytheseprobes,therewerestillhundredsofserversthatrespondedidenticaltotheobfs4serversweprobed.
Aftermanualanalysisoftheseservers,weaddedtwoadditionalprobesbasedonthetypesofserversweidentied:g)DNSAXFR:AlthoughDNSrequestsaretypicallydoneoverUDP,DNSzonetransfersarecarriedoutoverTCP.
WeidentiedseveralhostsinourinitialsamplethatappearedtobeDNSserversbasedonourmanualprobingusingnmap[29].
Todetecttheseusingourprobes,wecraftedaDNSAXFR(zonetransfer)queryprobebasedontheDNSspecication[31].
h)STUN:WediscoveredseveralendpointsonTCPport5004thatwewereunabletodirectlyidentify.
Wefoundmanyofthehostsalsohadport443openandrespondedwithTLSself-signedcerticatesthatsuggestedtheywereCiscoWebExdevices.
Whilethisconrmedthesewereunlikelytobeproxyservers,wewerealsoabletondamoredirectwaytoidentifythesehosts.
WeusedourpassivenetworktaptolookatdatasentoverTCPport5004,andusedthedatawecollectedtohelpusidentifywhatprotocolthesedevicessupportandhowtogenerateaprobeforthem.
Althoughanuncommonport(weonlysaw4data-carryingpacketsoverseveraldaysofcollectionona10Gbpstap),wewereabletoidentifythisportassupportingSessionTraversalUtilitiesforNAT(STUN)protocolsbasedonamagiccookievalueincludedinthedata.
Withthisknowledge,weimplementedaSTUNprobebasedontheZGrabGolanglibrary,whichweconrmedelicitsresponsesfromtheseremaininghosts,allowingustodirectlydistinguishthemfromproxies.
B.
ComparisonDatasetWecreateadatasetcomprisedofknownproxyendpoints(IP/portpairs)andcommon(non-proxy)endpoints.
Foreachproxyweinvestigate,wecollectasamplesetofactiveend-pointsfromtheirrespectiveproxydistributionsystem(e.
g.
BridgeDB),contactingthedevelopers,orbyrunningourowninstance.
Wecollectover20obfs4proxyendpointsfromTor'sBridgeDB,andreceive3LampshadeproxiesfromLanterndevelopers.
Weobtain3OSSHproxyendpointsfromPsiphondevelopers,anddiscover3endpointsofMTProtobyusingtheTelegramapplication.
AsShadowsocksisdesignedforuserstoruntheirownproxies,wesetupourownshadowsocks-pythoninstance(conguredusingthechacha20-ietf-poly1305cipher)andreceivedanaddressofshadowsocks-outlinefromdevelopers.
Gatheringarealistic"common"(non-proxy)endpointsdatasetistrickier.
Ideally,wewantalargesetofendpointsthatcontainsadiversesetofnon-proxyhosts.
Wecouldpotentiallycreateourownsetofendpointsbyrunningknownimplementationsofnon-proxyservicesourselves,suchaspopularwebservers,mailservers,andothernetworkservices.
Butthiswouldfailtocaptureproprietaryserversaswellasthelong-tailofobscureserversthatarepresentintherealworld.
Instead,wecollectlikelynon-proxyendpointsfromtwosources:activenetworkscansusingZMap[16],andpassivecollectionofnetowdata.
WeuseZMaptosendaSYNpacketto20,000hostsoneveryTCPport(0-65535)foratotalof1.
3billionprobes.
Wediscover1.
5millionendpointsthatrespondedwithSYN-ACKs,whichwelabelasourZMapdataset.
Forourpassivedataset,wecollectendpointsbysamplingnetowdatafroma10GbpsrouterattheUniversityofColorado.
OurintuitionisthatduetoitsnetworkpositioninacountrythatdoesnotcensoritsInternet,thevastmajorityoftrafcseenfromthisvantagepointwillnotcontainproxies.
ThisISP'susershavelittlemotivationtousecensorshipcircumventionproxies,soendpointscollectedherearelikelytobepredominantlyotherservices.
Moreover,thesehostsaremorerepresentativeofusefulservices,asopposedtoendpointsintheZMapscanthatmaynothaveanyactualclientsconnecttothembesideourscanner.
Overa3-daytimespan,wecollectedover550,000uniqueserverIP/portendpointsfromourISPthatweobservedsendingdata,andsentfollow-upconnectionsfromourscanningserver.
Ofthese,433,286(79%)hostsacceptedourconnection(re-spondedwithaTCPSYN-ACK),withtheremainingmajoritysimplytimingoutduringtheattemptedconnection.
Webelievethisresponseratecanbeexplainedbytworeasons.
First,ourfollow-upscansoccurreduptoseveraldaysaftertheconnec-tionswereobserved,andsomeserverscouldhavemovedIPsorbeentakenofineinthemeantime.
Second,serversmightbeconguredwithrewallsthatonlyallowaccessfromcertainIPs,potentiallyblockingourZMapscanninghost.
Nonetheless,wearestillabletocaptureover400,000uniqueendpointsinthisdatasetthatweknowareserversthathavebeenobservedsendingdata.
Bothdatasetsmightstillcontainsomeamountofactualproxyendpointsinthem,whichweinvestigatefurtherinSectionV-D.
Nonetheless,thesetwodatasetsprovideadiverselistofcommonendpointsforustocomparewithourlimitedproxies.
IV.
IDENTIFYINGPROXIESAcensor'sgoalistondwaystodifferentiateproxyserversfromotherbenignserversontheInternet,inordertoblockproxieswithoutblockingotherservices.
Inthissection,we4discusstechniquesfordifferentiatingserversfromoneanotherforthepurposeofuniquelyidentifyingproxyservers.
Atahighlevel,ourgoalistoidentifywaystoevokeuniqueresponsesfromproxyserversincomparisontonon-proxyservers.
IfweareabletogetaproxyserverimplementationtorespondinadistinctwayfromeveryotherserverontheInter-net,censorscouldusetheiruniqueresponsestoidentifyandblockproxies.
Weidentifythreecriticalfeaturesinthewaysthatserversrespondtoprobesthatcanbeusedtongerprintmodernproxies:responsedata,connectiontimeouts,andclosethresholds,whichwedetailnext.
A.
ResponseDataMostserverswillrespondwithsomekindofdatawhensentaprobe.
Forinstance,HTTPserverswillrespondtoourHTTPprobe,butmanyotherprotocolswillrespondwitherrormessagesorinformationwhentheyreceiveapplicationlayerdatathattheydonotunderstand.
Ontheipside,weobservethatnoneofourproxyserversrespondwithanydataforanyoftheprobes.
Thisisduetotheproxystrategyofremainingsilentunlesstheclientprovesknowledgeofthesecret,whichaprobingcensorisunabletodo.
However,ifthecensorhasasetofprobesthatcancoaxatleastoneresponsefromallhoststhataren'tproxies,thennon-responsecouldinformacensorthatthehostisaproxyandsafetoblock.
Proxiesmighttrytoprovidesomeformofresponse,thoughthislikelycommitstheproxytoaparticularprotocol,whichhasbeenshowntobeadifcultstrategytoemploycorrectly[24].
Modernprobe-resistantproxiesneverrespondwithdatatoanyofourprobes,sowecaneasilymarkanyserversthatrespondwithdatatoanyofourprobesasanon-proxyendpoint.
B.
TimeoutsEvenifaserverdoesnotreplywithanydata,theymightstillclosetheconnectioninuniqueways.
Forexample,someservershaveanapplication-speciedtimeout,afterwhichtheywillcloseaconnection.
Timeoutsmightalsobedifferentdependingonthestatethataserverisin.
Forinstance,aservermighttimeoutaconnectionafter10secondsifithasn'treceivedanydata,buttimeoutafter60secondsifithasreceivedsomeinitialdatafromtheclient.
Serversmightalsodifferinthewaythattheyclosetheconnectionafterthetimeout.
TCPsessionsendaftereitheraTCPFINorTCPRSTpacketissentasdeterminedbytheinteractionswiththeapplicationandtheunderlyingoperatingsystem.
C.
BufferThresholdsFinally,wediscoveranotherimplementation-specicbe-haviorthatcandistinguishendpointseveniftheyhaveidenticaltimeoutsandneverrespondwithdata.
ConsideraserverthatreadsNbytesfromtheclient,andattemptstoparseitasaprotocolheader.
Iftheparsingfails(e.
g.
invalidelds,check-sumsorMACs),theservermaysimplyclosetheconnection.
However,iftheclientsendsonlyN1bytes,theservermightkeeptheconnectionopenandwaitforadditionaldatabeforeitattemptstoparse.
Fig.
1:TCPThreshold—ManyTCPserverapplicationsclosetheconnectionafterreceivingacertainthresholdofbytesfromtheclient(e.
g.
protocolheadersorexpectedpayloadchunks)whentheyfailtoparsethem.
ServersthatcloseaconnectionafterathresholdwillsendaFINpacket.
However,iftheapplicationhasnotread(bycallingrecv)allofthedatafromtheOS'sconnectionbufferwhenitcallsclose,theserverwillsendaRSTinstead.
Thesethresholdscanbelearnedremotely(viabinarysearchusingrandom/invaliddata)andusedasawaytoidentifyorngerprintserverimplementations.
Wetermsuchadatalimittheclosethresholdofaserver.
Ifaclient(orprobingcensor)sendslessthanthisnumberofbytes,theserverwillwait,evenifthosebytesarerandomandnon-protocolcompliant.
However,assoonastheclientsendsdatabeyondthethresholdlimit,theserverwillattempttoparsethedata(andlikelyfailinthecaseofrandomdata),andclosetheconnectionwitheitherFINorRST.
Noteveryserverimplementationhasaclosethreshold.
Someimplementationsmayinsteadreadfromtheconnec-tionforeverafteranerrorisencountered,oronlyclosetheconnectionafteradata-independenttimeout.
Nonetheless,serversthatdohaveathresholdprovideanadditionalwayforcensorstoidentifyanddistinguishthemfromotherserversandimplementations.
Wealsodiscoverasecondtypeofidentifyingthresholdforsomeserversthatclosetheconnectionafterreadingathresholdnumberofbytes,whichweconrmedhappensinatypicalLinuxapplication.
WhenaprogramclosesaTCPconnection,theoperatingsystemnormallysendsaFINpacketandcompletesa4-wayclosinghandshakewiththeremoteend.
However,incertaincasestheconnectionwillbeclosedwithaRSTpacketinstead.
OnLinux,wendifthereisanyunreaddataintheconnectionbuffer,theoperatingsystemwillsendaRSTpacket.
WedeneFINthresholdandRSTthresholdastheamountofbytesneededtobesenttoaserverinordertospecicallytriggertheFINorRST,whileclosethresholdreferstowhicheveroccursrst(lowernumberofbytes).
Figure1illustratestheFINandRSTthresholdsandhowtheycommonlyrelate.
DatasentuptotheFINthresholdwillcausetheservertokeeptheconnectionopen,whiledatasentbeyondthatwillcausetheservertoclosetheconnectionwithaFIN.
IfenoughdataissenttoexceedtheRSTthreshold,theserverwillclosewithaRST.
Theselimitsareduetoapplication-specicbehavior.
WhilepriorworkhasdemonstratedwaystomeasureTCPbehavior5ConnectionTimeoutFINThresholdRSTThresholdobfs460-180s8-16KBnextmod1448Lampshade90/135s256bytes257bytesshadowsocks-pythoncongurable50bytes-shadowsocks-outlinecongurable50bytes51bytesMTProto---OSSH30s24bytes25bytesTABLEII:Proxytimeoutsandthresholds—Wemeasuredhowlongeachprobe-resistantproxyletsaconnectionstayopenbeforeittimesoutandclosesit(ConnectionTimeout),andhowmanybyteswecansendtoaproxybeforeitimmediatelyclosestheconnectionwithaFINorRST.
obfs4'sRSTThresholdisthenextvalueaftertheFINthresholdthatisdivisibleby1448.
oftheunderlyingOS(e.
g.
viacongestioncontrol[34]),tothebestofourknowledge,wearethersttoidentifyandmeasuretheseapplication-specicthresholdsinTCPprograms.
Asanexample,obfs4hasarandomizedclosethresholdbetween8192and16384bytes.
Sincetheobfs4handshakecanbeupto8192bytesinlength,theserverreadsthismanybytesbeforedeterminingtheclientisinvalid,andenteringacloseAfterDelayfunction.
Thisfunctioneitherclosestheconnectionafterarandomdelay(30-90seconds)oraftertheserverhasreadanadditionalNbytes,forNchosenrandomlybetween0and8192atserverstartup.
However,eachofthesereadsisdoneusinga1448-bytebuffer.
Thismeansthatobfs4serverswillsendaFINiftheyaresent8192+Nbytes,andsendaRSTiftheyaresent8192+N((8192+N)mod1448)+1448bytes,abehaviorthatappearstobeuniquetoobfs4.
TableIIshowsthetimeoutsandthresholdsfortheprobe-resistantproxieswestudy.
1)ThresholdDetector:WedevelopedatooltobinarysearchforthecloseandRSTthresholdforagivenendpoint.
Ourtoolstartsbyconnectingandsending4,096bytesofran-domdata,andseeingiftheconnectionisclosed(FINorRST)within3seconds.
Ifitis,wemakesubsequentconnections,halvingthenumberofbyteswesenduntilwereachavaluethattheserverdoesnotclosetheconnectionwithinasecond.
Ifeventheoriginal4,096byteprobedoesnotresultinaclose,wereconnectandsendtwicetheamountofdata(upto1MB)untilwendaclose.
Oncewehaveidentiedanupperandlowerboundforagivenserver,webinarysearchbetweentheseboundstoobtaintheexactvalueoftheclosethreshold.
Onceaclosethresholdisfound,wereconrmusingtwofollowupprobes:oneimmediatelybelowandoneimmediatelyabovethefoundthreshold.
Weuse3differentseedstogeneratetherandomdatainthoseconrmationprobes.
Ifthebehaviorisidenticalwithall3seeds,wemarkitashavingastableclosethreshold.
Otherwise,wemarktheendpointasunstableandignorethethresholdvalue.
V.
EVALUATIONToevaluateourcompetencyatdistinguishingproxiesfromotherservers,wesentourprobestoover1.
9millionendpointscontainedinour"commonservers"dataset,andobservedthewaysinwhichtheirresponsesdifferfromthoseofknownproxies.
Asareminder,thisdatasetcontainsover500,000endpointsobservedatourpassiveISPtap,andapproximatelyProbeTapZMapTLS87.
8%0.
90%HTTP64.
6%0.
95%STUN52.
5%0.
56%Empty8.
4%0.
23%S756.
9%0.
66%Modbus51.
4%0.
54%DNS-AXFR58.
8%0.
67%Any94.
0%1.
16%TABLEIII:Percentofendpointsrespondingwithdata—ForbothourISPpassive(Tap)andactive(ZMap)datasets,wereportthepercentofendpointsthatrespondtoeachofourprobes,aswellasthepercentthatrespondedtoatleastone(Any).
Fig.
2:ResponsesetCDF—Wegroupendpointsbasedonhowtheyrespondtoour7non-randomprobes,capturingthenumberofbytestheendpointreplieswith,howlongitkeepstheconnectionopen(binnedtoseconds),andhowitclosedtheconnection.
Over42%ofendpointsrespondthesame(timeoutafter300seconds)inourZMapdataset,whichweidentifyasacommonrewallbehavior.
Despitebeingsmaller,theTapdatasetismuchmorediverse(129kvs31kuniqueresponsesets).
1.
5millionendpointscollectedusingZMaponrandomIPsandports.
Wesend13probestoeachendpoint(our7probesfromSectionIII-Aand6probeswithrandomdatarangingfrom23bytestoover17KB)andrecordifandwhentheserverrespondswithdataorclosestheconnection.
Iftheservercloses,werecordifitusedaFINorRST.
Iftheserverdoesnotrespondorclosetheconnection,wetimeoutafter300secondsandmarktheresultasaTIMEOUT.
Inadditiontosendingprobes,wealsouseourthresholddetectoroneachendpoint,recordingtheirclosethresholds.
TableIIIshowseachofour(non-random)probes,andthepercentofendpointsineachofourtwo"commonendpoints"datasetsthatrespondtotheprobeswithdata.
Sinceprobe-resistantproxiesneverrespondwithdata,wecanimmediatelydiscardendpointsthatreplytoanyofourprobesasnon-proxies.
InourpassiveTapdataset,thisrulesout94%ofhosts,leavingonly26,021potentialproxies.
Ontheotherhand,inourZMapdataset,theoverwhelmingmajorityofhostsdonotrespondwithdatatoanyprobes,allowingustodiscardonly1.
2%ofendpointsbasedsolelyonresponsedata.
6WendasignicantreasonforthisdiscrepancyisthatZMapidentiesalargenumberofrewallsthatemploycommon"chaff"strategies,wheretheyrespondtoeverySYNtheyreceiveonallportsandIPsintheirparticularsubnet[48].
Weobserveover42%ofendpointsinourZMapdatasetbehaveidenticallybyneversendingdataandneverclosingtheconnection(uptoour300secondtimeout)toourprobes2.
Tounderstandthediversityofresponses,weclusteredresponsesfromendpointsbyconstructingresponsesetsthatcapturestheresponsetype(FIN,RST,timeout),numberofbytestheyrespondwith(possibly0),andthetimeofconnectioncloseortimeout(binnedtointegerseconds)foreachnon-randomprobewesend.
Iftwoendpointshaveidenticalresponsesets(forinstancetheybothrespondtoprobe-AwithaFINafter3secondsanda10-byteresponse,andprobe-BwithaRSTafter9secondswithnodata,etc),wesaytheyareinthesameresponsegroup.
Inthisclustering,weignoreanyactualcontentandonlycomparelengthsifanyresponsedataissent.
ThemostpopularresponsegroupinourTapdatasetcomprisesonly3.
0%ofendpoints,andappearstobeTLSservers(99.
9%areonendpointswithport443)inCloudare'snetwork.
ThesehostsrespondtoourTLSprobewithahand-shakealert,duetothelackofServerNameIndication(SNI)inourclienthelloprobe.
Figure2showsaCDFofuniqueresponsesets(sortedbypopularity)forourZMapandTapdataset,andillustratesthelargerdiversityinourTapdataset.
Thetop10responsesetscompriseover80%ofendpointsintheZMapdataset,butonly13%ofendpointscollectedatourTap.
A.
DesigningadecisiontreeWenowturntodistinguishingproxiesfromtheremainingsetofcommonnon-proxyendpointsthatdidnotrespondwithanydata.
Anaturalchoicefordistinguishingproxiesfromcommonhostsistousemachinelearningtoautomatethesynthesisofaclassiermodel.
Weevaluatedusingautomatically-generateddecisiontreesinAppendixA,butdidnotndthemtoprovidehigheraccuracy,andfoundtheyrequiremoremanuallaborthanthedecisiontreeswebuildmanually.
Inthissectionweinsteadfocusonmanuallycreatingadecisiontreethatcandistinguishproxiesfromcommonendpoints,astheresultingtreesareeasiertointerpretandareaseffectivecomparedtotheautomatically-generatedones.
Foreachproxyprotocol,wemanuallycreatedadecisiontreefromanalysisoftheproxy'ssourcecodeandobservedbehaviortoourprobes.
Sincenoneoftheproxiessenddataresponsesforanyofourprobes,therstdecisionlayeristomarkendpointsthatrespondwithanydataasnon-proxies.
AsshowninTableIII,thiseliminates94%ofendpointsinourTapdataset,butonly1.
2%ofendpointsinourZMapdataset.
Thenextlayersofeachdecisiontreeareprotocolspecic,andwedescribeeachinFigures3-8.
Ineachofthesegures,eachboxlistsatthetopasetofprobes,andatthebottomtheexpectedresponse.
Eachexpectedresponseisintheformofaclosetype(FIN,RST,orTIMEOUT)andatimebound(inintegerseconds).
Forexample,Figure3showsthedecision2excludingourrandomdataprobesrand-17kRSTandbaofeng.
com.
Weareunabletoconrmwhethertheseendpointsarerunningobfs4servers,butconcludethatitisunlikelygiventheRSTthresholdresultsandtheirlocationinsideacensoredcountry,wheretheyareunlikelytobeusefulforcensorshipcircumvention.
Lampshade-ForLampshade,onlyoneendpointwasidentiedinourZMapdataset.
Thisendpointdidnothaveastableclosethreshold,andisthereforenotaLampshadeinstance.
ThisendpointisrunningonahostthatalsoservesaTraccarloginpage,anopensourcetoolforinterfacingwithGPStrackers.
Shadowsocks-Ourdecisiontreeidenties8endpointsinourZMapdatasetasshadowsocks-python,allofwhichhavea50byteFINthreshold,suggestiveoftheproxy.
Weperformedmanualfollowupscans,andfoundallbutoneoftheseendpointsalsorunSSHonthesamehost,withthesameversionbanner.
Thesehostsarescatteredaroundtheworldinvarioushostingnetworks,thoughnoneincensoredcountries.
Wecannotconcludeforcertainthattheseareallshadowsocksservers,butgiventhethresholdresults,andtheirsimilarityandnetworklocations,webelievetheylikelyare.
IfweextrapolatefromoursmallZMapscantotherestoftheInternet,wewouldestimatethatthereareontheorderof1millionshadowsocks-pythonendpointsrunningworldwide.
Uponfurtherinvestiga-tionoftheidentiedshadowsocksendpoints,5areinthesamehostingprovider(xTom)andeachhasadistinctsetof700sequentialTCPportsopenthatallexhibitidenticalbehaviorconsistentwithshadowsocks.
Forexample,oneIPhasTCPports30000–30699open,allseeminglyidenticalbehavior.
If5outofevery8shadowsocks-pythonservershad700portsopeninthismanner(andtheothershadonlyasingleport),our1millionshadowsocks-pythonendpointswouldextrapolatetoabout2285shadowsocksservers(uniqueIPs)worldwide.
9Fig.
10:ConnectionTimeouts—Forthesubsetofserversthatneverrespondedtoanyprobewithin300sweperformedafollowupscanallowingtheconnectiontostayopenforanextendedperiodoftimetoidentifyafairrepresentativeofaninnitetimeout.
Theresultsafter300saredominatedbyclienttimeoutsuggestingthatthisisareasonableapproximationofunlimitedtimeout.
ThestrategyemployedbyMTPrototoneverrespondandwaitforclientstotimeoutiswellrepresentedincommonserverbehavior.
Fig.
11:CloseThresholds—Wemeasuredtheclosethresholdofover400,000serverendpointsobservedatourISP.
WiththeexceptionofMTProto,mostproxythresholdsarerelativelyunique.
WeshowthethresholdsforOSSH(24),Shadowsocks(50),Lampshade(256),obfs4(8-16KB),andMTProto(above1MB).
Ourdecisiontreealsoidenties7endpointsintheZMapdatasetasshadowsocks-outline.
6ofthoseendpointsareinNetropyIPblocksinSouthKorea,withtheremaininginCogent'snetwork.
Bythetimeweperformedmanualanalysisontheseendpoints,theywerenolongerup,preventingfollow-upanalysis.
MTProto-Over3,000endpointswereclassiedasMT-Protoinourdatasets,thoughlikelyfew(ifany)ofthesearetrulyMTProtoservers.
Thisover-countisduetothesimpledecisiontreeusedtoclassifyMTProto:manyendpointssimplynevertimeoutanddonothaveanyclosethresholds,makingthemdifculttodistinguishfromoneanother.
Theseendpointsrepresent0.
56%and0.
02%ofourTapandZMapdatasetsrespectively.
ThissuggeststhatMTProto'sstrategyofcamouageiseffectiveatevadingactiveprobing,becausetheseendpointsoffertrulynoresponse,evenattheTCPlevel.
Fig.
12:Typesofclose—Wemeasuredthethresholdbehaviorofeachofover400,000serverendpoints.
Asexpected,mostserversclosetheconnectionwithaFIN,andthenifadditionaldataissent,aRST(dataleftinthebuffer)(FINbeforeRST).
ThemajorityofserverssendonlyaFIN,meaningourtestdidnotsendenoughdatatoelicitaRSTortheserverneversendsaRST.
Lesscommonly,wend(standardnon-conforming)serversthatonlyclosewithaRST,orserversthatinterleaveuseofRSTandFIN.
Weprovidein-depthdiscussionofeffectivedefensestrategiesinsectionVI.
OSSH-Weclassify8endpointsinourTapdatasetasOSSH.
WefollowedupwithPsiphon,apopularcircumventiontoolthatcommonlyusesOSSHservers,andidentiedthat7oftheseweretheirownservers,conrmingtheywereindeedOSSHendpoints.
TheremainingwashostedinLinode'snetworkonport443,butwecannotconrmifitisrunningOSSHoranunrelatedservice.
10Endpointsw/ThresholdDecisionTreeLabeledProxyTapZMapTapZMapobfs43556520Lampshade2101Shadowsocks301808MTProto13k106k3144296OSSH70580TABLEIV:DecisionTreeResults—Weappliedourmanually-createddecisiontreestoboththeTap(433kend-points)andZMap(1.
5Mendpoints)datasets.
Weexpectthedecisiontreestolabelveryfewornoendpointsasproxies.
Indeed,withtheexceptionofMTProto,ourdecisiontreendsveryfewornoproxies.
Insomeinstances,suchasOSSH,7ofthe8endpointsfoundinourTapdatasetwereconrmedtobeactualOSSHserversbytheirdevelopers.
Wealsopresentthenumberofendpointsthathavethesameclosethresholdastheproxieswestudy,withthedatashowingthatthresholdsalonearenotasdiscerningastimeoutsforidentifyingproxies.
a)Summary:Ourmanually-crafteddecisiontreeisgenerallyeffectiveatdistinguishingproxyserversfromcom-monhostsinbothourTapandZMapdatasets(withtheex-ceptionofMTProto).
Insomecases,thehandfulofendpointsthatwereclassiedasproxiesturnedouttobediscoveredproxies,whichweconrmedthroughprivateconversationwiththeirdevelopers.
Inothercases,suchasShadowsocks,wehavecircumstantialevidencethatsupportstheclaimthattheseendpointsareproxies,butnodenitivewaytoconrm.
Despiteourlowfalse-positiverate(conservatively,lessthan0.
001%forallprotocolsbesideMTProto),wenotethatthebaserateofproxiesascomparedtocommonendpointsisanimportantconsiderationforwould-becensors:evenseeminglynegligiblefalsepositiveratescanbetoohighforcensorstosuffer[42].
WealsoobservethatMTProtodemonstratesaneffectivebehaviorthatmakesitmoredifculttodistinguishfromasmallbutnon-negligiblefractionofnon-proxyendpoints(0.
56%and0.
02%ofTapandZMapdatasets),offeringapotentialdefensetootherproxies,whichwefurtherinvestigateinSectionVI.
VI.
DEFENSEEVALUATIONGiventheresultthatmostoftheprobe-resistantproxiescanbeidentiedwithahandfulofprobes,wenowturntodiscusshowtoimprovetheseprotocolstoprotectthemfromsuchthreats:Howshouldprobe-resistantproxiesrespondtobestcamouagethemselvesinwiththemostserversToanswerthis,welookinourdatasetsforthemostcommonresponsestoourprobes.
Ifproxiesrespondthesamewayasthousandsofotherendpoints,theywillbetterblendinwithcommonhostsontheInternet,makingthemharderforcensorstoidentifyandblock.
Wenotethatproxiesshouldnotattempttodirectlymimiccommondata-carryingresponsestoourprobes.
Sendinganyresponsedatacommitsaproxytoaparticularprotocol,whichintroducessignicantchallengesinfaithfullymimickingtheprotocol[24],[22].
Thus,despiteTLSerrorsbeingthemostcommonresponsetoourprobes,weruleouttheseandotherendpointsthatrespondtoourprobeswithdata.
Rulingouttheendpointsthatrespondwithdataeliminates407k(94%)endpointsfromourtapdataset,andnearly9k(a)(b)Fig.
13:Probe-indifferentServerTimeouts—Wedeneanendpointasprobe-indifferentifitrespondstoallofourprobesinthesameway(i.
e.
withonlyoneofFIN,RST,orTIMEOUTat(approximately)thesametime).
Wecomparetheprobe-indifferenttimeoutsforourTapdataset(a)andourZMapdataset(b).
Themostpopularbehavior,sharedbybothdatasets,istoneverrespondtoanyofourprobes,asshownbyour300+secondTIMEOUTingrey.
(1.
1%)endpointsfromourZMapdataset.
However,manyoftheremainingserversstillclosetheconnectionatdifferenttimeoutsdependingontheprobewesend.
Forexample,serversthathaveaclosethresholdwillclosetheconnectionwithFINorRSTatvaryingtimes,dependingonthelengthofprobewesend.
Itispossiblethatotherprobesbeyondourowncouldelicitothertimeoutsorevendataresponsesfromtheseservers.
Wethusonlyconsiderserversthatareprobe-indifferent,inthattheyrespondtoallofourprobeswiththesameresponsetype(FINorRST)atasimilartime(within500millisecondsoftheirotherresponses,toallowfornetworkjitter).
Weexcludetheemptyprobe(wherewedonotsenddata)responsetimesfromourprobe-indifferentendpointsastheseserversarestillwaitingfordataandmighttimeoutatadifferenttime.
Figure13showstheresponsetypeandtimeoutofthe6,956probe-indifferentendpointsinourtapdataset(568,121inZMap).
Inbothdatasets,theoverwhelminglypopularresponsetypeistimeout,whichindicatestheendpointdidnotrespondwithinthe5minutelimitforourscanner.
AsmeasuredinFigure10,theseendpointspredominantlynevertimeout,andinsteadreadfromtheconnectionforeverwithoutresponding.
Over0.
7%ofourtapdatasetendpoints(42%ofZMap)11exhibitsuch"innitetimeout"behavior.
Duetothisrelativeubiquity,werecommendproxydevelopersimplementun-limitedtimeoutsforfailedclienthandshakes,keepingtheconnectionopenratherthanclosingit.
ThisstrategyisalreadyemployedbyMTProtoserversweprobed,andwehavemaderecommendationstootherprobe-resistantproxydeveloperstoimplementthisaswell.
However,notallproxiesmaybewillingtokeepunusedorfailedconnectionsopenforever.
Inaddition,itmaybeben-ecialforcircumventiontoolstoemploymultiplestrategies,suchasselectingbetweennotimeoutsandotherpopularnitetimeoutsonaper-serverbasis.
Forproxiesthatmusttimeout,anaivestrategywouldbetolookatcommon(nite)responsetimesinFigure13,whichshowsthedistributionofprobe-indifferenttimeouts(i.
e.
howlongaserverwaitedtotimeoutforallourprobes).
However,manypopularresponsetimesaresharedbygroupsofserversthatshareotherimportantcharacteristicsthatcouldbedifcultforproxiestomimic.
Forexample,manyprobe-indifferentendpointsrespondedwithaFINtoallourprobesafter90seconds,butmanualinvestigationrevealsthatalloftheseendpointsareinthesame/16andrunningonthesameport(9933).
Inaddition,eachoftheservershavethesameadditionalportopen(9443)thatreturnidenticalHTTP503errorscarryingthenameofaCanadianvideogamedeveloper.
Ifprobe-resistantproxiesattemptedtomimictheseendpointsbyonlyrespondingtoallfailedhandshakeswithaFINafter90seconds,anobservantcensorcoulddistinguishthemfromtheothercommonendpoints.
AnotherexamplepopularresponseisrespondingwithaFIN3secondsafterourprobe,butweobservealmostallofthesehostsappeartobeinfrastructureassociatedwiththeChandraX-rayObservatory[5].
OneresponseexhibitedbyaheterogeneousmixofIPsubnetsandportsinbothdatasetsistoclosetheconnectionwitheitheraRST4orFINat0seconds,meaningtheseserversclosedtheconnectionrightawayafterourprobes.
Whileintuitive,applicationsmustbecarefultoensuretheyonlysendFINsoronlysendRSTsoninvalidhandshakes,regardlessofclientprobesize.
Inaddition,proxyapplicationsmustchoosehowlongtowaitifnodataissentwhenaconnectionisopened.
WendthatforendpointsinourtapdatasetthatsendFINsrightawaytoourdataprobes,themostcommontimeoutsforemptyprobes(wherewesendnodata)istoclosetheconnectionwithaFINafter120,2,or60seconds.
Wecautionthatourmanualanalysisoftheseendpointsisnotexhaustive:theremaybeotherprobesthatallowcensorstodistinguishbetweenproxiesandcommonendpoints,andwerecommendproxieschoosenotimeoutoveraspecicniteone.
However,ifproxiesmustchoosenitetimeouts,thesevaluesmayprovidethebestcover,providedtheproxyalsosendsFINsrightawayforanydatareceived.
VII.
RELATEDWORKSeveralpriorworkshaveidentiedwaystopassivelyiden-tifyproxyprotocols,allowingcensorstodifferentiateproxytrafcfromothernetworktrafc.
Forexample,Wangetal.
[42]useentropyteststodistinguishobfs4ows(andotherproxies)4accomplishedinLinuxbysettingtheSO_LINGERsocketoptionwithazerotimeoutfromcommonnetworktrafc,observingthatitisunusualfornormaltrafctohavehighentropy,particularlyearlyonintheconnection.
SinceevenencryptedprotocolslikeSSHandTLShavelow-entropyheaders,thehigh-entropybytesinobfs4connectionsisaneffectivesignal.
Previously,Wiley[46]usedBayesianmodelstoidentifyOSSHfromothertrafc.
Finally,Shahbaretal.
[37]useaclassieroverseveraltrafcfeatures(e.
g.
packetsize,timings,etc)toidentifyseveralproxies,includingobfs3,apreviousversionfromobfs4thatisnotprobe-resistant.
Ouractiveprobingattackcomplementstheseexistingpas-sivedetectorsfortworeasons.
First,becausethebaserateof"normal"(non-proxy)trafcissignicantlyhigherthanproxytrafc,evenwithlowfalsepositiverates,themajorityofowsidentiedasproxiesbyacensormayactuallybefalsepositives[42].
Thus,ouractiveapproachcouldhelpcensorsconrmsuspectedproxiesfoundusingpassivetech-niques.
Second,proxiescanthwartsuchpassiveanalysisbyaddingrandomtimingand/orpaddingtotheirtrafc[12].
However,evenwithsuchdefenses,existingproxiescouldstillbevulnerabletoactiveprobing.
Therefore,wearguethatbothpassiveandactiveattacksmustbeaddressed,andfocusontheactiveattackinthispaper.
Oneofthemainmotivationsforprobe-resistantprotocolstouserandomizedstreamsisfromHoumansadretal.
[24]("TheParrotisDead").
ThisworkarguedthatmimicryofexistingprotocolssuchasSkypeorHTTP—employedbyproxiesatthetime—isdifculttodocorrectly,asevensmallimplemen-tationsubtletiescanrevealdifferencesbetweentrueclientsandproxiesthatattempttomimicthem.
TheauthorsrevealseveralactiveandpassiveattackstoidentifySkypeMorph[32],StegoTorus[44],andCensorSpoofer[43]fromprotocols(e.
g.
Skype,HTTP)theyattempttouseormimic.
Finally,therearecircumventiontoolsthatdonotrequireendpointstoremainhiddenfromcensors.
Forexample,meek(domainfronting)[19]andTapDance(RefractionNetwork-ing)[49],[21]bothhaveusersconnectto"decoy"sitesthatarenotcomplicitinthecircumventionsystem,andusenetworkinfrastructure(i.
e.
loadbalancersorISPtaps)toactasproxiesthatredirecttrafctotheirintendeddestination.
Evenifacensordiscoversthedecoysites(whicharepublic),theycannotblockthemwithoutalsoblockinglegitimateaccesstothosesitesaswell.
Flashproxies[33],andmorerecentlySnowake[40],createshort-livedproxiesinwebbrowsersofusersthatvisitparticularwebsites.
ThesewebsitesserveJavaScriptorWebRTC-basedproxiesthatruninthevisitor'sbrowser,andcantransittrafcforcensoredusersforaslongasthevisitorremainsonthepage.
Theseshort-livedproxiescanstillbeblockedbycensors,butdoingsoreliablyisdifcultduetotheirephemeralnature.
VIII.
DISCUSSIONA.
ResponsibledisclosureWesharedourndingswithdevelopersoftheprobe-resistantproxieswestudied,whoacknowledgedandinmostinstancesmadechangestoaddresstheissue.
Specically,wereachedouttodevelopersofPsiphonforOSSH(whopushedaxonMay13,2019),obfs4(xedJune21,2019,version0.
0.
11),ShadowsocksOutline(xedSeptember4,122019,version1.
0.
7),andLantern'sLampshade(xedOctober31,2019).
Psiphon'sxforOSSHistoreadforeverafteraninitialhandshakefailure,whiletheothersreaduntildataisnotsentforacertainperiodoftime,afterwhichtheconnectionisclosed.
Allofthesexesremovetheclosethresholdbehaviorfromtheseprobe-resistantproxies,thoughmaystillbeobservableduetotheirtimeoutbehavior.
B.
FutureWorkLookingforwardwebelievethatthereareseveralavenuesforextendingthisworktostrengthenboththeattackanddefenses.
First,enrichingthesetofprobesthatweusecouldhelptoelicitresponsesfrommorenon-proxyendpoints,ultimatelyimprovingourattack.
Todiscoverorevenautomaticallycreatenewprobes,wecouldextractdatafromliveconnectionsinourISPtap.
Watchingwhatdataissentinconnectionscouldallowustoinfertheprotocolsbeingused,allowingustosynthesizeadditionalprobesforthoseprotocols.
Doingsorequiresovercomingprivacychallenges,ensuringthatprivateinformationisnotinadvertentlycollectedandthensentinprobestootherendpoints.
Asanotherimprovementtoactiveattacks,futureworkcouldinvestigatetheotherportsthatareopenonasuspectedproxyhost.
Forinstance,manyhostsinourZMapdatasetrespondedasopenonallTCPportswhenscannedusingnmap,acommonrewallbehaviorthatisintendedtothwartactivescanning.
However,ourdetectorcouldbeextendedtocollectandusethisdataasadditionalinformationinidentifyingproxies.
C.
Long-termdefensesSectionVIdetailsandevaluatesimmediatesmallchangesthatexistingprobe-resistantproxiescanmaketohelpaddressourimmediateattacks.
However,solvingthisproblemfullyinthelongtermmayrequirerethinkingtheoveralldesignofprobe-resistantproxies.
Wangetal.
[42]categorizecircumventiontechniquesintothreecategories:Randomizing,ProtocolMimicryandTun-nelling.
Randomizingtransportsattempttohidetheapplication-layerngerprintsandessentiallyhave"nongerprint"bysendingmessagesofrandomizedsizethatareencryptedtobeindistinguishablefromrandombytes.
Allprobe-resistanttransportswecoverinthispaperfallintothiscategory.
Thesetransportsdetectprobingbytestingtheclient'sknowledgeofasecret,anddonotrespondifitfails.
Asweshow,notrespondingtoanyprobesisrareandangerprintitself,thatprobingcensorscouldusetoblocksuchtransports.
BothProtocolMimicryandTunnellingapproachesattempttomaketrafclooklikeitbelongstoacertainprotocoloranapplication,butwithanimportantdistinction:Mimicryinvolvesimplementingalook-alikeversionoftheprotocol,whileTunnellingleveragesanexistingpopularimplementationofaprotocolorapplicationandtunnelscircumventiontrafcoverit.
Priorwork[24]hasshownthatProtocolMimicryisdifcult,duetothecomplexityoffeaturesinpopularimple-mentations.
ProtocolMimicrytransportslikeSkypeMorph[32]attempttorespondtocensorprobeswithrealisticcommonresponses,butminordifferenceswithatargetprotocolorpopularimplementationcanyielduniquengerprintsthatcensorscanusetoblock[24],[23].
Thus,ProtocolMimicrytransportsaredifculttouseinprobe-resistantproxies.
Ontheotherhand,TunnellingprotocolslikeDeltaShaper[8],Facet[28],andCovertCast[30]tunnelcircumventiontrafcoverexistingimplementationsandservices.
Censorsthatprobetheseserversreceiveresponsesfromlegitimateimplementations,makingthemdifculttodistinguishfrombenign(non-proxy)services.
TherehavebeenseveralTunnellingprotocolsproposedintheliterature[8],[28],[30]anddeployedinpractice[19]demonstratingtheirstrongpotential.
Tunnellingtransportscandefendagainstactiveprobingbylookinglikelegitimateservices,evenevadingcensorsthatwhitelistpopularprotocols[7].
WhilerecentworkhasdemonstratedthefeasibilityofdetectingTunnellingtransportsusingmachinelearning[9],tothebestofourknowledge,thesetechniqueshaveyettobeemployedbycensors,possiblyduetothechallengingcombinationoffalsepositiveratesofthedetectionalgorithmsandbaseratesoflegitimatetrafc[42].
IX.
ACKNOWLEDGEMENTSWewishtothankthemanypeoplethathelpedmakethisworkpossible,especiallytheUniversityofColoradoITSecurityandNetworkOperationsforprovidingusaccesstothenetworktapusedinthispaper,andJ.
AlexHaldermanforprovidingZMapdata.
Wealsothankthemanyproxydeveloperswediscussedthispaperwithandforprovidingproxiestotestagainst,includingOxCartatLantern,MichaelGoldbergerandRodHynesatPsiphon,andViniciusFortunaandBenSchwartzfromGoogleJigsaw(Outline).
WearealsogratefultoPrasanthPrahladanforhisinitialdiscussionandparticipationonthiswork,andwethankDavidFieldforhisvaluablecomments,feedback,andsuggestionsonthepaperinseveraldrafts.
X.
CONCLUSIONProberesistanceisnecessaryformodernproxyimple-mentationstoavoidbeingblockedbycensors.
Inthiswork,wedemonstratethatexistingprobe-resistantstrategiesaregenerallyinsufcienttostopcensorsfromidentifyingproxiesviaactiveprobes.
Weidentifyseverallow-levelchoicesthatproxydevelopersmakethatleavethemvulnerabletoactiveprobingattacks.
Inparticular,proxyserversrevealdeveloperchoicesaboutwhenconnectionsareclosedintermsoftimingorbytesread,allowingcensorstongerprintanddifferentiatethemfromothernon-proxyservers.
WeevaluatetheeffectivenessofidentifyingproxiesbyprobingendpointscollectedfrombothpassivetapobservationsandactiveZMapscans,andndthatourattacksareabletoidentifymostproxieswithnegligiblefalsepositiveratesthatmaketheseattackspracticalforacensortousetoday.
Leveragingourdatasets,wemakerecommendationstoexist-ingcircumventionprojectstodefendagainstthesepotentialattacks.
13REFERENCES[1]"Mtprotomobileprotocol:Detaileddescription,"https://core.
telegram.
org/mtproto/description.
[2]"Shadowsocks:Asecuresocks5proxy,"https://shadowsocks.
org/assets/whitepaper.
pdf.
[3]"Telegram:aneweraofmessaging,"https://telegram.
org/.
[4]"Howthegreatrewallofchinaisblockingtor,"inPresentedaspartofthe2ndUSENIXWorkshoponFreeandOpenCommunicationsontheInternet.
Bellevue,WA:USENIX,2012.
[On-line].
Available:https://www.
usenix.
org/conference/foci12/workshop-program/presentation/Winter[5]"ChandraX-rayObservatory,"Oct.
2019.
[Online].
Available:http://cxc.
harvard.
edu[6]Y.
Angel,"obfs4(Theobfourscator)specication,"https://gitlab.
com/yawning/obfs4/blob/master/doc/obfs4-spec.
txt.
[7]S.
Aryan,H.
Aryan,andJ.
A.
Halderman,"InternetcensorshipinIran:Arstlook,"in3rdUSENIXWorkshoponFreeandOpenCommunicationsontheInternet,Washington,D.
C.
,2013.
[8]D.
Barradas,N.
Santos,andL.
Rodrigues,"DeltaShaper:Enablingun-observablecensorship-resistantTCPtunnelingovervideoconferencingstreams,"ProceedingsonPrivacyEnhancingTechnologies,vol.
2017,no.
4,pp.
5–22,2017.
[9]——,"Effectivedetectionofmultimediaprotocoltunnelingusingmachinelearning,"in27thUSENIXSecuritySymposium.
USENIXAssociation,2018.
[10]D.
J.
Bernstein,M.
Hamburg,A.
Krasnova,andT.
Lange,"Elligator:Elliptic-curvepointsindistinguishablefromuniformrandomstrings,"inProceedingsofthe2013ACMSIGSACconferenceonComputer&communicationssecurity.
ACM,2013,pp.
967–980.
[11]A.
Bersenev,"AsyncMTProtoproxyforTelegraminPython,"https://github.
com/alexbers/mtprotoproxy.
[12]X.
Cai,R.
Nithyanand,andR.
Johnson,"CS-BuFLO:Acongestionsensitivewebsitengerprintingdefense,"inProceedingsofthe13thWorkshoponPrivacyintheElectronicSociety.
ACM,2014,pp.
121–130.
[13]clowwindy,"shadowsocks(python),"https://github.
com/shadowsocks/shadowsocks,Jun2019.
[14]R.
Dingledine,"Strategiesforgettingmorebridgeaddresses,"https://blog.
torproject.
org/strategies-getting-more-bridge-addresses,2011.
[15]——,"Obfsproxy:thenextstepinthecensorshiparmsrace,"https://blog.
torproject.
org/obfsproxy-next-step-censorship-arms-race,2012.
[16]Z.
Durumeric,E.
Wustrow,andJ.
A.
Halderman,"ZMap:Fastinternet-widescanninganditssecurityapplications,"in22ndUSENIXSecuritySymposium(USENIXSecurity'13),2013,pp.
605–620.
[17]K.
P.
Dyer,S.
E.
Coull,T.
Ristenpart,andT.
Shrimpton,"Protocolmisidenticationmadeeasywithformat-transformingencryption,"inProceedingsofthe2013ACMSIGSACconferenceonComputer&communicationssecurity.
ACM,2013,pp.
61–72.
[18]R.
Ensa,D.
Field,P.
Winter,N.
Feamster,N.
Weaver,andV.
Paxson,"Examininghowthegreatrewalldiscovershiddencircumventionservers,"inProceedingsofthe2015InternetMeasurementConference.
ACM,2015,pp.
445–458.
[19]D.
Field,C.
Lan,R.
Hynes,P.
Wegmann,andV.
Paxson,"Blocking-resistantcommunicationthroughdomainfronting,"ProceedingsonPrivacyEnhancingTechnologies,vol.
2015,no.
2,pp.
46–64,2015.
[20]FreedomPrevails,"HighPerformanceNodeJSMTProtoProxy,"https://github.
com/FreedomPrevails/JSMTProxy.
[21]S.
Frolov,F.
Douglas,W.
Scott,A.
McDonald,B.
VanderSloot,R.
Hynes,A.
Kruger,M.
Kallitsis,D.
G.
Robinson,S.
Schultzeetal.
,"AnISP-scaledeploymentofTapDance,"in7thUSENIXWorkshoponFreeandOpenCommunicationsontheInternet(FOCI'17),2017.
[22]S.
FrolovandE.
Wustrow,"TheuseofTLSincensorshipcircumven-tion,"inProc.
NetworkandDistributedSystemSecuritySymposium(NDSS),2019.
[23]J.
Geddes,M.
Schuchard,andN.
Hopper,"CoveryourACKs:Pitfallsofcovertchannelcensorshipcircumvention,"inProceedingsofthe2013ACMSIGSACconferenceonComputer&communicationssecurity.
ACM,2013,pp.
361–372.
[24]A.
Houmansadr,C.
Brubaker,andV.
Shmatikov,"Theparrotisdead:Observingunobservablenetworkcommunications,"inSecurityandPrivacy(SP),2013IEEESymposiumon.
IEEE,2013,pp.
65–79.
[25]JigsawOperationsLLC,"OutlineVPN.
"[Online].
Available:https://www.
getoutline.
org/en/home[26]LanternProject,"Lampshade:atransportbetweenLanternclientsandproxies,"https://godoc.
org/github.
com/getlantern/lampshade.
[27]B.
Leidl,"ObfuscatedOpenSSH,"https://github.
com/brl/obfuscated-openssh/blob/master/README.
obfuscation.
[28]S.
Li,M.
Schliep,andN.
Hopper,"Facet:Streamingovervideocon-ferencingforcensorshipcircumvention,"inProceedingsofthe13thWorkshoponPrivacyintheElectronicSociety.
ACM,2014,pp.
163–172.
[29]G.
F.
Lyon,Nmapnetworkscanning:TheofcialNmapprojectguidetonetworkdiscoveryandsecurityscanning.
Insecure,2009.
[30]R.
McPherson,A.
Houmansadr,andV.
Shmatikov,"CovertCast:Usinglivestreamingtoevadeinternetcensorship,"ProceedingsonPrivacyEnhancingTechnologies,vol.
2016,no.
3,pp.
212–225,2016.
[31]P.
Mockapetris,"Domainnames-implementationandspecication,"IETF,RFC1035,Nov.
1987.
[Online].
Available:http://tools.
ietf.
org/rfc/rfc1035.
txt[32]H.
MohajeriMoghaddam,B.
Li,M.
Derakhshani,andI.
Goldberg,"SkypeMorph:ProtocolobfuscationforTorbridges,"inProceedingsofthe2012ACMconferenceonComputerandcommunicationssecurity.
ACM,2012,pp.
97–108.
[33]A.
Moshchuk,S.
D.
Gribble,andH.
M.
Levy,"Flashproxy:transpar-entlyenablingrichwebcontentviaremoteexecution,"inProceedingsofthe6thinternationalconferenceonMobilesystems,applications,andservices.
ACM,2008,pp.
81–93.
[34]J.
PahdyeandS.
Floyd,"OninferringTCPbehavior,"ACMSIGCOMMComputerCommunicationReview,vol.
31,no.
4,pp.
287–298,2001.
[35]PsiphonInc,"Psiphontunnelcore,"https://github.
com/Psiphon-Labs/psiphon-tunnel-core,2019.
[36]SergeyArkhipov,"MTProtoproxyforTelegraminGolang,"https://github.
com/9seconds/mtg.
[37]K.
ShahbarandA.
N.
Zincir-Heywood,"AnanalysisofTorpluggabletransportsunderadversarialconditions,"in2017IEEESymposiumSeriesonComputationalIntelligence(SSCI).
IEEE,2017,pp.
1–7.
[38]TorProject,"Bridgedb,"https://bridges.
torproject.
org/.
[39]——,"Tor:PluggableTransports,"https://www.
torproject.
org/docs/pluggable-transports.
html.
[40]——,"Snowake:pluggabletransportthatprox-iestrafcthroughtemporaryproxiesusingwebrtc.
"https://trac.
torproject.
org/projects/tor/wiki/doc/Snowake,2018.
[41]twilde,"KnockKnockKnockin'onBridges'Doors,"https://blog.
torproject.
org/knock-knock-knockin-bridges-doors,2012.
[42]L.
Wang,K.
P.
Dyer,A.
Akella,T.
Ristenpart,andT.
Shrimpton,"Seeingthroughnetwork-protocolobfuscation,"inProceedingsofthe22ndACMSIGSACConferenceonComputerandCommunicationsSecurity.
ACM,2015,pp.
57–69.
[43]Q.
Wang,X.
Gong,G.
T.
Nguyen,A.
Houmansadr,andN.
Borisov,"Censorspoofer:asymmetriccommunicationusingIPspoongforcensorship-resistantwebbrowsing,"inProceedingsofthe2012ACMconferenceonComputerandcommunicationssecurity.
ACM,2012,pp.
121–132.
[44]Z.
Weinberg,J.
Wang,V.
Yegneswaran,L.
Briesemeister,S.
Cheung,F.
Wang,andD.
Boneh,"StegoTorus:acamouageproxyforthetoranonymitysystem,"inProceedingsofthe2012ACMconferenceonComputerandcommunicationssecurity.
ACM,2012,pp.
109–120.
[45]T.
Wilde,"GreatrewallTorprobingcirca09DEC2011,"https://gist.
github.
com/da3c7a9af01d74cd7de7,2011.
[46]B.
Wiley,"Blocking-resistantprotocolclassicationusingbayesianmodelselection,"Technicalreport,UniversityofTexasatAustin,Tech.
Rep.
,2011.
[47]P.
Winter,T.
Pulls,andJ.
Fuss,"ScrambleSuit:Apolymorphicnetworkprotocoltocircumventcensorship,"inProceedingsofthe12thACM14workshoponWorkshoponprivacyintheelectronicsociety.
ACM,2013,pp.
213–224.
[48]R.
WrightandA.
Wick,"CyberChaff:Confoundinganddetectingadversaries,"https://galois.
com/project/cyberchaff/,2019.
[49]E.
Wustrow,C.
Swanson,andJ.
A.
Halderman,"TapDance:End-to-middleanticensorshipwithoutowblocking.
"in23rdUSENIXSecuritySymposium,Aug.
2014.
[50]L.
Yang,M.
Lv,andC.
Windy,"Shadowsocks-libevlibevportofshad-owsocks,"https://github.
com/shadowsocks/shadowsocks-libev,2014.
[51]ZMapproject,"ZGrab:ApplicationlayerscannerthatoperateswithZMap,"https://github.
com/zmap/zgrab.
APPENDIXAAUTOMATEDPROXYCLASSIFICATIONWhileweusedmanually-crafteddecisiontreestodistin-guishproxiesfromcommon(non-proxy)endpoints,wealsocreatedautomatically-generateddecisiontrees.
Inthissection,wepresentthechallengesinusingmachinelearninginthiscontext,andcomparetheresultstoourmanualapproach.
WestartbylteringourTapandZMapdatasetstoexcludeendpointsamplesthatrespondwithdatatoourprobes,astheyaretrivialtoclassifyasnon-proxyresults,evenwithoutknowinganydetailsofprobe-resistantproxyprotocols.
Thisleaves25ksamplesfromourTapdataset(776kfromZMap)thatweusefortrainingandtestingourautomatedclassier.
Evenafterremovingthesetriviallyclassiedsamples,ourdatasetsremainextremelyunbalancedaswehaveonlyahand-fulofpositiveproxysamples(comparedtotensorhundredsofthousandsofnon-proxysamples).
Toaddressthisimbalance,wesynthesizeproxysamplesbasedonourunderstandingfrommanualinspectionoftheirsourcecode.
Tosimulatenetworkmeasurement,weaddarandom20–500millisecondsoflatencytothetimeoutsspeciedbytheproxy'sservertimeoutcode.
Weemphasizethatwhilenecessarytobalanceourdatasets,understandingproxybehavioratthislevelofdetailisalreadysufcienttocreatethemanualdecisiontrees.
Wemustbecarefulaboutthedataimbalanceinourdataset,aswesynthesizethesamples.
Ifwesynthesizetoomany,proxieswillbealargeclusterinanotherwiseheterogeneouspopulation.
Otherwise,ifwesynthesizetoofew,thetreewilloverttothesmallsampleofproxiesandnotgeneralize.
Wechosetogeneratetheamountofproxysamplesequalto1%oftheamountofnot-proxysamples.
Asaresult,foreachproxylabelwegenerateatotalof258samplesforTapdataset,and7767samplesforZMapdataset.
ThishelpstobalancethedatasetwhilestillconveyingthatproxiesarerelativelyuncommonontheInternet.
Wethentrainedandtestedanautomatedmulti-classde-cisiontreeonourZMapandTapdatasets,includingineachthesyntheticsampleswegeneratedtobalancethedatasets.
Wealsobuildamanualmulti-classclassierbasedontheconditionsfromFigures3-8foreachproxylabel,andclassifyanyunmatchedsamplesasnotproxies.
Sinceallproxieshavemutuallyexclusivetrees,wecanchecktheminanyorder.
TheresultingaccuracyforourautomateddecisiontreeisshowninTableV.
Toevaluateourautomateddecisiontreeondatasetsithasnotseenbefore,wetrainononedataset(TaporZMap),andtestontheother(ZMaporTap).
Wealsouse5-foldcross-validationwhenwetrainandtestusingsubsetsofthesameFig.
14:Overtsubtree—Ourautomateddecisiontree(trainedonourZMapdatasetwithsyntheticsamples)showsevidenceofovertting.
Attherootnodeofthissubtree,thereare1171shadowsocksand4non-proxysampleslefttoclassify.
Ratherthandecidingonsomethinginherenttotheshadowsocksprotocol,theclassierdividessamplesbasedonextrinsicresponselatencydifferencestotheTLSprobe(TLSaborttime).
PartsofthetreedividesamplesintoendpointsthatrespondedtoourTLSprobebetween336and338mil-liseconds,andtoourrand-17410probebetween334and338milliseconds.
Weconrmednoneofthesetimesareintrinsictotheshadowsocksimplementations.
EvaluatedonTrainedonZMapTapZMap0.
999590.
88180Tap0.
990170.
98910manual0.
999620.
88386TABLEV:Accuracyofdecisiontrees—Weevaluatedac-curacyofourmanualandautomateddecisiontrees,trainedonourZMapandTapdatasets(includingsyntheticsamples).
Weexcludedendpointsthatrespondwithdatatoanyofourprobes,yielding25ksamplesfromourTapdatasetand776kfromtheZMapdataset.
Weused5-foldcross-validationtotrainandtesttheautomateddecisiontreeswhentrainingandevaluatingonthesamedataset.
ThemajorityofinaccuraciesforbothautomatedandmanualdecisiontreesstemfrommisclassicationsofMTProto.
dataset,partitioningthesetintodistinctsubsetsfortrainingandtesting.
Thereare3233(12.
5%)non-proxyendpointsinourTaplearningdatasetthatneverclosetheconnection5.
ThisbehaviorissharedbyMTProto,andthedecisionwhetherornottoclassifythoseendpointsasMTProtoaffectsaccuracythemostfortheTapdatasets.
ThemanualdecisiontreeandautomatedclassierlearnedonZMapdatabothclassifytheseendpointsasMTProto,whiletheautomateddecisiontreelearnedonTapdataclassiedthemasnot-proxies.
ThehighnumberofMTProto-likesamplesinthenon-proxyTapdatasetandtheTap-trainedtree'sdecisiontoclassifythemasnon-proxiesexplainsthehigheraccuracyofthedecisiontreethatwastrainedand5-foldcross-validatedonTapdata.
Toprovideafullsummaryofbothcorrectandincorrectpredictionsthat5ordosoafterour300secondprobingutilitytimeout15PredictedasActuallabelnot-proxylampshademtprotoobfs4osshss-pythonnot-proxy77642100000Lampshade177670000MTProto29607767000obfs4000776700OSSH000077670ss-python800007767(a)manual,testedonZMapdatasetPredictedasActuallabelnot-proxyLampshadeMTProtoobfs4OSSHss-pythonnot-proxy2272100000Lampshade02580000MTProto31450258000obfs420025800OSSH80002580ss-python00000258(b)manual,testedonTapdatasetPredictedasActuallabelnot-proxyLampshadeMTProtoobfs4OSSHss-pythonnot-proxy2266500000Lampshade22580000MTProto31820258000obfs4190025800OSSH80002580ss-python00000258(c)automated,trainedonZMap,testedonTapPredictedasActuallabelnot-proxyLampshadeMTProtoobfs4OSSHss-pythonnot-proxy776663877672412231Lampshade1177590000MTProto000000obfs42500774300OSSH600076450ss-python2100007736(d)automated,trainedonTap,testedonZMapTABLEVI:ConfusionMatrices—Confusionmatricesforourmanualandautomatically-generateddecisiontreesforbothourTapandZMapdatasets.
Weseegoodperformanceoverallforidentifyingproxies,thoughnoteallclassiersstruggletodistinguishMTProto,duetotheubiquityofendpointsthathavenoconnectiontimeout.
ourmanualandautomateddecisiontreesmade,weincludetheirConfusionMatricesinTableVI.
Nonetheless,inallothercases,ourmanualdecisiontreeslightlyout-performstheautomateddecisiontree.
Wepresentourautomatedmulti-classdecisiontreetrainedonourTap(andsyntheticdata)datasetinFigure15.
Weconcludethatautomateddecisiontreesmaybeaviablewaytoallowproxydeveloperstoquicklytestiftheirservershaveresponsesthatstandout,butarenomoreaccuratethanourmanuallycreateddecisiontrees,whilerequiringnolessmanuallabor.
Wenoteourmanuallycreatedtreeshavetheadvantagethattheywerebuiltusingonlydomainknowledgeofthespecicproxies;anyupdatestoproxiescanbedirectlyencoded.
Ontheotherhand,our"automated"decisiontreeswillneedtobeprovidedbothupdateddomainknowledge(forsyntheticsamples),andalsoretrainedifevenunrelatednon-proxytrafcchanges;the"automated"decisiontreealsoneedstoberetrainedifonlyoneoftheproxieschangebehavior.
16Fig.
15:AutomatedDecisionTree—Wetrainedamulti-classdecisiontreeclassieronthe25kendpointsinourTapdataset(and1ksynthetically-generatedproxysamples).
Eachnodecontainsanarrayofthenumberofsamplesyettobeclassiedatthatpointinthetree:[notproxy,lampshade,mtproto,obfs4,ossh,shadowsocks-python].
Nodesarelabeledwiththecurrentclassication,andcoloredaccordingtohowcondentadecisioncouldbemadeatthatpoint.
17
快云科技: 11.11钜惠 美国云机2H5G年付148仅有40台,云服务器全场7折,香港云服务器年付388仅不到五折 公司介绍:快云科技是成立于2020年的新进主机商,持有IDC/ICP/ISP等证件资质齐全主营产品有:香港弹性云服务器,美国vps和日本vps,香港物理机,国内高防物理机以及美国日本高防物理机官网地址:www.345idc.com活动截止日期为2021年11月13日此次促销活动提供...
ZJI发布了一款7月份特别促销独立服务器:香港邦联四型,提供65折优惠码,限量30台(每用户限购1台),优惠后每月520元起。ZJI是原来Wordpress圈知名主机商家:维翔主机,成立于2011年,2018年9月启用新域名ZJI,提供中国香港、台湾、日本、美国独立服务器(自营/数据中心直营)租用及VDS、虚拟主机空间、域名注册等业务。下面列出这款服务器的配置信息。香港邦联四型CPU:2*E5-2...
硅云怎么样?硅云是一家专业的云服务商,硅云的主营产品包括域名和服务器,其中香港云服务器、香港云虚拟主机是非常受欢迎的产品。硅云香港可用区接入了中国电信CN2 GIA、中国联通直连、中国移动直连、HGC、NTT、COGENT、PCCW在内的数十家优质的全球顶级运营商,是为数不多的多线香港云服务商之一。目前,硅云香港云服务器,CN2+BGP线路,1核1G香港云主机仅188元/年起,域名无需备案,支持个...
baofeng.com为你推荐
云爆发云出十里未及孤村什么意思咏春大师被ko大师:咏春是不会败的 教练:能不偷袭吗,咏春拳教练关键字关键词编故事www.119mm.comwww.kb119.com 这个网站你们能打开不?partnersonline国内有哪些知名的ACCA培训机构www.henhenlu.com有一个两位数,十位数字是个位数字的二分之一,将十位数字与个位数字对调,新的两位数比原来大36,这个两位数www.idanmu.com新开奇迹SF|再创发布网|奇迹SF|奇迹mu|网通奇迹|电信奇迹|yinrentangweichentang万艾可正品的作用真的不错吗龚如敏请问这张图片出自哪里?雀嘴鳝什么是雀鳝鱼 雀鳝可以吃吗
个人虚拟主机 域名买卖 香港vps主机 动态ip的vps 阿云浏览器 duniu 创宇云 国内php空间 申请个人网页 admit的用法 搜索引擎提交入口 shuang12 国外的代理服务器 cdn网站加速 酸酸乳 金主 小夜博客 学生机 最新优惠 asp简介 更多