gooddrupal7
drupal7 时间:2021-04-13 阅读:(
)
RESEARCHOpenAccessCloudrepositoryasamaliciousservice:challenge,identificationandimplicationXiaojingLiao1,2*,SumayahAlrwais1,KanYuan1,LuyiXing1,XiaoFengWang1,ShuangHao1andRaheemBeyah1AbstractThepopularityofcloudhostingservicesalsobringsinnewsecuritychal-lenges:ithasbeenreportedthattheseservicesareincreasinglyutilizedbymiscreantsfortheirmaliciousonlineactivities.
Mitigatingthisemergingthreat,posedbysuch"badrepositories"(simplyBar),ischallengingduetothedifferenthostingstrategytotraditionalhostingservice,thelackofdirectobservationsoftherepositoriesbythoseoutsidethecloud,thereluctanceofthecloudprovidertoscanitscustomers'repositorieswithouttheirconsent,andtheuniqueevasionstrategiesemployedbytheadversary.
Inthispaper,wetookthefirststeptowardunderstandinganddetectingthisemergingthreat.
Usingasmallsetof"seeds"(i.
e.
,confirmedBars),weidentifiedasetofcollectivefeaturesfromthewebsitestheyserve(e.
g.
,attemptstohideBars),whichuniquelycharacterizetheBars.
Thesefeatureswereutilizedtobuildascannerthatdetectedover600BarsonleadingcloudplatformslikeAmazon,Google,and150Ksites,includingpopularoneslikegroupon.
com,usingthem.
HighlightsofourstudyincludethepivotalrolesplayedbytheserepositoriesonmaliciousinfrastructuresandotherimportantdiscoveriesincludehowtheadversaryexploitedlegitimatecloudrepositoriesandwhytheadversaryusesBarsinthefirstplacethathasneverbeenreported.
Thesefindingsbringsuchmaliciousservicestothespotlightandcontributetoabetterunderstandingandultimatelyeliminatingthisnewthreat.
Keywords:Cloud,Cybercrime,MalicioushostingserviceIntroductionCloudhostingservicetodayisservingoverabillionusersworld-wide,providingthemstable,low-cost,reli-able,high-speedandgloballyavailableresourceaccess.
Forex-ample,AmazonSimpleStorageService(S3)isreportedtostoreover2trillionobjectsforwebandimagehosting,systembackup,etc.
Inadditiontostoringdata,theseser-vicesaremovingtowardamoreactiveroleinsupportingtheircustomers'computingmissions,throughsharingtherepositories(a.
k.
a.
bucketforGoogleCloud(Bucketsn.
d.
))hostingvariousdynamiccontentandprogrammingtools.
AprominentexampleisGoo-gle'sHostedLibraries(Google.
Googlehostedlibraries2015),acontentdistributionnetwork(CDN)fordissem-inatingthemostpopular,open-sourceJavaScriptre-sources,whichwebdeveloperscaneasilyincorporateintotheirwebsitesthroughasimplecodesnippet.
Inadditiontobenignusers,thepopularityoftheseserviceshasalsoattractedcybercriminals.
Comparedwithdedi-catedundergroundhostingservices,repositoriesonle-gitimatecommercialcloudsaremorereliableandhardertoblacklist.
Theyarealsomuchcheaper:forexample,itisreportedthat15GBonthedarknetissoldat$15permonth(Servnetn.
d.
),whichisactuallyofferedforfreebyGoogletoeveryGoogleDriveruser.
Indeed,ithasbeenreported(solutionary2015)thatmalwaredistribu-torsareincreasinglyusingthecommercialcloudstoprocessanddeploymaliciouscontent.
Understandingbadcloudrepositories:challengesAlthoughtherehavebeenindicationsofcloudhostingmisuse,understandinghowsuchservicesareabusedischallenging.
Fortheserviceproviders,whoareboundbytheirprivacycommitmentsandethicalconcerns,theytendtoavoidinspectingthecontentoftheircustomers'repositoriesintheabsenceofproperconsent.
Evenwhentheprovidersarewillingtodoso,determiningwhetherarepositoryinvolvesmaliciouscontentisbynomeans*Correspondence:xliao@indiana.
edu1IndianaUniversity,KingSaudUniversity,UniversityofTexasatDallas,GeorgiaInstituteofTechnology,Atlanta,USA2DepartmentofComputerScience,IndianaUniversityBloomington,Bloomington,USACybersecurityTheAuthor(s).
2018OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttribution4.
0InternationalLicense(http://creativecommons.
org/licenses/by/4.
0/),whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicense,andindicateifchangesweremade.
Liaoetal.
Cybersecurity(2018)1:14https://doi.
org/10.
1186/s42400-018-0015-6trivial:nutsandboltsformaliciousactivitiescouldap-pearperfectlyinnocentbeforetheyareassembledintoanattackmachine;examplesincludeimagefilesforSpamandPhishingasshowninFig.
1.
Actually,evenfortherepositoryconfirmedtoservemaliciouscontentlikemalware,today'scloudproviderstendtoonlyremovethatspecificcontent,insteadofterminatingthewholeaccount,toavoidcollateraldamage(e.
g.
,compromisedlegitimaterepositories).
Exploringtheissuebecomesevenmoredifficultforthethirdparty,whodoesnothavetheabilitytodirectlyobservetherepositoriesandcanonlyaccessthemthroughthewebsitesorsourcesthatutilizethestorageservices.
Furtheraddingtothecomplexityoffindingsucharepositoryisthediverserolesitmayplayinattackinfrastructures(e.
g.
,servingmalwareforoneattackandservingPhishingcontentforanother),duetothemixedcontentasinglerepositorymayhost:e.
g.
,malwaretogetherwithPhishingimages.
Asaresult,existingtechniques(e.
g.
,thosefordetectingdedicatedmaliciousservices(Lietal.
2014;Nelmsetal.
2015))cannotbedirectlyappliedtocapturethereposi-tory,simplybecausetheiroriginaltargetsoftencontainmorehomogeneouscontent(e.
g.
,justmalware)andcon-tributetodifferentcampaignsinthesameway.
Sofar,littlehasbeendonetounderstandthescopeandmagni-tudeofmaliciousorcompromisedrepositoriesonlegit-imateclouds(calledBadRepositoryorsimplyBarinourresearch)andthetechnicaldetailsabouttheirservicestotheadversary,nottomentionanyefforttomitigatethethreattheypose.
Finding"bars"onlineInthispaper,wepresentthefirstsystematicstudyontheabusesofcloudrepositoriesonthelegitimatecloudplatformsasamaliciousservice,whichwasfoundtobehighlypervasive,actingasabackboneforlarge-scalemaliciouswebcampaigns(Section"MeasurementandDiscoveries").
Ourstudywasbootstrappedbyasetof"seeds":100confirmedmaliciousorcompromisedbuckets(Bucketsn.
d.
),eachofwhichisacloudresourcerepositorywithstoredobjects(oftenofdifferenttypes)organizedunderauniqueidentificationkey.
ThesebucketswerecollectedfromSpammessagesorthemali-ciousURLscachedbyapopularmalwarescanner.
Com-paringthemwiththoseknowntobelegitimate,wefoundthatdespitevariousroleseachbucketplaysindif-ferenttypesofattacks(duetothediversityinthecontentitserves),stillthewebsitesconnectingtothosebucketsexhibitprominentcommonfeatures(seeSection"FeaturesofBadRepositories"),particularly,thepres-enceof"gatekeeper"sitesthatcovertheBars(avaluableassetfortheadversary)andremarkablyhomogeneousredirectionbehavior(i.
e.
,fetchingrepositoryresourcesindirectlythroughothersites'references)andsometimessimilarcontentorganizations,duetothesameattackpayloadthecompromisedsitesuploadfromtheirback-end(i.
e.
,theBars),orthetemplatesthebucketprovidestotheadversaryforquickdeploymentofherattacksites.
Bycomparison,alegitimatebucket(e.
g.
,reputablejQu-eryrepository)tendstobedirectlyaccessedbytheweb-siteswithhighlydiversecontent.
Basedonthisobservation,wedevelopedBarFinder,ascannerthatautomaticallydetectsBarsthroughinspect-ingthetopologicalrelationsbetweenwebsitesandthecloudbuckettheyuse,inanattempttocaptureBarsbasedontheexternalfeaturesofthewebsitestheyserve.
Morespecifically,forallthesitesconnectingtoareposi-tory,ourapproachcorrelatesthedomainsandURLs(particularthoserelatedtocloudrepositories)acrosstheirredirectionchainsandcontentfeaturesacrosstheirDOMstructurestoidentifythepresenceofgatekeepersandevadingbehavior,andalsomeasurethediversityoftheircontentorganization.
Asetofnewcollectivefea-turesgeneratedinthisway,includingbucketusagesimi-larity,connectionratio,landingsimilarityandothers(Section"FeaturesofBadRepositories"),arefurtheruti-lizedbyaclassifiertofindoutsuspiciousbuckets.
Run-ningthescanneroverallthedatacollectedbytheCommonCrawl(Crawl2015),whichindexedfivebillionwebpages,forthoseassociatedwithallmajorcloudstorageproviders(includingAmazonS3,Cloudfront,GoogleDrive,etc.
),wefoundaround1millionsitesutil-izing6885repositorieshostedontheseclouds.
Amongthem,BarFinderidentified694maliciousorcompro-misedrepositories,involvingmillionsoffiles,withapre-cisionof95%andacoverageof90%againstourground-truthset.
OurdiscoveriesLookingintotheBarsidentifiedbyourscanner,wearesurprisedbythescopeandthemagnitudeofthethreat.
Thesebucketsarehostedbythemostreputablecloudserviceproviders.
Forexample,13.
7%ofAmazonS3re-positoriesand5.
5%ofGooglerepositoriesthatweFig.
1ExampleofdeceptiveimagesinAmazonS3bucketcicloudfrontusedformalvertising.
TheimagewasshownatthebottomofawebpageasanupdatenotificationtolurevisitorstodownloadmalwareLiaoetal.
Cybersecurity(2018)1:14Page2of18inspectedturnedouttobeeithercompromisedorcom-pletelymalicious.
1Amongthosecompromisedarepopu-larcloudrepositoriessuchasGroupon'sofficialbucket.
Altogether,472suchlegitimaterepositorieswereconsid-eredtobecontaminated,duetoamisconfigurationflawneverreportedbefore,whichallowsarbitrarycontenttobeuploadedandexistingdatatobemodifiedwithoutproperauthorization.
TheimpactoftheseBarsissignifi-cant,infecting1306legitimatewebsites,includingAlexatop300siteslikegroupon.
com,Alexatop5000siteslikespace.
com,etc.
WereportedourfindingstoAma-zonandleadingorganizationsaffectedbytheinfections.
Grouponhasalreadyconfirmedthecompromisewedis-coveredandawardedusforourhelp.
Whenitcomestomaliciousbuckets,ourstudybringstolightnewinsightsintothisnewwaveofrepositorybasedcyber-attacks,includingtheimportanceofBarstomaliciouswebactivitiesandthechallengesindefendingagainstthisnewthreat.
Morespecifically,wefoundthatonaverage,oneBarserves152maliciousorcom-promisedsites.
Inoneofthelargecampaignsdiscoveredinourre-search,theBarcloudfront_file.
enjin.
comhostsamaliciousscriptthatwasinjectedintoatleast1020websites(Section"Prevalenceandsharing").
TheseBarssitrightatthecenteroftheattackinfrastructure,supportingandco-ordinatingothermaliciousactors'operationsatdif-ferentstagesofacampaign.
Interestingly,wefoundthattheycouldbestrategicallyplacedondifferentcloudplatforms,makingthemhardtoblock(duetothepopularityoftheirhostingcloudslikeGoogle)anddetect(scatteredacrossdif-ferentproviders),andeasytoshareacrossmultiplecam-paigns.
Asanexample,thePotentiallyUnwantedPrograms(PUP)campaignwefoundfirstloadsaredirectionscriptfromaBaronAkamaihd(theworld'slargestCDNplat-form)toleadthevictimtotheattackweb-site,thenfetchesPhishingpicturesfromanAmazonS3Bar,andfinallyde-liversthemalwarestoredonCloudfronttothetargetsys-tems(Section"CaseStudies").
Inthepresenceofsuchmeticulouslyplannedattacks,thecloudserviceprovidersapparentlyareinade-quatelyprepared,possiblyduetotheprivacyconstraintsintouchingtheircustomers'repositor-ies.
WefoundthatmanyBarssurviveamuchlongerlife-timethanthatofthemaliciouscontenthostedonwebsites.
FurthercomplicatingthemissionofBaridentificationareotherevasiontechniquestheadversaryemploys,includingcodeobfuscationanduseofaredirectionchainandcloak-ingtechniquestoavoidexposingmaliciouspayloadstoamalwarescanner.
ContributionsThecontributionsofthepaperareasfollows:Newunderstanding.
Weperformedthefirstsystematicstudyoncloudrepositoriesasamaliciousservice,anemergingsecuritythreat.
Forthefirsttime,ourstudyrevealsthescopeandmagnitudeofthethreatanditssignificantimpact,particularlyontheinfrastructuresofillicitwebactivities.
Thesefindingsbringtothespotlightthisimportantyetunderstudiedproblemandleadtoabetterunderstandingofthetechniquestheadversaryemploysandtheirweaknesses.
Thiswillcontributetobetterdefenseagainstandultimateeliminationofthethreat.
Newtechnique.
Basedonourunderstandingofbadcloudrepositories,wetakeafirststeptowardautomaticallydetectingthem.
Thetechniquewedevelopedreliesonthetopologicalrelationshipbetweenacloudrepositoryandthewebsitesitserves,whicharedifficulttochangeandeffectiveatcapturingmaliciousorcompromisedbuckets.
Ourevaluationoveralargenumberofpopularwebsitesdemonstratesthepotentialofthetechnique,whichcouldbeutilizedbybothcloudprovidersandthirdpartiestoidentifythethreatsposedbyBars.
BackgroundCloudhostingCloudhostingisatypeofinfrastructureasaservice(IaaS),whichisrentedbythecloudusertohostherwebassets(e.
g.
,HTML,JavaScript,CSS,andimagefiles).
Thesewebassetsareorganizedintocloudrepositoriesreferredtoasbucketswhichareidentifiedbyunique,2user-assignedkeys,thataremappedassub-domains.
Forexample,thesubdomainaws-publicdatasets.
s3.
amazonaws.
comidentifiesAmazonS3asthecloudplatformandaws-publicdatasetsastheuser'scloudbucketandrepository.
Suchnameas-signmentislabeledass3.
amazonaws.
com_aws-pub-licdatasetsthroughoutthispaper.
Also,eachbucketisprotectedbyanaccesscontrollistconfiguredbytheusertoauthorizerequestsforherresources.
Inrecentyears,wehaveseenanincreaseinpopularityoftheseservices.
Akeyfeatureofcloudhostingisbuilt-insitepublishing(Google.
Publishwebsitecontent2015),wherethewebassetsinthebucketcanbeserveddirectlytousersviafilenamesinarelativepathinthebucket(i.
e.
,cloudURL).
Forinstance,JavaScriptfileshostedinthecloudbucketcanbedirectlyruninthebrowser.
Also,thepay-as-you-gohostingiswellreceivedasaneconomicandflexiblecomputingsolution.
Asanexample,GoogleDrivetodayoffersafreewebhostingservicewith15GBofstorage,andanadditional100GBfor$1.
99/month,andGoDaddy'swebhostingstartsat$1/monthfor100GB.
Besidessuchfront-endwebsites,mainstreamcloudproviderstoday(AmazonS3,MicrosoftAzure,GoogleDrive,etc.
)allallowtheircustomerstostoredifferentkindsofwebcontentandotherresourcesintheircloudbuckets,servingasback-endrepositoriesthatcanbeLiaoetal.
Cybersecurity(2018)1:14Page3of18easilyaccessedbyfront-endapplications(liketheweb-site)andsharedacrossdifferentparties.
Figure2illus-tratesanexample,inwhichtheresourceownercreatesabucketonthecloudhostingplatformanduploadsascriptthere(CD);thisresource(i.
e.
,thescript)ismadepublicthroughacloudURL,whichcanbeembeddedintoanywebsite(@);wheneverthesiteisvisited(),re-questswillbegeneratedforfetchingthescript()anddeliveringittothevisitor'sbrowser().
Thebucketintheexampleistypicalofaservicerepository,whosere-sourcescanbefetchedandupdatedthroughacloudURL:forexample,thevisitorstatisticsofawebsitecanbecollectedthroughalink(s3.
amazonaws.
com/trk.
cetrk.
com/t.
js),whichdownloadsatrackingscriptfroms3.
amazonaws.
com_trk.
cetrk.
com,abucketownedbythetrackingwebsiteCrazyEgg.
Thisisdifferentfroma"self-serving"bucket,whoseresourcescanonlybeaccessedbythebucketowner'ssites.
Notethatourstudyfocusesonabusesofthistypeofcloudre-positories,regardlessoftheadditionalfunctionalitiestheymayhave(e.
g.
,CDNs,DDoSprotection,etc.
),sincethesefunctionalitiesdonotaffectthewaytherepositor-iesareusedbyeitherlegitimateormaliciousparties.
AdversarymodelInourresearch,weconsidertheadversarywhotriestousecloudbucketsonlegitimatecloudplatformsasservicere-positoriesforillicitactivities.
Forthispurpose,theattackercouldbuildherownmaliciousbucketorcompromiselegiti-mateones,andstorevariousattackvectorsthere,includingSpam,Phishing,malware,click-hijackingandothers.
Thesebucketsareconnectedtofront-endwebsites,whichcouldbemalicious,compromisedorlegitimateonescontaminatedonlybytheBar.
FindingbarsonlineInthissection,weelaborateonouranalysisofasetofknownBars(theseedset)andthefeaturesidentifiedfordifferentiatingbenignrepositoriesandBars.
Thesefea-turesareutilizedinourresearchtobuildasimplewebscanner,BarFinder,fordetectingothermaliciousorcompromisedhigh-profile,previously-unknownreposi-toriesandthemaliciouscampaignsinwhichtheyserve.
FeaturesofbadrepositoriesOurstudyisbasedonasmallsetofconfirmedgoodandbadrepositoriesandtheirrelateddomains,whichwean-alyzedtofindouthowBars(badrepositories)differfromlegitimaterepositories.
Intheabsenceofdirectac-cesstothesebuckets,goodorbad,allwecandoistoinfertheirlegitimacyfromwhousethemandhowtheyareused(bydifferentdomains),thatis,thefeaturesofthedomainsandtheirinteractivitiesontheredirectionpathsleadingtothecloudrepository.
Ofparticularinter-esthereareasetofcollectivepropertiesidentifiedfromtheresourcefetchingchains(a.
k.
a.
,redirectionchains)forservingthecontentofBars,whichishardtochangebytheadversary,comparedwiththecontentfeaturesofFig.
2OverviewofthecloudhostingprocessLiaoetal.
Cybersecurity(2018)1:14Page4of18individualBars.
Below,weelaborateonthewaysuchdatawascollectedandthesalientfeaturesdiscoveredinourresearch,whichdescribehowtheadversaryattemptstohideBarsorusethemtocoverotherattackassets,aredirectionpatternneverobservedonlegitimaterepositories.
DatacollectionTobuildtheseedset,wecollectedasetofconfirmedmaliciousorcompromisedbuckets(calledBadset)andlegitimatebuckets(calledGoodset)aswellastheirre-lateddomains,asillustratedinTable1.
BadsetWeutilizedtwofeedsasthegroundtruthforgatheringbadcloudbuck-ets:theSpamtrapfeedandtheCleanMXfeed(Clean-MX2015).
TheformercomesfromaSpamhoneypotweconstructed(A.
authorsn.
d.
)thatreceivesaround10KSpamemailsperday,fromwhichcloudURLspromotedbytheemailswereex-tractedwhichmayincludespamresourcessuchasHTML,images,andscripts.
Thelatterincludesthehis-toricaldataofCleanMX,apopulardomainscanningen-gine,fromwhichcloud-relatedURLswerecollected.
Forbothfeeds,wefurthervalidatethembyVirusTotal(Vir-usTotal2015)andmanualinspections(e.
g.
,lookingforPhishingcontent)toensurethattheywereindeedbad(toavoidcontaminatingthedatasetwithlegitimatebucketsusedinmaliciousactivities).
UsingthecollectedsetofmaliciouscloudURLsfrombothfeeds,weex-tractedtheirrepositories,whichledto100confirmedBars.
GoodsetThegoodbucketsweregatheredfromtheAlexatop3Kwebsites,whichareconsideredtobemostlyclean.
Tothisend,wevisitedeachwebsiteusingacrawler(asaFirefoxadd-on)torecordtheHTTPtraf-fictriggeredbythevisit,includingnetworkrequests,re-sponses,browserevents,etc.
Fromthecollectedtraffic,weextractedtheHTTPcloudrequestURLscorrespond-ingto300cloudbucketshostedon20leadingcloudhostingserviceslikeAmazonS3,GoogleDrive,etc.
(seeTable6inAppendixforthecompletelist).
NotethateventhoughsomeofthemprovideCDNserviceorDDOSprotection,theyareallprovidedhostingservicetoactascloudrepository.
Bucket-servedsitesandtheirHTTPtrafficWecol-lectedHTTPtrafficusingthecrawlermentionedabovetovisitalistofwebsitesusingbucketsforfeatureextrac-tion.
Ratherthanblindlycrawlingthewebtofindthosesites,weadoptedamoretargetedstrategybycrawlingthesitesfoundtocontainlinkstothecloudinthepast.
WebuiltthesitelistwiththehelpofCommonCrawl(Crawl2015),apublicbigdataprojectthatcrawlsabout5billionwebpageseachmonththroughalarge-scaleHadoop-basedcrawlerandmaintainslistsofthecrawledwebsitesandtheirembeddedlinks.
SearchingtheCom-monCrawl(Crawl2015)dataset,collectedinFebruary2015,forthewebsitesloadingcontentfromthe400cleanandmaliciousbucketsidentifiedabove,wefound141,149websites,wereusedbyourcrawler.
TopologicalfeaturesWefirstinspectedthetopologyoftheredirectioninfrastructureassociatedwithaspecificbucket.
Suchaninfrastructureisacollectionofredirectionpaths,witheachnodebeingaFullyQualifiedDomainName(FQDN).
Oneachpath,thebucketiseitheranodewhenitdirectlyparticipatesinaredirection(e.
g.
,itscloudURLdeliversaredirectionscripttothevisitor'sbrowser)orsimplyapassiverepositoryprovidingresourceslikepicturestootherdomains.
Figure3illustratesexamplesofredirectionpathsleadingtotworeal-worldrepositor-ies,oneforalegitimatebucketcloudfront.
-net_d24n15hnbwhuhnandtheotherforaBars3.
amazonaws.
com_cicloudfront.
AkeyobservationfromourstudyisthattheredirectioninfrastructureleadingtoaBartendstoincludethefeaturesforprotectingtheBarfrombeingdetectedbywebscanners,presumablyduetothefactthattherepositoryisoftencon-sideredtobeavaluableassetfortheadversary.
Specifically,wefoundthattypically,thereareafewgatekeepernodessit-tinginfrontofaBar,servingasanintermediarytoproxytheattemptstogetresourcesfromtheBar.
Examplesofthegatekeepersincludefp125.
mediaoptout.
comanditsdownstreamnodesinFig.
3b.
Onthetopologyofsuchaninfrastructure,thesegatekeepersarethehubsreceivingalotofresource-accessconnectionsfromentrysites(thefirstnodeonaredirectionpath,seeFig.
3).
Alsointerest-ingly,ourresearchshowsthatsomegatekeeperscanaccesstheBarthroughmultiplepaths.
Forexample,inFig.
3b,krd.
semantichelper.
comcaneithergostraighttos3.
amazonaws.
com_cicloudfrontortakeade-tourthroughp306.
atemada.
com.
ThisstructurecouldbecausedbythecloakingofthegatekeeperforhidingtheBar,orconstructedtomaintainaccesstotherepositoryTable1Summaryresultsoftheseeddataset#ofbuckets#oflinkedwebsites#ofaveragelinkedwebsite#ofredirectionpathsBadset10012,468133468,480Goodset300128,6818642,659,304Liaoetal.
Cybersecurity(2018)1:14Page5of18evenwhennodes(like1.
semantichelper.
com)aredown(detected,cleaned,etc.
).
Notethatsuchaprotectionstructuredoesnotexistonthepathstoabenignrepository(Fig.
3a):normally,theresourceshostedinarepository(e.
g.
,jQuery)isdirectlyfetchedbythewebsiteusingit,withoutgoingthroughanyredirection;eveninthepresenceofredirections,therewillnotbeanygatekeeper,nottomentionattemptstocloakorbuildabackuppath.
Toidentifythisunique"protection"structure,weutilizetwocollectivefeatures:bucketusagesimilarity(BUS)thatcapturesthetopologyinvolvinghubs(gatekeepers)andconnectionratio(CR)thatmeasurestheinteractivitiesacrossdifferentredirectionpaths(whichpointtotheex-istenceofcloakingbehaviorortheattemptstomaintainback-uppathstotheBar).
Specifically,consideraredirec-tiongraphG=(V,E)(asillustratedinFig.
3),whereVisthesetofnodes(theFQDNsinvolvedinaredirection)andEisasetofedgesfromonenodetothenextoneonindividualpaths:E={ei,j|nodeiprecedesnodejonapath}.
TheBUSismeasuredby1i,whereiisthenumberofimmediatepredecessornodestoarepository(thedomainsconnectingtothere-pository)andsisthetotalnumberofentriesoftherepository'sredirectiongraph.
TofindouttheCR,wefirstremovethebucketbandalltheedgestowhichitisattached(iftheyexist)togetanothergraphGt=GGb,whereGb=(b,Eb)andEb=eb,j.
NotethateachgraphGtisassociatedwithonebucket.
Then,fromGt,wefindoutthenumberofconnectedcomponentsnandcalcu-lateCR=1n(seeFig.
3foranexample).
Bothcollectivefeatureswerefoundtobediscrimina-tiveinourresearch.
Figure4aandbcomparethecumu-lativedistributions(CDF)oftheratiosbetweenBadandGoodsets.
Aswecanseefromthefigures,Barstendtohavehigherratiosthanbenignones:theaverageBUSis0.
87fortheBarsand0.
79forthelegitimaterepositoriesandtheCRis0.
85forthebadrepositoriesand0.
67forthegoodone.
Asmentionedearlier,thisiscausedbythefactthatasmallsetofgatekeepersnodesareoftenplacedthereforprotectingtheBarswhiletheredirectionchainstowardsthegoodrepositoriesaremuchmoredir-ectandindependent:differentorganizationstypicallydonotgothroughanintermediarytoindirectlyaccessthepublicrepositorylikejQuery,andevenwithinthesameorganization,useofsucharesourceisoftendirect.
Al-thoughtherecanbeexceptions,ourmeasurementstudyshowsthatingeneral,thestructuraldifferencesbetweenmaliciousandlegitimaterepositoriesarestark.
Also,wefoundthatoccasionally,aBaritselfmayserveasagatekeeper,runningscriptstohidemorevaluableattackassets,suchastheattackserverorothermaliciouslandingsites.
Whenthishappens,almostalwaystheBarleadstoasmallsetofsuccessorsonredirectionpaths(e.
g.
,attackservers,landsites).
Thisisverydifferentfromtheredirec-tionperformedbythescriptfromabenignrepository,forexample,cloudfront.
net_d24n15hnbwhuhn.
Insuchcases,thetargetsofredirectionsareoftenverydiverse.
Basedonthisobservation,wefurthermeasurethelandingsimilarity,LS=1l,wherelisthenumberoftheuniquelastnodesontheredirectionpathsassociatedwitharepository.
Again,asillustratedinFig.
4c,ourstudyshowsthatredirec-tionpathsinvolvingBarssharefewerendnodesthanlegit-imateones,andtherefore,therelatedredirectiongraphs(forBars)haveahigherlandingsimilarity(0.
94vs0.
88).
ContentandnetworkfeaturesInadditiontotheirdistinctivetopologicalfeatures,wefoundthatthenodesontheredirectionpathsattachedtoaBaroftenexhibitremarkablehomogeneityintheircontentandnetworkproperties.
Particularly,forthewebsitesdirectlyconnectingtotherepository,wefoundthattheytypicallyuseasmallsetoftemplates(likeapi2.
amplitude.
comcdn.
amplitude.
comapi2.
amplitude.
comcdn.
amplitude.
comjandhyala.
comakamaihd.
net_apispringsmartne-afp125.
mediaoptout.
comkrd.
semantichelper.
comp306.
atemda.
com1.
semantichelper.
coms3.
amazonaws.
com_cicloudfrontcloudfront.
net_d24n15hnbwhuhnapi2.
amplitude.
comcdn.
amplitude.
com(a)(b)Fig.
3Exampleoftheredirectioninfrastructureleadingtothele-gitimatebucketcloudfront.
netd24n15hnbwhuhn(a)andtheBars3.
amazonaws.
comcicloudfront(b),whichareinREDcolorLiaoetal.
Cybersecurity(2018)1:14Page6of18WordPress)tobuilduptheirwebpages,includesimi-larDOMpositionsforscriptinjection,carryingsimi-larIPaddressesorevenhavingthesamecontentmanagementsystem(CMS)vulnerabilities,etc.
Thesepropertiesturnouttobeverydiverseamongthoseutilizingalegitimatecloudrepository.
Forexample,allwebsiteslinkingtoaGoogleDriveBarhavetheirmaliciouscloudURL(forinjectingascript)placedatthebottomoftheDOMofeachwebsite.
Inanotherexample,wefoundthatthefront-endsitesusingaCloudfrontBaractuallyallincludeavulnerableJCEJoomlaextension.
Tobetterunderstandthediversityofsuchwebsites,wetrytocomparethemaccordingtoasetofcontentandnet-workproperties.
Inourresearch,weutilizedthepropertiesextractedbyWhatWeb(WhatWeb2015),apopularweb-pagescanner.
WhatWebisdesignedtoidentifythewebtechnologiesdeployed,includingthoserelatedtowebcon-tentandcommunication:e.
g.
,CMS,bloggingplatforms,statistic/analyticspackages,JavaScriptlibraries,socialmediaplugins,etc.
Forexample,fromthecontentweobtainthepropertypasakey-valuepairp=(k,v)=(wordpress,opensearch),whichindicatesthewebsiteusingwordpresspluginopensearch.
Fromourseeddataset,thescannerautomaticallyex-tracted372keysof1,596,379properties,andthenweclusteredthekeysinto15classessuchasAnalyticsandtracking,CMSandplugin,Meta-datainformation,etc.
,followingthecategoriesusedbyBuiltWith,awebtech-nologysearchengine(BuiltWith2015).
SomeexamplesofthesepropertiesarepresentedinTable2.
InadditiontothesepropertiesextractedbyWhatWeb,weaddedthefollowingpropertiestocharacterizecloudURLs,includ-ingthepositionoftheURL,theorderinwhichdifferentbucketsappearinthewebcontentandthenumberofcloudplatformsusedinapage.
Basedontheseproperties,againweutilizedatopo-logicalmetrictomeasuretheoverallsimilarityacrosssites.
Specifically,therelationsamongallthesites(con-nectingtothesamebucket)inthesamecategory(Ana-lyticsandtracking,CMSandplugin,etc.
)aremodeledasagraphGt=(Vt,Et,P),whereVtisthesetofthewebsites,whicharecharacterizedbyacollectionofpropertiesP,andEtisthesetofedges:Et={ei,j|websiteiandjsharep∈P},thatis,bothsiteshavingacommonproperty.
Overthisgraph,thesitesimilarityiscalculatedasSiS1njV1j.
Herenisthenumberofconnectedcomponentsinthegraph.
Inourresearch,wecomputedSiSacrossallthecat-egoriessummarizedfromtheseeddataset,andcom-paredthosewithBarsagainstthosewiththelegitimatebuckets.
Again,thesitesusingBarsarefoundtosharemanymorepropertiesandthereforeachieveamuchhighersimilarityvaluethanthoselinkingtoagoodbucket.
Thisislikelycausedbymass-productionofmali-cioussitesusingthesameresources(templates,pictures,etc.
)providedbyaBarorutilizationofthesameexploittoolstoredinaBarforcompromisingthesiteswiththesamevulnerabilities.
Therefore,suchsimilarityisinher-enttotheattackstrategiesandcanbehardtochange.
BarFinderDesignThedesignofBarFinderincludesawebcrawler,afea-tureanalyzer,andadetector.
Thecrawlerautomaticallyscansthewebforcloudbuckets(embeddedinwebcon-tent)andthenclusterswebsitesaccordingtothebucketstheyuse.
Fromeachcluster,theanalyzerconstructsare-directiongraphandacontentgraphasdescribedearlier(Section"FeaturesofBadRepositories"),onwhichitfur-thercalculatesthevaluesforasetofcollectivefeaturesincludingdisconnectionratio(D),bucketusagesimilar-ity(B),landingsimilarity(L)andaseriesofcontentproperty/networkpropertysimilarities(S1Sn)fornweb-technologycategories(e.
g.
,analyticsandtracking,(a)(b)(c)Fig.
4Barsshowsmallertopologicaldiversity.
aCumulativedistributionofbucketusagesimilarityperofconnectedratiopercloudlandingsimilaritypercloudbucket.
bCumulativedistributionbucketusagesimilarityperofconnectedratiopercloudlandingsimilaritypercloudbucket.
cCumulativedistributionofbucketusagesimilarityperofconnectedratiopercloudlandingsimilaritypercloudbucketLiaoetal.
Cybersecurity(2018)1:14Page7of18CMSandplugin,meta-datainformation,etc.
).
Theout-putofthisfeatureanalysisisthenpassedtothedetector,whichmaintainsamodel(trainedontheseeddataset)todeterminewhetherabucketismalicious,basedonitscollectivefeatures.
Specifically,thecrawlervisitseachwebsite,inspectingitscontent,triggeringevents,recordingtheredirectionpathsitobservesandparsingURLsencounteredusingthepatternsofknowncloudplatformstorecognizecloudbuckets.
Forex-ample,therepositoryonAmazonS3isaccessedthroughtheURLformattedasw+.
s3w+[].
amazonaws.
com,andAmazonCloudFrontpro-ducesresourceURLsintheformofw+.
cloudfront.
net.
Inourresearch,20cloudplatformswereexaminedtoidentifythebucketstheyhost.
Atthefeature-analysisstage,foreachbucket,BarFinderinspectsallitsredirec-tionpaths,convertseverynodeintoanFQDNtocom-putetheirtopologicalfeatures,andthenconnectsdifferentnodesaccordingtotheircontentandnetworkpropertiestofindouttheirsitesimilarities,asdescribedinSection"FeaturesofBadRepositories".
Next,eachcloudbucketiisuniquelycharacterizedbyavector:Di,Bi,Li,Si,1Si,n,witheachelementacollectivefeature.
Individualfeatureshavedifferentpowerindif-ferentiatinggoodandbadbuckets,whichwemeasuredusingtheF-Score(Bishop2006)(seeTable3).
Notethatthefeaturewithalargescorecanbetterclassifythesevectorsthantheonewithasmallvalue.
Therefore,abinaryclassifierwithamodelforweighingthefeaturesandotherparameterscanbeusedtoclassifythevectorsetanddeterminewhetherindividualbucketsarelegit-imateornot.
Suchamodelislearnedfromtheseeddataset.
Inourresearch,weutilizedaSupportVectorMachine(SVM)astheclassifier,whichshowedthebestperformanceamongotherclassificationalgorithms(seeTable4).
ItsclassificationmodelisbuiltupontheF-Scoresforthecollectivefeatures(D,B,etc.
)andathresholdsetaccordingtothefalsepositiveandnegativediscoveryexpectedtoachieve.
Foreachbucketclassified,theSVMcanalsoreporttheconfidenceoftheclassification.
ImplementationThissimpledesignwasimplementedinourstudyintoaprototypesystem.
ThewebcrawlerwasbuiltasaFirefoxadd-on.
Intotal,20suchcrawlersweredeployed.
WefurtherdevelopedatoolinPythontorecovercloudURLsfromthewebcontentgatheredbyCommonCrawl.
Thefeatureanalyzerincludesaround500linesofPythoncodeforprocessingthedatacollectedbythecrawlerandcomputingthecollectivefeatures(Section"FeaturesofBadRepositories").
Eachfeatureinthevec-torisnormalizedusingtheL1normbeforepassedtotheSVMclassifier.
Inoursystem,weincorporatedtheSVMprovidedbythescikit-learnopen-sourcemachinelearninglibrary(Sklearn2015).
EvaluationHerewereportourevaluationofBarFinderonboththegroundtruthandtheUnknownsets.
AlltheexperimentswereconductedwithinanAmazonEC2C4.
8xlargeTable2ExamplesofcontentandnetworkfeaturesCategoryFeatureExampleContentCMSplatforminformationandtheirplugin(wordpress,allinoneSEOpack)Meta-datainformation(metagenerator,drupal7)CloudURLinformation(position,bottom)Advertising(adsense,asynchronous)Javascriptlibrary(JQuery,1.
9.
1)Analyticsandtracking(Google-Analytics,UA-2410076-31)Widget(addthis,welcomebar)DocInfotechnologies(opengraphprotocol,null)NetworkIdentity(IP,216.
58.
216.
78)Cookie(Cookie,harbor.
session)Serverframeworkversion(Apache,2.
4.
12)CustomHTTPheader(X-hacker,Ifyoure.
.
)Table3F-scoreoffeaturesLiaoetal.
Cybersecurity(2018)1:14Page8of18instanceequippedwithIntelXeonE5–266636vCPUand60GiBofmemory.
EvaluationontheseedsetWetestedtheeffectivenessofBarFinderoverourground-truthdataset(i.
e.
,theseedset)throughthestandardfive-foldcrossvalidation:thatis,4/5ofthedatawasusedfortrainingtheSVMandtheremaining1/5forevaluatingtheaccuracyofBardetection.
Specifically,werandomlychose80Bars(outof100)fromtheBadsetand240(outof300)legitimatebucketsfromtheGoodset,togetherwiththerelatedwebsites(outof141,149).
Thesedatawerefirstprocessedbyourprototypetoadjusttheweightsandotherparametersforitsmodel.
Thenwetestedthemodelontheremainingdataset(20Bars,60legitimatebuckets).
Theprocessisthenrepeated5times.
BarFinderachievedbothalowfalsediscoveryrate(FDR:1-precision)andahighrecallindetection:only5.
6%ofreportedBarsturnedouttobelegitimate(i.
e.
,1.
6%offalsepositiverate),andover89.
3%oftheBarsweredetected.
WefurthershowtheAreaUnderCurve(AUC)oftheReceiverOperatingCharacteristics(ROC)graph,whichcomesverycloseto1(0.
96),demonstratingthegoodbalancewestrikebetweentheFDrateandthecoverage.
ThispreliminaryanalysisshowsthatthecollectivefeaturesofthesitesconnectingtocloudrepositoriesarepromisingindetectingBars.
EvaluationontheunknownsetWenowuseBarFindertoscananunknownset.
Thisun-knownsetcontainsHTTPtrafficcollectedusingacrawlerasdescribedinSection"FeaturesofBadRepositories"tovisitalistofwebsites.
Thislistofwebsitesisalsoextractedfromcommoncrawl(Crawl2015)bysearchingforweb-sitesthathaveloadedsomecontentinthepastfromthecloudplatformslistedinTable6inAppendix.
Asaresult,theunknowndatasetcontainedHTTPtrafficgeneratedfromdynamicallyvisiting1Mwebsitesloadingcontentfrom20cloudplatformsand6885cloudbuckets.
Tovalidateourevaluationresults,weemployameth-odologythatcombinesanti-virus(AV)scanning,black-listchecking,andmanualanalysis.
Specifically,fortheBarsflaggedbyoursystem,wefirstscantheircloudURLswithVirusTotalformalwareandcheckthemagainstthelistofsuspiciouscloudURLscollectedfromourSpamtraphoneypotforSpam,Phishing,blackhatSearchEngineOptimization(SEO),etc.
InthecaseofVirusTotal,aURLisconsideredtobesuspiciousifatleasttwoscannersraisethealarm.
AllsuchsuspiciousURLs(fromeitherVirusTotalortheSpamtraplist)arecross-checkedagainsttheblacklistofCleanMX.
Onlythosealsofoundtherearereportedtobeatruepositive.
OnceaURLisconfirmedmalicious,itscorrespondingbucketislabeledasbad.
Thoseunlabeledbutflagged(byBarFinder)bucketsarefurthervalidatedmanually.
Intheexperiment,BarFinderreportedatotalof730Bars,about10.
6%ofthe6885buckets.
Amongthem,theAVscanningandblacklistverificationconfirmedthat502bucketswereindeedbad.
Theremaining228weremanu-allyanalyzedthrough,e.
g.
,inspectingtheresourcesinthebucketsforphishingorscamcontent,runningscriptsintheVMtocapturebinarycodedownload.
Thisvalidationfurtherconfirmed192Bars.
TheFDRwasfoundtobeatmost5%(assumingthosenotconfirmedtobelegitimate),inlinewiththefindingfromtheseedset.
MeasurementanddiscoveriesBasedonthediscoveriesmadebyBarFinder,wefurtherconductedameasurementstudytobetterunderstandthefundamentalissuesaboutBar-basedmaliciousser-vices,particularlyhowthecloudrepositorieshelpfacili-tatemaliciousactivities,howtheadversaryexploitedlegitimatecloudbucketsandwhytheadversaryusesBarsinthefirstplace.
Ourresearchshowsthatontheinfra-structure,Barsplayapivotalrole,comparedwiththecontentkeptonothermaliciousorcompromisedsites,possiblybecausetheyarehostedonpopularcloudser-vices,andthereforehardtoblacklistandalsoeasytoshareacrossdifferentcampaigns.
Also,inamaliciouscampaign,theadversarymaytakeadvantageofmultipleBars,atdifferentattackstages,toconstructacompli-catedinfrastructurethatsupportshermission(Section"Prevalenceandsharing").
Moreimportantly,wediscov-eredthattheadversaryeffectivelyexploitedmisconfiguredlegitimatebucketstoinfectalargenumberoftheirfront-endwebservices(Section"BucketPollution").
Suchobservations,togetherwiththechallengeinblockingBars,offerinsightsintothemotivationformovingtowardthisnewtrendofrepository-basedattacks.
PrevalenceandsharingLandscapeAsmentionedearlier,BarFinderreported730suspiciousrepositoriesfrom6885cloudbucketsover20cloudplat-forms.
Amongthem,weutilized694confirmedBars(throughAV/blacklistscanningormanualvalidation,seeSection"BarFinder")forthemeasurementstudy.
TheseBarswerefoundtodirectlyserve156,608domains(i.
e.
,front-endwebsites),throughwhichtheyarefurtherTable4Performancecomparisonunderfive-foldacrossvalidationClassifierPrecisionRecallSVM0.
940.
89DecisionTree0.
90.
83LogisticRegression0.
910.
87NaiveBayes0.
90.
79RandomForest0.
850.
82Liaoetal.
Cybersecurity(2018)1:14Page9of18attachedto6,513,519redirectionpathsinvolving166,772domains.
Figure5illustratesthenumberofBarswefoundondifferentcloudplatforms.
Amongthem,AmazonS3isthemostpopularoneinourdataset,host-ingthemostBars(45%),whichisfollowedbyCloud-Front(Amazon'sCDN)25.
1%andAkamaihd9.
3%.
Notethatofthese20clouds,sevenofthemprovidefreestor-ageservices(e.
g.
,15GBfreespaceonGoogleDrive,5GBforAmazonS3),andthereforeeasilybecometheidealplatformsforlow-budgetmiscreantstodistributetheirillicitcontent.
Also,elevenofthemsupportHTTPS,onwhichmaliciousactivitiesaredifficulttocatchbyexist-ingsignature-basedintrusiondetectionsystemslikesnortandShadow(Snort2015;Symantec2015).
Inter-estingly,onsomeofthemostprominentplatforms,themiscreantsarefoundtotakeadvantageofthecloudpro-viders'reputationstomaketheirPhishingcampaignslookmorecredible:forexample,wefoundthatthead-versarycontinuouslyspoofedGmail'sloginpageonGoo-gleDrive,andthesoftwaredownloadpageforAmazonFireTVinanAmazonS3bucket.
Figure6showsthedistributionofBars'frontendweb-sitesacross81countries,asdeterminedbythegeoloca-tionsofthesites.
ThenumberofBars'frontendsitesineachcountryisrankedanddescribedwithdifferentlevelsofdarknessinthefigure.
WeobservethatmostofthesefrontendsstayinUnitedStates(14%),followedbyGermany(7%)andUnitedKingdom(5%).
ContentsharingOurresearchrevealsthatBarshavebeenextensivelysharedamongmaliciousorcompromisedwebsites,alsoacrossdifferentpositionsonmaliciousredirectionchains.
Figure7cillustratesthecumulativedistributionofBars'in-degreesintheirindividualredirectiongraphs:thatis,thenumberofthesitesutiliz-ingtheseBars.
Onaverage,eachBarshowsupon252sitesand12%ofthemareusedbymorethan200websites.
Table5liststhe10mostpopularBarswefound.
Amongthem,eight,includings3.
amazonaws.
com_content.
sitezoogle.
Com,s3.
amazonaws.
com_publisher_configura-tions.
shareaholic,etc.
,hostservicesforwebsitegeneration,blackhatSEOorSpam.
Particularly,aka-maihd.
net_cdncache3-aturnsouttobeadistribu-torofAdware,whosescriptsareloadedintothevictim'sbrowsertoredirectittoothersitesfordownloadingdiffer-entAdware.
Also,wefoundthatanotherBars3.
amazo-naws.
com_files.
enjin.
Comhostsexploitsutilizedby1020badsites.
FindingBarscanhelptoeffectivelyde-tectmoresiteswithmaliciouscontents.
Anotherinterestingobservationisthatmaliciouscon-tentisalsoextensivelysharedacrossdifferentBars.
Tounderstandsuchcontentreuse,wegroupedthemali-ciousprogramsretrievedfromdifferentBarsbasedonthesimilarityoftheircodeintermsofeditdistance.
Spe-cifically,weremovedthespaceswithinthepro-gramsandranthePythonlibraryscipy.
cluster.
hierarchy.
linkage(Scipy2015)tohierarchicallyclusterthem(nowintheformofstrings)accordingtotheirJaroscores(Cohenetal.
2003).
Inthisway,wewereabletodiscoverthreetypesofcontentsharing:intra-bucketsharing,cross-bucketsharing,andcross-platformsharing.
Specif-ically,withintheAmazonbucketakamaihd.
ne-t_asrv-a,wefoundthatmanyofitscloudURLsareintheformofhttp://asrv-a.
akamaihd.
net/sd/[num]/[num].
js.
TheJavaScriptcodeturnsouttobeallidentical,exceptthateachscriptredirectsthevisi-tortoadifferentwebsite.
ThesimilarcodealsoappearsinanotherAmazonbucketakamaihd.
net_cdncache-a.
Asanotherexample,wediscoveredthesamema-liciousJavaScript(JS.
ExploitBlacole.
zm)fromtheBarsonCloudFrontandQiniudnrespectively,evenunderthesamepath(i.
e.
,media/system/js/modal.
js).
Moreover,wefoundthatattackersusedsub-domaingenerationalgo-rithmtoautomaticallygeneratesub-domainforBars,thenfurtherreusedthesamemaliciouscontentsfortheseBars.
Specifically,wefoundthat28contentsharingBarsonAka-maihdhavethesameformatintheirnames.
AttackersFig.
5Top10cloudplatformswithmostBars,comparedwiththeirtotalnumberofcloudbucketsinourdatasetLiaoetal.
Cybersecurity(2018)1:14Page10of18utilizedawordbankbasedsub-domaingenerationalgo-rithms(damballa2015),whichconcatenatesfixedtermsandase-riesofdomainnames(removedot),thentruncatesthestringifitslengthisover13,e.
g.
,apismarterpoweru-a(truncatedfromsmarterpowerunite.
com).
Thecom-monpatternsofBarsindicatethepotentialofdevelopinganaccuratedetectionprocedure.
CorrelationWefurtherstudiedtherelationshipsbetweendifferentBars,fetchedbythesamewebsites.
Fromourdataset,11,442(3.
5%)websitesarefoundtoaccessatleasttwoBars.
Amongthem,8283wereservedasfront-endweb-sites,and3159othersitesonredirectionchains.
Also,60.
9%ofthesesiteslinktotherepositoriesonthesamecloudplatformsand39.
1%usethoseondifferentplat-forms.
Insomecases,twobucketsareoftenusedto-gether.
Forexample,wefoundthataclick-hijackingprogramwasseparatedintothecodepartandthecon-figurationpart:theformeriskeptonCloudFrontwhilethelatterisonAkamaihd;thetwobucketsalwaysshowuptogetheronredirectionchains.
Suchaseparationseemstobedonedeliberately,inanattempttoevadedetection.
AlsowesawthatBarscarryingthesameFig.
6ImpactofBars'front-endwebsitesaroundtheglobe(a)(b)(c)Fig.
7Barsplaycriticalrolesinattackinfrastructures.
aCumulativedistributionofdegreespersites.
bPercentageofBarsineachpositionofredirectionpathnumberofin-degreesperBar.
(Ignoringthosetraceswithlengthof2).
cCumulativedistributionofpositionofredirectionpathnumberofin-degreesperBar.
(Ignoringthosetraceswithlengthof2)Liaoetal.
Cybersecurity(2018)1:14Page11of18attackvectorsareoftenusedtogether,whichareappar-entlydeliberatelyputtheretoservepartiesofthesameinterests:asanotherexample,acompromisedwebsitewasobservedtoaccessfourdifferentBarsondifferentcloudplatforms,redirectingitsvisitorstodifferentplacesfordownloadingAdwaretothevisitor'ssystem.
OurfindingsshowthatBarsarewidelydeployedinat-tacksandserveinacomplexinfrastructure.
Bar-basedmaliciouswebinfrastructureRoleinattackinfrastructuresActually,mostnodesonamaliciousinfrastructurearethemaliciouswebsiteswithnewlyregistereddomainsandthosethatarecompromised.
Tobetterunderstandthecrit-icalrolesofBars,wecomparedthosenodeswiththebadcloudbuckets.
Specifically,wefirstidentifiedbothtypesofnodesfromtheredirectionpathsandthenanalyzedthenumberofuniquepathseachmemberineithercategoryisassociatedwithandthepositionofthememberonthepath.
Figure7apresentsthecumulativedistributionofthepathsgoingthroughaBarandthatofacompromisedormalicioussite.
Asseeninthefigure,comparedwithothernodesontheinfrastructure,Barsclearlysitonmuchmorepaths(47.
4onaveragevs.
8.
6),indicatingtheirimportance.
Further,Fig.
7bshowsthehistogramofpositiondistri-butions(again,Barsvs.
badsites).
TheobservationisthatmoreBars(41%,11%)showupatthebeginningsandtheendsofthepathsthanbadwebsites(22%,5%),whichdemonstratesthattheyoftenactasfirst-hopredirectorsorattack-payloadrepositories.
Forexample,inourthree-month-longmonitoringofthecampaignbasedontheSpywaredistributionBarakamaihd.
net_rvar-a,wefoundthatbesidestheBar,320newly-registeredweb-sitesparticipatedintheattack;heretheBaractedverymuchlikeadispatcher:providingJavaScriptthatidentifiedthevictim'sgeolocationandthenusinganiframetoredir-ecthertoaselectedbadsite.
AttackvectorsandpayloadsInamaliciousinfrastructure,theattackersrunanattackvectortocompromisethevictim'ssystems(browser,cli-ent,server,etc.
),andthendeliveranattackpayload(e.
g.
,malware).
WefoundthatBarsoftenservetheinfrastruc-tureasrepositoriesforsuchvectorsandpayloads.
Inourstudy,weranAV/blacklistscannersandalsocheckedfingerprintscollectedthroughmanualanalysisonthewebsitesinourdataset(Section"BarFinder").
Weiden-tifyattackvectorsusingthelabelsgeneratedbytheAV/blacklistscannerswheneveravailable.
Figure8aandbil-lustrateourfindings,inwhichSpam,Phishing,FakeAVscamsandvulnerabilityexploitsareconsideredtobetheattackvectors,andvirus,Spyware,Trojan,maliciousscriptsandothermalware-relatedcontentaretreatedaspayloads.
Aswecanseefromthefigure,bothtypesofmaliciouscontentareextensivelyhostedbyBars.
Ontheotherhand,maliciousorcompromisedwebsitestypicallyonlyserveattackvectors(andretrievethepayloadfromdedicatedserversorthecloudrepositories)(Lietal.
2014;Lietal.
2013).
Interestingly,wefoundamaliciouspayload(CVE-2015-0029)onS3Bar,whichwasuploadedin2013whilethevulnerabilityisreleasedinFebruary,2015.
ThemaliciouspayloadcanstayactiveforarelativelylongtimeinBars.
RevenueestimateWealsoinvestigatedtherevenuesthatcouldbereceivedbytheadversaryleveragingmaliciousinfrastructure.
Tothisend,weutilizedamodelasproposedinthepriorre-search(Alrwaisetal.
2014;Mooreetal.
2011):R(t)=Nv(t)PaRawherethetotalrevenueR(t)duringthetimeperiodtiscalculatedfromthetotalnumberofac-tionstaken(i.
e.
,click-throughnumber,numberofvisi-torsNv(t)weighedbytheprobabilitythatthevisitorstakeactionPa)andtheaveragerevenueperactionRa.
Here,weassumethepricemodelaspayperaction:thatis,theadversarygetspaidonlywhenaspecifiedactionTable5Top10mostpopularBarsRankCloudbucket#offront-endsitesAvgpathlenPopularity1s3.
amazonaws.
comcontent.
sitezoogle.
com44292.
92.
8%2cloudfront.
netd3n8a8pro7vhmx18293.
31.
4%3s3.
amazonaws.
comassets.
ngin.
com16433.
21.
2%4s3.
amazonaws.
compublisherconfigurations.
shareaholic14342.
70.
9%5cloudfront.
netd2e48ltfsb5exy13404.
00.
9%6cloudfront.
netd1t3gia0in9tdj12973.
20.
9%7cloudfront.
netd2i2wahzwrm1n512492.
50.
8%8cloudfront.
netd202m5krfqbpi510622.
80.
8%9s3.
amazonaws.
comfiles.
enjin.
com10207.
10.
7%10akamaihd.
netcdncache3-a9766.
40.
6%Liaoetal.
Cybersecurity(2018)1:14Page12of18(e.
g.
,softwareinstallation)istakenbythevictims.
UsingaPassiveDNSdataset(DNSDB2015),whichcontainsDNSlookupsrecordedbytheSecurityInformationExchange,weestimatedthedailynumberofvisitorsinFebruary2015(datacrawlingtime,i.
e.
,attacktime)Nv=32M.
NotethattheprobabilityofavisitortakingactionPaisdifficulttoestimateduetothevariousmaliciouspagesondifferentsites.
Accordingtothepriorworks(Alrwaisetal.
2014),wesetPa=0.
02andRa=$1.
25.
TheseparametersyieldadailyrevenueforattackersutilizingBarsof32M0.
02$1.
25=0.
8millionUSdollarsperday,whichcorre-spondstoahugeamountofillicitprofit.
BucketpollutionPollutedrepositoriesTofindpollutedbuckets,wesearchedtheAlexatop20KwebsitesfortheBarsinourdatasetand276Barswerefound.
WhenalegitimatesitelinkstoaBar,thereasonmightbeeitherthewebsiteortherepositoryishacked.
Differentiatingthesetwosituationswithcer-taintyishard,andinsomecases,itmaynotbepossible.
Allwecoulddoistogetanideaabouttheprevalenceofsuchbucketpollution,basedontheintuitionthatifawebsiteislessvulnerable,thenitislesslikelytobecom-promised.
Tothisend,weranWhatWeb,apowerfulwebvulnerabilityscanner,onthesesitesandfound134Bar'sfront-endwebsitescontainvariousflaws,suchasusingCMSinvulnerableversion(e.
g.
wordpress3.
9),vulnerableplugins(e.
g.
,JCEExtension2.
0.
10)andvul-nerablesoftware(e.
g.
,Apache2.
2).
Theremaining142Bar'sfront-endwebsiteslookprettysolidinwebprotec-tionandthereforeitislikelythattheBarstheyincludewerepolluted.
Thissetofpotentiallycompromisedbucketstakes19%ofalltheBarsflaggedbyBarFinder.
Thesebuckets,togetherwiththeadditional30randomlysampledfromtheset,wentthroughamanualanalysis,whichshowsthatindeedtheywerelegitimatebucketscontaminatedwithmaliciouscontent.
MisconfigurationandimpactItisevenmorechallengingtodeterminehowthesebucketswerecompromised,whichcouldbecausedbyexploitingei-therthecloudplatformvulnerabilitiesorthebucketmis-configurations.
Withoutanextensivetestonthecloudplatformandtherepositories,whichrequiresatleastdirectaccesstothem,acomprehensivestudyontheissueisim-possible.
Nevertheless,wewereabletoidentifyamisconfig-urationproblemwidelyexistinginpopularbuckets.
Thisflawhasneverbeenreportedbeforebutwaslikelyknowntotheundergroundcommunityandhasalreadybeenutilizedtoexploittheserepositories.
Wereportedtheflawstothevendorsandtheyconfirmedourfinding.
Specifically,onAmazonS3,onecanconfiguretheaccesspoliciesforherbuckettodefineswhichAWSac-countsorgroupsaregrantedaccessandthetypeofaccess(i.
e.
,list,upload/modify,deleteanddownload):thiscanbedonethroughspecifyingaccesscontrollistontheAWSManagementConsole.
Oncethishappens,thecloudveri-fiesthecontentoftheauthorizationfieldwithintheclient'sHTTPrequestheaderbeforetherequestedaccessisallowedtogothrough.
However,wefoundthatbyde-fault,thepolicyisnotinplace,andinthiscase,thecloudonlycheckswhethertheauthorizationkey(i.
e.
,accesskeyandsecretkey)belongstoanS3user,nottheauthorizedpartyforthisspecificbucket:inotherwords,anyone,aslongassheisalegitimateuseroftheS3,hastherighttoupload/modify,deleteandlisttheresourcesinthebucketanddownloadthecontent.
Notethatthisdoesnotmeanthatthebucketcanbedirectlytouchedthroughthebrowser,sinceitdoesnotputanythingintotheauthorizationfield.
However,theadversarycaneasilybuildhisownHTTPheader,fillinginhisownS3key,asillustratedinFig.
10,togainaccesstothemisconfiguredrepository.
Inourresearch,weverifiedthatallsuchopera-tionscanbeperformedonanyrepositorieswiththecon-figurationflaw,whichsuggeststhatsiteoperatorsneedtotakemorecautionwhensettingtheconfigurationrules.
(a)(b)Fig.
8DistributionofattackvectorsandpayloadsofBars.
aDistributionofattackvectorsofBars.
bDistributionofpayploadsofBarsLiaoetal.
Cybersecurity(2018)1:14Page13of18Tounderstandtheimpactofthisproblem,wedevel-opedasimplewebtestingtool,whichcheckedabucket'sconfigurationusingourownS3key.
Byscanningall6885repositories(includingbothBarsandlegitimatebuckets),wediscoveredthat472arevulnerable,whichwereassociatedwith1306front-endwebsites.
TheAlexaglobalranksandthebounceratesoftheirfront-endwebsitesareillustratedinFig.
9aandb.
Sixty-threeper-centofthemhavebounceratesfrom20%to60%;9sitesarerankedwithinAlexatop5000(e.
g.
,groupon.
com,space.
com).
Focusingonthe104badbucketswiththeflaws,wefurthermanuallysampled50andconfirmedthatthesebucketswereindeedlegitimate,includinghigh-profileoneslikes3.
amazonaws.
com_groupon.
Further,lookingintothethesebuckets'fileuploadingtime(re-trievedfromthebucketsthroughtheflaw),wefoundthatinsomecases,theattackhasbeenthereforsixyears.
ParticularlytheAmazonbuckets3.
amazo-naws.
com_groupon,Groupon'sofficialbucket,wasapparentlycom-promisedfivetimesbetween2012and2015(seeSection"CaseStudies"fordetails),accordingtothechangestothebucketweobservedfromthebuckethistoricaldatasetwecollectedfromarchive.
org.
WealsoestimatedthevolumeoftraffictothoseBar-relatedsitesusingaPassiveDNSdataset(DNSDB2015),whichcontainsDNSlookupsrecordedbytheSecurityInforma-tionExchange.
Figure9cillustratesthetrafficoftheweb-sitesduringthetimeperiodwhentheirbucketswerecompromised,whichwasincreasedsignificantlycomparedwithwhatthosesitesreceivedbeforetheircompromise,indi-catingthattheylikelyreceivedalotofvisits.
Thisprovidesevidencethattheimpactofsuchcompromisedbucketsisindeedsignificant.
CasestudiesInthissection,wediscusstwoprominentexamples.
PUPcampaignOurstudyrevealsamaliciouswebcampaigndubbedPotentiallyUnwantedPrograms(PUP)distribution:theattackredirectsthevictimtoanattackpage,whichshowsherfakesystemdiagnosisresultsorpatchrequirementsthroughtheimagesfetchedfromaBar,inanattempttocheatthevictimintodown-loading"unwantedprograms"suchasSpyware,Adwareoravirus.
Thiscampaignwasfirstdiscoveredinourdataset.
Altogether,atleast11Barsfrom3dif-ferentcloudplatformsand772websites(nothostedonthecloud)wereinvolvedin.
Throughanalyzingtheredirectiontracesofthecam-paign,wefoundthattwoAka-maiBars,akamaid.
-net_cdncache3-aandakamaihd_asrv-a,frequentlyinjectscriptsintocompromisedwebsites,whichserveasfirst-hopredirectorstomoveavisitordowntheredirectionchainbeforehittingmaliciouslandingpages(thatservema-liciouscontent).
Interestingly,allthefollow-upredirectorsarecompromisedormaliciouswebsitesthatarenothostedonthecloud.
ThescriptsintheBarswerefoundtochangeovertime,redirectingthevisitortodifferentnext-hopsites(alsoredirectors).
Onaverage,thelifespanofsuchsitesisonly120h,buttheBarwasstillalivewhenwesubmittedthispaper.
Suchredirectionsendatatleast216mali-ciouslandingsites,whichallretrievedeceptiveimagesfromanAmazonS3buckets3.
amazonaws.
com_cicloud-front(aBarneverreportedbeforeandisstillalive).
Anexampleisasystemupdatewarning,asshowninFig.
1.
Fromthereposi-tory,wecollected134images,includingthoseforfreesoftwareinstallation,updatesonallmain-streamOSes,browsersandsomepopularapplications.
Ifsheclicksanddownloadstheprogrampromotedonthesite,thecodewillbefetchedfrommultipleBars,suchass3.
amazonaws.
com_wbt_mediawherethePUPputsaBitcoinmineronthevictim'ssystem,andcloud-front.
net_d12mrm7igk59vq,whoseprogrammodi-fiesChrome'ssecuritysetting.
(a)(b)(c)Fig.
9Alexaglobalrank,bouncerateandtrafficincreaserateofBar'sfront-endwebsites.
aCumulativedistributionofAlexaglobalranksperBar'sfront-endsites.
bCumulativedistributionofAlexaglobalranksperBars'Alexabouncerate.
cCumulativedistributionofBar'strafficincreaserateperBar'sfront-endsitesLiaoetal.
Cybersecurity(2018)1:14Page14of18GrouponBarWediscoveredthatamisconfiguredAmazonS3buckets3.
amazonaws.
com_grouponbelongstoGroupon(Alexaglobalrank265)(Fig.
10),aglobale-commercemarketplaceserving48.
1millioncustomersworldwide.
ThebucketwasusedastheresourcerepositoryforGroupon'sofficialwebsite(i.
e.
,groupon.
com)aswellasitsmarketingsites(12websitesobservedinourdataset).
Whentrackingitshistoricalcontentfromarchive.
org,weweresurprisedtoseethattheGrouponS3buckethasbeencompromisedatleasteighttimesinthepastfiveyears(e.
g.
,2015/08/06,2014/12/18,2014/06/25,2014/01/27,2014/02/26,2013/06/23,2011/11/08,2010/09/28).
Theseattackscauseddifferenttypesofmaliciouspay-loadstobeuploadedtotheirrepository,includingAdware,Trojan,virusandothers.
Eventhoughthebucketownerchangedtheaccesscontrolpolicyin2012topreventtheunauthorizedpartyfromdirectlylistingthebucketcontentthroughbrowser,itremainedaccessiblebyourtoolmentionedinSection"BucketPollution",whichconstructsanAuthorizationfieldinHTTPheader,andunauthorizedlisting,uploadandevenmodificationcanstilloccur.
DiscussionLimitationsofBarFinderAsmentionedearlier,Bardetectionishard,sincecloudrepositoriescannotbedirectlyaccessedbythepartiesoutsidethecloud.
Therefore,thegoalofBarFinderistoleveragethesitesservedbyBarstofindsuspiciousre-positories.
Forthispurpose,wechosetoutilizethecol-lectivefeaturesofthesesites,suchastheirtopologicalrelations,contentsharedacrosssites,etc.
Thisstrategycouldmaketheapproachmorerobust,asthecollectivefeaturesaremoredifficulttoevadecomparedwiththosefromindividualsites.
Ontheotherhand,itrequiresthatthepartyrunningthesystemfirstmakeseffortstogatherthesitesusingcloudbuckets,themorethebetter.
Further,therearerepositoriesthatonlyserveasmallsetoffront-endsites:e.
g.
,wefoundthatamongtheAlexatop3Ksites,67sitesareconnectingtothecloudbucketsonlyusedbythemselves.
Those"self-serving"bucketsareratherpopularinreputablewebsitessuchasappspot.
-com_android-siteonlyusedbyandroid.
com,s3.
amazonaws.
com_ttv-backgroundartonlyusedbytwitch.
tv,etc.
ThisfactmakesthebadapplesamongthemhardtocatchbyBarFindersimplybecausenotenoughsitesusingthemareouttheretoallowustodiffer-entiatethesetwotypesofrepositories.
DetectiontechniquescoveringthistypeofBarsneedtobedevelopedinthefollow-upresearch.
OtherdefensesagainstbarsBesidesthedetectioneffortmadebythethirdparty,asBarFinderdoes,morecanbedonetomitigatethethreatsposedbyBars,fromtheendsofthewebsiteowner,thebucketownerandtheserviceprovider.
Thewebsiteownercouldperformintegritychecksonthere-sourcesherwebsiteretrievesfromthebucket,makingsurethatitisnotcompromised.
Thecloudbucketownershouldcarefullyconfigurehercloudbuckettoavoidtheissuewefoundandothermisconfigurationflaws.
Inthiscase,anautomaticconfigurationcheckercouldbehelpful.
Mostim-portantly,thecloudproviderdoeshavetheresponsibilitytomovemoreaggressivelyondetectingandremovalofBarsfromtheirsystems.
This,however,isnon-trivial,giventheprivacyconcernandthefactthatsomeBarscanonlybeconsideredtobemaliciousbylookingatthemaliciousactivitiestheyareinvolvedin,suchasthosehostingPhishingpictures.
Fur-therresearchisneededtobetterunderstandwhattheprovidercandotoaddresstheissue.
EthicalissuesMostfindingsofthepaperweremadethroughanalyzingthedatacrawledfromthepublicdomain.
Regardingthestudyonthemisconfigurationproblemwefound,ourscannerwasdesignedtominimizetheprivacyimpactsonvulnerablerepositories:specifically,itonlytriedoutthefunctionalitylikefilelisting,uploadinganddown-loading.
Theimpactofsuchoperationsareverymuchinlinewiththoseofrunningonlinewebtestingtools(e.
g.
,Sucuri(Sucuri2015))onothers'websites.
Mostimport-antly,wedidthiswiththefullintentiontoprotectsuchrepositoriesfromfutureexploits,andalsocarefullyavoidedchanginganyexistingcontentthereanddeletedfromoursystemallthefilesdownloaded.
Further,wehavealreadycontactedthemajorvendorssuchasGrouponandthecloudproviderslikeAmazonaboutthosesecuritybreaches,andwillcontinuetonotifyothersandhelpthemfixtheconfigurationproblem.
Sofar,Grouponhasacknowledgedtheimportanceofourfindingsandexpressedgratitudeforourhelp.
RelatedworkCybercrimehostingserviceThecybercrimehostinginfrastructureisabasicbuildingblockofthecyber-crimeecosystem.
Itprovidesserversandnetworkinginfrastructureforthecybercriminals,anditalsopersistinthefaceoftakedownattemptsandcomplaintsofillicitactivities.
Alrwaisetal.
(Alrwaisetal.
Fig.
10ConstructedrequestheaderLiaoetal.
Cybersecurity(2018)1:14Page15of182014)investigatedatrendingandstealthycybercrimehost-ingservice,thatmaliciousactivitieswerehostedonthesub-allocationsoflegitimateserviceproviderNetworks.
Lietal.
(Lietal.
2013)alsofoundthatInternetServiceProvider(ISP)wereusedforabusivehosting.
Comparedwiththosepriorstudies,weshedlightontheemergingandevasivecybercrimehostingplatform:cloudservice,whichhasneverbeenstudiedbefore.
BadsitedetectionMaliciouswebactivitieshavebeenextensivelystudied(Invernizzietal.
2012;Invernizzietal.
2014;Mooreetal.
2011;Nelmsetal.
2015).
MostrelatedtoourworkhereistheuseofHTMLcontentandredirectionpathstodetectmaliciousorcompromisedwebsites.
Examplesforthecontent-baseddetectionincludeaDOM-basedclusteringsystemsformonitoringScamwebsites(Deretal.
2014),clas-sificationofwebsitesforpredictingwhethersomeofthemwillturnbadbasedonthefeaturesextractedfromHTMLsources,andamonitoringmechanism(Borgolteetal.
2013)(calledDelta)tokeeptrackofthechangesmadetothecontentofawebsitefordetectingscript-injectioncampaigns.
Forthoseusingmaliciousredirectionpaths,prominentpriorapproachesuseshortredirectionse-quencestocapturebadsites(Lietal.
2014),uniqueprop-ertiesofmaliciousinfrastructure(itsdensity)fordetectingdrive-bydownloads(Invernizzietal.
2014)ormalwaredis-tribution(Stringhinietal.
2013)andatrace-backmechan-ismthatgoesthroughtheredi-rectionpaths(Nelmsetal.
2015)forlabelingmalwaredownloadsites.
Comparedwiththosepriorstudies,whichallrelyonthepropertiesofthetargetstheytrytocapture,BarFinderutilizesthefeaturesfoundfromthefront-endwebsitesusingcloudbuckets,asthoserepositoriesmaynotbedirectlyaccessible.
Also,ourapproachleveragesasetofuniquecollectivefeatures,basedontheconnectedcomponentsofagraph,which,toourknowledge,hasneverbeenusedbefore.
CloudsecurityPreviousstudiesonsecurityandprivacyissuesincloudstorageprimarilyfocusontheconfidentialityofthedatastoredinthecloudortheattackstargetingthecloudcomputinginfrastructure.
Examplesincludethestudyonco-locatingattackvirtualmachines(VM)withthetargetoneonAmazonEC2(Ristenpartetal.
2009),whichenablesacache-basedside-channelattacktoinfersensitiveuserinformationfromthetarget(Zhangetal.
2014),andtheworkoncontrolled-channelattacksonmulti-usercloudhostingenvironment(Xuetal.
2015),whichallowsanuntrustedVMtoextractsensitiveinfor-mationfromprotectedapplications.
Morerecently,at-tentionhasmovedtoabuseofcloud-basedservicesforfraudulentactivities.
Forexample,priorresearch(Mulaz-zanietal.
n.
d.
)analyzedDropboxclientsoftwareanddiscoveredthatitcanbeexploitedtohidefileswithun-limitedstoragecapacity.
Additionally,(Hanetal.
2015)studiedtheuseoftheAmazonEC2tohostmaliciousdomainsactingascommandandcontrolcenters,exploitserversbydownloadingmalwaresamplesandexecutingtheminsandboxenvironmentstoanalyzetheirinterac-tionswiththecloud.
(Liaoetal.
2016)studiedthelong-tailSEOspamoncloudplatformsandmeasureditseffectiveness.
Ourstudydiffersfromtheseworksbypro-posingBarFindertoidentifymaliciouscloudrepositoriesandprovideanin-depthanalysisoftheuseofcloudre-positoriesinmaliciouscampaignsandhowtheycorrel-atewiththewebsitestheyserve.
Inanotherstudy,researchers(Idzioreketal.
2011)inspectedthefraudulenttraffictocloud-hostedpagesforthepurposeofsquander-ingtheuser'sresourcesandraisinghercloud-usagecost.
Theyalsoproposeddetectionmethodsbasedontheconsistencyoftherequests.
Unliketheseworks,ourre-searchinvestigatedtheabuseofcloudbucketasamali-ciousservice,anemergingnewcloud-basedsecuritythreatthathasneverbeenstudiedbefore.
ConclusionTheemergenceofusingcloudrepositoriesasamali-ciousservicepresentsanewchallengetowebsecurity.
Thisnewthreat,however,hasnotbeenextensivelystudiedandlittleisknownaboutitsscopeandmagni-tudeandthetechniquestheadversaryemploys.
Inthispaper,wereportthefirstsystematicstudyonmaliciousandcom-promisedcloudrepositoriesandtheilliciton-lineactivitiesbuiltaroundthem.
WecollectedasmallsetofseedingBarsandidentifiedasetofcollectivefeaturesfromthewebsitesconnectingtothem.
Thesefeaturesde-scribetheeffortmadebytheadversarytoprotectBarsandutilizethemtoquicklybuildupalargecampaign.
Usingthesefeatures,wedevelopedanewscannerthatde-tectedover600Barsontop-of-the-linecloudplatforms,includingGoogle,Amazon,andothers.
OvertheseBars,weper-formedalarge-scalemeasurementstudythatledtosurprisingfindingsofsuchattacks.
Examplesincludethecentralrolesthosebucketsplayateachstageofawebattack(redirection,displayingPhishingcontent,exploits,attackpayloaddelivery,etc.
),thestrategytoseparatemal-warecodeandconfigurationfilestoavoiddetection,andaconfigurationflawneverreportedbeforethatwaslikelyexploitedtocompromisemanycloudbuckets.
Ourfind-ingsmadeanimportantsteptowardbetterunderstandingandeffectivemitigatingofthisnewsecuritythreat.
Endnotes1Wehavemanuallyexaminedandconfirmedallthoseinstances.
2Thetermsrepositoriesandbucketsareusedinter-changeablythroughoutthispaper.
Liaoetal.
Cybersecurity(2018)1:14Page16of18AcknowledgementsWethankouranonymousreviewersfortheirusefulcomments.
FundingThisworkwassupportedinpartbytheNationalScienceFoundation(grantsCNS-1223477,1223495,1527141and1618493);AvailabilityofdataandmaterialsAllpublicdatasetsourcesareasdescribedinthepaper.
Authors'contributionsXL:designandexperiment;KY,LX:experiment;SA,XW,SH,RB:design.
Allauthorsreadandapprovedthefinalmanuscript.
CompetinginterestsTheauthorsdeclarethattheyhavenocompetinginterests.
Publisher'sNoteSpringerNatureremainsneutralwithregardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.
Received:24May2018Accepted:10September2018ReferencesHao,Shuang,etal.
(2013)Understandingthedomainregistrationbehaviorofspammers.
Proceedingsofthe2013conferenceonInternetmeasurementconference.
ACMAlrwaisS,YuanK,AlowaisheqE,LiZ,WangX(2014)Understandingthedarksideofdomainparking.
In:Proceedingsofthe23rdUSENIXsecuritysymposiumBishopCM(2006)Patternrecognitionandmachinelearning.
springerBorgolteK,KruegelC,VignaG(2013)Delta:automaticidentificationofunknownweb-basedinfectioncampaigns.
In:Proceedingsofthe2013ACMSIGSACconferenceonComputer&communicationssecurity,ACM,pp109–120Bucketsn.
d.
.
https://cloud.
google.
com/storage/docs/json_api/v1/buckets.
AccessedAug2018.
[On-line]BuiltWith.
Builtwith.
http://builtwith.
com/,2015.
AccessedAug2018.
[Online]Clean-MX.
Cleanmxrealtimedatabase.
http://support.
clean-mx.
de/clean-mx/viruses.
php,2015.
AccessedAug2018.
[Online]CohenW,RavikumarP,FienbergS(2003)Acomparisonofstringmetricsformatchingnamesandrecords.
In:ProceedingsofKddworkshopondatacleaningandobjectconsolidationC.
Crawl.
Commoncrawl.
https://commoncrawl.
org/,2015.
AccessedAug2018.
[Online]damballa.
Dgasinthehandsofcyber-criminals:examiningthestateoftheartinmalwareevasiontechniques.
https://www.
damballa.
com/downloads/r_pubs/WP_DGAs-in-the-Hands-of-Cyber-Criminals.
pdf;2015.
AccessedAug2018.
[Online]DerMF,SaulLK,SavageS,VoelkerGM(2014)Knockitoff:profilingtheonlinestorefrontsofcounterfeitmerchandise.
In:Proceedingsofthe20thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.
ACM,pp1759–1768DNSDB.
Passivedns.
https://www.
dnsdb.
info/,2015.
AccessedAug2018.
[Online]Google.
Googlehostedlibraries.
https://developers.
google.
com/speed/libraries/csw=1,2015.
AccessedAug2018.
[Online]Google.
Publishwebsitecontent.
https://developers.
google.
com/drive/web/publish-site,2015.
AccessedAug2018.
[Online]HanX,KheirN,BalzarottiD(2015)Theroleofcloudservicesinmalicioussoftware:trendsandinsights.
In:DIMVA2015,12thConferenceonDetectionofIntrusionsandMalware&VulnerabilityAssessment,July9–10,2015,Milan,Italy,MilanIdziorekJ,TannianM,JacobsonD(2011)Detectingfraudulentuseofcloudresources.
In:Proc.
3rdACMworkshoponcloudcomputingsecurityworkshop,ChicagoInvernizziL,ComparettiPM,BenvenutiS,KruegelC,CovaM,VignaG(2012)Evilseed:Aguidedapproachtofindingmaliciouswebpages.
In:SecurityandPrivacy(SP),2012IEEESymposiumon.
IEEE,pp428–442InvernizziL,MiskovicS,TorresR,SahaS,LeeS,MelliaM,KruegelC,VignaG(2014)Nazca:detectingmalwaredistributioninlarge-scalenetworks.
In:ProceedingsoftheNetworkandDistributedSystemSecuritySymposium(NDSS)LiZ,AlrwaisS,WangX,AlowaisheqE(2014)Huntingtheredfoxonline:Understandinganddetectionofmassredirect-scriptinjections.
In:SecurityandPrivacy(SP),2014IEEESymposiumon.
IEEE,pp3–18LiZ,AlrwaisS,XieY,YuF,WangX(2013)Findingthelinchpinsofthedarkweb:astudyontopologicallydedicatedhostsonmaliciouswebinfrastructures.
In:SecurityandPrivacy(SP),2013IEEESymposiumon.
IEEE,pp112–126LiaoX,LiuC,MccoyD,ShiE,BeyahR(2016)Characterizinglong-tailseospamoncloudwebhostingservices.
In:ProceedingsoftheInternationalWorldWideWebConferenceMooreT,LeontiadisN,ChristinN(2011)Fashioncrimes:trending-termexploitationontheweb.
In:Proceedingsofthe18thACMconferenceonComputerandcommunicationssecurity.
ACM,pp455–466MulazzaniM,SchrittwieserS,LeithnerM,HuberMDarkCloudsontheHorizon:Usingcloudstorageasattackvectorandonlineslackspace.
In:Proc.
20thUSENIXsecuritysymposium,SanFrancisco,p2011NelmsT,PerdisciR,AntonakakisM,AhamadM(2015)Webwitness:Investigating,categorizing,andmitigatingmalwaredownloadpaths.
In:24thUSENIXSecuritySymposium(USENIXSecurity15).
USENIXAssociation,Washington,D.
C.
,pp1025–1040RistenpartT,TromerE,ShachamH,SavageS(2009)Hey,you,getoffofmycloud:exploringinformationleakageinthird-partycomputeclouds.
In:Proceedingsofthe16thACMconferenceonComputerandcommunicationssecurity.
ACM,pp199–212Scipy.
scipy.
cluster.
hierarchy.
linkage.
http://docs.
scipy.
org/doc/scipy/reference/generated/scipy.
cluster.
hierarchy.
linkage.
html,2015.
AccessedAug2018.
[Online]Servnetn.
d.
.
https://servnetshsztndci.
onion.
AccessedAug2018.
[Online]Sklearn.
sklearn.
svm.
svc.
http://scikit-learn.
org/stable/modules/generated/sklearn.
svm.
SVC.
html,2015.
AccessedAug2018.
[Online]AppendixTable6AListofcloudhostingplatformsCloudPlatformDomainherokuherokuapp.
comamazonS3s3.
amazonaws.
comcloudfrontcloudfront.
netwindowsnetwindows.
netazureazurewebsites.
netgooglegoogledrive.
comappspotappspot.
commsecdnmsecdn.
netbitbucketbitbucket.
orggithubgithub.
iosinasinaapp.
comolympeolympe.
inrackcdnrackcdn.
combaiduyunduapp.
comqiniuqiniucdn.
comakamaihdakamaihd.
netyahoohostingprod.
comsogosogoucdn.
comgo2cloudgo2cloud.
orgaliyunaliyuncs.
comLiaoetal.
Cybersecurity(2018)1:14Page17of18Snort.
Snortsslandtls.
http://manual.
snort.
org/node147.
html,2015.
AccessedAug2018.
[Online]solutionary.
Threat-intelligence.
https://www.
solutionary.
com/_assets/pdf/research/sert-q4-2013-threat-intelligence.
pdf,2015.
AccessedAug2018.
[Online]StringhiniG,KruegelC,VignaG(2013)Shadypaths:Leveragingsurfingcrowdstodetectmaliciouswebpages.
In:Proceedingsofthe2013ACMSIGSACconferenceonComputer&communicationssecurity.
ACM,pp133–144Sucuri.
Sucuri.
https://sucuri.
net/,2015.
AccessedAug2018.
[Online]Symantec.
Thefutureofids.
http://www.
symantec.
com/connect/articles/future-ids,2015.
AccessedAug2018.
[Online]VirusTotal.
Virustotal.
https://www.
virustotal.
com/,2015.
AccessedAug2018.
[Online]WhatWeb.
Whatweb.
http://www.
morningstarsecurity.
com/research/whatweb,2015.
AccessedAug2018.
[Online]XuY,CuiW,PeinadoM(2015)Controlled-channelattacks:deterministicsidechannelsforuntrustedoperatingsystems.
In:Proceedingsofthe36thIEEESymposiumonSecurityandPrivacy(Oakland).
IEEEInstituteofElectricalandElectronicsEngineersY.
Zhang,A.
Juels,M.
K.
Reiter,andT.
Ristenpart.
Cross-tenantSide-ChannelattacksinPaaSclouds.
InProc.
21stConferenceonComputerandCommunicationsSecurity(CCS),Scottsdale,2014Liaoetal.
Cybersecurity(2018)1:14Page18of18
云如故是一家成立于2018年的国内企业IDC服务商,由山东云如故网络科技有限公司运营,IDC ICP ISP CDN VPN IRCS等证件齐全!合法运营销售,主要从事自营高防独立服务器、物理机、VPS、云服务器,虚拟主机等产品销售,适合高防稳定等需求的用户,可用于建站、游戏、商城、steam、APP、小程序、软件、资料存储等等各种个人及企业级用途。机房可封UDP 海外 支持策略定制 双层硬件(傲...
tmhhost怎么样?tmhhost正在搞暑假大促销活动,全部是高端线路VPS,现在直接季付8折优惠,活动截止时间是8月31日。可选机房及线路有美国洛杉矶cn2 gia+200G高防、洛杉矶三网CN2 GIA、洛杉矶CERA机房CN2 GIA,日本软银(100M带宽)、香港BGP直连200M带宽、香港三网CN2 GIA、韩国双向CN2。点击进入:tmhhost官方网站地址tmhhost优惠码:Tm...
hostsailor怎么样?hostsailor成立多年,是一家罗马尼亚主机商家,机房就设在罗马尼亚,具说商家对内容管理的还是比较宽松的,商家提供虚拟主机、VPS及独立服务器,今天收到商家推送的八月优惠,针对所有的产品都有相应的优惠,商家的VPS产品分为KVM和OpenVZ两种架构,OVZ的比较便宜,有这方面需要的朋友可以看看。点击进入:hostsailor商家官方网站HostSailor优惠活动...
drupal7为你推荐
亿元支付宝企业邮局系统什么是企业邮局?企业cmscms系统的概念是什么magentoMagento是什么2019支付宝五福支付宝5褔过了开奖时间怎么办支付宝调整还款日蚂蚁借呗还款日能改吗netshwinsockreset游戏出现battlEye Launcher 怎么办缤纷网缤纷的意思是什么oa办公软件价格一般中小企业用的OA办公系统需要多少钱?最土团购程序公司要开设一个团购项目,应该如何运作?
高防服务器租用 重庆服务器租用 老左 本网站服务器在美国维护 plesk namecheap gomezpeer mobaxterm 京东云擎 lamp配置 镇江联通宽带 gg广告 流量计费 免费测手机号 万网空间购买 网购分享 路由跟踪 深圳主机托管 湖南铁通 fatcow 更多