FindingUnknownMalicein10Seconds:MassVettingforNewThreatsattheGoogle-PlayScaleKaiChen,ChineseAcademyofSciencesandIndianaUniversity;PengWang,YeonjoonLee,XiaofengWang,andNanZhang,IndianaUniversity;HeqingHuang,ThePennsylvaniaStateUniversity;WeiZou,ChineseAcademyofSciences;PengLiu,ThePennsylvaniaStateUniversityhttps://www.
usenix.
org/conference/usenixsecurity15/technical-sessions/presentation/chen-kaiUSENIXAssociation24thUSENIXSecuritySymposium659FindingUnknownMalicein10Seconds:MassVettingforNewThreatsattheGoogle-PlayScaleKaiChen,,PengWang,YeonjoonLee,XiaoFengWang,NanZhang,HeqingHuang§,WeiZouandPengLiu§{chenkai,zouwei}@iie.
ac.
cn,{pw7,yl52,xw7,nz3}@indiana.
edu,hhuang@cse.
psu.
edu,pliu@ist.
psu.
eduIndianaUniversity,BloomingtonStateKeyLaboratoryofInformationSecurity,InstituteofInformationEngineering,ChineseAcademyofSciences§CollegeofIST,PennStateUniversityAbstractAnappmarket'svettingprocessisexpectedtobescal-ableandeffective.
However,today'svettingmechanismsareslowandlesscapableofcatchingnewthreats.
Inourresearch,wefoundthatamorepowerfulsolutioncanbefoundbyexploitingthewayAndroidmalwareisconstructedanddisseminated,whichistypicallythroughrepackaginglegitimateappswithsimilarmaliciouscom-ponents.
Asaresult,suchattackpayloadsoftenstandoutfromthoseofthesamerepackagingoriginandalsoshowupintheappsnotsupposedtorelatetoeachother.
Baseduponthisobservation,wedevelopedanewtechnique,calledMassVet,forvettingappsatamas-sivescale,withoutknowingwhatmalwarelookslikeandhowitbehaves.
Unlikeexistingdetectionmecha-nisms,whichoftenutilizeheavyweightprogramanaly-sistechniques,ourapproachsimplycomparesasubmit-tedappwithallthosealreadyonamarket,focusingonthedifferencebetweenthosesharingasimilarUIstruc-ture(indicatingapossiblerepackagingrelation),andthecommonalityamongthoseseeminglyunrelated.
Oncepubliclibrariesandotherlegitimatecodereusearere-moved,suchdiff/commonprogramcomponentsbecomehighlysuspicious.
Inourresearch,webuiltthis"Diff-Com"analysisontopofanefcientsimilaritycompar-isonalgorithm,whichmapsthesalientfeaturesofanapp'sUIstructureoramethod'scontrol-owgraphtoavalueforafastcomparison.
WeimplementedMassVetoverastreamprocessingengineandevaluateditnearly1.
2millionappsfrom33appmarketsaroundtheworld,thescaleofGooglePlay.
Ourstudyshowsthatthetech-niquecanvetanappwithin10secondsatalowfalsedetectionrate.
Also,itoutperformedall54scannersinVirusTotal(NOD32,Symantec,McAfee,etc.
)intermsofdetectioncoverage,capturingoverahundredthou-sandmaliciousapps,includingover20likelyzero-daymalwareandthoseinstalledmillionsoftimes.
Acloselookattheseappsbringstolightintriguingnewobser-vations:e.
g.
,Google'sdetectionstrategyandmalwareauthors'countermovesthatcausethemysteriousdisap-pearanceandreappearanceofsomeGooglePlayapps.
1IntroductionThephenomenalgrowthofAndroiddevicesbringsinavibrantapplicationecosystem.
Millionsofapplica-tions(appforshort)havebeeninstalledbyAndroidusersaroundtheworldfromvariousappmarkets.
ProminentexamplesincludeGooglePlay,AmazonAppstore,Sam-sungGalaxyApps,andtensofsmallerthird-partymar-kets.
Withthisprosperity,theecosystemistaintedbytherampancyofAndroidmalware,whichmasqueradesasausefulprogram,oftenthroughrepackagingalegitimateapp,towreakhavoc,e.
g.
,interceptingone'smessages,stealingpersonaldata,sendingpremiumSMSmessages,etc.
Counteringthismenaceprimarilyreliesontheeffortfromtheappmarkets,sincetheyareatauniquepositiontostopthespreadofmalwareintherstplace.
Accom-plishingthismission,however,isbynomeanstrivial,ashighlightedbyarecentreport[8]that99%ofmobilemalwarerunsonAndroiddevices.
Challengesinappvetting.
Morespecically,thepro-tectiontoday'sappmarketputsinplaceisavettingpro-cess,whichscreensuploadedappsbyanalyzingtheircodeandoperationsforsuspiciousactivities.
Particu-larly,GooglePlayoperatesBouncer[24],asecurityser-vicethatstaticallyscansanappforknownmaliciouscodeandthenexecutesitwithinasimulatedenviron-mentonGoogle'scloudtodetecthiddenmaliciousbe-havior.
Theproblemhereisthatthestaticapproachdoesnotworkonnewthreats(i.
e.
,zero-daymalware),whilethedynamiconecanbecircumventedbyanappcapableofngerprintingthetestingenvironment,asdiscoveredbyapriorstudy[30].
Alsothedynamicanalysiscanbeheavyweight,whichmakesithardtoexploreallexecu-tionpathsofanapp.
Newdesignsofvettingtechniqueshaverecentlybeen66024thUSENIXSecuritySymposiumUSENIXAssociationproposedbytheresearchcommunity[57,28]forcaptur-ingnewappsassociatedwithknownsuspiciousbehavior,suchasdynamicloadingofbinarycodefromaremoteuntrustedwebsite[57],operationsrelatedtocomponenthijacking[28],Intentinjection[12],etc.
Alltheseap-proachesinvolveaheavyweightinformation-owanal-ysisandrequireasetofheuristicsthatcharacterizetheknownthreats.
Theyoftenneedadynamicanalysisinad-ditiontothestaticinspectionperformedonappcode[57]andfurtherhumaninterventionstoannotatethecodeorevenparticipateintheanalysis[14].
Moreover,emula-torsthatmostdynamicanalysistoolsemploycanbede-tectedandevadedbymalware[23].
Alsoimportantly,noneofthemhasbeenputtoamarket-scaletesttoun-derstandtheireffectiveness,norhastheirperformancebeenclearlymeasured.
Catchingunknownmalice.
Actually,avastmajorityofAndroidmalwarearerepackagedapps[56],whoseau-thorstypicallyattachthesameattackpayloadtodifferentlegitimateapps.
Inthisway,notonlydotheyhidetheirmaliciousprogramlogicbehindtheusefulfunctionali-tiesoftheseapps,buttheycanalsoautomatetherepack-agingprocesstoquicklyproduceanddistributealargenumberofTrojans1.
Ontheotherhand,thispracticemakessuchmalwarestandoutfromotherrepackagedapps,whichtypicallyincorporatenothingbutadvertis-inglibraries[2].
Alsoasaresultoftheapproach,similarcode(typicallyintermsofJavamethods)showsupinunrelatedappsthatarenotsupposedtoshareanythingexceptpopularlibraries.
Theseobservationspresentanewopportunitytocatchmaliciousrepackagedapps,themainstayofAndroidmalware,withoutusinganyheuristicstomodeltheirbe-havior.
Whatwecandoistosimplycomparethecodeofrelatedapps(anappanditsrepackagedversions,orthoserepackagedfromthesameapp)tochecktheirdif-ferentpart,andunrelatedapps(thoseofdifferentori-gins,signedbydifferentparties)toinspecttheircom-monparttoidentifysuspiciouscodesegments(atthemethodlevel).
Thesesegments,oncefoundtobein-explicable(e.
g.
,notcommonlibraries),arealmostcer-taintobemalicious,asdiscoveredinourstudy(Sec-tion4.
2).
ThisDiffComanalysisiswellsuitedfornd-ingpreviouslyunknownmaliciousbehaviorandalsocanbedoneefciently,withoutresortingtoanyheavyweightinformation-owtechnique.
Massvettingatscale.
Basedonthissimpleidea,wede-velopedanovel,highly-scalablevettingmechanismfordetectingrepackagedAndroidmalwareononemarketorcrossmarkets.
WecalltheapproachmassvettingorsimplyMassVet,asitdoesnotusemalwaresignatures1ThoseTrojansaretypicallysignedbydifferentkeystoavoidblockingofaspecicsigner.
andanymodelsofexpectedmaliciousoperations,andinstead,solelyreliesonthefeaturesofexistingappsonamarkettovetnewonesuploadedthere.
Morespecif-ically,toinspectanewapp,MassVetrunsahighlyef-cientDiffComanalysisonitagainstthewholemarket.
Anyexistingapprelatedtothenewone(i.
e.
,sharingthesamerepackagingorigin)isquicklyidentiedfromthestructuralsimilarityoftheiruserinterfaces(aka.
,views),whichareknowntobelargelypreservedduringrepack-aging(Section2).
Then,adifferentialanalysishappenstothosesharingthesimilarviewstructure(indicatingarepackagingrelationbetweenthem)whenamatchhasbeenfound.
Also,anintersectionanalysisisperformedtocomparethenewappagainstthosewithdifferentviewstructuresandsignedbydifferentcerticates.
Thecodecomponentsofinterestdiscoveredinthisway,eitherthecommon(orsimilar)methods(throughtheintersectionanalysis)ordifferentones(bythedifferentialanalysis),arefurtherinspectedtoremovecommoncodereuses(libraries,samplecode,etc.
)andcollectevidencefortheirsecurityrisks(dependenceonothercode,resource-accessAPIcalls,etc.
),beforearedagisraised.
Supportingthismassvettingmechanismareasuiteoftechniquesforhigh-performanceview/codecompar-isons.
Particularly,innovationsaremadetoachieveascalableanalysisofdifferentapps'userinterfaces(Sec-tion3.
2).
Theideaistoprojectasetofsalientfeaturesofanapp'sviewgraph(i.
e.
,theinterconnectionsbetweenitsuserinterfaces),suchastypesofwidgetsandevents,toasingledimension,usingauniqueindextorepresenttheapp'slocationwithinthedimensionandthesimilar-ityofitsinterfacestructuretothoseofothers.
Inourresearch,wecalculatedthisindexasageometriccenterofaviewgraph,calledv-core.
Thev-coresofalltheappsonthemarketaresortedtoenableabinarysearchduringthevettingofanewapp,whichmakesthisstephighlyscalable.
Thehigh-levelideaherewasappliedtoapplicationclonedetection[7],atechniquethathasbeenutilizedinourresearch(mappingthefeaturesofaJavamethodtoanindex,calledm-coreinourresearch)forndingcommonmethodsacrossdifferentapps(Sec-tion3.
3).
Itisimportanttonotethatfortheview-graphcomparison,newtricksneedtobeplayedtohandlethestructuralchangescausedbyrepackaging,e.
g.
,whenad-vertisementinterfacesareadded(Section3.
2).
Ourndings.
WeimplementedMassVetonacloudplatform,nearly1.
2millionreal-worldappscollectedfrom33appmarketsaroundtheworld.
Ourexperimen-talstudydemonstratesthatMassVetvettedappswithintenseconds,withalowfalsepositiverate.
Mostimpor-tantly,fromthe1.
2millionapps,ourapproachdiscov-ered127,429malware:amongthematleast20arelikelyzero-dayand34,026weremissedbythemajorityofthemalwarescannersrunbyVirusTotal,awebsitethatsyn-2USENIXAssociation24thUSENIXSecuritySymposium661dicates54differentantivirusproducts[43].
OurstudyfurthershowsthatMassVetachievedabetterdetectioncoveragethananyindividualscannerwithinVirusTotal,suchasKaspersky,Symantec,McAfee,etc.
Otherhigh-lightsofourndingsincludethediscoveryofmaliciousappsinleadingappmarkets(30,552fromGooglePlay),andGoogle'sstrategiestoremovemalwareandmalwareauthors'countermoves,whichcausemysteriousdisap-pearanceandreappearanceofappsonthePlayStore.
Contributions.
Thecontributionsofthepaperaresum-marizedasfollows:Newtechniques.
Wedevelopedanovelmassvet-tingapproachthatdetectsnewthreatsusingnothingbutthecodeoftheappsalreadyonamarket.
Aninnova-tivedifferential-intersectionanalysis(i.
e.
,DiffCom)isdesignedtoexploittheuniquefeaturesofrepackagingmalware,catchingthemaliciousappsevenwhentheirbehaviorhasnotbeenproledapriori.
Thisanalysisismadescalablebyitssimple,staticnatureandthefea-tureprojectiontechniquesthatenableacloud-based,fastsearchforview/codedifferencesandsimilarities.
Notethatwhenthev-coreandm-coredatasets(only100GBfor1.
2millionapps)aresharedamongmultiplemarkets,MassVetcanhelponemarkettodetectmalicioussubmis-sionsusingtheappshostedbyallthesemarkets.
Newdiscoveries.
WeimplementedMassVetandeval-uateditusingnearly1.
2millionapps,ascaleunparal-leledinanypriorstudyonAndroidmalwaredetection,uptoourknowledge,andonaparwiththatofGooglePlay,thelargestappmarketintheworldwith1.
3millionapps[39].
Oursystemcapturedtensofthousandsofmal-ware,includingthoseslippingundertheradarofmostorallexistingscanners,achievedahigherdetectioncover-agethanallpopularmalwarescannerswithinVirusTotalandvettednewappswithintenseconds.
Somemalwarehaveovermillionsofinstalls.
5,000malwarewerein-stalledover10,000timeseach,impactinghundredsofmillionsofmobiledevices.
Ameasurementstudyfur-thershedslightonsuchimportantissuesashoweffectiveGooglePlayisinscreeningsubmissions,howmalwareauthorshideanddistributetheirattackpayloads,etc.
2BackgroundAndroidAppmarkets.
Publishinganapponamarketneedstogothroughanapprovalprocess.
Asubmissionwillbeinspectedforpurposessuchasqualitycontrol,censorship,andalsosecurityprotection.
Since2012,GooglePlayhasbeenundertheprotectionofBouncer.
ThismechanismapparentlycontributestothereductionofmalwareonthePlaystore,about0.
1%ofallappsthereasdiscoveredbyF-Secure[15].
Ontheotherhand,thissecurityvettingmechanismwassuccessfullycir-cumventedbyanappthatngerprintsitssimulatorandstrategicallyadjustsitsbehavior[33].
ComparedwiththeAndroidofcialmarket,howthird-partymarketsre-viewsubmittedappsislessclear.
ThepicturepaintedbyF-Secure,however,isquitedark:notablemarketslikeMumayi,AnZhi,Baidu,etc.
wereallfoundriddledwithmalwareinltrations[16].
Attemptstoenhancethecurrentsecurevettingmecha-nismsmainlyresorttoconventionalmalwaredetectiontechniques.
Mostoftheseapproaches,suchasVet-Droid[52],relyontrackinginformationowswithinanappandthemaliciousbehaviormodelingfordetect-ingmalware.
Inthecasethatwhatthemalwarewilldoislesscleartothemarket,theseapproachesnolongerhelp.
Further,analyzinginformationowsrequiresse-manticallyinterpretingeachinstructionandcarefully-designedtechniquestoavoidfalsepositives,whichareoftenheavyweight.
Thiscastsdoubtonthefeasibilityofapplyingthesetechniquestoalarge-scaleappvetting.
Repackaging.
Apprepackagingisaprocessthatmod-iesanappdevelopedbyanotherpartyandalreadyre-leasedonmarketstoaddinsomenewfunctionalitiesbe-foreredistributingthenewapptotheAndroidusers.
Ac-cordingtoTrendMicro(July15,2014),nearly80%ofthetop50freeappsonGooglePlayhaverepackagedversions[49].
EventhePlaystoreitselfisreportedtohost1.
2%repackagedapps[58].
Thisratiobecomes5%to13%forthird-partymarkets,accordingtoapriorstudy[55].
Thesebogusappsarebuiltfortwopurposes:eitherforgettingadvertisementrevenuesorfordistribut-ingmalware[7].
Forexample,onecanwrapAngry-Birdwithadlibraries,includinghisownadvertingIDtobenetfromitsadvertisingrevenue.
Malwareauthorsalsofoundthatleveragingthosepopularlegitimateappsisthemosteffectiveandconvenientavenuetodistributetheirattackpayloads:repackagingsavesthemalotofef-forttobuildtheusefulfunctionalitiesofaTrojanandtheprocesscanalsobeautomatedusingthetoolslikesmal-i/baksmali[36];moreimportantly,theycanfree-ridethepopularityoftheseappstoquicklyinfectalargenumberofvictims.
Indeed,researchshowsthatthevastmajorityofAndroidmalwareisrepackagedapps,about86%ac-cordingtoastudy[56].
Aprominentfeaturesharedbyalltheserepackagedapps,maliciousornot,isthattheytendtokeeptheoriginaluserinterfacesintact,soastoimpersonatepopularlegitimateapps.
Scopeandassumptions.
MassVetisdesignedtodetectrepackagedAndroidmalware.
Wedonotconsiderthesituationthatthemalwareauthormakeshismaliciouspayloadaninseparablepartoftherepackagedapp,whichneedsmuchmoreefforttounderstandthelegitimateappthanhedoestoday.
Also,MassVetcanhandletypi-calcodeobfuscationusedinmostAndroidapps(Sec-tion3).
However,weassumethatthecodehasnotbeen366224thUSENIXSecuritySymposiumUSENIXAssociationFigure1:TheArchitectureofMassVet.
obfuscatedtotheextentthatevendisassemblycannotgothrough.
Whenthishappens,notonlyourapproachbutalsomostofotherstaticanalyseswillfail.
Finally,weassumethattheappmarketunderprotectionaccommo-datesalargenumberofhighly-diverseapps,sothatforthemaliciousrepackagedappuploaded,onthemarkettherewillbeeitheranotherappsharingitsrepackagingoriginortheoneincorporatingthesameattackpayload.
Tomakethismorelikelytohappen,differentmarketscansharethefeaturedatasetsoftheirapps(i.
e.
,v-coresandm-cores)witheachother.
Notethatsuchdatasetsareverycompact,only100GBfor1.
2millionapps.
3MassVet:DesignandImplementation3.
1OverviewDesignandarchitecture.
Todetectunknownmalwareatalargescale,wecomeupwithadesignillustratedinFigure1.
Itincludesthreekeycomponents:aprepro-cessingmodule,afeaturedatabasesystemandaDiff-Commodule.
Thepreprocessingmoduleautomaticallyanalyzesasubmittedapp,whichincludesextractingthefeaturesofitsviewstructureandmethods,andthensum-marizingthemintotheapp'sv-coresandm-coresrespec-tively.
TheDiffComcomponentthenworksonthesefea-tures,searchingforthemwithintheappmarket'sv-coreandm-coredatabases.
Matchesfoundthereareusedtoidentifysuspiciousdifferentorcommonmethods,whicharefurtherscreenedtoremovefalsepositives.
Howitworks.
Hereweuseanexampletowalkthroughtheworkowofthesystem.
MassVetrstprocessesalltheappsonamarkettocreateav-coredatabaseforviewstructuresandanm-coredatabaseforJavamethods(Sec-tion3.
4).
Bothdatabasesaresortedtosupportabinarysearchandareusedforvettingnewappssubmittedtothemarket.
ConsiderarepackagedAngryBird.
Onceup-loadedtothemarket,itisrstautomaticallydisassem-bledatthepreprocessingstageintoasmalirepresen-tation,fromwhichitsinterfacestructuresandmethodsareidentied.
Theirfeatures(forviews,userinterfaces,typesofwidgetsandevents,andformethods,controlowgraphsandbytecode)aremappedtoasetofv-cores(Section3.
2)andm-cores(Section3.
3)throughcalculat-ingthegeometriccentersoftheviewgraphsandcontrol-owgraphsrespectively.
Theapp'sv-coresarerstusedtoquerythedatabasethroughabinarysearch.
Onceamatchisfound,whichhappenswhenthereexistsanotherappwithasimilarAngryBirduserinterfacestructure,therepackagedappiscomparedwiththeappalreadyonthemarketatthemethodleveltoidentifytheirdifference.
Suchdifferentmethods(diffforshort)arethenautomat-icallyanalyzedtoensurethattheyarenotadslibrariesandindeedsuspicious,andifso,arereportedtothemar-ket(Section3.
2).
Whenthesearchonthev-coredatabasecomesbackwithnothing2,MassVetcontinuestolookfortheAngryBird'sm-coresinthemethoddatabase.
Ifasimilarmethodhasbeenfound,ourapproachtriestoconrmthatindeedtheappincludingthemethodisun-relatedtothesubmittedAngryBirdanditisnotalegit-imatecodereuse(Section3.
3).
Inthiscase,MassVetreportsthatasuspiciousappisfound.
Allthesestepsarefullyautomated,withouthumanintervention.
3.
2FastUser-InterfaceAnalysisAsdiscussedbefore,thewayMassVetvetsanappde-pendsonwhetheritisrelatedtoanyotherappalreadyonthemarket.
Sucharelationisestablishedinourre-searchthroughaquickinspectionofapps'userinter-faces(UI)toidentifythosewithsimilarviewstructures.
Whensuchappsarenot"ofcially"connected,e.
g.
,pro-ducedbythesameparty,thechanceisthattheyareofthesamerepackagingorigin,andthereforetheirdiffsbecomeinterestingformaliciouscodedetection.
Thisinterface-basedrelationidenticationisanalternativetocode-basedidentication:amaliciousrepackagedappcanbeobfuscatedandjunkcodecanbeeasilyaddedtomakeitlookverydifferentfromtheoriginalversionintermsofthesimilaritybetweentheircode(e.
g.
,per-centageofsimilarmethodssharedbetweenthem).
Ontheotherhand,asignicantchangetotheuserinterfaceneedsmoreeffortandmostimportantlyaffectsuserex-perience,makingitmoredifcultfortheadversarytofreeridethepopularityoftheoriginalapp.
Therefore,mostrepackagedappspreservetheiroriginalUIstruc-tures,asfoundbythepriorresearch[50].
Inourresearch,wefurtherdiscoveredthatmanyrepackagedappsincor-poratealargeamountofnewcode,evenmorethanthatintheiroriginalversions,butstillkeepthoseapps'UIstructureslargelyintact.
Theideaofusingviewstructurestodetectrepack-agedappshasbeenpreliminarilyexploredinpriorre-search[50],whichutilizessubgraphisomorphismalgo-rithmstomeasurethesimilaritybetweentwoapps.
How-ever,theapproachislesseffectivefortheappswithrel-ativelysimpleuser-interfacestructures,andmostimpor-2Themarketcanalsochoosetoperformbothdifferentialandinter-actionanalysesforallnewapps(Section3.
3).
4USENIXAssociation24thUSENIXSecuritySymposium663tantly,agonizinglyslow:ittook11secondstocompareapairofapps[50],whichwouldneed165daystoana-lyzeonesubmissionagainstall1.
3millionappsontheGooglePlaystore.
Followingweelaborateournewsolutiondesignedforanaccurateandhighperformanceapp-viewanalysis.
Featureextraction.
Anapp'suserinterfaceconsistsofasetofviews.
Eachviewcontainsoneormorevisualwid-getssuchasButton,ListView,TextView,etc.
TheseUIcomponentsrespondtousers'inputevents(e.
g.
,tappingandswiping)withtheoperationsspeciedbytheappde-veloper.
Suchresponsesmaycausevisiblechangestothecurrentviewortransitionstootherviews.
Thisintercon-nectionstructure,togetherwiththelayoutsandfunction-alitiesofindividualviews,wasfoundtobesufcientlyuniqueforcharacterizingeachapp[50].
Inourresearch,wemodelsuchaUIstructureasaviewgraph,whichisadirectedweightedgraphincludingallviewswithinanappandthenavigationrelations(thatis,thetransitionfromoneviewtoanother)amongthem.
Onsuchagraph,eachnodeisaview,withthenumberofitsactivewidgets(thosewithproperevent-responseop-erations)asitsweight,andthearcsconnectingthenodesdescribethenavigation(triggeredbytheinputevents)re-lationsamongthem.
Accordingtothetypesoftheevents(e.
g.
,onClick,onFocusChange,onTouch,etc.
),edgescanbedifferentiatedfromeachother.
SuchaviewgraphcaneffectivelydescribeanappwithareasonablycomplicatedUIstructure.
However,itbe-comeslesseffectiveforthesmallappswithonlyacoupleofviewsandaratherstraightforwardconnectionstruc-ture.
Toaddressthisissue,weenrichtheviewgraphwithadditionalfeatures,includingotherUIsandthetypesofwidgetsthatshowupinaview.
Specically,inaddi-tiontoview,whichisdisplayedthroughinvocationofanAndroidActivity,theUIssuchasAlertDialogarealsotreatedasnodesforthegraph.
Customdialogscanbehandledbyanalyzingclassinheritance.
Further,eachtypeofwidgetsisgivenauniquevalue,withasolepur-poseofdifferentiatingitfromothertypes.
Inthisway,wecancalculateaUInode'sweightbyaddingtogetherthevaluesassociatedwiththewidgetsitcarriestomakeasmallviewgraphmoredistinctivefromothers.
Anex-ampleisillustratedinFigure2.
NotethatweavoidusingtextlabelsonUIelementsorotherattributeslikesizeorcolor.
Allthefeaturesse-lectedhere,includingUIs,typesofwidgetsandeventsthatcausetransitionsamongUIs,arelessmanipulable:intheabsenceofseriouseffort,anysignicantchangetothem(e.
g.
,addingjunkwidgets,modifyingthewidgettypes,alteringthetransitionsamongviews)willperceiv-ablyaffectuserexperience,makingitmoredifcultfortheadversarytousethemtoimpersonatepopularapps.
ToconstructtheviewgraphforasubmittedFigure2:AView-graphexample.
Ac:Activity;Da:AlertDialog;Dt:TimePickerDialogDp:ProgressDialog;Dd:DatePickerDialogapp,thepreprocessingmoduleautomaticallyan-alyzesitscodetorecoverallUI-relatedinter-processcommunication(IPC),thechannelthroughwhichanAndroidappinvokesuserinterfaces.
SuchIPCcallsincludestartActivityandstartActivityForResult.
Foreachcall,ourapproachlocatesitwithinaUIandfurtheridentiestheUIittriggers.
Specically,theprogramlocationoftheIPCisexaminedtodeterminewhetheritisinsideaUI-relatedclassv.
Itsparameterisparsedtondouttheclassitcalls(v′).
Inthiscase,nodesarecreatedontheviewgraphforbothclasses(UIs)andanedgeisaddedtolinkvtov′.
Also,thetypeoftheedgeisdeterminedbytheeventhandleratwhichtheIPCislocated:forexample,whenthecallisfoundinsidetheonClickfunctionforabutton,weknowthatthiswidgetisusedtocauseaviewtransition.
Allsuchwidgetsareidentiedfromeachclassfordeterminingtheweightofitsnode.
Designforscale.
Onceaviewgraphisrecoveredfromanapp,wewanttoquicklycompareitacrossamarket(ormarkets)toidentifythoserelatedtotheapp.
Thisopera-tionneedstobeofhigh-performance,capableofprocess-ingoveronemillionappswithinseconds.
Tothisend,weappliedarecentlyproposedsimilaritycomparisonalgo-rithm,calledCentroids[7],totheview-graphanalysis.
Centroidmapsthefeaturesofaprogram'scontrol-owgraph(CFG)intoavalue,whichiscalculatedasthege-ometriccenteroftheprogram.
Thisvaluehasamono-tonicityproperty:thatis,wheneveraminorchangehap-penstotheCFG,thevaluechangesaccordinglyatasmallscale,reectingthelevelofthedifferencemadetotheprogram.
Thispropertylocalizestheglobalcomparisontoasmallnumberof"neighbors"toachievehighscal-abilitywithoutlosingaccuracy.
Theapproachwasusedforthemethodcomparisoninourresearch(Section3.
3).
However,itcannotbedirectlyadoptedforanalyzingtheUIstructure,astheviewgraphisquitedifferentfromtheCFG.
Also,anapp'sgraphisoftenfragmentedduetotheunusualwaystotriggersomeofitsmodules:e.
g.
,mostadvertisementviewsareinvokedthroughcallbacksusingtheAPIsoftheirlibrary;asaresult,theirgraphbecomesseparatedfromthatofthemainprogram.
Herewede-scribehowweaddresstheseissues.
566424thUSENIXSecuritySymposiumUSENIXAssociationGivenasetofsubgraphsforanappUI,Gi=1···n,ourpreprocessmoduleanalyzesthemonebyonetocalcu-latetheirindividualgeometriccenters,i.
e.
,v-cores.
ForasubgraphGi,therstthingthatneedstobedoneistoconvertthefeaturesofeachofitsnodes(i.
e.
,view)intoathree-dimensionalvectorc={α,β,γ}.
HereαisasequencenumberassignedtoeachnodeinGi,whichisdonethroughanordereddeep-rsttraversalofGi:start-ingfromitsmainview,weselectnodestovisitintheorderofthesizesoftheirsubtrees,andusetheirindivid-ualweightstobreakatie;eachnodetraversedinthiswaygetsthenumberbaseduponitsorderofbeingvisited.
Iftwosubtreeshavethesamesize,weselecttheoneac-cordingtotheirnodetypes.
Inthisway,weensurethattheassignmentofsequencenumbersisunique,whichonlydependsonthestructureofthedirectedweightedgraph.
Thesecondelement,β,inthevectoristheoutde-greeofthenode:thatis,thenumberofUIsthenodecanleadto.
Finally,γisthenumberof"transitionloops"thecurrentnodeisinvolved:i.
e.
,thestructurethatfromthenode,thereexistsanavigationpaththatbyvisitingeachnodeonthepathonlyonce,theuserisabletonavigatebacktothecurrentview.
Figure2presentsanexamplethatshowhowsuchavectorisconstructed.
AftereverynodekonGihasbeengivenavectorck,wecancalculateitsgeometriccenter,i.
e.
,v-corevci,asfollows:vci=∑e(p,q)∈Gi(wpcp+wqcq)∑e(p,q)∈Gi(wp+wq)wheree(p,q)denotesanedgeinGifromnodeptoqandwpistheweightofnodep.
Withthemonotonicityofv-cores,wecansortthemforalargenumberofappstosupportabinarysearch.
Inthisway,thesubgraphGicanbequicklycomparedwithmillionsofgraphstoidentifysimilarones.
Specically,givenanothergraphGtwithav-corevct,weconsiderthatitmatchesGiif|vcivct|≤τ,whereτisathreshold.
Further,giventwoappssharingasubsetoftheirview-graphsGi(l=1···m),weconsiderthatthesetwoappsaresimilarintheirUIstructurewhenthefollowinghappenstoatleastoneapp:∑l|Gi(l)|/∑i|Gi|≥θ:thatis,mostoftheapp'sviewstructuresalsoappearintheotherapp(withθbeingathreshold).
Thisensuresthatevenwhentheadversaryaddsmanynewviewstoanapp(e.
g.
,throughfakead-vertisements),therelationbetweentherepackagedappandtheoriginalonecanstillbeidentied.
Inourresearch,suchthresholdsweredeterminedfromatrainingprocessusing50,000randomlyselectedapps(Section3.
3).
Wesetdifferentthresholdsandmeasuredthecorrespondingfalsepositive/negativerates.
Forfalsepositives,werandomlysampled50apppairsdetectedbyourapproachundereachthresholdandmanuallycheckedtheirrelations.
Forfalsenegatives,weutilized100apppairsknowntohaverepackagingrelationsasthegroundtruthtondoutthenumberofpairsourapproachidentiedwithdifferentthresholds.
Thestudyshowsthatwhenτ=0andθ=0.
8,wegotbothalowfalseposi-tiverate(4%)andalowfalsenegativerate(6%).
Amongthese50,000apps,wefoundthat26,317apppairshadrepackagingrelations,involving3,742appsintotal.
Effectivenessoftheview-graphanalysis.
Comparedwithexistingcode-basedapproaches[7],theview-graphanalysisturnsouttobemoreeffectiveatdetectingappsofthesamerepackagingorigin.
Specically,weran-domlyselected10,000apppairs(involving17,964apps)fromthoserepackagedfromthesameprograms,asdis-coveredfrom1.
2millionappswecollected(Section4.
1).
Manyoftheserepackagingpairsinvolvetheappswhosecodesignicantlydifferfromeachother.
Particularly,in14%ofthesepairs,twoappswerefoundtohavelessthan50%oftheirindividualcodeincommon.
Thiscouldbecausedbyalargelibrary(oftenmalicious)addedtoanappduringrepackagingorjunkcodeinsertedforthepur-poseofobfuscation.
Sincetheseappslooksodifferentbasedupontheircode,theirrepackagingrelationscan-notbeeasilydeterminedbyprogramanalysis.
However,theywereallcaughtbyourapproach,simplybecausetheapps'view-graphswerealmostidentical.
3.
3DiffComAnalysisatScaleForanappgoingthroughthemassvettingprocess,theview-graphanalysisrstdetermineswhetheritisrelatedtoanyappalreadyonthemarket.
Ifso,thesetwoappswillbefurthercomparedtoidentifytheirdiffsforamal-wareanalysis.
Otherwise,theappischeckedagainstthewholemarketatthemethodlevel,inanattempttondtheprogramcomponentitshareswithotherapps.
Thediffsandcommoncomponentarefurtherinspectedtoremovecommoncodereuse(libraries,samplecode,etc.
)andcollectevidencefortheirsecurityrisks.
This"difference-commonality"analysisisperformedbytheDiffCommodule.
Wealsopresentthebrickandmor-tarforefcientcode-similarityanalyzeranddiscusstheevasionofDiffCom.
Thebrickandmortar.
Tovetappsatthemarketscale,DiffComneedsahighlyefcientcode-similarityanalyzer.
Inourresearch,wechoseCentroids[7]asthisbuildingblock.
Asdiscussedbefore,thisapproachprojectstheCFGofaprogramtoitsgeometriccenter,inawaysimilartotheview-graphanalysis.
Morespecif-ically,thealgorithmtakesabasicprogramblockasanode,whichincludesasequenceofconsecutivestate-mentswithonlyasingleinputandoutput.
Theweightoftheblockisthenumberofthestatementsitcontains.
ForeachnodeontheCFG,asequencenumberisassigned,togetherwiththecountsofthebranchesitconnectsand6USENIXAssociation24thUSENIXSecuritySymposium665thenumberofloopsitisinvolvedin.
Theseparametersareusedtocalculatethegeometriccenteroftheprogram.
Toprepareformassvetting,ourapproachrstgoesthroughalltheappsonamarketandbreaksthemintomethods.
Afterremovingcommonlibraries,thepre-processingmoduleanalyzestheircode,calculatesthegeometriccentersforindividualmethods(i.
e.
,them-cores)andthensortsthembeforestoringtheresultsinthedatabase.
Duringthevettingprocess,ifasubmittedappisfoundtosharetheviewgraphwithanotherapp,theirdiffsarequicklyidentiedbycomparingthem-coresoftheirindividualmethods.
Whentheappneedstogothroughtheintersectionstep,itsmethodsareusedforabinarysearchonthem-coredatabase,whichquicklydiscoversthosealsoincludedinexistingapps.
Hereweelaboratehowtheseoperationsareperformed.
TheiroverheadismeasuredinSection4.
2.
Analyzingdiffs.
Wheneveranappisfoundtorelatetoanotheronefromtheircommonviewgraph,wewanttoinspectthedifferencepartoftheircodetoidentifysus-piciousactivities.
TherationaleisthatrepackagedappsarethemainstayofAndroidmalware,andthemaliciouspayloadsareofteninjectedautomatically(usingtoolslikesmali/baksmali)withoutanysignicantchangestothecodeoftheoriginalapp,whichcanthereforebelo-catedbylookingatthediffsbetweentheappsofthesamerepackagingorigin.
Suchdiffsarequicklyidentiedbycomparingthesetwoapps'm-cores:giventwoorderedsequencesofm-coresLandL′,thediffbetweentheappsatthemethodlevelisfoundbymergingthesetwolistsaccordingtotheordersoftheirelementsandthenre-movingthosematchingtheircounterpartsontheotherlist;thiscanbedonewithinmin(|L|,|L′|)steps.
However,similarityofapps'UIsdoesnotalwaysindi-catearepackagingrelationbetweenthem.
Theproblemhappenstotheappsproducedbythesameparty,indi-vidualdevelopersoranorganization.
Inthiscase,itisunderstandablethatthesamelibrariesandUIscouldbereusedamongtheirdifferentproducts.
Itisevenpossiblethatoneappisactuallyanupdatedversionoftheother.
Also,amongdifferentdevelopers,openUISDKssuchasAppcelerator[3]andtemplateslikeEnvatomarket[13]arepopular,whichcouldcausetheviewstructuresofunrelatedappstolooksimilar.
Further,evenwhentheappsareindeedrepackaged,thedifferencebetweenthemcouldbejustadvertisement(ad)librariesinsteadofmali-ciouspayloads.
Achallengehereishowtoidentifythesesituationsandavoidbringinginfalsealarms.
Toaddresstheseissues,MassVetrstcleansupasub-mittedapp'smethods,removingadandotherlibraries,beforevettingtheappagainstthemarket.
Specically,wemaintainawhitelistoflegitimateadlibrariesbasedon[6],whichincludespopularmobileadplatformssuchasMobWin,Admob,etc.
Toidentifylessknownones,weanalyzedatrainingsetof50,000appsrandomlysam-pledfromthreeappmarkets,withhalfofthemfromGooglePlay.
Fromtheseapps,ouranalysisdiscovered34,886methodssharedbyatleast27,057appssignedbydifferentparties.
Foreachofthesemethods,wefur-therscanneditshostingappsusingVirusTotal.
Ifnoneofthemwerefoundtobemalicious,weplacedthemethodonthewhitelist.
Inasimilarway,popularviewgraphsamongtheseappswereidentiedandthelibrariesas-sociatedwiththeseviewsarewhite-listedtoavoidde-tectingfalserepackagingrelationsduringtheview-graphanalysis.
Also,othercommonlibrariessuchasAdmobwerealsoremovedduringthisprocess,whichweelab-oratelater.
Giventhesignicantsizeofthetrainingset(50,000randomlyselectedapps),mostifnotalllegiti-matelibrariesarealmostcertaintobeidentied.
Thisisparticularlytrueforthoseassociatedwithadvertising,astheyneedcertainpopularitytoremainprotable.
Ontheotherhand,itispossiblethattheapproachmayletsomezero-daymalwarefallthroughthecracks.
Inourre-search,wefurtherrandomlyselected50ad-relatedmeth-odsonthelistandsearchedforthemontheWeb,andconrmedthatallofthemwereindeedlegitimate.
Withthisfalse-negativerisk,stillourapproachachievedahighdetectioncoverage,higherthananyscannerintegratedinVirusTotal(Section4.
2).
Whenitcomestotheappsproducedbythesameparty,thecodetheyshareislesspopularandthereforemaynotbeidentiedbytheapproach.
Thesimplestsolutionhereistolookatsimilarapps'signatures:thosesignedbythesamepartyarenotconsideredtobesuspiciousbecausetheydohaveagoodreasontoberelated.
Thissimpletreatmentworksmostoftime,sincelegitimateappven-dorstypicallysigntheirproductsusingthesamecerti-cate.
However,therearesituationswhentwolegitimateappsaresignedbydifferentcerticatesbutactuallycomefromthesamesource.
Whenthishappens,thediffsoftheappswillbereportedandinvestigatedassuspiciouscode.
Toavoidthefalsealarm,wetookacloselookatthelegitimatediffs,whicharecharacterizedbytheirin-tensiveconnectionswithotherpartoftheapp.
Theyareinvokedbyothermethodsandinthemeantimecallthecommonpartofthecodebetweentheapps.
Ontheotherhand,themaliciouspayloadpackagedtoalegitimateapptendstostandalone,andcanonlybetriggeredfromafew(typicallyjustone)programlocationsandrarelycallthecomponentswithintheoriginalprogram.
Inourresearch,weleveragedthisobservationtodif-ferentiatesuspiciousdiffsfromthoselikelytobelegiti-mate.
Foreachdiffdetected,theDiffComanalyzerlooksforthecallsitmakestowardtherestoftheprogramandinspectsthesmalicodeoftheapptoidentifytheref-erencestothemethodswithinthediff.
Thesemethodswillgothroughafurtheranalysisonlywhensuchinter-766624thUSENIXSecuritySymposiumUSENIXAssociationactionsareverylimited,typicallyjustaninwardinvoca-tion,withoutanyoutboundcall.
Notethatcurrentmal-wareauthorsdonotmaketheircodemoreconnectedtothelegitimateapptheyrepackage,mainlybecausemoreeffortisneededtounderstandthecodeoftheappandcarefullyconstructtheattack.
Afurtherstudyisneededtounderstandtheadditionalcostrequiredtobuildmoresophisticatedmalwaretoevadeourdetection.
Forthedifffoundinthisway,DiffComtakesfur-thermeasurestodetermineitsrisk.
AsimpleapproachusedinourimplementationistocheckthepresenceofAPIcalls(eitherAndroidframeworkAPIsorthoseassociatedwithpopularlibraries)relatedtotheoper-ationsconsideredtobedangerous.
ExamplesincludegetSimSerialNumber,sendTextMessageandgetLastKnownLocation.
Thendingshereindi-catethatthediffcodeindeedhasthemeanstocausedam-agetothemobileuser'sinformationassets,thoughhowexactlythiscanhappenisnotspecied.
Thisisdiffer-entfromexistingbehavior-baseddetection[27],whichlooksformuchmorespecicoperationsequencessuchas"readingfromthecontactlistandthensendingittotheInternet".
Suchatreatmenthelpssuppressfalsealarmsandstillpreservesthegeneralityofourdesign,whichaimsatdetectingunknownmaliciousactivities.
Analyzingintersections.
Whennoapparentconnectionhasbeenfoundbetweenanappandthosealreadyonthemarket,thevettingprocessneedstogothroughanin-tersectionanalysis.
ThisalsohappenswhenDiffComisconguredtoperformtheanalysisontheappthathasnotbeenfoundtobemaliciousatthedifferentialstep.
Identi-cationofcommonmethodsanewlysubmittedappcar-riesisratherstraightforward:eachmethodoftheappismappedtoitsm-core,whichisusedtosearchagainstthem-coredatabase.
Asdiscussedbefore,thiscanbedonethroughabinarysearch.
Onceamatchisfound,Diff-Comfurtherinspectsit,removinglegitimateconnectionsbetweentheapps,andreportsthendingtothemarket.
Again,themainchallengehereistodeterminewhethertwoappsareindeedunrelated.
Asimplesig-naturecheckremovesmostofsuchconnectionsbutnotall.
The"stand-alone"test,whichcheckswhetherasetofmethodsintensivelyinteractwiththerestofanapp,doesnotworkfortheintersectiontest.
Theproblemhereisthatthecommonmethodsbetweentworepackagedappsmaynotbethecompletepictureofamaliciouspay-load,makingthemdifferentfromthediffidentiedinthedifferential-analysisstep:differentmalwareauthorsoftenusesomecommontoolkitsintheirattackpayloads,whichshowupintheintersectionbetweentheirapps;thesemodulesstillincludeheavyinteractionswithothercomponentsofthemalwarethatarenotfoundinsidetheintersection.
Asaresult,thisfeature,whichworkswellondiffs,cannothelptocapturesuspiciouscommoncodeamongapps.
Analternativesolutionhereistolookathowtheseem-inglyunrelatedappsareactuallyconnected.
Asdis-cussedbefore,whatcausestheproblemisthedevelopersororganizationsthatreusecodeinternally(e.
g.
,apropri-etarySDK)butsigntheappsusingdifferentcerticates.
Oncesucharelationisalsoidentied,wewillbemorecondentaboutwhethertwoappssharingcodeareinde-pendentfromeachother.
Inthiscase,thecommoncodebecomessuspiciousafterallpubliclibraries(e.
g.
,thoseonthelistusedinthepriorresearch[6])andcodetem-plateshavebeenremoved.
Herewedescribeasimpletechniquefordetectingsuchahiddenrelation.
Fromourtrainingdataset,wefoundthatmostcodereusedlegitimatelyinthissituationinvolvesuserinter-faces:thedeveloperstendtoleverageexistingviewde-signstoquicklybuildupnewapps.
Withthispractice,eventhoughtwoappsmaynotappearsimilarenoughintermsoftheircompleteUIstructures(thereforetheyareconsideredtobe"unrelated"bytheview-graphanaly-sis),acloselookatthesubgraphsoftheirviewsmayre-vealthattheyactuallyshareasignicantportionoftheirviewsandevensubgraphs.
Specically,fromthe50,000appsinourtrainingset,afterremovingpubliclibraries,wefound30,286sharingatleast30%oftheirviewswithotherapps,16,500sharing50%and8,683containingnolessthan80%commonviews.
Byrandomlysamplingtheseapps(10eachtime)andanalyzingthemmanually,weconrmedthatwhentheportiongoesabove50%,al-mostalltheappsandtheircounterpartsareeitherfromthesamedevelopersororganizations,orhavingthesamerepackagingorigins.
Also,oncethesharedviewsbe-come80%ormore,almostalwaystheappsareinvolvedinrepackaging.
Baseduponthisobservation,werunanadditionalcorrelationcheckonapairofappswithcom-moncode:DiffComcomparestheirindividualsubgraphsagainandifasignicantportion(50%)isfoundtobesimilar,theyareconsideredrelatedandthereforetheirintersectionwillnotbereportedtothemarket.
Afterthecorrelationcheck,alltheappsgoingthroughtheintersectionanalysisareverylikelytobeunrelated.
Therefore,legitimatecodesharedbetweenthem,ifany,isalmostalwayspubliclibrariesortemplates.
Asde-scribedbefore,weremovedsuchcommoncodethroughwhite-listingpopularlibrariesandfurthercomplementedthelistwiththosediscoveredfromthetrainingset:meth-odsinatleast2,363appswereconsideredlegitimatepublicresourcesifalltheseappswereclearedbyVirus-Total.
Suchcodewasfurthersampledandmanuallyan-alyzedinourstudytoensurethatitindeeddidnotin-volveanysuspiciousactivities.
Withallsuchlibrariesremoved,thesharedcode,particularlythemethodwithdangerousAPIs(e.
g.
,getSimSerialNumber),isre-portedaspossiblemaliciouspayload.
8USENIXAssociation24thUSENIXSecuritySymposium667EvadingMassVet.
ToevadeMassVet,theadversarycouldtrytoobfuscateappcode,whichcanbetoleratedtosomedegreebythesimilaritycomparisonalgorithmweuse[7].
Forexample,commonobfuscationtech-niques,suchasvariable/methodrenaming,donotaffectcentroids.
Alsothecommonalityanalysiscanonlybedefeatedwhentheadversaryiswillingtosignicantlyaltertheattackpayload(e.
g.
,amaliciousSDK)eachtimewhenheusesitforrepackaging.
Thisisnotsup-portedbyexistingautomatictoolslikeADAM[53]andDroidChameleon[32],whichalwaysconvertthesamecodetothesameobfuscatedform.
Further,adeepobfus-cationwillarousesuspicionwhentheappiscomparedwithitsrepackagingoriginthathasnotbeenobfuscated.
Theadversarymayalsoattempttoobfuscatetheapp'sviewgraphs.
This,however,isharderthanobfuscat-ingcode,askeyelementsinthegraph,likeeventfunc-tionsOnClick,OnDrag,etc.
,arehardcodedwithintheAndroidframeworkandcannotbemodied.
Alsoaddingjunkviewscanbemoredifcultthanitappearstobe:theadversarycannotsimplythrowinviewsdis-connectedfromexistingsub-graphs,astheywillnotaf-fecthowMassVetdetermineswhethertwoview-graphsmatch(Section3.
2);otherwise,hemayconnectsuchviewstoexistingsub-graphs(potentiallyallowingthemtobevisitedfromtheexistingUI),whichrequiresunder-standingalegitimateapp'sUIstructurestoavoidaffect-inguserexperience.
Wefurtheranalyzedtheeffectivenessofexistingob-fuscationtechniquesagainstourview-graphapproachover100randomlyselectedGoogle-Playapps.
Popu-larobfuscationtoolssuchasDexGuard[37]andPro-Guard[38]onlyworkonJavabytecode,nottheDalvikbytecodeofthesecommercialapps.
Inourresearch,weutilizedADAM[53]andDroidChameleon[32],whicharedesignedforDalvikbytecode,andarehighlyeffec-tiveaccordingtopriorstudies[53,32].
Supposedlytheycanalsoworkonview-relatedcodewithinthoseapps.
Howeverafterrunningthemontheapps,wefoundthattheirv-cores,comparedwiththosebeforetheobfusca-tion,didnotchangeatall.
Thisdemonstratesthatsuchobfuscationisnoteffectiveonourview-graphapproach.
Ontheotherhand,weacknowledgethatanewobfus-cationtoolcouldbebuilttodefeatMassVet,particularlyitsview-graphsearchandtheComstep.
Thecostfordo-ingthis,however,islessclearandneedsfurtherefforttounderstand(Section5).
3.
4SystemBuildingInourresearch,weimplementedaprototypeofMassVetusingCandPython,nearly1.
2millionappscollectedfrom33markets,includingover400,000fromGooglePlay(Section4.
1).
Beforetheseappscanbeusedtovetnewsubmissions,theyneedtobeinspectedtode-tectmaliciouscodealreadythere.
Analyzingappsofthisscaleandutilizingthemforareal-timevettingrequirecarefullydesignedtechniquesandaproperinfrastructuresupport,whichweelaborateinthissection.
Systembootstrappingandmalwaredetection.
Tobootstrapoursystem,amarketrstneedstogothroughallitsexistingappsusingourtechniquesinanef-cientway.
TheAPKsoftheseappsaredecompiledintosmali(usingthetoolbaksmali[36]3)toextracttheirviewgraphsandindividualmethods,whicharefurtherconvertedintov-coresandm-coresrespectively.
WeuseNetworkX[29]tohandlegraphsandndloops.
Thenthesefeatures(i.
e.
,cores)aresortedandindexedbeforestoredintotheirindividualdatabases.
Inourimplemen-tation,suchdataarepreservedusingtheSqlitedatabasesystem,whichischaracterizedbyitssmallsizeandef-ciency.
Foralltheseapps,1.
5GBwasgeneratedforv-coresand97GBform-cores.
Thenextstepistoscanallthese1.
2millionappsformaliciouscontent.
Astraightforwardapproachistoin-spectthemonebyonethroughthebinarysearch.
Thiswilltaketensofmillionsofstepsforcomparisonsandanalysis.
Ourimplementationincludesanefciental-ternative.
Specically,onthev-coredatabase,oursys-temgoesthroughthewholesequencefromoneend(thesmallestelement)totheotherend,evaluatingtheele-mentsalongthewaytoformequivalentgroups:allthosewithidenticalv-coresareassignedtothesamegroup4.
Allthesubgraphswithinthesamegroupmatcheachother.
However,assemblingthemtogethertodeterminethesimilaritybetweentwoappsturnsouttobetricky.
ThisisbecausetheUIofeachappistypicallybrokenintoaround20subgraphsdistributedacrossthewholeorderedv-coresequence.
Assuch,anyattempttomakethecomparisonbetweentwoappsrequirestogothroughallequivalentgroups.
Thefastestwaytodothatistomaintainatableforthenumberofviewsubgraphssharedbetweenanytwoapps.
However,thissimpleapproachrequiresahugetable,halfof1.
2millionby1.
2millionintheworstcase,whichcannotbecompletelyloadedintothememory.
Inourimplementation,atrade-offismadetosavespacebyonlyinspecting20,000appsagainsttherest1.
2millioneachtime,whichrequiresgoingthroughtheequivalentgroupsfor60timesandusesabout100GBmemoryateachround.
Theinspectiononm-coresismuchsimpleranddoesnotneedtocompareoneappagainstallothers.
Thisisbecauseallwecareaboutherearejustthecommonmethodsthatalreadyshowupwithinindividualequiva-3Veryfewapps(0.
01%)cannotbedecompiledinourdatasetduetothelimitationofthetool.
4Inourimplementation,wesetthethresholdτtozero,whichcancertainlybeadjustedtotoleratesomeminorvariationsamongsimilarmethods.
966824thUSENIXSecuritySymposiumUSENIXAssociationFigure3:CloudframeworkforMassVet.
lentgroups.
Thosemethodsarethenfurtheranalyzedtodetectsuspiciousones.
Cloudsupport.
Tosupportahigh-performancevettingofapps,MassVetisdesignedtoworkonthecloud,run-ningontopofastreamprocessingframework(Figure3).
Specically,ourimplementationwasbuiltonStorm[40],anopen-sourcestream-processingenginethatalsopow-ersleadingwebservicessuchasWebMD,Alibaba,Yelp,etc.
Stormsupportsalarge-scaleanalysisofadatastreambyasetofworkerunitsthatconnecttoeachother,formingatopology.
Inourimplementation,theworkowofthewholevettingprocessisconvertedintosuchatopology:asubmittedappisrstdisassembledtoextractviewgraphsandmethods,whicharecheckedagainstthewhitelisttoremovelegitimatelibrariesandtemplates;then,theapp'sv-coresandm-coresarecalcu-lated,andabinarysearchonthev-coredatabaseisper-formed;dependingontheresultsofthesearch,thediffer-entialanalysisisrstrun,whichcanbefollowedbytheintersectionanalysis.
Eachoperationhereisdelegatedtoaworkerunitonthetopologyandallthedataassociatedwiththeappareinasinglestream.
TheStormengineisdesignedtosupportconcurrentlyprocessingmultiplestreams,whichenablesamarkettoefcientlyvetalargenumberofsubmissions.
4EvaluationandMeasurement4.
1SettingtheStageAppcollection.
Wecollected1.
2millionappsfrom33Androidmarkets,includingover400,000fromGooglePlay,596,437from28appstoresinChina,61,866fromEuropeanstoresand27,047fromotherUSstoresaselab-oratedinTable5inAppendix.
WeremovedduplicatedappsaccordingtotheirMD5.
Alltheappswedown-loadedfromGooglePlayhavecompletemetadata(up-loaddate,size,numberofdownloads,developer,etc.
),whileallthosefromthird-partymarketsdonot.
TheappsfromGooglePlaywereselectedfrom42cat-egories,includingEntertainment,Tools,Social,Com-munication,etc.
Fromeachcategory,werstwentforitstop500popularones(intermsofnumberofinstalls)andthenrandomlypickedup1000to30,000acrossthewholecategory.
Foreachthird-partymarket,wejustran-domlycollectedasetofapps(Table5)(190to108,736,dependingonmarketsizes).
Ourcollectionincludeshigh-proledappssuchasFacebook,Skype,Yelp,Pin-terest,WeChat,etc.
andthoselesspopularones.
Theirsizesrangefromlessthan1MBtomorethan100MB.
Validation.
Forthesuspiciousappsreportedbyourpro-totype,wevalidatedthemthroughVirusTotalandman-ualevaluations.
Virustotalisthemostpowerfulpublicmalwaredetectionsystem,whichisacollectionof54anti-malwareproducts,includingthemosthigh-prolecommercialscanners.
Italsoprovidesthescanningser-viceonmobileapps[44].
VirusTotalhastwomodes,completescanning(whichwecallnewscan)andusingcachedresults(calledcachedscan).
Thelatterisfast,capableofgoingthrough200appseveryminute,butonlycoversthosethathavebeenscannedbefore.
Fortheprogramsithasneverseenoruploadedonlyrecently,theoutcomeisjust"unknown".
Theformerdetermineswhetheranappismaliciousbyrunningonitall54scan-nersintegratedwithinVirusTotal.
Theresultismoreup-to-datebuttheoperationismuchslower,taking5min-utesforeachapp.
Tovalidatetensofthousandssuspiciouscasesdetectedfromthe1.
2millionapps(Section4.
2),werstper-formedthecachedscantoconrmthatmostofournd-ingswereindeedmalicious.
Theappsreportedtobe"un-known"werefurtherrandomlysampledforanewscan.
ForalltheappsthatVirusTotaldidnotndmalicious,wefurtherpickedupafewsamplesforamanualanal-ysis.
Particularly,forallsuspiciousappsidentiedbytheintersectionanalysis,weclusteredthemaccordingtotheirsharedcode.
Withineachcluster,wheneverweob-servedthatmostmemberswereconrmedmalware,weconcludedthathighlylikelytheremainingappstherearealsosuspicious,eveniftheywerenotaggedbyVirus-Total.
Thecommoncodeoftheseappswerefurtherinspectedforsuspiciousactivitiessuchasinformationleaks.
Asimilarapproachwasemployedtounderstandthediffcodeextractedduringthedifferentialanalysis.
Wemanuallyidentiedtheactivitiesperformedbythecodeandlabeleditassuspiciouswhentheycouldleadtodamagestotheuser'sinformationassets.
4.
2EffectivenessandPerformanceMalwarefoundandcoverage.
Fromourcollection,MassVetreported127,429suspiciousapps(10.
93%).
10,202ofthemwerecaughtby"Diff"andtherestwerediscoveredby"Com".
Thesesuspiciousappsarefromdifferentmarkets:30,552fromGooglePlayand96,877fromthethird-partymarkets,asillustratedinTa-ble5.
WerstvalidatedthesendingsbyuploadingthemtoVirusTotalforacachedscan(i.
e.
,quicklycheckingtheappsagainstthechecksumscachedbyVirusTotal),10USENIXAssociation24thUSENIXSecuritySymposium669AVName#ofDetection%PercentageOurs(MassVet)19770.
11ESET-NOD3217160.
85VIPRE13648.
40NANO-Antivirus12042.
70AVware8730.
96Avira7928.
11Fortinet7125.
27AntiVir6021.
35Ikarus6021.
35TrendMicro-HouseCall5921.
00F-Prot4716.
73Sophos4616.
37McAfee4516.
01Table1:ThecoveragesofotherleadingAVscanners.
whichcamebackwith91,648conrmedcases(72%),17,061presumablyfalsepositives(13.
38%,thatis,theappswhosechecksumswereinthecachebutnotfoundtobemaliciouswhentheywerescanned)and13,492unknown(10.
59%,thatis,theappswhosechecksumswerenotinVirusTotal'scache).
Wefurtherrandomlyselected2,486samplesfromtheunknownsetand1,045fromthe"false-positive"set,andsubmittedtoVirusTotalagainforanewscan(i.
e.
,runningallthescanners,withthemostup-to-datemalwaresignatures,ontheapps).
Itturnedoutthat2,340(94.
12%)ofunknowncasesand349(33.
40%)of"falsepositives"areactuallymaliciousapps,accordingtothenewanalysis.
Thisgivesusafalsedetectionrate(FDR:falsepositivesvs.
alldetected)of9.
46%andafalsepositiverate(FPR:falsepositivesvs.
allappsanalyzed)of1%,solelybaseduponVirusTotal'sscanresults.
NotethattheComstepfoundmoremal-warethanDiff,asDiffreliesonthepresenceoftwoappsofsamerepackagingoriginsinthedataset,whileComonlylooksforcommonattackpayloadssharedamongapps.
Itturnsoutthatmanymalicious-appsutilizesamemaliciousSDKs,whichmakethemeasiertocatch.
Wefurtherrandomlysampled40falsepositivesre-portedbythenewscanforamanualvalidationandfoundthat20ofthemactuallyarehighlysuspicious.
Specif-ically,threeofthemloadandexecutesuspiciouscodedynamically;onetakespicturesstealthily;oneperformssensitiveoperationtomodifythebootingsequenceofotherapps;sevenofthemgetsensitiveuserinformationsuchasSIMcardSNnumberandtelephonenumber/ID;severalaggressiveadwareturnouttoaddphishingplug-insandappinstallationbarswithouttheuser'sconsent.
Thepresenceoftheseactivitiesmakesusbelievethatverylikelytheyareactuallyzero-daymalware.
WehavereportedallofthemtofourAntivirussoftwarevendorssuchasNortonandF-Secureforafurtheranalysis.
Ifallthesecasesareconrmed,thentheFDRofMassVetcouldfurtherbereducedto4.
73%.
Tounderstandthecoverageofourapproach,werandomlysampled2,700appsfromGooglePlayandscannedthemusingMassVetandthe54scannerswithinVirusTotal.
Alltogether,VirusTotaldetected281apps#AppsPre-Processingv-coredatabasedifferentialm-coredatabasesumanalysissearchsearch(Intersection)105.
840.
150.
331.
808.
12505.
850.
150.
341.
998.
331005.
850.
140.
352.
238.
572005.
880.
160.
353.
139.
525005.
880.
160.
353.
569.
95Table2:Performance:"Apps"herereferstothenumberofconcurrentlysubmittedapps.
andamongthemourapproachgot197apps.
Thecover-ageofMassVet,withregardtothecollectiveresultofall54scanners,is70.
1%,betterthanwhatcouldbeachievedbyanyindividualscannerintegratedwithinVirusTo-tal,includingsuchtop-of-the-lineantivirussystemsasNOD32(60.
8%),Trend(21.
0%),Symantec(5.
3%)andMcAfee(16%).
Mostimportantly,MassVetcaughtatleast11%malwarethosescannersmissed.
ThedetailsofthestudyarepresentedinTable1(top12).
Vettingdelay.
Wemeasuredtheperformanceofourtechnique,onaserverwith260GBmemory,40coresat2.
8GHzand28TBharddrives.
RunningontopoftheStormstreamprocessor,ourprototypewastestedagainst1to500concurrentlysubmittedapps.
Theaveragedelayweobservedis9seconds,fromthesubmissionoftheapptothecompletionofthewholeprocessonit.
Thisvettingoperationwasperformedagainstall1.
2millionapps.
Table2furthershowsthebreakdownofthevettingtimeatdifferentvettingstages,includingpreprocessing(v-coreandm-coregeneration),searchacrossthev-coredatabase,thedifferentialanalysis,searchoverthem-coredatabaseandtheintersectionanalysis.
Overall,weshowthatMassVetisindeedcapableofscalingtothelevelofreal-worldmarketstoprovideareal-timevettingservice.
4.
3MeasurementandFindingsOverthe127,429maliciousappsdetectedinourstudy,weperformedameasurementstudythatbringstolightafewinterestingobservationsimportantforunderstand-ingtheseriousnessofthemalwarethreattotheAndroidecosystem,aselaboratedbelow.
Landscape.
Themalwarewefoundaredistributedacrosstheworld:over35,473fromNorthAmerica,4,852fromEuropeand87,104fromAsia.
Intermsoftheportionofmaliciouscodewithinallapps,Chineseappmarketstaketheleadwith12.
90%,whichisfol-lowedbyUS,with8.
28%.
Thisobservationpointstoapossiblelackofregulationsandpropersecurityprotec-tioninmanyChinesemarkets,comparedwiththoseinothercountries.
EvenamongtheappsdownloadedfromGooglePlay,over7.
61%aremalicious,whichisdiffer-entfromapriorreportofonly0.
1%malwarediscoveredthere[15].
NotethatmostofthemalwareherehasbeenconrmedbyVirusTotal.
Thisindicatesthatindeedtheportionoftheappswithsuspiciousactivitiesonleadingappstorescouldbehigherthanpreviouslythought.
De-1167024thUSENIXSecuritySymposiumUSENIXAssociationFigure4:Thedistributionofdown-loadsformaliciousorsuspiciousappsinGooglePlay.
Figure5:ThedistributionofratingformaliciousorsuspiciousappsinGooglePlay.
Figure6:ThedistributionofaveragenumberofdownloadsformaliciousorsuspiciousappsinGooglePlay.
tailednumbersofmaliciousappsareshowninAppendix(Table5).
Weobservedthatmostscannersreactslowlytotheemergenceofnewmalware.
Forall91,648maliciousappsconrmedbyVirusTotal,only4.
1%werealarmedbyatleast25outof54scannersithosts.
TheresultsarepresentinFigure7.
ThisndingalsodemonstratesthecapabilityofMassVettocapturenewmaliciouscontentmissedbymostcommercialscanners.
Figure7:NumberofmalwaredetectedbyVirusTotal.
Theimpactsofthosemaliciousappsaresignicant.
Over5,000suchappshavealreadybeeninstalledover10,000timeseach(Figure4).
Also,thereareafewextremelypopularones,withtheinstallcountreaching1millionorevenmore.
Also,theGoogle-PlayratingsofthesuspiciousAPKsarehigh(mostofthemrangingfrom3.
6to4.
6,Figure5),witheachbeingdownloadedformanytimes(100,000to250,000)onaverage(Fig-ure6).
Thissuggeststhathundredsofmillionsofmobiledevicesmighthavealreadybeeninfected.
Existingdefenseanddisappearedapps.
Apparently,GooglePlayindeedmakesefforttomitigatethemalwarethreat.
However,ourmeasurementstudyalsoshowsthechallengeofthismission.
AsFigure8illustrates,mostmalwarewediscoveredwereuploadedinthepast14months.
Alsothemorerecentlyanappshowsup,themorelikelyitisproblematic.
ThisindicatesthatGooglePlaycontinuouslyinspectstheappsithoststoremovethesuspiciousones.
Fortheappsthathavealreadybeenthereforawhile,thechanceisthattheyarequitelegiti-mate,withonly4.
5%foundtobemalicious.
Ontheotherhand,thenewlyreleasedappsaremuchlesstrustworthy,with10.
69%ofthembeingsuspicious.
Also,thesemali-ciousappshaveaprettylongshelftime,asGoogleneedsupto14monthstoremovemostofthem.
Amongthemalwarewediscovered,3appsuploadedinDec.
2010arestillthereinGooglePlay.
Interestingly,40daysafteruploading3,711apps(thoseweaskedVirusTotaltorunnewscanupon,asmentionedearlier)toVirusTotal,wefoundthat250ofthemdisappearedfromGooglePlay.
90dayslater,an-other129appsdisappeared.
Amongthe379disappearedapps,54apps(14%)weredetectedbyVirusTotal.
Ap-parently,GoogledoesnotrunVirusTotalforitsvettingbutpayscloseattentiontothenewmalwareitnds.
Wefurtheridentied2,265developersofthe3,711suspiciousapps,usingtheapps'metadata,andmoni-toredalltheirappsinthefollow-up15weeks(November2014toFebruary2015).
Withinthisperiod,weobservedthatadditional204appsunderthesedevelopersdisap-peared,allofwhichweredetectedbyMassVet,duetothesuspiciousmethodstheysharedwiththemalwarewecaughtbeforethatperiod.
TheinterestingpartisthatwedidnotscantheseappswithinVirusTotal,whichindi-catesthatitislikelythatGooglePlayalsolookedintotheirmaliciouscomponentsandutilizedthemtocheckallotherappsunderthesamedevelopers.
However,ap-parently,Googledidnotdothisacrossthewholemarket-place,becausewefoundthatotherappscarryingthosemethodswerestillthereonGooglePlay.
IftheseappsweremissedduetothecostforscanningalltheappsonthePlayStore,MassVetmightactuallybeusefulhere:ourprototypeisabletocompareamethodacrossall1.
2millionappswithin0.
1second.
Anotherinterestingndingisthatwesawthatsomeofthesedevelopersuploadedthesameorsimilarmaliciousappsagainaftertheywereremoved.
Actually,amongthe2,125reappearedapps,604conrmedmalware(28.
4%)showedupinthePlayStoreunchanged,withthesameMD5andsamenames.
Further,thosedevelopersalsopublished829appswiththesamemaliciouscode(asthatofthemalware)butunderdifferentnames.
ThefactthattheappswithknownmaliciouspayloadsstillgotslippedinsuggeststhatGooglemightnotpayadequateattentiontoevenknownmalware.
Repackagingmalwareandmaliciouspayload.
Amongthesmallsetofrepackagingmalwarecapturedbythedifferentialanalysis,mostarefromthird-partystores(92.
35%).
Interestingly,rarelydidweobservethatmalwareauthorsrepackagedGooglePlayapps12USENIXAssociation24thUSENIXSecuritySymposium671Figure8:Numberofmaliciousappsovertime.
anddistributedthemtothethird-partystoresinChina.
Instead,malwarerepackagingappearstobequitelocalized,mostlybetweentheappstoresinthesameregionorevenonthesamestore.
Apossibleexplanationcouldbetheeffortthatmalwareauthorsneedtomakeontheoriginalappsothatitworksforanewaudience,whichiscertainlyhigherthansimplyrepackagingthepopularoneinthelocalmarkets.
Figure9illustratesthedistributionofcommoncodeacrossmalware,asdiscoveredfromtheintersectionanal-ysis.
Arelativelysmallsetofmethodshavebeenreusedbyalargenumberofmaliciousapps.
Theleadingonehasbeenutilizedby9,438Google-Playmalwareandby144suspiciousappsinthethird-partymarkets.
Thismethodturnsouttobepartofthelibrary("com/star-tapp")extensivelyusedbymalware.
Over98%oftheappsintegratingthislibrarywereaggedasmaliciousbyVirusTotalandtherestwerealsofoundtobesuspi-ciousthroughourmanualvalidation.
Thismethodsendsusers'ne-grainedlocationinformationtoasuspiciouswebsite.
Similarly,allotherpopularmethodsareappar-entlyalsopartofmalware-buildingtoolkits.
Examplesinclude"guohead","purchasesdk"and"SDKUtils".
Themalwareintegratingsuchlibrariesaresignedbythou-sandsofdifferentparties.
AnobservationisthattheuseofthesemaliciousSDKsisprettyregional:inChinesemarkets,"purchasesdk"ispopular,while"startapp"iswidelyusedintheUSmarkets.
Wealsonoticedthatanumberoflibrarieshavebeenobfuscated.
AcloselookattheseattackSDKsshowsthattheyareusedforgettingsensitiveinformationlikephonenumbers,downloadinglesandloadingcodedynamically.
Signaturesandidentities.
Foreachconrmedmali-ciousapp,wetookalookatits"signature",thatis,thepublickeyonitsX.
509certicateforverifyingthein-tegrityoftheapp.
Somesignatureshavebeenutilizedbymorethan1,000malwareeach:apparently,somemal-wareauthorshaveproducedalargenumberofmaliciousappsandsuccessfullydisseminatedthemacrossdiffer-entmarkets(Table3).
Further,whenwecheckedthemetadataforthemalwarediscoveredonGooglePlay,wefoundthatafewsignatureshavebeenassociatedwithmanyidentities(e.
g.
,thecreatoreldinthemeta-data).
Particularly,onesignaturehasbeenlinkedto604identities,whichindicatesthattheadversarymighthaveFigure9:Thedistributionofcommoncodeacrossmalware.
Signature#ofmaliciousappsc673c8a5f021a5bdc5c036ee30541dde1644a2993eaecf1e3c2bcad4769cb79f155612583be7d6ee0dca7e8d76ec68cf0ccd3a4a615f8956f66b67be5490ba6ac24b5c2699755986c2331f1d3bb4af2e88f485ca5a4b3d469Table3:Top5signaturesusedinapps.
createdmanyaccountstodistributehisapp(Table4).
Casestudies.
AmongthesuspiciousappsMassVetre-portedareasetofAPKsnotevenVirusTotalfoundtobemalicious.
Weanalyzed40samplesrandomlycho-senfromthissetandconcludedthat20ofthemwereindeedproblematicthroughmanualanalysis,likelytobezero-daymalware.
Wehavereportedthemto4malwarecompanies(F-Secure,Norton,Kaspersky,TrendMicro)forfurthervalidations.
Thebehaviorsoftheseappsin-cludeinstallingappswithoutuser'sconsent,collectinguser'sprivatedata(e.
g.
,takescreenshotsofotherapps)eventhoughsuchinformationdoesnotserveapps'statedfunctionalities,loadingandexecutingnativebinaryforcommandandcontrol.
Theseappsusevarioustechniquestoevadedetec-tion.
Forexample,somehidethesuspiciousfunction-alityforweeksbeforestartingtorunit.
"Durakcardgame"issuchangame,whichhasbeendownloadedover5,000,000times.
ItwasonGooglePlaybeforeBBCre-porteditonFebruary4th2015[25].
Sofar,onlytwoscannershostedbyVirusTotalcandetectit.
Thismal-waredisguisesaswarningmessageswhentheuserun-lockherAndroidsmartphone.
Itwaitsforseveralweeksbeforeperformingmaliciousactivities.
Itsadvertise-mentsalsodonotshowupuntilatleastonereboot.
Al-thoughGoogleremoves"Durakcardgame",otherappswithsimilarfunctionalitiesarestillonthePlayStorenow.
Wealsofoundthatsomemaliciousappsconcealtheirprogramlogicinsidenativebinaries.
Someevenencryptthebinariesanddynamicallydecryptthemforexecution.
FurthersomeutilizeJavareectionandotherobfuscationtechniquestocovertheirmaliciouscode.
Signature#ofdifferentidentities02d98ddfbcd202b13c49330182129e05604a2993eaecf1e3c2bcad4769cb79f155644782fd3091310ce901a889676eb4531f1e3219187c187a43b469fa1f995833080e7c3294c0520c6e71446f9ebdf8047705b7bda9145Table4:Top5signaturesusedbydifferentidentities.
1367224thUSENIXSecuritySymposiumUSENIXAssociation5DiscussionAsdiscussedbefore,MassVetaimsatrepackagingmal-ware,themainstayofpotentiallyharmfulmobileapps:thisisbecausemalwareauthorstypicallycannotaffordtospendalotoftimeandmoneytobuildapopularappjustforspreadingmalware,onlytobeforcedtodothisalloveragainonceitgetscaught.
Ourtechniqueex-ploitsaweaknessoftheirbusinessmodel,whichreliesonrepackagingpopularappswithasimilarattackpay-loadtokeepthecostformalwaredistributionlow.
Withthefundamentalityoftheissueandtheeffectivenessofthetechniqueonsuchmalware,ourcurrentimplementa-tion,however,isstilllimited,particularlywhenitcomestothedefenseagainstevasion.
Specically,thoughsimplyaddingthejunkviewscon-nectedtoanexistingapp'sviewgraphcanaffectuserex-perienceandthereforemaynotworkwell(Section3.
3),amoreeffectivealternativeistoobfuscatethelinksbe-tweenviews(callslikeStartActivity).
However,thistreatmentrendersanapp'sUIstructurelesscleartoouranalyzer,whichishighlysuspicious,asthevastmajorityofapps'viewgraphscanbedirectlyextracted.
Whatwecoulddoistoperformadynamicanalysisonsuchanapp,usingthetoolslikeMonkeytoexploretheconnectionsbetweendifferentviews.
Notethattheover-allperformanceimpactherecanstillbelimited,simplybecausemostappssubmittedtoanappstorearelegiti-mateandtheirUIstructurescanbestaticallydetermined.
Further,toevadethecommonalityanalysis,thead-versarycouldobfuscatethemaliciousmethods.
Asdis-cussedearlier(Section3.
3),thisattemptitselfisnontriv-ial,asthem-coresofthosemethodscanonlybemovedsignicantlyawayfromtheiroriginalvaluesthroughsubstantialchangestotheirCFGseachtimewhenale-gitimateappisrepackaged.
ThiscanbedonebyaddingalargeamountofjunkcodeontheCFGs.
Ourcurrentimplementationdoesnotdetectsuchanattack,sinceitisstillnothereinreal-worldmalwarecode.
Ontheotherhand,furtherstudiesarecertainlyneededtobetterunder-standandmitigatesuchathreat.
CriticaltothesuccessofourDiffComanalysisisre-movaloflegitimatelibraries.
Asanexample,wecouldutilizeacrawlertoperiodicallygathersharedlibrariesandcodetemplatesfromthewebtoupdateourwhitelists.
Further,asetofhigh-prolelegitimateappscanbeana-lyzedtoidentifythesharedcodemissedbythecrawler.
Whatcanalsobeleveragedhereisafewuniquere-sourcesinthepossessionoftheappmarket.
Forexam-ple,itknowstheaccountfromwhichtheappsareup-loaded,eventhoughtheyaresignedbydifferentcerti-cates.
Itislikelythatlegitimateorganizationsareonlymaintainingoneaccountandevenwhentheydohavemultipleones,theywillnotconcealtherelationsamongthem.
Usingsuchinformation,themarketcanndoutwhethertwoappsareactuallyrelatedtoidentifythein-ternallibrariestheyshare.
Ingeneral,giventhefactthatMassVetusesalargenumberofexistingapps(mostofwhicharelegitimate)tovetasmallsetofsubmissions,itisattherightpositiontoidentifyandremovemostifnotalllegitimatesharedcode.
6RelatedWorkMaliciousappdetection.
AppvettinglargelyreliesonthetechniquesfordetectingAndroidmalware.
Mostex-istingapproachesidentifymaliciousappseitherbaseduponhowtheylooklike(i.
e.
,content-basedsigna-ture)[20,27,21,45,51,57,19,54,17,22,4]orhowtheyact(behavior-basedsignature)[11,31,48,47,42,18,34].
Thoseapproachestypicallyrelyonheavyweightstaticordynamicanalysistechniques,andcannotdetecttheunknownmalwarewhosebehaviorhasnotbeenmod-eledapriori.
MassVetisdesignedtoaddresstheseis-suesbyleveraginguniquepropertiesofrepackagingmal-ware.
MostrelatedtoourworkisPiggyApp[54],whichutilizesthefeatures(permissions,APIs,etc.
)identiedfromamajorcomponentsharedbetweentwoappstondotherappsalsoincludingthiscomponent,thenclusterstherestpartoftheseapps'code,calledpiggybackedpay-loads,andsamplesfromindividualclusterstomanuallydeterminewhetherthepayloadsthereareindeedmali-cious.
Incontrast,MassVetautomaticallydetectsmal-warethroughinspectingthecodediffamongappswithasimilarUIstructureandthecommonmethodssharedbe-tweenthoseunrelated.
Whenitcomestothescaleofourstudy,ANDRUBIS[26,46]dynamicallyexaminedtheoperationsofover1millionappsinfouryears.
DifferentfromANDRUBIS,whichisanoff-lineanalyzerforre-coveringdetailedbehaviorofindividualmaliciousapps,MassVetismeanttobeafastonlinescannerforiden-tifyingmalwarewithoutknowingitsbehavior.
Itwentthrough1.
2millionofappswithinashortperiodoftime.
Repackagingandcodereusedetection.
Relatedtoourworkisrepackagingandcodereusedetection[55,21,1,9,10,41,35,5].
MostrelevanttoMassVetistheCentroidssimilaritycomparison[7],whichisalsopro-posedfordetectingcodereuse.
Althoughitisabuildingblockforourtechnique,theapproachitselfdoesnotde-tectmaliciousapps.
Signicanteffortwasmadeinourresearchtobuildview-graphandcodeanalysisontopofittoachieveanaccuratemalwarescan.
Also,todefeatcodeobfuscation,arecentproposalleveragesthesimi-laritybetweenrepackagedapps'UIstodetecttheirrela-tions[50].
However,itistooslow,requiring11secondstoprocessapairofapps.
Inourresearch,wecomeupwithamoreeffectiveUIcomparisontechnique,throughmappingthefeaturesofviewgraphstotheirgeometriccenters,asCentroidsdoes.
ThissignicantlyimprovestheperformanceoftheUI-basedapproach,enablingitto14USENIXAssociation24thUSENIXSecuritySymposium673helpvetalargenumberofappsinrealtime.
7ConclusionWepresentMassVet,aninnovativemalwaredetectiontechniquethatcomparesasubmittedappwithallotherappsonamarket,focusingonitsdiffswiththosehav-ingasimilarUIstructureandintersectionswithothers.
Ourimplementationwasusedtoanalyzenearly1.
2mil-lionapps,ascaleonparwiththatofGooglePlay,anddiscovered127,429maliciousapps,with20likelytobezero-day.
Theapproachalsoachievesahighercoveragethanleadinganti-malwareproductsinthemarket.
AcknowledgementWethankourshepherdAdamDoupeandanonymousre-viewersfortheirvaluablecomments.
WealsothankDr.
SencunZhuandDr.
FangfangZhangforsharingtheirViewDroidcode,whichallowsustounderstandhowtheirsystemworks,andVirusTotalforthehelpinval-idatingover100,000appsdiscoveredinourstudy.
IUauthorsweresupportedinpartbytheNSF1117106,1223477and1223495.
KaiChenwassupportedinpartbyNSFC61100226,61170281andstrategicpriorityre-searchprogramofCAS(XDA06030600).
PengLiuwassupportedbyNSFCCF-1320605andAROW911NF-09-1-0525(MURI).
References[1]ANDROGUARD.
Reverseengineering,malwareandgoodwareanalysisofandroidapplications.
.
.
andmore.
http://code.
google.
com/p/androguard/,2013.
[2]APPBRAIN.
Adnetworks-androidlibrarystatistics—app-brain.
com.
http://www.
appbrain.
com/stats/libraries/ad.
(Visitedon11/11/2014).
[3]APPCELERATOR.
6stepstogreatmobileapps.
http://www.
appcelerator.
com/.
2014.
[4]ARZT,S.
,RASTHOFER,S.
,FRITZ,C.
,BODDEN,E.
,BARTEL,A.
,KLEIN,J.
,LETRAON,Y.
,OCTEAU,D.
,ANDMCDANIEL,P.
Flowdroid:Precisecontext,ow,eld,object-sensitiveandlifecycle-awaretaintanalysisforandroidapps.
InPLDI(2014),ACM,p.
29.
[5]BAYER,U.
,COMPARETTI,P.
M.
,HLAUSCHEK,C.
,KRUEGEL,C.
,ANDKIRDA,E.
Scalable,behavior-basedmalwarecluster-ing.
InNDSS(2009),vol.
9,Citeseer,pp.
8–11.
[6]CHEN,K.
Alistofsharedlibrariesandadlibrariesusedinandroidapps.
http://sites.
psu.
edu/kaichen/2014/02/20/a-list-of-shared-libraries-and-ad-libraries-used-in-android-apps.
[7]CHEN,K.
,LIU,P.
,ANDZHANG,Y.
Achievingaccuracyandscalabilitysimultaneouslyindetectingapplicationclonesonan-droidmarkets.
InICSE(2014).
[8]CISCO.
"cisco2014annualsecurityreport,".
http://www.
cisco.
com/web/offer/gistty2asset/Cisco2014ASR.
pdf,2014.
[9]CRUSSELL,J.
,GIBLER,C.
,ANDCHEN,H.
Attackoftheclones:Detectingclonedapplicationsonandroidmarkets.
ES-ORICS(2012),37–54.
[10]CRUSSELL,J.
,GIBLER,C.
,ANDCHEN,H.
Scalablesemantics-baseddetectionofsimilarandroidapplications.
InESORICS(2013).
[11]ENCK,W.
,GILBERT,P.
,CHUN,B.
-G.
,COX,L.
P.
,JUNG,J.
,MCDANIEL,P.
,ANDSHETH,A.
Taintdroid:Aninformation-owtrackingsystemforrealtimeprivacymonitoringonsmart-phones.
InOSDI(2010),vol.
10,pp.
1–6.
[12]ENCK,W.
,OCTEAU,D.
,MCDANIEL,P.
,ANDCHAUDHURI,S.
Astudyofandroidapplicationsecurity.
InUSENIXsecuritysymposium(2011),vol.
2,p.
2.
[13]ENVATOMARKET.
Androidnoticationtemplateslibrary.
http://codecanyon.
net/item/android-notication-templates-library/5292884.
2014.
[14]ERNST,M.
D.
,JUST,R.
,MILLSTEIN,S.
,DIETL,W.
M.
,PERNSTEINER,S.
,ROESNER,F.
,KOSCHER,K.
,BARROS,P.
,BHORASKAR,R.
,HAN,S.
,ETAL.
Collaborativevericationofinformationowforahigh-assuranceappstore.
[15]F-SECURE.
F-secure:Internetsecurityforalldevices.
http://f-secure.
com,2014.
[16]F-SECURE.
Threatreporth22013.
Tech.
rep.
,f-secure,http://www.
f-secure.
com/documents/996508/1030743/ThreatReportH22013.
pdf,2014.
[17]FENG,Y.
,ANAND,S.
,DILLIG,I.
,ANDAIKEN,A.
Ap-poscopy:Semantics-baseddetectionofandroidmalwarethroughstaticanalysis.
InSIGSOFTFSE(2014).
[18]GILBERT,P.
,CHUN,B.
-G.
,COX,L.
P.
,ANDJUNG,J.
Vi-sion:automatedsecurityvalidationofmobileappsatappmar-kets.
InProceedingsofthesecondinternationalworkshoponMobilecloudcomputingandservices(2011),ACM,pp.
21–26.
[19]GRACE,M.
,ZHOU,Y.
,ZHANG,Q.
,ZOU,S.
,ANDJIANG,X.
Riskranker:scalableandaccuratezero-dayandroidmal-waredetection.
InProceedingsofthe10thinternationalconfer-enceonMobilesystems,applications,andservices(2012),ACM,pp.
281–294.
[20]GRIFFIN,K.
,SCHNEIDER,S.
,HU,X.
,ANDCHIUEH,T.
-C.
Automaticgenerationofstringsignaturesformalwaredetec-tion.
InRecentAdvancesinIntrusionDetection(2009),Springer,pp.
101–120.
[21]HANNA,S.
,HUANG,L.
,WU,E.
,LI,S.
,CHEN,C.
,ANDSONG,D.
Juxtapp:Ascalablesystemfordetectingcodereuseamongandroidapplications.
InDIMVA(2012).
[22]HUANG,H.
,CHEN,K.
,REN,C.
,LIU,P.
,ZHU,S.
,ANDWU,D.
Towardsdiscoveringandunderstandingunexpectedhazardsintailoringantivirussoftwareforandroid.
InAsiaCCS(2015),ACM,pp.
7–18.
[23]JING,Y.
,ZHAO,Z.
,AHN,G.
-J.
,ANDHU,H.
Morpheus:au-tomaticallygeneratingheuristicstodetectandroidemulators.
InProceedingsofthe30thAnnualComputerSecurityApplicationsConference(2014),ACM,pp.
216–225.
[24]KASSNER,M.
Googleplay:Android'sbouncercanbepwned.
http://www.
techrepublic.
com/blog/it-security/-google-play-androids-bouncer-can-be-pwned/,2012.
[25]KELION,L.
Androidadware'infectsmillions'ofphonesandtablets.
http://www.
bbc.
com/news/technology-31129797,2015.
[26]LINDORFER,M.
,NEUGSCHWANDTNER,M.
,WEICHSEL-BAUM,L.
,FRATANTONIO,Y.
,VANDERVEEN,V.
,ANDPLATZER,C.
Andrubis-1,000,000appslater:Aviewoncurrentandroidmalwarebehaviors.
InProceedingsofthethe3rdInter-nationalWorkshoponBuildingAnalysisDatasetsandGatheringExperienceReturnsforSecurity(BADGERS)(2014).
[27]LINDORFER,M.
,VOLANIS,S.
,SISTO,A.
,NEUGSCHWANDT-NER,M.
,ATHANASOPOULOS,E.
,MAGGI,F.
,PLATZER,C.
,ZANERO,S.
,ANDIOANNIDIS,S.
Andradar:Fastdiscoveryofandroidapplicationsinalternativemarkets.
InDIMVA(2014).
[28]LU,L.
,LI,Z.
,WU,Z.
,LEE,W.
,ANDJIANG,G.
Chex:stat-icallyvettingandroidappsforcomponenthijackingvulnerabili-ties.
InProceedingsofthe2012ACMconferenceonComputerandcommunicationssecurity(2012),ACM,pp.
229–240.
[29]NETWORKX.
Pythonpackageforcreat-ingandmanipulatinggraphsandnetworks.
https://pypi.
python.
org/pypi/networkx/1.
9.
1,2015.
1567424thUSENIXSecuritySymposiumUSENIXAssociation[30]OBERHEIDE,J.
,ANDMILLER,C.
Dissectingtheandroidbouncer.
SummerCon2012,NewYork(2012).
[31]RASTOGI,V.
,CHEN,Y.
,ANDENCK,W.
Appsplayground:Au-tomaticsecurityanalysisofsmartphoneapplications.
InProceed-ingsofthethirdACMconferenceonDataandapplicationsecu-rityandprivacy(2013),ACM,pp.
209–220.
[32]RASTOGI,V.
,CHEN,Y.
,ANDJIANG,X.
Catchmeifyoucan:Evaluatingandroidanti-malwareagainsttransformationattacks.
InformationForensicsandSecurity,IEEETransactionson9,1(2014),99–108.
[33]READING,I.
.
D.
Googleplayexploitsbypassmalwarechecks.
http://www.
darkreading.
com/risk-management/google-play-exploits-bypass-malware-checks/d/d-id/1104730,62012.
[34]REINA,A.
,FATTORI,A.
,ANDCAVALLARO,L.
Asystemcall-centricanalysisandstimulationtechniquetoautomaticallyrecon-structandroidmalwarebehaviors.
EuroSec,April(2013).
[35]REN,C.
,CHEN,K.
,ANDLIU,P.
Droidmarking:resilientsoft-warewatermarkingforimpedingandroidapplicationrepackag-ing.
InASE(2014),ACM,pp.
635–646.
[36]SMALI.
Anassembler/disassemblerforandroid'sdexformat.
http://code.
google.
com/p/smali/,2013.
[37]SQUARE,G.
Dexguard.
https://www.
saikoa.
com/dexguard,2015.
[38]SQUARE,G.
Proguard.
https://www.
saikoa.
com/proguard,2015.
[39]STATISTA.
Statista:Thestatisticsportal.
http://www.
statista.
com/,2014.
[40]STORM,A.
Storm,distributedandfault-tolerantrealtimecom-putation.
https://storm.
apache.
org/.
[41]VIDAS,T.
,ANDCHRISTIN,N.
Sweeteningandroidlemonmar-kets:measuringandcombatingmalwareinapplicationmarket-places.
InProceedingsofthethirdACMconferenceonDataandapplicationsecurityandprivacy(2013),ACM,pp.
197–208.
[42]VIDAS,T.
,TAN,J.
,NAHATA,J.
,TAN,C.
L.
,CHRISTIN,N.
,ANDTAGUE,P.
A5:Automatedanalysisofadversarialandroidapplications.
InProceedingsofthe4thACMWorkshoponSecu-rityandPrivacyinSmartphones&MobileDevices(2014),ACM,pp.
39–50.
[43]VIRUSTOTAL.
Virustotal-freeonlinevirus,malwareandurlscanner.
https://www.
virustotal.
com/,2014.
[44]VIRUSTOTAL.
Virustotalforandroid.
https://www.
virustotal.
com/en/documentation/mobile-applications/,2015.
[45]WALENSTEIN,A.
,ANDLAKHOTIA,A.
Thesoftwaresimi-larityprobleminmalwareanalysis.
Internat.
Begegnungs-undForschungszentrumf¨urInformatik,2007.
[46]WEICHSELBAUM,L.
,NEUGSCHWANDTNER,M.
,LINDORFER,M.
,FRATANTONIO,Y.
,VANDERVEEN,V.
,ANDPLATZER,C.
Andrubis:Androidmalwareunderthemagnifyingglass.
Vi-ennaUniversityofTechnology,Tech.
Rep.
TRISECLAB-0414-001(2014).
[47]WU,C.
,ZHOU,Y.
,PATEL,K.
,LIANG,Z.
,ANDJIANG,X.
Airbag:Boostingsmartphoneresistancetomalwareinfection.
InNDSS(2014).
[48]YAN,L.
K.
,ANDYIN,H.
Droidscope:seamlesslyreconstruct-ingtheosanddalviksemanticviewsfordynamicandroidmal-wareanalysis.
InInUSENIXrSecurity12'.
[49]YAN,P.
Alookatrepackagedappsandtheireffectonthemobilethreatlandscape.
http://blog.
trendmicro.
com/trendlabs-security-intelligence/a-look-into-repackaged-apps-and-its-role-in-the-mobile-threat-landscape/,72014.
Visitedon11/10/2014.
[50]ZHANG,F.
,HUANG,H.
,ZHU,S.
,WU,D.
,ANDLIU,P.
View-droid:Towardsobfuscation-resilientmobileapplicationrepack-agingdetection.
InProceedingsofthe7thACMConferenceonSecurityandPrivacyinWirelessandMobileNetworks(WiSec2014).
ACM(2014).
[51]ZHANG,Q.
,ANDREEVES,D.
S.
Metaaware:Identifyingmeta-morphicmalware.
InComputerSecurityApplicationsConfer-ence,2007.
ACSAC2007.
Twenty-ThirdAnnual(2007),IEEE,pp.
411–420.
[52]ZHANG,Y.
,YANG,M.
,XU,B.
,YANG,Z.
,GU,G.
,NING,P.
,WANG,X.
S.
,ANDZANG,B.
Vettingundesirablebehaviorsinandroidappswithpermissionuseanalysis.
InProceedingsofthe2013ACMSIGSACconferenceonComputer&communicationssecurity(2013),ACM,pp.
611–622.
[53]ZHENG,M.
,LEE,P.
P.
,ANDLUI,J.
C.
Adam:Anautomaticandextensibleplatformtostresstestandroidanti-virussystems.
InDetectionofIntrusionsandMalware,andVulnerabilityAs-sessment(2013),pp.
82–101.
[54]ZHOU,W.
,ZHOU,Y.
,GRACE,M.
,JIANG,X.
,ANDZOU,S.
Fast,scalabledetectionofpiggybackedmobileapplications.
InCODASPY(2013).
[55]ZHOU,W.
,ZHOU,Y.
,JIANG,X.
,ANDNING,P.
Detectingrepackagedsmartphoneapplicationsinthird-partyandroidmar-ketplaces.
InProceedingsofthesecondACMconferenceonDataandApplicationSecurityandPrivacy(2012),ACM,pp.
317–326.
[56]ZHOU,Y.
,ANDJIANG,X.
Dissectingandroidmalware:Char-acterizationandevolution.
InSecurityandPrivacy(SP),2012IEEESymposiumon(2012),IEEE,pp.
95–109.
[57]ZHOU,Y.
,WANG,Z.
,ZHOU,W.
,ANDJIANG,X.
Hey,you,getoffofmymarket:Detectingmaliciousappsinofcialandalternativeandroidmarkets.
InNDSS(2012).
[58]ZORZ,Z.
1.
2info.
http://www.
net-security.
org/secworld.
phpid=15976,112013.
(Visitedon11/10/2014).
8AppendixAppstore#ofmaliciousapps#oftotalappsstudiedPercentageCountryAnzhi179214605538.
91ChinaYidong1088302635.
96Chinayy138828295028.
07ChinaAnfen365157223.
22ChinaSlideme32851536721.
38USAndroidLeyuan997605316.
47Chinagfun1777910873616.
35China16apk40082571415.
59ChinaPandaapp15771067914.
77USLenovo97996883914.
23ChinaHaozhuo1100805213.
66ChinaDangle29922218313.
49China3533world1331988613.
46ChinaAppchina83966244913.
44ChinaWangyi8566312.
82ChinaYouyi408362811.
25ChinaNduo2019010.
53ChinaSogou24142377410.
15ChinaHuawei148146610.
1ChinaYingyongbao27228129.
67ChinaAndroidRuanjian19823088.
58ChinaAnji3467416078.
33ChinaAndroidMarket1997243328.
21ChinaOpera4852618667.
84EuropeMumayi6129795947.
7ChinaGoogle305524015497.
61USXiaomi832121396.
85Chinaothers2377386486.
15ChinaAmazon5910015.
89USBaidu831211223.
93China7xiazi898261953.
43ChinaLiqu394263921.
49ChinaGezila3050000.
6ChinaTable5:AppCollection&MalwareinDifferentMarkets.
16
Pia云商家在前面有介绍过一次,根据市面上的信息是2018的开办的国人商家,原名叫哔哔云,目前整合到了魔方云平台。这个云服务商家主要销售云服务器VPS主机业务和服务,云服务器采用KVM虚拟架构 。目前涉及的机房有美国洛杉矶、中国香港和深圳地区。洛杉矶为crea机房,三网回程CN2 GIA,自带20G防御。中国香港机房的线路也是CN2直连大陆,比较适合建站或者有游戏业务需求的用户群。在这篇文章中,简...
tmhhost怎么样?tmhhost正在搞暑假大促销活动,全部是高端线路VPS,现在直接季付8折优惠,活动截止时间是8月31日。可选机房及线路有美国洛杉矶cn2 gia+200G高防、洛杉矶三网CN2 GIA、洛杉矶CERA机房CN2 GIA,日本软银(100M带宽)、香港BGP直连200M带宽、香港三网CN2 GIA、韩国双向CN2。点击进入:tmhhost官方网站地址tmhhost优惠码:Tm...
湖南百纵科技有限公司是一家具有ISP ICP 电信增值许可证的正规公司,多年不断转型探索现已颇具规模,公司成立于2009年 通过多年经营积累目前已独具一格,公司主要经营有国内高防服务器,香港服务器,美国服务器,站群服务器,东南亚服务器租用,国内香港美国云服务器,以及全球专线业务!活动方案:主营:1、美国CN2云服务器,美国VPS,美国高防云主机,美国独立服务器,美国站群服务器,美国母机。2、香港C...
esetnod32id为你推荐
regularitygraph支持ipad支持ipad支持ipad勒索病毒win7补丁怎么删除 防勒索病毒 打的补丁联通版iphone4s苹果4s是联通版,或移动版,或全网通如何知道?360chromechrome是什么文件夹?是360急速浏览器吗?但是怎么没有卸载掉?csshackcss常见的hack方法有哪些迅雷下载速度迅雷限制下载速度要设置多少杀毒软件免费下载2013排行榜哪里有免费好用的杀毒软件
沈阳虚拟主机 备案域名出售 siteground 外国服务器 mobaxterm 免费名片模板 免费smtp服务器 嘉洲服务器 福建天翼加速 太原联通测速平台 刀片服务器的优势 可外链相册 中国电信宽带测速网 爱奇艺vip免费领取 创建邮箱 东莞idc 河南移动梦网 美国凤凰城 测速电信 umax 更多