Mahindru&SangalInternationalJournalonEmergingTechnologies11(3):516-525(2020)516InternationalJournalonEmergingTechnologies11(3):516-525(2020)ISSNNo.
(Print):0975-8364ISSNNo.
(Online):2249-3255DLDroid:FeatureSelectionbasedMalwareDetectionFrameworkforAndroidAppsdevelopedduringCOVID-19ArvindMahindru1,2andA.
L.
Sangal31ResearchScholar,DepartmentofComputerScienceandEngineering,Dr.
B.
R.
AmbedkarNationalInstituteofTechnologyJalandhar-144001(Punjab),India.
2AssistantProfessor,DepartmentofComputerScienceandApplications,DAVUniversity,Jalandhar144012(Punjab),India.
3Professor,DepartmentofComputerScienceandEngineering,Dr.
B.
R.
AmbedkarNationalInstituteofTechnologyJalandhar-144001(Punjab),India.
(Correspondingauthor:ArvindMahindru)(Received20April2020,Revised14May2020,Accepted17May2020)(PublishedbyResearchTrend,Website:www.
researchtrend.
net)ABSTRACT:COVID-19actedasawindowofopportunityforcybercriminalstodevelopmalware-infectedapps.
Duringthislockdownperiod,everyoneissittingathomesandinteractingwithothersmostlythroughsmartphones.
WithanexponentialincreaseinAndroidappsandhenceinAndroidmalware,ithasbecomereallychallengingthathowtosecureuser'sprivacy.
Forthispurpose,anumberofacademiciansandresearchershaveproposedvarioussignature-basedandmachinelearningapproachestodetectAndroidmalware.
Signature-basedapproachescandetectonlyknownmalwarewhosesignaturedefinitionsarealreadypresentinitsdatabase.
Ontheotherhand,machinelearningapproaches,whichwereproposedintheliteratureweredevelopedeitherwithirrelevantfeaturesornotabletodetectmalwarewhicharedevelopedduringCOVID-19pandemic.
ToovercometheseissuesitbecomeshighlyessentialtodevelopaneffectiveandefficientAndroidmalwaredetectionmodel.
Therefore,inthisresearchpaper,11,000distinctAndroidappsarecollected,thatbelongtotwelvedifferentcategoriesofAndroidapps.
Atotalof1844uniquefeaturesfromthesegatheredAndroidappsareextractedandusingtendistinctfeatureselectionapproachesirrelevantfeatureshavebeenremoved.
Afterthat,anAndroidmalwaredetectionframeworkisdevelopedbyusingsignificantfeaturesasinputandDeepNeuralNetwork(DNN)asmachinelearningtechnique.
TheexperimentresultsrevealthatthemodeldevelopedbyusingroughsetanalysisasfeatureselectionapproachalongwithDNNcandetect97.
9%malwarefromreal-worldapps.
Keywords:Androidapps,Permissionsmodel,APIcalls,DeepNeuralNetwork(DNN),Featureselection,Intrusion-detection,Cybersecurity,smartphone.
I.
INTRODUCTIONCOVID-19isaglobalcalamitythatstartedinDecember2019,inWuhan,Hubei,China[1],onanunbelievablescale,withdevastatingconsequences.
Ithasnotonlypaideffectonhealthindustryratheritpaideffectontheothersectorstoo,likeEducation,Banking,ITandBusiness.
Tofightwiththisnoveldisease,publichealthofficialsandlocalcommunitiessuggest"contacttracing"smartphoneapps.
Indiangovernmentreleased"AarogyaSetu"[2],WHOreleasedMyHealth[3],Italygovernmentlaunched"Immuni"[4],Singaporeangovernmentreleased"TraceTOgether"[5].
Thesesmartphoneappsdemandpermissionsrelatedtoapproximatelocation,preciselocation,bluetoothanddatasharing.
TheproperfunctioningofanAndroidappdependsuponthepermissionmodel.
Therefore,permissionsplayavitalroleinthestudyofsmartphonesecurity,ascybercriminalsusethesepermissionstostealthesensitiveorpersonalinformationoftheusersfromtheirsmartphones.
GrowthofAndroidmalwarehasbecomeaseriousthreatforuser'ssensitiveinformationandprivacy.
AccordingtothereportpublishedbyGDATA[6],cybercrooksmademorethan10,000malware-infectedappsondailybasis.
Itmeansthatinevery8secondsamalware-infectedappisdeveloped.
GoogleintroducedGoogleBouncer[7]intheyear2012forscanningtheexistingandnewappsinitsofficialplaystore.
ButGooglebouncerhasanumberoflimitations[8]andhasfailedtoachieveabetterdetectionrate.
Lateron,GoogleintroducedGoogleplayprotectinplaystoreforscanningtheAndroidappsatthetimeofdownloadingandinstallation.
AccordingtothereportpublishedbyMacAfee[9],inthefirstquarterof2020;1,000,000newmalwaredetectedintheQ4of2019.
Toaddressthisissue,intheliteratureanumberofauthorsproposedsignature-based[10]andmachinelearningapproaches[11-13]fordetectingmalwarefromAndroiddevices.
Signature-basedapproachescanidentifyonlythosemalwarewhosesignatureisalreadypresentinitsdatabase.
Ontheotherhand,machine-learningapproachesproposedbyacademiciansandresearchersareexaminedonthelimiteddataset.
So,tobuildaneffectiveandefficientAndroidmalwaredetectionmodel,inthisresearchpaper,wecollect11,000distinctAndroidapps,whichfurtherbelongtotwelvedifferentcategoriesofAndroidapps.
Weextract1844uniquefeaturesfromthesemanagedappsanddividethemintothirtydifferentfeaturesets.
Theperformanceofthemachine-learningalgorithmisbasedonthefeaturesbywhichitistrained.
Toremoveirrelevantfeaturesandmisclassifiederrors,inthisetMahindru&SangalInternationalJournalonEmergingTechnologies11(3):516-525(2020)517researchpaper,webuildandcomparethemodelbyusingtendifferentfeatureselectionapproaches.
Inthepastfewyears,themalwaredetectionmodeldevelopedbyconsideringDeepNeuralNetwork(DNN)hasachievedabetterdetectionrate.
DNNhasanabilitytolearnfromfeaturesanddoclassificationsimultaneouslytoachievebetterresults.
Motivatedbythis,inthisstudy,weusepermissions,APIcalls,numberoftheuserdownloadtheapps,andratingoftheappasinputfeaturestotrainwithDNN.
Themainreasonforconsideringpermissionasoneofthefeaturesisthatbyusingpermission,cybercriminalcaneasilyinteractwithuser'sinformationandstealsensitiveinformationfromuser'ssmartphones.
Seeingthecurrentsituation,mostoftheorganizationshaverequested,theirworkforcetoworkfromhome.
SeveralcountriessuchasIndia,China,Italy,France,Poland,NewZealandandtheUKhavegoneintofulllockdownandhumanbeingsareforcedtostayindoors.
So,peopleareentirelydependentuponthemobileappsforcommunication,news,entertainment,business,medical,health&fitness,dating,socialinteractionsetc.
Therefore,COVID-19hasbecomeanewweaponforcyberattackerstodevelopanumberofmalware-infectedAndroidappsinthenamesofCOVID-19andspreadingransomare,trojanandAdware.
So,smartphonesecuritybecomeshighlyimportantduringthistime.
Theuniqueandnovelcontributionsofthispaperareasfollows:–Tothebestofourknowledge,thisisthefirstresearchworkinwhich11,000distinctAndroidapps[14]arecollectedwhicharedevelopedduringCOVID-19pandemic.
–Inthisstudy,tendistinctfeatureselectionapproachesareusedtoremoveirrelevantfeatures.
TobuildeffectiveandefficientmalwaredetectionmodelweconsiderDeepNeuralNetwork(DNN)asamachine-learningalgorithm.
–CollectedappsbelongtotwelvedifferentcategoriesofAndroidapps,fromwhich1844uniquefeaturesareextractedtobuildeffectiveandefficientAndroidmalwaremodel.
–Proposedmalwaredetectionapproachisabletodetectmalwareinlesstimewhencomparedtopreviousdistinctanti-virusscannersavailableinthemarket.
Restofthepaperissummarizedasfollows.
InsectionII,wedescribetherelatedworkthathasbeendonesofarinthefieldofAndroidmalwaredetectionandgapspresentintheliterature.
SectionIII,representstheformulationofexperimentaldatasetandcreationoffeaturesets.
FeatureselectionapproachisdiscussedinsectionIV.
InsectionV,wediscussaboutthemachinelearningtechniqueusedinthisresearchpaper.
SectionVI,discussesaboutthedifferentmethodsonwhichwewillcompareourproposedmodel.
PerformanceparametersforevaluatingourproposedmodelarediscussedinsectionVII.
InsectionVIIIandIX,wediscussabouttheexperimentalsetupandresultsofourperformedexperiment.
InsectionX,wepresenttheconclusionandfuturework.
II.
RELATEDWORKInthissectionofthepaper,wediscussaboutthepreviousapproachesorframeworksdevelopedforAndroidmalwaredetection.
Farukietal.
,(2013)proposedAndroSimilarthatgeneratesanautomaticsignaturethatextractsstatisticallysyntacticfeatures,whichareusedformalwaredetection[4].
Andrubis[15]isaweb-basedmalwareanalysisplatforminwhichtheusercansubmitappsthroughwebservice,andafteranalyzingtheappbehavior,itreturnsdetailappisbenignormalware.
Aurasium[16]takescontroloftheexecutionofapps,byapplyingarbitrarysecuritypoliciesatrun-time.
ItrepackagestheAndroidappstoincludecodeforpolicyenforcement,andanyprivacyviolationsareinformedtotheuser.
Aurasiumhasalimitation;itcannotnotethemaliciousbehaviorifanappchangesitssignature.
CopperDroid[17]performscall-centricdynamicanalysisofAndroidapps;usingVirtualMachineIntrospection.
Authorsexperimentedwithmorethan2900Androidmalwaresamples,andthetechniqueproposedbythemshowsconclusivedetectionofmalwarebehavior.
MahindruandSingh(2017)extract123dynamicpermissionsfrom11,000distinctAndroidappsandappliedfivedifferentmachine-learningalgorithms,i.
e.
,NaveBayes,RandomForest,SimpleLogistic,DecisionTree,andk-star.
Outoffive-implementedmachinelearningalgorithms,Simplelogisticperformbetterindetectingmalwarefromreal-worldapps[13].
MahindruandSangal(2019)proposed"DeepDroid",whichworksonDeepNeuralNetwork(DNN)andPrincipalComponentAnalysis(PCA)asfeatureselectionmethod.
Anexperimentwasperformedon1,20,000Androidappsandachievedthedetectionrateof94%[18].
CrowDroid[19]isabehavior-basedmalwaredetectionsystem,whichworksontwocomponents,i.
e.
,acrowdsourcingappwhichneedstobeinstalledonuserdevicesandsecondontheremoteserverformalwaredetection.
CrowDroidwiththehelpofcrowdsourcingappsendsthebehavioraldataintheformoflog-filetotheremoteserver.
Attheremoteserver,thecollectedbehavioraldataisprocessedtocreatefeaturevectorbyusing2-meanclusteringalgorithmtopredictwhethertheappismaliciousorbenign.
However,ithaslimitation,CrowDroidappalwaysdraintheavailabledeviceresources.
MahindruandSangal(2020)proposed"PerbDroid",whichcandetectlimitedmalwarefamilies[12].
Featureswereselectedbyimplementingsixdistinctfeature-rankingapproaches(i.
e.
,PrincipalComponentAnalysis(PCA),GainRatio,Chi-squaredtest,Informationgainfeatureevaluation,OneRfeatureevaluation,andLogisticregressionanalysis).
Further,withselectedfeatures,theydevelopedsixtydistinctmodelsbyusingtendiscretemachine-learningalgorithms.
ThemodeldevelopedbyusingaDeepneuralnetworkandPCAachievedadetectionrateof97.
8%using2,00,000differentAndroidapps.
TaintDroid[20]tracktheprivacy-sensitiveinformationleakageinthethird-partydeveloperapps.
Wheneverthesensitivedataleavefromthesmartphone,TaintDroidrecordsthelabeloftheparticulardataandtheapp,whichreferredthedataalongwithitsdestinationaddress.
MahindruandSangal(2020)comparetheperformanceofsupervisedandsemi-supervisedmachinelearningalgorithmsbyusingfeaturesubsetselectionapproaches[11].
TheyimplementedLLGCasasemi-supervisedmachine-learningalgorithmandachievedahigherdetectionrateonmoderatedataset.
ClassificationMahindru&SangalInternationalJournalonEmergingTechnologies11(3):516-525(2020)518algorithmshavealsoachievedthehigherpredictionrateondiseasedataset[21].
Table1highlightsaboutthefeatureselectionapproachesanddatasetusedbydifferentresearchersandacademiciansintheirwork.
FromTable1,itisseenthatresearchershadappliedlimitedfeatureselectionapproachesontheircollecteddatasetandasitisknownthatsignificantfeaturesplayamajorroleindevelopingamalwaredetectionmodel,therefore,toovercomethis,inthisstudytendistinctfeatureselectionapproachesareimplementedtoselectsignificantfeatures.
Now,basedontheliteraturereview,weconsiderthefollowingresearchquestionsinthisresearchpaper.
A.
ResearchQuestionsRQ1.
WhichfeatureselectionapproachismoreeffectivefordetectingmalwarefromAndroidappsToexaminethisquestion,inthisstudy,weappliedtendistinctfeatureselectionapproachesanddevelopedmodelsbyconsideringDNNasamachine-learningalgorithm.
Further,theperformanceofthedevelopedmodeliscomparedwithtwodistinctperformanceparameters,i.
e.
,F-measureandaccuracy.
RQ2.
Isthefeatureselectionapproacheseffectontheoutcomeofthemachine-learningalgorithmToanswerthisquestion,wecomparetheperformanceoffeatureselectionapproacheswithallextractedfeaturesets.
Table1:Featureselectionapproachesimplementedintheliterature.
ProposedFrameworkFeatureSelectionTechniqueUsedPerbDroid[12]PrincipalComponentAnalysis(PCA),GainRatio,OneRfeature,Informationgainfeatureevaluation,Logisticregressionanalysisevaluation,andChi-squaretestMahindruandSangal(2019)[11]ConsistencySub-setEvaluationApproach,FilteredSub-setEvaluation,RoughSetAnalysisApproachandApproachBasedonCorrelationAzmoodehetal.
,[22]InformationGainShabtaietal.
,[23]Fisherscore,Chi-squareandInformationGainMas'udetal.
[24]InformationgainandChi-squareMKLDroid[25]Chi-squaredIII.
FORMULATIONOFEXPERIMENTALDATASETANDCREATIONOFFEATURESETSToreducetheeffectofAndroidmalwareandforbuildinganeffectivemalwaremodelthatiscapabletodetectmalwarefromrealworldapps,inthisresearchpaper,wecollect11,000distinctAndroidappsthatbelongtotwelvedifferentcategoriesofAndroidapps.
WecollectedAndroidapplicationpackages(.
apk)fromGoogleofficialplaystoreandthirdpartyappstoresi.
e,APKmirror[26]andAllFreeAPK[27].
TheseapkfilesarepublishedfromDecember2019toApril2020intheserepositories.
Thisdatasetisavailablepublicly[28].
Table2:NumberofAndroidappsusedinthisresearchwork.
IDCategoryGoogleplayThird-partyappstoreDS1Business3291750DS2Education1352000DS3Game1001820DS4Entertainment183152DS5SocialMedia128760DS6Travel&Local1232DS7Food&Drink17665DS8Finance184250DS9Medical187560DS10Health&Fitness1201000DS11News&Magazine139500DS12Dating185980Table2showsthecategorywisenumberof.
apkfilesconsideredinthisstudy.
Outofcollected11,000.
apkpackages,5,500aremalwareinfected.
Virus-total[29]identifyMalwarepackages.
A.
CreationoffeaturesetsAftercollectingAndroidappsfromdifferentpromiserepositories,weextractpermissionsandAPIcalls,whichweredemandedbyAndroidappsduringitsinstallationandrun-time.
ForextractingfeaturesfromAndroidapps,weusedAndroidstudioasanemulatorandself-writtenjavaprogramtoextractfeaturesfromthemmentionedin[13].
Weextract1532uniquepermissionsand310APIcallsfordevelopingmalwaredetectionmodel.
ListofextractedpermissionsandAPIcallsareavailableforresearchersandacademicians[30].
Atotalof1844-dimensionalBooleanvector,where"1"impliesthattheapprequiresthefeatureand"0"impliesthatthefeatureisnotrequired.
ItisverycommonthatbenignandnormalappsmayrequestasimilarsetofpermissionsandAPIcallsforitsexecution.
PermissionsoverviewgivenbyGoogle[31]isusedtodescribethebehaviorofpermissioni.
e.
,"dangerous"or"normal".
AfterextractingthepermissionsandAPIcalls,wedividethemintothirtydifferentfeaturesets,whichareshowninTable3.
Inthisresearchpaper,wealsoconsidertheratingofanappandnumberoftheuserdownloadtheappasfeatures.
Tonormalizethedata,weusedtheMin-maxapproach.
Thisapproachisbasedontheprincipleofalineartransformation,whichbringeachdatapointoffeaturetoanormalizedvalue,thatlieinbetween0Thefollowingequationisconsideredtofindthenormalizedvalueof:NormalizedD=()()(),wheremin(Q)&max(Q)aretheminimumandmaximumsignificanceofattributeQ,respectively.
Mahindru&SangalInternationalJournalonEmergingTechnologies11(3):516-525(2020)519Table3:FormulationofSetscontaining(Appdownloadedbynumberofusers,permissions,APIcallsandratingoftheApp)asfeatures.
No.
DescriptionrelatedtoNo.
DescriptionrelatedtoFS1PhoneStateandPhoneConnectionFS2AudioandVideoFS3BundleFS4LogFileFS5SynchronizationDataFS6ContactInformationFS7SystemSettingsFS8BrowserInformationFS9CalendarInformationFS10AccountSettingsFS11LocationInformationFS12WidgetFS13SystemToolsFS14NetworkInformationandBluetoothInformationFS15UniqueIdentifierFS16FileInformationFS17ServicesThatCostYouMoneyFS18PhoneCallsFS19DatabaseInformationFS20ImageFS21Containinfo.
RelatedtoAPIcallsFS22Containinfo.
RelatedtoratinganddownloadsFS23YourAccountsFS24StorageFileFS25SMSMMSFS26ReadFS27AccessActionFS28ReadandWriteFS29HardwareControlsFS30DefaultgroupIV.
FEATURESELECTIONAPPROACHESInthispaper,weimplementedtendistincttypesoffeatureselectionapproachesonalargecollectionof1844features(dividedintothirtydistinctfeaturesets)toidentifythebestsubsetoffeatureswhichassistustodetectmalwaredetectionwithbetterdetectionrateandalsominimizethefigureofmisclassificationerrors.
Table4representsthedifferentfeatureselectionapproachesusedinthisstudy.
Table4:Featureselectionapproaches.
NameofthefeatureselectionapproachDescriptionGain-ratiofeatureselectionapproach[12]Thisapproachworkonthepredictionofthegain-ratioinrelationtotheclasstowhichtheappbelong.
The"Z"knownasthegain-ratiooffeatureismeasuredas:-Gainratio=inthisGain()=)()(hereArepresentsthefeaturesetcontainsXamountofinstanceshavingndistinctclasses.
Chi-Squarefeatureselectionapproach[12]Thistestisutilizedtoinvestigatetheself-determinationbetweentwosituations,andinourstudy,rankingoffeaturesarebasedonthesignificanceofitsstatistic,whichisrelatedtotheclass.
Higherthecalculatedvalueimpliesthedenialoftheoutliersandasaresult,theseselectedfeaturescanbeconsideredasbetterrelevanceindetectingmalwareinfectedapps.
Information-gainfeatureselectionapproach[12]InInfo-gain,featuresareselectedonitsrelationwithrespecttotheclass,whichitbelong.
OneRfeatureselectionapproach[12]OneRfeatureselectionapproachisutilizedforrankingthefeatures.
Torankindividualfeaturesutilizesittheclassificationmechanism.
Initvaluablefeaturesareconsideredasconstantonesandpartitionthesetofvaluesintoafewdissociateintervalsmadebystraightforwardapproach.
Inthisstudy,weconsideronlyfeaturesthatishavingbetterclassificationrates.
PrincipalComponentAnalysis(PCA)[12]ReductionofattributeisaccomplishedbyimplementingPCAonourcollecteddataset.
PCAhelpsintransformingahighdimensiondataspaceintoalowdimensiondataspace.
Features,whicharepresentinlowdimension,haveextremeimportanceindetectingmalware.
Logisticregressionanalysis[12]Forfeatureranking,UnivariateLogisticRegression(ULR)analysisbeingconsideredtoverifythedegreeofimportanceforeveryfeaturesets.
Filteredsubsetevaluation[11]Basedontheprincipletoselectrandomsubsetevaluatorfromdatasetthatwasgainedbyapplyingarbitraryfilteringapproach.
Consistencysubsetevaluationapproach[11]Thistechniqueprovidestheimportanceofsubsetofattributesbytheirlevelofconsistencyappearinginclassvalues,whenthetraininginstancesareappliedonthesubsetofattributes.
Roughsetanalysis[11]Thisapproachisanestimationofconventionalset,intermsofajoinsoffeaturesetsthatprovidetheupperandthelowerestimationoftheoriginaldataset.
Correlationbasedfeatureselection[11]Thisapproachisbasedoncorrelationapproachwhichselectasubsetoffeaturesthatareparticularlyrelatedtotheclass(i.
e.
,benignormalware).
V.
MACHINELEARNINGTECHNIQUETodevelopaneffectiveandefficientAndroidmalwaredetectionmodel,weconsidertheDeepLearningModel(i.
e.
DNN)asamachinelearningtechnique.
Intheliterature,anumberofauthorsproposedtheconstructionofaDeepLearningModelwithConvolutionalneuralnetworks(CNN)andDeepBeliefNetworks(DBN)[12,18].
Inthepresentpaper,weconsiderCNNarchitectureforbuildingthedeeplearningmodel[12].
DNNcanbeassembledwithdifferentdeeparchitecturei.
e.
,DeepBeliefNetworks(DBN)andConvolutionalneuralnetworks(CNN).
Inthepresentpaper,weselectDBNarchitecturetodevelopourdeeplearningmodel.
Fig.
1demonstratesthearchitectureofdeeplearningmethod.
Itisdividedintotwostages,oneissupervisedback-Mahindru&SangalInternationalJournalonEmergingTechnologies11(3):516-525(2020)520propagationandsecondstageisunsupervisedpre-training.
Intheearlystageofmodelbuilding,RestrictedBoltzmannMachines(RBM),withthedeepneuralnetworkareusedtotrainedthemodel.
Intrainingstep,iterativeprocessisusedtobuildthemodelwithunlabeledAndroidapps.
Intheback-propagationstage,pre-trainedDBNisfine-tunedwithlabeledAndroidappsinasupervisedmanner.
ModelbuildbyconsideringdeeplearningmethoduseanAndroidappinbothstagesofthetrainingprocess.
Fig.
1.
DNNModel.
VI.
COMPARISONOFPROPOSEDMODELWITHDIFFERENTEXISTINGTECHNIQUESToexaminethatourdevelopedframeworkisabletoachieveahigherdetectionrateornot,inthisresearchpaper,weanalyzetheoutcomeofourproposedmodelwithtwodistinctmethodswhicharementionedbelow:(a)Comparisonofresultswithpreviouslyusedclassifiers:Toverifythatourdevelopedmodelisfeasibletodetectmalwareasequivalenttopreviouslyusedclassifiersornot,wevalidateitbasedontwoperformanceparametersi.
e.
,F-measureandAccu-racy.
(b)ComparisonofresultswithdifferentAnti-Virusscanners:Toanalyzetheperformanceofourmodelformalwaredetection,wechosetenavailabledistinctanti-virusscannersandcomparetheirdetectionratewiththedetectionrateoftheproposedmodel.
VII.
EVALUATIONOFPERFORMANCEPARAMETERSInthissectionofthepaper,wediscussthefundamentaldefinitionsoftheperformanceparametersutilizedbyuswhileevaluatingourproposedmodelformalwaredetection.
Theconfusionmatrixisusedtocalculatealltheseparameters.
Itconsistsofinformationrelatedtoactualanddetectedclassificationbuiltbydetectionmodels.
Table5demonstratestheconfusionmatrixforthemalwaredetectionmodel.
Inthisstudy,twoperformanceparametersnamely,F-measureandAccuracyareemployedformeasuringtheperformanceofmalwaredetectionmodels.
BelowweyieldformulaetoevaluateAccuracyandF-measure:AndTable5:ConfusionmatrixUsedinthisstudy.
MalwareBenignMalwareMalware→MalwareMalware→MalwareBenignBenign→MalwareBenign→BenignVIII.
EXPERIMENTALSETUPInthepresentsection,weintroducetheexperimentalsetupdonetofindtheperformanceofourdevelopedmalwaredetectionmodels.
DNNisimplementedon11,000Androidapps,whichbelongtotwelvedifferentcategoriesofandroidappsmentionedinTable2.
Allthesedatasetshaveavaryingnumberofbenignormalwareappsthatareadequatetoperformouranalysis.
Fig.
2demonstratestheframeworkofDLDroid.
Thesubsequentmeasuresarepursuedatthetimeofeitherchoosingasubsetoffeaturestodevelopthemalwaredetectionmodelthatdetectsthatappbelongstobenignormalwareclass.
Featureselectionapproachesareemployedon12differentcategoriesofAndroidapps.
Hence,atotalof132((1selectingallextractedfeatures+10featureselectionapproaches)*12datasets(subsetsofdifferentfeaturesetsparticulartodatasetsdeterminedafterconductingfeatureselection)*1detectionmethods)differentdetectionmodelshavebeendevelopedinthisresearchpaper.
Thesubsetsoffeaturesobtainedfromaforementionedprocedurearegivenasaninputtomachinelearningclassifiers.
Tocomparethedevelopedmodels,weuse20-foldcross-validationmethod.
Cross-validationisastatisticallearningapproachthatisutilizedtoclassifyandmatchthemodelsbydividingthedataintotwodifferentportions.
Oneportionisutilizedtotrainandtheremainingportionofdataisutilizedtoverifythebuildmodel,onthebasisoftraining.
ThedataisinitiallyseparatedintoKsamesizedsegments.
K-1foldsareutilizedtotrainthemodelandtherestonefoldisutilizedfortestingintention.
Mahindru&SangalInternationalJournalonEmergingTechnologies11(3):516-525(2020)521Fig.
2.
FrameworkofDLDRoid.
K-foldcross-validationishavingimportantsignificanceinutilizingthedatasetforthebothtestingandtraining.
Forthisstudy,20-foldcross-validationisutilizedtoanalyzethemodels,i.
e.
,datasetsaresegregatedinto20portions.
Theoutcomesofallbuildmalwaredetectionmodelsarematchedwitheachotherbyemployingtwodistinctperformancemeasureparameters:F-measureandAccuracy.
IX.
RESULTSOFPERFORMEDEXPERIMENTInthecurrentsectionofthepaper,therelationshipamongdifferentfeaturesetsandmalwaredetectionattheclasslevelissubmitted.
Thesetoffeaturesisusedasaninputandpresentstheratioofbenignandmalwareappswithinanexperiment.
F-measureandAccuracyareusedasperformanceassessmentparameterstomatchtheperformanceoftheAndroidmalwaredetectionmodeldevelopedbyusingsupervisedmachinelearningalgorithms.
A.
FeatureselectionapproachFig.
3demonstratesthesignificantfeatures,whichhelpustobuildthemalwaredetectionmodel.
Blackcircleissignificantfeaturesetandblankrectangleisinsignificantfeatureset.
B.
MachineLearningTechniquesElevensubsetsoffeatures(1consideringallsetofextractedfeatures+10resultingbyimplementedfeatureselectionapproaches)areusedasaninputtobuildamodelformalwaredetection.
HardwareusedtocarryoutthisstudyistheIntelCorei9processorhavingasecondarymemoryof1TBharddiskandprimarymemoryof16GB.
ModelsaredevelopedbyusingtheMATLABenvironment.
Further,theperformanceofeachdetectionmodelismeasuredbyusingtwodistinctperformanceparametersi.
e.
,F-measureandAccuracy.
Tables6and7,presenttheoutcomesobtainedfordistinctdatasetsbyutilizingDNN.
Usedabbreviationsinthisstudyare(FS1:CorrelationbestFeatureSelection,FS2:ClassifierSubsetEvaluation,FS3:FilteredSubsetEvaluation,FS4:RoughSetAnalysis(RSA),FR1:ChiSquaredtest,FR2:GainRatioFeatureEvaluation,FR3:FilteredSubsetEvaluation,FR4:InformationGainFeatureEvaluation,FR5:Logisticregressionanalysis,FR6:Principal.
ComponentAnalysis(PCA)andAF:AllExtractedfeatures)FromTables6and7,itmaybeconcludedthat:–ModeldevelopedbyconsideringfeaturesselectedbyRoughSetAnalysis(FS4)asinputisabletodetectmalwaremoreeffectivelyratherthanmodeldevelopedbyusingallextractedfeaturesets.
–FromTable6and7,wehaveseenthatfeatureselectionapproachpaidaseriouseffectontheoutcomeofthemodeldevelopedformalwaredetection.
Mahindru&SangalInternationalJournalonEmergingTechnologies11(3):516-525(2020)522(a)Chi-square(b)Gainratio(c)Informationgain(d)Logisticregressionanalysis(e)OneR(f)PCA(g)Classifier(h)Correlationbasedfeatureselection(i)Filtered(j)RSAFig.
3.
Featurerankingapproaches.
Table6:Accuracymeasuredusingdifferentfeatureselectionapproaches.
AccuracyIDAFFR1FR2FR3FR4FR5FR6FS1FS2FS3FS4DS168.
381848686828385868389.
8DS26580.
8848786858285868991.
8DS36781848782858381848990.
8DS462.
878818986838985878990.
7DS568.
881838081868782838589.
8DS667.
985878685848889929496.
7DS7788185888989.
688.
78686.
889.
793.
8DS86578757882848586878891DS96884879291838496959386DS1066.
87886888982898989.
889.
797DS117988888686898980868898DS1266.
882888286818388878990Table7:F-measuremeasuredusingdifferentfeatureselectionapproaches.
F-measureIDAFFR1FR2FR3FR4FR5FR6FS1FS2FS3FS4DS10.
790.
810.
850.
830.
810.
830.
850.
820.
870.
810.
89DS20.
750.
820.
860.
850.
840.
810.
850.
830.
850.
810.
87DS30.
780.
870.
860.
850.
830.
850.
860.
850.
840.
870.
89DS40.
720.
800.
880.
840.
870.
860.
860.
870.
810.
860.
88DS50.
670.
800.
810.
820.
830.
830.
840.
850.
860.
870.
90DS60.
690.
880.
850.
860.
870.
870.
850.
880.
870.
880.
90DS70.
700.
860.
850.
840.
870.
890.
860.
870.
820.
810.
89DS80.
670.
810.
810.
880.
850.
840.
830.
840.
840.
880.
89DS90.
780.
890.
920.
940.
930.
920.
960.
990.
910.
920.
91DS100.
700.
820.
810.
880.
860.
870.
850.
880.
820.
880.
96DS110.
720.
870.
860.
860.
850.
870.
850.
840.
820.
850.
93DS120.
750.
800.
810.
820.
810.
810.
820.
860.
860.
840.
89Mahindru&SangalInternationalJournalonEmergingTechnologies11(3):516-525(2020)523C.
EvaluationofDLDroidwithexistingtechniquesavailableintheliterature(i)Comparisonofresultswithpreviouslyusedclassifiers:Inthisstudy,wealsomakesthecomparisonwithdifferentmostoftenusedsupervisedmachinelearningapproachespresentinliteraturesuchasSVMwiththreedistinctkernelsi.
e.
,linear,polynomialandRBF,NaveBayesclassifier,Decisiontreeanalysis,LogisticregressionandNeuralnetwork.
Fig.
4demonstratesthebox-plotdiagramsforF-measureandAccuracyofcommonlyutilizedclassifiers.
(a)(b)Fig.
4.
Diagramofbox-plotshowingperformanceofdifferentclassifiers.
OnthebasisofFig.
4,weobservedthatDLDroid(DNN+FS4)hashighermedianvaluealongwithsomeoutliers.
(ii)ComparisonofresultswithdifferentAnti-Virusscanners:Althoughourproposedmodelgivesabetterperformanceascomparedtothemachinelearningtechniqueusedintheliterature,intheend,itmustbecomparablewiththecommonanti-virusproductsavailableinpracticeforAndroidmalwaredetection.
Forthisexperiment,weselect10differentanti-virusesthatareavailableinthemarketandappliedthemtoourcollecteddataset.
Forthisexperiment,weconsiderAndroidappswhosesizeislessthan50MB.
Theperformanceoftheproposedframeworkiscomparativelybetterthanmanyoftheanti-virusesavailableintheexperiment.
Table8showsustheresultsoftheexperimentwithanti-virusscanners.
Mahindru&SangalInternationalJournalonEmergingTechnologies11(3):516-525(2020)524Table8:Comparisonwithdistinctanti-virusscanners.
NameoftheAnti-virusDetectionrate(in%)SpeedtodetectmalwareinsecCyren8260Ikarus82.
6862VIPRE8940McAfee8930AVG9032AVware92.
830ESETNOD3292.
920CATQuickHeal96.
932AegisLab97.
130NANOAntivirus96.
220DLDroid(ourproposedframework)97.
912Thedetectionrateoftheanti-virusesscannersvariesconsiderably.
Also,thebestantivirusscannersdetected97.
1%oftheAndroidmalwareandcertainscannersidentifiedonly82%ofthemalicioussamples,likelydonotbeingspecializedindetectingAndroidmalware.
Byusing11,000Androidapps,DNNgivesusthedetectionrateof97.
9%andoutperformsequivalenttodifferentanti-virusscanners.
Fromthis,wecansaythatourproposedframeworkismoreefficientindetectingmalwareratherthanthemanuallycreateddefinitionofdistinctanti-virusscanners.
(iii)Experimentalfindings:Thecomprehensiveconclusionofourexperimentalworkispresentedinthissection.
TheempiricalstudywasperformedfortwelvedistinctcategoriesofAndroidappsbyconsideringsupervisedmachinelearningtechniques.
Basedontheexperimentalresults,thisresearchpaperisabletoanswerthequestionsmentionedinsectionII.
RQ1:ToaddresstheRQ1,Tables6and7wereanalyzed.
Here,itisfoundthatthemodelbuildbyutilizingFS4isabletodetectmoremalwarefromAndroidappswhencomparedtootherapproaches.
RQ2:Inthepresentpaper,featureselectionapproachisusedtoidentifythesmallersubsetoffeatures.
Byutilizingthis,weconsideredthebestpossiblesubsetsofthefeatures,whichhelpstodevelopamodeltoidentifywhetheranappisbenignormalware.
BasedontheexperimentalresultsmentionedinTables6and7,itindicatesthatinnumberofcasesthereoccursareducedsubsetoffeatures,whicharebestforbuildingadetectionmodelwhencomparedtoalltheextractedfeatures.
X.
CONCLUSIONANDFUTURESCOPEThisworkisemphasizedondevelopingamalwaredetectionframeworkbyusingaselectedsetoffeaturesthathelpustoidentifythatanAndroidappbelongstomalwareclassorbenignclass.
TheexperimentwasperformedbytakingassistanceoftwelvedistinctcategoriesofAndroidapps.
Oursubmissionsafterperformingtheexperimentarethefollowing:–Empiricalresultsspecifythatitisfeasibletoidentifyasmallsubsetoffeatures.
Malwaredetectionmodeldevelopedbyconsideringasmallsetoffeaturesisabletodetectmalwareandbenignappswiththeinferiorvalueofmisclassifiederrorsandbetteraccuracy.
–Basedonexperimentalfindings,weobservedthatconsideringfeatureselectionapproacheshelpstoreducethefeaturesets.
Theresultofmodelsbuildbyusingfeatureselectionapproachesperformbetterwhencomparedtoallextractedfeaturesets.
–Basedontheproposeddetectionframework,itisseenthatmodelbuildbyutilizingFS4iscapabletodetect97.
9%unknownmalwarefromreal-worldapps.
Inthisresearchpaper,weproposedthemalwaredetectionmodelthatdetectsonlywhetheranappismalwareorbenign.
Further,workcanbeextendedtodevelopamodelformalwaredetection,whichpredictswhetheraparticularfeatureiscapabletodetectmalware,ornot.
Moreover,thisstudycanbereplicatedoverotherAndroidappsrepository,whichutilizedsoftcomputingmodelstoattainabetterdetectionrateformalware.
ConflictofInterest.
Noconflictofinterest.
REFERENCES[1].
https://www.
up.
ac.
za/news/post\_2880755-covid-19-why-it-matters-that-scientists-continue-their-search-for-source-of-patient-zeros-infection-[2].
https://www.
mygov.
in/aarogya-setu-app/[3].
https://www.
who.
int/mediacentre/multimedia/app/en/[4].
https://www.
aa.
com.
tr/en/europe/italy-to-use-app-to-track-coronavirus-contacts/1808841[5].
https://www.
businessinsider.
in/tech/news/singapore-is-using-a-high-tech-surveillance-app-to-track-the-coronavirus-keeping-schools-and-businesses-open-heres-how-it-works-/articleshow/74797714.
cms[6].
https://www.
gdatasoftware.
co.
uk/news/2019/07/35228-mobile-malware-report-no-let-up-with-android-malware[7].
https://en.
wikipedia.
org/wiki/Google\_Playhttps://www.
gdatasoftware.
co.
uk/news/2019/07/35228-mobile-malware-report-no-let-up-with-android-malware[8].
https://www.
eweek.
com/security/google-bouncer-vulnerabilities-probed-by-security-researchers[9].
https://www.
mcafee.
com/content/dam/consumer/en-us/docs/2020-Mobile-Threat-Report.
pdf[10].
Faruki,P.
,Ganmoor,V.
,Laxmi,V.
,Gaur,M.
S.
,&Bharmal,A.
(2013).
AndroSimilar:robuststatisticalfeaturesignatureforAndroidmalwaredetection.
InProceedingsofthe6thInternationalConferenceonSecurityofInformationandNetworks,152-159.
[11].
Mahindru,A.
,&Sangal,A.
L.
(2020).
Feature-BasedSemi-supervisedLearningtoDetectMalwarefromAndroid.
InAutomatedSoftwareEngineering:ADeepLearning-BasedApproach,93-118.
Mahindru&SangalInternationalJournalonEmergingTechnologies11(3):516-525(2020)525[12].
Mahindru,A.
,&Sangal,A.
L.
(2020).
PerbDroid:EffectiveMalwareDetectionModelDevelopedUsingMachineLearningClassificationTechniques.
InAJourneyTowardsBio-inspiredTechniquesinSoftwareEngineering,103-139.
[13].
Mahindru,A.
,&Singh,P.
(2017).
Dynamicpermissionsbasedandroidmalwaredetectionusingmachinelearningtechniques.
InProceedingsofthe10thinnovationsinSoftwareEngineeringConference,202-210.
[14].
http://dx.
doi.
org/10.
17632/k4rt99sfbt.
2[15].
http://anubis.
iseclab.
org/[16].
Xu,R.
,Sadi,H.
,&Anderson,R.
(2012).
Aurasium:Practicalpolicyenforcementforandroidapplications.
InPresentedaspartofthe21st{USENIX}SecuritySymposium({USENIX}Security12)(pp.
539-552).
[17].
http://copperdroid.
isg.
rhul.
ac.
uk/copperdroid/index.
php[18].
Mahindru,A.
,&Sangal,A.
L.
(2019).
DeepDroid:FeatureSelectionapproachtodetectAndroidmalwareusingDeepLearning.
In2019IEEE10thInternationalConferenceonSoftwareEngineeringandServiceScience(ICSESS),16-19.
[19].
Burguera,I.
,Zurutuza,U.
,&Nadjm-Tehrani,S.
(2011).
Crowdroid:behavior-basedmalwaredetectionsystemforandroid.
InProceedingsofthe1stACMworkshoponSecurityandprivacyinsmartphonesandmobiledevices,15-26.
[20].
Enck,W.
,Gilbert,P.
,Han,S.
,Tendulkar,V.
,Chun,B.
G.
,Cox,L.
P.
,&Sheth,A.
N.
(2014).
TaintDroid:aninformation-flowtrackingsystemforrealtimeprivacymonitoringonsmartphones.
ACMTransactionsonComputerSystems(TOCS),32(2),1-29.
[21].
Lydia,E.
L.
,Sharmil,N.
,Shankar,K.
,&Maseleno,A.
(2019).
AnalysingthePerformanceofClassificationAlgorithmsonDiseasesDatasets.
InternationalJournalonEmergingTechnologies,10(3),224–230.
[22].
Azmoodeh,A.
,Dehghantanha,A.
,&Choo,K.
K.
R.
(2018).
Robustmalwaredetectionforinternetof(battlefield)thingsdevicesusingdeepeigenspacelearning.
IEEETransactionsonSustainableComputing,4(1),88-95.
[23].
Shabtai,A.
,Kanonov,U.
,Elovici,Y.
,Glezer,C.
,&Weiss,Y.
(2012).
"Andromaly":abehavioralmalwaredetectionframeworkforandroiddevices.
JournalofIntelligentInformationSystems,38(1),161-190.
[24].
Mas'ud,M.
Z.
,Sahib,S.
,Abdollah,M.
F.
,Selamat,S.
R.
,&Yusof,R.
(2014,May).
Analysisoffeaturesselectionandmachinelearningclassifierinandroidmalwaredetection.
In2014InternationalConferenceonInformationScience&Applications(ICISA),1-5.
[25].
Narayanan,A.
,Chandramohan,M.
,Chen,L.
,&Liu,Y.
(2018).
Amulti-viewcontext-awareapproachtoAndroidmalwaredetectionandmaliciouscodelocalization.
EmpiricalSoftwareEngineering,23(3),1222-1274.
[26].
https://www.
apkmirror.
com/[27].
https://www.
allfreeapk.
com/[28].
http://dx.
doi.
org/10.
17632/k4rt99sfbt.
2[29].
https://www.
virustotal.
com/gui/home[30].
http://dx.
doi.
org/10.
17632/b4mxg7ydb7.
3[31].
https://developer.
android.
com/guide/topics/permissions/overviewrHowtocitethisarticle:Mahindru,A.
andSangal,A.
L.
(2020).
DLDroid:FeatureSelectionbasedMalwareDetectionFrameworkforAndroidAppsdevelopedduringCOVID-19.
InternationalJournalonEmergingTechnologies,11(3):516–525.
易探云怎么样?易探云隶属于纯乐电商旗下网络服务品牌,香港NTT Communications合作伙伴,YiTanCloud Limited旗下合作云计算品牌,数十年云计算行业经验。发展至今,我们已凝聚起港内领先的开发和运维团队,积累起4年市场服务经验,提供电话热线/在线咨询/服务单系统等多种沟通渠道,7*24不间断服务,3分钟快速响应。目前,易探云提供香港大带宽20Mbps、16G DDR3内存、...
硅云怎么样?硅云是一家专业的云服务商,硅云的主营产品包括域名和服务器,其中香港云服务器、香港云虚拟主机是非常受欢迎的产品。硅云香港可用区接入了中国电信CN2 GIA、中国联通直连、中国移动直连、HGC、NTT、COGENT、PCCW在内的数十家优质的全球顶级运营商,是为数不多的多线香港云服务商之一。目前,硅云香港云服务器,CN2+BGP线路,1核1G香港云主机仅188元/年起,域名无需备案,支持个...
vollcloud怎么样?vollcloud LLC创立于2020年,是一家以互联网基础业务服务为主的 技术型企业,运营全球数据中心业务。VoLLcloud LLC针对新老用户推出全场年付产品7折促销优惠,共30个,机会难得,所有产品支持3日内无条件退款,同时提供产品免费体验。目前所有产品中,“镇店之宝”产品性价比高,适用大部分用户基础应用,卖的也是最好,同时,在这里感谢新老用户的支持和信任,我们...
esetnod32id为你推荐
abolishingios11微信小程序直播功能准入要求三星itunesThresholdcss经营策略iphone支持ipad支持ipadcss下拉菜单如何使用HTML和CSS制作下拉菜单重庆电信网速测试电信100M下载速度多少M,为什么我家里电信100M下载速度最快5M美妙,是不是严重缩水canvas2html5创建两个canvas后,怎么回到第一个canvas
cc域名 域名主机基地 域名备案流程 什么是二级域名 cn域名个人注册 站群服务器 鲨鱼机 godaddy支付宝 php免费空间 网盘申请 台湾谷歌网址 刀片式服务器 shopex主机 smtp服务器地址 日本代理ip 德讯 酸酸乳 密钥索引 fatcow websitepanel 更多