evaluate11kk99

11kk99 com 时间:2021-03-03 阅读:()

Processingofmassiveauditdatastreamsforreal-timeanomalyintrusiondetectionWeiWanga,*,XiaohongGuana,b,XiangliangZhangaaStateKeyLaboratoryforManufacturingSystems(SKLMS)andMOEKeyLabforIntelligentNetworksandNetworkSecurity(KLINNS),Xi'anJiaotongUniversity,Xi'an710049,ChinabCenterforIntelligentandNetworkedSystems,TsinghuaUniversity,Beijing100080,ChinaReceived26October2006;accepted2October2007Availableonline13October2007AbstractIntrusiondetectionisanimportanttechniqueinthedefense-in-depthnetworksecurityframework.
Mostcurrentintrusiondetectionmodelslacktheabilitytoprocessmassiveauditdatastreamsforreal-timeanomalydetection.
Inthispaper,wepresentaneectiveanom-alyintrusiondetectionmodelbasedonPrincipalComponentAnalysis(PCA).
Themodelismoresuitableforhighspeedprocessingofmassivedatastreamsinreal-timefromvariousdatasourcesbyconsideringthefrequencypropertyofauditeventsthanbyuseofthetransitionpropertyorthecorrelationproperty.
ItcanserveasageneralframeworkthatapracticalIntrusionDetectionSystems(IDS)canbeimplementedinvariouscomputingenvironments.
Inthismethod,amulti-prongedanomalydetectionmodelisusedtomon-itorvariouscomputersystemandnetworkbehaviors.
Threesourcesofdata,systemcalldatafromtheUniversityofNewMexico(lpr)andfromKLINNSLabofXi'anJiaotongUniversity(ftp),shellcommanddatafromAT&TResearchlaboratory,andnetworkdatafromMITLincolnLab,areusedtovalidatethemodelandthemethod.
Thefrequenciesofindividualsystemcallsgeneratedbyoneprocessandofindividualcommandsembeddedinonecommandblockaswellasfeaturesextractedinonenetworkconnectionaretrans-formedintoaninputdatavector.
Ourmethodisemployedtoreducethehighdimensionaldatavectorsandthusthedetectionishandledinalowerdimensionwithhigheciencyandlowuseofsystemresources.
Thedistancebetweenavectoranditsreconstructioninthereducedsubspaceisusedforanomalydetection.
Empiricalresultsshowthatourmodelispromisingintermsofdetectionaccuracyandcomputationaleciency,andthusamenableforreal-timeintrusiondetection.
2007ElsevierB.
V.
Allrightsreserved.
Keywords:Intrusiondetection;PrincipalComponentAnalysis;HiddenMarkovmodels;Networksecurity;Datastreams1.
IntroductionNetwork-borneattacksarecurrentlymajorthreatstoinformationsecurity.
Withtherapidgrowthofunautho-rizedactivitiesonthenetwork,IntrusionDetectionSys-tems(IDS)havebecomeveryimportant.
Intrusiondetectionisatechnologyfordetectinghostileattacksagainstcomputernetworksystems,bothfromoutsideandinside.
Ingeneral,thetechniquesforintrusiondetec-tionfallintotwomajorcategoriesdependingonthemod-elingmethodsused:signature-baseddetectionandanomalydetection.
Signature-baseddetectionidentiesmaliciousbehaviorbymatchingitagainstpredeneddescriptionofattacks,orsignatures.
Althoughsignature-baseddetectioniseectiveagainstknownintrusiontypes,itcannotdetectnewattacksthatwerenotpredened.
Anomalydetection,ontheotherhand,denesanormalproleofasubject'snormalactivities(anormalprole)andattemptstoidentifyanyunacceptabledeviationaspossiblytheresultofanattack.
Anomalydetectionmaybeabletodetectnewattacks.
However,itmayalsocauseasignicantnumberoffalsealarmsbecausethenormalbehaviorvarieswidely0140-3664/$-seefrontmatter2007ElsevierB.
V.
Allrightsreserved.
doi:10.
1016/j.
comcom.
2007.
10.
010*Correspondingauthor.
Tel.
:+33299127045.
E-mailaddresses:wei.
wang.
email@gmail.
com(W.
Wang),xhguan@tsinghua.
edu.
cn(X.
Guan),xlzhang@lri.
fr(X.
Zhang).
www.
elsevier.
com/locate/comcomAvailableonlineatwww.
sciencedirect.
comComputerCommunications31(2008)58–72andobtainingcompletedescriptionsofnormalbehaviorisoftendicult.
AnomalydetectionhasbeenanactiveresearchareaformorethanadecadesinceitwasoriginallyproposedbyDenning[1].
Manymethodshavebeendevelopedforanomalydetection,suchasmachinelearning,datamining,neuralnetworks,statisticalmethodology.
Therearemulti-pleprongsthatanomalydetectionmodelscanbebuiltuponinrealcomputernetworksystems.
Manysourcesofdataarethenusedforanomalydetection,includingshellcommands,auditevents,keystrokerecords,systemcallsandnetworkpackets.
Earlystudies[2–4]onanomalydetec-tionmainlyfocusedonmodelingsystemoruserbehaviorfrommonitoredsystemlogoraccountinglogdata,includ-ingCPUusage,timeoflogin,durationofusersessionsandnamesoflesaccessed.
SchonlauandTheus[5]attemptstodetectmasqueradesbybuildingnormaluserbehaviorusingtruncatedcommandsequences.
Experimentswithsixmas-queradedetectiontechniques[6]:Bayesone-stepMarkov,Hybridmulti-stepMarkov,IPAM,Uniqueness,Sequence-MatchandCompression,wereperformedandcompared.
MaxionandTownsend[7]appliedtheNa¨veBayesclassicationalgorithmtodetectmasqueradebasedonthesamedata.
LaneandBrodley[8]proposedalearn-ingalgorithmforanalyzingusershellcommandhistorytobuildnormaluserbehavioranddetectanomalies.
Itattemptstoaddressthe''conceptdrift''problemwhenthenormaluserbehaviorchanges.
RecentlyOkaetal.
[9]usedlayerednetworksformasqueradedetectionbasedonEigenCo-occurrenceMatrix(ECM).
Inrecentyears,alotofresearchactivitiesfocusedonlearningprogrambehaviorandbuildingproleswithsys-temcallsequencesasdatasources.
In1996,Forrestetal.
[10]introducedasimpleanomalydetectionmethodcalledtime-delayembedding(tide),basedonmonitoringsystemcallsinvokedbyactiveandprivilegedprocesses.
Prolesofnormalbehaviorwerebuiltbyenumeratingallxedlengthofdistinctandcontiguoussystemcallsthatoccurinthetrainingdatasetsandunmatchedsequencesinactualdetectionareconsideredanomalous.
Insubsequentresearch,theapproachisextendedbyvariousmethods.
Forexample,LeeandStolfo[11]useddataminingapproachtostudyasampleofsystemcalldataandcharac-terizethesequencescontainedinnormaldatabyasmallsetofrules.
Thesequencesviolatingthoseruleswerethentrea-tedasanomaliesformonitoringanddetectionpurpose.
Warrenderetal.
[12]proposedaHiddenMarkovModel(HMM)basedmethodformodelingandevaluatinginvisi-bleevents.
Thismethodwasfurtherstudiedbymanyotherresearchers[13–16].
Wespietal.
[17]extendedForrest'sideaandproposedavariablelengthapproach.
Asakaetal.
[18]developedanapproachbasedonthediscriminantmethodinwhichanoptimalclassicationsurfacewasrstlearnedfromsamplesofproperlylabelednormalandabnormalsystemcallsequences.
Thesurfacewasthenusedasabasisfordecidingthenormalityofanewsystemcallsequence.
YeungandDing[14]andLeeetal.
[19]usedinformation-theoreticmeasuresforanomalydetection.
Liaoetal.
[20]usedK-NearestNeighbor(K-NN)classierandWuetal.
[21]appliedrobustSupportVectorMachines(SVM)forintrusiondetectionbasedonsystemcalldatatomodelprogrambehaviorandclassiedeachprocessasnor-malorabnormalwhenitterminated.
Cho[22]appliedsoftcomputingtechniquesforanomalydetection.
Ourresearchgroup[23]employedRoughSetTheory(RST)tolearnandmodelnormalbehaviorwithimproveddetectionaccuracywhileusingmuchsmallersizeoftrainingdatasets.
Ourresearchgrouphasalsodevelopedanothertwointrusiondetectionmethods.
One[24]isbasedonplanrecognitionforpredictingintrusionintentionsbyobservingsystemcallsandtheother[25]isbasedonNon-negativeMatrixFactorization(NMF)toproleprogramanduserbehav-ior.
Othermethodsortechniques,suchasrst-orderMar-kovChainModels[26],high-orderMarkovModels[27],EWMA[28],Decisiontree[29],Chi-Square[30]andNeu-ralNetworks[31]arealsousedforsystemcallbasedintru-siondetection.
Inamulti-layeredormulti-prongedIDS,monitoringnetworktracbehaviorisasimportantasmonitoringpro-grambehavioranduserbehavior.
EMERALD[32]usedstatisticalanomalydetectionmodulestomonitornetworktrac.
Leeetal.
[33,34]extractedfeaturesfromnetworkdataandbuiltsignature-baseddetectionmodels.
Thedetectionmodelsgeneralizedrulesthatclassifythedatawiththevaluesoftheextractedfeatures.
Liuetal.
[35]pro-posedanewGeneticClustering(GC)basedanomalydetec-tionalgorithm.
Themethodcanestablishclustersanddetectnetworkintrusionsbylabelingnormalandabnor-malgroups.
Eskinetal.
[36]proposedageometricframe-workforunsupervisedanomalydetectionandthreealgorithms,Cluster,K-NearestNeighbor(K-NN)andSupportVectorMachine(SVM),wereusedforclassica-tion.
Shyuetal.
[37]proposedaPrincipalComponentClassier(PCC)fornetworkintrusiondetection.
TheymeasuredtheMahalanobisdistanceofeachobservationfromthecenterofthedataforanomalydetection.
TheMahalanobisdistanceiscomputedbasedonthesumofsquaresofthestandardizedprincipalcomponentscores.
Heywoodetal.
[38]usedahierarchicalneuralnetworkapproachbasedonSelfOrganizingMaps(SOM)andpotentialfunctionclusteringfornetworkintrusiondetec-tion.
Sarasammaetal.
[39]proposedamultilevelHierar-chicalKohonenenNet(HKN)foranomalynetworkintrusiondetection.
Eachlevelofthehierarchicalmapismodeledasasimplewinner-take-allKohonenennet.
Othermethods,suchasneuralnetworks[40]andfusionofmulti-pleneuralnetworkclassiers[41],havealsobeenappliedfornetworkintrusiondetection.
Althoughexistingeortsinanomalydetectionhavemadeimpressiveprogress,therearestillmanyissuestoberesolved.
First,acomputersystemindailyoperationcanproducemassivedatastreamsfromvariousdatasources.
Moreover,thesesourcesofdataaretypicallyhighdimensional.
Forexample,incollectingsystemcallsofW.
Wangetal.
/ComputerCommunications31(2008)58–7259sendmailonahostmachine,only112messagesproducedacombinedtracewiththelengthofover1.
5millionsystemcalls[10].
Eachtraceofthedatamaycontainaboutmorethan40distinctsystemcalls,resultingahighdimensionaldataset.
AndinexperimentscarriedoutbyMITLincolnLabforthe1998DARPAevaluation[42],networktracover7weekscontains5GBofcompressedbinarytcpdumpdatawhichwereprocessedintoaboutvemillionconnec-tionrecords.
Similarly,thenetworkdataishighdimen-sionalaseachnetworkconnectioncontains41features.
Giventhesegures,highspeedprocessingofhighdimen-sionalmassiveauditdatainmostcasesisessentialforapracticalIDSsothatactionsforresponsecanbetakenassoonaspossible.
However,manycurrentanomalydetectionmodelsmaketheimplicitassumptionthatdataisrelativelylowdimen-sional,oronlyasmallamountofdataisused.
Thisover-simplicationlimitstheeectivenessofthesemodels.
Ontheotherhand,manyintrusiondetectionmodelsrequiretoomuchtimetotrainthemodelsbyprocessingalargeamountofdata.
Forexample,ittookHiddenMarkovModels(HMM)approximatelytwomonthstotrainananomalydetectionmodelwithalargedataset[12].
Sincecomputingenvironmentschangerapidly,aneectiveintru-siondetectionmodelshouldbeperiodicallyretrainedtoachievereal-timeself-adaptiveintrusiondetection.
Spend-ingtoomuchtimefortrainingisclearlynotadequateforthispurpose.
Applicabilityonvariousdatasourcesisanotherissue.
ManyIDSscanonlyhandleoneparticularauditdatasource[33,34].
Forexample,systemcallbasedintrusiondetectionmethodsmaynotbeappliedtoothersourcesofdata,suchasshellcommanddataornetworkdata.
Sinceactivitiesatdierentpenetrationpointsarenor-mallyrecordedindierentauditdatasources,anIDSoftenneedstobeextendedtoincorporateadditionalmodulesthatspecializeoncertaincomponents(e.
g.
,hosts,subnets,etc.
)ofthenetworksystem[33].
Therefore,itiscrucialtodevelopandbuildageneralframeworkthatpracticalIDSscanbeimplementedonvariousdatasources.
AnIDSisarecognitionsysteminnature.
Toachieveeectivereal-timeanomalyintrusiondetection,itiscrucialtochooseandextractsuitablefeaturesandtodesigneec-tiveclassicationalgorithms.
Inpractice,itisalwaysachallengetochoosefeaturesthatbestcharacterizebehav-ioralpatternsofasubject(e.
g.
,aprogram,auseroranet-workelement,etc.
),sothatabnormalitycanbeclearlydistinguishedfromnormalactivities.
Ingeneral,therearethreecategoriesofattributesofactivitiesincomputersys-tems:thetransitionproperty,frequencypropertyandcor-relationpropertyofauditevents.
Thesethreepropertiesofauditeventshavebeenwidelystudiedforintrusiondetection.
Theintrusiondetectionmethodsconsideringthetransi-tionpropertyofauditeventsextractthetransitioninforma-tionbetweentheelementsintheauditdata.
Thesemethodsoftenuseslidingwindowstodividethedataintoshortsequencesfordatapreparation.
Thesemethodsinclude[8,10–17,23,24]and[26,27]intheliterature.
Theintrusiondetectionmethodstakingintoaccountthefrequencyprop-ertyofauditeventscomputethedistributionoftheauditdata.
Thesemethodsdonotfocusontemporalvariationsinthedata.
Somemethodsalsouseslidingwindowstopar-titionthedataintoshortsequenceswhileothermethodsdonotfordatapreparation.
Thesemethodsinclude[3,5,7,14,20,21,25]intheliterature.
Therearealsointrusiondetectionmethodsusingthecorrelationpropertyofauditevents.
Thesemethodscapturethecorrelationinformationembeddedintheauditdata.
Ref.
[9]isanexampleforcap-turingtheuserbehaviorsbycorrelatingnotonlyconnectedeventsbutalsoeventsthatarenotadjacenttoeachother.
Inthispaper,werstlyconductexperimentstoevaluatetherelationshipbetweentheperformanceofintrusiondetectionandthepropertiesofauditeventswhicharecon-sidered.
Comparativestudiesarethenusedasreferencestochoosesuitablefeaturesofauditeventsforreal-timeintru-siondetection.
HiddenMarkovModels(HMM)areusu-allyusedformodelingtemporalvariationinthedata.
Wethenusethismethodtoconsiderthetransitionpropertyofauditeventsforintrusiondetection.
Inthispaper,weproposeaPrincipalComponentAnalysis(PCA)methodtotakeintoaccountthefrequencypropertyofauditeventsforintrusiondetection.
WealsocomparethetestingresultsofPCAmethodwiththeresultsobtainedbyusingHMMmethodandECMmethodconsideringthecorrelationinformationofauditevents.
Thenoveltyofourworkliesinthefollowingthreeaspects.
First,therelationshipsbetweentheperformanceofintrusiondetectionandthepropertiesofauditeventsconsideredareevaluated.
Theevaluationresultscanbeusedasanimportantreferenceforeectivereal-timeintru-siondetection.
Second,theproposedPCAbasedintrusiondetectionmethodcanachievereal-timeintrusiondetectionbasedondimensionalityreductionandonasimpleclassi-er.
Third,theproposedmethodcanalsoserveasageneralframeworkthatpracticalIDSscanbeimplementedinvar-iousenvironmentsbasedonitsexibilityforprocessingvariouskindsofdatastreams.
Thetestingresultsshowthatconsideringthefrequencypropertyofauditeventsismoresuitableforreal-timeintrusiondetection,providingade-quatedetectionperformanceatlowcomputationalover-head,comparingtousingthetransitionandcorrelationpropertyofauditevents.
ThePCA-basedintrusiondetec-tionmethodiseectiveandsuitableforhighspeedprocess-ingofmassivedatastreamsfromvariousdatasources.
Inthismethod,amulti-prongedintrusiondetectionmodelisusedtomonitorvariouscomputernetworkbehaviorsandthreesourcesofdataareusedtovalidatethemodelandthemethod.
Empiricalresultsshowthatthemethodispromisingintermsofdetectionaccuracyandcomputa-tionaleciency,andthusamenableforreal-timeintrusiondetection.
Theremainderofthispaperisorganizedasfollows:Section2describestheHMMmethodconsideringthetran-sitionpropertiesandtherelatedmethodusingthecorrela-60W.
Wangetal.
/ComputerCommunications31(2008)58–72tionpropertiesofauditeventsforintrusiondetection.
Sec-tion3providesabriefintroductiontoPCAanddescribestheproposedintrusiondetectionmodelusingthefrequen-ciesofauditevents.
EmpiricalresultsonthreesourcesofdataareshownandanalyzedinSection4toillustratetheeectivenessandeciencyoftheproposedmethod.
TheconcludingremarksfollowinSection5.
2.
Intrusiondetectionmethodsbasedonthetransitionpropertiesandcorrelationpropertiesofauditevents2.
1.
HMM-basedintrusiondetectionmethodconsideringthetransitionpropertiesofauditeventsHMMsaredynamicmodelswidelyusedforconsideringthetransitionpropertyofevents.
AnHMMdescribesadoublystochasticprocess.
EachHMMcontainsanitenumberofunobservable(orhidden)states.
Transitionsamongthestatesaregovernedbyasetofprobabilitiescalledtransitionprobabilities.
AnHMMdenestwocon-currentstochasticprocesses:thesequenceoftheHMMstatesandasetofstateoutputprocesses.
GivenaninputsequenceofobservationsO(O1,,Ot),anHMMcanmodelthissequencebythreeparameters–statetransitionprobabilitydistributionA,observationsymbolprobabilitydistributionBandinitialstatedistributionp[43,44].
There-fore,asequencecanbemodeledask=(A,B,p)usingitscharacteristicparameters.
TherearethreecentralissuesinHMMsincludingtheevaluationproblem,thedecodingproblem,andthelearn-ingproblem.
GivenanHMMmodelkandasequenceofobservationsO(O1,,Ot),theevaluationproblemistoevaluateP(O|k),theprobabilitythattheobservationsaregeneratedbythemodel.
Thedecodingproblem,ontheotherhand,istodecidethemostlikelystatesequencethatproducedtheobservations.
Thelearningproblemisatrainingprocessandthereforeveryimportant.
ItistomaximizeP(O|k)byadjustingtheparametersofthemodelk.
HMMslearningcanbeconductedbytheBaum-Welch(BW)orforward-backwardalgorithm–anexampleofageneralizedExpectation-Maximization(EM)algorithm[45].
StandardHMMshaveaxednumberofstates,sowemustdecidethesizeofthemodelbeforetraining.
Previousresearchindicatesthatagoodchoicefortheapplicationistochooseanumberofstatesroughlycorrespondingtothenumberofdistinctelementsintheauditdata[12].
Thenumberofthestatesthereforedependsonthedatasetusedinexperiments.
AftertheHMMswerelearnedonthetrainingsetofthedata,normalbehavioristhusproledbytheparametersoftheHMMsk=(A,B,p).
Givenatestsequenceofthedata(lengthS),weuseaslidingwindowoflengthLtomovealongthetraceandget(SL+1)shortsequencesofthedataOi(16i6SL+1).
Usingthenormalmodelk=(A,B,p)whichwasbuiltbythelearningmethoddescribedabove,theprobabilitythatagivenobservationsequenceOisgeneratedfromthemodelcanbeevaluatedusingtheforwardalgorithm[45].
Intheexperiments,weuselog-probabilitylog-P(O|k)insteadofP(O|k)toincreasethescaleoftheprobability.
Ideally,awell-trainedHMMcangivesucientlyhighlikelihoodonlyforsequencesthatcorrespondtonormalbehavior.
Sequencesthatcorrespondtoabnormalbehav-ior,ontheotherhand,shouldgivesignicantlylowerlike-lihoodvalues[14].
TheHMMbasedanomalydetectionmethodinthispaperisbasedonthisproperty.
Givenapredeterminedthresholdeawithwhichwecom-paretheprobabilityofasequenceOinatesttrace,iftheprobabilityisbelowthethreshold,thesequenceOisthenaggedasamismatch.
Wesumupthemismatchesanddenetheanomalyindexastheratiobetweenthenumbersofthemismatchesandofallthesequencesinthetesttrace.
Theclassicationruleisthusassignedasfollowingequation:AnomalyindexNumbersofthemismatchesNumbersofallthesequencesinthetesttrace>eh1If(1)ismet,thenthetraceembeddingthetestsequencesisconsideredasapossibleintrusion.
Otherwiseitisconsid-eredasnormal.
2.
1.
1.
DatasetsIntheexperiments,weusesystemcalldatatoevaluatetheHMM-basedintrusiondetectionmodel.
Inthispaper,wedonotconsidertheargumentstosystemcallsthatwouldsupplyadditionalinformation(e.
g.
,[46,47]).
Inordertoassessthemodel,weusedtwodatasetsintheexperimentsforprolingprogrambehaviors.
OneislprdatacollectedinMITLincolnLabandalsobyWarrenderetal.
[12].
Thedatasetcanbedownloadedathttp://www.
cs.
unm.
edu/$immsecandtheproceduresforgenerat-ingthedataarealsodescribedonthewebsite.
Thedatasetincludes2703tracesofnormaldataand1001tracesofintrusiondata.
Weusedtherst600tracesofnormaldataandtherst300tracesofintrusiondataintheexperiments.
TheotherdatasetwascollectedintheactualsysteminourKLINNSlabofXi'anJiaotongUniversity.
WecollectedliveftpsystemcallsequencesonaRedHatLinuxsystemwithkernel2.
4.
7-10,spanningatimeperiodofabout2weeks.
Thenormaldataaretracesofsystemcallsgeneratedbyauthenticuserswithvariousconditions.
IntrusiontracesareassociatedwithexploitationsagainstawidelyknownWu-Ftpdvulnerability[48],whichallowsremoteattackerstoexecutearbitrarycommandsonthevictimhost.
Theftpsystemcalldatageneratedintheexperimentsincludes549normaltracesandsixintrusiontraces.
ThestatisticsofthesystemcalldatausedintheexperimentsareshowninTable1.
W.
Wangetal.
/ComputerCommunications31(2008)58–72612.
1.
2.
TestingresultsbasedontheHMMmethodIntheexperiments,wegroupthesystemcallsthatareinvokedbythesameprocessintoonetraceandclassifywhethertheprocessbehaviorisnormalornot.
Forlprdata,200tracesofthenormaldataarerandomlyselectedfortrainingandotherdata,400tracesofthenor-maldataand300tracesoftheintrusiondataareusedfordetection.
Forliveftpdata,70tracesofthenormaldataarerandomlyselectedfortrainingandotherdata,479tracesofnormaldataandsixtracesofintrusiondataareusedfordetection.
Inthetrainingsetofthenormaldata,thereare41distinctsystemcallsinlprdataand58distinctsys-temcallsinliveftpdata,respectively.
Wethereforeuse41statesand58statesofHMMsintheexperimentsforprolingtheprogrambehavioroflprandftpaccordingly.
Inourexperiments,eachtraceofthesystemcalldataisrstconvertedintoshortsequencesofxedlengths.
Weusewindowsizesas3and6,respectively,intheexperimentsforcomparisonofthetestingresults.
TheaverageanomalyindexesofthenormalandintrusiontracesofthesetwodatasetsarecalculatedrespectivelyandsummarizedinTable2andtheFalseAlarmRates(FAR)andDetectionRates(DR)aresummarizedinTable3.
TheCPUtimerequiredfortraininganddetectionoflprdataisalsosum-marizedinnextsectionforclearcomparisonwithothermethods.
FromTables2and3,itisobservedthat:(1)theanomalyindexesofabnormaltracesaresignicantlyhigherthanthoseofthenormaldataforbothofthetwodatasetsandtheHMMbasedmethodisthusaneectivemethodforintrusiondetection.
(2)Dierentwindowsizesresultindierentanomalyindexesofeachtraceofthedata.
Aswindowsizeincreases,theanomalyindexcorrespondingtothetraceofthedatatendstoincreasetoo.
Thisisnotstrangebecausealargeslidingwindowcontainsmoretran-sitioninformationbetweenthesystemcallsandthisisvalu-ableforclassication.
TheperformanceoftheintrusiondetectionisrelatedtothewindowsizesandthusthechoiceofwindowsizeshouldbeaddressedinpracticalIDSs.
2.
2.
ThemethodconsideringthecorrelationpropertiesofauditeventsTherearefewintrusiondetectionmethodsthatconsiderthecorrelationpropertiesofauditevents.
Recently,Okaetal.
[9]proposedanEigenCo-occurrenceMatrix(ECM)methodcorrelatingnotonlyconnectedeventsbutalsoeventsthatarenotadjacenttoeachotherwhileappearingwithinacertaindistance.
Theirmethodcreatedaso-called''co-occurrencematrix''bycorrelatingacom-mandinasequencewithanyfollowingcommandsthatappearwithinacertaindistance.
Userbehaviorwasthenbuiltbasedonthe''eigenco-occurrencematrix''createdbyextractingprincipalfeaturesofthe''co-occurrence''.
TheECMmethodistypicallyanintrusiondetectionmethodthatconsidersthecorrelationpropertyofauditevents.
Inthispaper,wecomparethetestingresultsofthismethodwiththoseofourmethod.
3.
TheproposedintrusiondetectionmethodbasedonPrincipalComponentAnalysis3.
1.
PrincipalComponentAnalysisPrincipalComponentAnalysis(PCA,alsocalledKarh-unen-Loe`vetransform)isoneofthemostwidelyuseddimensionreductiontechniquesfordataanalysisandcom-pression.
Itisbasedontransformingarelativelylargenum-berofvariablesintoasmallernumberofuncorrelatedvariablesbyndingafeworthogonallinearcombinationsoftheoriginalvariableswiththelargestvariance.
TherstprincipalcomponentofthetransformationisthelinearTable1DescriptionsofsystemcalldataintheexperimentsDatasetNumberofsystemcallsNumberofdistinctsystemcallsNumberofnormaltraces(processes)Numberofintrusiontraces(processes)MITlprdata842,27941600300LiveftpdatafromKLINNSLab5,699,277585496Table3ThedetectionratesandfalsealarmratesofthetwosetsofsystemcalldataDatasetWindowsizeeaeh(%)FAR(%)DR(%)Lprdata3322.
485.
5100325.
851.
199.
76403.
162.
3100406.
550.
599.
7Liveftpdata3402.
0301006704.
070100Table2TheaverageanomalyindexesofnormalandabnormaltracesofthetwosetsofsystemcalldataSystemcallsequencesAnomalyindexes(%)Windowsize=3Windowsize=6LprdataIntrusion5.
952410.
3659Normal1.
77361.
8665LiveftpdataIntrusion11.
4216.
25Normal0.
10210.
197862W.
Wangetal.
/ComputerCommunications31(2008)58–72combinationoftheoriginalvariableswiththelargestvari-ance;thesecondprincipalcomponentisthelinearcombi-nationoftheoriginalvariableswiththesecondlargestvarianceandorthogonaltotherstprincipalcomponentandsoon.
Inmanydatasets,therstseveralprincipalcomponentscontributemostofthevarianceintheoriginaldataset,sothattherestcanbedisregardedwithminimallossofthevariancefordimensionreductionofthedata[45,51].
PCAhasbeensuccessfullyappliedinmanyareas,suchasfacerecognition[50],imageprocessing,textcatego-rization,geneexpressionanalysisandsoon.
Thetransfor-mationworksasfollows.
Givenasetofobservationsbex1,x2,,xn,supposeeachobservationisrepresentedbyarowvectoroflengthm.
ThedatasetisthusrepresentedbyamatrixXn·mXnmx11x12x1mx21x22x2mxn1xn2xnm2666437775x1;x2;xn2Theaverageobservationisdenedasl1nXni1xi3ObservationdeviationfromtheaverageisdenedasUixil4ThesamplecovariancematrixofthedatasetisdenedasC1nXni1xilxilT1nsumni1UiUTi1nAAT5whereA=[U1,U2,,Un].
ToapplyPCA,eigenvaluesandcorrespondingeigenvec-torsofthesamplecovariancematrixCareusuallycom-putedbytheSingularValueDecomposition(SVD)theorem[51].
Suppose(k1,u1),(k2,u2)km,um)aremeigenvalue-eigenvectorpairsofthesamplecovariancematrixC.
Wechoosekeigenvectorshavingthelargesteigenvalues.
Oftentherewillbejustafewlargeeigenvalues,andthisimpliesthatkistheinherentdimensionalityofthesubspacegoverningthe''signal''whiletheremaining(mk)dimensionsgenerallycontainnoise[45].
Thedimensionalityofthesubspacekcanbedeterminedby[49].
Pki1kiPmi1kiPa6whereaistheratioofvariationinthesubspacetothetotalvariationintheoriginalspace.
Ifaischosenas99.
9%,thenvariationinthesubspacespannedbytheformerkeigen-vectorshasonly0.
1%lossofvariationintheoriginalspace.
Weformam·kmatrixUwhosecolumnsconsistofthekeigenvectors.
Therepresentationofthedatabyprincipalcomponentsconsistsofprojectingthedataontothek-dimensionalsubspaceaccordingtothefollowingrules[45].
yiUTxilUTUi73.
2.
IntrusiondetectionmodelbasedonPCAInordertodetectintrusionsacross-the-board,atypicalmulti-prongedIDSisproposedandshowninFig.
1.
Inthemulti-prongedIDS,thebehaviorsofanetworkedcomputersystemaremonitoredaccordingtotheimpactorderoftheattacksanddividedintothreeprongsincludingnetworkbehavior,userbehaviorandsystembehavior.
UsuallyvariousmethodsarerequiredtoprocessnetworkKeyHostMachineInternetDataCollectionSystemBehavioralstreamNetworkBehavioralStreamUserBehavioralStreamSystemCallDataNetworkDataKeystrokeRecordsShellCommandDataCommandanlyasisCPUtime,memory.
.
.
analysisKeystrokerecordsanalysisIntrusionalarmsreportNetworkconnectionanalysisDataAnalysis,IntrusionDetectionandAlarmsReportCPUtime,Memory.
.
.
FileaccessFileaccessanalysisSystemcallanalysisFig.
1.
Amulti-prongedIDS.
W.
Wangetal.
/ComputerCommunications31(2008)58–7263packets,keystrokerecords,lesystem,commandsequences,systemcalls,etc.
,ofauditdatastreamsobtainedinthethreeprongsforintrusiondetection.
Inthispaper,weproposeageneralframeworkforbuild-ingintrusiondetectionmodelsforamulti-prongedIDS.
Thismodelisthentestedinfourexperimentsusingthreesourcesofdata:systemcalldata,commanddataandnet-workconnectiondata.
Buildingtheintrusiondetectionmodelincludesthreesteps:datapreparation,dimensionreductionandfeatureextraction,andclassication.
3.
2.
1.
DatapreparationIndatapreparation,eachsourceoftheobservationdatasetisdividedintosmallerdatablocksusingaspeciedscheme.
Forexample,thesystemcalldataisdividedbyprocesses,networkdatabyconnectionsandshellcom-manddataisdividedintoconsecutiveblockswithaxedlength.
Insteadofusingthetransitioninformationofthedata,weusethefrequenciesofsystemcalldatatocharac-terizeprogrambehaviorandshellcommanddatatochar-acterizeuserbehavior.
Fornetworkdata,weusethefeaturesextractedfromanetworkconnectiontocharacter-izenetworktracbehavior.
Toclearlyshowthedetaileddatapreparationstep,wegiveanexampleofthedatapreparationmethodforsystemcalldata.
Thesystemcallsinvokedbythesameprocessarerstlygroupedintoonetracerepresentingeachprocessinthedata.
Forexample,thesystemcallsequenceinvokedbytheProcess7174inlprdataareshownbelow.
ProcessID:7174536767513967624465596765367676536767661061051051071061051051071061050510710610510510710610510510710610510507106105105107106105105107106105105107460005333333333651595590891681688159365193624156551593205128676940113209420101518910114412986901221136382831946578571201204444433385365449469106661691094954Inthissequence,eachsystemcallisrepresentedbyanumber.
Themappingbetweenasystemcallnumberandtheactualsystemcallnameisgivenbyaseparatetable.
Forexample,thenumber5representssystemcall''open'',thenumber3representssystemcall''read''.
Insteadofusingshortsequencesofsystemcallsusedbymostintru-siondetectionmethods[10–18,23,24]andbytheHMMmethodproposedinSection2.
1,weuseeachtraceofthedataasobservation.
IntheLinux/Unixenvironment,exe-cutionofaprogramcangenerateoneormoreprocesses.
Eachprocessproducesasingletraceofsystemcallsfromthebeginningofitsexecutiontotheend.
Therefore,bytreatingeachtraceofthedataasobservation,programbehaviorcanbeproledforanomalydetection.
Ineachtraceofthedata,thefrequenciesofindividualsystemcallsarecalculated.
Forexample,thefrequencyofnumber5intheprocess7174is0.
086.
Eachtraceofsystemcalldataisthustransformedintoadatavectorandthematrixrepre-sentingasystemcalldatasetisshownbelow.
Supposeanobservationdatasetisdividedintonblocks,andthereareatotalofmdistinctelements(e.
g.
,systemcalldataandcommanddata)orfeatures(e.
g.
,networkdata)inthedataset.
Theobserveddatacanbeexpressedbynvec-torswitheachvectorcontainingmdistinctobservations.
An·mMatrixX,whereeachelementXijstandsforthefre-quencyofjthdistinctelement(e.
g.
,systemcalldataandcommanddata)orfeature(e.
g.
,networkdata)occursintheithblock,isthenconstructed.
TheobserveddatasetthatisrepresentedbyamatrixXn·mcanbewrittenasEq.
(2),whererowvectorsx1,x2,,xnrepresentthecorre-spondingblocksoftheoriginaldata.
3.
2.
2.
DimensionreductionandfeatureextractionGivenatrainingsetofdatavectorsx1,x2,,xn,theaveragevectorlandeachmean-adjustedvectorcanbecomputedby(3)and(4).
meigenvalue-eigenvectorpairs(k1,u1),(k2,u2)km,um)ofthesamplecovariancematrixCarethencalculated.
Thesizeofprincipaleigenvectorsu1,u2,,uk(k>m),usedtorepresentthedistributionoftheoriginaldata,isoftendeterminedby(6).
Anydatavectorofthetrainingsetcanberepresentedbyalinearcombinationofkeigen-vectorssothatthedimensionalityofthedataisreducedandthefeaturesofthedataareextracted.
3.
2.
3.
ClassicationAtestdatavectortwhichrepresentsatestblockofdatacanbeprojectedontothek-dimensionalsubspaceaccord-ingtotherulesdenedby(7).
Thedistancebetweenthetestdatavectoranditsreconstructioninthesubspaceissimplythedistancebetweenthemean-adjustedinputdatavectorU=tlandUfUUTtlUy8Ifthetestdatavectorisnormal,thatis,ifthetestdatavectorisverysimilartothetrainingvectorscorrespondingtonormalbehavior,thetestdatavectoranditsreconstruc-tionwillbeverysimilarandthedistancebetweenthemwillbeverysmall[49,50].
Basedonthisproperty,normalpro-gram,userandnetworkbehaviorscanallbeproledforanomalydetection[52,53].
Intheexperimentspresentedhere,threemeasures,squaredEuclideandistanceee,CosinedistanceecandSignal-to-NoiseRatio(SNR)es,areusedtomapthedistanceorsimilarityofthesetwovectorsinordertocomparethetestingresults:64W.
Wangetal.
/ComputerCommunications31(2008)58–72eekUUfk29ecUTUfkUkkUfk10es10logkUk2kUUfk2!
11Inanomalydetection,ee,ecandesarecharacterizedasanomalyindexes.
Ifeeandecarebeloworesisaboveapre-determinedthreshold,thenthetestdatatisclassiedasnormal.
Otherwiseitistreatedasanomalous.
4.
ExperimentsandtestingInourexperiments,weusedfourdatasets(whichincludethreedatasources),lprsystemcalldatafromtheUniversityofNewMexicoandftpsystemcalldatafromtheKLINNSlabofXi'anJiaotongUniversity,shellcom-manddatafromAT&TResearchlab,andnetworkconnec-tiondatafromMITLincolnLab,totesttheanomalydetectionmodel.
4.
1.
Experimentsonsystemcalldata4.
1.
1.
DatasetsInordertofacilitatethecomparison,weusethesamesystemcalldatausedforevaluatingtheHMM-basedintru-siondetectionmethodinSection2.
1.
4.
1.
2.
TestingresultsandanalysisThemodelgivesgoodtestingresults.
Fig.
2showsthedetectionresultsonthelprdatausingthesquaredEuclid-eandistancemeasurewith200tracesofdatarandomlyselectedfromnormaldatafortrainingandanother700tracesfordetection.
Itisclearthatabnormaldatacanbeeasilydistinguishedfromnormaldatabasedontheanom-alyindex.
Intheexperiments,theratioa,asdenedin(6),isselectedas99.
9%andthetestingresultsincludingDetec-tionRates(DR)andFalseAlarmRates(FAR)aresumma-rizedinTable4forcomparison.
ItisobservedthatthedetectionresultsarethebestintermsofDRandFARwithsquaredEuclideandistancemeasure.
ThisisbecausePCAinnatureseeksaprojectionthatbestrepresentstheoriginaldatainaleast-squaresense.
TheresultsremainsimilarwithCosinedistanceandSNR.
Toevaluatetheimpactoftherateadenedin(6)ontheperformanceofthedetectionmodel,weconductthetestingwith200normaldatatracesfortrainingandsquaredEuclideandistanceasanomalyindexforanomalydetec-tion.
ThetestingresultsareshowninTable5.
Itisseenthatthefalsealarmrateisthelowestwhena=99.
92%.
Itdecreasesrstandthentendstoincreasewithincreaseofratioa.
Whenratioaisrelativelysmall,thevarianceinthedatasetcannotbeadequatelyrepresentedbythereducedsubspace.
Somevaluableinformationintheoriginaldatamaybediscardedandthisleadstorelativelyhighfalsealarmrates.
Ontheotherhand,whenratioaislargeenoughandnearto100%,thereducedsubspacecontainsnoise[45,51]thatreducestheeectivenessoftheintrusiondetection.
Also,thethresholdtendstobesmallerandsmal-lerwithincreaseofratioaandthismakesitdicultfordetection.
Withthesetestingresults,wesuggesttousea=99.
9%forfeatureextractionandsquaredEuclideandistanceasanomalyindexforanomalydetectioninrealenvironments.
Itcanreducethedatalargelywithgoodtest-ingresult.
WewillalsoverifythissuggestionwithnetworkdatainSection4.
3.
ThemodelisalsoeectiveforsystemcalldatafromourKLINNSlab.
Fig.
3showsthetestingresultsofourmodelontheftpsystemcalldatawith70normaldatatracesfortrainingandanother485datatracesfortesting,usinga=99.
9%andsquaredEuclideandistancemeasure.
Itisseenthatsixintrusiontracesarealldetectedwithoutanyfalsealarms.
ComparingthetestingresultsobtainedbyusingPCAmethodwiththoseoftheHMMmethod(showninTables2and3),itisobservedthatHMMisabettermethodthanPCAintermsofdetectionaccuracyonlprdatawithwin-dowsize6.
However,thedetectionaccuracyofHMMonLiveftpdataisthesameasthatofPCA.
TheHMMmethoddetectedalltheintrusionswith2.
3%falsealarmsrateonlprdataandwithoutfalsealarmsontheLiveftpdata.
Ontheotherhand,whilethePCAmethodsuccess-fullydetectedalltheintrusionsintheLiveftpdata,itreachedat2.
8%falsealarmsrateonthelprdata.
Theseresultsshowthatfocusingonthetransitionpropertyofauditeventscanachievebetterdetectionaccuracythanfocusingonthatofthefrequencyproperty.
However,usingfrequencypropertyofauditeventscanalsoyieldsatisfac-toryresultsinintrusiondetection.
Ourmethodiscomputationallyecient.
Duringthedetectionstage,thesquaredEuclideandistancebetweenatestvectoranditsreconstructionontothesubspaceisusedfordetection.
CalculationsforeachtestblockofdatatakeO(mk),wheremisthedimensionofeachvectorrepresent-Fig.
2.
Testingresultsonthelprsystemcalldata.
They-axisrepresentstheanomalyindexandx-axisrepresentsthesystemcalltracenumber.
Thestars(*)inthegrayshadingstandforabnormaltracesandthedots(d)withnoshadingstandfornormaltraces.
They-axisisexpandedforreadability.
W.
Wangetal.
/ComputerCommunications31(2008)58–7265ingeachblockofdataandkisthenumberofprincipalcomponentsusedinthemodel.
Experimentalresultsshowthatafterthehigh-dimensionaldataisreduced,theoriginaldatacanberepresentedbythelinearcombinationofonlyaverysmallnumberofprinciplecomponentswithoutsacri-cingvaluableinformation.
Intheexperiments,forexam-ple,theuseofonlysixprinciplecomponentsoutof41dimensionscanrepresenttheoriginaldatawithlessthan0.
1%lossofthetotalvariation.
Therefore,theoriginaldatacanbelargelyreducedforintrusiondetectionandkisverysmall.
Becausethesubspaceislowdimensionalandtheclassierissimple,littlecomputationaleortisrequiredforthedetection.
Moreover,systemresourcescouldbelar-gelysavedforlowdimensionaldatawhichareconvenientlystoredandtransmitted.
Intheexperiments,weevaluatethecomputationalper-formanceofourPCAmodelincomparisonwiththeHMMmethoddescribedinSection2.
1andthetidemethodreportedin[10]intermsoftrainingtimeforbuildingthemodelsaswellastesttimefordetection.
Theexperimentsareconductedona2.
4-GHzPentiumcomputerwith512MBDDRmemoryandthetestingresultsareshowninTable6.
Itisobservedthatonly5sarerequiredforPCAmethodversusupto4632sfortrainingthesamesizeofthedataforHMMmethod.
Tideisusuallyregardedasanecientmethodforreal-timeintrusiondetection[12].
However,ittakesabout33sfortraining,requiringmoretimethanourPCAmodel.
Theresultsondetectiontimesareconsistentwiththoseonthetrainingtime.
TheHMMmethodrequiredabout949sfordetectingabout600thou-sandssystemcallswithwindowsizeas3.
Bycomparison,ourPCAmodelonlyrequiresabout14sversusabout356sinthetidemethodforthesamesizeofdata.
Basedonthecomparativestudiesofthesetwointrusiondetectionmethodsdiscussedabove,itshowsthatutilizingthetransitionpropertyofauditeventscanproduceagooddetectionperformanceonlyathighcomputationalexpense.
Relyingonthefrequencypropertyofevents,ontheotherhand,isverysuitableforreal-timeintrusiondetection,pro-vidingadequatedetectionperformanceatverylowcompu-Table4DetectionRates(DR)andFalseAlarmRates(FAR)withdierentdistanceorsimilaritymeasuresSquaredEuclideandistancemeasureCosinemeasureSNRee(·103)FAR(%)DR(%)ecFAR(%)DR(%)esFAR(%)DR(%)0.
2732.
81000.
98310.
310049.
83010.
31001.
2000.
599.
70.
9894.
599.
738.
7914.
599.
7Table5DetectionRates(DR)andFalseAlarmRates(FAR)usingdierentdimensionalityofreducedsubspaceDimensionalityofsubspacekRatea(%)ee(·103)FAR(%)DR(%)198.
340.
79513.
51005.
2001.
399.
7299.
220.
60610.
31001.
6005.
399.
7599.
870.
2793.
81001.
3000.
899.
7699.
920.
2732.
81001.
2000.
599.
7799.
950.
20531001.
1000.
599.
71099.
990.
1353.
31000.
2131.
599.
71599.
990.
0483.
51000.
123199.
73299.
99a6.
3·1063.
751003599.
99a2.
9·1073.
75100361003.
21·102935100411003.
20·102935100aRateahereisverycloseto100%.
Fig.
3.
TestingresultsontheftpsystemcalldatafromourKLINNSLab.
They-axisrepresentstheanomalyindexandx-axisrepresentsthesystemcalltracenumber.
Thestars(*)inthegrayshadingstandforabnormaltracesandthedots(d)withnoshadingstandfornormaltraces.
They-axisisexpendedforreadability.
They-axisisexpandedforreadability.
Table6TraininganddetectiontimeswithsystemcalldataMethodNumberofsystemcallsfortrainingCPUTimefortraining(s)NumberofsystemcallsfordetectionCPUTimefordetection(s)HMM159,6424632682,637949(windowsize=3)1662(windowsize=6)Tide159,64233682,637356PCA159,6425682,6371466W.
Wangetal.
/ComputerCommunications31(2008)58–72tationaloverhead.
PCAmethod,consideringthefrequencypropertyoftheauditevents,isthusverysuitableforpro-cessingmassiveauditdatastreamsforreal-timeintrusiondetection.
Mostcurrentintrusiondetectionmethodscon-sideringthetransitionpropertyofeventsrstlydividesequencesofsystemcallsbyaxedlengthofslidingwin-dowfordatapreparation.
Detectionperformanceisshowntobesensitivetowindowsize[54].
Aswindowsizeincreases,thedetectionperformanceimproves,butonlyatconsiderablecomputationalexpense.
Moreover,thequestionofhowtochoosethewindowsizehasnotbeensucientlyaddressed.
ThePCAmethodtakesintoaccountthefrequencypropertyofsystemcalls.
Processesarecon-sideredasobservation.
Itthenavoidsdecidingthesizeofslidingwindowswhichareoftenchosenbyexperience.
InthePCAmethod,eachprocessisrepresentedbyadatavec-tor,whereeachentryisthefrequencyofadistinctsystemcallduringtheexecutionofaprocess.
Inthisway,theanomalyintrusiondetectionproblemistransformedintothesimplerproblemofclassifyingthesevectorsasnormalorabnormal.
PCAmethodusesasimpleclassierandcanachieveagoodreal-timedetectionperformance.
4.
2.
Experimentsonshellcommanddata4.
2.
1.
DatasetsTheshellcommanddatathatcomesfromaUNIXser-veratAT&T'sShannonResearchLaboratoryareusedfortesting.
Usernamesandassociatedcommandsequences(withoutarguments)makeupthetestingdataavailableathttp://www.
schonlau.
net/intrusion.
html.
Fiftyusersareincludedwith15,000consecutivecommandsforeachuserdividedinto150blocksof100commands.
Therst50blocksareuncontaminatedandusedastrainingdata.
Themasqueradingcommandblocks,randomlydrawnfromoutsideofthe50users,areinsertedintothecommandsequencesofthe50usersintherest100blocks.
Thedetailsofthecontaminationprocedurecanalsobefoundonthewebsite.
Thegoaloftheexperimentsistocorrectlydetectmas-queradingblocks.
Eachdatablockofauseristransformedintoavectorwhichcontainsthefrequenciesofindividualcommandsembeddedintheblock.
Atestdatavectorrep-resentingadatablockofauseristhenusedasdatainputforanomalydetectionby(8)and(9).
4.
2.
2.
TestingresultsandanalysisWeconducttheexperimentsonthe50usersandthetest-ingresultsofmostusersarepromising.
ThetestingresultsonUser24areshowninFig.
4asanexample.
ItisobservedthatsimulatedmasqueradedataarelocatedatBlocks69–89withgrayshadinginFig.
4andourmodelcaneasilycatchthemall.
Tousemuchdataforevaluatingthedetectionandcom-putationalperformanceofthemodel,wereconstructthedataforprolingoneuserbehaviorforanomalydetectionintheexperiments.
Werandomlyselecttwodatasetsoftwousers.
Therst50datablocksoftherstuserareusedfortrainingandrest100datablocksoftherstuserareconsideredasnormaland150blocksoftheseconduserasabnormal.
User5anduser32areselectedintheexperiments.
Weusea=99.
9%andsquaredEuclideandistanceasanomalyindexforanomalydetectionintheexperimentsandthetestingresultsareshownasFig.
5.
Table7showstheCPUtimesrequiredforthetraininganddetectionforFully-ConnectedHMMs(FC-HMM),Left-to-RightHMMs(LR-HMM),andCE(CrossEntropy)methodreportedin[14]andECMmethodreportedin[9]andourPCAmethod.
FC-HMMandLR-HMMmethodsarebasedonthetransitionpropertyofauditeventsandECMmethodisbasedonthecorrelationinformationofauditeventswhileCEmethodandourPCAmethodarebasedonthefrequencypropertyofauditevents.
Ourmethodistestedonacomputerwith2.
4GHzPentiumFig.
4.
TestingresultofUser24.
They-axisrepresentstheanomalyindexandx-axisrepresentscommandblocknumber.
Thestars(*)inthegrayshadingindicatesimulatedmasqueradesdataanddots(d)withnoshadingstandfornormaldata.
Fig.
5.
Testingresultsofthecombineddataofuser5anduser32.
They-axisrepresentstheanomalyindexandx-axisrepresentscommandblocknumber.
Allthedatablocksofuser5and32areuncontaminated,thereforetherst100datablocksfromuser5aretreatedasnormal(d)andblocks101–250fromuser32areconsideredasabnormal(*)withgrayshading.
W.
Wangetal.
/ComputerCommunications31(2008)58–7267CPUand512DDRMBmemoryandECMmethodwastestedonaworkstationwith3.
2GHzCUPand4GBmemory[9]whiletheothertwomethodsweretestedonanUltraSPARC30workstation[14].
FromFig.
6,itisseenthattheabnormaldatacanbe100%distinguishedfromthenormaldatawithoutanyfalsealarmsbyusingourmodel.
ItisalsoobservedfromTable7thatourmethodismuchfasterthantheFC-HMM,LR-HMMandECMmethodsintraining,whiletheCEmethoddoesnotrequiretrainingtime.
Thedetectiontimeofourmethodisalsofasterthanotherfourmethods.
Thisshowsthatconsideringcorrelationinformationofauditeventsisasmuchcomputationalcostlyasusingthoseofthetransitionproperty.
Takingaccountofthefrequencypropertyofauditevents,ontheotherhand,requirelowoverloadnotonlyfortrainingbutalsofordetection.
Itisthussuitableforprocessingofmassivedatastreamsforreal-timeintrusiondetection.
4.
3.
Experimentsonnetworkdata4.
3.
1.
DatasetsThenetworkdatausedfortestingisdistributedbyMITLincolnLabfor1998DARPAevaluation[42].
Thedatacontainstracinasimulatedmilitarynetworkthatconsistsofhundredsofhosts.
Thedataincludes7weeksoftrainingsetand2weeksoftestsetthatwerenotfromthesameprobabilitydistributionasthetrainingset.
Sincetheprobabilitydistributionisnotthesame,inourexper-iments,weonlyusethetrainingsetandsampleonepartofthedatafortrainingandanotherdierentpartofthedatafortesting.
Therawtrainingsetofthedatacontainsabout4GBofcompressedbinarytcpdumpdataofnetworktraf-canditwaspre-processedintoabout5millionconnec-tionrecordsbyLeeetal.
[33,34]aspartoftheUCIKDDarchive[55].
AconnectionisasequenceofTCPpacketsstartingandendingatsomewelldenedtimes,betweenwhichdataowsfromasourceIPaddresstoatargetIPaddressundersomewelldenedprotocol[55].
Inthe10%subsetdata,eachnetworkconnectionislabeledaseithernormal,orasanexactlyonespecickindofattack.
Thereare22typesofattacksintotalinthesubset.
Theseattacksfallinoneofthefollowingfourcategories:DOS:denial-of-service(e.
g.
,teardrop).
R2L:unauthorizedaccessfromaremotemachine(e.
g.
,passwordguessing).
U2R:unauthorizedaccesstolocalsuperuser(root)priv-ilegesbyalocalunprivilegeduser(e.
g.
,bueroverowattacks).
PROBE:surveillanceandotherprobing(e.
g.
,portscanning).
Aconnectionofthenetworkdatacontains41features.
ThesefeatureswereextractedbyLeeetal.
fromtherawdatadividedintothreegroups:basicfeaturesofindividualTCPconnections,tracfeaturesandcontentfeatureswithinaconnectionsuggestedbydomainknowledge[33,34].
Amongthese41features,34arenumericand7aresymbolic.
Onlythe34numericfeatureswereusedintheexperiments.
Eachconnectioninthedatasetisthustransformedintoa34-dimensionalvectorasdatainputfordetection.
Thereare494,021connectionrecordsinthetrainingsetinwhich97,277arenormaland396,744areattacks.
Inthenormaldata,werandomlyselected7000connectionsfortrainingthenormalmodeland10,000fordetection.
Alltheattackdataareusedfordetec-tion.
ThedatadescriptionsintheexperimentsareshowninTable8.
Table7TraininganddetectiontimeswithshellcommanddataMethodNumberofshellcommandsfortrainingCPUtimefortraining(s)NumberofshellcommandsfordetectionCPUtimefordetection(s)FC-HMM10,82632,69610,98120LR-HMM10,82633,53210,98112CE10,826010,98114ECM250,00050,44410022.
15PCA10,000611,0005Fig.
6.
ROCcurvesfordierentratiosausedinthenetworkintrusiondetectionexperiments.
68W.
Wangetal.
/ComputerCommunications31(2008)58–724.
3.
2.
TestingresultsandanalysisIntheexperiments,weuseReceiverOperatingCharac-teristic(ROC)curvestoevaluatethenetworkintrusiondetectionperformanceofourmodel.
TheROCcurveistheplotofDRagainstFAR.
ThereisatradeobetweentheDRandFPRandtheROCcurveisobtainedbysettingdierentthresholdontheanomalyindexdenedby(8).
Tofurtherinvestigatetheimpactoftheratioadenedin(6)ontheperformanceoftheintrusiondetectionmodel,wealsousedierentnumberofprincipalcomponentsintheexperimentsandthetestingresultsofoveralldataareshowninFig.
6.
Intheexperiments,onlyoneprincipalcomponentcanaccountfor90.
079%ofthetotalvariationandtwoprincipalcomponentscanaccountforthatof99.
953%.
FromtheROCcurveshowninFig.
6,itisobservedthatthetestingresultsarerelatedtotheratioa.
Thetestingresultsarethebestwhentheratioisabout99.
9%andthisconsistswithourpreviousresultsonsystemcalldatadiscussedinSection4.
1.
TestingresultsonallthenetworkdataareshowninFig.
7.
Toinvestigatetheperformanceofourmodelondif-ferentcategoriesofattackdata,weconducttheexperi-mentsoneachcategoryoftheattackdata.
Fig.
8showstheROCcurvesofthedetectionperformanceofourmodelonfourcategoriesofattackdataaswellasoveralldata.
FromFig.
7andtheROCcurvesshowninFig.
8,itisobservedthatourmodelisabletodetectmostattackswithlowfalsealarmrates.
Inmoredetails,ourmodelcandetectveryhighpercentageofDOSandU2RattackswithasmallnumberofR2LandPROBEattacksmissed.
Toevaluatetheperformanceofourmethodandcom-parewithdierentmethods,wesummarizetheDRandFARofthefourattackcategoriesaswellastheoverallinTable9incomparisonwithothervemethodsreportedin[35–37].
Intheexperiments,wealsomeasuretheCPUtimesfortraininganddetectiononacomputerwith2.
4GHzPentiumCPUand512DDRMBmemoryshowninTable10.
Intheexperiments,weusedrandomlyselectednormaldataofthetrainingsetforestablishingnormalbehaviorandusednearlyalltheotherdatainthesamesetfordetec-tion.
TheGCmethod[36]alsousedthetrainingsetbothfortraininganddetectionbutonlywithaverysmallpartofthedata.
IntheCluster,K-NNandSVMmethods[37],manyattackdataarelteredsothattheresultingdatasetconsistedof1%to1.
5%attackand98.
5–99%normalinstancesforunsupervisedanomalydetection.
ThePCCmethod[35]usedthesamedatasetasoursbothfortrain-inganddetectionbutalsowithasmallerdatasize.
FromTable9,itcanbeseenthatourmodelisbetterthantheotherrstfourmethodsintermsofdetectionratesandfalsealarmrates.
TheDMmethod[33,34],HKNmethod[39]andSOMmethod[38]usedboththenormaldataandattackdataofthetrainingsetfordeningattacksigna-turesorforbuildingdetectionmodelsandusedthetestsetfordetection.
TheDMmethodachievedan80.
2%detec-tionrateandHKNmethodachieved93.
46%detectionrateat3.
99%falsealarmratewhileSOMmethodobtained89%detectionrateat4.
6%falsealarmrate.
ThedetectionperformanceofthePCCmethodisalmostthesameasourPCAmodel.
ThePCCmethodmeasuredtheMahalanobisdistanceofeachobservationfromthecenterofthedataforanomalydetection.
Anyobservationthathasdistancelargerthanathresholdisconsideredasananomaly.
TheMahalanobisdistanceisthencomputedbasedonthesumofsquaresofthestandardizedprincipalcomponentscores.
ThePCCmethodusedbothprincipalFig.
7.
Testingresultsofallthenetworkdata.
They-axisrepresentstheanomalyindexandx-axisrepresentsthenetworkconnectionnumber.
Thestars(*)inthegrayshadingstandsforattackconnectionsandthedots(d)standfornormalconnections.
They-axisiscompressedforreadability.
Fig.
8.
ROCcurvesforfourcategoriesofattackdataandoveralldata.
Table8DescriptionsofnetworkdataintheexperimentsDatacategoryTotalnumberofnetworkconnectionsNumberofnetworkconnectionsusedNormal97,2777000fortraining1000fortestingAttackDOS391,458391,458R2L11261126U2R5252PROBE41074107W.
Wangetal.
/ComputerCommunications31(2008)58–7269componentsandminorcomponentsofthesampleinthedetectionstage.
OurPCAmethoddirectlyreducesthehighdimensionaldataintolowdimensionalspaceandusethedistancebetweeneachobservationanditsreconstructioninthereducedsubspaceforanomalydetection.
Onlyprin-cipalcomponentsarerequiredtoformthesubspaceandthedetectionschemeisstraightforwardandeasytohandle.
ThePCCmethodusedveprincipalcomponentsand6–7minorcomponentsintheexperimentswhileourPCAmethodonlyusedtwoprincipalcomponentsandachievedbetterdetectionresults.
ThePCCmethodassumesthatthesumofsquaresofseveralstandardizedprincipalcompo-nentfollowsav2distribution.
Ourmodelavoidsanydatadistributionalassumptionandcanbemorepracticalforapplication.
Intheexperiments,ourmodelisevaluatedon10,000normalnetworkconnectionsandalltheattackconnec-tions.
Thusover400,000networkconnectionsareincluded.
ItcanbeobservedfromTable10thattraininganddetec-tionareveryecient.
Forexample,lessthan1secondisrequiredfordetectingabout15,000networkconnections.
Thisshowsthatourmodelissuitableforreal-timeanomalydetectiononnetworkdata.
5.
ConcludingremarksInthispaper,wepresentaneectiveanomalyintrusiondetectionmodelbasedonPrincipalComponentAnalysis(PCA).
Themodelismoresuitableforhighspeedprocess-ingofmassivedatastreamsinreal-timethanuseoftransi-tionpropertyorcorrelationproperty.
Inourmodel,thedatablockforaprocess,commandornetworkconnectionisassociatedwithadatavectorrepresentingthefrequenciesorotherextractedfeaturesofindividualelementsinthedatablock.
Largeamountsofdataarethussignicantlyreduced.
Theanomalyintrusiondetectionproblemiscon-vertedintoclassifyingthesevectorsasnormalorabnormal.
ThedetectionmodelprovidesageneralframeworkforestablishingapracticalIDSinvariousenvironments.
Itcanprocessmanysourcesofauditdatasuchassystemcall,UNIXcommand,networkdata,etc.
,oflargesize,applica-bletoabroadrangeofanomalyintrusiondetection.
Datausedinintrusiondetectionproblemsarehighdimensionalinnature.
OurmodelappliesPCAtoreducethehighdimensionalityofthedata.
TheanomalyindexofadatablockisrepresentedasasinglenumberastheEuclideandistancebetweenthedatavectoranditsrecon-structioninthereducedsubspacesothatnormalbehavioriseasilyproledandanomalydetectioniseasilyimple-mentedwithoutanyadditionalclassier.
Itisthusaneec-tivemodeltoprocessamassofauditdatainreal-timewithlowoverheadandissuitableforreal-timeintrusiondetection.
Itispossibleforahackertoescapedetectionbynotlet-tingtheprocessterminate.
However,themodelcanstillbemadeeectiveforreal-timeanomalydetection.
Anattackusuallyproducesoneormoreprogramsandeachprogramproducesoneormoreprocesses.
Ifoneprocessisdetectedasanomalous,theprogramcontainingtheprocessisthenclassiedasanomalousandanintrusionalarmisthusreported.
Besides,toavoidthissituation,onecanspecifyamaximumlengthofsystemcallsineachprocess,forexample,lengthof1500[25],andthenonlyusethelimitedlengthofthesystemcallsequencefordetectionwithoutreachingtheendoftheprocess.
TherearealsodisadvantagesforthemodelsbasedonthefrequencypropertyofthesystembehaviorsuchasPCA.
Iffrequenciesofsystemcallsorcommandsgeneratedbyahostileprogramoranunauthenticuserareverysim-ilartothoseproducedbynormalprogramsorauthenticTable9TheDetectionRates(DR)andFalseAlarmRates(FAR)incomparisonwithothermethodsMethodsOverallDoSR2LU2RProbeDR(%)FAR(%)DR(%)FAR(%)DR(%)FAR(%)DR(%)FAR(%)DR(%)FAR(%)GC(Liu)[35]59.
40.
456–66–78–44–Cluster(Eskin)[36]9310K-NN(Eskin)[36]918SVM(Eskin)[36]9810PCC(Shyu)[37]97.
890.
92PCA98.
80.
499.
20.
294.
5488.
50.
680.
74Table10TraininganddetectiontimeswithnetworkdataDatacategoryTrainingTesting(Notethateachcategoryofattackdataincludes10,000normalnetworkconnectionsforanomalydetection)NormaldataDOSattackR2LattackU2RattackPROBEattackNumberofnetworkconnections7000401,45811,12610,05214,107CPUtime(s)366580.
330.
250.
4570W.
Wangetal.
/ComputerCommunications31(2008)58–72usersalthoughthesequencesarequitedierent,PCAcanhardlydetecttheanomalies.
Theothermethodsbasedonthetransitionorcorrelationanalysismaydetectsuchanomalywithoutdiculty.
Fourdatasets,thesystemcalldatafromUNMandourKLINNSlab,theshellcommandfromAT&TResearchlabandnetworkdatafromMITLincolnLab,areusedtovalidatethemodel.
Extensiveexperimentsareconductedtotestourmodelandtocomparewiththeresultsofmanyothermethods.
Testingresultsshowthatthemodelispromisingintermsofdetectionaccuracy,computationaleciencyandimplementationforreal-timeintrusiondetec-tion.
Forfurtherwork,weareinvestigatingapproachestocombiningthefrequenciespropertieswiththetransitionpropertiesandcorrelationinformationofthesystemandnetworkbehaviorinordertoachievelowerfalsealarmratesandhigherdetectionrates.
AcknowledgementsWethankDr.
WeixiangLiu,GraduateSchoolatShenz-hen,TsinghuaUniversity,forthefruitfulsuggestionsandcomments.
WethankMs.
MizukiOka,DepartmentofCom-puterScienceofUniversityofTsukuba,Japan,forthevalu-ablediscussions.
TheresearchpresentedinthispaperwassupportedinpartbytheNSFC(60736027,60574087),863HighTechDevelopmentPlan(2007AA01Z475,2007AA04Z154,2007AA01Z480,2007AA01Z464)and111InternationalCollaborationProgram,ofChina.
References[1]D.
E.
Denning,''Anintrusion-detectionmodel'',IEEETransactionsonSoftwareEngineering13(2)(1987)222–232.
[2]S.
E.
Smaha,Haystack:Anintrusiondetectionsystem,in:ProceedingsoftheIEEEFourthAerospaceComputerSecurityApplicationsConference,1988.
[3]T.
Lunt,A.
Tamaru,F.
Gilham,R.
Jagannathan,P.
Neumann,H.
Javitz,A.
Valdes,T.
Garvey,Areal-timeintrusiondetectionexpertsystem(IDES)–naltechnicalreport,Technicalreport,ComputerScienceLaboratory,SRIInternational,MenloPark,California,February1992.
[4]D.
Anderson,T.
Frivold,A.
Valdes,Next-generationintrusiondetectionexpertsystem(NIDES):asummary.
TechnicalReportSRI-CSL-95-07,ComputerScienceLaboratory,SRIInternational,MenloPark,California,May1995.
[5]M.
Schonlau,M.
Theus,Detectingmasqueradesinintrusiondetec-tionbasedonunpopularcommands,InformationProcessingLetters76(2000)33–38.
[6]M.
Schonlau,W.
Dumouchel,W.
-H.
Ju,A.
F.
Karr,M.
Theus,Y.
Vardi,Computerintrusion:detectingmasquerades,StatisticalScience16(1)(2001)58–74.
[7]R.
A.
Maxion,T.
N.
Townsend,Masqueradedetectionusingtruncatedcommandlines,ProceedingsoftheInternationalConferenceonDependableSystemsandNetworks(DSN'02),IEEEComputerSocietyPress,Washington,D.
C.
,LosAlamitos,California,2002,pp.
219–228.
[8]T.
Lane,C.
E.
Brodley,Temporalsequencelearninganddatareductionforanomalydetection,in:ProceedingsofFifthACMConferenceonComputerandCommunicationSecurity,1998.
[9]M.
Oka,Y.
Oyama,H.
Abe,K.
Kato,Anomalydetectionusinglayerednetworksbasedoneigenco-occurrencematrix,in:Proceed-ingsofSeventhInternationalSymposiumonRecentAdvancesinIntrusionDetection(RAID'2004),Springer,LNCS-3224,2004,pp.
223–237.
[10]S.
Forrest,S.
A.
Hofmeyr,A.
Somayaji,T.
A.
Longsta,AsenseofselfforUnixprocesses,in:Proceedingsofthe1996IEEESymposiumonResearchinSecurityandPrivacy,LosAlamos,CA,1996,pp.
120–128.
[11]W.
Lee,S.
Stolfo,Dataminingapproachesforintrusiondetection,in:ProceedingsoftheSeventhUSENIXSecuritySymposium,UsenixAssociation,1998,pp.
79–94.
[12]C.
Warrender,S.
Forrest,B.
Pearlmutter,Detectingintrusionsusingsystemcalls:alternativedatamodels,in:Proceedingsof1999IEEESymposiumonSecurityandPrivacy,1999,pp.
133–145.
[13]Q.
Yan,W.
Xie,B.
Yan,G.
Song,AnanomalyintrusiondetectionmethodbasedonHMM,ElectronicsLetters38(13)(2002)663–664.
[14]D.
Y.
Yeung,Y.
Ding,Host-basedintrusiondetectionusingdynamicandstaticbehavioralmodels,PatternRecognition36(1)(2003)229–243.
[15]S.
B.
Cho,H.
J.
Park,EcientanomalydetectionbymodelingprivilegeowsusinghiddenMarkovmodel,ComputersandSecurity22(1)(2003)5–55.
[16]W.
Wang,X.
Guan,X.
Zhang,Modelingprogrambehaviorsbyhiddenmarkovmodelsforintrusiondetection,in:ProceedingsoftheThirdInternationalConferenceonMachineLearningandCybernet-ics(ICMLC'2004),2004,pp.
2830–2835.
[17]A.
Wespi,M.
Dacier,H.
Debar,Intrusiondetectionusingvariable-lengthaudittrailpatterns,in:ProceedingsoftheThirdInternationalWorkshopontheRecentAdvancesinIntrusionDetection(RAID'2000),LNCS-1907,2000.
[18]M.
Asaka,T.
Onabuta,T.
Inoue,S.
Okazawa,S.
Goto,Anewintrusiondetectionmethodbasedondiscriminantanalysis,IEICETransactionsonInformationandSystemsE84D(5)(2001)570–577.
[19]W.
Lee,D.
Xiang,Information-theoreticmeasuresforanomalydetection,in:Proceedingsofthe2001IEEESymposiumonSecurityandPrivacy,Oakland,CA,May2001.
[20]Y.
H.
Liao,V.
R.
Vemuri,Useofk-nearestneighborclassierforintrusiondetection,ComputersandSecurity21(5)(2002)439–448.
[21]W.
Hu,Y.
Liao,V.
R.
Vemuri,Robustsupportvectormachinesforanomalydetectionincomputersecurity,in:Proceedingofthe2003InternationalConferenceonMachineLearningandApplications(ICMLA'03),LosAngeles,California,2003.
[22]S.
B.
Cho,Incorporatingsoftcomputingtechniquesintoaprobabi-listicintrusiondetectionsystem,IEEETransactionsonSystems,Man,andCybernetics–PartC32(2)(2002)154–160.
[23]Z.
Cai,X.
Guan,P.
Shao,Q.
Peng,G.
Sun,Aroughsettheorybasedmethodforanomalyintrusiondetectionincomputernetworks,ExpertSystems18(5)(2003)251–259.
[24]L.
Feng,X.
Guan,S.
Guo,Y.
Gao,P.
Liu,Predictingtheintrusionintentionsbyobservingsystemcallsequences,ComputersandSecurity23(5)(2004)241–252.
[25]W.
Wang,X.
Guan,X.
Zhang,Prolingprogramanduserbehaviorsforanomalyintrusiondetectionbasedonnon-negativematrixfactorization,in:Proceedingsof43rdIEEEConferenceonControlandDecision(CDC'2004),Atlantis,ParadiseIsland,Bahamas,2004,pp.
99–104.
[26]N.
Ye,Y.
Zhang,C.
M.
Borror,RobustnessoftheMarkovchainmodelforcyberattackdetection,IEEETransactionsonReliability53(1)(2004)116–121.
[27]W.
-H.
Ju,Y.
Vardi,Ahybridhigh-orderMarkovchainmodelforcomputerintrusiondetection,JournalofComputationalandGraph-icalStatistics10(2)(2001)277–295.
[28]N.
Ye,Q.
Chen,ComputerintrusiondetectionthroughEWMAforauto-correlatedanduncorrelateddata,IEEETransactionsonReli-ability52(1)(2003)73–82.
[29]N.
Ye,X.
Li,Q.
Chen,S.
M.
Emran,M.
Xu,Probabilistictechniquesforintrusiondetectionbasedoncomputerauditdata,IEEETrans-actionsonSystems,Man,andCybernetics–PartA31(4)(2001)266–274.
W.
Wangetal.
/ComputerCommunications31(2008)58–7271[30]N.
Ye,Q.
Chen,Ananomalydetectiontechniquebasedonachi-squarestatisticfordetectingintrusionsintoinformationsystems,QualityandReliabilityEngineeringInternational17(2)(2001)105–112.
[31]A.
K.
Ghosh,A.
Schwartzbard,M.
Schatz,Learningprogrambehaviorprolesforintrusiondetection,in:ProceedingsoftheFirstUSENIXWorkshoponIntrusionDetectionandNetworkMonitor-ing,1999,pp.
51–62.
[32]P.
A.
Porras,P.
G.
Neumann,EMERALD:EventMonitoringEnablingResponsestoAnomalousLiveDisturbances,in:Proceed-ingsofNationalInformationSystemsSecurityConference,Balti-more,MD,1997.
[33]W.
Lee,S.
Stolfo,K.
Mok,Adataminingframeworkforadaptiveintrusiondetection,in:Proceedingsofthe1999IEEESymposiumonSecurityandPrivacy,LosAlamos,CA,1999,pp.
120–132.
[34]W.
Lee,S.
Stolfo,AFrameworkforconstructingfeaturesandmodelsforintrusiondetectionsystems,ACMTransactionsonInformationandSystemSecurity3(4)(2000)227–261.
[35]Y.
Liu,K.
Chen,X.
Liao,etal.
,''Ageneticclusteringmethodforintrusiondetection'',PatternRecognition37(5)(2004)927–942.
[36]E.
Eskin,A.
Arnold,M.
Prerau,L.
Portnoy,S.
Stolfo,AGeometricframeworkforunsupervisedanomalydetection,ApplicationsofDataMininginComputerSecurity,KluwerAcademics,Dordrecht,2002.
[37]M.
Shyu,S.
Chen,K.
Sarinnapakorn,L.
Chang,Anovelanomalydetectionschemebasedonprincipalcomponentclassier,in:ProceedingsoftheIEEEFoundationsandNewDirectionsofDataMiningWorkshop,inconjunctionwiththeThirdIEEEInternationalConferenceonDataMining(ICDM'2003),2003,pp.
172–179.
[38]H.
Kayacik,A.
Zincir-Heywood,M.
Heywood,OnthecapabilityofanSOMbasedintrusiondetectionsystem,in:ProceedingsoftheIEEEInternationalJointConferenceNeuralNetworks(IJCNN'2003),2003,pp.
1808–1813.
[39]S.
T.
Sarasamma,Q.
A.
Zhu,J.
Hu,HierarchicalKohonenennetforanomalydetectioninnetworksecurity,IEEETransactionsonSystems,ManandCybernetics,PartB35(2)(2005)302–312.
[40]S.
C.
Lee,D.
V.
Heinbuch,Traininganeural-networkbasedintrusiondetector,IEEETransactionsonSystemsmanandCybernetics31(4)(2001)294–299.
[41]G.
Giacinto,F.
Roli,L.
Didaci,Fusionofmultipleclassiersforintrusiondetectionincomputernetworks,PatternRecognitionLetters24(5)(2003)1795–1803.
[42]MITLincolnLaboratory-DARPAIntrusionDetectionEvaluationDocumentation,,1999.
[43]L.
R.
Rabiner,AtutorialonhiddenMarkovmodelsandselectedapplicationsinspeechrecognition,ProceedingsoftheIEEE77(2)(1989).
[44]L.
R.
Rabiner,B.
H.
Juang,AnintroductiontohiddenMarkovmodels,IEEEASSPMagazine(1986).
[45]R.
O.
Duda,P.
E.
Hart,D.
G.
Stork,PatternClassication,seconded.
,ChinaMachinePress,Beijing,2004,Feb.
[46]C.
Kruegel,D.
Mutz,F.
ValeurandG.
Vigna,Onthedetectionofanomaloussystemcallarguments,in:EighthEuropeanSymposiumonResearchinComputerSecurity(ESORICS'2003),LNCS,Nor-way,2003,pp.
101–118.
[47]D.
Mutz,F.
Valeur,C.
Kruegel,G.
Vigna,Anomaloussystemcalldetection,ACMTransactionsonInformationandSystemSecurity9(1)(2006)61–93.
[48]CERTAdvisoryCA-2001-07File,GlobbingVulnerabilitiesinVariousFTPServers,,2001.
[49]I.
T.
Jollie,PrincipalComponentAnalysis,seconded.
,Springer-Verlag,NY,2002.
[50]M.
Turk,A.
Pentland,Eigenfacesforrecognition,JournalofNeuroscience3(1)(1991)71–86.
[51]G.
H.
Golub,C.
F.
vanLoan,MatrixComputation,JohnsHopkinsUniversityPress,Baltimore,1996.
[52]W.
Wang,X.
Guan,X.
Zhang,Anovelintrusiondetectionmethodbasedonprincipalcomponentanalysisincomputersecurity,in:AdvancesinNeuralNetworks-ISNN2004.
InternationalIEEESym-posiumonNeuralNetworks,Dalian,China.
LNCS-3174,August2004,pp.
657–662.
[53]W.
Wang,R.
Battiti,Identifyingintrusionsincomputernetworkswithprincipalcomponentanalysis,in:ProceedingsoftheFirstInternationalConferenceonAvailability,ReliabilityandSecurity(ARES2006),IEEEPressSociety,Vienna,Austria,April2006,pp.
270–277.
[54]K.
M.
C.
Tan,R.
A.
Maxion,Why6Deningtheoperationallimitsofstide,ananomaly-basedintrusiondetector,in:Proceedingsof2002IEEESymposiumonSecurityandPrivacy,2002,pp.
188–201.
[55]KDDCup1999Data,,1999.
WeiWang(wei.
wang.
email@gmail.
com)receivedhisB.
S.
degreeinprocessequipmentandcontrolengineeringandM.
S.
degreeinmechanicalandelectronicengineeringfromXi'anShiyouUniver-sity,Xi'an,China,in1997and2000,respectively,andhisPh.
D.
DegreeinControlScienceandEngineeringfromXi'anJiaotongUniversity,Xi'an,China,in2005.
HewasaresearchfellowfromJuly2005toFebruary2006andapostdoc-toralresearchfellowfromFebruary2006toJuly2006inDepartmentofInformationandCommu-nication,UniversityofTrento,Italy.
HeisapostdoctoralresearchfellowinRSMdepartmentatGET-ENSTBretagne-CampusRennes,Francein2007andwilljoinINRIA(FrenchNationalInstituteResearchinComputerScienceandControl),France,in2008.
Hisresearchinterestscurrentlyfocusoncomputernetworkedsystemsandcomputernetworksecurity.
XiaohongGuan(xhguan@tsinghua.
edu.
cn)receivedhisB.
S.
andM.
S.
degreesinControlEngineeringfromTsinghuaUniversity,Beijing,China,in1982and1985,respectively,andhisPh.
D.
DegreeinElectricalEngineeringfromtheUniversityofConnecticutin1993.
HewasaseniorconsultingengineerwithPG&Efrom1993to1995.
From1985to1988andsince1995hehasbeenwiththeSystemsEngineeringInstitute,Xi'anJiaotongUniversity,Xi'an,China,andcurrentlyheistheCheungKongProfessorofSystemsEngineeringandDirectoroftheNationalLabforManufacturingSystems.
HeisalsotheChairofDepartmentofAutomationandDirectoroftheCenterforIntel-ligentandNetworkedSystems,TsinghuaUniversity,China.
HevisitedtheDivisionofEngineeringandAppliedScience,HarvardUniversityfromJanuary1999toFebruary2000.
HeisanIEEEfellow.
Hisresearchinterestsincludecomputernetworksecurity,wirelesssensornetworksandeconomicsandsecurityofcomplexnetworkedsystems.
XiangliangZhang(xlzhang@lri.
fr)receivedherB.
S.
DegreeinInformationandCommunicationEngineeringandM.
S.
DegreeinElectronicEngineeringfromXi'anJiaotongUniversity,Xi'an,China,in2003and2006,respectively.
ShewasaninternshipstudentinDepartmentofInformationandCommunication,UniversityofTrento,Italy,fromFebruary2006toMay2006.
SheiscurrentlyaPh.
D.
studentinLaboratoiredeRechercheenInformatique,mixedwithFrenchNationalInstituteforResearchinComputerScienceandControl(INRIA),NationalCenterforScienticResearch(CNRS)andUniversityofParis-sud11,France.
Herresearchinterestsincludenetworksecurity,machinelearning,dataminingandtheirappli-cations,e.
g.
,computersecurity,complexsystemmodelingandgridmanagement.
72W.
Wangetal.
/ComputerCommunications31(2008)58–72

展开全文