medianlinuxcp

linuxcp  时间:2021-04-10  阅读:()
Chapter21DATACORPORAFORDIGITALFORENSICSEDUCATIONANDRESEARCHYorkYannikos,LukasGraner,MartinSteinebach,andChristianWinterAbstractDatacorporaareveryimportantfordigitalforensicseducationandre-search.
Severalcorporaareavailabletoacademia;theserangefromsmallmanually-createddatasetsofafewmegabytestomanyterabytesofreal-worlddata.
However,dierentcorporaaresuitedtodierentforensictasks.
Forexample,realdatacorporaareoftendesirablefortestingforensictoolpropertiessuchaseectivenessandeciency,butthesecorporatypicallylackthegroundtruththatisvitaltoperform-ingproperevaluations.
Syntheticdatacorporacansupporttooldevel-opmentandtesting,butonlyifthemethodologiesforgeneratingthecorporaguaranteedatawithrealisticproperties.
Thispaperpresentsanoverviewoftheavailabledigitalforensiccor-poraanddiscussestheproblemsthatmayarisewhenworkingwithspeciccorpora.
Thepaperalsodescribesaframeworkforgeneratingsyntheticcorporaforeducationandresearchwhensuitablereal-worlddataisnotavailable.
Keywords:Forensicdatacorpora,syntheticdiskimages,model-basedsimulation1.
IntroductionAdigitalforensicinvestigatormusthaveabroadknowledgeofforensicmethodologiesandexperiencewithawiderangeoftools.
Thisincludesmulti-purposeforensicsuiteswithadvancedfunctionalityandgoodus-abilityaswellassmalltoolsforspecialtasksthatmayhavemoderatetolowusability.
Gainingexpert-levelskillsintheoperationofforensictoolsrequiresasubstantialamountoftime.
Additionally,advancesinanalysismethods,toolsandtechnologiesrequirecontinuouslearningtomaintaincurrency.
G.
PetersonandS.
Shenoi(Eds.
):AdvancesinDigitalForensicsX,IFIPAICT433,pp.
309–325,2014.
cIFIPInternationalFederationforInformationProcessing2014310ADVANCESINDIGITALFORENSICSXIndigitalforensicseducation,itisimportanttoprovideinsightsintospecictechnologiesandhowforensicmethodsmustbeappliedtoper-formthoroughandsoundanalyses.
Itisalsoveryimportanttoprovidearichlearningenvironmentwherestudentscanuseforensictoolstorigorouslyanalyzesuitabletestdata.
Thesameistrueindigitalforensicsresearch.
Newmethodologiesandnewtoolshavetobetestedagainstwell-knowndatacorpora.
Thisprovidesabasisforcomparingmethodologiesandtoolssothatthead-vantagesandshortcomingscanbeidentied.
Forensicinvestigatorscanusetheresultsofsuchevaluationstomakeinformeddecisionsaboutthemethodologiesandtoolsthatshouldbeusedforspecictasks.
Thishelpsincreasetheeciencyandthequalityofforensicexaminationswhileallowingobjectiveevaluationsbythirdparties.
Thepaperprovidesanoverviewofseveralreal-worldandsyntheticdatacorporathatareavailablefordigitalforensicseducationandre-search.
Also,ithighlightsthepotentialrisksandproblemsencounteredwhenusingdatacorpora,alongwiththecapabilitiesofexistingtoolsthatallowthegenerationofsyntheticdatacorporawhenreal-worlddataisnotavailable.
Additionally,thepaperdescribesacustomframeworkforsyntheticdatagenerationandevaluatestheperformanceoftheframe-work.
2.
AvailableDataCorporaSeveraldatacorporahavebeenmadeavailableforpublicuse.
Whilesomeofthecorporaareusefulfordigitalforensicseducationandre-search,othersaresuitedtoveryspecicareassuchasnetworkforensicsandforensiclinguistics.
Thissectionpresentsanoverviewofthemostrelevantcorpora.
2.
1RealDataCorpusAfewreal-worlddatacorporaareavailabletosupportdigitalforen-sicseducationandresearch.
Garnkel,etal.
[7]havecreatedtheRealDataCorpusfromusedharddisksthatwerepurchasedfromaroundtheworld.
Inalaterwork,Garnkel[5]describedthechallengesandlessonslearnedwhilehandlingtheRealDataCorpus,whichbythenhadgrowntomorethan30terabytes[5].
AsofSeptember2013,theRealDataCorpusincorporated1,289harddiskimages,643ashmemoryimagesand98opticaldiscs.
However,becausethiscorpuswaspartlyfundedbytheU.
S.
Government,accesstothecorpusrequirestheapprovalofaninstitutionalreviewboardinaccordancewithU.
S.
legislation.
Ad-Yannikos,Graner,Steinebach&Winter311ditionalinformationaboutthecorpusanditsaccessrequirementsareavailableat[6].
Asmallercorpus,whichincludesspecicscenarioscreatedforeduca-tionalpurposes[25],canbedownloadedwithoutanyrestrictions.
Thissmallercorpuscontains:Threetestdiskimagescreatedespeciallyforeducationalandtest-ingpurposes(e.
g.
,lesystemanalysis,lecarvingandhandlingencodings).
FourrealisticdiskimagesetscreatedfromUSBmemorysticks,adigitalcameraandaWindowsXPcomputer.
Asetofalmost1,000,000les,including109,282JPEGles.
Fivephoneimagesfromfourdierentcellphonemodels.
Mixeddatacorrespondingtothreectionalscenariosforeduca-tionalpurposes,includingmultiplenetworkpacketdumpsanddiskimages.
Duetothevarietyofdataitcontains,theRealDataCorpusisavaluableresourceforeducatorsandresearchersintheareasofmulti-mediaforensics,mobilephoneforensicsandnetworkforensics.
Toourknowledge,itisthelargestpublicly-availablecorpusintheareaofdigitalforensics.
2.
2DARPAIntrusionDetectionDataSetsIn1998and1999,researchersatMITLincolnLaboratory[12,13]createdasimulationnetworkinordertoproducenetworktracandauditlogsforevaluatingintrusiondetectionsystems.
Thesimulatedin-frastructurewasattackedusingwell-knowntechniquesaswellasnewtechniquesthatwerespeciallydevelopedfortheevaluation.
In2000,additionalexperimentswereperformedinvolvingspecicscenarios,in-cludingtwoDDoSattacksandanattackonaWindowsNTsystem.
Thedatasetsforallthreeexperimentsareavailableat[11];theyin-cludenetworktracdataintcpdumpformat,auditlogsandlesystemsnapshots.
Themethodologiesemployedinthe1998and1999evaluationswerecriticizedbyMcHugh[16].
McHughstatesthattheevaluationresultsmissimportantdetailsandthatportionsoftheevaluationproceduresareunclearorinappropriate.
Additionally,Garnkel[4]pointsoutthatthedatasetsdonotrepresentreal-worldtracbecausetheylackcomplexityandheterogeneity.
Therefore,thiscorpushaslimiteduseinnetworkforensicsresearch.
312ADVANCESINDIGITALFORENSICSX2.
3MemCorpCorpusTheMemCorpCorpus[22]containsmemoryimagescreatedfromsev-eralvirtualandphysicalmachines.
Inparticular,thecorpuscontainsimagesextractedfrom87computersystemsrunningvariousversionsofMicrosoftWindows;theimageswereextractedusingcommonmemoryimagingtools.
Thecorpusincludesthefollowingimages:53systemmemoryimagescreatedfromvirtualmachines.
23systemmemoryimagescreatedfromphysicalmachineswithfactorydefaultcongurations(i.
e.
,withnoadditionalsoftwarein-stalled).
11systemmemoryimagescreatedfrommachinesunderspecicscenarios(e.
g.
,aftermalwarewasinstalled).
Thiscorpussupportseducationandtrainingeortsfocusedonmem-oryanalysisusingtoolssuchastheVolatileFramework[23].
However,asnotedbythecorpuscreator[22],thecorpusdoesnotcontainimagescreatedfromreal-worldsystemsorimagesfromoperatingsystemsotherthanMicrosoftWindows,whichreducesitsapplicability.
ThecreatoroftheMemCorpCorpusprovidesaccesstotheimagesuponrequest.
2.
4MORPHCorpusSeveralcorporahavebeencreatedintheareaoffacerecognition[8].
Sincealargecorpuswithfacialimagestaggedwithageinformationwouldbeveryusefulformultimediaforensics,wehavepickedasamplecorpusthatcouldbeavaluableresourceforresearch(e.
g.
,fordetectingofillegalmultimediacontentlikechildpornography).
TheMORPHCorpus[20]comprises55,000uniquefacialimagesofmorethan13,000individuals.
Theagesoftheindividualsrangefrom16to77withamedianageof33.
Fourimagesonaverageweretakenofeachindividualwithanaveragetimeof164daysbetweeneachimage.
Facialimagesannotatedwithageinformationareusefulfordevelop-ingautomatedagedetectionsystems.
Currently,noreliablemethods(i.
e.
,withlowerrorrates)existforageidentication.
Steinebach,etal.
[21]haveemployedfacerecognitiontechniquestoidentifyknownil-legalmultimediacontent,buttheydidnotconsiderageclassication.
2.
5EnronCorpusTheEnronCorpusintroducedin2004isawell-knowncorpusintheareaofforensiclinguistics[9].
Initsrawform,thecorpuscontainsYannikos,Graner,Steinebach&Winter313619,446emailmessagesfrom158executivesofEnronCorporation;theemailmessageswereseizedduringtheinvestigationofthe2001Enronscandal.
Afterdatacleansing,thecorpuscontains200,399messages.
TheEnronCorpusisoneofthemostreferencedmasscollectionsofreal-worldemaildatathatispubliclyavailable.
Thecorpusprovidesavaluablebasisforresearchonemailclassi-cation,animportantareainforensiclinguistics.
KlimtandYang[10]suggestusingthreadmembershipdetectionforemailclassicationandprovidetheresultsofbaselineexperimentsconductedwiththeEnronCorpus.
DatasetsfromtheEnronCorpusareavailableat[3].
2.
6GlobalIntelligenceFilesInFebruary2012,WikiLeaksstartedpublishingtheGlobalIntelli-genceFiles,alargecorpusofemailmessagesgatheredfromthein-telligencecompanyStratfor.
WikiLeaksclaimstopossessmorethan5,000,000emailmessagesdatedbetweenJuly2004andDecember2011.
AsofSeptember2013,almost3,000,000ofthesemessageshavebeenavailablefordownloadbythepublic[24].
WikiLeakscontinuestore-leasenewemailmessagesfromthecorpusonanalmostdailybasis.
LiketheEnronCorpus,theGlobalIntelligenceFileswouldprovideavaluablebasisforresearchinforensiclinguistics.
However,wearenotawareofanysignicantresearchconductedusingtheGlobalIntelligenceFiles.
2.
7ComputerForensicReferenceDataSetsTheComputerForensicReferenceDataSetsmaintainedbyNIST[19]isasmalldatacorpuscreatedfortrainingandtestingpurposes.
Thedatasetsincludetestcasesforlecarving,systemmemoryanalysisandstringsearchusingdierentencodings.
Thecorpuscontainsthefollowingdata:Onehackingcasescenario.
Twoimagesforunicodestringsearches.
Fourimagesforlesystemanalysis.
Oneimageformobiledeviceanalysis.
Oneimageforsystemmemoryanalysis.
Twoimagesforverifyingtheresultsofforensicimagingtools.
Thiscorpusprovidesasmallbutvaluablereferencesetfortooldevel-opers.
Itisalsosuitablefortraininginforensicanalysismethods.
314ADVANCESINDIGITALFORENSICSX3.
PitfallsofDataCorporaForensiccorporaareveryusefulforeducationandresearch,buttheyhavecertainpitfalls.
SolutionSpecicity:Whileacorpusisveryvaluablewhende-velopingmethodologiesandtoolsthatsolveresearchproblemsindigitalforensics,itisdiculttondgeneralsolutionsthatarenotsomehowtailoredtothecorpus.
Evenwhenasolutionisintendedtoworkingeneral(withdierentcorporaandintherealworld),researchanddevelopmenteortsoftenslowlyadaptthesolutiontothecorpusovertime,probablywithoutevenbeingnoticedbytheresearchers.
Forexample,theEnronCorpusiswidelyusedbytheforensicslinguisticscommunityasasinglebasisforresearchonemailclassication.
Itwouldbediculttoshowthattheresearchresultsbasedonthiscorpusapplytogeneralemailclassicationproblems.
Thiscouldalsobecomeanissueif,forinstance,ageneralmethod-ologyortoolthatsolvesaspecicproblemalreadyexists,andanotherresearchgroupisworkingtoenhancethesolution.
Usingonlyonecorpusduringdevelopmentincreasestheriskofcraftingasolutionthatmaybemoreeectiveandecientthanprevioussolutions,butonlywhenusedwiththatspeciccorpus.
LegalIssues:ThedataincorporasuchasGarnkel'sRealDataCorpuscreatedfromusedharddisksboughtfromthesecondarymarketmaybesubjecttointellectualpropertyandpersonalpri-vacylaws.
Evenifthecountrythathoststhereal-worldcorpusallowsitsuseforresearch,legalrestrictionscouldbeimposedbyasecondcountryinwhichtheresearchthatusesthecorpusisbeingconducted.
Theworstcaseiswhenlocallawscompletelyprohibittheuseofthecorpus.
Relevance:Datacorporaareoftencreatedassnapshotsofaspe-cicscenariosorenvironments.
Thedatacontainedincorporaoftenlosesitsrelevanceasitages.
Forexample,networktracfromthe1990sisquitedierentfromcurrentnetworktrac–afactthatwaspointedoutfortheDARPAIntrusionDetectionDataSets[4,16].
Anotherexampleisadatacorpuscontainingdataex-tractedfrommobilephones.
Suchacorpusmustbeupdatedveryfrequentlywithdatafromthelatestdevicesifitistobeusefulformobilephoneforensics.
Yannikos,Graner,Steinebach&Winter315ScenarioModelSyntheticDataSimulationPurposeFigure1.
Generatingsyntheticdatabasedonareal-worldscenario.
Transferability:Manydatacorporaarecreatedortakenfromspeciclocalenvironments.
TheemailmessagesintheEnronCor-pusareinEnglish.
Whilethiscorpusisvaluabletoforensiclin-guistsinEnglish-speakingcountries,itsvaluetoresearchersfo-cusedonotherlanguagesisdebatable.
Indeed,manyimportantpropertiesthatarerelevanttoEnglishandusedforemailclassi-cationmaynotbeapplicabletoArabicorMandarinChinese.
Likewise,corporadevelopedfortestingforensictoolsthatana-lyzespecicapplications(e.
g.
,instantmessagingsoftwareandchatclients)maynotbeusefulinothercountriesbecauseofdierencesinjargonandcommunicationpatterns.
Also,acorpusthatmostlyincludesFacebookpostsandIRClogsmaynotbeofmuchvalueinacountrywheretheseservicesarenotpopular.
4.
SyntheticDataCorpusGenerationAsidefrommethodologiesforcreatingsyntheticdatacorporabyman-uallyreproducingreal-worldactions,littleresearchhasbeendonerelatedtotool-supportedsyntheticdatacorpusgeneration.
MochandFreil-ing[17]havedevelopedForensig2,atoolthatgeneratessyntheticdiskimagesusingvirtualmachines.
Whiletheprocessforgeneratingdiskimageshastobeprogrammedinadvance,thetoolallowsrandomnesstobeintroducedinordertocreatesimilar,butnotidentical,diskimages.
Inamorerecentwork,MochandFreiling[18]presenttheresultsofanevaluationofForensig2appliedtostudenteducationscenarios.
Amethodologyforgeneratingasyntheticdatacorpusforforensicac-countingisproposedin[14]andevaluatedin[15].
Theauthorsdemon-stratehowtogeneratesyntheticdatacontainingfraudulentactivitiesfromsmallercollectionsofreal-worlddata.
Thedataisthenusedfortrainingandtestingafrauddetectionsystem.
5.
CorpusGenerationProcessThissectiondescribestheprocessforgeneratingasyntheticdatacor-pususingthemodel-basedframeworkpresentedin[27].
Figure1presentsthesyntheticdatagenerationprocess.
Therststepingeneratingasyntheticdatacorpusistodenethedatausecases.
For316ADVANCESINDIGITALFORENSICSXexample,inadigitalforensicsclass,wherestudentswillbetestedontheirknowledgeaboutharddiskanalysis,oneormoresuitablediskimageswouldberequiredforeachstudent.
ThestudentswouldhavetosearchthediskimagesfortracesofmalwareorrecovermultimediadatafragmentsusingtoolssuchasForemost[1]andSleuthKit[2].
Thediskimagescouldbecreatedinareasonableamountoftimeman-uallyorviascripting.
However,ifeverystudentshouldreceivedierentdiskimagesforanalysis,thensignicanteortmayhavetobeexpendedtoinsertvariationsintheimages.
Also,ifdierenttasksareassignedtodierentstudents(e.
g.
,onestudentshouldrecoverJPEGlesandanotherstudentshouldsearchfortracesofarootkit),moresignicantvariationswouldhavetobeincorporatedinthediskimages.
Thesecondstepinthecorpusgenerationprocessistospecifyareal-worldscenarioinwhichtherequiredkindofdataistypicallycreated.
Oneexampleisacomputerthatisusedbymultipleindividuals,whotypicallyinstallandremovesoftware,anddownload,copy,deleteandoverwriteles.
Thethirdstepistocreateamodeltomatchthisscenarioandserveasthebasisofasimulation,whichisthelaststep.
AMarkovchainconsistingofstatesandstatetransitionscanbecreatedtomodeluserbehavior.
Thestatescorrespondtotheactionsperformedbytheusersandthetransitionsspecifytheactionsthatcanbeperformedaftertheprecedingactions.
5.
1ScenarioModelingusingMarkovChainsFinitediscrete-timeMarkovchainsasdescribedin[26]areusedforsyntheticdatageneration.
OneMarkovchainiscreatedforeachtypeofsubjectwhoseactionsaretobesimulated.
Asubjectcorrespondstoauserwhoperformsactionsonaharddisksuchassoftwareinstallationsandledeletions.
ThestatesintheMarkovchaincorrespondtotheactionsperformedbythesubjectinthescenario.
Inordertoconstructasuitablemodel,itisnecessarytorstde-nealltheactions(states)thatcausedatatobecreatedanddeleted.
Thetransitionsbetweenactionsarethendened.
Followingthis,theprobabilityofeachactionisspecied(stateprobability)alongwiththeprobabilityofeachtransitionbetweentwoactions(transitionprobabil-ity);theprobabilitiesareusedduringtheMarkovchainsimulationtogeneraterealisticdata.
Thecomputationoffeasibletransitionproba-bilitiesgivenstateprobabilitiescaninvolvesomeeort,buttheprocesshasbeensimpliedin[28].
Yannikos,Graner,Steinebach&Winter317Next,thenumberofsubjectswhoperformtheactionsarespecied(e.
g.
,numberofindividualswhosharethecomputer).
Finally,thedetailsofeachpossibleactionarespecied(e.
g.
,whatexactlyhappensduringadownloadleactionoradeleteleaction).
5.
2Model-BasedSimulationHavingconstructedamodelofthedesiredreal-worldscenario,itisnecessarytoconductasimulationbasedonthemodel.
Thenumberofactionstobeperformedbyeachuserisspeciedandthesimulationisthenstarted.
Attheendofthesimulation,thediskimagecontainssyntheticdatacorrespondingtothemodeledreal-worldscenario.
5.
3SampleScenarioandModelTodemonstratethesyntheticdatagenerationprocess,weconsiderasamplescenario.
Thepurposeforgeneratingthesyntheticdataistotesthowdierentlecarversdealwithfragmenteddata.
Thereal-worldscenarioinvolvesanindividualwhousesanUSBmemorysticktotransferlargeamountsofles,mainlyphotographs,betweencomputers.
Inthefollowing,wedeneallthecomponentsinamodelthatwouldfacilitatethecreationofasyntheticdiskimageofaUSBmemorystickcontainingalargenumberofles,deletedlesandlefragments.
Theresultingdiskimagewouldbeusedtotesttheabilityoflecarverstoreconstructfragmenteddata.
States:Inthesamplemodel,thefollowingfouractionsaredenedasMarkovchainstates:1.
AddDocumentFile:Thisactionaddsadocumentle(e.
g.
,PDForDOC)tothelesystemofthesyntheticdiskimage.
ItisequivalenttocopyingalefromoneharddisktoanotherusingtheLinuxcpcommand.
2.
AddImageFile:Thisactionaddsanimagele(e.
g.
,JPEG,PNGorGIF)tothelesystem.
Again,itisequivalenttousingtheLinuxcpcommand.
3.
WriteFragmentedData:Thisactiontakesarandomimagele,cutsitintomultiplefragmentsandwritesthefragmentstothediskimage,ignoringthelesystem.
ItisequivalenttousingtheLinuxddforeachlefragment.
4.
DeleteFile:Thisactionremovesarandomlefromthelesystem.
ItisequivalenttousingtheLinuxrmcommand.
318ADVANCESINDIGITALFORENSICSX3124Figure2.
Markovchainusedtogenerateasyntheticdiskimage.
Transitions:Next,thetransitionsbetweentheactionsarede-ned.
Sincethetransitionsarenotreallyimportantinthescenario,theMarkovchainissimplyconstructedasacompletedigraph(Fig-ure2).
ThestatenumbersintheMarkovchaincorrespondtothestatenumbersspeciedabove.
StateProbabilities:Next,theprobabilityπiofeachaction(state)itobeperformedduringaMarkovchainsimulationisspecied.
Wechosethefollowingprobabilitiesfortheactionstoensurethatalargenumberoflesandlefragmentsareaddedtothesyntheticdiskimageandonlyamaximumofabouthalfoftheaddedlesaredeleted:π=(π1,π4)=(0.
2,0.
2,0.
4,0.
2).
StateTransitionProbabilities:Finally,thefeasibleprobabil-itiesforthetransitionsbetweentheactionsarecomputed.
Theframeworkisdesignedtocomputethetransitionprobabilitiesau-tomatically.
Onepossibleresultisthesimplesetoftransitionprobabilitiesspeciedinthematrix:P=0.
20.
20.
40.
20.
20.
20.
40.
20.
20.
20.
40.
20.
20.
20.
40.
2wherepijdenotestheprobabilityofatransitionfromactionitoactionj.
6.
CorporaGenerationFrameworkTheframeworkdevelopedforgeneratingsyntheticdiskimagesisim-plementedinJava1.
7.
ItusesamodulardesignwithasmallsetofcoreYannikos,Graner,Steinebach&Winter319Figure3.
Screenshotofthemodelbuilder.
components,agraphicaluserinterface(GUI)andmodulesthatprovidespecicfunctionality.
TheGUIprovidesamodelbuildinginterfacethatallowsamodeltobecreatedquicklyforaspecicscenariousingtheactionsavailableintheframework.
Additionally,animageviewerisimplementedtoprovidedetailedviewsofthegeneratedsyntheticdiskimages.
Newactionsintheframeworkcanbeaddedbyimplementingasmallnumberofinterfacesthatrequireminimalprogrammingeort.
Sincetheframeworksupportsthespecicationandexecutionofanabstractsyntheticdatagenerationprocess,newactionscanbeimplementedinde-pendentlyofascenarioforwhichasyntheticdiskimageisbeingcreated.
Forexample,itispossibletoworkonacompletelydierentscenariowherenancialdataistobecreatedinanenterpriserelationshipman-agementsystem.
Thecorrespondingactionsthatarerelevanttocreatingthenancialdatacanbeimplementedinastraightforwardmatter.
ThescreenshotinFigure3showsthemodelbuildercomponentoftheframework.
TheMarkovchainusedforgeneratingdatacorrespondingtothesamplescenarioisshowninthecenterofthegure(greenbox).
7.
FrameworkEvaluationThissectionevaluatestheperformanceoftheframework.
Thesamplemodeldescribedaboveisexecutedtosimulateacomputeruserwhoper-formswriteanddeleteactionsonaUSBmemorystick.
Theevaluationsetupisasfollows:Model:DescribedinSection5.
3.
320ADVANCESINDIGITALFORENSICSXDiscreteSimulationSteps:4,000actions.
SyntheticDiskImageSize:2,048MiB(USBmemorystick).
Filesystem:FAT32with4,096-byteclustersize.
AddDocumentFileAction:Adocument(e.
g.
,DOC,PDForTXT)leisrandomlycopiedfromalocallesourcecontaining139documentles.
AddImageFileAction:Animage(e.
g.
,PNG,JPEGorGIF)leisrandomlycopiedfromalocallesourcecontaining752imageles.
DeleteFileAction:Aleisrandomlychosenanddeletedfromthelesystemofthesyntheticdiskimagewithoutoverwriting.
WriteFragmentedDataAction:Animageleisrandomlychosenfromthelocallesourcecontaining752imageles.
Theleiswrittentothelesystemofthesyntheticdiskimageusingarandomnumberoffragmentsbetween2and20,arandomfragmentsizecorrespondingtoamultipleofthelesystemclustersizeandrandomly-selectedcluster-alignedlocationsforfragmentinsertion.
Twentysimulationsofthemodelwereexecutedusingthesetup.
Aftereachrun,thetimeneededtocompletelygeneratethesyntheticdiskimagewasassessed,alongwiththeamountofdiskspaceused,numberoflesdeleted,numberoflesstillavailableinthelesystemandnumberofdierentlefragmentswrittentotheimage.
Figure4(a)showsthetimerequiredbyframeworktoruneachsim-ulation.
Ontheaverage,asimulationrunwascompletedin2minutesand21seconds.
Figure4(b)presentsanoverviewofthenumbersoflesthatwereallocatedinanddeletedfromthesyntheticdiskimages.
Notethattheallocated(created)lesareshowninlightgraywhilethedeletedlesareshownindarkgray;theaveragevalueisshownasagrayline.
Ontheaverage,adiskimagecontained792allocatedlesand803deletedles,whichareexpectedduetotheprobabilitieschosenfortheactionsinthemodel.
Figure5(a)showstheuseddiskspaceinthesyntheticimagecor-respondingtoallocatedles(lightgray),deletedles(gray)andlefragments(darkgray).
Theusedspacediersconsiderablyoverthesimulationrunsbecauseonlythenumbersoflestobewrittenanddeletedfromthediskimageweredened(individuallesizeswerenotspecied).
SincetheleswerechosenrandomlyduringthesimulationYannikos,Graner,Steinebach&Winter3211234567891011121314151617181920050100150200128137131135154165143151134131136135156123150147161118142151SimulationRun(a)Timerequiredforeachsimulationrun.
1,0005000906NumberofFilesSimulationRun1234567891011121314151617181920749902742753832854767808795791797808833797807759816778801706845782782728818854777714841770825861772742811770827786816(b)Numbersofallocatedlesanddeletedles.
Figure4.
Evaluationresultsfor20simulationruns.
runs,thelesizesand,therefore,thediskspaceusagedier.
Ontheaverage,57%oftheavailablediskspacewasused.
Figure5(b)showstheaveragenumberoflefragmentsperletypeoverall20simulationruns.
Thewritingoffragmenteddatausedadedicatedlesourcecontainingonlypictures;thisexplainsthelargenumbersofJPEGandPNGfragments.
Figure6showsascreenshotoftheimageviewerprovidedbytheframe-work.
Informationsuchasthedatatype,fragmentsizeandlesystemstatus(allocatedanddeleted)isprovidedforeachblock.
8.
ConclusionsTheframeworkpresentedinthispaperiswell-suitedtoscenario-basedmodelbuildingandsyntheticdatageneration.
Inparticular,itprovidesaexibleandecientapproachforgeneratingsyntheticdatacorpora.
The322ADVANCESINDIGITALFORENSICSX34.
7784.
3854.
0252.
06UnusedDiskSpace(%)71.
6274.
7351.
5262.
9968.
8546.
4743.
2759.
9759.
8935.
7158.
3264.
1161.
1239.
3647.
4167.
90100500(a)Useddiskspacecorrespondingtoallocatedles,deletedlesandlefragments.
bmpepsgifjpgmovmp4pdfpngsvgtifzip10210310428601514866112492242FileType13,5463,061(b)Averagenumberoffragmentsperletype.
Figure5.
Evaluationresultsfor20simulationruns.
experimentalevaluationofcreatingasyntheticdiskimagefortestingthefragmentrecoveryperformanceoflecarversdemonstratestheutilityfortheframework.
Unlikereal-worldcorpora,syntheticcorporaprovidegroundtruthdatathatisveryimportantindigitalforensicseducationandresearch.
Thisenablesstudentsaswellasdevelopersandtesterstoacquiredetailedunderstandingofthecapabilitiesandperformanceofdigitalforensictools.
Theabilityoftheframeworktogeneratesyntheticcorporabasedonrealisticscenarioscansatisfytheneedfortestdatainapplicationsforwhichsuitablereal-worlddatacorporaarenotavailable.
Moreover,theframeworkisgenericenoughtoproducesyntheticcorporaforavarietyofdomains,includingforensicaccountingandnetworkforensics.
Yannikos,Graner,Steinebach&Winter323Figure6.
Screenshotoftheimageviewer.
AcknowledgementThisresearchwassupportedbytheCenterforAdvancedSecurityResearchDarmstadt(CASED).
References[1]AirForceOceofSpecialInvestigations,Foremost(foremost.
sourceforge.
net),2001.
[2]B.
Carrier,TheSleuthKit(www.
sleuthkit.
org/sleuthkit),2013.
[3]W.
Cohen,EnronEmailDataset,SchoolofComputerScience,CarnegieMellonUniversity,Pittsburgh,Pennsylvania(www.
cs.
cmu.
edu/~enron),2009.
[4]S.
Garnkel,Forensiccorpora,achallengeforforensicresearch,un-publishedmanuscript,2007.
[5]S.
Garnkel,Lessonslearnedwritingdigitalforensicstoolsandman-aginga30TBdigitalevidencecorpus,DigitalInvestigation,vol.
9(S),pp.
S80–S89,2012.
[6]S.
Garnkel,DigitalCorpora(digitalcorpora.
org),2013.
[7]S.
Garnkel,P.
Farrell,V.
RoussevandG.
Dinolt,Bringingsci-encetodigitalforensicswithstandardizedforensiccorpora,DigitalInvestigation,vol.
6(S),pp.
S2–S11,2009.
[8]M.
GrgicandK.
Delac,FaceRecognitionHomepage,Zagreb,Croa-tia(www.
face-rec.
org/databases),2013.
324ADVANCESINDIGITALFORENSICSX[9]B.
KlimtandY.
Yang,IntroducingtheEnronCorpus,presentedattheFirstConferenceonEmailandAnti-Spam,2004.
[10]B.
KlimtandY.
Yang,TheEnronCorpus:Anewdatasetforemailclassicationresearch,ProceedingsoftheFifteenthEuropeanCon-ferenceonMachineLearning,pp.
217–226,2004.
[11]LincolnLaboratory,MassachusettsInstituteofTechnology,DARPAIntrusionDetectionDataSets,Lexington,Massachusetts(www.
ll.
mit.
edu/mission/communications/cyber/CSTcorpora/ideval/data),2013.
[12]R.
Lippmann,D.
Fried,I.
Graf,J.
Haines,K.
Kendall,D.
McClung,D.
Weber,S.
Webster,D.
Wyschogrod,R.
CunninghamandM.
Zissman,Evaluatingintrusiondetectionsystems:The1998DARPAo-lineintrusiondetectionevaluation,ProceedingsoftheDARPAInformationSurvivabilityConferenceandExposition,vol.
2,pp.
12–26,2000.
[13]R.
Lippmann,J.
Haines,D.
Fried,J.
KorbaandK.
Das,The1999DARPAo-lineintrusiondetectionevaluation,ComputerNetworks,vol.
34(4),pp.
579–595,2000.
[14]E.
Lundin,H.
KvarnstromandE.
Jonsson,Asyntheticfrauddatagenerationmethodology,ProceedingsoftheFourthInternationalConferenceonInformationandCommunicationsSecurity,pp.
265–277,2002.
[15]E.
LundinBarse,H.
KvarnstromandE.
Jonsson,Synthesizingtestdataforfrauddetectionsystems,ProceedingsoftheNineteenthAnnualComputerSecurityApplicationsConference,pp.
384–394,2003.
[16]J.
McHugh,Testingintrusiondetectionsystems:Acritiqueofthe1998and1999DARPAintrusiondetectionsystemevaluationsasperformedbyLincolnLaboratory,ACMTransactionsonInforma-tionandSystemSecurity,vol.
3(4),pp.
262–294,2000.
[17]C.
MochandF.
Freiling,TheForensicImageGeneratorGenerator(Forensig2),ProceedingsoftheFifthInternationalConferenceonITSecurityIncidentManagementandITForensics,pp.
78–93,2009.
[18]C.
MochandF.
Freiling,EvaluatingtheForensicImageGeneratorGenerator,ProceedingsoftheThirdInternationalConferenceonDigitalForensicsandCyberCrime,pp.
238–252,2011.
[19]NationalInstituteofStandardsandTechnology,TheCFReDSProject,Gaithersburg,Maryland(www.
cfreds.
nist.
gov),2013.
Yannikos,Graner,Steinebach&Winter325[20]K.
RicanekandT.
Tesafaye,Morph:Alongitudinalimagedatabaseofnormaladultage-progression,ProceedingsoftheSeventhInter-nationalConferenceonAutomaticFaceandGestureRecognition,pp.
341–345,2006.
[21]M.
Steinebach,H.
LiuandY.
Yannikos,FaceHash:Facedetectionandrobusthashing,presentedattheFifthInternationalConferenceonDigitalForensicsandCyberCrime,2013.
[22]T.
Vidas,MemCorp:Anopendatacorpusformemoryanalysis,ProceedingsoftheForty-FourthHawaiiInternationalConferenceonSystemSciences,2011.
[23]Volatilty,TheVolatilityFramework(code.
google.
com/p/volatility),2014.
[24]WikiLeaks,TheGlobalIntelligenceFiles(wikileaks.
org/the-gifiles.
html),2013.
[25]K.
Woods,C.
Lee,S.
Garnkel,D.
Dittrich,A.
RussellandK.
Kearton,Creatingrealisticcorporaforsecurityandforensiceduca-tion,ProceedingsoftheADFSLConferenceonDigitalForensics,SecurityandLaw,2011.
[26]Y.
Yannikos,F.
Franke,C.
WinterandM.
Schneider,3LSPG:Forensictoolevaluationbythreelayerstochasticprocess-basedgen-erationofdata,ProceedingsoftheFourthInternationalConferenceonComputationalForensics,pp.
200–211,2010.
[27]Y.
YannikosandC.
Winter,Model-basedgenerationofsyntheticdiskimagesfordigitalforensictooltesting,ProceedingsoftheEighthInternationalConferenceonAvailability,ReliabilityandSecurity,pp.
498–505,2013.
[28]Y.
Yannikos,C.
WinterandM.
Schneider,Syntheticdatacre-ationforforensictooltesting:Improvingperformanceofthe3LSPGFramework,ProceedingsoftheSeventhInternationalConferenceonAvailability,ReliabilityandSecurity,pp.
613–619,2012.

[6.18]IMIDC:香港/台湾服务器月付30美元起,日本/俄罗斯服务器月付49美元起

IMIDC发布了6.18大促销活动,针对香港、台湾、日本和莫斯科独立服务器提供特别优惠价格最低月付30美元起。IMIDC名为彩虹数据(Rainbow Cloud),是一家香港本土运营商,全线产品自营,自有IP网络资源等,提供的产品包括VPS主机、独立服务器、站群独立服务器等,数据中心区域包括香港、日本、台湾、美国和南非等地机房,CN2网络直连到中国大陆。香港服务器   $39/...

香港最便宜的vps要多少钱?最便宜的香港vps能用吗?

香港最便宜的vps要多少钱?最便宜的香港vps能用吗?香港vps无需备案,整体性能好,而且租用价格便宜,使用灵活,因为备受站长喜爱。无论是个人还是企业建站,都比较倾向于选择香港VPS。最便宜的香港vps能用吗?正因为有着诸多租用优势,香港VPS在业内颇受欢迎,租用需求量也在日益攀升。那么,对于新手用户来说,香港最便宜的vps租用有四大要点是务必要注意的,还有易探云香港vps租用最便宜的月付仅18元...

搬瓦工(季付46.7美元)新增荷兰(联通线路)VPS,2.5-10Gbps

搬瓦工最近新增了荷兰机房中国联通(AS9929、AS4837)线路的VPS产品,选项为NL - China Unicom Amsterdam(ENUL_9),与日本软银和洛杉矶(DC06)CN2 GIA配置一致可以互换,属于高端系列,2.5Gbps-10Gbps大带宽,最低季付46.7美元起。搬瓦工VPS隶属于老牌IT7公司旗下,主要提供基于KVM架构VPS主机,数据中心包括美国洛杉矶、凤凰城、纽...

linuxcp为你推荐
站酷zcool谁能介绍几个矢量图的网站?2020双十一成绩单如何查找2020年小考六年级的成绩?杨紫别祝我生日快乐一个人过生日的伤感说说有什么百度商城百度积分有什么用?7788k.comwww.8855k.com是个什么网站seo优化工具seo优化软件有哪些?125xx.com115xx.com是什么意思百度指数词百度指数是指,词不管通过什么样的搜索引擎进行搜索,都会被算成百度指数吗?广告法新修订的《广告法》有哪些内容www.henhenlu.com有一个两位数,十位数字是个位数字的二分之一,将十位数字与个位数字对调,新的两位数比原来大36,这个两位数
快速域名备案 中国域名交易中心 科迈动态域名 siteground 韩国电信 主机屋免费空间 双12活动 evssl证书 2017年万圣节 12306抢票助手 标准机柜尺寸 北京主机 100m免费空间 帽子云 绍兴电信 网站在线扫描 cdn服务 hosts文件 游戏服务器 海尔t68g 更多